-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-14104][PYSPARK][ML] All Python param setters should use the _set method
#11939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #54062 has finished for PR 11939 at commit
|
python/pyspark/ml/param/__init__.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a previous PR a parameter was given an incorrect type converter, and this was not caught by the tests. Enforcing _setDefault to use the type converter for the param will ensure that all params with default values cannot be given incompatible type converters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch
|
Test build #54068 has finished for PR 11939 at commit
|
python/pyspark/ml/feature.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the change to _setDefault, I had to change this default to be a list instead of JavaObject. The other option would be to have type converters do nothing if they encounter JavaObjects. It is nice to leave stop words as a JavaObject if they are never accessed explicitly on the Python side. Would appreciate thoughts on this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we ever explicitly access them on the Python side - although a users application might attempt to do that and append stop words to the existing list in which case having it as a list is maybe good. One could get a similar effect by changing getStopWords without having to round trip the list in cases where it isn't ever accessed on the python side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is simple to make this change, so I think it's a good idea. This will help in the future for similar cases or if the list of stopwords grows even larger. I changed getStopWords to return a list always which is better for users, I think. Thanks for the suggestion!
|
cc @holdenk @jkbradley Could you take a look whenever you get a chance? |
|
Test build #54711 has finished for PR 11939 at commit
|
|
Would be good to update to master so can run the latest tests, but at its current point it seems to have gotten all of the direct paramMap sets (although there may be more in master now). It might make sense to also add a clearParam function and then add a note that no one (including developers) should directly access the param map but instead use one of the access functions? |
|
@holdenk I added a |
|
Test build #55767 has finished for PR 11939 at commit
|
|
I'll take a look now |
python/pyspark/ml/evaluation.py
Outdated
| metricName = Param(Params._dummy(), "metricName", | ||
| "metric name in evaluation (areaUnderROC|areaUnderPR)") | ||
| "metric name in evaluation (areaUnderROC|areaUnderPR)", | ||
| TypeConverters.toString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specify typeConverter as a keyword arg (here and elsewhere)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
Test build #55930 has finished for PR 11939 at commit
|
|
Thanks for the updates. I just sent #12422 Could you please take a look at it? |
|
Test build #2792 has finished for PR 11939 at commit
|
|
LGTM |
…set` method ## What changes were proposed in this pull request? Param setters in python previously accessed the _paramMap directly to update values. The `_set` method now implements type checking, so it should be used to update all parameters. This PR eliminates all direct accesses to `_paramMap` besides the one in the `_set` method to ensure type checking happens. Additional changes: * [SPARK-13068](apache#11663) missed adding type converters in evaluation.py so those are done here * An incorrect `toBoolean` type converter was used for StringIndexer `handleInvalid` param in previous PR. This is fixed here. ## How was this patch tested? Existing unit tests verify that parameters are still set properly. No new functionality is actually added in this PR. Author: sethah <[email protected]> Closes apache#11939 from sethah/SPARK-14104.
… method ## What changes were proposed in this pull request? #11939 make Python param setters use the `_set` method. This PR fix omissive ones. ## How was this patch tested? Existing tests. cc jkbradley sethah Author: Yanbo Liang <[email protected]> Closes #12531 from yanboliang/setters-omissive.
What changes were proposed in this pull request?
Param setters in python previously accessed the _paramMap directly to update values. The
_setmethod now implements type checking, so it should be used to update all parameters. This PR eliminates all direct accesses to_paramMapbesides the one in the_setmethod to ensure type checking happens.Additional changes:
toBooleantype converter was used for StringIndexerhandleInvalidparam in previous PR. This is fixed here.How was this patch tested?
Existing unit tests verify that parameters are still set properly. No new functionality is actually added in this PR.