-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-20762][ML]Make String Params Case-Insensitive #17995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #76962 has finished for PR 17995 at commit
|
|
Test build #76999 has finished for PR 17995 at commit
|
|
Test build #77004 has finished for PR 17995 at commit
|
|
Test build #77009 has started for PR 17995 at commit |
|
Test build #77005 has finished for PR 17995 at commit
|
|
Jenkins, retest this please |
|
Test build #77038 has finished for PR 17995 at commit
|
|
ping @yanboliang |
|
Test build #77568 has finished for PR 17995 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we organize as
val AreaUnderROC: String = "areaUnderROC".toLowerCase
val AreaUnderPR: String = "areaUnderPR".toLowerCase
val supportedMetricNames = Set(AreaUnderROC, AreaUnderPR)
in object BinaryClassificationEvaluator? This should be more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supported selector types should always be stored as lower case, please update corresponding code snippet in mllib.feature.ChiSqSelector from:
private[spark] val NumTopFeatures: String = "numTopFeatures"
......
to
private[spark] val NumTopFeatures: String = "numTopFeatures".toLowerCase
......
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we can't do this, since MLlib supports set params via other entrances. Currently we can leave as it is, until we resolved #16028.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it a good idea.
0db3a52 to
28941f3
Compare
|
@yanboliang I update this PR and revert changes on |
|
Test build #78607 has finished for PR 17995 at commit
|
|
Test build #78619 has finished for PR 17995 at commit
|
|
Test build #78623 has finished for PR 17995 at commit
|
|
Test build #78621 has finished for PR 17995 at commit
|
## What changes were proposed in this pull request? 1, make param support non-final with `finalFields` option 2, generate `HasSolver` with `finalFields = false` 3, override `solver` in LiR, GLR, and make MLPC inherit `HasSolver` ## How was this patch tested? existing tests Author: Ruifeng Zheng <[email protected]> Author: Zheng RuiFeng <[email protected]> Closes #16028 from zhengruifeng/param_non_final.
|
Test build #79060 has finished for PR 17995 at commit
|
|
Test build #79066 has finished for PR 17995 at commit
|
|
Test build #79065 has finished for PR 17995 at commit
|
|
Test build #79068 has finished for PR 17995 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
${getModelType} -> $getModelType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to keep the original style, since we may add new metric which is not comply with isLargerBetter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it private[BinaryClassificationEvaluator]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check for string value in an allowed set of string values in a case-insensitive way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function is useless and involves extra computing cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep the original output format for supported family names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NORMAL -> Normal
|
Test build #79258 has finished for PR 17995 at commit
|
|
Test build #79260 has finished for PR 17995 at commit
|
|
Test build #79262 has finished for PR 17995 at commit
|
What changes were proposed in this pull request?
Make String Params Case-Insensitive:
solver,modelType,initMode,metricName,handleInvalid,strategy,stringOrderType,coldStartStrategy,impurity,lossType,featureSubsetStrategy,intermediateStorageLevel,finalStorageLevelLeave alone
ChiSqSelector&StringIndexer, for they are not easy to handle like others.How was this patch tested?
existing tests and added tests