-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-23377][ML] Fixes Bucketizer with multiple columns persistence bug #20566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #87283 has finished for PR 20566 at commit
|
mgaido91
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just wondering whether we should persist the default params too (in case they are changed across multiple versions) but in a separate section. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we just make paramMap private[ml]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way are good for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this way I think you can also avoid the MiMa failure...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it still can't avoid the MiMa failure.
|
@mgaido91 I also considered the issue of changed default values across versions. I'm not sure which is more reasonable, using old version's default value or using current version's default value. |
|
@viirya that's a good question. Honestly my idea is that if the user doesn't set a value, he/she doesn't care about it, so it is good to use the new version default IMHO. But it is also true that changing a default may cause unexpected behavior in user code. So, it LGTM, but I'd like to hear others' opinion on this too. |
|
Yeah, IMHO, when the user loads a model from old version into new version to run, I think it is reasonable to run it with current default value because the param is not explicitly set and should use "default" value of current system. Thanks for your comment. Let's wait for others' option. |
|
Test build #87285 has finished for PR 20566 at commit
|
|
Test build #87289 has finished for PR 20566 at commit
|
|
I believe this will break persistence for LogisticRegression. I believe the issue is that the I believe LinearRegression may have a similar issue. Our current tests don't seem to cover this kind of thing so I think we should improve test coverage if we want to make this kind of change. |
|
Not only |
|
Test build #87299 has finished for PR 20566 at commit
|
|
Test build #87301 has finished for PR 20566 at commit
|
|
retest this please. |
|
Test build #87302 has finished for PR 20566 at commit
|
|
Thanks for the patch @viirya |
|
@jkbradley Thanks! I will post the problem and proposed design on the JIRA. |
|
I'd close this and favor the quick fix #20594 based on the discussion in JIRA. Will re-open it if it is needed later. |
What changes were proposed in this pull request?
Since 2.3,
Bucketizersupports multiple input/output columns. We will check if exclusive params are set during transformation. E.g., ifinputColsandoutputColare both set, an error will be thrown.However, when we write
Bucketizer, looks like the default params and user-supplied params are merged during writing. All saved params are loaded back and set to created model instance. So the defaultoutputColparam inHasOutputColtrait will be set inparamMapand become an user-supplied param. That makes the check of exclusive params failed.This patch changes
DefaultParamsWriterand only save user-supplied params.The multi-column
QuantileDiscretizeralso has the same issue.How was this patch tested?
Modified test.