-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-11217][ML] save/load for non-meta estimators and transformers #9454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not feasible to set an arbitrary param outside the instance. This should be a compatible change.
|
Test build #44978 has finished for PR 9454 at commit
|
|
Test build #44980 has finished for PR 9454 at commit
|
|
Test build #45124 has finished for PR 9454 at commit
|
|
Reviewing now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"non-meta transformers and estimators" --> "transformers and estimators which contain basic (json4s-serializable) Params and no data. This will not handle more complex Params or types with data (e.g., models with coefficients)."
|
That's it; just minor comments |
|
@jkbradley I removed |
|
Test build #45224 has finished for PR 9454 at commit
|
|
Test build #45225 has finished for PR 9454 at commit
|
|
Test build #45244 has finished for PR 9454 at commit
|
|
LGTM |
|
I'll merge this with branch-1.6 and master |
|
Spark merge script not happy. Maybe a conflict was just introduced? |
|
Nevermind, second time's the charm |
This PR implements the default save/load for non-meta estimators and transformers using the JSON serialization of param values. The saved metadata includes:
* class name
* uid
* timestamp
* paramMap
The save/load interface is similar to DataFrames. We use the current active context by default, which should be sufficient for most use cases.
~~~scala
instance.save("path")
instance.write.context(sqlContext).overwrite().save("path")
Instance.load("path")
~~~
The param handling is different from the design doc. We didn't save default and user-set params separately, and when we load it back, all parameters are user-set. This does cause issues. But it also cause other issues if we modify the default params.
TODOs:
* [x] Java test
* [ ] a follow-up PR to implement default save/load for all non-meta estimators and transformers
cc jkbradley
Author: Xiangrui Meng <[email protected]>
Closes #9454 from mengxr/SPARK-11217.
(cherry picked from commit c447c9d)
Signed-off-by: Joseph K. Bradley <[email protected]>
|
@mengxr As I implemented save/load for logreg, I found some things I'd like to change: [https://issues.apache.org/jira/browse/SPARK-11618]. I'll write a PR for these changes before logreg. They should not affect our various PRs too much. |
This PR implements the default save/load for non-meta estimators and transformers using the JSON serialization of param values. The saved metadata includes:
The save/load interface is similar to DataFrames. We use the current active context by default, which should be sufficient for most use cases.
The param handling is different from the design doc. We didn't save default and user-set params separately, and when we load it back, all parameters are user-set. This does cause issues. But it also cause other issues if we modify the default params.
TODOs:
cc @jkbradley