Skip to content

Conversation

@mengxr
Copy link
Contributor

@mengxr mengxr commented Nov 4, 2015

This PR implements the default save/load for non-meta estimators and transformers using the JSON serialization of param values. The saved metadata includes:

  • class name
  • uid
  • timestamp
  • paramMap

The save/load interface is similar to DataFrames. We use the current active context by default, which should be sufficient for most use cases.

instance.save("path")
instance.write.context(sqlContext).overwrite().save("path")

Instance.load("path")

The param handling is different from the design doc. We didn't save default and user-set params separately, and when we load it back, all parameters are user-set. This does cause issues. But it also cause other issues if we modify the default params.

TODOs:

  • Java test
  • a follow-up PR to implement default save/load for all non-meta estimators and transformers

cc @jkbradley

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not feasible to set an arbitrary param outside the instance. This should be a compatible change.

@SparkQA
Copy link

SparkQA commented Nov 4, 2015

Test build #44978 has finished for PR 9454 at commit df81d61.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * abstract class Saver extends BaseSaveLoad\n * trait Saveable\n * abstract class Loader[T] extends BaseSaveLoad\n * trait Loadable[T]\n

@SparkQA
Copy link

SparkQA commented Nov 4, 2015

Test build #44980 has finished for PR 9454 at commit e01e92d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * abstract class Saver extends BaseSaveLoad\n * trait Saveable\n * abstract class Loader[T] extends BaseSaveLoad\n * trait Loadable[T]\n

@SparkQA
Copy link

SparkQA commented Nov 5, 2015

Test build #45124 has finished for PR 9454 at commit bc8611d.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * abstract class Writer extends BaseReadWrite\n * trait Writable\n * abstract class Reader[T] extends BaseReadWrite\n * trait Readable[T]\n

@jkbradley
Copy link
Member

Reviewing now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"non-meta transformers and estimators" --> "transformers and estimators which contain basic (json4s-serializable) Params and no data. This will not handle more complex Params or types with data (e.g., models with coefficients)."

@jkbradley
Copy link
Member

That's it; just minor comments

@mengxr mengxr changed the title [WIP][SPARK-11217][ML] save/load for non-meta estimators and transformers [SPARK-11217][ML] save/load for non-meta estimators and transformers Nov 6, 2015
@mengxr
Copy link
Contributor Author

mengxr commented Nov 6, 2015

@jkbradley I removed from and to because from is a Python keyword. Instead, I added load(path) and save(path) as shortcuts.

@SparkQA
Copy link

SparkQA commented Nov 6, 2015

Test build #45224 has finished for PR 9454 at commit a410538.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * abstract class Writer extends BaseReadWrite\n * trait Writable\n * abstract class Reader[T] extends BaseReadWrite\n * trait Readable[T]\n

@SparkQA
Copy link

SparkQA commented Nov 6, 2015

Test build #45225 has finished for PR 9454 at commit f862b6a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * abstract class Writer extends BaseReadWrite\n * trait Writable\n * abstract class Reader[T] extends BaseReadWrite\n * trait Readable[T]\n

@SparkQA
Copy link

SparkQA commented Nov 6, 2015

Test build #45244 has finished for PR 9454 at commit 7952bd4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * abstract class Writer extends BaseReadWrite\n * trait Writable\n * abstract class Reader[T] extends BaseReadWrite\n * trait Readable[T]\n

@jkbradley
Copy link
Member

LGTM

@jkbradley
Copy link
Member

I'll merge this with branch-1.6 and master

@jkbradley
Copy link
Member

Spark merge script not happy. Maybe a conflict was just introduced?

@jkbradley
Copy link
Member

Nevermind, second time's the charm

asfgit pushed a commit that referenced this pull request Nov 6, 2015
This PR implements the default save/load for non-meta estimators and transformers using the JSON serialization of param values. The saved metadata includes:

* class name
* uid
* timestamp
* paramMap

The save/load interface is similar to DataFrames. We use the current active context by default, which should be sufficient for most use cases.

~~~scala
instance.save("path")
instance.write.context(sqlContext).overwrite().save("path")

Instance.load("path")
~~~

The param handling is different from the design doc. We didn't save default and user-set params separately, and when we load it back, all parameters are user-set. This does cause issues. But it also cause other issues if we modify the default params.

TODOs:

* [x] Java test
* [ ] a follow-up PR to implement default save/load for all non-meta estimators and transformers

cc jkbradley

Author: Xiangrui Meng <[email protected]>

Closes #9454 from mengxr/SPARK-11217.

(cherry picked from commit c447c9d)
Signed-off-by: Joseph K. Bradley <[email protected]>
@asfgit asfgit closed this in c447c9d Nov 6, 2015
@jkbradley
Copy link
Member

@mengxr As I implemented save/load for logreg, I found some things I'd like to change: [https://issues.apache.org/jira/browse/SPARK-11618]. I'll write a PR for these changes before logreg. They should not affect our various PRs too much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants