[SPARK-11217][ML] save/load for non-meta estimators and transformers #9454

mengxr · 2015-11-04T00:21:53Z

This PR implements the default save/load for non-meta estimators and transformers using the JSON serialization of param values. The saved metadata includes:

class name
uid
timestamp
paramMap

The save/load interface is similar to DataFrames. We use the current active context by default, which should be sufficient for most use cases.

instance.save("path")
instance.write.context(sqlContext).overwrite().save("path")

Instance.load("path")

The param handling is different from the design doc. We didn't save default and user-set params separately, and when we load it back, all parameters are user-set. This does cause issues. But it also cause other issues if we modify the default params.

TODOs:

Java test
a follow-up PR to implement default save/load for all non-meta estimators and transformers

cc @jkbradley

mengxr · 2015-11-04T00:22:56Z

mllib/src/main/scala/org/apache/spark/ml/param/params.scala

it is not feasible to set an arbitrary param outside the instance. This should be a compatible change.

SparkQA · 2015-11-04T00:27:17Z

Test build #44978 has finished for PR 9454 at commit df81d61.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * abstract class Saver extends BaseSaveLoad\n * trait Saveable\n * abstract class Loader[T] extends BaseSaveLoad\n * trait Loadable[T]\n

SparkQA · 2015-11-04T01:41:51Z

Test build #44980 has finished for PR 9454 at commit e01e92d.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * abstract class Saver extends BaseSaveLoad\n * trait Saveable\n * abstract class Loader[T] extends BaseSaveLoad\n * trait Loadable[T]\n

SparkQA · 2015-11-05T17:10:27Z

Test build #45124 has finished for PR 9454 at commit bc8611d.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * abstract class Writer extends BaseReadWrite\n * trait Writable\n * abstract class Reader[T] extends BaseReadWrite\n * trait Readable[T]\n

jkbradley · 2015-11-05T19:23:02Z

Reviewing now

jkbradley · 2015-11-05T21:48:46Z

mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala

"non-meta transformers and estimators" --> "transformers and estimators which contain basic (json4s-serializable) Params and no data. This will not handle more complex Params or types with data (e.g., models with coefficients)."

jkbradley · 2015-11-05T21:49:29Z

That's it; just minor comments

mengxr · 2015-11-06T16:54:25Z

@jkbradley I removed from and to because from is a Python keyword. Instead, I added load(path) and save(path) as shortcuts.

SparkQA · 2015-11-06T17:16:32Z

Test build #45224 has finished for PR 9454 at commit a410538.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * abstract class Writer extends BaseReadWrite\n * trait Writable\n * abstract class Reader[T] extends BaseReadWrite\n * trait Readable[T]\n

SparkQA · 2015-11-06T17:43:25Z

Test build #45225 has finished for PR 9454 at commit f862b6a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * abstract class Writer extends BaseReadWrite\n * trait Writable\n * abstract class Reader[T] extends BaseReadWrite\n * trait Readable[T]\n

SparkQA · 2015-11-06T20:54:05Z

Test build #45244 has finished for PR 9454 at commit 7952bd4.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * abstract class Writer extends BaseReadWrite\n * trait Writable\n * abstract class Reader[T] extends BaseReadWrite\n * trait Readable[T]\n

jkbradley · 2015-11-06T22:26:45Z

LGTM

jkbradley · 2015-11-06T22:48:31Z

I'll merge this with branch-1.6 and master

jkbradley · 2015-11-06T22:50:00Z

Spark merge script not happy. Maybe a conflict was just introduced?

jkbradley · 2015-11-06T22:51:01Z

Nevermind, second time's the charm

This PR implements the default save/load for non-meta estimators and transformers using the JSON serialization of param values. The saved metadata includes: * class name * uid * timestamp * paramMap The save/load interface is similar to DataFrames. We use the current active context by default, which should be sufficient for most use cases. ~~~scala instance.save("path") instance.write.context(sqlContext).overwrite().save("path") Instance.load("path") ~~~ The param handling is different from the design doc. We didn't save default and user-set params separately, and when we load it back, all parameters are user-set. This does cause issues. But it also cause other issues if we modify the default params. TODOs: * [x] Java test * [ ] a follow-up PR to implement default save/load for all non-meta estimators and transformers cc jkbradley Author: Xiangrui Meng <[email protected]> Closes #9454 from mengxr/SPARK-11217. (cherry picked from commit c447c9d) Signed-off-by: Joseph K. Bradley <[email protected]>

jkbradley · 2015-11-10T05:37:14Z

@mengxr As I implemented save/load for logreg, I found some things I'd like to change: [https://issues.apache.org/jira/browse/SPARK-11618]. I'll write a PR for these changes before logreg. They should not affect our various PRs too much.

mengxr added 2 commits November 3, 2015 10:56

initial implementation

cd1c7ea

update doc and test

df81d61

mengxr reviewed Nov 4, 2015
View reviewed changes

fix Scala style

e01e92d

rename save/load to write/read to be compatible with DataFrames API

bc8611d

add a test in Java

dd57812

remove options

59d1c5e

mengxr force-pushed the SPARK-11217 branch from 1a01926 to 59d1c5e Compare November 5, 2015 20:09

jkbradley reviewed Nov 5, 2015
View reviewed changes

address comments

a410538

mengxr changed the title ~~[WIP][SPARK-11217][ML] save/load for non-meta estimators and transformers~~ [SPARK-11217][ML] save/load for non-meta estimators and transformers Nov 6, 2015

remove from/to

f862b6a

fix test

7952bd4

asfgit closed this in c447c9d Nov 6, 2015

[SPARK-11217][ML] save/load for non-meta estimators and transformers #9454

[SPARK-11217][ML] save/load for non-meta estimators and transformers #9454

Uh oh!

Conversation

mengxr commented Nov 4, 2015

Uh oh!

mengxr Nov 4, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 4, 2015

Uh oh!

SparkQA commented Nov 4, 2015

Uh oh!

SparkQA commented Nov 5, 2015

Uh oh!

jkbradley commented Nov 5, 2015

Uh oh!

jkbradley Nov 5, 2015

Choose a reason for hiding this comment

Uh oh!

jkbradley commented Nov 5, 2015

Uh oh!

mengxr commented Nov 6, 2015

Uh oh!

SparkQA commented Nov 6, 2015

Uh oh!

SparkQA commented Nov 6, 2015

Uh oh!

SparkQA commented Nov 6, 2015

Uh oh!

jkbradley commented Nov 6, 2015

Uh oh!

jkbradley commented Nov 6, 2015

Uh oh!

jkbradley commented Nov 6, 2015

Uh oh!

jkbradley commented Nov 6, 2015

Uh oh!

jkbradley commented Nov 10, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants