Skip to content

Conversation

@mengxr
Copy link
Contributor

@mengxr mengxr commented Nov 18, 2015

Add read/write support to the following estimators under spark.ml:

  • CountVectorizer
  • IDF
  • MinMaxScaler
  • StandardScaler (a little awkward because we store some params in spark.mllib model)
  • StringIndexer

Added some necessary method for read/write. Maybe we should add private[ml] trait DefaultParamsReadable and DefaultParamsWritable to save some boilerplate code, though we still need to override load for Java compatibility.

@jkbradley

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Included some code clean-up of #6665.

@SparkQA
Copy link

SparkQA commented Nov 18, 2015

Test build #46190 has finished for PR 9798 at commit 3ea051a.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * class CountVectorizerModelWriter(instance: CountVectorizerModel) extends Writer\n * final class IDF(override val uid: String) extends Estimator[IDFModel] with IDFBase with Writable\n * class MinMaxScalerModelWriter(instance: MinMaxScalerModel) extends Writer\n * class StandardScalerModelWriter(instance: StandardScalerModel) extends Writer\n * class StringIndexModelWriter(instance: StringIndexerModel) extends Writer\n

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are already Params. This is just weird b/c the new and old model params can get out of synch. Fine for now though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I send #9839 to do minor refactor for StandardScaler, it fixed this issue.

@jkbradley
Copy link
Member

@mengxr LGTM except for the Scala style issue.

My comments aren't actionable unless you want to simplify the tests using testEstimatorAndModelReadWrite

@mengxr
Copy link
Contributor Author

mengxr commented Nov 18, 2015

testEstimatorAndModelReadWrite calls fit, which takes some extra time. So I construct the model directly.

@mengxr
Copy link
Contributor Author

mengxr commented Nov 18, 2015

test this please

1 similar comment
@mengxr
Copy link
Contributor Author

mengxr commented Nov 18, 2015

test this please

@SparkQA
Copy link

SparkQA commented Nov 18, 2015

Test build #46262 has finished for PR 9798 at commit a41349f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * class CountVectorizerModelWriter(instance: CountVectorizerModel) extends Writer\n * final class IDF(override val uid: String) extends Estimator[IDFModel] with IDFBase with Writable\n * class MinMaxScalerModelWriter(instance: MinMaxScalerModel) extends Writer\n * class StandardScalerModelWriter(instance: StandardScalerModel) extends Writer\n * class StringIndexModelWriter(instance: StringIndexerModel) extends Writer\n

asfgit pushed a commit that referenced this pull request Nov 18, 2015
Add read/write support to the following estimators under spark.ml:

* CountVectorizer
* IDF
* MinMaxScaler
* StandardScaler (a little awkward because we store some params in spark.mllib model)
* StringIndexer

Added some necessary method for read/write. Maybe we should add `private[ml] trait DefaultParamsReadable` and `DefaultParamsWritable` to save some boilerplate code, though we still need to override `load` for Java compatibility.

jkbradley

Author: Xiangrui Meng <[email protected]>

Closes #9798 from mengxr/SPARK-6787.

(cherry picked from commit 7e987de)
Signed-off-by: Xiangrui Meng <[email protected]>
@mengxr
Copy link
Contributor Author

mengxr commented Nov 18, 2015

Merged into master and branch-1.6.

@asfgit asfgit closed this in 7e987de Nov 18, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants