[ML][Minor] Separate estimator and model params for read/write test. #17151

yanboliang · 2017-03-03T14:01:30Z

What changes were proposed in this pull request?

Since we allow Estimator and Model not always share same params (see ALSParams and ALSModelParams), we should pass in test params for estimator and model separately in function testEstimatorAndModelReadWrite.

How was this patch tested?

Existing tests.

SparkQA · 2017-03-03T14:29:41Z

Test build #73846 has finished for PR 17151 at commit efff28b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-03T15:09:36Z

Test build #73847 has finished for PR 17151 at commit 8f4f87e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sethah · 2017-03-03T16:18:51Z

mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala

    val categoricalData: DataFrame =
      TreeTests.setMetadata(rdd, Map(0 -> 2, 1 -> 3), numClasses = 2)
-    testEstimatorAndModelReadWrite(dt, categoricalData, allParamSettings, checkModelData)
+    testEstimatorAndModelReadWrite(dt, categoricalData, allParamSettings,


It would seriously reduce the amount of code changed (and therefore make this much easier to review :p ) to just add an extra constructor:

def testEstimatorAndModelReadWrite[ E <: Estimator[M] with MLWritable, M <: Model[M] with MLWritable]( estimator: E, dataset: Dataset[_], testParams: Map[String, Any], checkModelData: (M, M) => Unit): Unit = { testEstimatorAndModelReadWrite(estimator, dataset, testParams, testParams, checkModelData) }

@sethah Thanks for your kindly remind. I was planing to write as your suggestion before I'm thinking:

Actually all Model only need to extends a fraction of Params from Estimator, so all testEstimatorAndModelReadWrite(estimator, dataset, testParams, checkModelData) should be changed to testEstimatorAndModelReadWrite(estimator, dataset, testEstimatorParams, testModelParams, checkModelData) eventually. I explicitly write with the later way in test suites to remind developers should separate their estimator and model params when adding new algorithms' read/write test. I'm afraid that developers are not aware of the separation if they refer other test suites and find almost all test cases only pass in testParams.

Though in the currently change, we pass in allParamSettings to both testEstimatorParams and testModelParams, this is because they share the same params set. Others like ALS will be pass in separate params. I think we should push forward to refactor *** and ***Model to separate their params, which could make models more succinct.

If this is a public API, I totally agree with you. However, this is an internal auxiliary function, I think all test cases will need to pass in separate params eventually, so I settle a matter at one go.

This is my two cents, I'm still open to hear your thoughts. If you have strongly opinion, I can update according your suggestion. Thanks.

Good points. I still think it's better to just add the extra constructor, but I don't feel strongly about it. So we can proceed with whatever you feel is best. Thanks!

I prefer to let any refactoring of these tests happen as-needed. If there are specific cases that need to be done now, we should create JIRAs to track them.

Yeah, actually almost all models are extends lots of params which are not necessary, I'd like to remove these params for models as todo list. I'll create JIRAs to track them. I'll merge this first since it blocks #17117. Thanks.

sethah

LGTM

sethah · 2017-03-08T01:30:30Z

mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala

    val categoricalData: DataFrame =
      TreeTests.setMetadata(rdd, Map(0 -> 2, 1 -> 3), numClasses = 2)
-    testEstimatorAndModelReadWrite(dt, categoricalData, allParamSettings, checkModelData)
+    testEstimatorAndModelReadWrite(dt, categoricalData, allParamSettings,


I prefer to let any refactoring of these tests happen as-needed. If there are specific cases that need to be done now, we should create JIRAs to track them.

yanboliang · 2017-03-08T10:05:42Z

Merged into master. Thanks for reviewing.

yanboliang added 2 commits March 3, 2017 05:55

Separate estimator and model params for read/write test.

efff28b

Refactor ALS read/write test.

8f4f87e

yanboliang mentioned this pull request Mar 3, 2017

[SPARK-10780][ML] Support initial model for KMeans. #17117

Closed

sethah reviewed Mar 3, 2017

View reviewed changes

sethah approved these changes Mar 8, 2017

View reviewed changes

asfgit closed this in 1fa5886 Mar 8, 2017

yanboliang deleted the test-rw branch March 8, 2017 10:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML][Minor] Separate estimator and model params for read/write test. #17151

[ML][Minor] Separate estimator and model params for read/write test. #17151

Uh oh!

yanboliang commented Mar 3, 2017 •

edited

Loading

Uh oh!

SparkQA commented Mar 3, 2017

Uh oh!

SparkQA commented Mar 3, 2017

Uh oh!

sethah Mar 3, 2017 •

edited

Loading

Uh oh!

yanboliang Mar 4, 2017

Uh oh!

sethah Mar 7, 2017

Uh oh!

sethah Mar 8, 2017

Uh oh!

yanboliang Mar 8, 2017

Uh oh!

sethah left a comment

Uh oh!

sethah Mar 8, 2017

Uh oh!

yanboliang commented Mar 8, 2017 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[ML][Minor] Separate estimator and model params for read/write test. #17151

[ML][Minor] Separate estimator and model params for read/write test. #17151

Uh oh!

Conversation

yanboliang commented Mar 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Mar 3, 2017

Uh oh!

SparkQA commented Mar 3, 2017

Uh oh!

sethah Mar 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yanboliang Mar 4, 2017

Choose a reason for hiding this comment

Uh oh!

sethah Mar 7, 2017

Choose a reason for hiding this comment

Uh oh!

sethah Mar 8, 2017

Choose a reason for hiding this comment

Uh oh!

yanboliang Mar 8, 2017

Choose a reason for hiding this comment

Uh oh!

sethah left a comment

Choose a reason for hiding this comment

Uh oh!

sethah Mar 8, 2017

Choose a reason for hiding this comment

Uh oh!

yanboliang commented Mar 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yanboliang commented Mar 3, 2017 •

edited

Loading

sethah Mar 3, 2017 •

edited

Loading

yanboliang commented Mar 8, 2017 •

edited

Loading