Skip to content

Conversation

@yanboliang
Copy link
Contributor

@yanboliang yanboliang commented Mar 3, 2017

What changes were proposed in this pull request?

Since we allow Estimator and Model not always share same params (see ALSParams and ALSModelParams), we should pass in test params for estimator and model separately in function testEstimatorAndModelReadWrite.

How was this patch tested?

Existing tests.

@SparkQA
Copy link

SparkQA commented Mar 3, 2017

Test build #73846 has finished for PR 17151 at commit efff28b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 3, 2017

Test build #73847 has finished for PR 17151 at commit 8f4f87e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val categoricalData: DataFrame =
TreeTests.setMetadata(rdd, Map(0 -> 2, 1 -> 3), numClasses = 2)
testEstimatorAndModelReadWrite(dt, categoricalData, allParamSettings, checkModelData)
testEstimatorAndModelReadWrite(dt, categoricalData, allParamSettings,
Copy link
Contributor

@sethah sethah Mar 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would seriously reduce the amount of code changed (and therefore make this much easier to review :p ) to just add an extra constructor:

def testEstimatorAndModelReadWrite[
    E <: Estimator[M] with MLWritable, M <: Model[M] with MLWritable](
      estimator: E,
      dataset: Dataset[_],
      testParams: Map[String, Any],
      checkModelData: (M, M) => Unit): Unit = {
    testEstimatorAndModelReadWrite(estimator, dataset, testParams, testParams, checkModelData)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sethah Thanks for your kindly remind. I was planing to write as your suggestion before I'm thinking:

  • Actually all Model only need to extends a fraction of Params from Estimator, so all testEstimatorAndModelReadWrite(estimator, dataset, testParams, checkModelData) should be changed to testEstimatorAndModelReadWrite(estimator, dataset, testEstimatorParams, testModelParams, checkModelData) eventually. I explicitly write with the later way in test suites to remind developers should separate their estimator and model params when adding new algorithms' read/write test. I'm afraid that developers are not aware of the separation if they refer other test suites and find almost all test cases only pass in testParams.
  • Though in the currently change, we pass in allParamSettings to both testEstimatorParams and testModelParams, this is because they share the same params set. Others like ALS will be pass in separate params. I think we should push forward to refactor *** and ***Model to separate their params, which could make models more succinct.
  • If this is a public API, I totally agree with you. However, this is an internal auxiliary function, I think all test cases will need to pass in separate params eventually, so I settle a matter at one go.

This is my two cents, I'm still open to hear your thoughts. If you have strongly opinion, I can update according your suggestion. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points. I still think it's better to just add the extra constructor, but I don't feel strongly about it. So we can proceed with whatever you feel is best. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to let any refactoring of these tests happen as-needed. If there are specific cases that need to be done now, we should create JIRAs to track them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, actually almost all models are extends lots of params which are not necessary, I'd like to remove these params for models as todo list. I'll create JIRAs to track them. I'll merge this first since it blocks #17117. Thanks.

Copy link
Contributor

@sethah sethah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

val categoricalData: DataFrame =
TreeTests.setMetadata(rdd, Map(0 -> 2, 1 -> 3), numClasses = 2)
testEstimatorAndModelReadWrite(dt, categoricalData, allParamSettings, checkModelData)
testEstimatorAndModelReadWrite(dt, categoricalData, allParamSettings,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to let any refactoring of these tests happen as-needed. If there are specific cases that need to be done now, we should create JIRAs to track them.

@yanboliang
Copy link
Contributor Author

yanboliang commented Mar 8, 2017

Merged into master. Thanks for reviewing.

@asfgit asfgit closed this in 1fa5886 Mar 8, 2017
@yanboliang yanboliang deleted the test-rw branch March 8, 2017 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants