Skip to content

Conversation

@WeichenXu123
Copy link
Contributor

@WeichenXu123 WeichenXu123 commented Nov 1, 2017

What changes were proposed in this pull request?

Add python API for collecting sub-models during CrossValidator/TrainValidationSplit fitting.

How was this patch tested?

UT added.

@SparkQA
Copy link

SparkQA commented Nov 1, 2017

Test build #83290 has finished for PR 19627 at commit c071183.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class HasCollectSubModels(Params):
  • class CrossValidator(Estimator, ValidatorParams, HasParallelism, HasCollectSubModels, MLReadable, MLWritable):
  • class TrainValidationSplit(Estimator, ValidatorParams, HasParallelism, HasCollectSubModels,

@WeichenXu123
Copy link
Contributor Author

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Nov 17, 2017

Test build #83953 has finished for PR 19627 at commit 4c3a7ea.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 17, 2017

Test build #83954 has finished for PR 19627 at commit 9e27f6b.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@WeichenXu123
Copy link
Contributor Author

My local test passed. This test failure looks like test system issue.

@WeichenXu123 WeichenXu123 changed the title [SPARK-21088][ML][WIP] CrossValidator, TrainValidationSplit support collect all models when fitting: Python API [SPARK-21088][ML] CrossValidator, TrainValidationSplit support collect all models when fitting: Python API Nov 17, 2017
@holdenk
Copy link
Contributor

holdenk commented Nov 18, 2017

What happens when you run check-license locally? I agree it doesn't look like any of these changes would impact the license headers.

@WeichenXu123
Copy link
Contributor Author

@holdenk Find the reason. There is an empty file in the directory. :)

@SparkQA
Copy link

SparkQA commented Nov 19, 2017

Test build #83991 has finished for PR 19627 at commit 758bc24.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 19, 2017

Test build #83992 has finished for PR 19627 at commit ae082f5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class CrossValidator(Estimator, ValidatorParams, HasParallelism, HasCollectSubModels,

@WeichenXu123
Copy link
Contributor Author

@holdenk Thanks!

@WeichenXu123 WeichenXu123 changed the title [SPARK-21088][ML] CrossValidator, TrainValidationSplit support collect all models when fitting: Python API [SPARK-21088][ML][WIP] CrossValidator, TrainValidationSplit support collect all models when fitting: Python API Dec 1, 2017
@jkbradley
Copy link
Member

Is this still WIP or ready?

@WeichenXu123
Copy link
Contributor Author

@jkbradley I think it is better to review #19857 (fix python model specific optimization) and merge it first and then I rebase & update this PR. :)

@WeichenXu123 WeichenXu123 reopened this Apr 10, 2018
@WeichenXu123
Copy link
Contributor Author

@MrBago @yogeshg @jkbradley Updated and ready for review now!

@SparkQA
Copy link

SparkQA commented Apr 10, 2018

Test build #89111 has finished for PR 19627 at commit 81473b0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@WeichenXu123 WeichenXu123 changed the title [SPARK-21088][ML][WIP] CrossValidator, TrainValidationSplit support collect all models when fitting: Python API [SPARK-21088][ML] CrossValidator, TrainValidationSplit support collect all models when fitting: Python API Apr 10, 2018
Copy link
Member

@jkbradley jkbradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

You'll need to update _from_java and _to_java for CrossValidator and TrainValidationSplit.

Also, please update the PR description.

tvs = TrainValidationSplit(estimator=lr, estimatorParamMaps=grid, evaluator=evaluator,
collectSubModels=True)
tvsModel = tvs.fit(dataset)
assert len(tvsModel.subModels) == len(grid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use self.assertEqual here and elsewhere.

"TypeConverters.toInt"),
("parallelism", "the number of threads to use when running parallel algorithms (>= 1).",
"1", "TypeConverters.toInt"),
("collectSubModels", "whether to collect a list of sub-models trained during tuning",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add the full description from Scala.



class CrossValidator(Estimator, ValidatorParams, HasParallelism, MLReadable, MLWritable):
class CrossValidator(Estimator, ValidatorParams, HasParallelism, HasCollectSubModels,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to update _from_java and _to_java as well to pass collectSubModels around. (Same for TrainValidationSplit)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also clarify in the doc for CrossValidatorModel.copy() that it does not copy the extra Params into the subModels. (same for TrainValidationSplitModel)

cvParallelModel = cv.fit(dataset)
self.assertEqual(cvSerialModel.avgMetrics, cvParallelModel.avgMetrics)

def test_expose_sub_models(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tests. Can you make one addition: Test the copy() method to make sure it copies the submodels.

@SparkQA
Copy link

SparkQA commented Apr 13, 2018

Test build #89334 has finished for PR 19627 at commit 80f07fb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

LGTM
Merging with master
Thanks!

@asfgit asfgit closed this in 0461482 Apr 16, 2018
@WeichenXu123 WeichenXu123 deleted the expose-model-list-py branch April 16, 2018 23:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants