Skip to content

Conversation

@WeichenXu123
Copy link
Contributor

What changes were proposed in this pull request?

Currently the param of CrossValidator/TrainValidationSplit persist/loading is hardcoding, which is different with other ML estimators. This cause persist bug for new added parallelism param.

I refactor related code, avoid hardcoding persist/load param. And in the same time, it solve the parallelism persisting bug.

This refactoring is very useful because we will add more new params in #19208 , hardcoding param persisting/loading making the thing adding new params very troublesome.

How was this patch tested?

Test added.

@WeichenXu123
Copy link
Contributor Author

cc @BryanCutler Thanks!

@SparkQA
Copy link

SparkQA commented Sep 19, 2017

Test build #81927 has finished for PR 19278 at commit 042b3d5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@smurching smurching left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I've left a few comments.

(This might be a task for a future PR) I'm curious why setEstimator() and setEvaluator() aren't part of the ValidatorParams API - they're currently called by both classes that extend ValidatorParams (TrainValidationSplit, CrossValidator) and moving these setter methods up the class hierarchy would allow you to share more loading/persistence code in ValidatorParams.saveImpl

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use an Option[List[String]] that defaults to None instead of a List[String] that defaults to null?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the docstring to state that params included in skipParams aren't set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exception is unused & can be removed.

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the parallelsim param persistence. I think it is a little confusing that some of the params are loaded manually, and some by DefaultParamReader.getAndSetParams that then requires you to then skip certain params. Is is possible to do this without the list of params to skip? If not, then maybe it would be better not to use DefaultParamsReader.getAndSetParams at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this included by accident?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I will remove it. sorry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I will remove it. sorry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also skip estimator and evaluator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Because estimator and evaluator isn't included in metadata. You can check the saveImpl.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you also need to skip estimator and evaluator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Because estimator and evaluator isn't included in metadata. You can check the saveImpl.

@WeichenXu123
Copy link
Contributor Author

@BryanCutler The reason I add skipParams is that, if we don't use DefaultParamReader.getAndSetParams, we have to hardcoding all params which are very troublesome. And every time we add new params, we have to also update the hardcoding params, it is very easy to forget and cause bug. But if use DefaultParamReader.getAndSetParams but do not support skipParams, the estimatorParamMaps is difficult to handle.
So this design is a balance of this concerns, although it makes the code a little weird.

@SparkQA
Copy link

SparkQA commented Sep 20, 2017

Test build #81959 has finished for PR 19278 at commit cc30578.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@jkbradley jkbradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this! One question: I think this maintains backwards compatibility, but would you mind testing that manually by:

  • Exporting a CV model using spark/master
  • Importing that CV model using this PR's branch, and making sure that works?

Thanks!

.setTrainRatio(0.5)
.setEstimatorParamMaps(paramMaps)
.setSeed(42L)
.setParallelism(2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the test for the Model too please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The model do not own parallel parameter. This was discussed before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right, thanks

.setNumFolds(20)
.setEstimatorParamMaps(paramMaps)
.setSeed(42L)
.setParallelism(2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the test for the model too please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

* TODO: Move to [[Metadata]] method
*/
def getAndSetParams(instance: Params, metadata: Metadata): Unit = {
def getAndSetParams(instance: Params, metadata: Metadata,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix scala style: 1 arg per line for multiline declarations

* This works if all Params (except params included by `skipParams` list) implement
* [[org.apache.spark.ml.param.Param.jsonDecode()]].
*
* The params included in `skipParams` won't be set. This is useful if some params don't
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document params using @param


val (metadata, estimator, evaluator, estimatorParamMaps) =
ValidatorParams.loadImpl(path, sc, className)
val numFolds = (metadata.params \ "numFolds").extract[Int]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numFolds is no longer needed

@WeichenXu123
Copy link
Contributor Author

@jkbradley Sure I tested the backwards compatibility. Part of the reason I changed into DefaultParamReader.getAndSetParams is for backwards compatibility.

@SparkQA
Copy link

SparkQA commented Sep 22, 2017

Test build #82057 has finished for PR 19278 at commit 8f78f59.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

LGTM
Merging with master
Thanks @WeichenXu123 for the fix and for testing for backwards compatibility!

@asfgit asfgit closed this in f180b65 Sep 23, 2017
@WeichenXu123 WeichenXu123 deleted the fix-tuning-param-bug branch September 23, 2017 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants