Skip to content

Conversation

@WeichenXu123
Copy link
Contributor

@WeichenXu123 WeichenXu123 commented Sep 26, 2017

What changes were proposed in this pull request?

Push down fitting parallelization code from CrossValidator/TrainValidationSplit into Estimators.
See discussions in SPARK-19357.
Design doc here https://docs.google.com/document/d/1xw5M4sp1e0eQie75yIt-r6-GTuD5vpFf_I6v-AFBM3M/edit?usp=sharing

scala api:

def fit(dataset: Dataset[_], paramMaps: Array[ParamMap],
    unpersistDatasetAfterFitting: Boolean, executionContext: ExecutionContext,
    modelCallback: (Model[_], Int) => Unit): Unit

java api:

def fit(dataset: Dataset[_], paramMaps: Array[ParamMap],
    unpersistDatasetAfterFitting: Boolean, executionContext: ExecutionContext,
    modelCallback: VoidFunction2[Model[_], Int]): Unit

Note: Either in scala or in java, developer can use helper method HasParrallelism.getExecutionContext to get the ExecutionContext object which this API requires.

Discussion: Whether we need to provide both scala & java api, or only provide java api ? In this PR, I provide both scala & java api, so in scala side, developer can override scala api (it will be easier to use), and in java side, developer can override java api.

How was this patch tested?

N/A

}

@Since("2.3.0")
def fit(dataset: Dataset[_], paramMaps: Array[ParamMap],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add doc for this interface later. But it can be reviewed first, currently this interface looks a little ugly.

@SparkQA
Copy link

SparkQA commented Sep 26, 2017

Test build #82189 has finished for PR 19350 at commit d5625a6.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@WeichenXu123
Copy link
Contributor Author

retest this please.

@SparkQA
Copy link

SparkQA commented Sep 26, 2017

Test build #82191 has finished for PR 19350 at commit d5625a6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 15, 2017

Test build #83889 has finished for PR 19350 at commit 5614390.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 20, 2017

Test build #84023 has finished for PR 19350 at commit 1b4a384.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// Fit models in a Future for training in parallel
val modelFutures = paramMaps.map { paramMap =>
Future[Model[_]] {
fit(dataset, paramMap).asInstanceOf[Model[_]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this work in a pipeline?

If the Estimator in CV is a Pipeline, then here it will call fit(dataset, paramMap) on the Pipeline which will in turn fit on each stage with that paramMap. This is what the current parallel CV is doing.

But if we have a stage with model-specific optimization (let's say for arguments sake a LinearRegression that can internally optimize maxIter) then its fit will be called with only a single paramMap arg.

So that pushing the parallel fit into Estimator nullifies any benefit from model-specific optimizations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MLnick Oh, the design is still under discussion on JIRA and will be changed I think. I should mark this WIP. thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WeichenXu123 WeichenXu123 changed the title [SPARK-22126][ML] Fix model-specific optimization support for ML tuning [SPARK-22126][ML][WIP] Fix model-specific optimization support for ML tuning Dec 13, 2017
@WeichenXu123
Copy link
Contributor Author

WeichenXu123 commented Dec 19, 2017

Design changed. @MrBago will create new PR for this later. New design is here https://docs.google.com/document/d/1xw5M4sp1e0eQie75yIt-r6-GTuD5vpFf_I6v-AFBM3M/edit?usp=sharing

@WeichenXu123 WeichenXu123 deleted the fix_model_spec_optim branch December 19, 2017 02:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants