[SPARK-22126][ML][WIP] Fix model-specific optimization support for ML tuning #19350

WeichenXu123 · 2017-09-26T10:17:14Z

What changes were proposed in this pull request?

Push down fitting parallelization code from CrossValidator/TrainValidationSplit into Estimators.
See discussions in SPARK-19357.
Design doc here https://docs.google.com/document/d/1xw5M4sp1e0eQie75yIt-r6-GTuD5vpFf_I6v-AFBM3M/edit?usp=sharing

scala api:

def fit(dataset: Dataset[_], paramMaps: Array[ParamMap],
    unpersistDatasetAfterFitting: Boolean, executionContext: ExecutionContext,
    modelCallback: (Model[_], Int) => Unit): Unit

java api:

def fit(dataset: Dataset[_], paramMaps: Array[ParamMap],
    unpersistDatasetAfterFitting: Boolean, executionContext: ExecutionContext,
    modelCallback: VoidFunction2[Model[_], Int]): Unit

Note: Either in scala or in java, developer can use helper method HasParrallelism.getExecutionContext to get the ExecutionContext object which this API requires.

Discussion: Whether we need to provide both scala & java api, or only provide java api ? In this PR, I provide both scala & java api, so in scala side, developer can override scala api (it will be easier to use), and in java side, developer can override java api.

How was this patch tested?

N/A

WeichenXu123 · 2017-09-26T10:19:16Z

mllib/src/main/scala/org/apache/spark/ml/Estimator.scala

  }

+  @Since("2.3.0")
+  def fit(dataset: Dataset[_], paramMaps: Array[ParamMap],


I will add doc for this interface later. But it can be reviewed first, currently this interface looks a little ugly.

SparkQA · 2017-09-26T10:34:09Z

Test build #82189 has finished for PR 19350 at commit d5625a6.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

WeichenXu123 · 2017-09-26T10:37:01Z

retest this please.

SparkQA · 2017-09-26T11:51:07Z

Test build #82191 has finished for PR 19350 at commit d5625a6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-11-15T11:04:44Z

Test build #83889 has finished for PR 19350 at commit 5614390.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-11-20T18:28:49Z

Test build #84023 has finished for PR 19350 at commit 1b4a384.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

MLnick · 2017-12-13T09:19:54Z

mllib/src/main/scala/org/apache/spark/ml/Estimator.scala

+    // Fit models in a Future for training in parallel
+    val modelFutures = paramMaps.map { paramMap =>
+      Future[Model[_]] {
+        fit(dataset, paramMap).asInstanceOf[Model[_]]


How will this work in a pipeline?

If the Estimator in CV is a Pipeline, then here it will call fit(dataset, paramMap) on the Pipeline which will in turn fit on each stage with that paramMap. This is what the current parallel CV is doing.

But if we have a stage with model-specific optimization (let's say for arguments sake a LinearRegression that can internally optimize maxIter) then its fit will be called with only a single paramMap arg.

So that pushing the parallel fit into Estimator nullifies any benefit from model-specific optimizations?

@MLnick Oh, the design is still under discussion on JIRA and will be changed I think. I should mark this WIP. thanks!

@MLnick I dicussed with @jkbradley @MrBago offline and here is the newest proposal
https://docs.google.com/document/d/1xw5M4sp1e0eQie75yIt-r6-GTuD5vpFf_I6v-AFBM3M/edit?usp=sharing
Thanks!

WeichenXu123 · 2017-12-19T02:49:02Z

Design changed. @MrBago will create new PR for this later. New design is here https://docs.google.com/document/d/1xw5M4sp1e0eQie75yIt-r6-GTuD5vpFf_I6v-AFBM3M/edit?usp=sharing

init pr

d5625a6

WeichenXu123 commented Sep 26, 2017

View reviewed changes

WeichenXu123 mentioned this pull request Sep 26, 2017

[SPARK-21087] [ML] CrossValidator, TrainValidationSplit expose sub models after fitting: Scala #19208

Closed

WeichenXu123 closed this Sep 27, 2017

WeichenXu123 reopened this Nov 3, 2017

update & resolve conflicts

5614390

add java api

1b4a384

WeichenXu123 mentioned this pull request Dec 1, 2017

[SPARK-22667][ML][WIP] Fix model-specific optimization support for ML tuning: Python API #19857

Closed

MLnick reviewed Dec 13, 2017

View reviewed changes

WeichenXu123 changed the title ~~[SPARK-22126][ML] Fix model-specific optimization support for ML tuning~~ [SPARK-22126][ML][WIP] Fix model-specific optimization support for ML tuning Dec 13, 2017

WeichenXu123 closed this Dec 19, 2017

WeichenXu123 deleted the fix_model_spec_optim branch December 19, 2017 02:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-22126][ML][WIP] Fix model-specific optimization support for ML tuning #19350

[SPARK-22126][ML][WIP] Fix model-specific optimization support for ML tuning #19350

Uh oh!

WeichenXu123 commented Sep 26, 2017 •

edited

Loading

Uh oh!

WeichenXu123 Sep 26, 2017

Uh oh!

SparkQA commented Sep 26, 2017

Uh oh!

WeichenXu123 commented Sep 26, 2017

Uh oh!

SparkQA commented Sep 26, 2017

Uh oh!

SparkQA commented Nov 15, 2017

Uh oh!

SparkQA commented Nov 20, 2017

Uh oh!

MLnick Dec 13, 2017

Uh oh!

WeichenXu123 Dec 13, 2017

Uh oh!

WeichenXu123 Dec 15, 2017

Uh oh!

WeichenXu123 commented Dec 19, 2017 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-22126][ML][WIP] Fix model-specific optimization support for ML tuning #19350

[SPARK-22126][ML][WIP] Fix model-specific optimization support for ML tuning #19350

Uh oh!

Conversation

WeichenXu123 commented Sep 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

WeichenXu123 Sep 26, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 26, 2017

Uh oh!

WeichenXu123 commented Sep 26, 2017

Uh oh!

SparkQA commented Sep 26, 2017

Uh oh!

SparkQA commented Nov 15, 2017

Uh oh!

SparkQA commented Nov 20, 2017

Uh oh!

MLnick Dec 13, 2017

Choose a reason for hiding this comment

Uh oh!

WeichenXu123 Dec 13, 2017

Choose a reason for hiding this comment

Uh oh!

WeichenXu123 Dec 15, 2017

Choose a reason for hiding this comment

Uh oh!

WeichenXu123 commented Dec 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

WeichenXu123 commented Sep 26, 2017 •

edited

Loading

WeichenXu123 commented Dec 19, 2017 •

edited

Loading