[SPARK-21087] [ML] CrossValidator, TrainValidationSplit expose sub models after fitting: Scala #19208

WeichenXu123 · 2017-09-12T15:39:37Z

What changes were proposed in this pull request?

We add a parameter whether to collect the full model list when CrossValidator/TrainValidationSplit training (Default is NOT), avoid the change cause OOM)

Add a method in CrossValidatorModel/TrainValidationSplitModel, allow user to get the model list
CrossValidatorModelWriter add a “option”, allow user to control whether to persist the model list to disk (will persist by default).
Note: when persisting the model list, use indices as the sub-model path

How was this patch tested?

Test cases added.

SparkQA · 2017-09-12T15:44:40Z

Test build #81685 has finished for PR 19208 at commit 46d3ab3.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

WeichenXu123 · 2017-09-12T15:48:59Z

cc @jkbradley

WeichenXu123 · 2017-09-12T15:50:20Z

mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala

+  def save(path: String, persistSubModels: Boolean): Unit = {
+    write.asInstanceOf[CrossValidatorModel.CrossValidatorModelWriter]
+      .persistSubModels(persistSubModels).save(path)
+  }


I add this method because the CrossValidatorModelWriter is private. User cannot use it. But I don't know whether there is better solution.

I think users can still access CrossValidatorModelWriter through CrossValidatorModel.write, so the save method is unnecessary.

The private[CrossValidatorModel] annotation on the CrossValidatorModelWriter constructor only means that users can't create instances of the class e.g. via new CrossValidatorModel.CrossValidatorModelWriter(...)

I tried model.write.asInstanceOf[CrossValidatorModel.CrossValidatorModelWriter] but cannot pass complier, it is inaccessible.
Do you have some other ways ?

Discussion: Another way I think is adding an interface def option(key: String, value: String) into Writer. cc @jkbradley

I agree with the last suggestion of adding def option(key: String, value: String) to mimic the SQL datasource API.

WeichenXu123 · 2017-09-12T15:56:22Z

mllib/src/main/scala/org/apache/spark/ml/tuning/ValidatorParams.scala

+      .map { case ParamPair(p, v) =>
+        p.name -> parse(p.jsonEncode(v))
+      }.toList ++ List("estimatorParamMaps" -> parse(estimatorParamMapsJson))
+    )


Improve code here. So that we don't need to add code for each parameter. Now we have 3 new added parameter: (parallelism, collectSubModels, persistSubModelPath), all added only in CV/TVS estimator. The old code here is easy to cause bugs if we forgot to update it when we add new params.

WeichenXu123 · 2017-09-12T16:03:04Z

mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala

        .setEstimatorParamMaps(estimatorParamMaps)
-        .setNumFolds(numFolds)
-        .setSeed(seed)
+      DefaultParamsReader.getAndSetParams(cv, metadata, skipParams = List("estimatorParamMaps"))


Use getAndSetParams instead of setting all params manually. This simplify code, and it can keep read/write compatibility.

SparkQA · 2017-09-12T17:15:09Z

Test build #81686 has finished for PR 19208 at commit ae13440.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2017-09-13T20:34:46Z

CC @hhbyyh and @MLnick Does this look reasonable to you?

And @hhbyyh would you want to split off a new JIRA for your original solution of adding an option to dump models to disk? Then we could revive your PR.

WeichenXu123 · 2017-09-13T23:31:43Z

oh...sorry for that, I integrate @hhbyyh's old PR into this new one, because I found the code "dump models to disk" and "collect models" seem to be cohesive and split them will cause some conflicts when merging. @jkbradley

jkbradley · 2017-09-14T00:48:51Z

Synced offline: I hadn't looked carefully and seen the 2 issues had been merged. @WeichenXu123 said he will split the work in 2, adding one parameter first.

WeichenXu123 · 2017-09-14T06:37:14Z

@jkbradley I split this PR, removed the code for "dump models to disk", so the PR will be smaller and easier to review. When this PR merged, I will create follow-up PR for "dump models to disk". Thanks!

SparkQA · 2017-09-14T07:04:45Z

Test build #81767 has finished for PR 19208 at commit e0f4ce6.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

WeichenXu123 · 2017-09-14T07:32:21Z

Jenkins, test this please.

SparkQA · 2017-09-14T08:44:58Z

Test build #81772 has finished for PR 19208 at commit e0f4ce6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hhbyyh · 2017-09-14T23:54:46Z

It's OK to me to include the "dump model to disk" #18313 in this or other PR (or not).

After reading the discussion, I feel it's an overkill to support a feature like this in two ways (keeping in memory and dumping to disk). Allowing user to register a custom action after each batch of est.fit(trainingDataset, epm) looks like a more general solution to me, in there user may dump models to disk, collect it for later use, or evaluate with other metric.

If you want to stick to this way which I'm not a fan of, I would only suggest to add the logic to estimate the memory the models will cost and stops the application if OOM is foreseeable.

smurching

Thanks for the work, just a couple of comments!

smurching · 2017-09-18T22:25:48Z

mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala

      ParamDesc[Int]("aggregationDepth", "suggested depth for treeAggregate (>= 2)", Some("2"),
-        isValid = "ParamValidators.gtEq(2)", isExpertParam = true))
+        isValid = "ParamValidators.gtEq(2)", isExpertParam = true),
+      ParamDesc[Boolean]("collectSubModels", "whether to collect sub models when tuning fitting",


Suggestion: reword "whether to collect sub models when tuning fitting" --> "whether to collect a list of sub-models trained during tuning"

smurching · 2017-09-18T22:31:13Z

mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala


+    val collectSubModelsParam = $(collectSubModels)
+
+    var subModels: Array[Array[Model[_]]] = if (collectSubModelsParam) {


Perhaps use an Option[Array[Model[_]]] instead of setting subModels to null?

smurching · 2017-09-18T23:51:57Z

mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala

  /** A Python-friendly auxiliary constructor. */
  private[ml] def this(uid: String, bestModel: Model[_], avgMetrics: JList[Double]) = {
-    this(uid, bestModel, avgMetrics.asScala.toArray)
+    this(uid, bestModel, avgMetrics.asScala.toArray, null)


See earlier suggestion, use an Option set to None instead of setting the Array to null

smurching · 2017-09-19T00:32:08Z

mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala

+  def save(path: String, persistSubModels: Boolean): Unit = {
+    write.asInstanceOf[CrossValidatorModel.CrossValidatorModelWriter]
+      .persistSubModels(persistSubModels).save(path)
+  }


I think users can still access CrossValidatorModelWriter through CrossValidatorModel.write, so the save method is unnecessary.

The private[CrossValidatorModel] annotation on the CrossValidatorModelWriter constructor only means that users can't create instances of the class e.g. via new CrossValidatorModel.CrossValidatorModelWriter(...)

smurching · 2017-09-19T01:22:59Z

mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala

+        val subModelsPath = new Path(path, "subModels")
+        for (paramIndex <- 0 until instance.getEstimatorParamMaps.length) {
+          val modelPath = new Path(subModelsPath, paramIndex.toString).toString
+          instance.subModels(paramIndex).asInstanceOf[MLWritable].save(modelPath)


Should we clean up/remove the partially-persisted subModels if any of these save() calls fail? E.g. let's say we have four subModels and the first three save() calls succeed but the fourth fails - should we delete the folders for the first three submodels?

@WeichenXu123 Actually I don't think we have to worry about this; Pipeline persistence doesn't clean up if a stage fails to persist (see Pipeline.scala)

Ah, its a good point. But currently model saving code do not have some exception handling code. e.g, overwrite saving, when save failed, it do not recover the old directory.
I think these things can be done in separated PRs.
cc @jkbradley What' your opinion ?

Good question about cleaning up partially saved models. I agree it'd be nice to do in the future, rather than now.

WeichenXu123 · 2017-09-19T01:38:40Z

@smurching Thanks! I will update later. And note that I will separate part of this PR to a new PR (the separated part will be a bugfix for #16774 )

WeichenXu123 · 2017-09-20T01:47:18Z

@smurching I will update this PR after #19278 merged. Because now this PR depend on that one. Thanks!

…st/load bug ## What changes were proposed in this pull request? Currently the param of CrossValidator/TrainValidationSplit persist/loading is hardcoding, which is different with other ML estimators. This cause persist bug for new added `parallelism` param. I refactor related code, avoid hardcoding persist/load param. And in the same time, it solve the `parallelism` persisting bug. This refactoring is very useful because we will add more new params in #19208 , hardcoding param persisting/loading making the thing adding new params very troublesome. ## How was this patch tested? Test added. Author: WeichenXu <[email protected]> Closes #19278 from WeichenXu123/fix-tuning-param-bug.

WeichenXu123 · 2017-09-26T11:02:49Z

I will update this PR after #19350 get merged. We need to address another issue first. Thanks!

SparkQA · 2017-09-27T09:51:03Z

Test build #82233 has finished for PR 19208 at commit 77d05f6.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

WeichenXu123 · 2017-09-27T16:05:54Z

cc @smurching code updated, thanks!

SparkQA · 2017-09-27T16:08:19Z

Test build #82246 has finished for PR 19208 at commit e009ee1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley

I'm sending some comments, but I'm not done yet.

One issue is that users will have a hard time discovered the persistSubModels option. I'd recommend we do the following:

Make the CrossValidatorModelWriter (and TVS writer) public, and add Scala doc to them to describe the option.
Override the write method in CrossValidatorModel so it return type CrossValidatorModelWriter (rather than a generic MLWriter). That should make it a little easier for users to find the writer option.
Add a note in the setCollectSubModels method about persistSubModels (to help with discoverability).

jkbradley · 2017-11-03T18:17:48Z

mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala

-    this(uid, bestModel, avgMetrics, null)
+  private var _subModels: Option[Array[Array[Model[_]]]] = None
+
+  @Since("2.3.0")


Only use Since annotations for public APIs

jkbradley · 2017-11-03T18:18:22Z

mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala

  }

+  @Since("2.3.0")
+  def subModels: Array[Array[Model[_]]] = _subModels.get


Let's add Scala doc. We'll need to explain what the inner and outer array are and which one corresponds to the ordering of estimatorParamsMaps.

Also, can you please add a better Exception message? If submodels are not available, then we should tell users to set the collectSubModels Param before fitting.

jkbradley · 2017-11-03T18:20:00Z

mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala

    /**
-     * Set option for persist sub models.
+     * Extra options for CrossValidatorModelWriter, current support "persistSubModels".
+     * if sub models exsit, the default value for option "persistSubModels" is "true".


typo: exsit -> exist

jkbradley · 2017-11-03T18:24:34Z

mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala

+   * `option()` handles extra options. If subclasses need to support extra options, override this
+   * method.
+   */
+  @Since("2.3.0")


Rather than overriding this in each subclass, let's have this option() method collect the specified options in a map which is consumed by the subclass when saveImpl() is called.

jkbradley · 2017-11-03T20:23:47Z

mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala

 @Since("1.6.0")
 object CrossValidatorModel extends MLReadable[CrossValidatorModel] {

+  private[CrossValidatorModel] def copySubModels(subModels: Option[Array[Array[Model[_]]]]) = {


style: state return value explicitly

jkbradley · 2017-11-03T20:25:55Z

mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala

 object CrossValidatorModel extends MLReadable[CrossValidatorModel] {

+  private[CrossValidatorModel] def copySubModels(subModels: Option[Array[Array[Model[_]]]]) = {
+    subModels.map { subModels =>


Can this be simplified using map?

subModels.map(_.map(_.map(_.copy(...).asInstanceOf[...])))

jkbradley · 2017-11-03T20:27:42Z

mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala

      import org.json4s.JsonDSL._
-      val extraMetadata = "avgMetrics" -> instance.avgMetrics.toSeq
+      val extraMetadata = ("avgMetrics" -> instance.avgMetrics.toSeq) ~
+        ("shouldPersistSubModels" -> shouldPersistSubModels)


Let's have 1 name for this argument: "persistSubModels"

jkbradley · 2017-11-03T20:28:42Z

mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala

      val bestModelPath = new Path(path, "bestModel").toString
      instance.bestModel.asInstanceOf[MLWritable].save(bestModelPath)
+      if (shouldPersistSubModels) {
+        require(instance.hasSubModels, "Cannot get sub models to persist.")


This error message may be unclear. How about adding: "When persisting tuning models, you can only set persistSubModels to true if the tuning was done with collectSubModels set to true. To save the sub-models, try rerunning fitting with collectSubModels set to true."

jkbradley · 2017-11-03T20:29:44Z

mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala

+        require(instance.hasSubModels, "Cannot get sub models to persist.")
+        val subModelsPath = new Path(path, "subModels")
+        for (splitIndex <- 0 until instance.getNumFolds) {
+          val splitPath = new Path(subModelsPath, splitIndex.toString)


How about naming this with the string "fold":
splitIndex.toString --> "fold" + splitIndex.toString?

jkbradley · 2017-11-03T22:48:38Z

Done with review. I mainly reviewed CrossValidator since some comments will apply to TrainValidationSplit as well. Thanks for the PR!

SparkQA · 2017-11-06T04:21:35Z

Test build #83469 has finished for PR 19208 at commit f2ef609.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley

Some more small comments, thanks!

jkbradley · 2017-11-06T22:04:31Z

mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala

+  /**
+   * @return submodels represented in two dimension array. The index of outer array is the
+   *         fold index, and the index of inner array corresponds to the ordering of
+   *         estimatorParamsMaps


typo: estimatorParamMaps

jkbradley · 2017-11-06T22:38:00Z

mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala

+   *         fold index, and the index of inner array corresponds to the ordering of
+   *         estimatorParamsMaps
+   *
+   * Note: If submodels not available, exception will be thrown. only when we set collectSubModels


reword, and use @throws scaladoc:

@throws IllegalArgumentException if subModels are not available. To retrieve subModels, make sure to set collectSubModels to true before fitting.

(Please fix wording in the error message too.)

jkbradley · 2017-11-06T22:40:47Z

mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala

+   * "persistSubModels" will cause exception.
+   */
+  @Since("2.3.0")
  class CrossValidatorModelWriter(instance: CrossValidatorModel) extends MLWriter {


Although we're making this public, let's not make all of its APIs public. Can you please make the constructor private and make this class final?

jkbradley · 2017-11-06T22:44:39Z

mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala

+   * @param instance CrossValidatorModel instance used to construct the writer
+   *
+   * Options:
+   * CrossValidatorModelWriter support an option "persistSubModels", available value is


Fix wording:

CrossValidatorModelWriter supports an option "persistSubModels", with possible values "true" or "false". If you set the collectSubModels Param before fitting, then you can set "persistSubModels" to "true" in order to persist the submodels. By default, "persistSubModels" will be "true" when submodels are available and "false" otherwise. If submodels are not available, then setting "persistSubModels" to "true" will cause an exception.

jkbradley · 2017-11-06T22:45:26Z

mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala

  /**
-   * `option()` handles extra options. If subclasses need to support extra options, override this
-   * method.
+   * Map store extra options for this writer.


"Map to store"

jkbradley · 2017-11-06T22:46:52Z

mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala

+  protected val optionMap: mutable.Map[String, String] = new mutable.HashMap[String, String]()
+
+  /**
+   * `option()` handles extra options.


"Adds an option to the underlying MLWriter. See the documentation for the specific model's writer for possible options. The option name (key) is case-insensitive."

jkbradley · 2017-11-06T22:50:19Z

mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala

-    }
-
    override protected def saveImpl(path: String): Unit = {
+      val persistSubModels = optionMap.getOrElse("persistsubmodels",


Please update this so that, when the valid is not convertible to a Boolean, the user sees an error message which states the invalid value and the possible valid values.

jkbradley · 2017-11-06T22:55:26Z

mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala

+   * Note: If set this param, when you save the returned model, you can set an option
+   * "persistSubModels" to be "true" before saving, in order to save these submodels.
+   * You can check documents of
+   * {@link org.apache.spark.ml.tuning.CrossValidatorModel.CrossValidatorModelWriter}


I haven't checked through TrainValidationSplit yet, but please do make sure updates to CrossValidator get applied here (and that the updates are checked for copy errors like this line). Thanks!

I have done some search to make sure everywhere is checked.

SparkQA · 2017-11-07T06:22:49Z

Test build #83530 has finished for PR 19208 at commit 7bacfca.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-11-07T12:23:04Z

Test build #83534 has finished for PR 19208 at commit 654e4d5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley

Just a few tiny items left, thanks!

jkbradley · 2017-11-13T18:07:25Z

mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala

+   * If subModels are not available, then setting "persistSubModels" to "true" will cause
+   * an exception.
+   */
+  final class TrainValidationSplitModelWriter private[tuning] (


Since annotation

jkbradley · 2017-11-13T18:07:38Z

mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala


  @Since("2.0.0")
-  override def write: MLWriter = new TrainValidationSplit.TrainValidationSplitWriter(this)
+  override def write: TrainValidationSplit.TrainValidationSplitWriter = {


Was this meant to be for TrainValidationSplitModel, not the Estimator?

Ah, there is two write method, one for Estimator and another for Model.
We only need to change the return type of write method for model.

jkbradley · 2017-11-13T18:13:42Z

mllib/src/test/scala/org/apache/spark/ml/tuning/CrossValidatorSuite.scala

+    val eval = new BinaryClassificationEvaluator
+    val numFolds = 3
+    val subPath = new File(tempDir, "testCrossValidatorSubModels")
+    val persistSubModelsPath = new File(subPath, "subModels").toString


jkbradley

2 more comments about backwards compatibility. Would you mind testing this manually, saving a model from spark 2.2 and then loading it with a build of this PR?

jkbradley · 2017-11-13T18:16:02Z

mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala

      val bestModelPath = new Path(path, "bestModel").toString
      val bestModel = DefaultParamsReader.loadParamsInstance[Model[_]](bestModelPath, sc)
      val avgMetrics = (metadata.metadata \ "avgMetrics").extract[Seq[Double]].toArray
+      val shouldPersistSubModels = (metadata.metadata \ "persistSubModels").extract[Boolean]


I realized this will not be backwards compatible. Let's make this persistSubModels optional so that we assume it is false if it is not in the metadata.

jkbradley · 2017-11-13T18:16:35Z

mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala

      val bestModelPath = new Path(path, "bestModel").toString
      val bestModel = DefaultParamsReader.loadParamsInstance[Model[_]](bestModelPath, sc)
      val validationMetrics = (metadata.metadata \ "validationMetrics").extract[Seq[Double]].toArray
+      val shouldPersistSubModels = (metadata.metadata \ "persistSubModels").extract[Boolean]


Same here; let's make this optional

SparkQA · 2017-11-14T05:14:15Z

Test build #83823 has finished for PR 19208 at commit 2bb6835.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

WeichenXu123 · 2017-11-14T07:45:45Z

I manually tested backwards compatibility and it works fine. I paste the test code for CrossValidator here.

Run following code in spark-2.2 shell first:

import java.io.File
import org.apache.spark.ml.tuning._
import org.apache.spark.ml.{Estimator, Model, Pipeline}
import org.apache.spark.ml.classification.{LogisticRegression, LogisticRegressionModel, OneVsRest}
import org.apache.spark.ml.feature.HashingTF
import org.apache.spark.ml.linalg.Vectors
import org.apache.spark.ml.param.ParamMap
import org.apache.spark.ml.regression.LinearRegression
import org.apache.spark.sql.Dataset
import org.apache.spark.sql.types.StructType
import org.apache.spark.ml.evaluation.{BinaryClassificationEvaluator, Evaluator, MulticlassClassificationEvaluator, RegressionEvaluator}
import org.apache.spark.ml.feature.{Instance, LabeledPoint}

def generateLogisticInput(offset: Double,scale: Double,nPoints: Int,seed: Int):Seq[LabeledPoint] = {
    val rnd = new java.util.Random(seed)
    val x1 = Array.fill[Double](nPoints)(rnd.nextGaussian())
    val y = (0 until nPoints).map { i =>
      val p = 1.0 / (1.0 + math.exp(-(offset + scale * x1(i))))
      if (rnd.nextDouble() < p) 1.0 else 0.0
    }
    val testData = (0 until nPoints).map(i => LabeledPoint(y(i), Vectors.dense(Array(x1(i)))))
    testData
  }
import spark.implicits._
val dataset = sc.parallelize(generateLogisticInput(0.0, 1.0, 10, 42), 2).toDF()
val lr = new LogisticRegression
val lrParamMaps = new ParamGridBuilder().addGrid(lr.regParam, Array(0.001, 1000.0)).addGrid(lr.maxIter, Array(0, 3)).build()
val eval = new BinaryClassificationEvaluator
val numFolds = 3
val cv = new CrossValidator().setEstimator(lr).setEstimatorParamMaps(lrParamMaps).setEvaluator(eval).setNumFolds(numFolds)
val cvModel = cv.fit(dataset)
cvModel.save("file:///Users/weichenxu/work/test/s1")

and then run following code on current PR: (in spark-shell)

val model = org.apache.spark.ml.tuning.CrossValidatorModel.load("file:///Users/weichenxu/work/test/s1")
model.hasSubModels

SparkQA · 2017-11-14T08:05:02Z

Test build #83824 has finished for PR 19208 at commit 7e997da.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

WeichenXu123 · 2017-11-14T08:07:16Z

Jenkins, test this please.

SparkQA · 2017-11-14T10:51:43Z

Test build #83835 has finished for PR 19208 at commit 7e997da.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-11-14T11:25:26Z

Test build #83834 has finished for PR 19208 at commit 7e997da.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2017-11-15T00:47:54Z

Awesome, thanks for the updates and for checking backwards compatibility!
LGTM
Merging with master

init pr

46d3ab3

WeichenXu123 mentioned this pull request Sep 12, 2017

[SPARK-19357][ML] Adding parallel model evaluation in ML tuning #16774

Closed

WeichenXu123 mentioned this pull request Sep 12, 2017

[SPARK-21087] [ML] CrossValidator, TrainValidationSplit should preserve all models after fitting: Scala #18313

Closed

WeichenXu123 commented Sep 12, 2017

View reviewed changes

fix style

ae13440

remove code for dump models to disk

e0f4ce6

WeichenXu123 changed the title ~~[SPARK-21087] [ML] CrossValidator, TrainValidationSplit should preserve all models after fitting: Scala~~ [SPARK-21087] [ML] CrossValidator, TrainValidationSplit expose sub models after fitting: Scala Sep 14, 2017

smurching reviewed Sep 19, 2017

View reviewed changes

WeichenXu123 mentioned this pull request Sep 19, 2017

[SPARK-22060][ML] Fix CrossValidator/TrainValidationSplit param persist/load bug #19278

Closed

WeichenXu123 added 2 commits September 27, 2017 21:54

merge master & resolve conflicts

a33c4ea

address comment issues

e009ee1

WeichenXu123 force-pushed the expose-model-list branch from 77d05f6 to e009ee1 Compare September 27, 2017 14:59

jkbradley reviewed Nov 3, 2017

View reviewed changes

address comments from joseph

f2ef609

jkbradley reviewed Nov 6, 2017

View reviewed changes

address comments from joseph

7bacfca

fix mima

654e4d5

jkbradley reviewed Nov 13, 2017

View reviewed changes

address minor issues

7e997da

WeichenXu123 force-pushed the expose-model-list branch from 2bb6835 to 7e997da Compare November 14, 2017 05:23

asfgit closed this in 7743980 Nov 15, 2017

WeichenXu123 deleted the expose-model-list branch November 15, 2017 04:05

yanboliang mentioned this pull request Dec 12, 2017

[SPARK-21087] [ML] [FOLLOWUP] Sync SharedParamsCodeGen and sharedParams. #19958

Closed


		val collectSubModelsParam = $(collectSubModels)

		var subModels: Array[Array[Model[_]]] = if (collectSubModelsParam) {

[SPARK-21087] [ML] CrossValidator, TrainValidationSplit expose sub models after fitting: Scala #19208

[SPARK-21087] [ML] CrossValidator, TrainValidationSplit expose sub models after fitting: Scala #19208

Uh oh!

Conversation

WeichenXu123 commented Sep 12, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Sep 12, 2017

Uh oh!

WeichenXu123 commented Sep 12, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 12, 2017

Uh oh!

jkbradley commented Sep 13, 2017

Uh oh!

WeichenXu123 commented Sep 13, 2017

Uh oh!

jkbradley commented Sep 14, 2017

Uh oh!

WeichenXu123 commented Sep 14, 2017

Uh oh!

SparkQA commented Sep 14, 2017

Uh oh!

WeichenXu123 commented Sep 14, 2017

Uh oh!

SparkQA commented Sep 14, 2017

Uh oh!

hhbyyh commented Sep 14, 2017

Uh oh!

smurching left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WeichenXu123 commented Sep 19, 2017

Uh oh!

WeichenXu123 commented Sep 20, 2017

Uh oh!

WeichenXu123 commented Sep 26, 2017

Uh oh!

SparkQA commented Sep 27, 2017

Uh oh!

WeichenXu123 commented Sep 27, 2017

Uh oh!

SparkQA commented Sep 27, 2017

Uh oh!

jkbradley left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

WeichenXu123 commented Sep 12, 2017 •

edited

Loading

jkbradley commented Nov 3, 2017 •

edited

Loading