[SPARK-18324][ML][DOC] Update ML programming and migration guide for 2.1 release #16076

yanboliang · 2016-11-30T06:09:53Z

What changes were proposed in this pull request?

Update ML programming and migration guide for 2.1 release.

How was this patch tested?

Doc change, no test.

SparkQA · 2016-11-30T06:32:28Z

Test build #69385 has finished for PR 16076 at commit f3f1fc3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yanboliang · 2016-11-30T11:27:55Z

cc @jkbradley @MLnick @sethah

jkbradley

Just a few comments.

One more: We need to note [https://issues.apache.org/jira/browse/SPARK-18291] or revert that change in order to avoid breaking APIs. CC @mengxr

jkbradley · 2016-11-30T22:48:44Z

docs/ml-guide.md

- In `spark.ml.util.MLReader` and `spark.ml.util.MLWriter`, the `context` method has been deprecated in favor of `session`.
-* In `spark.ml.feature.ChiSqSelectorModel`, the `setLabelCol` method has been deprecated since it was not used by `ChiSqSelectorModel`.
+* [SPARK-18592](https://issues.apache.org/jira/browse/SPARK-18592):
+  Deprecate all setter methods for `DecisionTreeClassificationModel`, `GBTClassificationModel`, `RandomForestClassificationModel`, `DecisionTreeRegressionModel`, `GBTRegressionModel` and `RandomForestRegressionModel`


"all setter methods" --> "all Param setter methods except for input/output column Params"

jkbradley · 2016-11-30T22:51:06Z

docs/ml-guide.md

+* `numTrees` in `classification.RandomForestClassificationModel` (This now refers to the Param called `numTrees`)
+* `numTrees` in `regression.RandomForestRegressionModel` (This now refers to the Param called `numTrees`)
+* `model` in `regression.LinearRegressionSummary`
+* `validateParams` in `PipelineStage`


Also: validateParams in Evaluator

jkbradley · 2016-11-30T22:54:53Z

docs/ml-guide.md

+ `KMeans` returns potentially fewer than k cluster centers in cases where k distinct centroids aren't available or aren't selected.
+* [SPARK-17389](https://issues.apache.org/jira/browse/SPARK-17389):
+ `KMeans` reduces the default number of steps from 5 to 2 for the k-means|| initialization mode.
+* [SPARK-18481](https://issues.apache.org/jira/browse/SPARK-18481):


This should go under breaking changes. I'm starting to wonder if we should revert this one change. It doesn't gain us much.

I moved it to breaking changes session. I think the Params setter methods are rarely used, since it takes no effect and should not be exposed for model classes (This is also why we deprecate them from 2.1). Even they are used, it's no breaking changes for Scala users. And for Java users, they always should make forced type conversion, such as:

RandomForestClassificationModel model = (RandomForestClassificationModel)rf.fit(dataFrame).setFeatureSubsetStrategy("sqrt");

So I think it has little affect and actually no breaking happens, so I think it's OK. But I'm open to hear your thoughts.

Ah, you're right. I was thinking the "final" change was breaking for Scala, but it's not since the model constructor is private. I'm OK with moving those back to changes of behavior, or just removing these items completely.

OK, I remove these completely. Thanks.

jkbradley · 2016-11-30T22:56:24Z

docs/ml-guide.md

- `QuantileDiscretizer` now uses `spark.sql.DataFrameStatFunctions.approxQuantile` to find splits (previously used custom sampling logic).
- The output buckets will differ for same input data and params.
+* [SPARK-17870](https://issues.apache.org/jira/browse/SPARK-17870):
+ Fix a bug of `ChiSqSelector` which will likely change its result.


Add the more specific text from the JIRA: "ChiSquareSelector use pValue rather than raw statistic for SelectKBest features."

SparkQA · 2016-12-01T05:45:52Z

Test build #69456 has finished for PR 16076 at commit e784901.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yanboliang · 2016-12-01T05:57:21Z

@jkbradley I addressed other comments except for SPARK-18291, since I think it's SparkR related issue and should be listed at SparkR session of migration guide. I'm preparing a PR to update SparkR migration guide. Thanks.

jkbradley · 2016-12-01T21:37:59Z

Sounds good about SPARK-18291.

I responded inline above about SPARK-18481. Apart from this update, this looks ready to me. Thank you!

SparkQA · 2016-12-02T08:56:54Z

Test build #69559 has finished for PR 16076 at commit f583653.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2016-12-03T00:27:33Z

LGTM
I'll merge this with master and branch-2.1
Thank you!

…2.1 release ## What changes were proposed in this pull request? Update ML programming and migration guide for 2.1 release. ## How was this patch tested? Doc change, no test. Author: Yanbo Liang <[email protected]> Closes #16076 from yanboliang/spark-18324. (cherry picked from commit 2dc0d7e) Signed-off-by: Joseph K. Bradley <[email protected]>

…2.1 release ## What changes were proposed in this pull request? Update ML programming and migration guide for 2.1 release. ## How was this patch tested? Doc change, no test. Author: Yanbo Liang <[email protected]> Closes apache#16076 from yanboliang/spark-18324.

Programming guide update and migration guide

f3f1fc3

yanboliang changed the title ~~[SPARK-18324][ML][DOC] ML programming guide update and migration guide for 2.1 release~~ [SPARK-18324][ML][DOC] Update ML programming and migration guide for 2.1 release Nov 30, 2016

jkbradley reviewed Nov 30, 2016

View reviewed changes

Address comments.

e784901

Remove SPARK-18481 from breaking changes.

f583653

asfgit closed this in 2dc0d7e Dec 3, 2016

yanboliang deleted the spark-18324 branch December 3, 2016 05:15

[SPARK-18324][ML][DOC] Update ML programming and migration guide for 2.1 release #16076

[SPARK-18324][ML][DOC] Update ML programming and migration guide for 2.1 release #16076

Uh oh!

Conversation

yanboliang commented Nov 30, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Nov 30, 2016

Uh oh!

yanboliang commented Nov 30, 2016

Uh oh!

jkbradley left a comment

Choose a reason for hiding this comment

Uh oh!

jkbradley Nov 30, 2016

Choose a reason for hiding this comment

Uh oh!

jkbradley Nov 30, 2016

Choose a reason for hiding this comment

Uh oh!

jkbradley Nov 30, 2016

Choose a reason for hiding this comment

Uh oh!

yanboliang Dec 1, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkbradley Dec 1, 2016

Choose a reason for hiding this comment

Uh oh!

yanboliang Dec 2, 2016

Choose a reason for hiding this comment

Uh oh!

jkbradley Nov 30, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 1, 2016

Uh oh!

yanboliang commented Dec 1, 2016

Uh oh!

jkbradley commented Dec 1, 2016

Uh oh!

SparkQA commented Dec 2, 2016

Uh oh!

jkbradley commented Dec 3, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yanboliang commented Nov 30, 2016 •

edited

Loading

yanboliang Dec 1, 2016 •

edited

Loading