Skip to content

Commit 161a3f2

Browse files
author
Nick Pentreath
committed
[SPARK-23112][DOC] Update ML migration guide with breaking and behavior changes.
Add breaking changes, as well as update behavior changes, to `2.3` ML migration guide. ## How was this patch tested? Doc only Author: Nick Pentreath <[email protected]> Closes #20421 from MLnick/SPARK-23112-ml-guide.
1 parent 695f714 commit 161a3f2

File tree

1 file changed

+19
-2
lines changed

1 file changed

+19
-2
lines changed

docs/ml-guide.md

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,13 @@ and the migration guide below will explain all changes between releases.
108108

109109
### Breaking changes
110110

111-
There are no breaking changes.
111+
* The class and trait hierarchy for logistic regression model summaries was changed to be cleaner
112+
and better accommodate the addition of the multi-class summary. This is a breaking change for user
113+
code that casts a `LogisticRegressionTrainingSummary` to a
114+
` BinaryLogisticRegressionTrainingSummary`. Users should instead use the `model.binarySummary`
115+
method. See [SPARK-17139](https://issues.apache.org/jira/browse/SPARK-17139) for more detail
116+
(_note_ this is an `Experimental` API). This _does not_ affect the Python `summary` method, which
117+
will still work correctly for both multinomial and binary cases.
112118

113119
### Deprecations and changes of behavior
114120

@@ -123,8 +129,19 @@ new [`OneHotEncoderEstimator`](ml-features.html#onehotencoderestimator)
123129
**Changes of behavior**
124130

125131
* [SPARK-21027](https://issues.apache.org/jira/browse/SPARK-21027):
126-
We are now setting the default parallelism used in `OneVsRest` to be 1 (i.e. serial). In 2.2 and
132+
The default parallelism used in `OneVsRest` is now set to 1 (i.e. serial). In `2.2` and
127133
earlier versions, the level of parallelism was set to the default threadpool size in Scala.
134+
* [SPARK-22156](https://issues.apache.org/jira/browse/SPARK-22156):
135+
The learning rate update for `Word2Vec` was incorrect when `numIterations` was set greater than
136+
`1`. This will cause training results to be different between `2.3` and earlier versions.
137+
* [SPARK-21681](https://issues.apache.org/jira/browse/SPARK-21681):
138+
Fixed an edge case bug in multinomial logistic regression that resulted in incorrect coefficients
139+
when some features had zero variance.
140+
* [SPARK-16957](https://issues.apache.org/jira/browse/SPARK-16957):
141+
Tree algorithms now use mid-points for split values. This may change results from model training.
142+
* [SPARK-14657](https://issues.apache.org/jira/browse/SPARK-14657):
143+
Fixed an issue where the features generated by `RFormula` without an intercept were inconsistent
144+
with the output in R. This may change results from model training in this scenario.
128145

129146
## Previous Spark versions
130147

0 commit comments

Comments
 (0)