Skip to content

Commit d468715

Browse files
committed
Create a Migration Guide tap in Spark documentation
1 parent 7f36cd2 commit d468715

17 files changed

+1295
-1158
lines changed

docs/_data/menu-migration.yaml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
- text: Spark Core
2+
url: core-migration-guide.html
3+
- text: SQL, Datasets and DataFrame
4+
url: sql-migration-guide.html
5+
- text: Structured Streaming
6+
url: ss-migration-guide.html
7+
- text: MLlib (Machine Learning)
8+
url: ml-migration-guide.html
9+
- text: PySpark (Python on Spark)
10+
url: pyspark-migration-guide.html
11+
- text: SparkR (R on Spark)
12+
url: sparkr-migration-guide.html

docs/_data/menu-sql.yaml

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -64,15 +64,7 @@
6464
- text: Usage Notes
6565
url: sql-pyspark-pandas-with-arrow.html#usage-notes
6666
- text: Migration Guide
67-
url: sql-migration-guide.html
68-
subitems:
69-
- text: Spark SQL Upgrading Guide
70-
url: sql-migration-guide-upgrade.html
71-
- text: Compatibility with Apache Hive
72-
url: sql-migration-guide-hive-compatibility.html
73-
- text: SQL Reserved/Non-Reserved Keywords
74-
url: sql-reserved-and-non-reserved-keywords.html
75-
67+
url: sql-migration-old.html
7668
- text: SQL Reference
7769
url: sql-ref.html
7870
subitems:
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
<div class="left-menu-wrapper">
2+
<div class="left-menu">
3+
<h3><a href="migration-guide.html">Migration Guide</a></h3>
4+
{% include nav-left.html nav=include.nav-migration %}
5+
</div>
6+
</div>

docs/_layouts/global.html

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,7 @@
112112
<li><a href="job-scheduling.html">Job Scheduling</a></li>
113113
<li><a href="security.html">Security</a></li>
114114
<li><a href="hardware-provisioning.html">Hardware Provisioning</a></li>
115+
<li><a href="migration-guide.html">Migration Guide</a></li>
115116
<li class="divider"></li>
116117
<li><a href="building-spark.html">Building Spark</a></li>
117118
<li><a href="https://spark.apache.org/contributing.html">Contributing to Spark</a></li>
@@ -126,8 +127,10 @@
126127

127128
<div class="container-wrapper">
128129

129-
{% if page.url contains "/ml" or page.url contains "/sql" %}
130-
{% if page.url contains "/ml" %}
130+
{% if page.url contains "/ml" or page.url contains "/sql" or page.url contains "migration-guide.html" %}
131+
{% if page.url contains "migration-guide.html" %}
132+
{% include nav-left-wrapper-migration.html nav-migration=site.data.menu-migration %}
133+
{% elsif page.url contains "/ml" %}
131134
{% include nav-left-wrapper-ml.html nav-mllib=site.data.menu-mllib nav-ml=site.data.menu-ml %}
132135
{% else %}
133136
{% include nav-left-wrapper-sql.html nav-sql=site.data.menu-sql %}

docs/core-migration-guide.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
---
2+
layout: global
3+
title: "Migration Guide: Spark Core"
4+
displayTitle: "Migration Guide: Spark Core"
5+
license: |
6+
Licensed to the Apache Software Foundation (ASF) under one or more
7+
contributor license agreements. See the NOTICE file distributed with
8+
this work for additional information regarding copyright ownership.
9+
The ASF licenses this file to You under the Apache License, Version 2.0
10+
(the "License"); you may not use this file except in compliance with
11+
the License. You may obtain a copy of the License at
12+
13+
http://www.apache.org/licenses/LICENSE-2.0
14+
15+
Unless required by applicable law or agreed to in writing, software
16+
distributed under the License is distributed on an "AS IS" BASIS,
17+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
18+
See the License for the specific language governing permissions and
19+
limitations under the License.
20+
---
21+
22+
* Table of contents
23+
{:toc}
24+
25+
## Upgrading from Core 2.4 to 3.0
26+
27+
- In Spark 3.0, deprecated method `TaskContext.isRunningLocally` has been removed. Local execution was removed and it always has returned `false`.
28+
29+
- In Spark 3.0, deprecated method `shuffleBytesWritten`, `shuffleWriteTime` and `shuffleRecordsWritten` in `ShuffleWriteMetrics` have been removed. Instead, use `bytesWritten`, `writeTime ` and `recordsWritten` respectively.
30+
31+
- In Spark 3.0, deprecated method `AccumulableInfo.apply` have been removed because creating `AccumulableInfo` is disallowed.
32+

docs/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,7 @@ options for deployment:
146146
* Integration with other storage systems:
147147
* [Cloud Infrastructures](cloud-integration.html)
148148
* [OpenStack Swift](storage-openstack-swift.html)
149+
* [Migration Guide](migration-guide.html): Migration guides for Spark components
149150
* [Building Spark](building-spark.html): build Spark using the Maven system
150151
* [Contributing to Spark](https://spark.apache.org/contributing.html)
151152
* [Third Party Projects](https://spark.apache.org/third-party-projects.html): related third party Spark projects

docs/migration-guide.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
---
2+
layout: global
3+
title: Migration Guide
4+
displayTitle: Migration Guide
5+
license: |
6+
Licensed to the Apache Software Foundation (ASF) under one or more
7+
contributor license agreements. See the NOTICE file distributed with
8+
this work for additional information regarding copyright ownership.
9+
The ASF licenses this file to You under the Apache License, Version 2.0
10+
(the "License"); you may not use this file except in compliance with
11+
the License. You may obtain a copy of the License at
12+
13+
http://www.apache.org/licenses/LICENSE-2.0
14+
15+
Unless required by applicable law or agreed to in writing, software
16+
distributed under the License is distributed on an "AS IS" BASIS,
17+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
18+
See the License for the specific language governing permissions and
19+
limitations under the License.
20+
---
21+
22+
This page documents sections of the migration guide for each component in order
23+
for users to migrate effectively.
24+
25+
* [Spark Core](core-migration-guide.html)
26+
* [SQL, Datasets, and DataFrame](sql-migration-guide.html)
27+
* [Structured Streaming](ss-migration-guide.html)
28+
* [MLlib (Machine Learning)](ml-migration-guide.html)
29+
* [PySpark (Python on Spark)](pyspark-migration-guide.html)
30+
* [SparkR (R on Spark)](sparkr-migration-guide.html)

docs/ml-guide.md

Lines changed: 2 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -113,68 +113,7 @@ transforming multiple columns.
113113
* Robust linear regression with Huber loss
114114
([SPARK-3181](https://issues.apache.org/jira/browse/SPARK-3181)).
115115

116-
# Migration guide
116+
# Migration Guide
117117

118-
MLlib is under active development.
119-
The APIs marked `Experimental`/`DeveloperApi` may change in future releases,
120-
and the migration guide below will explain all changes between releases.
118+
The migration guide is now archived [on this page](ml-migration-guide.html).
121119

122-
## From 2.4 to 3.0
123-
124-
### Breaking changes
125-
126-
* `OneHotEncoder` which is deprecated in 2.3, is removed in 3.0 and `OneHotEncoderEstimator` is now renamed to `OneHotEncoder`.
127-
128-
### Changes of behavior
129-
130-
* [SPARK-11215](https://issues.apache.org/jira/browse/SPARK-11215):
131-
In Spark 2.4 and previous versions, when specifying `frequencyDesc` or `frequencyAsc` as
132-
`stringOrderType` param in `StringIndexer`, in case of equal frequency, the order of
133-
strings is undefined. Since Spark 3.0, the strings with equal frequency are further
134-
sorted by alphabet. And since Spark 3.0, `StringIndexer` supports encoding multiple
135-
columns.
136-
137-
## From 2.2 to 2.3
138-
139-
### Breaking changes
140-
141-
* The class and trait hierarchy for logistic regression model summaries was changed to be cleaner
142-
and better accommodate the addition of the multi-class summary. This is a breaking change for user
143-
code that casts a `LogisticRegressionTrainingSummary` to a
144-
`BinaryLogisticRegressionTrainingSummary`. Users should instead use the `model.binarySummary`
145-
method. See [SPARK-17139](https://issues.apache.org/jira/browse/SPARK-17139) for more detail
146-
(_note_ this is an `Experimental` API). This _does not_ affect the Python `summary` method, which
147-
will still work correctly for both multinomial and binary cases.
148-
149-
### Deprecations and changes of behavior
150-
151-
**Deprecations**
152-
153-
* `OneHotEncoder` has been deprecated and will be removed in `3.0`. It has been replaced by the
154-
new [`OneHotEncoderEstimator`](ml-features.html#onehotencoderestimator)
155-
(see [SPARK-13030](https://issues.apache.org/jira/browse/SPARK-13030)). **Note** that
156-
`OneHotEncoderEstimator` will be renamed to `OneHotEncoder` in `3.0` (but
157-
`OneHotEncoderEstimator` will be kept as an alias).
158-
159-
**Changes of behavior**
160-
161-
* [SPARK-21027](https://issues.apache.org/jira/browse/SPARK-21027):
162-
The default parallelism used in `OneVsRest` is now set to 1 (i.e. serial). In `2.2` and
163-
earlier versions, the level of parallelism was set to the default threadpool size in Scala.
164-
* [SPARK-22156](https://issues.apache.org/jira/browse/SPARK-22156):
165-
The learning rate update for `Word2Vec` was incorrect when `numIterations` was set greater than
166-
`1`. This will cause training results to be different between `2.3` and earlier versions.
167-
* [SPARK-21681](https://issues.apache.org/jira/browse/SPARK-21681):
168-
Fixed an edge case bug in multinomial logistic regression that resulted in incorrect coefficients
169-
when some features had zero variance.
170-
* [SPARK-16957](https://issues.apache.org/jira/browse/SPARK-16957):
171-
Tree algorithms now use mid-points for split values. This may change results from model training.
172-
* [SPARK-14657](https://issues.apache.org/jira/browse/SPARK-14657):
173-
Fixed an issue where the features generated by `RFormula` without an intercept were inconsistent
174-
with the output in R. This may change results from model training in this scenario.
175-
176-
## Previous Spark versions
177-
178-
Earlier migration guides are archived [on this page](ml-migration-guides.html).
179-
180-
---

docs/ml-migration-guides.md renamed to docs/ml-migration-guide.md

Lines changed: 82 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
11
---
22
layout: global
3-
title: Old Migration Guides - MLlib
4-
displayTitle: Old Migration Guides - MLlib
5-
description: MLlib migration guides from before Spark SPARK_VERSION_SHORT
3+
title: "Migration Guide: MLlib (Machine Learning)"
4+
displayTitle: "Migration Guide MLlib (Machine Learning)"
65
license: |
76
Licensed to the Apache Software Foundation (ASF) under one or more
87
contributor license agreements. See the NOTICE file distributed with
@@ -20,15 +19,80 @@ license: |
2019
limitations under the License.
2120
---
2221

23-
The migration guide for the current Spark version is kept on the [MLlib Guide main page](ml-guide.html#migration-guide).
22+
* Table of contents
23+
{:toc}
2424

25-
## From 2.1 to 2.2
25+
Note that this migration guide describes the items specific to MLlib.
26+
Many items of SQL migration can be applied when migrating MLlib to higher versions
27+
for DataFrame-based APIs. Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide.html).
28+
29+
## Upgrading from MLlib 2.4 to 3.0
30+
31+
### Breaking changes
32+
{:.no_toc}
33+
34+
* `OneHotEncoder` which is deprecated in 2.3, is removed in 3.0 and `OneHotEncoderEstimator` is now renamed to `OneHotEncoder`.
35+
36+
### Changes of behavior
37+
{:.no_toc}
38+
39+
* [SPARK-11215](https://issues.apache.org/jira/browse/SPARK-11215):
40+
In Spark 2.4 and previous versions, when specifying `frequencyDesc` or `frequencyAsc` as
41+
`stringOrderType` param in `StringIndexer`, in case of equal frequency, the order of
42+
strings is undefined. Since Spark 3.0, the strings with equal frequency are further
43+
sorted by alphabet. And since Spark 3.0, `StringIndexer` supports encoding multiple
44+
columns.
45+
46+
## Upgrading from MLlib 2.2 to 2.3
47+
48+
### Breaking changes
49+
{:.no_toc}
50+
51+
* The class and trait hierarchy for logistic regression model summaries was changed to be cleaner
52+
and better accommodate the addition of the multi-class summary. This is a breaking change for user
53+
code that casts a `LogisticRegressionTrainingSummary` to a
54+
`BinaryLogisticRegressionTrainingSummary`. Users should instead use the `model.binarySummary`
55+
method. See [SPARK-17139](https://issues.apache.org/jira/browse/SPARK-17139) for more detail
56+
(_note_ this is an `Experimental` API). This _does not_ affect the Python `summary` method, which
57+
will still work correctly for both multinomial and binary cases.
58+
59+
### Deprecations and changes of behavior
60+
{:.no_toc}
61+
62+
**Deprecations**
63+
64+
* `OneHotEncoder` has been deprecated and will be removed in `3.0`. It has been replaced by the
65+
new [`OneHotEncoderEstimator`](ml-features.html#onehotencoderestimator)
66+
(see [SPARK-13030](https://issues.apache.org/jira/browse/SPARK-13030)). **Note** that
67+
`OneHotEncoderEstimator` will be renamed to `OneHotEncoder` in `3.0` (but
68+
`OneHotEncoderEstimator` will be kept as an alias).
69+
70+
**Changes of behavior**
71+
72+
* [SPARK-21027](https://issues.apache.org/jira/browse/SPARK-21027):
73+
The default parallelism used in `OneVsRest` is now set to 1 (i.e. serial). In `2.2` and
74+
earlier versions, the level of parallelism was set to the default threadpool size in Scala.
75+
* [SPARK-22156](https://issues.apache.org/jira/browse/SPARK-22156):
76+
The learning rate update for `Word2Vec` was incorrect when `numIterations` was set greater than
77+
`1`. This will cause training results to be different between `2.3` and earlier versions.
78+
* [SPARK-21681](https://issues.apache.org/jira/browse/SPARK-21681):
79+
Fixed an edge case bug in multinomial logistic regression that resulted in incorrect coefficients
80+
when some features had zero variance.
81+
* [SPARK-16957](https://issues.apache.org/jira/browse/SPARK-16957):
82+
Tree algorithms now use mid-points for split values. This may change results from model training.
83+
* [SPARK-14657](https://issues.apache.org/jira/browse/SPARK-14657):
84+
Fixed an issue where the features generated by `RFormula` without an intercept were inconsistent
85+
with the output in R. This may change results from model training in this scenario.
86+
87+
## Upgrading from MLlib 2.1 to 2.2
2688

2789
### Breaking changes
90+
{:.no_toc}
2891

2992
There are no breaking changes.
3093

3194
### Deprecations and changes of behavior
95+
{:.no_toc}
3296

3397
**Deprecations**
3498

@@ -45,9 +109,10 @@ There are no deprecations.
45109
`StringIndexer` now handles `NULL` values in the same way as unseen values. Previously an exception
46110
would always be thrown regardless of the setting of the `handleInvalid` parameter.
47111

48-
## From 2.0 to 2.1
112+
## Upgrading from MLlib 2.0 to 2.1
49113

50114
### Breaking changes
115+
{:.no_toc}
51116

52117
**Deprecated methods removed**
53118

@@ -59,6 +124,7 @@ There are no deprecations.
59124
* `validateParams` in `Evaluator`
60125

61126
### Deprecations and changes of behavior
127+
{:.no_toc}
62128

63129
**Deprecations**
64130

@@ -74,9 +140,10 @@ There are no deprecations.
74140
* [SPARK-17389](https://issues.apache.org/jira/browse/SPARK-17389):
75141
`KMeans` reduces the default number of steps from 5 to 2 for the k-means|| initialization mode.
76142

77-
## From 1.6 to 2.0
143+
## Upgrading from MLlib 1.6 to 2.0
78144

79145
### Breaking changes
146+
{:.no_toc}
80147

81148
There were several breaking changes in Spark 2.0, which are outlined below.
82149

@@ -171,6 +238,7 @@ Several deprecated methods were removed in the `spark.mllib` and `spark.ml` pack
171238
A full list of breaking changes can be found at [SPARK-14810](https://issues.apache.org/jira/browse/SPARK-14810).
172239

173240
### Deprecations and changes of behavior
241+
{:.no_toc}
174242

175243
**Deprecations**
176244

@@ -221,7 +289,7 @@ Changes of behavior in the `spark.mllib` and `spark.ml` packages include:
221289
`QuantileDiscretizer` now uses `spark.sql.DataFrameStatFunctions.approxQuantile` to find splits (previously used custom sampling logic).
222290
The output buckets will differ for same input data and params.
223291

224-
## From 1.5 to 1.6
292+
## Upgrading from MLlib 1.5 to 1.6
225293

226294
There are no breaking API changes in the `spark.mllib` or `spark.ml` packages, but there are
227295
deprecations and changes of behavior.
@@ -248,7 +316,7 @@ Changes of behavior:
248316
tokenizing. Now, it converts to lowercase by default, with an option not to. This matches the
249317
behavior of the simpler `Tokenizer` transformer.
250318

251-
## From 1.4 to 1.5
319+
## Upgrading from MLlib 1.4 to 1.5
252320

253321
In the `spark.mllib` package, there are no breaking API changes but several behavior changes:
254322

@@ -267,7 +335,7 @@ In the `spark.ml` package, there exists one breaking API change and one behavior
267335
* [SPARK-10097](https://issues.apache.org/jira/browse/SPARK-10097): `Evaluator.isLargerBetter` is
268336
added to indicate metric ordering. Metrics like RMSE no longer flip signs as in 1.4.
269337

270-
## From 1.3 to 1.4
338+
## Upgrading from MLlib 1.3 to 1.4
271339

272340
In the `spark.mllib` package, there were several breaking changes, but all in `DeveloperApi` or `Experimental` APIs:
273341

@@ -286,7 +354,7 @@ Since the `spark.ml` API was an alpha component in Spark 1.3, we do not list all
286354
However, since 1.4 `spark.ml` is no longer an alpha component, we will provide details on any API
287355
changes for future releases.
288356

289-
## From 1.2 to 1.3
357+
## Upgrading from MLlib 1.2 to 1.3
290358

291359
In the `spark.mllib` package, there were several breaking changes. The first change (in `ALS`) is the only one in a component not marked as Alpha or Experimental.
292360

@@ -313,7 +381,7 @@ Other changes were in `LogisticRegression`:
313381
* The `scoreCol` output column (with default value "score") was renamed to be `probabilityCol` (with default value "probability"). The type was originally `Double` (for the probability of class 1.0), but it is now `Vector` (for the probability of each class, to support multiclass classification in the future).
314382
* In Spark 1.2, `LogisticRegressionModel` did not include an intercept. In Spark 1.3, it includes an intercept; however, it will always be 0.0 since it uses the default settings for [spark.mllib.LogisticRegressionWithLBFGS](api/scala/index.html#org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS). The option to use an intercept will be added in the future.
315383

316-
## From 1.1 to 1.2
384+
## Upgrading from MLlib 1.1 to 1.2
317385

318386
The only API changes in MLlib v1.2 are in
319387
[`DecisionTree`](api/scala/index.html#org.apache.spark.mllib.tree.DecisionTree),
@@ -339,7 +407,7 @@ The tree `Node` now includes more information, including the probability of the
339407
Examples in the Spark distribution and examples in the
340408
[Decision Trees Guide](mllib-decision-tree.html#examples) have been updated accordingly.
341409

342-
## From 1.0 to 1.1
410+
## Upgrading from MLlib 1.0 to 1.1
343411

344412
The only API changes in MLlib v1.1 are in
345413
[`DecisionTree`](api/scala/index.html#org.apache.spark.mllib.tree.DecisionTree),
@@ -365,7 +433,7 @@ simple `String` types.
365433
Examples of the new recommended `trainClassifier` and `trainRegressor` are given in the
366434
[Decision Trees Guide](mllib-decision-tree.html#examples).
367435

368-
## From 0.9 to 1.0
436+
## Upgrading from MLlib 0.9 to 1.0
369437

370438
In MLlib v1.0, we support both dense and sparse input in a unified way, which introduces a few
371439
breaking changes. If your data is sparse, please store it in a sparse format instead of dense to

0 commit comments

Comments
 (0)