Skip to content

Commit d20a976

Browse files
felixcheungFelix Cheung
authored andcommitted
[SPARK-20192][SPARKR][DOC] SparkR migration guide to 2.2.0
## What changes were proposed in this pull request? Updating R Programming Guide ## How was this patch tested? manually Author: Felix Cheung <[email protected]> Closes #17816 from felixcheung/r22relnote.
1 parent 943a684 commit d20a976

File tree

1 file changed

+8
-0
lines changed

1 file changed

+8
-0
lines changed

docs/sparkr.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -644,3 +644,11 @@ You can inspect the search path in R with [`search()`](https://stat.ethz.ch/R-ma
644644
## Upgrading to SparkR 2.1.0
645645

646646
- `join` no longer performs Cartesian Product by default, use `crossJoin` instead.
647+
648+
## Upgrading to SparkR 2.2.0
649+
650+
- A `numPartitions` parameter has been added to `createDataFrame` and `as.DataFrame`. When splitting the data, the partition position calculation has been made to match the one in Scala.
651+
- The method `createExternalTable` has been deprecated to be replaced by `createTable`. Either methods can be called to create external or managed table. Additional catalog methods have also been added.
652+
- By default, derby.log is now saved to `tempdir()`. This will be created when instantiating the SparkSession with `enableHiveSupport` set to `TRUE`.
653+
- `spark.lda` was not setting the optimizer correctly. It has been corrected.
654+
- Several model summary outputs are updated to have `coefficients` as `matrix`. This includes `spark.logit`, `spark.kmeans`, `spark.glm`. Model summary outputs for `spark.gaussianMixture` have added log-likelihood as `loglik`.

0 commit comments

Comments
 (0)