-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-15177.1] [R] make SparkR model params and default values consistent with MLlib #13801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc: @shivaram |
| #' @note spark.kmeans since 2.0.0 | ||
| setMethod("spark.kmeans", signature(data = "SparkDataFrame", formula = "formula"), | ||
| function(data, formula, k, maxIter = 10, initMode = c("random", "k-means||")) { | ||
| function(data, formula, k = 2, maxIter = 20, initMode = c("k-means||", "random")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to clarify - this initMode change wasn't present in #13023 -- Is this intended to match some Spark behavior ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the default initMode in MLlib is k-means|| instead of random. See https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala#L263.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see - change LGTM then
|
Changes look fine given what was a part of #13023 |
|
test this please |
|
Test build #60913 has finished for PR 13801 at commit
|
|
Test build #60915 has finished for PR 13801 at commit
|
|
Test build #60917 has finished for PR 13801 at commit
|
|
Merged into master and branch-2.0. |
…istent with MLlib ## What changes were proposed in this pull request? This PR is a subset of #13023 by yanboliang to make SparkR model param names and default values consistent with MLlib. I tried to avoid other changes from #13023 to keep this PR minimal. I will send a follow-up PR to improve the documentation. Main changes: * `spark.glm`: epsilon -> tol, maxit -> maxIter * `spark.kmeans`: default k -> 2, default maxIter -> 20, default initMode -> "k-means||" * `spark.naiveBayes`: laplace -> smoothing, default 1.0 ## How was this patch tested? Existing unit tests. Author: Xiangrui Meng <[email protected]> Closes #13801 from mengxr/SPARK-15177.1. (cherry picked from commit 4f83ca1) Signed-off-by: Xiangrui Meng <[email protected]>
What changes were proposed in this pull request?
This PR is a subset of #13023 by @yanboliang to make SparkR model param names and default values consistent with MLlib. I tried to avoid other changes from #13023 to keep this PR minimal. I will send a follow-up PR to improve the documentation.
Main changes:
spark.glm: epsilon -> tol, maxit -> maxIterspark.kmeans: default k -> 2, default maxIter -> 20, default initMode -> "k-means||"spark.naiveBayes: laplace -> smoothing, default 1.0How was this patch tested?
Existing unit tests.