Skip to content

Conversation

@mengxr
Copy link
Contributor

@mengxr mengxr commented Jun 21, 2016

What changes were proposed in this pull request?

This PR is a subset of #13023 by @yanboliang to make SparkR model param names and default values consistent with MLlib. I tried to avoid other changes from #13023 to keep this PR minimal. I will send a follow-up PR to improve the documentation.

Main changes:

  • spark.glm: epsilon -> tol, maxit -> maxIter
  • spark.kmeans: default k -> 2, default maxIter -> 20, default initMode -> "k-means||"
  • spark.naiveBayes: laplace -> smoothing, default 1.0

How was this patch tested?

Existing unit tests.

@mengxr
Copy link
Contributor Author

mengxr commented Jun 21, 2016

cc: @shivaram

#' @note spark.kmeans since 2.0.0
setMethod("spark.kmeans", signature(data = "SparkDataFrame", formula = "formula"),
function(data, formula, k, maxIter = 10, initMode = c("random", "k-means||")) {
function(data, formula, k = 2, maxIter = 20, initMode = c("k-means||", "random")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to clarify - this initMode change wasn't present in #13023 -- Is this intended to match some Spark behavior ?

Copy link
Contributor Author

@mengxr mengxr Jun 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see - change LGTM then

@shivaram
Copy link
Contributor

Changes look fine given what was a part of #13023

@mengxr
Copy link
Contributor Author

mengxr commented Jun 21, 2016

test this please

@SparkQA
Copy link

SparkQA commented Jun 21, 2016

Test build #60913 has finished for PR 13801 at commit 39a4c4c.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 21, 2016

Test build #60915 has finished for PR 13801 at commit 39a4c4c.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 21, 2016

Test build #60917 has finished for PR 13801 at commit 0a712fe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor Author

mengxr commented Jun 21, 2016

Merged into master and branch-2.0.

@asfgit asfgit closed this in 4f83ca1 Jun 21, 2016
asfgit pushed a commit that referenced this pull request Jun 21, 2016
…istent with MLlib

## What changes were proposed in this pull request?

This PR is a subset of #13023 by yanboliang to make SparkR model param names and default values consistent with MLlib. I tried to avoid other changes from #13023 to keep this PR minimal. I will send a follow-up PR to improve the documentation.

Main changes:
* `spark.glm`: epsilon -> tol, maxit -> maxIter
* `spark.kmeans`: default k -> 2, default maxIter -> 20, default initMode -> "k-means||"
* `spark.naiveBayes`: laplace -> smoothing, default 1.0

## How was this patch tested?

Existing unit tests.

Author: Xiangrui Meng <[email protected]>

Closes #13801 from mengxr/SPARK-15177.1.

(cherry picked from commit 4f83ca1)
Signed-off-by: Xiangrui Meng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants