-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-15177] [SparkR] [ML] SparkR 2.0 QA: New R APIs and API docs for mllib.R #13023
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #58225 has finished for PR 13023 at commit
|
| setMethod("glm", signature(formula = "formula", family = "ANY", data = "SparkDataFrame"), | ||
| function(formula, family = gaussian, data, epsilon = 1e-06, maxit = 25) { | ||
| spark.glm(data, formula, family, epsilon, maxit) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because glm is R-compliant function, so I left the argument names consistent with native R.
|
cc @mengxr |
|
Test build #58484 has finished for PR 13023 at commit
|
| #' Fit a k-means model | ||
| #' | ||
| #' Fit a k-means model, similarly to R's kmeans(). | ||
| #' Fits a k-means model, similarly to R's kmeans(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems to be changing a few times here - should this be Fits a ... or Fit a ...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fits. See https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html for examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm referring to lines
#' Fit a k-means model
#'
#' Fit a k-means model, similarly to R's kmeans().
ie. the first line and the third line.
For example, it shows up for glm like this http://spark.apache.org/docs/latest/api/R/glm.html

Which I'd think would be rather odd if they are not consistent.
|
Suggested by this comment, I was wondering if we also need to update the docs for k-means and naive bayes in http://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/sparkr.html. Maybe we can include that change in this PR. |
|
@vectorijk There is a separate PR focus on updating machine learning section of SparkR users guide. FYI #13285. Thanks. |
|
@mengxr @yanboliang Is this PR still active ? Just checking if this is something we should track for the 2.0 release |
|
It would be nice to get this in. @yanboliang is traveling. I can help send a PR based on this one. |
…istent with MLlib ## What changes were proposed in this pull request? This PR is a subset of #13023 by yanboliang to make SparkR model param names and default values consistent with MLlib. I tried to avoid other changes from #13023 to keep this PR minimal. I will send a follow-up PR to improve the documentation. Main changes: * `spark.glm`: epsilon -> tol, maxit -> maxIter * `spark.kmeans`: default k -> 2, default maxIter -> 20, default initMode -> "k-means||" * `spark.naiveBayes`: laplace -> smoothing, default 1.0 ## How was this patch tested? Existing unit tests. Author: Xiangrui Meng <[email protected]> Closes #13801 from mengxr/SPARK-15177.1.
…istent with MLlib ## What changes were proposed in this pull request? This PR is a subset of #13023 by yanboliang to make SparkR model param names and default values consistent with MLlib. I tried to avoid other changes from #13023 to keep this PR minimal. I will send a follow-up PR to improve the documentation. Main changes: * `spark.glm`: epsilon -> tol, maxit -> maxIter * `spark.kmeans`: default k -> 2, default maxIter -> 20, default initMode -> "k-means||" * `spark.naiveBayes`: laplace -> smoothing, default 1.0 ## How was this patch tested? Existing unit tests. Author: Xiangrui Meng <[email protected]> Closes #13801 from mengxr/SPARK-15177.1. (cherry picked from commit 4f83ca1) Signed-off-by: Xiangrui Meng <[email protected]>
|
@yanboliang We are going to split the work into multiple PRs (SPARK-16090). Do you mind closing this PR for now? Thanks! |
What changes were proposed in this pull request?
SparkR 2.0 QA: New R APIs and API docs for mllib.R. Main changes including:
spark.glmandspark.naiveBayesAPI more consistent with Spark naming convention. Because most Spark MLlib algorithms do not override the base R functions, we can make the argument names consistent with Spark MLlib rather than base R. Meanwhile, make the default value to be consistent with MLlib.From:
To:
glminternally will callspark.glmto train model, we should not duplicate allspark.glmtests twice and just run the basic test case to check the API working well.How was this patch tested?
Existing unit tests.