-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-13011] K-means wrapper in SparkR #11124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #50945 has finished for PR 11124 at commit
|
| #'\dontrun{ | ||
| #' model <- kmeans(x, centers = 2, algorithm="random") | ||
| #'} | ||
| setMethod("kmeans", signature(x = "DataFrame"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall this match the signature of stats::kmeans in R? It's pretty close already, but it might be preferable to match it
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/kmeans.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can make the signature the same as stats:kmeans for now, since KMeans in MLlib doesn't provide the same interface with R's. E.g. KMeans in MLlib doesn't support warm-start so the centers has only to be a "numeric", and it doesn't have arguments like nstart and trace. Refer to the implementation of glm in SparkR, which also cannot match the signature with R's 100%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current signature looks okay to me. Just need to make sure we don't shadow R's own kmeans.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add test
|
@yinxusen I made one pass. Could you update the PR and implement both |
|
@mengxr I am on my way implementing |
|
I checked R's output. It seems that It seems like |
|
Test build #51249 has finished for PR 11124 at commit
|
| newIris$Species <- NULL | ||
| training <- suppressWarnings(createDataFrame(sqlContext, newIris)) | ||
|
|
||
| # Cahce the DataFrame here to work around the bug SPARK-13178. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: cache
|
@mengxr I add two summary variables |
|
test it please |
|
Test build #51253 has finished for PR 11124 at commit
|
|
@yinxusen Could you resolve merge conflicts? |
|
@mengxr, do it now. On Mon, Feb 22, 2016 at 11:57 PM, Xiangrui Meng [email protected]
CheersXusen Yin (尹绪森) |
|
test it please |
|
Test build #51751 has finished for PR 11124 at commit
|
|
LGTM. Merged into master. Thanks! |
https://issues.apache.org/jira/browse/SPARK-13011