Skip to content

Conversation

@yinxusen
Copy link
Contributor

@yinxusen yinxusen commented Feb 8, 2016

@SparkQA
Copy link

SparkQA commented Feb 9, 2016

Test build #50945 has finished for PR 11124 at commit ca0ea4a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

#'\dontrun{
#' model <- kmeans(x, centers = 2, algorithm="random")
#'}
setMethod("kmeans", signature(x = "DataFrame"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall this match the signature of stats::kmeans in R? It's pretty close already, but it might be preferable to match it
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/kmeans.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can make the signature the same as stats:kmeans for now, since KMeans in MLlib doesn't provide the same interface with R's. E.g. KMeans in MLlib doesn't support warm-start so the centers has only to be a "numeric", and it doesn't have arguments like nstart and trace. Refer to the implementation of glm in SparkR, which also cannot match the signature with R's 100%.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current signature looks okay to me. Just need to make sure we don't shadow R's own kmeans.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add test

@mengxr
Copy link
Contributor

mengxr commented Feb 11, 2016

@yinxusen I made one pass. Could you update the PR and implement both summary and fitted? Thanks!

@yinxusen
Copy link
Contributor Author

@mengxr I am on my way implementing summary. The summary of stats:kmeans contains "cluster" "centers" "totss" "withinss" "tot.withinss" "betweenss" "size" "iter" "ifault". However I think there is no need to implement all of them in this PR. I choose "cluster", "centers" and "size" in this version. What do you think?

@mengxr
Copy link
Contributor

mengxr commented Feb 12, 2016

I checked R's output. It seems that summary on a k-means model doesn't provide much information:

summary(model)
             Length Class  Mode   
cluster      3      -none- numeric
centers      2      -none- numeric
totss        1      -none- numeric
withinss     2      -none- numeric
tot.withinss 1      -none- numeric
betweenss    1      -none- numeric
size         2      -none- numeric
iter         1      -none- numeric
ifault       1      -none- numeric

It seems like summary is not implemented for k-means model. So I would just implement model$centers and fitted(model, method=c("classes")). Note that we don't provide cluster (cluster assignments) and size (cluster sizes) in Scala.

@SparkQA
Copy link

SparkQA commented Feb 14, 2016

Test build #51249 has finished for PR 11124 at commit 44c7595.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

newIris$Species <- NULL
training <- suppressWarnings(createDataFrame(sqlContext, newIris))

# Cahce the DataFrame here to work around the bug SPARK-13178.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: cache

@yinxusen
Copy link
Contributor Author

@mengxr I add two summary variables cluster and size in KMeansModel to resemble counterparts in R's kmeans. IMHO even though summary(kmeans.model) looks incompleted, the kmeans.model itself contains those variables. I add these two first.

@yinxusen
Copy link
Contributor Author

test it please

@SparkQA
Copy link

SparkQA commented Feb 14, 2016

Test build #51253 has finished for PR 11124 at commit ecb2850.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Feb 23, 2016

@yinxusen Could you resolve merge conflicts?

@yinxusen
Copy link
Contributor Author

@mengxr, do it now.

On Mon, Feb 22, 2016 at 11:57 PM, Xiangrui Meng [email protected]
wrote:

@yinxusen https://github.com/yinxusen Could you resolve conflicts?


Reply to this email directly or view it on GitHub
#11124 (comment).

Cheers

Xusen Yin (尹绪森)
LinkedIn: https://cn.linkedin.com/in/xusenyin

@yinxusen
Copy link
Contributor Author

test it please

@SparkQA
Copy link

SparkQA commented Feb 23, 2016

Test build #51751 has finished for PR 11124 at commit 2276323.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Feb 23, 2016

LGTM. Merged into master. Thanks!

@asfgit asfgit closed this in 8d29001 Feb 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants