[SPARK-22884][ML][TESTS] ML test for StructuredStreaming: spark.ml.clustering #20319

smurakozi · 2018-01-18T20:27:17Z

What changes were proposed in this pull request?

Converting clustering tests to also check code with structured streaming, using the ML testing infrastructure implemented in SPARK-22882.

How was this patch tested?

N/A

smurakozi · 2018-01-18T20:30:24Z

mllib/src/test/scala/org/apache/spark/ml/clustering/Encoders.scala

+import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
+
+private[clustering] object Encoders {
+  implicit val vectorEncoder = ExpressionEncoder[Vector]()


Is there a better solution to provide an implicit Encoder[Vector] for testTransformer?
Is it ok here, or is there a better place for it?
e.g. org.apache.spark.mllib.util.MLlibTestSparkContext.testImplicits

Thanks for asking; you shouldn't need to do this. I'll comment on BisectingKMeansSuite.scala
about using testImplicits instead. You basically just need to import testImplicits._ and use Tuple1 for the type param for testTransformer.

squito · 2018-01-19T16:12:00Z

Jenkins, add to whitelist

SparkQA · 2018-01-19T17:16:24Z

Test build #86391 has finished for PR 20319 at commit b6e06e8.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class BisectingKMeansSuite extends MLTest with DefaultReadWriteTest
class GaussianMixtureSuite extends MLTest with DefaultReadWriteTest

smurakozi · 2018-01-19T20:30:27Z

@jkbradley could you check out this change, please?

SparkQA · 2018-01-22T16:27:26Z

Test build #86479 has finished for PR 20319 at commit dc7e708.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2018-04-06T18:16:38Z

@smurakozi Thanks for the PR! I have bandwidth to review this now. Do you have time to rebase this to fix the merge conflicts?

WeichenXu123 · 2018-04-09T04:05:21Z

@smurakozi Thanks for the PR! Could you resolve conflicts first? and then I will make a review. If you're busy I can also take over it.

SparkQA · 2018-04-09T15:30:13Z

Test build #89063 has finished for PR 20319 at commit b2aa3c9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

smurakozi · 2018-04-09T16:28:44Z

@jkbradley, @WeichenXu123 thanks for checking it out. I've resolved the conflicts, build is green.

jkbradley · 2018-04-17T17:20:31Z

Reviewing now!

jkbradley

Done with review; thanks!

jkbradley · 2018-04-17T18:23:32Z

mllib/src/test/scala/org/apache/spark/ml/clustering/Encoders.scala

+import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
+
+private[clustering] object Encoders {
+  implicit val vectorEncoder = ExpressionEncoder[Vector]()


Thanks for asking; you shouldn't need to do this. I'll comment on BisectingKMeansSuite.scala
about using testImplicits instead. You basically just need to import testImplicits._ and use Tuple1 for the type param for testTransformer.

jkbradley · 2018-04-23T23:22:35Z

mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala

-  extends SparkFunSuite with MLlibTestSparkContext with DefaultReadWriteTest {
+class BisectingKMeansSuite extends MLTest with DefaultReadWriteTest {
+
+  import Encoders._


import testImplicits._ instead

jkbradley · 2018-04-23T23:23:07Z

mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala

-    // Verify we hit the edge case
-    assert(numClusters < k && numClusters > 1)
+
+    testTransformerByGlobalCheckFunc[Vector](sparseDataset.toDF(), model, "prediction") { rows =>


Use Tuple1[Vector] instead of Vector

jkbradley · 2018-04-23T23:26:51Z

mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala

+      val clusters = rows.map(_.getAs[Int](predictionColName)).toSet
+      assert(clusters.size === k)
+      assert(clusters === Set(0, 1, 2, 3, 4))
+      assert(model.computeCost(dataset) < 0.1)


These checks which do not use "rows" should go outside of testTransformerByGlobalCheckFunc

jkbradley · 2018-05-01T16:26:37Z

@smurakozi Do you have time to update this? I did a full review, though it now has a small merge conflict. Thanks!

jkbradley · 2018-05-17T20:58:09Z

I'm going to take this over to get this done, but @smurakozi you'll be the primary author. I'll link the PR here in a minute

jkbradley · 2018-05-17T22:14:09Z

Done! Here it is: #21358

@smurakozi Could you please close this issue and help review the new PR if you have time? Thanks!

## What changes were proposed in this pull request? Converting clustering tests to also check code with structured streaming, using the ML testing infrastructure implemented in SPARK-22882. This PR is a new version of #20319 Author: Sandor Murakozi <[email protected]> Author: Joseph K. Bradley <[email protected]> Closes #21358 from jkbradley/smurakozi-SPARK-22884.

AmplabJenkins · 2018-06-09T00:15:29Z

Can one of the admins verify this patch?

smurakozi added 2 commits January 18, 2018 20:03

Converted all clustering tests to check streaming

97c96b6

formatting, nits

b6e06e8

smurakozi commented Jan 18, 2018

View reviewed changes

Merge branch 'master' into SPARK-22884

dc7e708

smurakozi added 2 commits April 9, 2018 16:23

Merge branch 'master' into SPARK-22884

eee38f2

Removed duplicated import line

b2aa3c9

jkbradley reviewed Apr 23, 2018

View reviewed changes

jkbradley mentioned this pull request May 17, 2018

[SPARK-22884][ML] ML tests for StructuredStreaming: spark.ml.clustering #21358

Closed

HyukjinKwon mentioned this pull request Jul 16, 2018

[INFRA] Close stale PR #21781

Closed

asfgit closed this in 1a4fda8 Jul 19, 2018

[SPARK-22884][ML][TESTS] ML test for StructuredStreaming: spark.ml.clustering #20319

[SPARK-22884][ML][TESTS] ML test for StructuredStreaming: spark.ml.clustering #20319

Uh oh!

Conversation

smurakozi commented Jan 18, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

smurakozi Jan 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkbradley Apr 17, 2018

Choose a reason for hiding this comment

Uh oh!

squito commented Jan 19, 2018

Uh oh!

SparkQA commented Jan 19, 2018

Uh oh!

smurakozi commented Jan 19, 2018

Uh oh!

SparkQA commented Jan 22, 2018

Uh oh!

jkbradley commented Apr 6, 2018

Uh oh!

WeichenXu123 commented Apr 9, 2018

Uh oh!

SparkQA commented Apr 9, 2018

Uh oh!

smurakozi commented Apr 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkbradley commented Apr 17, 2018

Uh oh!

jkbradley left a comment

Choose a reason for hiding this comment

Uh oh!

jkbradley Apr 17, 2018

Choose a reason for hiding this comment

Uh oh!

jkbradley Apr 23, 2018

Choose a reason for hiding this comment

Uh oh!

jkbradley Apr 23, 2018

Choose a reason for hiding this comment

Uh oh!

jkbradley Apr 23, 2018

Choose a reason for hiding this comment

Uh oh!

jkbradley commented May 1, 2018

Uh oh!

jkbradley commented May 17, 2018

Uh oh!

jkbradley commented May 17, 2018

Uh oh!

AmplabJenkins commented Jun 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

smurakozi Jan 18, 2018 •

edited

Loading

smurakozi commented Apr 9, 2018 •

edited

Loading