[SPARK-16356][ML] Add testImplicits for ML unit tests and promote toDF() #14035

HyukjinKwon · 2016-07-03T06:20:17Z

What changes were proposed in this pull request?

This was suggested in 101663f#commitcomment-17114968.

This PR adds testImplicits to MLlibTestSparkContext so that some implicits such as toDF() can be sued across ml tests.

This PR also changes all the usages of spark.createDataFrame( ... ) to toDF() where applicable in ml tests in Scala.

How was this patch tested?

Existing tests should work.

HyukjinKwon · 2016-07-03T06:21:07Z

cc @mengxr, @yanboliang and @jaceklaskowski

SparkQA · 2016-07-03T07:00:03Z

Test build #61679 has finished for PR 14035 at commit 54c27d4.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-07-03T09:08:57Z

Test build #61681 has finished for PR 14035 at commit 6fdf290.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jaceklaskowski · 2016-07-03T11:51:45Z

mllib/src/test/scala/org/apache/spark/ml/classification/ClassifierSuite.scala

repeated. What about Moving it outside test methods?

HyukjinKwon · 2016-07-06T04:59:22Z

@mengxr, @yanboliang, Could you review this?

HyukjinKwon · 2016-07-08T23:50:46Z

Hi @mengxr, is this the change you meant? Could you please take a look?

HyukjinKwon · 2016-07-11T23:28:51Z

Gentle ping @mengxr and @yanboliang

HyukjinKwon · 2016-07-15T04:57:39Z

ping @mengxr and @yanboliang ..

HyukjinKwon · 2016-07-21T02:35:06Z

hm.. I can close if it looks inappropriate or it seems making a lot of conflicts across PRs. Could you give some feedback please @mengxr and @yanboliang ?

HyukjinKwon · 2016-08-10T00:58:40Z

ping @mengxr and @yanboliang

SparkQA · 2016-08-10T02:44:23Z

Test build #63487 has finished for PR 14035 at commit 5157d77.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2016-08-16T06:46:45Z

Hi @jkbradley, could you take a look for this one please?

SparkQA · 2016-08-30T07:49:40Z

Test build #64632 has finished for PR 14035 at commit f2990b1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-09-22T06:34:02Z

Test build #65757 has finished for PR 14035 at commit 2cbcabd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-09-22T06:37:05Z

Test build #65758 has finished for PR 14035 at commit 13b1a67.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2016-09-22T10:34:27Z

Hi @mengxr, @yanboliang and @jkbradley, if these changes are so big, I can just leave testImplicits and let others fix them later without sweeping. Could you please take a look?

yanboliang · 2016-09-22T11:04:19Z

Sorry for late response. I like this change and will have a look soon. Thanks.

HyukjinKwon · 2016-09-22T11:10:59Z

Thank you!!

yanboliang · 2016-09-25T08:35:06Z

@HyukjinKwon I have made a pass and this PR look good overall. Could you address my minor comments and double check whether all ML test cases are covered? Since I found we used implicit import of different style at ChiSqSelectorSuite, it's better we can unify them. Then I'd like to get this in. Thanks for working on this.

HyukjinKwon · 2016-09-25T12:03:04Z

Thanks @yanboliang and @jaceklaskowski . I addressed comments except for few comments I am not too sure of and I think are not related changes.

HyukjinKwon · 2016-09-25T12:04:24Z

mllib/src/test/scala/org/apache/spark/ml/feature/VectorIndexerSuite.scala

+    sparsePoints1 = sparsePoints1Seq.map(FeatureData).toDF()
+    // TODO: If we directly use `toDF` without parallelize, the test in
+    // "Throws error when given RDDs with different size vectors" is failed for an unknown reason.
+    densePoints2 = sc.parallelize(densePoints2Seq, 2).map(FeatureData).toDF()


BTW, It seems a test is failed when I change this to densePoints2Seq.map(FeatureData).toDF() for an unknown reason.

SparkQA · 2016-09-25T13:53:05Z

Test build #65881 has finished for PR 14035 at commit ad9d7ac.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-09-25T14:18:21Z

Test build #65883 has finished for PR 14035 at commit b60c952.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yanboliang · 2016-09-26T10:42:52Z

mllib/src/test/scala/org/apache/spark/ml/feature/ChiSqSelectorSuite.scala

  test("Test Chi-Square selector") {
-    val spark = this.spark
-    import spark.implicits._
+    import testImplicits._


Nit: Actually it should be moved out of this test function and can be shared between all test cases if necessary.

yanboliang · 2016-09-26T11:20:11Z

LGTM, merged into master. Thanks!

HyukjinKwon · 2016-09-26T11:22:59Z

Thank you for reviewing this!

…istributed Dataset. ## What changes were proposed in this pull request? #14035 added ```testImplicits``` to ML unit tests and promoted ```toDF()```, but left one minor issue at ```VectorIndexerSuite```. If we create the DataFrame by ```Seq(...).toDF()```, it will throw different error/exception compared with ```sc.parallelize(Seq(...)).toDF()``` for one of the test cases. After in-depth study, I found it was caused by different behavior of local and distributed Dataset if the UDF failed at ```assert```. If the data is local Dataset, it throws ```AssertionError``` directly; If the data is distributed Dataset, it throws ```SparkException``` which is the wrapper of ```AssertionError```. I think we should enforce this test to cover both case. ## How was this patch tested? Unit test. Author: Yanbo Liang <[email protected]> Closes #15261 from yanboliang/spark-16356.

jkbradley · 2016-10-25T18:15:09Z

Sorry I'm seeing this so late, but thank you all for the PR & reviews!

@jaceklaskowski Regarding the explicit partitioning in unit tests, that's historical: In the past, we had run into some bugs which only showed up with multiple partitions, so we got in the habit of using multiple partitions. I still think it's a nice idea in general, though perhaps there are fewer such bugs nowadays.

jaceklaskowski reviewed Jul 3, 2016
View reviewed changes

HyukjinKwon force-pushed the minor-ml-test branch from f2990b1 to 2cbcabd Compare September 22, 2016 04:30

HyukjinKwon force-pushed the minor-ml-test branch 2 times, most recently from e803905 to 13b1a67 Compare September 22, 2016 04:35

HyukjinKwon added 4 commits September 25, 2016 20:09

Promote toDF() instead of createDataFrame

4a04bab

Address comments

d09a469

Uniform testImplicits

30ae934

Add some more tests

ad9d7ac

HyukjinKwon force-pushed the minor-ml-test branch from 13b1a67 to ad9d7ac Compare September 25, 2016 11:59

HyukjinKwon commented Sep 25, 2016

View reviewed changes

Remove unrelated changes in SQL tests and fix indentation for imports

b60c952

yanboliang reviewed Sep 26, 2016

View reviewed changes

asfgit closed this in f234b7c Sep 26, 2016

yanboliang mentioned this pull request Sep 27, 2016

[SPARK-16356][Follow-up][ML] Enforce ML test of exception for local/distributed Dataset. #15261

Closed

HyukjinKwon deleted the minor-ml-test branch January 2, 2018 03:39

[SPARK-16356][ML] Add testImplicits for ML unit tests and promote toDF() #14035

[SPARK-16356][ML] Add testImplicits for ML unit tests and promote toDF() #14035

Uh oh!

Conversation

HyukjinKwon commented Jul 3, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

HyukjinKwon commented Jul 3, 2016

Uh oh!

SparkQA commented Jul 3, 2016

Uh oh!

SparkQA commented Jul 3, 2016

Uh oh!

jaceklaskowski Jul 3, 2016

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Jul 6, 2016

Uh oh!

HyukjinKwon commented Jul 8, 2016

Uh oh!

HyukjinKwon commented Jul 11, 2016

Uh oh!

HyukjinKwon commented Jul 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon commented Jul 21, 2016

Uh oh!

HyukjinKwon commented Aug 10, 2016

Uh oh!

SparkQA commented Aug 10, 2016

Uh oh!

HyukjinKwon commented Aug 16, 2016

Uh oh!

SparkQA commented Aug 30, 2016

Uh oh!

SparkQA commented Sep 22, 2016

Uh oh!

SparkQA commented Sep 22, 2016

Uh oh!

HyukjinKwon commented Sep 22, 2016

Uh oh!

yanboliang commented Sep 22, 2016

Uh oh!

HyukjinKwon commented Sep 22, 2016

Uh oh!

yanboliang commented Sep 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon commented Sep 25, 2016

Uh oh!

HyukjinKwon Sep 25, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 25, 2016

Uh oh!

SparkQA commented Sep 25, 2016

Uh oh!

yanboliang Sep 26, 2016

Choose a reason for hiding this comment

Uh oh!

yanboliang commented Sep 26, 2016

Uh oh!

HyukjinKwon commented Sep 26, 2016

Uh oh!

jkbradley commented Oct 25, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

HyukjinKwon commented Jul 15, 2016 •

edited

Loading

yanboliang commented Sep 25, 2016 •

edited

Loading