Skip to content

Conversation

@HyukjinKwon
Copy link
Member

What changes were proposed in this pull request?

This was suggested in 101663f#commitcomment-17114968.

This PR adds testImplicits to MLlibTestSparkContext so that some implicits such as toDF() can be sued across ml tests.

This PR also changes all the usages of spark.createDataFrame( ... ) to toDF() where applicable in ml tests in Scala.

How was this patch tested?

Existing tests should work.

@HyukjinKwon
Copy link
Member Author

cc @mengxr, @yanboliang and @jaceklaskowski

@SparkQA
Copy link

SparkQA commented Jul 3, 2016

Test build #61679 has finished for PR 14035 at commit 54c27d4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 3, 2016

Test build #61681 has finished for PR 14035 at commit 6fdf290.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repeated. What about Moving it outside test methods?

@HyukjinKwon
Copy link
Member Author

@mengxr, @yanboliang, Could you review this?

@HyukjinKwon
Copy link
Member Author

Hi @mengxr, is this the change you meant? Could you please take a look?

@HyukjinKwon
Copy link
Member Author

Gentle ping @mengxr and @yanboliang

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jul 15, 2016

ping @mengxr and @yanboliang ..

@HyukjinKwon
Copy link
Member Author

hm.. I can close if it looks inappropriate or it seems making a lot of conflicts across PRs. Could you give some feedback please @mengxr and @yanboliang ?

@HyukjinKwon
Copy link
Member Author

ping @mengxr and @yanboliang

@SparkQA
Copy link

SparkQA commented Aug 10, 2016

Test build #63487 has finished for PR 14035 at commit 5157d77.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

Hi @jkbradley, could you take a look for this one please?

@SparkQA
Copy link

SparkQA commented Aug 30, 2016

Test build #64632 has finished for PR 14035 at commit f2990b1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon force-pushed the minor-ml-test branch 2 times, most recently from e803905 to 13b1a67 Compare September 22, 2016 04:35
@SparkQA
Copy link

SparkQA commented Sep 22, 2016

Test build #65757 has finished for PR 14035 at commit 2cbcabd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 22, 2016

Test build #65758 has finished for PR 14035 at commit 13b1a67.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

Hi @mengxr, @yanboliang and @jkbradley, if these changes are so big, I can just leave testImplicits and let others fix them later without sweeping. Could you please take a look?

@yanboliang
Copy link
Contributor

Sorry for late response. I like this change and will have a look soon. Thanks.

@HyukjinKwon
Copy link
Member Author

Thank you!!

@yanboliang
Copy link
Contributor

yanboliang commented Sep 25, 2016

@HyukjinKwon I have made a pass and this PR look good overall. Could you address my minor comments and double check whether all ML test cases are covered? Since I found we used implicit import of different style at ChiSqSelectorSuite, it's better we can unify them. Then I'd like to get this in. Thanks for working on this.

@HyukjinKwon
Copy link
Member Author

Thanks @yanboliang and @jaceklaskowski . I addressed comments except for few comments I am not too sure of and I think are not related changes.

sparsePoints1 = sparsePoints1Seq.map(FeatureData).toDF()
// TODO: If we directly use `toDF` without parallelize, the test in
// "Throws error when given RDDs with different size vectors" is failed for an unknown reason.
densePoints2 = sc.parallelize(densePoints2Seq, 2).map(FeatureData).toDF()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, It seems a test is failed when I change this to densePoints2Seq.map(FeatureData).toDF() for an unknown reason.

@SparkQA
Copy link

SparkQA commented Sep 25, 2016

Test build #65881 has finished for PR 14035 at commit ad9d7ac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 25, 2016

Test build #65883 has finished for PR 14035 at commit b60c952.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

test("Test Chi-Square selector") {
val spark = this.spark
import spark.implicits._
import testImplicits._
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Actually it should be moved out of this test function and can be shared between all test cases if necessary.

@yanboliang
Copy link
Contributor

LGTM, merged into master. Thanks!

@asfgit asfgit closed this in f234b7c Sep 26, 2016
@HyukjinKwon
Copy link
Member Author

Thank you for reviewing this!

asfgit pushed a commit that referenced this pull request Sep 29, 2016
…istributed Dataset.

## What changes were proposed in this pull request?
#14035 added ```testImplicits``` to ML unit tests and promoted ```toDF()```, but left one minor issue at ```VectorIndexerSuite```. If we create the DataFrame by ```Seq(...).toDF()```, it will throw different error/exception compared with ```sc.parallelize(Seq(...)).toDF()``` for one of the test cases.
After in-depth study, I found it was caused by different behavior of local and distributed Dataset if the UDF failed at ```assert```. If the data is local Dataset, it throws ```AssertionError``` directly; If the data is distributed Dataset, it throws ```SparkException``` which is the wrapper of ```AssertionError```. I think we should enforce this test to cover both case.

## How was this patch tested?
Unit test.

Author: Yanbo Liang <[email protected]>

Closes #15261 from yanboliang/spark-16356.
@jkbradley
Copy link
Member

Sorry I'm seeing this so late, but thank you all for the PR & reviews!

@jaceklaskowski Regarding the explicit partitioning in unit tests, that's historical: In the past, we had run into some bugs which only showed up with multiple partitions, so we got in the habit of using multiple partitions. I still think it's a nice idea in general, though perhaps there are fewer such bugs nowadays.

@HyukjinKwon HyukjinKwon deleted the minor-ml-test branch January 2, 2018 03:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants