-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-16356][ML] Add testImplicits for ML unit tests and promote toDF() #14035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @mengxr, @yanboliang and @jaceklaskowski |
|
Test build #61679 has finished for PR 14035 at commit
|
|
Test build #61681 has finished for PR 14035 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repeated. What about Moving it outside test methods?
|
@mengxr, @yanboliang, Could you review this? |
|
Hi @mengxr, is this the change you meant? Could you please take a look? |
|
Gentle ping @mengxr and @yanboliang |
|
ping @mengxr and @yanboliang .. |
|
hm.. I can close if it looks inappropriate or it seems making a lot of conflicts across PRs. Could you give some feedback please @mengxr and @yanboliang ? |
|
ping @mengxr and @yanboliang |
|
Test build #63487 has finished for PR 14035 at commit
|
|
Hi @jkbradley, could you take a look for this one please? |
|
Test build #64632 has finished for PR 14035 at commit
|
f2990b1 to
2cbcabd
Compare
e803905 to
13b1a67
Compare
|
Test build #65757 has finished for PR 14035 at commit
|
|
Test build #65758 has finished for PR 14035 at commit
|
|
Hi @mengxr, @yanboliang and @jkbradley, if these changes are so big, I can just leave |
|
Sorry for late response. I like this change and will have a look soon. Thanks. |
|
Thank you!! |
|
@HyukjinKwon I have made a pass and this PR look good overall. Could you address my minor comments and double check whether all ML test cases are covered? Since I found we used implicit import of different style at |
13b1a67 to
ad9d7ac
Compare
|
Thanks @yanboliang and @jaceklaskowski . I addressed comments except for few comments I am not too sure of and I think are not related changes. |
| sparsePoints1 = sparsePoints1Seq.map(FeatureData).toDF() | ||
| // TODO: If we directly use `toDF` without parallelize, the test in | ||
| // "Throws error when given RDDs with different size vectors" is failed for an unknown reason. | ||
| densePoints2 = sc.parallelize(densePoints2Seq, 2).map(FeatureData).toDF() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, It seems a test is failed when I change this to densePoints2Seq.map(FeatureData).toDF() for an unknown reason.
|
Test build #65881 has finished for PR 14035 at commit
|
|
Test build #65883 has finished for PR 14035 at commit
|
| test("Test Chi-Square selector") { | ||
| val spark = this.spark | ||
| import spark.implicits._ | ||
| import testImplicits._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Actually it should be moved out of this test function and can be shared between all test cases if necessary.
|
LGTM, merged into master. Thanks! |
|
Thank you for reviewing this! |
…istributed Dataset. ## What changes were proposed in this pull request? #14035 added ```testImplicits``` to ML unit tests and promoted ```toDF()```, but left one minor issue at ```VectorIndexerSuite```. If we create the DataFrame by ```Seq(...).toDF()```, it will throw different error/exception compared with ```sc.parallelize(Seq(...)).toDF()``` for one of the test cases. After in-depth study, I found it was caused by different behavior of local and distributed Dataset if the UDF failed at ```assert```. If the data is local Dataset, it throws ```AssertionError``` directly; If the data is distributed Dataset, it throws ```SparkException``` which is the wrapper of ```AssertionError```. I think we should enforce this test to cover both case. ## How was this patch tested? Unit test. Author: Yanbo Liang <[email protected]> Closes #15261 from yanboliang/spark-16356.
|
Sorry I'm seeing this so late, but thank you all for the PR & reviews! @jaceklaskowski Regarding the explicit partitioning in unit tests, that's historical: In the past, we had run into some bugs which only showed up with multiple partitions, so we got in the habit of using multiple partitions. I still think it's a nice idea in general, though perhaps there are fewer such bugs nowadays. |
What changes were proposed in this pull request?
This was suggested in 101663f#commitcomment-17114968.
This PR adds
testImplicitstoMLlibTestSparkContextso that some implicits such astoDF()can be sued across ml tests.This PR also changes all the usages of
spark.createDataFrame( ... )totoDF()where applicable in ml tests in Scala.How was this patch tested?
Existing tests should work.