-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-17315][Follow-up][SparkR][ML] Fix print of Kolmogorov-Smirnov test summary #15139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #65568 has finished for PR 15139 at commit
|
| expect_equal(stats$p.value, rStats$p.value, tolerance = 1e-4) | ||
| expect_equal(stats$statistic, unname(rStats$statistic), tolerance = 1e-4) | ||
|
|
||
| printStr <- print.summary.KSTest(testResult) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have test for the function and its output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's easy to turn this into capture.output to avoid polluting output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output is relevant to the implementation of KolmogorovSmirnovTestResult.toString, if it was updated, this test will be broken which is not reasonable IMO. We have already checked the summary output value in unit test, I think it's not necessary to check another kinds of these value again. And I believe we will add more summary information later, which will require updating the expected printing summary again and again. In all other ML wrappers, we did not check print summary. Thanks.
|
I agree we don't need to check the full output but we should check the R print method is doing something - without a test, we have it broken before.
Generally we should have a test at minimum for any API we are exporting.
|
|
@felixcheung I agree with you and added test to check the R print method is doing sth. Any more comments, feel free to let me know. Thanks! |
|
Test build #65572 has finished for PR 15139 at commit
|
| degreesOfFreedom = degreesOfFreedom) | ||
| ans <- list(p.value = pValue, statistic = statistic, nullHypothesis = nullHypothesis, | ||
| nullHypothesis.name = distName, nullHypothesis.parameters = distParams, | ||
| degreesOfFreedom = degreesOfFreedom, jobj = jobj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does jobj show up in the summary? this seems to be different to our other implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it's ok to show up. This implementation is a little different from others since we would like to reuse the summary information output string which was defined at Scala side, otherwise, we should reimplement a same one at R side. We implement the summary information output string at R side directly for other wrappers such as GLM, since we do not have appropriate toString implementation at Scala side. Maybe we may unify them later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly - I'm not sure if there is a better way to do this; but likely better if we have some consistent way to do this for later.
| "statistic = 0.38208[0-9]* \\n", | ||
| "pValue = 0.19849[0-9]* \\n", | ||
| ".*"), perl = TRUE) | ||
| expect_true(length(capture.output(stats)) != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my recommendation would be to check for the first line:
Kolmogorov-Smirnov test summary:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, updated.
|
Test build #65656 has finished for PR 15139 at commit
|
|
LGTM, @junyangq do you have any comment? |
|
I will merge this into master. If anyone has more comments, I can address them at follow-up work. Thanks for your review. @felixcheung |
What changes were proposed in this pull request?
#14881 added Kolmogorov-Smirnov Test wrapper to SparkR. I found that
print.summary.KSTestwas implemented inappropriately and result in no effect.Running the following code for KSTest:
Before this PR:


After this PR:
The new implementation is similar with
print.summary.GeneralizedLinearRegressionModelof SparkR andprint.summary.glmof native R.BTW, I removed the comparison of
print.summary.KSTestin unit test, since it's only wrappers of the summary output which has been checked. Another reason is that these comparison will output summary information to the test console, it will make the test output in a mess.How was this patch tested?
Existing test.