Skip to content

Conversation

@yanboliang
Copy link
Contributor

@yanboliang yanboliang commented Sep 18, 2016

What changes were proposed in this pull request?

#14881 added Kolmogorov-Smirnov Test wrapper to SparkR. I found that print.summary.KSTest was implemented inappropriately and result in no effect.

Running the following code for KSTest:

data <- data.frame(test = c(0.1, 0.15, 0.2, 0.3, 0.25, -1, -0.5))
df <- createDataFrame(data)
testResult <- spark.kstest(df, "test", "norm")
summary(testResult)

Before this PR:
image
After this PR:
image
The new implementation is similar with print.summary.GeneralizedLinearRegressionModel of SparkR and print.summary.glm of native R.

BTW, I removed the comparison of print.summary.KSTest in unit test, since it's only wrappers of the summary output which has been checked. Another reason is that these comparison will output summary information to the test console, it will make the test output in a mess.

How was this patch tested?

Existing test.

@SparkQA
Copy link

SparkQA commented Sep 18, 2016

Test build #65568 has finished for PR 15139 at commit 22365ef.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

expect_equal(stats$p.value, rStats$p.value, tolerance = 1e-4)
expect_equal(stats$statistic, unname(rStats$statistic), tolerance = 1e-4)

printStr <- print.summary.KSTest(testResult)
Copy link
Member

@felixcheung felixcheung Sep 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have test for the function and its output?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's easy to turn this into capture.output to avoid polluting output.

Copy link
Contributor Author

@yanboliang yanboliang Sep 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output is relevant to the implementation of KolmogorovSmirnovTestResult.toString, if it was updated, this test will be broken which is not reasonable IMO. We have already checked the summary output value in unit test, I think it's not necessary to check another kinds of these value again. And I believe we will add more summary information later, which will require updating the expected printing summary again and again. In all other ML wrappers, we did not check print summary. Thanks.

@felixcheung
Copy link
Member

@junyangq

@felixcheung
Copy link
Member

felixcheung commented Sep 18, 2016 via email

@yanboliang
Copy link
Contributor Author

@felixcheung I agree with you and added test to check the R print method is doing sth. Any more comments, feel free to let me know. Thanks!

@SparkQA
Copy link

SparkQA commented Sep 18, 2016

Test build #65572 has finished for PR 15139 at commit 1e5b6c2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

degreesOfFreedom = degreesOfFreedom)
ans <- list(p.value = pValue, statistic = statistic, nullHypothesis = nullHypothesis,
nullHypothesis.name = distName, nullHypothesis.parameters = distParams,
degreesOfFreedom = degreesOfFreedom, jobj = jobj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does jobj show up in the summary? this seems to be different to our other implementation?

Copy link
Contributor Author

@yanboliang yanboliang Sep 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it's ok to show up. This implementation is a little different from others since we would like to reuse the summary information output string which was defined at Scala side, otherwise, we should reimplement a same one at R side. We implement the summary information output string at R side directly for other wrappers such as GLM, since we do not have appropriate toString implementation at Scala side. Maybe we may unify them later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly - I'm not sure if there is a better way to do this; but likely better if we have some consistent way to do this for later.

"statistic = 0.38208[0-9]* \\n",
"pValue = 0.19849[0-9]* \\n",
".*"), perl = TRUE)
expect_true(length(capture.output(stats)) != 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my recommendation would be to check for the first line:
Kolmogorov-Smirnov test summary:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, updated.

@SparkQA
Copy link

SparkQA commented Sep 20, 2016

Test build #65656 has finished for PR 15139 at commit 5c5c4ff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

LGTM, @junyangq do you have any comment?

@yanboliang
Copy link
Contributor Author

I will merge this into master. If anyone has more comments, I can address them at follow-up work. Thanks for your review. @felixcheung

@asfgit asfgit closed this in 6902eda Sep 22, 2016
@yanboliang yanboliang deleted the spark-17315 branch September 22, 2016 03:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants