[SPARK-4887][MLlib] Fix a bad unittest in LogisticRegressionSuite #3735

dbtsai · 2014-12-18T19:48:02Z

The original test doesn't make sense since if you step in, the lossSum is already NaN,
and the coefficients are diverging. That's because the step size is too large for SGD,
so it doesn't work.

The correct behavior is that you should get smaller coefficients than the one
without regularization. Comparing the values using 20000.0 relative error doesn't
make sense as well.

SparkQA · 2014-12-18T19:52:40Z

Test build #24595 has started for PR 3735 at commit b1a3c42.

This patch merges cleanly.

jkbradley · 2014-12-18T20:42:10Z

LGTM, pending Jenkins.

This kind of test seems pretty fragile anyways. I guess it provides a check for optimization behavior changing underneath, but at some point, it might be better to switch to better checks for convergence. E.g., we could use a well-conditioned problem and compare the results after a bunch of iterations from different starting points. The epsilon error allowed could be calculated properly based on the convergenceTol + the objective. Is that too much for now? It could be a JIRA instead.

SparkQA · 2014-12-18T21:14:30Z

Test build #24595 has finished for PR 3735 at commit b1a3c42.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-12-18T21:14:34Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24595/
Test PASSed.

dbtsai · 2014-12-18T21:48:21Z

I agree. The test is not good. I'm thinking we probably can add couple well known dataset like iris or prostate cancer dataset into the test resource, and we can compare the accuracy with R. At Alpine, we also have small R script as comment in the scala test which can reproduce the coefficients. Of course, we should test if this will be converged to the same solution from different initial condition. I found this issue when I refactorize our internal MLOR code to MLLIb, and I will add more tests in MLOR PR.

mengxr · 2014-12-18T21:56:15Z

Merged into master. Thanks!

first commit

b1a3c42

asfgit closed this in 59a49db Dec 18, 2014

dbtsai deleted the mlortestfix branch December 19, 2014 20:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-4887][MLlib] Fix a bad unittest in LogisticRegressionSuite #3735

[SPARK-4887][MLlib] Fix a bad unittest in LogisticRegressionSuite #3735

Uh oh!

dbtsai commented Dec 18, 2014

Uh oh!

SparkQA commented Dec 18, 2014

Uh oh!

jkbradley commented Dec 18, 2014

Uh oh!

SparkQA commented Dec 18, 2014

Uh oh!

AmplabJenkins commented Dec 18, 2014

Uh oh!

dbtsai commented Dec 18, 2014

Uh oh!

mengxr commented Dec 18, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-4887][MLlib] Fix a bad unittest in LogisticRegressionSuite #3735

[SPARK-4887][MLlib] Fix a bad unittest in LogisticRegressionSuite #3735

Uh oh!

Conversation

dbtsai commented Dec 18, 2014

Uh oh!

SparkQA commented Dec 18, 2014

Uh oh!

jkbradley commented Dec 18, 2014

Uh oh!

SparkQA commented Dec 18, 2014

Uh oh!

AmplabJenkins commented Dec 18, 2014

Uh oh!

dbtsai commented Dec 18, 2014

Uh oh!

mengxr commented Dec 18, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants