-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-15096][ML]:LogisticRegression MultiClassSummarizer numClasses can fail if no valid labels are found #12969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #58042 has finished for PR 12969 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you trying to handle the case where the map is empty? why not handle it directly then?
Just needs one new line of code, like: require(distinctMap.nonEmpty)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... or is the correct behavior to return 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think returning 0 is reasonable. The inline comments of the JIRA, @jkbradley proposed the fix of throwing a better exception.
I am glad to learn using require. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the only result of returning 0 that histogram() returns an empty array? that seems ... fine. It's not meaningful but it's not wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen Let me do some test about return 0 directly when the hashtable is empty and make changes accordingly. Thanks for your review!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen I made changes to return 0 when the hashtable is empty.
I found that
301 if (numInvalid != 0) {
302 val msg = s"Classification labels should be in {0 to ${numClasses - 1} " +
303 s"Found $numInvalid invalid labels."
304 logError(msg)
305 throw new SparkException(msg)
306 }
So, for the empty case, it will throw an exception anyway after train. Then, returning 0 or throw an exception won't make much difference.
|
Test build #58148 has finished for PR 12969 at commit
|
|
@srowen I have made changes as we discussed. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=== true is redundant everywhere
|
It seems like a good change, because several lower-level methods now return a reasonable result instead of an error. At a higher level, having 0 labels probably still results in some (better) error, or a trivial model, but that has at least improved the situation. |
|
Test build #58583 has finished for PR 12969 at commit
|
…can fail if no valid labels are found ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) Throw better exception when numClasses is empty and empty.max is thrown. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Add a new unit test, which calls histogram with empty numClasses. Author: [email protected] <[email protected]> Closes #12969 from wangmiao1981/logisticR. (cherry picked from commit 354f8f1) Signed-off-by: Sean Owen <[email protected]>
|
Merged to master/2.0 |
What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
Throw better exception when numClasses is empty and empty.max is thrown.
How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
Add a new unit test, which calls histogram with empty numClasses.