-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-17506][SQL] Improve the check double values equality rule. #15059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
We already have a 'relTol' operator for this purpose BTW. |
|
but relTol is defined in mllib and sql not reference it, seems better to move it to spark-core project? |
|
Oh I see. Yes if we can move it to Spark's core test module that would be nicer. |
|
Test build #65265 has finished for PR 15059 at commit
|
|
@srowen I've addressed your comment, thank you! |
|
That seems reasonable to pull out some generic testing utils into common from mllib. This looks OK to me. CC maybe @jkbradley ? @MLnick ? just anyone for a second opinion. |
|
Test build #65306 has finished for PR 15059 at commit
|
|
Moving generic testing utils from mllib to common looks OK to me. Actually we have |
|
Yes @yanboliang is correct - we seem to have duplicated the double testing between |
|
@yanboliang Thank you! Will address your comment soon! |
|
Oops, I found that |
|
Yes in theory that's the thing to do, to have a series of increasingly 'core' modules but I think it's a bit late to undo the mono-core module design here. Instead, maybe just for this test, it's OK to 'inline' the relative tolerance logic. |
|
@srowen I've reverted the previous change and inlined the relative tolerance logic, thank you! |
| * Note that if x or y is extremely close to zero, i.e., smaller than Double.MinPositiveValue, | ||
| * the relative tolerance is meaningless, so the exception will be raised to warn users. | ||
| */ | ||
| private def relativeErrorComparison(x: Double, y: Double, eps: Double = 1E-8): Boolean = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine. You could refer to the source of this code and explain the duplication but it's not a big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've add comment to indicate the problem. Thank you!
|
Test build #65363 has finished for PR 15059 at commit
|
|
Test build #65364 has finished for PR 15059 at commit
|
| (result, expected) match { | ||
| case (result: Array[Byte], expected: Array[Byte]) => | ||
| java.util.Arrays.equals(result, expected) | ||
| case (result: Double, expected: Spread[Double @unchecked]) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW do you mean to remove this case? does it not apply now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should have been replaced by the new case below.
|
Merged to master |
## What changes were proposed in this pull request? In `ExpressionEvalHelper`, we check the equality between two double values by comparing whether the expected value is within the range [target - tolerance, target + tolerance], but this can cause a negative false when the compared numerics are very large. Before: ``` val1 = 1.6358558070241E306 val2 = 1.6358558070240974E306 ExpressionEvalHelper.compareResults(val1, val2) false ``` In fact, `val1` and `val2` are but with different precisions, we should tolerant this case by comparing with percentage range, eg.,expected is within range [target - target * tolerance_percentage, target + target * tolerance_percentage]. After: ``` val1 = 1.6358558070241E306 val2 = 1.6358558070240974E306 ExpressionEvalHelper.compareResults(val1, val2) true ``` ## How was this patch tested? Exsiting testcases. Author: jiangxingbo <[email protected]> Closes apache#15059 from jiangxb1987/deq.
| checkEvaluation(Remainder(positiveLongLit, positiveLongLit), 0L) | ||
| checkEvaluation(Remainder(negativeLongLit, negativeLongLit), 0L) | ||
|
|
||
| // TODO: the following lines would fail the test due to inconsistency result of interpret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this TODO is not fixed yet, why remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result of interpret and codegen for remainder between giant values are equal within relative tolerance, so maybe this no longer requires to be resolved. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it seemed worth removing because the change does apparently make this test pass, and that's what the comment refers to. If it's still an issue, we can restore a modified version of the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, it's fixed in #15171
What changes were proposed in this pull request?
In
ExpressionEvalHelper, we check the equality between two double values by comparing whether the expected value is within the range [target - tolerance, target + tolerance], but this can cause a negative false when the compared numerics are very large.Before:
In fact,
val1andval2are but with different precisions, we should tolerant this case by comparing with percentage range, eg.,expected is within range [target - target * tolerance_percentage, target + target * tolerance_percentage].After:
How was this patch tested?
Exsiting testcases.