[SPARK-17506][SQL] Improve the check double values equality rule. #15059

jiangxb1987 · 2016-09-12T15:40:38Z

What changes were proposed in this pull request?

In ExpressionEvalHelper, we check the equality between two double values by comparing whether the expected value is within the range [target - tolerance, target + tolerance], but this can cause a negative false when the compared numerics are very large.
Before：

val1 = 1.6358558070241E306
val2 = 1.6358558070240974E306
ExpressionEvalHelper.compareResults(val1, val2)
false

In fact, val1 and val2 are but with different precisions, we should tolerant this case by comparing with percentage range, eg.,expected is within range [target - target * tolerance_percentage, target + target * tolerance_percentage].
After:

val1 = 1.6358558070241E306
val2 = 1.6358558070240974E306
ExpressionEvalHelper.compareResults(val1, val2)
true

How was this patch tested?

Exsiting testcases.

…range.

srowen · 2016-09-12T15:47:56Z

We already have a 'relTol' operator for this purpose BTW.

WeichenXu123 · 2016-09-12T16:50:24Z

but relTol is defined in mllib and sql not reference it, seems better to move it to spark-core project?

srowen · 2016-09-12T16:51:16Z

Oh I see. Yes if we can move it to Spark's core test module that would be nicer.

SparkQA · 2016-09-12T20:29:30Z

Test build #65265 has finished for PR 15059 at commit 78f3733.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jiangxb1987 · 2016-09-13T08:31:01Z

@srowen I've addressed your comment, thank you!

srowen · 2016-09-13T08:42:48Z

That seems reasonable to pull out some generic testing utils into common from mllib. This looks OK to me. CC maybe @jkbradley ? @MLnick ? just anyone for a second opinion.

SparkQA · 2016-09-13T10:38:45Z

Test build #65306 has finished for PR 15059 at commit 1721e0c.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class CompareDoubleRightSide(
- implicit class DoubleWithAlmostEquals(val x: Double)
- class TestingUtilsSuite extends SparkFunSuite

yanboliang · 2016-09-14T03:39:26Z

Moving generic testing utils from mllib to common looks OK to me. Actually we have TestingUtils under both spark.ml.util and spark.mllib.util. If we would like to move, we should remove both of them. Thanks!

MLnick · 2016-09-14T06:21:03Z

Yes @yanboliang is correct - we seem to have duplicated the double testing between ml and mllib. I think we can get rid of it in ml.TestingUtils also (but keep the vector/matrix stuff as it still needs to be different between package ml.linalg and mllib.linalg).

jiangxb1987 · 2016-09-14T06:23:34Z

@yanboliang Thank you! Will address your comment soon!

yanboliang · 2016-09-14T08:21:49Z

Oops, I found that ml.TestingUtils was located at mllib-local module and we can not move it to spark-core since mllib-local is not intended to depend on other spark sub modules. So we can not move these generic testing utils from mllib to common IMO. May be we can have one sub module like hadoop common for the whole spark common utils which does not depend on other modules, and this should be further defined. Thanks! cc @MLnick @srowen

srowen · 2016-09-14T08:23:34Z

Yes in theory that's the thing to do, to have a series of increasingly 'core' modules but I think it's a bit late to undo the mono-core module design here. Instead, maybe just for this test, it's OK to 'inline' the relative tolerance logic.

This reverts commit 1721e0c.

jiangxb1987 · 2016-09-14T09:33:22Z

@srowen I've reverted the previous change and inlined the relative tolerance logic, thank you!

srowen · 2016-09-14T09:43:00Z

...catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala

+   * Note that if x or y is extremely close to zero, i.e., smaller than Double.MinPositiveValue,
+   * the relative tolerance is meaningless, so the exception will be raised to warn users.
+   */
+  private def relativeErrorComparison(x: Double, y: Double, eps: Double = 1E-8): Boolean = {


Seems fine. You could refer to the source of this code and explain the duplication but it's not a big deal.

I've add comment to indicate the problem. Thank you!

SparkQA · 2016-09-14T11:34:59Z

Test build #65363 has finished for PR 15059 at commit f4ef207.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-09-14T12:49:16Z

Test build #65364 has finished for PR 15059 at commit 6f91656.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-09-16T09:17:09Z

...catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala

    (result, expected) match {
      case (result: Array[Byte], expected: Array[Byte]) =>
        java.util.Arrays.equals(result, expected)
-      case (result: Double, expected: Spread[Double @unchecked]) =>


BTW do you mean to remove this case? does it not apply now?

Yes, it should have been replaced by the new case below.

srowen · 2016-09-18T15:04:55Z

Merged to master

## What changes were proposed in this pull request? In `ExpressionEvalHelper`, we check the equality between two double values by comparing whether the expected value is within the range [target - tolerance, target + tolerance], but this can cause a negative false when the compared numerics are very large. Before： ``` val1 = 1.6358558070241E306 val2 = 1.6358558070240974E306 ExpressionEvalHelper.compareResults(val1, val2) false ``` In fact, `val1` and `val2` are but with different precisions, we should tolerant this case by comparing with percentage range, eg.,expected is within range [target - target * tolerance_percentage, target + target * tolerance_percentage]. After: ``` val1 = 1.6358558070241E306 val2 = 1.6358558070240974E306 ExpressionEvalHelper.compareResults(val1, val2) true ``` ## How was this patch tested? Exsiting testcases. Author: jiangxingbo <[email protected]> Closes apache#15059 from jiangxb1987/deq.

cloud-fan · 2016-09-21T02:02:50Z

...yst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala

    checkEvaluation(Remainder(positiveLongLit, positiveLongLit), 0L)
    checkEvaluation(Remainder(negativeLongLit, negativeLongLit), 0L)

-    // TODO: the following lines would fail the test due to inconsistency result of interpret


this TODO is not fixed yet, why remove it?

The result of interpret and codegen for remainder between giant values are equal within relative tolerance, so maybe this no longer requires to be resolved. Thanks!

Ah, it seemed worth removing because the change does apparently make this test pass, and that's what the comment refers to. If it's still an issue, we can restore a modified version of the comment.

nvm, it's fixed in #15171

check the equality of double values with tolerance within percentage …

78f3733

…range.

use relTol to compare double values.

1721e0c

jiangxb1987 added 2 commits September 14, 2016 16:47

Revert "use relTol to compare double values."

b4ad8e3

This reverts commit 1721e0c.

inline the relative tolerance logic.

f4ef207

srowen reviewed Sep 14, 2016
View reviewed changes

add comment.

6f91656

srowen reviewed Sep 16, 2016

View reviewed changes

asfgit closed this in 5d3f461 Sep 18, 2016

cloud-fan reviewed Sep 21, 2016

View reviewed changes

[SPARK-17506][SQL] Improve the check double values equality rule. #15059

[SPARK-17506][SQL] Improve the check double values equality rule. #15059

Uh oh!

Conversation

jiangxb1987 commented Sep 12, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

srowen commented Sep 12, 2016

Uh oh!

WeichenXu123 commented Sep 12, 2016

Uh oh!

srowen commented Sep 12, 2016

Uh oh!

SparkQA commented Sep 12, 2016

Uh oh!

jiangxb1987 commented Sep 13, 2016

Uh oh!

srowen commented Sep 13, 2016

Uh oh!

SparkQA commented Sep 13, 2016

Uh oh!

yanboliang commented Sep 14, 2016

Uh oh!

MLnick commented Sep 14, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiangxb1987 commented Sep 14, 2016

Uh oh!

yanboliang commented Sep 14, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srowen commented Sep 14, 2016

Uh oh!

jiangxb1987 commented Sep 14, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 14, 2016

Uh oh!

SparkQA commented Sep 14, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srowen commented Sep 18, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

MLnick commented Sep 14, 2016 •

edited

Loading

yanboliang commented Sep 14, 2016 •

edited

Loading