Skip to content

Conversation

@jiangxb1987
Copy link
Contributor

What changes were proposed in this pull request?

In ExpressionEvalHelper, we check the equality between two double values by comparing whether the expected value is within the range [target - tolerance, target + tolerance], but this can cause a negative false when the compared numerics are very large.
Before:

val1 = 1.6358558070241E306
val2 = 1.6358558070240974E306
ExpressionEvalHelper.compareResults(val1, val2)
false

In fact, val1 and val2 are but with different precisions, we should tolerant this case by comparing with percentage range, eg.,expected is within range [target - target * tolerance_percentage, target + target * tolerance_percentage].
After:

val1 = 1.6358558070241E306
val2 = 1.6358558070240974E306
ExpressionEvalHelper.compareResults(val1, val2)
true

How was this patch tested?

Exsiting testcases.

@srowen
Copy link
Member

srowen commented Sep 12, 2016

We already have a 'relTol' operator for this purpose BTW.

@WeichenXu123
Copy link
Contributor

but relTol is defined in mllib and sql not reference it, seems better to move it to spark-core project?

@srowen
Copy link
Member

srowen commented Sep 12, 2016

Oh I see. Yes if we can move it to Spark's core test module that would be nicer.

@SparkQA
Copy link

SparkQA commented Sep 12, 2016

Test build #65265 has finished for PR 15059 at commit 78f3733.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor Author

@srowen I've addressed your comment, thank you!

@srowen
Copy link
Member

srowen commented Sep 13, 2016

That seems reasonable to pull out some generic testing utils into common from mllib. This looks OK to me. CC maybe @jkbradley ? @MLnick ? just anyone for a second opinion.

@SparkQA
Copy link

SparkQA commented Sep 13, 2016

Test build #65306 has finished for PR 15059 at commit 1721e0c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class CompareDoubleRightSide(
    • implicit class DoubleWithAlmostEquals(val x: Double)
    • class TestingUtilsSuite extends SparkFunSuite

@yanboliang
Copy link
Contributor

Moving generic testing utils from mllib to common looks OK to me. Actually we have TestingUtils under both spark.ml.util and spark.mllib.util. If we would like to move, we should remove both of them. Thanks!

@MLnick
Copy link
Contributor

MLnick commented Sep 14, 2016

Yes @yanboliang is correct - we seem to have duplicated the double testing between ml and mllib. I think we can get rid of it in ml.TestingUtils also (but keep the vector/matrix stuff as it still needs to be different between package ml.linalg and mllib.linalg).

@jiangxb1987
Copy link
Contributor Author

@yanboliang Thank you! Will address your comment soon!

@yanboliang
Copy link
Contributor

yanboliang commented Sep 14, 2016

Oops, I found that ml.TestingUtils was located at mllib-local module and we can not move it to spark-core since mllib-local is not intended to depend on other spark sub modules. So we can not move these generic testing utils from mllib to common IMO. May be we can have one sub module like hadoop common for the whole spark common utils which does not depend on other modules, and this should be further defined. Thanks! cc @MLnick @srowen

@srowen
Copy link
Member

srowen commented Sep 14, 2016

Yes in theory that's the thing to do, to have a series of increasingly 'core' modules but I think it's a bit late to undo the mono-core module design here. Instead, maybe just for this test, it's OK to 'inline' the relative tolerance logic.

@jiangxb1987
Copy link
Contributor Author

@srowen I've reverted the previous change and inlined the relative tolerance logic, thank you!

* Note that if x or y is extremely close to zero, i.e., smaller than Double.MinPositiveValue,
* the relative tolerance is meaningless, so the exception will be raised to warn users.
*/
private def relativeErrorComparison(x: Double, y: Double, eps: Double = 1E-8): Boolean = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine. You could refer to the source of this code and explain the duplication but it's not a big deal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've add comment to indicate the problem. Thank you!

@SparkQA
Copy link

SparkQA commented Sep 14, 2016

Test build #65363 has finished for PR 15059 at commit f4ef207.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 14, 2016

Test build #65364 has finished for PR 15059 at commit 6f91656.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

(result, expected) match {
case (result: Array[Byte], expected: Array[Byte]) =>
java.util.Arrays.equals(result, expected)
case (result: Double, expected: Spread[Double @unchecked]) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW do you mean to remove this case? does it not apply now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should have been replaced by the new case below.

@srowen
Copy link
Member

srowen commented Sep 18, 2016

Merged to master

@asfgit asfgit closed this in 5d3f461 Sep 18, 2016
wgtmac pushed a commit to wgtmac/spark that referenced this pull request Sep 19, 2016
## What changes were proposed in this pull request?

In `ExpressionEvalHelper`, we check the equality between two double values by comparing whether the expected value is within the range [target - tolerance, target + tolerance], but this can cause a negative false when the compared numerics are very large.
Before:
```
val1 = 1.6358558070241E306
val2 = 1.6358558070240974E306
ExpressionEvalHelper.compareResults(val1, val2)
false
```
In fact, `val1` and `val2` are but with different precisions, we should tolerant this case by comparing with percentage range, eg.,expected is within range [target - target * tolerance_percentage, target + target * tolerance_percentage].
After:
```
val1 = 1.6358558070241E306
val2 = 1.6358558070240974E306
ExpressionEvalHelper.compareResults(val1, val2)
true
```

## How was this patch tested?

Exsiting testcases.

Author: jiangxingbo <[email protected]>

Closes apache#15059 from jiangxb1987/deq.
checkEvaluation(Remainder(positiveLongLit, positiveLongLit), 0L)
checkEvaluation(Remainder(negativeLongLit, negativeLongLit), 0L)

// TODO: the following lines would fail the test due to inconsistency result of interpret
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this TODO is not fixed yet, why remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result of interpret and codegen for remainder between giant values are equal within relative tolerance, so maybe this no longer requires to be resolved. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it seemed worth removing because the change does apparently make this test pass, and that's what the comment refers to. If it's still an issue, we can restore a modified version of the comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, it's fixed in #15171

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants