-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-26950][SQL][TEST] Make RandomDataGenerator use Float.NaN or Double.NaN for all NaN values #23851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…uble.NaN for all NaN values
|
cc @dbtsai , @cloud-fan , @gatorsmile , @HyukjinKwon |
|
Test build #102576 has finished for PR 23851 at commit
|
|
Retest this please. |
1 similar comment
|
Retest this please. |
|
it seems the better fix is to wrap expressions with |
|
Test build #102579 has finished for PR 23851 at commit
|
|
Huh! I didn't realize there were many representations of NaN. This seems OK. However this code caught my attention; it seems to be trying to generate floats 'uniformly' with |
|
Thank you for review, @cloud-fan and @srowen . To @cloud-fan .
We can wrap (transform) the first argument protected def checkEvaluationWithUnsafeProjection(
expression: Expression,
expected: Any,
inputRow: InternalRow = EmptyRow): Unit = { |
|
Hi, @srowen . For |
|
That's fine, it's not a big deal. I think the intent is to choose from all possible float values with equal probability, which isn't uniform over its range, but, 'uniform' over all possible values of a float. |
|
Test build #102586 has finished for PR 23851 at commit
|
…uble.NaN for all NaN values ## What changes were proposed in this pull request? Apache Spark uses the predefined `Float.NaN` and `Double.NaN` for NaN values, but there exists more NaN values with different binary presentations. ```scala scala> java.nio.ByteBuffer.allocate(4).putFloat(Float.NaN).array res1: Array[Byte] = Array(127, -64, 0, 0) scala> val x = java.lang.Float.intBitsToFloat(-6966608) x: Float = NaN scala> java.nio.ByteBuffer.allocate(4).putFloat(x).array res2: Array[Byte] = Array(-1, -107, -78, -80) ``` Since users can have these values, `RandomDataGenerator` generates these NaN values. However, this causes `checkEvaluationWithUnsafeProjection` failures due to the difference between `UnsafeRow` binary presentation. The following is the UT failure instance. This PR aims to fix this UT flakiness. - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102528/testReport/ ## How was this patch tested? Pass the Jenkins with the newly added test cases. Closes #23851 from dongjoon-hyun/SPARK-26950. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit ffef3d4) Signed-off-by: Wenchen Fan <[email protected]>
|
thanks, merging to master/2.4! |
|
Oh ha I also tried to merge it just now and got weird errors. That's why. |
|
Thank you, @cloud-fan , @srowen , @maropu . |
…uble.NaN for all NaN values ## What changes were proposed in this pull request? Apache Spark uses the predefined `Float.NaN` and `Double.NaN` for NaN values, but there exists more NaN values with different binary presentations. ```scala scala> java.nio.ByteBuffer.allocate(4).putFloat(Float.NaN).array res1: Array[Byte] = Array(127, -64, 0, 0) scala> val x = java.lang.Float.intBitsToFloat(-6966608) x: Float = NaN scala> java.nio.ByteBuffer.allocate(4).putFloat(x).array res2: Array[Byte] = Array(-1, -107, -78, -80) ``` Since users can have these values, `RandomDataGenerator` generates these NaN values. However, this causes `checkEvaluationWithUnsafeProjection` failures due to the difference between `UnsafeRow` binary presentation. The following is the UT failure instance. This PR aims to fix this UT flakiness. - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102528/testReport/ ## How was this patch tested? Pass the Jenkins with the newly added test cases. Closes #23851 from dongjoon-hyun/SPARK-26950. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit ffef3d4) Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit ef67be3) Signed-off-by: Dongjoon Hyun <[email protected]>
|
To prevent flakiness, I merged this to branch-2.3, too. |
…uble.NaN for all NaN values ## What changes were proposed in this pull request? Apache Spark uses the predefined `Float.NaN` and `Double.NaN` for NaN values, but there exists more NaN values with different binary presentations. ```scala scala> java.nio.ByteBuffer.allocate(4).putFloat(Float.NaN).array res1: Array[Byte] = Array(127, -64, 0, 0) scala> val x = java.lang.Float.intBitsToFloat(-6966608) x: Float = NaN scala> java.nio.ByteBuffer.allocate(4).putFloat(x).array res2: Array[Byte] = Array(-1, -107, -78, -80) ``` Since users can have these values, `RandomDataGenerator` generates these NaN values. However, this causes `checkEvaluationWithUnsafeProjection` failures due to the difference between `UnsafeRow` binary presentation. The following is the UT failure instance. This PR aims to fix this UT flakiness. - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102528/testReport/ ## How was this patch tested? Pass the Jenkins with the newly added test cases. Closes apache#23851 from dongjoon-hyun/SPARK-26950. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit ffef3d4) Signed-off-by: Wenchen Fan <[email protected]>
…uble.NaN for all NaN values ## What changes were proposed in this pull request? Apache Spark uses the predefined `Float.NaN` and `Double.NaN` for NaN values, but there exists more NaN values with different binary presentations. ```scala scala> java.nio.ByteBuffer.allocate(4).putFloat(Float.NaN).array res1: Array[Byte] = Array(127, -64, 0, 0) scala> val x = java.lang.Float.intBitsToFloat(-6966608) x: Float = NaN scala> java.nio.ByteBuffer.allocate(4).putFloat(x).array res2: Array[Byte] = Array(-1, -107, -78, -80) ``` Since users can have these values, `RandomDataGenerator` generates these NaN values. However, this causes `checkEvaluationWithUnsafeProjection` failures due to the difference between `UnsafeRow` binary presentation. The following is the UT failure instance. This PR aims to fix this UT flakiness. - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102528/testReport/ ## How was this patch tested? Pass the Jenkins with the newly added test cases. Closes apache#23851 from dongjoon-hyun/SPARK-26950. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit ffef3d4) Signed-off-by: Wenchen Fan <[email protected]>
…uble.NaN for all NaN values ## What changes were proposed in this pull request? Apache Spark uses the predefined `Float.NaN` and `Double.NaN` for NaN values, but there exists more NaN values with different binary presentations. ```scala scala> java.nio.ByteBuffer.allocate(4).putFloat(Float.NaN).array res1: Array[Byte] = Array(127, -64, 0, 0) scala> val x = java.lang.Float.intBitsToFloat(-6966608) x: Float = NaN scala> java.nio.ByteBuffer.allocate(4).putFloat(x).array res2: Array[Byte] = Array(-1, -107, -78, -80) ``` Since users can have these values, `RandomDataGenerator` generates these NaN values. However, this causes `checkEvaluationWithUnsafeProjection` failures due to the difference between `UnsafeRow` binary presentation. The following is the UT failure instance. This PR aims to fix this UT flakiness. - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102528/testReport/ ## How was this patch tested? Pass the Jenkins with the newly added test cases. Closes apache#23851 from dongjoon-hyun/SPARK-26950. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit ffef3d4) Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
Apache Spark uses the predefined
Float.NaNandDouble.NaNfor NaN values, but there exists more NaN values with different binary presentations.Since users can have these values,
RandomDataGeneratorgenerates these NaN values. However, this causescheckEvaluationWithUnsafeProjectionfailures due to the difference betweenUnsafeRowbinary presentation. The following is the UT failure instance. This PR aims to fix this UT flakiness.How was this patch tested?
Pass the Jenkins with the newly added test cases.