-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-18053][SQL] compare unsafe and safe complex-type values correctly #15929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #68843 has finished for PR 15929 at commit
|
| val funcCode: String = | ||
| s""" | ||
| public int $compareFunc(ArrayData a, ArrayData b) { | ||
| if (a instanceof UnsafeArrayData && b instanceof UnsafeArrayData && a == b) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the instanceof checks? This is java, you are comparing by reference.
| case dt: DataType if isPrimitiveType(dt) => s"$c1 == $c2" | ||
| case dt: DataType if dt.isInstanceOf[AtomicType] => s"$c1.equals($c2)" | ||
| case array: ArrayType => genComp(array, c1, c2) + " == 0" | ||
| case struct: StructType => genComp(struct, c1, c2) + " == 0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about MapType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MapType is not comparable. We currently do not support equals() nor hashcode() for MapData. See https://issues.apache.org/jira/browse/SPARK-18134 for a fun discussion on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
: ) That is a fun discussion.
Also accidentally found how Preso did it in a PR: https://github.com/prestodb/presto/pull/2469/files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do not plan to support MapType, could we add a negative test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in #15956
| } else if (left.dataType != BinaryType) { | ||
| input1 == input2 | ||
| } else { | ||
| java.util.Arrays.equals(input1.asInstanceOf[Array[Byte]], input2.asInstanceOf[Array[Byte]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bug, right? I mean the previous code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, we may compare an unsafe and safe row/array/map previously.
|
Test build #68915 has finished for PR 15929 at commit
|
|
LGTM. Merging to master/2.1/2.0. Thanks! |
## What changes were proposed in this pull request? In Spark SQL, some expression may output safe format values, e.g. `CreateArray`, `CreateStruct`, `Cast`, etc. When we compare 2 values, we should be able to compare safe and unsafe formats. The `GreaterThan`, `LessThan`, etc. in Spark SQL already handles it, but the `EqualTo` doesn't. This PR fixes it. ## How was this patch tested? new unit test and regression test Author: Wenchen Fan <[email protected]> Closes #15929 from cloud-fan/type-aware. (cherry picked from commit 84284e8) Signed-off-by: Herman van Hovell <[email protected]>
## What changes were proposed in this pull request? In Spark SQL, some expression may output safe format values, e.g. `CreateArray`, `CreateStruct`, `Cast`, etc. When we compare 2 values, we should be able to compare safe and unsafe formats. The `GreaterThan`, `LessThan`, etc. in Spark SQL already handles it, but the `EqualTo` doesn't. This PR fixes it. ## How was this patch tested? new unit test and regression test Author: Wenchen Fan <[email protected]> Closes #15929 from cloud-fan/type-aware. (cherry picked from commit 84284e8) Signed-off-by: Herman van Hovell <[email protected]>
## What changes were proposed in this pull request? In Spark SQL, some expression may output safe format values, e.g. `CreateArray`, `CreateStruct`, `Cast`, etc. When we compare 2 values, we should be able to compare safe and unsafe formats. The `GreaterThan`, `LessThan`, etc. in Spark SQL already handles it, but the `EqualTo` doesn't. This PR fixes it. ## How was this patch tested? new unit test and regression test Author: Wenchen Fan <[email protected]> Closes apache#15929 from cloud-fan/type-aware.
## What changes were proposed in this pull request? In Spark SQL, some expression may output safe format values, e.g. `CreateArray`, `CreateStruct`, `Cast`, etc. When we compare 2 values, we should be able to compare safe and unsafe formats. The `GreaterThan`, `LessThan`, etc. in Spark SQL already handles it, but the `EqualTo` doesn't. This PR fixes it. ## How was this patch tested? new unit test and regression test Author: Wenchen Fan <[email protected]> Closes apache#15929 from cloud-fan/type-aware.
What changes were proposed in this pull request?
In Spark SQL, some expression may output safe format values, e.g.
CreateArray,CreateStruct,Cast, etc. When we compare 2 values, we should be able to compare safe and unsafe formats.The
GreaterThan,LessThan, etc. in Spark SQL already handles it, but theEqualTodoesn't. This PR fixes it.How was this patch tested?
new unit test and regression test