-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-18519][SQL] map type can not be used in EqualTo #15956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't make sense, BinaryType is comparable, we should accept it as join key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a few tests with binary keys, that can also be done in a follow-up. The parser supports binary literals, so we can use SQLQueryTestSuite for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already added one: https://github.com/apache/spark/pull/15956/files#diff-999f50ed13c690eaec243e3b3446af08R468 , yea we should add more end-to-end tests in SQLQueryTestSuite in follow-up
|
Test build #68920 has started for PR 15956 at commit |
|
So technically I think map type should be able to be used in equality comparison, but not orderable, so I'm not sure if this is correct. Spark SQL currently has a restriction that we just don't allow map types to be used in equality comparison at all, but I don't want to structure code in a way that would make it harder to support that in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this "wrong"? map type is not supported here is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this to satisfy BinaryOperator, the type checking logic is defined by both BinaryOperator and checkInputDataTypes here. Shall we make BinaryComparison not extends BinaryOperator and duplicate its type checking logic here?
If we wanna support equality comparison for map type in the future, we can just update the type checking logic for |
|
Test build #68932 has finished for PR 15956 at commit
|
|
Test build #68938 has finished for PR 15956 at commit
|
|
Test build #68940 has finished for PR 15956 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't this be expressed? isn't the old ordered a way to express it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because TypeCollection.Ordered doesn't cover all the cases, e.g. ArrayType and StructType are orderable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, Hive only supports atomic type, shall we follow it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, postgres supports binary comparison on array type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semantically, this is different from the previous lilmits:
override def inputType: AbstractDataType = TypeCollection.Ordered
We add the support of StructType, ArrayType and UDT. It sounds like no test case coverage. Could you please add them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me minimize the size of this PR, I'll keep GreaterThan, LessThan, etc. unchanged, and only update EqualTo, EqualNullSafe to forbid map types.
| } | ||
| } | ||
|
|
||
| private def hasMapType(dt: DataType): Boolean = dt match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dt.existsRecursively(_.isInstanceOf[MapType])?
|
|
||
| override def inputType: AbstractDataType = AnyDataType | ||
|
|
||
| override def checkInputDataTypes(): TypeCheckResult = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Can we deduplicate this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well it's only duplicated twice and will be removed after #15970 , I think it should be fine.
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor things, otherwise LGTM
|
Test build #68997 has finished for PR 15956 at commit
|
|
LGTM pending Jenkins |
|
Test build #69002 has finished for PR 15956 at commit
|
|
LGTM - merging to master/2.1. Thanks! |
## What changes were proposed in this pull request? Technically map type is not orderable, but can be used in equality comparison. However, due to the limitation of the current implementation, map type can't be used in equality comparison so that it can't be join key or grouping key. This PR makes this limitation explicit, to avoid wrong result. ## How was this patch tested? updated tests. Author: Wenchen Fan <[email protected]> Closes #15956 from cloud-fan/map-type. (cherry picked from commit bb152cd) Signed-off-by: Herman van Hovell <[email protected]>
|
@cloud-fan I cannot merge this to 2.0. Can you make a backport if we need one? |
|
Yea, I'll backport |
## What changes were proposed in this pull request? Technically map type is not orderable, but can be used in equality comparison. However, due to the limitation of the current implementation, map type can't be used in equality comparison so that it can't be join key or grouping key. This PR makes this limitation explicit, to avoid wrong result. backport #15956 to 2.0 ## How was this patch tested? updated tests Author: Wenchen Fan <[email protected]> Closes #15988 from cloud-fan/map-type.
## What changes were proposed in this pull request? Technically map type is not orderable, but can be used in equality comparison. However, due to the limitation of the current implementation, map type can't be used in equality comparison so that it can't be join key or grouping key. This PR makes this limitation explicit, to avoid wrong result. ## How was this patch tested? updated tests. Author: Wenchen Fan <[email protected]> Closes apache#15956 from cloud-fan/map-type.
## What changes were proposed in this pull request? Technically map type is not orderable, but can be used in equality comparison. However, due to the limitation of the current implementation, map type can't be used in equality comparison so that it can't be join key or grouping key. This PR makes this limitation explicit, to avoid wrong result. ## How was this patch tested? updated tests. Author: Wenchen Fan <[email protected]> Closes apache#15956 from cloud-fan/map-type.
What changes were proposed in this pull request?
Technically map type is not orderable, but can be used in equality comparison. However, due to the limitation of the current implementation, map type can't be used in equality comparison so that it can't be join key or grouping key.
This PR makes this limitation explicit, to avoid wrong result.
How was this patch tested?
updated tests.