-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-24443][SQL] comparison should accept structurally-equal types #21470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| sql("CREATE TABLE S(C struct<F:int>) USING parquet") | ||
| Seq(("c", "C"), ("C", "c"), ("c.f", "C.F"), ("C.F", "c.f")).foreach { | ||
| case (left, right) => | ||
| checkAnswer(sql(s"SELECT * FROM t, S WHERE t.$left = S.$right"), Seq.empty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test case wants to test set operation, but here it's testing filter. the new tests should've covered it.
|
Test build #91355 has finished for PR 21470 at commit
|
| require(leftKeys.map(_.dataType) == rightKeys.map(_.dataType), | ||
| "Join keys from two sides should have same types") | ||
| require(leftKeys.map(_.dataType).zip(rightKeys.map(_.dataType)).forall { | ||
| case (l, r) => BinaryOperator.sameType(l, r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a test case to cover this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hit a test failure before changing this. This kind of check(assert, require) can only be hitten when there is a bug.
| } | ||
| }.getOrElse(b) // If there is no applicable conversion, leave expression unchanged. | ||
| case b @ BinaryOperator(left, right) | ||
| if !BinaryOperator.sameType(left.dataType, right.dataType) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just change this line? The other changes in this file can be done later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
findTightestCommonType doesn't accept struct type with different filed names, so the other code are needed.
| var i = 0 | ||
| var continue = fields1.length == fields2.length | ||
| while (i < len && continue) { | ||
| val commonType = findTightestCommonType(fields1(i).dataType, fields2(i).dataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we don't want to support type coercion, we can change this line to use fields1(i).dataType == fields2(i).dataType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about nested structs?
| val m = intercept[AnalysisException] { | ||
| sql("SELECT * FROM t, S WHERE c = C") | ||
| }.message | ||
| assert(m.contains("cannot resolve '(t.`c` = S.`C`)' due to data type mismatch")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pinging me, @cloud-fan .
Since this removal is a real behavior change instead of new test coverage of comparator.sql, could you add a documentation for this?
| var i = 0 | ||
| var continue = fields1.length == fields2.length | ||
| while (i < len && continue) { | ||
| val commonType = findTightestCommonType(fields1(i).dataType, fields2(i).dataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about nested structs?
| }) | ||
| val newRight = if (right.dataType == newRightST) right else Cast(right, newRightST) | ||
|
|
||
| if (b.inputType.acceptsType(newLeftST) && b.inputType.acceptsType(newRightST)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible b only accepts one side (e.g., only newLeftST) but doesn't accept other side?
| findTightestCommonType(left.dataType, right.dataType).map { commonType => | ||
| if (b.inputType.acceptsType(commonType)) { | ||
| // If the expression accepts the tightest common type, cast to that. | ||
| val newLeft = if (left.dataType == commonType) left else Cast(left, commonType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ternary operation seems to crop up a few times in this PR. Maybe we can push it out into a method?
private def castIfNeeded(e: Expression, possibleType: DataType): Expression = {
if (e.dataType == possibleType) data else Cast(e, possibleType)
}
| val len = fields1.length | ||
| var i = 0 | ||
| var continue = fields1.length == fields2.length | ||
| while (i < len && continue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop could be refactored functionally, e.g.
val commonTypes = (fields1 zip fields2).map(f => findTightestCommonType(f._1, f._2))
if (commonTypes.forall(_.isDefined)) {
. . .
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
When comparing struct type values, it's a little hard to make the field names same. Simple queries like
SELECT (a, b) = (1, 2) FROM tmay fail because the field names do not match.In Postgres, when comparing struct type values, it will do safe type coercion and ignore field name difference
This PR follows Postgres and accept structurally-equal types in comparison.
This also fixes the test failures in #21442 .
TODO:
How was this patch tested?
new test cases.