Skip to content

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented May 31, 2018

What changes were proposed in this pull request?

When comparing struct type values, it's a little hard to make the field names same. Simple queries like
SELECT (a, b) = (1, 2) FROM t may fail because the field names do not match.

In Postgres, when comparing struct type values, it will do safe type coercion and ignore field name difference

# create table t(i smallint, j smallint);
CREATE TABLE

# select Row(i, j) = Row(1, 1) from t;                              
 ?column? 
----------
 t
(1 row)

# select Row(i, j) < Row(1, 1) from t;
 ?column? 
----------
 f
(1 row)

# select Row(i, j) = Row(j, i) from t;
 ?column? 
----------
 t
(1 row)

This PR follows Postgres and accept structurally-equal types in comparison.

This also fixes the test failures in #21442 .

TODO:

  • check Hive and Presto.
  • think about array/map type. Postgres doesn't support type coercion for elements in array/map when comparison.

How was this patch tested?

new test cases.

@cloud-fan
Copy link
Contributor Author

sql("CREATE TABLE S(C struct<F:int>) USING parquet")
Seq(("c", "C"), ("C", "c"), ("c.f", "C.F"), ("C.F", "c.f")).foreach {
case (left, right) =>
checkAnswer(sql(s"SELECT * FROM t, S WHERE t.$left = S.$right"), Seq.empty)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test case wants to test set operation, but here it's testing filter. the new tests should've covered it.

@SparkQA
Copy link

SparkQA commented May 31, 2018

Test build #91355 has finished for PR 21470 at commit 3555e0f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

require(leftKeys.map(_.dataType) == rightKeys.map(_.dataType),
"Join keys from two sides should have same types")
require(leftKeys.map(_.dataType).zip(rightKeys.map(_.dataType)).forall {
case (l, r) => BinaryOperator.sameType(l, r)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test case to cover this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hit a test failure before changing this. This kind of check(assert, require) can only be hitten when there is a bug.

}
}.getOrElse(b) // If there is no applicable conversion, leave expression unchanged.
case b @ BinaryOperator(left, right)
if !BinaryOperator.sameType(left.dataType, right.dataType) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just change this line? The other changes in this file can be done later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findTightestCommonType doesn't accept struct type with different filed names, so the other code are needed.

var i = 0
var continue = fields1.length == fields2.length
while (i < len && continue) {
val commonType = findTightestCommonType(fields1(i).dataType, fields2(i).dataType)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we don't want to support type coercion, we can change this line to use fields1(i).dataType == fields2(i).dataType

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about nested structs?

val m = intercept[AnalysisException] {
sql("SELECT * FROM t, S WHERE c = C")
}.message
assert(m.contains("cannot resolve '(t.`c` = S.`C`)' due to data type mismatch"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pinging me, @cloud-fan .

Since this removal is a real behavior change instead of new test coverage of comparator.sql, could you add a documentation for this?

var i = 0
var continue = fields1.length == fields2.length
while (i < len && continue) {
val commonType = findTightestCommonType(fields1(i).dataType, fields2(i).dataType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about nested structs?

})
val newRight = if (right.dataType == newRightST) right else Cast(right, newRightST)

if (b.inputType.acceptsType(newLeftST) && b.inputType.acceptsType(newRightST)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible b only accepts one side (e.g., only newLeftST) but doesn't accept other side?

findTightestCommonType(left.dataType, right.dataType).map { commonType =>
if (b.inputType.acceptsType(commonType)) {
// If the expression accepts the tightest common type, cast to that.
val newLeft = if (left.dataType == commonType) left else Cast(left, commonType)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ternary operation seems to crop up a few times in this PR. Maybe we can push it out into a method?

private def castIfNeeded(e: Expression, possibleType: DataType): Expression = {
  if (e.dataType == possibleType) data else Cast(e, possibleType)
}

val len = fields1.length
var i = 0
var continue = fields1.length == fields2.length
while (i < len && continue) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop could be refactored functionally, e.g.

val commonTypes = (fields1 zip fields2).map(f => findTightestCommonType(f._1, f._2))
if (commonTypes.forall(_.isDefined)) {
 . . .

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jan 11, 2020
@github-actions github-actions bot closed this Jan 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants