-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-47463][SQL] Use V2Predicate to wrap expression with return type of boolean #45589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@beliefer @cloud-fan could you please take a look? |
|
@beliefer @cloud-fan sorry to bother you again, could you please take a look if you have time? |
|
The |
|
With hindsight, we shouldn't create the v2 It's too late to change the API now. One suggestion is to follow how we handle CASE WHEN: always use v2 |
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala
Outdated
Show resolved
Hide resolved
| None | ||
| } | ||
| case iff: If => generateExpressionWithName("CASE_WHEN", iff.children) | ||
| case iff: If => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you check other expressions matched here and see if they may return boolean type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you check other expressions matched here and see if they may return boolean type?
there is also Coalesce.
| } | ||
| } | ||
|
|
||
| private def generatePredicateWithName( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one idea: can we update generateExpressionWithName to take the target expression instead of the children? Then we can simply do
val children = e.children
...
if (d.dataType == BooleanType && isPredicate) {
new V2Predicate ...
} else {
new GeneralScalarExpression
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, I will try that.
| case unaryMinus @ UnaryMinus(_, true) => | ||
| generateExpressionWithName("-", unaryMinus, isPredicate) | ||
| case _: BitwiseNot => generateExpressionWithName("~", expr, isPredicate) | ||
| case CaseWhen(branches, elseValue) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CaseWhen always seems to return V2Predicate, is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, but this probably doesn't hurt as V2Predicate extends GeneralScalarExpression. We should still fix it to make code clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I made fix.
| generateExpressionWithName("-", unaryMinus, isPredicate) | ||
| case _: BitwiseNot => generateExpressionWithName("~", expr, isPredicate) | ||
| case caseWhen @ CaseWhen(branches, elseValue) => | ||
| val conditions = branches.map(_._1).flatMap(generateExpression(_, true)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reserved isPredicate=true for conditions of casewhen
sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala
Outdated
Show resolved
Hide resolved
| val df1 = sql( | ||
| s""" | ||
| |select * from | ||
| |(select if(i = 1, i, 0) as c from t1) t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test is sufficient. If can be a pedicate and before this PR we don't return V2Predicate which causes errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test is sufficient.
Ifcan be a pedicate and before this PR we don't return V2Predicate which causes errors.
makes sense to me
| s""" | ||
| |select * from | ||
| |(select if(i = 1, i, 0) as c from t1) t | ||
| |where t.c > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think where if(i = 1, i, 0) > 0 is a valid SQL? BTW, please upper case the SQL keywords in the SQL statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think
where if(i = 1, i, 0) > 0is a valid SQL? BTW, please upper case the SQL keywords in the SQL statement.
Indeed, thank you. I have changed.
|
thanks, merging to master! |
|
@wForget can you help to create a 3.5 backport PR? thanks! |
Sure, I will create it as soon as possible, and thanks for your review. |
…n type of boolean Backports #45589 to 3.5 ### What changes were proposed in this pull request? Use V2Predicate to wrap If expr when building v2 expressions. ### Why are the changes needed? The `PushFoldableIntoBranches` optimizer may fold predicate into (if / case) branches and `V2ExpressionBuilder` wraps `If` as `GeneralScalarExpression`, which causes the assertion in `PushablePredicate.unapply` to fail. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? added unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes #46074 from wForget/SPARK-47463_3.5. Authored-by: Zhen Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
|
I encountered a similar issue, I am using |
|
@wForget could you please take a look |
Can you provide a sql to reproduce? |
|
I finally have been able to reproduce it: Note that the exception is only thrown when the source is an iceberg table. |
I am not sure! The issue with this query is with the #48621 seems to solve an issue with |
|
late LGTM. |
…n type of boolean (apache#381) Backports apache#45589 to 3.5 ### What changes were proposed in this pull request? Use V2Predicate to wrap If expr when building v2 expressions. ### Why are the changes needed? The `PushFoldableIntoBranches` optimizer may fold predicate into (if / case) branches and `V2ExpressionBuilder` wraps `If` as `GeneralScalarExpression`, which causes the assertion in `PushablePredicate.unapply` to fail. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? added unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#46074 from wForget/SPARK-47463_3.5. Authored-by: Zhen Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> Co-authored-by: Zhen Wang <[email protected]>
What changes were proposed in this pull request?
Use V2Predicate to wrap If expr when building v2 expressions.
Why are the changes needed?
The
PushFoldableIntoBranchesoptimizer may fold predicate into (if / case) branches andV2ExpressionBuilderwrapsIfasGeneralScalarExpression, which causes the assertion inPushablePredicate.unapplyto fail.Does this PR introduce any user-facing change?
No
How was this patch tested?
added unit test
Was this patch authored or co-authored using generative AI tooling?
No