-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-50087] Robust handling of boolean expressions in CASE WHEN for MsSqlServer and future connectors #48621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
MaxGekk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@milastdbx Could you review this PR, please.
sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala
Outdated
Show resolved
Hide resolved
| assert(df.collect().length == 2) | ||
| } | ||
|
|
||
| test("SPARK-50087: SqlServer handle booleans in IF in SELECT test") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this provide extra test coverage than the following new test cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a specific edge case. If you use a single IF in a SELECT it won't be pushed down, but this is.
...egration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala
Outdated
Show resolved
Hide resolved
| val stringArray = e.children().grouped(2).flatMap { arr => | ||
| arr.dropRight(1).map(inputToSQL) :+ | ||
| (arr.last match { | ||
| case p: Predicate if p.name() != "ALWAYS_TRUE" && p.name() != "ALWAYS_FALSE" => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we say more about what's special for ALWAYS_TRUE/ALWAYS_FALSE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ALWAYS_TRUE gets translated to 1, and that is an int. If wrapped, it will fail with the same issue we are trying to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ALWAYS_TRUE gets translated to 1
Where does it happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
override def compileValue(value: Any): Any = value match {
case booleanValue: Boolean => if (booleanValue) 1 else 0
case other => super.compileValue(other)
}
In MsSqlServer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, is it correct? What if the boolean is in a place that do need a boolean type, like a function that takes a boolean? This is unrelated to this PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
problem is that MsSql cannot ask for a boolean as it cannot handle booleans. Any other DB will know how to translate it so we are ok. Only way this poses a problem is if we say CASE WHEN TRUE THEN which makes no sense so we are good
sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala
Show resolved
Hide resolved
sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala
Outdated
Show resolved
Hide resolved
MsSqlServerDialect: refactor V2ExpressionSQLBuilder: remove aux
milastdbx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved.
Just please add test for when booleans are not children of case when, if such test does not exist
| | UPPER(name) AS test_type, | ||
| | name, | ||
| | IF( | ||
| | LOWER(name) = 'adfsaef' OR LOWER(name) = 'agadg', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
consider changing test literals to something more meaningful.
its easier to check what answer you are expecting
| |) | ||
| |SELECT * FROM dummy_new limit 1""".stripMargin | ||
| ) | ||
| df.collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we have some pushdown check ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add an assert with External query check
| | ELSE (name = '1') END | ||
| |""".stripMargin | ||
| ) | ||
| df.collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im not sure but do we have test case for when type is not boolean ?
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala
Outdated
Show resolved
Hide resolved
andrej-db
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double-checked the code.
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala
Outdated
Show resolved
Hide resolved
...egration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala
Outdated
Show resolved
Hide resolved
…park/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala
sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala
Outdated
Show resolved
Hide resolved
|
thanks, merging to master! |
|
late LGTM. |
What changes were proposed in this pull request?
This PR proposes to propagate the
isPredicateinfo inV2ExpressionBuilderand wrap the children of CASE WHEN expression (onlyPredicates) withIIF(<>, 1, 0)for MsSqlServer. This is done to force returning an int instead of a boolean, as SqlServer cannot handle boolean expressions as a return type in CASE WHEN.E.g.
CASE WHEN ... ELSE a = b ENDOld behavior:
CASE WHEN ... ELSE a = b END = 1New behavior:
Since in SqlServer a
= 1is appended to the CASE WHEN, THEN and ELSE blocks must return an int. Therefore the final expression becomes:CASE WHEN ... ELSE IIF(a = b, 1, 0) END = 1Why are the changes needed?
A user cannot work with an MsSqlServer data with CASE WHEN clauses or IF clauses if they wish to return a boolean value.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added tests to MsSqlServerIntegrationSuite
Was this patch authored or co-authored using generative AI tooling?
No