Skip to content

Conversation

@andrej-db
Copy link
Contributor

@andrej-db andrej-db commented Oct 23, 2024

What changes were proposed in this pull request?

This PR proposes to propagate the isPredicate info in V2ExpressionBuilder and wrap the children of CASE WHEN expression (only Predicates) with IIF(<>, 1, 0) for MsSqlServer. This is done to force returning an int instead of a boolean, as SqlServer cannot handle boolean expressions as a return type in CASE WHEN.

E.g.
CASE WHEN ... ELSE a = b END

Old behavior:
CASE WHEN ... ELSE a = b END = 1

New behavior:
Since in SqlServer a = 1 is appended to the CASE WHEN, THEN and ELSE blocks must return an int. Therefore the final expression becomes:
CASE WHEN ... ELSE IIF(a = b, 1, 0) END = 1

Why are the changes needed?

A user cannot work with an MsSqlServer data with CASE WHEN clauses or IF clauses if they wish to return a boolean value.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added tests to MsSqlServerIntegrationSuite

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Oct 23, 2024
Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@milastdbx Could you review this PR, please.

assert(df.collect().length == 2)
}

test("SPARK-50087: SqlServer handle booleans in IF in SELECT test") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this provide extra test coverage than the following new test cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a specific edge case. If you use a single IF in a SELECT it won't be pushed down, but this is.

val stringArray = e.children().grouped(2).flatMap { arr =>
arr.dropRight(1).map(inputToSQL) :+
(arr.last match {
case p: Predicate if p.name() != "ALWAYS_TRUE" && p.name() != "ALWAYS_FALSE" =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we say more about what's special for ALWAYS_TRUE/ALWAYS_FALSE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALWAYS_TRUE gets translated to 1, and that is an int. If wrapped, it will fail with the same issue we are trying to fix.

Copy link
Contributor Author

@andrej-db andrej-db Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALWAYS_TRUE gets translated to 1

Where does it happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  override def compileValue(value: Any): Any = value match {
    case booleanValue: Boolean => if (booleanValue) 1 else 0
    case other => super.compileValue(other)
  }

In MsSqlServer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, is it correct? What if the boolean is in a place that do need a boolean type, like a function that takes a boolean? This is unrelated to this PR though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

problem is that MsSql cannot ask for a boolean as it cannot handle booleans. Any other DB will know how to translate it so we are ok. Only way this poses a problem is if we say CASE WHEN TRUE THEN which makes no sense so we are good

MsSqlServerDialect: refactor
V2ExpressionSQLBuilder: remove aux
Copy link
Contributor

@milastdbx milastdbx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved.
Just please add test for when booleans are not children of case when, if such test does not exist

| UPPER(name) AS test_type,
| name,
| IF(
| LOWER(name) = 'adfsaef' OR LOWER(name) = 'agadg',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
consider changing test literals to something more meaningful.

its easier to check what answer you are expecting

|)
|SELECT * FROM dummy_new limit 1""".stripMargin
)
df.collect()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have some pushdown check ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add an assert with External query check

| ELSE (name = '1') END
|""".stripMargin
)
df.collect()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im not sure but do we have test case for when type is not boolean ?

Copy link
Contributor Author

@andrej-db andrej-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double-checked the code.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@beliefer
Copy link
Contributor

late LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants