-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-48172][SQL] Fix escaping issues in JDBC Dialects #46437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I believe that it cannot be included under the current coverage. Can we add some new tests? |
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java
Outdated
Show resolved
Hide resolved
|
@cloud-fan Could you take a look at this fix? |
| switch (c) { | ||
| case '_' -> builder.append("\\_"); | ||
| case '%' -> builder.append("\\%"); | ||
| case '\'' -> builder.append("\\\'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you see the comment of escapeSpecialCharsForLikePattern ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I do. Unfortunately, ' is not a special character that should be escaped for like expression in this way for all JDBCDialects. First red flag is that H2 had to remove this change, wouldn't we expect that the special cases of JDBC have to only add characters? Second red flag was JDBCV2Suite that actually had a problem as it is not calling visitLiteral that is implemented in JDBCDialect, but the one from V2ExpressionSQLBuilder when it was displaying the pushdown result, which is why I would presume this escape was added in the first place. We need to escape ' only when we are using pure string literals, as these literals in sql come in format of 'value'. This addition to escape ' is already done in visitLiteral and should not be done here one more time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input of escapeSpecialCharsForLikePattern is already a valid SQL string literal (produced by visitLiteral), so the ' is already escaped.
private[jdbc] class JDBCSQLBuilder extends V2ExpressionSQLBuilder {
override def visitLiteral(literal: Literal[_]): String = {
Option(literal.value()).map(v =>
compileValue(CatalystTypeConverters.convertToScala(v, literal.dataType())).toString)
.getOrElse(super.visitLiteral(literal))
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
| } | ||
|
|
||
| class H2SQLBuilder extends JDBCSQLBuilder { | ||
| override def escapeSpecialCharsForLikePattern(str: String): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have noticed this at the beginning... This bug is hidden because we fixed it only for H2 and we only test it with H2.
| parameters = Map("typeName" -> "sql_variant", "jdbcType" -> "-156")) | ||
| } | ||
|
|
||
| test("test contains pushdown") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have a parent suite for this docker JDBC test suites?
sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java
Outdated
Show resolved
Hide resolved
| switch (c) { | ||
| case '_' -> builder.append("\\_"); | ||
| case '%' -> builder.append("\\%"); | ||
| case '\'' -> builder.append("\\\'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
|
LGTM except @cloud-fan 's comment. |
Special case escaping for MySQL and fix issues with redundant escaping for ' character. When pushing down startsWith, endsWith and contains they are converted to LIKE. This requires addition of escape characters for these expressions. Unfortunately, MySQL uses ESCAPE '\\' syntax instead of ESCAPE '\' which would cause errors when trying to push down. Yes Tests for each existing dialect. No. Closes #46437 from mihailom-db/SPARK-48172. Authored-by: Mihailo Milosevic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 47006a4) Signed-off-by: Wenchen Fan <[email protected]>
Special case escaping for MySQL and fix issues with redundant escaping for ' character. When pushing down startsWith, endsWith and contains they are converted to LIKE. This requires addition of escape characters for these expressions. Unfortunately, MySQL uses ESCAPE '\\' syntax instead of ESCAPE '\' which would cause errors when trying to push down. Yes Tests for each existing dialect. No. Closes #46437 from mihailom-db/SPARK-48172. Authored-by: Mihailo Milosevic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 47006a4) Signed-off-by: Wenchen Fan <[email protected]>
|
merged to master/3.5/3.4! |
|
@mihailom-db |
|
From the GA's results, it seems that only MySQL is good, and everything else has problems, not only the support of data type |
|
Thank @panbingkun,I reverted this in 4ff5ca8 |
Thanks! @yaooqinn @mihailom-db After fix it, you can resubmit :) |
This PR is a fix of #46437. The previous PR was reverted as `LONGTEXT` is not supported by all dialects. ### What changes were proposed in this pull request? Special case escaping for MySQL and fix issues with redundant escaping for ' character. New changes introduced in the fix include change `LONGTEXT` -> `VARCHAR(50)`, as well as fix for table naming in the tests. ### Why are the changes needed? When pushing down startsWith, endsWith and contains they are converted to LIKE. This requires addition of escape characters for these expressions. Unfortunately, MySQL uses ESCAPE '\' syntax instead of ESCAPE '' which would cause errors when trying to push down. ### Does this PR introduce any user-facing change? Yes ### How was this patch tested? Tests for each existing dialect. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46588 from mihailom-db/SPARK-48172. Authored-by: Mihailo Milosevic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
This PR is a fix of #46437. The previous PR was reverted as `LONGTEXT` is not supported by all dialects. Special case escaping for MySQL and fix issues with redundant escaping for ' character. New changes introduced in the fix include change `LONGTEXT` -> `VARCHAR(50)`, as well as fix for table naming in the tests. When pushing down startsWith, endsWith and contains they are converted to LIKE. This requires addition of escape characters for these expressions. Unfortunately, MySQL uses ESCAPE '\' syntax instead of ESCAPE '' which would cause errors when trying to push down. Yes Tests for each existing dialect. No. Closes #46588 from mihailom-db/SPARK-48172. Authored-by: Mihailo Milosevic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 9e386b4) Signed-off-by: Wenchen Fan <[email protected]>
This PR is a fix of #46437. The previous PR was reverted as `LONGTEXT` is not supported by all dialects. Special case escaping for MySQL and fix issues with redundant escaping for ' character. New changes introduced in the fix include change `LONGTEXT` -> `VARCHAR(50)`, as well as fix for table naming in the tests. When pushing down startsWith, endsWith and contains they are converted to LIKE. This requires addition of escape characters for these expressions. Unfortunately, MySQL uses ESCAPE '\' syntax instead of ESCAPE '' which would cause errors when trying to push down. Yes Tests for each existing dialect. No. Closes #46588 from mihailom-db/SPARK-48172. Authored-by: Mihailo Milosevic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 9e386b4) Signed-off-by: Wenchen Fan <[email protected]>




What changes were proposed in this pull request?
Special case escaping for MySQL and fix issues with redundant escaping for ' character.
Why are the changes needed?
When pushing down startsWith, endsWith and contains they are converted to LIKE. This requires addition of escape characters for these expressions. Unfortunately, MySQL uses ESCAPE '\' syntax instead of ESCAPE '' which would cause errors when trying to push down.
Does this PR introduce any user-facing change?
Yes
How was this patch tested?
Tests for each existing dialect.
Was this patch authored or co-authored using generative AI tooling?
No.