Skip to content

Conversation

@mihailom-db
Copy link
Contributor

@mihailom-db mihailom-db commented May 7, 2024

What changes were proposed in this pull request?

Special case escaping for MySQL and fix issues with redundant escaping for ' character.

Why are the changes needed?

When pushing down startsWith, endsWith and contains they are converted to LIKE. This requires addition of escape characters for these expressions. Unfortunately, MySQL uses ESCAPE '\' syntax instead of ESCAPE '' which would cause errors when trying to push down.

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

Tests for each existing dialect.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label May 7, 2024
@yaooqinn
Copy link
Member

yaooqinn commented May 7, 2024

Existing tests.

I believe that it cannot be included under the current coverage. Can we add some new tests?

@mihailom-db mihailom-db changed the title [SPARK-48172][SQL] Fix escaping issue for mysql [SPARK-48172][SQL] Fix escaping issues in JDBC Dialects May 8, 2024
@mihailom-db
Copy link
Contributor Author

@cloud-fan Could you take a look at this fix?

switch (c) {
case '_' -> builder.append("\\_");
case '%' -> builder.append("\\%");
case '\'' -> builder.append("\\\'");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see the comment of escapeSpecialCharsForLikePattern ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I do. Unfortunately, ' is not a special character that should be escaped for like expression in this way for all JDBCDialects. First red flag is that H2 had to remove this change, wouldn't we expect that the special cases of JDBC have to only add characters? Second red flag was JDBCV2Suite that actually had a problem as it is not calling visitLiteral that is implemented in JDBCDialect, but the one from V2ExpressionSQLBuilder when it was displaying the pushdown result, which is why I would presume this escape was added in the first place. We need to escape ' only when we are using pure string literals, as these literals in sql come in format of 'value'. This addition to escape ' is already done in visitLiteral and should not be done here one more time.

Copy link
Contributor

@cloud-fan cloud-fan May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input of escapeSpecialCharsForLikePattern is already a valid SQL string literal (produced by visitLiteral), so the ' is already escaped.

  private[jdbc] class JDBCSQLBuilder extends V2ExpressionSQLBuilder {
    override def visitLiteral(literal: Literal[_]): String = {
      Option(literal.value()).map(v =>
        compileValue(CatalystTypeConverters.convertToScala(v, literal.dataType())).toString)
        .getOrElse(super.visitLiteral(literal))
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

@dongjoon-hyun dongjoon-hyun dismissed their stale review May 10, 2024 03:17

Stale review.

}

class H2SQLBuilder extends JDBCSQLBuilder {
override def escapeSpecialCharsForLikePattern(str: String): String = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have noticed this at the beginning... This bug is hidden because we fixed it only for H2 and we only test it with H2.

parameters = Map("typeName" -> "sql_variant", "jdbcType" -> "-156"))
}

test("test contains pushdown") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a parent suite for this docker JDBC test suites?

switch (c) {
case '_' -> builder.append("\\_");
case '%' -> builder.append("\\%");
case '\'' -> builder.append("\\\'");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

@beliefer
Copy link
Contributor

LGTM except @cloud-fan 's comment.

@mihailom-db mihailom-db requested a review from cloud-fan May 14, 2024 07:59
@cloud-fan cloud-fan closed this in 47006a4 May 14, 2024
cloud-fan pushed a commit that referenced this pull request May 14, 2024
Special case escaping for MySQL and fix issues with redundant escaping for ' character.

When pushing down startsWith, endsWith and contains they are converted to LIKE. This requires addition of escape characters for these expressions. Unfortunately, MySQL uses ESCAPE '\\' syntax instead of ESCAPE '\' which would cause errors when trying to push down.

Yes

Tests for each existing dialect.

No.

Closes #46437 from mihailom-db/SPARK-48172.

Authored-by: Mihailo Milosevic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 47006a4)
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request May 14, 2024
Special case escaping for MySQL and fix issues with redundant escaping for ' character.

When pushing down startsWith, endsWith and contains they are converted to LIKE. This requires addition of escape characters for these expressions. Unfortunately, MySQL uses ESCAPE '\\' syntax instead of ESCAPE '\' which would cause errors when trying to push down.

Yes

Tests for each existing dialect.

No.

Closes #46437 from mihailom-db/SPARK-48172.

Authored-by: Mihailo Milosevic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 47006a4)
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan
Copy link
Contributor

merged to master/3.5/3.4!

@panbingkun
Copy link
Contributor

panbingkun commented May 14, 2024

@panbingkun
Copy link
Contributor

panbingkun commented May 15, 2024

From the GA's results, it seems that only MySQL is good, and everything else has problems, not only the support of data type LONGTEXT
cc @beliefer @yaooqinn @HyukjinKwon @cloud-fan

@panbingkun
Copy link
Contributor

When I change the data type from LONGTEXT to varchar(100), it seems that there are different problems in MsSqlServerIntegrationSuite, DB2IntegrationSuite, PostgresIntegrationSuite , eg:
image
image
image

@yaooqinn
Copy link
Member

Thank @panbingkun,I reverted this in 4ff5ca8

@panbingkun
Copy link
Contributor

panbingkun commented May 15, 2024

Thank @panbingkun,I reverted this in 4ff5ca8

Thanks! @yaooqinn

@mihailom-db After fix it, you can resubmit :)

cloud-fan pushed a commit that referenced this pull request May 15, 2024
This PR is a fix of #46437. The previous PR was reverted as `LONGTEXT` is not supported by all dialects.

### What changes were proposed in this pull request?
Special case escaping for MySQL and fix issues with redundant escaping for ' character.
New changes introduced in the fix include change `LONGTEXT` -> `VARCHAR(50)`, as well as fix for table naming in the tests.

### Why are the changes needed?
When pushing down startsWith, endsWith and contains they are converted to LIKE. This requires addition of escape characters for these expressions. Unfortunately, MySQL uses ESCAPE '\' syntax instead of ESCAPE '' which would cause errors when trying to push down.

### Does this PR introduce any user-facing change?
Yes

### How was this patch tested?
Tests for each existing dialect.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46588 from mihailom-db/SPARK-48172.

Authored-by: Mihailo Milosevic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request May 15, 2024
This PR is a fix of #46437. The previous PR was reverted as `LONGTEXT` is not supported by all dialects.

Special case escaping for MySQL and fix issues with redundant escaping for ' character.
New changes introduced in the fix include change `LONGTEXT` -> `VARCHAR(50)`, as well as fix for table naming in the tests.

When pushing down startsWith, endsWith and contains they are converted to LIKE. This requires addition of escape characters for these expressions. Unfortunately, MySQL uses ESCAPE '\' syntax instead of ESCAPE '' which would cause errors when trying to push down.

Yes

Tests for each existing dialect.

No.

Closes #46588 from mihailom-db/SPARK-48172.

Authored-by: Mihailo Milosevic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 9e386b4)
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request May 15, 2024
This PR is a fix of #46437. The previous PR was reverted as `LONGTEXT` is not supported by all dialects.

Special case escaping for MySQL and fix issues with redundant escaping for ' character.
New changes introduced in the fix include change `LONGTEXT` -> `VARCHAR(50)`, as well as fix for table naming in the tests.

When pushing down startsWith, endsWith and contains they are converted to LIKE. This requires addition of escape characters for these expressions. Unfortunately, MySQL uses ESCAPE '\' syntax instead of ESCAPE '' which would cause errors when trying to push down.

Yes

Tests for each existing dialect.

No.

Closes #46588 from mihailom-db/SPARK-48172.

Authored-by: Mihailo Milosevic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 9e386b4)
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants