[SPARK-29951][SQL] Make the behavior of Postgre dialect independent of ansi mode config #26584

xuanyuanking · 2019-11-18T20:40:41Z

What changes were proposed in this pull request?

Fix the inconsistent behavior of build-in function SQL LEFT/RIGHT.

Why are the changes needed?

As the comment in #26497 (comment), Postgre dialect should not be affected by the ANSI mode config.
During reran the existing tests, only the LEFT/RIGHT build-in SQL function broke the assumption. We fix this by following https://www.postgresql.org/docs/12/sql-keywords-appendix.html: LEFT/RIGHT reserved (can be function or type)

Does this PR introduce any user-facing change?

Yes, the Postgre dialect will not be affected by the ANSI mode config.

How was this patch tested?

Existing UT.

SparkQA · 2019-11-19T00:25:15Z

Test build #114035 has finished for PR 26584 at commit a250886.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

xuanyuanking · 2019-11-19T01:29:58Z

cc @cloud-fan

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala

SparkQA · 2019-11-19T11:42:08Z

Test build #114094 has finished for PR 26584 at commit 4953646.

This patch fails Scala style tests.
This patch does not merge cleanly.
This patch adds no public classes.

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala

cloud-fan · 2019-11-19T14:03:03Z

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

 interval
    : negativeSign=MINUS? INTERVAL (errorCapturingMultiUnitsInterval | errorCapturingUnitToUnitInterval)?
-    | {ansi}? (errorCapturingMultiUnitsInterval | errorCapturingUnitToUnitInterval)
+    | {use_SQL_standard_keywords}? (errorCapturingMultiUnitsInterval | errorCapturingUnitToUnitInterval)


@maropu maybe we should have a separated config to allow not specifying the leading INTERVAL. pgsql requires the leading INTERVAL

cloud0fan=# select interval '1' day; interval ---------- 1 day (1 row) cloud0fan=# select '1' day; ERROR: syntax error at or near "day" LINE 1: select '1' day;

Ur, I see. I just followed the behaviour of the standard, hive, and mysql. But, it looks reasonable to me. So, I'll file jira for that.

I checked SQL standard:

<interval literal> ::= INTERVAL [ <sign> ] <interval string> <interval qualifier> <interval string> ::= <quote> <unquoted interval string> <quote>

Spark SQL doesn't follow it completely as we allow interval 1 year 2 days, but INTERVAL should be required under ansi mode.

SparkQA · 2019-11-19T16:05:44Z

Test build #114096 has finished for PR 26584 at commit d357a9a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-11-19T17:31:08Z

Test build #114102 has finished for PR 26584 at commit f16b6ea.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2019-11-20T00:02:13Z

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

+   * When true, use ANSI SQL standard keywords.
   */
-  public boolean ansi = false;
+  public boolean use_SQL_standard_keywords = false;


nit: The name looks a little weird to me. This value defines not keywords but behaviours? So, follow_SQL_standard_behaviours ?

Thanks for the suggestion, how about keep both keyword and behavior in this flag, cause for the comment here, it is still closely related with keywords?

spark/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

Lines 954 to 960 in 5a70af7

// When `spark.sql.dialect.spark.ansi.enabled=true`, there are 2 kinds of keywords in Spark SQL.

// - Reserved keywords:

// Keywords that are reserved and can't be used as identifiers for table, view, column,

// function, alias, etc.

// - Non-reserved keywords:

// Keywords that have a special meaning only in particular contexts and can be used as

// identifiers in other contexts. For example, `SELECT 1 WEEK` is an interval literal, but WEEK

I do this temporarily in c05adb9. Please let me know if you don't agree :)

yea, looks fine, thanks!

maropu

LGTM except for the single comment.

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala

SparkQA · 2019-11-20T08:05:02Z

Test build #114130 has finished for PR 26584 at commit c05adb9.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-11-20T12:35:31Z

Test build #114143 has finished for PR 26584 at commit 5a4b2ea.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-11-20T16:57:03Z

thanks, merging to master!

xuanyuanking · 2019-11-21T05:09:56Z

Thanks for the review.

…f ansi mode config Fix the inconsistent behavior of build-in function SQL LEFT/RIGHT. As the comment in apache#26497 (comment), Postgre dialect should not be affected by the ANSI mode config. During reran the existing tests, only the LEFT/RIGHT build-in SQL function broke the assumption. We fix this by following https://www.postgresql.org/docs/12/sql-keywords-appendix.html: `LEFT/RIGHT reserved (can be function or type)` Yes, the Postgre dialect will not be affected by the ANSI mode config. Existing UT. Closes apache#26584 from xuanyuanking/SPARK-29951. Authored-by: Yuanjian Li <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

### What changes were proposed in this pull request? Reprocess all PostgreSQL dialect related PRs, listing in order: - #25158: PostgreSQL integral division support [revert] - #25170: UT changes for the integral division support [revert] - #25458: Accept "true", "yes", "1", "false", "no", "0", and unique prefixes as input and trim input for the boolean data type. [revert] - #25697: Combine below 2 feature tags into "spark.sql.dialect" [revert] - #26112: Date substraction support [keep the ANSI-compliant part] - #26444: Rename config "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled" [revert] - #26463: Cast to boolean support for PostgreSQL dialect [revert] - #26584: Make the behavior of Postgre dialect independent of ansi mode config [keep the ANSI-compliant part] ### Why are the changes needed? As the discussion in http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-PostgreSQL-dialect-td28417.html, we need to remove PostgreSQL dialect form code base for several reasons: 1. The current approach makes the codebase complicated and hard to maintain. 2. Fully migrating PostgreSQL workloads to Spark SQL is not our focus for now. ### Does this PR introduce any user-facing change? Yes, the config `spark.sql.dialect` will be removed. ### How was this patch tested? Existing UT. Closes #26763 from xuanyuanking/SPARK-30125. Lead-authored-by: Yuanjian Li <[email protected]> Co-authored-by: Maxim Gekk <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>