Skip to content

Conversation

@xuanyuanking
Copy link
Member

@xuanyuanking xuanyuanking commented Nov 18, 2019

What changes were proposed in this pull request?

Fix the inconsistent behavior of build-in function SQL LEFT/RIGHT.

Why are the changes needed?

As the comment in #26497 (comment), Postgre dialect should not be affected by the ANSI mode config.
During reran the existing tests, only the LEFT/RIGHT build-in SQL function broke the assumption. We fix this by following https://www.postgresql.org/docs/12/sql-keywords-appendix.html: LEFT/RIGHT reserved (can be function or type)

Does this PR introduce any user-facing change?

Yes, the Postgre dialect will not be affected by the ANSI mode config.

How was this patch tested?

Existing UT.

@SparkQA
Copy link

SparkQA commented Nov 19, 2019

Test build #114035 has finished for PR 26584 at commit a250886.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xuanyuanking
Copy link
Member Author

cc @cloud-fan

@SparkQA
Copy link

SparkQA commented Nov 19, 2019

Test build #114094 has finished for PR 26584 at commit 4953646.

  • This patch fails Scala style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

interval
: negativeSign=MINUS? INTERVAL (errorCapturingMultiUnitsInterval | errorCapturingUnitToUnitInterval)?
| {ansi}? (errorCapturingMultiUnitsInterval | errorCapturingUnitToUnitInterval)
| {use_SQL_standard_keywords}? (errorCapturingMultiUnitsInterval | errorCapturingUnitToUnitInterval)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu maybe we should have a separated config to allow not specifying the leading INTERVAL. pgsql requires the leading INTERVAL

cloud0fan=# select interval '1' day;
 interval 
----------
 1 day
(1 row)

cloud0fan=# select '1' day;
ERROR:  syntax error at or near "day"
LINE 1: select '1' day;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, I see. I just followed the behaviour of the standard, hive, and mysql. But, it looks reasonable to me. So, I'll file jira for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked SQL standard:

<interval literal> ::=
  INTERVAL [ <sign> ] <interval string> <interval qualifier>

<interval string> ::=
  <quote> <unquoted interval string> <quote>

Spark SQL doesn't follow it completely as we allow interval 1 year 2 days, but INTERVAL should be required under ansi mode.

@SparkQA
Copy link

SparkQA commented Nov 19, 2019

Test build #114096 has finished for PR 26584 at commit d357a9a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 19, 2019

Test build #114102 has finished for PR 26584 at commit f16b6ea.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* When true, use ANSI SQL standard keywords.
*/
public boolean ansi = false;
public boolean use_SQL_standard_keywords = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The name looks a little weird to me. This value defines not keywords but behaviours? So, follow_SQL_standard_behaviours ?

Copy link
Member Author

@xuanyuanking xuanyuanking Nov 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, how about keep both keyword and behavior in this flag, cause for the comment here, it is still closely related with keywords?

// When `spark.sql.dialect.spark.ansi.enabled=true`, there are 2 kinds of keywords in Spark SQL.
// - Reserved keywords:
// Keywords that are reserved and can't be used as identifiers for table, view, column,
// function, alias, etc.
// - Non-reserved keywords:
// Keywords that have a special meaning only in particular contexts and can be used as
// identifiers in other contexts. For example, `SELECT 1 WEEK` is an interval literal, but WEEK

I do this temporarily in c05adb9. Please let me know if you don't agree :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, looks fine, thanks!

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for the single comment.

@SparkQA
Copy link

SparkQA commented Nov 20, 2019

Test build #114130 has finished for PR 26584 at commit c05adb9.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 20, 2019

Test build #114143 has finished for PR 26584 at commit 5a4b2ea.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan cloud-fan closed this in 23b3c4f Nov 20, 2019
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@xuanyuanking
Copy link
Member Author

Thanks for the review.

@xuanyuanking xuanyuanking deleted the SPARK-29951 branch November 21, 2019 05:09
xuanyuanking added a commit to xuanyuanking/spark that referenced this pull request Dec 9, 2019
…f ansi mode config

Fix the inconsistent behavior of build-in function SQL LEFT/RIGHT.

As the comment in apache#26497 (comment), Postgre dialect should not be affected by the ANSI mode config.
During reran the existing tests, only the LEFT/RIGHT build-in SQL function broke the assumption. We fix this by following https://www.postgresql.org/docs/12/sql-keywords-appendix.html: `LEFT/RIGHT reserved (can be function or type)`

Yes, the Postgre dialect will not be affected by the ANSI mode config.

Existing UT.

Closes apache#26584 from xuanyuanking/SPARK-29951.

Authored-by: Yuanjian Li <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Dec 10, 2019
### What changes were proposed in this pull request?
Reprocess all PostgreSQL dialect related PRs, listing in order:

- #25158: PostgreSQL integral division support [revert]
- #25170: UT changes for the integral division support [revert]
- #25458: Accept "true", "yes", "1", "false", "no", "0", and unique prefixes as input and trim input for the boolean data type. [revert]
- #25697: Combine below 2 feature tags into "spark.sql.dialect" [revert]
- #26112: Date substraction support [keep the ANSI-compliant part]
- #26444: Rename config "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled" [revert]
- #26463: Cast to boolean support for PostgreSQL dialect [revert]
- #26584: Make the behavior of Postgre dialect independent of ansi mode config [keep the ANSI-compliant part]

### Why are the changes needed?
As the discussion in http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-PostgreSQL-dialect-td28417.html, we need to remove PostgreSQL dialect form code base for several reasons:
1. The current approach makes the codebase complicated and hard to maintain.
2. Fully migrating PostgreSQL workloads to Spark SQL is not our focus for now.

### Does this PR introduce any user-facing change?
Yes, the config `spark.sql.dialect` will be removed.

### How was this patch tested?
Existing UT.

Closes #26763 from xuanyuanking/SPARK-30125.

Lead-authored-by: Yuanjian Li <[email protected]>
Co-authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants