- 
                Notifications
    You must be signed in to change notification settings 
- Fork 28.9k
[SPARK-29951][SQL] Make the behavior of Postgre dialect independent of ansi mode config #26584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Test build #114035 has finished for PR 26584 at commit  
 | 
| cc @cloud-fan | 
        
          
                sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala
              
                Outdated
          
            Show resolved
            Hide resolved
        
      | Test build #114094 has finished for PR 26584 at commit  
 | 
4953646    to
    d357a9a      
    Compare
  
            
          
                sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala
              
                Outdated
          
            Show resolved
            Hide resolved
        
      | interval | ||
| : negativeSign=MINUS? INTERVAL (errorCapturingMultiUnitsInterval | errorCapturingUnitToUnitInterval)? | ||
| | {ansi}? (errorCapturingMultiUnitsInterval | errorCapturingUnitToUnitInterval) | ||
| | {use_SQL_standard_keywords}? (errorCapturingMultiUnitsInterval | errorCapturingUnitToUnitInterval) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu maybe we should have a separated config to allow not specifying the leading INTERVAL. pgsql requires the leading INTERVAL
cloud0fan=# select interval '1' day;
 interval 
----------
 1 day
(1 row)
cloud0fan=# select '1' day;
ERROR:  syntax error at or near "day"
LINE 1: select '1' day;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ur, I see. I just followed the behaviour of the standard, hive, and mysql. But, it looks reasonable to me. So, I'll file jira for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked SQL standard:
<interval literal> ::=
  INTERVAL [ <sign> ] <interval string> <interval qualifier>
<interval string> ::=
  <quote> <unquoted interval string> <quote>
Spark SQL doesn't follow it completely as we allow interval 1 year 2 days, but INTERVAL should be required under ansi mode.
| Test build #114096 has finished for PR 26584 at commit  
 | 
| Test build #114102 has finished for PR 26584 at commit  
 | 
| * When true, use ANSI SQL standard keywords. | ||
| */ | ||
| public boolean ansi = false; | ||
| public boolean use_SQL_standard_keywords = false; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The name looks a little weird to me. This value defines not keywords but behaviours? So, follow_SQL_standard_behaviours ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion, how about keep both keyword and behavior in this flag, cause for the comment here, it is still closely related with keywords?
spark/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
Lines 954 to 960 in 5a70af7
| // When `spark.sql.dialect.spark.ansi.enabled=true`, there are 2 kinds of keywords in Spark SQL. | |
| // - Reserved keywords: | |
| // Keywords that are reserved and can't be used as identifiers for table, view, column, | |
| // function, alias, etc. | |
| // - Non-reserved keywords: | |
| // Keywords that have a special meaning only in particular contexts and can be used as | |
| // identifiers in other contexts. For example, `SELECT 1 WEEK` is an interval literal, but WEEK | 
I do this temporarily in c05adb9. Please let me know if you don't agree :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, looks fine, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for the single comment.
f16b6ea    to
    c05adb9      
    Compare
  
            
          
                sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala
              
                Outdated
          
            Show resolved
            Hide resolved
        
      | Test build #114130 has finished for PR 26584 at commit  
 | 
| Test build #114143 has finished for PR 26584 at commit  
 | 
| thanks, merging to master! | 
| Thanks for the review. | 
…f ansi mode config Fix the inconsistent behavior of build-in function SQL LEFT/RIGHT. As the comment in apache#26497 (comment), Postgre dialect should not be affected by the ANSI mode config. During reran the existing tests, only the LEFT/RIGHT build-in SQL function broke the assumption. We fix this by following https://www.postgresql.org/docs/12/sql-keywords-appendix.html: `LEFT/RIGHT reserved (can be function or type)` Yes, the Postgre dialect will not be affected by the ANSI mode config. Existing UT. Closes apache#26584 from xuanyuanking/SPARK-29951. Authored-by: Yuanjian Li <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
### What changes were proposed in this pull request? Reprocess all PostgreSQL dialect related PRs, listing in order: - #25158: PostgreSQL integral division support [revert] - #25170: UT changes for the integral division support [revert] - #25458: Accept "true", "yes", "1", "false", "no", "0", and unique prefixes as input and trim input for the boolean data type. [revert] - #25697: Combine below 2 feature tags into "spark.sql.dialect" [revert] - #26112: Date substraction support [keep the ANSI-compliant part] - #26444: Rename config "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled" [revert] - #26463: Cast to boolean support for PostgreSQL dialect [revert] - #26584: Make the behavior of Postgre dialect independent of ansi mode config [keep the ANSI-compliant part] ### Why are the changes needed? As the discussion in http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-PostgreSQL-dialect-td28417.html, we need to remove PostgreSQL dialect form code base for several reasons: 1. The current approach makes the codebase complicated and hard to maintain. 2. Fully migrating PostgreSQL workloads to Spark SQL is not our focus for now. ### Does this PR introduce any user-facing change? Yes, the config `spark.sql.dialect` will be removed. ### How was this patch tested? Existing UT. Closes #26763 from xuanyuanking/SPARK-30125. Lead-authored-by: Yuanjian Li <[email protected]> Co-authored-by: Maxim Gekk <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
Fix the inconsistent behavior of build-in function SQL LEFT/RIGHT.
Why are the changes needed?
As the comment in #26497 (comment), Postgre dialect should not be affected by the ANSI mode config.
During reran the existing tests, only the LEFT/RIGHT build-in SQL function broke the assumption. We fix this by following https://www.postgresql.org/docs/12/sql-keywords-appendix.html:
LEFT/RIGHT reserved (can be function or type)Does this PR introduce any user-facing change?
Yes, the Postgre dialect will not be affected by the ANSI mode config.
How was this patch tested?
Existing UT.