-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-5213] [SQL] Pluggable SQL Parser Support #4015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #25465 has started for PR 4015 at commit
|
|
nice feature. 👍 |
|
Test build #25465 timed out for PR 4015 at commit |
|
Test FAILed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumption may lead to some problem, an example from #3935 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @scwf
Since our goal is to support variety sql dialects, we can not expect them all have the same behaviours so that the priority of parser is a problem.
What about leave each dialect's own implementation and abstract a method in SQLDialect to let each dialect implement their own order of parsing ?
And Sorry if I'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the difference about describe table between hive and sparksql is a known issue, we added those cases involved into blacklist in HiveCompatibilitySuite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, even if we moved the extended parser first, I don't think we want to skip the DDLParser, right? in the meantime, we have to consider the parsing fallback (once fail, we have to resort to the DDLParser) for EVERY extended parser, then, why NOT just do the fallback in DDLParser by moving it ahead of time? That's exactly the currently implementation!
And I don't think the issues @scwf described is the motive we need to update the code here, probably a better solution is we define a unified DescribeCommand logical node, and it can be casted into different execution within the context (HiveContext / SQLContext).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree @chenghao-intel , we can define a unified DescribeCommand for that issue. And the order of DDLParser and sqlParser is not a big point since they cover different sql syntax range.
2fe7d99 to
336cd89
Compare
|
Test build #25519 has started for PR 4015 at commit
|
|
Test build #25519 has finished for PR 4015 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo?
336cd89 to
4f7f626
Compare
|
Test build #25803 has started for PR 4015 at commit
|
|
Test build #25803 has finished for PR 4015 at commit
|
|
Test FAILed. |
|
Test build #25807 has started for PR 4015 at commit
|
|
Test build #25807 has finished for PR 4015 at commit
|
|
Test FAILed. |
|
Test build #25808 has started for PR 4015 at commit
|
|
Test build #25808 has finished for PR 4015 at commit
|
|
Test FAILed. |
|
Test build #25866 has started for PR 4015 at commit
|
|
Test build #25866 has finished for PR 4015 at commit
|
|
Test PASSed. |
c8f154d to
b0e8084
Compare
|
Test build #25947 has started for PR 4015 at commit
|
|
Test build #25947 has finished for PR 4015 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An abstract interface for adding a new SQL dialect. A `Dialect` is responsible for creating a logical plan from a string representation of a query. Since the `LogicalPlan` interface is not a public stable API, custom dialects will likely be tied to specific Spark releases.
Explicitly annotate this as an @DeveloperAPI.
|
Final comments to improve user documentation. Otherwise LGTM. |
|
test this please |
|
retest this please |
|
@liancheng @rxin @marmbrus can you trigger the unit test for me? Thanks. |
|
I think Jenkins is having some trouble right now. |
|
Jenkins, retest this please. |
1 similar comment
|
Jenkins, retest this please. |
|
Test build #31088 has started for PR 4015 at commit |
|
Test build #31088 has finished for PR 4015 at commit
|
|
cc @marmbrus |
|
Thanks, merged to master. |
based on #4015, we should not delete `sqlParser` from sqlcontext, that leads to mima failed. Users implement dialect to give a fallback for `sqlParser` and we should construct `sqlParser` in sqlcontext according to the dialect `protected[sql] val sqlParser = new SparkSQLParser(getSQLDialect().parse(_))` Author: Cheng Hao <[email protected]> Author: scwf <[email protected]> Closes #5827 from scwf/sqlparser1 and squashes the following commits: 81b9737 [scwf] comment fix 0878bd1 [scwf] remove comments c19780b [scwf] fix mima tests c2895cf [scwf] Merge branch 'master' of https://github.com/apache/spark into sqlparser1 493775c [Cheng Hao] update the code as feedback 81a731f [Cheng Hao] remove the unecessary comment aab0b0b [Cheng Hao] polish the code a little bit 49b9d81 [Cheng Hao] shrink the comment for rebasing
This PR aims to make the SQL Parser Pluggable, and user can register it's own parser via Spark SQL CLI.
```
# add the jar into the classpath
$hchengmydesktop:spark>bin/spark-sql --jars sql99.jar
-- switch to "hiveql" dialect
spark-sql>SET spark.sql.dialect=hiveql;
spark-sql>SELECT * FROM src LIMIT 1;
-- switch to "sql" dialect
spark-sql>SET spark.sql.dialect=sql;
spark-sql>SELECT * FROM src LIMIT 1;
-- switch to a custom dialect
spark-sql>SET spark.sql.dialect=com.xxx.xxx.SQL99Dialect;
spark-sql>SELECT * FROM src LIMIT 1;
-- register the non-exist SQL dialect
spark-sql> SET spark.sql.dialect=NotExistedClass;
spark-sql> SELECT * FROM src LIMIT 1;
-- Exception will be thrown and switch to default sql dialect ("sql" for SQLContext and "hiveql" for HiveContext)
```
Author: Cheng Hao <[email protected]>
Closes apache#4015 from chenghao-intel/sqlparser and squashes the following commits:
493775c [Cheng Hao] update the code as feedback
81a731f [Cheng Hao] remove the unecessary comment
aab0b0b [Cheng Hao] polish the code a little bit
49b9d81 [Cheng Hao] shrink the comment for rebasing
based on apache#4015, we should not delete `sqlParser` from sqlcontext, that leads to mima failed. Users implement dialect to give a fallback for `sqlParser` and we should construct `sqlParser` in sqlcontext according to the dialect `protected[sql] val sqlParser = new SparkSQLParser(getSQLDialect().parse(_))` Author: Cheng Hao <[email protected]> Author: scwf <[email protected]> Closes apache#5827 from scwf/sqlparser1 and squashes the following commits: 81b9737 [scwf] comment fix 0878bd1 [scwf] remove comments c19780b [scwf] fix mima tests c2895cf [scwf] Merge branch 'master' of https://github.com/apache/spark into sqlparser1 493775c [Cheng Hao] update the code as feedback 81a731f [Cheng Hao] remove the unecessary comment aab0b0b [Cheng Hao] polish the code a little bit 49b9d81 [Cheng Hao] shrink the comment for rebasing
This PR aims to make the SQL Parser Pluggable, and user can register it's own parser via Spark SQL CLI.
```
# add the jar into the classpath
$hchengmydesktop:spark>bin/spark-sql --jars sql99.jar
-- switch to "hiveql" dialect
spark-sql>SET spark.sql.dialect=hiveql;
spark-sql>SELECT * FROM src LIMIT 1;
-- switch to "sql" dialect
spark-sql>SET spark.sql.dialect=sql;
spark-sql>SELECT * FROM src LIMIT 1;
-- switch to a custom dialect
spark-sql>SET spark.sql.dialect=com.xxx.xxx.SQL99Dialect;
spark-sql>SELECT * FROM src LIMIT 1;
-- register the non-exist SQL dialect
spark-sql> SET spark.sql.dialect=NotExistedClass;
spark-sql> SELECT * FROM src LIMIT 1;
-- Exception will be thrown and switch to default sql dialect ("sql" for SQLContext and "hiveql" for HiveContext)
```
Author: Cheng Hao <[email protected]>
Closes apache#4015 from chenghao-intel/sqlparser and squashes the following commits:
493775c [Cheng Hao] update the code as feedback
81a731f [Cheng Hao] remove the unecessary comment
aab0b0b [Cheng Hao] polish the code a little bit
49b9d81 [Cheng Hao] shrink the comment for rebasing
based on apache#4015, we should not delete `sqlParser` from sqlcontext, that leads to mima failed. Users implement dialect to give a fallback for `sqlParser` and we should construct `sqlParser` in sqlcontext according to the dialect `protected[sql] val sqlParser = new SparkSQLParser(getSQLDialect().parse(_))` Author: Cheng Hao <[email protected]> Author: scwf <[email protected]> Closes apache#5827 from scwf/sqlparser1 and squashes the following commits: 81b9737 [scwf] comment fix 0878bd1 [scwf] remove comments c19780b [scwf] fix mima tests c2895cf [scwf] Merge branch 'master' of https://github.com/apache/spark into sqlparser1 493775c [Cheng Hao] update the code as feedback 81a731f [Cheng Hao] remove the unecessary comment aab0b0b [Cheng Hao] polish the code a little bit 49b9d81 [Cheng Hao] shrink the comment for rebasing
This PR aims to make the SQL Parser Pluggable, and user can register it's own parser via Spark SQL CLI.
```
# add the jar into the classpath
$hchengmydesktop:spark>bin/spark-sql --jars sql99.jar
-- switch to "hiveql" dialect
spark-sql>SET spark.sql.dialect=hiveql;
spark-sql>SELECT * FROM src LIMIT 1;
-- switch to "sql" dialect
spark-sql>SET spark.sql.dialect=sql;
spark-sql>SELECT * FROM src LIMIT 1;
-- switch to a custom dialect
spark-sql>SET spark.sql.dialect=com.xxx.xxx.SQL99Dialect;
spark-sql>SELECT * FROM src LIMIT 1;
-- register the non-exist SQL dialect
spark-sql> SET spark.sql.dialect=NotExistedClass;
spark-sql> SELECT * FROM src LIMIT 1;
-- Exception will be thrown and switch to default sql dialect ("sql" for SQLContext and "hiveql" for HiveContext)
```
Author: Cheng Hao <[email protected]>
Closes apache#4015 from chenghao-intel/sqlparser and squashes the following commits:
493775c [Cheng Hao] update the code as feedback
81a731f [Cheng Hao] remove the unecessary comment
aab0b0b [Cheng Hao] polish the code a little bit
49b9d81 [Cheng Hao] shrink the comment for rebasing
based on apache#4015, we should not delete `sqlParser` from sqlcontext, that leads to mima failed. Users implement dialect to give a fallback for `sqlParser` and we should construct `sqlParser` in sqlcontext according to the dialect `protected[sql] val sqlParser = new SparkSQLParser(getSQLDialect().parse(_))` Author: Cheng Hao <[email protected]> Author: scwf <[email protected]> Closes apache#5827 from scwf/sqlparser1 and squashes the following commits: 81b9737 [scwf] comment fix 0878bd1 [scwf] remove comments c19780b [scwf] fix mima tests c2895cf [scwf] Merge branch 'master' of https://github.com/apache/spark into sqlparser1 493775c [Cheng Hao] update the code as feedback 81a731f [Cheng Hao] remove the unecessary comment aab0b0b [Cheng Hao] polish the code a little bit 49b9d81 [Cheng Hao] shrink the comment for rebasing
This PR aims to make the SQL Parser Pluggable, and user can register it's own parser via Spark SQL CLI.