-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-26321][SQL] Improve the behavior of sql text splitting for the spark-sql command line #23276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
Outdated
Show resolved
Hide resolved
|
Test build #99921 has finished for PR 23276 at commit
|
|
Test build #99923 has finished for PR 23276 at commit
|
|
cc @gatorsmile |
|
Test build #99965 has finished for PR 23276 at commit
|
The failed unit test works fine on my laptop. |
|
Maybe this is obvious to you all, but what is the output, and what was expected here? |
|
OK I will add more desc. @srowen This is actually a trivial PR!Please review. |
|
Test build #99999 has finished for PR 23276 at commit
|
|
Let us first update the PR description to explain the problem we want to resolve. Regarding comments, our SQL parser follows PostgreSQL. You can read SqlBase.g4 to confirm it. In PostgreSQL doc: https://www.postgresql.org/docs/9.4/sql-syntax-lexical.html
|
Thus, you need to consider both double quotes, single quotes. Also backstick, which is being used by Spark SQL for quoting identifiers. |
|
OK, I will polish it later. We are not using the BRACKETED_COMMENT. It seems to be a complicated task. I have to admit that I am ignorant of Antlr. Test cases mannually verified on spark.sql: |
|
@gatorsmile Please re-review. |
|
Test build #100421 has finished for PR 23276 at commit
|
...e-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
Outdated
Show resolved
Hide resolved
...e-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
Outdated
Show resolved
Hide resolved
...e-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala
Outdated
Show resolved
Hide resolved
|
Test build #100689 has finished for PR 23276 at commit
|
|
Test build #100688 has finished for PR 23276 at commit
|
|
Rebased,still work in progress. |
|
@gatorsmile Please re-review. |
|
Test build #100891 has finished for PR 23276 at commit
|
|
Jenkins ??? |
|
retest this please |
|
|
Test build #100913 has finished for PR 23276 at commit
|
|
Rebase on the latest master with the newer Mockito. @gatorsmile Please review. |
|
Test build #101536 has finished for PR 23276 at commit
|
|
@gatorsmile Conflicts resolved, please re-review. |
|
Test build #101709 has finished for PR 23276 at commit
|
Manually tested, it works fine. |
|
retest this please |
|
Test build #101727 has finished for PR 23276 at commit
|
| } | ||
|
|
||
| // method body imported from Hive and translated from Java to Scala | ||
| override def processLine(line: String, allowInterrupting: Boolean): Int = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the default processLine implementation doesn't handle ; well? do you mean hive sql shell have this bug as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes,there is a buggy impl in hive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the hive code and seems the ; is well handled at least in the master branch: https://github.com/apache/hive/blob/master/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java#L395
Do you mean only Hive 1.2 has this bug? Maybe we should upgrade hive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes,and I had not checked the impl on Hive master. We may judge which impl is better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upgrade the built-in Hive can fix this issue:

Another related issue:
https://issues.apache.org/jira/browse/SPARK-12014
| console.printInfo("Press Ctrl+C again to kill JVM") | ||
|
|
||
| // First, kill any running Spark jobs | ||
| // TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think HiveInterruptUtils.interrupt() should be added here.
Because SparkSQLCLIDriver has invoke installSignalHandler() to add HiveInterruptCallback, which would cancel all spark jobs when
HiveInterruptUtils.interrupt() is invoked.
see
Lines 60 to 81 in bc7592b
| installSignalHandler() | |
| /** | |
| * Install an interrupt callback to cancel all Spark jobs. In Hive's CliDriver#processLine(), | |
| * a signal handler will invoke this registered callback if a Ctrl+C signal is detected while | |
| * a command is being processed by the current thread. | |
| */ | |
| def installSignalHandler() { | |
| HiveInterruptUtils.add(new HiveInterruptCallback { | |
| override def interrupt() { | |
| // Handle remote execution mode | |
| if (SparkSQLEnv.sparkContext != null) { | |
| SparkSQLEnv.sparkContext.cancelAllJobs() | |
| } else { | |
| if (transport != null) { | |
| // Force closing of TCP connection upon session termination | |
| transport.getSocket.close() | |
| } | |
| } | |
| } | |
| }) | |
| } |
| // Hook up the custom Ctrl+C handler while processing this line | ||
| interruptSignal = new Signal("INT") | ||
| oldSignal = Signal.handle(interruptSignal, new SignalHandler() { | ||
| private val cliThread = Thread.currentThread() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the meaning of cliThread, I don't find any usage.
|
ping @sadhen Any update? |
|
Can one of the admins verify this patch? |
|
OK. Thank you @gatorsmile |
|
Create new PR: #25018 |
What changes were proposed in this pull request?
Improve the behavior of sql text splitting for the spark-sql command line.
Currently, the spark-sql command line does not split the sql text correctly, for example, the
;in double quotes.How was this patch tested?
Manually tested:
The expected result is
^;^. However, Spark master will split the SQL by;. As as result,select "^is not a valid SQL.Unit Tests:
TODO
Polish the code on SignalHandler. However, it should be another PR.