[SPARK-26321][SQL] Improve the behavior of sql text splitting for the spark-sql command line #23276

da-liii · 2018-12-10T15:15:16Z

What changes were proposed in this pull request?

Improve the behavior of sql text splitting for the spark-sql command line.

Currently, the spark-sql command line does not split the sql text correctly, for example, the ; in double quotes.

How was this patch tested?

Manually tested:

$ ./build/mvn -Phive-thriftserver -DskipTests package
$ bin/spark-sql
> select "^;^";

The expected result is ^;^ . However, Spark master will split the SQL by ;. As as result, select "^ is not a valid SQL.

spark-sql> select "\";";
";
spark-sql> select "\';";
';

Unit Tests:

$ build/sbt -Phive-thriftserver
> project hive-thriftserver
> testOnly *SparkSQLCLIDriverSuite
> project catalyst
> testOnly *StringUtilsSuite

TODO

Polish the code on SignalHandler. However, it should be another PR.

Fix the failed tests.
Polish StringUtils.split and add more unit tests
Add scaladoc
Handling comments

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala

SparkQA · 2018-12-10T19:28:09Z

Test build #99921 has finished for PR 23276 at commit c5792e5.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-12-10T19:32:17Z

Test build #99923 has finished for PR 23276 at commit 932565f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-12-11T11:02:54Z

cc @gatorsmile

SparkQA · 2018-12-11T13:46:29Z

Test build #99965 has finished for PR 23276 at commit bd8c09a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

da-liii · 2018-12-11T16:23:46Z

[info] HiveMetastoreLazyInitializationSuite:
[info] - lazily initialize Hive client (17 seconds, 703 milliseconds)

The failed unit test works fine on my laptop.

srowen · 2018-12-11T20:27:29Z

Maybe this is obvious to you all, but what is the output, and what was expected here?

da-liii · 2018-12-11T23:45:36Z

OK I will add more desc.

@srowen This is actually a trivial PR！Please review.

SparkQA · 2018-12-12T05:29:45Z

Test build #99999 has finished for PR 23276 at commit 2f77b5d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-12-12T06:23:16Z

Let us first update the PR description to explain the problem we want to resolve. Regarding comments, our SQL parser follows PostgreSQL. You can read SqlBase.g4 to confirm it.

SIMPLE_COMMENT
    : '--' ~[\r\n]* '\r'? '\n'? -> channel(HIDDEN)
    ;

BRACKETED_EMPTY_COMMENT
    : '/**/' -> channel(HIDDEN)
    ;

BRACKETED_COMMENT
    : '/*' ~[+] .*? '*/' -> channel(HIDDEN)
    ;

In PostgreSQL doc: https://www.postgresql.org/docs/9.4/sql-syntax-lexical.html

A comment is a sequence of characters beginning with double dashes and extending to the end of the line, e.g.:

-- This is a standard SQL comment

Alternatively, C-style block comments can be used:

/* multiline comment
 * with nesting: /* nested block comment */
 */

where the comment begins with /* and extends to the matching occurrence of */. These block comments nest, as specified in the SQL standard but unlike C, so that one can comment out larger blocks of code that might contain existing block comments.

A comment is removed from the input stream before further syntax analysis and is effectively replaced by whitespace.

gatorsmile · 2018-12-12T06:25:53Z

The semicolon (;) terminates an SQL command. It cannot appear anywhere within a command, except within a string constant or quoted identifier.

Thus, you need to consider both double quotes, single quotes. Also backstick, which is being used by Spark SQL for quoting identifiers.

da-liii · 2018-12-12T07:33:03Z

OK, I will polish it later.

We are not using the BRACKETED_COMMENT. It seems to be a complicated task.

I have to admit that I am ignorant of Antlr.

Test cases mannually verified on spark.sql:

    val bracketedComment1 = "select 1 /*;*/" // Good
    val bracketedComment2 = "select 1 /* /* ; */" // Good
    val bracketedComment3 = "select 1 /* */ ; */" // Bad(semicolon not in comment)
    val bracketedComment4 = "select 1 /**/ ; */" // Good
    val bracketedComment5 = "select 1 /**/ ; /**/" // Good
    val bracketedComment6 = "select 1 /**/  ; /* */" // Good
    val bracketedComment7 = "select 1 /* */ ; /* */" // Bad(semicolon not in comment)

    val qQuote1 = "select 1 as `;`" // Good
    val qQuote2 = "select 1 as ```;`" // Good
    val qQuote3 = "select 1 as ``;`" // Bad

da-liii · 2018-12-24T11:30:59Z

@gatorsmile Please re-review.

SparkQA · 2018-12-24T15:51:02Z

Test build #100421 has finished for PR 23276 at commit e091dae.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

...e-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala

SparkQA · 2019-01-03T13:10:34Z

Test build #100689 has finished for PR 23276 at commit d4b5787.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-01-03T17:24:25Z

Test build #100688 has finished for PR 23276 at commit a269ae2.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

da-liii · 2019-01-03T17:32:55Z

Rebased，still work in progress.

da-liii · 2019-01-07T14:51:05Z

@gatorsmile Please re-review.

SparkQA · 2019-01-07T15:05:08Z

Test build #100891 has finished for PR 23276 at commit 912c77e.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

da-liii · 2019-01-07T15:15:01Z

Jenkins ???

gatorsmile · 2019-01-08T00:04:56Z

retest this please

gatorsmile · 2019-01-08T00:05:05Z

[error] /home/jenkins/workspace/SparkPullRequestBuilder@3/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriverSuite.scala:23: value argThat is not a member of object org.mockito.Matchers
[error] import org.mockito.Matchers.argThat
[error]        ^
[error] /home/jenkins/workspace/SparkPullRequestBuilder@3/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriverSuite.scala:41: not found: value argThat
[error]     when(cli.processCmd(argThat(new SQLMatcher))).thenReturn(0)
[error]                         ^

SparkQA · 2019-01-08T00:18:49Z

Test build #100913 has finished for PR 23276 at commit 912c77e.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

da-liii · 2019-01-22T12:36:59Z

Rebase on the latest master with the newer Mockito.

@gatorsmile Please review.

SparkQA · 2019-01-22T17:20:17Z

Test build #101536 has finished for PR 23276 at commit a4462f4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

da-liii · 2019-01-26T07:36:05Z

@gatorsmile Conflicts resolved, please re-review.

SparkQA · 2019-01-26T08:05:01Z

Test build #101709 has finished for PR 23276 at commit c245ad4.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

da-liii · 2019-01-26T09:59:35Z

$ build/sbt -Phive-thriftserver
> project hive-thriftserver
> testOnly *SparkSQLCLIDriverSuite
> project catalyst
> testOnly *StringUtilsSuite

Manually tested, it works fine.

HyukjinKwon · 2019-01-27T09:54:17Z

retest this please

SparkQA · 2019-01-27T14:37:30Z

Test build #101727 has finished for PR 23276 at commit c245ad4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-01-28T04:18:23Z

...e-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala

  }

+  // method body imported from Hive and translated from Java to Scala
+  override def processLine(line: String, allowInterrupting: Boolean): Int = {


so the default processLine implementation doesn't handle ; well? do you mean hive sql shell have this bug as well?

yes，there is a buggy impl in hive

I checked the hive code and seems the ; is well handled at least in the master branch: https://github.com/apache/hive/blob/master/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java#L395

Do you mean only Hive 1.2 has this bug? Maybe we should upgrade hive.

Yes，and I had not checked the impl on Hive master. We may judge which impl is better

Upgrade the built-in Hive can fix this issue:

Another related issue:
https://issues.apache.org/jira/browse/SPARK-12014

turboFei · 2019-03-01T04:00:35Z

...e-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala

+          console.printInfo("Press Ctrl+C again to kill JVM")
+
+          // First, kill any running Spark jobs
+          // TODO


I think HiveInterruptUtils.interrupt() should be added here.
Because SparkSQLCLIDriver has invoke installSignalHandler() to add HiveInterruptCallback, which would cancel all spark jobs when
HiveInterruptUtils.interrupt() is invoked.
see

spark/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala

Lines 60 to 81 in bc7592b

installSignalHandler()

/**

* Install an interrupt callback to cancel all Spark jobs. In Hive's CliDriver#processLine(),

* a signal handler will invoke this registered callback if a Ctrl+C signal is detected while

* a command is being processed by the current thread.

*/

def installSignalHandler() {

HiveInterruptUtils.add(new HiveInterruptCallback {

override def interrupt() {

// Handle remote execution mode

if (SparkSQLEnv.sparkContext != null) {

SparkSQLEnv.sparkContext.cancelAllJobs()

} else {

if (transport != null) {

// Force closing of TCP connection upon session termination

transport.getSocket.close()

}

}

}

})

}

turboFei · 2019-03-01T06:29:12Z

...e-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala

+      // Hook up the custom Ctrl+C handler while processing this line
+      interruptSignal = new Signal("INT")
+      oldSignal = Signal.handle(interruptSignal, new SignalHandler() {
+        private val cliThread = Thread.currentThread()


what's the meaning of cliThread, I don't find any usage.

gatorsmile · 2019-06-07T16:05:20Z

ping @sadhen Any update?

AmplabJenkins · 2019-06-29T05:17:54Z

Can one of the admins verify this patch?

gatorsmile · 2019-06-29T16:59:44Z

If @sadhen is busy, @wangyum Maybe you can take this over? This is very close to be finished.

wangyum · 2019-06-30T08:12:40Z

OK. Thank you @gatorsmile

wangyum · 2019-07-19T10:21:55Z

Create new PR: #25018

da-liii changed the title ~~[SPARK-26321][SQL] Split a SQL in a correct way~~ [SPARK-26321][SQL] Split a SQL in correct way Dec 10, 2018

He-Pin reviewed Dec 10, 2018

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala Outdated Show resolved Hide resolved

gatorsmile reviewed Jan 2, 2019

View reviewed changes

da-liii force-pushed the fix/cli branch from a269ae2 to d4b5787 Compare January 3, 2019 13:07

da-liii changed the title ~~[SPARK-26321][SQL] Split a SQL in correct way~~ [SPARK-26321][SQL] Split a SQL properly Jan 22, 2019

da-liii changed the title ~~[SPARK-26321][SQL] Split a SQL properly~~ [SPARK-26321][SQL] Improve the behavior of sql text splitting for the spark-sql command line Jan 22, 2019

da-liii force-pushed the fix/cli branch from 912c77e to a4462f4 Compare January 22, 2019 12:34

da-liii added 11 commits January 26, 2019 15:33

Fix SPARK-26321

9ab2865

Better code style

92e5352

Fix unit tests in CliSuite

2fd9b7b

Add test cases on inline comments

af6e885

Unit tests on ESCAPE, Improve scaladoc and readability

91c6e13

stash

fbdec3f

Should not follow the complete ANTLR rule

067b733

Minor fix

5ab280c

Refactor

a7dc590

More tests and comments

01c0cc5

Adapt for the new mockito

c245ad4

da-liii force-pushed the fix/cli branch from a4462f4 to c245ad4 Compare January 26, 2019 07:33

cloud-fan reviewed Jan 28, 2019

View reviewed changes

turboFei reviewed Mar 1, 2019

View reviewed changes

dongjoon-hyun added the SQL label Jun 14, 2019

da-liii closed this Jul 19, 2019

gatorsmile mentioned this pull request Jul 19, 2019

[SPARK-26321][SQL] Port HIVE-15297: Hive should not split semicolon within quoted string literals #25018

Closed

	installSignalHandler()

	/**
	* Install an interrupt callback to cancel all Spark jobs. In Hive's CliDriver#processLine(),
	* a signal handler will invoke this registered callback if a Ctrl+C signal is detected while
	* a command is being processed by the current thread.
	*/
	def installSignalHandler() {
	HiveInterruptUtils.add(new HiveInterruptCallback {
	override def interrupt() {
	// Handle remote execution mode
	if (SparkSQLEnv.sparkContext != null) {
	SparkSQLEnv.sparkContext.cancelAllJobs()
	} else {
	if (transport != null) {
	// Force closing of TCP connection upon session termination
	transport.getSocket.close()
	}
	}
	}
	})
	}

[SPARK-26321][SQL] Improve the behavior of sql text splitting for the spark-sql command line #23276

[SPARK-26321][SQL] Improve the behavior of sql text splitting for the spark-sql command line #23276

Uh oh!

Conversation

da-liii commented Dec 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

TODO

Uh oh!

Uh oh!

SparkQA commented Dec 10, 2018

Uh oh!

SparkQA commented Dec 10, 2018

Uh oh!

cloud-fan commented Dec 11, 2018

Uh oh!

SparkQA commented Dec 11, 2018

Uh oh!

da-liii commented Dec 11, 2018

Uh oh!

srowen commented Dec 11, 2018

Uh oh!

da-liii commented Dec 11, 2018

Uh oh!

SparkQA commented Dec 12, 2018

Uh oh!

gatorsmile commented Dec 12, 2018

Uh oh!

gatorsmile commented Dec 12, 2018

Uh oh!

da-liii commented Dec 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

da-liii commented Dec 24, 2018

Uh oh!

SparkQA commented Dec 24, 2018

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SparkQA commented Jan 3, 2019

Uh oh!

SparkQA commented Jan 3, 2019

Uh oh!

da-liii commented Jan 3, 2019

Uh oh!

da-liii commented Jan 7, 2019

Uh oh!

SparkQA commented Jan 7, 2019

Uh oh!

da-liii commented Jan 7, 2019

Uh oh!

gatorsmile commented Jan 8, 2019

Uh oh!

gatorsmile commented Jan 8, 2019

Uh oh!

SparkQA commented Jan 8, 2019

Uh oh!

da-liii commented Jan 22, 2019

Uh oh!

SparkQA commented Jan 22, 2019

Uh oh!

da-liii commented Jan 26, 2019

Uh oh!

SparkQA commented Jan 26, 2019

Uh oh!

da-liii commented Jan 26, 2019

Uh oh!

HyukjinKwon commented Jan 27, 2019

Uh oh!

SparkQA commented Jan 27, 2019

Uh oh!

cloud-fan Jan 28, 2019

Choose a reason for hiding this comment

da-liii commented Dec 10, 2018 •

edited

Loading

da-liii commented Dec 12, 2018 •

edited

Loading

turboFei Mar 1, 2019 •

edited

Loading