Skip to content

Conversation

@da-liii
Copy link
Contributor

@da-liii da-liii commented Dec 10, 2018

What changes were proposed in this pull request?

Improve the behavior of sql text splitting for the spark-sql command line.

Currently, the spark-sql command line does not split the sql text correctly, for example, the ; in double quotes.

How was this patch tested?

Manually tested:

$ ./build/mvn -Phive-thriftserver -DskipTests package
$ bin/spark-sql
> select "^;^";

The expected result is ^;^ . However, Spark master will split the SQL by ;. As as result, select "^ is not a valid SQL.

spark-sql> select "\";";
";
spark-sql> select "\';";
';

Unit Tests:

$ build/sbt -Phive-thriftserver
> project hive-thriftserver
> testOnly *SparkSQLCLIDriverSuite
> project catalyst
> testOnly *StringUtilsSuite

TODO

Polish the code on SignalHandler. However, it should be another PR.

  • Fix the failed tests.
  • Polish StringUtils.split and add more unit tests
  • Add scaladoc
  • Handling comments

@da-liii da-liii changed the title [SPARK-26321][SQL] Split a SQL in a correct way [SPARK-26321][SQL] Split a SQL in correct way Dec 10, 2018
@SparkQA
Copy link

SparkQA commented Dec 10, 2018

Test build #99921 has finished for PR 23276 at commit c5792e5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 10, 2018

Test build #99923 has finished for PR 23276 at commit 932565f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

cc @gatorsmile

@SparkQA
Copy link

SparkQA commented Dec 11, 2018

Test build #99965 has finished for PR 23276 at commit bd8c09a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@da-liii
Copy link
Contributor Author

da-liii commented Dec 11, 2018

[info] HiveMetastoreLazyInitializationSuite:
[info] - lazily initialize Hive client (17 seconds, 703 milliseconds)

The failed unit test works fine on my laptop.

@srowen
Copy link
Member

srowen commented Dec 11, 2018

Maybe this is obvious to you all, but what is the output, and what was expected here?

@da-liii
Copy link
Contributor Author

da-liii commented Dec 11, 2018

OK I will add more desc.

@srowen This is actually a trivial PR!Please review.

@SparkQA
Copy link

SparkQA commented Dec 12, 2018

Test build #99999 has finished for PR 23276 at commit 2f77b5d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Let us first update the PR description to explain the problem we want to resolve. Regarding comments, our SQL parser follows PostgreSQL. You can read SqlBase.g4 to confirm it.

SIMPLE_COMMENT
    : '--' ~[\r\n]* '\r'? '\n'? -> channel(HIDDEN)
    ;

BRACKETED_EMPTY_COMMENT
    : '/**/' -> channel(HIDDEN)
    ;

BRACKETED_COMMENT
    : '/*' ~[+] .*? '*/' -> channel(HIDDEN)
    ;

In PostgreSQL doc: https://www.postgresql.org/docs/9.4/sql-syntax-lexical.html

A comment is a sequence of characters beginning with double dashes and extending to the end of the line, e.g.:

-- This is a standard SQL comment

Alternatively, C-style block comments can be used:

/* multiline comment
 * with nesting: /* nested block comment */
 */

where the comment begins with /* and extends to the matching occurrence of */. These block comments nest, as specified in the SQL standard but unlike C, so that one can comment out larger blocks of code that might contain existing block comments.

A comment is removed from the input stream before further syntax analysis and is effectively replaced by whitespace.

@gatorsmile
Copy link
Member

The semicolon (;) terminates an SQL command. It cannot appear anywhere within a command, except within a string constant or quoted identifier.

Thus, you need to consider both double quotes, single quotes. Also backstick, which is being used by Spark SQL for quoting identifiers.

@da-liii
Copy link
Contributor Author

da-liii commented Dec 12, 2018

OK, I will polish it later.

We are not using the BRACKETED_COMMENT. It seems to be a complicated task.

I have to admit that I am ignorant of Antlr.

Test cases mannually verified on spark.sql:

    val bracketedComment1 = "select 1 /*;*/" // Good
    val bracketedComment2 = "select 1 /* /* ; */" // Good
    val bracketedComment3 = "select 1 /* */ ; */" // Bad(semicolon not in comment)
    val bracketedComment4 = "select 1 /**/ ; */" // Good
    val bracketedComment5 = "select 1 /**/ ; /**/" // Good
    val bracketedComment6 = "select 1 /**/  ; /* */" // Good
    val bracketedComment7 = "select 1 /* */ ; /* */" // Bad(semicolon not in comment)

    val qQuote1 = "select 1 as `;`" // Good
    val qQuote2 = "select 1 as ```;`" // Good
    val qQuote3 = "select 1 as ``;`" // Bad

@da-liii
Copy link
Contributor Author

da-liii commented Dec 24, 2018

@gatorsmile Please re-review.

@SparkQA
Copy link

SparkQA commented Dec 24, 2018

Test build #100421 has finished for PR 23276 at commit e091dae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 3, 2019

Test build #100689 has finished for PR 23276 at commit d4b5787.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 3, 2019

Test build #100688 has finished for PR 23276 at commit a269ae2.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@da-liii
Copy link
Contributor Author

da-liii commented Jan 3, 2019

Rebased,still work in progress.

@da-liii
Copy link
Contributor Author

da-liii commented Jan 7, 2019

@gatorsmile Please re-review.

@SparkQA
Copy link

SparkQA commented Jan 7, 2019

Test build #100891 has finished for PR 23276 at commit 912c77e.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@da-liii
Copy link
Contributor Author

da-liii commented Jan 7, 2019

Jenkins ???

@gatorsmile
Copy link
Member

retest this please

@gatorsmile
Copy link
Member

[error] /home/jenkins/workspace/SparkPullRequestBuilder@3/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriverSuite.scala:23: value argThat is not a member of object org.mockito.Matchers
[error] import org.mockito.Matchers.argThat
[error]        ^
[error] /home/jenkins/workspace/SparkPullRequestBuilder@3/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriverSuite.scala:41: not found: value argThat
[error]     when(cli.processCmd(argThat(new SQLMatcher))).thenReturn(0)
[error]                         ^

@SparkQA
Copy link

SparkQA commented Jan 8, 2019

Test build #100913 has finished for PR 23276 at commit 912c77e.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@da-liii da-liii changed the title [SPARK-26321][SQL] Split a SQL in correct way [SPARK-26321][SQL] Split a SQL properly Jan 22, 2019
@da-liii da-liii changed the title [SPARK-26321][SQL] Split a SQL properly [SPARK-26321][SQL] Improve the behavior of sql text splitting for the spark-sql command line Jan 22, 2019
@da-liii
Copy link
Contributor Author

da-liii commented Jan 22, 2019

Rebase on the latest master with the newer Mockito.

@gatorsmile Please review.

@SparkQA
Copy link

SparkQA commented Jan 22, 2019

Test build #101536 has finished for PR 23276 at commit a4462f4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@da-liii
Copy link
Contributor Author

da-liii commented Jan 26, 2019

@gatorsmile Conflicts resolved, please re-review.

@SparkQA
Copy link

SparkQA commented Jan 26, 2019

Test build #101709 has finished for PR 23276 at commit c245ad4.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@da-liii
Copy link
Contributor Author

da-liii commented Jan 26, 2019

$ build/sbt -Phive-thriftserver
> project hive-thriftserver
> testOnly *SparkSQLCLIDriverSuite
> project catalyst
> testOnly *StringUtilsSuite

Manually tested, it works fine.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jan 27, 2019

Test build #101727 has finished for PR 23276 at commit c245ad4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}

// method body imported from Hive and translated from Java to Scala
override def processLine(line: String, allowInterrupting: Boolean): Int = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the default processLine implementation doesn't handle ; well? do you mean hive sql shell have this bug as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes,there is a buggy impl in hive

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the hive code and seems the ; is well handled at least in the master branch: https://github.com/apache/hive/blob/master/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java#L395

Do you mean only Hive 1.2 has this bug? Maybe we should upgrade hive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes,and I had not checked the impl on Hive master. We may judge which impl is better

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

console.printInfo("Press Ctrl+C again to kill JVM")

// First, kill any running Spark jobs
// TODO
Copy link
Member

@turboFei turboFei Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think HiveInterruptUtils.interrupt() should be added here.
Because SparkSQLCLIDriver has invoke installSignalHandler() to add HiveInterruptCallback, which would cancel all spark jobs when
HiveInterruptUtils.interrupt() is invoked.
see

installSignalHandler()
/**
* Install an interrupt callback to cancel all Spark jobs. In Hive's CliDriver#processLine(),
* a signal handler will invoke this registered callback if a Ctrl+C signal is detected while
* a command is being processed by the current thread.
*/
def installSignalHandler() {
HiveInterruptUtils.add(new HiveInterruptCallback {
override def interrupt() {
// Handle remote execution mode
if (SparkSQLEnv.sparkContext != null) {
SparkSQLEnv.sparkContext.cancelAllJobs()
} else {
if (transport != null) {
// Force closing of TCP connection upon session termination
transport.getSocket.close()
}
}
}
})
}

// Hook up the custom Ctrl+C handler while processing this line
interruptSignal = new Signal("INT")
oldSignal = Signal.handle(interruptSignal, new SignalHandler() {
private val cliThread = Thread.currentThread()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the meaning of cliThread, I don't find any usage.

@gatorsmile
Copy link
Member

ping @sadhen Any update?

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@gatorsmile
Copy link
Member

If @sadhen is busy, @wangyum Maybe you can take this over? This is very close to be finished.

@wangyum
Copy link
Member

wangyum commented Jun 30, 2019

OK. Thank you @gatorsmile

@wangyum
Copy link
Member

wangyum commented Jul 19, 2019

Create new PR: #25018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.