-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-24345][SQL]Improve ParseError stop location when offending symbol is a token #21334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…token In the case where the offending symbol is a CommonToken, this PR increases the accuracy of the start and stop origin by leveraging the start and stop index information from CommonToken.
Fix character to be relative to the current line
|
Hi @rubenfiszel thanks for the contribution! Can you please take a glance through http://spark.apache.org/contributing.html to see the best way to get your change merged into Apache Spark? I'd suggest you:
Cheers! |
|
@ash211 As requested, implemented tests and created the associated ticket |
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala
Outdated
Show resolved
Hide resolved
|
Test build #4669 has finished for PR 21334 at commit
|
|
In the PR description, could you put the simple example that this pr could make more accurate in parser errors? |
| class ErrorParserSuite extends SparkFunSuite { | ||
| def intercept(sql: String, line: Int, startPosition: Int, messages: String*): Unit = { | ||
| def intercept(sql: String, line: Int, startPosition: Int, stopPosition: Int, | ||
| messages: String*): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
def intercept(
sql: String,
line: Int,
startPosition: Int,
stopPosition: Int,
messages: String*): Unit = {
| throw new ParseException(None, msg, position, position) | ||
| val (start, stop) = offendingSymbol match { | ||
| case token: CommonToken => | ||
| val start = Origin(Some(line), Some(token.getCharPositionInLine)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like computation of start can be moved outside ? Only the computation of stop is different between commonToken and non common tokens ?
Also, just for my understanding, can you please briefly explain the difference between the common token and other ones ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not exactly the same code, but does it have the same result? Looking OK to me but @rubenfiszel could you comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a pure code point of view, it's not equivalent since it is using the token.getCharPositionInline instead of the method arg.
It might be equivalent but that would require an invariant to hold (method getCharPositionInline == token.getCharPositionInLine) that seems unnecessary since the intent of this specific case is to leverage the informations from the CommonToken directly.
The difference between CommonToken and other types of offending symbols is that it is clear for CommonToken where is the stop.
We use this internally on our fork of spark to get nice language-server-protocol errors that are correctly delimited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rberenguel thanks for your explanation.
|
Test build #4684 has finished for PR 21334 at commit
|
|
Merged to master |
In the case where the offending symbol is a CommonToken, this PR increases the accuracy of the start and stop origin by leveraging the start and stop index information from CommonToken.