-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-49730][SQL] classify syntax errors for pgsql, mysql, sqlserver and h2 #48368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…o syntaxErrorsJdbc
…o syntaxErrorsJdbc
|
What about internal errors on custom connector side? Can we catch and classify those exceptions (based on the stack trace). For the example, we could have case where there is a miss match between database's schema and dataframe's schema, we wouldn't be able to detect this? |
| assert(types(1).equals("class java.lang.String")) | ||
| } | ||
|
|
||
| test("SPARK-49730: syntax error classification") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about tests for GET_SCHEMA and EXECUTE_QUERY?
| }, | ||
| "SYNTAX_ERROR" : { | ||
| "message" : [ | ||
| "Query generated for external DBMS, during compile contains syntax error. Original query: <query>." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the entire error message with parent's error message:
| "Query generated for external DBMS, during compile contains syntax error. Original query: <query>." | |
| "Compilation of the query for an external DBMS: <query>." |
| }, | ||
| "EXECUTE_QUERY" : { | ||
| "message" : [ | ||
| "Failed to execute jdbc query: <query>." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "Failed to execute jdbc query: <query>." | |
| "Execution of the query: <query>." |
| }, | ||
| "GET_SCHEMA" : { | ||
| "message" : [ | ||
| "Fetching schema for external query or table has failed." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "Fetching schema for external query or table has failed." | |
| "Schema fetching for an external query or a table." |
Does this pr solve it? @jovanpavl-db |
|
What should be EXECUTE_QUERY errors? |
|
I moved |
|
Moved tests to V2 Suites. |
| val parm = parameters.getOrElse(exp._1, | ||
| throw new IllegalArgumentException("Missing parameter" + exp._1)) | ||
| if (!exp._2.matches(parm)) { | ||
| if (!exp._2.matches(parm) && exp._2 != parm) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I am using checkErrorMatchPVals function, either I should:
- have this change so that we check regex or check if the exact value is the same or:
- from a variable which is the exact match to expected, create something like this: "parameter" -> ("^" + Pattern.quote(val) + "$").
I am using checkErrorMatchPVals in the first place, since I have some parameters that can only be checked with regex, but for some parameters I have the exact value.
|
@MaxGekk I am ready for a review. |
MaxGekk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivanjevtic-db Could you fix the failed tests like:
[info] - Error conditions are correctly formatted *** FAILED *** (112 milliseconds)
[info] "... ]
[info] },
[info] "[EXECUTE_QUERY" : {
[info] "message" : [
[info] "Execution of the query: <query>."
[info] ]
[info] },
|
jenkins trigger all |
@ivanjevtic-db Don't think it works here. |
Fixed |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
In this PR I propose that we add classification of JDBC driver exception that represent syntax errors on external databases.
Queries generated from Spark that are sent to connectors should never have syntax error, and always indicate to a bug in code.
Having this classification would greatly help in identifying bugs faster and improving quality.
Documentation about error codes for syntax errors:
https://learn.microsoft.com/en-us/sql/relational-databases/errors-events/database-engine-events-and-errors-0-to-999?view=sql-server-ver16
https://www.postgresql.org/docs/current/errcodes-appendix.html
https://mariadb.com/kb/en/mariadb-error-code-reference/
https://www.h2database.com/javadoc/org/h2/api/ErrorCode.html
Why are the changes needed?
Does this PR introduce any user-facing change?
Yes, previously customers would get
FAILED_JDBC. UNCLASSIFIEDor internal error when syntax error occurs in JDBC, whereas they will now getFAILED_JDBC.SYNTAX_ERRORHow was this patch tested?
Integration tests
Was this patch authored or co-authored using generative AI tooling?
No