Skip to content

Conversation

@ivanjevtic-db
Copy link
Contributor

What changes were proposed in this pull request?

In this PR I propose that we add classification of JDBC driver exception that represent syntax errors on external databases.
Queries generated from Spark that are sent to connectors should never have syntax error, and always indicate to a bug in code.

Having this classification would greatly help in identifying bugs faster and improving quality.

Documentation about error codes for syntax errors:
https://learn.microsoft.com/en-us/sql/relational-databases/errors-events/database-engine-events-and-errors-0-to-999?view=sql-server-ver16
https://www.postgresql.org/docs/current/errcodes-appendix.html
https://mariadb.com/kb/en/mariadb-error-code-reference/
https://www.h2database.com/javadoc/org/h2/api/ErrorCode.html

Why are the changes needed?

  1. This gives customer more precise error message about issue being hit.
  2. Help identify bugs faster as syntax error thrown from jdbc connectors should never happen.

Does this PR introduce any user-facing change?

Yes, previously customers would get FAILED_JDBC. UNCLASSIFIED or internal error when syntax error occurs in JDBC, whereas they will now get FAILED_JDBC.SYNTAX_ERROR

How was this patch tested?

Integration tests

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Oct 7, 2024
@jovanpavl-db
Copy link
Contributor

What about internal errors on custom connector side? Can we catch and classify those exceptions (based on the stack trace). For the example, we could have case where there is a miss match between database's schema and dataframe's schema, we wouldn't be able to detect this?

assert(types(1).equals("class java.lang.String"))
}

test("SPARK-49730: syntax error classification") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about tests for GET_SCHEMA and EXECUTE_QUERY?

},
"SYNTAX_ERROR" : {
"message" : [
"Query generated for external DBMS, during compile contains syntax error. Original query: <query>."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the entire error message with parent's error message:

Suggested change
"Query generated for external DBMS, during compile contains syntax error. Original query: <query>."
"Compilation of the query for an external DBMS: <query>."

},
"EXECUTE_QUERY" : {
"message" : [
"Failed to execute jdbc query: <query>."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Failed to execute jdbc query: <query>."
"Execution of the query: <query>."

},
"GET_SCHEMA" : {
"message" : [
"Fetching schema for external query or table has failed."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Fetching schema for external query or table has failed."
"Schema fetching for an external query or a table."

@ivanjevtic-db
Copy link
Contributor Author

ivanjevtic-db commented Oct 8, 2024

What about internal errors on custom connector side? Can we catch and classify those exceptions (based on the stack trace). For the example, we could have case where there is a miss match between database's schema and dataframe's schema, we wouldn't be able to detect this?

Does this pr solve it? @jovanpavl-db

@ivanjevtic-db
Copy link
Contributor Author

What should be EXECUTE_QUERY errors?

@ivanjevtic-db
Copy link
Contributor Author

I moved CREATE TABLE with table property test from V2JDBCTest to each dialect test suite. The tests reports syntax error for mssql: com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near 'a'.

@ivanjevtic-db
Copy link
Contributor Author

Moved tests to V2 Suites.

@github-actions github-actions bot added the CORE label Oct 23, 2024
val parm = parameters.getOrElse(exp._1,
throw new IllegalArgumentException("Missing parameter" + exp._1))
if (!exp._2.matches(parm)) {
if (!exp._2.matches(parm) && exp._2 != parm) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I am using checkErrorMatchPVals function, either I should:

  • have this change so that we check regex or check if the exact value is the same or:
  • from a variable which is the exact match to expected, create something like this: "parameter" -> ("^" + Pattern.quote(val) + "$").

I am using checkErrorMatchPVals in the first place, since I have some parameters that can only be checked with regex, but for some parameters I have the exact value.

@ivanjevtic-db
Copy link
Contributor Author

@MaxGekk I am ready for a review.

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivanjevtic-db Could you fix the failed tests like:

[info] - Error conditions are correctly formatted *** FAILED *** (112 milliseconds)
[info]   "...  ]
[info]         },
[info]         "[EXECUTE_QUERY" : {
[info]           "message" : [
[info]             "Execution of the query: <query>."
[info]           ]
[info]         },

@ivanjevtic-db
Copy link
Contributor Author

jenkins trigger all

@MaxGekk
Copy link
Member

MaxGekk commented Oct 30, 2024

jenkins trigger all

@ivanjevtic-db Don't think it works here.

@ivanjevtic-db
Copy link
Contributor Author

@ivanjevtic-db Could you fix the failed tests like:

[info] - Error conditions are correctly formatted *** FAILED *** (112 milliseconds)
[info]   "...  ]
[info]         },
[info]         "[EXECUTE_QUERY" : {
[info]           "message" : [
[info]             "Execution of the query: <query>."
[info]           ]
[info]         },

Fixed

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Feb 10, 2025
@github-actions github-actions bot closed this Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants