-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC #41518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Many databases support merge sql, including oracle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tableDoesNotSupportError("upserts", table)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method tableDoesNotSupportError requires a Table, where I have only a table name string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we can refactor
private def tableDoesNotSupportError(cmd: String, table: Table): Throwable
to
private[sql] def tableDoesNotSupportError(cmd: String, tableName: String): Throwable and update the callers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, done in 085f9af.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems not necessary!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upserts or upsert ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd go for upsert, as in upsert mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we avoid add parameter?
Please reuse JdbcOptionsInWrite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed because saveTable is called for all save modes, but only in Append mode for an existing table, we want to use the upsert statement, for all other code paths, we want to use plain insert.
We could decrease code complexity by removing this upsert argument and use options.isUpsert, but that would use upsert statements in situations where no upserts are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I know it. We can avoid add upsert here. If options with upsert parameters in Append mode ,we can use it directly. Otherwise, please ignore them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we can simplify the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I know in saveTable the current save mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get from options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The save mode is not part of options, it is an argument to createRelation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the parameter name is upsert?
It seems we should add new save mode. The Append cannot describe the semantics of upsert operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is called upsert, because it indicates saveTable to use upsert statements, rather than insert statements:
val insertStmt = if (upsert) {
getUpsertStatement(table, rddSchema, tableSchema, isCaseSensitive, dialect, options)
} else {
getInsertStatement(table, rddSchema, tableSchema, isCaseSensitive, dialect)
}I am open to introduce the Upsert save mode, but would like to hear other commiters' thoughts on this before I go and add this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better abstract, I think we should introduce the Upsert mode as you said. Some data sources doesn't support the upsert statement supported by some JDBC database, but we still could implement the Upsert mode by composite some operations like Overwrite mode.
see #41611 |
085f9af to
7416d0d
Compare
7416d0d to
5437219
Compare
5437219 to
7618004
Compare
|
@beliefer can you please approve the PR? |
7618004 to
4646cd0
Compare
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
|
Any update on this? Upsert is very much needed for jdbc sink |
https://github.com/melin/datatunnel support jdbc upsert. example: pg to oracle |
|
Reopened as #49528. |
What changes were proposed in this pull request?
This is a follow-up on #16685 and #16692.
Implements upsert mode for
SaveMode.Appendof the MySql, MsSql, and Postgres JDBC source.See #41611 for an alternative using the
MERGE INTOcommand (not supported by MySql).Why are the changes needed?
The JDBC writer only supports either truncating the existing table or inserting. Duplicates, i.e. rows with identical values in the primary or unique index columns, cause an exception, permitting updating existing and inserting new rows.
Re-evaluating a partition due to executor loss will insert rows that have been inserted in an earlier attempt, which kills the entier Spark job.
Does this PR introduce any user-facing change?
This adds
upsertandupsertKeyColumnsoptions forSaveMode.Appendof the JDBC source.How was this patch tested?
Tests in
JdbcSuiteand integration suites.Reopened as #49528.