-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-11989][SQL] Only use commit in JDBC data source if the underlying database supports transactions #9973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a var, the initialization, or following try block? seems like it's one line to init a val.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Sean,
in case a (flaky) driver throws some exception when detecting transaction support, the code should fall back to current standard behaviour: use transactions.
Does this make sense?
Thanks,
Christian
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not what this does; it's a no-op. It's only a try block.
|
Test build #2112 has finished for PR 9973 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(These should be removed)
Including suggested changes from Sean Thanks for your help, Sean!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you need a space before {
|
@CK50 can you update the pull request title to: [SPARK-11989][SQL] Only use commit in JDBC data source if the underlying database supports transactions |
|
Logging a warning helps debugging, and these warnings only show up on executor logs anyway. I think we should still log that warning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you write it this way to match our style?
} catch {
case NonFatal(e) =>
logWarning("Exception while detecting transaction support", e)
true
}|
LGTM except that nonfatal thing and indentation. |
|
Test build #2120 has finished for PR 9973 at commit
|
|
LGTM except you now may have a line that's too long |
|
Test build #2130 has finished for PR 9973 at commit
|
|
@CK50 this is good to go but I just now realized you have opened the PR vs the 1.6 branch. It needs to be vs master always (except in special cases like a back-port that is quite different). Can you just close this one and reopen the same change vs master? |
…ing database supports transactions Fixes [SPARK-11989](https://issues.apache.org/jira/browse/SPARK-11989) Author: CK50 <[email protected]> Author: Christian Kurz <[email protected]> Closes #9973 from CK50/branch-1.6_non-transactional.
|
@srowen @rxin I have created a forward port: On 30.11.2015 13:10, Reynold Xin wrote:
Oracle http://www.oracle.com ORACLE Deutschland B.V. & Co. KG | Hamborner Str. 51 | 40472 Düsseldorf ORACLE Deutschland B.V. & Co. KG Komplementärin: ORACLE Deutschland Verwaltung B.V. Green Oracle http://www.oracle.com/commitment Oracle is committed to |
| } | ||
| conn.commit() | ||
| if (supportsTransactions) { | ||
| conn.commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still wrong... The commit is partition level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the dataFrame has duplicate keys and the target tables have the unique constraints, we could see the non-deterministic results at the target tables, since some partitions containing duplicate keys could be rolled back.
|
You're right but lots of output semantics are per partition. I don't think we can do the update in one transaction no matter what. This improves the behavior in many cases so is worthwhile behavior but doesn't make this truly bulletproof or idempotrnt no matter what. The operation can still fail after the commit of any partition and then fail on reexecution . |
|
The basic problem is multiple connections work on the same transaction. It is doable but might not be applicable as a general JDBC data source connector. Let us keep it as an open problem. If necessary, we can explore how to do it in JDBC. |
|
Oh, can you really have a transaction across connections somehow? I didn't think that was possible in general. I agree that this is really the ideal behavior but don't know how to implement it in JDBC. |
|
Checked it with @huaxingao who worked for JDBC driver team before. Yeah, we are unable to do it using JDBC. In my previous team, we did it using the native connection methods instead of JDBC. It sounds like each major RDBMS connectors should not use JDBC because of this requirement. Thank you! |
Fixes SPARK-11989