Skip to content

Conversation

@CK50
Copy link
Contributor

@CK50 CK50 commented Nov 25, 2015

Fixes SPARK-11989

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a var, the initialization, or following try block? seems like it's one line to init a val.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Sean,
in case a (flaky) driver throws some exception when detecting transaction support, the code should fall back to current standard behaviour: use transactions.
Does this make sense?
Thanks,
Christian

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what this does; it's a no-op. It's only a try block.

@SparkQA
Copy link

SparkQA commented Nov 25, 2015

Test build #2112 has finished for PR 9973 at commit 0b9ea1a.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(These should be removed)

Including suggested changes from Sean
Thanks for your help, Sean!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you need a space before {

@rxin
Copy link
Contributor

rxin commented Nov 25, 2015

@CK50 can you update the pull request title to:

[SPARK-11989][SQL] Only use commit in JDBC data source if the underlying database supports transactions

@CK50 CK50 changed the title Fixes SPARK-11989 Spark JDBC write only works on techologies with transaction support [SPARK-11989][SQL] Only use commit in JDBC data source if the underlying database supports transactions Nov 25, 2015
@rxin
Copy link
Contributor

rxin commented Nov 25, 2015

Logging a warning helps debugging, and these warnings only show up on executor logs anyway. I think we should still log that warning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you write it this way to match our style?

} catch {
  case NonFatal(e) =>
    logWarning("Exception while detecting transaction support", e)
    true
}

@rxin
Copy link
Contributor

rxin commented Nov 25, 2015

LGTM except that nonfatal thing and indentation.

@SparkQA
Copy link

SparkQA commented Nov 27, 2015

Test build #2120 has finished for PR 9973 at commit 6a1220b.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Nov 27, 2015

LGTM except you now may have a line that's too long

@SparkQA
Copy link

SparkQA commented Nov 30, 2015

Test build #2130 has finished for PR 9973 at commit 00ed367.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Nov 30, 2015

@CK50 this is good to go but I just now realized you have opened the PR vs the 1.6 branch. It needs to be vs master always (except in special cases like a back-port that is quite different). Can you just close this one and reopen the same change vs master?

asfgit pushed a commit that referenced this pull request Nov 30, 2015
…ing database supports transactions

Fixes [SPARK-11989](https://issues.apache.org/jira/browse/SPARK-11989)

Author: CK50 <[email protected]>
Author: Christian Kurz <[email protected]>

Closes #9973 from CK50/branch-1.6_non-transactional.
@rxin
Copy link
Contributor

rxin commented Nov 30, 2015

@srowen it's ok for this one - i've merged it.

@CK50 for future ones, please target master, unless it is for a specific backport.

@asfgit asfgit closed this in 2db4662 Nov 30, 2015
@CK50
Copy link
Contributor Author

CK50 commented Nov 30, 2015

@srowen @rxin
Thanks for merging!

I have created a forward port:
[SPARK-11989][SQL] Only use commit in JDBC data source if the underlying
database supports transactions #10041

On 30.11.2015 13:10, Reynold Xin wrote:

@srowen https://github.com/srowen it's ok for this one - i've merged it.

@CK50 https://github.com/CK50 for future ones, please target master,
unless it is for a specific backport.


Reply to this email directly or view it on GitHub
#9973 (comment).

Oracle http://www.oracle.com
Christian Kurz | Consulting Member of Technical Staff
Phone: +49 228 30899431 tel:+49%20228%2030899431 | Mobile: +49 170
2964124 tel:+49%20170%202964124
Oracle Product Development

ORACLE Deutschland B.V. & Co. KG | Hamborner Str. 51 | 40472 Düsseldorf

ORACLE Deutschland B.V. & Co. KG
Hauptverwaltung: Riesstr. 25, D-80992 München
Registergericht: Amtsgericht München, HRA 95603

Komplementärin: ORACLE Deutschland Verwaltung B.V.
Hertogswetering 163/167, 3543 AS Utrecht, Niederlande
Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
Geschäftsführer: Alexander van der Ven, Astrid Kepper, Val Maher

Green Oracle http://www.oracle.com/commitment Oracle is committed to
developing practices and products that help protect the environment

}
conn.commit()
if (supportsTransactions) {
conn.commit()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still wrong... The commit is partition level.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the dataFrame has duplicate keys and the target tables have the unique constraints, we could see the non-deterministic results at the target tables, since some partitions containing duplicate keys could be rolled back.

@srowen
Copy link
Member

srowen commented Oct 12, 2016

You're right but lots of output semantics are per partition. I don't think we can do the update in one transaction no matter what. This improves the behavior in many cases so is worthwhile behavior but doesn't make this truly bulletproof or idempotrnt no matter what. The operation can still fail after the commit of any partition and then fail on reexecution .

@gatorsmile
Copy link
Member

The basic problem is multiple connections work on the same transaction. It is doable but might not be applicable as a general JDBC data source connector. Let us keep it as an open problem. If necessary, we can explore how to do it in JDBC.

@srowen
Copy link
Member

srowen commented Oct 13, 2016

Oh, can you really have a transaction across connections somehow? I didn't think that was possible in general. I agree that this is really the ideal behavior but don't know how to implement it in JDBC.

@gatorsmile
Copy link
Member

Checked it with @huaxingao who worked for JDBC driver team before. Yeah, we are unable to do it using JDBC. In my previous team, we did it using the native connection methods instead of JDBC. It sounds like each major RDBMS connectors should not use JDBC because of this requirement. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants