-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-26851][SQL] Fix double-checked locking in CachedRDDBuilder #23768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #102261 has finished for PR 23768 at commit
|
|
Ah, good catch! The reference for that is http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html, too. LGTM cc: @cloud-fan @srowen |
| @transient cachedPlan: SparkPlan, | ||
| tableName: Option[String])( | ||
| @transient private var _cachedColumnBuffers: RDD[CachedBatch] = null) { | ||
| @transient @volatile private var _cachedColumnBuffers: RDD[CachedBatch] = null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this just be a lazy val? I don't see any caller that specifies this argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC we use var here to support cache buffer clearance? def clearCache(blocking: Boolean = true)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bersprockets @cloud-fan Oops, I just noticed this causes the Scala 2.11 build to fail:
[error] /home/jenkins/workspace/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.11/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala:52: values cannot be volatile
[error] @transient @volatile private var _cachedColumnBuffers: RDD[CachedBatch] = null) {
[error]
It looks like this might be a scalac bug, that is only fixed in 2.12; didn't look too hard but ended up here:
scala/bug#8873
scala/scala#5294
It might be sufficient to move this to a private field, as I don't think any caller actually sets this value? Let me try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @srowen . Going forward, I assume I (and other contributors) should be building at least once with -Pscala-2.11 before submitting or updating a PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't go that far as we have another job to check 2.11 after the fact and these are rare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bersprockets yeah that did it: #23864
|
LGTM! merging to master! |
## What changes were proposed in this pull request? According to Brian Goetz et al in Java Concurrency in Practice, the double checked locking pattern has worked since Java 5, but only if the resource is declared volatile: > Subsequent changes in the JMM (Java 5.0 and later) have enabled DCL to work if resource is made volatile, and the performance impact of this is small since volatile reads are usually only slightly more expensive than nonvolatile reads. CachedRDDBuilder. cachedColumnBuffers and CachedRDDBuilder.clearCache both use DCL to manage the resource ``_cachedColumnBuffers``. The missing ingredient is that ``_cachedColumnBuffers`` is not volatile. Because of this, clearCache may see ``_cachedColumnBuffers`` as null, when in fact it is not, and therefore fail to un-cache the RDD. There may be other, more subtle bugs due to visibility issues. To avoid these issues, this PR makes ``_cachedColumnBuffers`` volatile. ## How was this patch tested? - Existing SQL unit tests - Existing pyspark-sql tests Closes apache#23768 from bersprockets/SPARK-26851. Authored-by: Bruce Robbins <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
According to Brian Goetz et al in Java Concurrency in Practice, the double checked locking pattern has worked since Java 5, but only if the resource is declared volatile:
CachedRDDBuilder. cachedColumnBuffers and CachedRDDBuilder.clearCache both use DCL to manage the resource
_cachedColumnBuffers. The missing ingredient is that_cachedColumnBuffersis not volatile.Because of this, clearCache may see
_cachedColumnBuffersas null, when in fact it is not, and therefore fail to un-cache the RDD. There may be other, more subtle bugs due to visibility issues.To avoid these issues, this PR makes
_cachedColumnBuffersvolatile.How was this patch tested?