Skip to content

Conversation

@bersprockets
Copy link
Contributor

@bersprockets bersprockets commented Feb 12, 2019

What changes were proposed in this pull request?

According to Brian Goetz et al in Java Concurrency in Practice, the double checked locking pattern has worked since Java 5, but only if the resource is declared volatile:

Subsequent changes in the JMM (Java 5.0 and later) have enabled DCL to work if resource is made volatile, and the performance impact of this is small since volatile reads are usually only slightly more expensive than nonvolatile reads.

CachedRDDBuilder. cachedColumnBuffers and CachedRDDBuilder.clearCache both use DCL to manage the resource _cachedColumnBuffers. The missing ingredient is that _cachedColumnBuffers is not volatile.

Because of this, clearCache may see _cachedColumnBuffers as null, when in fact it is not, and therefore fail to un-cache the RDD. There may be other, more subtle bugs due to visibility issues.

To avoid these issues, this PR makes _cachedColumnBuffers volatile.

How was this patch tested?

  • Existing SQL unit tests
  • Existing pyspark-sql tests

@SparkQA
Copy link

SparkQA commented Feb 12, 2019

Test build #102261 has finished for PR 23768 at commit 9e7974f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Feb 14, 2019

Ah, good catch! The reference for that is http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html, too.

LGTM

cc: @cloud-fan @srowen

@transient cachedPlan: SparkPlan,
tableName: Option[String])(
@transient private var _cachedColumnBuffers: RDD[CachedBatch] = null) {
@transient @volatile private var _cachedColumnBuffers: RDD[CachedBatch] = null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this just be a lazy val? I don't see any caller that specifies this argument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC we use var here to support cache buffer clearance? def clearCache(blocking: Boolean = true)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bersprockets @cloud-fan Oops, I just noticed this causes the Scala 2.11 build to fail:

[error] /home/jenkins/workspace/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.11/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala:52: values cannot be volatile
[error]     @transient @volatile private var _cachedColumnBuffers: RDD[CachedBatch] = null) {
[error]   

It looks like this might be a scalac bug, that is only fixed in 2.12; didn't look too hard but ended up here:
scala/bug#8873
scala/scala#5294

It might be sufficient to move this to a private field, as I don't think any caller actually sets this value? Let me try.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @srowen . Going forward, I assume I (and other contributors) should be building at least once with -Pscala-2.11 before submitting or updating a PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't go that far as we have another job to check 2.11 after the fact and these are rare

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bersprockets yeah that did it: #23864

@cloud-fan
Copy link
Contributor

LGTM! merging to master!

@cloud-fan cloud-fan closed this in f34b872 Feb 14, 2019
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

According to Brian Goetz et al in Java Concurrency in Practice, the double checked locking pattern has worked since Java 5, but only if the resource is declared volatile:

> Subsequent changes in the JMM (Java 5.0 and later) have enabled DCL to work if resource is made volatile, and the performance impact of this is small since volatile reads are usually only slightly more expensive than nonvolatile reads.

CachedRDDBuilder. cachedColumnBuffers and CachedRDDBuilder.clearCache both use DCL to manage the resource ``_cachedColumnBuffers``. The missing ingredient is that ``_cachedColumnBuffers`` is not volatile.

Because of this, clearCache may see ``_cachedColumnBuffers`` as null, when in fact it is not, and therefore fail to un-cache the RDD. There may be other, more subtle bugs due to visibility issues.

To avoid these issues, this PR makes ``_cachedColumnBuffers`` volatile.

## How was this patch tested?

- Existing SQL unit tests
- Existing pyspark-sql tests

Closes apache#23768 from bersprockets/SPARK-26851.

Authored-by: Bruce Robbins <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@bersprockets bersprockets deleted the SPARK-26851 branch May 1, 2022 01:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants