Skip to content

Commit 91e7a71

Browse files
aarondavpwendell
authored andcommitted
SPARK-1097: Do not introduce deadlock while fixing concurrency bug
We recently added this lock on 'conf' in order to prevent concurrent creation. However, it turns out that this can introduce a deadlock because Hadoop also synchronizes on the Configuration objects when creating new Configurations (and they do so via a static REGISTRY which contains all created Configurations). This fix forces all Spark initialization of Configuration objects to occur serially by using a static lock that we control, and thus also prevents introducing the deadlock. Author: Aaron Davidson <[email protected]> Closes #1409 from aarondav/1054 and squashes the following commits: 7d1b769 [Aaron Davidson] SPARK-1097: Do not introduce deadlock while fixing concurrency bug (cherry picked from commit 8867cd0) Signed-off-by: Patrick Wendell <[email protected]>
1 parent bf1ddc7 commit 91e7a71

File tree

1 file changed

+5
-2
lines changed

1 file changed

+5
-2
lines changed

core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -139,8 +139,8 @@ class HadoopRDD[K, V](
139139
// Create a JobConf that will be cached and used across this RDD's getJobConf() calls in the
140140
// local process. The local cache is accessed through HadoopRDD.putCachedMetadata().
141141
// The caching helps minimize GC, since a JobConf can contain ~10KB of temporary objects.
142-
// synchronize to prevent ConcurrentModificationException (Spark-1097, Hadoop-10456)
143-
conf.synchronized {
142+
// Synchronize to prevent ConcurrentModificationException (Spark-1097, Hadoop-10456).
143+
HadoopRDD.CONFIGURATION_INSTANTIATION_LOCK.synchronized {
144144
val newJobConf = new JobConf(conf)
145145
initLocalJobConfFuncOpt.map(f => f(newJobConf))
146146
HadoopRDD.putCachedMetadata(jobConfCacheKey, newJobConf)
@@ -231,6 +231,9 @@ class HadoopRDD[K, V](
231231
}
232232

233233
private[spark] object HadoopRDD {
234+
/** Constructing Configuration objects is not threadsafe, use this lock to serialize. */
235+
val CONFIGURATION_INSTANTIATION_LOCK = new Object()
236+
234237
/**
235238
* The three methods below are helpers for accessing the local map, a property of the SparkEnv of
236239
* the local process.

0 commit comments

Comments
 (0)