-
Notifications
You must be signed in to change notification settings - Fork 28.9k
SPARK-10611 Clone Configuration for each task for NewHadoopRDD #8763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Looks like a plausible variation of #2684 -- cc @JoshRosen |
|
Test build #1755 has finished for PR 8763 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If HADOOP-10456 is still an issue then I think that we need to be locking on the same lock in both NewHadoopRDD and HadoopRDD. Therefore, I'd be in favor of moving this lock to SparkHadoopUtil or a similar location, then updating both RDD implementations to use that lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HADOOP-10456 is fixed in Hadoop 2.4.1 (https://issues.apache.org/jira/browse/HADOOP-10456), so people will still hit the issue in some configurations.
The concurrency issue with the Configuration/JobConf instantiations seems to be per object, not per class. (i.e. happens when two threads try to copy the same conf object, not when two threads try to instantiate Configuration/JobConf with two separate conf objects.) The lock could've even be allocated per NewHadoopRDD instance or HadoopRDD instance.
So, I think this should be safe, although I can certainly move this to SparkHadoopUtil for clarity. Please let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. In the original version of the fix for SPARK-1097, we actually synchronized on the Configuration being cloned. It looks like the HadoopRDD object lock was added in a followup patch to avoid a deadlock caused by this synchronization: #1409
|
This seems like it could have a pretty serious perf impact. Are we able to do some benchmarking to assess this? |
|
Oh, nevermind, sorry, this is off by default. |
|
Jenkins, test this please |
|
Can someone kick off the build? Also, please let me know if there's any other comments! |
|
Test build #1769 has finished for PR 8763 at commit
|
|
LGTM, since this is off by default. I'm going to merge this into master, but let me know how far back you think this fix should be backported. |
|
Thanks @JoshRosen. Backporting this to Spark 1.5 would be good enough for me. Would that be reasonable? |
|
SGTM; I'll pull it in now for inclusion in 1.5.1. |
This patch attempts to fix the Hadoop Configuration thread safety issue for NewHadoopRDD in the same way SPARK-2546 fixed the issue for HadoopRDD. Author: Mingyu Kim <[email protected]> Closes #8763 from mingyukim/mkim/SPARK-10611. (cherry picked from commit 8074208) Signed-off-by: Josh Rosen <[email protected]>
|
Thanks! |
This patch attempts to fix the Hadoop Configuration thread safety issue for NewHadoopRDD in the same way SPARK-2546 fixed the issue for HadoopRDD. Author: Mingyu Kim <[email protected]> Closes apache#8763 from mingyukim/mkim/SPARK-10611.
This patch attempts to fix the Hadoop Configuration thread safety issue for NewHadoopRDD in the same way SPARK-2546 fixed the issue for HadoopRDD. Author: Mingyu Kim <[email protected]> Closes apache#8763 from mingyukim/mkim/SPARK-10611. (cherry picked from commit 8074208) Signed-off-by: Josh Rosen <[email protected]> (cherry picked from commit a6c3153)
This patch attempts to fix the Hadoop Configuration thread safety issue for NewHadoopRDD in the same way SPARK-2546 fixed the issue for HadoopRDD.