-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-2585] Remove special handling of Hadoop JobConf. #1648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Jenkins, what are you doing ... |
|
Jenkins, test this please. |
|
QA tests have started for PR 1648. This patch merges cleanly. |
|
Jenkins, retest this please. |
|
QA tests have started for PR 1648. This patch merges cleanly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment looks out of place now.
|
LGTM, but I'm not entirely familiar with all this code yet. |
|
This is unfortunately not working because of some thing with HiveConf .... |
|
QA tests have started for PR 1648. This patch merges cleanly. |
|
QA tests have started for PR 1648. This patch merges cleanly. |
|
Jenkins, retest this please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this guaranteed to return a new copy of the conf for every partition or something? Because otherwise I'm not sure I see why we can safely remove the lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is because RDD objects are not reused at all. Each task gets its own deserialized copy of the HadoopRDD and the conf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth a comment here then saying that the createJobConf() method really does create a new job conf because [xyz] even though it looks like it's just accessing the broadcast value.
|
It looks like two unrelated commits from #1675 got pulled into this PR. Do you mind rebasing to exclude them? |
|
QA tests have started for PR 1648. This patch merges cleanly. |
|
QA tests have started for PR 1648. This patch merges cleanly. |
|
QA results for PR 1648: |
|
QA tests have started for PR 1648. This patch merges cleanly. |
|
It looks like deserializing the JobConf could be pretty expensive. Here's part of the deserialization stack trace: Executor task launch worker-0 [RUNNABLE]
java.util.zip.ZipFile.getEntry(long, byte[], boolean)
java.util.zip.ZipFile.getEntry(String)
java.util.jar.JarFile.getEntry(String)
java.util.jar.JarFile.getJarEntry(String)
sun.misc.URLClassPath$JarLoader.getResource(String, boolean)
sun.misc.URLClassPath$JarLoader.findResource(String, boolean)
sun.misc.URLClassPath.findResource(String, boolean)
java.net.URLClassLoader$2.run()<2 recursive calls>
java.security.AccessController.doPrivileged(PrivilegedAction, AccessControlContext)
java.net.URLClassLoader.findResource(String)
java.lang.ClassLoader.getResource(String)<2 recursive calls>
java.net.URLClassLoader.getResourceAsStream(String)
javax.xml.parsers.SecuritySupport$4.run()
java.security.AccessController.doPrivileged(PrivilegedAction)
javax.xml.parsers.SecuritySupport.getResourceAsStream(ClassLoader, String)
javax.xml.parsers.FactoryFinder.findJarServiceProvider(String)
javax.xml.parsers.FactoryFinder.find(String, String)
javax.xml.parsers.DocumentBuilderFactory.newInstance()
org.apache.hadoop.conf.Configuration.loadResource(Properties, Object, boolean)
org.apache.hadoop.conf.Configuration.loadResources(Properties, ArrayList, boolean)
org.apache.hadoop.conf.Configuration.getProps()
org.apache.hadoop.conf.Configuration.get(String, String)
org.apache.hadoop.hive.conf.HiveConf.initialize(Class)
org.apache.hadoop.hive.conf.HiveConf.<init>()
sun.reflect.GeneratedConstructorAccessor142.newInstance(Object[])
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Object[])
java.lang.reflect.Constructor.newInstance(Object[])
org.apache.hadoop.util.ReflectionUtils.newInstance(Class, Configuration)
org.apache.hadoop.io.WritableFactories.newInstance(Class, Configuration)
org.apache.hadoop.io.ObjectWritable.readObject(DataInput, ObjectWritable, Configuration)
org.apache.hadoop.io.ObjectWritable.readFields(DataInput)
org.apache.spark.SerializableWritable.readObject(ObjectInputStream)
[...]
org.apache.spark.serializer.JavaDeserializationStream.readObject(ClassTag)
org.apache.spark.serializer.JavaSerializerInstance.deserialize(ByteBuffer, ClassLoader, ClassTag)
org.apache.spark.scheduler.ResultTask.runTask(TaskContext)
org.apache.spark.scheduler.Task.run(long)
org.apache.spark.executor.Executor$TaskRunner.run()
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
java.util.concurrent.ThreadPoolExecutor$Worker.run()
java.lang.Thread.run()This seems to involve fairly expensive searches of the classpath. |
|
QA results for PR 1648: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note: we need to remove this setting before merging it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should keep it at 2 to speed up tests ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually speeds up the tests quite a bit, although it might be masking some of the expensive serialization/deserialization issues.
|
Jenkins, retest this please. |
|
QA tests have started for PR 1648. This patch merges cleanly. |
|
QA results for PR 1648: |
|
It looks like this test failure was due to a BindException causing an unrelated test to fail. |
|
@rxin @JoshRosen yeah the test failure was unrelated. We need to fix one of the streaming tests. |
|
I've spent a bit of time looking into some of the performance issues that we've seen in this patch. After this patch, it looks like some of the Operating under the theory that deserializing Hadoop Configuration / JobConfs was expensive, I tried a few alternative serialization approaches, including using WritableUtils to manually serialize the configuration and writing my own code to read that back into a configuration; this didn't seem to make a huge difference. I'm going to put this fix on hold for now until I have more time to figure out why we're seeing this slowdown. @ash211 Do you have a way to reliably reproduce the thread-safety issues that you reported in SPARK-2546? That would be helpful in order to know whether I've actually fixed the problem with |
|
We have merged the patch to reduce the # of shuffle partitions when testing. Time to revisit or close this PR? |
Previously we broadcast JobConf for HadoopRDD because it is large. Now we always broadcast RDDs and task closures so it should no longer be necessary to broadcast the JobConf anymore.