[SPARK-2585] Remove special handling of Hadoop JobConf. #1648

rxin · 2014-07-30T04:40:14Z

Previously we broadcast JobConf for HadoopRDD because it is large. Now we always broadcast RDDs and task closures so it should no longer be necessary to broadcast the JobConf anymore.

rxin · 2014-07-30T06:11:42Z

Jenkins, what are you doing ...

rxin · 2014-07-30T06:11:48Z

Jenkins, test this please.

SparkQA · 2014-07-30T06:13:51Z

QA tests have started for PR 1648. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17425/consoleFull

rxin · 2014-07-30T16:27:10Z

Jenkins, retest this please.

SparkQA · 2014-07-30T16:29:12Z

QA tests have started for PR 1648. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17454/consoleFull

vanzin · 2014-07-30T18:40:27Z

core/src/main/scala/org/apache/spark/SparkContext.scala

Comment looks out of place now.

vanzin · 2014-07-30T18:41:14Z

LGTM, but I'm not entirely familiar with all this code yet.

rxin · 2014-07-30T18:49:32Z

This is unfortunately not working because of some thing with HiveConf ....

SparkQA · 2014-07-30T21:44:00Z

QA tests have started for PR 1648. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17490/consoleFull

SparkQA · 2014-07-30T23:33:52Z

QA tests have started for PR 1648. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17508/consoleFull

rxin · 2014-07-31T05:38:42Z

Jenkins, retest this please.

aarondav · 2014-07-31T05:45:18Z

core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala

Is this guaranteed to return a new copy of the conf for every partition or something? Because otherwise I'm not sure I see why we can safely remove the lock.

It is because RDD objects are not reused at all. Each task gets its own deserialized copy of the HadoopRDD and the conf.

It might be worth a comment here then saying that the createJobConf() method really does create a new job conf because [xyz] even though it looks like it's just accessing the broadcast value.

JoshRosen · 2014-08-04T23:57:36Z

It looks like two unrelated commits from #1675 got pulled into this PR. Do you mind rebasing to exclude them?

SparkQA · 2014-08-05T00:19:41Z

QA tests have started for PR 1648. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17894/consoleFull

SparkQA · 2014-08-05T00:44:24Z

QA tests have started for PR 1648. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17898/consoleFull

SparkQA · 2014-08-05T01:33:23Z

QA results for PR 1648:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17898/consoleFull

SparkQA · 2014-08-05T01:44:23Z

QA tests have started for PR 1648. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17906/consoleFull

JoshRosen · 2014-08-05T01:47:26Z

It looks like deserializing the JobConf could be pretty expensive. Here's part of the deserialization stack trace:

Executor task launch worker-0 [RUNNABLE]
java.util.zip.ZipFile.getEntry(long, byte[], boolean)
java.util.zip.ZipFile.getEntry(String)
java.util.jar.JarFile.getEntry(String)
java.util.jar.JarFile.getJarEntry(String)
sun.misc.URLClassPath$JarLoader.getResource(String, boolean)
sun.misc.URLClassPath$JarLoader.findResource(String, boolean)
sun.misc.URLClassPath.findResource(String, boolean)
java.net.URLClassLoader$2.run()<2 recursive calls>
java.security.AccessController.doPrivileged(PrivilegedAction, AccessControlContext)
java.net.URLClassLoader.findResource(String)
java.lang.ClassLoader.getResource(String)<2 recursive calls>
java.net.URLClassLoader.getResourceAsStream(String)
javax.xml.parsers.SecuritySupport$4.run()
java.security.AccessController.doPrivileged(PrivilegedAction)
javax.xml.parsers.SecuritySupport.getResourceAsStream(ClassLoader, String)
javax.xml.parsers.FactoryFinder.findJarServiceProvider(String)
javax.xml.parsers.FactoryFinder.find(String, String)
javax.xml.parsers.DocumentBuilderFactory.newInstance()
org.apache.hadoop.conf.Configuration.loadResource(Properties, Object, boolean)
org.apache.hadoop.conf.Configuration.loadResources(Properties, ArrayList, boolean)
org.apache.hadoop.conf.Configuration.getProps()
org.apache.hadoop.conf.Configuration.get(String, String)
org.apache.hadoop.hive.conf.HiveConf.initialize(Class)
org.apache.hadoop.hive.conf.HiveConf.<init>()
sun.reflect.GeneratedConstructorAccessor142.newInstance(Object[])
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Object[])
java.lang.reflect.Constructor.newInstance(Object[])
org.apache.hadoop.util.ReflectionUtils.newInstance(Class, Configuration)
org.apache.hadoop.io.WritableFactories.newInstance(Class, Configuration)
org.apache.hadoop.io.ObjectWritable.readObject(DataInput, ObjectWritable, Configuration)
org.apache.hadoop.io.ObjectWritable.readFields(DataInput)
org.apache.spark.SerializableWritable.readObject(ObjectInputStream)
[...]
org.apache.spark.serializer.JavaDeserializationStream.readObject(ClassTag)
org.apache.spark.serializer.JavaSerializerInstance.deserialize(ByteBuffer, ClassLoader, ClassTag)
org.apache.spark.scheduler.ResultTask.runTask(TaskContext)
org.apache.spark.scheduler.Task.run(long)
org.apache.spark.executor.Executor$TaskRunner.run()
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
java.util.concurrent.ThreadPoolExecutor$Worker.run()
java.lang.Thread.run()

This seems to involve fairly expensive searches of the classpath.

SparkQA · 2014-08-05T03:40:33Z

QA results for PR 1648:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17906/consoleFull

yhuai · 2014-08-05T03:52:01Z

...ompatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala

Just a note: we need to remove this setting before merging it.

We should keep it at 2 to speed up tests ...

This actually speeds up the tests quite a bit, although it might be masking some of the expensive serialization/deserialization issues.

rxin · 2014-08-05T09:15:20Z

Jenkins, retest this please.

SparkQA · 2014-08-05T09:19:30Z

QA tests have started for PR 1648. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17938/consoleFull

SparkQA · 2014-08-05T11:08:19Z

QA results for PR 1648:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17938/consoleFull

JoshRosen · 2014-08-05T18:35:30Z

It looks like this test failure was due to a BindException causing an unrelated test to fail.

pwendell · 2014-08-05T18:46:36Z

@rxin @JoshRosen yeah the test failure was unrelated. We need to fix one of the streaming tests.

JoshRosen · 2014-08-05T20:15:04Z

I've spent a bit of time looking into some of the performance issues that we've seen in this patch.

After this patch, it looks like some of the mapPartitions stages in the correlationoptimizer.* tests are taking ~7 seconds instead of a few tens of milliseconds (these were called from the SparkSQL Exchange operator). @marmbrus, maybe we should chat about this, since you're more familiar with that code.

Operating under the theory that deserializing Hadoop Configuration / JobConfs was expensive, I tried a few alternative serialization approaches, including using WritableUtils to manually serialize the configuration and writing my own code to read that back into a configuration; this didn't seem to make a huge difference.

I'm going to put this fix on hold for now until I have more time to figure out why we're seeing this slowdown.

@ash211 Do you have a way to reliably reproduce the thread-safety issues that you reported in SPARK-2546? That would be helpful in order to know whether I've actually fixed the problem with clone().

marmbrus · 2014-08-05T21:03:27Z

@rxin opened #1784 to try and combat the performance issues in tests.

marmbrus · 2014-09-23T19:33:32Z

We have merged the patch to reduce the # of shuffle partitions when testing. Time to revisit or close this PR?

vanzin reviewed Jul 30, 2014
View reviewed changes

core/src/main/scala/org/apache/spark/SparkContext.scala Outdated

Copy link

Contributor

vanzin Jul 30, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment looks out of place now.

rxin mentioned this pull request Jul 30, 2014

[SPARK-2758] UnionRDD's UnionPartition should not reference parent RDDs #1675

Closed

aarondav reviewed Jul 31, 2014
View reviewed changes

rxin added 2 commits August 4, 2014 17:12

[SPARK-2585] Remove special handling of Hadoop JobConf.

19fc6da

Remove JobConf broadcast comment.

7abe8d6

Set the number of partitions to 2.

a56005b

yhuai reviewed Aug 5, 2014
View reviewed changes

JoshRosen added a commit to JoshRosen/spark that referenced this pull request Oct 6, 2014

Add comment to address Aaron's review comment in apache#1648.

1d67d9d

JoshRosen mentioned this pull request Oct 6, 2014

[SPARK-2585] Remove special handling of Hadoop JobConf #2683

Closed

rxin closed this Oct 6, 2014

[SPARK-2585] Remove special handling of Hadoop JobConf. #1648

[SPARK-2585] Remove special handling of Hadoop JobConf. #1648

Uh oh!

Conversation

rxin commented Jul 30, 2014

Uh oh!

rxin commented Jul 30, 2014

Uh oh!

rxin commented Jul 30, 2014

Uh oh!

SparkQA commented Jul 30, 2014

Uh oh!

rxin commented Jul 30, 2014

Uh oh!

SparkQA commented Jul 30, 2014

Uh oh!

vanzin Jul 30, 2014

Choose a reason for hiding this comment

Uh oh!

vanzin commented Jul 30, 2014

Uh oh!

rxin commented Jul 30, 2014

Uh oh!

SparkQA commented Jul 30, 2014

Uh oh!

SparkQA commented Jul 30, 2014

Uh oh!

rxin commented Jul 31, 2014

Uh oh!

aarondav Jul 31, 2014

Choose a reason for hiding this comment

Uh oh!

rxin Jul 31, 2014

Choose a reason for hiding this comment

Uh oh!

ash211 Aug 1, 2014

Choose a reason for hiding this comment

Uh oh!

JoshRosen commented Aug 4, 2014

Uh oh!

SparkQA commented Aug 5, 2014

Uh oh!

SparkQA commented Aug 5, 2014

Uh oh!

SparkQA commented Aug 5, 2014

Uh oh!

SparkQA commented Aug 5, 2014

Uh oh!

JoshRosen commented Aug 5, 2014

Uh oh!

SparkQA commented Aug 5, 2014

Uh oh!

yhuai Aug 5, 2014

Choose a reason for hiding this comment

Uh oh!

rxin Aug 5, 2014

Choose a reason for hiding this comment

Uh oh!

JoshRosen Aug 5, 2014

Choose a reason for hiding this comment

Uh oh!

rxin commented Aug 5, 2014

Uh oh!

SparkQA commented Aug 5, 2014

Uh oh!

SparkQA commented Aug 5, 2014

Uh oh!

JoshRosen commented Aug 5, 2014

Uh oh!

pwendell commented Aug 5, 2014

Uh oh!

JoshRosen commented Aug 5, 2014

Uh oh!

marmbrus commented Aug 5, 2014

Uh oh!

marmbrus commented Sep 23, 2014

Uh oh!

Reviewers

Assignees

Labels