Skip to content

Conversation

@jey
Copy link
Contributor

@jey jey commented Aug 11, 2013

Initial patch to allow one spark binary to target multiple hadoop versions. Has a few bugs that I'll submit fixes for shortly:

  • sbt clean doesn't work
  • unncessarily recompiles some source files when $SPARK_HADOOP_VERSION is changed
  • doesn't work with hadoop-0.23.x

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small code organization suggestion -- factor this into spark.Utils because you have another copy of this method in HadoopMapReduceUtil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just gave that a shot, but turns out that spark.Utils is private to the package. Do we want to increase its visibility enough to be available from org.apache.hadoop?

@jey
Copy link
Contributor Author

jey commented Aug 11, 2013

Another known bug: some unit tests won't pass when running against YARN

@mridulm
Copy link
Contributor

mridulm commented Aug 11, 2013

Any reason to move yarn from inside core to outside ? I dont think it makes sense outside of core ...

CC'ing @tgravescs

@jey
Copy link
Contributor Author

jey commented Aug 11, 2013

We are working to unify the Spark binaries so it won't have to be rebuilt for each Hadoop version. This involved moving YARN support out of core because the YARN APIs are not available when building against MapReduce v1 versions of Hadoop. The only effect that has on YARN users is that the SPARK_JAR environment variable has to be set to point to the spark-yarn assembly JAR instead of the spark-core assembly used previously.

@AmplabJenkins
Copy link

Thank you for submitting this pull request.

Unfortunately, the automated tests for this request have failed.

Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/531/

@AmplabJenkins
Copy link

Thank you for submitting this pull request.

Unfortunately, the automated tests for this request have failed.

Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/533/

@AmplabJenkins
Copy link

Thank you for submitting this pull request.

Unfortunately, the automated tests for this request have failed.

Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/542/

@AmplabJenkins
Copy link

Thank you for submitting this pull request.

All automated tests for this request have passed.

Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/543/

@mateiz
Copy link
Member

mateiz commented Aug 12, 2013

Hey Jey, for the default Hadoop version, I actually suggested 1.2.1, not 1.1.2.

@jey
Copy link
Contributor Author

jey commented Aug 12, 2013

Oops. Fixed.

@AmplabJenkins
Copy link

Thank you for submitting this pull request.

Unfortunately, the automated tests for this request have failed.

Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/554/

@jey
Copy link
Contributor Author

jey commented Aug 12, 2013

Here's a pretty strange error from Jenkins, looks like possibily a bug in Scala's JPropertiesWrapper's toMap method? Or maybe some other thread is caling System.setProperty at around the same time?

[info] DriverSuite:
Exception in thread "main" java.util.ConcurrentModificationException
    at java.util.Hashtable$Enumerator.next(Hashtable.java:1200)
    at scala.collection.JavaConversions$JPropertiesWrapper$$anon$3.next(JavaConversions.scala:933)
    at scala.collection.JavaConversions$JPropertiesWrapper$$anon$3.next(JavaConversions.scala:930)
    at scala.collection.Iterator$class.foreach(Iterator.scala:772)
    at scala.collection.JavaConversions$JPropertiesWrapper$$anon$3.foreach(JavaConversions.scala:930)
    at scala.collection.IterableLike$class.foreach(IterableLike.scala:73)
    at scala.collection.JavaConversions$JPropertiesWrapper.foreach(JavaConversions.scala:903)
    at scala.collection.TraversableOnce$class.toMap(TraversableOnce.scala:256)
    at scala.collection.JavaConversions$JPropertiesWrapper.toMap(JavaConversions.scala:903)
    at spark.SparkContext.<init>(SparkContext.scala:254)
    at spark.DriverWithoutCleanup$.main(DriverSuite.scala:53)
    at spark.DriverWithoutCleanup.main(DriverSuite.scala)
[info] - driver should exit after finishing *** FAILED ***
[info]   SparkException was thrown during property evaluation. (DriverSuite.scala:37)
[info]   Message: Process List(./run, spark.DriverWithoutCleanup, local-cluster[2,1,512]) exited with code 1
[info]   Occurred at table row 1 (zero based, not counting headings), which had values (
[info]     master = local-cluster[2,1,512]
[info]   )

Jenkins, retest this please.

@jey
Copy link
Contributor Author

jey commented Aug 12, 2013

BTW, unit tests pass under YARN too after 1a0607a fixed a silly mistake in bin/update-classpath.sh

@jey
Copy link
Contributor Author

jey commented Aug 12, 2013

Jenkins, retest this please.

@AmplabJenkins
Copy link

Thank you for submitting this pull request.

Unfortunately, the automated tests for this request have failed.

Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/555/

@AmplabJenkins
Copy link

Thank you for submitting this pull request.

All automated tests for this request have passed.

Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/598/

@AmplabJenkins
Copy link

Thank you for submitting this pull request.

All automated tests for this request have passed.

Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/599/

@mridulm
Copy link
Contributor

mridulm commented Aug 15, 2013

Moving yarn code out of core does not look like a good decision - it does not make sense as a top level module.
It is better to simply have different profile which adds the yarn specific targets and subdirectories for building the code - the default building for non-yarn case which does not include the yarn sub-dir.

@jey
Copy link
Contributor Author

jey commented Aug 15, 2013

It's simpler to have a different module, especially due to Maven insanity. The only way this materially affects YARN users is that they need to provide the path to the spark-yarn assembly instead of the spark-core assembly when launching a YARN job.

@jey
Copy link
Contributor Author

jey commented Aug 15, 2013

I'm happy to consider the profile-based approach, but please list the benefits, as they are not obvious to me.

@mateiz
Copy link
Member

mateiz commented Aug 15, 2013

Yeah, the core problem is that the YARN code won't work unless you are linking against a YARN-enabled Hadoop, and we don't want to be publishing separate spark-core artifacts for every Hadoop version. YARN users will just have to add spark-yarn as an additional dependency until either Hadoop improves its packaging (e.g. adds some kind of yarn-client package that still works with old Hadoop versions) or YARN is widely enough deployed that we don't want to run on non-YARN clusters (unlikely in the near future).

@jey
Copy link
Contributor Author

jey commented Aug 15, 2013

I'm closing this PR and will submit a new one targeted against mesos/master instead of mesos/branch-0.8

@jey jey closed this Aug 15, 2013
@jey jey mentioned this pull request Aug 15, 2013
@mridulm
Copy link
Contributor

mridulm commented Aug 16, 2013

I am not commenting about including yarn artifacts within the same spark core jar - but where the code is hosted within spark source tree. Hosting it as a top level directory does not make much sense when it has no standalone value since it is very closely tied to spark core (and specializing it in a implementation dependent way).
Compare with other top level directories - bagel, ml, shark, doc, etc : it makes sense for them to exist there right now, not so for yarn directory.

Having two jars make sense - since expected functionality is different : and we can generate different artifacts when driven by different profiles. Btw, we would not need to include mesos jars while building spark yarn jar - if it is getting separated out.

@mateiz
Copy link
Member

mateiz commented Aug 16, 2013

The folder structure in SBT/Maven just reflects modules, not anything about standalone value. I don't see any reason to complicate the build file to do this. We've actually fought a lot with the profiles in Maven and SBT to try todo conditional builds, and it's really painful. This stuff will still get compiled, tested, etc together.

xiajunluan pushed a commit to xiajunluan/spark that referenced this pull request May 30, 2014
Author: Patrick Wendell <[email protected]>

Closes mesos#803 from pwendell/mapr-support and squashes the following commits:

8df60e4 [Patrick Wendell] SPARK-1862: Support for MapR in the Maven build.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants