-
Notifications
You must be signed in to change notification settings - Fork 383
Hadoop integration #1 #803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… => SparkHadoopMapReduceUtil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small code organization suggestion -- factor this into spark.Utils because you have another copy of this method in HadoopMapReduceUtil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just gave that a shot, but turns out that spark.Utils is private to the package. Do we want to increase its visibility enough to be available from org.apache.hadoop?
|
Another known bug: some unit tests won't pass when running against YARN |
|
Any reason to move yarn from inside core to outside ? I dont think it makes sense outside of core ... CC'ing @tgravescs |
|
We are working to unify the Spark binaries so it won't have to be rebuilt for each Hadoop version. This involved moving YARN support out of core because the YARN APIs are not available when building against MapReduce v1 versions of Hadoop. The only effect that has on YARN users is that the |
|
Thank you for submitting this pull request. Unfortunately, the automated tests for this request have failed. Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/531/ |
|
Thank you for submitting this pull request. Unfortunately, the automated tests for this request have failed. Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/533/ |
|
Thank you for submitting this pull request. Unfortunately, the automated tests for this request have failed. Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/542/ |
|
Thank you for submitting this pull request. All automated tests for this request have passed. Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/543/ |
|
Hey Jey, for the default Hadoop version, I actually suggested 1.2.1, not 1.1.2. |
|
Oops. Fixed. |
|
Thank you for submitting this pull request. Unfortunately, the automated tests for this request have failed. Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/554/ |
|
Here's a pretty strange error from Jenkins, looks like possibily a bug in Scala's JPropertiesWrapper's toMap method? Or maybe some other thread is caling System.setProperty at around the same time? Jenkins, retest this please. |
|
BTW, unit tests pass under YARN too after 1a0607a fixed a silly mistake in |
|
Jenkins, retest this please. |
|
Thank you for submitting this pull request. Unfortunately, the automated tests for this request have failed. Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/555/ |
|
Thank you for submitting this pull request. All automated tests for this request have passed. Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/598/ |
|
Thank you for submitting this pull request. All automated tests for this request have passed. Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/599/ |
|
Moving yarn code out of core does not look like a good decision - it does not make sense as a top level module. |
|
It's simpler to have a different module, especially due to Maven insanity. The only way this materially affects YARN users is that they need to provide the path to the spark-yarn assembly instead of the spark-core assembly when launching a YARN job. |
|
I'm happy to consider the profile-based approach, but please list the benefits, as they are not obvious to me. |
|
Yeah, the core problem is that the YARN code won't work unless you are linking against a YARN-enabled Hadoop, and we don't want to be publishing separate spark-core artifacts for every Hadoop version. YARN users will just have to add spark-yarn as an additional dependency until either Hadoop improves its packaging (e.g. adds some kind of yarn-client package that still works with old Hadoop versions) or YARN is widely enough deployed that we don't want to run on non-YARN clusters (unlikely in the near future). |
|
I'm closing this PR and will submit a new one targeted against mesos/master instead of mesos/branch-0.8 |
|
I am not commenting about including yarn artifacts within the same spark core jar - but where the code is hosted within spark source tree. Hosting it as a top level directory does not make much sense when it has no standalone value since it is very closely tied to spark core (and specializing it in a implementation dependent way). Having two jars make sense - since expected functionality is different : and we can generate different artifacts when driven by different profiles. Btw, we would not need to include mesos jars while building spark yarn jar - if it is getting separated out. |
|
The folder structure in SBT/Maven just reflects modules, not anything about standalone value. I don't see any reason to complicate the build file to do this. We've actually fought a lot with the profiles in Maven and SBT to try todo conditional builds, and it's really painful. This stuff will still get compiled, tested, etc together. |
Author: Patrick Wendell <[email protected]> Closes mesos#803 from pwendell/mapr-support and squashes the following commits: 8df60e4 [Patrick Wendell] SPARK-1862: Support for MapR in the Maven build.
Initial patch to allow one spark binary to target multiple hadoop versions. Has a few bugs that I'll submit fixes for shortly:
sbt cleandoesn't work$SPARK_HADOOP_VERSIONis changed