-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-3039] [BUILD] Spark assembly for new hadoop API (hadoop 2) contai... #4315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…tains avro-mapred for hadoop 1 API had been marked as resolved but did not work for at least some builds due to version conflicts using avro-mapred-1.7.5.jar and avro-mapred-1.7.6-hadoop2.jar (the correct version) when building for hadoop2. sql/hive/pom.xml org.spark-project.hive:hive-exec's depends on 1.7.5: Building Spark Project Hive 1.2.0 [INFO] ------------------------------------------------------------------------ [INFO] [INFO] --- maven-dependency-plugin:2.4:tree (default-cli) @ spark-hive_2.10 --- [INFO] org.apache.spark:spark-hive_2.10:jar:1.2.0 [INFO] +- org.spark-project.hive:hive-exec:jar:0.13.1a:compile [INFO] | \- org.apache.avro:avro-mapred:jar:1.7.5:compile [INFO] \- org.apache.avro:avro-mapred:jar:hadoop2:1.7.6:compile [INFO] Excluding this dependency allows the explicitly listed avro-mapred dependency to be picked up.
|
Can one of the admins verify this patch? |
|
OK, so this works because |
|
The problem was that the Spark project hive-exec 0.13.1a depends on (see http://central.maven.org/maven2/org/spark-project/hive/hive-exec/0.13.1a/hive-exec-0.13.1a.pom) Its parent defines avro.version as 1.7.5 (see http://central.maven.org/maven2/org/spark-project/hive/hive/0.13.1a/hive-0.13.1a.pom) The only place hive-exec is being used as a dependency is in: In sql/hive/pom.xml we also explicitly have dependency on: Therefore if we choose a profile that does not define avro.mapred.classifier If we choose a profile like hadoop-2.4 we set it to hadoop2 and pull: However, with changes in 1.3.0-SNAPSHOT the avro-mapred's scope is newly defined as: That scope is in main pom.xml: However, with changes in 1.3.0-SNAPSHOT the avro-mapred's scope is newly defined as: That scope is in main pom.xml: Same for hive-exec. So competing avro-mapred classes will no longer be included in the spark-assembly.jar. They are not included on the Hadoop classpath (only Avro), so they need to be supplied by the job. That will be new for Avro users. But excluding the hive-exec dependency and explicitly specifying avro-mapred to be only 1.7.6 with the correct classifier will be necessary if anything like maven enforcer is ever run. When building with -Phadoop-2.4, my local maven repo contains avro-mapred-1.7.6-hadoop2.jar only. When building with -Phadoop-0.23 the local repo only contains avro-mapred-1.7.6.jar as expected. |
|
LGTM |
|
LGTM, but looks like Jenkins hasn't tested this yet. Let me see if I am able to whitelist you: |
|
Jenkins, this is OK to test. |
|
@pwendell or @JoshRosen could you tell Jenkins this is OK to test? I'm not on the VIP list. |
|
@srowen I've granted you admin permissions on the Spark pull request builder, so you should now be able to whitelist PRs by yourself. Want to comment here to test it out? |
|
Jenkins, this is OK to test. |
|
ok to test |
|
Test build #27015 has started for PR 4315 at commit
|
|
Test build #27015 has finished for PR 4315 at commit
|
|
Test PASSed. |
…tai... ...ns avro-mapred for hadoop 1 API had been marked as resolved but did not work for at least some builds due to version conflicts using avro-mapred-1.7.5.jar and avro-mapred-1.7.6-hadoop2.jar (the correct version) when building for hadoop2. sql/hive/pom.xml org.spark-project.hive:hive-exec's depends on 1.7.5: Building Spark Project Hive 1.2.0 [INFO] ------------------------------------------------------------------------ [INFO] [INFO] --- maven-dependency-plugin:2.4:tree (default-cli) spark-hive_2.10 --- [INFO] org.apache.spark:spark-hive_2.10:jar:1.2.0 [INFO] +- org.spark-project.hive:hive-exec:jar:0.13.1a:compile [INFO] | \- org.apache.avro:avro-mapred:jar:1.7.5:compile [INFO] \- org.apache.avro:avro-mapred:jar:hadoop2:1.7.6:compile [INFO] Excluding this dependency allows the explicitly listed avro-mapred dependency to be picked up. Author: medale <[email protected]> Closes #4315 from medale/avro-hadoop2 and squashes the following commits: 1ab4fa3 [medale] Merge branch 'master' into avro-hadoop2 9d85e2a [medale] Merge remote-tracking branch 'upstream/master' into avro-hadoop2 51b9c2a [medale] [SPARK-3039] [BUILD] Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 1 API had been marked as resolved but did not work for at least some builds due to version conflicts using avro-mapred-1.7.5.jar and avro-mapred-1.7.6-hadoop2.jar (the correct version) when building for hadoop2. (cherry picked from commit 75fdccc) Signed-off-by: Sean Owen <[email protected]>
...ns avro-mapred for
hadoop 1 API had been marked as resolved but did not work for at least some
builds due to version conflicts using avro-mapred-1.7.5.jar and
avro-mapred-1.7.6-hadoop2.jar (the correct version) when building for hadoop2.
sql/hive/pom.xml org.spark-project.hive:hive-exec's depends on 1.7.5:
Building Spark Project Hive 1.2.0
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-dependency-plugin:2.4:tree (default-cli) @ spark-hive_2.10 ---
[INFO] org.apache.spark:spark-hive_2.10:jar:1.2.0
[INFO] +- org.spark-project.hive:hive-exec:jar:0.13.1a:compile
[INFO] | - org.apache.avro:avro-mapred:jar:1.7.5:compile
[INFO] - org.apache.avro:avro-mapred:jar:hadoop2:1.7.6:compile
[INFO]
Excluding this dependency allows the explicitly listed avro-mapred dependency
to be picked up.