Skip to content

Conversation

@medale
Copy link
Contributor

@medale medale commented Feb 2, 2015

...ns avro-mapred for

hadoop 1 API had been marked as resolved but did not work for at least some
builds due to version conflicts using avro-mapred-1.7.5.jar and
avro-mapred-1.7.6-hadoop2.jar (the correct version) when building for hadoop2.

sql/hive/pom.xml org.spark-project.hive:hive-exec's depends on 1.7.5:

Building Spark Project Hive 1.2.0
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-dependency-plugin:2.4:tree (default-cli) @ spark-hive_2.10 ---
[INFO] org.apache.spark:spark-hive_2.10:jar:1.2.0
[INFO] +- org.spark-project.hive:hive-exec:jar:0.13.1a:compile
[INFO] | - org.apache.avro:avro-mapred:jar:1.7.5:compile
[INFO] - org.apache.avro:avro-mapred:jar:hadoop2:1.7.6:compile
[INFO]

Excluding this dependency allows the explicitly listed avro-mapred dependency
to be picked up.

…tains avro-mapred for

hadoop 1 API had been marked as resolved but did not work for at least some
builds due to version conflicts using avro-mapred-1.7.5.jar and
avro-mapred-1.7.6-hadoop2.jar (the correct version) when building for hadoop2.

sql/hive/pom.xml org.spark-project.hive:hive-exec's depends on 1.7.5:

Building Spark Project Hive 1.2.0
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-dependency-plugin:2.4:tree (default-cli) @ spark-hive_2.10 ---
[INFO] org.apache.spark:spark-hive_2.10:jar:1.2.0
[INFO] +- org.spark-project.hive:hive-exec:jar:0.13.1a:compile
[INFO] |  \- org.apache.avro:avro-mapred:jar:1.7.5:compile
[INFO] \- org.apache.avro:avro-mapred:jar:hadoop2:1.7.6:compile
[INFO]

Excluding this dependency allows the explicitly listed avro-mapred dependency
to be picked up.
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Feb 3, 2015

OK, so this works because spark-hive brings this artifact back in, but with the appropriate classifier? I suppose Maven treats artifacts with different classifiers separately. If so and you verify that the dependency:tree change is what you expect with both Hadoop 1 and 2, LGTM.

@medale
Copy link
Contributor Author

medale commented Feb 4, 2015

The problem was that the Spark project hive-exec 0.13.1a depends on

<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro-mapred</artifactId>
<version>${avro.version}</version>
</dependency>

(see http://central.maven.org/maven2/org/spark-project/hive/hive-exec/0.13.1a/hive-exec-0.13.1a.pom)

Its parent defines avro.version as 1.7.5

<avro.version>1.7.5</avro.version>

(see http://central.maven.org/maven2/org/spark-project/hive/hive/0.13.1a/hive-0.13.1a.pom)

The only place hive-exec is being used as a dependency is in:

find . -name pom.xml | xargs grep hive-exec
pom.xml (where we define it in dependencyManagement section)
sql/hive/pom.xml (in actual dependencies)

In sql/hive/pom.xml we also explicitly have dependency on:

   <dependency>
      <groupId>org.apache.avro</groupId>
      <artifactId>avro-mapred</artifactId>
      <classifier>${avro.mapred.classifier}</classifier>
    </dependency>

Therefore if we choose a profile that does not define avro.mapred.classifier
this field is left empty (see main pom.xml <avro.mapred.classifier></avro.mapred.classifier>).
We pull: avro-mapred-1.7.6.jar (exact same as avro-mapred-1.7.6-hadoop1.jar) as it should be.

If we choose a profile like hadoop-2.4 we set it to hadoop2 and pull:
avro-mapred-1.7.6-hadoop2.jar as it should be.

    <profile>
      <id>hadoop-2.4</id>
      <properties>
        <hadoop.version>2.4.0</hadoop.version>
        <protobuf.version>2.5.0</protobuf.version>
        <jets3t.version>0.9.0</jets3t.version>
        <hbase.version>0.98.7-hadoop2</hbase.version>
        <commons.math3.version>3.1.1</commons.math3.version>
        <avro.mapred.classifier>hadoop2</avro.mapred.classifier>
      </properties>
    </profile>

However, with changes in 1.3.0-SNAPSHOT the avro-mapred's scope is newly defined as:

     <dependency>
        <groupId>org.apache.avro</groupId>
        <artifactId>avro-mapred</artifactId>
        <version>${avro.version}</version>
        <classifier>${avro.mapred.classifier}</classifier>
        <scope>${hive.deps.scope}</scope>

That scope is in main pom.xml:
<hive.deps.scope>compile</hive.deps.scope>

However, with changes in 1.3.0-SNAPSHOT the avro-mapred's scope is newly defined as:

     <dependency>
        <groupId>org.apache.avro</groupId>
        <artifactId>avro-mapred</artifactId>
        <version>${avro.version}</version>
        <classifier>${avro.mapred.classifier}</classifier>
        <scope>${hive.deps.scope}</scope>

That scope is in main pom.xml:
<hive.deps.scope>compile</hive.deps.scope>
assembly/pom.xml: <hive.deps.scope>provided</hive.deps.scope>
examples/pom.xml: <hive.deps.scope>provided</hive.deps.scope>

Same for hive-exec. So competing avro-mapred classes will no longer be included in the spark-assembly.jar. They are not included on the Hadoop classpath (only Avro), so they need to be supplied by the job. That will be new for Avro users. But excluding the hive-exec dependency and explicitly specifying avro-mapred to be only 1.7.6 with the correct classifier will be necessary if anything like maven enforcer is ever run.

When building with -Phadoop-2.4, my local maven repo contains avro-mapred-1.7.6-hadoop2.jar only. When building with -Phadoop-0.23 the local repo only contains avro-mapred-1.7.6.jar as expected.

@cjnolet
Copy link
Member

cjnolet commented Feb 6, 2015

LGTM

@srowen
Copy link
Member

srowen commented Feb 6, 2015

LGTM, but looks like Jenkins hasn't tested this yet. Let me see if I am able to whitelist you:

@srowen
Copy link
Member

srowen commented Feb 6, 2015

Jenkins, this is OK to test.

@srowen
Copy link
Member

srowen commented Feb 7, 2015

@pwendell or @JoshRosen could you tell Jenkins this is OK to test? I'm not on the VIP list.

@JoshRosen
Copy link
Contributor

@srowen I've granted you admin permissions on the Spark pull request builder, so you should now be able to whitelist PRs by yourself. Want to comment here to test it out?

@srowen
Copy link
Member

srowen commented Feb 7, 2015

Jenkins, this is OK to test.

@srowen
Copy link
Member

srowen commented Feb 7, 2015

ok to test

@SparkQA
Copy link

SparkQA commented Feb 7, 2015

Test build #27015 has started for PR 4315 at commit 1ab4fa3.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 8, 2015

Test build #27015 has finished for PR 4315 at commit 1ab4fa3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27015/
Test PASSed.

asfgit pushed a commit that referenced this pull request Feb 8, 2015
…tai...

...ns avro-mapred for

hadoop 1 API had been marked as resolved but did not work for at least some
builds due to version conflicts using avro-mapred-1.7.5.jar and
avro-mapred-1.7.6-hadoop2.jar (the correct version) when building for hadoop2.

sql/hive/pom.xml org.spark-project.hive:hive-exec's depends on 1.7.5:

Building Spark Project Hive 1.2.0
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-dependency-plugin:2.4:tree (default-cli)  spark-hive_2.10 ---
[INFO] org.apache.spark:spark-hive_2.10:jar:1.2.0
[INFO] +- org.spark-project.hive:hive-exec:jar:0.13.1a:compile
[INFO] |  \- org.apache.avro:avro-mapred:jar:1.7.5:compile
[INFO] \- org.apache.avro:avro-mapred:jar:hadoop2:1.7.6:compile
[INFO]

Excluding this dependency allows the explicitly listed avro-mapred dependency
to be picked up.

Author: medale <[email protected]>

Closes #4315 from medale/avro-hadoop2 and squashes the following commits:

1ab4fa3 [medale] Merge branch 'master' into avro-hadoop2
9d85e2a [medale] Merge remote-tracking branch 'upstream/master' into avro-hadoop2
51b9c2a [medale] [SPARK-3039] [BUILD] Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 1 API had been marked as resolved but did not work for at least some builds due to version conflicts using avro-mapred-1.7.5.jar and avro-mapred-1.7.6-hadoop2.jar (the correct version) when building for hadoop2.

(cherry picked from commit 75fdccc)
Signed-off-by: Sean Owen <[email protected]>
@asfgit asfgit closed this in 75fdccc Feb 8, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants