Skip to content

Conversation

@sunchao
Copy link
Member

@sunchao sunchao commented Sep 24, 2021

What changes were proposed in this pull request?

Fix an issue where Maven may stuck in an infinite loop when building Spark, for Hadoop 2.7 profile.

Why are the changes needed?

After re-enabling createDependencyReducedPom for maven-shade-plugin, Spark build stopped working for Hadoop 2.7 profile and will stuck in an infinitely loop, likely due to a Maven shade plugin bug similar to https://issues.apache.org/jira/browse/MSHADE-148. This seems to be caused by the fact that, under hadoop-2.7 profile, variable hadoop-client-runtime.artifact and hadoop-client-api.artifactare both hadoop-client which triggers the issue.

As a workaround, this changes hadoop-client-runtime.artifact to be hadoop-yarn-api when using hadoop-2.7. Since hadoop-yarn-api is a dependency of hadoop-client, this essentially moves the former to the same level as the latter. It should have no effect as both are dependencies of Spark.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

N/A

@sunchao sunchao changed the title [SPARK-36835][BUILD][hadoop-2.7] Fix maven issue for Hadoop 2.7 profile after enabling dependency reduced pom [SPARK-36835][FOLLOWUP][BUILD][hadoop-2.7] Fix maven issue for Hadoop 2.7 profile after enabling dependency reduced pom Sep 24, 2021
@github-actions github-actions bot added the BUILD label Sep 24, 2021
@sunchao
Copy link
Member Author

sunchao commented Sep 24, 2021

let me test both Hadoop profiles here

@SparkQA
Copy link

SparkQA commented Sep 24, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48120/

launcher/pom.xml Outdated
<!--
Only declare for Hadoop 3.2 profile. Otherwise maven-shade-plugin may stuck in an infinite
loop building the dependency-reduced pom, perhaps due to a maven bug:
https://issues.apache.org/jira/browse/MSHADE-148
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the pointer. According to the discussion on that issue, it's marked as resolved, but not fixed yet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's not fixed yet. See the last comment in the JIRA.

@dongjoon-hyun
Copy link
Member

It looks reasonable to me. I believe we need @gengliangwang 's sign-off with Hadoop 2.7 testing.

@gengliangwang gengliangwang changed the title [SPARK-36835][FOLLOWUP][BUILD][hadoop-2.7] Fix maven issue for Hadoop 2.7 profile after enabling dependency reduced pom [SPARK-36835][FOLLOWUP][BUILD][TEST-HADOOP2.7] Fix maven issue for Hadoop 2.7 profile after enabling dependency reduced pom Sep 24, 2021
@gengliangwang
Copy link
Member

retest this please

@gengliangwang
Copy link
Member

Thank you for the fix @sunchao

@SparkQA
Copy link

SparkQA commented Sep 24, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48120/

Copy link
Contributor

@JoshRosen JoshRosen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running

build/mvn -DskipTests -Phadoop-2.7 clean install

didn't work for me: I ran into Enforcer errors related to duplicate dependencies:

[INFO] --- maven-enforcer-plugin:3.0.0-M2:enforce (enforce-no-duplicate-dependencies) @ spark-core_2.12 ---
[WARNING] Rule 0: org.apache.maven.plugins.enforcer.BanDuplicatePomDependencyVersions failed with message:
Found 1 duplicate dependency declaration in this project:
 - dependencies.dependency[org.apache.hadoop:hadoop-client:jar] ( 2 times )

It seems like there's some sort of interaction between Enforcer and Maven Shade, since I would have expected the non-dependency-reduced build to also fail with the same duplicate dependencies issue.

In order to get this to work, I had to make a similar change in the core and kafka-0-10-assembly builds: 11b34c2

I should probably test this again with all optional profiles / modules enabled (or just copy the profiles used when publishing).

@SparkQA
Copy link

SparkQA commented Sep 24, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48123/

@SparkQA
Copy link

SparkQA commented Sep 24, 2021

Test build #143608 has finished for PR 34100 at commit 4456fc1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 24, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48123/

@sunchao
Copy link
Member Author

sunchao commented Sep 24, 2021

Thanks @JoshRosen ! it's interesting that this error is not reported in the CI jobs. So the enforcer rule is only executed in install phase but not package?

Let me double verify locally and add the changes to all the necessary modules.

@SparkQA
Copy link

SparkQA commented Sep 24, 2021

Test build #143611 has finished for PR 34100 at commit 4456fc1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@github-actions github-actions bot added the YARN label Sep 24, 2021
@sunchao
Copy link
Member Author

sunchao commented Sep 24, 2021

updated the PR to use different name for hadoop-client-runtime.artifact, which is probably a simpler approach. Verified locally with:

build/mvn clean install -DskipTests -Phadoop-2.7 -Phive-2.3 -Pmesos -Phive-thriftserver -Pyarn -Pspark-ganglia-lgpl -Pkinesis-asl -Pkubernetes -Phadoop-cloud -Phive

and the build is successful.

<commons-io.version>2.4</commons-io.version>
<hadoop-client-api.artifact>hadoop-client</hadoop-client-api.artifact>
<hadoop-client-runtime.artifact>hadoop-client</hadoop-client-runtime.artifact>
<hadoop-client-runtime.artifact>hadoop-yarn-api</hadoop-client-runtime.artifact>
Copy link
Contributor

@JoshRosen JoshRosen Sep 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh, this is a clever fix:

Instead of the hadoop-2.7 profile resulting in a duplicate direct dependency on hadoop-client, we now just declare an explicit dependency on one of hadoop-client's transitive dependencies (hadoop-yarn-api in this case). Anything which depends on hadoop-client-runtime.artifact must also depend on hadoop-client-api.artifact, so this doesn't end up changing the set of dependencies pulled in.

It looks like we didn't need to do that for hadoop-client-minicluster.artifact because that's only used in the resource-managers/yarn POM and that's already using Maven profiles to control the dependency selection (so the other workaround is fairly non-invasive in that context). In principle, though, I guess we could have changed that to some other transitive dep.


Could you maybe add a one or two line comment above these Hadoop 2.7 lines to explain what's going on? And maybe edit the comment at

spark/pom.xml

Lines 251 to 255 in d73562e

<!--
These default to Hadoop 3.x shaded client/minicluster jars, but are switched to hadoop-client
when the Hadoop profile is hadoop-2.7, because these are only available in 3.x. Note that,
as result we have to include the same hadoop-client dependency multiple times in hadoop-2.7.
-->
to reflect this change? This fix is clever but a little subtle, so I think a comment calling it out (and maybe mentioning SPARK-36835 might help future readers.

Edit: could you also update the PR description to reflect this final fix?

Copy link
Member Author

@sunchao sunchao Sep 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look. Yes I think it's better to apply the same for hadoop-client-minicluster.artifact. Let me try that, and perhaps we won't need the changes in YARN's pom.xml with this.

The side effect for this is seems to be that it affects the distance of these dependencies to the root module and thus may make a difference when maven tries to resolve a dependency with multiple versions (see here for reference). I was using hadoop-common (which carries lots of dependencies) instead of hadoop-yarn-api and it was not able to compile.

Will update PR description and the comment in the above pom.xml.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it may not be so useful to change hadoop-client-minicluster.artifact since it is test scope while the other two are compile scope by default. For some reason it also changes dev/deps/spark-deps-hadoop-2.7-hive-2.3 when I set it to something like hadoop-mapreduce-client-jobclient.

@SparkQA
Copy link

SparkQA commented Sep 24, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48127/

@SparkQA
Copy link

SparkQA commented Sep 25, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48127/

@sunchao sunchao force-pushed the SPARK-36835-followup branch from 73ba941 to 0c358b3 Compare September 25, 2021 00:59
@SparkQA
Copy link

SparkQA commented Sep 25, 2021

Test build #143615 has finished for PR 34100 at commit d73562e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 25, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48130/

@SparkQA
Copy link

SparkQA commented Sep 25, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48130/

@SparkQA
Copy link

SparkQA commented Sep 25, 2021

Test build #143618 has finished for PR 34100 at commit 0c358b3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member

I tried build/mvn -DskipTests -Phadoop-2.7 clean install and it works now. Shall we merge this and start RC5?

@dongjoon-hyun
Copy link
Member

@LuciferYang
Copy link
Contributor

branch-3.2 also seems to need this fix

Copy link
Contributor

@JoshRosen JoshRosen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, so +1 to merging so we can cut a new RC. Thanks again for the fix!

@gengliangwang
Copy link
Member

Merging to master/3.2. Thanks all!

gengliangwang pushed a commit that referenced this pull request Sep 26, 2021
…doop 2.7 profile after enabling dependency reduced pom

### What changes were proposed in this pull request?

Fix an issue where Maven may stuck in an infinite loop when building Spark, for Hadoop 2.7 profile.

### Why are the changes needed?

After re-enabling `createDependencyReducedPom` for `maven-shade-plugin`, Spark build stopped working for Hadoop 2.7 profile and will stuck in an infinitely loop, likely due to a Maven shade plugin bug similar to https://issues.apache.org/jira/browse/MSHADE-148. This seems to be caused by the fact that, under `hadoop-2.7` profile, variable `hadoop-client-runtime.artifact` and `hadoop-client-api.artifact`are both `hadoop-client` which triggers the issue.

As a workaround, this changes `hadoop-client-runtime.artifact` to be `hadoop-yarn-api` when using `hadoop-2.7`. Since `hadoop-yarn-api` is a dependency of `hadoop-client`, this essentially moves the former to the same level as the latter. It should have no effect as both are dependencies of Spark.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

Closes #34100 from sunchao/SPARK-36835-followup.

Authored-by: Chao Sun <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
(cherry picked from commit 937a74e)
Signed-off-by: Gengliang Wang <[email protected]>
@dongjoon-hyun
Copy link
Member

Thank you!

@gengliangwang
Copy link
Member

@sunchao Unfortunately, the build without Hadoop failed after this one.

$ ./build/mvn clean package -DskipTests -B -Pmesos -Pyarn -Pkubernetes -Psparkr -Pscala-2.12 -Phadoop-provided

[ERROR] [Error] /opt/spark-rm/output/spark-3.2.0-bin-without-hadoop/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala:30: object MiniYARNCluster is not a member of package org.apache.hadoop.yarn.server
[ERROR] [Error] /opt/spark-rm/output/spark-3.2.0-bin-without-hadoop/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala:61: not found: type MiniYARNCluster
[ERROR] [Error] /opt/spark-rm/output/spark-3.2.0-bin-without-hadoop/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala:104: not found: type MiniYARNCluster
[ERROR] [Error] /opt/spark-rm/output/spark-3.2.0-bin-without-hadoop/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala:37: object resourcemanager is not a member of package org.apache.hadoop.yarn.server
[ERROR] [Error] /opt/spark-rm/output/spark-3.2.0-bin-without-hadoop/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala:38: object resourcemanager is not a member of package org.apache.hadoop.yarn.server
[ERROR] [Error] /opt/spark-rm/output/spark-3.2.0-bin-without-hadoop/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala:39: object resourcemanager is not a member of package org.apache.hadoop.yarn.server
[ERROR] [Error] /opt/spark-rm/output/spark-3.2.0-bin-without-hadoop/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala:40: object resourcemanager is not a member of package org.apache.hadoop.yarn.server
[ERROR] [Error] /opt/spark-rm/output/spark-3.2.0-bin-without-hadoop/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala:250: not found: type RMContext
[ERROR] [Error] /opt/spark-rm/output/spark-3.2.0-bin-without-hadoop/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala:252: not found: type RMApp
[ERROR] [Error] /opt/spark-rm/output/spark-3.2.0-bin-without-hadoop/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala:261: not found: type RMApplicationHistoryWriter
[ERROR] [Error] /opt/spark-rm/output/spark-3.2.0-bin-without-hadoop/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala:263: not found: type SystemMetricsPublisher
[ERROR] [Error] /opt/spark-rm/output/spark-3.2.0-bin-without-hadoop/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala:267: not found: type RMAppManager
[ERROR] [Error] /opt/spark-rm/output/spark-3.2.0-bin-without-hadoop/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala:272: not found: type ClientRMService

Could you fix it?

@sunchao
Copy link
Member Author

sunchao commented Sep 26, 2021

@gengliangwang will take a look - is it caused by this PR?

@sunchao
Copy link
Member Author

sunchao commented Sep 26, 2021

oooh I see, it is because only one active profile is allowed, and thus when hadoop-provided is activated, hadoop-3.2 will not be, and the Hadoop dependencies will not be found.

@sunchao
Copy link
Member Author

sunchao commented Sep 26, 2021

I think adding -Phadoop-3.2 should make it work but let me think how to fix this.

build/mvn clean package -DskipTests -B -Pmesos -Pyarn -Pkubernetes -Psparkr -Pscala-2.12 -Phadoop-provided -Phadoop-3.2

@gengliangwang
Copy link
Member

@sunchao Yes it is caused by this one. Again, thanks for looking into this.

@sunchao
Copy link
Member Author

sunchao commented Sep 26, 2021

There are two ways to fix this:

  1. move spark.yarn.isHadoopProvided to Spark parent pom, so that -Phadoop-3.2 can become the default profile in the YARN module's pom. I don't see any side effect on this - ideally this property can be more general such as spark.isHadoopProvided.
  2. move hadoop-client-runtime.artifact out of the -Phadoop-3.2 profile. It should fix the build issue but someone that's using -Phadoop-provided to test Hadoop 3.2 could still fail.

I'm inclined to option 1) here but let me know if you have any thoughts on this.

@gengliangwang
Copy link
Member

@sunchao yes let's try option 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants