-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-36835][FOLLOWUP][BUILD][TEST-HADOOP2.7] Fix maven issue for Hadoop 2.7 profile after enabling dependency reduced pom #34100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
let me test both Hadoop profiles here |
|
Kubernetes integration test starting |
launcher/pom.xml
Outdated
| <!-- | ||
| Only declare for Hadoop 3.2 profile. Otherwise maven-shade-plugin may stuck in an infinite | ||
| loop building the dependency-reduced pom, perhaps due to a maven bug: | ||
| https://issues.apache.org/jira/browse/MSHADE-148 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the pointer. According to the discussion on that issue, it's marked as resolved, but not fixed yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it's not fixed yet. See the last comment in the JIRA.
|
It looks reasonable to me. I believe we need @gengliangwang 's sign-off with Hadoop 2.7 testing. |
|
retest this please |
|
Thank you for the fix @sunchao |
|
Kubernetes integration test status failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running
build/mvn -DskipTests -Phadoop-2.7 clean install
didn't work for me: I ran into Enforcer errors related to duplicate dependencies:
[INFO] --- maven-enforcer-plugin:3.0.0-M2:enforce (enforce-no-duplicate-dependencies) @ spark-core_2.12 ---
[WARNING] Rule 0: org.apache.maven.plugins.enforcer.BanDuplicatePomDependencyVersions failed with message:
Found 1 duplicate dependency declaration in this project:
- dependencies.dependency[org.apache.hadoop:hadoop-client:jar] ( 2 times )
It seems like there's some sort of interaction between Enforcer and Maven Shade, since I would have expected the non-dependency-reduced build to also fail with the same duplicate dependencies issue.
In order to get this to work, I had to make a similar change in the core and kafka-0-10-assembly builds: 11b34c2
I should probably test this again with all optional profiles / modules enabled (or just copy the profiles used when publishing).
|
Kubernetes integration test starting |
|
Test build #143608 has finished for PR 34100 at commit
|
|
Kubernetes integration test status failure |
|
Thanks @JoshRosen ! it's interesting that this error is not reported in the CI jobs. So the enforcer rule is only executed in Let me double verify locally and add the changes to all the necessary modules. |
|
Test build #143611 has finished for PR 34100 at commit
|
|
updated the PR to use different name for and the build is successful. |
| <commons-io.version>2.4</commons-io.version> | ||
| <hadoop-client-api.artifact>hadoop-client</hadoop-client-api.artifact> | ||
| <hadoop-client-runtime.artifact>hadoop-client</hadoop-client-runtime.artifact> | ||
| <hadoop-client-runtime.artifact>hadoop-yarn-api</hadoop-client-runtime.artifact> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh, this is a clever fix:
Instead of the hadoop-2.7 profile resulting in a duplicate direct dependency on hadoop-client, we now just declare an explicit dependency on one of hadoop-client's transitive dependencies (hadoop-yarn-api in this case). Anything which depends on hadoop-client-runtime.artifact must also depend on hadoop-client-api.artifact, so this doesn't end up changing the set of dependencies pulled in.
It looks like we didn't need to do that for hadoop-client-minicluster.artifact because that's only used in the resource-managers/yarn POM and that's already using Maven profiles to control the dependency selection (so the other workaround is fairly non-invasive in that context). In principle, though, I guess we could have changed that to some other transitive dep.
Could you maybe add a one or two line comment above these Hadoop 2.7 lines to explain what's going on? And maybe edit the comment at
Lines 251 to 255 in d73562e
| <!-- | |
| These default to Hadoop 3.x shaded client/minicluster jars, but are switched to hadoop-client | |
| when the Hadoop profile is hadoop-2.7, because these are only available in 3.x. Note that, | |
| as result we have to include the same hadoop-client dependency multiple times in hadoop-2.7. | |
| --> |
Edit: could you also update the PR description to reflect this final fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look. Yes I think it's better to apply the same for hadoop-client-minicluster.artifact. Let me try that, and perhaps we won't need the changes in YARN's pom.xml with this.
The side effect for this is seems to be that it affects the distance of these dependencies to the root module and thus may make a difference when maven tries to resolve a dependency with multiple versions (see here for reference). I was using hadoop-common (which carries lots of dependencies) instead of hadoop-yarn-api and it was not able to compile.
Will update PR description and the comment in the above pom.xml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it may not be so useful to change hadoop-client-minicluster.artifact since it is test scope while the other two are compile scope by default. For some reason it also changes dev/deps/spark-deps-hadoop-2.7-hive-2.3 when I set it to something like hadoop-mapreduce-client-jobclient.
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
73ba941 to
0c358b3
Compare
|
Test build #143615 has finished for PR 34100 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #143618 has finished for PR 34100 at commit
|
|
I tried |
|
Is there any change in your previous opinion, @JoshRosen ? |
|
branch-3.2 also seems to need this fix |
JoshRosen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, so +1 to merging so we can cut a new RC. Thanks again for the fix!
|
Merging to master/3.2. Thanks all! |
…doop 2.7 profile after enabling dependency reduced pom ### What changes were proposed in this pull request? Fix an issue where Maven may stuck in an infinite loop when building Spark, for Hadoop 2.7 profile. ### Why are the changes needed? After re-enabling `createDependencyReducedPom` for `maven-shade-plugin`, Spark build stopped working for Hadoop 2.7 profile and will stuck in an infinitely loop, likely due to a Maven shade plugin bug similar to https://issues.apache.org/jira/browse/MSHADE-148. This seems to be caused by the fact that, under `hadoop-2.7` profile, variable `hadoop-client-runtime.artifact` and `hadoop-client-api.artifact`are both `hadoop-client` which triggers the issue. As a workaround, this changes `hadoop-client-runtime.artifact` to be `hadoop-yarn-api` when using `hadoop-2.7`. Since `hadoop-yarn-api` is a dependency of `hadoop-client`, this essentially moves the former to the same level as the latter. It should have no effect as both are dependencies of Spark. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #34100 from sunchao/SPARK-36835-followup. Authored-by: Chao Sun <[email protected]> Signed-off-by: Gengliang Wang <[email protected]> (cherry picked from commit 937a74e) Signed-off-by: Gengliang Wang <[email protected]>
|
Thank you! |
|
@sunchao Unfortunately, the build without Hadoop failed after this one. Could you fix it? |
|
@gengliangwang will take a look - is it caused by this PR? |
|
oooh I see, it is because only one active profile is allowed, and thus when |
|
I think adding |
|
@sunchao Yes it is caused by this one. Again, thanks for looking into this. |
|
There are two ways to fix this:
I'm inclined to option 1) here but let me know if you have any thoughts on this. |
|
@sunchao yes let's try option 1 |
What changes were proposed in this pull request?
Fix an issue where Maven may stuck in an infinite loop when building Spark, for Hadoop 2.7 profile.
Why are the changes needed?
After re-enabling
createDependencyReducedPomformaven-shade-plugin, Spark build stopped working for Hadoop 2.7 profile and will stuck in an infinitely loop, likely due to a Maven shade plugin bug similar to https://issues.apache.org/jira/browse/MSHADE-148. This seems to be caused by the fact that, underhadoop-2.7profile, variablehadoop-client-runtime.artifactandhadoop-client-api.artifactare bothhadoop-clientwhich triggers the issue.As a workaround, this changes
hadoop-client-runtime.artifactto behadoop-yarn-apiwhen usinghadoop-2.7. Sincehadoop-yarn-apiis a dependency ofhadoop-client, this essentially moves the former to the same level as the latter. It should have no effect as both are dependencies of Spark.Does this PR introduce any user-facing change?
No.
How was this patch tested?
N/A