-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-33212][FOLLOW-UP][BUILD] Bring back duplicate dependency check and add more strict Hadoop version check #31203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
xkrogen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fast follow-on Chao!
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
Outdated
Show resolved
Hide resolved
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #134130 has finished for PR 31203 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #134146 has finished for PR 31203 at commit
|
b5d82b7 to
8650a73
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably out of the scope of this PR. I'll open a new one if we agree this is the right thing to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems a good improvement to me, that table is pretty unsightly as-is.
8650a73 to
6657c87
Compare
xkrogen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should go into VersionUtils? That does claim to be specifically for working with Spark version strings, but it seems relevant...
| def supportHadoopShadedClient(hadoopVersion: String): Boolean = { | ||
| getVersionParts(hadoopVersion).exists { | ||
| case (3, 2, v) if v >= 2 => true | ||
| case _ => false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe
case (maj, _, _) if maj > 3 => true
case (3, min, _) if min > 2 => true
case (3, 2, patch) if patch >=2 => true
Seems like we can reasonably assume that future versions of Hadoop will support the shaded client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd better to wait until the future versions come out before changing this (so that we can verify firs). For instance, Hadoop 3.3.0 currently doesn't support shaded client (due to the hadoop-aws issue). But yeah the Hadoop 3.2.2+ should support the shaded client assuming there's no regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting... I just worry about forgetting to update this if/when we bump the Hadoop version in the future and causing a regression. Has the hadoop-aws fix made it to be targeted for Hadoop 3.3.1? If so, can we reasonably assume that 3.2.2+, 3.3.1+, and 3.4.0+ will have it?
It seems you're more tied into what's happening in the Hadoop world than I am these days so I'll take your word in either direction. If we decide not to future-proof it, can we create a follow-up JIRA to revisit it once some future release is out at which time we would be confident in putting a wildcard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your concern is valid. One thing we can do is perhaps adding a test to make sure that the built-in Hadoop version is always compatible with the shaded client. So that in future if we upgrade Hadoop version & forget to do this, the test will break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yes we can assume that 3.2.2+, 3.3.1+ and 3.4.0+ will all have the fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent idea on adding a compatibility test for the built-in Hadoop version!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems a good improvement to me, that table is pretty unsightly as-is.
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
Outdated
Show resolved
Hide resolved
Oops I wasn't even aware of |
|
Test build #134382 has finished for PR 31203 at commit
|
- move version parsing to `VersionUtils` - add compatibility test - use `Option` for null
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #134405 has finished for PR 31203 at commit
|
core/src/test/scala/org/apache/spark/util/VersionUtilsSuite.scala
Outdated
Show resolved
Hide resolved
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HadoopVersionInfoSuite.scala
Outdated
Show resolved
Hide resolved
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
Outdated
Show resolved
Hide resolved
|
|
||
| /** | ||
| * Retrieves the major, minor and patch parts from the input `version`. Returns `None` if the | ||
| * input is not of a valid format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should mention that minor/patch versions are filled in as 0 if they're not found. This is different from the behavior of other methods in this class (e.g. majorMinor will give an error if minor is not present)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good point.
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #134499 has finished for PR 31203 at commit
|
|
Changes LGTM thanks @sunchao ! |
|
Thanks @xkrogen for the thorough review! @dongjoon-hyun @viirya could you take a look? |
| def supportsHadoopShadedClient(hadoopVersion: String): Boolean = { | ||
| VersionUtils.majorMinorPatchVersion(hadoopVersion).exists { | ||
| case (3, 2, v) if v >= 2 => true | ||
| case _ => false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I agree that we need to change this again.
And yes we can assume that 3.2.2+, 3.3.1+ and 3.4.0+ will all have the fix.
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM (with only one minor comment about comment typo)
| val extraExclusions = if (hadoopVersion.startsWith("3")) { | ||
| // this introduced from lower version of Hive could conflict with jars in Hadoop 3.2+, so | ||
| // exclude here in favor of the ones in Hadoop 3.2+ | ||
| Seq("org.apache.hadoop:hadoop-auth") | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to exclude hadoop-auth anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good with one question.
|
Since the last question is addressed, I'll merge this. Thanks! |
…uld support shaded client" for hadoop-2.7 ### What changes were proposed in this pull request? We added test "built-in Hadoop version should support shaded client" in #31203, but it fails when profile hadoop-2.7 is activated. This change fixes the test by skipping the assertion when Hadoop version is 2. ### Why are the changes needed? The test fails in master branch when profile hadoop-2.7 is activated. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Ran the test with hadoop-2.7 profile. Closes #31391 from bozhang2820/fix-hadoop-2-version-test. Authored-by: Bo Zhang <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
… and add more strict Hadoop version check ### What changes were proposed in this pull request? 1. Add back Maven enforcer for duplicate dependencies check 2. More strict check on Hadoop versions which support shaded client in `IsolatedClientLoader`. To do proper version check, this adds a util function `majorMinorPatchVersion` to extract major/minor/patch version from a string. 3. Cleanup unnecessary code ### Why are the changes needed? The Maven enforcer was removed as part of apache#30556. This proposes to add it back. Also, Hadoop shaded client doesn't work in certain cases (see [these comments](apache#30701 (comment)) for details). This strictly checks that the current Hadoop version (i.e., 3.2.2 at the moment) has good support of shaded client or otherwise fallback to old unshaded ones. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. Closes apache#31203 from sunchao/SPARK-33212-followup. Lead-authored-by: Chao Sun <[email protected]> Co-authored-by: Chao Sun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…uld support shaded client" for hadoop-2.7 ### What changes were proposed in this pull request? We added test "built-in Hadoop version should support shaded client" in apache#31203, but it fails when profile hadoop-2.7 is activated. This change fixes the test by skipping the assertion when Hadoop version is 2. ### Why are the changes needed? The test fails in master branch when profile hadoop-2.7 is activated. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Ran the test with hadoop-2.7 profile. Closes apache#31391 from bozhang2820/fix-hadoop-2-version-test. Authored-by: Bo Zhang <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
… and add more strict Hadoop version check 1. Add back Maven enforcer for duplicate dependencies check 2. More strict check on Hadoop versions which support shaded client in `IsolatedClientLoader`. To do proper version check, this adds a util function `majorMinorPatchVersion` to extract major/minor/patch version from a string. 3. Cleanup unnecessary code The Maven enforcer was removed as part of apache#30556. This proposes to add it back. Also, Hadoop shaded client doesn't work in certain cases (see [these comments](apache#30701 (comment)) for details). This strictly checks that the current Hadoop version (i.e., 3.2.2 at the moment) has good support of shaded client or otherwise fallback to old unshaded ones. No. Existing tests. Closes apache#31203 from sunchao/SPARK-33212-followup. Lead-authored-by: Chao Sun <[email protected]> Co-authored-by: Chao Sun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…uld support shaded client" for hadoop-2.7 ### What changes were proposed in this pull request? We added test "built-in Hadoop version should support shaded client" in apache#31203, but it fails when profile hadoop-2.7 is activated. This change fixes the test by skipping the assertion when Hadoop version is 2. ### Why are the changes needed? The test fails in master branch when profile hadoop-2.7 is activated. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Ran the test with hadoop-2.7 profile. Closes apache#31391 from bozhang2820/fix-hadoop-2-version-test. Authored-by: Bo Zhang <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
What changes were proposed in this pull request?
IsolatedClientLoader. To do proper version check, this adds a util functionmajorMinorPatchVersionto extract major/minor/patch version from a string.Why are the changes needed?
The Maven enforcer was removed as part of #30556. This proposes to add it back.
Also, Hadoop shaded client doesn't work in certain cases (see these comments for details). This strictly checks that the current Hadoop version (i.e., 3.2.2 at the moment) has good support of shaded client or otherwise fallback to old unshaded ones.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing tests.