Skip to content

Conversation

@zhangbutao
Copy link
Contributor

@zhangbutao zhangbutao commented Oct 10, 2024

What changes were proposed in this pull request?

Upgrade Hadoop version from 3.3.6 to 3.4.1

Why are the changes needed?

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

Yes. See the file:
hadoop_upgrade_dep.txt

How was this patch tested?

Existing Tests.

@slfan1989
Copy link
Contributor

@zhangbutao LGTM +1.

@Aggarwal-Raghav
Copy link
Contributor

@zhangbutao, thanks for driving this forward.

For tez project:

  1. We need to exclude logback jar from hadoop transitive dependency in tez project or move to hadoop 3.4.1, Otherwise it can cause classloading issues. IIRC, I had faced issue because logback jar was getting picked first and hive-log4j2.properties not getting honoured. If possible please go through the following:
    a. HADOOP-19084. Prune hadoop-common transitive dependencies (#6574) hadoop#6582 (comment)
    b. HADOOP-19153: hadoop-common exports logback as a dependency (This fix is not in hadoop 3.4.0)
  2. zookeeper version, I would prefer to keep it in sync:
    a. hive => 3.8.4
    b. hadoop3.4.0 => 3.8.3
    c. hadoop3.4.1 => 3.8.4

For hive:

@zhangbutao
Copy link
Contributor Author

@Aggarwal-Raghav Thanks for your insightful thought! Will check this later.

@sonarqubecloud
Copy link

@zhangbutao
Copy link
Contributor Author

I haven't figured out why the some qtests changed after upgrading guava. But these changes 78357d2 are just the names of the columns in the explain, so I think the guava upgrade is acceptable.

@ayushtkn
Copy link
Member

I am not sure we should chase Guava upgrade as part of Hadoop upgrade. We can track that separately I believe.

btw. Hadoop doesn't use guava version specified in its POM, that is kept only for its transitive dependency. It uses the Guava coming from hadoop-thirdparty: HADOOP-17288 and that is 30+ as of today, should be 30+ for 3.4.1 as well If I am not mistaken

https://github.com/apache/hadoop-thirdparty/blob/trunk/pom.xml#L101

@Aggarwal-Raghav
Copy link
Contributor

"Hadoop doesn't use guava version specified in its POM, that is kept only for its transitive dependency."

Oh, I was not aware of this. Then maybe we can track it in separate ticket.

Just info: In our codebase, we have guava version 32.0.1-jre in tez (0.10.3), hadoop(3.3.6) and hive(4.0.0) and I didn't observed any UT failures there. Something to investigate on my end.

@zhangbutao
Copy link
Contributor Author

guava 32.0.1-jre would cause lots of qtests failure, including some class not found exeception.

guava 27.0-jre would casue some minor explain qtests changes. https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5500/9/pipeline/

So, Maybe we can upgrade to this 27.0 version first and then consider upgrading to 32.0 version.

In short, it makes more sense to study the guava version carefully in subsequent ticket before upgrading.

I will revert the guava upgrade in this PR. @Aggarwal-Raghav @ayushtkn

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments rest looks good

@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 5, 2025

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ayushtkn ayushtkn merged commit fdd48ef into apache:master Feb 10, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants