Skip to content

Commit 66a3653

Browse files
committed
[SPARK-44678][BUILD][3.5] Downgrade Hadoop to 3.3.4
### What changes were proposed in this pull request? This PR aims to downgrade the Apache Hadoop dependency to 3.3.4 in `Apache Spark 3.5` in order to prevent any regression from `Apache Spark 3.4.x`. In other words, although `Apache Spark 3.5.x` will lose many bug fixes of Apache Hadoop 3.3.5 and 3.3.6, it will be in the same situation with `Apache Spark 3.4.x`. - SPARK-44197 Upgrade Hadoop to 3.3.6 (#41744) - SPARK-42913 Upgrade Hadoop to 3.3.5 (#39124) - SPARK-43448 Remove dummy dependency `hadoop-openstack` (#41133) On top of reverting SPARK-44197 and SPARK-42913, this PR has additional dependency exclusion change due to the following. - SPARK-43880 Organize `hadoop-cloud` in standard maven project structure (#41380) ### Why are the changes needed? There is a community report on S3A committer performance regression. Although it's one liner fix, there is no available Hadoop release with that fix at this time. - HADOOP-18757: Bump corePoolSize of HadoopThreadPoolExecutor in s3a committer (apache/hadoop#5706) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. Closes #42345 from dongjoon-hyun/SPARK-44678. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent 981cf80 commit 66a3653

File tree

4 files changed

+26
-14
lines changed

4 files changed

+26
-14
lines changed

dev/deps/spark-deps-hadoop-3-hive-2.3

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ audience-annotations/0.5.0//audience-annotations-0.5.0.jar
2424
avro-ipc/1.11.2//avro-ipc-1.11.2.jar
2525
avro-mapred/1.11.2//avro-mapred-1.11.2.jar
2626
avro/1.11.2//avro-1.11.2.jar
27-
aws-java-sdk-bundle/1.12.367//aws-java-sdk-bundle-1.12.367.jar
27+
aws-java-sdk-bundle/1.12.262//aws-java-sdk-bundle-1.12.262.jar
2828
azure-data-lake-store-sdk/2.3.9//azure-data-lake-store-sdk-2.3.9.jar
2929
azure-keyvault-core/1.0.0//azure-keyvault-core-1.0.0.jar
3030
azure-storage/7.0.1//azure-storage-7.0.1.jar
@@ -66,16 +66,17 @@ gcs-connector/hadoop3-2.2.14/shaded/gcs-connector-hadoop3-2.2.14-shaded.jar
6666
gmetric4j/1.0.10//gmetric4j-1.0.10.jar
6767
gson/2.2.4//gson-2.2.4.jar
6868
guava/14.0.1//guava-14.0.1.jar
69-
hadoop-aliyun/3.3.6//hadoop-aliyun-3.3.6.jar
70-
hadoop-annotations/3.3.6//hadoop-annotations-3.3.6.jar
71-
hadoop-aws/3.3.6//hadoop-aws-3.3.6.jar
72-
hadoop-azure-datalake/3.3.6//hadoop-azure-datalake-3.3.6.jar
73-
hadoop-azure/3.3.6//hadoop-azure-3.3.6.jar
74-
hadoop-client-api/3.3.6//hadoop-client-api-3.3.6.jar
75-
hadoop-client-runtime/3.3.6//hadoop-client-runtime-3.3.6.jar
76-
hadoop-cloud-storage/3.3.6//hadoop-cloud-storage-3.3.6.jar
69+
hadoop-aliyun/3.3.4//hadoop-aliyun-3.3.4.jar
70+
hadoop-annotations/3.3.4//hadoop-annotations-3.3.4.jar
71+
hadoop-aws/3.3.4//hadoop-aws-3.3.4.jar
72+
hadoop-azure-datalake/3.3.4//hadoop-azure-datalake-3.3.4.jar
73+
hadoop-azure/3.3.4//hadoop-azure-3.3.4.jar
74+
hadoop-client-api/3.3.4//hadoop-client-api-3.3.4.jar
75+
hadoop-client-runtime/3.3.4//hadoop-client-runtime-3.3.4.jar
76+
hadoop-cloud-storage/3.3.4//hadoop-cloud-storage-3.3.4.jar
77+
hadoop-openstack/3.3.4//hadoop-openstack-3.3.4.jar
7778
hadoop-shaded-guava/1.1.1//hadoop-shaded-guava-1.1.1.jar
78-
hadoop-yarn-server-web-proxy/3.3.6//hadoop-yarn-server-web-proxy-3.3.6.jar
79+
hadoop-yarn-server-web-proxy/3.3.4//hadoop-yarn-server-web-proxy-3.3.4.jar
7980
hive-beeline/2.3.9//hive-beeline-2.3.9.jar
8081
hive-cli/2.3.9//hive-cli-2.3.9.jar
8182
hive-common/2.3.9//hive-common-2.3.9.jar
@@ -115,6 +116,7 @@ janino/3.1.9//janino-3.1.9.jar
115116
javassist/3.29.2-GA//javassist-3.29.2-GA.jar
116117
javax.jdo/3.2.0-m3//javax.jdo-3.2.0-m3.jar
117118
javolution/5.5.1//javolution-5.5.1.jar
119+
jaxb-api/2.2.11//jaxb-api-2.2.11.jar
118120
jaxb-runtime/2.3.2//jaxb-runtime-2.3.2.jar
119121
jcl-over-slf4j/2.0.7//jcl-over-slf4j-2.0.7.jar
120122
jdo-api/3.0.1//jdo-api-3.0.1.jar
@@ -125,7 +127,7 @@ jersey-container-servlet-core/2.40//jersey-container-servlet-core-2.40.jar
125127
jersey-container-servlet/2.40//jersey-container-servlet-2.40.jar
126128
jersey-hk2/2.40//jersey-hk2-2.40.jar
127129
jersey-server/2.40//jersey-server-2.40.jar
128-
jettison/1.5.4//jettison-1.5.4.jar
130+
jettison/1.1//jettison-1.1.jar
129131
jetty-util-ajax/9.4.51.v20230217//jetty-util-ajax-9.4.51.v20230217.jar
130132
jetty-util/9.4.51.v20230217//jetty-util-9.4.51.v20230217.jar
131133
jline/2.14.6//jline-2.14.6.jar
@@ -246,7 +248,7 @@ threeten-extra/1.7.1//threeten-extra-1.7.1.jar
246248
tink/1.9.0//tink-1.9.0.jar
247249
transaction-api/1.1//transaction-api-1.1.jar
248250
univocity-parsers/2.9.1//univocity-parsers-2.9.1.jar
249-
wildfly-openssl/1.1.3.Final//wildfly-openssl-1.1.3.Final.jar
251+
wildfly-openssl/1.0.7.Final//wildfly-openssl-1.0.7.Final.jar
250252
xbean-asm9-shaded/4.23//xbean-asm9-shaded-4.23.jar
251253
xz/1.9//xz-1.9.jar
252254
zjsonpatch/0.3.0//zjsonpatch-0.3.0.jar

hadoop-cloud/pom.xml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,12 @@
134134
<artifactId>hadoop-azure</artifactId>
135135
<version>${hadoop.version}</version>
136136
<scope>${hadoop.deps.scope}</scope>
137+
<exclusions>
138+
<exclusion>
139+
<groupId>org.codehaus.jackson</groupId>
140+
<artifactId>jackson-core-asl</artifactId>
141+
</exclusion>
142+
</exclusions>
137143
</dependency>
138144
<!--
139145
There's now a hadoop-cloud-storage which transitively pulls in the store JARs,
@@ -145,6 +151,10 @@
145151
<version>${hadoop.version}</version>
146152
<scope>${hadoop.deps.scope}</scope>
147153
<exclusions>
154+
<exclusion>
155+
<groupId>org.apache.hadoop</groupId>
156+
<artifactId>hadoop-common</artifactId>
157+
</exclusion>
148158
<exclusion>
149159
<!--
150160
This is a code coverage library introduced by aliyun-java-sdk-core, only for testing

pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@
122122
<slf4j.version>2.0.7</slf4j.version>
123123
<log4j.version>2.20.0</log4j.version>
124124
<!-- make sure to update IsolatedClientLoader whenever this version is changed -->
125-
<hadoop.version>3.3.6</hadoop.version>
125+
<hadoop.version>3.3.4</hadoop.version>
126126
<!-- SPARK-41247: When updating `protobuf.version`, also need to update `protoVersion` in `SparkBuild.scala` -->
127127
<protobuf.version>3.23.4</protobuf.version>
128128
<protoc-jar-maven-plugin.version>3.11.4</protoc-jar-maven-plugin.version>

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ private[hive] object IsolatedClientLoader extends Logging {
6666
case e: RuntimeException if e.getMessage.contains("hadoop") =>
6767
// If the error message contains hadoop, it is probably because the hadoop
6868
// version cannot be resolved.
69-
val fallbackVersion = "3.3.6"
69+
val fallbackVersion = "3.3.4"
7070
logWarning(s"Failed to resolve Hadoop artifacts for the version $hadoopVersion. We " +
7171
s"will change the hadoop version from $hadoopVersion to $fallbackVersion and try " +
7272
"again. It is recommended to set jars used by Hive metastore client through " +

0 commit comments

Comments
 (0)