[SPARK-33618][CORE] Use hadoop-client instead of hadoop-client-api to make hadoop-aws work #30508

dongjoon-hyun · 2020-11-25T22:57:24Z

What changes were proposed in this pull request?

This reverts commit SPARK-33212 (cb3fa6c) mostly with three exceptions:

SparkSubmitUtils was updated recently by SPARK-33580
resource-managers/yarn/pom.xml was updated recently by SPARK-33104 to add hadoop-yarn-server-resourcemanager test dependency.
Adjust com.fasterxml.jackson.module:jackson-module-jaxb-annotations dependency in K8s module which is updated recently by SPARK-33471.

Why are the changes needed?

According to HADOOP-16080 since Apache Hadoop 3.1.1, hadoop-aws doesn't work with hadoop-client-api. It fails at write operation like the following.

1. Spark distribution with -Phadoop-cloud

$ bin/spark-shell --conf spark.hadoop.fs.s3a.access.key=$AWS_ACCESS_KEY_ID --conf spark.hadoop.fs.s3a.secret.key=$AWS_SECRET_ACCESS_KEY
20/11/30 23:01:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context available as 'sc' (master = local[*], app id = local-1606806088715).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.0-SNAPSHOT
      /_/

Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_272)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.read.parquet("s3a://dongjoon/users.parquet").show
20/11/30 23:01:34 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
+------+--------------+----------------+
|  name|favorite_color|favorite_numbers|
+------+--------------+----------------+
|Alyssa|          null|  [3, 9, 15, 20]|
|   Ben|           red|              []|
+------+--------------+----------------+


scala> Seq(1).toDF.write.parquet("s3a://dongjoon/out.parquet")
20/11/30 23:02:14 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)/ 1]
java.lang.NoSuchMethodError: org.apache.hadoop.util.SemaphoredDelegatingExecutor.<init>(Lcom/google/common/util/concurrent/ListeningExecutorService;IZ)V

2. Spark distribution without -Phadoop-cloud

$ bin/spark-shell --conf spark.hadoop.fs.s3a.access.key=$AWS_ACCESS_KEY_ID --conf spark.hadoop.fs.s3a.secret.key=$AWS_SECRET_ACCESS_KEY -c spark.eventLog.enabled=true -c spark.eventLog.dir=s3a://dongjoon/spark-events/ --packages org.apache.hadoop:hadoop-aws:3.2.0,org.apache.hadoop:hadoop-common:3.2.0
...
java.lang.NoSuchMethodError: org.apache.hadoop.util.SemaphoredDelegatingExecutor.<init>(Lcom/google/common/util/concurrent/ListeningExecutorService;IZ)V
  at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:772)

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CI.

dongjoon-hyun · 2020-11-25T23:09:39Z

cc @sunchao

dongjoon-hyun · 2020-11-26T00:15:59Z

This is for testing the feasibility as one of the option.

pom.xml

HyukjinKwon · 2020-11-26T00:44:25Z

I am good with reverting this first. I will take a look separately for SPARK-33104. Presumably the tests will fail with Hadoop 2.

HyukjinKwon · 2020-11-26T00:46:25Z

@dongjoon-hyun do you mind fixing the PR title and description to contain SPARK-33104 and 10bd42c?

sunchao · 2020-11-26T00:48:49Z

Yes I'm fine for reverting this first while we searching for other solutions. Let's hope we can still ship this in Spark 3.1 release.

dongjoon-hyun · 2020-11-26T00:50:34Z

Thank you, @HyukjinKwon and @sunchao .
This is still testing to check the feasibility to revert~ This PR will wait until next Monday. :)

BTW, I'll update the PR title and description.

…ofile" This reverts commit cb3fa6c. (cherry picked from commit a7dc7f92a392328bcbc95800f09d467a89d18dfe) Signed-off-by: Dongjoon Hyun <[email protected]>

dongjoon-hyun · 2020-12-01T08:38:13Z

Hi, All.
To investigate this more during Apache Spark 3.1 QA timeframe, I filed a new JIRA.
We have a few approaches including this and #30556 .

SparkQA · 2020-12-02T04:18:22Z

Test build #132008 has finished for PR 30508 at commit 806aa85.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-12-02T04:30:12Z

Test build #132012 has finished for PR 30508 at commit 8bbde84.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2020-12-02T07:04:32Z

Hi, @HyukjinKwon .
Could you review this PR, please? I will reopen SPARK-33212 after merging this PR.
This will recover hadoop-aws functionality in Apache Spark 3.1.

dongjoon-hyun · 2020-12-02T07:26:28Z

Also, cc @viirya , @dbtsai , @sunchao , @srowen , @AngersZhuuuu , @mridulm , @tgravescs .

viirya

Looks okay as I compared with SPARK-33212 (cb3fa6c).

HyukjinKwon

I already reviewed this actually. Was wondering which one you guys prefer. LGTM

HyukjinKwon · 2020-12-02T09:23:32Z

Merged to master.

srowen · 2020-12-02T12:31:57Z

dev/deps/spark-deps-hadoop-3.2-hive-2.3

+hadoop-annotations/3.2.0//hadoop-annotations-3.2.0.jar
+hadoop-auth/3.2.0//hadoop-auth-3.2.0.jar
+hadoop-client/3.2.0//hadoop-client-3.2.0.jar
+hadoop-common/3.2.0//hadoop-common-3.2.0.jar


This pulls a ton more code into Spark now, like the whole client... hm, is this going to affect the hadoop-provided distro? it also downgrades some versions above which may be harmless. We really need this just for hadoop-aws?

Oh, @srowen this is basically a revert. There was an issue found of shading hadoop client so it was reverted here as a safe choice. A proper fix is in progress.

Ah OK, nevermind. I am not following closely.

dongjoon-hyun · 2020-12-02T16:32:39Z

Thank you, @viirya , @HyukjinKwon and @srowen !

yaooqinn · 2021-01-15T06:22:58Z

dev/deps/spark-deps-hadoop-3.2-hive-2.3

+kerb-crypto/1.0.1//kerb-crypto-1.0.1.jar
+kerb-identity/1.0.1//kerb-identity-1.0.1.jar
+kerb-server/1.0.1//kerb-server-1.0.1.jar
+kerb-simplekdc/1.0.1//kerb-simplekdc-1.0.1.jar


just for curiosity, does spark has a chance to play the role of KDC at runtime?

This is actually a revert. It was added in ce7ba2e#diff-e45e1eee8dcfd7eaf8a013cec02b67806da3edeabe0f195ac6b4402f67d4b6dcR146

looks like the original PR does not handle any transitive artifact exclusion at all 😸

github-actions bot added BUILD CORE SQL STRUCTURED STREAMING YARN labels Nov 25, 2020

dongjoon-hyun mentioned this pull request Nov 25, 2020

[SPARK-33212][BUILD] Move to shaded clients for Hadoop 3.x profile #29843

Closed

dongjoon-hyun requested a review from dbtsai November 25, 2020 23:09

This comment has been minimized.

Sign in to view

dongjoon-hyun mentioned this pull request Nov 26, 2020

[SPARK-33104][BUILD] Exclude 'org.apache.hadoop:hadoop-yarn-server-resourcemanager:jar:tests' #30133

Closed

dongjoon-hyun requested a review from HyukjinKwon November 26, 2020 00:15

HyukjinKwon reviewed Nov 26, 2020

View reviewed changes

pom.xml Outdated Show resolved Hide resolved

dongjoon-hyun changed the title ~~Revert "[SPARK-33212][BUILD] Move to shaded clients for Hadoop 3.x profile"~~ Revert SPARK-33212 and SPARK-33104 to recover hadoop-aws for Hadoop 3.x Nov 26, 2020

This comment has been minimized.

Sign in to view

dongjoon-hyun mentioned this pull request Nov 29, 2020

[SPARK-33495][BUILD] Remove commons-logging.jar's dependency #30470

Closed

dongjoon-hyun changed the title ~~Revert SPARK-33212 and SPARK-33104 to recover hadoop-aws for Hadoop 3.x~~ Revert "[SPARK-33212][BUILD] Move to shaded clients for Hadoop 3.x profile" Nov 30, 2020

This comment has been minimized.

Sign in to view

dongjoon-hyun marked this pull request as draft December 1, 2020 07:37

Revert "[SPARK-33212][BUILD] Move to shaded clients for Hadoop 3.x pr…

aa30e62

…ofile" This reverts commit cb3fa6c. (cherry picked from commit a7dc7f92a392328bcbc95800f09d467a89d18dfe) Signed-off-by: Dongjoon Hyun <[email protected]>

dongjoon-hyun changed the title ~~Revert "[SPARK-33212][BUILD] Move to shaded clients for Hadoop 3.x profile"~~ [WIP][SPARK-33618][CORE] Fix hadoop-aws to work Dec 1, 2020

exclude

806aa85

github-actions bot added the KUBERNETES label Dec 2, 2020

Add back

8bbde84

dongjoon-hyun changed the title ~~[WIP][SPARK-33618][CORE] Fix hadoop-aws to work~~ [SPARK-33618][CORE] Fix hadoop-aws to work Dec 2, 2020

dongjoon-hyun marked this pull request as ready for review December 2, 2020 02:56

dongjoon-hyun changed the title ~~[SPARK-33618][CORE] Fix hadoop-aws to work~~ [SPARK-33618][CORE] Use hadoop-client instead of hadoop-client-api to make hadoop-aws work Dec 2, 2020

dongjoon-hyun mentioned this pull request Dec 2, 2020

[WIP][SPARK-33212][BUILD] Provide hadoop-aws-shaded jar in hadoop-cloud module #30556

Closed

viirya approved these changes Dec 2, 2020

View reviewed changes

HyukjinKwon approved these changes Dec 2, 2020

View reviewed changes

HyukjinKwon closed this in 290aa02 Dec 2, 2020

srowen reviewed Dec 2, 2020

View reviewed changes

dongjoon-hyun deleted the SPARK-33212-REVERT branch December 2, 2020 16:32

yaooqinn reviewed Jan 15, 2021

View reviewed changes

[SPARK-33618][CORE] Use hadoop-client instead of hadoop-client-api to make hadoop-aws work #30508

[SPARK-33618][CORE] Use hadoop-client instead of hadoop-client-api to make hadoop-aws work #30508

Uh oh!

Conversation

dongjoon-hyun commented Nov 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

dongjoon-hyun commented Nov 25, 2020

Uh oh!

This comment has been minimized.

dongjoon-hyun commented Nov 26, 2020

Uh oh!

Uh oh!

HyukjinKwon commented Nov 26, 2020

Uh oh!

HyukjinKwon commented Nov 26, 2020

Uh oh!

sunchao commented Nov 26, 2020

Uh oh!

dongjoon-hyun commented Nov 26, 2020

Uh oh!

This comment has been minimized.

This comment has been minimized.

dongjoon-hyun commented Dec 1, 2020

Uh oh!

SparkQA commented Dec 2, 2020

Uh oh!

SparkQA commented Dec 2, 2020

Uh oh!

dongjoon-hyun commented Dec 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Dec 2, 2020

Uh oh!

viirya left a comment

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Dec 2, 2020

Uh oh!

srowen Dec 2, 2020

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Dec 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srowen Dec 2, 2020

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Dec 2, 2020

Uh oh!

yaooqinn Jan 15, 2021

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Jan 15, 2021

Choose a reason for hiding this comment

Uh oh!

yaooqinn Jan 15, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

dongjoon-hyun commented Nov 25, 2020 •

edited

Loading

dongjoon-hyun commented Dec 2, 2020 •

edited

Loading

HyukjinKwon Dec 2, 2020 •

edited

Loading