Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Aug 4, 2023

What changes were proposed in this pull request?

This PR aims to downgrade the Apache Hadoop dependency to 3.3.4 in Apache Spark 3.5 in order to prevent any regression from Apache Spark 3.4.x. In other words, although Apache Spark 3.5.x will lose many bug fixes of Apache Hadoop 3.3.5 and 3.3.6, it will be in the same situation with Apache Spark 3.4.x.

On top of reverting SPARK-44197 and SPARK-42913, this PR has additional dependency exclusion change due to the following.

Why are the changes needed?

There is a community report on S3A committer performance regression. Although it's one liner fix, there is no available Hadoop release with that fix at this time.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs.

@dongjoon-hyun
Copy link
Member Author

Thank you, @pan3793 .

Also, cc @LuciferYang , @sunchao , @viirya

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@viirya
Copy link
Member

viirya commented Aug 4, 2023

Looks good to me.

@dongjoon-hyun
Copy link
Member Author

Thank you, @sunchao , @viirya , @pan3793 .
Merged to branch-3.4 for Apache Spark 3.5.0 RC2.

dongjoon-hyun added a commit that referenced this pull request Aug 4, 2023
### What changes were proposed in this pull request?

This PR aims to downgrade the Apache Hadoop dependency to 3.3.4 in `Apache Spark 3.5` in order to prevent any regression from `Apache Spark 3.4.x`. In other words, although `Apache Spark 3.5.x` will lose many bug fixes of Apache Hadoop 3.3.5 and 3.3.6, it will be in the same situation with `Apache Spark 3.4.x`.
- SPARK-44197 Upgrade Hadoop to 3.3.6 (#41744)
- SPARK-42913 Upgrade Hadoop to 3.3.5 (#39124)
- SPARK-43448 Remove dummy dependency `hadoop-openstack` (#41133)

On top of reverting SPARK-44197 and SPARK-42913, this PR has additional dependency exclusion change due to the following.
- SPARK-43880 Organize `hadoop-cloud` in standard maven project structure (#41380)

### Why are the changes needed?

There is a community report on S3A committer performance regression. Although it's one liner fix, there is no available Hadoop release with that fix at this time.
- HADOOP-18757: Bump corePoolSize of HadoopThreadPoolExecutor in s3a committer (apache/hadoop#5706)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #42345 from dongjoon-hyun/SPARK-44678.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun dongjoon-hyun deleted the SPARK-44678 branch August 4, 2023 21:21
@LuciferYang
Copy link
Contributor

late LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants