Skip to content

Conversation

@pan3793
Copy link
Member

@pan3793 pan3793 commented May 10, 2024

What changes were proposed in this pull request?

Remove a jar that has CVE GHSA-jrg3-qq99-35g7

Why are the changes needed?

Previously, jodd-core came from Hive transitive deps, while apache/hive#5151 (Hive 2.3.10) cut it out, so we can remove it from Spark now.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Pass GA.

Was this patch authored or co-authored using generative AI tooling?

No.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (Pending CIs)

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-48230][BUILD] Remove unused jodd-core [SPARK-48230][BUILD] Remove unused jodd-core May 10, 2024
@dongjoon-hyun
Copy link
Member

Merged to master for Apache Spark 4.0.0.

@dongjoon-hyun
Copy link
Member

Hi, @pan3793 .

It seems that we need to re-evaluate this dependency removal. Please see the following which is related to commons-lang:commons:lang.

@dongjoon-hyun
Copy link
Member

For the existing Hive UDFs which assumes jodd library, this could be a breaking change like #46528 .

- import jodd.datetime.JDateTime;
+ import org.apache.hadoop.hive.ql.io.parquet.timestamp.datetime.JDateTime;

Sorry but let me revert this first. We need a more graceful way of removing or additional verified migration path (documentation and config).

@pan3793
Copy link
Member Author

pan3793 commented May 11, 2024

@dongjoon-hyun Okay, as this fail the CI, we should revert deps removing first and do more investigation later.

While for supporting "legacy Hive UDF jars", I think if the user imports some classes that come from Hive transitive deps, Spark is not responsible for handling that.

Spark only includes part of Hive deps (I mean all jars shipped by Hive binary tgz), for example, Hive ships groovy-all-2.4.4.jar but Spark does not, if user's UDF imports classes from groovy-all-2.4.4.jar, it should fail on Spark due to ClassNotFound, in this case user should add thrid party deps by themselves.

I believe jodd/commons-lang 2.x/jackson 1.x are in same position.

@pan3793
Copy link
Member Author

pan3793 commented Jul 21, 2025

Hi, @dongjoon-hyun, since we have successfully upgraded Hive to 2.3.10 in Spark 4.0, do you think we can re-evaluate this PR for Spark 4.1?

@dongjoon-hyun
Copy link
Member

Hi, @dongjoon-hyun, since we have successfully upgraded Hive to 2.3.10 in Spark 4.0, do you think we can re-evaluate this PR for Spark 4.1?

+1, @pan3793 .

@dongjoon-hyun
Copy link
Member

cc @peter-toth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants