-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-48231][BUILD] Remove unused CodeHaus Jackson dependencies #46521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @dongjoon-hyun and @wangyum |
| <groupId>org.codehaus.jackson</groupId> | ||
| <artifactId>jackson-core-asl</artifactId> | ||
| <version>${codehaus.jackson.version}</version> | ||
| <scope>${hive.jackson.scope}</scope> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also remove <hive.jackson.scope>compile</hive.jackson.scope>?
Line 270 in 44f00cc
| <hive.jackson.scope>compile</hive.jackson.scope> |
Line 269 in 2df494f
| <hive.jackson.scope>provided</hive.jackson.scope> |
https://github.com/apache/spark/blob/master/assembly/pom.xml#L272-L277
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we identify some issues on hive 2.3.10 before 4.0.0 release, we may need to revert this patch and fallback to SPARK-47119 approach to mitigate CodeHaus Jackson dependencies vulnerabilities, see comemnts at
#45201 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM (Pending CIs)
|
I think the case is already covered by CI. When IsolatedClassLoader is enabled, the |
|
Ya, I know that part, but do we have an end-to-end Hive UDF registration and invocation test case? |
|
@dongjoon-hyun AFAIK, the "Hive UDF execution" always uses built-in Hive jars without IsolatedClassLoader. While "Hive UDF registration" will happen during |
It sounds like that we could have a corner case. That's the reason why we need an actual test case to cover it, isn't it? |
|
For this one PR, I believe we need a verification for different HMS versions to make it sure. |
|
Hmm, let me clear my view. In short, I think the current CI is sufficient. Spark uses Hive in two cases:
For case 1, the CI already covers that(any older HMS client initialization triggers built-in UDF registration). For case 2, there is no chance to invoke CodeHaus Jackson classes since Hive 2.3.10 totally removed it in the codebase. |
|
also cc @wangyum @yaooqinn @AngersZhuuuu @cloud-fan |
that's a valid concern, since Spark CI only covers embedded HMS client case, let me test it with the real setup. |
|
Thank you. Please attach the test results to the PR description. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please hold on all Hive related dependency change until we recover Maven CIs.
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
@dongjoon-hyun I managed to set up an env to test the IsolatedClassLoader, it works as expected. The basic test steps:
Verify built-in Hive 2.3.10 works well without CodeHaus Jackson jars Verify Hive 3.1.3 metastore jars also works well without CodeHaus Jackson jars |
|
Thank you for checking, @pan3793 . Are you assuming to rebuild all Hive UDF jars here? I'm wondering if you are presenting the test result with old Hive built-UDF jars here. |
|
BTW, thank you for taking a look at removing this. |
|
I added this to a subtask of SPARK-48231 . |
@dongjoon-hyun I never made such an assumption, most of the existing UDFs should work without any change, except to: the UDFs explicitly imports and uses the classes we removed from Spark new releases, it is not limited to CodeHaus Jackson, the risk happens each time when we update let's say if the CustomUDF built with Hive 2.3.9 uses OkHTTP classes, it works well in Spark 3.5 because it ships OkHTTP jar by K8s client 6, but Spark 4.0 removes OkHTTP jars during K8s client 7 upgrading, then the CustomUDF should fail with OkHTTP class not found, to fix it, the user can either shade the deps or add them by What matters is that we must NOT break the Hive built-in UDF deps, otherwise, it blocks SPARK-51029 (GitHub PR [1]) removes In details, when user runs a query like Currently (v4.0.0-rc2), user must add the [1] #49725 |
|
In short, my conclusion is, we should and must keep all jars required by Hive built-in UDF to allow |
I fully understand your backgrounds, reasoning, and this conclusion. May I ask why you initiate that discussion on this PR, |
I just found I forgot this stuff, and I think this PR could be reopened, so the comments should be visible to future reviewers.
Given this should be done in 4.1, so let's focus on SPARK-51029 and move discussion to #49725 for now? |
|
No, what I meant here was that your concern is legitimate. So, you can raise your concerns to the broader audience, @pan3793 . For example, dev@spark instead of this PR which is completely opposite to your intention. If you want, you can block Apache Spark 4.0.0 RCs by vetoing and initiating a discussion thread to add them all back.
|
|
The RC is supposed to gather those kind of feedbacks and difficulties. There is no Apache Spark 4.0.0 until we have a community-blessed one. |
|
Get your point, and let me respond in the voting mail |
|
Thank you, @pan3793 ! And, sorry for your inconvenience. |
|
Now that Spark 4.0.0 has been successfully released with Hive 2.3.10, I think we can continue the process of removing CodeHaus Jackson dependencies. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @pan3793 . I cleared my previous review comments from this PR. For the rest of the process, I'm going to follow the community decision while being away from this PR because I want to be neutral for this specific topic.
|
thanks @dongjoon-hyun. cc @wangyum @LuciferYang, this is ready for review. also cc @Madhukar525722, who asked this before. |
yaooqinn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine from my side
|
Please give me two hours to carry out some verification work |
wangyum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
LuciferYang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Merged into master for Apache Spark 4.1.0. Thanks @pan3793 @yaooqinn @wangyum and @dongjoon-hyun |
### What changes were proposed in this pull request? CodeHaus Jackson dependencies were pulled from Hive, while in apache/hive#4564 (Hive 2.3.10), it migrated to Jackson 2.x, so we can remove them from Spark now. ### Why are the changes needed? Remove unused and vulnerable dependencies. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46521 from pan3793/SPARK-48231. Authored-by: Cheng Pan <[email protected]> Signed-off-by: yangjie01 <[email protected]>
What changes were proposed in this pull request?
CodeHaus Jackson dependencies were pulled from Hive, while in apache/hive#4564 (Hive 2.3.10), it migrated to Jackson 2.x, so we can remove them from Spark now.
Why are the changes needed?
Remove unused and vulnerable dependencies.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Pass GA.
Was this patch authored or co-authored using generative AI tooling?
No.