-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-51449][BUILD] Restore hive-llap-common to compile scope #50222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I personally would treat this as a blocker for the 4.0.0 release. |
|
+1 for restore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making a PR, @pan3793 .
To the reviewers, I'm not against to this PR because this is a legitimate request from the community members.
I just want to add a context for the record,
- Apache Spark 4.0.0 RC2 makes this dependency optional intentionally due to CVE-2024-23953. In RC2, The vulnerability only affected the production environments when the users allow it by installing the package intentionally.
- This PR will propagate
Apache Hive LLAP vulnerabilityback to Apache Spark binary distribution again although this is not a regression from Apache Spark 3. - After this PR, it's highly recommended to handle it internally in the production environments by patching it internal fork of Spark or Hive based on their own user situations.
I must admit that the AS-IS Apache Spark 4.0.0 RC2 was a bandage until Apache Spark upgrades its Hive dependency to Apache Hive 4.x. Every path (including this) has it own rational. So, thank you again.
From our production environment, we will opt-out still.
After restoring this dependency, I think it would be best to have a place to document this known issue and provide recommendations to users, such as on the security.html page of spark website, or in the 4.0 release notes? |
|
Close and in favor SPARK-51466 (#50232) |
What changes were proposed in this pull request?
Restore
hive-llap-commonfromprovidedtocompilescope, this PR reverts #49725 and #50146 (partially).Why are the changes needed?
SPARK-51029 (#49725) removes
hive-llap-commonfrom the Spark binary distributions, which technically breaks the feature "Spark SQL supports integration of Hive UDFs, UDAFs and UDTFs", more precisely, it changes Hive UDF support from batteries included to not.In details, when user runs a query like
CREATE TEMPORARY FUNCTION hello AS 'my.HelloUDF', it triggerso.a.h.hive.ql.exec.FunctionRegistryinitialization, which also initializes the Hive built-in UDFs, UDAFs and UDTFs, thenNoClassDefFoundErrorocuurs due to some built-in UDTFs depend on class inhive-llap-common.Currently (v4.0.0-rc2), user must add the
hive-llap-commonjar explicitly, e.g. by using--packages org.apache.hive:hive-llap-common:2.3.10, to fix theNoClassDefFoundErrorissue, even themy.HelloUDFdoes not depend on any class in
hive-llap-common, this is quite confusing.Does this PR introduce any user-facing change?
Restore "Spark SQL supports integration of Hive UDFs, UDAFs and UDTFs" to batteries included, as earlier release like Spark 3.5
How was this patch tested?
Manually verified,
NoClassDefFoundErrorhas gone after restoringhive-llap-commonto classpath when calling Hive UDF.Was this patch authored or co-authored using generative AI tooling?
No.