-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-51029][BUILD] Remove hive-llap-common compile dependency
#49725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you so much, @LuciferYang . |
### What changes were proposed in this pull request? This PR aims to remove `hive-llap-common` compile dependency from Apache Spark. ### Why are the changes needed? Technically, Apache Spark is not using this jar. We had better exclude it from Apache Spark distribution in order to mitigate security concerns. ### Does this PR introduce _any_ user-facing change? Yes, this is a removal of dependency which may affect existing Hive UDF jars. The user can add the `hive-llap-common` library to their class path at their own risk, similar to the other third-party libraries. The migration guide is updated. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49725 from dongjoon-hyun/SPARK-51029. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 339b036) Signed-off-by: Dongjoon Hyun <[email protected]>
|
Merged to master/4.0. |
|
Due to the lack of these test dependencies of hive-llap-client and hive-llap-common, testing hive-thriftserver using Maven will hang at Should we add them as test dependencies for |
|
Based on my analysis at #49736 (comment), it seems that the compile-scope dependency on llap-common cannot be removed. |
|
Yes, as I wrote in the PR description, UDF parts are affected, @LuciferYang . |
|
The purpose of this PR is to eliminate the risk from Apache Spark side and to give a full freedom to users to take it or deploy with the patched For example, Apache Spark 3.5.4, we can expect to delete WDTY, @LuciferYang ? |
|
For example, this kind of risk issue. |
|
Understood, fine to me. Thank you @dongjoon-hyun |
|
Thank you. We can discuss more during the QA and RC period in order to get the final decision~ |
|
Sorry, I can't follow the decision of removing For details, please refer to #46521 (comment) |
|
Thank you for the comments, @pan3793 . |
…on Hive UDF evaluation ### What changes were proposed in this pull request? Fork a few methods from Hive to eliminate calls of `org.apache.hadoop.hive.ql.exec.FunctionRegistry` to avoid initializing Hive built-in UDFs ### Why are the changes needed? Currently, when the user runs a query that contains Hive UDF, it triggers `o.a.h.hive.ql.exec.FunctionRegistry` initialization, which also initializes the [Hive built-in UDFs, UDAFs and UDTFs](https://github.com/apache/hive/blob/rel/release-2.3.10/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java#L500). Since [SPARK-51029](https://issues.apache.org/jira/browse/SPARK-51029) (#49725) removes hive-llap-common from the Spark binary distributions, `NoClassDefFoundError` occurs. ``` org.apache.spark.sql.execution.QueryExecutionException: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/llap/security/LlapSigner$Signable at java.base/java.lang.Class.getDeclaredConstructors0(Native Method) at java.base/java.lang.Class.privateGetDeclaredConstructors(Class.java:3373) at java.base/java.lang.Class.getConstructor0(Class.java:3578) at java.base/java.lang.Class.getDeclaredConstructor(Class.java:2754) at org.apache.hive.common.util.ReflectionUtil.newInstance(ReflectionUtil.java:79) at org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDTF(Registry.java:208) at org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDTF(Registry.java:201) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.<clinit>(FunctionRegistry.java:500) at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:160) at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector$lzycompute(hiveUDFEvaluators.scala:118) at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector(hiveUDFEvaluators.scala:117) at org.apache.spark.sql.hive.HiveGenericUDF.dataType$lzycompute(hiveUDFs.scala:132) at org.apache.spark.sql.hive.HiveGenericUDF.dataType(hiveUDFs.scala:132) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.makeHiveFunctionExpression(HiveSessionStateBuilder.scala:197) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.$anonfun$makeExpression$1(HiveSessionStateBuilder.scala:177) at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:187) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.makeExpression(HiveSessionStateBuilder.scala:171) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.$anonfun$makeFunctionBuilder$1(SessionCatalog.scala:1689) ... ``` Actually, Spark does not use those Hive built-in functions, but still needs to pull those transitive deps to make Hive happy. By eliminating Hive built-in UDFs initialization, Spark can get rid of those transitive deps, and gain a small performance improvement on the first call Hive UDF. ### Does this PR introduce _any_ user-facing change? No, except for a small perf improvement on the first call Hive UDF. ### How was this patch tested? Pass GHA to ensure the porting code is correct. Manually tested that call Hive UDF, UDAF and UDTF won't trigger `org.apache.hadoop.hive.ql.exec.FunctionRegistry.<clinit>` ``` $ bin/spark-sql // UDF spark-sql (default)> create temporary function hive_uuid as 'org.apache.hadoop.hive.ql.udf.UDFUUID'; Time taken: 0.878 seconds spark-sql (default)> select hive_uuid(); 840356e5-ce2a-4d6c-9383-294d620ec32b Time taken: 2.264 seconds, Fetched 1 row(s) // GenericUDF spark-sql (default)> create temporary function hive_sha2 as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFSha2'; Time taken: 0.023 seconds spark-sql (default)> select hive_sha2('ABC', 256); b5d4045c3f466fa91fe2cc6abe79232a1a57cdf104f7a26e716e0a1e2789df78 Time taken: 0.157 seconds, Fetched 1 row(s) // UDAF spark-sql (default)> create temporary function hive_percentile as 'org.apache.hadoop.hive.ql.udf.UDAFPercentile'; Time taken: 0.032 seconds spark-sql (default)> select hive_percentile(id, 0.5) from range(100); 49.5 Time taken: 0.474 seconds, Fetched 1 row(s) // GenericUDAF spark-sql (default)> create temporary function hive_sum as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDAFSum'; Time taken: 0.017 seconds spark-sql (default)> select hive_sum(*) from range(100); 4950 Time taken: 1.25 seconds, Fetched 1 row(s) // GenericUDTF spark-sql (default)> create temporary function hive_replicate_rows as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDTFReplicateRows'; Time taken: 0.012 seconds spark-sql (default)> select hive_replicate_rows(3L, id) from range(3); 3 0 3 0 3 0 3 1 3 1 3 1 3 2 3 2 3 2 Time taken: 0.19 seconds, Fetched 9 row(s) ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #50232 from pan3793/eliminate-hive-udf-init. Authored-by: Cheng Pan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…on Hive UDF evaluation ### What changes were proposed in this pull request? Fork a few methods from Hive to eliminate calls of `org.apache.hadoop.hive.ql.exec.FunctionRegistry` to avoid initializing Hive built-in UDFs ### Why are the changes needed? Currently, when the user runs a query that contains Hive UDF, it triggers `o.a.h.hive.ql.exec.FunctionRegistry` initialization, which also initializes the [Hive built-in UDFs, UDAFs and UDTFs](https://github.com/apache/hive/blob/rel/release-2.3.10/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java#L500). Since [SPARK-51029](https://issues.apache.org/jira/browse/SPARK-51029) (apache#49725) removes hive-llap-common from the Spark binary distributions, `NoClassDefFoundError` occurs. ``` org.apache.spark.sql.execution.QueryExecutionException: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/llap/security/LlapSigner$Signable at java.base/java.lang.Class.getDeclaredConstructors0(Native Method) at java.base/java.lang.Class.privateGetDeclaredConstructors(Class.java:3373) at java.base/java.lang.Class.getConstructor0(Class.java:3578) at java.base/java.lang.Class.getDeclaredConstructor(Class.java:2754) at org.apache.hive.common.util.ReflectionUtil.newInstance(ReflectionUtil.java:79) at org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDTF(Registry.java:208) at org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDTF(Registry.java:201) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.<clinit>(FunctionRegistry.java:500) at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:160) at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector$lzycompute(hiveUDFEvaluators.scala:118) at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector(hiveUDFEvaluators.scala:117) at org.apache.spark.sql.hive.HiveGenericUDF.dataType$lzycompute(hiveUDFs.scala:132) at org.apache.spark.sql.hive.HiveGenericUDF.dataType(hiveUDFs.scala:132) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.makeHiveFunctionExpression(HiveSessionStateBuilder.scala:197) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.$anonfun$makeExpression$1(HiveSessionStateBuilder.scala:177) at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:187) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.makeExpression(HiveSessionStateBuilder.scala:171) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.$anonfun$makeFunctionBuilder$1(SessionCatalog.scala:1689) ... ``` Actually, Spark does not use those Hive built-in functions, but still needs to pull those transitive deps to make Hive happy. By eliminating Hive built-in UDFs initialization, Spark can get rid of those transitive deps, and gain a small performance improvement on the first call Hive UDF. ### Does this PR introduce _any_ user-facing change? No, except for a small perf improvement on the first call Hive UDF. ### How was this patch tested? Pass GHA to ensure the porting code is correct. Manually tested that call Hive UDF, UDAF and UDTF won't trigger `org.apache.hadoop.hive.ql.exec.FunctionRegistry.<clinit>` ``` $ bin/spark-sql // UDF spark-sql (default)> create temporary function hive_uuid as 'org.apache.hadoop.hive.ql.udf.UDFUUID'; Time taken: 0.878 seconds spark-sql (default)> select hive_uuid(); 840356e5-ce2a-4d6c-9383-294d620ec32b Time taken: 2.264 seconds, Fetched 1 row(s) // GenericUDF spark-sql (default)> create temporary function hive_sha2 as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFSha2'; Time taken: 0.023 seconds spark-sql (default)> select hive_sha2('ABC', 256); b5d4045c3f466fa91fe2cc6abe79232a1a57cdf104f7a26e716e0a1e2789df78 Time taken: 0.157 seconds, Fetched 1 row(s) // UDAF spark-sql (default)> create temporary function hive_percentile as 'org.apache.hadoop.hive.ql.udf.UDAFPercentile'; Time taken: 0.032 seconds spark-sql (default)> select hive_percentile(id, 0.5) from range(100); 49.5 Time taken: 0.474 seconds, Fetched 1 row(s) // GenericUDAF spark-sql (default)> create temporary function hive_sum as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDAFSum'; Time taken: 0.017 seconds spark-sql (default)> select hive_sum(*) from range(100); 4950 Time taken: 1.25 seconds, Fetched 1 row(s) // GenericUDTF spark-sql (default)> create temporary function hive_replicate_rows as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDTFReplicateRows'; Time taken: 0.012 seconds spark-sql (default)> select hive_replicate_rows(3L, id) from range(3); 3 0 3 0 3 0 3 1 3 1 3 1 3 2 3 2 3 2 Time taken: 0.19 seconds, Fetched 9 row(s) ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50232 from pan3793/eliminate-hive-udf-init. Authored-by: Cheng Pan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…tion on Hive UDF evaluation Backport #50232 to branch-4.0 ### What changes were proposed in this pull request? Fork a few methods from Hive to eliminate calls of `org.apache.hadoop.hive.ql.exec.FunctionRegistry` to avoid initializing Hive built-in UDFs ### Why are the changes needed? Currently, when the user runs a query that contains Hive UDF, it triggers `o.a.h.hive.ql.exec.FunctionRegistry` initialization, which also initializes the [Hive built-in UDFs, UDAFs and UDTFs](https://github.com/apache/hive/blob/rel/release-2.3.10/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java#L500). Since [SPARK-51029](https://issues.apache.org/jira/browse/SPARK-51029) (#49725) removes hive-llap-common from the Spark binary distributions, `NoClassDefFoundError` occurs. ``` org.apache.spark.sql.execution.QueryExecutionException: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/llap/security/LlapSigner$Signable at java.base/java.lang.Class.getDeclaredConstructors0(Native Method) at java.base/java.lang.Class.privateGetDeclaredConstructors(Class.java:3373) at java.base/java.lang.Class.getConstructor0(Class.java:3578) at java.base/java.lang.Class.getDeclaredConstructor(Class.java:2754) at org.apache.hive.common.util.ReflectionUtil.newInstance(ReflectionUtil.java:79) at org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDTF(Registry.java:208) at org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDTF(Registry.java:201) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.<clinit>(FunctionRegistry.java:500) at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:160) at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector$lzycompute(hiveUDFEvaluators.scala:118) at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector(hiveUDFEvaluators.scala:117) at org.apache.spark.sql.hive.HiveGenericUDF.dataType$lzycompute(hiveUDFs.scala:132) at org.apache.spark.sql.hive.HiveGenericUDF.dataType(hiveUDFs.scala:132) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.makeHiveFunctionExpression(HiveSessionStateBuilder.scala:197) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.$anonfun$makeExpression$1(HiveSessionStateBuilder.scala:177) at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:187) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.makeExpression(HiveSessionStateBuilder.scala:171) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.$anonfun$makeFunctionBuilder$1(SessionCatalog.scala:1689) ... ``` Actually, Spark does not use those Hive built-in functions, but still needs to pull those transitive deps to make Hive happy. By eliminating Hive built-in UDFs initialization, Spark can get rid of those transitive deps, and gain a small performance improvement on the first call Hive UDF. ### Does this PR introduce _any_ user-facing change? No, except for a small perf improvement on the first call Hive UDF. ### How was this patch tested? Exclude `hive-llap-*` deps from the STS module and pass all SQL tests (previously some tests fail without `hive-llap-*` deps, see SPARK-51041) Manually tested that call Hive UDF, UDAF and UDTF won't trigger `org.apache.hadoop.hive.ql.exec.FunctionRegistry.<clinit>` ``` $ bin/spark-sql // UDF spark-sql (default)> create temporary function hive_uuid as 'org.apache.hadoop.hive.ql.udf.UDFUUID'; Time taken: 0.878 seconds spark-sql (default)> select hive_uuid(); 840356e5-ce2a-4d6c-9383-294d620ec32b Time taken: 2.264 seconds, Fetched 1 row(s) // GenericUDF spark-sql (default)> create temporary function hive_sha2 as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFSha2'; Time taken: 0.023 seconds spark-sql (default)> select hive_sha2('ABC', 256); b5d4045c3f466fa91fe2cc6abe79232a1a57cdf104f7a26e716e0a1e2789df78 Time taken: 0.157 seconds, Fetched 1 row(s) // UDAF spark-sql (default)> create temporary function hive_percentile as 'org.apache.hadoop.hive.ql.udf.UDAFPercentile'; Time taken: 0.032 seconds spark-sql (default)> select hive_percentile(id, 0.5) from range(100); 49.5 Time taken: 0.474 seconds, Fetched 1 row(s) // GenericUDAF spark-sql (default)> create temporary function hive_sum as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDAFSum'; Time taken: 0.017 seconds spark-sql (default)> select hive_sum(*) from range(100); 4950 Time taken: 1.25 seconds, Fetched 1 row(s) // GenericUDTF spark-sql (default)> create temporary function hive_replicate_rows as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDTFReplicateRows'; Time taken: 0.012 seconds spark-sql (default)> select hive_replicate_rows(3L, id) from range(3); 3 0 3 0 3 0 3 1 3 1 3 1 3 2 3 2 3 2 Time taken: 0.19 seconds, Fetched 9 row(s) ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #50264 from pan3793/SPARK-51466-4.0. Authored-by: Cheng Pan <[email protected]> Signed-off-by: yangjie01 <[email protected]>
…on Hive UDF evaluation ### What changes were proposed in this pull request? Fork a few methods from Hive to eliminate calls of `org.apache.hadoop.hive.ql.exec.FunctionRegistry` to avoid initializing Hive built-in UDFs ### Why are the changes needed? Currently, when the user runs a query that contains Hive UDF, it triggers `o.a.h.hive.ql.exec.FunctionRegistry` initialization, which also initializes the [Hive built-in UDFs, UDAFs and UDTFs](https://github.com/apache/hive/blob/rel/release-2.3.10/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java#L500). Since [SPARK-51029](https://issues.apache.org/jira/browse/SPARK-51029) (apache#49725) removes hive-llap-common from the Spark binary distributions, `NoClassDefFoundError` occurs. ``` org.apache.spark.sql.execution.QueryExecutionException: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/llap/security/LlapSigner$Signable at java.base/java.lang.Class.getDeclaredConstructors0(Native Method) at java.base/java.lang.Class.privateGetDeclaredConstructors(Class.java:3373) at java.base/java.lang.Class.getConstructor0(Class.java:3578) at java.base/java.lang.Class.getDeclaredConstructor(Class.java:2754) at org.apache.hive.common.util.ReflectionUtil.newInstance(ReflectionUtil.java:79) at org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDTF(Registry.java:208) at org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDTF(Registry.java:201) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.<clinit>(FunctionRegistry.java:500) at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:160) at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector$lzycompute(hiveUDFEvaluators.scala:118) at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector(hiveUDFEvaluators.scala:117) at org.apache.spark.sql.hive.HiveGenericUDF.dataType$lzycompute(hiveUDFs.scala:132) at org.apache.spark.sql.hive.HiveGenericUDF.dataType(hiveUDFs.scala:132) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.makeHiveFunctionExpression(HiveSessionStateBuilder.scala:197) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.$anonfun$makeExpression$1(HiveSessionStateBuilder.scala:177) at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:187) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.makeExpression(HiveSessionStateBuilder.scala:171) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.$anonfun$makeFunctionBuilder$1(SessionCatalog.scala:1689) ... ``` Actually, Spark does not use those Hive built-in functions, but still needs to pull those transitive deps to make Hive happy. By eliminating Hive built-in UDFs initialization, Spark can get rid of those transitive deps, and gain a small performance improvement on the first call Hive UDF. ### Does this PR introduce _any_ user-facing change? No, except for a small perf improvement on the first call Hive UDF. ### How was this patch tested? Pass GHA to ensure the porting code is correct. Manually tested that call Hive UDF, UDAF and UDTF won't trigger `org.apache.hadoop.hive.ql.exec.FunctionRegistry.<clinit>` ``` $ bin/spark-sql // UDF spark-sql (default)> create temporary function hive_uuid as 'org.apache.hadoop.hive.ql.udf.UDFUUID'; Time taken: 0.878 seconds spark-sql (default)> select hive_uuid(); 840356e5-ce2a-4d6c-9383-294d620ec32b Time taken: 2.264 seconds, Fetched 1 row(s) // GenericUDF spark-sql (default)> create temporary function hive_sha2 as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFSha2'; Time taken: 0.023 seconds spark-sql (default)> select hive_sha2('ABC', 256); b5d4045c3f466fa91fe2cc6abe79232a1a57cdf104f7a26e716e0a1e2789df78 Time taken: 0.157 seconds, Fetched 1 row(s) // UDAF spark-sql (default)> create temporary function hive_percentile as 'org.apache.hadoop.hive.ql.udf.UDAFPercentile'; Time taken: 0.032 seconds spark-sql (default)> select hive_percentile(id, 0.5) from range(100); 49.5 Time taken: 0.474 seconds, Fetched 1 row(s) // GenericUDAF spark-sql (default)> create temporary function hive_sum as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDAFSum'; Time taken: 0.017 seconds spark-sql (default)> select hive_sum(*) from range(100); 4950 Time taken: 1.25 seconds, Fetched 1 row(s) // GenericUDTF spark-sql (default)> create temporary function hive_replicate_rows as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDTFReplicateRows'; Time taken: 0.012 seconds spark-sql (default)> select hive_replicate_rows(3L, id) from range(3); 3 0 3 0 3 0 3 1 3 1 3 1 3 2 3 2 3 2 Time taken: 0.19 seconds, Fetched 9 row(s) ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50232 from pan3793/eliminate-hive-udf-init. Authored-by: Cheng Pan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This PR aims to remove
hive-llap-commoncompile dependency from Apache Spark.Why are the changes needed?
Technically, Apache Spark is not using this jar. We had better exclude it from Apache Spark distribution in order to mitigate security concerns.
Does this PR introduce any user-facing change?
Yes, this is a removal of dependency which may affect existing Hive UDF jars. The user can add the
hive-llap-commonlibrary to their class path at their own risk, similar to the other third-party libraries.The migration guide is updated.
How was this patch tested?
Pass the CIs.
Was this patch authored or co-authored using generative AI tooling?
No.