[SPARK-51029][BUILD] Remove `hive-llap-common` compile dependency #49725

dongjoon-hyun · 2025-01-29T08:21:20Z

What changes were proposed in this pull request?

This PR aims to remove hive-llap-common compile dependency from Apache Spark.

Why are the changes needed?

Technically, Apache Spark is not using this jar. We had better exclude it from Apache Spark distribution in order to mitigate security concerns.

Does this PR introduce any user-facing change?

Yes, this is a removal of dependency which may affect existing Hive UDF jars. The user can add the hive-llap-common library to their class path at their own risk, similar to the other third-party libraries.

The migration guide is updated.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

dongjoon-hyun · 2025-01-29T17:15:23Z

Thank you so much, @LuciferYang .

### What changes were proposed in this pull request? This PR aims to remove `hive-llap-common` compile dependency from Apache Spark. ### Why are the changes needed? Technically, Apache Spark is not using this jar. We had better exclude it from Apache Spark distribution in order to mitigate security concerns. ### Does this PR introduce _any_ user-facing change? Yes, this is a removal of dependency which may affect existing Hive UDF jars. The user can add the `hive-llap-common` library to their class path at their own risk, similar to the other third-party libraries. The migration guide is updated. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49725 from dongjoon-hyun/SPARK-51029. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 339b036) Signed-off-by: Dongjoon Hyun <[email protected]>

dongjoon-hyun · 2025-01-29T17:33:52Z

Merged to master/4.0.

LuciferYang · 2025-01-30T16:18:58Z

Due to the lack of these test dependencies of hive-llap-client and hive-llap-common, testing hive-thriftserver using Maven will hang at

Discovery starting.
2025-01-29T19:23:53.496214833Z ScalaTest-main ERROR Filters contains invalid attributes "onMatch", "onMismatch"
2025-01-29T19:23:53.510863632Z ScalaTest-main ERROR Filters contains invalid attributes "onMatch", "onMismatch"
Discovery completed in 1 second, 258 milliseconds.
Run starting. Expected test count is: 634
HiveThriftBinaryServerSuite:
- SPARK-17819: Support default database in connection URIs
- GetInfo Thrift API

Should we add them as test dependencies for hive-thriftserver or revert?

LuciferYang · 2025-01-30T16:54:51Z

Based on my analysis at #49736 (comment), it seems that the compile-scope dependency on llap-common cannot be removed.

dongjoon-hyun · 2025-01-30T16:57:00Z

Yes, as I wrote in the PR description, UDF parts are affected, @LuciferYang .

dongjoon-hyun · 2025-01-30T17:03:12Z

The purpose of this PR is to eliminate the risk from Apache Spark side and to give a full freedom to users to take it or deploy with the patched hive-llap-common.

For example, Apache Spark 3.5.4, we can expect to delete hive-llap-common.jar and run like the following.

$ sbin/start-thriftserver.sh --packages org.apache.hive:hive-llap-common:2.3.9

WDTY, @LuciferYang ?

dongjoon-hyun · 2025-01-30T17:04:36Z

For example, this kind of risk issue.

https://github.com/apache/spark/security/dependabot/112

LuciferYang · 2025-01-30T17:20:49Z

Understood, fine to me. Thank you @dongjoon-hyun

dongjoon-hyun · 2025-01-30T17:44:51Z

Thank you. We can discuss more during the QA and RC period in order to get the final decision~

pan3793 · 2025-03-07T12:12:25Z

Sorry, I can't follow the decision of removing hive-llap-common-2.3.10.jar from Spark dist, it technically breaks the feature "Support Hive UDF", without this jar, Spark is not able to call even a simple HelloUDF that has no 3rd party deps.

For details, please refer to #46521 (comment)

dongjoon-hyun · 2025-03-07T18:03:54Z

Thank you for the comments, @pan3793 .

…on Hive UDF evaluation ### What changes were proposed in this pull request? Fork a few methods from Hive to eliminate calls of `org.apache.hadoop.hive.ql.exec.FunctionRegistry` to avoid initializing Hive built-in UDFs ### Why are the changes needed? Currently, when the user runs a query that contains Hive UDF, it triggers `o.a.h.hive.ql.exec.FunctionRegistry` initialization, which also initializes the [Hive built-in UDFs, UDAFs and UDTFs](https://github.com/apache/hive/blob/rel/release-2.3.10/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java#L500). Since [SPARK-51029](https://issues.apache.org/jira/browse/SPARK-51029) (#49725) removes hive-llap-common from the Spark binary distributions, `NoClassDefFoundError` occurs. ``` org.apache.spark.sql.execution.QueryExecutionException: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/llap/security/LlapSigner$Signable at java.base/java.lang.Class.getDeclaredConstructors0(Native Method) at java.base/java.lang.Class.privateGetDeclaredConstructors(Class.java:3373) at java.base/java.lang.Class.getConstructor0(Class.java:3578) at java.base/java.lang.Class.getDeclaredConstructor(Class.java:2754) at org.apache.hive.common.util.ReflectionUtil.newInstance(ReflectionUtil.java:79) at org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDTF(Registry.java:208) at org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDTF(Registry.java:201) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.<clinit>(FunctionRegistry.java:500) at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:160) at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector$lzycompute(hiveUDFEvaluators.scala:118) at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector(hiveUDFEvaluators.scala:117) at org.apache.spark.sql.hive.HiveGenericUDF.dataType$lzycompute(hiveUDFs.scala:132) at org.apache.spark.sql.hive.HiveGenericUDF.dataType(hiveUDFs.scala:132) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.makeHiveFunctionExpression(HiveSessionStateBuilder.scala:197) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.$anonfun$makeExpression$1(HiveSessionStateBuilder.scala:177) at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:187) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.makeExpression(HiveSessionStateBuilder.scala:171) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.$anonfun$makeFunctionBuilder$1(SessionCatalog.scala:1689) ... ``` Actually, Spark does not use those Hive built-in functions, but still needs to pull those transitive deps to make Hive happy. By eliminating Hive built-in UDFs initialization, Spark can get rid of those transitive deps, and gain a small performance improvement on the first call Hive UDF. ### Does this PR introduce _any_ user-facing change? No, except for a small perf improvement on the first call Hive UDF. ### How was this patch tested? Pass GHA to ensure the porting code is correct. Manually tested that call Hive UDF, UDAF and UDTF won't trigger `org.apache.hadoop.hive.ql.exec.FunctionRegistry.<clinit>` ``` $ bin/spark-sql // UDF spark-sql (default)> create temporary function hive_uuid as 'org.apache.hadoop.hive.ql.udf.UDFUUID'; Time taken: 0.878 seconds spark-sql (default)> select hive_uuid(); 840356e5-ce2a-4d6c-9383-294d620ec32b Time taken: 2.264 seconds, Fetched 1 row(s) // GenericUDF spark-sql (default)> create temporary function hive_sha2 as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFSha2'; Time taken: 0.023 seconds spark-sql (default)> select hive_sha2('ABC', 256); b5d4045c3f466fa91fe2cc6abe79232a1a57cdf104f7a26e716e0a1e2789df78 Time taken: 0.157 seconds, Fetched 1 row(s) // UDAF spark-sql (default)> create temporary function hive_percentile as 'org.apache.hadoop.hive.ql.udf.UDAFPercentile'; Time taken: 0.032 seconds spark-sql (default)> select hive_percentile(id, 0.5) from range(100); 49.5 Time taken: 0.474 seconds, Fetched 1 row(s) // GenericUDAF spark-sql (default)> create temporary function hive_sum as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDAFSum'; Time taken: 0.017 seconds spark-sql (default)> select hive_sum(*) from range(100); 4950 Time taken: 1.25 seconds, Fetched 1 row(s) // GenericUDTF spark-sql (default)> create temporary function hive_replicate_rows as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDTFReplicateRows'; Time taken: 0.012 seconds spark-sql (default)> select hive_replicate_rows(3L, id) from range(3); 3 0 3 0 3 0 3 1 3 1 3 1 3 2 3 2 3 2 Time taken: 0.19 seconds, Fetched 9 row(s) ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #50232 from pan3793/eliminate-hive-udf-init. Authored-by: Cheng Pan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…on Hive UDF evaluation ### What changes were proposed in this pull request? Fork a few methods from Hive to eliminate calls of `org.apache.hadoop.hive.ql.exec.FunctionRegistry` to avoid initializing Hive built-in UDFs ### Why are the changes needed? Currently, when the user runs a query that contains Hive UDF, it triggers `o.a.h.hive.ql.exec.FunctionRegistry` initialization, which also initializes the [Hive built-in UDFs, UDAFs and UDTFs](https://github.com/apache/hive/blob/rel/release-2.3.10/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java#L500). Since [SPARK-51029](https://issues.apache.org/jira/browse/SPARK-51029) (apache#49725) removes hive-llap-common from the Spark binary distributions, `NoClassDefFoundError` occurs. ``` org.apache.spark.sql.execution.QueryExecutionException: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/llap/security/LlapSigner$Signable at java.base/java.lang.Class.getDeclaredConstructors0(Native Method) at java.base/java.lang.Class.privateGetDeclaredConstructors(Class.java:3373) at java.base/java.lang.Class.getConstructor0(Class.java:3578) at java.base/java.lang.Class.getDeclaredConstructor(Class.java:2754) at org.apache.hive.common.util.ReflectionUtil.newInstance(ReflectionUtil.java:79) at org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDTF(Registry.java:208) at org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDTF(Registry.java:201) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.<clinit>(FunctionRegistry.java:500) at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:160) at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector$lzycompute(hiveUDFEvaluators.scala:118) at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector(hiveUDFEvaluators.scala:117) at org.apache.spark.sql.hive.HiveGenericUDF.dataType$lzycompute(hiveUDFs.scala:132) at org.apache.spark.sql.hive.HiveGenericUDF.dataType(hiveUDFs.scala:132) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.makeHiveFunctionExpression(HiveSessionStateBuilder.scala:197) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.$anonfun$makeExpression$1(HiveSessionStateBuilder.scala:177) at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:187) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.makeExpression(HiveSessionStateBuilder.scala:171) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.$anonfun$makeFunctionBuilder$1(SessionCatalog.scala:1689) ... ``` Actually, Spark does not use those Hive built-in functions, but still needs to pull those transitive deps to make Hive happy. By eliminating Hive built-in UDFs initialization, Spark can get rid of those transitive deps, and gain a small performance improvement on the first call Hive UDF. ### Does this PR introduce _any_ user-facing change? No, except for a small perf improvement on the first call Hive UDF. ### How was this patch tested? Pass GHA to ensure the porting code is correct. Manually tested that call Hive UDF, UDAF and UDTF won't trigger `org.apache.hadoop.hive.ql.exec.FunctionRegistry.<clinit>` ``` $ bin/spark-sql // UDF spark-sql (default)> create temporary function hive_uuid as 'org.apache.hadoop.hive.ql.udf.UDFUUID'; Time taken: 0.878 seconds spark-sql (default)> select hive_uuid(); 840356e5-ce2a-4d6c-9383-294d620ec32b Time taken: 2.264 seconds, Fetched 1 row(s) // GenericUDF spark-sql (default)> create temporary function hive_sha2 as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFSha2'; Time taken: 0.023 seconds spark-sql (default)> select hive_sha2('ABC', 256); b5d4045c3f466fa91fe2cc6abe79232a1a57cdf104f7a26e716e0a1e2789df78 Time taken: 0.157 seconds, Fetched 1 row(s) // UDAF spark-sql (default)> create temporary function hive_percentile as 'org.apache.hadoop.hive.ql.udf.UDAFPercentile'; Time taken: 0.032 seconds spark-sql (default)> select hive_percentile(id, 0.5) from range(100); 49.5 Time taken: 0.474 seconds, Fetched 1 row(s) // GenericUDAF spark-sql (default)> create temporary function hive_sum as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDAFSum'; Time taken: 0.017 seconds spark-sql (default)> select hive_sum(*) from range(100); 4950 Time taken: 1.25 seconds, Fetched 1 row(s) // GenericUDTF spark-sql (default)> create temporary function hive_replicate_rows as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDTFReplicateRows'; Time taken: 0.012 seconds spark-sql (default)> select hive_replicate_rows(3L, id) from range(3); 3 0 3 0 3 0 3 1 3 1 3 1 3 2 3 2 3 2 Time taken: 0.19 seconds, Fetched 9 row(s) ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50232 from pan3793/eliminate-hive-udf-init. Authored-by: Cheng Pan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…tion on Hive UDF evaluation Backport #50232 to branch-4.0 ### What changes were proposed in this pull request? Fork a few methods from Hive to eliminate calls of `org.apache.hadoop.hive.ql.exec.FunctionRegistry` to avoid initializing Hive built-in UDFs ### Why are the changes needed? Currently, when the user runs a query that contains Hive UDF, it triggers `o.a.h.hive.ql.exec.FunctionRegistry` initialization, which also initializes the [Hive built-in UDFs, UDAFs and UDTFs](https://github.com/apache/hive/blob/rel/release-2.3.10/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java#L500). Since [SPARK-51029](https://issues.apache.org/jira/browse/SPARK-51029) (#49725) removes hive-llap-common from the Spark binary distributions, `NoClassDefFoundError` occurs. ``` org.apache.spark.sql.execution.QueryExecutionException: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/llap/security/LlapSigner$Signable at java.base/java.lang.Class.getDeclaredConstructors0(Native Method) at java.base/java.lang.Class.privateGetDeclaredConstructors(Class.java:3373) at java.base/java.lang.Class.getConstructor0(Class.java:3578) at java.base/java.lang.Class.getDeclaredConstructor(Class.java:2754) at org.apache.hive.common.util.ReflectionUtil.newInstance(ReflectionUtil.java:79) at org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDTF(Registry.java:208) at org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDTF(Registry.java:201) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.<clinit>(FunctionRegistry.java:500) at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:160) at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector$lzycompute(hiveUDFEvaluators.scala:118) at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector(hiveUDFEvaluators.scala:117) at org.apache.spark.sql.hive.HiveGenericUDF.dataType$lzycompute(hiveUDFs.scala:132) at org.apache.spark.sql.hive.HiveGenericUDF.dataType(hiveUDFs.scala:132) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.makeHiveFunctionExpression(HiveSessionStateBuilder.scala:197) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.$anonfun$makeExpression$1(HiveSessionStateBuilder.scala:177) at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:187) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.makeExpression(HiveSessionStateBuilder.scala:171) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.$anonfun$makeFunctionBuilder$1(SessionCatalog.scala:1689) ... ``` Actually, Spark does not use those Hive built-in functions, but still needs to pull those transitive deps to make Hive happy. By eliminating Hive built-in UDFs initialization, Spark can get rid of those transitive deps, and gain a small performance improvement on the first call Hive UDF. ### Does this PR introduce _any_ user-facing change? No, except for a small perf improvement on the first call Hive UDF. ### How was this patch tested? Exclude `hive-llap-*` deps from the STS module and pass all SQL tests (previously some tests fail without `hive-llap-*` deps, see SPARK-51041) Manually tested that call Hive UDF, UDAF and UDTF won't trigger `org.apache.hadoop.hive.ql.exec.FunctionRegistry.<clinit>` ``` $ bin/spark-sql // UDF spark-sql (default)> create temporary function hive_uuid as 'org.apache.hadoop.hive.ql.udf.UDFUUID'; Time taken: 0.878 seconds spark-sql (default)> select hive_uuid(); 840356e5-ce2a-4d6c-9383-294d620ec32b Time taken: 2.264 seconds, Fetched 1 row(s) // GenericUDF spark-sql (default)> create temporary function hive_sha2 as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFSha2'; Time taken: 0.023 seconds spark-sql (default)> select hive_sha2('ABC', 256); b5d4045c3f466fa91fe2cc6abe79232a1a57cdf104f7a26e716e0a1e2789df78 Time taken: 0.157 seconds, Fetched 1 row(s) // UDAF spark-sql (default)> create temporary function hive_percentile as 'org.apache.hadoop.hive.ql.udf.UDAFPercentile'; Time taken: 0.032 seconds spark-sql (default)> select hive_percentile(id, 0.5) from range(100); 49.5 Time taken: 0.474 seconds, Fetched 1 row(s) // GenericUDAF spark-sql (default)> create temporary function hive_sum as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDAFSum'; Time taken: 0.017 seconds spark-sql (default)> select hive_sum(*) from range(100); 4950 Time taken: 1.25 seconds, Fetched 1 row(s) // GenericUDTF spark-sql (default)> create temporary function hive_replicate_rows as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDTFReplicateRows'; Time taken: 0.012 seconds spark-sql (default)> select hive_replicate_rows(3L, id) from range(3); 3 0 3 0 3 0 3 1 3 1 3 1 3 2 3 2 3 2 Time taken: 0.19 seconds, Fetched 9 row(s) ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #50264 from pan3793/SPARK-51466-4.0. Authored-by: Cheng Pan <[email protected]> Signed-off-by: yangjie01 <[email protected]>

…on Hive UDF evaluation ### What changes were proposed in this pull request? Fork a few methods from Hive to eliminate calls of `org.apache.hadoop.hive.ql.exec.FunctionRegistry` to avoid initializing Hive built-in UDFs ### Why are the changes needed? Currently, when the user runs a query that contains Hive UDF, it triggers `o.a.h.hive.ql.exec.FunctionRegistry` initialization, which also initializes the [Hive built-in UDFs, UDAFs and UDTFs](https://github.com/apache/hive/blob/rel/release-2.3.10/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java#L500). Since [SPARK-51029](https://issues.apache.org/jira/browse/SPARK-51029) (apache#49725) removes hive-llap-common from the Spark binary distributions, `NoClassDefFoundError` occurs. ``` org.apache.spark.sql.execution.QueryExecutionException: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/llap/security/LlapSigner$Signable at java.base/java.lang.Class.getDeclaredConstructors0(Native Method) at java.base/java.lang.Class.privateGetDeclaredConstructors(Class.java:3373) at java.base/java.lang.Class.getConstructor0(Class.java:3578) at java.base/java.lang.Class.getDeclaredConstructor(Class.java:2754) at org.apache.hive.common.util.ReflectionUtil.newInstance(ReflectionUtil.java:79) at org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDTF(Registry.java:208) at org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDTF(Registry.java:201) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.<clinit>(FunctionRegistry.java:500) at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:160) at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector$lzycompute(hiveUDFEvaluators.scala:118) at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector(hiveUDFEvaluators.scala:117) at org.apache.spark.sql.hive.HiveGenericUDF.dataType$lzycompute(hiveUDFs.scala:132) at org.apache.spark.sql.hive.HiveGenericUDF.dataType(hiveUDFs.scala:132) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.makeHiveFunctionExpression(HiveSessionStateBuilder.scala:197) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.$anonfun$makeExpression$1(HiveSessionStateBuilder.scala:177) at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:187) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.makeExpression(HiveSessionStateBuilder.scala:171) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.$anonfun$makeFunctionBuilder$1(SessionCatalog.scala:1689) ... ``` Actually, Spark does not use those Hive built-in functions, but still needs to pull those transitive deps to make Hive happy. By eliminating Hive built-in UDFs initialization, Spark can get rid of those transitive deps, and gain a small performance improvement on the first call Hive UDF. ### Does this PR introduce _any_ user-facing change? No, except for a small perf improvement on the first call Hive UDF. ### How was this patch tested? Pass GHA to ensure the porting code is correct. Manually tested that call Hive UDF, UDAF and UDTF won't trigger `org.apache.hadoop.hive.ql.exec.FunctionRegistry.<clinit>` ``` $ bin/spark-sql // UDF spark-sql (default)> create temporary function hive_uuid as 'org.apache.hadoop.hive.ql.udf.UDFUUID'; Time taken: 0.878 seconds spark-sql (default)> select hive_uuid(); 840356e5-ce2a-4d6c-9383-294d620ec32b Time taken: 2.264 seconds, Fetched 1 row(s) // GenericUDF spark-sql (default)> create temporary function hive_sha2 as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFSha2'; Time taken: 0.023 seconds spark-sql (default)> select hive_sha2('ABC', 256); b5d4045c3f466fa91fe2cc6abe79232a1a57cdf104f7a26e716e0a1e2789df78 Time taken: 0.157 seconds, Fetched 1 row(s) // UDAF spark-sql (default)> create temporary function hive_percentile as 'org.apache.hadoop.hive.ql.udf.UDAFPercentile'; Time taken: 0.032 seconds spark-sql (default)> select hive_percentile(id, 0.5) from range(100); 49.5 Time taken: 0.474 seconds, Fetched 1 row(s) // GenericUDAF spark-sql (default)> create temporary function hive_sum as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDAFSum'; Time taken: 0.017 seconds spark-sql (default)> select hive_sum(*) from range(100); 4950 Time taken: 1.25 seconds, Fetched 1 row(s) // GenericUDTF spark-sql (default)> create temporary function hive_replicate_rows as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDTFReplicateRows'; Time taken: 0.012 seconds spark-sql (default)> select hive_replicate_rows(3L, id) from range(3); 3 0 3 0 3 0 3 1 3 1 3 1 3 2 3 2 3 2 Time taken: 0.19 seconds, Fetched 9 row(s) ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50232 from pan3793/eliminate-hive-udf-init. Authored-by: Cheng Pan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

[SPARK-51029][BUILD] Remove hive-llap-common compile dependency

67e7e58

github-actions bot added BUILD DOCS labels Jan 29, 2025

LuciferYang approved these changes Jan 29, 2025

View reviewed changes

dongjoon-hyun closed this in 339b036 Jan 29, 2025

dongjoon-hyun deleted the SPARK-51029 branch January 29, 2025 17:33

jamie-albert mentioned this pull request Jan 30, 2025

spark-3.5-scala-2.12/GHSA-p953-3j66-hg45 wolfi-dev/advisories#11440

Merged

pan3793 mentioned this pull request Mar 7, 2025

[SPARK-48231][BUILD] Remove unused CodeHaus Jackson dependencies #46521

Closed

This was referenced Mar 10, 2025

[SPARK-51449][BUILD] Restore hive-llap-common to compile scope #50222

Closed

[SPARK-51466][SQL][HIVE] Eliminate Hive built-in UDFs initialization on Hive UDF evaluation #50232

Closed

pan3793 mentioned this pull request Mar 13, 2025

[SPARK-51466][SQL][HIVE][4.0] Eliminate Hive built-in UDFs initialization on Hive UDF evaluation #50264

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-51029][BUILD] Remove `hive-llap-common` compile dependency #49725

[SPARK-51029][BUILD] Remove `hive-llap-common` compile dependency #49725

Uh oh!

dongjoon-hyun commented Jan 29, 2025 •

edited

Loading

Uh oh!

dongjoon-hyun commented Jan 29, 2025

Uh oh!

dongjoon-hyun commented Jan 29, 2025

Uh oh!

LuciferYang commented Jan 30, 2025

Uh oh!

LuciferYang commented Jan 30, 2025

Uh oh!

dongjoon-hyun commented Jan 30, 2025

Uh oh!

dongjoon-hyun commented Jan 30, 2025

Uh oh!

dongjoon-hyun commented Jan 30, 2025

Uh oh!

LuciferYang commented Jan 30, 2025

Uh oh!

dongjoon-hyun commented Jan 30, 2025

Uh oh!

pan3793 commented Mar 7, 2025 •

edited

Loading

Uh oh!

dongjoon-hyun commented Mar 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-51029][BUILD] Remove hive-llap-common compile dependency #49725

[SPARK-51029][BUILD] Remove hive-llap-common compile dependency #49725

Uh oh!

Conversation

dongjoon-hyun commented Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun commented Jan 29, 2025

Uh oh!

dongjoon-hyun commented Jan 29, 2025

Uh oh!

LuciferYang commented Jan 30, 2025

Uh oh!

LuciferYang commented Jan 30, 2025

Uh oh!

dongjoon-hyun commented Jan 30, 2025

Uh oh!

dongjoon-hyun commented Jan 30, 2025

Uh oh!

dongjoon-hyun commented Jan 30, 2025

Uh oh!

LuciferYang commented Jan 30, 2025

Uh oh!

dongjoon-hyun commented Jan 30, 2025

Uh oh!

pan3793 commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Mar 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-51029][BUILD] Remove `hive-llap-common` compile dependency #49725

[SPARK-51029][BUILD] Remove `hive-llap-common` compile dependency #49725

dongjoon-hyun commented Jan 29, 2025 •

edited

Loading

pan3793 commented Mar 7, 2025 •

edited

Loading