-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-31312][SQL] Cache Class instance for the UDF instance in HiveFunctionWrapper #28079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ntext classloader after transformed
| } | ||
| } | ||
|
|
||
| test("SPARK-26560 Spark should be able to run Hive UDF using jar regardless of " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is moved to HiveUDFDynamicLoadSuite - now it's being tested with 5 available Hive UDF types.
| clazz = Utils.getContextOrSparkClassLoader.loadClass(functionClassName) | ||
| .asInstanceOf[Class[_ <: AnyRef]] | ||
| } | ||
| val func = clazz.getConstructor().newInstance().asInstanceOf[UDFType] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add clazz = null below this line, the new UT (SPARK-31312) fails with UDF type (only one of 5 fails, because other cases this cases instance instead).
|
cc. @cloud-fan @maropu |
| val jarUrl = getHiveUDFTestJarUrl | ||
| test("SPARK-26560 Spark should be able to run Hive UDF using jar regardless of " + | ||
| s"current thread context classloader (${udfInfo.identifier}") { | ||
| testHiveUDFUsingJarWithChangingClassloader( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if the method is only called once, can we inline it?
|
|
||
| test("SPARK-31312 Transformed Hive UDF using jar expression should not be failed to run " + | ||
| s"regardless of current thread context classloader (${udfInfo.identifier})") { | ||
| testHiveUDFUsingJarWithChangingClassloaderWithCopyUDFExpression( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| assert(Thread.currentThread().getContextClassLoader eq | ||
| spark.sqlContext.sharedState.jarClassLoader) | ||
|
|
||
| val udfExpr = fnCreateHiveUDFExpression() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test should start here. The above test is the same as testHiveUDFUsingJarWithChangingClassloader.
|
Thanks for the review comments. I've just consolidated two tests into one, and inlined. Please take a look again. |
|
Test build #120634 has finished for PR 28079 at commit
|
| ) | ||
|
|
||
| udfTestInfos.foreach { udfInfo => | ||
| val jarUrl = getHiveUDFTestJarUrl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can inline it as well
|
Test build #120641 has finished for PR 28079 at commit
|
|
Test build #120642 has finished for PR 28079 at commit
|
…unctionWrapper ### What changes were proposed in this pull request? This patch proposes to cache Class instance for the UDF instance in HiveFunctionWrapper to fix the case where Hive simple UDF is somehow transformed (expression is copied) and evaluated later with another classloader (for the case current thread context classloader is somehow changed). In this case, Spark throws CNFE as of now. It's only occurred for Hive simple UDF, as HiveFunctionWrapper caches the UDF instance whereas it doesn't do for `UDF` type. The comment says Spark has to create instance every time for UDF, so we cannot simply do the same. This patch caches Class instance instead, and switch current thread context classloader to which loads the Class instance. This patch extends the test boundary as well. We only tested with GenericUDTF for SPARK-26560, and this patch actually requires only UDF. But to avoid regression for other types as well, this patch adds all available types (UDF, GenericUDF, AbstractGenericUDAFResolver, UDAF, GenericUDTF) into the boundary of tests. Credit to cloud-fan as he discovered the problem and proposed the solution. ### Why are the changes needed? Above section describes why it's a bug and how it's fixed. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? New UTs added. Closes #28079 from HeartSaVioR/SPARK-31312. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 2a6aa8e) Signed-off-by: Wenchen Fan <[email protected]>
|
thanks, merging to master/3.0! |
|
@HeartSaVioR can you send a backport PR for 2.4? thanks! |
…unctionWrapper This patch proposes to cache Class instance for the UDF instance in HiveFunctionWrapper to fix the case where Hive simple UDF is somehow transformed (expression is copied) and evaluated later with another classloader (for the case current thread context classloader is somehow changed). In this case, Spark throws CNFE as of now. It's only occurred for Hive simple UDF, as HiveFunctionWrapper caches the UDF instance whereas it doesn't do for `UDF` type. The comment says Spark has to create instance every time for UDF, so we cannot simply do the same. This patch caches Class instance instead, and switch current thread context classloader to which loads the Class instance. This patch extends the test boundary as well. We only tested with GenericUDTF for SPARK-26560, and this patch actually requires only UDF. But to avoid regression for other types as well, this patch adds all available types (UDF, GenericUDF, AbstractGenericUDAFResolver, UDAF, GenericUDTF) into the boundary of tests. Credit to cloud-fan as he discovered the problem and proposed the solution. Above section describes why it's a bug and how it's fixed. No. New UTs added. Closes apache#28079 from HeartSaVioR/SPARK-31312. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
|
Thanks for the quick review and merge! #28086 is for branch-2.4. |
|
late LGTM, thanks for the work, @HeartSaVioR ! |
|
(very late) LGTM! |
…unctionWrapper ### What changes were proposed in this pull request? This patch proposes to cache Class instance for the UDF instance in HiveFunctionWrapper to fix the case where Hive simple UDF is somehow transformed (expression is copied) and evaluated later with another classloader (for the case current thread context classloader is somehow changed). In this case, Spark throws CNFE as of now. It's only occurred for Hive simple UDF, as HiveFunctionWrapper caches the UDF instance whereas it doesn't do for `UDF` type. The comment says Spark has to create instance every time for UDF, so we cannot simply do the same. This patch caches Class instance instead, and switch current thread context classloader to which loads the Class instance. This patch extends the test boundary as well. We only tested with GenericUDTF for SPARK-26560, and this patch actually requires only UDF. But to avoid regression for other types as well, this patch adds all available types (UDF, GenericUDF, AbstractGenericUDAFResolver, UDAF, GenericUDTF) into the boundary of tests. Credit to cloud-fan as he discovered the problem and proposed the solution. ### Why are the changes needed? Above section describes why it's a bug and how it's fixed. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? New UTs added. Closes apache#28079 from HeartSaVioR/SPARK-31312. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
This patch proposes to cache Class instance for the UDF instance in HiveFunctionWrapper to fix the case where Hive simple UDF is somehow transformed (expression is copied) and evaluated later with another classloader (for the case current thread context classloader is somehow changed). In this case, Spark throws CNFE as of now.
It's only occurred for Hive simple UDF, as HiveFunctionWrapper caches the UDF instance whereas it doesn't do for
UDFtype. The comment says Spark has to create instance every time for UDF, so we cannot simply do the same. This patch caches Class instance instead, and switch current thread context classloader to which loads the Class instance.This patch extends the test boundary as well. We only tested with GenericUDTF for SPARK-26560, and this patch actually requires only UDF. But to avoid regression for other types as well, this patch adds all available types (UDF, GenericUDF, AbstractGenericUDAFResolver, UDAF, GenericUDTF) into the boundary of tests.
Credit to @cloud-fan as he discovered the problem and proposed the solution.
Why are the changes needed?
Above section describes why it's a bug and how it's fixed.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
New UTs added.