[SPARK-49533][CORE][TESTS] Change default `ivySettings` in the `IvyTestUtis#withRepository` function to use `.ivy2.5.2` as the Default Ivy User Dir #48006

LuciferYang · 2024-09-06T02:44:56Z

What changes were proposed in this pull request?

This pull request introduces changes to the default value of the ivySettings parameter in the IvyTestUtils#withRepository function. During the construction of the IvySettings object, the configurations of DefaultIvyUserDir and DefaultCache within the instance are modified through an additional call to the MavenUtils.processIvyPathArg function:

The DefaultIvyUserDir is set to ${user.home}/.ivy2.5.2.
The DefaultCache is set to the cache directory under the modified IvyUserDir. By default, the cache directory is ${user.home}/.ivy2/cache.

These alterations are made to address a Badcase in the testing process.

Additionally, to allow IvyTestUtils to invoke the MavenUtils.processIvyPathArg function, the visibility of the processIvyPathArg function has been adjusted from private to private[util].

Why are the changes needed?

To fix a Badcase in the testing, the reproduction steps are as follows:

Clean up files and directories related to mylib-0.1.jar under ~/.ivy2.5.2
Execute the following tests using Java 21:

java -version
openjdk version "21.0.4" 2024-07-16 LTS
OpenJDK Runtime Environment Zulu21.36+17-CA (build 21.0.4+7-LTS)
OpenJDK 64-Bit Server VM Zulu21.36+17-CA (build 21.0.4+7-LTS, mixed mode, sharing)
build/sbt clean "connect-client-jvm/testOnly org.apache.spark.sql.application.ReplE2ESuite" -Phive

Deleting /Users/yangjie01/.ivy2/cache/my.great.lib, exists: false
file:/Users/yangjie01/SourceCode/git/spark-sbt/target/tmp/spark-2a9107ea-4e09-4dfe-a270-921d799837fb/ added as a remote repository with the name: repo-1
:: loading settings :: url = jar:file:/Users/yangjie01/Library/Caches/Coursier/v1/https/maven-central.storage-download.googleapis.com/maven2/org/apache/ivy/ivy/2.5.2/ivy-2.5.2.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /Users/yangjie01/.ivy2.5.2/cache
The jars for the packages stored in: /Users/yangjie01/.ivy2.5.2/jars
my.great.lib#mylib added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-5827ff8a-7a85-4598-8ced-e949457752e4;1.0
	confs: [default]
	found my.great.lib#mylib;0.1 in repo-1
downloading file:/Users/yangjie01/SourceCode/git/spark-sbt/target/tmp/spark-2a9107ea-4e09-4dfe-a270-921d799837fb/my/great/lib/mylib/0.1/mylib-0.1.jar ...
	[SUCCESSFUL ] my.great.lib#mylib;0.1!mylib.jar (1ms)
:: resolution report :: resolve 4325ms :: artifacts dl 2ms
	:: modules in use:
	my.great.lib#mylib;0.1 from repo-1 in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   1   |   1   |   1   |   0   ||   1   |   1   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-5827ff8a-7a85-4598-8ced-e949457752e4
	confs: [default]
	1 artifacts copied, 0 already retrieved (0kB/6ms)
Deleting /Users/yangjie01/.ivy2/cache/my.great.lib, exists: false
[info] - External JAR (6 seconds, 288 milliseconds)
...
[info] Run completed in 40 seconds, 441 milliseconds.
[info] Total number of tests run: 26
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 26, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

Re-execute the above tests using Java 17:

java -version
openjdk version "17.0.12" 2024-07-16 LTS
OpenJDK Runtime Environment Zulu17.52+17-CA (build 17.0.12+7-LTS)
OpenJDK 64-Bit Server VM Zulu17.52+17-CA (build 17.0.12+7-LTS, mixed mode, sharing)
build/sbt clean "connect-client-jvm/testOnly org.apache.spark.sql.application.ReplE2ESuite" -Phive

[info] - External JAR *** FAILED *** (1 second, 626 milliseconds)
[info]   isContain was false Ammonite output did not contain 'Array[Int] = Array(1, 2, 3, 4, 5)':
[info]   scala>  

[info]   scala> // this import will fail 

[info]   scala> import my.great.lib.MyLib 

[info]   scala>  

[info]   scala> // making library available in the REPL to compile UDF 

[info]   scala> import coursierapi.{Credentials, MavenRepository} 
import coursierapi.{Credentials, MavenRepository}
[info]   
[info]   scala> interp.repositories() ++= Seq(MavenRepository.of("file:/Users/yangjie01/SourceCode/git/spark-sbt/target/tmp/spark-6e6bc234-758f-44f1-a8b3-fbb79ed74647/")) 

[info]   
[info]   scala> import $ivy.`my.great.lib:mylib:0.1` 
import $ivy.$
[info]   
[info]   scala>  

[info]   scala> val func = udf((a: Int) => {
[info]            import my.great.lib.MyLib
[info]            MyLib.myFunc(a)
[info]          }) 
func: org.apache.spark.sql.expressions.UserDefinedFunction = SparkUserDefinedFunction(
[info]     f = ammonite.$sess.cmd28$Helper$$Lambda$3059/0x0000000801da4218@721b2487,
[info]     dataType = IntegerType,
[info]     inputEncoders = ArraySeq(Some(value = PrimitiveIntEncoder)),
[info]     outputEncoder = Some(value = BoxedIntEncoder),
[info]     givenName = None,
[info]     nullable = true,
[info]     deterministic = true
[info]   )
[info]   
[info]   scala>  

[info]   scala> // add library to the Executor 

[info]   scala> spark.addArtifact("ivy://my.great.lib:mylib:0.1?repos=file:/Users/yangjie01/SourceCode/git/spark-sbt/target/tmp/spark-6e6bc234-758f-44f1-a8b3-fbb79ed74647/") 

[info]   
[info]   scala>  

[info]   scala> spark.range(5).select(func(col("id"))).as[Int].collect() 

[info]   scala>  

[info]   scala> semaphore.release() 

[info]   Error Output: Compiling (synthetic)/ammonite/predef/ArgsPredef.sc
[info]   Compiling /Users/yangjie01/SourceCode/git/spark-sbt/connector/connect/client/jvm/(console)
[info]   cmd25.sc:1: not found: value my
[info]   import my.great.lib.MyLib
[info]          ^
[info]   Compilation Failed
[info]   org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] User defined function (` (cmd28$Helper$$Lambda$3054/0x0000007002189800)`: (int) => int) failed due to: java.lang.UnsupportedClassVersionError: my/great/lib/MyLib has been compiled by a more recent version of the Java Runtime (class file version 65.0), this version of the Java Runtime only recognizes class file versions up to 61.0. SQLSTATE: 39000
[info]     org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:195)
[info]     org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala)
[info]     org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(generated.java:114)
[info]     org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
[info]     org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50)
[info]     org.apache.spark.sql.execution.arrow.ArrowConverters$ArrowBatchIterator.hasNext(ArrowConverters.scala:100)
[info]     scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583)
[info]     scala.collection.mutable.Growable.addAll(Growable.scala:61)
[info]     scala.collection.mutable.Growable.addAll$(Growable.scala:57)
[info]     scala.collection.mutable.ArrayBuilder.addAll(ArrayBuilder.scala:75)
[info]     scala.collection.IterableOnceOps.toArray(IterableOnce.scala:1505)
[info]     scala.collection.IterableOnceOps.toArray$(IterableOnce.scala:1498)
[info]     scala.collection.AbstractIterator.toArray(Iterator.scala:1303)
[info]     org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.$anonfun$processAsArrowBatches$5(SparkConnectPlanExecution.scala:183)
[info]     org.apache.spark.SparkContext.$anonfun$submitJob$1(SparkContext.scala:2608)
[info]     org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
[info]     org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171)
[info]     org.apache.spark.scheduler.Task.run(Task.scala:146)
[info]     org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:644)
[info]     org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
[info]     org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
[info]     org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99)
[info]     org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:647)
[info]     java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[info]     java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[info]     java.lang.Thread.run(Thread.java:840)
[info]   org.apache.spark.SparkException: java.lang.UnsupportedClassVersionError: my/great/lib/MyLib has been compiled by a more recent version of the Java Runtime (class file version 65.0), this version of the Java Runtime only recognizes class file versions up to 61.0
[info]     java.lang.ClassLoader.defineClass1(Native Method)
[info]     java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
[info]     java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
[info]     java.net.URLClassLoader.defineClass(URLClassLoader.java:524)
[info]     java.net.URLClassLoader$1.run(URLClassLoader.java:427)
[info]     java.net.URLClassLoader$1.run(URLClassLoader.java:421)
[info]     java.security.AccessController.doPrivileged(AccessController.java:712)
[info]     java.net.URLClassLoader.findClass(URLClassLoader.java:420)
[info]     java.lang.ClassLoader.loadClass(ClassLoader.java:592)
[info]     org.apache.spark.util.ChildFirstURLClassLoader.loadClass(ChildFirstURLClassLoader.java:55)
[info]     java.lang.ClassLoader.loadClass(ClassLoader.java:579)
[info]     org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.java:40)
[info]     java.lang.ClassLoader.loadClass(ClassLoader.java:525)
[info]     org.apache.spark.executor.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:109)
[info]     java.lang.ClassLoader.loadClass(ClassLoader.java:592)
[info]     java.lang.ClassLoader.loadClass(ClassLoader.java:525)
[info]     ammonite.$sess.cmd28$Helper.$anonfun$func$1(cmd28.sc:3)
[info]     ammonite.$sess.cmd28$Helper.$anonfun$func$1$adapted(cmd28.sc:1)
[info]     org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(generated.java:112)
[info]     org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
[info]     org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50)
[info]     org.apache.spark.sql.execution.arrow.ArrowConverters$ArrowBatchIterator.hasNext(ArrowConverters.scala:100)
[info]     scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583)
[info]     scala.collection.mutable.Growable.addAll(Growable.scala:61)
[info]     scala.collection.mutable.Growable.addAll$(Growable.scala:57)
[info]     scala.collection.mutable.ArrayBuilder.addAll(ArrayBuilder.scala:75)
[info]     scala.collection.IterableOnceOps.toArray(IterableOnce.scala:1505)
[info]     scala.collection.IterableOnceOps.toArray$(IterableOnce.scala:1498)
[info]     scala.collection.AbstractIterator.toArray(Iterator.scala:1303)
[info]     org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.$anonfun$processAsArrowBatches$5(SparkConnectPlanExecution.scala:183)
[info]     org.apache.spark.SparkContext.$anonfun$submitJob$1(SparkContext.scala:2608)
[info]     org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
[info]     org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171)
[info]     org.apache.spark.scheduler.Task.run(Task.scala:146)
[info]     org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:644)
[info]     org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
[info]     org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
[info]     org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99)
[info]     org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:647)
[info]     java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[info]     java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[info]     java.lang.Thread.run(Thread.java:840) (ReplE2ESuite.scala:117)

The reasons I suspect for the aforementioned bad case are as follows:

Following [SPARK-44914][BUILD] Upgrade Apache Ivy to 2.5.2 #45075, to address compatibility issues, Spark 4.0 adopted ~/.ivy2.5.2 as the default Ivy user directory. When tests are executed with Java 21, the compiled mylib-0.1.jar is published to the directory ~/.ivy2.5.2/cache/my.great.lib/mylib/jars.
However, the getDefaultCache method within the default IvySettings instance still returns ~/.ivy2/cache. Consequently, when the purgeLocalIvyCache function is called within the withRepository function, it attempts to clean the artifact and deps directories under ~/.ivy2/cache. This results in the failure to effectively clean up the mylib-0.1.jar file located at ~/.ivy2.5.2/cache/my.great.lib/mylib/jars, which was originally published by Java 21. Subsequently, when tests are executed with Java 17 and attempt to load this Java 21-compiled mylib-0.1.jar, the tests fail.

spark/common/utils/src/test/scala/org/apache/spark/util/IvyTestUtils.scala

Lines 361 to 371 in 9269a0b

    
           private[spark] def withRepository( 
        
               artifact: MavenCoordinate, 
        
               dependencies: Option[String], 
        
               rootDir: Option[File], 
        
               useIvyLayout: Boolean = false, 
        
               withPython: Boolean = false, 
        
               withR: Boolean = false, 
        
               ivySettings: IvySettings = new IvySettings)(f: String => Unit): Unit = { 
        
             val deps = dependencies.map(MavenUtils.extractMavenCoordinates) 
        
             purgeLocalIvyCache(artifact, deps, ivySettings) 
        
             val repo = createLocalRepositoryForTests(artifact, dependencies, rootDir, useIvyLayout,

spark/common/utils/src/test/scala/org/apache/spark/util/IvyTestUtils.scala

Lines 392 to 403 in 9269a0b

    
             /** Deletes the test packages from the ivy cache */ 
        
             private def purgeLocalIvyCache( 
        
                 artifact: MavenCoordinate, 
        
                 dependencies: Option[Seq[MavenCoordinate]], 
        
                 ivySettings: IvySettings): Unit = { 
        
               // delete the artifact from the cache as well if it already exists 
        
               FileUtils.deleteDirectory(new File(ivySettings.getDefaultCache, artifact.groupId)) 
        
               dependencies.foreach { _.foreach { dep => 
        
                   FileUtils.deleteDirectory(new File(ivySettings.getDefaultCache, dep.groupId)) 
        
                 } 
        
               } 
        
             }

To address this issue, the pull request modifies the default configuration of the IvySettings instance, ensuring that purgeLocalIvyCache is able to properly clean up the corresponding cache files located in ~/.ivy2.5.2/cache. This resolution fixes the aforementioned problem.

Does this PR introduce any user-facing change?

No, just for test

How was this patch tested?

Pass GitHub Actions
Manually executing the tests described in the pull request results in success, and it is confirmed that the ~/.ivy2.5.2/cache/my.great.lib directory is cleaned up promptly.

Was this patch authored or co-authored using generative AI tooling?

NO

LuciferYang · 2024-09-06T02:45:05Z

Test first, will update pr description later.

dongjoon-hyun

Thank you. Looks valid to me. Shall we file a JIRA and convert to a normal PR, @LuciferYang ?

LuciferYang · 2024-09-06T08:21:22Z

Thank you. Looks valid to me. Shall we file a JIRA and convert to a normal PR, @LuciferYang ?

done

dongjoon-hyun

+1, LGTM. Thank you, @LuciferYang .
Merged to master.

LuciferYang · 2024-09-08T14:39:20Z

Thanks @dongjoon-hyun

…stUtis#withRepository` function to use `.ivy2.5.2` as the Default Ivy User Dir ### What changes were proposed in this pull request? This pull request introduces changes to the default value of the `ivySettings` parameter in the `IvyTestUtils#withRepository` function. During the construction of the `IvySettings` object, the configurations of `DefaultIvyUserDir` and `DefaultCache` within the instance are modified through an additional call to the `MavenUtils.processIvyPathArg` function: 1. The `DefaultIvyUserDir` is set to `${user.home}/.ivy2.5.2`. 2. The `DefaultCache` is set to the `cache` directory under the modified `IvyUserDir`. By default, the `cache` directory is `${user.home}/.ivy2/cache`. These alterations are made to address a Badcase in the testing process. Additionally, to allow `IvyTestUtils` to invoke the `MavenUtils.processIvyPathArg` function, the visibility of the `processIvyPathArg` function has been adjusted from `private` to `private[util]`. ### Why are the changes needed? To fix a Badcase in the testing, the reproduction steps are as follows: 1. Clean up files and directories related to `mylib-0.1.jar` under `~/.ivy2.5.2` 2. Execute the following tests using Java 21: ``` java -version openjdk version "21.0.4" 2024-07-16 LTS OpenJDK Runtime Environment Zulu21.36+17-CA (build 21.0.4+7-LTS) OpenJDK 64-Bit Server VM Zulu21.36+17-CA (build 21.0.4+7-LTS, mixed mode, sharing) build/sbt clean "connect-client-jvm/testOnly org.apache.spark.sql.application.ReplE2ESuite" -Phive ``` ``` Deleting /Users/yangjie01/.ivy2/cache/my.great.lib, exists: false file:/Users/yangjie01/SourceCode/git/spark-sbt/target/tmp/spark-2a9107ea-4e09-4dfe-a270-921d799837fb/ added as a remote repository with the name: repo-1 :: loading settings :: url = jar:file:/Users/yangjie01/Library/Caches/Coursier/v1/https/maven-central.storage-download.googleapis.com/maven2/org/apache/ivy/ivy/2.5.2/ivy-2.5.2.jar!/org/apache/ivy/core/settings/ivysettings.xml Ivy Default Cache set to: /Users/yangjie01/.ivy2.5.2/cache The jars for the packages stored in: /Users/yangjie01/.ivy2.5.2/jars my.great.lib#mylib added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent-5827ff8a-7a85-4598-8ced-e949457752e4;1.0 confs: [default] found my.great.lib#mylib;0.1 in repo-1 downloading file:/Users/yangjie01/SourceCode/git/spark-sbt/target/tmp/spark-2a9107ea-4e09-4dfe-a270-921d799837fb/my/great/lib/mylib/0.1/mylib-0.1.jar ... [SUCCESSFUL ] my.great.lib#mylib;0.1!mylib.jar (1ms) :: resolution report :: resolve 4325ms :: artifacts dl 2ms :: modules in use: my.great.lib#mylib;0.1 from repo-1 in [default] --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | default | 1 | 1 | 1 | 0 || 1 | 1 | --------------------------------------------------------------------- :: retrieving :: org.apache.spark#spark-submit-parent-5827ff8a-7a85-4598-8ced-e949457752e4 confs: [default] 1 artifacts copied, 0 already retrieved (0kB/6ms) Deleting /Users/yangjie01/.ivy2/cache/my.great.lib, exists: false [info] - External JAR (6 seconds, 288 milliseconds) ... [info] Run completed in 40 seconds, 441 milliseconds. [info] Total number of tests run: 26 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 26, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` 3. Re-execute the above tests using Java 17: ``` java -version openjdk version "17.0.12" 2024-07-16 LTS OpenJDK Runtime Environment Zulu17.52+17-CA (build 17.0.12+7-LTS) OpenJDK 64-Bit Server VM Zulu17.52+17-CA (build 17.0.12+7-LTS, mixed mode, sharing) build/sbt clean "connect-client-jvm/testOnly org.apache.spark.sql.application.ReplE2ESuite" -Phive ``` ``` [info] - External JAR *** FAILED *** (1 second, 626 milliseconds) [info] isContain was false Ammonite output did not contain 'Array[Int] = Array(1, 2, 3, 4, 5)': [info] scala> [info] scala> // this import will fail [info] scala> import my.great.lib.MyLib [info] scala> [info] scala> // making library available in the REPL to compile UDF [info] scala> import coursierapi.{Credentials, MavenRepository} import coursierapi.{Credentials, MavenRepository} [info] [info] scala> interp.repositories() ++= Seq(MavenRepository.of("file:/Users/yangjie01/SourceCode/git/spark-sbt/target/tmp/spark-6e6bc234-758f-44f1-a8b3-fbb79ed74647/")) [info] [info] scala> import $ivy.`my.great.lib:mylib:0.1` import $ivy.$ [info] [info] scala> [info] scala> val func = udf((a: Int) => { [info] import my.great.lib.MyLib [info] MyLib.myFunc(a) [info] }) func: org.apache.spark.sql.expressions.UserDefinedFunction = SparkUserDefinedFunction( [info] f = ammonite.$sess.cmd28$Helper$$Lambda$3059/0x0000000801da4218721b2487, [info] dataType = IntegerType, [info] inputEncoders = ArraySeq(Some(value = PrimitiveIntEncoder)), [info] outputEncoder = Some(value = BoxedIntEncoder), [info] givenName = None, [info] nullable = true, [info] deterministic = true [info] ) [info] [info] scala> [info] scala> // add library to the Executor [info] scala> spark.addArtifact("ivy://my.great.lib:mylib:0.1?repos=file:/Users/yangjie01/SourceCode/git/spark-sbt/target/tmp/spark-6e6bc234-758f-44f1-a8b3-fbb79ed74647/") [info] [info] scala> [info] scala> spark.range(5).select(func(col("id"))).as[Int].collect() [info] scala> [info] scala> semaphore.release() [info] Error Output: Compiling (synthetic)/ammonite/predef/ArgsPredef.sc [info] Compiling /Users/yangjie01/SourceCode/git/spark-sbt/connector/connect/client/jvm/(console) [info] cmd25.sc:1: not found: value my [info] import my.great.lib.MyLib [info] ^ [info] Compilation Failed [info] org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] User defined function (` (cmd28$Helper$$Lambda$3054/0x0000007002189800)`: (int) => int) failed due to: java.lang.UnsupportedClassVersionError: my/great/lib/MyLib has been compiled by a more recent version of the Java Runtime (class file version 65.0), this version of the Java Runtime only recognizes class file versions up to 61.0. SQLSTATE: 39000 [info] org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:195) [info] org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala) [info] org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(generated.java:114) [info] org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) [info] org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50) [info] org.apache.spark.sql.execution.arrow.ArrowConverters$ArrowBatchIterator.hasNext(ArrowConverters.scala:100) [info] scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583) [info] scala.collection.mutable.Growable.addAll(Growable.scala:61) [info] scala.collection.mutable.Growable.addAll$(Growable.scala:57) [info] scala.collection.mutable.ArrayBuilder.addAll(ArrayBuilder.scala:75) [info] scala.collection.IterableOnceOps.toArray(IterableOnce.scala:1505) [info] scala.collection.IterableOnceOps.toArray$(IterableOnce.scala:1498) [info] scala.collection.AbstractIterator.toArray(Iterator.scala:1303) [info] org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.$anonfun$processAsArrowBatches$5(SparkConnectPlanExecution.scala:183) [info] org.apache.spark.SparkContext.$anonfun$submitJob$1(SparkContext.scala:2608) [info] org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) [info] org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171) [info] org.apache.spark.scheduler.Task.run(Task.scala:146) [info] org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:644) [info] org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) [info] org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) [info] org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99) [info] org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:647) [info] java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [info] java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [info] java.lang.Thread.run(Thread.java:840) [info] org.apache.spark.SparkException: java.lang.UnsupportedClassVersionError: my/great/lib/MyLib has been compiled by a more recent version of the Java Runtime (class file version 65.0), this version of the Java Runtime only recognizes class file versions up to 61.0 [info] java.lang.ClassLoader.defineClass1(Native Method) [info] java.lang.ClassLoader.defineClass(ClassLoader.java:1017) [info] java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) [info] java.net.URLClassLoader.defineClass(URLClassLoader.java:524) [info] java.net.URLClassLoader$1.run(URLClassLoader.java:427) [info] java.net.URLClassLoader$1.run(URLClassLoader.java:421) [info] java.security.AccessController.doPrivileged(AccessController.java:712) [info] java.net.URLClassLoader.findClass(URLClassLoader.java:420) [info] java.lang.ClassLoader.loadClass(ClassLoader.java:592) [info] org.apache.spark.util.ChildFirstURLClassLoader.loadClass(ChildFirstURLClassLoader.java:55) [info] java.lang.ClassLoader.loadClass(ClassLoader.java:579) [info] org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.java:40) [info] java.lang.ClassLoader.loadClass(ClassLoader.java:525) [info] org.apache.spark.executor.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:109) [info] java.lang.ClassLoader.loadClass(ClassLoader.java:592) [info] java.lang.ClassLoader.loadClass(ClassLoader.java:525) [info] ammonite.$sess.cmd28$Helper.$anonfun$func$1(cmd28.sc:3) [info] ammonite.$sess.cmd28$Helper.$anonfun$func$1$adapted(cmd28.sc:1) [info] org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(generated.java:112) [info] org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) [info] org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50) [info] org.apache.spark.sql.execution.arrow.ArrowConverters$ArrowBatchIterator.hasNext(ArrowConverters.scala:100) [info] scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583) [info] scala.collection.mutable.Growable.addAll(Growable.scala:61) [info] scala.collection.mutable.Growable.addAll$(Growable.scala:57) [info] scala.collection.mutable.ArrayBuilder.addAll(ArrayBuilder.scala:75) [info] scala.collection.IterableOnceOps.toArray(IterableOnce.scala:1505) [info] scala.collection.IterableOnceOps.toArray$(IterableOnce.scala:1498) [info] scala.collection.AbstractIterator.toArray(Iterator.scala:1303) [info] org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.$anonfun$processAsArrowBatches$5(SparkConnectPlanExecution.scala:183) [info] org.apache.spark.SparkContext.$anonfun$submitJob$1(SparkContext.scala:2608) [info] org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) [info] org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171) [info] org.apache.spark.scheduler.Task.run(Task.scala:146) [info] org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:644) [info] org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) [info] org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) [info] org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99) [info] org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:647) [info] java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [info] java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [info] java.lang.Thread.run(Thread.java:840) (ReplE2ESuite.scala:117) ``` The reasons I suspect for the aforementioned bad case are as follows: 1. Following apache#45075, to address compatibility issues, Spark 4.0 adopted `~/.ivy2.5.2` as the default Ivy user directory. When tests are executed with Java 21, the compiled `mylib-0.1.jar` is published to the directory `~/.ivy2.5.2/cache/my.great.lib/mylib/jars`. 2. However, the `getDefaultCache` method within the default `IvySettings` instance still returns `~/.ivy2/cache`. Consequently, when the `purgeLocalIvyCache` function is called within the `withRepository` function, it attempts to clean the `artifact` and `deps` directories under `~/.ivy2/cache`. This results in the failure to effectively clean up the `mylib-0.1.jar` file located at `~/.ivy2.5.2/cache/my.great.lib/mylib/jars`, which was originally published by Java 21. Subsequently, when tests are executed with Java 17 and attempt to load this Java 21-compiled `mylib-0.1.jar`, the tests fail. https://github.com/apache/spark/blob/9269a0bfed56429e999269dfdfd89aefcb1b7261/common/utils/src/test/scala/org/apache/spark/util/IvyTestUtils.scala#L361-L371 https://github.com/apache/spark/blob/9269a0bfed56429e999269dfdfd89aefcb1b7261/common/utils/src/test/scala/org/apache/spark/util/IvyTestUtils.scala#L392-L403 To address this issue, the pull request modifies the default configuration of the `IvySettings` instance, ensuring that `purgeLocalIvyCache` is able to properly clean up the corresponding cache files located in `~/.ivy2.5.2/cache`. This resolution fixes the aforementioned problem. ### Does this PR introduce _any_ user-facing change? No, just for test ### How was this patch tested? 1. Pass GitHub Actions 2. Manually executing the tests described in the pull request results in success, and it is confirmed that the `~/.ivy2.5.2/cache/my.great.lib` directory is cleaned up promptly. ### Was this patch authored or co-authored using generative AI tooling? NO Closes apache#48006 from LuciferYang/IvyTestUtils-withRepository. Authored-by: yangjie01 <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

init

66b7479

LuciferYang marked this pull request as draft September 6, 2024 02:45

dongjoon-hyun reviewed Sep 6, 2024

View reviewed changes

LuciferYang marked this pull request as ready for review September 6, 2024 08:21

dongjoon-hyun approved these changes Sep 6, 2024

View reviewed changes

dongjoon-hyun closed this in b5e345c Sep 6, 2024

LuciferYang deleted the IvyTestUtils-withRepository branch May 2, 2025 05:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-49533][CORE][TESTS] Change default `ivySettings` in the `IvyTestUtis#withRepository` function to use `.ivy2.5.2` as the Default Ivy User Dir #48006

[SPARK-49533][CORE][TESTS] Change default `ivySettings` in the `IvyTestUtis#withRepository` function to use `.ivy2.5.2` as the Default Ivy User Dir #48006

Uh oh!

LuciferYang commented Sep 6, 2024 •

edited

Loading

Uh oh!

LuciferYang commented Sep 6, 2024 •

edited

Loading

Uh oh!

dongjoon-hyun left a comment

Uh oh!

LuciferYang commented Sep 6, 2024

Uh oh!

dongjoon-hyun left a comment •

edited

Loading

Uh oh!

LuciferYang commented Sep 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	private[spark] def withRepository(
	artifact: MavenCoordinate,
	dependencies: Option[String],
	rootDir: Option[File],
	useIvyLayout: Boolean = false,
	withPython: Boolean = false,
	withR: Boolean = false,
	ivySettings: IvySettings = new IvySettings)(f: String => Unit): Unit = {
	val deps = dependencies.map(MavenUtils.extractMavenCoordinates)
	purgeLocalIvyCache(artifact, deps, ivySettings)
	val repo = createLocalRepositoryForTests(artifact, dependencies, rootDir, useIvyLayout,

	/** Deletes the test packages from the ivy cache */
	private def purgeLocalIvyCache(
	artifact: MavenCoordinate,
	dependencies: Option[Seq[MavenCoordinate]],
	ivySettings: IvySettings): Unit = {
	// delete the artifact from the cache as well if it already exists
	FileUtils.deleteDirectory(new File(ivySettings.getDefaultCache, artifact.groupId))
	dependencies.foreach { _.foreach { dep =>
	FileUtils.deleteDirectory(new File(ivySettings.getDefaultCache, dep.groupId))
	}
	}
	}

[SPARK-49533][CORE][TESTS] Change default ivySettings in the IvyTestUtis#withRepository function to use .ivy2.5.2 as the Default Ivy User Dir #48006

[SPARK-49533][CORE][TESTS] Change default ivySettings in the IvyTestUtis#withRepository function to use .ivy2.5.2 as the Default Ivy User Dir #48006

Uh oh!

Conversation

LuciferYang commented Sep 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

LuciferYang commented Sep 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

LuciferYang commented Sep 6, 2024

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LuciferYang commented Sep 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-49533][CORE][TESTS] Change default `ivySettings` in the `IvyTestUtis#withRepository` function to use `.ivy2.5.2` as the Default Ivy User Dir #48006

[SPARK-49533][CORE][TESTS] Change default `ivySettings` in the `IvyTestUtis#withRepository` function to use `.ivy2.5.2` as the Default Ivy User Dir #48006

LuciferYang commented Sep 6, 2024 •

edited

Loading

LuciferYang commented Sep 6, 2024 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading