-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-49533][CORE][TESTS] Change default ivySettings in the IvyTestUtis#withRepository function to use .ivy2.5.2 as the Default Ivy User Dir
#48006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test first, will update pr description later. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. Looks valid to me. Shall we file a JIRA and convert to a normal PR, @LuciferYang ?
ivySettings in the IvyTestUtis#withRepository function to use .ivy2.5.2 as the Default Ivy User DirivySettings in the IvyTestUtis#withRepository function to use .ivy2.5.2 as the Default Ivy User Dir
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @LuciferYang .
Merged to master.
|
Thanks @dongjoon-hyun |
…stUtis#withRepository` function to use `.ivy2.5.2` as the Default Ivy User Dir
### What changes were proposed in this pull request?
This pull request introduces changes to the default value of the `ivySettings` parameter in the `IvyTestUtils#withRepository` function. During the construction of the `IvySettings` object, the configurations of `DefaultIvyUserDir` and `DefaultCache` within the instance are modified through an additional call to the `MavenUtils.processIvyPathArg` function:
1. The `DefaultIvyUserDir` is set to `${user.home}/.ivy2.5.2`.
2. The `DefaultCache` is set to the `cache` directory under the modified `IvyUserDir`. By default, the `cache` directory is `${user.home}/.ivy2/cache`.
These alterations are made to address a Badcase in the testing process.
Additionally, to allow `IvyTestUtils` to invoke the `MavenUtils.processIvyPathArg` function, the visibility of the `processIvyPathArg` function has been adjusted from `private` to `private[util]`.
### Why are the changes needed?
To fix a Badcase in the testing, the reproduction steps are as follows:
1. Clean up files and directories related to `mylib-0.1.jar` under `~/.ivy2.5.2`
2. Execute the following tests using Java 21:
```
java -version
openjdk version "21.0.4" 2024-07-16 LTS
OpenJDK Runtime Environment Zulu21.36+17-CA (build 21.0.4+7-LTS)
OpenJDK 64-Bit Server VM Zulu21.36+17-CA (build 21.0.4+7-LTS, mixed mode, sharing)
build/sbt clean "connect-client-jvm/testOnly org.apache.spark.sql.application.ReplE2ESuite" -Phive
```
```
Deleting /Users/yangjie01/.ivy2/cache/my.great.lib, exists: false
file:/Users/yangjie01/SourceCode/git/spark-sbt/target/tmp/spark-2a9107ea-4e09-4dfe-a270-921d799837fb/ added as a remote repository with the name: repo-1
:: loading settings :: url = jar:file:/Users/yangjie01/Library/Caches/Coursier/v1/https/maven-central.storage-download.googleapis.com/maven2/org/apache/ivy/ivy/2.5.2/ivy-2.5.2.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /Users/yangjie01/.ivy2.5.2/cache
The jars for the packages stored in: /Users/yangjie01/.ivy2.5.2/jars
my.great.lib#mylib added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-5827ff8a-7a85-4598-8ced-e949457752e4;1.0
confs: [default]
found my.great.lib#mylib;0.1 in repo-1
downloading file:/Users/yangjie01/SourceCode/git/spark-sbt/target/tmp/spark-2a9107ea-4e09-4dfe-a270-921d799837fb/my/great/lib/mylib/0.1/mylib-0.1.jar ...
[SUCCESSFUL ] my.great.lib#mylib;0.1!mylib.jar (1ms)
:: resolution report :: resolve 4325ms :: artifacts dl 2ms
:: modules in use:
my.great.lib#mylib;0.1 from repo-1 in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 1 | 1 | 1 | 0 || 1 | 1 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-5827ff8a-7a85-4598-8ced-e949457752e4
confs: [default]
1 artifacts copied, 0 already retrieved (0kB/6ms)
Deleting /Users/yangjie01/.ivy2/cache/my.great.lib, exists: false
[info] - External JAR (6 seconds, 288 milliseconds)
...
[info] Run completed in 40 seconds, 441 milliseconds.
[info] Total number of tests run: 26
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 26, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
3. Re-execute the above tests using Java 17:
```
java -version
openjdk version "17.0.12" 2024-07-16 LTS
OpenJDK Runtime Environment Zulu17.52+17-CA (build 17.0.12+7-LTS)
OpenJDK 64-Bit Server VM Zulu17.52+17-CA (build 17.0.12+7-LTS, mixed mode, sharing)
build/sbt clean "connect-client-jvm/testOnly org.apache.spark.sql.application.ReplE2ESuite" -Phive
```
```
[info] - External JAR *** FAILED *** (1 second, 626 milliseconds)
[info] isContain was false Ammonite output did not contain 'Array[Int] = Array(1, 2, 3, 4, 5)':
[info] scala>
[info] scala> // this import will fail
[info] scala> import my.great.lib.MyLib
[info] scala>
[info] scala> // making library available in the REPL to compile UDF
[info] scala> import coursierapi.{Credentials, MavenRepository}
import coursierapi.{Credentials, MavenRepository}
[info]
[info] scala> interp.repositories() ++= Seq(MavenRepository.of("file:/Users/yangjie01/SourceCode/git/spark-sbt/target/tmp/spark-6e6bc234-758f-44f1-a8b3-fbb79ed74647/"))
[info]
[info] scala> import $ivy.`my.great.lib:mylib:0.1`
import $ivy.$
[info]
[info] scala>
[info] scala> val func = udf((a: Int) => {
[info] import my.great.lib.MyLib
[info] MyLib.myFunc(a)
[info] })
func: org.apache.spark.sql.expressions.UserDefinedFunction = SparkUserDefinedFunction(
[info] f = ammonite.$sess.cmd28$Helper$$Lambda$3059/0x0000000801da4218721b2487,
[info] dataType = IntegerType,
[info] inputEncoders = ArraySeq(Some(value = PrimitiveIntEncoder)),
[info] outputEncoder = Some(value = BoxedIntEncoder),
[info] givenName = None,
[info] nullable = true,
[info] deterministic = true
[info] )
[info]
[info] scala>
[info] scala> // add library to the Executor
[info] scala> spark.addArtifact("ivy://my.great.lib:mylib:0.1?repos=file:/Users/yangjie01/SourceCode/git/spark-sbt/target/tmp/spark-6e6bc234-758f-44f1-a8b3-fbb79ed74647/")
[info]
[info] scala>
[info] scala> spark.range(5).select(func(col("id"))).as[Int].collect()
[info] scala>
[info] scala> semaphore.release()
[info] Error Output: Compiling (synthetic)/ammonite/predef/ArgsPredef.sc
[info] Compiling /Users/yangjie01/SourceCode/git/spark-sbt/connector/connect/client/jvm/(console)
[info] cmd25.sc:1: not found: value my
[info] import my.great.lib.MyLib
[info] ^
[info] Compilation Failed
[info] org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] User defined function (` (cmd28$Helper$$Lambda$3054/0x0000007002189800)`: (int) => int) failed due to: java.lang.UnsupportedClassVersionError: my/great/lib/MyLib has been compiled by a more recent version of the Java Runtime (class file version 65.0), this version of the Java Runtime only recognizes class file versions up to 61.0. SQLSTATE: 39000
[info] org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:195)
[info] org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala)
[info] org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(generated.java:114)
[info] org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
[info] org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50)
[info] org.apache.spark.sql.execution.arrow.ArrowConverters$ArrowBatchIterator.hasNext(ArrowConverters.scala:100)
[info] scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583)
[info] scala.collection.mutable.Growable.addAll(Growable.scala:61)
[info] scala.collection.mutable.Growable.addAll$(Growable.scala:57)
[info] scala.collection.mutable.ArrayBuilder.addAll(ArrayBuilder.scala:75)
[info] scala.collection.IterableOnceOps.toArray(IterableOnce.scala:1505)
[info] scala.collection.IterableOnceOps.toArray$(IterableOnce.scala:1498)
[info] scala.collection.AbstractIterator.toArray(Iterator.scala:1303)
[info] org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.$anonfun$processAsArrowBatches$5(SparkConnectPlanExecution.scala:183)
[info] org.apache.spark.SparkContext.$anonfun$submitJob$1(SparkContext.scala:2608)
[info] org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
[info] org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171)
[info] org.apache.spark.scheduler.Task.run(Task.scala:146)
[info] org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:644)
[info] org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
[info] org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
[info] org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99)
[info] org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:647)
[info] java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[info] java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[info] java.lang.Thread.run(Thread.java:840)
[info] org.apache.spark.SparkException: java.lang.UnsupportedClassVersionError: my/great/lib/MyLib has been compiled by a more recent version of the Java Runtime (class file version 65.0), this version of the Java Runtime only recognizes class file versions up to 61.0
[info] java.lang.ClassLoader.defineClass1(Native Method)
[info] java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
[info] java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
[info] java.net.URLClassLoader.defineClass(URLClassLoader.java:524)
[info] java.net.URLClassLoader$1.run(URLClassLoader.java:427)
[info] java.net.URLClassLoader$1.run(URLClassLoader.java:421)
[info] java.security.AccessController.doPrivileged(AccessController.java:712)
[info] java.net.URLClassLoader.findClass(URLClassLoader.java:420)
[info] java.lang.ClassLoader.loadClass(ClassLoader.java:592)
[info] org.apache.spark.util.ChildFirstURLClassLoader.loadClass(ChildFirstURLClassLoader.java:55)
[info] java.lang.ClassLoader.loadClass(ClassLoader.java:579)
[info] org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.java:40)
[info] java.lang.ClassLoader.loadClass(ClassLoader.java:525)
[info] org.apache.spark.executor.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:109)
[info] java.lang.ClassLoader.loadClass(ClassLoader.java:592)
[info] java.lang.ClassLoader.loadClass(ClassLoader.java:525)
[info] ammonite.$sess.cmd28$Helper.$anonfun$func$1(cmd28.sc:3)
[info] ammonite.$sess.cmd28$Helper.$anonfun$func$1$adapted(cmd28.sc:1)
[info] org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(generated.java:112)
[info] org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
[info] org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50)
[info] org.apache.spark.sql.execution.arrow.ArrowConverters$ArrowBatchIterator.hasNext(ArrowConverters.scala:100)
[info] scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583)
[info] scala.collection.mutable.Growable.addAll(Growable.scala:61)
[info] scala.collection.mutable.Growable.addAll$(Growable.scala:57)
[info] scala.collection.mutable.ArrayBuilder.addAll(ArrayBuilder.scala:75)
[info] scala.collection.IterableOnceOps.toArray(IterableOnce.scala:1505)
[info] scala.collection.IterableOnceOps.toArray$(IterableOnce.scala:1498)
[info] scala.collection.AbstractIterator.toArray(Iterator.scala:1303)
[info] org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.$anonfun$processAsArrowBatches$5(SparkConnectPlanExecution.scala:183)
[info] org.apache.spark.SparkContext.$anonfun$submitJob$1(SparkContext.scala:2608)
[info] org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
[info] org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171)
[info] org.apache.spark.scheduler.Task.run(Task.scala:146)
[info] org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:644)
[info] org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
[info] org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
[info] org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99)
[info] org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:647)
[info] java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[info] java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[info] java.lang.Thread.run(Thread.java:840) (ReplE2ESuite.scala:117)
```
The reasons I suspect for the aforementioned bad case are as follows:
1. Following apache#45075, to address compatibility issues, Spark 4.0 adopted `~/.ivy2.5.2` as the default Ivy user directory. When tests are executed with Java 21, the compiled `mylib-0.1.jar` is published to the directory `~/.ivy2.5.2/cache/my.great.lib/mylib/jars`.
2. However, the `getDefaultCache` method within the default `IvySettings` instance still returns `~/.ivy2/cache`. Consequently, when the `purgeLocalIvyCache` function is called within the `withRepository` function, it attempts to clean the `artifact` and `deps` directories under `~/.ivy2/cache`. This results in the failure to effectively clean up the `mylib-0.1.jar` file located at `~/.ivy2.5.2/cache/my.great.lib/mylib/jars`, which was originally published by Java 21. Subsequently, when tests are executed with Java 17 and attempt to load this Java 21-compiled `mylib-0.1.jar`, the tests fail.
https://github.com/apache/spark/blob/9269a0bfed56429e999269dfdfd89aefcb1b7261/common/utils/src/test/scala/org/apache/spark/util/IvyTestUtils.scala#L361-L371
https://github.com/apache/spark/blob/9269a0bfed56429e999269dfdfd89aefcb1b7261/common/utils/src/test/scala/org/apache/spark/util/IvyTestUtils.scala#L392-L403
To address this issue, the pull request modifies the default configuration of the `IvySettings` instance, ensuring that `purgeLocalIvyCache` is able to properly clean up the corresponding cache files located in `~/.ivy2.5.2/cache`. This resolution fixes the aforementioned problem.
### Does this PR introduce _any_ user-facing change?
No, just for test
### How was this patch tested?
1. Pass GitHub Actions
2. Manually executing the tests described in the pull request results in success, and it is confirmed that the `~/.ivy2.5.2/cache/my.great.lib` directory is cleaned up promptly.
### Was this patch authored or co-authored using generative AI tooling?
NO
Closes apache#48006 from LuciferYang/IvyTestUtils-withRepository.
Authored-by: yangjie01 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
…stUtis#withRepository` function to use `.ivy2.5.2` as the Default Ivy User Dir
### What changes were proposed in this pull request?
This pull request introduces changes to the default value of the `ivySettings` parameter in the `IvyTestUtils#withRepository` function. During the construction of the `IvySettings` object, the configurations of `DefaultIvyUserDir` and `DefaultCache` within the instance are modified through an additional call to the `MavenUtils.processIvyPathArg` function:
1. The `DefaultIvyUserDir` is set to `${user.home}/.ivy2.5.2`.
2. The `DefaultCache` is set to the `cache` directory under the modified `IvyUserDir`. By default, the `cache` directory is `${user.home}/.ivy2/cache`.
These alterations are made to address a Badcase in the testing process.
Additionally, to allow `IvyTestUtils` to invoke the `MavenUtils.processIvyPathArg` function, the visibility of the `processIvyPathArg` function has been adjusted from `private` to `private[util]`.
### Why are the changes needed?
To fix a Badcase in the testing, the reproduction steps are as follows:
1. Clean up files and directories related to `mylib-0.1.jar` under `~/.ivy2.5.2`
2. Execute the following tests using Java 21:
```
java -version
openjdk version "21.0.4" 2024-07-16 LTS
OpenJDK Runtime Environment Zulu21.36+17-CA (build 21.0.4+7-LTS)
OpenJDK 64-Bit Server VM Zulu21.36+17-CA (build 21.0.4+7-LTS, mixed mode, sharing)
build/sbt clean "connect-client-jvm/testOnly org.apache.spark.sql.application.ReplE2ESuite" -Phive
```
```
Deleting /Users/yangjie01/.ivy2/cache/my.great.lib, exists: false
file:/Users/yangjie01/SourceCode/git/spark-sbt/target/tmp/spark-2a9107ea-4e09-4dfe-a270-921d799837fb/ added as a remote repository with the name: repo-1
:: loading settings :: url = jar:file:/Users/yangjie01/Library/Caches/Coursier/v1/https/maven-central.storage-download.googleapis.com/maven2/org/apache/ivy/ivy/2.5.2/ivy-2.5.2.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /Users/yangjie01/.ivy2.5.2/cache
The jars for the packages stored in: /Users/yangjie01/.ivy2.5.2/jars
my.great.lib#mylib added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-5827ff8a-7a85-4598-8ced-e949457752e4;1.0
confs: [default]
found my.great.lib#mylib;0.1 in repo-1
downloading file:/Users/yangjie01/SourceCode/git/spark-sbt/target/tmp/spark-2a9107ea-4e09-4dfe-a270-921d799837fb/my/great/lib/mylib/0.1/mylib-0.1.jar ...
[SUCCESSFUL ] my.great.lib#mylib;0.1!mylib.jar (1ms)
:: resolution report :: resolve 4325ms :: artifacts dl 2ms
:: modules in use:
my.great.lib#mylib;0.1 from repo-1 in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 1 | 1 | 1 | 0 || 1 | 1 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-5827ff8a-7a85-4598-8ced-e949457752e4
confs: [default]
1 artifacts copied, 0 already retrieved (0kB/6ms)
Deleting /Users/yangjie01/.ivy2/cache/my.great.lib, exists: false
[info] - External JAR (6 seconds, 288 milliseconds)
...
[info] Run completed in 40 seconds, 441 milliseconds.
[info] Total number of tests run: 26
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 26, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
3. Re-execute the above tests using Java 17:
```
java -version
openjdk version "17.0.12" 2024-07-16 LTS
OpenJDK Runtime Environment Zulu17.52+17-CA (build 17.0.12+7-LTS)
OpenJDK 64-Bit Server VM Zulu17.52+17-CA (build 17.0.12+7-LTS, mixed mode, sharing)
build/sbt clean "connect-client-jvm/testOnly org.apache.spark.sql.application.ReplE2ESuite" -Phive
```
```
[info] - External JAR *** FAILED *** (1 second, 626 milliseconds)
[info] isContain was false Ammonite output did not contain 'Array[Int] = Array(1, 2, 3, 4, 5)':
[info] scala>
[info] scala> // this import will fail
[info] scala> import my.great.lib.MyLib
[info] scala>
[info] scala> // making library available in the REPL to compile UDF
[info] scala> import coursierapi.{Credentials, MavenRepository}
import coursierapi.{Credentials, MavenRepository}
[info]
[info] scala> interp.repositories() ++= Seq(MavenRepository.of("file:/Users/yangjie01/SourceCode/git/spark-sbt/target/tmp/spark-6e6bc234-758f-44f1-a8b3-fbb79ed74647/"))
[info]
[info] scala> import $ivy.`my.great.lib:mylib:0.1`
import $ivy.$
[info]
[info] scala>
[info] scala> val func = udf((a: Int) => {
[info] import my.great.lib.MyLib
[info] MyLib.myFunc(a)
[info] })
func: org.apache.spark.sql.expressions.UserDefinedFunction = SparkUserDefinedFunction(
[info] f = ammonite.$sess.cmd28$Helper$$Lambda$3059/0x0000000801da4218721b2487,
[info] dataType = IntegerType,
[info] inputEncoders = ArraySeq(Some(value = PrimitiveIntEncoder)),
[info] outputEncoder = Some(value = BoxedIntEncoder),
[info] givenName = None,
[info] nullable = true,
[info] deterministic = true
[info] )
[info]
[info] scala>
[info] scala> // add library to the Executor
[info] scala> spark.addArtifact("ivy://my.great.lib:mylib:0.1?repos=file:/Users/yangjie01/SourceCode/git/spark-sbt/target/tmp/spark-6e6bc234-758f-44f1-a8b3-fbb79ed74647/")
[info]
[info] scala>
[info] scala> spark.range(5).select(func(col("id"))).as[Int].collect()
[info] scala>
[info] scala> semaphore.release()
[info] Error Output: Compiling (synthetic)/ammonite/predef/ArgsPredef.sc
[info] Compiling /Users/yangjie01/SourceCode/git/spark-sbt/connector/connect/client/jvm/(console)
[info] cmd25.sc:1: not found: value my
[info] import my.great.lib.MyLib
[info] ^
[info] Compilation Failed
[info] org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] User defined function (` (cmd28$Helper$$Lambda$3054/0x0000007002189800)`: (int) => int) failed due to: java.lang.UnsupportedClassVersionError: my/great/lib/MyLib has been compiled by a more recent version of the Java Runtime (class file version 65.0), this version of the Java Runtime only recognizes class file versions up to 61.0. SQLSTATE: 39000
[info] org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:195)
[info] org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala)
[info] org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(generated.java:114)
[info] org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
[info] org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50)
[info] org.apache.spark.sql.execution.arrow.ArrowConverters$ArrowBatchIterator.hasNext(ArrowConverters.scala:100)
[info] scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583)
[info] scala.collection.mutable.Growable.addAll(Growable.scala:61)
[info] scala.collection.mutable.Growable.addAll$(Growable.scala:57)
[info] scala.collection.mutable.ArrayBuilder.addAll(ArrayBuilder.scala:75)
[info] scala.collection.IterableOnceOps.toArray(IterableOnce.scala:1505)
[info] scala.collection.IterableOnceOps.toArray$(IterableOnce.scala:1498)
[info] scala.collection.AbstractIterator.toArray(Iterator.scala:1303)
[info] org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.$anonfun$processAsArrowBatches$5(SparkConnectPlanExecution.scala:183)
[info] org.apache.spark.SparkContext.$anonfun$submitJob$1(SparkContext.scala:2608)
[info] org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
[info] org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171)
[info] org.apache.spark.scheduler.Task.run(Task.scala:146)
[info] org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:644)
[info] org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
[info] org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
[info] org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99)
[info] org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:647)
[info] java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[info] java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[info] java.lang.Thread.run(Thread.java:840)
[info] org.apache.spark.SparkException: java.lang.UnsupportedClassVersionError: my/great/lib/MyLib has been compiled by a more recent version of the Java Runtime (class file version 65.0), this version of the Java Runtime only recognizes class file versions up to 61.0
[info] java.lang.ClassLoader.defineClass1(Native Method)
[info] java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
[info] java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
[info] java.net.URLClassLoader.defineClass(URLClassLoader.java:524)
[info] java.net.URLClassLoader$1.run(URLClassLoader.java:427)
[info] java.net.URLClassLoader$1.run(URLClassLoader.java:421)
[info] java.security.AccessController.doPrivileged(AccessController.java:712)
[info] java.net.URLClassLoader.findClass(URLClassLoader.java:420)
[info] java.lang.ClassLoader.loadClass(ClassLoader.java:592)
[info] org.apache.spark.util.ChildFirstURLClassLoader.loadClass(ChildFirstURLClassLoader.java:55)
[info] java.lang.ClassLoader.loadClass(ClassLoader.java:579)
[info] org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.java:40)
[info] java.lang.ClassLoader.loadClass(ClassLoader.java:525)
[info] org.apache.spark.executor.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:109)
[info] java.lang.ClassLoader.loadClass(ClassLoader.java:592)
[info] java.lang.ClassLoader.loadClass(ClassLoader.java:525)
[info] ammonite.$sess.cmd28$Helper.$anonfun$func$1(cmd28.sc:3)
[info] ammonite.$sess.cmd28$Helper.$anonfun$func$1$adapted(cmd28.sc:1)
[info] org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(generated.java:112)
[info] org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
[info] org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50)
[info] org.apache.spark.sql.execution.arrow.ArrowConverters$ArrowBatchIterator.hasNext(ArrowConverters.scala:100)
[info] scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583)
[info] scala.collection.mutable.Growable.addAll(Growable.scala:61)
[info] scala.collection.mutable.Growable.addAll$(Growable.scala:57)
[info] scala.collection.mutable.ArrayBuilder.addAll(ArrayBuilder.scala:75)
[info] scala.collection.IterableOnceOps.toArray(IterableOnce.scala:1505)
[info] scala.collection.IterableOnceOps.toArray$(IterableOnce.scala:1498)
[info] scala.collection.AbstractIterator.toArray(Iterator.scala:1303)
[info] org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.$anonfun$processAsArrowBatches$5(SparkConnectPlanExecution.scala:183)
[info] org.apache.spark.SparkContext.$anonfun$submitJob$1(SparkContext.scala:2608)
[info] org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
[info] org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171)
[info] org.apache.spark.scheduler.Task.run(Task.scala:146)
[info] org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:644)
[info] org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
[info] org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
[info] org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99)
[info] org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:647)
[info] java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[info] java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[info] java.lang.Thread.run(Thread.java:840) (ReplE2ESuite.scala:117)
```
The reasons I suspect for the aforementioned bad case are as follows:
1. Following apache#45075, to address compatibility issues, Spark 4.0 adopted `~/.ivy2.5.2` as the default Ivy user directory. When tests are executed with Java 21, the compiled `mylib-0.1.jar` is published to the directory `~/.ivy2.5.2/cache/my.great.lib/mylib/jars`.
2. However, the `getDefaultCache` method within the default `IvySettings` instance still returns `~/.ivy2/cache`. Consequently, when the `purgeLocalIvyCache` function is called within the `withRepository` function, it attempts to clean the `artifact` and `deps` directories under `~/.ivy2/cache`. This results in the failure to effectively clean up the `mylib-0.1.jar` file located at `~/.ivy2.5.2/cache/my.great.lib/mylib/jars`, which was originally published by Java 21. Subsequently, when tests are executed with Java 17 and attempt to load this Java 21-compiled `mylib-0.1.jar`, the tests fail.
https://github.com/apache/spark/blob/9269a0bfed56429e999269dfdfd89aefcb1b7261/common/utils/src/test/scala/org/apache/spark/util/IvyTestUtils.scala#L361-L371
https://github.com/apache/spark/blob/9269a0bfed56429e999269dfdfd89aefcb1b7261/common/utils/src/test/scala/org/apache/spark/util/IvyTestUtils.scala#L392-L403
To address this issue, the pull request modifies the default configuration of the `IvySettings` instance, ensuring that `purgeLocalIvyCache` is able to properly clean up the corresponding cache files located in `~/.ivy2.5.2/cache`. This resolution fixes the aforementioned problem.
### Does this PR introduce _any_ user-facing change?
No, just for test
### How was this patch tested?
1. Pass GitHub Actions
2. Manually executing the tests described in the pull request results in success, and it is confirmed that the `~/.ivy2.5.2/cache/my.great.lib` directory is cleaned up promptly.
### Was this patch authored or co-authored using generative AI tooling?
NO
Closes apache#48006 from LuciferYang/IvyTestUtils-withRepository.
Authored-by: yangjie01 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
…stUtis#withRepository` function to use `.ivy2.5.2` as the Default Ivy User Dir
### What changes were proposed in this pull request?
This pull request introduces changes to the default value of the `ivySettings` parameter in the `IvyTestUtils#withRepository` function. During the construction of the `IvySettings` object, the configurations of `DefaultIvyUserDir` and `DefaultCache` within the instance are modified through an additional call to the `MavenUtils.processIvyPathArg` function:
1. The `DefaultIvyUserDir` is set to `${user.home}/.ivy2.5.2`.
2. The `DefaultCache` is set to the `cache` directory under the modified `IvyUserDir`. By default, the `cache` directory is `${user.home}/.ivy2/cache`.
These alterations are made to address a Badcase in the testing process.
Additionally, to allow `IvyTestUtils` to invoke the `MavenUtils.processIvyPathArg` function, the visibility of the `processIvyPathArg` function has been adjusted from `private` to `private[util]`.
### Why are the changes needed?
To fix a Badcase in the testing, the reproduction steps are as follows:
1. Clean up files and directories related to `mylib-0.1.jar` under `~/.ivy2.5.2`
2. Execute the following tests using Java 21:
```
java -version
openjdk version "21.0.4" 2024-07-16 LTS
OpenJDK Runtime Environment Zulu21.36+17-CA (build 21.0.4+7-LTS)
OpenJDK 64-Bit Server VM Zulu21.36+17-CA (build 21.0.4+7-LTS, mixed mode, sharing)
build/sbt clean "connect-client-jvm/testOnly org.apache.spark.sql.application.ReplE2ESuite" -Phive
```
```
Deleting /Users/yangjie01/.ivy2/cache/my.great.lib, exists: false
file:/Users/yangjie01/SourceCode/git/spark-sbt/target/tmp/spark-2a9107ea-4e09-4dfe-a270-921d799837fb/ added as a remote repository with the name: repo-1
:: loading settings :: url = jar:file:/Users/yangjie01/Library/Caches/Coursier/v1/https/maven-central.storage-download.googleapis.com/maven2/org/apache/ivy/ivy/2.5.2/ivy-2.5.2.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /Users/yangjie01/.ivy2.5.2/cache
The jars for the packages stored in: /Users/yangjie01/.ivy2.5.2/jars
my.great.lib#mylib added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-5827ff8a-7a85-4598-8ced-e949457752e4;1.0
confs: [default]
found my.great.lib#mylib;0.1 in repo-1
downloading file:/Users/yangjie01/SourceCode/git/spark-sbt/target/tmp/spark-2a9107ea-4e09-4dfe-a270-921d799837fb/my/great/lib/mylib/0.1/mylib-0.1.jar ...
[SUCCESSFUL ] my.great.lib#mylib;0.1!mylib.jar (1ms)
:: resolution report :: resolve 4325ms :: artifacts dl 2ms
:: modules in use:
my.great.lib#mylib;0.1 from repo-1 in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 1 | 1 | 1 | 0 || 1 | 1 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-5827ff8a-7a85-4598-8ced-e949457752e4
confs: [default]
1 artifacts copied, 0 already retrieved (0kB/6ms)
Deleting /Users/yangjie01/.ivy2/cache/my.great.lib, exists: false
[info] - External JAR (6 seconds, 288 milliseconds)
...
[info] Run completed in 40 seconds, 441 milliseconds.
[info] Total number of tests run: 26
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 26, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
3. Re-execute the above tests using Java 17:
```
java -version
openjdk version "17.0.12" 2024-07-16 LTS
OpenJDK Runtime Environment Zulu17.52+17-CA (build 17.0.12+7-LTS)
OpenJDK 64-Bit Server VM Zulu17.52+17-CA (build 17.0.12+7-LTS, mixed mode, sharing)
build/sbt clean "connect-client-jvm/testOnly org.apache.spark.sql.application.ReplE2ESuite" -Phive
```
```
[info] - External JAR *** FAILED *** (1 second, 626 milliseconds)
[info] isContain was false Ammonite output did not contain 'Array[Int] = Array(1, 2, 3, 4, 5)':
[info] scala>
[info] scala> // this import will fail
[info] scala> import my.great.lib.MyLib
[info] scala>
[info] scala> // making library available in the REPL to compile UDF
[info] scala> import coursierapi.{Credentials, MavenRepository}
import coursierapi.{Credentials, MavenRepository}
[info]
[info] scala> interp.repositories() ++= Seq(MavenRepository.of("file:/Users/yangjie01/SourceCode/git/spark-sbt/target/tmp/spark-6e6bc234-758f-44f1-a8b3-fbb79ed74647/"))
[info]
[info] scala> import $ivy.`my.great.lib:mylib:0.1`
import $ivy.$
[info]
[info] scala>
[info] scala> val func = udf((a: Int) => {
[info] import my.great.lib.MyLib
[info] MyLib.myFunc(a)
[info] })
func: org.apache.spark.sql.expressions.UserDefinedFunction = SparkUserDefinedFunction(
[info] f = ammonite.$sess.cmd28$Helper$$Lambda$3059/0x0000000801da4218721b2487,
[info] dataType = IntegerType,
[info] inputEncoders = ArraySeq(Some(value = PrimitiveIntEncoder)),
[info] outputEncoder = Some(value = BoxedIntEncoder),
[info] givenName = None,
[info] nullable = true,
[info] deterministic = true
[info] )
[info]
[info] scala>
[info] scala> // add library to the Executor
[info] scala> spark.addArtifact("ivy://my.great.lib:mylib:0.1?repos=file:/Users/yangjie01/SourceCode/git/spark-sbt/target/tmp/spark-6e6bc234-758f-44f1-a8b3-fbb79ed74647/")
[info]
[info] scala>
[info] scala> spark.range(5).select(func(col("id"))).as[Int].collect()
[info] scala>
[info] scala> semaphore.release()
[info] Error Output: Compiling (synthetic)/ammonite/predef/ArgsPredef.sc
[info] Compiling /Users/yangjie01/SourceCode/git/spark-sbt/connector/connect/client/jvm/(console)
[info] cmd25.sc:1: not found: value my
[info] import my.great.lib.MyLib
[info] ^
[info] Compilation Failed
[info] org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] User defined function (` (cmd28$Helper$$Lambda$3054/0x0000007002189800)`: (int) => int) failed due to: java.lang.UnsupportedClassVersionError: my/great/lib/MyLib has been compiled by a more recent version of the Java Runtime (class file version 65.0), this version of the Java Runtime only recognizes class file versions up to 61.0. SQLSTATE: 39000
[info] org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:195)
[info] org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala)
[info] org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(generated.java:114)
[info] org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
[info] org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50)
[info] org.apache.spark.sql.execution.arrow.ArrowConverters$ArrowBatchIterator.hasNext(ArrowConverters.scala:100)
[info] scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583)
[info] scala.collection.mutable.Growable.addAll(Growable.scala:61)
[info] scala.collection.mutable.Growable.addAll$(Growable.scala:57)
[info] scala.collection.mutable.ArrayBuilder.addAll(ArrayBuilder.scala:75)
[info] scala.collection.IterableOnceOps.toArray(IterableOnce.scala:1505)
[info] scala.collection.IterableOnceOps.toArray$(IterableOnce.scala:1498)
[info] scala.collection.AbstractIterator.toArray(Iterator.scala:1303)
[info] org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.$anonfun$processAsArrowBatches$5(SparkConnectPlanExecution.scala:183)
[info] org.apache.spark.SparkContext.$anonfun$submitJob$1(SparkContext.scala:2608)
[info] org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
[info] org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171)
[info] org.apache.spark.scheduler.Task.run(Task.scala:146)
[info] org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:644)
[info] org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
[info] org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
[info] org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99)
[info] org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:647)
[info] java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[info] java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[info] java.lang.Thread.run(Thread.java:840)
[info] org.apache.spark.SparkException: java.lang.UnsupportedClassVersionError: my/great/lib/MyLib has been compiled by a more recent version of the Java Runtime (class file version 65.0), this version of the Java Runtime only recognizes class file versions up to 61.0
[info] java.lang.ClassLoader.defineClass1(Native Method)
[info] java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
[info] java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
[info] java.net.URLClassLoader.defineClass(URLClassLoader.java:524)
[info] java.net.URLClassLoader$1.run(URLClassLoader.java:427)
[info] java.net.URLClassLoader$1.run(URLClassLoader.java:421)
[info] java.security.AccessController.doPrivileged(AccessController.java:712)
[info] java.net.URLClassLoader.findClass(URLClassLoader.java:420)
[info] java.lang.ClassLoader.loadClass(ClassLoader.java:592)
[info] org.apache.spark.util.ChildFirstURLClassLoader.loadClass(ChildFirstURLClassLoader.java:55)
[info] java.lang.ClassLoader.loadClass(ClassLoader.java:579)
[info] org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.java:40)
[info] java.lang.ClassLoader.loadClass(ClassLoader.java:525)
[info] org.apache.spark.executor.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:109)
[info] java.lang.ClassLoader.loadClass(ClassLoader.java:592)
[info] java.lang.ClassLoader.loadClass(ClassLoader.java:525)
[info] ammonite.$sess.cmd28$Helper.$anonfun$func$1(cmd28.sc:3)
[info] ammonite.$sess.cmd28$Helper.$anonfun$func$1$adapted(cmd28.sc:1)
[info] org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(generated.java:112)
[info] org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
[info] org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50)
[info] org.apache.spark.sql.execution.arrow.ArrowConverters$ArrowBatchIterator.hasNext(ArrowConverters.scala:100)
[info] scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583)
[info] scala.collection.mutable.Growable.addAll(Growable.scala:61)
[info] scala.collection.mutable.Growable.addAll$(Growable.scala:57)
[info] scala.collection.mutable.ArrayBuilder.addAll(ArrayBuilder.scala:75)
[info] scala.collection.IterableOnceOps.toArray(IterableOnce.scala:1505)
[info] scala.collection.IterableOnceOps.toArray$(IterableOnce.scala:1498)
[info] scala.collection.AbstractIterator.toArray(Iterator.scala:1303)
[info] org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.$anonfun$processAsArrowBatches$5(SparkConnectPlanExecution.scala:183)
[info] org.apache.spark.SparkContext.$anonfun$submitJob$1(SparkContext.scala:2608)
[info] org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
[info] org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171)
[info] org.apache.spark.scheduler.Task.run(Task.scala:146)
[info] org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:644)
[info] org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
[info] org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
[info] org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99)
[info] org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:647)
[info] java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[info] java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[info] java.lang.Thread.run(Thread.java:840) (ReplE2ESuite.scala:117)
```
The reasons I suspect for the aforementioned bad case are as follows:
1. Following apache#45075, to address compatibility issues, Spark 4.0 adopted `~/.ivy2.5.2` as the default Ivy user directory. When tests are executed with Java 21, the compiled `mylib-0.1.jar` is published to the directory `~/.ivy2.5.2/cache/my.great.lib/mylib/jars`.
2. However, the `getDefaultCache` method within the default `IvySettings` instance still returns `~/.ivy2/cache`. Consequently, when the `purgeLocalIvyCache` function is called within the `withRepository` function, it attempts to clean the `artifact` and `deps` directories under `~/.ivy2/cache`. This results in the failure to effectively clean up the `mylib-0.1.jar` file located at `~/.ivy2.5.2/cache/my.great.lib/mylib/jars`, which was originally published by Java 21. Subsequently, when tests are executed with Java 17 and attempt to load this Java 21-compiled `mylib-0.1.jar`, the tests fail.
https://github.com/apache/spark/blob/9269a0bfed56429e999269dfdfd89aefcb1b7261/common/utils/src/test/scala/org/apache/spark/util/IvyTestUtils.scala#L361-L371
https://github.com/apache/spark/blob/9269a0bfed56429e999269dfdfd89aefcb1b7261/common/utils/src/test/scala/org/apache/spark/util/IvyTestUtils.scala#L392-L403
To address this issue, the pull request modifies the default configuration of the `IvySettings` instance, ensuring that `purgeLocalIvyCache` is able to properly clean up the corresponding cache files located in `~/.ivy2.5.2/cache`. This resolution fixes the aforementioned problem.
### Does this PR introduce _any_ user-facing change?
No, just for test
### How was this patch tested?
1. Pass GitHub Actions
2. Manually executing the tests described in the pull request results in success, and it is confirmed that the `~/.ivy2.5.2/cache/my.great.lib` directory is cleaned up promptly.
### Was this patch authored or co-authored using generative AI tooling?
NO
Closes apache#48006 from LuciferYang/IvyTestUtils-withRepository.
Authored-by: yangjie01 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This pull request introduces changes to the default value of the
ivySettingsparameter in theIvyTestUtils#withRepositoryfunction. During the construction of theIvySettingsobject, the configurations ofDefaultIvyUserDirandDefaultCachewithin the instance are modified through an additional call to theMavenUtils.processIvyPathArgfunction:DefaultIvyUserDiris set to${user.home}/.ivy2.5.2.DefaultCacheis set to thecachedirectory under the modifiedIvyUserDir. By default, thecachedirectory is${user.home}/.ivy2/cache.These alterations are made to address a Badcase in the testing process.
Additionally, to allow
IvyTestUtilsto invoke theMavenUtils.processIvyPathArgfunction, the visibility of theprocessIvyPathArgfunction has been adjusted fromprivatetoprivate[util].Why are the changes needed?
To fix a Badcase in the testing, the reproduction steps are as follows:
mylib-0.1.jarunder~/.ivy2.5.2The reasons I suspect for the aforementioned bad case are as follows:
Following [SPARK-44914][BUILD] Upgrade Apache Ivy to 2.5.2 #45075, to address compatibility issues, Spark 4.0 adopted
~/.ivy2.5.2as the default Ivy user directory. When tests are executed with Java 21, the compiledmylib-0.1.jaris published to the directory~/.ivy2.5.2/cache/my.great.lib/mylib/jars.However, the
getDefaultCachemethod within the defaultIvySettingsinstance still returns~/.ivy2/cache. Consequently, when thepurgeLocalIvyCachefunction is called within thewithRepositoryfunction, it attempts to clean theartifactanddepsdirectories under~/.ivy2/cache. This results in the failure to effectively clean up themylib-0.1.jarfile located at~/.ivy2.5.2/cache/my.great.lib/mylib/jars, which was originally published by Java 21. Subsequently, when tests are executed with Java 17 and attempt to load this Java 21-compiledmylib-0.1.jar, the tests fail.spark/common/utils/src/test/scala/org/apache/spark/util/IvyTestUtils.scala
Lines 361 to 371 in 9269a0b
spark/common/utils/src/test/scala/org/apache/spark/util/IvyTestUtils.scala
Lines 392 to 403 in 9269a0b
To address this issue, the pull request modifies the default configuration of the
IvySettingsinstance, ensuring thatpurgeLocalIvyCacheis able to properly clean up the corresponding cache files located in~/.ivy2.5.2/cache. This resolution fixes the aforementioned problem.Does this PR introduce any user-facing change?
No, just for test
How was this patch tested?
~/.ivy2.5.2/cache/my.great.libdirectory is cleaned up promptly.Was this patch authored or co-authored using generative AI tooling?
NO