[SPARK-43744][CONNECT] Fix class loading problem caused by stub user classes not found on the server classpath #42069

zhenlineo · 2023-07-19T05:23:19Z

What changes were proposed in this pull request?

This PR introduces a stub class loader for unpacking Scala UDFs in the driver and the executor. When encountering user classes that are not found on the server session classpath, the stub class loader would try to stub the class.

This solves the problem that when serializing UDFs, Java serializer might include unnecessary user code e.g. User classes used in the lambda definition signatures in the same class where the UDF is defined.

If the user code is actually needed to execute the UDF, we will return an error message to suggest the user to add the missing classes using the addArtifact method.

Why are the changes needed?

To enhance the user experience of UDF. This PR should be merged to master and 3.5.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added test both for Scala 2.12 & 2.13

4 tests in SparkSessionE2ESuite still fail to run with maven after the fix because the client test jar is installed on the system classpath (added using --jar at server start), the stub classloader can only stub classes missing from the session classpath (added using session.addArtifact).

Moving the test jar to the session classpath causes failures in tests for flatMapGroupsWithState (SPARK-44576). Finish moving the test jar to session classpath once flatMapGroupsWithState test failures are fixed.

core/src/main/scala/org/apache/spark/executor/Executor.scala

core/src/main/java/org/apache/spark/util/ChildFirstURLClassLoader.java

core/src/main/scala/org/apache/spark/executor/Executor.scala

LuciferYang · 2023-07-25T07:49:09Z

checked maven test with this pr, there are 10 TESTS FAILED, further confirmation is needed to confirm whether all are related to this pr:

run

build/mvn clean install -DskipTests -Phive
build/mvn clean test -pl connector/connect/client/jvm

FlatMapGroupsWithStateStreamingSuite:
- flatMapGroupsWithState - streaming *** FAILED ***
  org.apache.spark.SparkException: RST_STREAM closed stream. HTTP/2 error code: PROTOCOL_ERROR
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toSparkThrowable(GrpcExceptionConverter.scala:53)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:30)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:38)
  at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
  at scala.collection.Iterator.toStream(Iterator.scala:1417)
  at scala.collection.Iterator.toStream$(Iterator.scala:1416)
  at scala.collection.AbstractIterator.toStream(Iterator.scala:1431)
  at scala.collection.TraversableOnce.toSeq(TraversableOnce.scala:354)
  at scala.collection.TraversableOnce.toSeq$(TraversableOnce.scala:354)
  at scala.collection.AbstractIterator.toSeq(Iterator.scala:1431)
  ...
- flatMapGroupsWithState - streaming - with initial state *** FAILED ***
  org.apache.spark.SparkException: RST_STREAM closed stream. HTTP/2 error code: PROTOCOL_ERROR
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toSparkThrowable(GrpcExceptionConverter.scala:53)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:30)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:38)
  at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
  at scala.collection.Iterator.toStream(Iterator.scala:1417)
  at scala.collection.Iterator.toStream$(Iterator.scala:1416)
  at scala.collection.AbstractIterator.toStream(Iterator.scala:1431)
  at scala.collection.TraversableOnce.toSeq(TraversableOnce.scala:354)
  at scala.collection.TraversableOnce.toSeq$(TraversableOnce.scala:354)
  at scala.collection.AbstractIterator.toSeq(Iterator.scala:1431)
  ...
- mapGroupsWithState - streaming *** FAILED ***
  org.apache.spark.SparkException: RST_STREAM closed stream. HTTP/2 error code: PROTOCOL_ERROR
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toSparkThrowable(GrpcExceptionConverter.scala:53)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:30)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:38)
  at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
  at scala.collection.Iterator.toStream(Iterator.scala:1417)
  at scala.collection.Iterator.toStream$(Iterator.scala:1416)
  at scala.collection.AbstractIterator.toStream(Iterator.scala:1431)
  at scala.collection.TraversableOnce.toSeq(TraversableOnce.scala:354)
  at scala.collection.TraversableOnce.toSeq$(TraversableOnce.scala:354)
  at scala.collection.AbstractIterator.toSeq(Iterator.scala:1431)
  ...
- mapGroupsWithState - streaming - with initial state *** FAILED ***
  org.apache.spark.SparkException: RST_STREAM closed stream. HTTP/2 error code: PROTOCOL_ERROR
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toSparkThrowable(GrpcExceptionConverter.scala:53)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:30)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:38)
  at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
  at scala.collection.Iterator.toStream(Iterator.scala:1417)
  at scala.collection.Iterator.toStream$(Iterator.scala:1416)
  at scala.collection.AbstractIterator.toStream(Iterator.scala:1431)
  at scala.collection.TraversableOnce.toSeq(TraversableOnce.scala:354)
  at scala.collection.TraversableOnce.toSeq$(TraversableOnce.scala:354)
  at scala.collection.AbstractIterator.toSeq(Iterator.scala:1431)
  ...
- flatMapGroupsWithState *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 489.0 failed 1 times, most recent failure: Lost task 0.0 in stage 489.0 (TID 1997) (localhost executor driver): java.lang.ClassCastException: org.apache.spark.sql.ClickState cannot be cast to org.apache.spark.sql.ClickState
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(generated.java:87)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
	at org.apache.spark.sql.execution.arrow.ArrowConverters$ArrowBatchIterator.hasNext(ArrowConverters.scala:100)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
	at scala.collection.AbstractIterator.to(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
	at org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.$anonfun$processAsArrowBatches$4(Sp...
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toSparkThrowable(GrpcExceptionConverter.scala:53)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:30)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:38)
  at org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:83)
  at org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:153)
  at org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:183)
  at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2813)
  at org.apache.spark.sql.Dataset.withResult(Dataset.scala:3252)
  at org.apache.spark.sql.Dataset.collect(Dataset.scala:2812)
  at org.apache.spark.sql.connect.client.util.QueryTest.checkDataset(QueryTest.scala:54)
  ...
- flatMapGroupsWithState - with initial state *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 494.0 failed 1 times, most recent failure: Lost task 0.0 in stage 494.0 (TID 2006) (localhost executor driver): java.lang.ClassCastException: org.apache.spark.sql.ClickState cannot be cast to org.apache.spark.sql.ClickState
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(generated.java:87)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
	at org.apache.spark.sql.execution.arrow.ArrowConverters$ArrowBatchIterator.hasNext(ArrowConverters.scala:100)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
	at scala.collection.AbstractIterator.to(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
	at org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.$anonfun$processAsArrowBatches$4(Sp...
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toSparkThrowable(GrpcExceptionConverter.scala:53)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:30)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:38)
  at org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:83)
  at org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:153)
  at org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:183)
  at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2813)
  at org.apache.spark.sql.Dataset.withResult(Dataset.scala:3252)
  at org.apache.spark.sql.Dataset.collect(Dataset.scala:2812)
  at org.apache.spark.sql.connect.client.util.QueryTest.checkDataset(QueryTest.scala:54)
  ...
- mapGroupsWithState *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 497.0 failed 1 times, most recent failure: Lost task 0.0 in stage 497.0 (TID 2013) (localhost executor driver): java.lang.ClassCastException: org.apache.spark.sql.ClickState cannot be cast to org.apache.spark.sql.ClickState
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(generated.java:87)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
	at org.apache.spark.sql.execution.arrow.ArrowConverters$ArrowBatchIterator.hasNext(ArrowConverters.scala:100)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
	at scala.collection.AbstractIterator.to(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
	at org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.$anonfun$processAsArrowBatches$4(Sp...
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toSparkThrowable(GrpcExceptionConverter.scala:53)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:30)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:38)
  at org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:83)
  at org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:153)
  at org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:183)
  at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2813)
  at org.apache.spark.sql.Dataset.withResult(Dataset.scala:3252)
  at org.apache.spark.sql.Dataset.collect(Dataset.scala:2812)
  at org.apache.spark.sql.connect.client.util.QueryTest.checkDataset(QueryTest.scala:54)
  ...
- mapGroupsWithState - with initial state *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 502.0 failed 1 times, most recent failure: Lost task 0.0 in stage 502.0 (TID 2022) (localhost executor driver): java.lang.ClassCastException: org.apache.spark.sql.ClickState cannot be cast to org.apache.spark.sql.ClickState
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(generated.java:87)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
	at org.apache.spark.sql.execution.arrow.ArrowConverters$ArrowBatchIterator.hasNext(ArrowConverters.scala:100)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
	at scala.collection.AbstractIterator.to(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
	at org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.$anonfun$processAsArrowBatches$4(Sp...
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toSparkThrowable(GrpcExceptionConverter.scala:53)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:30)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:38)
  at org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:83)
  at org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:153)
  at org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:183)
  at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2813)
  at org.apache.spark.sql.Dataset.withResult(Dataset.scala:3252)
  at org.apache.spark.sql.Dataset.collect(Dataset.scala:2812)
  at org.apache.spark.sql.connect.client.util.QueryTest.checkDataset(QueryTest.scala:54)
  ...
- update class loader after stubbing: new session *** FAILED ***
  java.io.NotSerializableException: org.scalatest.Engine
  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
  at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
  at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
  at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
  at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
  at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
  ...
- update class loader after stubbing: same session *** FAILED ***
  java.io.NotSerializableException: org.scalatest.Engine
  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
  at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
  at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
  at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
  at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
  at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
  ...
*** 10 TESTS FAILED ***

.../client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala

connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/UDFClassLoadingE2ESuite.scala

...lient/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/IntegrationTestUtils.scala

core/src/main/scala/org/apache/spark/internal/config/package.scala

LuciferYang · 2023-07-25T09:10:01Z

checked maven test with this pr, there are 10 TESTS FAILED, further confirmation is needed to confirm whether all are related to this pr:

run

build/mvn clean install -DskipTests -Phive
build/mvn clean test -pl connector/connect/client/jvm

FlatMapGroupsWithStateStreamingSuite:
- flatMapGroupsWithState - streaming *** FAILED ***
  org.apache.spark.SparkException: RST_STREAM closed stream. HTTP/2 error code: PROTOCOL_ERROR
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toSparkThrowable(GrpcExceptionConverter.scala:53)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:30)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:38)
  at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
  at scala.collection.Iterator.toStream(Iterator.scala:1417)
  at scala.collection.Iterator.toStream$(Iterator.scala:1416)
  at scala.collection.AbstractIterator.toStream(Iterator.scala:1431)
  at scala.collection.TraversableOnce.toSeq(TraversableOnce.scala:354)
  at scala.collection.TraversableOnce.toSeq$(TraversableOnce.scala:354)
  at scala.collection.AbstractIterator.toSeq(Iterator.scala:1431)
  ...
- flatMapGroupsWithState - streaming - with initial state *** FAILED ***
  org.apache.spark.SparkException: RST_STREAM closed stream. HTTP/2 error code: PROTOCOL_ERROR
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toSparkThrowable(GrpcExceptionConverter.scala:53)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:30)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:38)
  at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
  at scala.collection.Iterator.toStream(Iterator.scala:1417)
  at scala.collection.Iterator.toStream$(Iterator.scala:1416)
  at scala.collection.AbstractIterator.toStream(Iterator.scala:1431)
  at scala.collection.TraversableOnce.toSeq(TraversableOnce.scala:354)
  at scala.collection.TraversableOnce.toSeq$(TraversableOnce.scala:354)
  at scala.collection.AbstractIterator.toSeq(Iterator.scala:1431)
  ...
- mapGroupsWithState - streaming *** FAILED ***
  org.apache.spark.SparkException: RST_STREAM closed stream. HTTP/2 error code: PROTOCOL_ERROR
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toSparkThrowable(GrpcExceptionConverter.scala:53)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:30)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:38)
  at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
  at scala.collection.Iterator.toStream(Iterator.scala:1417)
  at scala.collection.Iterator.toStream$(Iterator.scala:1416)
  at scala.collection.AbstractIterator.toStream(Iterator.scala:1431)
  at scala.collection.TraversableOnce.toSeq(TraversableOnce.scala:354)
  at scala.collection.TraversableOnce.toSeq$(TraversableOnce.scala:354)
  at scala.collection.AbstractIterator.toSeq(Iterator.scala:1431)
  ...
- mapGroupsWithState - streaming - with initial state *** FAILED ***
  org.apache.spark.SparkException: RST_STREAM closed stream. HTTP/2 error code: PROTOCOL_ERROR
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toSparkThrowable(GrpcExceptionConverter.scala:53)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:30)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:38)
  at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
  at scala.collection.Iterator.toStream(Iterator.scala:1417)
  at scala.collection.Iterator.toStream$(Iterator.scala:1416)
  at scala.collection.AbstractIterator.toStream(Iterator.scala:1431)
  at scala.collection.TraversableOnce.toSeq(TraversableOnce.scala:354)
  at scala.collection.TraversableOnce.toSeq$(TraversableOnce.scala:354)
  at scala.collection.AbstractIterator.toSeq(Iterator.scala:1431)
  ...
- flatMapGroupsWithState *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 489.0 failed 1 times, most recent failure: Lost task 0.0 in stage 489.0 (TID 1997) (localhost executor driver): java.lang.ClassCastException: org.apache.spark.sql.ClickState cannot be cast to org.apache.spark.sql.ClickState
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(generated.java:87)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
	at org.apache.spark.sql.execution.arrow.ArrowConverters$ArrowBatchIterator.hasNext(ArrowConverters.scala:100)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
	at scala.collection.AbstractIterator.to(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
	at org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.$anonfun$processAsArrowBatches$4(Sp...
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toSparkThrowable(GrpcExceptionConverter.scala:53)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:30)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:38)
  at org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:83)
  at org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:153)
  at org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:183)
  at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2813)
  at org.apache.spark.sql.Dataset.withResult(Dataset.scala:3252)
  at org.apache.spark.sql.Dataset.collect(Dataset.scala:2812)
  at org.apache.spark.sql.connect.client.util.QueryTest.checkDataset(QueryTest.scala:54)
  ...
- flatMapGroupsWithState - with initial state *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 494.0 failed 1 times, most recent failure: Lost task 0.0 in stage 494.0 (TID 2006) (localhost executor driver): java.lang.ClassCastException: org.apache.spark.sql.ClickState cannot be cast to org.apache.spark.sql.ClickState
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(generated.java:87)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
	at org.apache.spark.sql.execution.arrow.ArrowConverters$ArrowBatchIterator.hasNext(ArrowConverters.scala:100)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
	at scala.collection.AbstractIterator.to(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
	at org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.$anonfun$processAsArrowBatches$4(Sp...
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toSparkThrowable(GrpcExceptionConverter.scala:53)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:30)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:38)
  at org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:83)
  at org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:153)
  at org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:183)
  at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2813)
  at org.apache.spark.sql.Dataset.withResult(Dataset.scala:3252)
  at org.apache.spark.sql.Dataset.collect(Dataset.scala:2812)
  at org.apache.spark.sql.connect.client.util.QueryTest.checkDataset(QueryTest.scala:54)
  ...
- mapGroupsWithState *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 497.0 failed 1 times, most recent failure: Lost task 0.0 in stage 497.0 (TID 2013) (localhost executor driver): java.lang.ClassCastException: org.apache.spark.sql.ClickState cannot be cast to org.apache.spark.sql.ClickState
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(generated.java:87)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
	at org.apache.spark.sql.execution.arrow.ArrowConverters$ArrowBatchIterator.hasNext(ArrowConverters.scala:100)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
	at scala.collection.AbstractIterator.to(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
	at org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.$anonfun$processAsArrowBatches$4(Sp...
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toSparkThrowable(GrpcExceptionConverter.scala:53)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:30)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:38)
  at org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:83)
  at org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:153)
  at org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:183)
  at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2813)
  at org.apache.spark.sql.Dataset.withResult(Dataset.scala:3252)
  at org.apache.spark.sql.Dataset.collect(Dataset.scala:2812)
  at org.apache.spark.sql.connect.client.util.QueryTest.checkDataset(QueryTest.scala:54)
  ...
- mapGroupsWithState - with initial state *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 502.0 failed 1 times, most recent failure: Lost task 0.0 in stage 502.0 (TID 2022) (localhost executor driver): java.lang.ClassCastException: org.apache.spark.sql.ClickState cannot be cast to org.apache.spark.sql.ClickState
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(generated.java:87)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
	at org.apache.spark.sql.execution.arrow.ArrowConverters$ArrowBatchIterator.hasNext(ArrowConverters.scala:100)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
	at scala.collection.AbstractIterator.to(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
	at org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.$anonfun$processAsArrowBatches$4(Sp...
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toSparkThrowable(GrpcExceptionConverter.scala:53)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:30)
  at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:38)
  at org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:83)
  at org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:153)
  at org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:183)
  at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2813)
  at org.apache.spark.sql.Dataset.withResult(Dataset.scala:3252)
  at org.apache.spark.sql.Dataset.collect(Dataset.scala:2812)
  at org.apache.spark.sql.connect.client.util.QueryTest.checkDataset(QueryTest.scala:54)
  ...
- update class loader after stubbing: new session *** FAILED ***
  java.io.NotSerializableException: org.scalatest.Engine
  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
  at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
  at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
  at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
  at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
  at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
  ...
- update class loader after stubbing: same session *** FAILED ***
  java.io.NotSerializableException: org.scalatest.Engine
  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
  at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
  at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
  at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
  at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
  at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
  ...
*** 10 TESTS FAILED ***

All test failures only occur with this pr, but this PR solves four test failures in SparkSessionE2ESuite

zhenlineo · 2023-07-25T15:27:54Z

@LuciferYang Thanks for the detailed review. Let me fix them. The udf test failures might be caused by my class loading ordering with stub.

zhenlineo · 2023-07-26T00:53:45Z

@LuciferYang The 10 errors:

8 Streaming related test failures: Something is wrong with the class loader, still investigating.
2 new udf loading test failures: they needs the client jar file, so the test cannot run with mvn clean. I will update a warning to help with these two failures.

LuciferYang · 2023-07-26T16:18:27Z

2 new udf loading test failures: they needs the client jar file, so the test cannot run with mvn clean. I will update a warning to help with these two failures.

Yes, we should make a clear indication on this, as I’ve noticed that many developers get into the habit of using the build/mvn package test command for testing, which ends up causing test failures.

zhenlineo · 2023-07-27T23:01:52Z

@LuciferYang @vicennial This is ready for another look, thanks.
The PR should be merged to 3.5 as this is a bug fix for 3.5 UDFs.
cc @rednaxelafx @juliuszsompolski @hvanhovell

hvanhovell · 2023-07-29T00:04:39Z

connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala

    val plan = proto.Plan.newBuilder().setCommand(command).build()

-    client.execute(plan)
+    client.execute(plan).asScala.foreach(_ => ())


Why is this needed?

Currently the registerUDF call is async. I do not feel it is correct to have registerUDF to be async, so added the code to block for success or error.

hvanhovell · 2023-07-29T00:05:38Z

core/src/main/scala/org/apache/spark/util/StubClassLoader.scala

+    classWriter.visitSource(name + ".java", null)
+
+    // Generate constructor.
+    val ctorWriter = classWriter.visitMethod(


Can you file a follow-up to make this throw an exception?

Do you want to cover the case where the default constructor is called? I had the code, just thought not that useful as 99% cases it failed to call other constructors/scan method before calling any constructor etc. Let me bring back the code...

hvanhovell · 2023-07-29T01:44:34Z

connector/connect/client/jvm/src/test/resources/StubClassDummyUdf.scala

Location is a bit weird, why not in src/test/scala?

This source file cannot be on the classpath, otherwise sbt would include it in the server system classpath. So it is outside in resources. We only needs the jars and binaries, which will be manually installed in session classpath. Keeping the source file is just in case anyone wondering what the dummy udf looks like.

hvanhovell · 2023-07-29T02:37:57Z

core/src/main/scala/org/apache/spark/executor/Executor.scala

+      if (updated) {
+        // When a new url is added for non-default class loader, recreate the class loader
+        // to ensure all classes are updated.
+        state.urlClassLoader = createClassLoader(state.urlClassLoader.getURLs, useStub = true)


Why do we recreate the URL classloader as well? Is that needed?

nvm I get it.

hvanhovell · 2023-07-29T02:50:13Z

core/src/main/scala/org/apache/spark/internal/config/package.scala

      .createWithDefault(false)
+
+  private[spark] val CONNECT_SCALA_UDF_STUB_CLASSES =
+    ConfigBuilder("spark.connect.scalaUdf.stubClasses")


stubPrefixes?

can you change this in a follow-up?

core/src/main/scala/org/apache/spark/internal/config/package.scala

hvanhovell · 2023-07-29T02:54:19Z

core/src/main/scala/org/apache/spark/executor/Executor.scala

          isolatedSession)
+        // Always reset the thread class loader to ensure if any updates, all threads (not only
+        // the thread that updated the dependencies) can update to the new class loader.
+        Thread.currentThread.setContextClassLoader(isolatedSession.replClassLoader)


I am pretty sure we do this else where as well.

Are you also unsetting it once you are done?

hvanhovell · 2023-07-29T02:55:44Z

...erver/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala

+    val loader = if (SparkEnv.get.conf.get(CONNECT_SCALA_UDF_STUB_CLASSES).nonEmpty) {
+      val stubClassLoader =
+        StubClassLoader(null, SparkEnv.get.conf.get(CONNECT_SCALA_UDF_STUB_CLASSES))
+      new ChildFirstURLClassLoader(


Should this follow the same rules for classpath resolution we have on the executor?

Probably should to be consistent. Let me fix in a followup.

Actually it is fine. There are 3 existing class loader:

User CL : classes added using --jar
Sys CL: Spark + sys libs
Session CL: classes added using session.addArtifacts

In Executor:

normal: Sys -> (User + Session) -> Stub

reverse: (User + Session) -> Sys -> Stub

In Driver:

normal: (Sys + User) -> Session -> Stub

reverse: (User -> Sys) -> Session -> Stub

So here what you saw is () -> Session -> Stub.

hvanhovell

LGTM

hvanhovell · 2023-07-29T02:58:52Z

Merging this, it fixes a pretty big UX issue for UDFs!

…classes not found on the server classpath ### What changes were proposed in this pull request? This PR introduces a stub class loader for unpacking Scala UDFs in the driver and the executor. When encountering user classes that are not found on the server session classpath, the stub class loader would try to stub the class. This solves the problem that when serializing UDFs, Java serializer might include unnecessary user code e.g. User classes used in the lambda definition signatures in the same class where the UDF is defined. If the user code is actually needed to execute the UDF, we will return an error message to suggest the user to add the missing classes using the `addArtifact` method. ### Why are the changes needed? To enhance the user experience of UDF. This PR should be merged to master and 3.5. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added test both for Scala 2.12 & 2.13 4 tests in SparkSessionE2ESuite still fail to run with maven after the fix because the client test jar is installed on the system classpath (added using --jar at server start), the stub classloader can only stub classes missing from the session classpath (added using `session.addArtifact`). Moving the test jar to the session classpath causes failures in tests for `flatMapGroupsWithState` (SPARK-44576). Finish moving the test jar to session classpath once `flatMapGroupsWithState` test failures are fixed. Closes #42069 from zhenlineo/ref-spark-result. Authored-by: Zhen Li <[email protected]> Signed-off-by: Herman van Hovell <[email protected]> (cherry picked from commit 6d0fed9) Signed-off-by: Herman van Hovell <[email protected]>

juliuszsompolski · 2023-07-29T13:30:22Z

core/src/main/scala/org/apache/spark/internal/config/package.scala

+      .doc("""
+          |Comma-separated list of binary names of classes/packages that should be stubbed during
+          |the Scala UDF serde and execution if not found on the server classpath.
+          |An empty list effectively disables stubbing for all missing classes.
+          |By default, the server stubs classes from the Scala client package.
+          |""".stripMargin)


So by default we will be stubbing if some Spark Connect client code is pulled into the UDF, but not if the serialization pulls some other class, unrelated to the client and not needed by the UDF, but just referenced in the contained class in a way that will make it pulled in?
In that case the user would also get an error about ClassNotFound?
Do we in that case want the user to add that using an addArtifact, even though it might be unclear to the user why is that relevant to the UDF?
What are the disadvantages of just stubbing everything?

The stub class loader currently would be used for all withSession calls in drivers, and all task runs in executors.
Perhaps we should move the stubbing only used for UDF class loading in drivers + more aggressive default e.g. "org, com".

Rubber duck questions :-):
What are the risks of being more aggressive and stubbing everything?
Why the risks are smaller if you were to do it only on the driver?
Would it even work without doing it on executors? Executors execute this, so they need to have the stubs to not run into ClassNotFound?

In the description you write

Java serializer might include unnecessary user code e.g. User classes used in the lambda definition signatures in the same class where the UDF is defined.

but with it defaulting to connect client classes only, it will actually not help for "User classes"?

Include @hvanhovell as he suggested to not stubbing for user classes.

I generally expect user classes to be present on the classpath, if they are not the user needs to something anyway. The internal classes are a bit special because they can be captured by accident, so there stubbing makes more sense.

IDK about that... don't you think that if the Spark Connect is used inside a real life bigger application, there may be many user classes that are not related to anything that the user wishes to execute on Spark cluster, but just various user application business logic that can get captured by accident just as well?

juliuszsompolski · 2023-07-29T13:34:40Z

...connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala

+      case e: IOException if e.getCause.isInstanceOf[NoSuchMethodException] =>
+        throw new ClassNotFoundException(
+          s"Failed to load class correctly due to ${e.getCause}. " +
+            "Make sure the artifact where the class is defined is installed by calling" +
+            " session.addArtifact.")


In the description you write

If the user code is actually needed to execute the UDF, we will return an error message to suggest the user to add the missing classes using the addArtifact method.

but since this triggers during deserialization, wouldn't this trigger also for a class that is not actually used, just accidentally pulled in, and not captured by the CONNECT_SCALA_UDF_STUB_CLASSES config?

wouldn't this trigger also for a class that is not actually used, just accidentally pulled in, and not captured by the CONNECT_SCALA_UDF_STUB_CLASSES config

This code you highlighted would not catch this class. Because your described case would fail with a NoClassFoundException rather than a NoSuchMethodException.

Ah, smart, that's why you catch NoSuchMethodException, because that would suggest actual use, and for NoSuchClassException generate a stub, now I finally understand from your other comment with explanation.
That could be worth a rubber ducky comment here saying that "while NoSuchClassException may be caused by an unused class accidentally pulled by the serializer, NoSuchMethodException suggests actual use of the class".

And @hvanhovell comment about throwing from default constructor is to cover the case where someone just calls the default constructor, but doesn't use any methods?
Also worth a rubber ducky comment :-)

juliuszsompolski · 2023-07-29T13:36:16Z

core/src/main/scala/org/apache/spark/util/StubClassLoader.scala

+    new StubClassLoader(parent, name => binaryName.exists(p => name.startsWith(p)))
+  }
+
+  def generateStub(binaryName: String): Array[Byte] = {


In the description you write

If the user code is actually needed to execute the UDF, we will return an error message to suggest the user to add the missing classes using the addArtifact method.

If I understand correctly, this generated stub should be throwing that error if it actually gets called?

When user actually uses a class, it normally would be val clazz = new Clazz(); clazz.callMethod, when this happens, it fails earlier at compile to find the method before we come here (throw the error from constructor during runtime).

Throwing an error from constructor would only help if the user calls val clazz = new Class(). And did not use the class afterwords.

If you ask why not sub methods that the user would call and throw the error there? The reason is because it is too hard :) We need to scan the UDF contents. The NoSuchMethodException in SparkConnectPlanner is good enough to throw the error for us.

### What changes were proposed in this pull request? Made the stub constructor to throw ClassNotFoundException if called. A tiny improvement to not recreate class loaders in executor if stubbing is not enabled. ### Why are the changes needed? Enhancement to #42069 Should be merged to 3.5. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit tests. Closes #42222 from zhenlineo/error-from-constuctor. Authored-by: Zhen Li <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

### What changes were proposed in this pull request? Made the stub constructor to throw ClassNotFoundException if called. A tiny improvement to not recreate class loaders in executor if stubbing is not enabled. ### Why are the changes needed? Enhancement to #42069 Should be merged to 3.5. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit tests. Closes #42222 from zhenlineo/error-from-constuctor. Authored-by: Zhen Li <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 5df1d79) Signed-off-by: Hyukjin Kwon <[email protected]>

…ry files ### What changes were proposed in this pull request? The purpose of this pr is to clean up the binary files used to assist with Scala 2.12 testing. They include: - `core/src/test/resources/TestHelloV3_2.12.jar` and `core/src/test/resources/TestHelloV2_2.12.jar` added by SPARK-44246(#41789). - `connector/connect/client/jvm/src/test/resources/udf2.12` and `connector/connect/client/jvm/src/test/resources/udf2.12.jar` added by SPARK-43744(#42069) - `connector/connect/client/jvm/src/test/resources/TestHelloV2_2.12.jar` added by SPARK-44293(#41844) - `sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.12.jar` added by SPARK-25304(#22308) ### Why are the changes needed? Spark 4.0 no longer supports Scala 2.12. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #43106 from LuciferYang/SPARK-45321. Authored-by: yangjie01 <[email protected]> Signed-off-by: yangjie01 <[email protected]>

tenstriker · 2024-02-29T01:45:37Z

I think this PR is affecting external users as well. We start spark-connect server with external jars and hitting similar ClassCastException error due to classloading issue: https://issues.apache.org/jira/browse/SPARK-46762

### What changes were proposed in this pull request? This jar was added in #42069 but moved in #43735. ### Why are the changes needed? To clean up a jar not used. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests should check ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47315 from HyukjinKwon/minor-cleanup-jar-2. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Martin Grund <[email protected]>

### What changes were proposed in this pull request? This jar was added in apache#42069 but moved in apache#43735. ### Why are the changes needed? To clean up a jar not used. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests should check ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47315 from HyukjinKwon/minor-cleanup-jar-2. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Martin Grund <[email protected]>

github-actions bot added SQL BUILD CORE CONNECT labels Jul 19, 2023

vicennial reviewed Jul 19, 2023

View reviewed changes

core/src/main/scala/org/apache/spark/executor/Executor.scala Outdated Show resolved Hide resolved

vicennial reviewed Jul 19, 2023

View reviewed changes

core/src/main/scala/org/apache/spark/executor/Executor.scala Outdated Show resolved Hide resolved

vicennial reviewed Jul 19, 2023

View reviewed changes

core/src/main/java/org/apache/spark/util/ChildFirstURLClassLoader.java Outdated Show resolved Hide resolved

vicennial reviewed Jul 19, 2023

View reviewed changes

core/src/main/scala/org/apache/spark/executor/Executor.scala Outdated Show resolved Hide resolved

juliuszsompolski mentioned this pull request Jul 24, 2023

[SPARK-44422][CONNECT] Spark Connect fine grained interrupt #42009

Closed

zhenlineo force-pushed the ref-spark-result branch from e4721af to f60b4aa Compare July 25, 2023 05:38

zhenlineo changed the title ~~[WIP] Fix class loading problem caused by stub user classes not found on the server classpath~~ [SPARK-43744][CONNECT] Fix class loading problem caused by stub user classes not found on the server classpath Jul 25, 2023

zhenlineo added 3 commits July 24, 2023 22:38

Reproduce the error with classNotFound with SparkResult

33d8961

Fix executor class loader

98ec68c

Fix

79f716d

zhenlineo force-pushed the ref-spark-result branch from f60b4aa to 79f716d Compare July 25, 2023 05:43

zhenlineo marked this pull request as ready for review July 25, 2023 05:43

LuciferYang reviewed Jul 25, 2023

View reviewed changes

github-actions bot removed the BUILD label Jul 25, 2023

Fix after review

31adf15

zhenlineo force-pushed the ref-spark-result branch from cac9ff0 to 31adf15 Compare July 26, 2023 00:32

Make the tests runnable for sbt too

0290c58

This comment was marked as outdated.

Sign in to view

hvanhovell reviewed Jul 29, 2023

View reviewed changes

core/src/main/scala/org/apache/spark/internal/config/package.scala Show resolved Hide resolved

hvanhovell reviewed Jul 29, 2023

View reviewed changes

hvanhovell approved these changes Jul 29, 2023

View reviewed changes

hvanhovell closed this in 6d0fed9 Jul 29, 2023

zhenlineo mentioned this pull request Jul 29, 2023

[SPARK-43744][CONNECT][FOLLOW-UP]Throw error from the constructor #42222

Closed

juliuszsompolski reviewed Jul 29, 2023

View reviewed changes

LuciferYang mentioned this pull request Sep 25, 2023

[SPARK-45321][TESTS] Clean up the unnecessary Scala 2.12 related binary files #43106

Closed

This was referenced Jul 11, 2024

[MINOR][TESTS] Remove unused test jar (udf_noA.jar) #47309

Merged

[MINOR][TESTS] Remove unused test jar (udf_noA.jar) #47315

Closed

[SPARK-43744][CONNECT] Fix class loading problem caused by stub user classes not found on the server classpath #42069

[SPARK-43744][CONNECT] Fix class loading problem caused by stub user classes not found on the server classpath #42069

Uh oh!

Conversation

zhenlineo commented Jul 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LuciferYang commented Jul 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LuciferYang commented Jul 25, 2023

Uh oh!

zhenlineo commented Jul 25, 2023

Uh oh!

zhenlineo commented Jul 26, 2023

Uh oh!

LuciferYang commented Jul 26, 2023

Uh oh!

zhenlineo commented Jul 27, 2023

Uh oh!

This comment was marked as outdated.

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hvanhovell left a comment

Choose a reason for hiding this comment

Uh oh!

hvanhovell commented Jul 29, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

zhenlineo commented Jul 19, 2023 •

edited

Loading

LuciferYang commented Jul 25, 2023 •

edited

Loading