Skip to content

Commit 62e2824

Browse files
zsxwingcloud-fan
authored andcommitted
[SPARK-28456][SQL] Add a public API Encoder.makeCopy to allow creating Encoder without touching Scala Reflection
## What changes were proposed in this pull request? Because `Encoder` is not thread safe, the user cannot reuse an `Encoder` in multiple `Dataset`s. However, creating an `Encoder` for a complicated class is slow due to Scala Reflection. To eliminate the cost of Scala Reflection, right now I usually use the private API `ExpressionEncoder.copy` as follows: ```scala object FooEncoder { private lazy val _encoder: ExpressionEncoder[Foo] = ExpressionEncoder[Foo]() implicit def encoder: ExpressionEncoder[Foo] = _encoder.copy() } ``` This PR proposes a new method `makeCopy` in `Encoder` so that the above codes can be rewritten using public APIs. ```scala object FooEncoder { private lazy val _encoder: Encoder[Foo] = Encoders.product[Foo]() implicit def encoder: Encoder[Foo] = _encoder.makeCopy } ``` The method name is consistent with `TreeNode.makeCopy`. ## How was this patch tested? Jenkins Closes #25209 from zsxwing/encoder-copy. Authored-by: Shixiong Zhu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
1 parent 7ed0088 commit 62e2824

File tree

3 files changed

+12
-0
lines changed

3 files changed

+12
-0
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/Encoder.scala

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,4 +80,10 @@ trait Encoder[T] extends Serializable {
8080
* A ClassTag that can be used to construct an Array to contain a collection of `T`.
8181
*/
8282
def clsTag: ClassTag[T]
83+
84+
/**
85+
* Create a copied [[Encoder]]. The implementation may just copy internal reusable fields to speed
86+
* up the [[Encoder]] creation.
87+
*/
88+
def makeCopy: Encoder[T]
8389
}

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -382,6 +382,8 @@ case class ExpressionEncoder[T](
382382
.map { case(f, a) => s"${f.name}$a: ${f.dataType.simpleString}"}.mkString(", ")
383383

384384
override def toString: String = s"class[$schemaString]"
385+
386+
override def makeCopy: ExpressionEncoder[T] = copy()
385387
}
386388

387389
// A dummy logical plan that can hold expressions and go through optimizer rules.

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -427,6 +427,10 @@ class ExpressionEncoderSuite extends CodegenInterpretedPlanTest with AnalysisTes
427427
testOverflowingBigNumeric(BigInt("9" * 100), "scala very large big int")
428428
testOverflowingBigNumeric(new BigInteger("9" * 100), "java very big int")
429429

430+
encodeDecodeTest("foo" -> 1L, "makeCopy") {
431+
Encoders.product[(String, Long)].makeCopy.asInstanceOf[ExpressionEncoder[(String, Long)]]
432+
}
433+
430434
private def testOverflowingBigNumeric[T: TypeTag](bigNumeric: T, testName: String): Unit = {
431435
Seq(true, false).foreach { allowNullOnOverflow =>
432436
testAndVerifyNotLeakingReflectionObjects(

0 commit comments

Comments
 (0)