Skip to content

Conversation

@zsxwing
Copy link
Member

@zsxwing zsxwing commented Jul 19, 2019

What changes were proposed in this pull request?

Because Encoder is not thread safe, the user cannot reuse an Encoder in multiple Datasets. However, creating an Encoder for a complicated class is slow due to Scala Reflection. To eliminate the cost of Scala Reflection, right now I usually use the private API ExpressionEncoder.copy as follows:

object FooEncoder {
  private lazy val _encoder: ExpressionEncoder[Foo] = ExpressionEncoder[Foo]()
  implicit def encoder: ExpressionEncoder[Foo] = _encoder.copy()
}

This PR proposes a new method makeCopy in Encoder so that the above codes can be rewritten using public APIs.

object FooEncoder {
  private lazy val _encoder: Encoder[Foo] = Encoders.product[Foo]()
  implicit def encoder: Encoder[Foo] = _encoder.makeCopy
}

The method name is consistent with TreeNode.makeCopy.

How was this patch tested?

Jenkins

@zsxwing zsxwing changed the title Add a public API Encoder.copyEncoder to allow creating Encoder without touching Scala reflection. Add a public API Encoder.copyEncoder to allow creating Encoder without touching Scala reflections Jul 19, 2019
@zsxwing zsxwing changed the title Add a public API Encoder.copyEncoder to allow creating Encoder without touching Scala reflections [SPARK-28456][SQL]Add a public API Encoder.copyEncoder to allow creating Encoder without touching Scala reflections Jul 19, 2019
@zsxwing
Copy link
Member Author

zsxwing commented Jul 19, 2019

cc @marmbrus @cloud-fan

@zsxwing zsxwing changed the title [SPARK-28456][SQL]Add a public API Encoder.copyEncoder to allow creating Encoder without touching Scala reflections [SPARK-28456][SQL]Add a public API Encoder.copyEncoder to allow creating Encoder without touching Scala Reflection Jul 19, 2019
@SparkQA
Copy link

SparkQA commented Jul 20, 2019

Test build #107930 has finished for PR 25209 at commit a071bf9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-28456][SQL]Add a public API Encoder.copyEncoder to allow creating Encoder without touching Scala Reflection [SPARK-28456][SQL] Add a public API Encoder.copyEncoder to allow creating Encoder without touching Scala Reflection Jul 20, 2019
@zsxwing zsxwing changed the title [SPARK-28456][SQL] Add a public API Encoder.copyEncoder to allow creating Encoder without touching Scala Reflection [SPARK-28456][SQL] Add a public API Encoder.makeCopy to allow creating Encoder without touching Scala Reflection Jul 21, 2019
@SparkQA
Copy link

SparkQA commented Jul 22, 2019

Test build #107972 has finished for PR 25209 at commit 3f18ceb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 62e2824 Jul 22, 2019
@zsxwing zsxwing deleted the encoder-copy branch July 22, 2019 17:54
yiheng pushed a commit to yiheng/spark that referenced this pull request Jul 24, 2019
…ing Encoder without touching Scala Reflection

## What changes were proposed in this pull request?

Because `Encoder` is not thread safe, the user cannot reuse an `Encoder` in multiple `Dataset`s. However, creating an `Encoder` for a complicated class is slow due to Scala Reflection. To eliminate the cost of Scala Reflection, right now I usually use the private API `ExpressionEncoder.copy` as follows:

```scala
object FooEncoder {
  private lazy val _encoder: ExpressionEncoder[Foo] = ExpressionEncoder[Foo]()
  implicit def encoder: ExpressionEncoder[Foo] = _encoder.copy()
}
```

This PR proposes a new method `makeCopy` in `Encoder` so that the above codes can be rewritten using public APIs.

```scala
object FooEncoder {
  private lazy val _encoder: Encoder[Foo] = Encoders.product[Foo]()
  implicit def encoder: Encoder[Foo] = _encoder.makeCopy
}
```

The method name is consistent with `TreeNode.makeCopy`.

## How was this patch tested?

Jenkins

Closes apache#25209 from zsxwing/encoder-copy.

Authored-by: Shixiong Zhu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants