[SPARK-27965][SQL] Add extractors for v2 catalog transforms. #24812

rdblue · 2019-06-05T23:43:02Z

What changes were proposed in this pull request?

Add extractors for v2 catalog transforms.

These extractors are used to match transforms that are equivalent to Spark's internal case classes. This makes it easier to work with v2 transforms.

How was this patch tested?

Added test suite for the new extractors.

SparkQA · 2019-06-06T02:44:37Z

Test build #106219 has finished for PR 24812 at commit 1821443.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

...yst/src/test/scala/org/apache/spark/sql/catalog/v2/expressions/TransformExtractorSuite.scala

dongjoon-hyun · 2019-06-06T03:13:09Z

...yst/src/test/scala/org/apache/spark/sql/catalog/v2/expressions/TransformExtractorSuite.scala

+    override def value: T = literal
+    override def dataType: DataType = catalyst.expressions.Literal(literal).dataType
+    override def describe: String = literal.toString
+  }


It seems that we have this already. Can we reuse LogicalExpressions.literal and remove this?

- override def arguments: Array[Expression] = Array(lit(numBuckets), ref) + override def arguments: Array[Expression] = Array(LogicalExpressions.literal(numBuckets), ref)

Using an anonymous class is part of the test.

The extract functions are intended to correctly match any Transform, NamedReference, or Literal instance that is equivalent. To test that, we need to test with objects that are equivalent according to the Java interface, but that do not actually use Spark's internal case classes.

dongjoon-hyun · 2019-06-06T03:17:11Z

...yst/src/test/scala/org/apache/spark/sql/catalog/v2/expressions/TransformExtractorSuite.scala

+  private def ref(names: String*): NamedReference = new NamedReference {
+    override def fieldNames: Array[String] = names.toArray
+    override def describe: String = names.mkString(".")
+  }


Ditto. Please reuse LogicalExpressions.reference.

- transform("identity", ref("a", "b")) match { + transform("identity", LogicalExpressions.reference("a.b")) match {

Of course, you can omit the prefix LogicalExpressions. with the proper import.

dongjoon-hyun · 2019-06-06T03:24:10Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalog/v2/expressions/expressions.scala

+ * Convenience extractor for any Literal.
+ */
+private object Lit {
+  def unapply[T](literal: Literal[T]): Some[(T, DataType)] = {


Some -> Option?

This always returns Some so I thought it was correct to use here. This practice was pointed out by @HyukjinKwon here: #24689 (comment)

dongjoon-hyun · 2019-06-06T03:25:08Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalog/v2/expressions/expressions.scala

+ * Convenience extractor for any NamedReference.
+ */
+private object Ref {
+  def unapply(named: NamedReference): Some[Seq[String]] = {


Some -> Option.

dongjoon-hyun · 2019-06-06T03:25:18Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalog/v2/expressions/expressions.scala

+ * Convenience extractor for any Transform.
+ */
+private object NamedTransform {
+  def unapply(transform: Transform): Some[(String, Seq[Expression])] = {


Some -> Option.

...yst/src/test/scala/org/apache/spark/sql/catalog/v2/expressions/TransformExtractorSuite.scala

dongjoon-hyun · 2019-06-06T03:44:40Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalog/v2/expressions/expressions.scala

+    case _ =>
+      None
+  }
+}


If we have NamedTransform.unapply and Ref.unapply, the others are not required much like the following.

- case IdentityTransform(FieldReference(seq)) => + case NamedTransform("identity", Seq(Ref(seq))) =>

- case YearsTransform(FieldReference(seq)) => + case NamedTransform("years", Seq(Ref(seq))) =>

Do we need all of them?

Yes, we need them.

The idea is to make any transform that is equivalent to an IdentityTransform(...) instance of Spark's case class work in a match expression as though it were actually an IdentityTransform instance. That way, Spark can internally use these case classes, even though users may pass instances that are unknown classes. This will reduce future bugs caused by matching IdentityTransform instead of remembering to match the more general NamedTransform("identity", ...).

dongjoon-hyun

Please remove redundant utility functions first. For extractors, I'm not sure we need all of them. If possible, we had better keep the most general and small set of ways.

cc @gatorsmile

rdblue · 2019-06-06T16:42:33Z

@dongjoon-hyun, I replied to your comments and updated this. Please have another look. Thank you!

dongjoon-hyun · 2019-06-06T17:00:14Z

Thank you for the explanation. I'll review today again, @rdblue .

SparkQA · 2019-06-06T19:44:55Z

Test build #106246 has finished for PR 24812 at commit 453bb92.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile

All the changes are internal. I am fine to add these extractors, even if we do not need them in the current stage. If @dongjoon-hyun has more comments, please address them after we merge it.

LGTM Thanks! Merged to master.

## What changes were proposed in this pull request? Add extractors for v2 catalog transforms. These extractors are used to match transforms that are equivalent to Spark's internal case classes. This makes it easier to work with v2 transforms. ## How was this patch tested? Added test suite for the new extractors. Closes apache#24812 from rdblue/SPARK-27965-add-transform-extractors. Authored-by: Ryan Blue <[email protected]> Signed-off-by: gatorsmile <[email protected]>

SPARK-27965: Add extractors for v2 catalog transforms.

1821443

rdblue changed the title ~~SPARK-27965: Add extractors for v2 catalog transforms.~~ [SPARK-27965][SQL] Add extractors for v2 catalog transforms. Jun 5, 2019

rdblue mentioned this pull request Jun 5, 2019

[SPARK-27919][SQL] Add v2 session catalog #24768

Closed

dongjoon-hyun reviewed Jun 6, 2019

View reviewed changes

...yst/src/test/scala/org/apache/spark/sql/catalog/v2/expressions/TransformExtractorSuite.scala Outdated Show resolved Hide resolved

dongjoon-hyun reviewed Jun 6, 2019

View reviewed changes

...yst/src/test/scala/org/apache/spark/sql/catalog/v2/expressions/TransformExtractorSuite.scala Outdated Show resolved Hide resolved

dongjoon-hyun reviewed Jun 6, 2019

View reviewed changes

...yst/src/test/scala/org/apache/spark/sql/catalog/v2/expressions/TransformExtractorSuite.scala Outdated Show resolved Hide resolved

dongjoon-hyun reviewed Jun 6, 2019

View reviewed changes

dongjoon-hyun requested changes Jun 6, 2019

View reviewed changes

Update tests for review comments.

453bb92

gatorsmile reviewed Jun 7, 2019

View reviewed changes

gatorsmile closed this in b30655b Jun 7, 2019

dongjoon-hyun added the SQL label Feb 5, 2020

rdblue deleted the SPARK-27965-add-transform-extractors branch July 17, 2020 00:43

[SPARK-27965][SQL] Add extractors for v2 catalog transforms. #24812

[SPARK-27965][SQL] Add extractors for v2 catalog transforms. #24812

Uh oh!

Conversation

rdblue commented Jun 5, 2019

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Jun 6, 2019

Uh oh!

Uh oh!

Uh oh!

dongjoon-hyun Jun 6, 2019

Choose a reason for hiding this comment

Uh oh!

rdblue Jun 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Jun 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Jun 6, 2019

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Jun 6, 2019

Choose a reason for hiding this comment

Uh oh!

rdblue Jun 6, 2019

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Jun 6, 2019

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Jun 6, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dongjoon-hyun Jun 6, 2019

Choose a reason for hiding this comment

Uh oh!

rdblue Jun 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

rdblue commented Jun 6, 2019

Uh oh!

dongjoon-hyun commented Jun 6, 2019

Uh oh!

SparkQA commented Jun 6, 2019

Uh oh!

gatorsmile left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rdblue Jun 6, 2019 •

edited

Loading

dongjoon-hyun Jun 6, 2019 •

edited

Loading

rdblue Jun 6, 2019 •

edited

Loading

gatorsmile left a comment •

edited

Loading