Skip to content

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented Jun 5, 2019

What changes were proposed in this pull request?

Add extractors for v2 catalog transforms.

These extractors are used to match transforms that are equivalent to Spark's internal case classes. This makes it easier to work with v2 transforms.

How was this patch tested?

Added test suite for the new extractors.

@rdblue rdblue changed the title SPARK-27965: Add extractors for v2 catalog transforms. [SPARK-27965][SQL] Add extractors for v2 catalog transforms. Jun 5, 2019
@SparkQA
Copy link

SparkQA commented Jun 6, 2019

Test build #106219 has finished for PR 24812 at commit 1821443.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

override def value: T = literal
override def dataType: DataType = catalyst.expressions.Literal(literal).dataType
override def describe: String = literal.toString
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we have this already. Can we reuse LogicalExpressions.literal and remove this?

-    override def arguments: Array[Expression] = Array(lit(numBuckets), ref)
+    override def arguments: Array[Expression] = Array(LogicalExpressions.literal(numBuckets), ref)

Copy link
Contributor Author

@rdblue rdblue Jun 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using an anonymous class is part of the test.

The extract functions are intended to correctly match any Transform, NamedReference, or Literal instance that is equivalent. To test that, we need to test with objects that are equivalent according to the Java interface, but that do not actually use Spark's internal case classes.

private def ref(names: String*): NamedReference = new NamedReference {
override def fieldNames: Array[String] = names.toArray
override def describe: String = names.mkString(".")
}
Copy link
Member

@dongjoon-hyun dongjoon-hyun Jun 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. Please reuse LogicalExpressions.reference.

-    transform("identity", ref("a", "b")) match {
+    transform("identity", LogicalExpressions.reference("a.b")) match {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, you can omit the prefix LogicalExpressions. with the proper import.

* Convenience extractor for any Literal.
*/
private object Lit {
def unapply[T](literal: Literal[T]): Some[(T, DataType)] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some -> Option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This always returns Some so I thought it was correct to use here. This practice was pointed out by @HyukjinKwon here: #24689 (comment)

* Convenience extractor for any NamedReference.
*/
private object Ref {
def unapply(named: NamedReference): Some[Seq[String]] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some -> Option.

* Convenience extractor for any Transform.
*/
private object NamedTransform {
def unapply(transform: Transform): Some[(String, Seq[Expression])] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some -> Option.

case _ =>
None
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have NamedTransform.unapply and Ref.unapply, the others are not required much like the following.

-      case IdentityTransform(FieldReference(seq)) =>
+      case NamedTransform("identity", Seq(Ref(seq))) =>
-      case YearsTransform(FieldReference(seq)) =>
+      case NamedTransform("years", Seq(Ref(seq))) =>

Do we need all of them?

Copy link
Contributor Author

@rdblue rdblue Jun 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need them.

The idea is to make any transform that is equivalent to an IdentityTransform(...) instance of Spark's case class work in a match expression as though it were actually an IdentityTransform instance. That way, Spark can internally use these case classes, even though users may pass instances that are unknown classes. This will reduce future bugs caused by matching IdentityTransform instead of remembering to match the more general NamedTransform("identity", ...).

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove redundant utility functions first. For extractors, I'm not sure we need all of them. If possible, we had better keep the most general and small set of ways.

cc @gatorsmile

@rdblue
Copy link
Contributor Author

rdblue commented Jun 6, 2019

@dongjoon-hyun, I replied to your comments and updated this. Please have another look. Thank you!

@dongjoon-hyun
Copy link
Member

Thank you for the explanation. I'll review today again, @rdblue .

@SparkQA
Copy link

SparkQA commented Jun 6, 2019

Test build #106246 has finished for PR 24812 at commit 453bb92.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the changes are internal. I am fine to add these extractors, even if we do not need them in the current stage. If @dongjoon-hyun has more comments, please address them after we merge it.

LGTM Thanks! Merged to master.

@gatorsmile gatorsmile closed this in b30655b Jun 7, 2019
emanuelebardelli pushed a commit to emanuelebardelli/spark that referenced this pull request Jun 15, 2019
## What changes were proposed in this pull request?

Add extractors for v2 catalog transforms.

These extractors are used to match transforms that are equivalent to Spark's internal case classes. This makes it easier to work with v2 transforms.

## How was this patch tested?

Added test suite for the new extractors.

Closes apache#24812 from rdblue/SPARK-27965-add-transform-extractors.

Authored-by: Ryan Blue <[email protected]>
Signed-off-by: gatorsmile <[email protected]>
@rdblue rdblue deleted the SPARK-27965-add-transform-extractors branch July 17, 2020 00:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants