-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-27965][SQL] Add extractors for v2 catalog transforms. #24812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #106219 has finished for PR 24812 at commit
|
...yst/src/test/scala/org/apache/spark/sql/catalog/v2/expressions/TransformExtractorSuite.scala
Outdated
Show resolved
Hide resolved
...yst/src/test/scala/org/apache/spark/sql/catalog/v2/expressions/TransformExtractorSuite.scala
Outdated
Show resolved
Hide resolved
| override def value: T = literal | ||
| override def dataType: DataType = catalyst.expressions.Literal(literal).dataType | ||
| override def describe: String = literal.toString | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that we have this already. Can we reuse LogicalExpressions.literal and remove this?
- override def arguments: Array[Expression] = Array(lit(numBuckets), ref)
+ override def arguments: Array[Expression] = Array(LogicalExpressions.literal(numBuckets), ref)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using an anonymous class is part of the test.
The extract functions are intended to correctly match any Transform, NamedReference, or Literal instance that is equivalent. To test that, we need to test with objects that are equivalent according to the Java interface, but that do not actually use Spark's internal case classes.
| private def ref(names: String*): NamedReference = new NamedReference { | ||
| override def fieldNames: Array[String] = names.toArray | ||
| override def describe: String = names.mkString(".") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto. Please reuse LogicalExpressions.reference.
- transform("identity", ref("a", "b")) match {
+ transform("identity", LogicalExpressions.reference("a.b")) match {There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, you can omit the prefix LogicalExpressions. with the proper import.
| * Convenience extractor for any Literal. | ||
| */ | ||
| private object Lit { | ||
| def unapply[T](literal: Literal[T]): Some[(T, DataType)] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some -> Option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This always returns Some so I thought it was correct to use here. This practice was pointed out by @HyukjinKwon here: #24689 (comment)
| * Convenience extractor for any NamedReference. | ||
| */ | ||
| private object Ref { | ||
| def unapply(named: NamedReference): Some[Seq[String]] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some -> Option.
| * Convenience extractor for any Transform. | ||
| */ | ||
| private object NamedTransform { | ||
| def unapply(transform: Transform): Some[(String, Seq[Expression])] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some -> Option.
...yst/src/test/scala/org/apache/spark/sql/catalog/v2/expressions/TransformExtractorSuite.scala
Outdated
Show resolved
Hide resolved
| case _ => | ||
| None | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have NamedTransform.unapply and Ref.unapply, the others are not required much like the following.
- case IdentityTransform(FieldReference(seq)) =>
+ case NamedTransform("identity", Seq(Ref(seq))) =>- case YearsTransform(FieldReference(seq)) =>
+ case NamedTransform("years", Seq(Ref(seq))) =>Do we need all of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need them.
The idea is to make any transform that is equivalent to an IdentityTransform(...) instance of Spark's case class work in a match expression as though it were actually an IdentityTransform instance. That way, Spark can internally use these case classes, even though users may pass instances that are unknown classes. This will reduce future bugs caused by matching IdentityTransform instead of remembering to match the more general NamedTransform("identity", ...).
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove redundant utility functions first. For extractors, I'm not sure we need all of them. If possible, we had better keep the most general and small set of ways.
cc @gatorsmile
|
@dongjoon-hyun, I replied to your comments and updated this. Please have another look. Thank you! |
|
Thank you for the explanation. I'll review today again, @rdblue . |
|
Test build #106246 has finished for PR 24812 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the changes are internal. I am fine to add these extractors, even if we do not need them in the current stage. If @dongjoon-hyun has more comments, please address them after we merge it.
LGTM Thanks! Merged to master.
## What changes were proposed in this pull request? Add extractors for v2 catalog transforms. These extractors are used to match transforms that are equivalent to Spark's internal case classes. This makes it easier to work with v2 transforms. ## How was this patch tested? Added test suite for the new extractors. Closes apache#24812 from rdblue/SPARK-27965-add-transform-extractors. Authored-by: Ryan Blue <[email protected]> Signed-off-by: gatorsmile <[email protected]>
What changes were proposed in this pull request?
Add extractors for v2 catalog transforms.
These extractors are used to match transforms that are equivalent to Spark's internal case classes. This makes it easier to work with v2 transforms.
How was this patch tested?
Added test suite for the new extractors.