-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-26004][SQL] InMemoryTable support StartsWith predicate push down #23004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #98689 has finished for PR 23004 at commit
|
| list.map(l => statsFor(a).lowerBound <= l.asInstanceOf[Literal] && | ||
| l.asInstanceOf[Literal] <= statsFor(a).upperBound).reduce(_ || _) | ||
|
|
||
| case StartsWith(a: AttributeReference, ExtractableLiteral(l)) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some comment to explain it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to pr description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the comment in the line 240, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu Done
| case StartsWith(a: AttributeReference, ExtractableLiteral(l)) => | ||
| statsFor(a).lowerBound.substr(0, Length(l)) <= l && | ||
| l <= statsFor(a).upperBound.substr(0, Length(l)) | ||
| case StartsWith(ExtractableLiteral(l), a: AttributeReference) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, a.startswith(b) and b.startswith(a) are not same but why are they same here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, The last one should be removed, DataSourceStrategy has the same logic:
spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
Lines 512 to 513 in 3d6b68b
| case expressions.StartsWith(a: Attribute, Literal(v: UTF8String, StringType)) => | |
| Some(sources.StringStartsWith(a.name, v.toString)) |
|
Looks fine to me |
|
Test build #99040 has finished for PR 23004 at commit
|
|
Any update? |
|
Test build #103146 has finished for PR 23004 at commit
|
|
thanks, merging to master! |
What changes were proposed in this pull request?
SPARK-24638 adds support for Parquet file
StartsWithpredicate push down.InMemoryTablecan also support this feature.This is an example to explain how it works, Imagine that the
idcolumn stored as below:A filter
df.filter($"id".startsWith("2"))orid like '2%'then we substr lowerBound and upperBound:
We can see that we only need to read
p1andp3.How was this patch tested?
unit tests and benchmark tests
benchmark test result: