-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-12957][SQL] Initial support for constraint propagation in SparkSQL #10844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #49770 has finished for PR 10844 at commit
|
f7251dd to
04ff99a
Compare
|
Test build #49775 has finished for PR 10844 at commit
|
|
Could you add the id of the JIRA ticket to the titlle? Could you also add a description, explaining why we want this? Seems cool though! |
|
@hvanhovell added, thanks! |
|
@sameeragarwal I just saw it. Thank you! I will do the code changes after this is merged. |
|
Test build #50083 has finished for PR 10844 at commit
|
|
Test build #50161 has finished for PR 10844 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protected?
Also I'm not sure I get the scala doc. Maybe getReleventContraints is a better name? It is taking the constraints and removing those that don't apply anymore because we removed columns right?
|
Test build #50371 has finished for PR 10844 at commit
|
|
Thanks @marmbrus, all comments addressed! |
|
Test build #50404 has finished for PR 10844 at commit
|
|
test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider making this getRelevantConstraints(validConstraints) so that each implementor of validContraints does't have to remember to do the filter / canonicalization. They can just focus on augmenting or passing through constraints from children based on the operators semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea, added.
|
Test build #50417 has finished for PR 10844 at commit
|
|
Thanks @marmbrus, all comments addressed! |
|
Test build #50429 has finished for PR 10844 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was a merging mistake as its duplicated with the method below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, fixed!
|
@marmbrus comments addressed! |
|
test this please |
|
Test build #50518 has finished for PR 10844 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we do not need PredicateHelper at here? Maybe it is better to just with PredicateHelper for Filter and Join?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, added this only for filter and join that needed splitConjunctivePredicates
c1a0128 to
2bd2735
Compare
|
comments addressed! |
|
Test build #50603 has finished for PR 10844 at commit
|
|
Thanks! Merging to master. |
Based on the semantics of a query, we can derive a number of data constraints on output of each (logical or physical) operator. For instance, if a filter defines
‘a > 10, we know that the output data of this filter satisfies 2 constraints:‘a > 10isNotNull(‘a)This PR proposes a possible way of keeping track of these constraints and propagating them in the logical plan, which can then help us build more advanced optimizations (such as pruning redundant filters, optimizing joins, among others). We define constraints as a set of (implicitly conjunctive) expressions. For e.g., if a filter operator has constraints =
Set(‘a > 10, ‘b < 100), it’s implied that the outputs satisfy both individual constraints (i.e.,‘a > 10AND‘b < 100).Design Document: https://docs.google.com/a/databricks.com/document/d/1WQRgDurUBV9Y6CWOBS75PQIqJwT-6WftVa18xzm7nCo/edit?usp=sharing