-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-13919] [SQL] [WIP] Resolving the Conflicts of ColumnPruning and PushPredicateThroughProject #11745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-13919] [SQL] [WIP] Resolving the Conflicts of ColumnPruning and PushPredicateThroughProject #11745
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -61,15 +61,16 @@ abstract class Optimizer extends RuleExecutor[LogicalPlan] { | |
| RemoveLiteralFromGroupExpressions) :: | ||
| Batch("Operator Optimizations", FixedPoint(100), | ||
| // Operator push down | ||
| SetOperationPushDown, | ||
| SamplePushDown, | ||
| ReorderJoin, | ||
| OuterJoinElimination, | ||
| PushPredicateThroughJoin, | ||
| PushPredicateThroughProject, | ||
| SetOperationPushDown, | ||
| PushPredicateThroughJoin, | ||
| PushPredicateThroughGenerate, | ||
| PushPredicateThroughAggregate, | ||
| LimitPushDown, | ||
| PushProjectThroughFilter, | ||
| ColumnPruning, | ||
| EliminateOperators, | ||
| // Operator combine | ||
|
|
@@ -91,6 +92,10 @@ abstract class Optimizer extends RuleExecutor[LogicalPlan] { | |
| SimplifyCasts, | ||
| SimplifyCaseConversionExpressions, | ||
| EliminateSerialization) :: | ||
| // Because ColumnPruning is called after PushPredicateThroughProject, the predicate push down | ||
| // is reversed. This batch is to ensure Filter is pushed below Project, if possible. | ||
| Batch("Push Predicate Through Project", Once, | ||
| PushPredicateThroughProject) :: | ||
| Batch("Decimal Optimizations", FixedPoint(100), | ||
| DecimalAggregates) :: | ||
| Batch("LocalRelation", FixedPoint(100), | ||
|
|
@@ -306,14 +311,28 @@ object SetOperationPushDown extends Rule[LogicalPlan] with PredicateHelper { | |
| } | ||
|
|
||
| /** | ||
| * Attempts to eliminate the reading of unneeded columns from the query plan using the following | ||
| * transformations: | ||
| * Attempts to eliminate the reading of unneeded columns from the query plan | ||
| * by pushing Project through Filter. | ||
| * | ||
| * - Inserting Projections beneath the following operators: | ||
| * - Aggregate | ||
| * - Generate | ||
| * - Project <- Join | ||
| * - LeftSemiJoin | ||
| * Note: This rule could reverse the effects of PushPredicateThroughProject. | ||
| * This rule should be run before ColumnPruning for ensuring that Project can be | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a little against to depending on rules order too much, sometimes we have to as other solutions are way too complex, but for this issue, can we try to find a more general solution?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I have the same concern. This PR is just to resolve the conflicts based on the current infrastructure. In my opinion, in each batch, we need a few rule sets. The order of rule sets do not matter. In each rule set, the order of rules matters. However, this is a fundamental design change. @marmbrus @rxin might have a better idea in this. |
||
| * pushed as low as possible. | ||
| */ | ||
| object PushProjectThroughFilter extends Rule[LogicalPlan] { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We does not actual PUSH project through filter, we create new Project before to prune some columns. As I said in another PR, we remove the those Project before filter.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @davies The naming of this rule is not right, but I still think this PR fixes the fundamental issue of the conflicts between |
||
| def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { | ||
| case p @ Project(projectList, f: Filter) | ||
| if f.condition.deterministic && projectList.forall(_.deterministic) => | ||
| val required = f.references ++ p.references | ||
| if ((f.inputSet -- required).nonEmpty) { | ||
| p.copy(child = f.copy(child = ColumnPruning.prunedChild(f.child, required))) | ||
| } else { | ||
| p | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Attempts to eliminate the reading of unneeded columns from the query plan | ||
| */ | ||
| object ColumnPruning extends Rule[LogicalPlan] { | ||
| def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { | ||
|
|
@@ -392,7 +411,7 @@ object ColumnPruning extends Rule[LogicalPlan] { | |
| } | ||
|
|
||
| /** Applies a projection only when the child is producing unnecessary attributes */ | ||
| private def prunedChild(c: LogicalPlan, allReferences: AttributeSet) = | ||
| def prunedChild(c: LogicalPlan, allReferences: AttributeSet): LogicalPlan = | ||
| if ((c.outputSet -- allReferences.filter(c.outputSet.contains)).nonEmpty) { | ||
| Project(c.output.filter(allReferences.contains), c) | ||
| } else { | ||
|
|
@@ -874,6 +893,10 @@ object PruneFilters extends Rule[LogicalPlan] with PredicateHelper { | |
| * that were defined in the projection. | ||
| * | ||
| * This heuristic is valid assuming the expression evaluation cost is minimal. | ||
| * | ||
| * Note: Because PushProjectThroughFilter could reverse the effect of PushPredicateThroughProject, | ||
| * PushPredicateThroughProject needs to be called before the other Predicate Push Down rules for | ||
| * ensuring the predicates can be pushed as low as possible. | ||
| */ | ||
| object PushPredicateThroughProject extends Rule[LogicalPlan] with PredicateHelper { | ||
| def apply(plan: LogicalPlan): LogicalPlan = plan transform { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put this role in a separate batch is not correct, some other filter push down rules depend on this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not remove it from the original batch. Just added the extra batch here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I missed that, sorry.