[SPARK-7303] [SQL] push down project if possible when the child is sort #5838

scwf · 2015-05-01T15:11:49Z

Optimize the case of project(_, sort) , a example is:

select key from (select * from testData order by key) t

before this PR:

== Parsed Logical Plan ==
'Project ['key]
 'Subquery t
  'Sort ['key ASC], true
   'Project [*]
    'UnresolvedRelation [testData], None

== Analyzed Logical Plan ==
Project [key#0]
 Subquery t
  Sort [key#0 ASC], true
   Project [key#0,value#1]
    Subquery testData
     LogicalRDD [key#0,value#1], MapPartitionsRDD[1]

== Optimized Logical Plan ==
Project [key#0]
 Sort [key#0 ASC], true
  LogicalRDD [key#0,value#1], MapPartitionsRDD[1] 

== Physical Plan ==
Project [key#0]
 Sort [key#0 ASC], true
  Exchange (RangePartitioning [key#0 ASC], 5), []
   PhysicalRDD [key#0,value#1], MapPartitionsRDD[1]

after this PR

== Parsed Logical Plan ==
'Project ['key]
 'Subquery t
  'Sort ['key ASC], true
   'Project [*]
    'UnresolvedRelation [testData], None

== Analyzed Logical Plan ==
Project [key#0]
 Subquery t
  Sort [key#0 ASC], true
   Project [key#0,value#1]
    Subquery testData
     LogicalRDD [key#0,value#1], MapPartitionsRDD[1] 

== Optimized Logical Plan ==
Sort [key#0 ASC], true
 Project [key#0]
  LogicalRDD [key#0,value#1], MapPartitionsRDD[1] 

== Physical Plan ==
Sort [key#0 ASC], true
 Exchange (RangePartitioning [key#0 ASC], 5), []
  Project [key#0]
   PhysicalRDD [key#0,value#1], MapPartitionsRDD[1]

with this rule we will first do column pruning on the table and then do sorting.

AmplabJenkins · 2015-05-01T15:12:12Z

Merged build triggered.

AmplabJenkins · 2015-05-01T15:12:18Z

Merged build started.

SparkQA · 2015-05-01T15:12:56Z

Test build #31561 has started for PR 5838 at commit b09b895.

SparkQA · 2015-05-01T16:13:09Z

Test build #31561 has finished for PR 5838 at commit b09b895.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class ExecutorUIData(

AmplabJenkins · 2015-05-01T16:13:12Z

Merged build finished. Test FAILed.

AmplabJenkins · 2015-05-01T16:13:13Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31561/
Test FAILed.

AmplabJenkins · 2015-05-01T16:57:09Z

Merged build triggered.

AmplabJenkins · 2015-05-01T16:57:18Z

Merged build started.

SparkQA · 2015-05-01T16:59:03Z

Test build #31572 has started for PR 5838 at commit e230155.

SparkQA · 2015-05-01T18:39:43Z

Test build #31572 has finished for PR 5838 at commit e230155.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-05-01T18:39:47Z

Merged build finished. Test PASSed.

AmplabJenkins · 2015-05-01T18:39:48Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31572/
Test PASSed.

scwf · 2015-05-08T06:33:37Z

@yhuai can you help review this?

scwf · 2015-05-08T06:33:47Z

retest this please

AmplabJenkins · 2015-05-08T06:37:11Z

Merged build triggered.

AmplabJenkins · 2015-05-08T06:37:16Z

Merged build started.

SparkQA · 2015-05-08T06:37:55Z

Test build #32207 has started for PR 5838 at commit e230155.

SparkQA · 2015-05-08T08:33:32Z

Test build #32207 has finished for PR 5838 at commit e230155.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-05-08T08:33:37Z

Merged build finished. Test PASSed.

AmplabJenkins · 2015-05-08T08:33:37Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32207/
Test PASSed.

marmbrus · 2015-05-08T18:27:59Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala

This name is very generic. push project past sort perhaps. please also add a test where the push down would be invalid.

AmplabJenkins · 2015-05-09T01:17:11Z

Merged build triggered.

AmplabJenkins · 2015-05-09T01:17:20Z

Merged build started.

SparkQA · 2015-05-09T01:19:09Z

Test build #32288 has started for PR 5838 at commit b00d833.

SparkQA · 2015-05-09T03:13:46Z

Test build #32288 has finished for PR 5838 at commit b00d833.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-05-09T03:13:50Z

Merged build finished. Test PASSed.

AmplabJenkins · 2015-05-09T03:13:51Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32288/
Test PASSed.

scwf · 2015-05-09T03:35:48Z

updated done

scwf · 2015-05-12T02:31:44Z

is this ok to go @marmbrus ?

Optimize the case of `project(_, sort)` , a example is: `select key from (select * from testData order by key) t` before this PR: ``` == Parsed Logical Plan == 'Project ['key] 'Subquery t 'Sort ['key ASC], true 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Project [key#0] Subquery t Sort [key#0 ASC], true Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Project [key#0] Sort [key#0 ASC], true LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == Project [key#0] Sort [key#0 ASC], true Exchange (RangePartitioning [key#0 ASC], 5), [] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] ``` after this PR ``` == Parsed Logical Plan == 'Project ['key] 'Subquery t 'Sort ['key ASC], true 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Project [key#0] Subquery t Sort [key#0 ASC], true Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Sort [key#0 ASC], true Project [key#0] LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == Sort [key#0 ASC], true Exchange (RangePartitioning [key#0 ASC], 5), [] Project [key#0] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] ``` with this rule we will first do column pruning on the table and then do sorting. Author: scwf <[email protected]> This patch had conflicts when merged, resolved by Committer: Michael Armbrust <[email protected]> Closes #5838 from scwf/pruning and squashes the following commits: b00d833 [scwf] address michael's comment e230155 [scwf] fix tests failure b09b895 [scwf] improve column pruning (cherry picked from commit 59250fe) Signed-off-by: Michael Armbrust <[email protected]>

marmbrus · 2015-05-13T23:14:40Z

Thanks, merge conflict fixed manually and merged to master and 1.4

Optimize the case of `project(_, sort)` , a example is: `select key from (select * from testData order by key) t` before this PR: ``` == Parsed Logical Plan == 'Project ['key] 'Subquery t 'Sort ['key ASC], true 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Project [key#0] Subquery t Sort [key#0 ASC], true Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Project [key#0] Sort [key#0 ASC], true LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == Project [key#0] Sort [key#0 ASC], true Exchange (RangePartitioning [key#0 ASC], 5), [] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] ``` after this PR ``` == Parsed Logical Plan == 'Project ['key] 'Subquery t 'Sort ['key ASC], true 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Project [key#0] Subquery t Sort [key#0 ASC], true Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Sort [key#0 ASC], true Project [key#0] LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == Sort [key#0 ASC], true Exchange (RangePartitioning [key#0 ASC], 5), [] Project [key#0] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] ``` with this rule we will first do column pruning on the table and then do sorting. Author: scwf <[email protected]> This patch had conflicts when merged, resolved by Committer: Michael Armbrust <[email protected]> Closes apache#5838 from scwf/pruning and squashes the following commits: b00d833 [scwf] address michael's comment e230155 [scwf] fix tests failure b09b895 [scwf] improve column pruning

improve column pruning

b09b895

fix tests failure

e230155

marmbrus reviewed May 8, 2015
View reviewed changes

address michael's comment

b00d833

asfgit closed this in 59250fe May 13, 2015

scwf deleted the pruning branch May 14, 2015 00:51

[SPARK-7303] [SQL] push down project if possible when the child is sort #5838

[SPARK-7303] [SQL] push down project if possible when the child is sort #5838

Uh oh!

Conversation

scwf commented May 1, 2015

Uh oh!

AmplabJenkins commented May 1, 2015

Uh oh!

AmplabJenkins commented May 1, 2015

Uh oh!

SparkQA commented May 1, 2015

Uh oh!

SparkQA commented May 1, 2015

Uh oh!

AmplabJenkins commented May 1, 2015

Uh oh!

AmplabJenkins commented May 1, 2015

Uh oh!

AmplabJenkins commented May 1, 2015

Uh oh!

AmplabJenkins commented May 1, 2015

Uh oh!

SparkQA commented May 1, 2015

Uh oh!

SparkQA commented May 1, 2015

Uh oh!

AmplabJenkins commented May 1, 2015

Uh oh!

AmplabJenkins commented May 1, 2015

Uh oh!

scwf commented May 8, 2015

Uh oh!

scwf commented May 8, 2015

Uh oh!

AmplabJenkins commented May 8, 2015

Uh oh!

AmplabJenkins commented May 8, 2015

Uh oh!

SparkQA commented May 8, 2015

Uh oh!

SparkQA commented May 8, 2015

Uh oh!

AmplabJenkins commented May 8, 2015

Uh oh!

AmplabJenkins commented May 8, 2015

Uh oh!

marmbrus May 8, 2015

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented May 9, 2015

Uh oh!

AmplabJenkins commented May 9, 2015

Uh oh!

SparkQA commented May 9, 2015

Uh oh!

SparkQA commented May 9, 2015

Uh oh!

AmplabJenkins commented May 9, 2015

Uh oh!

AmplabJenkins commented May 9, 2015

Uh oh!

scwf commented May 9, 2015

Uh oh!

scwf commented May 12, 2015

Uh oh!

marmbrus commented May 13, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants