SHS-NG M4.2: Port executors page to new backend. #8

vanzin · 2017-04-17T20:11:52Z

The executors page is built on top of the REST API, so the page itself
was easy to hook up to the new code.

Some other pages depend on the ExecutorListener class that is being
removed, though, so they needed to be modified to use data from the
new store. Fortunately, all they seemed to need is the map of executor
logs, so that was somewhat easy too.

The executor timeline graph required some extra code to save the
executor-related events in the UI store. This just implements the
existing functionality, without making any changes related to efficiency
or scalability of that graph.

I had to change some of the test golden files because the old code would
return executors in "random" order (since it used a mutable Map instead
of something that returns a sorted list), and the new code returns executors
in id order.

The static files are still kept in the core/ resources directory since
Jetty's default servlet does not handle fetching static files from multiple
jars.

TODO: add unit tests for the new ExecutorSummary fields being populated.

The executors page is built on top of the REST API, so the page itself was easy to hook up to the new code. Some other pages depend on the `ExecutorListener` class that is being removed, though, so they needed to be modified to use data from the new store. Fortunately, all they seemed to need is the map of executor logs, so that was somewhat easy too. The executor timeline graph required some extra code to save the executor-related events in the UI store. This just implements the existing functionality, without making any changes related to efficiency or scalability of that graph. I had to change some of the test golden files because the old code would return executors in "random" order (since it used a mutable Map instead of something that returns a sorted list), and the new code returns executors in id order.

## What changes were proposed in this pull request? This PR aims to optimize GroupExpressions by removing repeating expressions. `RemoveRepetitionFromGroupExpressions` is added. **Before** ```scala scala> sql("select a+1 from values 1,2 T(a) group by a+1, 1+a, A+1, 1+A").explain() == Physical Plan == WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1)#6,(1 + a#0)#7,(A#0 + 1)#8,(1 + A#0)#9], functions=[], output=[(a + 1)#5]) : +- INPUT +- Exchange hashpartitioning((a#0 + 1)#6, (1 + a#0)#7, (A#0 + 1)#8, (1 + A#0)#9, 200), None +- WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1) AS (a#0 + 1)#6,(1 + a#0) AS (1 + a#0)#7,(A#0 + 1) AS (A#0 + 1)#8,(1 + A#0) AS (1 + A#0)#9], functions=[], output=[(a#0 + 1)#6,(1 + a#0)#7,(A#0 + 1)#8,(1 + A#0)#9]) : +- INPUT +- LocalTableScan [a#0], [[1],[2]] ``` **After** ```scala scala> sql("select a+1 from values 1,2 T(a) group by a+1, 1+a, A+1, 1+A").explain() == Physical Plan == WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1)#6], functions=[], output=[(a + 1)#5]) : +- INPUT +- Exchange hashpartitioning((a#0 + 1)#6, 200), None +- WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1) AS (a#0 + 1)#6], functions=[], output=[(a#0 + 1)#6]) : +- INPUT +- LocalTableScan [a#0], [[1],[2]] ``` ## How was this patch tested? Pass the Jenkins tests (with a new testcase) Author: Dongjoon Hyun <[email protected]> Closes apache#12590 from dongjoon-hyun/SPARK-14830. (cherry picked from commit 6e63201) Signed-off-by: Michael Armbrust <[email protected]>

## What changes were proposed in this pull request? This PR aims at improving the way physical plans are explained in spark. Currently, the explain output for physical plan may look very cluttered and each operator's string representation can be very wide and wraps around in the display making it little hard to follow. This especially happens when explaining a query 1) Operating on wide tables 2) Has complex expressions etc. This PR attempts to split the output into two sections. In the header section, we display the basic operator tree with a number associated with each operator. In this section, we strictly control what we output for each operator. In the footer section, each operator is verbosely displayed. Based on the feedback from Maryann, the uncorrelated subqueries (SubqueryExecs) are not included in the main plan. They are printed separately after the main plan and can be correlated by the originating expression id from its parent plan. To illustrate, here is a simple plan displayed in old vs new way. Example query1 : ``` EXPLAIN SELECT key, Max(val) FROM explain_temp1 WHERE key > 0 GROUP BY key HAVING max(val) > 0 ``` Old : ``` *(2) Project [key#2, max(val)#15] +- *(2) Filter (isnotnull(max(val#3)#18) AND (max(val#3)#18 > 0)) +- *(2) HashAggregate(keys=[key#2], functions=[max(val#3)], output=[key#2, max(val)#15, max(val#3)#18]) +- Exchange hashpartitioning(key#2, 200) +- *(1) HashAggregate(keys=[key#2], functions=[partial_max(val#3)], output=[key#2, max#21]) +- *(1) Project [key#2, val#3] +- *(1) Filter (isnotnull(key#2) AND (key#2 > 0)) +- *(1) FileScan parquet default.explain_temp1[key#2,val#3] Batched: true, DataFilters: [isnotnull(key#2), (key#2 > 0)], Format: Parquet, Location: InMemoryFileIndex[file:/user/hive/warehouse/explain_temp1], PartitionFilters: [], PushedFilters: [IsNotNull(key), GreaterThan(key,0)], ReadSchema: struct<key:int,val:int> ``` New : ``` Project (8) +- Filter (7) +- HashAggregate (6) +- Exchange (5) +- HashAggregate (4) +- Project (3) +- Filter (2) +- Scan parquet default.explain_temp1 (1) (1) Scan parquet default.explain_temp1 [codegen id : 1] Output: [key#2, val#3] (2) Filter [codegen id : 1] Input : [key#2, val#3] Condition : (isnotnull(key#2) AND (key#2 > 0)) (3) Project [codegen id : 1] Output : [key#2, val#3] Input : [key#2, val#3] (4) HashAggregate [codegen id : 1] Input: [key#2, val#3] (5) Exchange Input: [key#2, max#11] (6) HashAggregate [codegen id : 2] Input: [key#2, max#11] (7) Filter [codegen id : 2] Input : [key#2, max(val)#5, max(val#3)#8] Condition : (isnotnull(max(val#3)#8) AND (max(val#3)#8 > 0)) (8) Project [codegen id : 2] Output : [key#2, max(val)#5] Input : [key#2, max(val)#5, max(val#3)#8] ``` Example Query2 (subquery): ``` SELECT * FROM explain_temp1 WHERE KEY = (SELECT Max(KEY) FROM explain_temp2 WHERE KEY = (SELECT Max(KEY) FROM explain_temp3 WHERE val > 0) AND val = 2) AND val > 3 ``` Old: ``` *(1) Project [key#2, val#3] +- *(1) Filter (((isnotnull(KEY#2) AND isnotnull(val#3)) AND (KEY#2 = Subquery scalar-subquery#39)) AND (val#3 > 3)) : +- Subquery scalar-subquery#39 : +- *(2) HashAggregate(keys=[], functions=[max(KEY#26)], output=[max(KEY)#45]) : +- Exchange SinglePartition : +- *(1) HashAggregate(keys=[], functions=[partial_max(KEY#26)], output=[max#47]) : +- *(1) Project [key#26] : +- *(1) Filter (((isnotnull(KEY#26) AND isnotnull(val#27)) AND (KEY#26 = Subquery scalar-subquery#38)) AND (val#27 = 2)) : : +- Subquery scalar-subquery#38 : : +- *(2) HashAggregate(keys=[], functions=[max(KEY#28)], output=[max(KEY)#43]) : : +- Exchange SinglePartition : : +- *(1) HashAggregate(keys=[], functions=[partial_max(KEY#28)], output=[max#49]) : : +- *(1) Project [key#28] : : +- *(1) Filter (isnotnull(val#29) AND (val#29 > 0)) : : +- *(1) FileScan parquet default.explain_temp3[key#28,val#29] Batched: true, DataFilters: [isnotnull(val#29), (val#29 > 0)], Format: Parquet, Location: InMemoryFileIndex[file:/user/hive/warehouse/explain_temp3], PartitionFilters: [], PushedFilters: [IsNotNull(val), GreaterThan(val,0)], ReadSchema: struct<key:int,val:int> : +- *(1) FileScan parquet default.explain_temp2[key#26,val#27] Batched: true, DataFilters: [isnotnull(key#26), isnotnull(val#27), (val#27 = 2)], Format: Parquet, Location: InMemoryFileIndex[file:/user/hive/warehouse/explain_temp2], PartitionFilters: [], PushedFilters: [IsNotNull(key), IsNotNull(val), EqualTo(val,2)], ReadSchema: struct<key:int,val:int> +- *(1) FileScan parquet default.explain_temp1[key#2,val#3] Batched: true, DataFilters: [isnotnull(key#2), isnotnull(val#3), (val#3 > 3)], Format: Parquet, Location: InMemoryFileIndex[file:/user/hive/warehouse/explain_temp1], PartitionFilters: [], PushedFilters: [IsNotNull(key), IsNotNull(val), GreaterThan(val,3)], ReadSchema: struct<key:int,val:int> ``` New: ``` Project (3) +- Filter (2) +- Scan parquet default.explain_temp1 (1) (1) Scan parquet default.explain_temp1 [codegen id : 1] Output: [key#2, val#3] (2) Filter [codegen id : 1] Input : [key#2, val#3] Condition : (((isnotnull(KEY#2) AND isnotnull(val#3)) AND (KEY#2 = Subquery scalar-subquery#23)) AND (val#3 > 3)) (3) Project [codegen id : 1] Output : [key#2, val#3] Input : [key#2, val#3] ===== Subqueries ===== Subquery:1 Hosting operator id = 2 Hosting Expression = Subquery scalar-subquery#23 HashAggregate (9) +- Exchange (8) +- HashAggregate (7) +- Project (6) +- Filter (5) +- Scan parquet default.explain_temp2 (4) (4) Scan parquet default.explain_temp2 [codegen id : 1] Output: [key#26, val#27] (5) Filter [codegen id : 1] Input : [key#26, val#27] Condition : (((isnotnull(KEY#26) AND isnotnull(val#27)) AND (KEY#26 = Subquery scalar-subquery#22)) AND (val#27 = 2)) (6) Project [codegen id : 1] Output : [key#26] Input : [key#26, val#27] (7) HashAggregate [codegen id : 1] Input: [key#26] (8) Exchange Input: [max#35] (9) HashAggregate [codegen id : 2] Input: [max#35] Subquery:2 Hosting operator id = 5 Hosting Expression = Subquery scalar-subquery#22 HashAggregate (15) +- Exchange (14) +- HashAggregate (13) +- Project (12) +- Filter (11) +- Scan parquet default.explain_temp3 (10) (10) Scan parquet default.explain_temp3 [codegen id : 1] Output: [key#28, val#29] (11) Filter [codegen id : 1] Input : [key#28, val#29] Condition : (isnotnull(val#29) AND (val#29 > 0)) (12) Project [codegen id : 1] Output : [key#28] Input : [key#28, val#29] (13) HashAggregate [codegen id : 1] Input: [key#28] (14) Exchange Input: [max#37] (15) HashAggregate [codegen id : 2] Input: [max#37] ``` Note: I opened this PR as a WIP to start getting feedback. I will be on vacation starting tomorrow would not be able to immediately incorporate the feedback. I will start to work on them as soon as i can. Also, currently this PR provides a basic infrastructure for explain enhancement. The details about individual operators will be implemented in follow-up prs ## How was this patch tested? Added a new test `explain.sql` that tests basic scenarios. Need to add more tests. Closes apache#24759 from dilipbiswal/explain_feature. Authored-by: Dilip Biswal <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

vanzin force-pushed the shs-ng/M4.2 branch from 0a2ca45 to 19218ad Compare April 17, 2017 21:40

vanzin force-pushed the shs-ng/M4.1 branch from 6b5bb75 to 6321bff Compare April 17, 2017 21:40

vanzin force-pushed the shs-ng/M4.2 branch from 19218ad to e75a98d Compare April 25, 2017 17:43

vanzin force-pushed the shs-ng/M4.1 branch from 6321bff to 422c1f2 Compare April 25, 2017 17:43

vanzin force-pushed the shs-ng/M4.2 branch from e75a98d to d6318f8 Compare April 26, 2017 18:11

vanzin force-pushed the shs-ng/M4.1 branch from 422c1f2 to b1a4e06 Compare April 26, 2017 18:11

vanzin force-pushed the shs-ng/M4.2 branch from d6318f8 to 947e9d5 Compare April 26, 2017 23:58

vanzin force-pushed the shs-ng/M4.1 branch from b1a4e06 to 2057aaf Compare April 26, 2017 23:58

vanzin force-pushed the shs-ng/M4.2 branch from 947e9d5 to 2521cae Compare April 27, 2017 18:14

vanzin force-pushed the shs-ng/M4.1 branch from 2057aaf to 8b7aa41 Compare April 27, 2017 18:14

vanzin force-pushed the shs-ng/M4.2 branch from 2521cae to 58c5d00 Compare April 27, 2017 21:31

vanzin force-pushed the shs-ng/M4.1 branch from 8b7aa41 to 96885b3 Compare April 27, 2017 21:31

vanzin force-pushed the shs-ng/M4.2 branch from 58c5d00 to 9dd2f98 Compare April 28, 2017 15:08

vanzin force-pushed the shs-ng/M4.1 branch from 96885b3 to 285a2c2 Compare April 28, 2017 15:08

vanzin force-pushed the shs-ng/M4.2 branch 2 times, most recently from 52e2641 to 5505c83 Compare May 1, 2017 22:58

vanzin force-pushed the shs-ng/M4.1 branch from 285a2c2 to 96990e6 Compare May 1, 2017 22:58

vanzin force-pushed the shs-ng/M4.2 branch from 5505c83 to 974eb38 Compare May 5, 2017 21:19

vanzin force-pushed the shs-ng/M4.1 branch from 96990e6 to 71f8edc Compare May 5, 2017 21:19

vanzin force-pushed the shs-ng/M4.2 branch from 974eb38 to 4fece2f Compare May 5, 2017 22:57

vanzin force-pushed the shs-ng/M4.1 branch from 71f8edc to 211a1a9 Compare May 5, 2017 22:57

vanzin force-pushed the shs-ng/M4.2 branch from 4fece2f to b3701f6 Compare May 8, 2017 17:25

vanzin force-pushed the shs-ng/M4.1 branch from 211a1a9 to 7a8058b Compare May 8, 2017 17:25

vanzin force-pushed the shs-ng/M4.2 branch from b3701f6 to 433d1ec Compare May 9, 2017 01:09

vanzin force-pushed the shs-ng/M4.1 branch from 7a8058b to a362fa0 Compare May 9, 2017 01:09

vanzin force-pushed the shs-ng/M4.2 branch from 433d1ec to f7d6a74 Compare May 15, 2017 20:44

vanzin force-pushed the shs-ng/M4.1 branch from a362fa0 to df51af6 Compare May 15, 2017 20:45

vanzin force-pushed the shs-ng/M4.2 branch from f7d6a74 to 4526ffe Compare May 26, 2017 18:53

vanzin force-pushed the shs-ng/M4.1 branch from df51af6 to cb7b315 Compare May 26, 2017 18:53

vanzin force-pushed the shs-ng/M4.2 branch from 4526ffe to d66024c Compare May 30, 2017 23:04

vanzin force-pushed the shs-ng/M4.1 branch from cb7b315 to 27a92dd Compare May 30, 2017 23:04

vanzin closed this May 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SHS-NG M4.2: Port executors page to new backend. #8

SHS-NG M4.2: Port executors page to new backend. #8

Uh oh!

vanzin commented Apr 17, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SHS-NG M4.2: Port executors page to new backend. #8

SHS-NG M4.2: Port executors page to new backend. #8

Uh oh!

Conversation

vanzin commented Apr 17, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants