Skip to content

Conversation

@pengbo
Copy link
Contributor

@pengbo pengbo commented Apr 17, 2019

What changes were proposed in this pull request?

Currently, the SparkSQL UI page shows only actual metric info in each SparkPlan node. However with statistics info may help us understand how the plan is designed and the reason it runs slowly. This PR is to show BroadcastHashJoinExec numOutputRows statistic info on SparkSQL UI page first when it's available.

The main changes:

  1. Add stats field in SparkPlan
  2. Initialize BroadcastHashJoinExec with the stats in Join LogicalPlan
  3. Add stats field in SQLMetric and show it on UI page when it's available

How was this patch tested?

Regarding unit test has been added, manual UI test has been tested

SPARK-27482-1

@pengbo
Copy link
Contributor Author

pengbo commented Apr 17, 2019

CC @cloud-fan @srowen @dongjoon-hyun

@cloud-fan
Copy link
Contributor

cloud-fan commented Apr 17, 2019

I agree with the use case, but the implementation is too hacky. We need a general approach to propagate the statistics from logical plan to physical plan.

@pengbo
Copy link
Contributor Author

pengbo commented Apr 18, 2019

I agree with the use case, but the implementation is too hacky. We need a general approach to propagate the statistics from logical plan to physical plan.

Thanks for your comments.

How about:

  1. Add stats field and getter/setter in SparkPlanStats (trait of SparkPlan)
  2. Add propagateProperty(candidate, plan) before collectPlaceholders in QueryPlanner.plan which is override by SparkPlanner to set the stats value in candidate SparkPlan from logicalPlan

Your feedback is appreciated. @cloud-fan @dongjoon-hyun

@pengbo
Copy link
Contributor Author

pengbo commented Apr 26, 2019

Propagate the statistics from logical plan to physical plan in Strategy.apply method instead

@dongjoon-hyun
Copy link
Member

ok to test

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-27482][SQL][Web UI]: Show BroadcastHashJoinExec numOutputRows statistics info on SparkSQL UI page [SPARK-27482][SQL][WEBUI] Show BroadcastHashJoinExec numOutputRows statistics info on SparkSQL UI page May 7, 2019
@SparkQA
Copy link

SparkQA commented May 8, 2019

Test build #105235 has finished for PR 24389 at commit b3cd0b0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@pengbo
Copy link
Contributor Author

pengbo commented May 9, 2019

@dongjoon-hyun @cloud-fan
Back to use propagate template method in QueryPlanner.plan method, as the previous way may break the SparkStrategy Extension compatibility.

@pengbo
Copy link
Contributor Author

pengbo commented May 9, 2019

retest it please

@SparkQA
Copy link

SparkQA commented May 9, 2019

Test build #105279 has finished for PR 24389 at commit 83483b1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@pengbo
Copy link
Contributor Author

pengbo commented May 17, 2019

Thanks @cloud-fan has created a more general approach
#24626

@pengbo pengbo closed this May 17, 2019
@cloud-fan
Copy link
Contributor

Hi @pengbo, my PR just adds the foundation: physical plan knows the statistics of its corresponding logical plan. But we still need some work to make the statistics available in the SQL UI. We still need your PR after my PR is merged.

@pengbo
Copy link
Contributor Author

pengbo commented May 17, 2019

Hi @pengbo, my PR just adds the foundation: physical plan knows the statistics of its corresponding logical plan. But we still need some work to make the statistics available in the SQL UI. We still need your PR after my PR is merged.

Thanks for info, I will open another one when your PR is merged.

j-baker pushed a commit to palantir/spark that referenced this pull request Jan 25, 2020
It's pretty useful if we can convert a physical plan back to a logical plan, e.g., in apache#24389

This PR introduces a new feature to `TreeNode`, which allows `TreeNode` to carry some extra information via a mutable map, and keep the information when it's copied.

The planner leverages this feature to put the logical plan into the physical plan.

a test suite that runs all TPCDS queries and checks that some common physical plans contain the corresponding logical plans.

Closes apache#24626 from cloud-fan/link.

Lead-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Peng Bo <[email protected]>
Signed-off-by: gatorsmile <[email protected]>
cloud-fan added a commit to cloud-fan/spark that referenced this pull request Jan 31, 2020
It's pretty useful if we can convert a physical plan back to a logical plan, e.g., in apache#24389

This PR introduces a new feature to `TreeNode`, which allows `TreeNode` to carry some extra information via a mutable map, and keep the information when it's copied.

The planner leverages this feature to put the logical plan into the physical plan.

a test suite that runs all TPCDS queries and checks that some common physical plans contain the corresponding logical plans.

Closes apache#24626 from cloud-fan/link.

Lead-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Peng Bo <[email protected]>
Signed-off-by: gatorsmile <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants