[SPARK-26003] Improve SQLAppStatusListener.aggregateMetrics performance #23002

mgaido91 · 2018-11-10T16:07:51Z

What changes were proposed in this pull request?

In SQLAppStatusListener.aggregateMetrics, we use the metricIds only to filter the relevant metrics. And this is a Seq which is also sorted. When there are many metrics involved, this can be pretty inefficient. The PR proposes to use a Set for it.

How was this patch tested?

NA

SparkQA · 2018-11-10T19:33:16Z

Test build #98683 has finished for PR 23002 at commit 7e79041.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mgaido91 · 2018-11-12T09:41:07Z

cc @cloud-fan @vanzin

cloud-fan · 2018-11-12T14:57:54Z

LGTM, also cc @gengliangwang

gengliangwang · 2018-11-12T15:41:46Z

sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala


  private def aggregateMetrics(exec: LiveExecutionData): Map[Long, String] = {
-    val metricIds = exec.metrics.map(_.accumulatorId).sorted
+    val metricIds = exec.metrics.map(_.accumulatorId).toSet


Actually this one can be merged into metricTypes.

gengliangwang · 2018-11-12T15:42:50Z

sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala

+    val metricIds = exec.metrics.map(_.accumulatorId).toSet
    val metricTypes = exec.metrics.map { m => (m.accumulatorId, m.metricType) }.toMap
    val metrics = exec.stages.toSeq
      .flatMap { stageId => Option(stageMetrics.get(stageId)) }


Consider also change the following flatMap / filter / groupBy into while loop

not sure what you mean here. Why should we use a while loop?

If the metrics is large, then using a while loop can reduce the number of traversal loops. And it is not complicated to do it in the code here.

I am also fine with the current code here.

yes, we can save 1 traversal, but I am not sure it is worth honestly... This approach seems cleaner to me.

gengliangwang

LGTM

SparkQA · 2018-11-12T19:18:30Z

Test build #98730 has finished for PR 23002 at commit 031d512.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-11-13T05:47:38Z

thanks, merging to master!

## What changes were proposed in this pull request? In `SQLAppStatusListener.aggregateMetrics`, we use the `metricIds` only to filter the relevant metrics. And this is a Seq which is also sorted. When there are many metrics involved, this can be pretty inefficient. The PR proposes to use a Set for it. ## How was this patch tested? NA Closes apache#23002 from mgaido91/SPARK-26003. Authored-by: Marco Gaido <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

… performance This PR is to cherry-pick #23002 to Spark 2.4 --- ## What changes were proposed in this pull request? In `SQLAppStatusListener.aggregateMetrics`, we use the `metricIds` only to filter the relevant metrics. And this is a Seq which is also sorted. When there are many metrics involved, this can be pretty inefficient. The PR proposes to use a Set for it. ## How was this patch tested? NA Closes #23002 from mgaido91/SPARK-26003. Closes #25860 from gatorsmile/cherrypickSPARK-26003. Authored-by: Marco Gaido <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

[SPARK-26003] Improve SQLAppStatusListener.aggregateMetrics performance

7e79041

gengliangwang reviewed Nov 12, 2018

View reviewed changes

address comment

031d512

gengliangwang approved these changes Nov 12, 2018

View reviewed changes

asfgit closed this in 8d7dbde Nov 13, 2018

gatorsmile mentioned this pull request Sep 19, 2019

[SPARK-26003][SQL][2.4] Improve SQLAppStatusListener.aggregateMetrics performance #25860

Closed

[SPARK-26003] Improve SQLAppStatusListener.aggregateMetrics performance #23002

[SPARK-26003] Improve SQLAppStatusListener.aggregateMetrics performance #23002

Uh oh!

Conversation

mgaido91 commented Nov 10, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Nov 10, 2018

Uh oh!

mgaido91 commented Nov 12, 2018

Uh oh!

cloud-fan commented Nov 12, 2018

Uh oh!

gengliangwang Nov 12, 2018

Choose a reason for hiding this comment

Uh oh!

gengliangwang Nov 12, 2018

Choose a reason for hiding this comment

Uh oh!

mgaido91 Nov 12, 2018

Choose a reason for hiding this comment

Uh oh!

gengliangwang Nov 12, 2018

Choose a reason for hiding this comment

Uh oh!

gengliangwang Nov 12, 2018

Choose a reason for hiding this comment

Uh oh!

mgaido91 Nov 12, 2018

Choose a reason for hiding this comment

Uh oh!

gengliangwang left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 12, 2018

Uh oh!

cloud-fan commented Nov 13, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants