Skip to content

Conversation

@gatorsmile
Copy link
Member

This PR is to cherry-pick #23002 to Spark 2.4


What changes were proposed in this pull request?

In SQLAppStatusListener.aggregateMetrics, we use the metricIds only to filter the relevant metrics. And this is a Seq which is also sorted. When there are many metrics involved, this can be pretty inefficient. The PR proposes to use a Set for it.

How was this patch tested?

NA

Closes #23002 from mgaido91/SPARK-26003.

## What changes were proposed in this pull request?

In `SQLAppStatusListener.aggregateMetrics`, we use the `metricIds` only to filter the relevant metrics. And this is a Seq which is also sorted. When there are many metrics involved, this can be pretty inefficient. The PR proposes to use a Set for it.

## How was this patch tested?

NA

Closes apache#23002 from mgaido91/SPARK-26003.

Authored-by: Marco Gaido <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@gatorsmile
Copy link
Member Author

cc @mgaido91 @zsxwing @cloud-fan

@zsxwing
Copy link
Member

zsxwing commented Sep 19, 2019

LGTM

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-26003] [Backport-2.4] Improve SQLAppStatusListener.aggregateMetrics performance [SPARK-26003][SQL][2.4] Improve SQLAppStatusListener.aggregateMetrics performance Sep 19, 2019
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@SparkQA
Copy link

SparkQA commented Sep 20, 2019

Test build #111027 has finished for PR 25860 at commit 39c1bcd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Thank you all! Merged to branch-2.4.

dongjoon-hyun pushed a commit that referenced this pull request Sep 20, 2019
… performance

This PR is to cherry-pick #23002 to Spark 2.4

---

## What changes were proposed in this pull request?

In `SQLAppStatusListener.aggregateMetrics`, we use the `metricIds` only to filter the relevant metrics. And this is a Seq which is also sorted. When there are many metrics involved, this can be pretty inefficient. The PR proposes to use a Set for it.

## How was this patch tested?

NA

Closes #23002 from mgaido91/SPARK-26003.

Closes #25860 from gatorsmile/cherrypickSPARK-26003.

Authored-by: Marco Gaido <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@mgaido91
Copy link
Contributor

a late LGTM, thanks @gatorsmile @dongjoon-hyun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants