-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-26003] Improve SQLAppStatusListener.aggregateMetrics performance #23002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #98683 has finished for PR 23002 at commit
|
|
LGTM, also cc @gengliangwang |
|
|
||
| private def aggregateMetrics(exec: LiveExecutionData): Map[Long, String] = { | ||
| val metricIds = exec.metrics.map(_.accumulatorId).sorted | ||
| val metricIds = exec.metrics.map(_.accumulatorId).toSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this one can be merged into metricTypes.
| val metricIds = exec.metrics.map(_.accumulatorId).toSet | ||
| val metricTypes = exec.metrics.map { m => (m.accumulatorId, m.metricType) }.toMap | ||
| val metrics = exec.stages.toSeq | ||
| .flatMap { stageId => Option(stageMetrics.get(stageId)) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider also change the following flatMap / filter / groupBy into while loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what you mean here. Why should we use a while loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the metrics is large, then using a while loop can reduce the number of traversal loops. And it is not complicated to do it in the code here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also fine with the current code here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we can save 1 traversal, but I am not sure it is worth honestly... This approach seems cleaner to me.
gengliangwang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Test build #98730 has finished for PR 23002 at commit
|
|
thanks, merging to master! |
## What changes were proposed in this pull request? In `SQLAppStatusListener.aggregateMetrics`, we use the `metricIds` only to filter the relevant metrics. And this is a Seq which is also sorted. When there are many metrics involved, this can be pretty inefficient. The PR proposes to use a Set for it. ## How was this patch tested? NA Closes apache#23002 from mgaido91/SPARK-26003. Authored-by: Marco Gaido <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
## What changes were proposed in this pull request? In `SQLAppStatusListener.aggregateMetrics`, we use the `metricIds` only to filter the relevant metrics. And this is a Seq which is also sorted. When there are many metrics involved, this can be pretty inefficient. The PR proposes to use a Set for it. ## How was this patch tested? NA Closes apache#23002 from mgaido91/SPARK-26003. Authored-by: Marco Gaido <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
… performance This PR is to cherry-pick #23002 to Spark 2.4 --- ## What changes were proposed in this pull request? In `SQLAppStatusListener.aggregateMetrics`, we use the `metricIds` only to filter the relevant metrics. And this is a Seq which is also sorted. When there are many metrics involved, this can be pretty inefficient. The PR proposes to use a Set for it. ## How was this patch tested? NA Closes #23002 from mgaido91/SPARK-26003. Closes #25860 from gatorsmile/cherrypickSPARK-26003. Authored-by: Marco Gaido <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
In
SQLAppStatusListener.aggregateMetrics, we use themetricIdsonly to filter the relevant metrics. And this is a Seq which is also sorted. When there are many metrics involved, this can be pretty inefficient. The PR proposes to use a Set for it.How was this patch tested?
NA