[SPARK-2307][Reprise] Correctly report RDD blocks on SparkUI #1255

andrewor14 · 2014-06-28T06:15:38Z

Problem. The existing code in ExecutorPage.scala requires a linear scan through all the blocks to filter out the uncached ones. Every refresh could be expensive if there are many blocks and many executors.

Solution. The proper semantics should be the following: StorageStatusListener should contain only block statuses that are cached. This means as soon as a block is unpersisted by any mean, its status should be removed. This is reflected in the changes made in StorageStatusListener.scala.

Further, the StorageTab must stop relying on the StorageStatusListener changing a dropped block's status to StorageLevel.NONE (which no longer happens). This is reflected in the changes made in StorageTab.scala and StorageUtils.scala.

If you have been following this chain of PRs like @pwendell, you will quickly notice that this reverts the changes in #1249, which reverts the changes in #1080. In other words, we are adding back the changes from #1080, and fixing SPARK-2307 on top of those changes. Please ask questions if you are confused.

This is actually quite tricky to get right. With this commit, StorageStatusListener will only hold cached blocks (i.e. no blocks with StorageLevel.NONE). This means the StorageTab needs special handling, because it currently relies on dropped blocks having StorageLevel.NONE, rather than disappearing altogether in the storage status list.

AmplabJenkins · 2014-06-28T06:20:29Z

Merged build triggered.

AmplabJenkins · 2014-06-28T06:20:35Z

Merged build started.

AmplabJenkins · 2014-06-28T07:03:56Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-28T07:03:56Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16226/

pwendell · 2014-06-28T17:48:53Z

core/src/main/scala/org/apache/spark/storage/StorageStatusListener.scala

If you already have the executorId here, why doesn't this just directly index into the executorIdToStorageStatus instead of doing find?

good point... I have no idea

pwendell · 2014-06-28T18:59:03Z

@andrewor14 this looks good, but one thing, could we write some basic unit tests to cover the behavior of this listener (at least to test the specific case here). One of the major benefits of going through this event based model is that it should be pretty easy to write tests.

…reprise

AmplabJenkins · 2014-06-28T21:05:31Z

Merged build triggered.

AmplabJenkins · 2014-06-28T21:05:40Z

Merged build started.

AmplabJenkins · 2014-06-28T21:49:22Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-28T21:49:22Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16230/

**Problem.** The existing code in `ExecutorPage.scala` requires a linear scan through all the blocks to filter out the uncached ones. Every refresh could be expensive if there are many blocks and many executors. **Solution.** The proper semantics should be the following: `StorageStatusListener` should contain only block statuses that are cached. This means as soon as a block is unpersisted by any mean, its status should be removed. This is reflected in the changes made in `StorageStatusListener.scala`. Further, the `StorageTab` must stop relying on the `StorageStatusListener` changing a dropped block's status to `StorageLevel.NONE` (which no longer happens). This is reflected in the changes made in `StorageTab.scala` and `StorageUtils.scala`. ---------- If you have been following this chain of PRs like pwendell, you will quickly notice that this reverts the changes in #1249, which reverts the changes in #1080. In other words, we are adding back the changes from #1080, and fixing SPARK-2307 on top of those changes. Please ask questions if you are confused. Author: Andrew Or <[email protected]> Closes #1255 from andrewor14/storage-ui-fix-reprise and squashes the following commits: 45416fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into storage-ui-fix-reprise a82ea25 [Andrew Or] Add tests for StorageStatusListener 8773b01 [Andrew Or] Update comment / minor changes 3afde3f [Andrew Or] Correctly report the number of blocks on SparkUI (cherry picked from commit 3894a49) Signed-off-by: Patrick Wendell <[email protected]>

**Problem.** The existing code in `ExecutorPage.scala` requires a linear scan through all the blocks to filter out the uncached ones. Every refresh could be expensive if there are many blocks and many executors. **Solution.** The proper semantics should be the following: `StorageStatusListener` should contain only block statuses that are cached. This means as soon as a block is unpersisted by any mean, its status should be removed. This is reflected in the changes made in `StorageStatusListener.scala`. Further, the `StorageTab` must stop relying on the `StorageStatusListener` changing a dropped block's status to `StorageLevel.NONE` (which no longer happens). This is reflected in the changes made in `StorageTab.scala` and `StorageUtils.scala`. ---------- If you have been following this chain of PRs like pwendell, you will quickly notice that this reverts the changes in apache#1249, which reverts the changes in apache#1080. In other words, we are adding back the changes from apache#1080, and fixing SPARK-2307 on top of those changes. Please ask questions if you are confused. Author: Andrew Or <[email protected]> Closes apache#1255 from andrewor14/storage-ui-fix-reprise and squashes the following commits: 45416fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into storage-ui-fix-reprise a82ea25 [Andrew Or] Add tests for StorageStatusListener 8773b01 [Andrew Or] Update comment / minor changes 3afde3f [Andrew Or] Correctly report the number of blocks on SparkUI

pwendell reviewed Jun 28, 2014
View reviewed changes

andrewor14 added 3 commits June 28, 2014 13:17

Update comment / minor changes

8773b01

Add tests for StorageStatusListener

a82ea25

Merge branch 'master' of github.com:apache/spark into storage-ui-fix-…

45416fa

…reprise

asfgit closed this in 3894a49 Jul 4, 2014

andrewor14 deleted the storage-ui-fix-reprise branch July 8, 2014 21:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-2307][Reprise] Correctly report RDD blocks on SparkUI #1255

[SPARK-2307][Reprise] Correctly report RDD blocks on SparkUI #1255

Uh oh!

andrewor14 commented Jun 28, 2014

Uh oh!

AmplabJenkins commented Jun 28, 2014

Uh oh!

AmplabJenkins commented Jun 28, 2014

Uh oh!

AmplabJenkins commented Jun 28, 2014

Uh oh!

AmplabJenkins commented Jun 28, 2014

Uh oh!

pwendell Jun 28, 2014

Uh oh!

andrewor14 Jun 28, 2014

Uh oh!

pwendell commented Jun 28, 2014

Uh oh!

AmplabJenkins commented Jun 28, 2014

Uh oh!

AmplabJenkins commented Jun 28, 2014

Uh oh!

AmplabJenkins commented Jun 28, 2014

Uh oh!

AmplabJenkins commented Jun 28, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-2307][Reprise] Correctly report RDD blocks on SparkUI #1255

[SPARK-2307][Reprise] Correctly report RDD blocks on SparkUI #1255

Uh oh!

Conversation

andrewor14 commented Jun 28, 2014

Uh oh!

AmplabJenkins commented Jun 28, 2014

Uh oh!

AmplabJenkins commented Jun 28, 2014

Uh oh!

AmplabJenkins commented Jun 28, 2014

Uh oh!

AmplabJenkins commented Jun 28, 2014

Uh oh!

pwendell Jun 28, 2014

Choose a reason for hiding this comment

Uh oh!

andrewor14 Jun 28, 2014

Choose a reason for hiding this comment

Uh oh!

pwendell commented Jun 28, 2014

Uh oh!

AmplabJenkins commented Jun 28, 2014

Uh oh!

AmplabJenkins commented Jun 28, 2014

Uh oh!

AmplabJenkins commented Jun 28, 2014

Uh oh!

AmplabJenkins commented Jun 28, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants