[SPARK-20657][core] Speed up rendering of the stages page. #20013

vanzin · 2017-12-18T22:04:31Z

There are two main changes to speed up rendering of the tasks list
when rendering the stage page.

The first one makes the code only load the tasks being shown in the
current page of the tasks table, and information related to only
those tasks. One side-effect of this change is that the graph that
shows task-related events now only shows events for the tasks in
the current page, instead of the previously hardcoded limit of "events
for the first 1000 tasks". That ends up helping with readability,
though.

To make sorting efficient when using a disk store, the task wrapper
was extended to include many new indices, one for each of the sortable
columns in the UI, and metrics for which quantiles are calculated.

The second changes the way metric quantiles are calculated for stages.
Instead of using the "Distribution" class to process data for all task
metrics, which requires scanning all tasks of a stage, the code now
uses the KVStore "skip()" functionality to only read tasks that contain
interesting information for the quantiles that are desired.

This is still not cheap; because there are many metrics that the UI
and API track, the code needs to scan the index for each metric to
gather the information. Savings come mainly from skipping deserialization
when using the disk store, but the in-memory code also seems to be
faster than before (most probably because of other changes in this
patch).

To make subsequent calls faster, some quantiles are cached in the
status store. This makes UIs much faster after the first time a stage
has been loaded.

With the above changes, a lot of code in the UI layer could be simplified.

There are two main changes to speed up rendering of the tasks list when rendering the stage page. The first one makes the code only load the tasks being shown in the current page of the tasks table, and information related to only those tasks. One side-effect of this change is that the graph that shows task-related events now only shows events for the tasks in the current page, instead of the previously hardcoded limit of "events for the first 1000 tasks". That ends up helping with readability, though. To make sorting efficient when using a disk store, the task wrapper was extended to include many new indices, one for each of the sortable columns in the UI, and metrics for which quantiles are calculated. The second changes the way metric quantiles are calculated for stages. Instead of using the "Distribution" class to process data for all task metrics, which requires scanning all tasks of a stage, the code now uses the KVStore "skip()" functionality to only read tasks that contain interesting information for the quantiles that are desired. This is still not cheap; because there are many metrics that the UI and API track, the code needs to scan the index for each metric to gather the information. Savings come mainly from skipping deserialization when using the disk store, but the in-memory code also seems to be faster than before (most probably because of other changes in this patch). To make subsequent calls faster, some quantiles are cached in the status store. This makes UIs much faster after the first time a stage has been loaded. With the above changes, a lot of code in the UI layer could be simplified.

vanzin · 2017-12-18T22:06:43Z

Some previous comments for this PR can be found at: vanzin#41

vanzin · 2017-12-19T00:50:12Z

I did some non-scientific test locally and this change causes the disk store for an app to grow to about double its previous size; while not great, given how much new data is being written, that's probably ok.

SparkQA · 2017-12-19T01:28:02Z

Test build #85073 has finished for PR 20013 at commit 5b7f205.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-22T03:17:12Z

Test build #85289 has finished for PR 20013 at commit 429d12e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-01-03T03:00:35Z

Test build #85614 has finished for PR 20013 at commit e22bceb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2018-01-03T19:39:18Z

@squito

ajbozarth · 2018-01-04T19:44:52Z

I'll take a look at this either today or tomorrow

ajbozarth · 2018-01-05T00:20:26Z

I did a read through, I didn't see any issues from a code standpoint but I haven't spent much time in your new code so another pair of eyes would be good. I also didn't check out and run the code either, if you want me too I can do that tomorrow.

gengliangwang · 2018-01-05T09:22:38Z

core/src/main/scala/org/apache/spark/status/LiveEntity.scala

-          metrics.shuffleWriteMetrics.recordsWritten))
-      if (old != null) calculateMetricsDelta(recordedMetrics, old) else recordedMetrics
+      val old = this.metrics
+      val newMetrics = new MetricsTracker()


create a method in MetricsTracker which accepts a TaskMetrics and update the metrics?

This would be the only place where it's used, so I don't see any gains.

Changing so many fields here seems ugly..But I respect you preference

gengliangwang · 2018-01-05T11:53:34Z

@vanzin I haven't run the code. I wonder which changes double the disk usage? The new indices or the cached quantiles?

vanzin · 2018-01-05T18:03:54Z

I wonder which changes double the disk usage?

It's the new indices, more explicitly the values, not the keys. I tried changing the disk layout to write all the indices in a new namespace with a very short key length, and that didn't change the resulting store size at all.

squito

looks reasonable, my comments are nothing important really. I need to do another pass though. Also didn't look at closely at the UI related code.

squito · 2018-01-05T21:56:36Z

core/src/main/scala/org/apache/spark/status/AppStatusStore.scala

+   * Whether to cache information about a specific metric quantile. We cache quantiles at every 0.05
+   * step, which covers the default values used both in the API and in the stages page.
+   */
+  private def shouldCacheQuantile(q: Double): Boolean = ((q * 100).toInt % 5) == 0


this happens to work, but fyi its a bit dangerous with roundoff -- 0.55 * 100 has an error, luckily it happens to be in a direction fixed by toInt.

squito · 2018-01-05T22:06:22Z

core/src/main/scala/org/apache/spark/status/AppStatusStore.scala

+    // to make the code simpler.
+    if (cachedQuantiles.size == quantiles.size) {
+      def toValues(fn: CachedQuantile => Double): IndexedSeq[Double] = {
+        cachedQuantiles.map(fn).toIndexedSeq


minor -- you can skip a copy here that comes with toIndexedSeq. You could use a WrappedArray which is also an IndexedSeq. you can just drop the toIndexedSeq and it'll happen automatically (implicit conversion in Predef), or you can make explicit with WrappedArray.make().

scala> val arr = Array(1,2,3,4) arr: Array[Int] = Array(1, 2, 3, 4) scala> val x: IndexedSeq[Int] = arr x: IndexedSeq[Int] = WrappedArray(1, 2, 3, 4) scala> x.toArray eq arr res9: Boolean = true scala> arr.toIndexedSeq.toArray eq arr res10: Boolean = false

squito · 2018-01-05T22:19:30Z

core/src/main/scala/org/apache/spark/status/AppStatusStore.scala

+    // stabilize once the stage finishes. It's also slow, especially with disk stores.
+    val indices = quantiles.map { q => math.min((q * count).toLong, count - 1) }
+
+    def scanTasks(index: String)(fn: TaskDataWrapper => Long): IndexedSeq[Double] = {


aside: an alternative would be to use a datastructure which supports online approximate quantile computation. tdigest is the one I'm familiar with, perhaps there are others.. But lets leave exploring that for the future.

Funny you mention that, my first implementation actually used t-digest. But it's a bit slow for this purpose (the unreleased version at the time was faster) and uses quite a bit of memory, so I gave up trying that for now. It also changes the computed values (since they'd be approximations), which requires changing all the golden files (argh).

There's some code for approximate quantiles in the sql module that could potentially be refactored.

SparkQA · 2018-01-06T04:59:00Z

Test build #85737 has finished for PR 20013 at commit c4e7f61.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

squito · 2018-01-06T05:03:40Z

core/src/main/scala/org/apache/spark/status/LiveEntity.scala

+      weakIntern(info.executorId),
+      weakIntern(info.host),
+      weakIntern(info.status),
+      weakIntern(info.taskLocality.toString()),


I wasn't familiar with WeakInterner before, its neat. Not super important, but for most of these, wouldn't strong interning be fine? I expect the same values to come up over and over again across apps for these:

executorId
host
status
taskLocality
storageLevel

though weak interning makes sense for

exec.hostPort
accumulators

I don't think weak vs. strong interning makes a whole lot of difference with Java 8 (= no perm gen). I used the weak one because that's what the old code used.

It can probably be extended to more things, but tasks are the bulk of an app's data, and that's what I was focusing on in this PR.

squito · 2018-01-06T05:07:04Z

lgtm

gengliangwang · 2018-01-07T05:01:52Z

The code looks good. But it is a lot of changes in SHS, I suggest running more tests(real workloads) before merge.

gengliangwang · 2018-01-07T14:48:50Z

Tried run with event logs of TPCDS benchmark(scale 1) on my local setup, the UI works.
The leveldb size is about 50% larger.

vanzin · 2018-01-08T18:45:47Z

I suggest running more tests(real workloads)

I've tested all of these PRs attached to SPARK-18085 with lots of real and synthetic app logs.

sameeragarwal · 2018-01-08T21:16:11Z

@vanzin Thanks for the great work! Given this is a sizable change, I'll like get this in before cutting RC1. (/cc @cloud-fan @gengliangwang)

SparkQA · 2018-01-08T22:43:23Z

Test build #85806 has finished for PR 20013 at commit ed59b4d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-01-09T02:03:51Z

Test build #85809 has finished for PR 20013 at commit 44c9647.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-01-09T06:46:09Z

core/src/main/scala/org/apache/spark/status/AppStatusListener.scala

    val now = System.nanoTime()

-    val metricsDelta = liveTasks.remove(event.taskInfo.taskId).map { task =>
+    val (updatedTask, metricsDelta) = liveTasks.remove(event.taskInfo.taskId).map { task =>


where is updatedTask used?

cloud-fan · 2018-01-09T07:17:57Z

core/src/main/scala/org/apache/spark/status/AppStatusStore.scala

+        scanTasks(TaskIndexNames.SHUFFLE_WRITE_TIME) { t => t.shuffleWriteTime }))
+
+    // Go through the computed quantiles and cache the values that match the caching criteria.
+    computedQuantiles.quantiles.zipWithIndex


I think it makes sense to cache them for disk store, but may be an overkill for in-memory store.

The cached data is really small, no point in having different code paths to save a few hundred bytes.

cloud-fan · 2018-01-09T07:43:43Z

core/src/main/scala/org/apache/spark/status/LiveEntity.scala

+
+      this.metrics = newMetrics
+      if (old.executorDeserializeTime >= 0L) {
+        old.subtract(newMetrics)


shouldn't this be newMetrics.subtract(old)? The returned delta will be added to an old metrics.

No, this is right. The parameter is actually a delta; so it has to be applied to the old value, except when the old value is not initialized (which is the condition in L162).

But let me double check this again.

Actually sorry, the parameters is not the delta, the return value is. This code is not right but the solution is different than what you suggest. I'll also add a test for it.

cloud-fan · 2018-01-09T07:46:32Z

core/src/main/scala/org/apache/spark/status/storeTypes.scala

+    val taskId: JLong,
+    @KVIndexParam(value = TaskIndexNames.TASK_INDEX, parent = TaskIndexNames.STAGE)
+    val index: Int,
+    @KVIndexParam(value = TaskIndexNames.ATTEMPT, parent = TaskIndexNames.STAGE)


do we really need to create index for every field?

Yes, the UI can be sorted by all of these. (And also that makes it possible to calculate the quantiles without having to deserialize everything.)

gengliangwang · 2018-01-09T07:51:11Z

core/src/main/scala/org/apache/spark/status/AppStatusStore.scala

-      shuffleReadMetrics = shuffleReadMetrics,
-      shuffleWriteMetrics = shuffleWriteMetrics
-    )
+      executorDeserializeTime = scanTasks(TaskIndexNames.DESER_TIME) { t =>


From here there are a lot accesses to KV store.
And I think many new indexes are only used here.
I think we can make it simpler, just get the task data with one access to KV store, and get the quantiles

+1, cache miss should not be very frequent, I think it's fine to be a little expensive when building the cache.

Deserializing 100k tasks from the disk store is a lot more expensive than this (which only deserializes number of indices * number of quantiles). Believe me, I tested all of the suggestions you guys are making.

gengliangwang · 2018-01-09T08:02:51Z

The major concern is that with these code changes, the memory usage will be much larger with InMemoryStore.
Also building so many new indexes just for getting computedQuantiles, seems overkilling.

vanzin · 2018-01-09T17:42:25Z

the memory usage will be much larger with InMemoryStore.

No it won't. The in-memory store does not use extra memory for indices. It sorts the data when the index is read, and discards that memory after the iterator is discarded.

SparkQA · 2018-01-09T22:57:52Z

Test build #85867 has finished for PR 20013 at commit 86275b0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-01-10T02:07:53Z

core/src/main/scala/org/apache/spark/status/LiveEntity.scala

-    shuffleWriteRecords += delta.shuffleWriteMetrics.recordsWritten
-    memoryBytesSpilled += delta.memoryBytesSpilled
-    diskBytesSpilled += delta.diskBytesSpilled
+private class MetricsTracker(


can we just use api.v1.TaskMetrics here? They are both immutable view, the only difference is MetricsTracker flattens the fields.

ok, but the code becomes uglier because v1.TaskMetrics is kind of an annoying type to use.

cloud-fan · 2018-01-10T02:09:14Z

core/src/main/scala/org/apache/spark/status/LiveEntity.scala

+    buildUpdate()
+  }
+
+  def updateAndGet(kvstore: KVStore, now: Long): TaskDataWrapper = {


why do we need it? Since the caller side doesn't care about the return value, we can just update the lastWriteTime in doUpdate, and then call doUpdate instead.

SparkQA · 2018-01-10T06:13:21Z

Test build #85899 has finished for PR 20013 at commit 3d66505.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-01-10T20:52:39Z

Test build #85924 has finished for PR 20013 at commit 34d02b2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-01-11T11:42:40Z

thanks, merging to master/2.3!

There are two main changes to speed up rendering of the tasks list when rendering the stage page. The first one makes the code only load the tasks being shown in the current page of the tasks table, and information related to only those tasks. One side-effect of this change is that the graph that shows task-related events now only shows events for the tasks in the current page, instead of the previously hardcoded limit of "events for the first 1000 tasks". That ends up helping with readability, though. To make sorting efficient when using a disk store, the task wrapper was extended to include many new indices, one for each of the sortable columns in the UI, and metrics for which quantiles are calculated. The second changes the way metric quantiles are calculated for stages. Instead of using the "Distribution" class to process data for all task metrics, which requires scanning all tasks of a stage, the code now uses the KVStore "skip()" functionality to only read tasks that contain interesting information for the quantiles that are desired. This is still not cheap; because there are many metrics that the UI and API track, the code needs to scan the index for each metric to gather the information. Savings come mainly from skipping deserialization when using the disk store, but the in-memory code also seems to be faster than before (most probably because of other changes in this patch). To make subsequent calls faster, some quantiles are cached in the status store. This makes UIs much faster after the first time a stage has been loaded. With the above changes, a lot of code in the UI layer could be simplified. Author: Marcelo Vanzin <[email protected]> Closes #20013 from vanzin/SPARK-20657. (cherry picked from commit 1c70da3) Signed-off-by: Wenchen Fan <[email protected]>

vanzin mentioned this pull request Dec 19, 2017

[SPARK-20654][core] Add config to limit disk usage of the history server. #20011

Closed

Merge branch 'master' into SPARK-20657

429d12e

Close db before throwing version-related exception.

e22bceb

gengliangwang reviewed Jan 5, 2018

View reviewed changes

squito reviewed Jan 5, 2018

View reviewed changes

Feedback.

c4e7f61

squito reviewed Jan 6, 2018

View reviewed changes

Merge branch 'master' into SPARK-20657

ed59b4d

Missed a spot.

44c9647

cloud-fan reviewed Jan 9, 2018

View reviewed changes

gengliangwang reviewed Jan 9, 2018

View reviewed changes

Fix metric delta calculation.

86275b0

cloud-fan reviewed Jan 10, 2018

View reviewed changes

Feedback.

3d66505

Fix bug.

34d02b2

asfgit closed this in 1c70da3 Jan 11, 2018

vanzin deleted the SPARK-20657 branch January 16, 2018 19:49

[SPARK-20657][core] Speed up rendering of the stages page. #20013

[SPARK-20657][core] Speed up rendering of the stages page. #20013

Uh oh!

Conversation

vanzin commented Dec 18, 2017

Uh oh!

vanzin commented Dec 18, 2017

Uh oh!

vanzin commented Dec 19, 2017

Uh oh!

SparkQA commented Dec 19, 2017

Uh oh!

SparkQA commented Dec 22, 2017

Uh oh!

SparkQA commented Jan 3, 2018

Uh oh!

vanzin commented Jan 3, 2018

Uh oh!

ajbozarth commented Jan 4, 2018

Uh oh!

ajbozarth commented Jan 5, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gengliangwang commented Jan 5, 2018

Uh oh!

vanzin commented Jan 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

squito left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 6, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

squito commented Jan 6, 2018

Uh oh!

gengliangwang commented Jan 7, 2018

Uh oh!

gengliangwang commented Jan 7, 2018

Uh oh!

vanzin commented Jan 8, 2018

Uh oh!

sameeragarwal commented Jan 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Jan 8, 2018

Uh oh!

SparkQA commented Jan 9, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vanzin Jan 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

vanzin commented Jan 5, 2018 •

edited

Loading

sameeragarwal commented Jan 8, 2018 •

edited

Loading

vanzin Jan 9, 2018 •

edited

Loading

vanzin Jan 9, 2018 •

edited

Loading