SHS-NG M9: Stage page speed up. #41

vanzin · 2017-05-30T23:09:46Z

There are two main changes to speed up rendering of the tasks list
when rendering the stage page.

The first one makes the code only load the tasks being shown in the
current page of the tasks table, and information related to only
those tasks. One side-effect of this change is that the graph that
shows task-related events now only shows events for the tasks in
the current page, instead of the previously hardcoded limit of "events
for the first 1000 tasks". That ends up helping with readability,
though.

To make sorting efficient when using a disk store, the task wrapper
was extended to include many new indices, one for each of the sortable
columns in the UI, and metrics for which quantiles are calculated.

The second changes the way metric quantiles are calculated for stages.
Instead of using the "Distribution" class to process data for all task
metrics, which requires scanning all tasks of a stage, the code now
uses the KVStore "skip()" functionality to only read tasks that contain
interesting information for the quantiles that are desired.

This is still not cheap; because there are many metrics that the UI
and API track, the code needs to scan the index for each metric to
gather the information. Savings come mainly from skipping deserialization
when using the disk store, but the in-memory code also seems to be
faster than before (most probably because of other changes in this
patch).

With the above changes, a lot of code in the UI layer could be simplified.

libratiger · 2017-06-06T05:33:53Z

is this branch stable enough now?

libratiger · 2017-06-06T07:49:23Z

I just run the UnitTesst, and found some test failed:

stage task summary w shuffle write
stage task summary w shuffle read
stage task list w/ sortBy
stage task list w/ sortBy short names
job progress bars / cells reflect skipped stages

vanzin · 2017-06-06T16:59:05Z

@djvulee there's a couple of things I need to fix in this last patch... if you just reset the branch to the previous commit things should be more stable.

vanzin · 2017-06-06T17:22:06Z

Unit tests should be fixed in this patch too, now.

libratiger · 2017-06-07T04:53:26Z

Ok, Thanks! I found the current branch can not deal with the failed Stage well enough, it will produce the following error:

java.lang.IndexOutOfBoundsException: Page 1 is out of range. Please select a page number between 1 and 0.
at org.apache.spark.ui.PagedDataSource.pageData(PagedTable.scala:56)
at org.apache.spark.ui.PagedTable$class.table(PagedTable.scala:108)
at org.apache.spark.ui.jobs.TaskPagedTable.table(StagePage.scala:702)
at org.apache.spark.ui.jobs.StagePage.liftedTree1$1(StagePage.scala:295)
at org.apache.spark.ui.jobs.StagePage.render(StagePage.scala:284)
at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:88)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:845)
at org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1689)
at org.apache.spark.deploy.history.ApplicationCacheCheckFilter.doFilter(ApplicationCache.scala:437)
at org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1676)
at org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
at org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
at org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.spark_project.jetty.server.Server.handle(Server.java:524)
at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:319)
at org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:253)
at org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:95)
at org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:745)

libratiger · 2017-06-07T09:40:37Z

Another issue is that the SQL tab page will lead to nullPointerException(M8 branch)

vanzin · 2017-06-07T19:47:52Z

@djvulee do you have some code that can reproduce the failed stage you're having trouble with? I can't see any issues in my local build. The SQL tab and individual executions also render fine for me.

Detect the deletion of event log files from storage, and remove data about the related application attempt in the SHS.

There are two main changes to speed up rendering of the tasks list when rendering the stage page. The first one makes the code only load the tasks being shown in the current page of the tasks table, and information related to only those tasks. One side-effect of this change is that the graph that shows task-related events now only shows events for the tasks in the current page, instead of the previously hardcoded limit of "events for the first 1000 tasks". That ends up helping with readability, though. To make sorting efficient when using a disk store, the task wrapper was extended to include many new indices, one for each of the sortable columns in the UI, and metrics for which quantiles are calculated. The second changes the way metric quantiles are calculated for stages. Instead of using the "Distribution" class to process data for all task metrics, which requires scanning all tasks of a stage, the code now uses the KVStore "skip()" functionality to only read tasks that contain interesting information for the quantiles that are desired. This is still not cheap; because there are many metrics that the UI and API track, the code needs to scan the index for each metric to gather the information. Savings come mainly from skipping deserialization when using the disk store, but the in-memory code also seems to be faster than before (most probably because of other changes in this patch). To make subsequent calls faster, some quantiles are cached in the status store. This makes UIi much faster after the first time a stage has been loaded. With the above changes, a lot of code in the UI layer could be simplified.

vanzin force-pushed the shs-ng/M9 branch from 6781ee5 to 2e4e2cb Compare June 1, 2017 01:53

vanzin force-pushed the shs-ng/M8 branch from 153867b to 0ba3395 Compare June 1, 2017 01:53

vanzin force-pushed the shs-ng/M9 branch from 2e4e2cb to 4169c32 Compare June 1, 2017 20:15

vanzin force-pushed the shs-ng/M8 branch from 0ba3395 to bfb2ba0 Compare June 1, 2017 20:15

vanzin force-pushed the shs-ng/M9 branch from 4169c32 to 6db657c Compare June 2, 2017 16:24

vanzin force-pushed the shs-ng/M8 branch from bfb2ba0 to 3fd19ac Compare June 2, 2017 16:40

vanzin force-pushed the shs-ng/M9 branch 4 times, most recently from b5b270f to 3ee026c Compare June 3, 2017 02:12

vanzin force-pushed the shs-ng/M8 branch from 3fd19ac to 5742f0f Compare June 5, 2017 17:47

vanzin force-pushed the shs-ng/M9 branch from 3ee026c to a77aca6 Compare June 5, 2017 17:47

vanzin force-pushed the shs-ng/M8 branch from 5742f0f to af789ea Compare June 5, 2017 18:26

vanzin force-pushed the shs-ng/M9 branch from a77aca6 to fea935d Compare June 5, 2017 18:26

vanzin force-pushed the shs-ng/M8 branch from af789ea to 89d45ab Compare June 6, 2017 17:22

vanzin force-pushed the shs-ng/M9 branch from fea935d to 72d8db8 Compare June 6, 2017 17:22

vanzin force-pushed the shs-ng/M8 branch from 89d45ab to 2f7a152 Compare June 6, 2017 20:54

vanzin force-pushed the shs-ng/M9 branch from 72d8db8 to 34affb9 Compare June 6, 2017 20:54

vanzin force-pushed the shs-ng/M8 branch from 2f7a152 to e6a6925 Compare June 6, 2017 20:58

vanzin force-pushed the shs-ng/M9 branch from 34affb9 to ad39db2 Compare June 6, 2017 20:58

vanzin force-pushed the shs-ng/M8 branch from e6a6925 to b858d22 Compare June 7, 2017 18:01

vanzin force-pushed the shs-ng/M9 branch from ad39db2 to 44c72c3 Compare June 7, 2017 18:01

vanzin force-pushed the shs-ng/M8 branch from b858d22 to 65cacc3 Compare June 7, 2017 23:47

vanzin force-pushed the shs-ng/M9 branch from f4fd5d3 to 4eca39f Compare June 12, 2017 21:49

vanzin force-pushed the shs-ng/M8 branch from 90f827e to 951d0e5 Compare August 10, 2017 01:36

vanzin force-pushed the shs-ng/M9 branch from 4eca39f to 2b2fbf8 Compare August 10, 2017 01:36

vanzin force-pushed the shs-ng/M9 branch from 2b2fbf8 to ba5bbd3 Compare September 5, 2017 21:37

vanzin force-pushed the shs-ng/M8 branch from 951d0e5 to 9200daa Compare September 28, 2017 17:55

vanzin force-pushed the shs-ng/M9 branch from ba5bbd3 to fb46bb0 Compare September 28, 2017 17:55

vanzin mentioned this pull request Oct 12, 2017

[SPARK-21809] : Change Stage Page to use datatables to support sorting columns and searching apache/spark#19270

Closed

vanzin force-pushed the shs-ng/M8 branch from 9200daa to 0834fc4 Compare October 26, 2017 18:29

vanzin force-pushed the shs-ng/M9 branch from fb46bb0 to 5a9b89e Compare October 26, 2017 18:29

vanzin force-pushed the shs-ng/M8 branch from 0834fc4 to e7267f4 Compare October 26, 2017 21:14

vanzin force-pushed the shs-ng/M9 branch from 5a9b89e to 0aa61a6 Compare October 26, 2017 21:14

vanzin force-pushed the shs-ng/M8 branch from e7267f4 to a07bf0f Compare October 28, 2017 00:18

vanzin force-pushed the shs-ng/M9 branch from 0aa61a6 to 2812a5d Compare October 28, 2017 00:18

vanzin force-pushed the shs-ng/M8 branch from a07bf0f to d766509 Compare November 3, 2017 21:01

vanzin force-pushed the shs-ng/M9 branch from 2812a5d to f7e39c8 Compare November 3, 2017 21:01

vanzin force-pushed the shs-ng/M8 branch from d766509 to 5587ece Compare November 3, 2017 21:11

vanzin force-pushed the shs-ng/M9 branch from f7e39c8 to 51f5891 Compare November 3, 2017 21:12

vanzin force-pushed the shs-ng/M8 branch from 5587ece to dd92052 Compare November 3, 2017 21:55

vanzin force-pushed the shs-ng/M9 branch from 51f5891 to d3fa0bd Compare November 3, 2017 21:55

vanzin force-pushed the shs-ng/M8 branch from dd92052 to f354f93 Compare November 6, 2017 19:34

vanzin force-pushed the shs-ng/M9 branch from d3fa0bd to a0730de Compare November 6, 2017 19:34

vanzin force-pushed the shs-ng/M8 branch 2 times, most recently from 85b9ca1 to 172d0bb Compare December 5, 2017 20:54

vanzin force-pushed the shs-ng/M9 branch from a0730de to 541e62e Compare December 8, 2017 22:38

Marcelo Vanzin added 2 commits December 11, 2017 11:51

SHS-NG M8: Delete stale application data from SHS.

4540d85

Detect the deletion of event log files from storage, and remove data about the related application attempt in the SHS.

vanzin force-pushed the shs-ng/M9 branch from 541e62e to 711944b Compare December 12, 2017 01:56

vanzin mentioned this pull request Dec 18, 2017

[SPARK-20657][core] Speed up rendering of the stages page. apache/spark#20013

Closed

vanzin closed this Dec 18, 2017

vanzin deleted the shs-ng/M9 branch April 25, 2019 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SHS-NG M9: Stage page speed up. #41

SHS-NG M9: Stage page speed up. #41

Uh oh!

vanzin commented May 30, 2017 •

edited

Loading

Uh oh!

libratiger commented Jun 6, 2017

Uh oh!

libratiger commented Jun 6, 2017

Uh oh!

vanzin commented Jun 6, 2017

Uh oh!

vanzin commented Jun 6, 2017

Uh oh!

libratiger commented Jun 7, 2017 •

edited

Loading

Uh oh!

libratiger commented Jun 7, 2017

Uh oh!

vanzin commented Jun 7, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SHS-NG M9: Stage page speed up. #41

SHS-NG M9: Stage page speed up. #41

Uh oh!

Conversation

vanzin commented May 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

libratiger commented Jun 6, 2017

Uh oh!

libratiger commented Jun 6, 2017

Uh oh!

vanzin commented Jun 6, 2017

Uh oh!

vanzin commented Jun 6, 2017

Uh oh!

libratiger commented Jun 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

libratiger commented Jun 7, 2017

Uh oh!

vanzin commented Jun 7, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vanzin commented May 30, 2017 •

edited

Loading

libratiger commented Jun 7, 2017 •

edited

Loading