[SPARK-1683] Track task read metrics. #962

kayousterhout · 2014-06-03T23:37:36Z

This commit adds a new metric in TaskMetrics to record
the input data size and displays this information in the UI.

An earlier version of this commit also added the read time,
which can be useful for diagnosing straggler problems,
but unfortunately that change introduced a significant performance
regression for jobs that don't do much computation. In order to
track read time, we'll need to do sampling.

The screenshots below show the UI with the new "Input" field,
which I added to the stage summary page, the executor summary page,
and the per-stage page.

AmplabJenkins · 2014-06-03T23:37:58Z

Merged build triggered.

AmplabJenkins · 2014-06-03T23:38:09Z

Merged build started.

AmplabJenkins · 2014-06-04T00:19:43Z

Merged build finished.

AmplabJenkins · 2014-06-04T00:19:43Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15418/

rxin · 2014-06-04T02:25:26Z

core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala

Can we just count the size when iterator.close is called (or the method in context)?

AmplabJenkins · 2014-06-09T20:57:50Z

Merged build triggered.

AmplabJenkins · 2014-06-09T21:56:48Z

Merged build started.

AmplabJenkins · 2014-06-09T22:37:49Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-09T22:37:50Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15577/

kayousterhout · 2014-06-10T05:25:53Z

@rxin I changed this to only get the position twice (at the beginning and end), and also added metrics to NewHadoopRDD, which I'd forgotten about previously. I also surrounded the calls to reader.getPos() with try/catch since they can throw exceptions. This should be good to go now (and I verified it works for both Hadoop APIs on EC2).

AmplabJenkins · 2014-06-10T19:52:05Z

Merged build triggered.

AmplabJenkins · 2014-06-10T19:52:12Z

Merged build started.

AmplabJenkins · 2014-06-10T20:17:50Z

Merged build finished.

AmplabJenkins · 2014-06-10T20:17:51Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15635/

pwendell · 2014-06-10T20:17:54Z

I had to restart these tests due to something unrelated. Jenkins, test this please.

AmplabJenkins · 2014-06-10T20:22:06Z

Merged build triggered.

AmplabJenkins · 2014-06-10T20:22:13Z

Merged build started.

AmplabJenkins · 2014-06-10T21:35:01Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-10T21:35:02Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15641/

AmplabJenkins · 2014-06-18T20:39:52Z

Merged build triggered.

AmplabJenkins · 2014-06-18T20:39:57Z

Merged build started.

kayousterhout · 2014-06-18T20:46:31Z

@pwendell I updated this to master so it's good to go for you to review.

AmplabJenkins · 2014-06-18T21:24:03Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-18T21:24:03Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15883/

andrewor14 · 2014-06-24T20:23:47Z

core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala

minor: no need to val here

Ah cool will fix. Thanks for looking at this but you might want to hold off a few minutes -- I'm about to push a rebase of this, and also I realized I forgot to add the JSON code for the new TaskMetrics field.

kayousterhout · 2014-06-24T21:02:04Z

This is blocked by #1198 -- the Json tests need to be fixed to actually work before I can add the appropriate Json tests for this change.

pwendell · 2014-06-24T22:02:43Z

is 1198 blocking you from updating the patch? or just blocking it from having tests that pass?

AmplabJenkins · 2014-06-24T23:00:17Z

Merged build triggered.

AmplabJenkins · 2014-06-24T23:00:25Z

Merged build started.

kayousterhout · 2014-06-24T23:08:31Z

@pwendell I was just trying to avoid doing a bunch more updates to this. I've pushed the rebased version and the new Json tests but I don't think the new tests will work once the Json tests are fixed -- I can update them once 1198 goes in.

AmplabJenkins · 2014-06-27T20:58:50Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16216/

AmplabJenkins · 2014-06-29T22:00:35Z

Merged build triggered.

AmplabJenkins · 2014-06-29T22:00:43Z

Merged build started.

AmplabJenkins · 2014-06-29T22:41:41Z

Merged build finished.

AmplabJenkins · 2014-06-29T22:41:41Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16246/

kayousterhout · 2014-06-29T22:43:01Z

Jenkins, retest this please

AmplabJenkins · 2014-06-29T22:45:35Z

Merged build triggered.

AmplabJenkins · 2014-06-29T22:45:44Z

Merged build started.

AmplabJenkins · 2014-06-29T23:29:24Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-29T23:29:24Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16247/

AmplabJenkins · 2014-06-30T00:00:35Z

Merged build triggered.

AmplabJenkins · 2014-06-30T00:00:44Z

Merged build started.

AmplabJenkins · 2014-06-30T02:01:57Z

Merged build finished.

AmplabJenkins · 2014-06-30T02:01:57Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16249/

pwendell · 2014-06-30T03:46:01Z

Jenkins, retest this please.

AmplabJenkins · 2014-06-30T03:50:35Z

Merged build triggered.

AmplabJenkins · 2014-06-30T03:50:45Z

Merged build started.

kayousterhout · 2014-06-30T05:03:04Z

I've merged this into master (the Jenkins build finished but got stuck behind another broken build to report success). @sryza I added a comment as you suggested; we can fix this to be more accurate using the file statistics thing you suggested in a later patch.

AmplabJenkins · 2014-06-30T05:16:50Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-30T05:16:50Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16255/

rxin · 2014-06-30T05:18:20Z

oops you guys beat me to it!

This commit adds a new metric in TaskMetrics to record the input data size and displays this information in the UI. An earlier version of this commit also added the read time, which can be useful for diagnosing straggler problems, but unfortunately that change introduced a significant performance regression for jobs that don't do much computation. In order to track read time, we'll need to do sampling. The screenshots below show the UI with the new "Input" field, which I added to the stage summary page, the executor summary page, and the per-stage page. ![image](https://cloud.githubusercontent.com/assets/1108612/3167930/2627f92a-eb77-11e3-861c-98ea5bb7a1a2.png) ![image](https://cloud.githubusercontent.com/assets/1108612/3167936/475a889c-eb77-11e3-9706-f11c48751f17.png) ![image](https://cloud.githubusercontent.com/assets/1108612/3167948/80ebcf12-eb77-11e3-87ed-349fce6a770c.png) Author: Kay Ousterhout <[email protected]> Closes apache#962 from kayousterhout/read_metrics and squashes the following commits: f13b67d [Kay Ousterhout] Correctly format input bytes on executor page 8b70cde [Kay Ousterhout] Added comment about potential inaccuracy of bytesRead d1016e8 [Kay Ousterhout] Udated SparkListenerSuite test 8461492 [Kay Ousterhout] Miniscule style fix ae04d99 [Kay Ousterhout] Remove input metrics for parallel collections 719f19d [Kay Ousterhout] Style fixes bb6ec62 [Kay Ousterhout] Small fixes 869ac7b [Kay Ousterhout] Updated Json tests 44a0301 [Kay Ousterhout] Fixed accidentally added line 4bd0568 [Kay Ousterhout] Added input source, renamed Hdfs to Hadoop. f27e535 [Kay Ousterhout] Updates based on review comments and to fix rebase bf41029 [Kay Ousterhout] Updated Json tests to pass 0fc33e0 [Kay Ousterhout] Added explicit backward compatibility test 4e52925 [Kay Ousterhout] Added Json output and associated tests. 365400b [Kay Ousterhout] [SPARK-1683] Track task read metrics.

#962)

rxin reviewed Jun 4, 2014
View reviewed changes

andrewor14 reviewed Jun 24, 2014
View reviewed changes

Added comment about potential inaccuracy of bytesRead

8b70cde

Correctly format input bytes on executor page

f13b67d

asfgit closed this in 7b71a0e Jun 30, 2014

kayousterhout deleted the read_metrics branch June 30, 2014 05:28

wangyum pushed a commit that referenced this pull request May 26, 2023

[CARMEL-6006] Log stack trace to track the source of query cancelation (

a1b61bf

#962)

[SPARK-1683] Track task read metrics. #962

[SPARK-1683] Track task read metrics. #962

Uh oh!

Conversation

kayousterhout commented Jun 3, 2014

Uh oh!

AmplabJenkins commented Jun 3, 2014

Uh oh!

AmplabJenkins commented Jun 3, 2014

Uh oh!

AmplabJenkins commented Jun 4, 2014

Uh oh!

AmplabJenkins commented Jun 4, 2014

Uh oh!

rxin Jun 4, 2014

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Jun 9, 2014

Uh oh!

AmplabJenkins commented Jun 9, 2014

Uh oh!

AmplabJenkins commented Jun 9, 2014

Uh oh!

AmplabJenkins commented Jun 9, 2014

Uh oh!

kayousterhout commented Jun 10, 2014

Uh oh!

AmplabJenkins commented Jun 10, 2014

Uh oh!

AmplabJenkins commented Jun 10, 2014

Uh oh!

AmplabJenkins commented Jun 10, 2014

Uh oh!

AmplabJenkins commented Jun 10, 2014

Uh oh!

pwendell commented Jun 10, 2014

Uh oh!

AmplabJenkins commented Jun 10, 2014

Uh oh!

AmplabJenkins commented Jun 10, 2014

Uh oh!

AmplabJenkins commented Jun 10, 2014

Uh oh!

AmplabJenkins commented Jun 10, 2014

Uh oh!

AmplabJenkins commented Jun 18, 2014

Uh oh!

AmplabJenkins commented Jun 18, 2014

Uh oh!

kayousterhout commented Jun 18, 2014

Uh oh!

AmplabJenkins commented Jun 18, 2014

Uh oh!

AmplabJenkins commented Jun 18, 2014

Uh oh!

andrewor14 Jun 24, 2014

Choose a reason for hiding this comment

Uh oh!

kayousterhout Jun 24, 2014

Choose a reason for hiding this comment

Uh oh!

andrewor14 Jun 24, 2014

Choose a reason for hiding this comment

Uh oh!

kayousterhout commented Jun 24, 2014

Uh oh!

pwendell commented Jun 24, 2014

Uh oh!

AmplabJenkins commented Jun 24, 2014

Uh oh!

AmplabJenkins commented Jun 24, 2014

Uh oh!

kayousterhout commented Jun 24, 2014

Uh oh!

AmplabJenkins commented Jun 27, 2014

Uh oh!

AmplabJenkins commented Jun 29, 2014

Uh oh!

AmplabJenkins commented Jun 29, 2014

Uh oh!