-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-1683] Track task read metrics. #962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15418/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just count the size when iterator.close is called (or the method in context)?
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. All automated tests passed. |
|
All automated tests passed. |
|
@rxin I changed this to only get the position twice (at the beginning and end), and also added metrics to NewHadoopRDD, which I'd forgotten about previously. I also surrounded the calls to reader.getPos() with try/catch since they can throw exceptions. This should be good to go now (and I verified it works for both Hadoop APIs on EC2). |
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15635/ |
|
I had to restart these tests due to something unrelated. Jenkins, test this please. |
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. All automated tests passed. |
|
All automated tests passed. |
|
Merged build triggered. |
|
Merged build started. |
|
@pwendell I updated this to master so it's good to go for you to review. |
|
Merged build finished. All automated tests passed. |
|
All automated tests passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: no need to val here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah cool will fix. Thanks for looking at this but you might want to hold off a few minutes -- I'm about to push a rebase of this, and also I realized I forgot to add the JSON code for the new TaskMetrics field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
This is blocked by #1198 -- the Json tests need to be fixed to actually work before I can add the appropriate Json tests for this change. |
|
is 1198 blocking you from updating the patch? or just blocking it from having tests that pass? |
|
Merged build triggered. |
|
Merged build started. |
|
@pwendell I was just trying to avoid doing a bunch more updates to this. I've pushed the rebased version and the new Json tests but I don't think the new tests will work once the Json tests are fixed -- I can update them once 1198 goes in. |
|
All automated tests passed. |
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16246/ |
|
Jenkins, retest this please |
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. All automated tests passed. |
|
All automated tests passed. |
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16249/ |
|
Jenkins, retest this please. |
|
Merged build triggered. |
|
Merged build started. |
|
I've merged this into master (the Jenkins build finished but got stuck behind another broken build to report success). @sryza I added a comment as you suggested; we can fix this to be more accurate using the file statistics thing you suggested in a later patch. |
|
Merged build finished. All automated tests passed. |
|
All automated tests passed. |
|
oops you guys beat me to it! |
This commit adds a new metric in TaskMetrics to record the input data size and displays this information in the UI. An earlier version of this commit also added the read time, which can be useful for diagnosing straggler problems, but unfortunately that change introduced a significant performance regression for jobs that don't do much computation. In order to track read time, we'll need to do sampling. The screenshots below show the UI with the new "Input" field, which I added to the stage summary page, the executor summary page, and the per-stage page.    Author: Kay Ousterhout <[email protected]> Closes apache#962 from kayousterhout/read_metrics and squashes the following commits: f13b67d [Kay Ousterhout] Correctly format input bytes on executor page 8b70cde [Kay Ousterhout] Added comment about potential inaccuracy of bytesRead d1016e8 [Kay Ousterhout] Udated SparkListenerSuite test 8461492 [Kay Ousterhout] Miniscule style fix ae04d99 [Kay Ousterhout] Remove input metrics for parallel collections 719f19d [Kay Ousterhout] Style fixes bb6ec62 [Kay Ousterhout] Small fixes 869ac7b [Kay Ousterhout] Updated Json tests 44a0301 [Kay Ousterhout] Fixed accidentally added line 4bd0568 [Kay Ousterhout] Added input source, renamed Hdfs to Hadoop. f27e535 [Kay Ousterhout] Updates based on review comments and to fix rebase bf41029 [Kay Ousterhout] Updated Json tests to pass 0fc33e0 [Kay Ousterhout] Added explicit backward compatibility test 4e52925 [Kay Ousterhout] Added Json output and associated tests. 365400b [Kay Ousterhout] [SPARK-1683] Track task read metrics.
This commit adds a new metric in TaskMetrics to record
the input data size and displays this information in the UI.
An earlier version of this commit also added the read time,
which can be useful for diagnosing straggler problems,
but unfortunately that change introduced a significant performance
regression for jobs that don't do much computation. In order to
track read time, we'll need to do sampling.
The screenshots below show the UI with the new "Input" field,
which I added to the stage summary page, the executor summary page,
and the per-stage page.