-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-5762] Fix shuffle write time for sort-based shuffle #4559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The shuffle write time is meant to be measured in nanoseconds, right? Edited to remove my second complaint which didn't make sense. |
|
Ah thanks for noticing that -- fixed! And yeah the other case works fine because it goes through the DiskBlockObjectWriter (sounds like you figured this out). |
|
My other concern is that we're also measuring the time spent reading the spill files from disk. Though I suppose there's no way around this because we use |
|
Test build #27337 has finished for PR 4559 at commit
|
|
Test build #27341 has finished for PR 4559 at commit
|
|
I think this was just an oversight, it's good to include that time. The change looks good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we write this metric if there was an exception thrown above? if so, maybe we should add this in the finally block. If not, then this is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point -- done.
|
LGTM, @sryza I think its okay to include the reading time since this whole operation you can argue is for writing out the file - even if there is some reading. |
|
I had the same thought as @ksakellis re:reading time. Thanks for looking at this all! |
|
Test build #27366 has finished for PR 4559 at commit
|
|
Test build #27369 has finished for PR 4559 at commit
|
|
Jenkins, retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be foreach since we don't use the return value (not a big deal at all)
|
Test build #27374 has finished for PR 4559 at commit
|
|
Test build #27375 has finished for PR 4559 at commit
|
|
Ok merging into master 1.3 1.2 thanks @kayousterhout |
mateiz was excluding the time to write this final file from the shuffle write time intentional? Author: Kay Ousterhout <[email protected]> Closes #4559 from kayousterhout/SPARK-5762 and squashes the following commits: 5c6f3d9 [Kay Ousterhout] Use foreach 94e4237 [Kay Ousterhout] Removed open time metrics added inadvertently ace156c [Kay Ousterhout] Moved metrics to finally block d773276 [Kay Ousterhout] Use nano time 5a59906 [Kay Ousterhout] [SPARK-5762] Fix shuffle write time for sort-based shuffle (cherry picked from commit 47c73d4) Signed-off-by: Andrew Or <[email protected]>
mateiz was excluding the time to write this final file from the shuffle write time intentional? Author: Kay Ousterhout <[email protected]> Closes #4559 from kayousterhout/SPARK-5762 and squashes the following commits: 5c6f3d9 [Kay Ousterhout] Use foreach 94e4237 [Kay Ousterhout] Removed open time metrics added inadvertently ace156c [Kay Ousterhout] Moved metrics to finally block d773276 [Kay Ousterhout] Use nano time 5a59906 [Kay Ousterhout] [SPARK-5762] Fix shuffle write time for sort-based shuffle (cherry picked from commit 47c73d4) Signed-off-by: Andrew Or <[email protected]>
mateiz was excluding the time to write this final file from the shuffle write time intentional? Author: Kay Ousterhout <[email protected]> Closes apache#4559 from kayousterhout/SPARK-5762 and squashes the following commits: 5c6f3d9 [Kay Ousterhout] Use foreach 94e4237 [Kay Ousterhout] Removed open time metrics added inadvertently ace156c [Kay Ousterhout] Moved metrics to finally block d773276 [Kay Ousterhout] Use nano time 5a59906 [Kay Ousterhout] [SPARK-5762] Fix shuffle write time for sort-based shuffle (cherry picked from commit 47c73d4) Signed-off-by: Andrew Or <[email protected]>
@mateiz was excluding the time to write this final file from the shuffle write time intentional?