-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-12221] add cpu time to metrics #10212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens when the history server is tries to deserialise a history from an earlier spark version, one which doesn't have a CPU time? As it looks to me through my scan through the code that this is going to fail.
HistoryServerSuite is the regression test here —it does have job histories without the relevant metric.
It would benefit from having another reference test run here for playback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it won't be able to deserialize a history from an earlier version. Would it be better to make this backward-compatible? (Sorry for the super late response)
|
Jenkins, test this please |
|
(I may be trusted enough to start a run..let's see) |
1b77424 to
30752cb
Compare
|
@jisookim0513 are you still around? Do you mind updating the patch so we can trigger tests? |
|
@vanzin sure will do |
|
@vanzin I updated the patch |
|
ok to test |
|
Test build #64085 has finished for PR 10212 at commit
|
|
Test build #64096 has finished for PR 10212 at commit
|
|
Test build #64099 has finished for PR 10212 at commit
|
|
@vanzin this PR had passed all tests. Could you merge it if I fix the recently introduced conflicts? |
|
Sure. Just remember to ping someone, otherwise things get lost in the mountain of e-mails github generates. |
|
Test build #65679 has finished for PR 10212 at commit
|
|
Test build #65686 has finished for PR 10212 at commit
|
|
@vanzin could you merge this? Thanks! |
|
@jisookim0513 unfortunately there are conflicts once more. |
vanzin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor things to clean up, otherwise looks ok.
| "numCompleteTasks" : 8, | ||
| "numFailedTasks" : 0, | ||
| "executorRunTime" : 162, | ||
| "executorCpuTime" : 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to add these values to all these files? The code should be able to handle the old logs that don't have the value, and not adding these would be a good test case for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I thought HistoryServerSuite runs with included log files (that don't have CPU time). So this is an expected result since those logs don't have cpu time fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I thought these were input to the history server, not "golden files" that it checks against... if that's the case, ignore my comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no, these are expected outputs. I think the inputs are stored under src/test/resources/spark-events.
| metrics.setExecutorRunTime((json \ "Executor Run Time").extract[Long]) | ||
| metrics.setExecutorCpuTime((json \ "Executor CPU Time") match { | ||
| case JNothing => 0 | ||
| case x => x.extract[Long]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move '}' to next line (with the ')'). Also above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix this.
| | }, | ||
| | "Task Metrics": { | ||
| | "Executor Deserialize Time": 300, | ||
| | "Executor Deserialize CPU Time": 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use values other than 0 so that you're sure the code is actually parsing the value, instead of falling into the default case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, JsonProtolSuite creates a JSON string from the event created by makeTaskMetrics():
'makeTaskMetrics(300L, 400L, 500L, 600L, 700, 800, hasHadoopInput = true, hasOutput = false))'.
I tried changing makeTaskMetrics() to accept deserialize CPU time and CPU time as arguments , but that ended up violating scalaStyle by having more than 10 parameters..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it would be worth to fix this; just find some way around the style check.
Otherwise, you're not really testing whether the parsing code is actually parsing the field (what if there's a typo somewhere?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I tested it on my testing cluster, but this makes sense. I will add non-zero CPU times by setting the CPU times same as given wall times.
|
Test build #65787 has finished for PR 10212 at commit
|
|
Test build #65815 has finished for PR 10212 at commit
|
|
Failure looks unrelated... retest this please |
|
@vanzin thanks, I was about to ask for a retest :) |
|
LGTM pending tests. |
|
Test build #65834 has finished for PR 10212 at commit
|
|
Merging to master. |
|
@vanzin thanks a lot! |
Currently task metrics don't support executor CPU time, so there's no way to calculate how much CPU time a stage/task took from History Server metrics. This PR enables reporting CPU time.