Skip to content

Conversation

@jisookim0513
Copy link
Contributor

Currently task metrics don't support executor CPU time, so there's no way to calculate how much CPU time a stage/task took from History Server metrics. This PR enables reporting CPU time.

@jisookim0513 jisookim0513 changed the title add cpu time to metrics [SPARK-12221] add cpu time to metrics Dec 9, 2015
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens when the history server is tries to deserialise a history from an earlier spark version, one which doesn't have a CPU time? As it looks to me through my scan through the code that this is going to fail.

HistoryServerSuite is the regression test here —it does have job histories without the relevant metric.

It would benefit from having another reference test run here for playback

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it won't be able to deserialize a history from an earlier version. Would it be better to make this backward-compatible? (Sorry for the super late response)

@steveloughran
Copy link
Contributor

Jenkins, test this please

@steveloughran
Copy link
Contributor

(I may be trusted enough to start a run..let's see)

@jisookim0513 jisookim0513 force-pushed the add-cpu-time-metric branch from 1b77424 to 30752cb Compare March 3, 2016 23:04
@vanzin
Copy link
Contributor

vanzin commented Aug 4, 2016

@jisookim0513 are you still around? Do you mind updating the patch so we can trigger tests?

@jisookim0513
Copy link
Contributor Author

@vanzin sure will do

@jisookim0513
Copy link
Contributor Author

@vanzin I updated the patch

@vanzin
Copy link
Contributor

vanzin commented Aug 19, 2016

ok to test

@SparkQA
Copy link

SparkQA commented Aug 19, 2016

Test build #64085 has finished for PR 10212 at commit bd19098.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 19, 2016

Test build #64096 has finished for PR 10212 at commit bf8d4f8.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 19, 2016

Test build #64099 has finished for PR 10212 at commit 5a9ea8e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jisookim0513
Copy link
Contributor Author

@vanzin this PR had passed all tests. Could you merge it if I fix the recently introduced conflicts?

@vanzin
Copy link
Contributor

vanzin commented Sep 20, 2016

Sure. Just remember to ping someone, otherwise things get lost in the mountain of e-mails github generates.

@SparkQA
Copy link

SparkQA commented Sep 20, 2016

Test build #65679 has finished for PR 10212 at commit d4d6f76.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 21, 2016

Test build #65686 has finished for PR 10212 at commit 2ff5bdc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jisookim0513
Copy link
Contributor Author

@vanzin could you merge this? Thanks!

@vanzin
Copy link
Contributor

vanzin commented Sep 22, 2016

@jisookim0513 unfortunately there are conflicts once more.

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor things to clean up, otherwise looks ok.

"numCompleteTasks" : 8,
"numFailedTasks" : 0,
"executorRunTime" : 162,
"executorCpuTime" : 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to add these values to all these files? The code should be able to handle the old logs that don't have the value, and not adding these would be a good test case for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I thought HistoryServerSuite runs with included log files (that don't have CPU time). So this is an expected result since those logs don't have cpu time fields.

Copy link
Contributor

@vanzin vanzin Sep 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I thought these were input to the history server, not "golden files" that it checks against... if that's the case, ignore my comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no, these are expected outputs. I think the inputs are stored under src/test/resources/spark-events.

metrics.setExecutorRunTime((json \ "Executor Run Time").extract[Long])
metrics.setExecutorCpuTime((json \ "Executor CPU Time") match {
case JNothing => 0
case x => x.extract[Long]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move '}' to next line (with the ')'). Also above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix this.

| },
| "Task Metrics": {
| "Executor Deserialize Time": 300,
| "Executor Deserialize CPU Time": 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use values other than 0 so that you're sure the code is actually parsing the value, instead of falling into the default case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, JsonProtolSuite creates a JSON string from the event created by makeTaskMetrics():
'makeTaskMetrics(300L, 400L, 500L, 600L, 700, 800, hasHadoopInput = true, hasOutput = false))'.
I tried changing makeTaskMetrics() to accept deserialize CPU time and CPU time as arguments , but that ended up violating scalaStyle by having more than 10 parameters..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think it would be worth to fix this; just find some way around the style check.

Otherwise, you're not really testing whether the parsing code is actually parsing the field (what if there's a typo somewhere?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I tested it on my testing cluster, but this makes sense. I will add non-zero CPU times by setting the CPU times same as given wall times.

@SparkQA
Copy link

SparkQA commented Sep 22, 2016

Test build #65787 has finished for PR 10212 at commit d9c5f8f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class SparkContext(config: SparkConf) extends Logging
    • class ChiSqSelector @Since(\"2.1.0\") () extends Serializable

@SparkQA
Copy link

SparkQA commented Sep 23, 2016

Test build #65815 has finished for PR 10212 at commit f0ef503.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Sep 23, 2016

Failure looks unrelated... retest this please

@jisookim0513
Copy link
Contributor Author

@vanzin thanks, I was about to ask for a retest :)

@vanzin
Copy link
Contributor

vanzin commented Sep 23, 2016

LGTM pending tests.

@SparkQA
Copy link

SparkQA commented Sep 23, 2016

Test build #65834 has finished for PR 10212 at commit f0ef503.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Sep 23, 2016

Merging to master.

@asfgit asfgit closed this in 90a30f4 Sep 23, 2016
@jisookim0513
Copy link
Contributor Author

@vanzin thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants