-
Notifications
You must be signed in to change notification settings - Fork 29k
[Spark-20087][CORE] Attach accumulators / metrics to 'TaskKilled' end reason #21165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
6262c52
1830889
cb276bc
30ae145
88b1ceb
ea16b1c
5de0bcc
05d1d9c
945c1d5
59c2807
74911b7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1852,7 +1852,7 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi | |
| assertDataStructuresEmpty() | ||
| } | ||
|
|
||
| test("accumulators are updated on exception failures") { | ||
| test("accumulators are updated on exception failures and task killed") { | ||
| val acc1 = AccumulatorSuite.createLongAccum("ingenieur") | ||
| val acc2 = AccumulatorSuite.createLongAccum("boulanger") | ||
| val acc3 = AccumulatorSuite.createLongAccum("agriculteur") | ||
|
|
@@ -1868,15 +1868,24 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi | |
| val accUpdate3 = new LongAccumulator | ||
| accUpdate3.metadata = acc3.metadata | ||
| accUpdate3.setValue(18) | ||
| val accumUpdates = Seq(accUpdate1, accUpdate2, accUpdate3) | ||
| val accumInfo = accumUpdates.map(AccumulatorSuite.makeInfo) | ||
|
|
||
| val accumUpdates1 = Seq(accUpdate1, accUpdate2) | ||
| val accumInfo1 = accumUpdates1.map(AccumulatorSuite.makeInfo) | ||
| val exceptionFailure = new ExceptionFailure( | ||
| new SparkException("fondue?"), | ||
| accumInfo).copy(accums = accumUpdates) | ||
| accumInfo1).copy(accums = accumUpdates1) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not caused by you but why we do a copy instead of passing
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can avoid the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, this |
||
| submit(new MyRDD(sc, 1, Nil), Array(0)) | ||
| runEvent(makeCompletionEvent(taskSets.head.tasks.head, exceptionFailure, "result")) | ||
|
|
||
| assert(AccumulatorContext.get(acc1.id).get.value === 15L) | ||
| assert(AccumulatorContext.get(acc2.id).get.value === 13L) | ||
|
|
||
| val accumUpdates2 = Seq(accUpdate3) | ||
| val accumInfo2 = accumUpdates2.map(AccumulatorSuite.makeInfo) | ||
|
|
||
| val taskKilled = new TaskKilled( "test", accumInfo2, accums = accumUpdates2) | ||
| runEvent(makeCompletionEvent(taskSets.head.tasks.head, taskKilled, "result")) | ||
|
|
||
| assert(AccumulatorContext.get(acc3.id).get.value === 18L) | ||
| } | ||
|
|
||
|
|
@@ -2497,6 +2506,7 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi | |
| val accumUpdates = reason match { | ||
| case Success => task.metrics.accumulators() | ||
| case ef: ExceptionFailure => ef.accums | ||
| case tk: TaskKilled => tk.accums | ||
| case _ => Seq.empty | ||
| } | ||
| CompletionEvent(task, reason, result, accumUpdates ++ extraAccumUpdates, taskInfo) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we use
AccumulableInfoto expose accumulator information to end users. NowAccumulatorV2is already a public classs and we don't need to do it anymore, I think we can just doThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I noticed
accumUpdates: Seq[AccumulableInfo]is only used in JsonProtocol. Is that for a reason?The current impl is constructed to be sync with existing TaskEndReason such as
ExceptionFailureI'd prefer to keep in sync, leave two options for cleanup:
@cloud-fan what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's clean up ExceptionFailure at the same time, and use only
AccumulatorV2in this PR.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan After a second look, I don't think we can clean up
ExceptionFailureunless we can breakJsonProtocolThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now the question is: shall we keep the unnecessary Seq[AccumulableInfo] in new APIs, to make the API consistent? I'd like to not keep the Seq[AccumulableInfo], we may deprecate it in the existing APIs in the near future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with not keeping
Seq[AccumulableInfo]. But it means inconsistent logic and api and may make future refactoring a bit difficult.Let's see what I can do.
BTW, I think we have already deprecated
AccumulableInfo. Unless we are planing to remove it in Spark 3.0 and Spark 3.0 is the next release,AccumulableInfowill be there for a long timeThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @cloud-fan, I have looked at how to remove
Seq[AccumulableInfo]tonight.It turns out that we cannot because
JsonProtocolcallstaskEndReasonFromJsonto reconstructTaskEndReasons. SinceAccumulatorV2is an abstract class, we cannot simply constructAccumulatorV2s from json.Even we are promoting
AccumulatorV2, we still needAccumulableInfowhen (de)serializing json.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that makes sense, let's keep
AccumulableInfo.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan so, could you trigger the test and have a look?
And looks like I am not in the whitelist again...