Skip to content

Conversation

@noodle-fb
Copy link

What changes were proposed in this pull request?

The ultimate goal is for listeners to onTaskEnd to receive metrics when a task is killed intentionally, since the data is currently just thrown away. This is already done for ExceptionFailure, so this just copies the same approach.

How was this patch tested?

The unit test in DAGSchedulerSuite that tests this for ExceptionFailure was modified to test the same thing for TaskKilled. I also re-tested all the unit tests modified by the last change to TaskKilled, and made sure they all still pass.

For integration tests, I ran a query that caused a speculative task retry on our deployment, and verified that the metrics showed up in our logging for that retry when it was killed.

@HyukjinKwon
Copy link
Member

Hi @noodle-fb, it seems not a trivial change that does not need a JIRA. Could we create a JIRA and put this in the title (see http://spark.apache.org/contributing.html)?

@JoshRosen
Copy link
Contributor

/cc @ericl as FYI.

@noodle-fb noodle-fb changed the title Attach accumulators / metrics to 'TaskKilled' end reason [SPARK-20087] Attach accumulators / metrics to 'TaskKilled' end reason Mar 27, 2017
@noodle-fb
Copy link
Author

@HyukjinKwon edited with Jira tag, didn't realize that was the naming convention

@noodle-fb
Copy link
Author

@JoshRosen ping? not sure how to github correctly

@jiangxb1987
Copy link
Contributor

add to whitelist

@jiangxb1987
Copy link
Contributor

ok to test

@jiangxb1987
Copy link
Contributor

@noodle-fb could you rebase this so we can review it? Thanks!

@advancedxy
Copy link
Contributor

@noodle-fb are you still working on this? If not, I may work on it based on your current impl.

I am facing same issue here. The accumulator updates are lost for killed tasks.

@squito
Copy link
Contributor

squito commented Feb 2, 2018

@advancedxy this has been quiet for a long time, so I suggest you just take it over. I actually think this is so close to complete that very little would need to be done, and credit would most likely go to @noodle-fb . That said, we may need a little more input on whether or not this is desirable, as it will change the meaning of the aggregated metrics.

override def toErrorString: String = "TaskKilled ($reason)"
override def countTowardsTaskFailures: Boolean = false

private[spark] def withAccums(accums: Seq[AccumulatorV2[_, _]]): TaskKilled = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this method is really necessary at all, you could just pass it in the constructor in the places its used.

} else {
Seq.empty
}
val accUpdates = accums.map(acc => acc.toInfo(Some(acc.value), None))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be refactored, and not repeated 3 times.

@noodle-fb
Copy link
Author

@advancedxy, feel free to take this over! @squito, as I remember this, it seemed inconsistent to count metrics for tasks that fail, but not tasks that were killed, machines are doing work in either case. But others might interpret the metrics differently.

@advancedxy
Copy link
Contributor

All right then, I will take it over. Of course the credit should go to @noodle-fb.

We can discuss whether this behaviour is desirable or not in the JIRA or the new PR.

srowen pushed a commit to srowen/spark that referenced this pull request May 22, 2018
… reason

## What changes were proposed in this pull request?
The ultimate goal is for listeners to onTaskEnd to receive metrics when a task is killed intentionally, since the data is currently just thrown away. This is already done for ExceptionFailure, so this just copies the same approach.

## How was this patch tested?
Updated existing tests.

This is a rework of apache#17422, all credits should go to noodle-fb

Author: Xianjin YE <[email protected]>
Author: Charles Lewis <[email protected]>

Closes apache#21165 from advancedxy/SPARK-20087.
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants