Commit 20051eb
[SPARK-36540][YARN] YARN-CLIENT mode should check Shutdown message when AMEndpoint disconencted
### What changes were proposed in this pull request?
We meet a case AM lose connection
```
21/08/18 02:14:15 ERROR TransportRequestHandler: Error sending result RpcResponse{requestId=5675952834716124039, body=NioManagedBuffer{buf=java.nio.HeapByteBuffer[pos=0 lim=47 cap=64]}} to xx.xx.xx.xx:41420; closing connection
java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957)
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865)
at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1104)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
```
Check the code about client, when AMEndpoint disconnected, will finish Application with SUCCESS final status
```
override def onDisconnected(remoteAddress: RpcAddress): Unit = {
// In cluster mode or unmanaged am case, do not rely on the disassociated event to exit
// This avoids potentially reporting incorrect exit codes if the driver fails
if (!(isClusterMode || sparkConf.get(YARN_UNMANAGED_AM))) {
logInfo(s"Driver terminated or disconnected! Shutting down. $remoteAddress")
finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
}
}
```
Normally say in client mode, when application success, driver will stop and AM loss connection, it's ok that exit with SUCCESS, but if there is a not work problem cause disconnected. Still finish with final status is not correct.
Then YarnClientSchedulerBackend will receive application report with final status with success and stop SparkContext cause application failed but mark it as a normal stop.
```
private class MonitorThread extends Thread {
private var allowInterrupt = true
override def run() {
try {
val YarnAppReport(_, state, diags) =
client.monitorApplication(appId.get, logApplicationReport = false)
logError(s"YARN application has exited unexpectedly with state $state! " +
"Check the YARN application logs for more details.")
diags.foreach { err =>
logError(s"Diagnostics message: $err")
}
allowInterrupt = false
sc.stop()
} catch {
case e: InterruptedException => logInfo("Interrupting monitor thread")
}
}
def stopMonitor(): Unit = {
if (allowInterrupt) {
this.interrupt()
}
```
IMO, we should send a `Shutdown` message to yarn client mode AM to make sure the shut down case
### Why are the changes needed?
Fix bug
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Closes #33780 from AngersZhuuuu/SPARK-36540.
Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Thomas Graves <[email protected]>1 parent dd986e8 commit 20051eb
File tree
5 files changed
+53
-4
lines changed- docs
- resource-managers/yarn/src/main/scala/org/apache/spark
- deploy/yarn
- scheduler/cluster
5 files changed
+53
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
441 | 441 | | |
442 | 442 | | |
443 | 443 | | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
444 | 457 | | |
445 | 458 | | |
446 | 459 | | |
| |||
Lines changed: 13 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
784 | 784 | | |
785 | 785 | | |
786 | 786 | | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
787 | 790 | | |
788 | 791 | | |
789 | 792 | | |
| |||
801 | 804 | | |
802 | 805 | | |
803 | 806 | | |
| 807 | + | |
| 808 | + | |
804 | 809 | | |
805 | 810 | | |
806 | 811 | | |
| |||
843 | 848 | | |
844 | 849 | | |
845 | 850 | | |
846 | | - | |
847 | | - | |
| 851 | + | |
| 852 | + | |
| 853 | + | |
| 854 | + | |
| 855 | + | |
| 856 | + | |
| 857 | + | |
848 | 858 | | |
849 | 859 | | |
850 | 860 | | |
| |||
862 | 872 | | |
863 | 873 | | |
864 | 874 | | |
| 875 | + | |
865 | 876 | | |
866 | 877 | | |
867 | 878 | | |
| |||
Lines changed: 15 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
55 | 70 | | |
56 | 71 | | |
57 | 72 | | |
| |||
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
161 | 161 | | |
162 | 162 | | |
163 | 163 | | |
| 164 | + | |
164 | 165 | | |
165 | 166 | | |
166 | 167 | | |
| |||
Lines changed: 11 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
58 | 58 | | |
59 | 59 | | |
60 | 60 | | |
61 | | - | |
| 61 | + | |
62 | 62 | | |
63 | 63 | | |
64 | 64 | | |
| |||
291 | 291 | | |
292 | 292 | | |
293 | 293 | | |
294 | | - | |
| 294 | + | |
295 | 295 | | |
296 | 296 | | |
297 | 297 | | |
| |||
319 | 319 | | |
320 | 320 | | |
321 | 321 | | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
322 | 331 | | |
323 | 332 | | |
324 | 333 | | |
| |||
0 commit comments