Commit 86bf93e
[SPARK-13522][CORE] Executor should kill itself when it's unable to heartbeat to driver more than N times
## What changes were proposed in this pull request?
Sometimes, network disconnection event won't be triggered for other potential race conditions that we may not have thought of, then the executor will keep sending heartbeats to driver and won't exit.
This PR adds a new configuration `spark.executor.heartbeat.maxFailures` to kill Executor when it's unable to heartbeat to the driver more than `spark.executor.heartbeat.maxFailures` times.
## How was this patch tested?
unit tests
Author: Shixiong Zhu <[email protected]>
Closes #11401 from zsxwing/SPARK-13522.1 parent c433c0a commit 86bf93e
File tree
2 files changed
+29
-1
lines changed- core/src/main/scala/org/apache/spark/executor
2 files changed
+29
-1
lines changedLines changed: 21 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
114 | 114 | | |
115 | 115 | | |
116 | 116 | | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
117 | 130 | | |
118 | 131 | | |
119 | 132 | | |
| |||
464 | 477 | | |
465 | 478 | | |
466 | 479 | | |
| 480 | + | |
467 | 481 | | |
468 | | - | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
469 | 489 | | |
470 | 490 | | |
471 | 491 | | |
| |||
Lines changed: 8 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
42 | 48 | | |
43 | 49 | | |
44 | 50 | | |
| |||
51 | 57 | | |
52 | 58 | | |
53 | 59 | | |
| 60 | + | |
| 61 | + | |
54 | 62 | | |
55 | 63 | | |
56 | 64 | | |
| |||
0 commit comments