Commit 8de0d36
[SPARK-16925] Master should call schedule() after all executor exit events, not only failures
This patch fixes a bug in Spark's standalone Master which could cause applications to hang if tasks cause executors to exit with zero exit codes.
As an example of the bug, run
```
sc.parallelize(1 to 1, 1).foreachPartition { _ => System.exit(0) }
```
on a standalone cluster which has a single Spark application. This will cause all executors to die but those executors won't be replaced unless another Spark application or worker joins or leaves the cluster (or if an executor exits with a non-zero exit code). This behavior is caused by a bug in how the Master handles the `ExecutorStateChanged` event: the current implementation calls `schedule()` only if the executor exited with a non-zero exit code, so a task which causes a JVM to unexpectedly exit "cleanly" will skip the `schedule()` call.
This patch addresses this by modifying the `ExecutorStateChanged` to always unconditionally call `schedule()`. This should be safe because it should always be safe to call `schedule()`; adding extra `schedule()` calls can only affect performance and should not introduce correctness bugs.
I added a regression test in `DistributedSuite`.
Author: Josh Rosen <[email protected]>
Closes apache#14510 from JoshRosen/SPARK-16925.
(cherry picked from commit 4f5f9b6)
Signed-off-by: Josh Rosen <[email protected]>
(cherry picked from commit c162886)1 parent 083d2d5 commit 8de0d36
File tree
2 files changed
+22
-10
lines changed- core/src
- main/scala/org/apache/spark/deploy/master
- test/scala/org/apache/spark
2 files changed
+22
-10
lines changedLines changed: 7 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
284 | 284 | | |
285 | 285 | | |
286 | 286 | | |
287 | | - | |
288 | | - | |
289 | | - | |
290 | | - | |
291 | | - | |
292 | | - | |
293 | | - | |
294 | | - | |
295 | | - | |
296 | | - | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
297 | 293 | | |
298 | 294 | | |
299 | 295 | | |
| 296 | + | |
300 | 297 | | |
301 | 298 | | |
302 | 299 | | |
| |||
Lines changed: 15 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
131 | 131 | | |
132 | 132 | | |
133 | 133 | | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
134 | 149 | | |
135 | 150 | | |
136 | 151 | | |
| |||
0 commit comments