Commit 34ac45a
[SPARK-16230][CORE] CoarseGrainedExecutorBackend to self kill if there is an exception while creating an Executor
## What changes were proposed in this pull request?
With the fix from SPARK-13112, I see that `LaunchTask` is always processed after `RegisteredExecutor` is done and so it gets chance to do all retries to startup an executor. There is still a problem that if `Executor` creation itself fails and there is some exception, it gets unnoticed and the executor is killed when it tries to process the `LaunchTask` as `executor` is null : https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L88 So if one looks at the logs, it does not tell that there was problem during `Executor` creation and thats why it was killed.
This PR explicitly catches exception in `Executor` creation, logs a proper message and then exits the JVM. Also, I have changed the `exitExecutor` method to accept `reason` so that backends can use that reason and do stuff like logging to a DB to get an aggregate of such exits at a cluster level
## How was this patch tested?
I am relying on existing tests
Author: Tejas Patil <[email protected]>
Closes #14202 from tejasapatil/exit_executor_failure.
(cherry picked from commit b2f24f9)
Signed-off-by: Shixiong Zhu <[email protected]>1 parent e833c90 commit 34ac45a
File tree
1 file changed
+20
-12
lines changed- core/src/main/scala/org/apache/spark/executor
1 file changed
+20
-12
lines changedLines changed: 20 additions & 12 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
| |||
64 | 65 | | |
65 | 66 | | |
66 | 67 | | |
67 | | - | |
68 | | - | |
| 68 | + | |
69 | 69 | | |
70 | 70 | | |
71 | 71 | | |
| |||
78 | 78 | | |
79 | 79 | | |
80 | 80 | | |
81 | | - | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
82 | 87 | | |
83 | 88 | | |
84 | | - | |
85 | | - | |
| 89 | + | |
86 | 90 | | |
87 | 91 | | |
88 | 92 | | |
89 | | - | |
90 | | - | |
| 93 | + | |
91 | 94 | | |
92 | 95 | | |
93 | 96 | | |
| |||
97 | 100 | | |
98 | 101 | | |
99 | 102 | | |
100 | | - | |
101 | | - | |
| 103 | + | |
102 | 104 | | |
103 | 105 | | |
104 | 106 | | |
| |||
127 | 129 | | |
128 | 130 | | |
129 | 131 | | |
130 | | - | |
131 | | - | |
| 132 | + | |
132 | 133 | | |
133 | 134 | | |
134 | 135 | | |
| |||
147 | 148 | | |
148 | 149 | | |
149 | 150 | | |
150 | | - | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
151 | 159 | | |
152 | 160 | | |
153 | 161 | | |
| |||
0 commit comments