Commit 83136ef
authored
[ML] address potential bug where trained models get stuck in starting after being allocated to node (#88945)
When a model is starting, it has been rarely observed that it will lock up while trying to restore the model objects to the native process.
This would manifest as a trained model being stuck in "starting" while also being assigned to a node. So, there is a native process started and task available on the assigned nodes, but the model state never gets out of "starting".1 parent 70a7276 commit 83136ef
File tree
4 files changed
+71
-42
lines changed- docs/changelog
- x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference
- assignment
- deployment
- persistence
4 files changed
+71
-42
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
Lines changed: 3 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
225 | 225 | | |
226 | 226 | | |
227 | 227 | | |
228 | | - | |
| 228 | + | |
229 | 229 | | |
230 | 230 | | |
231 | | - | |
| 231 | + | |
232 | 232 | | |
233 | 233 | | |
234 | 234 | | |
235 | | - | |
236 | 235 | | |
237 | 236 | | |
238 | 237 | | |
| |||
413 | 412 | | |
414 | 413 | | |
415 | 414 | | |
416 | | - | |
| 415 | + | |
417 | 416 | | |
418 | 417 | | |
419 | 418 | | |
| |||
Lines changed: 44 additions & 29 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
| 64 | + | |
64 | 65 | | |
65 | 66 | | |
66 | 67 | | |
| |||
149 | 150 | | |
150 | 151 | | |
151 | 152 | | |
152 | | - | |
153 | | - | |
154 | | - | |
155 | | - | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
160 | | - | |
161 | | - | |
162 | | - | |
163 | | - | |
164 | | - | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
165 | 166 | | |
166 | | - | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
167 | 180 | | |
168 | | - | |
169 | | - | |
170 | | - | |
171 | | - | |
172 | | - | |
173 | | - | |
174 | | - | |
175 | | - | |
176 | | - | |
177 | | - | |
178 | | - | |
179 | | - | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
180 | 193 | | |
181 | 194 | | |
182 | 195 | | |
| |||
404 | 417 | | |
405 | 418 | | |
406 | 419 | | |
407 | | - | |
408 | | - | |
| 420 | + | |
| 421 | + | |
409 | 422 | | |
410 | | - | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
411 | 426 | | |
412 | 427 | | |
413 | 428 | | |
| |||
Lines changed: 18 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
13 | | - | |
14 | | - | |
15 | 13 | | |
16 | 14 | | |
| 15 | + | |
17 | 16 | | |
| 17 | + | |
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
| 41 | + | |
41 | 42 | | |
42 | | - | |
| 43 | + | |
| 44 | + | |
43 | 45 | | |
44 | 46 | | |
45 | 47 | | |
| |||
71 | 73 | | |
72 | 74 | | |
73 | 75 | | |
74 | | - | |
| 76 | + | |
75 | 77 | | |
76 | 78 | | |
77 | 79 | | |
| |||
122 | 124 | | |
123 | 125 | | |
124 | 126 | | |
125 | | - | |
126 | 127 | | |
127 | 128 | | |
128 | 129 | | |
| |||
132 | 133 | | |
133 | 134 | | |
134 | 135 | | |
135 | | - | |
136 | | - | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
137 | 146 | | |
138 | 147 | | |
139 | 148 | | |
| |||
182 | 191 | | |
183 | 192 | | |
184 | 193 | | |
185 | | - | |
| 194 | + | |
186 | 195 | | |
187 | 196 | | |
188 | 197 | | |
189 | 198 | | |
190 | 199 | | |
191 | | - | |
| 200 | + | |
192 | 201 | | |
193 | 202 | | |
194 | 203 | | |
| |||
0 commit comments