You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 15, 2024. It is now read-only.
[SPARK-21656][CORE] spark dynamic allocation should not idle timeout executors when tasks still to run
## What changes were proposed in this pull request?
Right now spark lets go of executors when they are idle for the 60s (or configurable time). I have seen spark let them go when they are idle but they were really needed. I have seen this issue when the scheduler was waiting to get node locality but that takes longer than the default idle timeout. In these jobs the number of executors goes down really small (less than 10) but there are still like 80,000 tasks to run.
We should consider not allowing executors to idle timeout if they are still needed according to the number of tasks to be run.
## How was this patch tested?
Tested by manually adding executors to `executorsIdsToBeRemoved` list and seeing if those executors were removed when there are a lot of tasks and a high `numExecutorsTarget` value.
Code used
In `ExecutorAllocationManager.start()`
```
start_time = clock.getTimeMillis()
```
In `ExecutorAllocationManager.schedule()`
```
val executorIdsToBeRemoved = ArrayBuffer[String]()
if ( now > start_time + 1000 * 60 * 2) {
logInfo("--- REMOVING 1/2 of the EXECUTORS ---")
start_time += 1000 * 60 * 100
var counter = 0
for (x <- executorIds) {
counter += 1
if (counter == 2) {
counter = 0
executorIdsToBeRemoved += x
}
}
}
Author: John Lee <[email protected]>
Closesapache#18874 from yoonlee95/SPARK-21656.
(cherry picked from commit adf005d)
Signed-off-by: Tom Graves <[email protected]>
0 commit comments