Skip to content

Commit 34c61b9

Browse files
dagrawal3409dongjoon-hyun
authored andcommitted
[SPARK-32575][CORE][TESTS] Bump up timeouts in BlockManagerDecommissionIntegrationSuite to reduce flakyness
### What changes were proposed in this pull request? As reported by HyukjinKwon, BlockManagerDecommissionIntegrationSuite test is apparently still flaky (even after #29226): #29226 (comment). The new flakyness is because the executors are not launching in the 6 seconds time out I had given them when run under github checks. Bumped up the timeouts. ### Why are the changes needed? To make this test not flaky so that it can give us high signal if decommissioning regresses. ### Does this PR introduce _any_ user-facing change? No, unit test only check. ### How was this patch tested? No new tests. Just github and jenkins. Closes #29388 from agrawaldevesh/more_bm_harden. Authored-by: Devesh Agrawal <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent 0ae94ad commit 34c61b9

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

core/src/test/scala/org/apache/spark/storage/BlockManagerDecommissionIntegrationSuite.scala

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ class BlockManagerDecommissionIntegrationSuite extends SparkFunSuite with LocalS
4040
val TaskEnded = "TASK_ENDED"
4141
val JobEnded = "JOB_ENDED"
4242

43-
test(s"verify that an already running task which is going to cache data succeeds " +
43+
testRetry(s"verify that an already running task which is going to cache data succeeds " +
4444
s"on a decommissioned executor after task start") {
4545
runDecomTest(true, false, TaskStarted)
4646
}
@@ -89,7 +89,7 @@ class BlockManagerDecommissionIntegrationSuite extends SparkFunSuite with LocalS
8989

9090
val sleepIntervalMs = whenToDecom match {
9191
// Increase the window of time b/w task started and ended so that we can decom within that.
92-
case TaskStarted => 2000
92+
case TaskStarted => 10000
9393
// Make one task take a really short time so that we can decommission right after it is
9494
// done but before its peers are done.
9595
case TaskEnded =>
@@ -176,11 +176,11 @@ class BlockManagerDecommissionIntegrationSuite extends SparkFunSuite with LocalS
176176
} else {
177177
10.milliseconds
178178
}
179-
eventually(timeout(6.seconds), interval(intervalMs)) {
179+
eventually(timeout(20.seconds), interval(intervalMs)) {
180180
assert(getCandidateExecutorToDecom.isDefined)
181181
}
182182
} else {
183-
ThreadUtils.awaitResult(asyncCount, 15.seconds)
183+
ThreadUtils.awaitResult(asyncCount, 1.minute)
184184
}
185185

186186
// Decommission one of the executors.
@@ -194,7 +194,7 @@ class BlockManagerDecommissionIntegrationSuite extends SparkFunSuite with LocalS
194194
val decomTime = new SystemClock().getTimeMillis()
195195

196196
// Wait for job to finish.
197-
val asyncCountResult = ThreadUtils.awaitResult(asyncCount, 15.seconds)
197+
val asyncCountResult = ThreadUtils.awaitResult(asyncCount, 1.minute)
198198
assert(asyncCountResult === numParts)
199199
// All tasks finished, so accum should have been increased numParts times.
200200
assert(accum.value === numParts)
@@ -226,7 +226,7 @@ class BlockManagerDecommissionIntegrationSuite extends SparkFunSuite with LocalS
226226
}
227227

228228
// Wait for our respective blocks to have migrated
229-
eventually(timeout(30.seconds), interval(10.milliseconds)) {
229+
eventually(timeout(1.minute), interval(10.milliseconds)) {
230230
if (persist) {
231231
// One of our blocks should have moved.
232232
val rddUpdates = blocksUpdated.filter { update =>

0 commit comments

Comments
 (0)