Skip to content

Commit 676a1b7

Browse files
authored
Rollup jobs should be cleaned up before indices are deleted (#38930) (#39152)
Rollup jobs should be stopped + deleted before the indices are removed. It's possible for an active rollup job to issue a bulk request, the test ends and the cleanup code deletes all indices. The in-flight bulk request will then stall + error because the index no-longer exists... but this process might take longer than the StopRollup timeout. Which means the test fails, and often fails several other tests since the job is still active (e.g. other tests cannot create the same-named job, or fail to stop the job in their cleanup because it's still stalled). This tends to knock over several tests before the bulk finally times out and the job shuts down. Instead, we need to simply stop jobs first. Inflight bulks will resolve quickly, and we can carry on with deleting indices after the jobs are confirmed inactive. stop-job.asciidoc tended to trigger this issue because it executed an async stop API and then exited, which setup the above situation. In can and did happen with other tests though. As an extra precaution, the doc test was modified to substitute in wait_for_completion to help head off these issues too.
1 parent 058e937 commit 676a1b7

File tree

2 files changed

+14
-10
lines changed

2 files changed

+14
-10
lines changed

docs/reference/rollup/apis/stop-job.asciidoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ POST _xpack/rollup/job/sensor/_stop
5656
--------------------------------------------------
5757
// CONSOLE
5858
// TEST[setup:sensor_started_rollup_job]
59+
// TEST[s/_stop/_stop?wait_for_completion=true&timeout=10s/]
5960

6061
Which will return the response:
6162

test/framework/src/main/java/org/elasticsearch/test/rest/ESRestTestCase.java

Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -436,6 +436,19 @@ protected boolean preserveILMPoliciesUponCompletion() {
436436
}
437437

438438
private void wipeCluster() throws Exception {
439+
440+
// Cleanup rollup before deleting indices. A rollup job might have bulks in-flight,
441+
// so we need to fully shut them down first otherwise a job might stall waiting
442+
// for a bulk to finish against a non-existing index (and then fail tests)
443+
//
444+
// Rollups were introduced in 6.3.0 so any cluster that contains older
445+
// nodes won't be able to do *anything* with rollups, including cleanup.
446+
if (hasXPack && nodeVersions.first().onOrAfter(Version.V_6_3_0)
447+
&& false == preserveRollupJobsUponCompletion()) {
448+
wipeRollupJobs();
449+
waitForPendingRollupTasks();
450+
}
451+
439452
if (preserveIndicesUponCompletion() == false) {
440453
// wipe indices
441454
try {
@@ -483,16 +496,6 @@ private void wipeCluster() throws Exception {
483496
wipeClusterSettings();
484497
}
485498

486-
/*
487-
* Rollups were introduced in 6.3.0 so any cluster that contains older
488-
* nodes won't be able to do *anything* with rollups, including cleanup.
489-
*/
490-
if (hasXPack && nodeVersions.first().onOrAfter(Version.V_6_3_0)
491-
&& false == preserveRollupJobsUponCompletion()) {
492-
wipeRollupJobs();
493-
waitForPendingRollupTasks();
494-
}
495-
496499
if (hasXPack && false == preserveILMPoliciesUponCompletion()) {
497500
deleteAllPolicies();
498501
}

0 commit comments

Comments
 (0)