-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
Bug description
I have an application which uses remote partitioned batch jobs which are sent to the workers via JMS.
I also have ThreadPoolTaskExecutor
configured on the worker side, so the chunks can be processed in parallel.
I was testing the graceful shutdown behavior on the worker side.
One of the testcase was to test what is happening when the processing time of a step on remote side takes longer than the graceful period.
The expected scenario in this case that after the graceful period expires then the partition step terminates end the step state is going to be STOPPED
in the database.
In my case, the application just starts hanging, Spring is not able to fully close the spring context in this scenario. It's hanging in an endless loop in RepeatTemplate.executeInternal()
. This calls TaskExecutorRepeatTemplate.getNextResult()
there it tries calls runnable.expect()
which calls queue.expect();
. Since spring already tries to Interrupt everything this call will fail with an InterruptedException
which then will be translated to a RepeatException
.
Lines 204 to 217 in e6c2727
try { | |
result = getNextResult(context, callback, state); | |
executeAfterInterceptors(context, result); | |
} | |
catch (Throwable throwable) { | |
doHandle(throwable, context, deferred); | |
} | |
// N.B. the order may be important here: | |
if (isComplete(context, result) || isMarkedComplete(context) || !deferred.isEmpty()) { | |
running = false; | |
} |
Here couple of things can fail:
-
doHandle calls
DefaultExceptionHandler
Lines 37 to 39 in e6c2727
public void handleException(RepeatContext context, Throwable throwable) throws Throwable { throw throwable; }
This can be overridden by a custom ExceptionHandler so no NPE will be thrown. -
in case DEBUG is enabled then NPE can also be thrown here, since the unwrapped throwable is
null
Lines 288 to 290 in e6c2727
if (logger.isDebugEnabled()) { logger.debug("Handling exception: " + throwable.getClass().getName() + ", caused by: " + unwrappedThrowable.getClass().getName() + ": " + unwrappedThrowable.getMessage());
This can also be fixed by turning of DEBUG. -
and finally here:
Lines 215 to 217 in e6c2727
if (isComplete(context, result) || isMarkedComplete(context) || !deferred.isEmpty()) { | |
running = false; | |
} |
I would expect running
to be set to false, however it won't happen the RepeatContext is still not complete.
- Using reflect I was able to add an
RepeatListener
toRepeatTemplate
which calls thecontext.setTerminateOnly()
when the application is shutting down. This allows to break the endless loop here, but after that inAbstarctStep
, it again tries to rethrow null after it extracted out the cause from thisRepeateException
spring-batch/spring-batch-core/src/main/java/org/springframework/batch/core/step/AbstractStep.java
Line 232 in e6c2727
throw e.getCause();
Environment
Please provide as many details as possible: Spring Batch version, Java version, which database you use if any, etc
- openjdk version "17.0.7" 2023-04-18 LTS
- Spring Batch 5.0.2
- Spring Boot 3.1.1
- PostrgeSQL 15.3
Steps to reproduce
See above
Expected behavior
- after the graceful period Spring shall be able to forcefully close the ApplicationContext
- no NPE or other exception is expected to be thrown.
- the related step state shall be saved using
STOPPED
state in the datatabase.
Minimal Complete Reproducible example
TBD, I will try to create a minimalistic example for this.
springbatchissue.zip
Steps to reproduce:
- unzip
- execute
./gradlew jibDockerBuild
to create a docker image - start the stack using
docker-compose up
- check the logs for the worker, immediately after the first message is received by the worker execute
kill -15 1
to kill it
You will see the app won't terminate after the graceful period ends. execute kill -3 1
and you will see that it's hanging in an endless loop