-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
In what version(s) of Spring for Apache Kafka are you seeing this issue?
3.3.0 and 3.3.1
Describe the bug
We have an application using Spring Boot v3.3.6 and Spring Kafka v3.3.0. And we have seen that the metric called spring.kafka.listener.active
leads to an increasing number of activeTask instances of type DefaultLongTaskTimer
from Micrometer, which are never garbage collected.
We can see in the screenshot the memory and CPU usage of the process, which shows a classic memory leak trend :
The issue doesn’t appear on another application we have that uses spring Kafka v3.2.2
The symptoms are similar to the one in this issue: spring-projects/spring-security#14030, where the stop method is not call on the observation.
We can see by debugging the code in method doInvokeRecordListener
in the class KafkaMessageListenerContainer
, that the finally bloc containing the observation.stop
is not called because the listener is an instance of RecordMessaginMessageListenerAdapter
.
When we disable this property, system resource consumption seems to return to normal : spring.cloud.stream.kafka.binder.enableObservation
Further compounding the issue, prometheus scrapes regularly this metric, which uses even more CPU and leads to timeouts or broken pipes on the scraping endpoint (/actuator/Prometheus
)
See the stack trace associated to the scrapping workload : threaddump-1734718231951.zip
To Reproduce
We have been able to create a minimal sample project to reproduce the issue.
This is a simple Kafka producer / consumer using the latest Spring Boot v3.4.1 and Spring Kafka v3.3.1 versions.
We changed the rate of the producer to 1 millisecond (so around 1000 messages per second) to speed up the phenomenon.
We see millions of instance counts (and growing) for the active task after about 1h of running the test :
Sample