-
-
Notifications
You must be signed in to change notification settings - Fork 461
Description
We're experiencing a complete application freeze in our Kubernetes environments (both dev and stg) caused by a deadlock in Sentry's OpenTelemetry integration when using Java 24 virtual threads. All HTTP request processing threads become blocked waiting for the same lock in SentryContextWrapper, making the application unresponsive.
Environment
- Sentry SDK Version: 8.25.0
- Java Version: Java 24 (OpenJDK 24)
- Spring Boot Version: 3.5.6
- Application Server: Tomcat with VirtualThreadExecutor (enabled by default in Spring Boot 3.2+)
- Deployment: Kubernetes (both dev and stg environments affected)
- Sentry Integration:
- Sentry Java Agent (
sentry-opentelemetry-agentvia-javaagent) - Sentry Spring Boot Starter (
sentry-spring-boot-jakarta)
- Sentry Java Agent (
Configuration
# application.yml
spring:
datasource:
hikari:
register-mbeans: true
allow-pool-suspension: true
leak-detection-threshold: 20000
# Sentry configuration
sentry:
dsn: "https://[email protected]/4504878114340864"
sample-rate: 1.0
traces-sample-rate: 0.2 # Performance tracing at 20% sampling
environment: ${ENVIRONMENT_NAME}
send-default-pii: true
max-request-body-size: always
logging:
minimum-event-level: warn
minimum-breadcrumb-level: info
# Micrometer tracing configuration
management:
tracing:
sampling:
probability: 0.2 # 20% trace samplingEnvironment variables:
SENTRY_AUTO_INIT=false
OTEL_LOGS_EXPORTER=none
OTEL_METRICS_EXPORTER=none
OTEL_TRACES_EXPORTER=noneProblem Description
Symptoms
- Application becomes completely unresponsive in K8s environments (both dev and stg)
- No HTTP requests can be processed
- Hikari connection pool appears exhausted (but is actually blocked from starting operations)
- Issue does NOT occur in local development (lower concurrency, Sentry disabled)
Root Cause
Multiple threads (Tomcat NIO poller, virtual thread workers, and master poller) are deadlocked waiting for the same ReentrantLock object (<0x00000007fe0d06d0>) in SentryContextWrapper.forkCurrentScopeInternal().
The deadlock occurs when:
- Tomcat tries to create a new virtual thread for an HTTP request
- Sentry's OpenTelemetry integration tries to fork the current scope
- Lock contention occurs in
SynchronizedQueue.toArray() - All virtual threads become pinned/blocked waiting for this lock
- No new HTTP requests can be processed
Thread Dump Evidence
Blocked Thread #1: Tomcat NIO Poller (line 1048)
"http-nio-8088-Poller" #153 [147] daemon prio=5 os_prio=0 cpu=2798.94ms elapsed=10104.08s tid=0x00007f7ee2cc5430 nid=147 waiting on condition [0x00007f7fc4ffe000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@24/Native Method)
- parking to wait for <0x00000007fe0d06d0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(java.base@24/LockSupport.java:223)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@24/AbstractQueuedSynchronizer.java:789)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@24/AbstractQueuedSynchronizer.java:1029)
at java.util.concurrent.locks.ReentrantLock$Sync.lock(java.base@24/ReentrantLock.java:154)
at java.util.concurrent.locks.ReentrantLock.lock(java.base@24/ReentrantLock.java:323)
at io.sentry.util.AutoClosableReentrantLock.acquire(AutoClosableReentrantLock.java:12)
at io.sentry.SynchronizedQueue.toArray(SynchronizedQueue.java:148)
at io.sentry.Scope.<init>(Scope.java:138)
at io.sentry.Scope.clone(Scope.java:1099)
at io.sentry.Scopes.forkedScopes(Scopes.java:110)
at io.sentry.Sentry.forkedRootScopes(Sentry.java:129)
at io.sentry.opentelemetry.SentryContextWrapper.forkCurrentScopeInternal(SentryContextWrapper.java:75)
at io.sentry.opentelemetry.SentryContextWrapper.forkCurrentScope(SentryContextWrapper.java:46)
at io.sentry.opentelemetry.SentryContextWrapper.wrap(SentryContextWrapper.java:94)
at io.sentry.opentelemetry.SentryContextStorage.root(SentryContextStorage.java:44)
at io.opentelemetry.javaagent.shaded.io.opentelemetry.context.Context.root(Context.java:105)
at io.opentelemetry.javaagent.shaded.io.opentelemetry.context.Context.current(Context.java:93)
at io.opentelemetry.javaagent.bootstrap.Java8BytecodeBridge.currentContext(Java8BytecodeBridge.java:23)
at java.util.concurrent.ForkJoinPool.execute(java.base@24/ForkJoinPool.java:3100)
at java.lang.VirtualThread.submitRunContinuation(java.base@24/VirtualThread.java:350)
at java.lang.VirtualThread.externalSubmitRunContinuationOrThrow(java.base@24/VirtualThread.java:435)
at java.lang.VirtualThread.start(java.base@24/VirtualThread.java:710)
at java.lang.VirtualThread.start(java.base@24/VirtualThread.java:721)
at java.lang.ThreadBuilders$VirtualThreadBuilder.start(java.base@24/ThreadBuilders.java:262)
at org.apache.tomcat.util.threads.VirtualThreadExecutor.execute(VirtualThreadExecutor.java:52)
at org.apache.tomcat.util.net.AbstractEndpoint.processSocket(AbstractEndpoint.java:1360)
at org.apache.tomcat.util.net.NioEndpoint$Poller.processKey(NioEndpoint.java:842)
at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:809)
Blocked Thread #2: ForkJoinPool Worker (line 1174)
"ForkJoinPool-1-worker-1" #162 [155] daemon prio=5 os_prio=0 cpu=130508.48ms elapsed=10103.54s tid=0x00007f7ee2d45ad0 nid=155 waiting on condition [0x00007f7fc44fe000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@24/Native Method)
- parking to wait for <0x00000007fe0d06d0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(java.base@24/LockSupport.java:223)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@24/AbstractQueuedSynchronizer.java:789)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@24/AbstractQueuedSynchronizer.java:1029)
at java.util.concurrent.locks.ReentrantLock$Sync.lock(java.base@24/ReentrantLock.java:154)
at java.util.concurrent.locks.ReentrantLock.lock(java.base@24/ReentrantLock.java:323)
at io.sentry.util.AutoClosableReentrantLock.acquire(AutoClosableReentrantLock.java:12)
at io.sentry.SynchronizedQueue.toArray(SynchronizedQueue.java:148)
at io.sentry.Scope.<init>(Scope.java:138)
at io.sentry.Scope.clone(Scope.java:1099)
at io.sentry.Scopes.forkedScopes(Scopes.java:110)
at io.sentry.Sentry.forkedRootScopes(Sentry.java:129)
at io.sentry.opentelemetry.SentryContextWrapper.forkCurrentScopeInternal(SentryContextWrapper.java:75)
at io.sentry.opentelemetry.SentryContextWrapper.forkCurrentScope(SentryContextWrapper.java:46)
at io.sentry.opentelemetry.SentryContextWrapper.wrap(SentryContextWrapper.java:94)
at io.sentry.opentelemetry.SentryContextStorage.root(SentryContextStorage.java:44)
at io.opentelemetry.javaagent.shaded.io.opentelemetry.context.Context.root(Context.java:105)
at io.opentelemetry.javaagent.bootstrap.executors.ExecutorAdviceHelper.shouldPropagateContext(ExecutorAdviceHelper.java:53)
at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(java.base@24/ScheduledThreadPoolExecutor.java:543)
at java.lang.VirtualThread.schedule(java.base@24/VirtualThread.java:1450)
at java.lang.VirtualThread.afterYield(java.base@24/VirtualThread.java:571)
at java.lang.VirtualThread.runContinuation(java.base@24/VirtualThread.java:309)
Blocked Thread #3: Master Poller (line 1226)
"MasterPoller" #164 [157] daemon prio=5 os_prio=0 cpu=25987.63ms elapsed=10103.52s tid=0x0000564951d91730 nid=157 waiting on condition [0x00007f7fc42fe000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@24/Native Method)
- parking to wait for <0x00000007fe0d06d0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(java.base@24/LockSupport.java:223)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@24/AbstractQueuedSynchronizer.java:789)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@24/AbstractQueuedSynchronizer.java:1029)
at java.util.concurrent.locks.ReentrantLock$Sync.lock(java.base@24/ReentrantLock.java:154)
at java.util.concurrent.locks.ReentrantLock.lock(java.base@24/ReentrantLock.java:323)
at io.sentry.util.AutoClosableReentrantLock.acquire(AutoClosableReentrantLock.java:12)
at io.sentry.SynchronizedQueue.toArray(SynchronizedQueue.java:148)
at io.sentry.Scope.<init>(Scope.java:138)
at io.sentry.Scope.clone(Scope.java:1099)
at io.sentry.Scopes.forkedScopes(Scopes.java:110)
at io.sentry.Sentry.forkedRootScopes(Sentry.java:129)
at io.sentry.opentelemetry.SentryContextWrapper.forkCurrentScopeInternal(SentryContextWrapper.java:75)
Key Observation
- Only 1 Hikari thread present:
HikariPool-1:housekeeper(maintenance thread) - No threads waiting for database connections
- All blocking occurs at the same lock:
<0x00000007fe0d06d0> - Bottleneck:
SynchronizedQueue.toArray()at line 148
Analysis
Why This Happens
- High virtual thread concurrency: Java 24 + Spring Boot 3.5 enables virtual threads by default
- Performance tracing enabled: With
traces-sample-rate: 0.2, 20% of HTTP requests create traces - Lock contention in scope forking: Every virtual thread creation triggers
SentryContextWrapper.forkCurrentScopeInternal()for trace context propagation (even for non-traced requests) - Scope cloning bottleneck:
Scope.clone()callsSynchronizedQueue.toArray()which acquires a lock - Cascade effect: Under high concurrency, multiple virtual threads compete for the same lock
- Complete freeze: Even with 20% sampling, the lock contention is severe enough to deadlock the system
Why It Appears as "Connection Pool Exhaustion"
- Virtual threads never start → No database operations begin
- Hikari pool shows as "exhausted" because no connections are being used OR released
- This is a secondary symptom, not the root cause
Reproduction Steps
- Deploy Spring Boot 3.5+ application with Sentry 8.25.0 to Kubernetes
- Enable both Sentry Java Agent and Spring Boot Starter
- Configure:
sentry.traces-sample-rate=0.2(20% performance tracing)management.tracing.sampling.probability=0.2(Micrometer at 20%)
- Run on Java 24 (virtual threads enabled by default)
- Send moderate concurrent HTTP traffic (10-20 concurrent requests)
- Observe application freeze after 10-30 minutes
Note: Both our dev and stg environments with identical configuration (20% sampling) experience this deadlock.
Expected Behavior
Sentry should handle virtual thread scope propagation without lock contention, allowing high concurrency without deadlocks.
Actual Behavior
Application freezes completely. All virtual threads block waiting for the same lock in SentryContextWrapper, preventing any HTTP request processing.
Workarounds Tested
Important: The deadlock occurs even with 20% trace sampling (traces-sample-rate=0.2), indicating the issue is not simply about sampling rate but about fundamental lock contention in virtual thread scope propagation.
Under Investigation:
-
❓ Further reduce trace sampling - Test with
sentry.traces-sample-rate=0.05(5%)- Current 20% sampling still causes a deadlock
- Testing if very low sampling avoids the issue
-
❓ Remove Sentry Java Agent - Keep only Spring Boot starter (
sentry-spring-boot-jakarta)- Testing if agent's bytecode instrumentation causes the contention
- May lose some automatic instrumentation features
-
❓ Remove
sentry-reactordependency - Already removed, testing in progress- Unlikely to help (we use Spring MVC, not WebFlux)
Related Issues
- Virtual thread pinning on MainEventProcessor.ensureHostnameCache #3312 - Virtual thread pinning in
MainEventProcessor(marked as fixed in 8.0.0)- Difference: That issue was about
synchronizedmethods inMainEventProcessor - This issue: Lock contention in
SentryContextWrapperduring scope forking
- Difference: That issue was about
Questions
- Is
SentryContextWrapper.forkCurrentScopeInternal()optimized for high-concurrency virtual thread scenarios? - Could scope cloning be made lock-free or use more granular locking?
- Should the dual setup (Java Agent + Spring Boot Starter) be avoided with virtual threads?
Additional Context
We're using:
- PostgreSQL with HikariCP (default 10 connections)
- Elasticsearch for search
- MongoDB for document storage
- Spring MVC (not WebFlux)
- Kubernetes deployment with resource limits
Metadata
Metadata
Assignees
Labels
Projects
Status