Skip to content

Conversation

@original-brownbear
Copy link
Contributor

There is a JVM bug causing Thread#suspend calls to randomly take
multiple seconds breaking these tests that call the method numerous times
in a loop. Increasing the timeout would will not work since we may call
suspend tens if not hundreds of times and even a small number of them
experiencing the blocking will lead to multiple minutes of waiting.

This PR detects the specific issue by timing the Thread#suspend calls and
skips the remainder of the test if it timed out because of the JVM bug.

Closes #50047

There is a JVM bug causing `Thread#suspend` calls to randomly take
multiple seconds breaking these tests that call the method numerous times
in a loop. Increasing the timeout would will not work since we may call
`suspend` tens if not hundreds of times and even a small number of them
experiencing the blocking will lead to multiple minutes of waiting.

This PR detects the specific issue by timing the `Thread#suspend` calls and
skips the remainder of the test if it timed out because of the JVM bug.

Closes elastic#50047
@original-brownbear original-brownbear added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. >test Issues or PRs that are addressing/adding tests v7.5.2 v7.6.0 v8.0.0 labels Jan 8, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Distributed)

final long startTime = System.nanoTime();
thread.suspend();
if (System.nanoTime() - startTime > TimeUnit.SECONDS.toNanos(3L)) {
sawSlowSuspendBug.set(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this conditional on the JVM version so that we are sure to keep tracking that JVM bug and to drop this fix when it's no longer needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically we could, but right now we're seeing this failure on all versions. This is a little unexpected on 8 so I figured we'd apply the fix for 8 as well to identify whether it's the same issue (slow suspend) there or if it's blocked somewhere else.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@original-brownbear
Copy link
Contributor Author

Thanks Yannick!

@original-brownbear original-brownbear merged commit 375c140 into elastic:master Jan 14, 2020
@original-brownbear original-brownbear deleted the fix-50047 branch January 14, 2020 13:38
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jan 14, 2020
There is a JVM bug causing `Thread#suspend` calls to randomly take
multiple seconds breaking these tests that call the method numerous times
in a loop. Increasing the timeout would will not work since we may call
`suspend` tens if not hundreds of times and even a small number of them
experiencing the blocking will lead to multiple minutes of waiting.

This PR detects the specific issue by timing the `Thread#suspend` calls and
skips the remainder of the test if it timed out because of the JVM bug.

Closes elastic#50047
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jan 14, 2020
There is a JVM bug causing `Thread#suspend` calls to randomly take
multiple seconds breaking these tests that call the method numerous times
in a loop. Increasing the timeout would will not work since we may call
`suspend` tens if not hundreds of times and even a small number of them
experiencing the blocking will lead to multiple minutes of waiting.

This PR detects the specific issue by timing the `Thread#suspend` calls and
skips the remainder of the test if it timed out because of the JVM bug.

Closes elastic#50047
original-brownbear added a commit that referenced this pull request Jan 14, 2020
There is a JVM bug causing `Thread#suspend` calls to randomly take
multiple seconds breaking these tests that call the method numerous times
in a loop. Increasing the timeout would will not work since we may call
`suspend` tens if not hundreds of times and even a small number of them
experiencing the blocking will lead to multiple minutes of waiting.

This PR detects the specific issue by timing the `Thread#suspend` calls and
skips the remainder of the test if it timed out because of the JVM bug.

Closes #50047
original-brownbear added a commit that referenced this pull request Jan 14, 2020
There is a JVM bug causing `Thread#suspend` calls to randomly take
multiple seconds breaking these tests that call the method numerous times
in a loop. Increasing the timeout would will not work since we may call
`suspend` tens if not hundreds of times and even a small number of them
experiencing the blocking will lead to multiple minutes of waiting.

This PR detects the specific issue by timing the `Thread#suspend` calls and
skips the remainder of the test if it timed out because of the JVM bug.

Closes #50047
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
There is a JVM bug causing `Thread#suspend` calls to randomly take
multiple seconds breaking these tests that call the method numerous times
in a loop. Increasing the timeout would will not work since we may call
`suspend` tens if not hundreds of times and even a small number of them
experiencing the blocking will lead to multiple minutes of waiting.

This PR detects the specific issue by timing the `Thread#suspend` calls and
skips the remainder of the test if it timed out because of the JVM bug.

Closes elastic#50047
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. >test Issues or PRs that are addressing/adding tests v7.5.2 v7.6.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LongGCDisruptionTests.testNotBlockingUnsafeStackTraces failed

5 participants