Skip to content

Conversation

@ywangd
Copy link
Member

@ywangd ywangd commented Jul 7, 2020

After discussion with @tvernum, the new SocketTimeoutException is likely due to the authenticate request being put onto the queue instead of the expected rejection. Since the thread pool is intentionally blocked, the authenticate request fails to come back in time.

Here are what likely happend in time order:

  1. The previous test is completed, but somehow its associated authentication task has not yet been cleared out of the threadpool. This is evident as shown in the previous EsRejectedExecutionException failure (Make test more robust for API key auth 429 #59077).
  2. The current test begins, blocking tasks are submitted to the thread pool.
  3. More no-op tasks are submitted in order to fill up the queue. However not all of them can be submitted because of the lingering task from step 1. So instead of 10 tasks, only 9 of them are successfully submitted. We catch the rejection execption and move on.
  4. JVM now decides to clear out the lingering task from step 1 from the thread pool. This means the queue now has one empty spot.
  5. The authentication request is sent by the client. Since there is one empty spot in the queue, the request is queued instead of being rejected and finally fails because of timeout.

So the fix is to ensure that authentication request in step 5 does not get put onto the queue. This can be ahieved by ensuring all blocking tasks from step 2 are alreadying running before move onto step 3. Once active threads are indeed blocked, we can then be sure that no tasks would be popped out of the queue unexpectedly, which in turn guarantees the rejection of the subsequent authentication request.

Resolves: #59149

@ywangd ywangd added >test Issues or PRs that are addressing/adding tests :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) v8.0.0 v7.9.0 labels Jul 7, 2020
@ywangd
Copy link
Member Author

ywangd commented Jul 8, 2020

@elasticmachine run elasticsearch-ci/2

2 similar comments
@ywangd
Copy link
Member Author

ywangd commented Jul 8, 2020

@elasticmachine run elasticsearch-ci/2

@ywangd
Copy link
Member Author

ywangd commented Jul 8, 2020

@elasticmachine run elasticsearch-ci/2

@ywangd ywangd marked this pull request as ready for review July 8, 2020 05:34
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security (:Security/Authentication)

@elasticmachine elasticmachine added the Team:Security Meta label for security team label Jul 8, 2020
@ywangd ywangd requested a review from tvernum July 8, 2020 05:34
@ywangd
Copy link
Member Author

ywangd commented Jul 8, 2020

@tvernum The tests have been successfully for 5 consecutive runs. It should be good for review. Thanks!

Copy link
Contributor

@tvernum tvernum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ywangd ywangd merged commit 9226edb into elastic:master Jul 8, 2020
ywangd added a commit to ywangd/elasticsearch that referenced this pull request Jul 8, 2020
Ensure blocking tasks are running before submitting more no-op tasks. This ensures no task would be popped out of the queue unexpectedly, which in turn guarantees the rejection of subsequent authentication request.
ywangd added a commit that referenced this pull request Jul 8, 2020
Ensure blocking tasks are running before submitting more no-op tasks. This ensures no task would be popped out of the queue unexpectedly, which in turn guarantees the rejection of subsequent authentication request.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) Team:Security Meta label for security team >test Issues or PRs that are addressing/adding tests v7.9.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] ApiKeyIntegTests testAuthenticationReturns429WhenThreadPoolIsSaturated timout issues

4 participants