[Test] More resilience for token invalidation test #75319

ywangd · 2021-07-14T03:42:03Z

This PR adds more resilience to testExpiredTokensDeletedAfterExpiration
by wrapping the empty error assertion into an assertBusy block. In rare
cases, it is possible for search to return a deleted but not yet
refreshed document. Trying to invalidate this document then result into
MissDocumentException. This state is transient and should go away once
all shards refresh.

Resolves: #67347

This PR add more resilience to testExpiredTokensDeletedAfterExpiration by wrapping the empty error assertion into an assertBusy block. In rare cases, it is possible for search to return a deleted but not yet refreshed document. Trying to invalidate this document then result into MissDocumentException. This state is transient and should go away once all shards refresh. Resolves: elastic#67347

elasticmachine · 2021-07-14T03:42:06Z

Pinging @elastic/es-security (Team:Security)

ywangd · 2021-07-14T03:47:08Z

...src/internalClusterTest/java/org/elasticsearch/xpack/security/authc/TokenAuthIntegTests.java

-                assertThat(newResponse.getPreviouslyInvalidatedTokens(), equalTo(0));
+                assertThat(newResponse.getErrors(), empty());


I was the reviewer for the previous change (#64757). But somehow I didn't realise that we should wrap getErrors() here instead of getPreviouslyInvalidatedTokens() because:

The original failure (TokenAuthIntegTests testExpiredTokensDeletedAfterExpiration failure #56903) was about getErrors().

getPreviouslyInvalidatedTokens() should always return 0 since the refersh token is never invalidated before. If the cluster behaves really weirdly, we could probably expect getInvalidatedTokens() to be 1, but not getPreviouslyInvalidatedTokens().

...src/internalClusterTest/java/org/elasticsearch/xpack/security/authc/TokenAuthIntegTests.java

tvernum · 2021-07-14T06:42:29Z

...src/internalClusterTest/java/org/elasticsearch/xpack/security/authc/TokenAuthIntegTests.java

        InvalidateTokenResponse invalidateRefreshTokenResponse = restClient.security().invalidateToken(
            invalidateRefreshTokenRequest, SECURITY_REQUEST_OPTIONS);
        assertThat(invalidateRefreshTokenResponse.getInvalidatedTokens(), equalTo(0));
-        assertThat(invalidateRefreshTokenResponse.getErrors(), empty());


One of the problems here is that we have this very complex test, but nothing testing the more normal cases.
As far as I can tell there is nothing to test what happens if you try to invalidate an access token and then invalidate its refresh token. What's that behaviour supposed to be?

And this test is trying to test a behaviour that we mostly don't care about - specifically what happens if you try to invalidate a refresh token after it's been deleted from the index? The answer is it should either behave like a token that never existed, or it should behave like a token that was recently invalidated. We don't really mind which.

I don't think we should ever return errors though - it's not the caller's problem that the cleanup job just ran.

but nothing testing the more normal cases

We can (and should) add more tests. But I'd prefer a separate PR. This one is just to tame the number of CI failures.

And this test is trying to test a behaviour that we mostly don't care about - specifically what happens if you try to invalidate a refresh token after it's been deleted from the index?

If we go by the test name literally, to me, testExpiredTokensDeletedAfterExpiration means it is sufficient to ensure that the expired token document is deleted once the cleanup job runs. It has nothing to do with subsuquent invalidations after the deletion. And like you said, any subsequent invalidation should just behave like the document never exists and we do have a separate test (testInvalidateNotValidRefreshTokens) for that. If this is the case, I'd say the test is sufficient after the second assertBusy block which asserts the document no longer exists after triggering the cleanup job.

So a simpler fix is to just drop all the code after it. We can then have separate PRs to add tests for more common use cases, e.g. invalidate refresh token after access token is invalidated etc.

Part of my concern here is that this test says it's OK to return errors for a period of time after the deletion occurs, as long as they eventually go away.

Why is that? That seems like it's just accepting that there is a bug that we don't want to fix.
And due to the weirdness in these tests overall, it's hard to tell whether that's a bug that manifests more generally, or whether it's just an edge case that we don't care about and should stop testing for.

Because it sounds like we don't care, and we're just trying to get the test to run.
Why do these assertions model the behaviour that we actually want from the code?

I added a new test for "deleting access and refresh tokens one after the other with randomness" to cover the more realistic use case as you suggested.

As for this test, the elaborated code indeed suggests for a bug. There are also other bugs in TokenService and they are all more or less caused by the same dirty read problem. All of them are quite rare and this one is even less a practical issue. So we didn't get time around to tackle them, plus they all have some complexities. Even for this one, code wise it is likely easy to skip any DocumentMissingException and just returns normally. But I am not sure whether this would be covering up other subtle and likely very rare problems. In my opinion, it is at least safer to report a false error instead of silently skip the error (unless we are perfectly sure). All these sounds like a bit too much for a PR that is meant to reduce our CI backlog. I can raise an issue to track the bug, but I'd prefer to keep it separate from this PR.

…csearch/xpack/security/authc/TokenAuthIntegTests.java Co-authored-by: Tim Vernum <[email protected]>

…lidation-test

tvernum

LGTM.
We need to fix this test. We clearly have some underlying issues around requests that come in at the same time as the cleanup job, but they're more theoretical than of practical concern.

ywangd · 2021-08-20T04:45:41Z

We need to fix this test. We clearly have some underlying issues around requests that come in at the same time as the cleanup job, but they're more theoretical than of practical concern.

I agree there are more issues to fix. But I think it is more about the code itself rather than the test. The test is a reflect on how the code logic being non-ideal. Once we get around to fix the code, the test should be also be fixed as part of it.

elasticsearchmachine · 2021-08-20T04:47:06Z

💔 Backport failed

Status	Branch	Result
❌	7.x	Commit could not be cherrypicked due to conflicts

To backport manually run backport --upstream elastic/elasticsearch --pr 75319

This PR add more resilience to testExpiredTokensDeletedAfterExpiration by wrapping the empty error assertion into an assertBusy block. In rare cases, it is possible for search to return a deleted but not yet refreshed document. Trying to invalidate this document then result into MissDocumentException. This state is transient and should go away once all shards refresh. Resolves: elastic#67347

This PR add more resilience to testExpiredTokensDeletedAfterExpiration by wrapping the empty error assertion into an assertBusy block. In rare cases, it is possible for search to return a deleted but not yet refreshed document. Trying to invalidate this document then result into MissDocumentException. This state is transient and should go away once all shards refresh. Resolves: #67347

ywangd added >test Issues or PRs that are addressing/adding tests :Security/Security Security issues without another label v8.0.0 v7.15.0 labels Jul 14, 2021

ywangd requested a review from tvernum July 14, 2021 03:42

elasticmachine added the Team:Security Meta label for security team label Jul 14, 2021

ywangd commented Jul 14, 2021

View reviewed changes

tweak

1673512

tvernum reviewed Jul 14, 2021

View reviewed changes

Update x-pack/plugin/security/src/internalClusterTest/java/org/elasti…

542260f

…csearch/xpack/security/authc/TokenAuthIntegTests.java Co-authored-by: Tim Vernum <[email protected]>

ywangd requested a review from tvernum July 19, 2021 00:06

ywangd added 2 commits August 3, 2021 23:03

Merge remote-tracking branch 'origin/master' into es-67347-token-inva…

5c38f37

…lidation-test

address feedback

41d596e

mark-vieira added v7.16.0 and removed v7.15.0 labels Aug 19, 2021

tvernum approved these changes Aug 20, 2021

View reviewed changes

ywangd added the auto-backport Automatically create backport pull requests when merged label Aug 20, 2021

ywangd merged commit 9d2a944 into elastic:master Aug 20, 2021

jakelandis added v8.0.0-alpha2 and removed v8.0.0 labels Sep 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Test] More resilience for token invalidation test #75319

[Test] More resilience for token invalidation test #75319

Uh oh!

ywangd commented Jul 14, 2021 •

edited

Loading

Uh oh!

elasticmachine commented Jul 14, 2021

Uh oh!

ywangd Jul 14, 2021 •

edited

Loading

Uh oh!

Uh oh!

tvernum Jul 14, 2021

Uh oh!

ywangd Jul 14, 2021

Uh oh!

tvernum Jul 29, 2021

Uh oh!

ywangd Aug 3, 2021

Uh oh!

tvernum left a comment

Uh oh!

ywangd commented Aug 20, 2021

Uh oh!

elasticsearchmachine commented Aug 20, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

		assertThat(newResponse.getPreviouslyInvalidatedTokens(), equalTo(0));
		assertThat(newResponse.getErrors(), empty());

[Test] More resilience for token invalidation test #75319

[Test] More resilience for token invalidation test #75319

Uh oh!

Conversation

ywangd commented Jul 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Jul 14, 2021

Uh oh!

ywangd Jul 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tvernum Jul 14, 2021

Choose a reason for hiding this comment

Uh oh!

ywangd Jul 14, 2021

Choose a reason for hiding this comment

Uh oh!

tvernum Jul 29, 2021

Choose a reason for hiding this comment

Uh oh!

ywangd Aug 3, 2021

Choose a reason for hiding this comment

Uh oh!

tvernum left a comment

Choose a reason for hiding this comment

Uh oh!

ywangd commented Aug 20, 2021

Uh oh!

elasticsearchmachine commented Aug 20, 2021

💔 Backport failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ywangd commented Jul 14, 2021 •

edited

Loading

ywangd Jul 14, 2021 •

edited

Loading