Skip to content

Conversation

@ywangd
Copy link
Member

@ywangd ywangd commented Jul 14, 2021

This PR adds more resilience to testExpiredTokensDeletedAfterExpiration
by wrapping the empty error assertion into an assertBusy block. In rare
cases, it is possible for search to return a deleted but not yet
refreshed document. Trying to invalidate this document then result into
MissDocumentException. This state is transient and should go away once
all shards refresh.

Resolves: #67347

This PR add more resilience to testExpiredTokensDeletedAfterExpiration
by wrapping the empty error assertion into an assertBusy block. In rare
cases, it is possible for search to return a deleted but not yet
refreshed document. Trying to invalidate this document then result into
MissDocumentException. This state is transient and should go away once
all shards refresh.

Resolves: elastic#67347
@ywangd ywangd added >test Issues or PRs that are addressing/adding tests :Security/Security Security issues without another label v8.0.0 v7.15.0 labels Jul 14, 2021
@ywangd ywangd requested a review from tvernum July 14, 2021 03:42
@elasticmachine elasticmachine added the Team:Security Meta label for security team label Jul 14, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

Comment on lines -215 to +216
assertThat(newResponse.getPreviouslyInvalidatedTokens(), equalTo(0));
assertThat(newResponse.getErrors(), empty());
Copy link
Member Author

@ywangd ywangd Jul 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was the reviewer for the previous change (#64757). But somehow I didn't realise that we should wrap getErrors() here instead of getPreviouslyInvalidatedTokens() because:

  1. The original failure (TokenAuthIntegTests testExpiredTokensDeletedAfterExpiration failure #56903) was about getErrors().
  2. getPreviouslyInvalidatedTokens() should always return 0 since the refersh token is never invalidated before. If the cluster behaves really weirdly, we could probably expect getInvalidatedTokens() to be 1, but not getPreviouslyInvalidatedTokens().

InvalidateTokenResponse invalidateRefreshTokenResponse = restClient.security().invalidateToken(
invalidateRefreshTokenRequest, SECURITY_REQUEST_OPTIONS);
assertThat(invalidateRefreshTokenResponse.getInvalidatedTokens(), equalTo(0));
assertThat(invalidateRefreshTokenResponse.getErrors(), empty());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the problems here is that we have this very complex test, but nothing testing the more normal cases.
As far as I can tell there is nothing to test what happens if you try to invalidate an access token and then invalidate its refresh token. What's that behaviour supposed to be?

And this test is trying to test a behaviour that we mostly don't care about - specifically what happens if you try to invalidate a refresh token after it's been deleted from the index? The answer is it should either behave like a token that never existed, or it should behave like a token that was recently invalidated. We don't really mind which.

I don't think we should ever return errors though - it's not the caller's problem that the cleanup job just ran.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but nothing testing the more normal cases

We can (and should) add more tests. But I'd prefer a separate PR. This one is just to tame the number of CI failures.

And this test is trying to test a behaviour that we mostly don't care about - specifically what happens if you try to invalidate a refresh token after it's been deleted from the index?

If we go by the test name literally, to me, testExpiredTokensDeletedAfterExpiration means it is sufficient to ensure that the expired token document is deleted once the cleanup job runs. It has nothing to do with subsuquent invalidations after the deletion. And like you said, any subsequent invalidation should just behave like the document never exists and we do have a separate test (testInvalidateNotValidRefreshTokens) for that. If this is the case, I'd say the test is sufficient after the second assertBusy block which asserts the document no longer exists after triggering the cleanup job.

So a simpler fix is to just drop all the code after it. We can then have separate PRs to add tests for more common use cases, e.g. invalidate refresh token after access token is invalidated etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of my concern here is that this test says it's OK to return errors for a period of time after the deletion occurs, as long as they eventually go away.

Why is that? That seems like it's just accepting that there is a bug that we don't want to fix.
And due to the weirdness in these tests overall, it's hard to tell whether that's a bug that manifests more generally, or whether it's just an edge case that we don't care about and should stop testing for.

Because it sounds like we don't care, and we're just trying to get the test to run.
Why do these assertions model the behaviour that we actually want from the code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new test for "deleting access and refresh tokens one after the other with randomness" to cover the more realistic use case as you suggested.

As for this test, the elaborated code indeed suggests for a bug. There are also other bugs in TokenService and they are all more or less caused by the same dirty read problem. All of them are quite rare and this one is even less a practical issue. So we didn't get time around to tackle them, plus they all have some complexities. Even for this one, code wise it is likely easy to skip any DocumentMissingException and just returns normally. But I am not sure whether this would be covering up other subtle and likely very rare problems. In my opinion, it is at least safer to report a false error instead of silently skip the error (unless we are perfectly sure). All these sounds like a bit too much for a PR that is meant to reduce our CI backlog. I can raise an issue to track the bug, but I'd prefer to keep it separate from this PR.

…csearch/xpack/security/authc/TokenAuthIntegTests.java

Co-authored-by: Tim Vernum <[email protected]>
@ywangd ywangd requested a review from tvernum July 19, 2021 00:06
Copy link
Contributor

@tvernum tvernum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
We need to fix this test. We clearly have some underlying issues around requests that come in at the same time as the cleanup job, but they're more theoretical than of practical concern.

@ywangd
Copy link
Member Author

ywangd commented Aug 20, 2021

We need to fix this test. We clearly have some underlying issues around requests that come in at the same time as the cleanup job, but they're more theoretical than of practical concern.

I agree there are more issues to fix. But I think it is more about the code itself rather than the test. The test is a reflect on how the code logic being non-ideal. Once we get around to fix the code, the test should be also be fixed as part of it.

@ywangd ywangd added the auto-backport Automatically create backport pull requests when merged label Aug 20, 2021
@ywangd ywangd merged commit 9d2a944 into elastic:master Aug 20, 2021
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
7.x Commit could not be cherrypicked due to conflicts

To backport manually run backport --upstream elastic/elasticsearch --pr 75319

ywangd added a commit to ywangd/elasticsearch that referenced this pull request Aug 20, 2021
This PR add more resilience to testExpiredTokensDeletedAfterExpiration
by wrapping the empty error assertion into an assertBusy block. In rare
cases, it is possible for search to return a deleted but not yet
refreshed document. Trying to invalidate this document then result into
MissDocumentException. This state is transient and should go away once
all shards refresh.

Resolves: elastic#67347
elasticsearchmachine pushed a commit that referenced this pull request Aug 20, 2021
This PR add more resilience to testExpiredTokensDeletedAfterExpiration
by wrapping the empty error assertion into an assertBusy block. In rare
cases, it is possible for search to return a deleted but not yet
refreshed document. Trying to invalidate this document then result into
MissDocumentException. This state is transient and should go away once
all shards refresh.

Resolves: #67347
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :Security/Security Security issues without another label Team:Security Meta label for security team >test Issues or PRs that are addressing/adding tests v7.16.0 v8.0.0-alpha2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

testExpiredTokensDeletedAfterExpiration fails with "Error invalidating refresh_token"

6 participants