Skip to content

Conversation

@original-brownbear
Copy link
Contributor

The loading of RepositoryData is not an atomic operation.
It uses a list + get combination of calls.
This lead to accidentally returning an empty repository data
for generations >=0 which can never not exist unless the repository
is corrupted.
In the test #48122 (and other SLM tests) there was a low chance of
running into this concurrent modification scenario and the repository
actually moving two index generations between listing out the
index-N and loading the latest version of it. Since we only keep
two index-N around at a time this lead to unexpectedly absent
snapshots in status APIs.
Fixing the behavior to be more resilient is non-trivial but in the works.
For now I think we should simply throw in this scenario. This will also
help prevent corruption in the unlikely event but possible of running into this
issue in a snapshot create or delete operation on master failover on a
repository like S3 which doesn't have the "no overwrites" protection on
writing a new index-N.

Fixes #48122

backport of #48329

The loading of `RepositoryData` is not an atomic operation.
It uses a list + get combination of calls.
This lead to accidentally returning an empty repository data
for generations >=0 which can never not exist unless the repository
is corrupted.
In the test elastic#48122 (and other SLM tests) there was a low chance of
running into this concurrent modification scenario and the repository
actually moving two index generations between listing out the
index-N and loading the latest version of it. Since we only keep
two index-N around at a time this lead to unexpectedly absent
snapshots in status APIs.
Fixing the behavior to be more resilient is non-trivial but in the works.
For now I think we should simply throw in this scenario. This will also
help prevent corruption in the unlikely event but possible of running into this
issue in a snapshot create or delete operation on master failover on a
repository like S3 which doesn't have the "no overwrites" protection on
writing a new index-N.

Fixes elastic#48122
@original-brownbear original-brownbear added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs backport labels Nov 2, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@original-brownbear original-brownbear merged commit d159e5d into elastic:7.5 Nov 2, 2019
@original-brownbear original-brownbear deleted the 48329-7.5 branch November 2, 2019 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants