Skip to content

Conversation

@original-brownbear
Copy link
Contributor

There is a small chance here that #67947 would cause the callback
for the repository data to run on a transport or CS updater thread
and do a lot of IO to fetch SnapshotInfo.

Fixed by always forking to the generic pool for the callback.
Added test that triggers lots of deserializing repository data from
cache on the transport thread concurrently which triggers this bug
relatively reliable (more than half the runs) but is still reasonably
fast (under 5s).

There is a small chance here that elastic#67947 would cause the callback
for the repository data to run on a transport or CS updater thread
and do a lot of IO to fetch `SnapshotInfo`.

Fixed by always forking to the generic pool for the callback.
Added test that triggers lots of deserializing repository data from
cache on the transport thread concurrently which triggers this bug
relatively reliable (more than half the runs) but is still reasonably
fast (under 5s).
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jan 26, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@original-brownbear
Copy link
Contributor Author

Sorry for the oversight in #67947 @fcofdez (I thought I had all spots covered but I missed the RepositoriesService indirection :) Now we should be good.

Copy link
Contributor

@fcofdez fcofdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good catch!

@original-brownbear
Copy link
Contributor Author

Thanks Francisco!

@original-brownbear original-brownbear merged commit f5c64af into elastic:master Jan 28, 2021
@original-brownbear original-brownbear deleted the improve-snapshot-status-api branch January 28, 2021 05:58
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jan 28, 2021
…stic#68023)

There is a small chance here that elastic#67947 would cause the callback
for the repository data to run on a transport or CS updater thread
and do a lot of IO to fetch `SnapshotInfo`.

Fixed by always forking to the generic pool for the callback.
Added test that triggers lots of deserializing repository data from
cache on the transport thread concurrently which triggers this bug
relatively reliable (more than half the runs) but is still reasonably
fast (under 5s).
original-brownbear added a commit that referenced this pull request Jan 28, 2021
) (#68092)

There is a small chance here that #67947 would cause the callback
for the repository data to run on a transport or CS updater thread
and do a lot of IO to fetch `SnapshotInfo`.

Fixed by always forking to the generic pool for the callback.
Added test that triggers lots of deserializing repository data from
cache on the transport thread concurrently which triggers this bug
relatively reliable (more than half the runs) but is still reasonably
fast (under 5s).
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Feb 2, 2021
Same as elastic#68023 but even less likely (couldn't really find a quick way
to write a test for it for that reason).
Fix is the same, fork off to the generic pool for listener handling.
Also, this allows removing the forking in the transport action since we don't do any long
runnning work on the calling thread any longer in the restore method.
original-brownbear added a commit that referenced this pull request Feb 3, 2021
Same as #68023 but even less likely (couldn't really find a quick way
to write a test for it for that reason).
Fix is the same, fork off to the generic pool for listener handling.
Also, this allows removing the forking in the transport action since we don't do any long
runnning work on the calling thread any longer in the restore method.
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Feb 3, 2021
Same as elastic#68023 but even less likely (couldn't really find a quick way
to write a test for it for that reason).
Fix is the same, fork off to the generic pool for listener handling.
Also, this allows removing the forking in the transport action since we don't do any long
runnning work on the calling thread any longer in the restore method.
original-brownbear added a commit that referenced this pull request Feb 3, 2021
Same as #68023 but even less likely (couldn't really find a quick way
to write a test for it for that reason).
Fix is the same, fork off to the generic pool for listener handling.
Also, this allows removing the forking in the transport action since we don't do any long
runnning work on the calling thread any longer in the restore method.
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request May 18, 2021
The callback to loading the repository-data may not run on generic in the uncached case
because of the repo data deduplication logic.
The same issue was fixed for the snapshot status API in elastic#68023
original-brownbear added a commit that referenced this pull request May 18, 2021
…#73196)

The callback to loading the repository-data may not run on generic in the uncached case
because of the repo data deduplication logic.
The same issue was fixed for the snapshot status API in #68023
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jun 29, 2021
…elastic#73196)

The callback to loading the repository-data may not run on generic in the uncached case
because of the repo data deduplication logic.
The same issue was fixed for the snapshot status API in elastic#68023
original-brownbear added a commit that referenced this pull request Jun 29, 2021
…#73196) (#74695)

The callback to loading the repository-data may not run on generic in the uncached case
because of the repo data deduplication logic.
The same issue was fixed for the snapshot status API in #68023
@original-brownbear original-brownbear restored the improve-snapshot-status-api branch April 18, 2023 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >non-issue Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.12.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants