Skip to content

Conversation

@tlrx
Copy link
Member

@tlrx tlrx commented Dec 6, 2019

Note: this draft pull request targets the feature/searchable-snapshots branch

This pull request introduces a simple caching mechanism that operates at the Lucene files level of searchable snapshot directories.

Several new classes are introduced or changed since #49651: the searchable snapshot directory (SearchableSnapshotDirectory) now contains a representation of the snapshotted shard files (SearchableSnapshotShard) which allows to list the files or read a file from a specific snapshot.

A basic implementation of a searchable snapshot shard is BlobStoreSearchableSnapshotShard which directly accesses a remote blob store repository to list or to read files. This implementation takes care of converting the names of Lucene files into blob names in the repository and to load the appropriate chunks of blobs (the implementation is still very raw and error prone and must be consolidate).

Another implementation of a searchable snapshot shard is CachedSearchableSnapshotShard which
caches segment (or portion) of file using a CacheService. This cache service uses the existing LRU org.elasticsearch.common.cache.Cache to cache file segments in memory. This cache is also very raw and should evolve to something more complex that caches segment of files on disk. The CachedSearchableSnapshotShard acts as a FilterSearchableSnapshotShard so that it delegates the listing or the reading of files to another searchable snapshot shard in case of the segment of file to read is not present in cache (ie, a cache miss). When the segment of file to read requested by the searchable snapshot directory's index input is present in cache it is served directly.

Finally, this pull request reuses the tests added in #49651 to test the searchable snapshot directory implementation by randomly use the cache or not.

@tlrx tlrx added the :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs label Dec 10, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@tlrx
Copy link
Member Author

tlrx commented Jan 27, 2020

The cache system has been implemented in #50693

@tlrx tlrx closed this Jan 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants