Skip to content

Conversation

@fcofdez
Copy link
Contributor

@fcofdez fcofdez commented Sep 14, 2021

Today MetadataSnapshot#recoveryDiff considers the .liv file as per-commit
rather than per-segment and often transfers them during peer recoveries and
snapshot restores. It also considers differences in .fnm, .dvd and .dvm
files as indicating a difference in the whole segment, even though these files
may be adjusted without changing the segment itself.

This commit adjusts this logic to attach these generational files to the
segments themselves, allowing Elasticsearch only to transfer them if they are
genuinely needed.

Closes #55142

This is basically the same as #55239 but updated

DaveCTurner and others added 7 commits April 15, 2020 12:49
Today `MetadataSnapshot#recoveryDiff` considers the `.liv` file as per-commit
rather than per-segment and often transfers them during peer recoveries and
snapshot restores. It also considers differences in `.fnm`, `.dvd` and `.dvm`
files as indicating a difference in the whole segment, even though these files
may be adjusted without changing the segment itself.

This commit adjusts this logic to attach these generational files to the
segments themselves, allowing Elasticsearch only to transfer them if they are
genuinely needed.

Closes elastic#55142
Resolves an outstanding `//NORELEASE` action related to elastic#50999.
@fcofdez fcofdez added >enhancement :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v8.0.0 Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.16.0 labels Sep 14, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@fcofdez
Copy link
Contributor Author

fcofdez commented Sep 14, 2021

@elasticmachine update branch

@fcofdez
Copy link
Contributor Author

fcofdez commented Sep 16, 2021

This is blocked by #77842 unless we find a workaround for source-only snapshots that share the segment id but whose content can be different.

@fcofdez fcofdez force-pushed the 2020-04-15-dont-copy-liv-file branch from 115680b to da417ce Compare September 23, 2021 15:19
@fcofdez
Copy link
Contributor Author

fcofdez commented Sep 27, 2021

@elasticmachine update branch

@fcofdez fcofdez force-pushed the 2020-04-15-dont-copy-liv-file branch from 12f94bb to 57af839 Compare September 28, 2021 15:42
@fcofdez
Copy link
Contributor Author

fcofdez commented Sep 29, 2021

@elasticmachine update branch

@fcofdez
Copy link
Contributor Author

fcofdez commented Sep 29, 2021

@original-brownbear would you mind reviewing the bits around snapshot FileInfo serialization when you have the chance? 7f2b8bc I had to introduce some conditional serialization based on the repo version to account mixed clusters.

Copy link
Contributor

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as far as the snapshot related changes go :)

@fcofdez fcofdez requested a review from tlrx October 1, 2021 11:00
Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I left only minor comments that you can choose to address or not. Sorry for the delay

for (StoreFileMetadata sourceFile : this) {
if (sourceFile.name().startsWith("_")) {
final String segmentId = IndexFileNames.parseSegmentName(sourceFile.name());
final long generation = IndexFileNames.parseGeneration(sourceFile.name());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
final long generation = IndexFileNames.parseGeneration(sourceFile.name());
final boolean isGenerationalFile = IndexFileNames.parseGeneration(sourceFile.name()) > 0L;

*/
public RecoveryDiff recoveryDiff(MetadataSnapshot recoveryTargetSnapshot) {
public RecoveryDiff recoveryDiff(final MetadataSnapshot targetSnapshot) {
final List<StoreFileMetadata> perCommitSourceFiles = new ArrayList<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move the computation of perCommitSourceFiles and perSegmentSourceFiles just before the loop where it is used.


final ShardGeneration indexGeneration;
final boolean writeShardGens = SnapshotsService.useShardGenerations(context.getRepositoryMetaVersion());
final boolean writeFileInfoWriterUUID = SnapshotsService.includeFileInfoWriterUUID(context.getRepositoryMetaVersion());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is always used as a String so maybe worth to declare it a String

// If we have the file contents, we directly compare the contents. This is useful to compare segment info
// files of source-only snapshots where the original segment info file shares the same id as the source-only
// segment info file but its contents are different.
if (hashEqualsContents()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we compute hashEqualsContents once is the constructor and stores it as a class member? It looks like every time a StoreFileMetadata is instanciated we use it.

/**
* Writes blob with resolving the blob name using {@link #blobName} method.
* <p>
* The blob will optionally by compressed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* The blob will optionally by compressed.
* The blob will optionally be compressed.

final String blobName,
final boolean compress,
final Map<String, String> extraParams,
OutputStream outputStream
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make this one final too

XContentBuilder builder = XContentFactory.contentBuilder(XContentType.JSON).prettyPrint();
BlobStoreIndexShardSnapshot.FileInfo.toXContent(info, builder);
boolean serializeWriterUUID = randomBoolean();
ToXContent.Params params = new ToXContent.MapParams(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also test the default behavior with an empty map

iwc.setMergePolicy(NoMergePolicy.INSTANCE);
iwc.setUseCompoundFile(random.nextBoolean());
iwc.setOpenMode(IndexWriterConfig.OpenMode.APPEND);
IndexWriter writer = new IndexWriter(store.directory(), iwc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very important but IndexWriter implements AutoCloseable and can be used in try-with-resources blocks. IndexWriterConfig also commits on close so you can save few lines (but it's not used like this in other tests so 🤷).


private final BytesRef writerUuid;

public StoreFileMetadata(String name, long length, String checksum, String writtenBy) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we get rid of this ctor somehow? It's used in RecoveryFileChunkRequest as a way to carry the name/length/etc but those are serialized separately there and I wonder if that could introduce some bugs later if someone rely on writerUuid in recovery but it's never available there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Maybe we should serialize the writerUuid there too? It's a bit hacky but that's where we are today 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding the serialization of the writerUuid we could maybe just serialize a StoreFileMetadata. This can be done in a follow up though

@fcofdez
Copy link
Contributor Author

fcofdez commented Oct 5, 2021

@elasticmachine update branch

@fcofdez
Copy link
Contributor Author

fcofdez commented Oct 5, 2021

@elasticmachine run elasticsearch-ci/part-1
It was a known failure #78675

@fcofdez fcofdez merged commit 310b4ac into elastic:master Oct 5, 2021
@fcofdez
Copy link
Contributor Author

fcofdez commented Oct 5, 2021

Thanks Armin and Tanguy!

@fcofdez fcofdez added the auto-backport Automatically create backport pull requests when merged label Oct 5, 2021
@fcofdez fcofdez added auto-backport Automatically create backport pull requests when merged and removed auto-backport Automatically create backport pull requests when merged labels Oct 5, 2021
fcofdez added a commit to fcofdez/elasticsearch that referenced this pull request Oct 5, 2021
Today `MetadataSnapshot#recoveryDiff` considers the `.liv` file as per-commit
rather than per-segment and often transfers them during peer recoveries and
snapshot restores. It also considers differences in `.fnm`, `.dvd` and `.dvm`
files as indicating a difference in the whole segment, even though these files
may be adjusted without changing the segment itself.

This commit adjusts this logic to attach these generational files to the
segments themselves, allowing Elasticsearch only to transfer them if they are
genuinely needed.

Closes elastic#55142
Backport of elastic#77695

Co-authored-by: David Turner <[email protected]>
fcofdez added a commit to fcofdez/elasticsearch that referenced this pull request Oct 6, 2021
fcofdez added a commit that referenced this pull request Oct 6, 2021
fcofdez added a commit that referenced this pull request Oct 6, 2021
Today `MetadataSnapshot#recoveryDiff` considers the `.liv` file as per-commit
rather than per-segment and often transfers them during peer recoveries and
snapshot restores. It also considers differences in `.fnm`, `.dvd` and `.dvm`
files as indicating a difference in the whole segment, even though these files
may be adjusted without changing the segment itself.

This commit adjusts this logic to attach these generational files to the
segments themselves, allowing Elasticsearch only to transfer them if they are
genuinely needed.

Closes #55142
Backport of #77695

Co-authored-by: David Turner <[email protected]>
fcofdez added a commit to fcofdez/elasticsearch that referenced this pull request Oct 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.16.0 v8.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incremental restores treat segments containing soft-deleted docs as "different"

6 participants