Skip to content

Source-only snapshots create a modified segment info file with the same id as the original segment #77842

@fcofdez

Description

@fcofdez

Source-only snapshots create slightly modified segments since it has to remove certain meta-information that doesn't apply to the segments that only contains stored fields. The problem is that the modified segment reuses the original segment info ID that it is supposed to uniquely identify a segment, see:

BytesRef segmentId = new BytesRef(si.getId());
boolean exists = existingSegments.containsKey(segmentId);
if (exists == false) {
SegmentInfo newSegmentInfo = new SegmentInfo(targetDirectory, si.getVersion(), si.getMinVersion(), si.name, si.maxDoc(),
false, si.getCodec(), si.getDiagnostics(), si.getId(), si.getAttributes(), null);

#53463 introduced an optimization that allows to just soft-link the stored fields segment files instead of copying them, saving up to 50% in shard storage. Since stored fields files contains a header with the original segment id for integrity checks this should be the same as the original segment id for this optimization to work.

This breaks the contract that Lucene provides regarding uniqueness of segment ids and it prevents some enhancements such as #77695.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions