-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Source-only snapshots create slightly modified segments since it has to remove certain meta-information that doesn't apply to the segments that only contains stored fields. The problem is that the modified segment reuses the original segment info ID that it is supposed to uniquely identify a segment, see:
Lines 215 to 219 in 9958c3c
| BytesRef segmentId = new BytesRef(si.getId()); | |
| boolean exists = existingSegments.containsKey(segmentId); | |
| if (exists == false) { | |
| SegmentInfo newSegmentInfo = new SegmentInfo(targetDirectory, si.getVersion(), si.getMinVersion(), si.name, si.maxDoc(), | |
| false, si.getCodec(), si.getDiagnostics(), si.getId(), si.getAttributes(), null); |
#53463 introduced an optimization that allows to just soft-link the stored fields segment files instead of copying them, saving up to 50% in shard storage. Since stored fields files contains a header with the original segment id for integrity checks this should be the same as the original segment id for this optimization to work.
This breaks the contract that Lucene provides regarding uniqueness of segment ids and it prevents some enhancements such as #77695.