Skip to content

Conversation

@dnhatn
Copy link
Member

@dnhatn dnhatn commented Aug 8, 2025

Merging shrink TSDB or LogsDB indices can fail in versions 8.19 or 9.1+.
When shrinking an index to a single shard, we use addIndexes, which can
add Lucene segments directly. In this case, FieldInfos can differ
between shards and the new segment. We should use the FieldInfos from
each segment to retrieve the merge stats, instead of the FieldInfos of
the merged segment.

Relates #125403

@dnhatn dnhatn force-pushed the fix-tsdb-merges branch 2 times, most recently from 05076f2 to 2d5f28c Compare August 8, 2025 20:34
@dnhatn dnhatn added the v9.1.2 label Aug 8, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @dnhatn, I've created a changelog YAML for you.

@dnhatn dnhatn marked this pull request as ready for review August 8, 2025 22:21
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Nhat, very good find here. LGTM 👍

for (int i = 0; i < mergeState.docValuesProducers.length; i++) {
final FieldInfo fieldInfo = mergeState.fieldInfos[i].fieldInfo(mergedFieldInfo.name);
if (fieldInfo == null) {
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar to this if statement:

var wrapped = perFieldReader.getDocValuesProducer(fieldInfo);
if (wrapped == null) {
   continue;
}

and so it is safe to continue here? (and we don't have to fallback to UNSUPPORTED)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be safe to continue with the merge optimization. If the FieldInfo is null in some segments, it is safe to skip those and use the merge stats from segments that have values.

@dnhatn
Copy link
Member Author

dnhatn commented Aug 9, 2025

Thanks Martijn!

@dnhatn dnhatn added the auto-backport Automatically create backport pull requests when merged label Aug 9, 2025
@dnhatn dnhatn merged commit 7bd95dc into elastic:main Aug 9, 2025
33 checks passed
@dnhatn dnhatn deleted the fix-tsdb-merges branch August 9, 2025 01:00
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19 Commit could not be cherrypicked due to conflicts
9.1

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 132597

dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Aug 9, 2025
Merging shrink TSDB or LogsDB indices can fail in versions 8.19 or 9.1+.
When shrinking an index to a single shard, we use addIndexes, which can
add Lucene segments directly. In this case, FieldInfos can differ
between shards and the new segment. We should use the FieldInfos from
each segment to retrieve the merge stats, instead of the FieldInfos of
the merged segment.

Relates elastic#125403
elasticsearchmachine pushed a commit that referenced this pull request Aug 9, 2025
…32612)

Merging shrink TSDB or LogsDB indices can fail in versions 8.19 or 9.1+.
When shrinking an index to a single shard, we use addIndexes, which can
add Lucene segments directly. In this case, FieldInfos can differ
between shards and the new segment. We should use the FieldInfos from
each segment to retrieve the merge stats, instead of the FieldInfos of
the merged segment.

Relates #125403
@dnhatn
Copy link
Member Author

dnhatn commented Aug 9, 2025

💚 All backports created successfully

Status Branch Result
8.19

Questions ?

Please refer to the Backport tool documentation

dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Aug 9, 2025
Merging shrink TSDB or LogsDB indices can fail in versions 8.19 or 9.1+.
When shrinking an index to a single shard, we use addIndexes, which can
add Lucene segments directly. In this case, FieldInfos can differ
between shards and the new segment. We should use the FieldInfos from
each segment to retrieve the merge stats, instead of the FieldInfos of
the merged segment.

Relates elastic#125403
elasticsearchmachine pushed a commit that referenced this pull request Aug 9, 2025
…32613)

Merging shrink TSDB or LogsDB indices can fail in versions 8.19 or 9.1+.
When shrinking an index to a single shard, we use addIndexes, which can
add Lucene segments directly. In this case, FieldInfos can differ
between shards and the new segment. We should use the FieldInfos from
each segment to retrieve the merge stats, instead of the FieldInfos of
the merged segment.

Relates #125403
martijnvg added a commit that referenced this pull request Aug 11, 2025
Be more strict about not finding a metadata entry based on field info number.
If this were to happen, then this can cause issues in doc value consumer later.
An example of such a failure is what is fixed via #132597, here with shrinking
the local segment and merged field numbers were not aligned causing serious
merge errors in the tsdb doc value consumer. Currently such an issue shouldn't
occur anymore.
ppf2 added a commit that referenced this pull request Aug 11, 2025
Adding #132597 as known issue for 9.1.0-9.1.1
ppf2 added a commit that referenced this pull request Aug 11, 2025
Adding #132597 as known issue for 8.19.0.

Placing it at the very bottom just to be consistent with other release notes in the 8-series (though known issues are better positioned towards the top of the release notes -- after breaking changes).
ppf2 added a commit that referenced this pull request Aug 11, 2025
Adding #132597 as known issue for 8.19.1.

Placing it at the very bottom just to be consistent with other release notes in the 8-series (though known issues are better positioned towards the top of the release notes -- after breaking changes).
ppf2 added a commit that referenced this pull request Aug 11, 2025
Adding #132597 as known issue for 8.19.0.

Placing it at the very bottom just to be consistent with other release notes in the 8-series (though known issues are better positioned towards the top of the release notes -- after breaking changes).
ppf2 added a commit that referenced this pull request Aug 11, 2025
Adding #132597 as known issue for 8.19.1.

Placing it at the very bottom just to be consistent with other release notes in the 8-series (though known issues are better positioned towards the top of the release notes -- after breaking changes).
ppf2 added a commit that referenced this pull request Aug 11, 2025
Adding #132597 as known issue for 9.1.0-9.1.1
ppf2 added a commit that referenced this pull request Aug 12, 2025
Reverting change based on the [comment](#132683 (comment)).

Due to a PR CI issue, the [optimization](#125403) introduced in 8.19.0 that would have caused the [bug](#132597) didn't actually get enabled in 8.19.x versions.  Therefore, 8.19.0 and 8.19.1 do not have either the optimization or the bug.  The optimization will be enabled in 8.19.2 and the bug will be fixed by then.
ppf2 added a commit that referenced this pull request Aug 12, 2025
Reverting change based on the [comment](#132683 (comment)).

Due to a PR CI issue, the [optimization](#125403) introduced in 8.19.0 that would have caused the [bug](#132597) didn't actually get enabled in 8.19.x versions.  Therefore, 8.19.0 and 8.19.1 do not have either the optimization or the bug.  The optimization will be enabled in 8.19.2 and the bug will be fixed by then.
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Aug 14, 2025
Be more strict about not finding a metadata entry based on field info number.
If this were to happen, then this can cause issues in doc value consumer later.
An example of such a failure is what is fixed via elastic#132597, here with shrinking
the local segment and merged field numbers were not aligned causing serious
merge errors in the tsdb doc value consumer. Currently such an issue shouldn't
occur anymore.
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Aug 14, 2025
Be more strict about not finding a metadata entry based on field info number.
If this were to happen, then this can cause issues in doc value consumer later.
An example of such a failure is what is fixed via elastic#132597, here with shrinking
the local segment and merged field numbers were not aligned causing serious
merge errors in the tsdb doc value consumer. Currently such an issue shouldn't
occur anymore.
elasticsearchmachine pushed a commit that referenced this pull request Aug 14, 2025
Be more strict about not finding a metadata entry based on field info number.
If this were to happen, then this can cause issues in doc value consumer later.
An example of such a failure is what is fixed via #132597, here with shrinking
the local segment and merged field numbers were not aligned causing serious
merge errors in the tsdb doc value consumer. Currently such an issue shouldn't
occur anymore.
elasticsearchmachine pushed a commit that referenced this pull request Aug 18, 2025
Be more strict about not finding a metadata entry based on field info number.
If this were to happen, then this can cause issues in doc value consumer later.
An example of such a failure is what is fixed via #132597, here with shrinking
the local segment and merged field numbers were not aligned causing serious
merge errors in the tsdb doc value consumer. Currently such an issue shouldn't
occur anymore.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants