Skip to content

_recovery_source sometimes remains after merge #82595

@jtibshirani

Description

@jtibshirani

If _source is disabled or filtered in the mappings, we add a _recovery_source field to support shard recoveries and CCR. Once it's no longer needed, then future merges will drop the _recovery_source field to reclaim space.

In certain cases, it appears that _recovery_source can stick around even after a merge. I noticed this issue through the dense vector rally track. This command indexes 100,000 documents with _source disabled, then force merges to 1 segment:

esrally race --track=dense_vector --challenge=index-and-search --track-params="ingest_percentage:10" --on-error abort

At the end, the shard was larger than expected:

195M	data/indices/gPefBjHjTCCxU_EnbSuGrQ/0/index

Using the disk usage API, we see this is due to recovery source:

   "_recovery_source" : {
        "total" : "149.9mb",
        "total_in_bytes" : 157209753,
        ....

There are no replicas, so the force merge should have removed recovery source. I can reproduce this with both 1 and 2 shards. I haven't found a small-scale reproduction yet.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions