Enrich policy execution can be a heavy burden on the master

Today when an enrich policy is executed, building a new copy of the enrich index, we use the elected master to coordinate the underlying reindex task which operates in batches of 10,000 documents by default. A user reported that a particular enrich policy execution would reliably cause their master to fail with an `OutOfMemoryError`. The policy in question was of `geo_match` type and the documents, containing geoshapes, could be quite large. Their master had a 1GB heap, which is appropriate for their small and well-run cluster, but it appears it simply could not hold 10,000 geoshapes in memory on the master at once.

As a workaround they reduced the (undocumented?) `enrich.fetch_size` to 5000, which was enough to avoid the OOM, but I'm still concerned about the strain this puts on the master.

A few ideas for possible improvements:

- Move the coordination of the reindexing job onto a different node.
- Strengthen a circuit-breaker to prevent an OOM like this one.
- Adapt the reindex batch size to the resources available.

I think we should definitely do the first; the other two are harder.

See https://discuss.elastic.co/t/why-does-an-enrich-policy-get-executed-on-the-master-node/263241 for more details.

/cc @consulthys 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enrich policy execution can be a heavy burden on the master #70436

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enrich policy execution can be a heavy burden on the master #70436

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions