-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Today when an enrich policy is executed, building a new copy of the enrich index, we use the elected master to coordinate the underlying reindex task which operates in batches of 10,000 documents by default. A user reported that a particular enrich policy execution would reliably cause their master to fail with an OutOfMemoryError. The policy in question was of geo_match type and the documents, containing geoshapes, could be quite large. Their master had a 1GB heap, which is appropriate for their small and well-run cluster, but it appears it simply could not hold 10,000 geoshapes in memory on the master at once.
As a workaround they reduced the (undocumented?) enrich.fetch_size to 5000, which was enough to avoid the OOM, but I'm still concerned about the strain this puts on the master.
A few ideas for possible improvements:
- Move the coordination of the reindexing job onto a different node.
- Strengthen a circuit-breaker to prevent an OOM like this one.
- Adapt the reindex batch size to the resources available.
I think we should definitely do the first; the other two are harder.
See https://discuss.elastic.co/t/why-does-an-enrich-policy-get-executed-on-the-master-node/263241 for more details.
/cc @consulthys