Query latency increases with multiple shards

**Description of the problem including expected versus actual behavior**:

With https://github.com/elastic/elasticsearch/pull/30783 we have reduced the number of maximum concurrent shard requests to the number of nodes but at most 256. Previously this was dependent on the number of nodes and old default number of shards (5).

In our benchmarks we see an increase of query latency for benchmarks which explicitly set the number of shards to 5 (e.g. [geonames](https://elasticsearch-benchmarks.elastic.co/#tracks/geonames/nightly/30d) or [geopoints](https://elasticsearch-benchmarks.elastic.co/#tracks/geopoint/nightly/30d)). For example, the 50th percentile latency for the polygon query in [geopoints](https://elasticsearch-benchmarks.elastic.co/#tracks/geopoint/nightly/30d) has increased from 59 ms to 153 ms. Similary, for the painless_static query in [geonames](https://elasticsearch-benchmarks.elastic.co/#tracks/geonames/nightly/30d) the 50th percentile service time has increased from 504 ms to 1488 ms (the system is completely saturated in the latter case and it makes no sense to look at latency that's why I mentioned service time here).


**Steps to reproduce**:

The problem can be reproduced with the following Rally benchmarks:

```
# state after PR 30783
esrally --revision=d7040ad7b41 --track="geonames" --include-tasks="create-index,index-append,force-merge,painless_static"
# manually revert https://github.com/elastic/elasticsearch/pull/30783 
cd $(awk -F " = " '$1~/^src\.root\.dir/ {print $2}' ~/.rally/rally.ini)/elasticsearch
git revert 2984734197223003dc80ed1ac4e8366f8d49ed1c
# state before (i.e. d7040ad7b41 without 29847341972)
esrally --revision=current --track="geonames" --include-tasks="create-index,index-append,force-merge,painless_static"
# reset your local state again!
git reset --hard origin/master
```

Unfortunately you have to manually revert the work introduced in https://github.com/elastic/elasticsearch/pull/30783 because we had [yet another regression](https://github.com/elastic/elasticsearch/issues/30801) in between that caused Elasticsearch to OOM (which was fixed by https://github.com/elastic/elasticsearch/pull/30820).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Query latency increases with multiple shards #30994

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Query latency increases with multiple shards #30994

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions