Performance regression for search query in single node cluster

**Elasticsearch version** (`bin/elasticsearch --version`): 6.3.1

**Plugins installed**: []

**JVM version** (`java -version`): OpenJDK Runtime Environment (build 10.0.1+10-Ubuntu-3ubuntu1)

**OS version** (`uname -a` if on a Unix-like system): Ubuntu 18.04

`Linux es-perf 4.13.0-25-generic #29-Ubuntu SMP Mon Jan 8 21:14:41 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux`

**Description of the problem including expected versus actual behavior**:

A single search query running on a single node elasticsearch cluster is much slower with elasticsearch 6.3.1 than it was with 5.5.3 while it would be expected to take the same time.

An inefficient use of the CPU due to the limit of concurrent shard requests per search request (implemented in https://github.com/elastic/elasticsearch/pull/25632) is the likely culprit.

**Steps to reproduce**:

The following steps were tested in a virtual machine with 8 cores and 16 GB of RAM.

1. Install elasticsearch 5.5.3 from deb package
2. Create indices with test data using `populate.sh` (see below)
3. Run a single `match_all` search query while showing CPU usage

```console
$ ./query.sh
Elasticsearch version 5.5.3

%Cpu(s): 55.7 us,  4.0 sy,  0.0 ni, 31.0 id,  9.1 wa,  0.0 hi,  0.0 si,  0.1 st
%Cpu(s):  1.1 us,  0.0 sy,  0.0 ni, 98.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  0.0 us,  1.1 sy,  0.0 ni, 98.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

GET /_search
{"query":{"match_all":{}}}

%Cpu(s):  1.1 us,  2.2 sy,  0.0 ni, 96.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s): 76.5 us,  0.0 sy,  0.0 ni, 23.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s): 85.9 us,  1.2 sy,  0.0 ni, 12.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
{
  "hits": 670830085,
  "took": 334,
  "_shards": {
    "total": 100,
    "successful": 100,
    "failed": 0
  }
}
%Cpu(s): 24.7 us,  1.1 sy,  0.0 ni, 74.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  0.0 us,  1.1 sy,  0.0 ni, 98.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  1.1 us,  1.1 sy,  0.0 ni, 96.7 id,  1.1 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  1.1 us,  0.0 sy,  0.0 ni, 98.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
$
```
4. Upgrade to 6.3.1 by installing new package and restarting the service
5. Run the same query again

```console
$ ./query.sh
Elasticsearch version 6.3.1

%Cpu(s): 54.8 us,  4.0 sy,  0.0 ni, 32.0 id,  9.1 wa,  0.0 hi,  0.0 si,  0.1 st
%Cpu(s):  2.2 us,  1.1 sy,  0.0 ni, 96.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

GET /_search
{"query":{"match_all":{}}}

%Cpu(s):  0.0 us,  1.1 sy,  0.0 ni, 98.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s): 25.6 us,  1.1 sy,  0.0 ni, 72.2 id,  0.0 wa,  0.0 hi,  1.1 si,  0.0 st
%Cpu(s): 52.8 us,  1.1 sy,  0.0 ni, 46.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s): 62.1 us,  0.0 sy,  0.0 ni, 37.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s): 59.8 us,  1.1 sy,  0.0 ni, 39.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s): 58.8 us,  0.0 sy,  0.0 ni, 41.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s): 56.5 us,  0.0 sy,  0.0 ni, 43.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s): 44.3 us,  2.3 sy,  0.0 ni, 53.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
{
  "hits": 670830085,
  "took": 727,
  "_shards": {
    "total": 100,
    "successful": 100,
    "skipped": 0,
    "failed": 0
  }
}
%Cpu(s):  1.1 us,  2.3 sy,  0.0 ni, 96.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  0.0 us,  1.1 sy,  0.0 ni, 98.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  1.1 us,  1.1 sy,  0.0 ni, 97.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  1.1 us,  1.1 sy,  0.0 ni, 97.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
$
```

- Run the same again while setting `max_concurrent_shard_requests` to 16

```console
$ ./query.sh ?max_concurrent_shard_requests=16
Elasticsearch version 6.3.1

%Cpu(s): 53.4 us,  3.9 sy,  0.0 ni, 33.8 id,  8.8 wa,  0.0 hi,  0.0 si,  0.1 st
%Cpu(s):  1.1 us,  1.1 sy,  0.0 ni, 97.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  1.1 us,  2.3 sy,  0.0 ni, 96.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

GET /_search?max_concurrent_shard_requests=16
{"query":{"match_all":{}}}

%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s): 78.2 us,  2.3 sy,  0.0 ni, 19.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
{
  "hits": 670830085,
  "took": 309,
  "_shards": {
    "total": 100,
    "successful": 100,
    "skipped": 0,
    "failed": 0
  }
}
%Cpu(s):  5.7 us,  2.3 sy,  0.0 ni, 92.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  1.1 us,  0.0 sy,  0.0 ni, 98.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  1.1 us,  0.0 sy,  0.0 ni, 98.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  1.1 us,  1.1 sy,  0.0 ni, 97.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
$
```

**Summary**:

| Version | Query took (ms) | Peak user CPU usage | Extra query parameter |
| ------- | --------------- | ------------------- | ---------------- |
| 5.5.3   | 334 | 100 % | |
| 6.3.1   | 727 | 62 % (only 5 threads running on a 8-core CPU) | |
| 6.3.1   | 309 | 100 % | max_concurrent_shard_requests=16 |


**Test scripts**:

`populate.sh`:
```bash
#!/bin/bash

# Create index-1 with 10M dummy documents

for j in $(seq 0 10); do
        for i in $(seq 0 1000000); do echo -e '{ "index" : { "_index" : "index-1", "_type" : "doc" } }\n{ "message" : "Hello World!" }'; done | curl -s -X POST "localhost:9200/_bulk" -H 'Content-Type: application/json' --data-binary @- | jq '{errors: .errors, took: .took}'
done

# Clone index-1

for index in $(seq 2 20); do
        curl -X POST "localhost:9200/_reindex" -H 'Content-Type: application/json' -d"{\"source\": {\"index\": \"index-1\"}, \"dest\": {\"index\": \"index-$index\"}}" &
done
```

`query.sh`:
```bash
#!/bin/bash

echo -n "Elasticsearch version "
curl -s localhost:9200 | jq -r .version.number
echo

curl -s -X POST "localhost:9200/_cache/clear" > /dev/null

QUERY='{"query":{"match_all":{}}}'

top -d 0.1 -b | grep Cpu &

sleep 0.5
echo
echo "GET /_search$1"
echo $QUERY
echo
curl -s -X GET "localhost:9200/_search$1" -H 'Content-Type: application/json' -d $QUERY | jq "{hits: .hits.total, took: .took, _shards: ._shards}"
sleep 0.5

kill $(jobs -p)
wait
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance regression for search query in single node cluster #31877

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Version	Query took (ms)	Peak user CPU usage	Extra query parameter
5.5.3	334	100 %
6.3.1	727	62 % (only 5 threads running on a 8-core CPU)
6.3.1	309	100 %	max_concurrent_shard_requests=16

Performance regression for search query in single node cluster #31877

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions