Skip to content

Performance regression for search query in single node cluster #31877

@ctrlaltdel

Description

@ctrlaltdel

Elasticsearch version (bin/elasticsearch --version): 6.3.1

Plugins installed: []

JVM version (java -version): OpenJDK Runtime Environment (build 10.0.1+10-Ubuntu-3ubuntu1)

OS version (uname -a if on a Unix-like system): Ubuntu 18.04

Linux es-perf 4.13.0-25-generic #29-Ubuntu SMP Mon Jan 8 21:14:41 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

A single search query running on a single node elasticsearch cluster is much slower with elasticsearch 6.3.1 than it was with 5.5.3 while it would be expected to take the same time.

An inefficient use of the CPU due to the limit of concurrent shard requests per search request (implemented in #25632) is the likely culprit.

Steps to reproduce:

The following steps were tested in a virtual machine with 8 cores and 16 GB of RAM.

  1. Install elasticsearch 5.5.3 from deb package
  2. Create indices with test data using populate.sh (see below)
  3. Run a single match_all search query while showing CPU usage
$ ./query.sh
Elasticsearch version 5.5.3

%Cpu(s): 55.7 us,  4.0 sy,  0.0 ni, 31.0 id,  9.1 wa,  0.0 hi,  0.0 si,  0.1 st
%Cpu(s):  1.1 us,  0.0 sy,  0.0 ni, 98.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  0.0 us,  1.1 sy,  0.0 ni, 98.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

GET /_search
{"query":{"match_all":{}}}

%Cpu(s):  1.1 us,  2.2 sy,  0.0 ni, 96.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s): 76.5 us,  0.0 sy,  0.0 ni, 23.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s): 85.9 us,  1.2 sy,  0.0 ni, 12.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
{
  "hits": 670830085,
  "took": 334,
  "_shards": {
    "total": 100,
    "successful": 100,
    "failed": 0
  }
}
%Cpu(s): 24.7 us,  1.1 sy,  0.0 ni, 74.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  0.0 us,  1.1 sy,  0.0 ni, 98.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  1.1 us,  1.1 sy,  0.0 ni, 96.7 id,  1.1 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  1.1 us,  0.0 sy,  0.0 ni, 98.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
$
  1. Upgrade to 6.3.1 by installing new package and restarting the service
  2. Run the same query again
$ ./query.sh
Elasticsearch version 6.3.1

%Cpu(s): 54.8 us,  4.0 sy,  0.0 ni, 32.0 id,  9.1 wa,  0.0 hi,  0.0 si,  0.1 st
%Cpu(s):  2.2 us,  1.1 sy,  0.0 ni, 96.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

GET /_search
{"query":{"match_all":{}}}

%Cpu(s):  0.0 us,  1.1 sy,  0.0 ni, 98.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s): 25.6 us,  1.1 sy,  0.0 ni, 72.2 id,  0.0 wa,  0.0 hi,  1.1 si,  0.0 st
%Cpu(s): 52.8 us,  1.1 sy,  0.0 ni, 46.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s): 62.1 us,  0.0 sy,  0.0 ni, 37.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s): 59.8 us,  1.1 sy,  0.0 ni, 39.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s): 58.8 us,  0.0 sy,  0.0 ni, 41.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s): 56.5 us,  0.0 sy,  0.0 ni, 43.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s): 44.3 us,  2.3 sy,  0.0 ni, 53.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
{
  "hits": 670830085,
  "took": 727,
  "_shards": {
    "total": 100,
    "successful": 100,
    "skipped": 0,
    "failed": 0
  }
}
%Cpu(s):  1.1 us,  2.3 sy,  0.0 ni, 96.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  0.0 us,  1.1 sy,  0.0 ni, 98.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  1.1 us,  1.1 sy,  0.0 ni, 97.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  1.1 us,  1.1 sy,  0.0 ni, 97.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
$
  • Run the same again while setting max_concurrent_shard_requests to 16
$ ./query.sh ?max_concurrent_shard_requests=16
Elasticsearch version 6.3.1

%Cpu(s): 53.4 us,  3.9 sy,  0.0 ni, 33.8 id,  8.8 wa,  0.0 hi,  0.0 si,  0.1 st
%Cpu(s):  1.1 us,  1.1 sy,  0.0 ni, 97.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  1.1 us,  2.3 sy,  0.0 ni, 96.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

GET /_search?max_concurrent_shard_requests=16
{"query":{"match_all":{}}}

%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s): 78.2 us,  2.3 sy,  0.0 ni, 19.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
{
  "hits": 670830085,
  "took": 309,
  "_shards": {
    "total": 100,
    "successful": 100,
    "skipped": 0,
    "failed": 0
  }
}
%Cpu(s):  5.7 us,  2.3 sy,  0.0 ni, 92.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  1.1 us,  0.0 sy,  0.0 ni, 98.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  1.1 us,  0.0 sy,  0.0 ni, 98.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  1.1 us,  1.1 sy,  0.0 ni, 97.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
$

Summary:

Version Query took (ms) Peak user CPU usage Extra query parameter
5.5.3 334 100 %
6.3.1 727 62 % (only 5 threads running on a 8-core CPU)
6.3.1 309 100 % max_concurrent_shard_requests=16

Test scripts:

populate.sh:

#!/bin/bash

# Create index-1 with 10M dummy documents

for j in $(seq 0 10); do
        for i in $(seq 0 1000000); do echo -e '{ "index" : { "_index" : "index-1", "_type" : "doc" } }\n{ "message" : "Hello World!" }'; done | curl -s -X POST "localhost:9200/_bulk" -H 'Content-Type: application/json' --data-binary @- | jq '{errors: .errors, took: .took}'
done

# Clone index-1

for index in $(seq 2 20); do
        curl -X POST "localhost:9200/_reindex" -H 'Content-Type: application/json' -d"{\"source\": {\"index\": \"index-1\"}, \"dest\": {\"index\": \"index-$index\"}}" &
done

query.sh:

#!/bin/bash

echo -n "Elasticsearch version "
curl -s localhost:9200 | jq -r .version.number
echo

curl -s -X POST "localhost:9200/_cache/clear" > /dev/null

QUERY='{"query":{"match_all":{}}}'

top -d 0.1 -b | grep Cpu &

sleep 0.5
echo
echo "GET /_search$1"
echo $QUERY
echo
curl -s -X GET "localhost:9200/_search$1" -H 'Content-Type: application/json' -d $QUERY | jq "{hits: .hits.total, took: .took, _shards: ._shards}"
sleep 0.5

kill $(jobs -p)
wait

Metadata

Metadata

Assignees

Labels

:Search/SearchSearch-related issues that do not fall into other categories

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions