Skip to content

[7.x] :qa:mixed-cluster:v7.11.2#mixedClusterTest {p0=cat.nodes/10_basic/Test cat nodes output}  #69757

@jakelandis

Description

@jakelandis

This is a continuation of #69641

On 7.x it appears that for reasons yet to be determined, the regex used by "cat.nodes/10_basic/Test cat nodes output" is extremely expensive, to the point of causing tests to timeout.

On a modern desktop CPU the following command on 7.x can take over 1 minute to run

./gradlew ':qa:mixed-cluster:v7.11.2#mixedClusterTest'  -Dtests.timestamp=$(date +%S) -Dtests.method="test {p0=cat.nodes/10_basic/Test cat nodes output}" --info

For example:

org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT > test {p0=cat.nodes/10_basic/Test cat nodes output} STANDARD_OUT
    [2021-03-02T03:46:19,108][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=cat.nodes/10_basic/Test cat nodes output] before test
    [2021-03-02T03:46:19,189][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] initializing REST clients against [http://[::1]:45603, http://127.0.0.1:46583, http://[::1]:43459, http://127.0.0.1:44891, http://[::1]:33689, http://127.0.0.1:33833, http://[::1]:36341, http://127.0.0.1:43035]
    [2021-03-02T03:46:19,617][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] initializing client, minimum es version [7.11.2], master version, [7.11.2], hosts [http://[::1]:45603, http://127.0.0.1:46583, http://[::1]:43459, http://127.0.0.1:44891, http://[::1]:33689, http://127.0.0.1:33833, http://[::1]:36341, http://127.0.0.1:43035], os [Ubuntu 20.04.1 LTS]
    [2021-03-02T03:48:19,626][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] Clearing template [.logstash-management]
    [2021-03-02T03:48:20,020][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=cat.nodes/10_basic/Test cat nodes output] after test

It has been observed that when running the full test suite that (on the same hardware ^^) to take 4 minutes.

Most all evidence and some testing done be @benwtrent lay blame at the regular expression(s) used in this test.

Further, I have found some successful tests that took 15m to perform this test:
image

It appears that this started with the innocent change introduced by #68605, which merely adds a character to set of character constants. During that 4m local test, I was able to grab the output of a _cat/nodes while the CPU was pegged with lots of work happening in regex (as evident by the stack)

ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role   master name
127.0.0.1           53          99   2    1.99    4.27     4.11 cdfhilmrstw -      v7.11.2-0
127.0.0.1           62          99   2    1.99    4.27     4.11 cdhilmrstw  -      v7.11.2-3
127.0.0.1           55          99   2    1.99    4.27     4.11 cdfhilmrstw -      v7.11.2-1
127.0.0.1           76          99   2    1.99    4.27     4.11 cdhilmrstw  *      v7.11.2-2

Oddly this is not an issue for master or other branches and seems specific to 7.x. The equivalent command from master is :

./gradlew ':qa:mixed-cluster:v7.13.0#mixedClusterTest'  -Dtests.timestamp=$(date +%S) -Dtests.method="test {p0=cat.nodes/10_basic/Test cat nodes output}" --info

and executes in under 1s on that same hardware ^^.

Build scan:

https://gradle-enterprise.elastic.co/s/az2nnpcpm5dd6

Repro line:

On 7.x

./gradlew ':qa:mixed-cluster:v7.11.2#mixedClusterTest'  -Dtests.timestamp=$(date +%S) -Dtests.method="test {p0=cat.nodes/10_basic/Test cat nodes output}" --info

Reproduces locally?:

The slowness does, but only up to 4m so far. I have not been able to

Applicable branches:

7.x

Failure history:
build-stats

and

gradle enterprise

Failure excerpt:

You will see "p0=cat.nodes/10_basic/Test cat nodes output" as the last test to execute. In the server logs, you will see a long pause with nothing in the logs until the container is killed by the test timeout. No relevant error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions