Use String.join() to describe a list of tasks #28941

DaveCTurner · 2018-03-08T14:12:01Z

This change replaces the use of string concatenation with a call to
String.join(). String concatenation might be quadratic, unless the compiler can
optimise it away, whereas String.join() is more reliably linear. There can
sometimes be a large number of pending ClusterState update tasks and #28920
includes a report that this operation sometimes takes a long time.

This change replaces the use of string concatenation with a call to String.join(). String concatenation might be quadratic, unless the compiler can optimise it away, whereas String.join() is more reliably linear. There can sometimes be a large number of pending ClusterState update tasks and elastic#28920 includes a report that this operation sometimes takes a long time.

DaveCTurner · 2018-03-08T16:18:08Z

NB I haven't been able to reproduce the slowness reported in #28920, because I don't have easy access to a partitionable cluster with 10k shards. @danielmitterdorfer do you have any opinions about this change and/or good ideas about how to quantify the benefit?

If this change looks ok, I can ask the OP of #28920 to try it.

danielmitterdorfer · 2018-03-09T08:03:36Z

do you have any opinions about this change

I am not familiar with this part of the code. But if this is the bottleneck then your change makes sense to me.

good ideas about how to quantify the benefit?

I think this would be a good candidate for a microbenchmark? We have some infrastructure for that in place in the benchmarks module. You can also have a look at an example benchmark that I did recently in the context of #28702. Happy to help out here as well and we can also run it in our microbenchmarking environment (which is tuned to reduce measurement noise).

bleskes

LGTM

bleskes · 2018-03-09T08:55:07Z

server/src/main/java/org/elasticsearch/cluster/ClusterStateTaskExecutor.java

-                return s1 + ", " + s2;
-            }
-        }).orElse("");
+        return String.join(", ", tasks.stream().map(t -> (CharSequence)t.toString()).filter(t -> t.length() == 0)::iterator);


out of curiosity - didn't T::toString work?

No, apparently Iterator<String> cannot be converted to Iterator<CharSequence>:

> Task :server:compileJava /Users/davidturner/src/elasticsearch-master/server/src/main/java/org/elasticsearch/cluster/ClusterStateTaskExecutor.java:59: error: no suitable method found for join(String,tasks.stre[...]rator) return String.join(", ", tasks.stream().map(T::toString).filter(t -> t.length() == 0)::iterator); ^ method String.join(CharSequence,CharSequence...) is not applicable (varargs mismatch; CharSequence is not a functional interface multiple non-overriding abstract methods found in interface CharSequence) method String.join(CharSequence,Iterable<? extends CharSequence>) is not applicable (argument mismatch; bad return type in method reference Iterator<String> cannot be converted to Iterator<CharSequence>)

DaveCTurner · 2018-03-09T09:34:17Z

Thanks @danielmitterdorfer for the pointer to the benchmarks module, that's just what I was after. I just overwrote the existing benchmark with this one: https://gist.github.com/DaveCTurner/de8763bc791d860e4fb0c9a9f98df7cd. At 10,000 tasks, each with a 100-byte description, the improvement is from ~500ms to ~120µs:

Result "org.elasticsearch.benchmark.routing.allocation.AllocationBenchmark.measureStreamReduce":
  578.282 ±(99.9%) 14.884 ms/op [Average]
  (min, avg, max) = (534.191, 578.282, 633.771), stdev = 22.278
  CI (99.9%): [563.398, 593.166] (assumes normal distribution)

...


Result "org.elasticsearch.benchmark.routing.allocation.AllocationBenchmark.measureStringJoin":
  0.119 ±(99.9%) 0.004 ms/op [Average]
  (min, avg, max) = (0.114, 0.119, 0.138), stdev = 0.006
  CI (99.9%): [0.115, 0.123] (assumes normal distribution)


# Run complete. Total time: 00:02:14

Benchmark                                Mode  Cnt    Score    Error  Units
AllocationBenchmark.measureStreamReduce  avgt   30  578.282 ± 14.884  ms/op
AllocationBenchmark.measureStringJoin    avgt   30    0.119 ±  0.004  ms/op

This seems worth doing.

danielmitterdorfer · 2018-03-09T09:41:30Z

You're welcome. That's quite a significant difference indeed.

This change replaces the use of string concatenation with a call to String.join(). String concatenation might be quadratic, unless the compiler can optimise it away, whereas String.join() is more reliably linear. There can sometimes be a large number of pending ClusterState update tasks and #28920 includes a report that this operation sometimes takes a long time.

This change replaces the use of string concatenation with a call to String.join(). String concatenation might be quadratic, unless the compiler can optimise it away, whereas String.join() is more reliably linear. There can sometimes be a large number of pending ClusterState update tasks and elastic#28920 includes a report that this operation sometimes takes a long time.

* master: (28 commits) Maybe die before failing engine (elastic#28973) Remove special handling for _all in nodes info Remove Booleans use from XContent and ToXContent (elastic#28768) Update Gradle Testing Docs (elastic#28970) Make primary-replica resync failures less lenient (elastic#28534) Remove temporary file 10_basic.yml~ Use different pipeline id in test. (pipelines do not get removed between tests extending from ESIntegTestCase) Use fixture to test the repository-gcs plugin (elastic#28788) Use String.join() to describe a list of tasks (elastic#28941) Fixed incorrect test try-catch statement Plugins: Consolidate plugin and module loading code (elastic#28815) percolator: Take `matchAllDocs` and `verified` of the sub result into account when analyzing a function_score query. Build: Remove rest tests on archive distribution projects (elastic#28952) Remove FastStringReader in favor of vanilla StringReader (elastic#28944) Remove FastCharArrayReader and FastCharArrayWriter (elastic#28951) Continue registering pipelines after one pipeline parse failure. (elastic#28752) Build: Fix ability to ignore when no tests are run (elastic#28930) [rest-api-spec] update doc link for /_rank_eval Switch XContentBuilder from BytesStreamOutput to ByteArrayOutputStream (elastic#28945) Factor UnknownNamedObjectException into its own class (elastic#28931) ...

In elastic#28941 we changed the computation of cluster state task descriptions but this introduced a bug in which we only log the empty descriptions (rather than the non-empty ones). This PR fixes that.

In #28941 we changed the computation of cluster state task descriptions but this introduced a bug in which we only log the empty descriptions (rather than the non-empty ones). This change fixes that.

In elastic#28941 we changed the computation of cluster state task descriptions but this introduced a bug in which we only log the empty descriptions (rather than the non-empty ones). This change fixes that. Backport of elastic#34182.

In #28941 we changed the computation of cluster state task descriptions but this introduced a bug in which we only log the empty descriptions (rather than the non-empty ones). This change fixes that. Backport of #34182.

In #28941 we changed the computation of cluster state task descriptions but this introduced a bug in which we only log the empty descriptions (rather than the non-empty ones). This change fixes that.

DaveCTurner added :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v7.0.0 v6.3.0 v5.6.9 labels Mar 8, 2018

DaveCTurner requested a review from bleskes March 8, 2018 14:12

bleskes approved these changes Mar 9, 2018

View reviewed changes

DaveCTurner merged commit 033a83b into elastic:master Mar 9, 2018

DaveCTurner mentioned this pull request Mar 9, 2018

Use RUNTIME_JAVA_HOME for benchmarks #28961

Closed

clintongormley added the >non-issue label Apr 18, 2018

DaveCTurner mentioned this pull request Oct 1, 2018

Fix logging of cluster state update descriptions #34182

Merged

DaveCTurner mentioned this pull request Oct 2, 2018

Fix logging of cluster state update descriptions #34243

Merged

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

DaveCTurner deleted the 2018-03-08-describeTasks-using-String-join branch July 23, 2022 10:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use String.join() to describe a list of tasks #28941

Use String.join() to describe a list of tasks #28941

Uh oh!

DaveCTurner commented Mar 8, 2018

Uh oh!

DaveCTurner commented Mar 8, 2018

Uh oh!

danielmitterdorfer commented Mar 9, 2018

Uh oh!

bleskes left a comment

Uh oh!

bleskes Mar 9, 2018

Uh oh!

DaveCTurner Mar 9, 2018

Uh oh!

DaveCTurner commented Mar 9, 2018

Uh oh!

danielmitterdorfer commented Mar 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Use String.join() to describe a list of tasks #28941

Use String.join() to describe a list of tasks #28941

Uh oh!

Conversation

DaveCTurner commented Mar 8, 2018

Uh oh!

DaveCTurner commented Mar 8, 2018

Uh oh!

danielmitterdorfer commented Mar 9, 2018

Uh oh!

bleskes left a comment

Choose a reason for hiding this comment

Uh oh!

bleskes Mar 9, 2018

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Mar 9, 2018

Choose a reason for hiding this comment

Uh oh!

DaveCTurner commented Mar 9, 2018

Uh oh!

danielmitterdorfer commented Mar 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants