Skip to content

Conversation

@DaveCTurner
Copy link
Contributor

This change replaces the use of string concatenation with a call to
String.join(). String concatenation might be quadratic, unless the compiler can
optimise it away, whereas String.join() is more reliably linear. There can
sometimes be a large number of pending ClusterState update tasks and #28920
includes a report that this operation sometimes takes a long time.

This change replaces the use of string concatenation with a call to
String.join(). String concatenation might be quadratic, unless the compiler can
optimise it away, whereas String.join() is more reliably linear. There can
sometimes be a large number of pending ClusterState update tasks and elastic#28920
includes a report that this operation sometimes takes a long time.
@DaveCTurner DaveCTurner added :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v7.0.0 v6.3.0 v5.6.9 labels Mar 8, 2018
@DaveCTurner DaveCTurner requested a review from bleskes March 8, 2018 14:12
@DaveCTurner
Copy link
Contributor Author

NB I haven't been able to reproduce the slowness reported in #28920, because I don't have easy access to a partitionable cluster with 10k shards. @danielmitterdorfer do you have any opinions about this change and/or good ideas about how to quantify the benefit?

If this change looks ok, I can ask the OP of #28920 to try it.

@danielmitterdorfer
Copy link
Member

do you have any opinions about this change

I am not familiar with this part of the code. But if this is the bottleneck then your change makes sense to me.

good ideas about how to quantify the benefit?

I think this would be a good candidate for a microbenchmark? We have some infrastructure for that in place in the benchmarks module. You can also have a look at an example benchmark that I did recently in the context of #28702. Happy to help out here as well and we can also run it in our microbenchmarking environment (which is tuned to reduce measurement noise).

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

return s1 + ", " + s2;
}
}).orElse("");
return String.join(", ", tasks.stream().map(t -> (CharSequence)t.toString()).filter(t -> t.length() == 0)::iterator);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity - didn't T::toString work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, apparently Iterator<String> cannot be converted to Iterator<CharSequence>:

> Task :server:compileJava 
/Users/davidturner/src/elasticsearch-master/server/src/main/java/org/elasticsearch/cluster/ClusterStateTaskExecutor.java:59: error: no suitable method found for join(String,tasks.stre[...]rator)
        return String.join(", ", tasks.stream().map(T::toString).filter(t -> t.length() == 0)::iterator);
                     ^
    method String.join(CharSequence,CharSequence...) is not applicable
      (varargs mismatch; CharSequence is not a functional interface
          multiple non-overriding abstract methods found in interface CharSequence)
    method String.join(CharSequence,Iterable<? extends CharSequence>) is not applicable
      (argument mismatch; bad return type in method reference
          Iterator<String> cannot be converted to Iterator<CharSequence>)

@DaveCTurner
Copy link
Contributor Author

Thanks @danielmitterdorfer for the pointer to the benchmarks module, that's just what I was after. I just overwrote the existing benchmark with this one: https://gist.github.com/DaveCTurner/de8763bc791d860e4fb0c9a9f98df7cd. At 10,000 tasks, each with a 100-byte description, the improvement is from ~500ms to ~120µs:

Result "org.elasticsearch.benchmark.routing.allocation.AllocationBenchmark.measureStreamReduce":
  578.282 ±(99.9%) 14.884 ms/op [Average]
  (min, avg, max) = (534.191, 578.282, 633.771), stdev = 22.278
  CI (99.9%): [563.398, 593.166] (assumes normal distribution)

...


Result "org.elasticsearch.benchmark.routing.allocation.AllocationBenchmark.measureStringJoin":
  0.119 ±(99.9%) 0.004 ms/op [Average]
  (min, avg, max) = (0.114, 0.119, 0.138), stdev = 0.006
  CI (99.9%): [0.115, 0.123] (assumes normal distribution)


# Run complete. Total time: 00:02:14

Benchmark                                Mode  Cnt    Score    Error  Units
AllocationBenchmark.measureStreamReduce  avgt   30  578.282 ± 14.884  ms/op
AllocationBenchmark.measureStringJoin    avgt   30    0.119 ±  0.004  ms/op

This seems worth doing.

@danielmitterdorfer
Copy link
Member

You're welcome. That's quite a significant difference indeed.

@DaveCTurner DaveCTurner merged commit 033a83b into elastic:master Mar 9, 2018
DaveCTurner added a commit that referenced this pull request Mar 9, 2018
This change replaces the use of string concatenation with a call to
String.join(). String concatenation might be quadratic, unless the compiler can
optimise it away, whereas String.join() is more reliably linear. There can
sometimes be a large number of pending ClusterState update tasks and #28920
includes a report that this operation sometimes takes a long time.
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Mar 9, 2018
This change replaces the use of string concatenation with a call to
String.join(). String concatenation might be quadratic, unless the compiler can
optimise it away, whereas String.join() is more reliably linear. There can
sometimes be a large number of pending ClusterState update tasks and elastic#28920
includes a report that this operation sometimes takes a long time.
sebasjm pushed a commit to sebasjm/elasticsearch that referenced this pull request Mar 10, 2018
This change replaces the use of string concatenation with a call to
String.join(). String concatenation might be quadratic, unless the compiler can
optimise it away, whereas String.join() is more reliably linear. There can
sometimes be a large number of pending ClusterState update tasks and elastic#28920
includes a report that this operation sometimes takes a long time.
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Mar 12, 2018
* master: (28 commits)
  Maybe die before failing engine (elastic#28973)
  Remove special handling for _all in nodes info
  Remove Booleans use from XContent and ToXContent (elastic#28768)
  Update Gradle Testing Docs (elastic#28970)
  Make primary-replica resync failures less lenient (elastic#28534)
  Remove temporary file 10_basic.yml~
  Use different pipeline id in test. (pipelines do not get removed between tests extending from ESIntegTestCase)
  Use fixture to test the repository-gcs plugin (elastic#28788)
  Use String.join() to describe a list of tasks (elastic#28941)
  Fixed incorrect test try-catch statement
  Plugins: Consolidate plugin and module loading code (elastic#28815)
  percolator: Take `matchAllDocs` and `verified` of the sub result into account when analyzing a function_score query.
  Build: Remove rest tests on archive distribution projects (elastic#28952)
  Remove FastStringReader in favor of vanilla StringReader (elastic#28944)
  Remove FastCharArrayReader and FastCharArrayWriter (elastic#28951)
  Continue registering pipelines after one pipeline parse failure. (elastic#28752)
  Build: Fix ability to ignore when no tests are run (elastic#28930)
  [rest-api-spec] update doc link for /_rank_eval
  Switch XContentBuilder from BytesStreamOutput to ByteArrayOutputStream (elastic#28945)
  Factor UnknownNamedObjectException into its own class (elastic#28931)
  ...
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Oct 1, 2018
In elastic#28941 we changed the computation of cluster state task descriptions but
this introduced a bug in which we only log the empty descriptions (rather than
the non-empty ones). This PR fixes that.
DaveCTurner added a commit that referenced this pull request Oct 2, 2018
In #28941 we changed the computation of cluster state task descriptions but
this introduced a bug in which we only log the empty descriptions (rather than
the non-empty ones). This change fixes that.
DaveCTurner added a commit that referenced this pull request Oct 2, 2018
In #28941 we changed the computation of cluster state task descriptions but
this introduced a bug in which we only log the empty descriptions (rather than
the non-empty ones). This change fixes that.
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Oct 2, 2018
In elastic#28941 we changed the computation of cluster state task descriptions but
this introduced a bug in which we only log the empty descriptions (rather than
the non-empty ones). This change fixes that. Backport of elastic#34182.
DaveCTurner added a commit that referenced this pull request Oct 2, 2018
In #28941 we changed the computation of cluster state task descriptions but
this introduced a bug in which we only log the empty descriptions (rather than
the non-empty ones). This change fixes that. Backport of #34182.
kcm pushed a commit that referenced this pull request Oct 30, 2018
In #28941 we changed the computation of cluster state task descriptions but
this introduced a bug in which we only log the empty descriptions (rather than
the non-empty ones). This change fixes that.
@DaveCTurner DaveCTurner deleted the 2018-03-08-describeTasks-using-String-join branch July 23, 2022 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. >non-issue v5.6.9 v6.3.0 v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants