Shortcut simple patterns ending in `*` #43904

DaveCTurner · 2019-07-03T07:48:34Z

When profiling a call to AllocationService#reroute() in a large cluster
containing allocation filters of the form node-name-* I observed a nontrivial
amount of time spent in Regex#simpleMatch due to these allocation filters.
Patterns ending in a wildcard are not uncommon, and this change treats them as
a special case in Regex#simpleMatch in order to shave a bit of time off this
calculation. It also uses String#regionMatches() to avoid an allocation in
the case that the pattern's only wildcard is at the start.

Microbenchmark results before this change:

Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch":
  1113.839 ±(99.9%) 6.338 ns/op [Average]
  (min, avg, max) = (1102.388, 1113.839, 1135.783), stdev = 9.486
  CI (99.9%): [1107.502, 1120.177] (assumes normal distribution)

Microbenchmark results with this change applied:

Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch":
  433.190 ±(99.9%) 0.644 ns/op [Average]
  (min, avg, max) = (431.518, 433.190, 435.456), stdev = 0.964
  CI (99.9%): [432.546, 433.833] (assumes normal distribution)

The microbenchmark in question was:

@Fork(3)
@Warmup(iterations = 10)
@Measurement(iterations = 10)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
@SuppressWarnings("unused") //invoked by benchmarking framework
public class RegexStartsWithBenchmark {

    private static final String testString = "abcdefghijklmnopqrstuvwxyz";
    private static final String[] patterns;

    static {
        patterns = new String[testString.length() + 1];
        for (int i = 0; i <= testString.length(); i++) {
            patterns[i] = testString.substring(0, i) + "*";
        }
    }

    @Benchmark
    public void performSimpleMatch() {
        for (int i = 0; i < patterns.length; i++) {
            Regex.simpleMatch(patterns[i], testString);
        }
    }
}

@measurement

When profiling a call to `AllocationService#reroute()` in a large cluster containing allocation filters of the form `node-name-*` I observed a nontrivial amount of time spent in `Regex#simpleMatch` due to these allocation filters. Patterns ending in a wildcard are not uncommon, and this change treats them as a special case in `Regex#simpleMatch` in order to shave a bit of time off this calculation. Microbenchmark results before this change: Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch": 1113.839 ±(99.9%) 6.338 ns/op [Average] (min, avg, max) = (1102.388, 1113.839, 1135.783), stdev = 9.486 CI (99.9%): [1107.502, 1120.177] (assumes normal distribution) Microbenchmark results with this change applied: Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch": 502.942 ±(99.9%) 0.590 ns/op [Average] (min, avg, max) = (501.292, 502.942, 504.490), stdev = 0.883 CI (99.9%): [502.351, 503.532] (assumes normal distribution) The microbenchmark in question was: @fork(3) @WarmUp(iterations = 10) @measurement(iterations = 10) @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) @SuppressWarnings("unused") //invoked by benchmarking framework public class RegexStartsWithBenchmark { private static final String testString = "abcdefghijklmnopqrstuvwxyz"; private static final String[] patterns; static { patterns = new String[testString.length() + 1]; for (int i = 0; i <= testString.length(); i++) { patterns[i] = testString.substring(0, i) + "*"; } } @benchmark public void performSimpleMatch() { for (int i = 0; i < patterns.length; i++) { Regex.simpleMatch(patterns[i], testString); } } }

elasticmachine · 2019-07-03T07:48:37Z

Pinging @elastic/es-core-infra

server/src/main/java/org/elasticsearch/common/regex/Regex.java

original-brownbear · 2019-07-03T08:14:39Z

server/src/test/java/org/elasticsearch/common/regex/RegexTests.java

    }
+
+    public void testSimpleMatch() {
+        for (int i = 0; i < 100000; i++) {


Leftover from benchmarking experiments?

Meh 100k iterations is still only 300ms :)

Reduced in 489e82b.

henningandersen

Looking good, but I wonder if we could get rid of more allocations (if that was the culprit?), see comments.

henningandersen · 2019-07-03T08:57:26Z

server/src/main/java/org/elasticsearch/common/regex/Regex.java

            return false;
        }
+        if (firstIndex == pattern.length() - 1) {
+            return str.startsWith(pattern.substring(0, firstIndex));


It is not entirely clear to me where the performance issue comes from, but a guess is that the substring allocations in the recursion below is part of it. I wonder if we should try to avoid the last substring call by using regionMatches?

TIL regionMatches is the name of the function I was trying to find. Yes, this shaves off a bit more time. I've updated the PR description with new benchmark results.

henningandersen · 2019-07-03T08:59:15Z

server/src/main/java/org/elasticsearch/common/regex/Regex.java

+            return str.startsWith(pattern.substring(0, firstIndex));
+        }
        return (str.length() >= firstIndex &&
                pattern.substring(0, firstIndex).equals(str.substring(0, firstIndex)) &&


Depending on outcome of using regionMatches above, I would suggest also using it here (or str.startsWith if that is deemed faster).

henningandersen

LGTM

original-brownbear

LGTM2

@measurement

When profiling a call to `AllocationService#reroute()` in a large cluster containing allocation filters of the form `node-name-*` I observed a nontrivial amount of time spent in `Regex#simpleMatch` due to these allocation filters. Patterns ending in a wildcard are not uncommon, and this change treats them as a special case in `Regex#simpleMatch` in order to shave a bit of time off this calculation. It also uses `String#regionMatches()` to avoid an allocation in the case that the pattern's only wildcard is at the start. Microbenchmark results before this change: Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch": 1113.839 ±(99.9%) 6.338 ns/op [Average] (min, avg, max) = (1102.388, 1113.839, 1135.783), stdev = 9.486 CI (99.9%): [1107.502, 1120.177] (assumes normal distribution) Microbenchmark results with this change applied: Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch": 433.190 ±(99.9%) 0.644 ns/op [Average] (min, avg, max) = (431.518, 433.190, 435.456), stdev = 0.964 CI (99.9%): [432.546, 433.833] (assumes normal distribution) The microbenchmark in question was: @fork(3) @WarmUp(iterations = 10) @measurement(iterations = 10) @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) @SuppressWarnings("unused") //invoked by benchmarking framework public class RegexStartsWithBenchmark { private static final String testString = "abcdefghijklmnopqrstuvwxyz"; private static final String[] patterns; static { patterns = new String[testString.length() + 1]; for (int i = 0; i <= testString.length(); i++) { patterns[i] = testString.substring(0, i) + "*"; } } @benchmark public void performSimpleMatch() { for (int i = 0; i < patterns.length; i++) { Regex.simpleMatch(patterns[i], testString); } } }

DaveCTurner added >enhancement :Core/Infra/Core Core issues without another label v8.0.0 v7.3.0 labels Jul 3, 2019

DaveCTurner requested review from henningandersen and original-brownbear July 3, 2019 07:48

Comment was a word

01ef72c

original-brownbear reviewed Jul 3, 2019

View reviewed changes

server/src/main/java/org/elasticsearch/common/regex/Regex.java Outdated Show resolved Hide resolved

DaveCTurner added 2 commits July 3, 2019 09:04

startsWith() can take an int

1cfa3e1

Bah doesn't work

589e8fe

original-brownbear reviewed Jul 3, 2019

View reviewed changes

DaveCTurner added 2 commits July 3, 2019 09:18

Fewer test iterations

489e82b

String#regionMatches

e93c1da

henningandersen reviewed Jul 3, 2019

View reviewed changes

henningandersen approved these changes Jul 3, 2019

View reviewed changes

original-brownbear approved these changes Jul 3, 2019

View reviewed changes

DaveCTurner merged commit e514966 into elastic:master Jul 3, 2019

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

DaveCTurner deleted the 2019-07-03-regex-startswith branch July 23, 2022 10:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Shortcut simple patterns ending in `*` #43904

Shortcut simple patterns ending in `*` #43904

Uh oh!

DaveCTurner commented Jul 3, 2019 •

edited

Loading

Uh oh!

elasticmachine commented Jul 3, 2019

Uh oh!

Uh oh!

original-brownbear Jul 3, 2019

Uh oh!

DaveCTurner Jul 3, 2019

Uh oh!

DaveCTurner Jul 3, 2019

Uh oh!

henningandersen left a comment

Uh oh!

henningandersen Jul 3, 2019

Uh oh!

DaveCTurner Jul 3, 2019

Uh oh!

henningandersen Jul 3, 2019

Uh oh!

henningandersen left a comment

Uh oh!

original-brownbear left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Shortcut simple patterns ending in * #43904

Shortcut simple patterns ending in * #43904

Uh oh!

Conversation

DaveCTurner commented Jul 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Jul 3, 2019

Uh oh!

Uh oh!

original-brownbear Jul 3, 2019

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Jul 3, 2019

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Jul 3, 2019

Choose a reason for hiding this comment

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

henningandersen Jul 3, 2019

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Jul 3, 2019

Choose a reason for hiding this comment

Uh oh!

henningandersen Jul 3, 2019

Choose a reason for hiding this comment

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

original-brownbear left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Shortcut simple patterns ending in `*` #43904

Shortcut simple patterns ending in `*` #43904

DaveCTurner commented Jul 3, 2019 •

edited

Loading