Skip to content

Conversation

@DaveCTurner
Copy link
Contributor

@DaveCTurner DaveCTurner commented Jul 3, 2019

When profiling a call to AllocationService#reroute() in a large cluster
containing allocation filters of the form node-name-* I observed a nontrivial
amount of time spent in Regex#simpleMatch due to these allocation filters.
Patterns ending in a wildcard are not uncommon, and this change treats them as
a special case in Regex#simpleMatch in order to shave a bit of time off this
calculation. It also uses String#regionMatches() to avoid an allocation in
the case that the pattern's only wildcard is at the start.

Microbenchmark results before this change:

Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch":
  1113.839 ±(99.9%) 6.338 ns/op [Average]
  (min, avg, max) = (1102.388, 1113.839, 1135.783), stdev = 9.486
  CI (99.9%): [1107.502, 1120.177] (assumes normal distribution)

Microbenchmark results with this change applied:

Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch":
  433.190 ±(99.9%) 0.644 ns/op [Average]
  (min, avg, max) = (431.518, 433.190, 435.456), stdev = 0.964
  CI (99.9%): [432.546, 433.833] (assumes normal distribution)

The microbenchmark in question was:

@Fork(3)
@Warmup(iterations = 10)
@Measurement(iterations = 10)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
@SuppressWarnings("unused") //invoked by benchmarking framework
public class RegexStartsWithBenchmark {

    private static final String testString = "abcdefghijklmnopqrstuvwxyz";
    private static final String[] patterns;

    static {
        patterns = new String[testString.length() + 1];
        for (int i = 0; i <= testString.length(); i++) {
            patterns[i] = testString.substring(0, i) + "*";
        }
    }

    @Benchmark
    public void performSimpleMatch() {
        for (int i = 0; i < patterns.length; i++) {
            Regex.simpleMatch(patterns[i], testString);
        }
    }
}

When profiling a call to `AllocationService#reroute()` in a large cluster
containing allocation filters of the form `node-name-*` I observed a nontrivial
amount of time spent in `Regex#simpleMatch` due to these allocation filters.
Patterns ending in a wildcard are not uncommon, and this change treats them as
a special case in `Regex#simpleMatch` in order to shave a bit of time off this
calculation.

Microbenchmark results before this change:

    Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch":
      1113.839 ±(99.9%) 6.338 ns/op [Average]
      (min, avg, max) = (1102.388, 1113.839, 1135.783), stdev = 9.486
      CI (99.9%): [1107.502, 1120.177] (assumes normal distribution)

Microbenchmark results with this change applied:

    Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch":
      502.942 ±(99.9%) 0.590 ns/op [Average]
      (min, avg, max) = (501.292, 502.942, 504.490), stdev = 0.883
      CI (99.9%): [502.351, 503.532] (assumes normal distribution)

The microbenchmark in question was:

    @fork(3)
    @WarmUp(iterations = 10)
    @measurement(iterations = 10)
    @BenchmarkMode(Mode.AverageTime)
    @OutputTimeUnit(TimeUnit.NANOSECONDS)
    @State(Scope.Benchmark)
    @SuppressWarnings("unused") //invoked by benchmarking framework
    public class RegexStartsWithBenchmark {

        private static final String testString = "abcdefghijklmnopqrstuvwxyz";
        private static final String[] patterns;

        static {
            patterns = new String[testString.length() + 1];
            for (int i = 0; i <= testString.length(); i++) {
                patterns[i] = testString.substring(0, i) + "*";
            }
        }

        @benchmark
        public void performSimpleMatch() {
            for (int i = 0; i < patterns.length; i++) {
                Regex.simpleMatch(patterns[i], testString);
            }
        }
    }
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

}

public void testSimpleMatch() {
for (int i = 0; i < 100000; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover from benchmarking experiments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh 100k iterations is still only 300ms :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduced in 489e82b.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, but I wonder if we could get rid of more allocations (if that was the culprit?), see comments.

return false;
}
if (firstIndex == pattern.length() - 1) {
return str.startsWith(pattern.substring(0, firstIndex));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not entirely clear to me where the performance issue comes from, but a guess is that the substring allocations in the recursion below is part of it. I wonder if we should try to avoid the last substring call by using regionMatches?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL regionMatches is the name of the function I was trying to find. Yes, this shaves off a bit more time. I've updated the PR description with new benchmark results.

return str.startsWith(pattern.substring(0, firstIndex));
}
return (str.length() >= firstIndex &&
pattern.substring(0, firstIndex).equals(str.substring(0, firstIndex)) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on outcome of using regionMatches above, I would suggest also using it here (or str.startsWith if that is deemed faster).

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM2

@DaveCTurner DaveCTurner merged commit e514966 into elastic:master Jul 3, 2019
DaveCTurner added a commit that referenced this pull request Jul 3, 2019
When profiling a call to `AllocationService#reroute()` in a large cluster
containing allocation filters of the form `node-name-*` I observed a nontrivial
amount of time spent in `Regex#simpleMatch` due to these allocation filters.
Patterns ending in a wildcard are not uncommon, and this change treats them as
a special case in `Regex#simpleMatch` in order to shave a bit of time off this
calculation. It also uses `String#regionMatches()` to avoid an allocation in
the case that the pattern's only wildcard is at the start.

Microbenchmark results before this change:

    Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch":
      1113.839 ±(99.9%) 6.338 ns/op [Average]
      (min, avg, max) = (1102.388, 1113.839, 1135.783), stdev = 9.486
      CI (99.9%): [1107.502, 1120.177] (assumes normal distribution)

Microbenchmark results with this change applied:

    Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch":
      433.190 ±(99.9%) 0.644 ns/op [Average]
      (min, avg, max) = (431.518, 433.190, 435.456), stdev = 0.964
      CI (99.9%): [432.546, 433.833] (assumes normal distribution)

The microbenchmark in question was:

    @fork(3)
    @WarmUp(iterations = 10)
    @measurement(iterations = 10)
    @BenchmarkMode(Mode.AverageTime)
    @OutputTimeUnit(TimeUnit.NANOSECONDS)
    @State(Scope.Benchmark)
    @SuppressWarnings("unused") //invoked by benchmarking framework
    public class RegexStartsWithBenchmark {

        private static final String testString = "abcdefghijklmnopqrstuvwxyz";
        private static final String[] patterns;

        static {
            patterns = new String[testString.length() + 1];
            for (int i = 0; i <= testString.length(); i++) {
                patterns[i] = testString.substring(0, i) + "*";
            }
        }

        @benchmark
        public void performSimpleMatch() {
            for (int i = 0; i < patterns.length; i++) {
                Regex.simpleMatch(patterns[i], testString);
            }
        }
    }
@DaveCTurner DaveCTurner deleted the 2019-07-03-regex-startswith branch July 23, 2022 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants