Skip to content

Conversation

@spinscale
Copy link
Contributor

@spinscale spinscale commented Feb 1, 2019

The nightly benchmarks showed a sharp decrease in aggregation performance for
the UTC case.

This commit uses the same calculation as joda time, which requires no
conversion into any java time object, also, the check for an fixedoffset
has been put into the ctor to reduce the need for runtime calculations.
The same goes for the amount of the used unit in milliseconds.

Closes #37826

Benchmark comparison (on my notebook):

Before:

RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJava      avgt   30  140,989 ± 14,582  ns/op
RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJoda      avgt   30   11,981 ±  0,066  ns/op

final results comparing against the joda implementations (all benchmarks taken on my osx notebook, so take with a grain of salt)

Benchmark                                                 Mode  Cnt    Score    Error  Units
RoundingBenchmark.timeIntervalRoundingJava                avgt   30  226,556 ±  1,931  ns/op
RoundingBenchmark.timeIntervalRoundingJoda                avgt   30  328,073 ±  2,480  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitDayOfMonthJava  avgt   30   88,096 ±  1,887  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitDayOfMonthJoda  avgt   30  358,745 ± 22,062  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitJava            avgt   30  133,689 ±  0,774  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitJoda            avgt   30  357,008 ± 19,957  ns/op
RoundingBenchmark.timeUnitRoundingUtcDayOfMonthJava       avgt   30   11,695 ±  0,040  ns/op
RoundingBenchmark.timeUnitRoundingUtcDayOfMonthJoda       avgt   30   11,905 ±  0,056  ns/op
RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJava      avgt   30    3,189 ±  0,021  ns/op
RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJoda      avgt   30    5,465 ±  0,026  ns/op
RoundingBenchmark.timeUnitRoundingUtcQuarterOfYearJava    avgt   30    4,509 ±  0,059  ns/op
RoundingBenchmark.timeUnitRoundingUtcQuarterOfYearJoda    avgt   30   31,511 ±  0,414  ns/op
RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJava    avgt   30    2,991 ±  0,010  ns/op
RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJoda    avgt   30    6,456 ±  0,040  ns/op

I also added a duelling test class that runs against the old implementation to see if there are any inconsistencies. I ran this class a couple of million times on my linux without failure.

The benchmarks showed a sharp decrease in aggregation performance for
the UTC case.

This commit uses the same calculation as joda time, which requires no
conversion into any java time object, also, the check for an fixedoffset
has been put into the ctor to reduce the need for runtime calculations.
The same goes for the amount of the used unit in milliseconds.

Closes elastic#37826
@spinscale spinscale added >non-issue :Core/Infra/Core Core issues without another label v7.0.0 labels Feb 1, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@spinscale spinscale added the WIP label Feb 2, 2019
current results show slowdowns in quarter of year and year of century
parsing, the rest is faster/similar in this case.

RoundingBenchmark.timeIntervalRoundingJava                avgt   10  227,639 ± 12,084  ns/op
RoundingBenchmark.timeIntervalRoundingJoda                avgt   10  329,288 ± 15,914  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitDayOfMonthJava  avgt   10   87,215 ±  5,923  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitDayOfMonthJoda  avgt   10  370,219 ± 73,419  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitJava            avgt   10  135,884 ±  3,506  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitJoda            avgt   10  327,793 ± 27,550  ns/op
RoundingBenchmark.timeUnitRoundingUtcDayOfMonthJava       avgt   10   11,724 ±  0,266  ns/op
RoundingBenchmark.timeUnitRoundingUtcDayOfMonthJoda       avgt   10   11,785 ±  0,069  ns/op
RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJava      avgt   10   57,007 ±  1,023  ns/op
RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJoda      avgt   10   56,605 ±  0,725  ns/op
RoundingBenchmark.timeUnitRoundingUtcQuarterOfYearJava    avgt   10   57,992 ±  1,289  ns/op
RoundingBenchmark.timeUnitRoundingUtcQuarterOfYearJoda    avgt   10   29,979 ±  0,574  ns/op
RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJava    avgt   10   56,557 ±  0,684  ns/op
RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJoda    avgt   10    6,344 ±  0,035  ns/op
Use joda time code which solely uses milliseconds and never converts to
any other joda time field, thus being super fast.

Benchmark                                               Mode  Cnt  Score   Error  Units
RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJava  avgt   10  3,378 ± 0,036  ns/op
RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJoda  avgt   10  6,460 ± 0,027  ns/op
Benchmark                                                 Mode  Cnt    Score    Error  Units
RoundingBenchmark.timeIntervalRoundingJava                avgt   30  223,906 ±  1,248  ns/op
RoundingBenchmark.timeIntervalRoundingJoda                avgt   30  337,049 ± 14,682  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitDayOfMonthJava  avgt   30   85,784 ±  1,719  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitDayOfMonthJoda  avgt   30  364,764 ± 19,697  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitJava            avgt   30  133,247 ±  1,022  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitJoda            avgt   30  356,372 ± 17,340  ns/op
RoundingBenchmark.timeUnitRoundingUtcDayOfMonthJava       avgt   30   11,778 ±  0,044  ns/op
RoundingBenchmark.timeUnitRoundingUtcDayOfMonthJoda       avgt   30   11,922 ±  0,060  ns/op
RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJava      avgt   30   52,529 ±  3,865  ns/op
RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJoda      avgt   30   61,306 ±  1,581  ns/op
RoundingBenchmark.timeUnitRoundingUtcQuarterOfYearJava    avgt   30   22,152 ±  0,164  ns/op
RoundingBenchmark.timeUnitRoundingUtcQuarterOfYearJoda    avgt   30   30,734 ±  0,099  ns/op
RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJava    avgt   30    3,352 ±  0,021  ns/op
RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJoda    avgt   30    6,453 ±  0,072  ns/op
@spinscale spinscale removed the WIP label Feb 2, 2019
due to object creation in java time

Benchmark                                             Mode  Cnt   Score   Error  Units
RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJava  avgt   30  18,864 ± 0,636  ns/op
RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJoda  avgt   30   5,597 ± 0,117  ns/op
now java is same or faster everywhere

Benchmark                                                 Mode  Cnt    Score   Error  Units
RoundingBenchmark.timeIntervalRoundingJava                avgt   10  228,960 ± 4,507  ns/op
RoundingBenchmark.timeIntervalRoundingJoda                avgt   10  324,240 ± 1,546  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitDayOfMonthJava  avgt   10   87,284 ± 0,857  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitDayOfMonthJoda  avgt   10  358,485 ± 4,671  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitJava            avgt   10  137,124 ± 3,098  ns/op
RoundingBenchmark.timeRoundingDateTimeUnitJoda            avgt   10  327,330 ± 2,473  ns/op
RoundingBenchmark.timeUnitRoundingUtcDayOfMonthJava       avgt   10   11,772 ± 0,088  ns/op
RoundingBenchmark.timeUnitRoundingUtcDayOfMonthJoda       avgt   10   11,907 ± 0,132  ns/op
RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJava      avgt   10    3,538 ± 0,037  ns/op
RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJoda      avgt   10    5,567 ± 0,145  ns/op
RoundingBenchmark.timeUnitRoundingUtcQuarterOfYearJava    avgt   10    4,872 ± 0,126  ns/op
RoundingBenchmark.timeUnitRoundingUtcQuarterOfYearJoda    avgt   10   30,843 ± 2,215  ns/op
RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJava    avgt   10    3,382 ± 0,071  ns/op
RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJoda    avgt   10    6,466 ± 0,065  ns/op
@spinscale spinscale force-pushed the 1902-fix-utc-aggs-performance branch from 9cfb69a to 59db0ae Compare February 2, 2019 23:57
@spinscale
Copy link
Contributor Author

spinscale commented Feb 3, 2019

rally run of nyc_taxis against a recent master commit (on a linux desktop machine)

|   All |                   Min Throughput |       autohisto_agg |      1.2 |  ops/s |
|   All |                Median Throughput |       autohisto_agg |     1.21 |  ops/s |
|   All |                   Max Throughput |       autohisto_agg |     1.22 |  ops/s |
|   All |          50th percentile latency |       autohisto_agg |  33052.1 |     ms |
|   All |          90th percentile latency |       autohisto_agg |  45437.1 |     ms |
|   All |          99th percentile latency |       autohisto_agg |  48204.9 |     ms |
|   All |         100th percentile latency |       autohisto_agg |  48514.8 |     ms |
|   All |     50th percentile service time |       autohisto_agg |  812.235 |     ms |
|   All |     90th percentile service time |       autohisto_agg |  816.315 |     ms |
|   All |     99th percentile service time |       autohisto_agg |  821.443 |     ms |
|   All |    100th percentile service time |       autohisto_agg |  821.872 |     ms |
|   All |                       error rate |       autohisto_agg |        0 |      % |
|   All |                   Min Throughput |  date_histogram_agg |     1.22 |  ops/s |
|   All |                Median Throughput |  date_histogram_agg |     1.22 |  ops/s |
|   All |                   Max Throughput |  date_histogram_agg |     1.23 |  ops/s |
|   All |          50th percentile latency |  date_histogram_agg |  32331.2 |     ms |
|   All |          90th percentile latency |  date_histogram_agg |  44873.6 |     ms |
|   All |          99th percentile latency |  date_histogram_agg |  47697.7 |     ms |
|   All |         100th percentile latency |  date_histogram_agg |  48008.8 |     ms |
|   All |     50th percentile service time |  date_histogram_agg |  816.826 |     ms |
|   All |     90th percentile service time |  date_histogram_agg |  822.268 |     ms |
|   All |     99th percentile service time |  date_histogram_agg |  828.293 |     ms |
|   All |    100th percentile service time |  date_histogram_agg |  830.532 |     ms |
|   All |                       error rate |  date_histogram_agg |        0 |      % |

same run against this branch

|   All |                   Min Throughput |       autohisto_agg |         2 |  ops/s |
|   All |                Median Throughput |       autohisto_agg |         2 |  ops/s |
|   All |                   Max Throughput |       autohisto_agg |         2 |  ops/s |
|   All |          50th percentile latency |       autohisto_agg |   466.499 |     ms |
|   All |          90th percentile latency |       autohisto_agg |   489.928 |     ms |
|   All |          99th percentile latency |       autohisto_agg |   503.417 |     ms |
|   All |         100th percentile latency |       autohisto_agg |    512.86 |     ms |
|   All |     50th percentile service time |       autohisto_agg |   466.408 |     ms |
|   All |     90th percentile service time |       autohisto_agg |   489.372 |     ms |
|   All |     99th percentile service time |       autohisto_agg |   503.324 |     ms |
|   All |    100th percentile service time |       autohisto_agg |   512.777 |     ms |
|   All |                       error rate |       autohisto_agg |         0 |      % |
|   All |                   Min Throughput |  date_histogram_agg |         2 |  ops/s |
|   All |                Median Throughput |  date_histogram_agg |         2 |  ops/s |
|   All |                   Max Throughput |  date_histogram_agg |         2 |  ops/s |
|   All |          50th percentile latency |  date_histogram_agg |    449.92 |     ms |
|   All |          90th percentile latency |  date_histogram_agg |   465.649 |     ms |
|   All |          99th percentile latency |  date_histogram_agg |    483.38 |     ms |
|   All |         100th percentile latency |  date_histogram_agg |   485.472 |     ms |
|   All |     50th percentile service time |  date_histogram_agg |   449.809 |     ms |
|   All |     90th percentile service time |  date_histogram_agg |    465.46 |     ms |
|   All |     99th percentile service time |  date_histogram_agg |   483.307 |     ms |
|   All |    100th percentile service time |  date_histogram_agg |   485.251 |     ms |
|   All |                       error rate |  date_histogram_agg |         0 |      % |

Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments. The macrobenchmark results look promising indeed. I did not check the parts that you took from Joda time but I wonder whether we need to attribute this more clearly.

private final ZoneId zoneId = ZoneId.of("Europe/Amsterdam");
private final DateTimeZone timeZone = DateUtils.zoneIdToDateTimeZone(zoneId);

private final long timestamp = 1548879021354L;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a constant value (+ declaring it as final) might allow for additional compiler optimizations that are unrealistic. We also see that some of the results are very low. For example, timeUnitRoundingUtcYearOfCenturyJava takes just 3 nanoseconds which is only around 10 CPU cycles on a 3GHz core. Given the amount of work that the respective method is doing (it's basic arithmetic) it does not seem completely unreasonable but if we just consider the microbenchmark results this would still need a bit further investigation. However, the macrobenchmark results that you posted look fine and give me confidence that performance has improved substantially indeed.


/**
* This rounds down the supplied milliseconds since the epoch down to the next unit. In order to retain performance this method
* should be as fast as possiblee and not try to convert dates to java-time objects if possible
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: typo possiblee -> possible.

}

/*
* begin of code that is partially copied from the joda time implementation in order to make calculations about utc rounding much
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to attribute this even more clearly?

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left mostly suggestions about the names of things. I think we should have more thorough tests for the copied code, particularly around the boundaries (e.g. month boundaries ±1ms) and with negative numbers.

* @param utcMillis the milliseconds since the epoch
* @return The milliseconds since the epoch rounded down to the beginning of the year
*/
public static long getFirstDayOfYearMillis(long utcMillis) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe roundYear() to align with the naming convention above?

: ((i < 304 * 84375) ? 10 : (i < 334 * 84375) ? 11 : 12)));
}

private static long getYearMillis(int year) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be inlined?

private static final long APPROX_MILLIS_AT_EPOCH_DIVIDED_BY_TWO = (1970L * MILLIS_PER_YEAR) / 2;

// see org.joda.time.chrono.BasicChronology
private static int getYear(long instant) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename instant to utcMillis to align with the arguments elsewhere and to clarify what it means?

}

// see org.joda.time.chrono.BasicGJChronology
private static int getMonthOfYear(long millis, int year) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think millis here is also a utcMillis.

public static long roundQuarterOfYear(long utcMillis) {
int year = DateUtils.getYear(utcMillis);
int month = DateUtils.getMonthOfYear(utcMillis, year);
return DateUtils.of(year, Month.of(month).firstMonthOfQuarter().getValue());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this constructs a Month object simply to round it down to a multiple of 3. I don't think it loses much clarity to do this by hand instead.

* @return the milliseconds since the epoch of the first of january at midnight of the specified year
*/
// see org.joda.time.chrono.GregorianChronology
private static long calculateFirstDayOfYearMillis(int year) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the mention of Day in this method's name to be a little confusing. Maybe utcMillisAtStartOfYear?

@jasontedor jasontedor removed the v7.0.0 label Feb 6, 2019
@spinscale spinscale merged commit b955220 into elastic:master Feb 11, 2019
spinscale added a commit to spinscale/elasticsearch that referenced this pull request Feb 11, 2019
The benchmarks showed a sharp decrease in aggregation performance for
the UTC case.

This commit uses the same calculation as joda time, which requires no
conversion into any java time object, also, the check for an fixedoffset
has been put into the ctor to reduce the need for runtime calculations.
The same goes for the amount of the used unit in milliseconds.

Closes elastic#37826
spinscale added a commit to spinscale/elasticsearch that referenced this pull request Feb 11, 2019
The benchmarks showed a sharp decrease in aggregation performance for
the UTC case.

This commit uses the same calculation as joda time, which requires no
conversion into any java time object, also, the check for an fixedoffset
has been put into the ctor to reduce the need for runtime calculations.
The same goes for the amount of the used unit in milliseconds.

Closes elastic#37826
spinscale added a commit that referenced this pull request Feb 11, 2019
The benchmarks showed a sharp decrease in aggregation performance for
the UTC case.

This commit uses the same calculation as joda time, which requires no
conversion into any java time object, also, the check for an fixedoffset
has been put into the ctor to reduce the need for runtime calculations.
The same goes for the amount of the used unit in milliseconds.

Closes #37826
spinscale added a commit that referenced this pull request Feb 11, 2019
The benchmarks showed a sharp decrease in aggregation performance for
the UTC case.

This commit uses the same calculation as joda time, which requires no
conversion into any java time object, also, the check for an fixedoffset
has been put into the ctor to reduce the need for runtime calculations.
The same goes for the amount of the used unit in milliseconds.

Closes #37826
@jpountz
Copy link
Contributor

jpountz commented Jul 5, 2019

@spinscale I'm assuming there is nothing left to backport and removed the backport pending label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Drop in indexing performance and increase in agg latency after mappings/aggs switch to java time in #36363

8 participants