-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Restore date aggregation performance in UTC case #38221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore date aggregation performance in UTC case #38221
Conversation
The benchmarks showed a sharp decrease in aggregation performance for the UTC case. This commit uses the same calculation as joda time, which requires no conversion into any java time object, also, the check for an fixedoffset has been put into the ctor to reduce the need for runtime calculations. The same goes for the amount of the used unit in milliseconds. Closes elastic#37826
|
Pinging @elastic/es-core-infra |
current results show slowdowns in quarter of year and year of century parsing, the rest is faster/similar in this case. RoundingBenchmark.timeIntervalRoundingJava avgt 10 227,639 ± 12,084 ns/op RoundingBenchmark.timeIntervalRoundingJoda avgt 10 329,288 ± 15,914 ns/op RoundingBenchmark.timeRoundingDateTimeUnitDayOfMonthJava avgt 10 87,215 ± 5,923 ns/op RoundingBenchmark.timeRoundingDateTimeUnitDayOfMonthJoda avgt 10 370,219 ± 73,419 ns/op RoundingBenchmark.timeRoundingDateTimeUnitJava avgt 10 135,884 ± 3,506 ns/op RoundingBenchmark.timeRoundingDateTimeUnitJoda avgt 10 327,793 ± 27,550 ns/op RoundingBenchmark.timeUnitRoundingUtcDayOfMonthJava avgt 10 11,724 ± 0,266 ns/op RoundingBenchmark.timeUnitRoundingUtcDayOfMonthJoda avgt 10 11,785 ± 0,069 ns/op RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJava avgt 10 57,007 ± 1,023 ns/op RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJoda avgt 10 56,605 ± 0,725 ns/op RoundingBenchmark.timeUnitRoundingUtcQuarterOfYearJava avgt 10 57,992 ± 1,289 ns/op RoundingBenchmark.timeUnitRoundingUtcQuarterOfYearJoda avgt 10 29,979 ± 0,574 ns/op RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJava avgt 10 56,557 ± 0,684 ns/op RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJoda avgt 10 6,344 ± 0,035 ns/op
Use joda time code which solely uses milliseconds and never converts to any other joda time field, thus being super fast. Benchmark Mode Cnt Score Error Units RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJava avgt 10 3,378 ± 0,036 ns/op RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJoda avgt 10 6,460 ± 0,027 ns/op
Benchmark Mode Cnt Score Error Units RoundingBenchmark.timeIntervalRoundingJava avgt 30 223,906 ± 1,248 ns/op RoundingBenchmark.timeIntervalRoundingJoda avgt 30 337,049 ± 14,682 ns/op RoundingBenchmark.timeRoundingDateTimeUnitDayOfMonthJava avgt 30 85,784 ± 1,719 ns/op RoundingBenchmark.timeRoundingDateTimeUnitDayOfMonthJoda avgt 30 364,764 ± 19,697 ns/op RoundingBenchmark.timeRoundingDateTimeUnitJava avgt 30 133,247 ± 1,022 ns/op RoundingBenchmark.timeRoundingDateTimeUnitJoda avgt 30 356,372 ± 17,340 ns/op RoundingBenchmark.timeUnitRoundingUtcDayOfMonthJava avgt 30 11,778 ± 0,044 ns/op RoundingBenchmark.timeUnitRoundingUtcDayOfMonthJoda avgt 30 11,922 ± 0,060 ns/op RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJava avgt 30 52,529 ± 3,865 ns/op RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJoda avgt 30 61,306 ± 1,581 ns/op RoundingBenchmark.timeUnitRoundingUtcQuarterOfYearJava avgt 30 22,152 ± 0,164 ns/op RoundingBenchmark.timeUnitRoundingUtcQuarterOfYearJoda avgt 30 30,734 ± 0,099 ns/op RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJava avgt 30 3,352 ± 0,021 ns/op RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJoda avgt 30 6,453 ± 0,072 ns/op
due to object creation in java time Benchmark Mode Cnt Score Error Units RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJava avgt 30 18,864 ± 0,636 ns/op RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJoda avgt 30 5,597 ± 0,117 ns/op
now java is same or faster everywhere Benchmark Mode Cnt Score Error Units RoundingBenchmark.timeIntervalRoundingJava avgt 10 228,960 ± 4,507 ns/op RoundingBenchmark.timeIntervalRoundingJoda avgt 10 324,240 ± 1,546 ns/op RoundingBenchmark.timeRoundingDateTimeUnitDayOfMonthJava avgt 10 87,284 ± 0,857 ns/op RoundingBenchmark.timeRoundingDateTimeUnitDayOfMonthJoda avgt 10 358,485 ± 4,671 ns/op RoundingBenchmark.timeRoundingDateTimeUnitJava avgt 10 137,124 ± 3,098 ns/op RoundingBenchmark.timeRoundingDateTimeUnitJoda avgt 10 327,330 ± 2,473 ns/op RoundingBenchmark.timeUnitRoundingUtcDayOfMonthJava avgt 10 11,772 ± 0,088 ns/op RoundingBenchmark.timeUnitRoundingUtcDayOfMonthJoda avgt 10 11,907 ± 0,132 ns/op RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJava avgt 10 3,538 ± 0,037 ns/op RoundingBenchmark.timeUnitRoundingUtcMonthOfYearJoda avgt 10 5,567 ± 0,145 ns/op RoundingBenchmark.timeUnitRoundingUtcQuarterOfYearJava avgt 10 4,872 ± 0,126 ns/op RoundingBenchmark.timeUnitRoundingUtcQuarterOfYearJoda avgt 10 30,843 ± 2,215 ns/op RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJava avgt 10 3,382 ± 0,071 ns/op RoundingBenchmark.timeUnitRoundingUtcYearOfCenturyJoda avgt 10 6,466 ± 0,065 ns/op
9cfb69a to
59db0ae
Compare
|
rally run of same run against this branch |
danielmitterdorfer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments. The macrobenchmark results look promising indeed. I did not check the parts that you took from Joda time but I wonder whether we need to attribute this more clearly.
| private final ZoneId zoneId = ZoneId.of("Europe/Amsterdam"); | ||
| private final DateTimeZone timeZone = DateUtils.zoneIdToDateTimeZone(zoneId); | ||
|
|
||
| private final long timestamp = 1548879021354L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a constant value (+ declaring it as final) might allow for additional compiler optimizations that are unrealistic. We also see that some of the results are very low. For example, timeUnitRoundingUtcYearOfCenturyJava takes just 3 nanoseconds which is only around 10 CPU cycles on a 3GHz core. Given the amount of work that the respective method is doing (it's basic arithmetic) it does not seem completely unreasonable but if we just consider the microbenchmark results this would still need a bit further investigation. However, the macrobenchmark results that you posted look fine and give me confidence that performance has improved substantially indeed.
|
|
||
| /** | ||
| * This rounds down the supplied milliseconds since the epoch down to the next unit. In order to retain performance this method | ||
| * should be as fast as possiblee and not try to convert dates to java-time objects if possible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: typo possiblee -> possible.
| } | ||
|
|
||
| /* | ||
| * begin of code that is partially copied from the joda time implementation in order to make calculations about utc rounding much |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to attribute this even more clearly?
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left mostly suggestions about the names of things. I think we should have more thorough tests for the copied code, particularly around the boundaries (e.g. month boundaries ±1ms) and with negative numbers.
| * @param utcMillis the milliseconds since the epoch | ||
| * @return The milliseconds since the epoch rounded down to the beginning of the year | ||
| */ | ||
| public static long getFirstDayOfYearMillis(long utcMillis) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe roundYear() to align with the naming convention above?
| : ((i < 304 * 84375) ? 10 : (i < 334 * 84375) ? 11 : 12))); | ||
| } | ||
|
|
||
| private static long getYearMillis(int year) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be inlined?
| private static final long APPROX_MILLIS_AT_EPOCH_DIVIDED_BY_TWO = (1970L * MILLIS_PER_YEAR) / 2; | ||
|
|
||
| // see org.joda.time.chrono.BasicChronology | ||
| private static int getYear(long instant) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename instant to utcMillis to align with the arguments elsewhere and to clarify what it means?
| } | ||
|
|
||
| // see org.joda.time.chrono.BasicGJChronology | ||
| private static int getMonthOfYear(long millis, int year) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think millis here is also a utcMillis.
| public static long roundQuarterOfYear(long utcMillis) { | ||
| int year = DateUtils.getYear(utcMillis); | ||
| int month = DateUtils.getMonthOfYear(utcMillis, year); | ||
| return DateUtils.of(year, Month.of(month).firstMonthOfQuarter().getValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this constructs a Month object simply to round it down to a multiple of 3. I don't think it loses much clarity to do this by hand instead.
| * @return the milliseconds since the epoch of the first of january at midnight of the specified year | ||
| */ | ||
| // see org.joda.time.chrono.GregorianChronology | ||
| private static long calculateFirstDayOfYearMillis(int year) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the mention of Day in this method's name to be a little confusing. Maybe utcMillisAtStartOfYear?
The benchmarks showed a sharp decrease in aggregation performance for the UTC case. This commit uses the same calculation as joda time, which requires no conversion into any java time object, also, the check for an fixedoffset has been put into the ctor to reduce the need for runtime calculations. The same goes for the amount of the used unit in milliseconds. Closes elastic#37826
The benchmarks showed a sharp decrease in aggregation performance for the UTC case. This commit uses the same calculation as joda time, which requires no conversion into any java time object, also, the check for an fixedoffset has been put into the ctor to reduce the need for runtime calculations. The same goes for the amount of the used unit in milliseconds. Closes elastic#37826
The benchmarks showed a sharp decrease in aggregation performance for the UTC case. This commit uses the same calculation as joda time, which requires no conversion into any java time object, also, the check for an fixedoffset has been put into the ctor to reduce the need for runtime calculations. The same goes for the amount of the used unit in milliseconds. Closes #37826
The benchmarks showed a sharp decrease in aggregation performance for the UTC case. This commit uses the same calculation as joda time, which requires no conversion into any java time object, also, the check for an fixedoffset has been put into the ctor to reduce the need for runtime calculations. The same goes for the amount of the used unit in milliseconds. Closes #37826
|
@spinscale I'm assuming there is nothing left to backport and removed the backport pending label. |
The nightly benchmarks showed a sharp decrease in aggregation performance for
the UTC case.
This commit uses the same calculation as joda time, which requires no
conversion into any java time object, also, the check for an fixedoffset
has been put into the ctor to reduce the need for runtime calculations.
The same goes for the amount of the used unit in milliseconds.
Closes #37826
Benchmark comparison (on my notebook):
Before:
final results comparing against the joda implementations (all benchmarks taken on my osx notebook, so take with a grain of salt)
I also added a duelling test class that runs against the old implementation to see if there are any inconsistencies. I ran this class a couple of million times on my linux without failure.