-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Rework checking if a year is a leap year #60585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
This way is faster, saving about 8% on the microbenchmark that rounds to the nearest month. That is in the hot path for `date_histogram` which is a very popular aggregation so it seems worth it to at least try and speed it up a little.
| * Blindly believe the numbers that your microbenchmark produces but verify them by measuring e.g. with `-prof perfasm`. | ||
| * Run more threads than your number of CPU cores (in case you run multi-threaded microbenchmarks). | ||
| * Look only at the `Score` column and ignore `Error`. Instead take countermeasures to keep `Error` low / variance explainable. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote all this while I was working on this because I couldn't figure out how to get actual intel assembly instructions. It turns out that if you install the hsdis that they build as part of the openjdk you'll just get jvm opcodes. Those are neat, but not useful to me. Instead you need the hsdis from the "Free Code Manipulation Library". Which you have to build. And it isn't a simple as git clone foo && cd foo && ./configure && make && sudo make install. You need to do what I did.
I can separate this out into a separate change if either change is controversial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems helpful to me... I would appreciate instructions like this if I were getting started :)
server/src/test/java/org/elasticsearch/common/time/DateUtilsRoundingTests.java
Show resolved
Hide resolved
polyfractal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we tried this with different JVM versions and/or processors? This sorta feels like it might be very specific to particular JVMs, or how the JVM optimizes for different processors?
| * Blindly believe the numbers that your microbenchmark produces but verify them by measuring e.g. with `-prof perfasm`. | ||
| * Run more threads than your number of CPU cores (in case you run multi-threaded microbenchmarks). | ||
| * Look only at the `Score` column and ignore `Error`. Instead take countermeasures to keep `Error` low / variance explainable. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems helpful to me... I would appreciate instructions like this if I were getting started :)
Just the bundled one on. None of the assembly that it spits out is particularly strange looking so I expect it'd be ok. I'm not sure that my weekend project is worth a huge amount of digging. If we think it'd turn sour sometimes then I don't think it is worth merging. |
polyfractal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, sounds good to me. Easy enough to revert if it tanks the benchmarks or something unexpected, but likely harmless in the worst case and plus some speed in the best case :)
| if (year % 100 != 0) { | ||
| return true; | ||
| } | ||
| return ((year / 100) & 3) == 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something I was thinking about - do we see the difference because of the change from year % 400 to (year / 100) & 3, or because of unrolling the short circuit logic? And if it's only the former, does it make more sense to leave this as a one-liner and let the optimizer short circuit it as necessary? I'm more asking out of curiosity about how the optimizer works than out of a need to see this changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the short circuiting is actually the same here. At least, that is how I read the Java. I believe the speed is entirely around the change from (year % 400) == 0 to ((year / 100) & 3) == 0. We certainly could keep it as a one liner, but I think it as harder to read that way. Certainly harder for me to track down the disassembly chunks because the disassembler sticks line numbers in there. Not that you can truly believe them though.
This way is faster, saving about 8% on the microbenchmark that rounds to the nearest month. That is in the hot path for `date_histogram` which is a very popular aggregation so it seems worth it to at least try and speed it up a little.
This way is faster, saving about 8% on the microbenchmark that rounds to the nearest month. That is in the hot path for `date_histogram` which is a very popular aggregation so it seems worth it to at least try and speed it up a little. Co-authored-by: Elastic Machine <[email protected]>
This way is faster, saving about 8% on the microbenchmark that rounds to
the nearest month. That is in the hot path for
date_histogramwhich isa very popular aggregation so it seems worth it to at least try and
speed it up a little.