-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Compress geo-point field data. #4387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I think we should keep an option to not compress geopoints and keep it as fast as possible (with the trade-off with memory - as it is now). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be great to have more detail here like:
- You can update this online with the mapping put api and it will cause newly added points to be reduced to this precision.
- Something about the memory tradoffs with ballpark figures like you have in the commit message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! Will do it shortly.
This commit allows to trade precision for memory when storing geo points. This new field data impl accepts a `precision` parameter that controls the maximum expected error for storing coordinates. This option can be updated on a live index with the PUT mapping API. Default precision is 1cm, which requires 8 bytes per geo-point (50% memory saving compared to using 2 doubles). Close elastic#4386
|
Here is a new iteration of this pull request. Sorry to those who commented on the previous one but I did so many refactorings that I rather squashed the commits to make it more readable. Here are highlights of this new commit:
However, I kept the default precision to 1cm instead of 3m as suggested by David (which would store geo points on 6 bytes instead of 8). The reason is that, when using 6 bytes, the field data impl is a bit slower because there needs to be some bit packing routines happening under the hood while at 8 bytes, data is stored into 2 int[] arrays. To get some speed back, one needs to configure precision to 1km (which translates to 4 bytes) and two short[] arrays will be used under the hood. |
Awesome explanation. It makes a lot of sense! Thanks. |
…lastic#4387) Previously, phase X's `after` step had `X` as its associated phase. This causes confusion because we have only entered phase `X` once the `after` step is complete. Therefore, this refactor pushes the after's phase to be associated with the previous phase. This first phase is an exception. The first phase's `after` step is associated with the first phase (not some non-existent prior phase).
…lastic#4387) Previously, phase X's `after` step had `X` as its associated phase. This causes confusion because we have only entered phase `X` once the `after` step is complete. Therefore, this refactor pushes the after's phase to be associated with the previous phase. This first phase is an exception. The first phase's `after` step is associated with the first phase (not some non-existent prior phase).
…4387) Previously, phase X's `after` step had `X` as its associated phase. This causes confusion because we have only entered phase `X` once the `after` step is complete. Therefore, this refactor pushes the after's phase to be associated with the previous phase. This first phase is an exception. The first phase's `after` step is associated with the first phase (not some non-existent prior phase).
This commit allows to trade precision for memory when storing geo points.
GeoPointFieldMapper now accepts a
precisionparameter that controls themaximum expected error for storing coordinates. This option can be updated on
a live index with the PUT mapping API.
Default precision is 1cm, which requires 8 bytes per geo-point (50% memory
saving compared to using 2 doubles). With this default precision,
GeoDistanceSearchBenchmark reports times which are 12 to 23% slower than
the previous field data implementation.
Close #4386