-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Today we use doubles in order to encode latitudes and longitudes when loading field data for geo points into memory. This is 16 bytes per geo point.
However, we could take advantage of the fact that values are in a fixed range, and maybe trade some precision for memory. In particular, I've been thinking about using a fixed-length encoding with configurable precision. This precision could be configurable in mappings:
PUT /test
{
"mappings": {
"test": {
"properties": {
"pin": {
"type": "geo_point",
"fielddata": {
"format": "compressed",
"precision": "1cm"
}
}
}
}
}
}Here are some values of the number of bytes needed per geo point depending on the expected precision:
| Precision | Bytes per point | Size reduction |
|---|---|---|
| 1km | 4 | 75% |
| 3m | 6 | 62.5% |
| 1cm | 8 | 50% |
| 1mm | 10 | 37.5% |
I plan to use 1cm has the default, which is good I think since it would be accurate enough for most use-cases and would require 4 bytes per latitude and longitude, which can be efficiently stored in an int[] array, for best speed.
The same encoding could be used to implement doc values support (#4207).
For now, the default format is going to remain exact and based on two double[] arrays, so you need to explicitely opt-in for this format by configuring the field data format in the mappings.