Skip to content

Compress geo-point field data #4386

@jpountz

Description

@jpountz

Today we use doubles in order to encode latitudes and longitudes when loading field data for geo points into memory. This is 16 bytes per geo point.

However, we could take advantage of the fact that values are in a fixed range, and maybe trade some precision for memory. In particular, I've been thinking about using a fixed-length encoding with configurable precision. This precision could be configurable in mappings:

PUT /test
{
    "mappings": {
        "test": {
            "properties": {
                "pin": {
                    "type": "geo_point",
                    "fielddata": {
                      "format": "compressed",
                      "precision": "1cm"
                   }
                }
            }
        }
    }
}

Here are some values of the number of bytes needed per geo point depending on the expected precision:

Precision Bytes per point Size reduction
1km 4 75%
3m 6 62.5%
1cm 8 50%
1mm 10 37.5%

I plan to use 1cm has the default, which is good I think since it would be accurate enough for most use-cases and would require 4 bytes per latitude and longitude, which can be efficiently stored in an int[] array, for best speed.

The same encoding could be used to implement doc values support (#4207).

For now, the default format is going to remain exact and based on two double[] arrays, so you need to explicitely opt-in for this format by configuring the field data format in the mappings.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions