Compress geo-point field data. #4387

jpountz · 2013-12-09T19:08:29Z

This commit allows to trade precision for memory when storing geo points.
GeoPointFieldMapper now accepts a precision parameter that controls the
maximum expected error for storing coordinates. This option can be updated on
a live index with the PUT mapping API.

Default precision is 1cm, which requires 8 bytes per geo-point (50% memory
saving compared to using 2 doubles). With this default precision,
GeoDistanceSearchBenchmark reports times which are 12 to 23% slower than
the previous field data implementation.

Close #4386

dadoonet · 2013-12-09T19:17:10Z

I think we should keep an option to not compress geopoints and keep it as fast as possible (with the trade-off with memory - as it is now).
I would set default to 3meters as I think most of use cases with geopoints don't go under 10 meters.
If 3 meters precision goes down to 6 bytes, that's seems to me a good default value.

nik9000 · 2013-12-09T19:44:36Z

docs/reference/mapping/types/geo-point-type.asciidoc

It'd be great to have more detail here like:

You can update this online with the mapping put api and it will cause newly added points to be reduced to this precision.

Something about the memory tradoffs with ballpark figures like you have in the commit message.

Agreed! Will do it shortly.

This commit allows to trade precision for memory when storing geo points. This new field data impl accepts a `precision` parameter that controls the maximum expected error for storing coordinates. This option can be updated on a live index with the PUT mapping API. Default precision is 1cm, which requires 8 bytes per geo-point (50% memory saving compared to using 2 doubles). Close elastic#4386

jpountz · 2013-12-12T09:36:53Z

Here is a new iteration of this pull request. Sorry to those who commented on the previous one but I did so many refactorings that I rather squashed the commits to make it more readable. Here are highlights of this new commit:

the default format for geo points is the double[] array one and is called array (as in master) making sure there is no precision loss,
the compressed field data format is now called compressed and the precision option is now part of the field data settings,
GeoPointFieldMapper was changed to extend AbstractFieldMapper. This allows to automatically benefit from some niceties of AbstractFieldMapper, like ouf-of-the-box support of field data settings updates (through the update mapping API)
I added duel tests between the array and the compressed field data formats.
Documentation now includes a table that describes the memory savings given various precisions, and makes it explicit that the precision can be updated on a live index.

However, I kept the default precision to 1cm instead of 3m as suggested by David (which would store geo points on 6 bytes instead of 8). The reason is that, when using 6 bytes, the field data impl is a bit slower because there needs to be some bit packing routines happening under the hood while at 8 bytes, data is stored into 2 int[] arrays. To get some speed back, one needs to configure precision to 1km (which translates to 4 bytes) and two short[] arrays will be used under the hood.

dadoonet · 2013-12-12T11:24:06Z

However, I kept the default precision to 1cm instead of 3m as suggested by David (which would store geo points on 6 bytes instead of 8). The reason is that, when using 6 bytes, the field data impl is a bit slower because there needs to be some bit packing routines happening under the hood while at 8 bytes, data is stored into 2 int[] arrays. To get some speed back, one needs to configure precision to 1km (which translates to 4 bytes) and two short[] arrays will be used under the hood.

Awesome explanation. It makes a lot of sense! Thanks.

…lastic#4387) Previously, phase X's `after` step had `X` as its associated phase. This causes confusion because we have only entered phase `X` once the `after` step is complete. Therefore, this refactor pushes the after's phase to be associated with the previous phase. This first phase is an exception. The first phase's `after` step is associated with the first phase (not some non-existent prior phase).

…4387) Previously, phase X's `after` step had `X` as its associated phase. This causes confusion because we have only entered phase `X` once the `after` step is complete. Therefore, this refactor pushes the after's phase to be associated with the previous phase. This first phase is an exception. The first phase's `after` step is associated with the first phase (not some non-existent prior phase).

nik9000 reviewed Dec 9, 2013
View reviewed changes

ghost assigned jpountz Dec 10, 2013

jpountz closed this Dec 17, 2013

jpountz deleted the enhancement/fielddata_geo_points branch December 17, 2013 10:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compress geo-point field data. #4387

Compress geo-point field data. #4387

Uh oh!

jpountz commented Dec 9, 2013

Uh oh!

dadoonet commented Dec 9, 2013

Uh oh!

nik9000 Dec 9, 2013

Uh oh!

jpountz Dec 10, 2013

Uh oh!

jpountz commented Dec 12, 2013

Uh oh!

dadoonet commented Dec 12, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Compress geo-point field data. #4387

Compress geo-point field data. #4387

Uh oh!

Conversation

jpountz commented Dec 9, 2013

Uh oh!

dadoonet commented Dec 9, 2013

Uh oh!

nik9000 Dec 9, 2013

Choose a reason for hiding this comment

Uh oh!

jpountz Dec 10, 2013

Choose a reason for hiding this comment

Uh oh!

jpountz commented Dec 12, 2013

Uh oh!

dadoonet commented Dec 12, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants