Skip to content

Conversation

@jpountz
Copy link
Contributor

@jpountz jpountz commented Dec 9, 2013

This commit allows to trade precision for memory when storing geo points.
GeoPointFieldMapper now accepts a precision parameter that controls the
maximum expected error for storing coordinates. This option can be updated on
a live index with the PUT mapping API.

Default precision is 1cm, which requires 8 bytes per geo-point (50% memory
saving compared to using 2 doubles). With this default precision,
GeoDistanceSearchBenchmark reports times which are 12 to 23% slower than
the previous field data implementation.

Close #4386

@dadoonet
Copy link
Contributor

dadoonet commented Dec 9, 2013

I think we should keep an option to not compress geopoints and keep it as fast as possible (with the trade-off with memory - as it is now).
I would set default to 3meters as I think most of use cases with geopoints don't go under 10 meters.
If 3 meters precision goes down to 6 bytes, that's seems to me a good default value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great to have more detail here like:

  • You can update this online with the mapping put api and it will cause newly added points to be reduced to this precision.
  • Something about the memory tradoffs with ballpark figures like you have in the commit message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! Will do it shortly.

@ghost ghost assigned jpountz Dec 10, 2013
This commit allows to trade precision for memory when storing geo points.
This new field data impl accepts a `precision` parameter that controls the
maximum expected error for storing coordinates. This option can be updated on
a live index with the PUT mapping API.

Default precision is 1cm, which requires 8 bytes per geo-point (50% memory
saving compared to using 2 doubles).

Close elastic#4386
@jpountz
Copy link
Contributor Author

jpountz commented Dec 12, 2013

Here is a new iteration of this pull request. Sorry to those who commented on the previous one but I did so many refactorings that I rather squashed the commits to make it more readable. Here are highlights of this new commit:

  • the default format for geo points is the double[] array one and is called array (as in master) making sure there is no precision loss,
  • the compressed field data format is now called compressed and the precision option is now part of the field data settings,
  • GeoPointFieldMapper was changed to extend AbstractFieldMapper. This allows to automatically benefit from some niceties of AbstractFieldMapper, like ouf-of-the-box support of field data settings updates (through the update mapping API)
  • I added duel tests between the array and the compressed field data formats.
  • Documentation now includes a table that describes the memory savings given various precisions, and makes it explicit that the precision can be updated on a live index.

However, I kept the default precision to 1cm instead of 3m as suggested by David (which would store geo points on 6 bytes instead of 8). The reason is that, when using 6 bytes, the field data impl is a bit slower because there needs to be some bit packing routines happening under the hood while at 8 bytes, data is stored into 2 int[] arrays. To get some speed back, one needs to configure precision to 1km (which translates to 4 bytes) and two short[] arrays will be used under the hood.

@dadoonet
Copy link
Contributor

However, I kept the default precision to 1cm instead of 3m as suggested by David (which would store geo points on 6 bytes instead of 8). The reason is that, when using 6 bytes, the field data impl is a bit slower because there needs to be some bit packing routines happening under the hood while at 8 bytes, data is stored into 2 int[] arrays. To get some speed back, one needs to configure precision to 1km (which translates to 4 bytes) and two short[] arrays will be used under the hood.

Awesome explanation. It makes a lot of sense! Thanks.

@jpountz jpountz closed this Dec 17, 2013
@jpountz jpountz deleted the enhancement/fielddata_geo_points branch December 17, 2013 10:44
talevy added a commit to talevy/elasticsearch that referenced this pull request Apr 25, 2018
…lastic#4387)

Previously, phase X's `after` step had `X` as its associated phase. This causes confusion because
we have only entered phase `X` once the `after` step is complete. Therefore, this refactor
pushes the after's phase to be associated with the previous phase. This first phase is an exception. 
The first phase's `after` step is associated with the first phase (not some non-existent prior phase).
talevy added a commit to talevy/elasticsearch that referenced this pull request May 14, 2018
…lastic#4387)

Previously, phase X's `after` step had `X` as its associated phase. This causes confusion because
we have only entered phase `X` once the `after` step is complete. Therefore, this refactor
pushes the after's phase to be associated with the previous phase. This first phase is an exception. 
The first phase's `after` step is associated with the first phase (not some non-existent prior phase).
jasontedor pushed a commit that referenced this pull request Aug 17, 2018
…4387)

Previously, phase X's `after` step had `X` as its associated phase. This causes confusion because
we have only entered phase `X` once the `after` step is complete. Therefore, this refactor
pushes the after's phase to be associated with the previous phase. This first phase is an exception. 
The first phase's `after` step is associated with the first phase (not some non-existent prior phase).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compress geo-point field data

3 participants