-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
The current default precision of 5 gives a rough cell size of 5km x 5km which is great if the dataset is the UK or a US state but can be unwieldy for larger areas like the whole of the US or the whole of the world. At precision 5, there are 33,554,432 possible geohash buckets to collect during the collect phase on the shards. Whether all these buckets are collected depends a lot on how dense the data is. IF all the buckets are collected, a rough calculation estimates the required heap memory to be 512MB but this is a big if since 2/3 of the earth is ocean so most world datasets would not have much, if any data for these areas. This memory usage will increase significantly if sub-aggregations are used especially if they are complex and add more sub-buckets to each geo-grid bucket.
However the problem remains that with a default precision of 5 a modest cluster could easily run out of heap on a large dataset especially if sub-bucket aggregations are used.
We could reduce the default precision to 4 which would give cells roughly 39.1km x 19.5km in size and reduce the number of possible geohash buckets to 1,048,576 which would use roughly 16MB. At this precision the UK would still be roughly 300 cells which still sounds reasonable. The disadvantage here is that the defaults would be a lot less useful for county/city level use cases but I think the trade off might be ok here?