Skip to content

Conversation

@iverase
Copy link
Contributor

@iverase iverase commented Mar 2, 2022

This commit adds a new optional parameter on the vector tiles API called grid_agg with two
possible values, geotile (default) and geohex. This will allow to build the aggs layer using different
grid aggregations, for example we can have a grid aggregation that is built using hexagons.

Note the GeoHexGridAggregation only support data indexed using geo_point field type so this limitation
applies here too.

Docs preview

https://elasticsearch_84553.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/search-vector-tile-api.html

This commit adds a new optional parameter on the vector tiles API called `grid_agg` with two
possible values, geotile (default) and geohex. This will allow to build the aggs layer using different
grid aggregations, for example we can have a grid aggregation that is built using hexagons.

Note the GeoHexGridAggregation only support data indexed using geo_point field type so this limitation
applies here too.
@iverase iverase added >enhancement :Analytics/Geo Indexing, search aggregations of geo points and shapes v8.2.0 labels Mar 2, 2022
@iverase iverase requested review from imotov and jrodewig March 2, 2022 08:56
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 2, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @iverase, I've created a changelog YAML for you.

Copy link
Contributor

@jrodewig jrodewig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @iverase.

There were some additional changes needed for the docs. For example, in several places, the docs assume a geotile_grid aggregation is used.

I pushed 6cea458 to address those changes. I'd greatly appreciate your review on that commit.

If that looks good to you, these docs LGTM.

@iverase
Copy link
Contributor Author

iverase commented Mar 3, 2022

@jrodewig I love what you did, thanks!

// zoom 27: 13 | 14 | 14 | 15 | 15 | 15 | 15 | 15 |
// zoom 28: 14 | 14 | 15 | 15 | 15 | 15 | 15 | 15 |
// zoom 29: 14 | 15 | 15 | 15 | 15 | 15 | 15 | 15 |
return Math.min(H3.MAX_H3_RES, Math.max(2, (z + gridPrecision - 1) / 2));
Copy link
Contributor

@thomasneirynck thomasneirynck Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @iverase

@nreese and I were discussing this mapping-function earlier, in brining this into Kibana.

We're wondering if this is the right heuristic on how to map a geohex-level to a zoom-level.

What we would like to achieve is a heuristic that approximates density of hex-bins in the tile to that of what we have using grid or clusters.

This heuristic

target_hex_resolution = (<zoom> + grid_precision - 1) / 2

generally tends to create too fine hex-grids for a given grid_precision, compared to comparable tiles using clusters or grid, especially at lower zoom.

How grid_precision interacts with the existing mvt-api is simple: zoom + grid_precision = target_geotile_level. This mapping to target_geotile_level is easy to reason about. Hoping for something similar with geohex.

We would prefer a heuristic that maps target_geotile_zoom to the geohex_zoom based on density of bins.

So more something like:

target_hex_resolution = mapGeotileToHexbin(zoom + grid_precision)

where mapGeoTileToHexbin() is the mapping from level->h3 resolution from this table:

image

Copy link
Contributor Author

@iverase iverase Mar 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that the hex bins resolution for high zooms is too low and can be improved. Still what you propose would not wok either because the density of hex bin in your example is too big.

For geotile and grid_precision 8 we can generate up to 65536 bins which is the default limit for the number of buckets in a bucket aggregation. Therefore the density of hex bins should always be <= density of geotile, otherwise we might go over the limit of the number of buckets.

In your proposal, the density of hex bins can be up to 2.5 times higher than tile bins. We should make sure density is not bigger than 1 (it can be slightly bigger as the number of hex bins consider hexes around the poles which are not part of the mercator projection and for small zoom level when the number of bins < 65536).

Here is my approximation, what I wonder is how to document this.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated how we compute the H3 resolution by using the table above. Let me know what you think @thomasneirynck
@craigtaverner

Copy link
Contributor

@imotov imotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

iverase added 2 commits March 11, 2022 12:30
# Conflicts:
#	x-pack/plugin/vector-tile/src/main/java/org/elasticsearch/xpack/vectortile/feature/FeatureFactory.java
#	x-pack/plugin/vector-tile/src/test/java/org/elasticsearch/xpack/vectortile/feature/FeatureFactoriesConsistencyTests.java
@iverase
Copy link
Contributor Author

iverase commented Mar 15, 2022

@jrodewig Could you have a look into the docs again? We update the way we compute the precision for GeoHex so we needed to update the docs as well.

Copy link
Contributor

@nreese nreese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
tried hex bins out with Kibana, elastic/kibana#127170, and it looks great. Adding this to vector tile service makes it so easy to integrate into Kibana, thanks.

Copy link
Contributor

@jrodewig jrodewig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @iverase. I pushed 901b5c1 to fix some rendering issues and reorder some of the new docs. If those change look good to you and the CI passes, this LGTM.

@iverase
Copy link
Contributor Author

iverase commented Mar 15, 2022

I open #84993

@elasticmachine run elasticsearch-ci/part-2

@iverase
Copy link
Contributor Author

iverase commented Mar 15, 2022

@elasticmachine run elasticsearch-ci/packaging-tests-windows-sample

Copy link
Contributor

@craigtaverner craigtaverner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a few comments and suggestions, but otherwise I think it looks good.

// 26 4503599627370500 15 569707381193162 0.1265004504
// 27 18014398509482000 15 569707381193162 0.03162511259
// 28 72057594037927900 15 569707381193162 0.007906278149
// 29 288230376151712000 15 569707381193162 0.001976569537
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since H3 only goes down to 1m² and clearly elastic square tiles go down to much smaller tiles, how feasible would it be to increase the number of H3 levels to match the resolution of the tiles? Perhaps two more H3 levels would suffice? Or is it that in H3 the 0..15 range is very clearly hard-coded (eg. only 4 bits used to store the H3 level)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is just hard coded, e.g if you increase the max resolution all the test fail so there is much more into it. Note that hexagons don't divide the sphere perfectly so you have some pentagons (and other shapes) that need to fail the gaps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed in some Uber videos they were describing H3 tiles with fixed length strings, and tiles with less than level 15 would have suffixes made of the character f, so it appears that it is likely that the 0..15 range is correlated to the string length chosen for this encoding. I'd be curious to investigate more at some point.


@Override
public int gridPrecisionToAggPrecision(int z, int gridPrecision) {
return ZOOM2RESOLUTION[GEOTILE.gridPrecisionToAggPrecision(z, gridPrecision)];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry that we are trying to be too clever here, and trying to force the zoom levels of the H3 tiles to match the zoom levels of the grid. What about pushing this problem to the client? So when doing geotiles, the possible zoom values would be 1 to 29, but when doing H3 the possible values are reduced to 0 to 15. The server side code is much simpler, and clients like kibana need to worry about just what works best for visualization?

Copy link
Contributor Author

@iverase iverase Mar 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did consider having different precision meaning for geotile and geohex but I think it will make the API hard to use. In addition, we are already not exposing the geotile zoom while aggregating but a grid precision. Therefore I think it makes sense to adjust the grid precision to the density of bins.

}
final int aggPrecision = gridPrecisionToAggPrecision(z, gridPrecision);
if (aggPrecision < 2) {
// we need to consider all data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this? For H3 even level 0 divides the world into 20 tiles, so looking at all data seems to be an inefficient option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that some hex bins will be going over the pole and at the moment we are not able to compute the buffering using cartesian bounding boxes. This is an edge case that only happens when the grid_precision is <= 2 and you are at zoom = 0 or zoom = 1, therefore let consider it if there is some use case where performance is needed. wdyt?

// we need to consider all data
return new Rectangle(-180, 180, GeoTileUtils.LATITUDE_MASK, -GeoTileUtils.LATITUDE_MASK);
}
return new Rectangle(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, the constructor of Rectangle was confusion. I thought you had made a mistake with the 3rd and 4th arguments swapped around, but when I looked at the code for Rectangle it really does have minX, maxX, maxY, minY! I guess this is related to the convention to have the original at the top-left instead of bottom left?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is legacy and like you I found it confusing.

lats[i] = latLng.getLatDeg();
lons[i] = latLng.getLonDeg();
}
lats[boundary.numPoints()] = boundary.getLatLon(0).getLatDeg();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lats[boundary.numPoints()] = boundary.getLatLon(0).getLatDeg();
lats[boundary.numPoints()] = lats[0];

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@iverase
Copy link
Contributor Author

iverase commented Mar 15, 2022

#84980

@elasticmachine run elasticsearch-ci/part-2

@iverase iverase merged commit 3f6d460 into elastic:master Mar 16, 2022
@iverase iverase deleted the vectortilehex branch March 16, 2022 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Geo Indexing, search aggregations of geo points and shapes >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants