Skip to content

Conversation

@iverase
Copy link
Contributor

@iverase iverase commented Feb 22, 2022

With the upgrade to Lucene 9.1, the Tessellator supports validity checks of the polygons and it will return a nice message to the user. That allows calling the normaliser only when necessary as the checks it was doing to the polygon are covered by the Tessellator.

In general this change of strategy should result in better error messages back to the user when a polygon is invalid.

closes #35349

@iverase iverase added >enhancement :Analytics/Geo Indexing, search aggregations of geo points and shapes v8.2.0 labels Feb 22, 2022
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Feb 22, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@iverase iverase marked this pull request as draft February 22, 2022 13:50
@elasticsearchmachine
Copy link
Collaborator

Hi @iverase, I've created a changelog YAML for you.

@iverase
Copy link
Contributor Author

iverase commented Feb 22, 2022

@elasticmachine update branch

@iverase iverase requested review from craigtaverner and imotov March 7, 2022 09:19
@elasticsearchmachine
Copy link
Collaborator

Hi @iverase, I've updated the changelog YAML for you.

@iverase iverase marked this pull request as ready for review March 7, 2022 09:19
Copy link
Contributor

@craigtaverner craigtaverner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks like an improvement for the error checking, but am worried about performance. Do we have regular benchmarks running that would catch a regression?

return Collections.emptyList();
}
geometry = GeometryNormalizer.apply(orientation, geometry);
if (GeometryNormalizer.needsNormalize(orientation, geometry)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code, needsNormalize does quite an extensive check, so I presume this adds some performance cost?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needsNormalize should run several times faster than the normalisation.

@Override
public Void visit(Polygon polygon) {
addFields(LatLonShape.createIndexableFields(name, GeoShapeUtils.toLucenePolygon(polygon)));
addFields(LatLonShape.createIndexableFields(name, GeoShapeUtils.toLucenePolygon(polygon), true));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additional checks in the tesselator also add some performance cost.

Copy link
Contributor Author

@iverase iverase Mar 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the hope is that the overhead of checking for intersection in the tessellator is compensated with skipping the normalisation step.

@iverase
Copy link
Contributor Author

iverase commented Mar 7, 2022

We don't have a rally track for geo_shape that runs daily but we are working on it: elastic/rally-tracks#238

Just run indexing manually and all-in-all it seems the effect in performance is very small, cumulative time just increased by ~1.75%.

|                                                        Metric |                          Task |   Baseline |   Contender |     Diff |   Unit |   Diff % |
|--------------------------------------------------------------:|------------------------------:|-----------:|------------:|---------:|-------:|---------:|
|                    Cumulative indexing time of primary shards |                               |    35.7289 |       36.35 |  0.62105 |    min |   +1.74% |
|             Min cumulative indexing time across primary shard |                               |     2.1934 |     2.29463 |  0.10123 |    min |   +4.62% |
|          Median cumulative indexing time across primary shard |                               |     16.689 |     16.9463 |  0.25733 |    min |   +1.54% |
|             Max cumulative indexing time across primary shard |                               |    16.8465 |      17.109 |  0.26248 |    min |   +1.56% |
|           Cumulative indexing throttle time of primary shards |                               |          0 |           0 |        0 |    min |    0.00% |
|    Min cumulative indexing throttle time across primary shard |                               |          0 |           0 |        0 |    min |    0.00% |
| Median cumulative indexing throttle time across primary shard |                               |          0 |           0 |        0 |    min |    0.00% |
|    Max cumulative indexing throttle time across primary shard |                               |          0 |           0 |        0 |    min |    0.00% |
|                       Cumulative merge time of primary shards |                               |    14.1262 |     15.0582 |  0.93203 |    min |   +6.60% |
|                      Cumulative merge count of primary shards |                               |         19 |          20 |        1 |        |   +5.26% |
|                Min cumulative merge time across primary shard |                               |    2.57527 |      2.6538 |  0.07853 |    min |   +3.05% |
|             Median cumulative merge time across primary shard |                               |    2.91672 |     3.77475 |  0.85803 |    min |  +29.42% |
|                Max cumulative merge time across primary shard |                               |    8.63423 |      8.6297 | -0.00453 |    min |   -0.05% |
|              Cumulative merge throttle time of primary shards |                               |     10.318 |      10.515 |  0.19703 |    min |   +1.91% |
|       Min cumulative merge throttle time across primary shard |                               |    1.78408 |      1.8912 |  0.10712 |    min |   +6.00% |
|    Median cumulative merge throttle time across primary shard |                               |    1.95482 |     2.36807 |  0.41325 |    min |  +21.14% |
|       Max cumulative merge throttle time across primary shard |                               |     6.5791 |     6.25577 | -0.32333 |    min |   -4.91% |
|                     Cumulative refresh time of primary shards |                               |    1.22522 |     1.20875 | -0.01647 |    min |   -1.34% |
|                    Cumulative refresh count of primary shards |                               |        142 |         143 |        1 |        |   +0.70% |
|              Min cumulative refresh time across primary shard |                               |   0.123433 |    0.127533 |   0.0041 |    min |   +3.32% |
|           Median cumulative refresh time across primary shard |                               |   0.159117 |    0.149583 | -0.00953 |    min |   -5.99% |
|              Max cumulative refresh time across primary shard |                               |   0.942667 |    0.931633 | -0.01103 |    min |   -1.17% |
|                       Cumulative flush time of primary shards |                               |    1.05685 |    0.977583 | -0.07927 |    min |   -7.50% |
|                      Cumulative flush count of primary shards |                               |         34 |          36 |        2 |        |   +5.88% |
|                Min cumulative flush time across primary shard |                               |    0.30235 |     0.23895 |  -0.0634 |    min |  -20.97% |
|             Median cumulative flush time across primary shard |                               |   0.324133 |     0.26285 | -0.06128 |    min |  -18.91% |
|                Max cumulative flush time across primary shard |                               |   0.430367 |    0.475783 |  0.04542 |    min |  +10.55% |
|                                       Total Young Gen GC time |                               |     10.306 |      10.311 |    0.005 |      s |   +0.05% |
|                                      Total Young Gen GC count |                               |       2336 |        2502 |      166 |        |   +7.11% |
|                                         Total Old Gen GC time |                               |          0 |           0 |        0 |      s |    0.00% |
|                                        Total Old Gen GC count |                               |          0 |           0 |        0 |        |    0.00% |
|                                                    Store size |                               |    15.4281 |     15.1183 | -0.30984 |     GB |   -2.01% |
|                                                 Translog size |                               |   0.426228 |    0.185329 |  -0.2409 |     GB |  -56.52% |
|                                        Heap used for segments |                               |          0 |           0 |        0 |     MB |    0.00% |
|                                      Heap used for doc values |                               |          0 |           0 |        0 |     MB |    0.00% |
|                                           Heap used for terms |                               |          0 |           0 |        0 |     MB |    0.00% |
|                                           Heap used for norms |                               |          0 |           0 |        0 |     MB |    0.00% |
|                                          Heap used for points |                               |          0 |           0 |        0 |     MB |    0.00% |
|                                   Heap used for stored fields |                               |          0 |           0 |        0 |     MB |    0.00% |
|                                                 Segment count |                               |         85 |          77 |       -8 |        |   -9.41% |
|                                                Min Throughput |      index-append-linestrings |    3294.77 |     2075.88 | -1218.89 | docs/s |  -36.99% |
|                                               Mean Throughput |      index-append-linestrings |    5403.66 |     5292.28 | -111.379 | docs/s |   -2.06% |
|                                             Median Throughput |      index-append-linestrings |    5295.44 |     5252.72 | -42.7218 | docs/s |   -0.81% |
|                                                Max Throughput |      index-append-linestrings |    7166.22 |     6470.59 | -695.622 | docs/s |   -9.71% |
|                                       50th percentile latency |      index-append-linestrings |    16.7324 |     16.7869 |  0.05447 |     ms |   +0.33% |
|                                       90th percentile latency |      index-append-linestrings |    23.6483 |     23.7894 |  0.14106 |     ms |   +0.60% |
|                                       99th percentile latency |      index-append-linestrings |    61.4017 |     63.9209 |  2.51921 |     ms |   +4.10% |
|                                     99.9th percentile latency |      index-append-linestrings |    132.675 |     141.512 |  8.83686 |     ms |   +6.66% |
|                                    99.99th percentile latency |      index-append-linestrings |    2375.61 |     2341.86 | -33.7483 |     ms |   -1.42% |
|                                      100th percentile latency |      index-append-linestrings |     2497.3 |     2474.51 | -22.7875 |     ms |   -0.91% |
|                                  50th percentile service time |      index-append-linestrings |    16.7324 |     16.7869 |  0.05447 |     ms |   +0.33% |
|                                  90th percentile service time |      index-append-linestrings |    23.6483 |     23.7894 |  0.14106 |     ms |   +0.60% |
|                                  99th percentile service time |      index-append-linestrings |    61.4017 |     63.9209 |  2.51921 |     ms |   +4.10% |
|                                99.9th percentile service time |      index-append-linestrings |    132.675 |     141.512 |  8.83686 |     ms |   +6.66% |
|                               99.99th percentile service time |      index-append-linestrings |    2375.61 |     2341.86 | -33.7483 |     ms |   -1.42% |
|                                 100th percentile service time |      index-append-linestrings |     2497.3 |     2474.51 | -22.7875 |     ms |   -0.91% |
|                                                    error rate |      index-append-linestrings |          0 |           0 |        0 |      % |    0.00% |
|                                                Min Throughput | index-append-multilinestrings |    183.326 |     573.222 |  389.896 | docs/s | +212.68% |
|                                               Mean Throughput | index-append-multilinestrings |    1313.34 |      1292.2 | -21.1443 | docs/s |   -1.61% |
|                                             Median Throughput | index-append-multilinestrings |    1377.24 |     1358.35 |  -18.894 | docs/s |   -1.37% |
|                                                Max Throughput | index-append-multilinestrings |     1436.5 |     1407.65 | -28.8521 | docs/s |   -2.01% |
|                                       50th percentile latency | index-append-multilinestrings |    58.8746 |     60.0739 |  1.19922 |     ms |   +2.04% |
|                                       90th percentile latency | index-append-multilinestrings |     97.748 |      100.27 |  2.52195 |     ms |   +2.58% |
|                                       99th percentile latency | index-append-multilinestrings |    160.413 |     168.742 |  8.32908 |     ms |   +5.19% |
|                                     99.9th percentile latency | index-append-multilinestrings |    360.004 |     407.042 |  47.0379 |     ms |  +13.07% |
|                                      100th percentile latency | index-append-multilinestrings |    533.189 |     510.677 | -22.5116 |     ms |   -4.22% |
|                                  50th percentile service time | index-append-multilinestrings |    58.8746 |     60.0739 |  1.19922 |     ms |   +2.04% |
|                                  90th percentile service time | index-append-multilinestrings |     97.748 |      100.27 |  2.52195 |     ms |   +2.58% |
|                                  99th percentile service time | index-append-multilinestrings |    160.413 |     168.742 |  8.32908 |     ms |   +5.19% |
|                                99.9th percentile service time | index-append-multilinestrings |    360.004 |     407.042 |  47.0379 |     ms |  +13.07% |
|                                 100th percentile service time | index-append-multilinestrings |    533.189 |     510.677 | -22.5116 |     ms |   -4.22% |
|                                                    error rate | index-append-multilinestrings |          0 |           0 |        0 |      % |    0.00% |
|                                                Min Throughput |         index-append-polygons |    3783.58 |     2684.51 | -1099.07 | docs/s |  -29.05% |
|                                               Mean Throughput |         index-append-polygons |    5171.62 |     5078.59 | -93.0297 | docs/s |   -1.80% |
|                                             Median Throughput |         index-append-polygons |    5187.75 |     5119.05 | -68.6972 | docs/s |   -1.32% |
|                                                Max Throughput |         index-append-polygons |    6367.59 |     6851.77 |  484.186 | docs/s |   +7.60% |
|                                       50th percentile latency |         index-append-polygons |    16.0851 |     16.2122 |   0.1271 |     ms |   +0.79% |
|                                       90th percentile latency |         index-append-polygons |    22.7093 |     23.0117 |  0.30236 |     ms |   +1.33% |
|                                       99th percentile latency |         index-append-polygons |    35.6411 |      35.869 |  0.22795 |     ms |   +0.64% |
|                                     99.9th percentile latency |         index-append-polygons |    73.7639 |     82.1317 |  8.36783 |     ms |  +11.34% |
|                                    99.99th percentile latency |         index-append-polygons |    1883.87 |     1879.16 | -4.70606 |     ms |   -0.25% |
|                                      100th percentile latency |         index-append-polygons |    2097.72 |     2132.22 |  34.4987 |     ms |   +1.64% |
|                                  50th percentile service time |         index-append-polygons |    16.0851 |     16.2122 |   0.1271 |     ms |   +0.79% |
|                                  90th percentile service time |         index-append-polygons |    22.7093 |     23.0117 |  0.30236 |     ms |   +1.33% |
|                                  99th percentile service time |         index-append-polygons |    35.6411 |      35.869 |  0.22795 |     ms |   +0.64% |
|                                99.9th percentile service time |         index-append-polygons |    73.7639 |     82.1317 |  8.36783 |     ms |  +11.34% |
|                               99.99th percentile service time |         index-append-polygons |    1883.87 |     1879.16 | -4.70606 |     ms |   -0.25% |
|                                 100th percentile service time |         index-append-polygons |    2097.72 |     2132.22 |  34.4987 |     ms |   +1.64% |
|                                                    error rate |         index-append-polygons |          0 |           0 |        0 |      % |    0.00% |

Copy link
Contributor

@imotov imotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@iverase iverase merged commit 83d0e8e into elastic:master Mar 10, 2022
@iverase iverase deleted the moveChecksTessellator branch March 10, 2022 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Geo Indexing, search aggregations of geo points and shapes >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[GEO] Explore replacing shape validation in ShapeBuilders with Lucene's Tessellator

5 participants