Optimize version map for append-only indexing #27752

s1monw · 2017-12-11T13:28:58Z

Today we still maintain a version map even if we only index append-only
or in other words, documents with auto-generated IDs. We can instead maintain
an un-safe version map that will be swapped to a safe version map only if necessary
once we see the first document that requires access to the version map. For instance:

a auto-generated id retry
any kind of deletes
a document with a foreign ID (non-autogenerated

In these cases we forcefully refresh then internal reader and start maintaining
a version map until such a safe map wasn't necessary for two refresh cycles.
Indices / shards that never see an autogenerated ID document will always meintain a version
map and in the case of a delete / retry in a pure append-only index the version map will be
de-optimized for a short amount of time until we know it's safe again to swap back. This
will also minimize the required refreshes.

Today we still maintain a version map even if we only index append-only or in other words, documents with auto-generated IDs. We can instead maintain an un-safe version map that will be swapped to a safe version map only if necessary once we see the first document that requires access to the version map. For instance: * a auto-generated id retry * any kind of deletes * a document with a foreign ID (non-autogenerated In these cases we forcefully refresh then internal reader and start maintaining a version map until such a safe map wasn't necessary for two refresh cycles. Indices / shards that never see an autogenerated ID document will always meintain a version map and in the case of a delete / retry in a pure append-only index the version map will be de-optimized for a short amount of time until we know it's safe again to swap back. This will also minimize the requried refeshes.

s1monw · 2017-12-11T13:30:22Z

this shows significant indexing improvements:

Geopoints:


|                         Metric |         Task |   Baseline |   Contender |     Diff |   Unit |
|-------------------------------:|-------------:|-----------:|------------:|---------:|-------:|
|                  Indexing time |              |    26.9624 |     23.4305 | -3.53188 |    min |
|                     Merge time |              |    5.49877 |     4.53233 | -0.96643 |    min |
|                   Refresh time |              |    1.16957 |    0.993367 |  -0.1762 |    min |
|                     Flush time |              |    0.51805 |    0.468367 | -0.04968 |    min |
|            Merge throttle time |              |    1.76193 |     1.26982 | -0.49212 |    min |
|             Total Young Gen GC |              |     24.711 |      10.726 |  -13.985 |      s |
|               Total Old Gen GC |              |      0.727 |       0.489 |   -0.238 |      s |
|                     Index size |              |    6.03152 |     6.03097 | -0.00055 |     GB |
|                Totally written |              |    20.3174 |     19.3723 | -0.94511 |     GB |
|         Heap used for segments |              |    23.2156 |     22.2079 |  -1.0077 |     MB |
|       Heap used for doc values |              |  0.0111542 |   0.0166664 |  0.00551 |     MB |
|            Heap used for terms |              |    21.0925 |     20.2266 |  -0.8659 |     MB |
|            Heap used for norms |              |          0 |           0 |        0 |     MB |
|           Heap used for points |              |   0.879958 |    0.850516 | -0.02944 |     MB |
|    Heap used for stored fields |              |    1.23195 |     1.11408 | -0.11787 |     MB |
|                  Segment count |              |        172 |         161 |      -11 |        |
|                 Min Throughput | index-append |     182101 |      191317 |  9216.29 | docs/s |
|              Median Throughput | index-append |     183596 |      196235 |  12639.3 | docs/s |
|                 Max Throughput | index-append |     185967 |      199587 |  13620.4 | docs/s |
|        50th percentile latency | index-append |    170.042 |     154.773 | -15.2692 |     ms |
|        90th percentile latency | index-append |    284.776 |     266.989 | -17.7874 |     ms |
|        99th percentile latency | index-append |    694.106 |     724.595 |  30.4894 |     ms |
|      99.9th percentile latency | index-append |    1038.56 |      1844.6 |  806.043 |     ms |
|       100th percentile latency | index-append |    1361.68 |     2422.32 |  1060.64 |     ms |
|   50th percentile service time | index-append |    170.042 |     154.773 | -15.2692 |     ms |
|   90th percentile service time | index-append |    284.776 |     266.989 | -17.7874 |     ms |
|   99th percentile service time | index-append |    694.106 |     724.595 |  30.4894 |     ms |
| 99.9th percentile service time | index-append |    1038.56 |      1844.6 |  806.043 |     ms |
|  100th percentile service time | index-append |    1361.68 |     2422.32 |  1060.64 |     ms |
|                     error rate | index-append |          0 |           0 |        0 |      % |

and geonames:


|                        Metric |         Task |   Baseline |   Contender |     Diff |   Unit |
|------------------------------:|-------------:|-----------:|------------:|---------:|-------:|
|                 Indexing time |              |    22.4665 |     21.8181 | -0.64835 |    min |
|                    Merge time |              |    3.66453 |      3.5914 | -0.07313 |    min |
|                  Refresh time |              |    1.41763 |     1.41457 | -0.00307 |    min |
|                    Flush time |              |     0.3911 |      0.4539 |   0.0628 |    min |
|           Merge throttle time |              |   0.876583 |    0.900417 |  0.02383 |    min |
|            Total Young Gen GC |              |       9.87 |       7.339 |   -2.531 |      s |
|              Total Old Gen GC |              |       0.53 |       0.492 |   -0.038 |      s |
|                    Index size |              |    5.75518 |     5.87197 |  0.11678 |     GB |
|               Totally written |              |    13.1005 |     13.0266 | -0.07381 |     GB |
|        Heap used for segments |              |     21.857 |     21.9353 |  0.07827 |     MB |
|      Heap used for doc values |              |  0.0544395 |   0.0587387 |   0.0043 |     MB |
|           Heap used for terms |              |    20.5231 |     20.5907 |  0.06768 |     MB |
|           Heap used for norms |              |   0.142517 |    0.141418 |  -0.0011 |     MB |
|          Heap used for points |              |    0.28997 |    0.295731 |  0.00576 |     MB |
|   Heap used for stored fields |              |      0.847 |    0.848633 |  0.00163 |     MB |
|                 Segment count |              |        181 |         180 |       -1 |        |
|                Min Throughput | index-append |    70267.3 |     77435.2 |   7167.9 | docs/s |
|             Median Throughput | index-append |    71270.2 |       79231 |  7960.76 | docs/s |
|                Max Throughput | index-append |    71692.2 |       79573 |  7880.85 | docs/s |
|       50th percentile latency | index-append |    381.892 |     336.063 | -45.8289 |     ms |
|       90th percentile latency | index-append |    841.375 |     529.527 | -311.849 |     ms |
|       99th percentile latency | index-append |    1914.11 |     1827.01 | -87.1048 |     ms |
|      100th percentile latency | index-append |    2300.26 |     2282.18 | -18.0827 |     ms |
|  50th percentile service time | index-append |    381.892 |     336.063 | -45.8289 |     ms |
|  90th percentile service time | index-append |    841.375 |     529.527 | -311.849 |     ms |
|  99th percentile service time | index-append |    1914.11 |     1827.01 | -87.1048 |     ms |
| 100th percentile service time | index-append |    2300.26 |     2282.18 | -18.0827 |     ms |
|                    error rate | index-append |          0 |           0 |        0 |      % |

s1monw · 2017-12-11T13:30:54Z

@danielmitterdorfer FYI

s1monw · 2017-12-14T13:51:54Z

@bleskes does this resolve #19813 I'd say it does

bleskes · 2017-12-14T14:05:19Z

I agree it does, but I need to update it as we ended up doing a whole different approach.

…

On Thu, Dec 14, 2017 at 2:52 PM, Simon Willnauer ***@***.***> wrote: @bleskes <https://github.com/bleskes> does this resolve #19813 <#19813> I'd say it does — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#27752 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA9bJ9MK7G17HIlRaXXrGq-6znOWsuv3ks5tASgRgaJpZM4Q9Y87> .

bleskes

LGTM. This is awesome! I left some minor nits for your consideration. I also think we can push this to 6.x too, no?

bleskes · 2017-12-15T07:53:59Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

            assert index.version() == 1L : "can optimize on replicas but incoming version is [" + index.version() + "]";
            plan = IndexingStrategy.optimizedAppendOnly(index.seqNo());
        } else {
+            versionMap.enforceSafeAccess();


just an idea - shall we fold this into the IndexingStrategy? this means that we can set this flag under the static methods such as IndexingStrategy.overrideExistingAsIfNotThere. I think it will be easier to follow the logic.

bleskes · 2017-12-15T07:55:16Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

    private boolean assertDocDoesNotExist(final Index index, final boolean allowDeleted) throws IOException {
-        final VersionValue versionValue = versionMap.getUnderLock(index.uid().bytes());
+        final VersionValue versionValue = versionMap.getUnderLock(index.uid().bytes()); // this uses direct access to the version map -
+        // no refresh needed here


is it true that it's not needed or is it more that we don't want to change refresh semantics in an assert method.

well if I do that here we will never test our logic since we execute this assertion all the time.

bleskes · 2017-12-15T07:55:45Z

core/src/main/java/org/elasticsearch/index/engine/LiveVersionMap.java

+        // that will prevent concurrent updates to the same document ID and therefore we can rely on the happens-before guanratee of the
+        // map reference itself.
+        private boolean unsafe;
+        boolean safeAccessRequested = false;


I think this is a left over.

bleskes · 2017-12-15T07:59:13Z

core/src/main/java/org/elasticsearch/index/engine/LiveVersionMap.java

+
+        boolean shouldInheritSafeAccess() {
+            return needsSafeAccess
+                // previous map was empty and not unsafe but the map before needed it so we maintain it


This comment is still formulated as it was in the beforeRefresh method. Maybe now it should be adapted (this map hasn't seen any operation. We should transfer the value of the previous map so that noop refreshes will not reset it).

Alternatively we can move this logic to the beforeRefresh method as this is the only place it's used at.

bleskes · 2017-12-15T08:04:33Z

core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java

+    public void testVersionMapAfterAutoIDDocument() throws IOException {
+        ParsedDocument doc = testParsedDocument("1", null, testDocumentWithTextField(),
+            new BytesArray("{}".getBytes(Charset.defaultCharset())), null);
+        Engine.Index operation = appendOnlyReplica(doc, false, 1, randomIntBetween(0, 5));


should we randomize between primary and replica? they slightly different code paths in the engine.

bleskes · 2017-12-15T08:11:47Z

core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java

            };
            thread[i].start();
        }
+        try (Engine.Searcher searcher = engine.acquireSearcher("test", Engine.SearcherScope.INTERNAL)) {


bleskes · 2017-12-15T08:13:00Z

core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java

+    public void testConcurrentAppendUpdateAndRefresh() throws InterruptedException, IOException {
+        int numDocsPerThread = scaledRandomIntBetween(100, 1000);
+        CountDownLatch latch = new CountDownLatch(2);
+        AtomicInteger threadsRunning = new AtomicInteger();


it seems this can be an atomic boolean? we ended up with just one thread..

bleskes · 2017-12-15T08:13:26Z

core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java

    }
+
+    public void testConcurrentAppendUpdateAndRefresh() throws InterruptedException, IOException {
+        int numDocsPerThread = scaledRandomIntBetween(100, 1000);


bleskes · 2017-12-15T08:16:38Z

core/src/test/java/org/elasticsearch/index/engine/LiveVersionMapTests.java

+        assertFalse(map.isUnsafe());
+        assertNotNull(map.getUnderLock(uid("1")));
+        map.beforeRefresh();
+        assertFalse(map.isUnsafe());


add assert for isSafeAccessMode?

bleskes · 2017-12-15T08:16:47Z

core/src/test/java/org/elasticsearch/index/engine/LiveVersionMapTests.java

+        assertNotNull(map.getUnderLock(uid("1")));
+        map.afterRefresh(randomBoolean());
+        assertNull(map.getUnderLock(uid("1")));
+        assertFalse(map.isUnsafe());


add assert for isSafeAccessMode?

s1monw · 2017-12-15T09:51:02Z

I also, I think we can push this to 6.x too, no?

I thinks so. it turned out to be more contained as I though it would be

bleskes

still LGTM . Thanks @s1monw

Today we still maintain a version map even if we only index append-only or in other words, documents with auto-generated IDs. We can instead maintain an un-safe version map that will be swapped to a safe version map only if necessary once we see the first document that requires access to the version map. For instance: * a auto-generated id retry * any kind of deletes * a document with a foreign ID (non-autogenerated In these cases we forcefully refresh then internal reader and start maintaining a version map until such a safe map wasn't necessary for two refresh cycles. Indices / shards that never see an autogenerated ID document will always meintain a version map and in the case of a delete / retry in a pure append-only index the version map will be de-optimized for a short amount of time until we know it's safe again to swap back. This will also minimize the requried refeshes. Closes #19813

* es/6.x: (170 commits) Allow TrimFilter to be used in custom normalizers (#27758) recovery from snapshot should fill gaps (#27850) Remove unused class PreBuiltTokenFilters (#27839) Reject scroll query if size is 0 (#22552) (#27842) Mutes ‘Rollover no condition matched’ YAML test Make randomNonNegativeLong() draw from a uniform distribution (#27856) Adapt rest test after backport. Relates #27833 Handle case where the hole vertex is south of the containing polygon(s) (#27685) Move range field mapper back to core Fix publication of elasticsearch-cli to Maven Do not use system properties when building the HttpAsyncClient (#27829) Optimize version map for append-only indexing (#27752) Add NioGroup for use in different transports (#27737) adapt field collapsing skip test version. relates #27833 Add version support for inner hits in field collapsing (#27822) (#27833) Clarify that number of threads is set by packages Register HTTP read timeout setting Fixes Checkstyle Remove `operationThreaded` from Java API (#27836) Fixes failing BytesSizeValues tests ...

s1monw added :Engine >enhancement v7.0.0 labels Dec 11, 2017

s1monw requested a review from bleskes December 11, 2017 13:28

s1monw added 6 commits December 11, 2017 16:41

Merge branch 'master' into unsafe_version_map

2f9fdde

Merge branch 'master' into unsafe_version_map

cef82c0

pass map on to putUnderLock

c2ade56

add some tests for the optimization

5181730

Merge branch 'master' into unsafe_version_map

f7efcdb

remove imports

f331092

bleskes approved these changes Dec 15, 2017

View reviewed changes

s1monw added 3 commits December 15, 2017 09:31

Merge branch 'master' into unsafe_version_map

5816308

Apply feedback

28d8766

maintain an assertion map even if ops are unsafe

c4f97dc

s1monw added the v6.2.0 label Dec 15, 2017

bleskes reviewed Dec 15, 2017

View reviewed changes

s1monw merged commit d941c64 into elastic:master Dec 15, 2017

bleskes mentioned this pull request Dec 15, 2017

Optimize indexing in create once and never update scenarios #19813

Closed

10 tasks

bleskes added the backport pending label Dec 15, 2017

s1monw removed the backport pending label Dec 15, 2017

clintongormley added the :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. label Feb 13, 2018

clintongormley added :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Engine :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. labels Feb 13, 2018

dnhatn mentioned this pull request Apr 3, 2018

Can we somehow make the engine's LiveVersionMap tracking optional? #19787

Closed

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Optimize version map for append-only indexing #27752

Optimize version map for append-only indexing #27752

Uh oh!

Conversation

s1monw commented Dec 11, 2017

Uh oh!

s1monw commented Dec 11, 2017

Uh oh!

s1monw commented Dec 11, 2017

Uh oh!

s1monw commented Dec 14, 2017

Uh oh!

bleskes commented Dec 14, 2017 via email

Uh oh!

bleskes left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

s1monw commented Dec 15, 2017

Uh oh!

bleskes left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bleskes left a comment •

edited

Loading