Skip to content

Conversation

@s1monw
Copy link
Contributor

@s1monw s1monw commented Dec 11, 2017

Today we still maintain a version map even if we only index append-only
or in other words, documents with auto-generated IDs. We can instead maintain
an un-safe version map that will be swapped to a safe version map only if necessary
once we see the first document that requires access to the version map. For instance:

  • a auto-generated id retry
  • any kind of deletes
  • a document with a foreign ID (non-autogenerated

In these cases we forcefully refresh then internal reader and start maintaining
a version map until such a safe map wasn't necessary for two refresh cycles.
Indices / shards that never see an autogenerated ID document will always meintain a version
map and in the case of a delete / retry in a pure append-only index the version map will be
de-optimized for a short amount of time until we know it's safe again to swap back. This
will also minimize the required refreshes.

Today we still maintain a version map even if we only index append-only
or in other words, documents with auto-generated IDs. We can instead maintain
an un-safe version map that will be swapped to a safe version map only if necessary
once we see the first document that requires access to the version map. For instance:
 * a auto-generated id retry
 * any kind of deletes
 * a document with a foreign ID (non-autogenerated

In these cases we forcefully refresh then internal reader and start maintaining
a version map until such a safe map wasn't necessary for two refresh cycles.
Indices / shards that never see an autogenerated ID document will always meintain a version
map and in the case of a delete / retry in a pure append-only index the version map will be
de-optimized for a short amount of time until we know it's safe again to swap back. This
will also minimize the requried refeshes.
@s1monw
Copy link
Contributor Author

s1monw commented Dec 11, 2017

this shows significant indexing improvements:

Geopoints:


|                         Metric |         Task |   Baseline |   Contender |     Diff |   Unit |
|-------------------------------:|-------------:|-----------:|------------:|---------:|-------:|
|                  Indexing time |              |    26.9624 |     23.4305 | -3.53188 |    min |
|                     Merge time |              |    5.49877 |     4.53233 | -0.96643 |    min |
|                   Refresh time |              |    1.16957 |    0.993367 |  -0.1762 |    min |
|                     Flush time |              |    0.51805 |    0.468367 | -0.04968 |    min |
|            Merge throttle time |              |    1.76193 |     1.26982 | -0.49212 |    min |
|             Total Young Gen GC |              |     24.711 |      10.726 |  -13.985 |      s |
|               Total Old Gen GC |              |      0.727 |       0.489 |   -0.238 |      s |
|                     Index size |              |    6.03152 |     6.03097 | -0.00055 |     GB |
|                Totally written |              |    20.3174 |     19.3723 | -0.94511 |     GB |
|         Heap used for segments |              |    23.2156 |     22.2079 |  -1.0077 |     MB |
|       Heap used for doc values |              |  0.0111542 |   0.0166664 |  0.00551 |     MB |
|            Heap used for terms |              |    21.0925 |     20.2266 |  -0.8659 |     MB |
|            Heap used for norms |              |          0 |           0 |        0 |     MB |
|           Heap used for points |              |   0.879958 |    0.850516 | -0.02944 |     MB |
|    Heap used for stored fields |              |    1.23195 |     1.11408 | -0.11787 |     MB |
|                  Segment count |              |        172 |         161 |      -11 |        |
|                 Min Throughput | index-append |     182101 |      191317 |  9216.29 | docs/s |
|              Median Throughput | index-append |     183596 |      196235 |  12639.3 | docs/s |
|                 Max Throughput | index-append |     185967 |      199587 |  13620.4 | docs/s |
|        50th percentile latency | index-append |    170.042 |     154.773 | -15.2692 |     ms |
|        90th percentile latency | index-append |    284.776 |     266.989 | -17.7874 |     ms |
|        99th percentile latency | index-append |    694.106 |     724.595 |  30.4894 |     ms |
|      99.9th percentile latency | index-append |    1038.56 |      1844.6 |  806.043 |     ms |
|       100th percentile latency | index-append |    1361.68 |     2422.32 |  1060.64 |     ms |
|   50th percentile service time | index-append |    170.042 |     154.773 | -15.2692 |     ms |
|   90th percentile service time | index-append |    284.776 |     266.989 | -17.7874 |     ms |
|   99th percentile service time | index-append |    694.106 |     724.595 |  30.4894 |     ms |
| 99.9th percentile service time | index-append |    1038.56 |      1844.6 |  806.043 |     ms |
|  100th percentile service time | index-append |    1361.68 |     2422.32 |  1060.64 |     ms |
|                     error rate | index-append |          0 |           0 |        0 |      % |

and geonames:


|                        Metric |         Task |   Baseline |   Contender |     Diff |   Unit |
|------------------------------:|-------------:|-----------:|------------:|---------:|-------:|
|                 Indexing time |              |    22.4665 |     21.8181 | -0.64835 |    min |
|                    Merge time |              |    3.66453 |      3.5914 | -0.07313 |    min |
|                  Refresh time |              |    1.41763 |     1.41457 | -0.00307 |    min |
|                    Flush time |              |     0.3911 |      0.4539 |   0.0628 |    min |
|           Merge throttle time |              |   0.876583 |    0.900417 |  0.02383 |    min |
|            Total Young Gen GC |              |       9.87 |       7.339 |   -2.531 |      s |
|              Total Old Gen GC |              |       0.53 |       0.492 |   -0.038 |      s |
|                    Index size |              |    5.75518 |     5.87197 |  0.11678 |     GB |
|               Totally written |              |    13.1005 |     13.0266 | -0.07381 |     GB |
|        Heap used for segments |              |     21.857 |     21.9353 |  0.07827 |     MB |
|      Heap used for doc values |              |  0.0544395 |   0.0587387 |   0.0043 |     MB |
|           Heap used for terms |              |    20.5231 |     20.5907 |  0.06768 |     MB |
|           Heap used for norms |              |   0.142517 |    0.141418 |  -0.0011 |     MB |
|          Heap used for points |              |    0.28997 |    0.295731 |  0.00576 |     MB |
|   Heap used for stored fields |              |      0.847 |    0.848633 |  0.00163 |     MB |
|                 Segment count |              |        181 |         180 |       -1 |        |
|                Min Throughput | index-append |    70267.3 |     77435.2 |   7167.9 | docs/s |
|             Median Throughput | index-append |    71270.2 |       79231 |  7960.76 | docs/s |
|                Max Throughput | index-append |    71692.2 |       79573 |  7880.85 | docs/s |
|       50th percentile latency | index-append |    381.892 |     336.063 | -45.8289 |     ms |
|       90th percentile latency | index-append |    841.375 |     529.527 | -311.849 |     ms |
|       99th percentile latency | index-append |    1914.11 |     1827.01 | -87.1048 |     ms |
|      100th percentile latency | index-append |    2300.26 |     2282.18 | -18.0827 |     ms |
|  50th percentile service time | index-append |    381.892 |     336.063 | -45.8289 |     ms |
|  90th percentile service time | index-append |    841.375 |     529.527 | -311.849 |     ms |
|  99th percentile service time | index-append |    1914.11 |     1827.01 | -87.1048 |     ms |
| 100th percentile service time | index-append |    2300.26 |     2282.18 | -18.0827 |     ms |
|                    error rate | index-append |          0 |           0 |        0 |      % |

@s1monw
Copy link
Contributor Author

s1monw commented Dec 11, 2017

@danielmitterdorfer FYI

@s1monw
Copy link
Contributor Author

s1monw commented Dec 14, 2017

@bleskes does this resolve #19813 I'd say it does

@bleskes
Copy link
Contributor

bleskes commented Dec 14, 2017 via email

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This is awesome! I left some minor nits for your consideration. I also think we can push this to 6.x too, no?

assert index.version() == 1L : "can optimize on replicas but incoming version is [" + index.version() + "]";
plan = IndexingStrategy.optimizedAppendOnly(index.seqNo());
} else {
versionMap.enforceSafeAccess();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just an idea - shall we fold this into the IndexingStrategy? this means that we can set this flag under the static methods such as IndexingStrategy.overrideExistingAsIfNotThere. I think it will be easier to follow the logic.

private boolean assertDocDoesNotExist(final Index index, final boolean allowDeleted) throws IOException {
final VersionValue versionValue = versionMap.getUnderLock(index.uid().bytes());
final VersionValue versionValue = versionMap.getUnderLock(index.uid().bytes()); // this uses direct access to the version map -
// no refresh needed here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it true that it's not needed or is it more that we don't want to change refresh semantics in an assert method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well if I do that here we will never test our logic since we execute this assertion all the time.

// that will prevent concurrent updates to the same document ID and therefore we can rely on the happens-before guanratee of the
// map reference itself.
private boolean unsafe;
boolean safeAccessRequested = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a left over.


boolean shouldInheritSafeAccess() {
return needsSafeAccess
// previous map was empty and not unsafe but the map before needed it so we maintain it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is still formulated as it was in the beforeRefresh method. Maybe now it should be adapted (this map hasn't seen any operation. We should transfer the value of the previous map so that noop refreshes will not reset it).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively we can move this logic to the beforeRefresh method as this is the only place it's used at.

public void testVersionMapAfterAutoIDDocument() throws IOException {
ParsedDocument doc = testParsedDocument("1", null, testDocumentWithTextField(),
new BytesArray("{}".getBytes(Charset.defaultCharset())), null);
Engine.Index operation = appendOnlyReplica(doc, false, 1, randomIntBetween(0, 5));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we randomize between primary and replica? they slightly different code paths in the engine.

};
thread[i].start();
}
try (Engine.Searcher searcher = engine.acquireSearcher("test", Engine.SearcherScope.INTERNAL)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

public void testConcurrentAppendUpdateAndRefresh() throws InterruptedException, IOException {
int numDocsPerThread = scaledRandomIntBetween(100, 1000);
CountDownLatch latch = new CountDownLatch(2);
AtomicInteger threadsRunning = new AtomicInteger();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems this can be an atomic boolean? we ended up with just one thread..

}

public void testConcurrentAppendUpdateAndRefresh() throws InterruptedException, IOException {
int numDocsPerThread = scaledRandomIntBetween(100, 1000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numDocs?

assertFalse(map.isUnsafe());
assertNotNull(map.getUnderLock(uid("1")));
map.beforeRefresh();
assertFalse(map.isUnsafe());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add assert for isSafeAccessMode?

assertNotNull(map.getUnderLock(uid("1")));
map.afterRefresh(randomBoolean());
assertNull(map.getUnderLock(uid("1")));
assertFalse(map.isUnsafe());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add assert for isSafeAccessMode?

@s1monw s1monw added the v6.2.0 label Dec 15, 2017
@s1monw
Copy link
Contributor Author

s1monw commented Dec 15, 2017

I also, I think we can push this to 6.x too, no?

I thinks so. it turned out to be more contained as I though it would be

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still LGTM . Thanks @s1monw

@s1monw s1monw merged commit d941c64 into elastic:master Dec 15, 2017
s1monw added a commit that referenced this pull request Dec 15, 2017
Today we still maintain a version map even if we only index append-only
or in other words, documents with auto-generated IDs. We can instead maintain
an un-safe version map that will be swapped to a safe version map only if necessary
once we see the first document that requires access to the version map. For instance:
 * a auto-generated id retry
 * any kind of deletes
 * a document with a foreign ID (non-autogenerated

In these cases we forcefully refresh then internal reader and start maintaining
a version map until such a safe map wasn't necessary for two refresh cycles.
Indices / shards that never see an autogenerated ID document will always meintain a version
map and in the case of a delete / retry in a pure append-only index the version map will be
de-optimized for a short amount of time until we know it's safe again to swap back. This
will also minimize the requried refeshes.

Closes #19813
martijnvg added a commit that referenced this pull request Dec 18, 2017
* es/6.x: (170 commits)
  Allow TrimFilter to be used in custom normalizers (#27758)
  recovery from snapshot should fill gaps (#27850)
  Remove unused class PreBuiltTokenFilters (#27839)
  Reject scroll query if size is 0 (#22552) (#27842)
  Mutes ‘Rollover no condition matched’ YAML test
  Make randomNonNegativeLong() draw from a uniform distribution (#27856)
  Adapt rest test after backport. Relates #27833
  Handle case where the hole vertex is south of the containing polygon(s) (#27685)
  Move range field mapper back to core
  Fix publication of elasticsearch-cli to Maven
  Do not use system properties when building the HttpAsyncClient (#27829)
  Optimize version map for append-only indexing (#27752)
  Add NioGroup for use in different transports (#27737)
  adapt field collapsing skip test version. relates #27833
  Add version support for inner hits in field collapsing (#27822) (#27833)
  Clarify that number of threads is set by packages
  Register HTTP read timeout setting
  Fixes Checkstyle
  Remove `operationThreaded` from Java API (#27836)
  Fixes failing BytesSizeValues tests
  ...
@clintongormley clintongormley added the :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. label Feb 13, 2018
@clintongormley clintongormley added :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Engine :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v6.2.0 v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants