-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Optimize version map for append-only indexing #27752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Today we still maintain a version map even if we only index append-only or in other words, documents with auto-generated IDs. We can instead maintain an un-safe version map that will be swapped to a safe version map only if necessary once we see the first document that requires access to the version map. For instance: * a auto-generated id retry * any kind of deletes * a document with a foreign ID (non-autogenerated In these cases we forcefully refresh then internal reader and start maintaining a version map until such a safe map wasn't necessary for two refresh cycles. Indices / shards that never see an autogenerated ID document will always meintain a version map and in the case of a delete / retry in a pure append-only index the version map will be de-optimized for a short amount of time until we know it's safe again to swap back. This will also minimize the requried refeshes.
|
this shows significant indexing improvements: Geopoints: and geonames: |
|
I agree it does, but I need to update it as we ended up doing a whole
different approach.
…On Thu, Dec 14, 2017 at 2:52 PM, Simon Willnauer ***@***.***> wrote:
@bleskes <https://github.com/bleskes> does this resolve #19813
<#19813> I'd say it does
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#27752 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA9bJ9MK7G17HIlRaXXrGq-6znOWsuv3ks5tASgRgaJpZM4Q9Y87>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. This is awesome! I left some minor nits for your consideration. I also think we can push this to 6.x too, no?
| assert index.version() == 1L : "can optimize on replicas but incoming version is [" + index.version() + "]"; | ||
| plan = IndexingStrategy.optimizedAppendOnly(index.seqNo()); | ||
| } else { | ||
| versionMap.enforceSafeAccess(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just an idea - shall we fold this into the IndexingStrategy? this means that we can set this flag under the static methods such as IndexingStrategy.overrideExistingAsIfNotThere. I think it will be easier to follow the logic.
| private boolean assertDocDoesNotExist(final Index index, final boolean allowDeleted) throws IOException { | ||
| final VersionValue versionValue = versionMap.getUnderLock(index.uid().bytes()); | ||
| final VersionValue versionValue = versionMap.getUnderLock(index.uid().bytes()); // this uses direct access to the version map - | ||
| // no refresh needed here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it true that it's not needed or is it more that we don't want to change refresh semantics in an assert method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well if I do that here we will never test our logic since we execute this assertion all the time.
| // that will prevent concurrent updates to the same document ID and therefore we can rely on the happens-before guanratee of the | ||
| // map reference itself. | ||
| private boolean unsafe; | ||
| boolean safeAccessRequested = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a left over.
|
|
||
| boolean shouldInheritSafeAccess() { | ||
| return needsSafeAccess | ||
| // previous map was empty and not unsafe but the map before needed it so we maintain it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is still formulated as it was in the beforeRefresh method. Maybe now it should be adapted (this map hasn't seen any operation. We should transfer the value of the previous map so that noop refreshes will not reset it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively we can move this logic to the beforeRefresh method as this is the only place it's used at.
| public void testVersionMapAfterAutoIDDocument() throws IOException { | ||
| ParsedDocument doc = testParsedDocument("1", null, testDocumentWithTextField(), | ||
| new BytesArray("{}".getBytes(Charset.defaultCharset())), null); | ||
| Engine.Index operation = appendOnlyReplica(doc, false, 1, randomIntBetween(0, 5)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we randomize between primary and replica? they slightly different code paths in the engine.
| }; | ||
| thread[i].start(); | ||
| } | ||
| try (Engine.Searcher searcher = engine.acquireSearcher("test", Engine.SearcherScope.INTERNAL)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
| public void testConcurrentAppendUpdateAndRefresh() throws InterruptedException, IOException { | ||
| int numDocsPerThread = scaledRandomIntBetween(100, 1000); | ||
| CountDownLatch latch = new CountDownLatch(2); | ||
| AtomicInteger threadsRunning = new AtomicInteger(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems this can be an atomic boolean? we ended up with just one thread..
| } | ||
|
|
||
| public void testConcurrentAppendUpdateAndRefresh() throws InterruptedException, IOException { | ||
| int numDocsPerThread = scaledRandomIntBetween(100, 1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numDocs?
| assertFalse(map.isUnsafe()); | ||
| assertNotNull(map.getUnderLock(uid("1"))); | ||
| map.beforeRefresh(); | ||
| assertFalse(map.isUnsafe()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add assert for isSafeAccessMode?
| assertNotNull(map.getUnderLock(uid("1"))); | ||
| map.afterRefresh(randomBoolean()); | ||
| assertNull(map.getUnderLock(uid("1"))); | ||
| assertFalse(map.isUnsafe()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add assert for isSafeAccessMode?
I thinks so. it turned out to be more contained as I though it would be |
bleskes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still LGTM . Thanks @s1monw
Today we still maintain a version map even if we only index append-only or in other words, documents with auto-generated IDs. We can instead maintain an un-safe version map that will be swapped to a safe version map only if necessary once we see the first document that requires access to the version map. For instance: * a auto-generated id retry * any kind of deletes * a document with a foreign ID (non-autogenerated In these cases we forcefully refresh then internal reader and start maintaining a version map until such a safe map wasn't necessary for two refresh cycles. Indices / shards that never see an autogenerated ID document will always meintain a version map and in the case of a delete / retry in a pure append-only index the version map will be de-optimized for a short amount of time until we know it's safe again to swap back. This will also minimize the requried refeshes. Closes #19813
* es/6.x: (170 commits) Allow TrimFilter to be used in custom normalizers (#27758) recovery from snapshot should fill gaps (#27850) Remove unused class PreBuiltTokenFilters (#27839) Reject scroll query if size is 0 (#22552) (#27842) Mutes ‘Rollover no condition matched’ YAML test Make randomNonNegativeLong() draw from a uniform distribution (#27856) Adapt rest test after backport. Relates #27833 Handle case where the hole vertex is south of the containing polygon(s) (#27685) Move range field mapper back to core Fix publication of elasticsearch-cli to Maven Do not use system properties when building the HttpAsyncClient (#27829) Optimize version map for append-only indexing (#27752) Add NioGroup for use in different transports (#27737) adapt field collapsing skip test version. relates #27833 Add version support for inner hits in field collapsing (#27822) (#27833) Clarify that number of threads is set by packages Register HTTP read timeout setting Fixes Checkstyle Remove `operationThreaded` from Java API (#27836) Fixes failing BytesSizeValues tests ...
Today we still maintain a version map even if we only index append-only
or in other words, documents with auto-generated IDs. We can instead maintain
an un-safe version map that will be swapped to a safe version map only if necessary
once we see the first document that requires access to the version map. For instance:
In these cases we forcefully refresh then internal reader and start maintaining
a version map until such a safe map wasn't necessary for two refresh cycles.
Indices / shards that never see an autogenerated ID document will always meintain a version
map and in the case of a delete / retry in a pure append-only index the version map will be
de-optimized for a short amount of time until we know it's safe again to swap back. This
will also minimize the required refreshes.