Watcher: Use Bulkprocessor in HistoryStore/TriggeredWatchStore #32490

spinscale · 2018-07-31T08:48:55Z

Currently a watch execution results in one bulk request, when the
triggered watches are written into the that index, that need to be
executed. However the update of the watch status, the creation of the
watch history entry as well as the deletion of the triggered watches
index are all single document operations.

This can have quite a negative impact, once you are executing a lot of
watches, as each execution results in 4 documents writes, three of them
being single document actions.

This commit switches to a bulk processor instead of a single document
action for writing watch history entries and deleting triggered watch
entries. However the defaults are to run synchronous as before because
the number of concurrent requests is set to 0. This also fixes a bug,
where the deletion of the triggered watch entry was done asynchronously.

However if you have a high number of watches being executed, you can
configure watcher to delete the triggered watches entries as well as
writing the watch history entries via bulk requests.

The triggered watches deletions should still happen in a timely manner,
where as the history entries might actually be bound by size as one
entry can easily have 20kb.

The following settings have been added:

xpack.watcher.bulk.actions (default 1)
xpack.watcher.bulk.concurrent_requests (default 0)
xpack.watcher.bulk.flush_interval (default 1s)
xpack.watcher.bulk.size (default 1mb)

The drawback of this is of course, that on a node outage you might end
up with watch history entries not being written or watches needing to be
executing again because they have not been deleted from the triggered
watches index. The window of these two cases increases configuring the bulk processor to wait to reach certain thresholds.

Currently a watch execution results in one bulk request, when the triggered watches are written into the that index, that need to be executed. However the update of the watch status, the creation of the watch history entry as well as the deletion of the triggered watches index are all single document operations. This can have quite a negative impact, once you are executing a lot of watches, as each execution results in 4 documents writes, three of them being single document actions. This commit switches to a bulk processor instead of a single document action for writing watch history entries and deleting triggered watch entries. However the defaults are to run synchronous as before because the number of concurrent requests is set to 0. This also fixes a bug, where the deletion of the triggered watch entry was done asynchronously. However if you have a high number of watches being executed, you can configure watcher to delete the triggered watches entries as well as writing the watch history entries via bulk requests. The triggered watches deletions should still happen in a timely manner, where as the history entries might actually be bound by size as one entry can easily have 20kb. The following settings have been added: - xpack.watcher.triggered_watch_store.bulk.actions (default 1) - xpack.watcher.triggered_watch_store.bulk.concurrent_requests (default 0) - xpack.watcher.triggered_watch_store.bulk.flush_interval (default 1s) - xpack.watcher.triggered_watch_store.bulk.size (default 1mb) - xpack.watcher.history_store.bulk.actions (default 1) - xpack.watcher.history_store.bulk.concurrent_requests (default 0) - xpack.watcher.history_store.bulk.flush_interval (default 1s) - xpack.watcher.history_store.bulk.size (default 1mb) The drawback of this is of course, that on a node outage you might end up with watch history entries not being written or watches needing to be executing again because they have not been deleted from the triggered watches index. The window of these two cases increases when running bulk requests.

elasticmachine · 2018-07-31T08:48:57Z

Pinging @elastic/es-core-infra

spinscale · 2018-07-31T14:15:11Z

@elasticmachine retest this please

…ry-store-and-triggered-watches-store

hub-cap

wow this actually cleaned things up a good bit. Great job. I had one question about sync vs async deletes, and once you answer me on that, this is good to go. Ill pre-approve your loan tho :)

hub-cap · 2018-08-13T16:09:45Z

...gin/watcher/src/main/java/org/elasticsearch/xpack/watcher/execution/TriggeredWatchStore.java

-            client.delete(request); // FIXME shouldn't we wait before saying the delete was successful
-        }
-        logger.trace("successfully deleted triggered watch with id [{}]", wid);
+        bulkProcessor.add(request);


Does executing watches take into consideration any deleted requests here? what happens if i delete a watch and then wait to submit the bulk, might that watch trigger again? Should we make deletes be the only sync thing? I think thats a fair tradeoff given my very little knowledge here.

yes it does. If the watch is deleted and then a triggered watch is picked, there will be a watch history entry telling you that the referenced watch could not be found.

…ry-store-and-triggered-watches-store

spinscale · 2018-08-16T08:38:17Z

I thought about this as I wanted to merge and did a change that warrants some discussion. I have added a new commit using only one bulk processor and one set of settings instead. Do you think that's better or should be reverted and we go with two?

My initial thought was, that splitting those makes sense because the documents are extremely different in size, so one could configure different settings - but at the end you configure a flush interval or based on size so all of this can be done within one bulk processor. One bulk processor with correctly tuned settings will be as good as two.

…ry-store-and-triggered-watches-store

hub-cap · 2018-08-21T14:48:59Z

I think this change makes sense. Its less junk in the service, making them easier to test, and it removes a bunch of nested close logic that we might forget if we change/add something. As long as we dont start adding this to the components list for guice, im all for it ;)

I also dont see the value in 2 diff settings given that we can flush on size and time. I agree w/ your tuning statement above.

hub-cap · 2018-08-21T14:49:30Z

be sure to modify your commit msg and remove the extra set of settings ;)

…ry-store-and-triggered-watches-store

…ing a triggered watch

…ry-store-and-triggered-watches-store

hub-cap · 2018-08-31T00:50:03Z

changes look good, gl w the test failure :)

…ry-store-and-triggered-watches-store

spinscale · 2018-09-06T13:50:47Z

@elasticmachine retest this please

…ry-store-and-triggered-watches-store

Currently a watch execution results in one bulk request, when the triggered watches are written into the that index, that need to be executed. However the update of the watch status, the creation of the watch history entry as well as the deletion of the triggered watches index are all single document operations. This can have quite a negative impact, once you are executing a lot of watches, as each execution results in 4 documents writes, three of them being single document actions. This commit switches to a bulk processor instead of a single document action for writing watch history entries and deleting triggered watch entries. However the defaults are to run synchronous as before because the number of concurrent requests is set to 0. This also fixes a bug, where the deletion of the triggered watch entry was done asynchronously. However if you have a high number of watches being executed, you can configure watcher to delete the triggered watches entries as well as writing the watch history entries via bulk requests. The triggered watches deletions should still happen in a timely manner, where as the history entries might actually be bound by size as one entry can easily have 20kb. The following settings have been added: - xpack.watcher.bulk.actions (default 1) - xpack.watcher.bulk.concurrent_requests (default 0) - xpack.watcher.bulk.flush_interval (default 1s) - xpack.watcher.bulk.size (default 1mb) The drawback of this is of course, that on a node outage you might end up with watch history entries not being written or watches needing to be executing again because they have not been deleted from the triggered watches index. The window of these two cases increases configuring the bulk processor to wait to reach certain thresholds.

spinscale added 2 commits July 31, 2018 10:35

fix timeout

36e3f7e

spinscale added >enhancement :Data Management/Watcher labels Jul 31, 2018

spinscale requested a review from hub-cap July 31, 2018 08:48

spinscale added 2 commits July 31, 2018 17:18

Merge branch 'master' into 1807-watcher-allow-bulk-requests-for-histo…

d700558

…ry-store-and-triggered-watches-store

Merge branch 'master' into 1807-watcher-allow-bulk-requests-for-histo…

660b43a

…ry-store-and-triggered-watches-store

hub-cap approved these changes Aug 13, 2018

View reviewed changes

spinscale added 2 commits August 16, 2018 09:46

Merge branch 'master' into 1807-watcher-allow-bulk-requests-for-histo…

c6ff80f

…ry-store-and-triggered-watches-store

to discuss: only use a single bulk processor

1c2ab0f

Merge branch 'master' into 1807-watcher-allow-bulk-requests-for-histo…

99b2b1a

…ry-store-and-triggered-watches-store

spinscale added 3 commits August 29, 2018 11:17

Merge branch 'master' into 1807-watcher-allow-bulk-requests-for-histo…

de78ceb

…ry-store-and-triggered-watches-store

remove TODO, minor refactoring, added more tests when deleting or add…

6bfbeeb

…ing a triggered watch

Merge branch 'master' into 1807-watcher-allow-bulk-requests-for-histo…

3ba6138

…ry-store-and-triggered-watches-store

spinscale added 5 commits August 31, 2018 09:13

Merge branch 'master' into 1807-watcher-allow-bulk-requests-for-histo…

365fefa

…ry-store-and-triggered-watches-store

Merge branch 'master' into 1807-watcher-allow-bulk-requests-for-histo…

11ba093

…ry-store-and-triggered-watches-store

Merge branch 'master' into 1807-watcher-allow-bulk-requests-for-histo…

eee7a52

…ry-store-and-triggered-watches-store

make store final again

5541e2d

Merge branch 'master' into 1807-watcher-allow-bulk-requests-for-histo…

1ed000f

…ry-store-and-triggered-watches-store

spinscale added 2 commits September 6, 2018 15:51

Merge branch 'master' into 1807-watcher-allow-bulk-requests-for-histo…

56a9728

…ry-store-and-triggered-watches-store

Merge branch 'master' into 1807-watcher-allow-bulk-requests-for-histo…

1359c8c

…ry-store-and-triggered-watches-store

spinscale added v7.0.0 v6.5.0 labels Sep 17, 2018

spinscale merged commit 1391288 into elastic:master Sep 18, 2018

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

jakelandis mentioned this pull request Apr 19, 2019

Watcher can deadlock resulting in-ability to index any documents. #41390

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Watcher: Use Bulkprocessor in HistoryStore/TriggeredWatchStore #32490

Watcher: Use Bulkprocessor in HistoryStore/TriggeredWatchStore #32490

Uh oh!

spinscale commented Jul 31, 2018 •

edited

Loading

Uh oh!

elasticmachine commented Jul 31, 2018

Uh oh!

spinscale commented Jul 31, 2018

Uh oh!

hub-cap left a comment

Uh oh!

hub-cap Aug 13, 2018

Uh oh!

spinscale Aug 16, 2018

Uh oh!

spinscale commented Aug 16, 2018

Uh oh!

hub-cap commented Aug 21, 2018

Uh oh!

hub-cap commented Aug 21, 2018

Uh oh!

hub-cap commented Aug 31, 2018

Uh oh!

spinscale commented Sep 6, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Watcher: Use Bulkprocessor in HistoryStore/TriggeredWatchStore #32490

Watcher: Use Bulkprocessor in HistoryStore/TriggeredWatchStore #32490

Uh oh!

Conversation

spinscale commented Jul 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Jul 31, 2018

Uh oh!

spinscale commented Jul 31, 2018

Uh oh!

hub-cap left a comment

Choose a reason for hiding this comment

Uh oh!

hub-cap Aug 13, 2018

Choose a reason for hiding this comment

Uh oh!

spinscale Aug 16, 2018

Choose a reason for hiding this comment

Uh oh!

spinscale commented Aug 16, 2018

Uh oh!

hub-cap commented Aug 21, 2018

Uh oh!

hub-cap commented Aug 21, 2018

Uh oh!

hub-cap commented Aug 31, 2018

Uh oh!

spinscale commented Sep 6, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

spinscale commented Jul 31, 2018 •

edited

Loading