-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Watcher: Use Bulkprocessor in HistoryStore/TriggeredWatchStore #32490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Watcher: Use Bulkprocessor in HistoryStore/TriggeredWatchStore #32490
Conversation
Currently a watch execution results in one bulk request, when the triggered watches are written into the that index, that need to be executed. However the update of the watch status, the creation of the watch history entry as well as the deletion of the triggered watches index are all single document operations. This can have quite a negative impact, once you are executing a lot of watches, as each execution results in 4 documents writes, three of them being single document actions. This commit switches to a bulk processor instead of a single document action for writing watch history entries and deleting triggered watch entries. However the defaults are to run synchronous as before because the number of concurrent requests is set to 0. This also fixes a bug, where the deletion of the triggered watch entry was done asynchronously. However if you have a high number of watches being executed, you can configure watcher to delete the triggered watches entries as well as writing the watch history entries via bulk requests. The triggered watches deletions should still happen in a timely manner, where as the history entries might actually be bound by size as one entry can easily have 20kb. The following settings have been added: - xpack.watcher.triggered_watch_store.bulk.actions (default 1) - xpack.watcher.triggered_watch_store.bulk.concurrent_requests (default 0) - xpack.watcher.triggered_watch_store.bulk.flush_interval (default 1s) - xpack.watcher.triggered_watch_store.bulk.size (default 1mb) - xpack.watcher.history_store.bulk.actions (default 1) - xpack.watcher.history_store.bulk.concurrent_requests (default 0) - xpack.watcher.history_store.bulk.flush_interval (default 1s) - xpack.watcher.history_store.bulk.size (default 1mb) The drawback of this is of course, that on a node outage you might end up with watch history entries not being written or watches needing to be executing again because they have not been deleted from the triggered watches index. The window of these two cases increases when running bulk requests.
|
Pinging @elastic/es-core-infra |
|
@elasticmachine retest this please |
…ry-store-and-triggered-watches-store
…ry-store-and-triggered-watches-store
hub-cap
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow this actually cleaned things up a good bit. Great job. I had one question about sync vs async deletes, and once you answer me on that, this is good to go. Ill pre-approve your loan tho :)
| client.delete(request); // FIXME shouldn't we wait before saying the delete was successful | ||
| } | ||
| logger.trace("successfully deleted triggered watch with id [{}]", wid); | ||
| bulkProcessor.add(request); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does executing watches take into consideration any deleted requests here? what happens if i delete a watch and then wait to submit the bulk, might that watch trigger again? Should we make deletes be the only sync thing? I think thats a fair tradeoff given my very little knowledge here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it does. If the watch is deleted and then a triggered watch is picked, there will be a watch history entry telling you that the referenced watch could not be found.
…ry-store-and-triggered-watches-store
|
I thought about this as I wanted to merge and did a change that warrants some discussion. I have added a new commit using only one bulk processor and one set of settings instead. Do you think that's better or should be reverted and we go with two? My initial thought was, that splitting those makes sense because the documents are extremely different in size, so one could configure different settings - but at the end you configure a flush interval or based on size so all of this can be done within one bulk processor. One bulk processor with correctly tuned settings will be as good as two. |
…ry-store-and-triggered-watches-store
|
I think this change makes sense. Its less junk in the service, making them easier to test, and it removes a bunch of nested close logic that we might forget if we change/add something. As long as we dont start adding this to the components list for guice, im all for it ;) I also dont see the value in 2 diff settings given that we can flush on size and time. I agree w/ your tuning statement above. |
|
be sure to modify your commit msg and remove the extra set of settings ;) |
…ry-store-and-triggered-watches-store
…ing a triggered watch
…ry-store-and-triggered-watches-store
|
changes look good, gl w the test failure :) |
…ry-store-and-triggered-watches-store
…ry-store-and-triggered-watches-store
…ry-store-and-triggered-watches-store
…ry-store-and-triggered-watches-store
|
@elasticmachine retest this please |
…ry-store-and-triggered-watches-store
…ry-store-and-triggered-watches-store
Currently a watch execution results in one bulk request, when the triggered watches are written into the that index, that need to be executed. However the update of the watch status, the creation of the watch history entry as well as the deletion of the triggered watches index are all single document operations. This can have quite a negative impact, once you are executing a lot of watches, as each execution results in 4 documents writes, three of them being single document actions. This commit switches to a bulk processor instead of a single document action for writing watch history entries and deleting triggered watch entries. However the defaults are to run synchronous as before because the number of concurrent requests is set to 0. This also fixes a bug, where the deletion of the triggered watch entry was done asynchronously. However if you have a high number of watches being executed, you can configure watcher to delete the triggered watches entries as well as writing the watch history entries via bulk requests. The triggered watches deletions should still happen in a timely manner, where as the history entries might actually be bound by size as one entry can easily have 20kb. The following settings have been added: - xpack.watcher.bulk.actions (default 1) - xpack.watcher.bulk.concurrent_requests (default 0) - xpack.watcher.bulk.flush_interval (default 1s) - xpack.watcher.bulk.size (default 1mb) The drawback of this is of course, that on a node outage you might end up with watch history entries not being written or watches needing to be executing again because they have not been deleted from the triggered watches index. The window of these two cases increases configuring the bulk processor to wait to reach certain thresholds.
Currently a watch execution results in one bulk request, when the
triggered watches are written into the that index, that need to be
executed. However the update of the watch status, the creation of the
watch history entry as well as the deletion of the triggered watches
index are all single document operations.
This can have quite a negative impact, once you are executing a lot of
watches, as each execution results in 4 documents writes, three of them
being single document actions.
This commit switches to a bulk processor instead of a single document
action for writing watch history entries and deleting triggered watch
entries. However the defaults are to run synchronous as before because
the number of concurrent requests is set to 0. This also fixes a bug,
where the deletion of the triggered watch entry was done asynchronously.
However if you have a high number of watches being executed, you can
configure watcher to delete the triggered watches entries as well as
writing the watch history entries via bulk requests.
The triggered watches deletions should still happen in a timely manner,
where as the history entries might actually be bound by size as one
entry can easily have 20kb.
The following settings have been added:
The drawback of this is of course, that on a node outage you might end
up with watch history entries not being written or watches needing to be
executing again because they have not been deleted from the triggered
watches index. The window of these two cases increases configuring the bulk processor to wait to reach certain thresholds.