-
Notifications
You must be signed in to change notification settings - Fork 25.6k
EQL: Introduce sequencing fetch size #59063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Combine the request size and pipe declaration with the query into one For eventQuery push the limit into the query, for sequences keep the limit on the sequence but push the fetch size in the queries.
|
Pinging @elastic/es-ql (:Query Languages/EQL) |
|
@elasticsearchmachine update branch |
|
Reverted merging the master due to #59066... |
|
@elasticsearchmachine update branch |
matriv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. left 2 minor comments
| return this.fetchSize; | ||
| } | ||
|
|
||
| public EqlSearchRequest fetchSize(int size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename the arg to fetchSize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will rename it in a follow-up PR. I'd like to merge this in considering it took 3 times due to transient issues and backport it to 7.x
| request.tiebreakerField("event.sequence"); | ||
| // some queries return more than 10 results | ||
| request.size(50); | ||
| request.fetchSize(2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to make it a random between let's say 1 and 10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does - it has to be 2 since we're talking about a window (1 would work but the algo would have to change - that requires an extra call that I'd rather avoid).
I've pushed it with 2 just to get the build running enough times with this minimum size - I'll follow-up in the until PR with random.
The current internal sequence algorithm relies on fetching multiple results and then paginating through the dataset. Depending on the dataset and memory, setting a larger page size can yield better performance at the expense of memory. This PR makes this behavior explicit by decoupling the fetch size from size, the maximum number of results desired. As such, use in testing a minimum fetch size which exposed a number of bugs: Jumping across data across queries causing valid data to be seen as a gap. Incorrectly resuming searching across pages (again causing data to be discarded). which have been addressed. (cherry picked from commit 2f389a7)
| static final String KEY_EVENT_CATEGORY_FIELD = "event_category_field"; | ||
| static final String KEY_IMPLICIT_JOIN_KEY_FIELD = "implicit_join_key_field"; | ||
| static final String KEY_CASE_SENSITIVE = "case_sensitive"; | ||
| static final String KEY_SIZE = "size"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrodewig A heads-up on these two parameters.
The default size of the response changed to 10 from 50 (aligned with that of the search api).
Additionally the fetch_size parameters has been introduced to indicate how large the page for sequence matching is. A large page means faster results and less search calls but it does so at the expense of memory (more data needs to be returned).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the heads up. #59085 should already cover the changes to size. I'll work on documenting the fetch_size param.
| "params":{"v0":43,"v1":"serial_event_id","v2":41} | ||
| ; | ||
|
|
||
| eventQueryDefaultLimit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrodewig tail and pipe should now work for all type of queries; if you encountered problems when trying to use them in the docs let me know since that would be a bug.
The current internal sequence algorithm relies on fetching multiple results and then paginating through the dataset. Depending on the dataset and memory, setting a larger page size can yield better performance at the expense of memory.
This PR makes this behavior explicit by decoupling the fetch size from size, the maximum number of results desired.
As such, use in testing a minimum fetch size which exposed a number of bugs:
which have been addressed.