EQL: Introduce sequencing fetch size #59063

costin · 2020-07-06T11:47:04Z

The current internal sequence algorithm relies on fetching multiple results and then paginating through the dataset. Depending on the dataset and memory, setting a larger page size can yield better performance at the expense of memory.
This PR makes this behavior explicit by decoupling the fetch size from size, the maximum number of results desired.
As such, use in testing a minimum fetch size which exposed a number of bugs:

Jumping across data across queries causing valid data to be seen as a gap.
Incorrectly resuming searching across pages (again causing data to be discarded).

which have been addressed.

Combine the request size and pipe declaration with the query into one For eventQuery push the limit into the query, for sequences keep the limit on the sequence but push the fetch size in the queries.

elasticmachine · 2020-07-06T11:47:06Z

Pinging @elastic/es-ql (:Query Languages/EQL)

costin · 2020-07-06T13:00:40Z

@elasticsearchmachine update branch

costin · 2020-07-06T13:02:51Z

Reverted merging the master due to #59066...

costin · 2020-07-06T13:57:24Z

@elasticsearchmachine update branch

…-size

matriv

LGTM. left 2 minor comments

matriv · 2020-07-06T15:17:18Z

client/rest-high-level/src/main/java/org/elasticsearch/client/eql/EqlSearchRequest.java

        return this.fetchSize;
    }

    public EqlSearchRequest fetchSize(int size) {


I would rename the arg to fetchSize

Will rename it in a follow-up PR. I'd like to merge this in considering it took 3 times due to transient issues and backport it to 7.x

matriv · 2020-07-06T15:18:21Z

...k/plugin/eql/qa/common/src/main/java/org/elasticsearch/test/eql/CommonEqlActionTestCase.java

        request.tiebreakerField("event.sequence");
+        // some queries return more than 10 results
+        request.size(50);
+        request.fetchSize(2);


Does it make sense to make it a random between let's say 1 and 10?

It does - it has to be 2 since we're talking about a window (1 would work but the algo would have to change - that requires an extra call that I'd rather avoid).
I've pushed it with 2 just to get the build running enough times with this minimum size - I'll follow-up in the until PR with random.

The current internal sequence algorithm relies on fetching multiple results and then paginating through the dataset. Depending on the dataset and memory, setting a larger page size can yield better performance at the expense of memory. This PR makes this behavior explicit by decoupling the fetch size from size, the maximum number of results desired. As such, use in testing a minimum fetch size which exposed a number of bugs: Jumping across data across queries causing valid data to be seen as a gap. Incorrectly resuming searching across pages (again causing data to be discarded). which have been addressed. (cherry picked from commit 2f389a7)

costin · 2020-07-06T16:16:27Z

client/rest-high-level/src/main/java/org/elasticsearch/client/eql/EqlSearchRequest.java

    static final String KEY_EVENT_CATEGORY_FIELD = "event_category_field";
    static final String KEY_IMPLICIT_JOIN_KEY_FIELD = "implicit_join_key_field";
    static final String KEY_CASE_SENSITIVE = "case_sensitive";
    static final String KEY_SIZE = "size";


@jrodewig A heads-up on these two parameters.
The default size of the response changed to 10 from 50 (aligned with that of the search api).
Additionally the fetch_size parameters has been introduced to indicate how large the page for sequence matching is. A large page means faster results and less search calls but it does so at the expense of memory (more data needs to be returned).

Thanks for the heads up. #59085 should already cover the changes to size. I'll work on documenting the fetch_size param.

costin · 2020-07-06T16:17:50Z

x-pack/plugin/eql/src/test/resources/queryfolder_tests.txt

 "params":{"v0":43,"v1":"serial_event_id","v2":41}
 ;
+
+eventQueryDefaultLimit


@jrodewig tail and pipe should now work for all type of queries; if you encountered problems when trying to use them in the docs let me know since that would be a bug.

Changes: * Documents the `size` default as `10`. * Updates `size` param def to note its relation to pipes. * Updates the `head` and `tail` pipe docs to modify sequences. * Documents the `fetch_size` parameter. Relates to #59014 and #59063

costin added 8 commits July 6, 2020 14:39

First attempt in adding fetchSize alongside size

7bfbcc6

Integrate size/fetch size as LimitWithOffset within the planExecutor

d223863

Combine the request size and pipe declaration with the query into one For eventQuery push the limit into the query, for sequences keep the limit on the sequence but push the fetch size in the queries.

Add fetch size parameter in the rest client API

a7a4677

skip stages only when there are no candidates on the current stage

0d8c04b

Don't set beginning of follow-up query to avoid them jumping over data

9e15b4f

Update query next only when at least one result has been returned

6c5ca1f

Reinitialize pointer when doing desc/asc queries

4b3f7ef

Increase the default size of results in testing

e0d2f7d

costin added the :Analytics/EQL EQL querying label Jul 6, 2020

costin requested review from astefan and matriv July 6, 2020 11:47

elasticmachine added the Team:QL (Deprecated) Meta label for query languages team label Jul 6, 2020

Update EQL client test

9c9c8b2

costin force-pushed the eql/match-size branch from 323310f to 9c9c8b2 Compare July 6, 2020 13:02

Merge remote-tracking branch 'remotes/upstream/master' into eql/match…

a108555

…-size

matriv approved these changes Jul 6, 2020

View reviewed changes

costin merged commit 2f389a7 into elastic:master Jul 6, 2020

costin deleted the eql/match-size branch July 6, 2020 15:50

costin commented Jul 6, 2020

View reviewed changes

costin mentioned this pull request Jul 6, 2020

EQL: Most recent matches returned by default #58646

Closed

jrodewig mentioned this pull request Jul 6, 2020

[DOCS] EQL: Document size limit for pipes #59085

Merged

EQL: Introduce sequencing fetch size #59063

EQL: Introduce sequencing fetch size #59063

Uh oh!

Conversation

costin commented Jul 6, 2020

Uh oh!

elasticmachine commented Jul 6, 2020

Uh oh!

costin commented Jul 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

costin commented Jul 6, 2020

Uh oh!

costin commented Jul 6, 2020

Uh oh!

matriv left a comment

Choose a reason for hiding this comment

Uh oh!

matriv Jul 6, 2020

Choose a reason for hiding this comment

Uh oh!

costin Jul 6, 2020

Choose a reason for hiding this comment

Uh oh!

matriv Jul 6, 2020

Choose a reason for hiding this comment

Uh oh!

costin Jul 6, 2020

Choose a reason for hiding this comment

Uh oh!

costin Jul 6, 2020

Choose a reason for hiding this comment

Uh oh!

jrodewig Jul 6, 2020

Choose a reason for hiding this comment

Uh oh!

costin Jul 6, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

costin commented Jul 6, 2020 •

edited

Loading