Skip to content

Conversation

@costin
Copy link
Member

@costin costin commented Jul 6, 2020

The current internal sequence algorithm relies on fetching multiple results and then paginating through the dataset. Depending on the dataset and memory, setting a larger page size can yield better performance at the expense of memory.
This PR makes this behavior explicit by decoupling the fetch size from size, the maximum number of results desired.
As such, use in testing a minimum fetch size which exposed a number of bugs:

  1. Jumping across data across queries causing valid data to be seen as a gap.
  2. Incorrectly resuming searching across pages (again causing data to be discarded).

which have been addressed.

@costin costin added the :Analytics/EQL EQL querying label Jul 6, 2020
@costin costin requested review from astefan and matriv July 6, 2020 11:47
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (:Query Languages/EQL)

@elasticmachine elasticmachine added the Team:QL (Deprecated) Meta label for query languages team label Jul 6, 2020
@costin
Copy link
Member Author

costin commented Jul 6, 2020

@elasticsearchmachine update branch

@costin
Copy link
Member Author

costin commented Jul 6, 2020

Reverted merging the master due to #59066...

@costin
Copy link
Member Author

costin commented Jul 6, 2020

@elasticsearchmachine update branch

Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. left 2 minor comments

return this.fetchSize;
}

public EqlSearchRequest fetchSize(int size) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename the arg to fetchSize

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will rename it in a follow-up PR. I'd like to merge this in considering it took 3 times due to transient issues and backport it to 7.x

request.tiebreakerField("event.sequence");
// some queries return more than 10 results
request.size(50);
request.fetchSize(2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to make it a random between let's say 1 and 10?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does - it has to be 2 since we're talking about a window (1 would work but the algo would have to change - that requires an extra call that I'd rather avoid).
I've pushed it with 2 just to get the build running enough times with this minimum size - I'll follow-up in the until PR with random.

@costin costin merged commit 2f389a7 into elastic:master Jul 6, 2020
@costin costin deleted the eql/match-size branch July 6, 2020 15:50
costin added a commit that referenced this pull request Jul 6, 2020
The current internal sequence algorithm relies on fetching multiple results and then paginating through the dataset. Depending on the dataset and memory, setting a larger page size can yield better performance at the expense of memory.
This PR makes this behavior explicit by decoupling the fetch size from size, the maximum number of results desired.
As such, use in testing a minimum fetch size which exposed a number of bugs:

Jumping across data across queries causing valid data to be seen as a gap.
Incorrectly resuming searching across pages (again causing data to be discarded).
which have been addressed.

(cherry picked from commit 2f389a7)
static final String KEY_EVENT_CATEGORY_FIELD = "event_category_field";
static final String KEY_IMPLICIT_JOIN_KEY_FIELD = "implicit_join_key_field";
static final String KEY_CASE_SENSITIVE = "case_sensitive";
static final String KEY_SIZE = "size";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrodewig A heads-up on these two parameters.
The default size of the response changed to 10 from 50 (aligned with that of the search api).
Additionally the fetch_size parameters has been introduced to indicate how large the page for sequence matching is. A large page means faster results and less search calls but it does so at the expense of memory (more data needs to be returned).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads up. #59085 should already cover the changes to size. I'll work on documenting the fetch_size param.

"params":{"v0":43,"v1":"serial_event_id","v2":41}
;

eventQueryDefaultLimit
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrodewig tail and pipe should now work for all type of queries; if you encountered problems when trying to use them in the docs let me know since that would be a bug.

jrodewig added a commit that referenced this pull request Jul 8, 2020
Changes:
* Documents the `size` default as `10`.
* Updates `size` param def to note its relation to pipes.
* Updates the `head` and `tail` pipe docs to modify sequences.
* Documents the `fetch_size` parameter.

Relates to #59014 and #59063
jrodewig added a commit that referenced this pull request Jul 8, 2020
Changes:
* Documents the `size` default as `10`.
* Updates `size` param def to note its relation to pipes.
* Updates the `head` and `tail` pipe docs to modify sequences.
* Documents the `fetch_size` parameter.

Relates to #59014 and #59063
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/EQL EQL querying Team:QL (Deprecated) Meta label for query languages team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants