Skip to content

Create Synchronous EQL querying REST API #49634

@colings86

Description

@colings86

The first mode of execution for EQL queries will be running ad hoc EQL queries against historical data (i.e. running the query over large amounts of data already stored in an index in a single run). For this issue we will make the API a synchronous request/response where the execution of the query will complete before returning the response. In a later issue we will address long running EQL queries and explore converting this to an asynchronous API.

Request

Parameters on the request should be (note that we can probably define sensible defaults for everything except the index and the rule):

  • Index/index pattern/alias (normal index definition and expansion, including wildcard expansion options and ignore_throttled)
  • Narrowing query (Using ES Query DSL) - allowing the user to select a subset of the index on which to run the rule. Defaults to null (no query)
  • EQL rule to run
  • Size of response? - To limit the number of results to hold in memory and return to the user and to allow pagination. Defaults to 50?
  • Search_after for join key? - to enable scrolling through results when there are more than one page, the value for this search_after would be the join value for which all previous join values should be excluded. Defaults to null.
  • Field to use as timestamp. Defaults to @timestamp
  • Field to use as event type (process, file, network etc.). Defaults to event.type
  • Field to use as an implicit join key - This defines a field that will be implicitly added as the first join key. This can be used to prevent sequences and join matching across e.g. edge nodes if the rule should consider data from edges nodes completely separate. The default for this field would be null so by default we would only use the join keys specified in the EQL rule. This option will be useful for the Endpoint use case since we need to be able to run the same rules on Elasticsearch as on the Endpoints but when querying the endpoints, each endpoint is considered individually so we will need some control outside of the rule to get the same behaviour in Elasticsearch.

Note parameter names are not intended to be final suggestions

Example minimal request:

GET index-pattern-*/_eql/search?sync_search_threshold=5s
{
  “rule”: “””
              sequence with maxspan=5h 
              [file where user != ‘SYSTEM’ by file_path]
              [process where user = ‘SYSTEM’ by process_path]
              ”””
}

Example request with all options:

GET index-pattern-*/_eql/search?sync_search_threshold=5s
{
  “query”: {
    “match” : {
      “foo”: “bar”
    }
  },
  “timestamp_field”: “@timestamp”,
  “event_type_field”: “event.type”,
  “implicit_join_key_field”: “device.id”,
  “size”: 100,
  "search_after": [ "device-20184", "/user/local/foo.exe", "2019-11-26T00:45:43.542" ]
  “rule”: “””
            sequence with maxspan=5h 
              [file where user != ‘SYSTEM’ by file_path]
              [process where user = ‘SYSTEM’ by process_path]
          ”””
}

Response

Although the response does not need to be tabular, it is much easier for UIs and users to consume the results if the response is easily converted to a table.

Information required in response:

  • Number of results returned
  • Indication whether there are more results?
  • Indication if these are partial results (we will need to decide if we want to support interim results and/or if we want to return results if a shard fails on one of the searches
  • Rule results

Information required for each rule result:

  • Join values (if applicable)
  • Events that make up the result

Current format of results

For EQL queries without pipes, the results of an EQL query are always a flat list of events. This means that if the query is a sequence the ordering of the events in the results defines the sequence rather than the sequence being defined by structure. For example if the query was looking for file events followed by a process event the results would look like the following:

  1. File event
  2. Process event
  3. File event
  4. Process event
  5. File event
  6. Process event

From the list above you can see that every 2 events make up an instance of the sequence we are looking for. The downside here is that the client needs to understand the query being run to be able to understand the results. We will probably need to support this style of results output in order to fit in with the way that the endpoint SMP Server currently uses EQL. Note that the SMP Server currently pushes the understanding of the sequences to the user (i.e. it shows the flat events output as returned) for cases where the query is defined by the user. For cases where the server itself defines the EQL (such as in the resolver view) the server has implicit knowledge of what it's asking for so knows how to interpret the results.

Alternatives to current result output

For clients like Kibana (and probably SIEM) it would be better if the client does not need to understand the query in order to interpret the results. The difference here compared with the SMP server is that in Kibana the user will define an arbitrary EQL query and expect Kibana to know how to render it in a way that makes sense. This means that Kibana should not have to understand the query (since we don’t want to have to add a query parser in Kibana as well as ES) but does need the results in a generic understandable form. If sequences are defined as structure Kibana can identify sequences without understanding the query itself (it just needs to understand it might get sequences back containing 1 or more events each). Another option would be to have a “sequence group id” field in the response for each event so events in the same sequence can be matched without having to have explicit response structure.

The endgame CLI client also has the option to define --flat --columns which pivot the result data into a table form with the specified columns. This may also be something we would want to support since it will put the results into a much more consumable form for clients like Kibana and is the kind of operation analysts will naturally reach for following the search anyway.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions