From 04316ff4f7d540c6d1ec71dac2395d1ebdfb3778 Mon Sep 17 00:00:00 2001 From: ruflin Date: Wed, 13 May 2020 09:14:25 +0200 Subject: [PATCH 1/5] Add stream fields This adds the stream fields which are used for the new indexing strategy to ECS: https://github.com/elastic/kibana/blob/master/docs/ingest_manager/index.asciidoc#indexing-strategy-1 To goal of having these in ECS is to allow any timeseries shipper to use these fields and get the benefit of the new indexing strategy. Before we landed on `stream.*` quite a few discussions happened if we could use `event.kind`, `event.dataset`, `event.type`. With `event.kind` there are two main problems: * It is not a constant_keyword so this would be a breaking change * It already contains more values in it then we need. The first problem also applies to `event.dataset` even though the field has the same content. It felt also odd to have some of the fields under `event.*` and some under `stream.*`. An other option discussed was to use `datastream.*` based on the new Data Stream feature in Elasticsearch as this is where all data ends up. But this would indicate it is a feature of the Data Streams itself but it is not. So `stream` seemed to be the best fit. --- docs/field-details.asciidoc | 70 +++++++++++++++++++++++++ docs/fields.asciidoc | 2 + generated/beats/fields.ecs.yml | 55 +++++++++++++++++++ generated/csv/fields.csv | 3 ++ generated/ecs/ecs_flat.yml | 51 ++++++++++++++++++ generated/ecs/ecs_nested.yml | 70 +++++++++++++++++++++++++ generated/elasticsearch/6/template.json | 14 +++++ generated/elasticsearch/7/template.json | 14 +++++ schemas/stream.yml | 63 ++++++++++++++++++++++ use-cases/apm.md | 21 -------- use-cases/auditbeat.md | 44 ---------------- use-cases/beats.md | 18 ------- use-cases/filebeat-apache-access.md | 29 ---------- use-cases/kubernetes.md | 21 -------- use-cases/logging.md | 22 -------- use-cases/metricbeat.md | 31 ----------- use-cases/web-logs.md | 29 ---------- 17 files changed, 342 insertions(+), 215 deletions(-) create mode 100644 schemas/stream.yml delete mode 100644 use-cases/apm.md delete mode 100644 use-cases/auditbeat.md delete mode 100644 use-cases/beats.md delete mode 100644 use-cases/filebeat-apache-access.md delete mode 100644 use-cases/kubernetes.md delete mode 100644 use-cases/logging.md delete mode 100644 use-cases/metricbeat.md delete mode 100644 use-cases/web-logs.md diff --git a/docs/field-details.asciidoc b/docs/field-details.asciidoc index ffe4744f88..bf8101aeb2 100644 --- a/docs/field-details.asciidoc +++ b/docs/field-details.asciidoc @@ -5616,6 +5616,76 @@ example: `co.uk` // =============================================================== +|===== + +[[ecs-stream]] +=== Stream Fields + +The stream fields are part of the new [indexing strategy](https://github.com/elastic/kibana/blob/master/docs/ingest_manager/index.asciidoc#indexing-strategy-1). + +These fields are used to determine into which index the data is shipped in Elasticsearch and allow efficient querying of data. Initially these fields are mainly used by data shipped by the Elastic Agent but any time series data shipper should switch to using data streams and the new indexing strategy with these fields. + +All three fields are `constant_keyword` fields. + +==== Stream Field Details + +[options="header"] +|===== +| Field | Description | Level + +// =============================================================== + +| stream.dataset +| Dataset describes the structure of the data. + +The dataset describes the structure of the data. All data shipped into a single dataset should have the same or very similar data structure. For example `system.cpu` and `system.disk` are two different datasets as they have very different fields. + +The name of the dataset should be descriptive of the data and it is encourage to use `.` to combine multiple words. All characters which are allowed in index names can be used for the dataset except `-`. + +The default for dataset is `generic`. + +type: keyword + + + +example: `nginx.access` + +| extended + +// =============================================================== + +| stream.namespace +| Namespace of your stream. + +This is the namespace used in your index. The namespace is used to separate the same structure into different Data Streams. For example if nginx logs are shipped for testing and production into the same cluster, two different namespaces can be used. This allows to assign different ILM policies as an example. + +The default value for a namespace is `default`. + +type: constant_keyword + + + +example: `production` + +| extended + +// =============================================================== + +| stream.type +| Type of the stream. + +The type of the stream can be `logs` or `metrics`. More types can be added in the future but no other types then the one describe here should be used. + +type: constant_keyword + + + +example: `logs` + +| extended + +// =============================================================== + |===== [[ecs-threat]] diff --git a/docs/fields.asciidoc b/docs/fields.asciidoc index ead1723d98..c1ceb224ee 100644 --- a/docs/fields.asciidoc +++ b/docs/fields.asciidoc @@ -86,6 +86,8 @@ all fields are defined. | <> | Fields about the source side of a network connection, used with destination. +| <> | Fields about the monitoring agent. + | <> | Fields to classify events and alerts according to a threat taxonomy. | <> | Fields describing a TLS connection. diff --git a/generated/beats/fields.ecs.yml b/generated/beats/fields.ecs.yml index b527d3192b..959a05f752 100644 --- a/generated/beats/fields.ecs.yml +++ b/generated/beats/fields.ecs.yml @@ -4380,6 +4380,61 @@ default_field: false description: Short name or login of the user. example: albert + - name: stream + title: Stream + group: 2 + description: 'The stream fields are part of the new [indexing strategy](https://github.com/elastic/kibana/blob/master/docs/ingest_manager/index.asciidoc#indexing-strategy-1). + + These fields are used to determine into which index the data is shipped in Elasticsearch + and allow efficient querying of data. Initially these fields are mainly used + by data shipped by the Elastic Agent but any time series data shipper should + switch to using data streams and the new indexing strategy with these fields. + + All three fields are `constant_keyword` fields.' + footnote: 'Examples: The new indexing strategy is {stream.type}-{stream.dataset}-{stream.namespace}`.` + As an example, nginx access logs are shipped into `logs-nginx.access-default`.' + type: group + fields: + - name: dataset + level: extended + type: keyword + ignore_above: 1024 + description: 'Dataset describes the structure of the data. + + The dataset describes the structure of the data. All data shipped into a single + dataset should have the same or very similar data structure. For example `system.cpu` + and `system.disk` are two different datasets as they have very different fields. + + The name of the dataset should be descriptive of the data and it is encourage + to use `.` to combine multiple words. All characters which are allowed in + index names can be used for the dataset except `-`. + + The default for dataset is `generic`.' + example: nginx.access + default_field: false + - name: namespace + level: extended + type: constant_keyword + description: 'Namespace of your stream. + + This is the namespace used in your index. The namespace is used to separate + the same structure into different Data Streams. For example if nginx logs + are shipped for testing and production into the same cluster, two different + namespaces can be used. This allows to assign different ILM policies as an + example. + + The default value for a namespace is `default`.' + example: production + default_field: false + - name: type + level: extended + type: constant_keyword + description: 'Type of the stream. + + The type of the stream can be `logs` or `metrics`. More types can be added + in the future but no other types then the one describe here should be used.' + example: logs + default_field: false - name: threat title: Threat group: 2 diff --git a/generated/csv/fields.csv b/generated/csv/fields.csv index 2859067da8..30a5e2412e 100644 --- a/generated/csv/fields.csv +++ b/generated/csv/fields.csv @@ -516,6 +516,9 @@ ECS_Version,Indexed,Field_Set,Field,Type,Level,Normalization,Example,Description 1.6.0-dev,true,source,source.user.id,keyword,core,,,Unique identifier of the user. 1.6.0-dev,true,source,source.user.name,keyword,core,,albert,Short name or login of the user. 1.6.0-dev,true,source,source.user.name.text,text,core,,albert,Short name or login of the user. +1.6.0-dev,true,stream,stream.dataset,keyword,extended,,nginx.access,Dataset describes the structure of the data. +1.6.0-dev,true,stream,stream.namespace,constant_keyword,extended,,production,Namespace of your stream. +1.6.0-dev,true,stream,stream.type,constant_keyword,extended,,logs,Type of the stream. 1.6.0-dev,true,threat,threat.framework,keyword,extended,,MITRE ATT&CK,Threat classification framework. 1.6.0-dev,true,threat,threat.tactic.id,keyword,extended,array,TA0040,Threat tactic id. 1.6.0-dev,true,threat,threat.tactic.name,keyword,extended,array,impact,Threat tactic. diff --git a/generated/ecs/ecs_flat.yml b/generated/ecs/ecs_flat.yml index 5f6aa7025b..3b232c032e 100644 --- a/generated/ecs/ecs_flat.yml +++ b/generated/ecs/ecs_flat.yml @@ -6669,6 +6669,57 @@ source.user.name: original_fieldset: user short: Short name or login of the user. type: keyword +stream.dataset: + dashed_name: stream-dataset + description: 'Dataset describes the structure of the data. + + The dataset describes the structure of the data. All data shipped into a single + dataset should have the same or very similar data structure. For example `system.cpu` + and `system.disk` are two different datasets as they have very different fields. + + The name of the dataset should be descriptive of the data and it is encourage + to use `.` to combine multiple words. All characters which are allowed in index + names can be used for the dataset except `-`. + + The default for dataset is `generic`.' + example: nginx.access + flat_name: stream.dataset + ignore_above: 1024 + level: extended + name: dataset + normalize: [] + short: Dataset describes the structure of the data. + type: keyword +stream.namespace: + dashed_name: stream-namespace + description: 'Namespace of your stream. + + This is the namespace used in your index. The namespace is used to separate the + same structure into different Data Streams. For example if nginx logs are shipped + for testing and production into the same cluster, two different namespaces can + be used. This allows to assign different ILM policies as an example. + + The default value for a namespace is `default`.' + example: production + flat_name: stream.namespace + level: extended + name: namespace + normalize: [] + short: Namespace of your stream. + type: constant_keyword +stream.type: + dashed_name: stream-type + description: 'Type of the stream. + + The type of the stream can be `logs` or `metrics`. More types can be added in + the future but no other types then the one describe here should be used.' + example: logs + flat_name: stream.type + level: extended + name: type + normalize: [] + short: Type of the stream. + type: constant_keyword tags: dashed_name: tags description: List of keywords used to tag each event. diff --git a/generated/ecs/ecs_nested.yml b/generated/ecs/ecs_nested.yml index 14cc581f24..b1f4b9142d 100644 --- a/generated/ecs/ecs_nested.yml +++ b/generated/ecs/ecs_nested.yml @@ -7698,6 +7698,76 @@ source: short: Fields about the source side of a network connection, used with destination. title: Source type: group +stream: + description: 'The stream fields are part of the new [indexing strategy](https://github.com/elastic/kibana/blob/master/docs/ingest_manager/index.asciidoc#indexing-strategy-1). + + These fields are used to determine into which index the data is shipped in Elasticsearch + and allow efficient querying of data. Initially these fields are mainly used by + data shipped by the Elastic Agent but any time series data shipper should switch + to using data streams and the new indexing strategy with these fields. + + All three fields are `constant_keyword` fields.' + fields: + dataset: + dashed_name: stream-dataset + description: 'Dataset describes the structure of the data. + + The dataset describes the structure of the data. All data shipped into a single + dataset should have the same or very similar data structure. For example `system.cpu` + and `system.disk` are two different datasets as they have very different fields. + + The name of the dataset should be descriptive of the data and it is encourage + to use `.` to combine multiple words. All characters which are allowed in + index names can be used for the dataset except `-`. + + The default for dataset is `generic`.' + example: nginx.access + flat_name: stream.dataset + ignore_above: 1024 + level: extended + name: dataset + normalize: [] + short: Dataset describes the structure of the data. + type: keyword + namespace: + dashed_name: stream-namespace + description: 'Namespace of your stream. + + This is the namespace used in your index. The namespace is used to separate + the same structure into different Data Streams. For example if nginx logs + are shipped for testing and production into the same cluster, two different + namespaces can be used. This allows to assign different ILM policies as an + example. + + The default value for a namespace is `default`.' + example: production + flat_name: stream.namespace + level: extended + name: namespace + normalize: [] + short: Namespace of your stream. + type: constant_keyword + type: + dashed_name: stream-type + description: 'Type of the stream. + + The type of the stream can be `logs` or `metrics`. More types can be added + in the future but no other types then the one describe here should be used.' + example: logs + flat_name: stream.type + level: extended + name: type + normalize: [] + short: Type of the stream. + type: constant_keyword + footnote: 'Examples: The new indexing strategy is {stream.type}-{stream.dataset}-{stream.namespace}`.` + As an example, nginx access logs are shipped into `logs-nginx.access-default`.' + group: 2 + name: stream + prefix: stream. + short: Fields about the monitoring agent. + title: Stream + type: group threat: description: 'Fields to classify events and alerts according to a threat taxonomy such as the Mitre ATT&CK framework. diff --git a/generated/elasticsearch/6/template.json b/generated/elasticsearch/6/template.json index d5f033b22e..12687778d9 100644 --- a/generated/elasticsearch/6/template.json +++ b/generated/elasticsearch/6/template.json @@ -2450,6 +2450,20 @@ } } }, + "stream": { + "properties": { + "dataset": { + "ignore_above": 1024, + "type": "keyword" + }, + "namespace": { + "type": "constant_keyword" + }, + "type": { + "type": "constant_keyword" + } + } + }, "tags": { "ignore_above": 1024, "type": "keyword" diff --git a/generated/elasticsearch/7/template.json b/generated/elasticsearch/7/template.json index f756237e21..484674f21c 100644 --- a/generated/elasticsearch/7/template.json +++ b/generated/elasticsearch/7/template.json @@ -2449,6 +2449,20 @@ } } }, + "stream": { + "properties": { + "dataset": { + "ignore_above": 1024, + "type": "keyword" + }, + "namespace": { + "type": "constant_keyword" + }, + "type": { + "type": "constant_keyword" + } + } + }, "tags": { "ignore_above": 1024, "type": "keyword" diff --git a/schemas/stream.yml b/schemas/stream.yml new file mode 100644 index 0000000000..cd0e5177be --- /dev/null +++ b/schemas/stream.yml @@ -0,0 +1,63 @@ +--- +- name: stream + title: Stream + group: 2 + short: Fields about the monitoring agent. + description: > + The stream fields are part of the new [indexing strategy](https://github.com/elastic/kibana/blob/master/docs/ingest_manager/index.asciidoc#indexing-strategy-1). + + These fields are used to determine into which index the data is shipped in Elasticsearch and + allow efficient querying of data. Initially these fields are mainly used by data shipped by + the Elastic Agent but any time series data shipper should switch to using data streams and + the new indexing strategy with these fields. + + All three fields are `constant_keyword` fields. + footnote: > + Examples: The new indexing strategy is {stream.type}-{stream.dataset}-{stream.namespace}`.` + As an example, nginx access logs are shipped into `logs-nginx.access-default`. + type: group + fields: + - name: type + level: extended + type: constant_keyword + short: Type of the stream. + description: > + Type of the stream. + + The type of the stream can be `logs` or `metrics`. More types can be added in the future but no + other types then the one describe here should be used. + + example: logs + + - name: dataset + level: extended + type: keyword + short: Dataset describes the structure of the data. + description: > + Dataset describes the structure of the data. + + The dataset describes the structure of the data. All data shipped into a single dataset + should have the same or very similar data structure. For example `system.cpu` and `system.disk` + are two different datasets as they have very different fields. + + The name of the dataset should be descriptive of the data and it is encourage to use `.` to + combine multiple words. All characters which are allowed in index names can be used for the dataset + except `-`. + + The default for dataset is `generic`. + example: nginx.access + + - name: namespace + level: extended + type: constant_keyword + short: Namespace of your stream. + description: > + Namespace of your stream. + + This is the namespace used in your index. The namespace is used to separate + the same structure into different Data Streams. For example if nginx logs + are shipped for testing and production into the same cluster, two different + namespaces can be used. This allows to assign different ILM policies as an example. + + The default value for a namespace is `default`. + example: production diff --git a/use-cases/apm.md b/use-cases/apm.md deleted file mode 100644 index 8a63ae0aa3..0000000000 --- a/use-cases/apm.md +++ /dev/null @@ -1,21 +0,0 @@ -## APM use case - -ECS usage for the APM data. - -### APM fields - - -| Field | Description | Level | Type | Example | -|---|---|---|---|---| -| *id* | *Unique id to describe the event.* | (use case) | keyword | `8a4f500d` | -| [@timestamp](../README.md#@timestamp) | Timestamp when the event was created in the app / service. | core | date | `2016-05-23T08:05:34.853Z` | -| *agent.** | *The agent fields are used to describe which agent did send the information.
* | | | | -| [agent.version](../README.md#agent.version) | APM Agent version. | core | keyword | `3.14.0` | -| [agent.name](../README.md#agent.name) | APM agent name. | core | keyword | `elastic-node` | -| *service.** | *The service fields describe the service inside which the APM agent is running.
* | | | | -| [service.id](../README.md#service.id) | Unique identifier of the running service. | core | keyword | `d37e5ebfe0ae6c4972dbe9f0174a1637bb8247f6` | -| [service.name](../README.md#service.name) | Name of the service the agent is running in. This is normally a user defined name. | core | keyword | `user-service` | -| [service.version](../README.md#service.version) | Version of the service the agent is running in. This depends on if the service is given a version. | core | keyword | `3.2.4` | - - - diff --git a/use-cases/auditbeat.md b/use-cases/auditbeat.md deleted file mode 100644 index dff825a597..0000000000 --- a/use-cases/auditbeat.md +++ /dev/null @@ -1,44 +0,0 @@ -## Auditbeat use case - -ECS usage in Auditbeat. - -### Auditbeat fields - - -| Field | Description | Level | Type | Example | -|---|---|---|---|---| -| [event.module](../README.md#event.module) | Auditbeat module name. | core | keyword | `apache` | -| *file.** | *File attributes.
* | | | | -| [file.path](../README.md#file.path) | The path to the file. | extended | keyword | `/home/alice/example.png` | -| [file.target_path](../README.md#file.target_path) | The target path for symlinks. | extended | keyword | | -| [file.type](../README.md#file.type) | The file type (file, dir, or symlink). | extended | keyword | `file` | -| [file.device](../README.md#file.device) | The device. | extended | keyword | `sda` | -| [file.inode](../README.md#file.inode) | The inode representing the file in the filesystem. | extended | keyword | `256383` | -| [file.uid](../README.md#file.uid) | The user ID (UID) or security identifier (SID) of the file owner. | extended | keyword | `1001` | -| [file.owner](../README.md#file.owner) | The file owner's username. | extended | keyword | `alice` | -| [file.gid](../README.md#file.gid) | The primary group ID (GID) of the file. | extended | keyword | `1001` | -| [file.group](../README.md#file.group) | The primary group name of the file. | extended | keyword | `alice` | -| [file.mode](../README.md#file.mode) | The mode of the file in octal representation. | extended | keyword | `416` | -| [file.size](../README.md#file.size) | The file size in bytes (field is only added when `type` is `file`). | extended | long | `16384` | -| [file.mtime](../README.md#file.mtime) | The last modified time of the file (time when content was modified). | extended | date | | -| [file.ctime](../README.md#file.ctime) | The last change time of the file (time when metadata was changed). | extended | date | | -| *hash.** | *Hash fields used in Auditbeat.
The hash field contains cryptographic hashes of data associated with the event (such as a file). The keys are names of cryptographic algorithms. The values are encoded as hexidecimal (lower-case).
All fields in user can have one or multiple entries.
* | | | | -| *hash.blake2b_256* | *BLAKE2b-256 hash of the file.* | (use case) | keyword | | -| *hash.blake2b_384* | *BLAKE2b-384 hash of the file.* | (use case) | keyword | | -| *hash.blake2b_512* | *BLAKE2b-512 hash of the file.* | (use case) | keyword | | -| [hash.md5](../README.md#hash.md5) | MD5 hash. | extended | keyword | | -| [hash.sha1](../README.md#hash.sha1) | SHA-1 hash. | extended | keyword | | -| *hash.sha224* | *SHA-224 hash (SHA-2 family).* | (use case) | keyword | | -| [hash.sha256](../README.md#hash.sha256) | SHA-256 hash (SHA-2 family). | extended | keyword | | -| *hash.sha384* | *SHA-384 hash (SHA-2 family).* | (use case) | keyword | | -| [hash.sha512](../README.md#hash.sha512) | SHA-512 hash (SHA-2 family). | extended | keyword | | -| *hash.sha512_224* | *SHA-512/224 hash (SHA-2 family).* | (use case) | keyword | | -| *hash.sha512_256* | *SHA-512/256 hash (SHA-2 family).* | (use case) | keyword | | -| *hash.sha3_224* | *SHA3-224 hash (SHA-3 family).* | (use case) | keyword | | -| *hash.sha3_256* | *SHA3-256 hash (SHA-3 family).* | (use case) | keyword | | -| *hash.sha3_384* | *SHA3-384 hash (SHA-3 family).* | (use case) | keyword | | -| *hash.sha3_512* | *SHA3-512 hash (SHA-3 family).* | (use case) | keyword | | -| *hash.xxh64* | *XX64 hash of the file.* | (use case) | keyword | | - - - diff --git a/use-cases/beats.md b/use-cases/beats.md deleted file mode 100644 index c96e994b2d..0000000000 --- a/use-cases/beats.md +++ /dev/null @@ -1,18 +0,0 @@ -## Beats use case - -ECS fields used in Beats. - -### Beats fields - - -| Field | Description | Level | Type | Example | -|---|---|---|---|---| -| *id* | *Unique id to describe the event.* | (use case) | keyword | `8a4f500d` | -| *timestamp* | *Timestamp when the event was created.* | (use case) | date | `2016-05-23T08:05:34.853Z` | -| *agent.** | *The agent fields are used to describe by which beat the information was collected.
* | | | | -| [agent.version](../README.md#agent.version) | Beat version. | core | keyword | `6.0.0-rc2` | -| [agent.name](../README.md#agent.name) | Beat name. | core | keyword | `filebeat` | -| [agent.id](../README.md#agent.id) | Unique beat identifier. | core | keyword | `8a4f500d` | - - - diff --git a/use-cases/filebeat-apache-access.md b/use-cases/filebeat-apache-access.md deleted file mode 100644 index a9ef41840f..0000000000 --- a/use-cases/filebeat-apache-access.md +++ /dev/null @@ -1,29 +0,0 @@ -## Filebeat Apache use case - -ECS fields used in Filebeat for the apache module. - -### Filebeat Apache fields - - -| Field | Description | Level | Type | Example | -|---|---|---|---|---| -| *id* | *Unique id to describe the event.* | (use case) | keyword | `8a4f500d` | -| [@timestamp](../README.md#@timestamp) | Timestamp of the log line after processing. | core | date | `2016-05-23T08:05:34.853Z` | -| [message](../README.md#message) | Log message of the event | core | text | `Hello World` | -| [event.module](../README.md#event.module) | Currently fileset.module | core | keyword | `apache` | -| [event.dataset](../README.md#event.dataset) | Currenly fileset.name | core | keyword | `access` | -| [source.ip](../README.md#source.ip) | Source ip of the request. Currently apache.access.remote_ip | core | ip | `192.168.1.1` | -| [user.name](../README.md#user.name) | User name in the request. Currently apache.access.user_name | core | keyword | `ruflin` | -| *http.method* | *Http method, currently apache.access.method* | (use case) | keyword | `GET` | -| *http.url* | *Http url, currently apache.access.url* | (use case) | keyword | `http://elastic.co/` | -| [http.version](../README.md#http.version) | Http version, currently apache.access.http_version | extended | keyword | `1.1` | -| *http.response.code* | *Http response code, currently apache.access.response_code* | (use case) | keyword | `404` | -| *http.response.body_sent.bytes* | *Http response body bytes sent, currently apache.access.body_sent.bytes* | (use case) | long | `117` | -| *http.referer* | *Http referrer code, currently apache.access.referrer
NOTE: In the RFC its misspell as referer and has become accepted standard* | (use case) | keyword | `http://elastic.co/` | -| *user_agent.** | *User agent fields as in schema. Currently under apache.access.user_agent.*
* | | | | -| [user_agent.original](../README.md#user_agent.original) | Original user agent. Currently apache.access.agent | extended | keyword | `http://elastic.co/` | -| *geoip.** | *User agent fields as in schema. Currently under apache.access.geoip.*
These are extracted from source.ip
Should they be under source.geoip?
* | | | | -| *geoip....* | *All geoip fields.* | (use case) | keyword | | - - - diff --git a/use-cases/kubernetes.md b/use-cases/kubernetes.md deleted file mode 100644 index 5588da6060..0000000000 --- a/use-cases/kubernetes.md +++ /dev/null @@ -1,21 +0,0 @@ -## Kubernetes use case - -You can monitor containers running in a Kubernetes cluster by adding Kubernetes-specific information under `kubernetes.` - - -### Kubernetes fields - - -| Field | Description | Level | Type | Example | -|---|---|---|---|---| -| [container.id](../README.md#container.id) | Unique container id. | core | keyword | `fdbef803fa2b` | -| [container.name](../README.md#container.name) | Container name. | extended | keyword | | -| [host.hostname](../README.md#host.hostname) | Hostname of the host.
It normally contains what the `hostname` command returns on the host machine. | core | keyword | `kube-high-cpu-42` | -| *kubernetes.pod.name* | *Kubernetes pod name* | (use case) | keyword | `foo-webserver` | -| *kubernetes.namespace* | *Kubernetes namespace* | (use case) | keyword | `foo-team` | -| *kubernetes.labels* | *Kubernetes labels map* | (use case) | object | | -| *kubernetes.annotations* | *Kubernetes annotations map* | (use case) | object | | -| *kubernetes.container.name* | *Kubernetes container name. This name is unique within the pod only. It is different from the `container.name` field.* | (use case) | keyword | | - - - diff --git a/use-cases/logging.md b/use-cases/logging.md deleted file mode 100644 index be7efd0b6b..0000000000 --- a/use-cases/logging.md +++ /dev/null @@ -1,22 +0,0 @@ -## Logging use case - -ECS fields used in logging use cases. - -### Logging fields - - -| Field | Description | Level | Type | Example | -|---|---|---|---|---| -| *id* | *Unique id of the log entry.* | (use case) | keyword | `8a4f500d` | -| *timestamp* | *Timestamp of the log line.* | (use case) | date | `2016-05-23T08:05:34.853Z` | -| [message](../README.md#message) | The log message.
This can contain the full log line or based on the processing only the extracted message part. This is expected to be human readable. | core | text | `Hello World` | -| *hostname* | *Hostname extracted from the log line.* | (use case) | keyword | `www.example.com` | -| *ip* | *IP Address extracted from the log line. Can be IPv4 or IPv6.* | (use case) | ip | `192.168.1.12` | -| [log.level](../README.md#log.level) | Log level field. Is expected to be `WARN`, `ERR`, `INFO` etc. | core | keyword | `ERR` | -| *log.line* | *Line number the log event was collected from.* | (use case) | long | `18` | -| *log.offset* | *Offset of the log event.* | (use case) | long | `12` | -| *source.** | *Describes from where the log entries come from.
* | | | | -| *source.path* | *File path of the file the data is harvested from.* | (use case) | keyword | `/var/log/test.log` | - - - diff --git a/use-cases/metricbeat.md b/use-cases/metricbeat.md deleted file mode 100644 index c573a7897e..0000000000 --- a/use-cases/metricbeat.md +++ /dev/null @@ -1,31 +0,0 @@ -## Metricbeat use case - -ECS fields used Metricbeat. - -### Metricbeat fields - - -| Field | Description | Level | Type | Example | -|---|---|---|---|---| -| *id* | *Unique id to describe the event.* | (use case) | keyword | `8a4f500d` | -| *timestamp* | *Timestamp when the event was created.* | (use case) | date | `2016-05-23T08:05:34.853Z` | -| [agent.version](../README.md#agent.version) | Beat version. | core | keyword | `6.0.0-rc2` | -| [agent.name](../README.md#agent.name) | Beat name. | core | keyword | `filebeat` | -| [agent.id](../README.md#agent.id) | Unique beat identifier. | core | keyword | `8a4f500d` | -| *service.** | *The service fields describe the service for / from which the data was collected.
If logs or metrics are collected from Redis, `service.name` would be `redis`. This allows to find and correlate logs for a specicic service or even version with `service.version`.
* | | | | -| [service.id](../README.md#service.id) | Unique identifier of the running service.
This id should uniquely identify this service. This makes it possible to correlate logs and metrics for one specific service. For example in case of issues with one redis instance, it's possible to filter on the id to see metrics and logs for this single instance. | core | keyword | `d37e5ebfe0ae6c4972dbe9f0174a1637bb8247f6` | -| [service.name](../README.md#service.name) | Name of the service data is collected from.
The name is normally the same as the module name. | core | keyword | `elasticsearch` | -| [service.version](../README.md#service.version) | Version of the service the data was collected from.
This allows to look at a data set only for a specific version of a service. | core | keyword | `3.2.4` | -| *service.host* | *Host address that is used to connect to the service.
This normally contains hostname + port.
REVIEW: Should this be service.uri instead, sometimes it's more then just the host? It could also include a path or the protocol.* | (use case) | keyword | `elasticsearch:9200` | -| *request.rtt* | *Request round trip time.
How long did the request take to fetch metrics from the service.
REVIEW: THIS DOES NOT EXIST YET IN ECS.* | (use case) | long | `115` | -| *error.** | *Error namespace
Use for errors which can happen during fetching information for a service.
* | | | | -| [error.message](../README.md#error.message) | Error message returned by the service during fetching metrics. | core | text | | -| [error.code](../README.md#error.code) | Error code returned by the service during fetching metrics. | core | keyword | | -| [host.hostname](../README.md#host.hostname) | Hostname of the system metricbeat is running on or user defined name. | core | keyword | | -| *host.timezone.offset.sec* | *Timezone offset of the host in seconds.* | (use case) | long | | -| [host.id](../README.md#host.id) | Unique host id. | core | keyword | | -| [event.module](../README.md#event.module) | Name of the module this data is coming from. | core | keyword | `mysql` | -| [event.dataset](../README.md#event.dataset) | Name of the dataset.
This contains the information which is currently stored in metricset.name and metricset.module. | core | keyword | `stats` | - - - diff --git a/use-cases/web-logs.md b/use-cases/web-logs.md deleted file mode 100644 index 57f9a96062..0000000000 --- a/use-cases/web-logs.md +++ /dev/null @@ -1,29 +0,0 @@ -## Parsing web server logs use case - -Representing web server access logs in ECS. -This use case uses previous definitions for `http` and `user_agent` fields sets, which were taken out of ECS temporarily for Beta1. Their official definition in ECS is expected to change slightly. -Using the fields as represented here is not expected to conflict with ECS, but may require a transition, when they are re-introduced officially. - -### Parsing web server logs fields - - -| Field | Description | Level | Type | Example | -|---|---|---|---|---| -| [@timestamp](../README.md#@timestamp) | Time at which the response was sent, and the web server log created. | core | date | `2016-05-23T08:05:34.853Z` | -| *http.** | *Fields related to HTTP requests and responses.
* | | | | -| [http.request.method](../README.md#http.request.method) | Http request method. | extended | keyword | `GET, POST, PUT` | -| [http.request.referrer](../README.md#http.request.referrer) | Referrer for this HTTP request. | extended | keyword | `https://blog.example.com/` | -| [http.response.status_code](../README.md#http.response.status_code) | Http response status code. | extended | long | `404` | -| [http.response.body.content](../README.md#http.response.body.content) | The full http response body. | extended | keyword | `Hello world` | -| [http.version](../README.md#http.version) | Http version. | extended | keyword | `1.1` | -| *user_agent.** | *The user_agent fields normally come from a browser request. They often show up in web service logs coming from the parsed user agent string.
* | | | | -| [user_agent.original](../README.md#user_agent.original) | Unparsed version of the user_agent. | extended | keyword | `Mozilla/5.0 (iPhone; CPU iPhone OS 12_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Mobile/15E148 Safari/604.1` | -| *user_agent.device* | *Name of the physical device.* | (use case) | keyword | | -| [user_agent.version](../README.md#user_agent.version) | Version of the physical device. | extended | keyword | `12.0` | -| *user_agent.major* | *Major version of the user agent.* | (use case) | long | | -| *user_agent.minor* | *Minor version of the user agent.* | (use case) | long | | -| *user_agent.patch* | *Patch version of the user agent.* | (use case) | keyword | | -| [user_agent.name](../README.md#user_agent.name) | Name of the user agent. | extended | keyword | `Chrome` | - - - From 98875d4dc0396b904df80bd4660ab6e3e2d23f39 Mon Sep 17 00:00:00 2001 From: ruflin Date: Wed, 13 May 2020 09:42:01 +0200 Subject: [PATCH 2/5] add support for constant_keyword --- code/go/ecs/stream.go | 56 +++++++++++++++++++++++++++++ scripts/cmd/gocodegen/gocodegen.go | 2 +- scripts/generators/es_template.py | 3 ++ use-cases/apm.md | 21 +++++++++++ use-cases/auditbeat.md | 44 +++++++++++++++++++++++ use-cases/beats.md | 18 ++++++++++ use-cases/filebeat-apache-access.md | 29 +++++++++++++++ use-cases/kubernetes.md | 21 +++++++++++ use-cases/logging.md | 22 ++++++++++++ use-cases/metricbeat.md | 31 ++++++++++++++++ use-cases/web-logs.md | 29 +++++++++++++++ 11 files changed, 275 insertions(+), 1 deletion(-) create mode 100644 code/go/ecs/stream.go create mode 100644 use-cases/apm.md create mode 100644 use-cases/auditbeat.md create mode 100644 use-cases/beats.md create mode 100644 use-cases/filebeat-apache-access.md create mode 100644 use-cases/kubernetes.md create mode 100644 use-cases/logging.md create mode 100644 use-cases/metricbeat.md create mode 100644 use-cases/web-logs.md diff --git a/code/go/ecs/stream.go b/code/go/ecs/stream.go new file mode 100644 index 0000000000..47d4ad26d5 --- /dev/null +++ b/code/go/ecs/stream.go @@ -0,0 +1,56 @@ +// Licensed to Elasticsearch B.V. under one or more contributor +// license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright +// ownership. Elasticsearch B.V. licenses this file to you under +// the Apache License, Version 2.0 (the "License"); you may +// not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +// Code generated by scripts/gocodegen.go - DO NOT EDIT. + +package ecs + +// The stream fields are part of the new [indexing +// strategy](https://github.com/elastic/kibana/blob/master/docs/ingest_manager/index.asciidoc#indexing-strategy-1). +// These fields are used to determine into which index the data is shipped in +// Elasticsearch and allow efficient querying of data. Initially these fields +// are mainly used by data shipped by the Elastic Agent but any time series +// data shipper should switch to using data streams and the new indexing +// strategy with these fields. +// All three fields are `constant_keyword` fields. +type Stream struct { + // Type of the stream. + // The type of the stream can be `logs` or `metrics`. More types can be + // added in the future but no other types then the one describe here should + // be used. + Type string `ecs:"type"` + + // Dataset describes the structure of the data. + // The dataset describes the structure of the data. All data shipped into a + // single dataset should have the same or very similar data structure. For + // example `system.cpu` and `system.disk` are two different datasets as + // they have very different fields. + // The name of the dataset should be descriptive of the data and it is + // encourage to use `.` to combine multiple words. All characters which are + // allowed in index names can be used for the dataset except `-`. + // The default for dataset is `generic`. + Dataset string `ecs:"dataset"` + + // Namespace of your stream. + // This is the namespace used in your index. The namespace is used to + // separate the same structure into different Data Streams. For example if + // nginx logs are shipped for testing and production into the same cluster, + // two different namespaces can be used. This allows to assign different + // ILM policies as an example. + // The default value for a namespace is `default`. + Namespace string `ecs:"namespace"` +} diff --git a/scripts/cmd/gocodegen/gocodegen.go b/scripts/cmd/gocodegen/gocodegen.go index c202691ce0..6802722efa 100644 --- a/scripts/cmd/gocodegen/gocodegen.go +++ b/scripts/cmd/gocodegen/gocodegen.go @@ -274,7 +274,7 @@ func goDataType(fieldName, elasticsearchDataType string) string { } switch elasticsearchDataType { - case "keyword", "text", "ip", "geo_point": + case "keyword", "constant_keyword", "text", "ip", "geo_point": return "string" case "long": return "int64" diff --git a/scripts/generators/es_template.py b/scripts/generators/es_template.py index 6a04461008..f5f2b59cda 100644 --- a/scripts/generators/es_template.py +++ b/scripts/generators/es_template.py @@ -50,6 +50,9 @@ def entry_for(field): elif field['type'] == 'text': ecs_helpers.dict_copy_existing_keys(field, field_entry, ['norms']) + if field['type'] == 'constant_keyword': + ecs_helpers.dict_copy_existing_keys(field, field_entry) + if 'multi_fields' in field: field_entry['fields'] = {} for mf in field['multi_fields']: diff --git a/use-cases/apm.md b/use-cases/apm.md new file mode 100644 index 0000000000..8a63ae0aa3 --- /dev/null +++ b/use-cases/apm.md @@ -0,0 +1,21 @@ +## APM use case + +ECS usage for the APM data. + +### APM fields + + +| Field | Description | Level | Type | Example | +|---|---|---|---|---| +| *id* | *Unique id to describe the event.* | (use case) | keyword | `8a4f500d` | +| [@timestamp](../README.md#@timestamp) | Timestamp when the event was created in the app / service. | core | date | `2016-05-23T08:05:34.853Z` | +| *agent.** | *The agent fields are used to describe which agent did send the information.
* | | | | +| [agent.version](../README.md#agent.version) | APM Agent version. | core | keyword | `3.14.0` | +| [agent.name](../README.md#agent.name) | APM agent name. | core | keyword | `elastic-node` | +| *service.** | *The service fields describe the service inside which the APM agent is running.
* | | | | +| [service.id](../README.md#service.id) | Unique identifier of the running service. | core | keyword | `d37e5ebfe0ae6c4972dbe9f0174a1637bb8247f6` | +| [service.name](../README.md#service.name) | Name of the service the agent is running in. This is normally a user defined name. | core | keyword | `user-service` | +| [service.version](../README.md#service.version) | Version of the service the agent is running in. This depends on if the service is given a version. | core | keyword | `3.2.4` | + + + diff --git a/use-cases/auditbeat.md b/use-cases/auditbeat.md new file mode 100644 index 0000000000..dff825a597 --- /dev/null +++ b/use-cases/auditbeat.md @@ -0,0 +1,44 @@ +## Auditbeat use case + +ECS usage in Auditbeat. + +### Auditbeat fields + + +| Field | Description | Level | Type | Example | +|---|---|---|---|---| +| [event.module](../README.md#event.module) | Auditbeat module name. | core | keyword | `apache` | +| *file.** | *File attributes.
* | | | | +| [file.path](../README.md#file.path) | The path to the file. | extended | keyword | `/home/alice/example.png` | +| [file.target_path](../README.md#file.target_path) | The target path for symlinks. | extended | keyword | | +| [file.type](../README.md#file.type) | The file type (file, dir, or symlink). | extended | keyword | `file` | +| [file.device](../README.md#file.device) | The device. | extended | keyword | `sda` | +| [file.inode](../README.md#file.inode) | The inode representing the file in the filesystem. | extended | keyword | `256383` | +| [file.uid](../README.md#file.uid) | The user ID (UID) or security identifier (SID) of the file owner. | extended | keyword | `1001` | +| [file.owner](../README.md#file.owner) | The file owner's username. | extended | keyword | `alice` | +| [file.gid](../README.md#file.gid) | The primary group ID (GID) of the file. | extended | keyword | `1001` | +| [file.group](../README.md#file.group) | The primary group name of the file. | extended | keyword | `alice` | +| [file.mode](../README.md#file.mode) | The mode of the file in octal representation. | extended | keyword | `416` | +| [file.size](../README.md#file.size) | The file size in bytes (field is only added when `type` is `file`). | extended | long | `16384` | +| [file.mtime](../README.md#file.mtime) | The last modified time of the file (time when content was modified). | extended | date | | +| [file.ctime](../README.md#file.ctime) | The last change time of the file (time when metadata was changed). | extended | date | | +| *hash.** | *Hash fields used in Auditbeat.
The hash field contains cryptographic hashes of data associated with the event (such as a file). The keys are names of cryptographic algorithms. The values are encoded as hexidecimal (lower-case).
All fields in user can have one or multiple entries.
* | | | | +| *hash.blake2b_256* | *BLAKE2b-256 hash of the file.* | (use case) | keyword | | +| *hash.blake2b_384* | *BLAKE2b-384 hash of the file.* | (use case) | keyword | | +| *hash.blake2b_512* | *BLAKE2b-512 hash of the file.* | (use case) | keyword | | +| [hash.md5](../README.md#hash.md5) | MD5 hash. | extended | keyword | | +| [hash.sha1](../README.md#hash.sha1) | SHA-1 hash. | extended | keyword | | +| *hash.sha224* | *SHA-224 hash (SHA-2 family).* | (use case) | keyword | | +| [hash.sha256](../README.md#hash.sha256) | SHA-256 hash (SHA-2 family). | extended | keyword | | +| *hash.sha384* | *SHA-384 hash (SHA-2 family).* | (use case) | keyword | | +| [hash.sha512](../README.md#hash.sha512) | SHA-512 hash (SHA-2 family). | extended | keyword | | +| *hash.sha512_224* | *SHA-512/224 hash (SHA-2 family).* | (use case) | keyword | | +| *hash.sha512_256* | *SHA-512/256 hash (SHA-2 family).* | (use case) | keyword | | +| *hash.sha3_224* | *SHA3-224 hash (SHA-3 family).* | (use case) | keyword | | +| *hash.sha3_256* | *SHA3-256 hash (SHA-3 family).* | (use case) | keyword | | +| *hash.sha3_384* | *SHA3-384 hash (SHA-3 family).* | (use case) | keyword | | +| *hash.sha3_512* | *SHA3-512 hash (SHA-3 family).* | (use case) | keyword | | +| *hash.xxh64* | *XX64 hash of the file.* | (use case) | keyword | | + + + diff --git a/use-cases/beats.md b/use-cases/beats.md new file mode 100644 index 0000000000..c96e994b2d --- /dev/null +++ b/use-cases/beats.md @@ -0,0 +1,18 @@ +## Beats use case + +ECS fields used in Beats. + +### Beats fields + + +| Field | Description | Level | Type | Example | +|---|---|---|---|---| +| *id* | *Unique id to describe the event.* | (use case) | keyword | `8a4f500d` | +| *timestamp* | *Timestamp when the event was created.* | (use case) | date | `2016-05-23T08:05:34.853Z` | +| *agent.** | *The agent fields are used to describe by which beat the information was collected.
* | | | | +| [agent.version](../README.md#agent.version) | Beat version. | core | keyword | `6.0.0-rc2` | +| [agent.name](../README.md#agent.name) | Beat name. | core | keyword | `filebeat` | +| [agent.id](../README.md#agent.id) | Unique beat identifier. | core | keyword | `8a4f500d` | + + + diff --git a/use-cases/filebeat-apache-access.md b/use-cases/filebeat-apache-access.md new file mode 100644 index 0000000000..a9ef41840f --- /dev/null +++ b/use-cases/filebeat-apache-access.md @@ -0,0 +1,29 @@ +## Filebeat Apache use case + +ECS fields used in Filebeat for the apache module. + +### Filebeat Apache fields + + +| Field | Description | Level | Type | Example | +|---|---|---|---|---| +| *id* | *Unique id to describe the event.* | (use case) | keyword | `8a4f500d` | +| [@timestamp](../README.md#@timestamp) | Timestamp of the log line after processing. | core | date | `2016-05-23T08:05:34.853Z` | +| [message](../README.md#message) | Log message of the event | core | text | `Hello World` | +| [event.module](../README.md#event.module) | Currently fileset.module | core | keyword | `apache` | +| [event.dataset](../README.md#event.dataset) | Currenly fileset.name | core | keyword | `access` | +| [source.ip](../README.md#source.ip) | Source ip of the request. Currently apache.access.remote_ip | core | ip | `192.168.1.1` | +| [user.name](../README.md#user.name) | User name in the request. Currently apache.access.user_name | core | keyword | `ruflin` | +| *http.method* | *Http method, currently apache.access.method* | (use case) | keyword | `GET` | +| *http.url* | *Http url, currently apache.access.url* | (use case) | keyword | `http://elastic.co/` | +| [http.version](../README.md#http.version) | Http version, currently apache.access.http_version | extended | keyword | `1.1` | +| *http.response.code* | *Http response code, currently apache.access.response_code* | (use case) | keyword | `404` | +| *http.response.body_sent.bytes* | *Http response body bytes sent, currently apache.access.body_sent.bytes* | (use case) | long | `117` | +| *http.referer* | *Http referrer code, currently apache.access.referrer
NOTE: In the RFC its misspell as referer and has become accepted standard* | (use case) | keyword | `http://elastic.co/` | +| *user_agent.** | *User agent fields as in schema. Currently under apache.access.user_agent.*
* | | | | +| [user_agent.original](../README.md#user_agent.original) | Original user agent. Currently apache.access.agent | extended | keyword | `http://elastic.co/` | +| *geoip.** | *User agent fields as in schema. Currently under apache.access.geoip.*
These are extracted from source.ip
Should they be under source.geoip?
* | | | | +| *geoip....* | *All geoip fields.* | (use case) | keyword | | + + + diff --git a/use-cases/kubernetes.md b/use-cases/kubernetes.md new file mode 100644 index 0000000000..5588da6060 --- /dev/null +++ b/use-cases/kubernetes.md @@ -0,0 +1,21 @@ +## Kubernetes use case + +You can monitor containers running in a Kubernetes cluster by adding Kubernetes-specific information under `kubernetes.` + + +### Kubernetes fields + + +| Field | Description | Level | Type | Example | +|---|---|---|---|---| +| [container.id](../README.md#container.id) | Unique container id. | core | keyword | `fdbef803fa2b` | +| [container.name](../README.md#container.name) | Container name. | extended | keyword | | +| [host.hostname](../README.md#host.hostname) | Hostname of the host.
It normally contains what the `hostname` command returns on the host machine. | core | keyword | `kube-high-cpu-42` | +| *kubernetes.pod.name* | *Kubernetes pod name* | (use case) | keyword | `foo-webserver` | +| *kubernetes.namespace* | *Kubernetes namespace* | (use case) | keyword | `foo-team` | +| *kubernetes.labels* | *Kubernetes labels map* | (use case) | object | | +| *kubernetes.annotations* | *Kubernetes annotations map* | (use case) | object | | +| *kubernetes.container.name* | *Kubernetes container name. This name is unique within the pod only. It is different from the `container.name` field.* | (use case) | keyword | | + + + diff --git a/use-cases/logging.md b/use-cases/logging.md new file mode 100644 index 0000000000..be7efd0b6b --- /dev/null +++ b/use-cases/logging.md @@ -0,0 +1,22 @@ +## Logging use case + +ECS fields used in logging use cases. + +### Logging fields + + +| Field | Description | Level | Type | Example | +|---|---|---|---|---| +| *id* | *Unique id of the log entry.* | (use case) | keyword | `8a4f500d` | +| *timestamp* | *Timestamp of the log line.* | (use case) | date | `2016-05-23T08:05:34.853Z` | +| [message](../README.md#message) | The log message.
This can contain the full log line or based on the processing only the extracted message part. This is expected to be human readable. | core | text | `Hello World` | +| *hostname* | *Hostname extracted from the log line.* | (use case) | keyword | `www.example.com` | +| *ip* | *IP Address extracted from the log line. Can be IPv4 or IPv6.* | (use case) | ip | `192.168.1.12` | +| [log.level](../README.md#log.level) | Log level field. Is expected to be `WARN`, `ERR`, `INFO` etc. | core | keyword | `ERR` | +| *log.line* | *Line number the log event was collected from.* | (use case) | long | `18` | +| *log.offset* | *Offset of the log event.* | (use case) | long | `12` | +| *source.** | *Describes from where the log entries come from.
* | | | | +| *source.path* | *File path of the file the data is harvested from.* | (use case) | keyword | `/var/log/test.log` | + + + diff --git a/use-cases/metricbeat.md b/use-cases/metricbeat.md new file mode 100644 index 0000000000..c573a7897e --- /dev/null +++ b/use-cases/metricbeat.md @@ -0,0 +1,31 @@ +## Metricbeat use case + +ECS fields used Metricbeat. + +### Metricbeat fields + + +| Field | Description | Level | Type | Example | +|---|---|---|---|---| +| *id* | *Unique id to describe the event.* | (use case) | keyword | `8a4f500d` | +| *timestamp* | *Timestamp when the event was created.* | (use case) | date | `2016-05-23T08:05:34.853Z` | +| [agent.version](../README.md#agent.version) | Beat version. | core | keyword | `6.0.0-rc2` | +| [agent.name](../README.md#agent.name) | Beat name. | core | keyword | `filebeat` | +| [agent.id](../README.md#agent.id) | Unique beat identifier. | core | keyword | `8a4f500d` | +| *service.** | *The service fields describe the service for / from which the data was collected.
If logs or metrics are collected from Redis, `service.name` would be `redis`. This allows to find and correlate logs for a specicic service or even version with `service.version`.
* | | | | +| [service.id](../README.md#service.id) | Unique identifier of the running service.
This id should uniquely identify this service. This makes it possible to correlate logs and metrics for one specific service. For example in case of issues with one redis instance, it's possible to filter on the id to see metrics and logs for this single instance. | core | keyword | `d37e5ebfe0ae6c4972dbe9f0174a1637bb8247f6` | +| [service.name](../README.md#service.name) | Name of the service data is collected from.
The name is normally the same as the module name. | core | keyword | `elasticsearch` | +| [service.version](../README.md#service.version) | Version of the service the data was collected from.
This allows to look at a data set only for a specific version of a service. | core | keyword | `3.2.4` | +| *service.host* | *Host address that is used to connect to the service.
This normally contains hostname + port.
REVIEW: Should this be service.uri instead, sometimes it's more then just the host? It could also include a path or the protocol.* | (use case) | keyword | `elasticsearch:9200` | +| *request.rtt* | *Request round trip time.
How long did the request take to fetch metrics from the service.
REVIEW: THIS DOES NOT EXIST YET IN ECS.* | (use case) | long | `115` | +| *error.** | *Error namespace
Use for errors which can happen during fetching information for a service.
* | | | | +| [error.message](../README.md#error.message) | Error message returned by the service during fetching metrics. | core | text | | +| [error.code](../README.md#error.code) | Error code returned by the service during fetching metrics. | core | keyword | | +| [host.hostname](../README.md#host.hostname) | Hostname of the system metricbeat is running on or user defined name. | core | keyword | | +| *host.timezone.offset.sec* | *Timezone offset of the host in seconds.* | (use case) | long | | +| [host.id](../README.md#host.id) | Unique host id. | core | keyword | | +| [event.module](../README.md#event.module) | Name of the module this data is coming from. | core | keyword | `mysql` | +| [event.dataset](../README.md#event.dataset) | Name of the dataset.
This contains the information which is currently stored in metricset.name and metricset.module. | core | keyword | `stats` | + + + diff --git a/use-cases/web-logs.md b/use-cases/web-logs.md new file mode 100644 index 0000000000..57f9a96062 --- /dev/null +++ b/use-cases/web-logs.md @@ -0,0 +1,29 @@ +## Parsing web server logs use case + +Representing web server access logs in ECS. +This use case uses previous definitions for `http` and `user_agent` fields sets, which were taken out of ECS temporarily for Beta1. Their official definition in ECS is expected to change slightly. +Using the fields as represented here is not expected to conflict with ECS, but may require a transition, when they are re-introduced officially. + +### Parsing web server logs fields + + +| Field | Description | Level | Type | Example | +|---|---|---|---|---| +| [@timestamp](../README.md#@timestamp) | Time at which the response was sent, and the web server log created. | core | date | `2016-05-23T08:05:34.853Z` | +| *http.** | *Fields related to HTTP requests and responses.
* | | | | +| [http.request.method](../README.md#http.request.method) | Http request method. | extended | keyword | `GET, POST, PUT` | +| [http.request.referrer](../README.md#http.request.referrer) | Referrer for this HTTP request. | extended | keyword | `https://blog.example.com/` | +| [http.response.status_code](../README.md#http.response.status_code) | Http response status code. | extended | long | `404` | +| [http.response.body.content](../README.md#http.response.body.content) | The full http response body. | extended | keyword | `Hello world` | +| [http.version](../README.md#http.version) | Http version. | extended | keyword | `1.1` | +| *user_agent.** | *The user_agent fields normally come from a browser request. They often show up in web service logs coming from the parsed user agent string.
* | | | | +| [user_agent.original](../README.md#user_agent.original) | Unparsed version of the user_agent. | extended | keyword | `Mozilla/5.0 (iPhone; CPU iPhone OS 12_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Mobile/15E148 Safari/604.1` | +| *user_agent.device* | *Name of the physical device.* | (use case) | keyword | | +| [user_agent.version](../README.md#user_agent.version) | Version of the physical device. | extended | keyword | `12.0` | +| *user_agent.major* | *Major version of the user agent.* | (use case) | long | | +| *user_agent.minor* | *Minor version of the user agent.* | (use case) | long | | +| *user_agent.patch* | *Patch version of the user agent.* | (use case) | keyword | | +| [user_agent.name](../README.md#user_agent.name) | Name of the user agent. | extended | keyword | `Chrome` | + + + From 22977d96490897e1cad001555f7bc4e811c8c6db Mon Sep 17 00:00:00 2001 From: ruflin Date: Wed, 13 May 2020 10:17:46 +0200 Subject: [PATCH 3/5] fix tests --- scripts/generators/es_template.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/generators/es_template.py b/scripts/generators/es_template.py index f5f2b59cda..9bbf510d5c 100644 --- a/scripts/generators/es_template.py +++ b/scripts/generators/es_template.py @@ -51,7 +51,7 @@ def entry_for(field): ecs_helpers.dict_copy_existing_keys(field, field_entry, ['norms']) if field['type'] == 'constant_keyword': - ecs_helpers.dict_copy_existing_keys(field, field_entry) + ecs_helpers.dict_copy_existing_keys(field, field_entry, []) if 'multi_fields' in field: field_entry['fields'] = {} From 2e480e4ecff9831402082c3361ded45580622157 Mon Sep 17 00:00:00 2001 From: ruflin Date: Wed, 13 May 2020 13:13:53 +0200 Subject: [PATCH 4/5] fix dataset to constant_keyword --- docs/field-details.asciidoc | 2 +- generated/beats/fields.ecs.yml | 3 +-- generated/csv/fields.csv | 2 +- generated/ecs/ecs_flat.yml | 3 +-- generated/ecs/ecs_nested.yml | 3 +-- generated/elasticsearch/6/template.json | 3 +-- generated/elasticsearch/7/template.json | 3 +-- schemas/stream.yml | 2 +- 8 files changed, 8 insertions(+), 13 deletions(-) diff --git a/docs/field-details.asciidoc b/docs/field-details.asciidoc index bf8101aeb2..4bd9076a2e 100644 --- a/docs/field-details.asciidoc +++ b/docs/field-details.asciidoc @@ -5644,7 +5644,7 @@ The name of the dataset should be descriptive of the data and it is encourage to The default for dataset is `generic`. -type: keyword +type: constant_keyword diff --git a/generated/beats/fields.ecs.yml b/generated/beats/fields.ecs.yml index 959a05f752..5ea78faa47 100644 --- a/generated/beats/fields.ecs.yml +++ b/generated/beats/fields.ecs.yml @@ -4397,8 +4397,7 @@ fields: - name: dataset level: extended - type: keyword - ignore_above: 1024 + type: constant_keyword description: 'Dataset describes the structure of the data. The dataset describes the structure of the data. All data shipped into a single diff --git a/generated/csv/fields.csv b/generated/csv/fields.csv index 30a5e2412e..e183dba634 100644 --- a/generated/csv/fields.csv +++ b/generated/csv/fields.csv @@ -516,7 +516,7 @@ ECS_Version,Indexed,Field_Set,Field,Type,Level,Normalization,Example,Description 1.6.0-dev,true,source,source.user.id,keyword,core,,,Unique identifier of the user. 1.6.0-dev,true,source,source.user.name,keyword,core,,albert,Short name or login of the user. 1.6.0-dev,true,source,source.user.name.text,text,core,,albert,Short name or login of the user. -1.6.0-dev,true,stream,stream.dataset,keyword,extended,,nginx.access,Dataset describes the structure of the data. +1.6.0-dev,true,stream,stream.dataset,constant_keyword,extended,,nginx.access,Dataset describes the structure of the data. 1.6.0-dev,true,stream,stream.namespace,constant_keyword,extended,,production,Namespace of your stream. 1.6.0-dev,true,stream,stream.type,constant_keyword,extended,,logs,Type of the stream. 1.6.0-dev,true,threat,threat.framework,keyword,extended,,MITRE ATT&CK,Threat classification framework. diff --git a/generated/ecs/ecs_flat.yml b/generated/ecs/ecs_flat.yml index 3b232c032e..82fde63738 100644 --- a/generated/ecs/ecs_flat.yml +++ b/generated/ecs/ecs_flat.yml @@ -6684,12 +6684,11 @@ stream.dataset: The default for dataset is `generic`.' example: nginx.access flat_name: stream.dataset - ignore_above: 1024 level: extended name: dataset normalize: [] short: Dataset describes the structure of the data. - type: keyword + type: constant_keyword stream.namespace: dashed_name: stream-namespace description: 'Namespace of your stream. diff --git a/generated/ecs/ecs_nested.yml b/generated/ecs/ecs_nested.yml index b1f4b9142d..905748d99e 100644 --- a/generated/ecs/ecs_nested.yml +++ b/generated/ecs/ecs_nested.yml @@ -7723,12 +7723,11 @@ stream: The default for dataset is `generic`.' example: nginx.access flat_name: stream.dataset - ignore_above: 1024 level: extended name: dataset normalize: [] short: Dataset describes the structure of the data. - type: keyword + type: constant_keyword namespace: dashed_name: stream-namespace description: 'Namespace of your stream. diff --git a/generated/elasticsearch/6/template.json b/generated/elasticsearch/6/template.json index 12687778d9..2d174c74fa 100644 --- a/generated/elasticsearch/6/template.json +++ b/generated/elasticsearch/6/template.json @@ -2453,8 +2453,7 @@ "stream": { "properties": { "dataset": { - "ignore_above": 1024, - "type": "keyword" + "type": "constant_keyword" }, "namespace": { "type": "constant_keyword" diff --git a/generated/elasticsearch/7/template.json b/generated/elasticsearch/7/template.json index 484674f21c..9570c24ca1 100644 --- a/generated/elasticsearch/7/template.json +++ b/generated/elasticsearch/7/template.json @@ -2452,8 +2452,7 @@ "stream": { "properties": { "dataset": { - "ignore_above": 1024, - "type": "keyword" + "type": "constant_keyword" }, "namespace": { "type": "constant_keyword" diff --git a/schemas/stream.yml b/schemas/stream.yml index cd0e5177be..d3bbe8a012 100644 --- a/schemas/stream.yml +++ b/schemas/stream.yml @@ -31,7 +31,7 @@ - name: dataset level: extended - type: keyword + type: constant_keyword short: Dataset describes the structure of the data. description: > Dataset describes the structure of the data. From 1c79b49f66cb0a5eada144dd378de563627d0a1c Mon Sep 17 00:00:00 2001 From: ruflin Date: Wed, 10 Jun 2020 10:11:07 +0200 Subject: [PATCH 5/5] add dataset changes --- code/go/ecs/{stream.go => dataset.go} | 22 ++-- docs/field-details.asciidoc | 140 ++++++++++++------------ docs/fields.asciidoc | 4 +- generated/beats/fields.ecs.yml | 109 +++++++++--------- generated/csv/fields.csv | 6 +- generated/ecs/ecs_flat.yml | 101 ++++++++--------- generated/ecs/ecs_nested.yml | 139 +++++++++++------------ generated/elasticsearch/6/template.json | 26 ++--- generated/elasticsearch/7/template.json | 26 ++--- schemas/{stream.yml => dataset.yml} | 28 ++--- 10 files changed, 302 insertions(+), 299 deletions(-) rename code/go/ecs/{stream.go => dataset.go} (79%) rename schemas/{stream.yml => dataset.yml} (68%) diff --git a/code/go/ecs/stream.go b/code/go/ecs/dataset.go similarity index 79% rename from code/go/ecs/stream.go rename to code/go/ecs/dataset.go index 47d4ad26d5..57232e1f3b 100644 --- a/code/go/ecs/stream.go +++ b/code/go/ecs/dataset.go @@ -19,7 +19,7 @@ package ecs -// The stream fields are part of the new [indexing +// The dataset fields are part of the new [indexing // strategy](https://github.com/elastic/kibana/blob/master/docs/ingest_manager/index.asciidoc#indexing-strategy-1). // These fields are used to determine into which index the data is shipped in // Elasticsearch and allow efficient querying of data. Initially these fields @@ -27,25 +27,25 @@ package ecs // data shipper should switch to using data streams and the new indexing // strategy with these fields. // All three fields are `constant_keyword` fields. -type Stream struct { - // Type of the stream. - // The type of the stream can be `logs` or `metrics`. More types can be +type Dataset struct { + // Type of the dataset. + // The type of the dataset can be `logs` or `metrics`. More types can be // added in the future but no other types then the one describe here should // be used. Type string `ecs:"type"` - // Dataset describes the structure of the data. - // The dataset describes the structure of the data. All data shipped into a - // single dataset should have the same or very similar data structure. For - // example `system.cpu` and `system.disk` are two different datasets as - // they have very different fields. + // Dataset name describes the structure of the data. + // The dataset name describes the structure of the data. All data shipped + // into a single dataset should have the same or very similar data + // structure. For example `system.cpu` and `system.disk` are two different + // datasets as they have very different fields. // The name of the dataset should be descriptive of the data and it is // encourage to use `.` to combine multiple words. All characters which are // allowed in index names can be used for the dataset except `-`. // The default for dataset is `generic`. - Dataset string `ecs:"dataset"` + Name string `ecs:"name"` - // Namespace of your stream. + // Namespace of the dataset. // This is the namespace used in your index. The namespace is used to // separate the same structure into different Data Streams. For example if // nginx logs are shipped for testing and production into the same cluster, diff --git a/docs/field-details.asciidoc b/docs/field-details.asciidoc index 4bd9076a2e..d9b9a918ec 100644 --- a/docs/field-details.asciidoc +++ b/docs/field-details.asciidoc @@ -807,6 +807,76 @@ example: `docker` |===== +[[ecs-dataset]] +=== Dataset Fields + +The dataset fields are part of the new [indexing strategy](https://github.com/elastic/kibana/blob/master/docs/ingest_manager/index.asciidoc#indexing-strategy-1). + +These fields are used to determine into which index the data is shipped in Elasticsearch and allow efficient querying of data. Initially these fields are mainly used by data shipped by the Elastic Agent but any time series data shipper should switch to using data streams and the new indexing strategy with these fields. + +All three fields are `constant_keyword` fields. + +==== Dataset Field Details + +[options="header"] +|===== +| Field | Description | Level + +// =============================================================== + +| dataset.name +| Dataset name describes the structure of the data. + +The dataset name describes the structure of the data. All data shipped into a single dataset should have the same or very similar data structure. For example `system.cpu` and `system.disk` are two different datasets as they have very different fields. + +The name of the dataset should be descriptive of the data and it is encourage to use `.` to combine multiple words. All characters which are allowed in index names can be used for the dataset except `-`. + +The default for dataset is `generic`. + +type: constant_keyword + + + +example: `nginx.access` + +| extended + +// =============================================================== + +| dataset.namespace +| Namespace of the dataset. + +This is the namespace used in your index. The namespace is used to separate the same structure into different Data Streams. For example if nginx logs are shipped for testing and production into the same cluster, two different namespaces can be used. This allows to assign different ILM policies as an example. + +The default value for a namespace is `default`. + +type: constant_keyword + + + +example: `production` + +| extended + +// =============================================================== + +| dataset.type +| Type of the dataset. + +The type of the dataset can be `logs` or `metrics`. More types can be added in the future but no other types then the one describe here should be used. + +type: constant_keyword + + + +example: `logs` + +| extended + +// =============================================================== + +|===== + [[ecs-destination]] === Destination Fields @@ -5616,76 +5686,6 @@ example: `co.uk` // =============================================================== -|===== - -[[ecs-stream]] -=== Stream Fields - -The stream fields are part of the new [indexing strategy](https://github.com/elastic/kibana/blob/master/docs/ingest_manager/index.asciidoc#indexing-strategy-1). - -These fields are used to determine into which index the data is shipped in Elasticsearch and allow efficient querying of data. Initially these fields are mainly used by data shipped by the Elastic Agent but any time series data shipper should switch to using data streams and the new indexing strategy with these fields. - -All three fields are `constant_keyword` fields. - -==== Stream Field Details - -[options="header"] -|===== -| Field | Description | Level - -// =============================================================== - -| stream.dataset -| Dataset describes the structure of the data. - -The dataset describes the structure of the data. All data shipped into a single dataset should have the same or very similar data structure. For example `system.cpu` and `system.disk` are two different datasets as they have very different fields. - -The name of the dataset should be descriptive of the data and it is encourage to use `.` to combine multiple words. All characters which are allowed in index names can be used for the dataset except `-`. - -The default for dataset is `generic`. - -type: constant_keyword - - - -example: `nginx.access` - -| extended - -// =============================================================== - -| stream.namespace -| Namespace of your stream. - -This is the namespace used in your index. The namespace is used to separate the same structure into different Data Streams. For example if nginx logs are shipped for testing and production into the same cluster, two different namespaces can be used. This allows to assign different ILM policies as an example. - -The default value for a namespace is `default`. - -type: constant_keyword - - - -example: `production` - -| extended - -// =============================================================== - -| stream.type -| Type of the stream. - -The type of the stream can be `logs` or `metrics`. More types can be added in the future but no other types then the one describe here should be used. - -type: constant_keyword - - - -example: `logs` - -| extended - -// =============================================================== - |===== [[ecs-threat]] diff --git a/docs/fields.asciidoc b/docs/fields.asciidoc index c1ceb224ee..b45b6fa276 100644 --- a/docs/fields.asciidoc +++ b/docs/fields.asciidoc @@ -32,6 +32,8 @@ all fields are defined. | <> | Fields describing the container that generated this event. +| <> | Fields about the dataset of this document. + | <> | Fields about the destination side of a network connection, used with source. | <> | These fields contain information about code libraries dynamically loaded into processes. @@ -86,8 +88,6 @@ all fields are defined. | <> | Fields about the source side of a network connection, used with destination. -| <> | Fields about the monitoring agent. - | <> | Fields to classify events and alerts according to a threat taxonomy. | <> | Fields describing a TLS connection. diff --git a/generated/beats/fields.ecs.yml b/generated/beats/fields.ecs.yml index 5ea78faa47..7dc98148f3 100644 --- a/generated/beats/fields.ecs.yml +++ b/generated/beats/fields.ecs.yml @@ -552,6 +552,61 @@ ignore_above: 1024 description: Runtime managing this container. example: docker + - name: dataset + title: Dataset + group: 2 + description: 'The dataset fields are part of the new [indexing strategy](https://github.com/elastic/kibana/blob/master/docs/ingest_manager/index.asciidoc#indexing-strategy-1). + + These fields are used to determine into which index the data is shipped in Elasticsearch + and allow efficient querying of data. Initially these fields are mainly used + by data shipped by the Elastic Agent but any time series data shipper should + switch to using data streams and the new indexing strategy with these fields. + + All three fields are `constant_keyword` fields.' + footnote: 'Examples: The new indexing strategy is {dataset.type}-{dataset.name}-{dataset.namespace}`.` + As an example, nginx access logs are shipped into `logs-nginx.access-default`.' + type: group + fields: + - name: name + level: extended + type: constant_keyword + description: 'Dataset name describes the structure of the data. + + The dataset name describes the structure of the data. All data shipped into + a single dataset should have the same or very similar data structure. For + example `system.cpu` and `system.disk` are two different datasets as they + have very different fields. + + The name of the dataset should be descriptive of the data and it is encourage + to use `.` to combine multiple words. All characters which are allowed in + index names can be used for the dataset except `-`. + + The default for dataset is `generic`.' + example: nginx.access + default_field: false + - name: namespace + level: extended + type: constant_keyword + description: 'Namespace of the dataset. + + This is the namespace used in your index. The namespace is used to separate + the same structure into different Data Streams. For example if nginx logs + are shipped for testing and production into the same cluster, two different + namespaces can be used. This allows to assign different ILM policies as an + example. + + The default value for a namespace is `default`.' + example: production + default_field: false + - name: type + level: extended + type: constant_keyword + description: 'Type of the dataset. + + The type of the dataset can be `logs` or `metrics`. More types can be added + in the future but no other types then the one describe here should be used.' + example: logs + default_field: false - name: destination title: Destination group: 2 @@ -4380,60 +4435,6 @@ default_field: false description: Short name or login of the user. example: albert - - name: stream - title: Stream - group: 2 - description: 'The stream fields are part of the new [indexing strategy](https://github.com/elastic/kibana/blob/master/docs/ingest_manager/index.asciidoc#indexing-strategy-1). - - These fields are used to determine into which index the data is shipped in Elasticsearch - and allow efficient querying of data. Initially these fields are mainly used - by data shipped by the Elastic Agent but any time series data shipper should - switch to using data streams and the new indexing strategy with these fields. - - All three fields are `constant_keyword` fields.' - footnote: 'Examples: The new indexing strategy is {stream.type}-{stream.dataset}-{stream.namespace}`.` - As an example, nginx access logs are shipped into `logs-nginx.access-default`.' - type: group - fields: - - name: dataset - level: extended - type: constant_keyword - description: 'Dataset describes the structure of the data. - - The dataset describes the structure of the data. All data shipped into a single - dataset should have the same or very similar data structure. For example `system.cpu` - and `system.disk` are two different datasets as they have very different fields. - - The name of the dataset should be descriptive of the data and it is encourage - to use `.` to combine multiple words. All characters which are allowed in - index names can be used for the dataset except `-`. - - The default for dataset is `generic`.' - example: nginx.access - default_field: false - - name: namespace - level: extended - type: constant_keyword - description: 'Namespace of your stream. - - This is the namespace used in your index. The namespace is used to separate - the same structure into different Data Streams. For example if nginx logs - are shipped for testing and production into the same cluster, two different - namespaces can be used. This allows to assign different ILM policies as an - example. - - The default value for a namespace is `default`.' - example: production - default_field: false - - name: type - level: extended - type: constant_keyword - description: 'Type of the stream. - - The type of the stream can be `logs` or `metrics`. More types can be added - in the future but no other types then the one describe here should be used.' - example: logs - default_field: false - name: threat title: Threat group: 2 diff --git a/generated/csv/fields.csv b/generated/csv/fields.csv index e183dba634..8431cee602 100644 --- a/generated/csv/fields.csv +++ b/generated/csv/fields.csv @@ -58,6 +58,9 @@ ECS_Version,Indexed,Field_Set,Field,Type,Level,Normalization,Example,Description 1.6.0-dev,true,container,container.labels,object,extended,,,Image labels. 1.6.0-dev,true,container,container.name,keyword,extended,,,Container name. 1.6.0-dev,true,container,container.runtime,keyword,extended,,docker,Runtime managing this container. +1.6.0-dev,true,dataset,dataset.name,constant_keyword,extended,,nginx.access,Dataset name describing the structure of the data. +1.6.0-dev,true,dataset,dataset.namespace,constant_keyword,extended,,production,Namespace of the dataset. +1.6.0-dev,true,dataset,dataset.type,constant_keyword,extended,,logs,Type of the dataset. 1.6.0-dev,true,destination,destination.address,keyword,extended,,,Destination network address. 1.6.0-dev,true,destination,destination.as.number,long,extended,,15169,Unique number allocated to the autonomous system. The autonomous system number (ASN) uniquely identifies each network on the Internet. 1.6.0-dev,true,destination,destination.as.organization.name,keyword,extended,,Google LLC,Organization name. @@ -516,9 +519,6 @@ ECS_Version,Indexed,Field_Set,Field,Type,Level,Normalization,Example,Description 1.6.0-dev,true,source,source.user.id,keyword,core,,,Unique identifier of the user. 1.6.0-dev,true,source,source.user.name,keyword,core,,albert,Short name or login of the user. 1.6.0-dev,true,source,source.user.name.text,text,core,,albert,Short name or login of the user. -1.6.0-dev,true,stream,stream.dataset,constant_keyword,extended,,nginx.access,Dataset describes the structure of the data. -1.6.0-dev,true,stream,stream.namespace,constant_keyword,extended,,production,Namespace of your stream. -1.6.0-dev,true,stream,stream.type,constant_keyword,extended,,logs,Type of the stream. 1.6.0-dev,true,threat,threat.framework,keyword,extended,,MITRE ATT&CK,Threat classification framework. 1.6.0-dev,true,threat,threat.tactic.id,keyword,extended,array,TA0040,Threat tactic id. 1.6.0-dev,true,threat,threat.tactic.name,keyword,extended,array,impact,Threat tactic. diff --git a/generated/ecs/ecs_flat.yml b/generated/ecs/ecs_flat.yml index 82fde63738..212ea884e3 100644 --- a/generated/ecs/ecs_flat.yml +++ b/generated/ecs/ecs_flat.yml @@ -667,6 +667,57 @@ container.runtime: normalize: [] short: Runtime managing this container. type: keyword +dataset.name: + dashed_name: dataset-name + description: 'Dataset name describes the structure of the data. + + The dataset name describes the structure of the data. All data shipped into a + single dataset should have the same or very similar data structure. For example + `system.cpu` and `system.disk` are two different datasets as they have very different + fields. + + The name of the dataset should be descriptive of the data and it is encourage + to use `.` to combine multiple words. All characters which are allowed in index + names can be used for the dataset except `-`. + + The default for dataset is `generic`.' + example: nginx.access + flat_name: dataset.name + level: extended + name: name + normalize: [] + short: Dataset name describing the structure of the data. + type: constant_keyword +dataset.namespace: + dashed_name: dataset-namespace + description: 'Namespace of the dataset. + + This is the namespace used in your index. The namespace is used to separate the + same structure into different Data Streams. For example if nginx logs are shipped + for testing and production into the same cluster, two different namespaces can + be used. This allows to assign different ILM policies as an example. + + The default value for a namespace is `default`.' + example: production + flat_name: dataset.namespace + level: extended + name: namespace + normalize: [] + short: Namespace of the dataset. + type: constant_keyword +dataset.type: + dashed_name: dataset-type + description: 'Type of the dataset. + + The type of the dataset can be `logs` or `metrics`. More types can be added in + the future but no other types then the one describe here should be used.' + example: logs + flat_name: dataset.type + level: extended + name: type + normalize: [] + short: Type of the dataset. + type: constant_keyword destination.address: dashed_name: destination-address description: 'Some event destination addresses are defined ambiguously. The event @@ -6669,56 +6720,6 @@ source.user.name: original_fieldset: user short: Short name or login of the user. type: keyword -stream.dataset: - dashed_name: stream-dataset - description: 'Dataset describes the structure of the data. - - The dataset describes the structure of the data. All data shipped into a single - dataset should have the same or very similar data structure. For example `system.cpu` - and `system.disk` are two different datasets as they have very different fields. - - The name of the dataset should be descriptive of the data and it is encourage - to use `.` to combine multiple words. All characters which are allowed in index - names can be used for the dataset except `-`. - - The default for dataset is `generic`.' - example: nginx.access - flat_name: stream.dataset - level: extended - name: dataset - normalize: [] - short: Dataset describes the structure of the data. - type: constant_keyword -stream.namespace: - dashed_name: stream-namespace - description: 'Namespace of your stream. - - This is the namespace used in your index. The namespace is used to separate the - same structure into different Data Streams. For example if nginx logs are shipped - for testing and production into the same cluster, two different namespaces can - be used. This allows to assign different ILM policies as an example. - - The default value for a namespace is `default`.' - example: production - flat_name: stream.namespace - level: extended - name: namespace - normalize: [] - short: Namespace of your stream. - type: constant_keyword -stream.type: - dashed_name: stream-type - description: 'Type of the stream. - - The type of the stream can be `logs` or `metrics`. More types can be added in - the future but no other types then the one describe here should be used.' - example: logs - flat_name: stream.type - level: extended - name: type - normalize: [] - short: Type of the stream. - type: constant_keyword tags: dashed_name: tags description: List of keywords used to tag each event. diff --git a/generated/ecs/ecs_nested.yml b/generated/ecs/ecs_nested.yml index 905748d99e..fdafdeba96 100644 --- a/generated/ecs/ecs_nested.yml +++ b/generated/ecs/ecs_nested.yml @@ -923,6 +923,76 @@ container: short: Fields describing the container that generated this event. title: Container type: group +dataset: + description: 'The dataset fields are part of the new [indexing strategy](https://github.com/elastic/kibana/blob/master/docs/ingest_manager/index.asciidoc#indexing-strategy-1). + + These fields are used to determine into which index the data is shipped in Elasticsearch + and allow efficient querying of data. Initially these fields are mainly used by + data shipped by the Elastic Agent but any time series data shipper should switch + to using data streams and the new indexing strategy with these fields. + + All three fields are `constant_keyword` fields.' + fields: + name: + dashed_name: dataset-name + description: 'Dataset name describes the structure of the data. + + The dataset name describes the structure of the data. All data shipped into + a single dataset should have the same or very similar data structure. For + example `system.cpu` and `system.disk` are two different datasets as they + have very different fields. + + The name of the dataset should be descriptive of the data and it is encourage + to use `.` to combine multiple words. All characters which are allowed in + index names can be used for the dataset except `-`. + + The default for dataset is `generic`.' + example: nginx.access + flat_name: dataset.name + level: extended + name: name + normalize: [] + short: Dataset name describing the structure of the data. + type: constant_keyword + namespace: + dashed_name: dataset-namespace + description: 'Namespace of the dataset. + + This is the namespace used in your index. The namespace is used to separate + the same structure into different Data Streams. For example if nginx logs + are shipped for testing and production into the same cluster, two different + namespaces can be used. This allows to assign different ILM policies as an + example. + + The default value for a namespace is `default`.' + example: production + flat_name: dataset.namespace + level: extended + name: namespace + normalize: [] + short: Namespace of the dataset. + type: constant_keyword + type: + dashed_name: dataset-type + description: 'Type of the dataset. + + The type of the dataset can be `logs` or `metrics`. More types can be added + in the future but no other types then the one describe here should be used.' + example: logs + flat_name: dataset.type + level: extended + name: type + normalize: [] + short: Type of the dataset. + type: constant_keyword + footnote: 'Examples: The new indexing strategy is {dataset.type}-{dataset.name}-{dataset.namespace}`.` + As an example, nginx access logs are shipped into `logs-nginx.access-default`.' + group: 2 + name: dataset + prefix: dataset. + short: Fields about the dataset of this document. + title: Dataset + type: group destination: description: 'Destination fields describe details about the destination of a packet/event. @@ -7698,75 +7768,6 @@ source: short: Fields about the source side of a network connection, used with destination. title: Source type: group -stream: - description: 'The stream fields are part of the new [indexing strategy](https://github.com/elastic/kibana/blob/master/docs/ingest_manager/index.asciidoc#indexing-strategy-1). - - These fields are used to determine into which index the data is shipped in Elasticsearch - and allow efficient querying of data. Initially these fields are mainly used by - data shipped by the Elastic Agent but any time series data shipper should switch - to using data streams and the new indexing strategy with these fields. - - All three fields are `constant_keyword` fields.' - fields: - dataset: - dashed_name: stream-dataset - description: 'Dataset describes the structure of the data. - - The dataset describes the structure of the data. All data shipped into a single - dataset should have the same or very similar data structure. For example `system.cpu` - and `system.disk` are two different datasets as they have very different fields. - - The name of the dataset should be descriptive of the data and it is encourage - to use `.` to combine multiple words. All characters which are allowed in - index names can be used for the dataset except `-`. - - The default for dataset is `generic`.' - example: nginx.access - flat_name: stream.dataset - level: extended - name: dataset - normalize: [] - short: Dataset describes the structure of the data. - type: constant_keyword - namespace: - dashed_name: stream-namespace - description: 'Namespace of your stream. - - This is the namespace used in your index. The namespace is used to separate - the same structure into different Data Streams. For example if nginx logs - are shipped for testing and production into the same cluster, two different - namespaces can be used. This allows to assign different ILM policies as an - example. - - The default value for a namespace is `default`.' - example: production - flat_name: stream.namespace - level: extended - name: namespace - normalize: [] - short: Namespace of your stream. - type: constant_keyword - type: - dashed_name: stream-type - description: 'Type of the stream. - - The type of the stream can be `logs` or `metrics`. More types can be added - in the future but no other types then the one describe here should be used.' - example: logs - flat_name: stream.type - level: extended - name: type - normalize: [] - short: Type of the stream. - type: constant_keyword - footnote: 'Examples: The new indexing strategy is {stream.type}-{stream.dataset}-{stream.namespace}`.` - As an example, nginx access logs are shipped into `logs-nginx.access-default`.' - group: 2 - name: stream - prefix: stream. - short: Fields about the monitoring agent. - title: Stream - type: group threat: description: 'Fields to classify events and alerts according to a threat taxonomy such as the Mitre ATT&CK framework. diff --git a/generated/elasticsearch/6/template.json b/generated/elasticsearch/6/template.json index 2d174c74fa..dfd3dbb789 100644 --- a/generated/elasticsearch/6/template.json +++ b/generated/elasticsearch/6/template.json @@ -304,6 +304,19 @@ } } }, + "dataset": { + "properties": { + "name": { + "type": "constant_keyword" + }, + "namespace": { + "type": "constant_keyword" + }, + "type": { + "type": "constant_keyword" + } + } + }, "destination": { "properties": { "address": { @@ -2450,19 +2463,6 @@ } } }, - "stream": { - "properties": { - "dataset": { - "type": "constant_keyword" - }, - "namespace": { - "type": "constant_keyword" - }, - "type": { - "type": "constant_keyword" - } - } - }, "tags": { "ignore_above": 1024, "type": "keyword" diff --git a/generated/elasticsearch/7/template.json b/generated/elasticsearch/7/template.json index 9570c24ca1..064186bb72 100644 --- a/generated/elasticsearch/7/template.json +++ b/generated/elasticsearch/7/template.json @@ -303,6 +303,19 @@ } } }, + "dataset": { + "properties": { + "name": { + "type": "constant_keyword" + }, + "namespace": { + "type": "constant_keyword" + }, + "type": { + "type": "constant_keyword" + } + } + }, "destination": { "properties": { "address": { @@ -2449,19 +2462,6 @@ } } }, - "stream": { - "properties": { - "dataset": { - "type": "constant_keyword" - }, - "namespace": { - "type": "constant_keyword" - }, - "type": { - "type": "constant_keyword" - } - } - }, "tags": { "ignore_above": 1024, "type": "keyword" diff --git a/schemas/stream.yml b/schemas/dataset.yml similarity index 68% rename from schemas/stream.yml rename to schemas/dataset.yml index d3bbe8a012..bf1abe6bd9 100644 --- a/schemas/stream.yml +++ b/schemas/dataset.yml @@ -1,10 +1,10 @@ --- -- name: stream - title: Stream +- name: dataset + title: Dataset group: 2 - short: Fields about the monitoring agent. + short: Fields about the dataset of this document. description: > - The stream fields are part of the new [indexing strategy](https://github.com/elastic/kibana/blob/master/docs/ingest_manager/index.asciidoc#indexing-strategy-1). + The dataset fields are part of the new [indexing strategy](https://github.com/elastic/kibana/blob/master/docs/ingest_manager/index.asciidoc#indexing-strategy-1). These fields are used to determine into which index the data is shipped in Elasticsearch and allow efficient querying of data. Initially these fields are mainly used by data shipped by @@ -13,30 +13,30 @@ All three fields are `constant_keyword` fields. footnote: > - Examples: The new indexing strategy is {stream.type}-{stream.dataset}-{stream.namespace}`.` + Examples: The new indexing strategy is {dataset.type}-{dataset.name}-{dataset.namespace}`.` As an example, nginx access logs are shipped into `logs-nginx.access-default`. type: group fields: - name: type level: extended type: constant_keyword - short: Type of the stream. + short: Type of the dataset. description: > - Type of the stream. + Type of the dataset. - The type of the stream can be `logs` or `metrics`. More types can be added in the future but no + The type of the dataset can be `logs` or `metrics`. More types can be added in the future but no other types then the one describe here should be used. example: logs - - name: dataset + - name: name level: extended type: constant_keyword - short: Dataset describes the structure of the data. + short: Dataset name describing the structure of the data. description: > - Dataset describes the structure of the data. + Dataset name describes the structure of the data. - The dataset describes the structure of the data. All data shipped into a single dataset + The dataset name describes the structure of the data. All data shipped into a single dataset should have the same or very similar data structure. For example `system.cpu` and `system.disk` are two different datasets as they have very different fields. @@ -50,9 +50,9 @@ - name: namespace level: extended type: constant_keyword - short: Namespace of your stream. + short: Namespace of the dataset. description: > - Namespace of your stream. + Namespace of the dataset. This is the namespace used in your index. The namespace is used to separate the same structure into different Data Streams. For example if nginx logs