-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Describe the context
In developing ECS, we took a step back and took a critical look at the information we're looking for, when parsing a user agent. The current user_agent plugin for Ingest Node had a few issues, we found.
Let's start with the default parsing for Chrome 70.0.3538.102 on Mac 10.14.1:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
Is parsed as such:
{
"patch" : "3538",
"major" : "70",
"minor" : "0",
"os" : "Mac OS X 10.14.1",
"os_minor" : "14",
"os_major" : "10",
"name" : "Chrome",
"os_name" : "Mac OS X",
"device" : "Other"
}To follow ECS, the structure would ideally be:
{
"name" : "Chrome",
"original" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36",
"os" : {
"name" : "Mac OS X",
"version" : "10.14.1",
"full" : "Mac OS X 10.14.1"
},
"device" : "Other",
"version" : "70.0.3538.102"
}There's a few things to notice here
- By default, the versions are reported as a complete version string, not broken down. An option is still welcome, to output the breakdown of the version numbers.
- Also: full version strings make support for pre-release versions trivial (e.g.
-beta1,-rc2etc.)
- Also: full version strings make support for pre-release versions trivial (e.g.
- The original ua string is kept around, which can be made optional as well.
In translating multiple access log Filebeat modules to match ECS, I've had to repeatedly rename the fields around. Here are some examples:
- Apache - all fields renamed, but the version numbers are still broken down (which is not in ECS). Currently most access log modules are implemented this way.
- Traefik - fields renamed, and there's a hacky attempt at reconstructing full version fields :-)
- pipeline
- semi-successful version reconstruction: Browsing around you'll see completely empty versions (
..), partially reconstructed ones (11.2.instead of11.2.5) and successful reconstructions (7.62.0). Of course this can be cleaned up further to eliminate the noise. - Note that this has not been released yet, and will not be released in its current form, this is an experiment ;-)
I don't mind doing this in order to hit the 7.0 feature freeze in a pinch. But I think this will not be a good experience for users that try to follow ECS, when they use the user agent processor. They would benefit greatly from having the plugin follow ECS by default or via some easy to enable setting(s).
Describe the feature
Can we update the user agent parser to:
- output field names following the ECS schema
- output the full version string for the agent and for the OS
- output the original agent string at
.original
I'm more than happy that we do this via one or more option flags.
cc @ruflin