Skip to content

Conversation

@danhermann
Copy link
Contributor

This PR adds support for a processor that looks up the registered domain, eTLD from the public suffix list, and uses those to split out the subdomain as well. It's essentially a port of the beats processor.

~ curl -H "Content-Type: application/json" -X POST -u elastic:password http://localhost:9200/_ingest/pipeline/_simulate\?verbose --data-binary @- << EOF
{
  "pipeline": {
    "processors": [
      {
        "registered_domain": {
          "field": "url",
          "target_field": "url_registered_domain"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "url": "www.google.com"
      }
    }
  ]
}

Produces:

...
"_source" : {
  "url_registered_domain" : {
    "subdomain": "www",
    "registered_domain": "google.com",
    "top_level_domain": "com"
  }
  "url": "www.google.com"
}
...

Backport of #67611

@danhermann danhermann added >feature :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP backport v7.13.0 labels Apr 14, 2021
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Apr 14, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (Team:Core/Features)

@danhermann danhermann merged commit c7f846e into elastic:7.x Apr 14, 2021
@danhermann danhermann deleted the backport_7x_67611_registered_domain_processor branch April 14, 2021 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >feature Team:Data Management Meta label for data/management team v7.13.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants