Skip to content

Conversation

@danhermann
Copy link
Contributor

Adds a fingerprint processor that computes hashes of document content for content fingerprinting use cases.

E.g.:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "fingerprint": {
          "fields": ["user"]
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "user": {
          "last_name": "Smith",
          "first_name": "John",
          "date_of_birth": "1980-01-15",
          "is_active": true
        }
      }
    }
  ]
}

Which produces:

"_source" : {
  "fingerprint" : "WbSUPW4zY1PBPehh2AA/sSxiRjw=",
  "user" : {
    "last_name" : "Smith",
    "first_name" : "John",
    "date_of_birth" : "1980-01-15",
    "is_active" : true
  }
}

Supports any number of document fields, nested document content, any hash from [MD5, SHA-1, SHA-256, SHA-512], and a per-processor salt.

Closes #53578 though it addresses only content fingerprinting and not anonymization use cases.

Backport of #68415

@danhermann danhermann added >feature :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP backport v7.12.0 labels Feb 16, 2021
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Feb 16, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (Team:Core/Features)

@danhermann danhermann merged commit a8669e7 into elastic:7.x Feb 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >feature Team:Data Management Meta label for data/management team v7.12.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants