Skip to content

[processor/redaction] Add support for keys patterns and ability to specify mask string #35830

@krokwen-tftc

Description

@krokwen-tftc

Component(s)

processor/redaction

Is your feature request related to a problem? Please describe.

I want to redact my http access logs.
We log full request data including POST, cookies, etc...
There is a lot of various fields containing tokens, that I want to hide, these fields have common patterns like 'token' or 'apiKey'.
It will be complicated to collect all the variations of these keys and their values formats.
Also, I don't want to remove these keys from log attributes, because it's important to see if the field exists or not.

In addition, it may be useful to add hashing processing, to hash masked value instead of replacing with mask to keep ability to track logs by similar hash values in keys but without exposing the actual value.

Describe the solution you'd like

Masking option

processors:
  redaction/nginx_access_redact_secrets:
    allow_all_keys: true
    blocked_keys_patterns:
      - ".*token.*"
      - ".*api_key.*"
      - ".*apiKey.*"
      - ".*password.*"
    mask_string: "<redacted>"

And as result to get attributes like request_args.secret_client_token: <redacted>

Or hashing option

processors:
  redaction/nginx_access_hash_secrets:
    allow_all_keys: true
    blocked_keys_patterns:
      - ".*token.*"
      - ".*api_key.*"
      - ".*apiKey.*"
      - ".*password.*"
    hashing: sha1 # by default set 'none'

And as result to get attributes like request_args.secret_client_token: <sha1 sum>

Describe alternatives you've considered

Using transform processor:
But it's more complicated and it's possible due to a bug-feature inside replace_all_patterns ottl function, like:

...
statements:
# this won't work according to docs (it should replace keys, not values), but it works in real (v0.103)
  - replace_all_patterns(attributes["request_args"]["query"], "key", ".*token.*", "redacted", SHA1, "redacted %s") where IsMap(attributes["request_args"]["query"])
  - replace_all_patterns(attributes["request_args"]["query"], "value", "^redacted.*", "<redacted>") where IsMap(attributes["request_args"]["query"])
# yes, there is a nested map, but it's merged into attributes root later
...

And I believe it can work faster as strict functionality in "redaction" processor than the statements pipeline in transform processor

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions