Skip to content

Ingest Script Processor can leak mutable parameter references #91977

@joegallo

Description

@joegallo

Consider a pipeline like the following:

PUT _ingest/pipeline/test-pipeline-1
{
  "processors" : [
    {
      "script": {
        "lang": "painless",
        "params": {
          "list": [1,2,3,4]
        },
        "source": "ctx.list = params.get('list');"
      }
    },
    {
      "append": {
        "field": "list",
        "value": 10
      }
    }
  ]
}

And then repeatedly invoke the _simulate API against that pipeline:

POST /_ingest/pipeline/test-pipeline-1/_simulate
{
  "docs": [
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "message": "hello"
      }
    }
  ]
}

Each invocation will result in growing a new '10' because the script is leaking a reference to the underlying mutable 'list' parameter:

{
  "docs" : [
    {
      "doc" : {
        "_index" : "index",
        "_id" : "id",
        "_version" : "-3",
        "_source" : {
          "message" : "hello",
          "list" : [
            1,
            2,
            3,
            4,
            10,
            10,
            10,
            10,
            10,
            10,
            10 # weeeeeeeeeeeeeeeeeeeeeee
          ]
        },
        "_ingest" : {
          "timestamp" : "2022-11-28T17:46:35.03053Z"
        }
      }
    }
  ]
}

Fixing this in the calling code is relatively easy, the pipeline should be rewritten as:

PUT _ingest/pipeline/test-pipeline-1
{
  "processors" : [
    {
      "script": {
        "lang": "painless",
        "params": {
          "list": [1,2,3,4]
        },
        "source": "ctx.list = new ArrayList(params.get('list'));"
      }
    },
    {
      "append": {
        "field": "list",
        "value": 10
      }
    }
  ]
}

Now the list parameter is defensively copied and so we're no longer banging on an unintentionally static object.


Note: the params object itself is unmodifiable, you can't invent/inject a new parameter at runtime via script, however it's only shallowly immutable -- the tree of maps and lists that it represents is made of mutable maps and lists.

IMHO we should update ScriptProcessor to either return a shared deeply immutable param tree, or we should duplicate the params tree on each invocation in which case it's still mutable but no longer shared.

The current behavior is harmless enough when ingest pipelines merely leak the mutable reference, but it can cause rare and hard to track down race-condition bugs when the pipelines actually do mutate the shared mutable reference.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions