Skip to content

Improve flexibility over index.required_pipeline #49247

@roncohen

Description

@roncohen

Describe the feature:

Following up on #46847 there are a couple of cases where we want to ensure that a specific pipeline is run on any documents that are ingested into an index. For example, you may want to set the event.ingested timestamp or ensure that the name of the API Key used is present in the document.

At the same time, we want to give users the flexibility they currently have to use a pipeline of their choosing to process the incoming data. We have index.required_pipeline, but it doesn't come with the flexibility we'd like.

@skearns64 suggested:

Sounds like we need an "append" pipeline, or an option to required to be "run first or run last"

If "append pipeline" means that Elasticsearch will automatically run the "append pipeline" on every indexed document after the pipeline specified with the request has been run, it sounds like the "append pipeline" option would solve the use-cases I'm familiar with.

I've not heard a compelling use case for "run first", but they could exist.

some questions that come to mind:

  • Does "append pipeline" let users specify a list of pipelines to append or only a single pipeline. You can achieve the same functionality by combining pipelines, but I can imagine it would be convenient to be able to specify a list.
  • how will it work with index.default_pipeline and index.required_pipeline
  • I don't know that index.required_pipeline has any use case that index.append_pipeline does not solve, but that could be due to lack of context on my part

cc @ruflin @webmat @clintongormley @jasontedor @bytebilly

(first Elasticsearch issue! 🎉 )

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions