-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Describe the feature:
Following up on #46847 there are a couple of cases where we want to ensure that a specific pipeline is run on any documents that are ingested into an index. For example, you may want to set the event.ingested timestamp or ensure that the name of the API Key used is present in the document.
At the same time, we want to give users the flexibility they currently have to use a pipeline of their choosing to process the incoming data. We have index.required_pipeline, but it doesn't come with the flexibility we'd like.
@skearns64 suggested:
Sounds like we need an "append" pipeline, or an option to required to be "run first or run last"
If "append pipeline" means that Elasticsearch will automatically run the "append pipeline" on every indexed document after the pipeline specified with the request has been run, it sounds like the "append pipeline" option would solve the use-cases I'm familiar with.
I've not heard a compelling use case for "run first", but they could exist.
some questions that come to mind:
- Does "append pipeline" let users specify a list of pipelines to append or only a single pipeline. You can achieve the same functionality by combining pipelines, but I can imagine it would be convenient to be able to specify a list.
- how will it work with
index.default_pipelineandindex.required_pipeline - I don't know that
index.required_pipelinehas any use case thatindex.append_pipelinedoes not solve, but that could be due to lack of context on my part
cc @ruflin @webmat @clintongormley @jasontedor @bytebilly
(first Elasticsearch issue! 🎉 )