-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Coming from this discussion: https://discuss.elastic.co/t/how-to-control-the-indexed-chars-value-on-a-ingest-attachment-pipeline/123073/4
We today support a global indexed_chars processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.
Here is my proposal.
We should add an option like reading this limit value from the document itself by adding a setting like indexed_chars_field.
Then we could do something:
PUT _ingest/pipeline/attachment
{
"description" : "Extract attachment information. Used to parse pdf and office files",
"processors" : [
{
"attachment" : {
"field" : "data",
"indexed_chars_field" : "size"
}
}
]
}
Then index either:
PUT index/doc/1?pipeline=attachment
{
"data": "BASE64"
}
Which will use the default value (or the one defined by indexed_chars)
Or
PUT index/doc/2?pipeline=attachment
{
"data": "BASE64",
"size": 1000
}
I'll propose hopefully soon a PR for it unless someone in the meantime reject that feature request or propose another implementation for it.