-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Enrich processor configuration changes #45466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enrich processor configuration changes #45466
Conversation
* Renamed `enrich_key` option to `field` option. * Replaced `set_from` and `targets` options with `target_field`. The `target_field` option behaves different to how `set_from` and `targets` worked. The `target_field` is the field that will contain the looked up document. Relates to elastic#32789
|
Pinging @elastic/es-core-features |
|
@elasticmachine run elasticsearch-ci/1 |
jakelandis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. A couple minor suggestions.
Also, I just realized with this updated approach, it encourages small enrich data sets else by default you get a ton of data added to your docs. (IMO this is a good thing!)
| | Name | Required | Default | Description | ||
| | `policy_name` | yes | - | The name of the enrich policy to use. | ||
| | `enrich_key` | no | Policy enrich_key | The field to get the value from for the enrich lookup. | ||
| | `field` | yes | - | The field to get the value from for the enrich lookup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about :
The field to get the value from for the enrich lookup.
to
The field in the input document that matches the policies match_field used to retrieve the enrichment data
Also, we could default this to the match_field if we wanted to.
| | `policy_name` | yes | - | The name of the enrich policy to use. | ||
| | `enrich_key` | no | Policy enrich_key | The field to get the value from for the enrich lookup. | ||
| | `field` | yes | - | The field to get the value from for the enrich lookup. | ||
| | `target_field` | yes | - | The field that will hold the content of the looked up document as json object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about :
The field that will hold the content of the looked up document as json object.
to
The field that will be used for the enrichment data
| | `enrich_key` | no | Policy enrich_key | The field to get the value from for the enrich lookup. | ||
| | `field` | yes | - | The field to get the value from for the enrich lookup. | ||
| | `target_field` | yes | - | The field that will hold the content of the looked up document as json object. | ||
| | `ignore_missing` | no | `false` | If `true` and `enrich_key` does not exist, the processor quietly exits without modifying the document |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enrich_key -> field
also should the false be in single quotes ?
| } | ||
|
|
||
| public void testIngestDataWithEnrichProcessor() { | ||
| public void qtestIngestDataWithEnrichProcessor() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qtestIngestDataWithEnrichProcessor ?
| GetResponse getResponse = client().get(new GetRequest("my-index", Integer.toString(i))).actionGet(); | ||
| Map<String, Object> source = getResponse.getSourceAsMap(); | ||
| assertThat(source.size(), equalTo(1 + DECORATE_FIELDS.length)); | ||
| Map<?, ?> source = (Map<?, ?>) getResponse.getSourceAsMap().get("user"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious why <String,Object> --> <?,?>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid unchecked casts, otherwise we need to add a suppress annotation.
(the source is not direct source, but the content under the user key)
jbaiera
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just had some corner cases that I wanted to air, and some clarifications on using matchField vs field in the processor.
| | `policy_name` | yes | - | The name of the enrich policy to use. | ||
| | `enrich_key` | no | Policy enrich_key | The field to get the value from for the enrich lookup. | ||
| | `ignore_missing` | no | `false` | If `true` and `enrich_key` does not exist, the processor quietly exits without modifying the document | ||
| | `field` | yes | - | TThe field in the input document that matches the policies match_field used to retrieve the enrichment data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proofreading: TThe field -> The field
| } | ||
|
|
||
| TermQueryBuilder termQuery = new TermQueryBuilder(enrichKey, value); | ||
| TermQueryBuilder termQuery = new TermQueryBuilder(field, value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be matchField ?
| searchBuilder.size(1); | ||
| searchBuilder.trackScores(false); | ||
| searchBuilder.fetchSource(specifications.stream().map(s -> s.sourceField).toArray(String[]::new), null); | ||
| searchBuilder.fetchSource(null, matchField); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to always exclude the matchField in the result? I can see a case where the field used for looking up data is stored in user.id and the target_field is set to user. In that case, the id field would be thrown away when the processor overwrites the target field.
Granted, in that case, the user should be specifying that id field as the match_key AND as an enrich_value if they want it to appear in the target field after it is replaced. We should probably pass in the list of enrich values directly and use that instead of making the assumption that the id would never be wanted in the final data added. It's a small edge case, but it might be important for enriching ECS structured data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, lets then not filter out the match field. If filtering out the match field is required then this can be done by another processor configured after the enrich processor.
| return; | ||
| } else if (searchHits.length > 1) { | ||
| handler.accept(null, new IllegalStateException("more than one doc id matching for [" + enrichKey + "]")); | ||
| handler.accept(null, new IllegalStateException("more than one doc id matching for [" + field + "]")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be matchField or field?
|
@jbaiera @jakelandis Do you want to take another look? |
jbaiera
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Enrich processor configuration changes: * Renamed `enrich_key` option to `field` option. * Replaced `set_from` and `targets` options with `target_field`. The `target_field` option behaves different to how `set_from` and `targets` worked. The `target_field` is the field that will contain the looked up document. Relates to #32789
enrich_keyoption tofieldoption.set_fromandtargetsoptions withtarget_field.The
target_fieldoption behaves different to howset_fromandtargetsworked. Thetarget_fieldis the field that will containthe looked up document.
Relates to #32789