Skip to content

Conversation

@martijnvg
Copy link
Member

  • Renamed enrich_key option to field option.
  • Replaced set_from and targets options with target_field.

The target_field option behaves different to how set_from and
targets worked. The target_field is the field that will contain
the looked up document.

Relates to #32789

 * Renamed `enrich_key` option to `field` option.
 * Replaced `set_from` and `targets` options with `target_field`.

The `target_field` option behaves different to how `set_from` and
`targets` worked. The `target_field` is the field that will contain
the looked up document.

Relates to elastic#32789
@martijnvg martijnvg added >non-issue :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP labels Aug 12, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@martijnvg
Copy link
Member Author

@elasticmachine run elasticsearch-ci/1

Copy link
Contributor

@jakelandis jakelandis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. A couple minor suggestions.

Also, I just realized with this updated approach, it encourages small enrich data sets else by default you get a ton of data added to your docs. (IMO this is a good thing!)

| Name | Required | Default | Description
| `policy_name` | yes | - | The name of the enrich policy to use.
| `enrich_key` | no | Policy enrich_key | The field to get the value from for the enrich lookup.
| `field` | yes | - | The field to get the value from for the enrich lookup.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about :
The field to get the value from for the enrich lookup.
to
The field in the input document that matches the policies match_field used to retrieve the enrichment data

Also, we could default this to the match_field if we wanted to.

| `policy_name` | yes | - | The name of the enrich policy to use.
| `enrich_key` | no | Policy enrich_key | The field to get the value from for the enrich lookup.
| `field` | yes | - | The field to get the value from for the enrich lookup.
| `target_field` | yes | - | The field that will hold the content of the looked up document as json object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about :
The field that will hold the content of the looked up document as json object.
to
The field that will be used for the enrichment data

| `enrich_key` | no | Policy enrich_key | The field to get the value from for the enrich lookup.
| `field` | yes | - | The field to get the value from for the enrich lookup.
| `target_field` | yes | - | The field that will hold the content of the looked up document as json object.
| `ignore_missing` | no | `false` | If `true` and `enrich_key` does not exist, the processor quietly exits without modifying the document
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enrich_key -> field

also should the false be in single quotes ?

}

public void testIngestDataWithEnrichProcessor() {
public void qtestIngestDataWithEnrichProcessor() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qtestIngestDataWithEnrichProcessor ?

GetResponse getResponse = client().get(new GetRequest("my-index", Integer.toString(i))).actionGet();
Map<String, Object> source = getResponse.getSourceAsMap();
assertThat(source.size(), equalTo(1 + DECORATE_FIELDS.length));
Map<?, ?> source = (Map<?, ?>) getResponse.getSourceAsMap().get("user");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why <String,Object> --> <?,?>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid unchecked casts, otherwise we need to add a suppress annotation.
(the source is not direct source, but the content under the user key)

@martijnvg martijnvg requested a review from jakelandis August 14, 2019 09:54
Copy link
Member

@jbaiera jbaiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had some corner cases that I wanted to air, and some clarifications on using matchField vs field in the processor.

| `policy_name` | yes | - | The name of the enrich policy to use.
| `enrich_key` | no | Policy enrich_key | The field to get the value from for the enrich lookup.
| `ignore_missing` | no | `false` | If `true` and `enrich_key` does not exist, the processor quietly exits without modifying the document
| `field` | yes | - | TThe field in the input document that matches the policies match_field used to retrieve the enrichment data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proofreading: TThe field -> The field

}

TermQueryBuilder termQuery = new TermQueryBuilder(enrichKey, value);
TermQueryBuilder termQuery = new TermQueryBuilder(field, value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be matchField ?

searchBuilder.size(1);
searchBuilder.trackScores(false);
searchBuilder.fetchSource(specifications.stream().map(s -> s.sourceField).toArray(String[]::new), null);
searchBuilder.fetchSource(null, matchField);
Copy link
Member

@jbaiera jbaiera Aug 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to always exclude the matchField in the result? I can see a case where the field used for looking up data is stored in user.id and the target_field is set to user. In that case, the id field would be thrown away when the processor overwrites the target field.

Granted, in that case, the user should be specifying that id field as the match_key AND as an enrich_value if they want it to appear in the target field after it is replaced. We should probably pass in the list of enrich values directly and use that instead of making the assumption that the id would never be wanted in the final data added. It's a small edge case, but it might be important for enriching ECS structured data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, lets then not filter out the match field. If filtering out the match field is required then this can be done by another processor configured after the enrich processor.

return;
} else if (searchHits.length > 1) {
handler.accept(null, new IllegalStateException("more than one doc id matching for [" + enrichKey + "]"));
handler.accept(null, new IllegalStateException("more than one doc id matching for [" + field + "]"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be matchField or field?

@martijnvg martijnvg requested a review from jbaiera August 15, 2019 08:41
@martijnvg
Copy link
Member Author

@jbaiera @jakelandis Do you want to take another look?

Copy link
Member

@jbaiera jbaiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@martijnvg martijnvg merged commit 2879e67 into elastic:enrich Aug 22, 2019
martijnvg added a commit that referenced this pull request Aug 22, 2019
Enrich processor configuration changes:
* Renamed `enrich_key` option to `field` option.
* Replaced `set_from` and `targets` options with `target_field`.

The `target_field` option behaves different to how `set_from` and
`targets` worked. The `target_field` is the field that will contain
the looked up document.

Relates to #32789
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >non-issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants