-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Allow duplicate keys in json processor #74602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow duplicate keys in json processor #74602
Conversation
|
Pinging @elastic/es-core-features (Team:Core/Features) |
danhermann
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging instead of
putAllis important as Filebeat sets values fordata_stream.type,data_stream.dataset, anddata_stream.namespace. If the logs override just one, such asdata_stream.dataset, currently, the wholedata_streamnamespace would be overridden.
This is a breaking change in the JSON processor's behavior that we can't add in a minor release unless it's an opt-in feature.
Also, we need some unit tests for the new allowDuplicateKeys option for XContent parsing.
modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/JsonProcessor.java
Outdated
Show resolved
Hide resolved
modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/JsonProcessor.java
Outdated
Show resolved
Hide resolved
| | `add_to_root` | no | false | Flag that forces the serialized json to be injected into the top level of the document. `target_field` must not be set when this option is chosen. | ||
| | Name | Required | Default | Description | ||
| | `field` | yes | - | The field to be parsed. | ||
| | `target_field` | no | `field` | The field that the converted structured object will be written into. Any existing content in this field will be overwritten. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like the "any existing content in this field will be overwritten" statement will no longer be necessarily true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is whether or not we want to alter the behavior for target_field. In my initial attempt, I've tried that but it made things quite complex. One reason is that target_field allows the source and target to be a scalar whereas add_to_root requires the source to be a map/object. Also, today the default behavior differs a lot depending on whether target_field or add_to_root is set.
The setting target_field always overrides the target. When add_to_root is set, the fields are kind of merged, but only on the first level. When we introduce a new merge option, it would be easier to only let that apply to add_to_root, however, that would make it somewhat confusing.
For now, I considered the behavior of the partial merge that applies if add_to_root is set to be a bug as it's arguably quite surprising. But I see that this may lead to backward-incompatible behavior if folks rely on the current behavior, as surprising as it may be.
I see three options now:
- Leave it as is and declare the partial merging behavior a bug.
- Introduce
merge_to_rootthat only applies for theadd_to_root, leaving the existing partial merge ofadd_to_rootand the overriding behavior oftarget_fieldintact. - Introduce
mergethat applies to bothtarget_fieldandadd_to_root. If set tofalse(default), leaving the existing partial merge ofadd_to_rootand the overriding behavior oftarget_fieldintact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. I'll bring this up at our team meeting and see if we can settle on a preferred option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the outcome of the discussion in the meeting, I've added an add_to_root_recursive_merge flag.
Doing that uncovered that CBOR and Smile currently don't support disabling duplicate detection after creating a parser. Not sure if that's intentional or a bug in Jackson. It doesn't seem impossible as the content gets parsed after creating the parser. As we only need to disable the duplicate detection for JSON, I'll not go down the rabbit hole of creating a Jackson PR. Btw, we're still using the deprecated |
Adds support for allowing duplicate keys.
Also, if
add_to_rootistrue, it will recursively merge the deserialized JSON object instead of overriding conflicting keys viaputAll.Background:
This is to support creating an ingest pipeline for ECS JSON logs.
Due to how some logging frameworks work, ECS JSON logs may contain duplicate keys. Instead of failing, the JSON parser should be more lenient and prefer the last value.
Merging instead of
putAllis important as Filebeat sets values fordata_stream.type,data_stream.dataset, anddata_stream.namespace. If the logs override just one, such asdata_stream.dataset, currently, the wholedata_streamnamespace would be overridden.