-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add support for querying JSON fields based on key. #34621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for querying JSON fields based on key. #34621
Conversation
|
Pinging @elastic/es-search-aggs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we consider all possible splits on . to try to find one with a JSON field name on the left-hand side. I'm a little concerned about the cost of this if the search field is deeply nested. Note that we don't just incur this cost when searching on a valid keyed JSON field, but also when the field is unmapped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do share your concern but I don't yet have a good idea of how to improve this. We cannot know the keys for the field mapper upfront (since they are potentially different on each document) and we want the syntax for accessing the keys to be as if they were actual fields.
One option might be to have a different separator between the json field name and the key but this means educating the user of the new syntax and potentially dealing with a collision with existing field names which have used that separator.
With the current implementation we incur the cost only if the mapping contains a json field and the cost increases with the nesting of the field being accessed. Maybe we could improve this by being able to know the maximum nesting of any of the json field names so if we haven't been able to find the json field by the time we descend to that nesting level int he field name we know we can give up? That would mean the cost is proportional to the maximum json nesting in the mapping rather than the field name we are trying to look up. It doesn't really solve this issue but maybe it helps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One slight change I would make would be to search forwards rather than backwards. I think we're much more likely to encounter mappings like json_field.deeply.nested.object.that.goes.on.for.ever, with a short field name and then a long object path. Combined with Colin's suggestion of checking the maximum nesting level that should make this more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One option might be to have a different separator between the json field name and the key
@colings86 I agree with your assessment -- it would be great to be able to keep the existing separator if possible.
@romseygeek I think that intuition makes sense to me (that we should design for the case that the JSON blob has more extreme nesting than the mapped json field). We limit the depth of mapping already through index.mapping.depth.limit, which defaults to 20. And the draw of a json field is that you can put arbitrary, complex data in there and not have to worry too much about its effect on performance.
My current plan is to reverse the order of the check as suggested, then experiment a bit and follow up with an optimization in another PR. In addition to your suggestion, another idea would be to store the JSON mappers in a trie.
12190dd to
de1533c
Compare
|
@colings86 @romseygeek I've removed the 'WIP' label and think that this PR is ready for another look. Although I will need to follow-up with a change or optimization as we discussed, I anticipate that we'll follow the same overall approach/ structure laid out in this PR. |
romseygeek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I think I'd like to see some YAML tests as well before this gets merged in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: whitespace should probably stay the same as before here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, thanks.
89ff2a9 to
8ed75db
Compare
ed1b9e6 to
dedab89
Compare
|
@elasticmachine test this please |
8ed75db to
b05fbdf
Compare
a914ca5 to
c2c2950
Compare
c2c2950 to
7ebb8e7
Compare
|
Thanks @romseygeek, added a couple REST tests. |
romseygeek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for all the iterations @jtibshirani
This PR is a bit of a 'work in progress', because I'm not entirely happy with how the field lookup logic turned out. It would be great to get your thoughts.