Skip to content

Conversation

@jtibshirani
Copy link
Contributor

This PR is a bit of a 'work in progress', because I'm not entirely happy with how the field lookup logic turned out. It would be great to get your thoughts.

@jtibshirani jtibshirani added >feature WIP :Search Foundations/Mapping Index mappings, including merging and defining field types labels Oct 18, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

Copy link
Contributor Author

@jtibshirani jtibshirani Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we consider all possible splits on . to try to find one with a JSON field name on the left-hand side. I'm a little concerned about the cost of this if the search field is deeply nested. Note that we don't just incur this cost when searching on a valid keyed JSON field, but also when the field is unmapped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do share your concern but I don't yet have a good idea of how to improve this. We cannot know the keys for the field mapper upfront (since they are potentially different on each document) and we want the syntax for accessing the keys to be as if they were actual fields.

One option might be to have a different separator between the json field name and the key but this means educating the user of the new syntax and potentially dealing with a collision with existing field names which have used that separator.

With the current implementation we incur the cost only if the mapping contains a json field and the cost increases with the nesting of the field being accessed. Maybe we could improve this by being able to know the maximum nesting of any of the json field names so if we haven't been able to find the json field by the time we descend to that nesting level int he field name we know we can give up? That would mean the cost is proportional to the maximum json nesting in the mapping rather than the field name we are trying to look up. It doesn't really solve this issue but maybe it helps?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One slight change I would make would be to search forwards rather than backwards. I think we're much more likely to encounter mappings like json_field.deeply.nested.object.that.goes.on.for.ever, with a short field name and then a long object path. Combined with Colin's suggestion of checking the maximum nesting level that should make this more efficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option might be to have a different separator between the json field name and the key

@colings86 I agree with your assessment -- it would be great to be able to keep the existing separator if possible.

@romseygeek I think that intuition makes sense to me (that we should design for the case that the JSON blob has more extreme nesting than the mapped json field). We limit the depth of mapping already through index.mapping.depth.limit, which defaults to 20. And the draw of a json field is that you can put arbitrary, complex data in there and not have to worry too much about its effect on performance.

My current plan is to reverse the order of the check as suggested, then experiment a bit and follow up with an optimization in another PR. In addition to your suggestion, another idea would be to store the JSON mappers in a trie.

@jtibshirani jtibshirani force-pushed the keyed-json-field branch 2 times, most recently from 12190dd to de1533c Compare October 19, 2018 00:49
@jtibshirani jtibshirani removed the WIP label Oct 22, 2018
@jtibshirani
Copy link
Contributor Author

jtibshirani commented Oct 22, 2018

@colings86 @romseygeek I've removed the 'WIP' label and think that this PR is ready for another look. Although I will need to follow-up with a change or optimization as we discussed, I anticipate that we'll follow the same overall approach/ structure laid out in this PR.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I think I'd like to see some YAML tests as well before this gets merged in?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: whitespace should probably stay the same as before here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, thanks.

@jtibshirani jtibshirani force-pushed the object-fields branch 3 times, most recently from 89ff2a9 to 8ed75db Compare October 26, 2018 06:27
@jtibshirani jtibshirani force-pushed the keyed-json-field branch 2 times, most recently from ed1b9e6 to dedab89 Compare October 26, 2018 06:55
@jtibshirani
Copy link
Contributor Author

@elasticmachine test this please

@jtibshirani jtibshirani force-pushed the keyed-json-field branch 2 times, most recently from a914ca5 to c2c2950 Compare October 26, 2018 19:24
@jtibshirani
Copy link
Contributor Author

Thanks @romseygeek, added a couple REST tests.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for all the iterations @jtibshirani

@jtibshirani jtibshirani merged this pull request into elastic:object-fields Oct 29, 2018
@jtibshirani jtibshirani deleted the keyed-json-field branch October 29, 2018 17:25
jtibshirani added a commit to jtibshirani/elasticsearch that referenced this pull request Nov 8, 2018
jtibshirani added a commit to jtibshirani/elasticsearch that referenced this pull request Nov 9, 2018
jtibshirani added a commit to jtibshirani/elasticsearch that referenced this pull request Mar 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>feature :Search Foundations/Mapping Index mappings, including merging and defining field types

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants