When parsing JSON fields, also create tokens prefixed with the field key. #34207

jtibshirani · 2018-10-01T22:43:33Z

This PR updates the parsing for json fields to emit tokens prefixed by the field key, in addition to 'root' tokens containing the unprefixed value.

As an example, given a json field called 'json_field' and the following input

{
  "json_field: {
    "key1": "some value",
    "key2": {
       "key3": true
    }
  }
}

the mapper will produce untokenized string fields with the values some value, key\0some value, true, and key2.key3\0true.

An important note about this change: the behavior we want is for searches on the 'root' JSON field to only consider raw values, and not prefixed tokens. For example, given the JSON field "headers": { "content-type": "application/json”"}, it should not match the query "prefix": { "headers": "content" } } just because one of the indexed tokens is content-type\0application/json. I think this behavior would be confusing, and that we should try to keep the token prefixing as an internal implementation detail. To avoid this issue, I chose to add the prefixed tokens to a new lucene field called <field name>._prefixed. This won't affect how searches are done (the user will still query using the key headers, or headers.content-type).

Finally, this PR just updates the indexing process -- these prefixed tokens can’t be searched yet. The search side will be implemented in a subsequent PR (I’m just a fan of keeping PRs small!)

elasticmachine · 2018-10-01T22:43:35Z

Pinging @elastic/es-search-aggs

jtibshirani · 2018-10-01T22:46:54Z

server/src/main/java/org/elasticsearch/index/mapper/JsonFieldParser.java

The null character \0 seemed like a reasonable choice for a separator, as (1) it shouldn’t show up too often in field keys, and (2) there is already precedent for it, as we use it when storing percolator queries (PercolatorFieldMapper#FIELD_VALUE_SEPARATOR).

jtibshirani · 2018-10-01T22:54:24Z

server/src/main/java/org/elasticsearch/index/mapper/JsonFieldParser.java

For prefixed values, the alternative option here would be to check the whole length of the prefixed token, as opposed to just the value. I think that this behavior is more intuitive (and I also don't think we're as concerned about field keys being really long, as opposed to values?)

I agree, we should probably put in some kind of soft limit on the depth of these objects at some point and the ignore above plus that soft limit will give us an upper bound on the term lengths here anyway

Makes sense to me, I'll make a note on the meta-issue to add a limit.

colings86

I left some comments but they are very minor

colings86 · 2018-10-02T08:16:09Z

server/src/main/java/org/elasticsearch/index/mapper/JsonFieldParser.java

Maybe we should call this rootFieldName or something so its clear everywhere exact which field is being used?

colings86 · 2018-10-02T08:22:10Z

server/src/main/java/org/elasticsearch/index/mapper/JsonFieldParser.java

Should we throw an exception in an else block here? If we encounter something like an array of arrays we should probably reject the document rather than silently ignoring it? The same probably applies for parseObject above?

Oops, I think this is actually a bug, since an array of arrays is valid JSON and should be accepted. Will fix.

I agree it's a good idea to add an else with an exception, so we'll fail fast when encountering something unexpected rather than attempt to proceed in a potentially wrong state.

colings86 · 2018-10-02T08:23:24Z

server/src/main/java/org/elasticsearch/index/mapper/JsonFieldParser.java

I agree, we should probably put in some kind of soft limit on the depth of these objects at some point and the ignore above plus that soft limit will give us an upper bound on the term lengths here anyway

romseygeek

I think this looks good. My only nit is that I don't like the name 'prefixed', as this seems to me to be an implementation detail. Something like 'keyed' might be better?

jtibshirani · 2018-10-02T17:13:28Z

Thanks both of you for taking a look. I also like the 'keyed' naming better, will update to that.

colings86

LGTM thanks for making the changes

romseygeek

LGTM too, thanks Julie!

…key.

…key. (#34207)

…key. (elastic#34207)

…key. (#34207)

…key. (elastic#34207)

…key. (#34207)

jtibshirani added >feature :Search Foundations/Mapping Index mappings, including merging and defining field types labels Oct 1, 2018

jtibshirani requested review from colings86 and romseygeek October 1, 2018 22:43

jtibshirani changed the title ~~Prefixed json fields~~ When parsing JSON fields, also create tokens prefixed with the field key. Oct 1, 2018

jtibshirani commented Oct 1, 2018

View reviewed changes

colings86 reviewed Oct 2, 2018

View reviewed changes

romseygeek reviewed Oct 2, 2018

View reviewed changes

colings86 approved these changes Oct 3, 2018

View reviewed changes

romseygeek approved these changes Oct 3, 2018

View reviewed changes

jtibshirani force-pushed the object-fields branch 3 times, most recently from 402b7ac to c0ea6d4 Compare October 3, 2018 20:58

jtibshirani added 7 commits October 3, 2018 22:58

When parsing JSON fields, also create tokens prefixed with the field …

d25f77f

…key.

Add an integration test for search queries.

fce305d

In JsonFieldParser, rename fieldName -> rootFieldName.

61bd167

Make sure that arrays of arrays are supported.

8facb51

When parsing JSON, fail hard when encountering an unexpected token.

6876518

Add a randomized test to help ensure JSON parsing is robust.

6627932

Use the term 'keyed' as opposed to 'prefixed'.

21534d9

jtibshirani force-pushed the prefixed-json-fields branch from cce97f2 to 21534d9 Compare October 3, 2018 20:58

jtibshirani merged commit 8ed75db into elastic:object-fields Oct 4, 2018

jtibshirani deleted the prefixed-json-fields branch October 4, 2018 06:29

jtibshirani added a commit that referenced this pull request Oct 18, 2018

When parsing JSON fields, also create tokens prefixed with the field …

20af9f7

…key. (#34207)

jtibshirani mentioned this pull request Oct 18, 2018

Flattened object fields design + implementation #33003

Closed

11 tasks

jtibshirani added a commit that referenced this pull request Oct 26, 2018

When parsing JSON fields, also create tokens prefixed with the field …

b05fbdf

…key. (#34207)

jtibshirani added a commit that referenced this pull request Oct 29, 2018

When parsing JSON fields, also create tokens prefixed with the field …

fec6a16

…key. (#34207)

jtibshirani added a commit that referenced this pull request Oct 31, 2018

When parsing JSON fields, also create tokens prefixed with the field …

3a24ecd

…key. (#34207)

jtibshirani added a commit that referenced this pull request Nov 7, 2018

When parsing JSON fields, also create tokens prefixed with the field …

47e9502

…key. (#34207)

jtibshirani added a commit to jtibshirani/elasticsearch that referenced this pull request Nov 8, 2018

When parsing JSON fields, also create tokens prefixed with the field …

11e112d

…key. (elastic#34207)

jtibshirani added a commit to jtibshirani/elasticsearch that referenced this pull request Nov 9, 2018

When parsing JSON fields, also create tokens prefixed with the field …

9a313b5

…key. (elastic#34207)

jtibshirani added a commit that referenced this pull request Nov 30, 2018

When parsing JSON fields, also create tokens prefixed with the field …

d94e481

…key. (#34207)

jtibshirani added a commit that referenced this pull request Dec 5, 2018

When parsing JSON fields, also create tokens prefixed with the field …

e820054

…key. (#34207)

jtibshirani added a commit that referenced this pull request Dec 13, 2018

When parsing JSON fields, also create tokens prefixed with the field …

e452af7

…key. (#34207)

jtibshirani added a commit that referenced this pull request Mar 6, 2019

When parsing JSON fields, also create tokens prefixed with the field …

b909125

…key. (#34207)

jtibshirani added a commit that referenced this pull request Mar 7, 2019

When parsing JSON fields, also create tokens prefixed with the field …

3706f31

…key. (#34207)

jtibshirani added a commit that referenced this pull request Mar 9, 2019

When parsing JSON fields, also create tokens prefixed with the field …

4a58906

…key. (#34207)

jtibshirani added a commit that referenced this pull request Mar 11, 2019

When parsing JSON fields, also create tokens prefixed with the field …

2d61f86

…key. (#34207)

jtibshirani added a commit that referenced this pull request Mar 14, 2019

When parsing JSON fields, also create tokens prefixed with the field …

a939000

…key. (#34207)

jtibshirani added a commit that referenced this pull request Mar 19, 2019

When parsing JSON fields, also create tokens prefixed with the field …

66a475d

…key. (#34207)

jtibshirani added a commit to jtibshirani/elasticsearch that referenced this pull request Mar 19, 2019

When parsing JSON fields, also create tokens prefixed with the field …

87a238e

…key. (elastic#34207)

jtibshirani added a commit that referenced this pull request Mar 28, 2019

When parsing JSON fields, also create tokens prefixed with the field …

7188ba7

…key. (#34207)

jtibshirani added a commit that referenced this pull request Apr 10, 2019

When parsing JSON fields, also create tokens prefixed with the field …

5608bb3

…key. (#34207)

jtibshirani added a commit that referenced this pull request Apr 17, 2019

When parsing JSON fields, also create tokens prefixed with the field …

648c272

…key. (#34207)

jtibshirani added a commit that referenced this pull request May 1, 2019

When parsing JSON fields, also create tokens prefixed with the field …

8058733

…key. (#34207)

jtibshirani added a commit that referenced this pull request May 24, 2019

When parsing JSON fields, also create tokens prefixed with the field …

0f2bd06

…key. (#34207)

jtibshirani added a commit that referenced this pull request May 29, 2019

When parsing JSON fields, also create tokens prefixed with the field …

09b68e7

…key. (#34207)

When parsing JSON fields, also create tokens prefixed with the field key. #34207

When parsing JSON fields, also create tokens prefixed with the field key. #34207

Uh oh!

Conversation

jtibshirani commented Oct 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Oct 1, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

colings86 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

romseygeek left a comment

Choose a reason for hiding this comment

Uh oh!

jtibshirani commented Oct 2, 2018

Uh oh!

colings86 left a comment

Choose a reason for hiding this comment

Uh oh!

romseygeek left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jtibshirani commented Oct 1, 2018 •

edited

Loading