Skip to content

Conversation

@mayya-sharipova
Copy link
Contributor

Add an index level setting index.mapping.nested_objects.limit to control
the number of nested json objects that can be in a single document
across all fields. Defaults to 10000.

Throw an error if the number of created nested documents exceed this
limit during the parsing of a document.

Closes #26962

@mayya-sharipova mayya-sharipova added >breaking v7.0.0 :Data Management/Indices APIs APIs to create and manage indices and templates labels Nov 15, 2017
@mayya-sharipova mayya-sharipova force-pushed the limit-nested-docs-number branch 2 times, most recently from bf33198 to a5c68ee Compare November 16, 2017 21:47
@cbuescher cbuescher self-assigned this Nov 17, 2017
@cbuescher cbuescher self-requested a review November 17, 2017 11:25
Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mayya-sharipova, I did a first round of review and left a couple of comments, mostly minor. The change looks good already, can you please also add test that show that the default values work as expected (e.g. intergration tests) and maybe a rest test for a small setting (like the ones in the unit test you already have)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can probably be a primitive long

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can probably be a primitive long

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if there is a way to get the default value from the MapperService.INDEX_MAPPING_NESTED_DOCS_LIMIT_SETTING itself, since it seems to be defined there. Would be good to not have to define the constant in two places in case it needs changing in the future. I did some initial digging but need to understand this a bit better myself I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbuescher thanks Christoph! I agree it is strange to define the constant in two places, but so far did not find the way. I will investigate more the way to define it a single place only.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding, is this only called for nested documents? I see there is another ParseContext implementation with "addDoc()", I guess that one isn't used for parsing nested documents?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbuescher Yes, this method only invoked to add nested documents. The Lucene document for the non nested parts of the document being indexed is added to the documents list in the constructor of this class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: long, otherwise it gets outboxed later anyway

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion so make this kind of xContent setup easier to read and less prone to accidental reformatting. I sometimes like to do something a little bit more verbose like this:

XContentBuilder docBuilder = XContentFactory.jsonBuilder();
docBuilder.startObject();
{
      docBuilder.startArray("nested1");
      {
           docBuilder.startObject().field("field1", "11").field("field2", "21").endObject();
           docBuilder.startObject().field("field1", "12").field("field2", "22").endObject();
      }
      docBuilder.endArray();
}
docBuilder.endObject();
SourceToParse source1 = SourceToParse.source("test1", "type", "1", docBuilder.bytes(), XContentType.JSON);

It allows to add some indentation that won't break so easily when reformatting the source.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting (and a great test by the way), because I just realized that the maximum number of nested docs check is applied across all nested fields in the document. I didn't immediately get that from the desciption in #26962 or the PR so far, maybe worth adding to the migration notest and docs as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add something that this limit applies accros all nested fields in a document. (see above)

@mayya-sharipova
Copy link
Contributor Author

@cbuescher Christoph, thanks for your review. I have tried to address your comments in the 2nd commit, can you please continue/finalize the review when you have available time.
The only thing I am not sure is what you meant by "migration notest" in your comment: "maybe worth adding to the migration notest and docs as well".

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mayya-sharipova thanks a lot, this looks very close. I left one more comment about testing the default setting and two nits, scratch my previous comment about "migration notest", I didn't see they were already there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you slightly rephrase so the version this feature was introduced is in the "reason" description (e.g. something like "the XYZ feature has been added in 7.0.0")? I think the skip "reason" is printed when running the tests and it might be useful to see the reason including the version at a glance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the case where #nested_docs < limit is already covered by the later tests, could you change this to that the the document has more than the default number of nested docs (simple loop) and gets rejected? This way we can test the default value works if no explicit setting is there. I'd also check if that unit test then takes a lot more time (which I doubt, gut feeling check is okay here I think)

Add an index level setting `index.mapping.nested_objects.limit` to control
the number of nested json objects that can be in a single document
across all fields. Defaults to 10000.

Throw an error if the number of created nested documents exceed this
limit during the parsing of a document.

Closes elastic#26962
Add an index level setting `index.mapping.nested_objects.limit` to control
the number of nested json objects that can be in a single document
across all fields. Defaults to 10000.

Throw an error if the number of created nested documents exceed this
limit during the parsing of a document.

Closes elastic#26962
Add an index level setting `index.mapping.nested_objects.limit` to control
the number of nested json objects that can be in a single document
across all fields. Defaults to 10000.

Throw an error if the number of created nested documents exceed this
limit during the parsing of a document.

Closes elastic#26962
@mayya-sharipova mayya-sharipova force-pushed the limit-nested-docs-number branch from 4984450 to e45f7bb Compare November 21, 2017 22:16
Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mayya-sharipova thanks a lot, this looks amost ready, I'm just a bit confused about the new test that checks the default setting and why it works (see line comment). Maybe you can clarify that, other than that I think this PR is ready.

docBuilder.startObject().field("field1", "11").field("field2", "21").endObject();
docBuilder.startObject().field("field1", "12").field("field2", "22").endObject();
docBuilder.startObject().field("field1", "13").field("field2", "23").endObject();
for(int i=0; i<=defaultMaxNoNestedDocs; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: some whitespaces around the operators etc... would be nice

docBuilder.startObject().field("field1", "12").field("field2", "22").endObject();
docBuilder.startObject().field("field1", "13").field("field2", "23").endObject();
for(int i=0; i<=defaultMaxNoNestedDocs; i++) {
docBuilder.startObject().field("f", i).endObject();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suprised this triggers the no. fields warnings since "f" is not the field thats defined with the "nested" type above? I would have expected this to not trigger the limit but the exception tested below seems to indicate it triggers the setting. What am I missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbuescher Please correct me if I misunderstood your previous comment: "could you change this to that the the document has more than the default number of nested docs (simple loop) and gets rejected".
This is what this new test is about.
We have a parent nested field nested1 under which we have many json objects with a field f, and since the number of these json objects is more than the default number, we get an error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I probably read the test to quickly and missed the startArray() part. You are absolutely right, this is how I expected the test. Sorry for the noise.

public void testLimitNestedDocs() throws Exception{
// setting limit to allow only two nested objects in the whole doc
long nested_docs_limit = 2L;
long maxNoNestedDocs = 2L;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just as feedback, nothing to change really, but I liked the previous variable name better ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbuescher thanks for the feedback, I thought in the previous variable names I did not follow Java camel case standards for variable names (was more of python style)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, and you are right, camel case is preferred. I probably misread the "NoNestedDocs" part of the name as "no nested docs" and that confused me for a second, but either way is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbuescher thanks Christoph, you are rightNoNestedDocs is confusing, I will avoid using No for "number of" in the future.

Add an index level setting `index.mapping.nested_objects.limit` to control
the number of nested json objects that can be in a single document
across all fields. Defaults to 10000.

Throw an error if the number of created nested documents exceed this
limit during the parsing of a document.

Closes elastic#26962
Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for bearing with me @mayya-sharipova ;-)

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mayya-sharipova mayya-sharipova merged commit 57e4d10 into elastic:master Nov 22, 2017
@mayya-sharipova mayya-sharipova deleted the limit-nested-docs-number branch November 22, 2017 15:16
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
russcam added a commit to elastic/elasticsearch-net that referenced this pull request Jun 20, 2019
Mpdreamz pushed a commit to elastic/elasticsearch-net that referenced this pull request Jun 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>breaking :Data Management/Indices APIs APIs to create and manage indices and templates v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants