-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add a recommendation against large documents to the docs. #21652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dadoonet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
41e5706 to
45a41b2
Compare
|
Perhaps it's also worth noting that it can have a negative impact on proximity searches (per @mikemccand comment)? |
|
Thanks @bczifra , I just added a note about it. |
|
LGTM, thanks @jpountz! |
| since their content will need to be retrieved by the `_search` API to build | ||
| the response. Inverting this document can use an amount of memory that is a | ||
| multiplier of the original size of the document. Proximity search (phrase | ||
| queries for instance) and <<search-request-highlighting,highlighting>> also |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hitting the large document in the response is going to be slow in ES as well because we always go to stored fields for the id. Unless you turn off storing _source. I'm not sure if that needs to be in the list, but it feels important because as it reads now I'd think "well, I just have to avoid phrase queries and highlighting".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do that!
clintongormley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, minor style commens
| [[maximum-document-size]] | ||
| === Avoid large documents | ||
|
|
||
| Given that the default <<modules-http,`http.max_context_length`>> is set to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that -> than
|
|
||
| Given that the default <<modules-http,`http.max_context_length`>> is set to | ||
| 100MB, Elasticsearch will refuse to index any document that is larger that | ||
| that. You might decide to increase that particular setting, but Lucene still |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at -> of
| 100MB, Elasticsearch will refuse to index any document that is larger that | ||
| that. You might decide to increase that particular setting, but Lucene still | ||
| has a limit at about 2GB. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But even -> Even
|
|
||
| But even without considering hard limits, large documents are usually not | ||
| practical. Large documents put more stress on network, disk and on memory usage | ||
| since their content will need to be retrieved by the `_search` API to build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inverting -> Indexing (or) Indexing this document into the inverted index
| original document. | ||
|
|
||
| It is sometimes useful to reconsider what the unit of information should be. | ||
| For instance, the fact you want to make books searchable doesn't necesarily |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a single document should consist of a whole book
|
|
||
| It is sometimes useful to reconsider what the unit of information should be. | ||
| For instance, the fact you want to make books searchable doesn't necesarily | ||
| mean that a document should consist of a book. It might be a better idea to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use chapters [delete comma]
| mean that a document should consist of a book. It might be a better idea to | ||
| use chapters, or even paragraphs as documents, and then have a property in | ||
| these documents that identifies which book they belong to. This does not only | ||
| avoid the issues with large documents, it also makes the search experience |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete likely
294a949 to
b970c93
Compare
|
LGTM |
* master: (42 commits) Add support for merging custom meta data in tribe node (elastic#21552) [DOCS] Show EC2's auto attribute (elastic#21474) Add information about the removal of store throttling to the migration guide. Add a recommendation against large documents to the docs. (elastic#21652) Add indices options tests to search api REST tests (elastic#21701) Fixing indentation in geospatial querying example. (elastic#21682) Fix typo in filters aggregation docs (elastic#21690) Add BWC layer for Exceptions (elastic#21694) Add checkstyle rule to forbid empty javadoc comments (elastic#20881) Docs: Added offline install link for discovery-file plugin remove pointless catch exception in TransportSearchAction (elastic#21689) Rename ClusterState#lookupPrototypeSafe to `lookupPrototype` and remove previous "unsafe" unused variant (elastic#21686) Use a buffer to do character to byte conversion in StreamOutput#writeString (elastic#21680) Fix integer overflows when dealing with templates. (elastic#21628) Fix highlighting on a stored keyword field (elastic#21645) Set execute permissions for native plugin programs (elastic#21657) adjust visibility of DiscoveryNodes.Delta constructor Remove unused DiscoveryNodes.Delta constructor Remove unused DiscoveryNode#removeDeadMembers public method Remove minNodeVersion and corresponding public `getSmallestVersion` getter method from DiscoveryNodes ...
No description provided.