Skip to content

Commit b970c93

Browse files
committed
Fix style.
1 parent caf1769 commit b970c93

File tree

1 file changed

+12
-13
lines changed

1 file changed

+12
-13
lines changed

docs/reference/how-to/general.asciidoc

Lines changed: 12 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -16,30 +16,29 @@ use the <<search-request-scroll,Scroll>> API.
1616
=== Avoid large documents
1717

1818
Given that the default <<modules-http,`http.max_context_length`>> is set to
19-
100MB, Elasticsearch will refuse to index any document that is larger that
19+
100MB, Elasticsearch will refuse to index any document that is larger than
2020
that. You might decide to increase that particular setting, but Lucene still
21-
has a limit at about 2GB.
21+
has a limit of about 2GB.
2222

23-
But even without considering hard limits, large documents are usually not
23+
Even without considering hard limits, large documents are usually not
2424
practical. Large documents put more stress on network, memory usage and disk,
2525
even for search requests that do not request the `_source` since Elasticsearch
2626
needs to fetch the `_id` of the document in all cases, and the cost of getting
2727
this field is bigger for large documents due to how the filesystem cache works.
28-
Inverting this document can use an amount of memory that is a
29-
multiplier of the original size of the document. Proximity search (phrase
30-
queries for instance) and <<search-request-highlighting,highlighting>> also
31-
become more expensive since their cost directly depends on the size of the
32-
original document.
28+
Indexing this document can use an amount of memory that is a multiplier of the
29+
original size of the document. Proximity search (phrase queries for instance)
30+
and <<search-request-highlighting,highlighting>> also become more expensive
31+
since their cost directly depends on the size of the original document.
3332

3433
It is sometimes useful to reconsider what the unit of information should be.
3534
For instance, the fact you want to make books searchable doesn't necesarily
36-
mean that a document should consist of a book. It might be a better idea to
37-
use chapters, or even paragraphs as documents, and then have a property in
35+
mean that a document should consist of a whole book. It might be a better idea
36+
to use chapters or even paragraphs as documents, and then have a property in
3837
these documents that identifies which book they belong to. This does not only
3938
avoid the issues with large documents, it also makes the search experience
40-
likely better. For instance if a user searches for two words `foo` and `bar`,
41-
a match across different chapters is probably very poor, while a match within
42-
the same paragraph is likely good.
39+
better. For instance if a user searches for two words `foo` and `bar`, a match
40+
across different chapters is probably very poor, while a match within the same
41+
paragraph is likely good.
4342

4443
[float]
4544
[[sparsity]]

0 commit comments

Comments
 (0)