-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Expand How to tune for disk usage #25562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Adding sections on shard size (20-30 GB), disabling `_source` when possible, Force Merge, and Shrink.
jasontedor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment.
|
|
||
| [float] | ||
| === Watch your shard size | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should specify a size here as it depends on too many factors. With fast replica recovery coming that is another mitigating factor (#22484) to one of the drawbacks that you mention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jasontedor Makes sense. Do you think we should mention an upper range, e.g. 50 GB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good question but I don't think so. We are still fighting the "30 GB" heap recommendation, too many people see that number and think it's the magical number where they should set their heap without enough consideration for all the factors involved. Instead, I think that the verbiage is good but we should avoid enshrining specific numbers.
|
@jasontedor I've updated the shard size recommendation. |
jasontedor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few suggestions.
|
|
||
| [float] | ||
| === Force Merge | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps turn this around a bit: Elasticsearch stores data in shards. Shards are Lucene indices and are composed of segments. Segments are the actual files on disk, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, how about: "Indices in Elasticsearch are stored in one or more shards. Each shard is a Lucene index and made up of one or more segments - the actual files on disk. Larger segments are more efficient for storing data. The <<indices-forcemerge,_forcemerge API>> can be [...]"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to me.
| === Watch your shard size | ||
|
|
||
| Larger shards are going to be more efficient at storing data. To increase the size of your shards, you can decrease the number of primary shards in an index by <<indices-create-index,creating indices>> with less primary shards, creating less indices (e.g. by leveraging the <<indices-rollover-index,Rollover API>>), or modifying an existing index using the <<indices-shrink-index,Shrink API>>. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do wonder if a comment is in order here about how this applies to full recoveries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, I'm not sure under which circumstances we'd do a full recovery. Can you suggest a wording?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jasontedor ping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I've been on vacation. I will resume reviewing when I'm fully back tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need anything elaborate here, something like: "Keep in mind that large shard sizes come with drawbacks such as long full recovery times."
|
@jasontedor Incorporated your suggestions, thanks a lot. How does it look? |
jasontedor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Adding sections on:
_sourcewhen possibleMore suggestions welcome.