Skip to content

Conversation

@cwurm
Copy link
Contributor

@cwurm cwurm commented Jul 5, 2017

Adding sections on:

  • Shard size (20-30 GB)
  • Disabling _source when possible
  • Force Merge
  • Shrink

More suggestions welcome.

Adding sections on shard size (20-30 GB), disabling `_source` when possible, Force Merge, and Shrink.
@cwurm cwurm added the >docs General docs changes label Jul 5, 2017
Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment.


[float]
=== Watch your shard size

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should specify a size here as it depends on too many factors. With fast replica recovery coming that is another mitigating factor (#22484) to one of the drawbacks that you mention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasontedor Makes sense. Do you think we should mention an upper range, e.g. 50 GB?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good question but I don't think so. We are still fighting the "30 GB" heap recommendation, too many people see that number and think it's the magical number where they should set their heap without enough consideration for all the factors involved. Instead, I think that the verbiage is good but we should avoid enshrining specific numbers.

@cwurm
Copy link
Contributor Author

cwurm commented Jul 7, 2017

@jasontedor I've updated the shard size recommendation.

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few suggestions.


[float]
=== Force Merge

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps turn this around a bit: Elasticsearch stores data in shards. Shards are Lucene indices and are composed of segments. Segments are the actual files on disk, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, how about: "Indices in Elasticsearch are stored in one or more shards. Each shard is a Lucene index and made up of one or more segments - the actual files on disk. Larger segments are more efficient for storing data. The <<indices-forcemerge,_forcemerge API>> can be [...]"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me.

=== Watch your shard size

Larger shards are going to be more efficient at storing data. To increase the size of your shards, you can decrease the number of primary shards in an index by <<indices-create-index,creating indices>> with less primary shards, creating less indices (e.g. by leveraging the <<indices-rollover-index,Rollover API>>), or modifying an existing index using the <<indices-shrink-index,Shrink API>>.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder if a comment is in order here about how this applies to full recoveries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, I'm not sure under which circumstances we'd do a full recovery. Can you suggest a wording?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I've been on vacation. I will resume reviewing when I'm fully back tomorrow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need anything elaborate here, something like: "Keep in mind that large shard sizes come with drawbacks such as long full recovery times."

@cwurm
Copy link
Contributor Author

cwurm commented Aug 7, 2017

@jasontedor Incorporated your suggestions, thanks a lot. How does it look?

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@cwurm cwurm merged commit 0120448 into master Aug 21, 2017
@cwurm cwurm deleted the cwurm-docs-disk-usage branch August 21, 2017 19:08
cwurm added a commit that referenced this pull request Aug 23, 2017
cwurm added a commit that referenced this pull request Aug 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>docs General docs changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants