Skip to content

Conversation

@danielmitterdorfer
Copy link
Member

With this commit we add the .cfs file extension to the list of file
types that are memory-mapped by hybridfs. .cfs files combine all files
of a Lucene segment into a single file in order to save file handles. As
this strategy is only used for "small" segments (less than 10% of the
shard size), it is benefical to memory-map them instead of accessing
them via NIO.

Relates #36668

With this commit we add the `.cfs` file extension to the list of file
types that are memory-mapped by hybridfs. `.cfs` files combine all files
of a Lucene segment into a single file in order to save file handles. As
this strategy is only used for "small" segments (less than 10% of the
shard size), it is benefical to memory-map them instead of accessing
them via NIO.

Relates elastic#36668
@danielmitterdorfer danielmitterdorfer added >enhancement v7.0.0 :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. v8.0.0 v7.2.0 labels Feb 15, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@danielmitterdorfer
Copy link
Member Author

danielmitterdorfer commented Feb 15, 2019

I have run several experiments to judge the impact of this change. A baseline benchmark with mmapfs for a workload with external ids on a "small" index with roughly 3GB (on a system with 32GB RAM), we see a median throughput of 68700 docs/s. Without this change, hybridfs results in 56900 docs/s (a reduction of 17%). With this change, we are again on par (as expected).

I also ran an update-heavy workload with 40% id conflicts on a larger index of 75GB on a system that only has 8GB available page cache. The baseline (mmapfs) results in 12000 docs/s median indexing throughput whereas hybridfs with this change results in 21800 docs/s.

@danielmitterdorfer danielmitterdorfer merged commit 2ab88e2 into elastic:master Feb 15, 2019
@danielmitterdorfer danielmitterdorfer deleted the hybrid-fs-with-cfs branch February 15, 2019 12:08
danielmitterdorfer added a commit to danielmitterdorfer/elasticsearch that referenced this pull request Feb 15, 2019
With this commit we add the `.cfs` file extension to the list of file
types that are memory-mapped by hybridfs. `.cfs` files combine all files
of a Lucene segment into a single file in order to save file handles. As
this strategy is only used for "small" segments (less than 10% of the
shard size), it is benefical to memory-map them instead of accessing
them via NIO.

Relates elastic#36668
danielmitterdorfer added a commit to danielmitterdorfer/elasticsearch that referenced this pull request Feb 15, 2019
With this commit we add the `.cfs` file extension to the list of file
types that are memory-mapped by hybridfs. `.cfs` files combine all files
of a Lucene segment into a single file in order to save file handles. As
this strategy is only used for "small" segments (less than 10% of the
shard size), it is benefical to memory-map them instead of accessing
them via NIO.

Relates elastic#36668
danielmitterdorfer added a commit that referenced this pull request Feb 15, 2019
With this commit we add the `.cfs` file extension to the list of file
types that are memory-mapped by hybridfs. `.cfs` files combine all files
of a Lucene segment into a single file in order to save file handles. As
this strategy is only used for "small" segments (less than 10% of the
shard size), it is benefical to memory-map them instead of accessing
them via NIO.

Relates #36668
danielmitterdorfer added a commit that referenced this pull request Feb 15, 2019
With this commit we add the `.cfs` file extension to the list of file
types that are memory-mapped by hybridfs. `.cfs` files combine all files
of a Lucene segment into a single file in order to save file handles. As
this strategy is only used for "small" segments (less than 10% of the
shard size), it is benefical to memory-map them instead of accessing
them via NIO.

Relates #36668
@danielmitterdorfer
Copy link
Member Author

danielmitterdorfer commented Feb 15, 2019

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Feb 15, 2019
* elastic/master:
  Avoid double term construction in DfsPhase (elastic#38716)
  Fix typo in DateRange docs (yyy → yyyy) (elastic#38883)
  Introduced class reuses follow parameter code between ShardFollowTasks (elastic#38910)
  Ensure random timestamps are within search boundary (elastic#38753)
  [CI] Muting  method testFollowIndex in IndexFollowingIT
  Update Lucene snapshot repo for 7.0.0-beta1 (elastic#38946)
  SQL: Doc on syntax (identifiers in particular) (elastic#38662)
  Upgrade to Gradle 5.2.1 (elastic#38880)
  Tie break search shard iterator comparisons on cluster alias (elastic#38853)
  Also mmap cfs files for hybridfs (elastic#38940)
  Build: Fix issue with test status logging (elastic#38799)
  Adapt FullClusterRestartIT on master (elastic#38856)
  Fix testAutoFollowing test to use createLeaderIndex() helper method.
  Migrate muted auto follow rolling upgrade test and unmute this test (elastic#38900)
  ShardBulkAction ignore primary response on primary (elastic#38901)
  Recover peers from translog, ignoring soft deletes (elastic#38904)
  Fix NPE on Stale Index in IndicesService (elastic#38891)
  Smarter CCR concurrent file chunk fetching (elastic#38841)
  Fix intermittent failure in ApiKeyIntegTests (elastic#38627)
  re-enable SmokeTestWatcherWithSecurityIT (elastic#38814)
danielmitterdorfer added a commit that referenced this pull request Mar 19, 2019
With this commit we add the `.cfs` file extension to the list of file
types that are memory-mapped by hybridfs. `.cfs` files combine all files
of a Lucene segment into a single file in order to save file handles. As
this strategy is only used for "small" segments (less than 10% of the
shard size), it is benefical to memory-map them instead of accessing
them via NIO.

Relates #36668
kovrus added a commit to crate/crate that referenced this pull request Sep 9, 2019
kovrus added a commit to crate/crate that referenced this pull request Sep 9, 2019
mergify bot pushed a commit to crate/crate that referenced this pull request Sep 10, 2019
@jakelandis jakelandis removed the v8.0.0 label Jul 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v6.7.0 v7.0.0-rc2 v7.2.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants