Skip to content

Optimize indexing by replacing sychroinzed lock of TranslogWriter.add by Disruptor in async mode #45371

@dengweisysu

Description

@dengweisysu

When indexing big data in high speed, sychronized lock of TranslogWriter.add will waste a lot of time. Here is some situations that Completing for sync lock happens:

  • with many other thread writing the same index shard
  • flush trigger by IndexService.AsyncTranslogFSync
  • flush trigger when translog reach flush_threshold_size
  • flush trigger when rolling generation

for user, translog durability strategy was set async , but there are still much performance losed (about 50% in my situation) compared with no translog writing. So I try to using RingBuffer(Disruptor) to make translog adding lock free for writing thread.

Test scenarios:

  • machine: Cpu=24 core, memory=64G
  • doc count: 10 million
  • field count per doc: 400+
  • translog size per doc: about 3k (34G tlog generate for 10 million doc)

elasticsearch node config and index config:

# large size to reduce times of full flush
translog.flush_threshold_size: 30G (default 512M)
index.translog.sync_interval: 60s (default 5s)

indices.memory.index_buffer_size: 20% (default 10%)

Indexing elapsed time for each scene list below:

  • translog open with async durability model : 18 minutes
  • translog close (change source code) : 12 minutes
  • translog writing async using disruptor(change source code) : 12 minutes

reference: https://lmax-exchange.github.io/disruptor/

Metadata

Metadata

Assignees

Labels

:Distributed Indexing/EngineAnything around managing Lucene and the Translog in an open shard.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions