Skip to content

Conversation

@dnhatn
Copy link
Member

@dnhatn dnhatn commented Apr 30, 2019

Translog#shouldRollGeneration should execute under the read lock for it accesses the current writer.

@dnhatn dnhatn added >bug :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. v8.0.0 v7.2.0 v6.7.3 v7.0.2 labels Apr 30, 2019
@dnhatn dnhatn requested a review from jasontedor April 30, 2019 16:44
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@dnhatn dnhatn requested a review from henningandersen May 1, 2019 14:34
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I you know ways in which this could manifest itself, it would be nice to include into the commit message. I think this should in reality never show, but is easier to reason about with the lock check included.

@jasontedor
Copy link
Member

I am not sure, is this actually a problem? There are issues here which this doesn't protect against. Namely the check and roll are not atomic, so it could be that a thread checks if a roll is needed, and then a roll occurs, and then this thread returns back and performs a roll itself. This change doesn't protect against that.

@dnhatn
Copy link
Member Author

dnhatn commented May 2, 2019

@henningandersen @jasontedor Thanks for looking.

I spotted this while I was investigating a case where more than 140 empty translog files in a single shard. For sure this is not the reason for those empty translog files. That shard has more than 280 primary failovers, and we roll a new translog on each primary promotion.

Although this should never happen in practice, I think this method should be called with lock. I marked this PR as a non-issue.

@jasontedor
Copy link
Member

I agree it's more correct and should be integrated, my point though is that I don't think it matters in practice yet it still leaves behind real issues. 🤷‍♀

@dnhatn dnhatn removed the v7.1.0 label May 6, 2019
@dnhatn
Copy link
Member Author

dnhatn commented May 9, 2019

@henningandersen @jasontedor Thanks for reviewing!

@dnhatn
Copy link
Member Author

dnhatn commented May 9, 2019

process was found dead while waiting for ports files, node{:modules:mapper-extras:integTest-1}

run elasticsearch-ci/1

@dnhatn dnhatn merged commit 80f4846 into elastic:master May 10, 2019
@dnhatn dnhatn deleted the should-rolll-under-lock branch May 10, 2019 12:50
dnhatn added a commit that referenced this pull request May 10, 2019
Translog#shouldRollGeneration should execute under the read lock since
it accesses the current writer.
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request May 10, 2019
* elastic/master: (84 commits)
  [ML] adds geo_centroid aggregation support to data frames (elastic#42088)
  Add documentation for calendar/fixed intervals (elastic#41919)
  Remove global checkpoint assertion in peer recovery (elastic#41987)
  Don't create tempdir for cli scripts (elastic#41913)
  Fix debian-8 update (elastic#42056)
  Cleanup plugin bin directories (elastic#41907)
  Prevent order being lost for _nodes API filters (elastic#42045)
  Change IndexAnalyzers default analyzer access (elastic#42011)
  Remove reference to fs.data.spins in docs
  Mute failing AsyncTwoPhaseIndexerTests
  Remove close method in PageCacheRecycler/Recycler (elastic#41917)
  [ML] adding pivot.max_search_page_size option for setting paging size (elastic#41920)
  Docs: Tweak list formatting
  Simplify handling of keyword field normalizers (elastic#42002)
  [ML] properly nesting objects in document source (elastic#41901)
  Remove extra `ms` from log message (elastic#42068)
  Increase the sample space for random inner hits name generator (elastic#42057)
  Recognise direct buffers in heap size docs (elastic#42070)
  shouldRollGeneration should execute under read lock (elastic#41696)
  Wait for active shard after close in mixed cluster (elastic#42029)
  ...
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
Translog#shouldRollGeneration should execute under the read lock since
it accesses the current writer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >non-issue v7.2.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants