Skip to content

Conversation

@dnhatn
Copy link
Member

@dnhatn dnhatn commented Aug 24, 2018

We generate slightly different NoOps in InternalEngine and TransportShardBulkAction for the same failure.

  1. InternalEngine uses Exception#getFailure to generate a message without the class name.

newOp [NoOp{seqNo=1, primaryTerm=1, reason='Contexts are mandatory in context enabled completion field [suggest_context]'}]

  1. TransportShardBulkAction uses Exception#toString to generate a message with the class name.

NoOp{seqNo=1, primaryTerm=1, reason='java.lang.IllegalArgumentException: Contexts are mandatory in context enabled completion field [suggest_context]'}

If a write operation fails while a replica is recovering, that replica will possibly receive two different NoOps: one from recovery and one from replication. These two different NoOps will trip TranslogWriter#assertNoSeqNumberConflict assertion.

This commit makes sure that we generate the same No-Ops for the same failure.

Closes #32986

We generate NoOps in two places: in InternalEngine and in
TransportShardBulkAction after a write operation failed. However, we
generate slightly different messages for the same failure:

1. InternalEngine uses Exception#getFailure to generate a message
without the class name.

> newOp [NoOp{seqNo=1, primaryTerm=1, reason='Contexts are mandatory in
> context enabled completion field [suggest_context]'}]

2. TransportShardBulkAction uses  Exception#toString  to generate a
message with the class name.

> NoOp{seqNo=1, primaryTerm=1,
> reason='java.lang.IllegalArgumentException: Contexts are mandatory in
> context enabled completion field [suggest_context]'}

If a write operation fails while a replica is recovering, that replica
will possibly receive two different NoOps: one from InternalEngine and
one from TransportShardBulkAction.  Two different NoOps will trip
TranslogWriter#assertNoSeqNumberConflic assertion.

This commit makes sure that we generate the same No-Ops for the same
failure.

Closes elastic#32986
@dnhatn dnhatn added >bug :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v7.0.0 v6.5.0 v6.4.1 labels Aug 24, 2018
@dnhatn dnhatn requested review from jasontedor and s1monw August 24, 2018 21:41
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dnhatn
Copy link
Member Author

dnhatn commented Aug 27, 2018

Thanks @s1monw.

@dnhatn dnhatn merged commit 014b323 into elastic:master Aug 27, 2018
@dnhatn dnhatn deleted the fix-noop-failure branch August 27, 2018 19:59
dnhatn added a commit that referenced this pull request Aug 28, 2018
We generate slightly different NoOps in InternalEngine and
TransportShardBulkAction for the same failure.

1. InternalEngine uses Exception#getFailure to generate a message
without the class name: newOp [NoOp{seqNo=1, primaryTerm=1,
reason='Contexts are mandatory in context enabled completion field
[suggest_context]'}].

2. TransportShardBulkAction uses Exception#toString to generate a
message with the class name: NoOp{seqNo=1, primaryTerm=1,
reason='java.lang.IllegalArgumentException: Contexts are mandatory in
context enabled completion field [suggest_context]'}.

If a write operation fails while a replica is recovering, that replica
will possibly receive two different NoOps: one from recovery and one
from replication. These two different NoOps will trip
TranslogWriter#assertNoSeqNumberConflict assertion.

This commit ensures that we generate the same Noop for the same failure.

Closes #32986
dnhatn added a commit that referenced this pull request Aug 28, 2018
We generate slightly different NoOps in InternalEngine and
TransportShardBulkAction for the same failure.

1. InternalEngine uses Exception#getFailure to generate a message
without the class name: newOp [NoOp{seqNo=1, primaryTerm=1,
reason='Contexts are mandatory in context enabled completion field
[suggest_context]'}].

2. TransportShardBulkAction uses Exception#toString to generate a
message with the class name: NoOp{seqNo=1, primaryTerm=1,
reason='java.lang.IllegalArgumentException: Contexts are mandatory in
context enabled completion field [suggest_context]'}.

If a write operation fails while a replica is recovering, that replica
will possibly receive two different NoOps: one from recovery and one
from replication. These two different NoOps will trip
TranslogWriter#assertNoSeqNumberConflict assertion.

This commit ensures that we generate the same Noop for the same failure.

Closes #32986
dnhatn added a commit that referenced this pull request Aug 28, 2018
* master:
  [Rollup] Better error message when trying to set non-rollup index (#32965)
  HLRC: Use Optional in validation logic (#33104)
  Remove unused User class from protocol (#33137)
  ingest: Introduce the dissect processor (#32884)
  [Docs] Add link to es-kotlin-wrapper-client (#32618)
  [Docs] Remove repeating words (#33087)
  Minor spelling and grammar fix (#32931)
  Remove support for deprecated params._agg/_aggs for scripted metric aggregations (#32979)
  Watcher: Simplify finding next date in cron schedule (#33015)
  Run Third party audit with forbidden APIs CLI  (part3/3) (#33052)
  Fix plugin build test on Windows (#33078)
  HLRC+MINOR: Remove Unused Private Method (#33165)
  Remove old unused test script files (#32970)
  Build analysis-icu client JAR (#33184)
  Ensure to generate identical NoOp for the same failure (#33141)
  ShardSearchFailure#readFrom to set index and shardId (#33161)
dnhatn added a commit that referenced this pull request Aug 28, 2018
* 6.x:
  [Rollup] Better error message when trying to set non-rollup index (#32965)
  Remove unused User class from protocol (#33137)
  [DOCS] Adds link to 6.3.0 release highlights
  Test: fix token bwc tests due to bad backport
  Ensure to generate identical NoOp for the same failure (#33141)
  [Docs] Add link to es-kotlin-wrapper-client (#32618)
  [Docs] Remove repeating words (#33087)
  Minor spelling and grammar fix (#32931)
  Run Third party audit with forbidden APIs CLI  (part3/3) (#33052)
  Fix plugin build test on Windows (#33078)
  Watcher: Simplify finding next date in cron schedule (#33015)
  Remove old unused test script files (#32970)
  Build analysis-icu client JAR (#33184)
  Switch remaining tests to new style Requests (#33109)
  Use internal connection manager when fetching remote node info
  Switch remaining x-pack tests to new style Requests (#33108)
  Switch remaining ml tests to new style Requests (#33107)
  Token API supports the client_credentials grant (#33106)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v6.4.1 v6.5.0 v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants