Skip to content

Conversation

@jimczi
Copy link
Contributor

@jimczi jimczi commented Jun 11, 2019

When set to false, allowPartialSearchResults option does not check if the
shard failures have been reseted to null. The atomic array, that is used to record
shard failures, is filled with a null value if a successful request on a shard happens
after a failure on a shard of another replica. In this case the atomic array is not empty
but contains only null values so this shouldn't be considered as a failure since all
shards are successful (some replicas have failed but the retries on another replica succeeded).
This change fixes this bug by checking the content of the atomic array and fails the request only
if allowPartialSearchResults is set to false and at least one shard failure is not null.

Closes #40743

…tries

When set to false, allowPartialSearchResults option does not check if the
shard failures have been reseted to null. The atomic array, that is used to record
shard failures, is filled with a null value if a successful request on a shard happens
after a failure on a shard of another replica. In this case the atomic array is not empty
but contains only null values so this shouldn't be considered as a failure since all
shards are successful (some replicas have failed but the retries on another replica succeeded).
This change fixes this bug by checking the content of the atomic array and fails the request only
if allowPartialSearchResults is set to false and at least one shard failure is not null.

Closes elastic#40743
@jimczi jimczi added >bug :Search/Search Search-related issues that do not fall into other categories v8.0.0 v7.3.0 v6.8.1 labels Jun 11, 2019
@jimczi jimczi requested a review from markharwood June 11, 2019 11:10
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

Copy link
Contributor

@markharwood markharwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - test passing here.
Just some commented-out code in the test class that looks like it needs dealing with

}
//} else {
// initializing.add(routing);
//}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments need removing?

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jimczi jimczi merged commit 03322ea into elastic:master Jun 12, 2019
@jimczi jimczi deleted the bug/partial_results_successful_retries branch June 12, 2019 13:37
jimczi added a commit that referenced this pull request Jun 12, 2019
…tries (#43095)

When set to false, allowPartialSearchResults option does not check if the
shard failures have been reseted to null. The atomic array, that is used to record
shard failures, is filled with a null value if a successful request on a shard happens
after a failure on a shard of another replica. In this case the atomic array is not empty
but contains only null values so this shouldn't be considered as a failure since all
shards are successful (some replicas have failed but the retries on another replica succeeded).
This change fixes this bug by checking the content of the atomic array and fails the request only
if allowPartialSearchResults is set to false and at least one shard failure is not null.

Closes #40743
jimczi added a commit that referenced this pull request Jun 12, 2019
…tries (#43095)

When set to false, allowPartialSearchResults option does not check if the
shard failures have been reseted to null. The atomic array, that is used to record
shard failures, is filled with a null value if a successful request on a shard happens
after a failure on a shard of another replica. In this case the atomic array is not empty
but contains only null values so this shouldn't be considered as a failure since all
shards are successful (some replicas have failed but the retries on another replica succeeded).
This change fixes this bug by checking the content of the atomic array and fails the request only
if allowPartialSearchResults is set to false and at least one shard failure is not null.

Closes #40743
jimczi added a commit that referenced this pull request Jun 12, 2019
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jun 12, 2020
Before elastic#57042 the max_buckets test would consistently pass because the
request would consistently fail. In particular, the request would fail on
the data node. After elastic#57042 it only fails on the coordinating node. When
the max_buckets test is run in a mixed version cluster it consistently
fails on *either* the data node or the coordinating node. Except when
the coordinating node is missing elastic#43095. In that case if the one data
node has elastic#57042 and one does not, *and* the one that doesn't gets the
request first, fails it as expected, and then the coordinating node
retries the request on the node with elastic#57042. When that happens the
request fails mysteriously with "partial shard failures" as the error
message but not partial failures reported. This is *exactly* the bug
fixed in elastic#43095.

This updates the test to be skipped in mixed version clusters without
 elastic#43095 because they *sometimes* fail the test spuriously. The request
fails in those cases, just like we expect, but with a mysterious error
message.

Closes elastic#57657
nik9000 added a commit that referenced this pull request Jun 12, 2020
Before #57042 the max_buckets test would consistently pass because the
request would consistently fail. In particular, the request would fail on
the data node. After #57042 it only fails on the coordinating node. When
the max_buckets test is run in a mixed version cluster it consistently
fails on *either* the data node or the coordinating node. Except when
the coordinating node is missing #43095. In that case if the one data
node has #57042 and one does not, *and* the one that doesn't gets the
request first, fails it as expected, and then the coordinating node
retries the request on the node with #57042. When that happens the
request fails mysteriously with "partial shard failures" as the error
message but not partial failures reported. This is *exactly* the bug
fixed in #43095.

This updates the test to be skipped in mixed version clusters without
 #43095 because they *sometimes* fail the test spuriously. The request
fails in those cases, just like we expect, but with a mysterious error
message.

Closes #57657
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Search/Search Search-related issues that do not fall into other categories v6.8.1 v7.3.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Queries using allowPartialSearchResults=false involving only successful retries fail with status 503

5 participants