Skip to content

Conversation

@nik9000
Copy link
Member

@nik9000 nik9000 commented Jun 12, 2020

Before #57042 the max_buckets test would consistently pass because the
request would consistently fail. In particular, the request would fail on
the data node. After #57042 it only fails on the coordinating node. When
the max_buckets test is run in a mixed version cluster it consistently
fails on either the data node or the coordinating node. Except when
the coordinating node is missing #43095. In that case if the one data
node has #57042 and one does not, and the one that doesn't gets the
request first, fails it as expected, and then the coordinating node
retries the request on the node with #57042. When that happens the
request fails mysteriously with "partial shard failures" as the error
message but not partial failures reported. This is exactly the bug
fixed in #43095.

This updates the test to be skipped in mixed version clusters without
#43095 because they sometimes fail the test spuriously. The request
fails in those cases, just like we expect, but with a mysterious error
message.

Closes #57657

Before elastic#57042 the max_buckets test would consistently pass because the
request would consistently fail. In particular, the request would fail on
the data node. After elastic#57042 it only fails on the coordinating node. When
the max_buckets test is run in a mixed version cluster it consistently
fails on *either* the data node or the coordinating node. Except when
the coordinating node is missing elastic#43095. In that case if the one data
node has elastic#57042 and one does not, *and* the one that doesn't gets the
request first, fails it as expected, and then the coordinating node
retries the request on the node with elastic#57042. When that happens the
request fails mysteriously with "partial shard failures" as the error
message but not partial failures reported. This is *exactly* the bug
fixed in elastic#43095.

This updates the test to be skipped in mixed version clusters without
 elastic#43095 because they *sometimes* fail the test spuriously. The request
fails in those cases, just like we expect, but with a mysterious error
message.

Closes elastic#57657
@nik9000 nik9000 added >test Issues or PRs that are addressing/adding tests :Analytics/Aggregations Aggregations v7.9.0 labels Jun 12, 2020
@nik9000 nik9000 requested a review from imotov June 12, 2020 12:00
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 12, 2020
@nik9000
Copy link
Member Author

nik9000 commented Jun 12, 2020

Note: This targets the 7.x branch because the failure only occurs there. There is no need to land this in master.

Copy link
Contributor

@imotov imotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was quite a brain teaser. Thanks a lot for digging into it!

@nik9000
Copy link
Member Author

nik9000 commented Jun 12, 2020

run elasticsearch-ci/packaging-sample-matrix-windows

1 similar comment
@nik9000
Copy link
Member Author

nik9000 commented Jun 12, 2020

run elasticsearch-ci/packaging-sample-matrix-windows

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Aggregations Aggregations Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >test Issues or PRs that are addressing/adding tests v7.9.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants