Improve Stability of Mock APIs #49518

original-brownbear · 2019-11-24T20:45:06Z

This commit ensures that even for requests that are known to be empty body
we at least attempt to read one bytes from the request body input stream.
This is done to work around the behavior in sun.net.httpserver.ServerImpl.Dispatcher#handleEvent
that will close a TCP/HTTP connection that does not have the eof flag (see sun.net.httpserver.LeftOverInputStream#isEOF)
set on its input stream. As far as I can tell the only way to set this flag is to do a read when there's no more bytes buffered.
This fixes the numerous connection closing issues because the ServerImpl stops closing connections that it thinks
weren't fully drained.

Also, I removed a now redundant drain loop in the Azure handler as well as removed the connection closing in the error handler's
drain action (this shouldn't have an effect but makes things more predictable/easier to reason about IMO).

I would suggest merging this and closing related issue after verifying that this fixes things on CI.

The way to locally reproduce the issues we're seeing in tests is to make the retry timings more aggressive in e.g. the azure tests
and move them to single digit values. This makes the retries happen quickly enough that they run into the async connecting closing
of allegedly non-eof connections by ServerImpl and produces the exact kinds of failures we're seeing currently.

Relates #49401, #49429 (even if the exception failing the retried request isn't a SocketException this is potentially the cause of it because the SocketException from the closing of the connection counted towards the overall retry count)

This commit ensures that even for requests that are known to be empty body we at least attempt to read one bytes from the request body input stream. This is done to work around the behavior in `sun.net.httpserver.ServerImpl.Dispatcher#handleEvent` that will close a TCP/HTTP connection that does not have the `eof` flag (see `sun.net.httpserver.LeftOverInputStream#isEOF`) set on its input stream. As far as I can tell the only way to set this flag is to do a read when there's no more bytes buffered. This fixes the numerous connection closing issues because the `ServerImpl` stops closing connections that it thinks weren't fully drained. Also, I removed a now redundant drain loop in the Azure handler as well as removed the connection closing in the error handler's drain action (this shouldn't have an effect but makes things more predictable/easier to reason about IMO). I would suggest merging this and closing related issue after verifying that this fixes things on CI. The way to locally reproduce the issues we're seeing in tests is to make the retry timings more aggressive in e.g. the azure tests and move them to single digit values. This makes the retries happen quickly enough that they run into the async connecting closing of allegedly non-eof connections by `ServerImpl` and produces the exact kinds of failures we're seeing currently. Relates elastic#49401

elasticmachine · 2019-11-24T20:45:08Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

original-brownbear · 2019-11-24T20:45:59Z

test/fixtures/azure-fixture/src/main/java/fixture/azure/AzureHttpHandler.java


            } else if (Regex.simpleMatch("DELETE /" + container + "/*", request)) {
                // Delete Blob (https://docs.microsoft.com/en-us/rest/api/storageservices/delete-blob)
-                try (InputStream is = exchange.getRequestBody()) {


This should always be empty? (at least it seems it is in practice)

tlrx

LGTM - I agree this should fix the CI issues. Thanks a lot Armin for digging this one!

original-brownbear · 2019-11-25T08:27:46Z

Thanks @tlrx , merging + backporting now :) Let's check build stats tomorrow 🤞

This commit ensures that even for requests that are known to be empty body we at least attempt to read one bytes from the request body input stream. This is done to work around the behavior in `sun.net.httpserver.ServerImpl.Dispatcher#handleEvent` that will close a TCP/HTTP connection that does not have the `eof` flag (see `sun.net.httpserver.LeftOverInputStream#isEOF`) set on its input stream. As far as I can tell the only way to set this flag is to do a read when there's no more bytes buffered. This fixes the numerous connection closing issues because the `ServerImpl` stops closing connections that it thinks weren't fully drained. Also, I removed a now redundant drain loop in the Azure handler as well as removed the connection closing in the error handler's drain action (this shouldn't have an effect but makes things more predictable/easier to reason about IMO). I would suggest merging this and closing related issue after verifying that this fixes things on CI. The way to locally reproduce the issues we're seeing in tests is to make the retry timings more aggressive in e.g. the azure tests and move them to single digit values. This makes the retries happen quickly enough that they run into the async connecting closing of allegedly non-eof connections by `ServerImpl` and produces the exact kinds of failures we're seeing currently. Relates elastic#49401, elastic#49429

This commit ensures that even for requests that are known to be empty body we at least attempt to read one bytes from the request body input stream. This is done to work around the behavior in `sun.net.httpserver.ServerImpl.Dispatcher#handleEvent` that will close a TCP/HTTP connection that does not have the `eof` flag (see `sun.net.httpserver.LeftOverInputStream#isEOF`) set on its input stream. As far as I can tell the only way to set this flag is to do a read when there's no more bytes buffered. This fixes the numerous connection closing issues because the `ServerImpl` stops closing connections that it thinks weren't fully drained. Also, I removed a now redundant drain loop in the Azure handler as well as removed the connection closing in the error handler's drain action (this shouldn't have an effect but makes things more predictable/easier to reason about IMO). I would suggest merging this and closing related issue after verifying that this fixes things on CI. The way to locally reproduce the issues we're seeing in tests is to make the retry timings more aggressive in e.g. the azure tests and move them to single digit values. This makes the retries happen quickly enough that they run into the async connecting closing of allegedly non-eof connections by `ServerImpl` and produces the exact kinds of failures we're seeing currently. Relates #49401, #49429

Same as elastic#49518 pretty much but for GCS. Fixing a few more spots where input stream can get closed without being fully drained and adding assertions to make sure it's always drained. Moved the no-close stream wrapper to production code utilities since there's a number of spots in production code where it's also useful (will reuse it there in a follow-up).

Same as #49518 pretty much but for GCS. Fixing a few more spots where input stream can get closed without being fully drained and adding assertions to make sure it's always drained. Moved the no-close stream wrapper to production code utilities since there's a number of spots in production code where it's also useful (will reuse it there in a follow-up).

Same as elastic#49518 pretty much but for GCS. Fixing a few more spots where input stream can get closed without being fully drained and adding assertions to make sure it's always drained. Moved the no-close stream wrapper to production code utilities since there's a number of spots in production code where it's also useful (will reuse it there in a follow-up).

Same as #49518 pretty much but for GCS. Fixing a few more spots where input stream can get closed without being fully drained and adding assertions to make sure it's always drained. Moved the no-close stream wrapper to production code utilities since there's a number of spots in production code where it's also useful (will reuse it there in a follow-up).

original-brownbear added >test Issues or PRs that are addressing/adding tests :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.6.0 v7.5.1 labels Nov 24, 2019

cs

9468c49

original-brownbear commented Nov 24, 2019

View reviewed changes

original-brownbear requested a review from tlrx November 25, 2019 05:30

tlrx approved these changes Nov 25, 2019

View reviewed changes

original-brownbear merged commit f836754 into elastic:master Nov 25, 2019

original-brownbear deleted the fix-http-handling-mock-apis branch November 25, 2019 08:28

original-brownbear mentioned this pull request Nov 25, 2019

Improve Stability of Mock APIs (#49518) #49524

Merged

original-brownbear mentioned this pull request Nov 25, 2019

Improve Stability of Mock APIs (#49518) #49525

Merged

original-brownbear mentioned this pull request Nov 26, 2019

Improve Stability of GCS Mock API #49592

Merged

original-brownbear mentioned this pull request Nov 26, 2019

Improve Stability of GCS Mock API (#49592) #49597

Merged

original-brownbear mentioned this pull request Nov 26, 2019

Improve Stability of GCS Mock API (#49592) #49598

Merged

original-brownbear restored the fix-http-handling-mock-apis branch August 6, 2020 18:34

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve Stability of Mock APIs #49518

Improve Stability of Mock APIs #49518

Uh oh!

original-brownbear commented Nov 24, 2019 •

edited

Loading

Uh oh!

elasticmachine commented Nov 24, 2019

Uh oh!

original-brownbear Nov 24, 2019

Uh oh!

tlrx left a comment

Uh oh!

original-brownbear commented Nov 25, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Improve Stability of Mock APIs #49518

Improve Stability of Mock APIs #49518

Uh oh!

Conversation

original-brownbear commented Nov 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Nov 24, 2019

Uh oh!

original-brownbear Nov 24, 2019

Choose a reason for hiding this comment

Uh oh!

tlrx left a comment

Choose a reason for hiding this comment

Uh oh!

original-brownbear commented Nov 25, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

original-brownbear commented Nov 24, 2019 •

edited

Loading