-
Couldn't load subscription status.
- Fork 9.1k
HADOOP-18883. [ABFS]: Expect-100 JDK bug resolution: prevent multiple server calls #6022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18883. [ABFS]: Expect-100 JDK bug resolution: prevent multiple server calls #6022
Conversation
| && EXPECT_100_JDK_ERROR.equals(e.getMessage())) { | ||
| LOG.debug( | ||
| "Getting output stream failed with expect header enabled, returning back ", | ||
| e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
351 and 352 lines can be merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. It was done by IDE. I think keeping as before shouldnt be a problem. will take this.
| + "and inputStream."); | ||
| return; | ||
| } | ||
| processConnHeadersAndInputStreams(buffer, offset, length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the method name can remain same as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this to make it more testable. This is to abstract out the logic which actually reads the responseHeaders and inputStream. And in future, if some more logic would need to be added, that can be added in the processConnHeadersAndInputStreams
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
| * and should qualify for retry. | ||
| */ | ||
| public static final int HTTP_CONTINUE = 100; | ||
| public static final String EXPECT_100_JDK_ERROR = "Server rejected operation"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this error string be returned by server only for this exception ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exception message thrown by JDK and not server: https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/sun/net/www/protocol/http/HttpURLConnection.java#L1364
| Mockito.anyInt(), Mockito.anyInt()); | ||
| } | ||
|
|
||
| private void readyMocksForAppendTest(final AbfsHttpOperation[] httpOpForAppendTest, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name can be changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please suggest a name that would be more apt. Reasoning for this name as that we indeed ready the mocks for append test. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mockSetupForAppend ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken: mockSetupForAppend
| LambdaTestUtils.intercept(FileNotFoundException.class, () -> { | ||
| os.close(); | ||
| }); | ||
| Assertions.assertThat(httpOpForAppendTest[0].getExpect100failureReceived()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assertion statements missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assertion is happening isTrue(). You mean something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion, I meant assertion description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
|
:::: AGGREGATED TEST RESULT :::: HNS-OAuth[INFO] Results: HNS-SharedKey[INFO] Results: NonHNS-SharedKey[INFO] Results: AppendBlob-HNS-OAuth[INFO] Results: Time taken: 45 mins 49 secs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added few queries around expectation of any exception case in getOuputStream().
| * and should qualify for retry. | ||
| */ | ||
| public static final int HTTP_CONTINUE = 100; | ||
| public static final String EXPECT_100_JDK_ERROR = "Server rejected operation"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wont the JDK bug were redundant connections are attempted be an issue for any exception case in getOutputStream() ? Shouldn't the case to handle be irrespective of explicit server rejected request case ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For any IOException thrown by getOutputStream, no headers / inputStream will be parsed. The flow of code is such that for IOException other expect-100 error, exception will be thrown back to AbfsRestOperation which will retry again as per retry-policy.
| private long sendRequestTimeMs; | ||
| private long recvResponseTimeMs; | ||
| private boolean shouldMask = false; | ||
| private boolean expect100failureReceived = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though the significance of the bug is seen in 100-Continue flow, the state of the previously established connection is the reason to prevent getHeaderField* APIs. Would suggest renaming this field to connectionDisconnectedOnError and also add a code comment that this is a workaround for JDK bug with link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe, getOutputStream can throw only two major types of exception:
- Expect-100 error
- Other IOException
In case of other IOException, we should immediately throw back to AbfsRestOperation to do the retry. We can take responseCode only in case of Except-100 error, because there was a valid server call made. In other IOException, they can contain, CT, connection-reset etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have taken it. Refactored the variable name.
| if (expectHeader != null && expectHeader.equals(HUNDRED_CONTINUE)) { | ||
| if (expectHeader != null && expectHeader.equals(HUNDRED_CONTINUE) | ||
| && e instanceof ProtocolException | ||
| && EXPECT_100_JDK_ERROR.equals(e.getMessage())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in an earlier comment, the HttpUrlConnection that we hold currently is disconnected while inside this catch block.
Do we want to prevent later API calls that trigger connections irrespective of throttled failures ? If so, setting the status of connectionDisconnectedOnError should be at the start of catch block ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken. The setting of connectionDisconnectedOnError in starting of the catch block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the question by @snvijaya is Do we want to prevent later API calls that trigger connections irrespective of any failures?
If yes then why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At httpUrlConnection.getOutputStream, either the error could IOException(including ConnectionTimeout and ReadTimeout) or expect-100 error (this raises ProtocolException which is child of IOException). Server errors if any would be caught in processResponse and the treatment would be same as done with all other apis (analyse if needed to be retried and then RestOperation would retry it).
In the JDK's implementation of getOutputStream, For the IOExceptions, the connection is killed. So, if further APIs are let go ahead, they would be firing a new server call all together. So, other APIs, like getHeaderField() etc, would be returning the data as per the new server call which is undesirable.
Also, the implementation of httpUrlConnection is such that the other APIs (like getHeaderField()), would internally call getInputStream(), which would would first call getOutputStream() (if the sendData flag is true and doesnt hold strOutputStream object). Now, here two things can happen:
- Expect100 failure: no data capture, and again any next API on the httpUrlConnection would fire a new call.
- Status-100 : Now, it is not in the block where data can be put in the outputStream, the stream shall be closed which will raise IOException, and from here it will go back to retry loop. Ref: https://github.com/openjdk/jdk8/blob/master/jdk/src/share/classes/sun/net/www/protocol/http/HttpURLConnection.java#L1463-L1471
Hence, any further API is prevented on the HttpUrlConnection object which has got an IOException in getOutputStream.
|
|
||
| this.statusDescription = getConnResponseMessage(); | ||
|
|
||
| /* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wont above getResponseCode() and getResponseMessage also lead to connections being established ? If yes, we should probably return right at the start of the method itself if connectionDisconnectedOnError is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of expect-100 error:
ProtocolException is raised: https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/sun/net/www/protocol/http/HttpURLConnection.java#L1364.
Caught by getOutputStream0(), we do: https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/sun/net/www/protocol/http/HttpURLConnection.java#L1488-L1493 -> wherein the responseCode is saved in object field.
In case of getResponseCode: it checks if the field is != -1. If yes, it returns the responseCode it has else, go via getInputStream route: https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/net/HttpURLConnection.java#L520-L522. Here we have it in the field and is not -1.
In case of getResponseMessage(), its kind of a getter. It internally calls getResponseCode(): https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/net/HttpURLConnection.java#L596-L599.
Hence, these two methods are safe.
|
🎊 +1 overall
This message was automatically generated. |
|
:::: AGGREGATED TEST RESULT :::: HNS-OAuth[INFO] Results: HNS-SharedKey[INFO] Results: NonHNS-SharedKey[INFO] Results: AppendBlob-HNS-OAuth[INFO] Results: Time taken: 51 mins 2 secs. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
@steveloughran @mehakmeet , requesting your review please. Thank you so much. |
|
Hi @steveloughran @mehakmeet , requesting your kind review. Thank you so much. |
|
Hi @steveloughran @mehakmeet . Requesting you to kindly review the PR please. This is PR to prevent day-0 JDK bug around expect-100 in abfs. Would be really awesome to get your feedback on this. Thank you so much. |
|
Hi @steveloughran @mehakmeet @mukund-thakur , Requesting you to kindly review the PR please. This is PR to prevent day-0 JDK bug around expect-100 in abfs. Would be really awesome to get your feedback on this. Thank you so much. |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
:::: AGGREGATED TEST RESULT :::: HNS-OAuth[INFO] Results: HNS-SharedKey[INFO] Results: NonHNS-SharedKey[INFO] Results: AppendBlob-HNS-OAuth[INFO] Results: Time taken: 50 mins 25 secs. |
|
Hi @steveloughran @mehakmeet @mukund-thakur , Requesting you to kindly review the PR please. This is PR to prevent day-0 JDK bug around expect-100 in abfs. Would be really awesome to get your feedback on this. Thank you so much. |
| is enabled, we return back without throwing an exception to | ||
| the caller. The caller is responsible for setting the correct status code. | ||
| If expect header is not enabled, we throw back the exception. | ||
| connectionDisconnectedOnError = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setting this field here and using in processResponse() means that we won't be processing response for any IOException. But isn't the intent to not process only in case of JDK error?
So shouldn't this go inside the if (EXPECT_100_JDK_ERROR.equals(e.getMessage().......) check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to prevent any further API called on the httpUrlConnection if it throws IOException for the reason shared in https://github.com/apache/hadoop/pull/6022/files#r1444240890.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
+1
… server calls (apache#6022) Address JDK bug JDK-8314978 related to handling of HTTP 100 responses. https://bugs.openjdk.org/browse/JDK-8314978 In the AbfsHttpOperation, after sendRequest() we call processResponse() method from AbfsRestOperation. Even if the conn.getOutputStream() fails due to expect-100 error, we consume the exception and let the code go ahead. This may call getHeaderField() / getHeaderFields() / getHeaderFieldLong() after getOutputStream() has failed. These invocation all lead to server calls. This commit aims to prevent this. If connection.getOutputStream() fails due to an Expect-100 error, the ABFS client does not invoke getHeaderField(), getHeaderFields(), getHeaderFieldLong() or getInputStream(). getResponseCode() is safe as on the failure it sets the responseCode variable in HttpUrlConnection object. Contributed by Pranav Saxena
|
Thank you so much @steveloughran for the review. Have raised a backport pr on branch-3.3: #6484. Thank you so much. |
… server calls (#6022) Address JDK bug JDK-8314978 related to handling of HTTP 100 responses. https://bugs.openjdk.org/browse/JDK-8314978 In the AbfsHttpOperation, after sendRequest() we call processResponse() method from AbfsRestOperation. Even if the conn.getOutputStream() fails due to expect-100 error, we consume the exception and let the code go ahead. This may call getHeaderField() / getHeaderFields() / getHeaderFieldLong() after getOutputStream() has failed. These invocation all lead to server calls. This commit aims to prevent this. If connection.getOutputStream() fails due to an Expect-100 error, the ABFS client does not invoke getHeaderField(), getHeaderFields(), getHeaderFieldLong() or getInputStream(). getResponseCode() is safe as on the failure it sets the responseCode variable in HttpUrlConnection object. Contributed by Pranav Saxena
… server calls (apache#6022) Address JDK bug JDK-8314978 related to handling of HTTP 100 responses. https://bugs.openjdk.org/browse/JDK-8314978 In the AbfsHttpOperation, after sendRequest() we call processResponse() method from AbfsRestOperation. Even if the conn.getOutputStream() fails due to expect-100 error, we consume the exception and let the code go ahead. This may call getHeaderField() / getHeaderFields() / getHeaderFieldLong() after getOutputStream() has failed. These invocation all lead to server calls. This commit aims to prevent this. If connection.getOutputStream() fails due to an Expect-100 error, the ABFS client does not invoke getHeaderField(), getHeaderFields(), getHeaderFieldLong() or getInputStream(). getResponseCode() is safe as on the failure it sets the responseCode variable in HttpUrlConnection object. Contributed by Pranav Saxena
… server calls (#6022) Address JDK bug JDK-8314978 related to handling of HTTP 100 responses. https://bugs.openjdk.org/browse/JDK-8314978 In the AbfsHttpOperation, after sendRequest() we call processResponse() method from AbfsRestOperation. Even if the conn.getOutputStream() fails due to expect-100 error, we consume the exception and let the code go ahead. This may call getHeaderField() / getHeaderFields() / getHeaderFieldLong() after getOutputStream() has failed. These invocation all lead to server calls. This commit aims to prevent this. If connection.getOutputStream() fails due to an Expect-100 error, the ABFS client does not invoke getHeaderField(), getHeaderFields(), getHeaderFieldLong() or getInputStream(). getResponseCode() is safe as on the failure it sets the responseCode variable in HttpUrlConnection object. Contributed by Pranav Saxena
… server calls (apache#6022) Address JDK bug JDK-8314978 related to handling of HTTP 100 responses. https://bugs.openjdk.org/browse/JDK-8314978 In the AbfsHttpOperation, after sendRequest() we call processResponse() method from AbfsRestOperation. Even if the conn.getOutputStream() fails due to expect-100 error, we consume the exception and let the code go ahead. This may call getHeaderField() / getHeaderFields() / getHeaderFieldLong() after getOutputStream() has failed. These invocation all lead to server calls. This commit aims to prevent this. If connection.getOutputStream() fails due to an Expect-100 error, the ABFS client does not invoke getHeaderField(), getHeaderFields(), getHeaderFieldLong() or getInputStream(). getResponseCode() is safe as on the failure it sets the responseCode variable in HttpUrlConnection object. Contributed by Pranav Saxena
… server calls (#6022) Address JDK bug JDK-8314978 related to handling of HTTP 100 responses. https://bugs.openjdk.org/browse/JDK-8314978 In the AbfsHttpOperation, after sendRequest() we call processResponse() method from AbfsRestOperation. Even if the conn.getOutputStream() fails due to expect-100 error, we consume the exception and let the code go ahead. This may call getHeaderField() / getHeaderFields() / getHeaderFieldLong() after getOutputStream() has failed. These invocation all lead to server calls. This commit aims to prevent this. If connection.getOutputStream() fails due to an Expect-100 error, the ABFS client does not invoke getHeaderField(), getHeaderFields(), getHeaderFieldLong() or getInputStream(). getResponseCode() is safe as on the failure it sets the responseCode variable in HttpUrlConnection object. Contributed by Pranav Saxena
JIRA: https://issues.apache.org/jira/browse/HADOOP-18883
This is inline to JDK bug: https://bugs.openjdk.org/browse/JDK-8314978.
In the AbfsHttpOperation, after
sendRequest()we callprocessResponse()method from AbfsRestOperation. Even if theconn.getOutputStream()fails due to expect-100 error, we consume the exception and let the code go ahead. So, we can havegetHeaderField() / getHeaderFields() / getHeaderFieldLong()which will be triggered after getOutputStream is failed. These invocation will lead to server calls.This pr aims to prevent this. In the solution, if
conn.getOutputStream()is failed due to expect100 error, we will not invoke any getHeaderField(), getHeaderFields(), getHeaderFieldLong(), getInputStream(). getResponseCode() is safe, reason being, when expect100 fails, it sets theresponseCodevariable in HttpUrlConnection object. This would be just a getter for that.For a network trace without the change:
withoutchange.txt
Trace 27 fails the expect100 in abfsHttpOp.getOutputStream(). Then
Trace 36, 45, 54 are the server calls getting made from conn.getHeaderFields(), conn.getHeaderField(), getHeaderFieldLong() in abfsHttpOp.
Trace 65 is because of the retry from client (where we disable expect100 in the header and retry)
Network trace with the change:
withchange.txt
Trace 28 is because of the getOutpuStream(), then as you can see no other call is no repeated server call.
Trace 39 is because of the retry from client (where we disable expect100 in the header and retry)
Test run:
:::: AGGREGATED TEST RESULT ::::
HNS-OAuth
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:344->lambda$testAcquireRetry$6:345 » TestTimedOut
[INFO]
[ERROR] Tests run: 572, Failures: 0, Errors: 1, Skipped: 54
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41
HNS-SharedKey
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:336 » TestTimedOut test timed o...
[INFO]
[ERROR] Tests run: 572, Failures: 0, Errors: 1, Skipped: 54
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41
NonHNS-SharedKey
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 11
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] ITestAzureBlobFileSystemRandomRead.testSkipBounds:218->Assert.assertTrue:42->Assert.fail:89 There should not be any network I/O (elapsedTimeMs=38).
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:344->lambda$testAcquireRetry$6:345 » TestTimedOut
[INFO]
[ERROR] Tests run: 572, Failures: 1, Errors: 1, Skipped: 277
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 44
AppendBlob-HNS-OAuth
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:336 » TestTimedOut test timed o...
[INFO]
[ERROR] Tests run: 572, Failures: 0, Errors: 1, Skipped: 54
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41
Time taken: 45 mins 6 secs.
azureuser@Hadoop-VM-EAST2:~/hadoop/hadoop-tools/hadoop-azure$ git log
commit b15468f (HEAD -> expect100resolution, origin/expect100resolution)
Author: Pranav Saxena <>
Date: Tue Sep 5 03:18:01 2023 -0700