-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-17338. Intermittent S3AInputStream failures: Premature end of … #2455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…Content-Length delimited message body etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good.
-some minor nits in the pr, mostly about logging.
-afraid you have to following the testing_s3a.md file rules and tell us which S3 region you ran all the integration tests against. Do with -Dscale, as that does more of the input stream seeking
| @Retries.OnceRaw | ||
| private void closeStream(String reason, long length, boolean forceAbort) { | ||
| if (isObjectStreamOpen()) { | ||
| if (!isObjectStreamOpen()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: wrap in { }. Also add a comment "stream is already closed"
| // remaining object payload is read from S3 while closing the stream. | ||
| LOG.debug("Aborting stream"); | ||
| wrappedStream.abort(); | ||
| LOG.warn("Aborting stream {}", uri); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert to debug. We abort() a lot as week seek round files if read policy != random, and don't want application logs overloaded with messages
| * set | ||
| */ | ||
| private volatile boolean closed; | ||
| private S3Object object; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a javadoc comment explaining why this is needed "we have to keep a reference to this to stop garbage collection breaking the input stream"
| LOG.info("Got exception while trying to read from stream {}" + | ||
| " trying to recover: " + ioe, uri); | ||
| LOG.info("Got exception while trying to read from stream " + uri | ||
| + ", client: " + client + " object: " + object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the change? The previous one used SLF4J expansion, and deliberately only printed the IOEs message, not the stack.
If you want the stack, add a separate log at debug with the full details, and use SL4J expansion of {} elements in the log string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the review Steve! Will address other comments asap.
The reason I made the change here is, the symptom is intermittent, and the stack is helpful when we need to diagnose issues. I hope to catch it when it occurs. My worry is that turning on debug will introduce a lot of other debug messages to the logs. But if exceptions happen here very often, we can move the stack printing to debug. Is that the case here though? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leave at debug. We collect statistics on it, so if you see it a lot in the stats you can turn on debug logging for this class only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, still want things at debug
|
Oh, noticed another JIRA I have related to improvements here:
Change #2 makes sense if there's an app reading many files in different threads: we want to know what is causing problems. Could you do that in this patch? |
|
build failures are checkstyles Be interesting to think of how to test this. |
|
@yzhangal you active on this? I'd like to wrap it up this month, and all of it is done other than the log and checkstyle tuning |
HI @steveloughran , sorry for the delay. I just did a new PR #2497, hopefully addressed all of your comments here. I tried to run test by following https://hadoop.apache.org/docs/r3.1.4/hadoop-aws/tools/hadoop-aws/testing.html#The_submitter_of_any_patch_is_required_to_run_all_the_integration_tests_and_declare_which_S3_region.2Fimplementation_they_used., but I do see errors like timeout, OOM etc. I wonder if the hadoop jenkins test can be set up to do the s3a test automatically like other tests. Thanks and happy holiday season. |
I would like more test runs, e.g the daily jenkins runs, to at least have credentials, but I've yet to come up with a good design for secure execution. It'd need something like
See: not easy. Put your error stack traces into the PR. A single test failure isn't enough to block a patch if we can identify a cause and say "this is independent". Given you are seeing things I'm not, that's something we need to understand. |
|
Many thanks Steve. I see the challenge of supporting the tests at upstream. I reran the hadoop-tools/hadoop-aws tests with command "mvn verify -Dscale -DtestsThreadCount=8 " and only got the following (some of the earlier failures I saw did not show): [ERROR] Errors: Wonder if the failed tests work for you. Thanks. |
|
closing this as superceded by #2497; |
…Content-Length delimited message body etc
NOTICE
Please create an issue in ASF JIRA before opening a pull request,
and you need to set the title of the pull request which starts with
the corresponding JIRA issue number. (e.g. HADOOP-XXXXX. Fix a typo in YYY.)
For more details, please see https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute