Skip to content

Conversation

@yzhangal
Copy link
Contributor

…Content-Length delimited message body etc

NOTICE

Please create an issue in ASF JIRA before opening a pull request,
and you need to set the title of the pull request which starts with
the corresponding JIRA issue number. (e.g. HADOOP-XXXXX. Fix a typo in YYY.)
For more details, please see https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good.
-some minor nits in the pr, mostly about logging.
-afraid you have to following the testing_s3a.md file rules and tell us which S3 region you ran all the integration tests against. Do with -Dscale, as that does more of the input stream seeking

@Retries.OnceRaw
private void closeStream(String reason, long length, boolean forceAbort) {
if (isObjectStreamOpen()) {
if (!isObjectStreamOpen())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wrap in { }. Also add a comment "stream is already closed"

// remaining object payload is read from S3 while closing the stream.
LOG.debug("Aborting stream");
wrappedStream.abort();
LOG.warn("Aborting stream {}", uri);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert to debug. We abort() a lot as week seek round files if read policy != random, and don't want application logs overloaded with messages

* set
*/
private volatile boolean closed;
private S3Object object;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a javadoc comment explaining why this is needed "we have to keep a reference to this to stop garbage collection breaking the input stream"

LOG.info("Got exception while trying to read from stream {}" +
" trying to recover: " + ioe, uri);
LOG.info("Got exception while trying to read from stream " + uri
+ ", client: " + client + " object: " + object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the change? The previous one used SLF4J expansion, and deliberately only printed the IOEs message, not the stack.
If you want the stack, add a separate log at debug with the full details, and use SL4J expansion of {} elements in the log string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the review Steve! Will address other comments asap.

The reason I made the change here is, the symptom is intermittent, and the stack is helpful when we need to diagnose issues. I hope to catch it when it occurs. My worry is that turning on debug will introduce a lot of other debug messages to the logs. But if exceptions happen here very often, we can move the stack printing to debug. Is that the case here though? Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave at debug. We collect statistics on it, so if you see it a lot in the stats you can turn on debug logging for this class only.

@steveloughran steveloughran added bug fs/s3 changes related to hadoop-aws; submitter must declare test endpoint labels Nov 12, 2020
@steveloughran steveloughran self-requested a review November 12, 2020 12:58
@apache apache deleted a comment from hadoop-yetus Nov 16, 2020
Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, still want things at debug

@steveloughran
Copy link
Contributor

Oh, noticed another JIRA I have related to improvements here:
https://issues.apache.org/jira/browse/HADOOP-15944

  1. on exceptions the full stack should go in @ debug (I think you do this)
    2 include the URL of the object in all exceptions

Change #2 makes sense if there's an app reading many files in different threads: we want to know what is causing problems.

Could you do that in this patch?

@steveloughran steveloughran self-requested a review November 17, 2020 11:26
@steveloughran
Copy link
Contributor

build failures are checkstyles

./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java:555:    if (!isObjectStreamOpen()): 'if' construct must use '{}'s. [NeedBraces]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java:586:          LOG.warn("When closing {} stream for {}, will abort the stream", uri, reason, e);: Line is longer than 80 characters (found 91). [LineLength]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java:597:          LOG.warn("When aborting {} stream after failing to close it for {}", uri, reason, e);: Line is longer than 80 characters (found 95). [LineLength]

Be interesting to think of how to test this.

@steveloughran
Copy link
Contributor

@yzhangal you active on this? I'd like to wrap it up this month, and all of it is done other than the log and checkstyle tuning

@yzhangal
Copy link
Contributor Author

@yzhangal you active on this? I'd like to wrap it up this month, and all of it is done other than the log and checkstyle tuning

HI @steveloughran , sorry for the delay. I just did a new PR #2497, hopefully addressed all of your comments here. I tried to run test by following https://hadoop.apache.org/docs/r3.1.4/hadoop-aws/tools/hadoop-aws/testing.html#The_submitter_of_any_patch_is_required_to_run_all_the_integration_tests_and_declare_which_S3_region.2Fimplementation_they_used., but I do see errors like timeout, OOM etc. I wonder if the hadoop jenkins test can be set up to do the s3a test automatically like other tests. Thanks and happy holiday season.

@steveloughran
Copy link
Contributor

I wonder if the hadoop jenkins test can be set up to do the s3a test automatically like other tests.

  1. we can't give it credentials for security reasons -even if we only issued short-lived session credentials, getting them would be as trivial as submitting a PR which printed them. Same for abfs
    2, if someone isn't set up to run the tests, they aren't set up to deal with regressions or debug why their own patch doesn't work.
  2. There's an extra benefit -because everyone's config is slightly different (network, endpoints, encryption, etc) we get better coverage of test configurations by having different people run the tests. It's not unusual for a patch to get merged in but which a few days later needs a followup as someone else finds a regression in their test setup.

I would like more test runs, e.g the daily jenkins runs, to at least have credentials, but I've yet to come up with a good design for secure execution. It'd need something like

See: not easy.

Put your error stack traces into the PR. A single test failure isn't enough to block a patch if we can identify a cause and say "this is independent". Given you are seeing things I'm not, that's something we need to understand.

@yzhangal
Copy link
Contributor Author

yzhangal commented Dec 2, 2020

Many thanks Steve. I see the challenge of supporting the tests at upstream.

I reran the hadoop-tools/hadoop-aws tests with command "mvn verify -Dscale -DtestsThreadCount=8 " and only got the following (some of the earlier failures I saw did not show):

[ERROR] Errors:
[ERROR] ITestS3AContractRootDir.testListEmptyRootDirectory:82->AbstractContractRootDirectoryTest.testListEmptyRootDirectory:196 » TestTimedOut
[ERROR] ITestS3AContractRootDir>AbstractContractRootDirectoryTest.testRecursiveRootListing:265 » TestTimedOut
[ERROR] ITestS3AContractRootDir>AbstractContractRootDirectoryTest.testRmEmptyRootDirNonRecursive:101 » TestTimedOut
[ERROR] ITestS3ATemporaryCredentials.testSTS:133 » AccessDenied : request session cred...
[INFO]
[ERROR] Tests run: 260, Failures: 0, Errors: 4, Skipped: 45

Wonder if the failed tests work for you.

Thanks.

@steveloughran
Copy link
Contributor

closing this as superceded by #2497;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug fs/s3 changes related to hadoop-aws; submitter must declare test endpoint

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants