-
Couldn't load subscription status.
- Fork 9.1k
HADOOP-16221 add option to fail operation on metadata write failure #666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add an entry for this in org.apache.hadoop.fs.s3a.S3ARetryPolicy for whatever policy we think matters. For now, set it to fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to change the policy annotation here
|
right, if you are getting into the failure handling, you are going to have to look closely at the invoker retry stuff and those @Retry annotations we do our best to keep up to date. This means: any change to stop swallowing DDB exceptions will need changes in the annotation values and commentary on all uses of the {{finishedWrite()}} method, and review of "is everything consistent with my changes". Yes, it's a pain, yes it's manual, but its there to help us all understand how things fail, what kind of exceptions get raised, etc. It was that or javadocs: at least with these annotations you could imagine doing some graph of invocations
|
|
🎊 +1 overall
This message was automatically generated. |
|
Thanks for the feedback @steveloughran ! To be honest, the purpose of the retry annotations hadn't really sunk in for me until you mentioned them. I pushed an update but I don't think I'm grasping the annotations well enough to have gotten it correct. It's not clear to me if the annotation is supposed to be about the code within the immediate scope of the method or the transitive scope. For example, the code immediately within the finishedWrite() method is not retried and the method itself does not translate any exceptions that occur within it. As such, I documented it as "OnceRaw". That said, S3Guard.putAndReturn() transitively does include retries, and there is translation on those exceptions. That occurs within DynamoDBMetadataStore.processBatchWriteRequest(). This had me wondering whether finishedWrite() should have been documented as "RetryTranslated". Further complicating things is that finishedWrite() now sometimes (depending on configuration) swallows exceptions. I guess I could have used a string argument passed to the annotation to document that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DynamoDB metastore putAndReturn is Retrydoes retry; what's changing now is that
raw S3 FS Delete exceptions are being swallowed; when eventually s3guard gives up, that's thrown. Afraid you are going to have to chase through that call chain to see what's going on and make sure things are consistent w.r.t retry/translation and with clarifications. If there's some stuff in between which isn't fully annotated (putAndReturn()), this is the chance to fix that. I know it's a pain, but trying to wrap retry() with retry() causes an exponential explosion in the time a failure takes to surface and we need to keep absolutely on top of that (more than the translated/raw stuff)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think I had the wrong impression of the expectations for the annotations.
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/WriteOperationHelper.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have this wrong. Now my impression is I shouldn't have changed this annotation. This entire operation is effectively RetryTranslated. This method itself retries and translates exceptions on complete-MPU and the finishedWrite() method retries and translates exceptions internal to itself (via the retrying and translating occurring in DynamoDBMetadataStore). Is that how you would think of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if AWS exceptions are being caught & stuck through translateException, they are retry-translated. The key thing is: by the time any operation gets to the public FS APIs, they must be translated. Double translate is harmless. It is retry-on-retry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Looks like I can just revert the change on this line here and just leave it as it was before since it already said Retries.RetryTranslated and the extra text I added isn't helping any.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 8ff5427
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to improve the documentation of this class here. I'm curious if I have captured the real intended meaning/purpose of these annotations? If so, hopefully it will help the next developer understand the goal better than I did initially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, though I'd be stricter about the wrapping of retries. You mustn't retry around a retrying operation as you'll just cause things to take so long that operations will time out anyway. The raw/translated flags are markers about whether to expect AWS-library Runtime Exceptions or translated stuff, or a mixture. It's less critical, but once everything is pure IOE, there's no need to catch and convert again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tweaked in ef91a67
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should use {@value} instead of false in the javadoc like elsewhere in this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this exception won't happen unless the config allows it, maybe reword to something like "could unacceptably not be saved to the metadata store" as a hint about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change @throws from IOException to more specific MetadataPersistenceException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo, whichcase -> which case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be helpful to change key -> p here, so that whoever is reading the error log has the qualified path? also the : at the end of the message string is not necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might prefix this with something like "If the write operation cannot be programmatically retried, ..." since that would probably be the preferred remedy for this exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
excepted -> expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add another test where the only difference is that this is flipped to false (or never set since that's the default), that verifies no MetadataPersistenceException is thrown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Also noticed the test class was incorrectly named and fixed it. It was really in integration test and furthermore, the convention in the project is the test class names are prefixed rather than suffixed with Test or ITest.
|
💔 -1 overall
This message was automatically generated. |
|
Another case of yetus can't apply the patch from the PR. This time I just squash rebased on trunk and force-pushed. |
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
Looks like the test failures were due to some sort of jenkins/yetus infrastructure issue. The logs are full of "java.lang.OutOfMemoryError: unable to create new native thread" like this one: I'll use the trick Gabor mentioned on my other PR of amending my commit message and force-pushing to get it to build again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the h1 level heading here was a mistake made when this section was added. I fixed it when addressing the merge conflict with my changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that's happened in a couple of places. thanks
commit a03e02eb93c86f7ff2c5bac3db85a12d1bc166fe Merge: cc5e44c988a df76cdc Author: Ben Roling <[email protected]> Date: Thu Apr 18 16:09:16 2019 -0500 Merge branch 'trunk' into HADOOP-16221 commit cc5e44c988aeba2a8f1e8899518a55482d52d93d Merge: 8ff542741ab 04c0437 Author: Ben Roling <[email protected]> Date: Tue Apr 16 22:40:16 2019 -0500 Merge branch 'trunk' into HADOOP-16221 commit 8ff542741ab3386a9ce967fb2f36fab2265413f4 Author: Ben Roling <[email protected]> Date: Tue Apr 16 22:27:14 2019 -0500 Incorporate PR review feedback commit 5d02d2d07ba3cc9503c1b5d2d8b75eb5818db5b8 Author: Ben Roling <[email protected]> Date: Thu Apr 4 15:27:28 2019 -0500 Fix broken ITestS3AMetadataPersistenceException Previously the test could fail if the test file already existed. Edit: commit message ammended to force a yetus rebuild. commit b359f27be46193a99bb23b96eff9fc5b64f9d25e Author: Ben Roling <[email protected]> Date: Fri Mar 29 11:31:13 2019 -0500 HADOOP-16221 add option to fail operation on metadata write failure
cc5e44c to
09ebd87
Compare
|
Squashed and force-pushed to get Yetus to build it again. |
|
I did another run of the tests (with us-west-2) Tests had errors or failures:
I re-ran these each individually and they passed: |
|
🎊 +1 overall
This message was automatically generated. |
|
I've just been retesting this...happy with the changes in the operation, just two things I want to make sure we are all good with
(1) There's no way you'd ever want this to be disabled when in auth mode. But in condition #2, even if we recover, there will be a period of inconsistency. Should we silently swallow this? Or raise an exception? I'm coming round to the "this will always be on unless you somehow want to disable it" viewpoint too. Because if you aren't updating the store for some reason (example: you don't have write perms to the table), well, that merits a failure -doesn't it? Accordingly, I'm going to propose
saying "swallow metastore updates" is a special case people should be explicitly asking for. Returning to this patch then, I'm happy with it with some small changes:
|
Great! I'll get a commit posted with the changes soon. |
|
@steveloughran I pushed the changes you suggested. I also ran the tests again against us-west-2: The one error was in ITestDirectoryCommitMRJob, which succeeded on a re-run. |
|
|
||
| <property> | ||
| <name>fs.s3a.metadatastore.fail.on.write.error</name> | ||
| <value>false</value> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should change to true here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, definitely, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in aeccca1, will run tests again
| ``` | ||
|
|
||
| Programmatic retries of the original operation would require overwrite=true. | ||
| Suppose the original operation was FileSystem.create(myFile, ovewrite=false). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ovewrite -> overwrite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 056b52a
|
|
||
| By default, S3AFileSystem write operations will fail when updates to | ||
| S3Guard metadata fail. S3AFileSystem first writes the file to S3 and then | ||
| updates the metadata in S3Guard. If the metadata write fails, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
|
💔 -1 overall
This message was automatically generated. |
|
|
||
| By default, S3AFileSystem write operations will fail when updates to | ||
| S3Guard metadata fail. S3AFileSystem first writes the file to S3 and then | ||
| updates the metadata in S3Guard. If the metadata write fails, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
|
💔 -1 overall
This message was automatically generated. |
|
Ran the tests again against us-west-2. The errors and failures were in ITestS3AContractGetFileStatusV1List, ITestS3AContractRename, and ITestDirectoryCommitMRJob. Each run individually passed. |
|
🎊 +1 overall
This message was automatically generated. |
|
OK, I'm +1 on this. applying patch locally and then doing a full retest; if all is well then I'll commit it |
|
+1, committed after test run |
As per subject Author: Wei Song <[email protected]> Reviewers: Jagadish Venkatraman <[email protected]> Closes apache#666 from weisong44/SAMZA-1915 and squashes the following commits: b7794af5 [Wei Song] Updated table section in CSS 396d5e8a [Wei Song] Merged from master 1e5de45 [Wei Song] Merge remote-tracking branch 'upstream/master' e0e70acf [Wei Song] Merge branch 'master' into SAMZA-1915 c85604e [Wei Song] Merge remote-tracking branch 'upstream/master' d687f716 [Wei Song] SAMZA-1915: Added docs for table API 242d844 [Wei Song] Merge remote-tracking branch 'upstream/master' ec7d840 [Wei Song] Merge remote-tracking branch 'upstream/master' e19b4dc [Wei Song] Merge remote-tracking branch 'upstream/master' 8ee7844 [Wei Song] Merge remote-tracking branch 'upstream/master' 1c6a2ea [Wei Song] Merge remote-tracking branch 'upstream/master' a6c94ad [Wei Song] Merge remote-tracking branch 'upstream/master' 41299b5 [Wei Song] Merge remote-tracking branch 'upstream/master' 239a095 [Wei Song] Merge remote-tracking branch 'upstream/master' eca0020 [Wei Song] Merge remote-tracking branch 'upstream/master' 5156239 [Wei Song] Merge remote-tracking branch 'upstream/master' de708f5 [Wei Song] Merge remote-tracking branch 'upstream/master' df2f8d7 [Wei Song] Merge remote-tracking branch 'upstream/master' f28b491 [Wei Song] Merge remote-tracking branch 'upstream/master' 4782c61 [Wei Song] Merge remote-tracking branch 'upstream/master' 0440f75 [Wei Song] Merge remote-tracking branch 'upstream/master' aae0f38 [Wei Song] Merge remote-tracking branch 'upstream/master' a15a7c9 [Wei Song] Merge remote-tracking branch 'upstream/master' 5cbf9af [Wei Song] Merge remote-tracking branch 'upstream/master' 3f7ed71 [Wei Song] Added self to committer list
Initial patch. Curious for any feedback, particular with regard to the default. I have left the default as matching current behavior, but it doesn't feel right.