Skip to content

Conversation

@ben-roling
Copy link
Contributor

Initial patch. Curious for any feedback, particular with regard to the default. I have left the default as matching current behavior, but it doesn't feel right.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add an entry for this in org.apache.hadoop.fs.s3a.S3ARetryPolicy for whatever policy we think matters. For now, set it to fail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to change the policy annotation here

@steveloughran
Copy link
Contributor

right, if you are getting into the failure handling, you are going to have to look closely at the invoker retry stuff and those @Retry annotations we do our best to keep up to date.

This means: any change to stop swallowing DDB exceptions will need changes in the annotation values and commentary on all uses of the {{finishedWrite()}} method, and review of "is everything consistent with my changes". Yes, it's a pain, yes it's manual, but its there to help us all understand how things fail, what kind of exceptions get raised, etc. It was that or javadocs: at least with these annotations you could imagine doing some graph of invocations

  • there must not be retries around retries; {{WriteOperationsHelper.finalizeMultipartUpload()}} is doing retries though. It may be best to pull the finishedWrite there into its own once() call
  • I'm also not sure if we want to wrap the IOE with another IOE, not unless we really think it makes sense either in retry handling or diagnostics. Normally I'd say "don't do at all", but this time I'm open to discussion, precisely because it lets us change policies. But

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
0 reexec 27 Docker mode activated.
_ Prechecks _
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
0 mvndep 22 Maven dependency ordering for branch
+1 mvninstall 1047 trunk passed
+1 compile 939 trunk passed
+1 checkstyle 193 trunk passed
+1 mvnsite 120 trunk passed
+1 shadedclient 1047 branch has no errors when building and testing our client artifacts.
+1 findbugs 158 trunk passed
+1 javadoc 103 trunk passed
_ Patch Compile Tests _
0 mvndep 22 Maven dependency ordering for patch
+1 mvninstall 76 the patch passed
+1 compile 897 the patch passed
+1 javac 897 the patch passed
+1 checkstyle 194 the patch passed
+1 mvnsite 120 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 xml 1 The patch has no ill-formed XML file.
+1 shadedclient 678 patch has no errors when building and testing our client artifacts.
+1 findbugs 203 the patch passed
+1 javadoc 81 the patch passed
_ Other Tests _
+1 unit 516 hadoop-common in the patch passed.
+1 unit 286 hadoop-aws in the patch passed.
+1 asflicense 47 The patch does not generate ASF License warnings.
6745
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-666/1/artifact/out/Dockerfile
GITHUB PR #666
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle
uname Linux de4bfcf25a3a 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 7dc0ecc
maven version: Apache Maven 3.3.9
Default Java 1.8.0_191
findbugs v3.1.0-RC1
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-666/1/testReport/
Max. process+thread count 1348 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-666/1/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

@ben-roling
Copy link
Contributor Author

Thanks for the feedback @steveloughran !

To be honest, the purpose of the retry annotations hadn't really sunk in for me until you mentioned them. I pushed an update but I don't think I'm grasping the annotations well enough to have gotten it correct. It's not clear to me if the annotation is supposed to be about the code within the immediate scope of the method or the transitive scope. For example, the code immediately within the finishedWrite() method is not retried and the method itself does not translate any exceptions that occur within it. As such, I documented it as "OnceRaw". That said, S3Guard.putAndReturn() transitively does include retries, and there is translation on those exceptions. That occurs within DynamoDBMetadataStore.processBatchWriteRequest(). This had me wondering whether finishedWrite() should have been documented as "RetryTranslated".

Further complicating things is that finishedWrite() now sometimes (depending on configuration) swallows exceptions. I guess I could have used a string argument passed to the annotation to document that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DynamoDB metastore putAndReturn is Retrydoes retry; what's changing now is that
raw S3 FS Delete exceptions are being swallowed; when eventually s3guard gives up, that's thrown. Afraid you are going to have to chase through that call chain to see what's going on and make sure things are consistent w.r.t retry/translation and with clarifications. If there's some stuff in between which isn't fully annotated (putAndReturn()), this is the chance to fix that. I know it's a pain, but trying to wrap retry() with retry() causes an exponential explosion in the time a failure takes to surface and we need to keep absolutely on top of that (more than the translated/raw stuff)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think I had the wrong impression of the expectations for the annotations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have this wrong. Now my impression is I shouldn't have changed this annotation. This entire operation is effectively RetryTranslated. This method itself retries and translates exceptions on complete-MPU and the finishedWrite() method retries and translates exceptions internal to itself (via the retrying and translating occurring in DynamoDBMetadataStore). Is that how you would think of it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if AWS exceptions are being caught & stuck through translateException, they are retry-translated. The key thing is: by the time any operation gets to the public FS APIs, they must be translated. Double translate is harmless. It is retry-on-retry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks like I can just revert the change on this line here and just leave it as it was before since it already said Retries.RetryTranslated and the extra text I added isn't helping any.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 8ff5427

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to improve the documentation of this class here. I'm curious if I have captured the real intended meaning/purpose of these annotations? If so, hopefully it will help the next developer understand the goal better than I did initially.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, though I'd be stricter about the wrapping of retries. You mustn't retry around a retrying operation as you'll just cause things to take so long that operations will time out anyway. The raw/translated flags are markers about whether to expect AWS-library Runtime Exceptions or translated stuff, or a mixture. It's less critical, but once everything is pure IOE, there's no need to catch and convert again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tweaked in ef91a67

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 28 Docker mode activated.
_ Prechecks _
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
0 mvndep 19 Maven dependency ordering for branch
+1 mvninstall 982 trunk passed
+1 compile 947 trunk passed
+1 checkstyle 193 trunk passed
+1 mvnsite 209 trunk passed
+1 shadedclient 1520 branch has no errors when building and testing our client artifacts.
+1 findbugs 364 trunk passed
+1 javadoc 186 trunk passed
_ Patch Compile Tests _
0 mvndep 42 Maven dependency ordering for patch
+1 mvninstall 198 the patch passed
+1 compile 966 the patch passed
+1 javac 966 the patch passed
-0 checkstyle 192 root: The patch generated 1 new + 8 unchanged - 0 fixed = 9 total (was 8)
+1 mvnsite 115 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 xml 2 The patch has no ill-formed XML file.
+1 shadedclient 674 patch has no errors when building and testing our client artifacts.
+1 findbugs 169 the patch passed
+1 javadoc 102 the patch passed
_ Other Tests _
-1 unit 501 hadoop-common in the patch failed.
+1 unit 284 hadoop-aws in the patch passed.
+1 asflicense 39 The patch does not generate ASF License warnings.
7671
Reason Tests
Failed junit tests hadoop.util.TestDiskCheckerWithDiskIo
hadoop.util.TestReadWriteDiskValidator
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-666/2/artifact/out/Dockerfile
GITHUB PR #666
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle
uname Linux dd21068f989b 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 56f1e13
maven version: Apache Maven 3.3.9
Default Java 1.8.0_191
findbugs v3.1.0-RC1
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-666/2/artifact/out/diff-checkstyle-root.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-666/2/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-666/2/testReport/
Max. process+thread count 1347 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-666/2/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
0 reexec 29 Docker mode activated.
_ Prechecks _
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
0 mvndep 23 Maven dependency ordering for branch
+1 mvninstall 980 trunk passed
+1 compile 930 trunk passed
+1 checkstyle 184 trunk passed
+1 mvnsite 108 trunk passed
+1 shadedclient 929 branch has no errors when building and testing our client artifacts.
+1 findbugs 141 trunk passed
+1 javadoc 81 trunk passed
_ Patch Compile Tests _
0 mvndep 21 Maven dependency ordering for patch
+1 mvninstall 74 the patch passed
+1 compile 892 the patch passed
+1 javac 892 the patch passed
-0 checkstyle 193 root: The patch generated 1 new + 9 unchanged - 0 fixed = 10 total (was 9)
+1 mvnsite 121 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 xml 1 The patch has no ill-formed XML file.
+1 shadedclient 688 patch has no errors when building and testing our client artifacts.
+1 findbugs 166 the patch passed
+1 javadoc 102 the patch passed
_ Other Tests _
+1 unit 519 hadoop-common in the patch passed.
+1 unit 284 hadoop-aws in the patch passed.
+1 asflicense 47 The patch does not generate ASF License warnings.
6489
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-666/3/artifact/out/Dockerfile
GITHUB PR #666
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle
uname Linux 6e6772c25632 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 56f1e13
maven version: Apache Maven 3.3.9
Default Java 1.8.0_191
findbugs v3.1.0-RC1
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-666/3/artifact/out/diff-checkstyle-root.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-666/3/testReport/
Max. process+thread count 1400 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-666/3/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
0 reexec 30 Docker mode activated.
_ Prechecks _
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
0 mvndep 23 Maven dependency ordering for branch
+1 mvninstall 1001 trunk passed
+1 compile 978 trunk passed
+1 checkstyle 186 trunk passed
+1 mvnsite 120 trunk passed
+1 shadedclient 999 branch has no errors when building and testing our client artifacts.
+1 findbugs 147 trunk passed
+1 javadoc 90 trunk passed
_ Patch Compile Tests _
0 mvndep 21 Maven dependency ordering for patch
+1 mvninstall 72 the patch passed
+1 compile 928 the patch passed
+1 javac 928 the patch passed
-0 checkstyle 187 root: The patch generated 1 new + 9 unchanged - 0 fixed = 10 total (was 9)
+1 mvnsite 111 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 xml 1 The patch has no ill-formed XML file.
+1 shadedclient 637 patch has no errors when building and testing our client artifacts.
+1 findbugs 166 the patch passed
+1 javadoc 92 the patch passed
_ Other Tests _
+1 unit 505 hadoop-common in the patch passed.
+1 unit 285 hadoop-aws in the patch passed.
+1 asflicense 45 The patch does not generate ASF License warnings.
6552
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-666/4/artifact/out/Dockerfile
GITHUB PR #666
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle
uname Linux 72b7e2a6ab1c 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 56f1e13
maven version: Apache Maven 3.3.9
Default Java 1.8.0_191
findbugs v3.1.0-RC1
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-666/4/artifact/out/diff-checkstyle-root.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-666/4/testReport/
Max. process+thread count 1509 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-666/4/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should use {@value} instead of false in the javadoc like elsewhere in this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this exception won't happen unless the config allows it, maybe reword to something like "could unacceptably not be saved to the metadata store" as a hint about that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change @throws from IOException to more specific MetadataPersistenceException

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo, whichcase -> which case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be helpful to change key -> p here, so that whoever is reading the error log has the qualified path? also the : at the end of the message string is not necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might prefix this with something like "If the write operation cannot be programmatically retried, ..." since that would probably be the preferred remedy for this exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excepted -> expected

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add another test where the only difference is that this is flipped to false (or never set since that's the default), that verifies no MetadataPersistenceException is thrown

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Also noticed the test class was incorrectly named and fixed it. It was really in integration test and furthermore, the convention in the project is the test class names are prefixed rather than suffixed with Test or ITest.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 0 Docker mode activated.
-1 patch 7 #666 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.
Subsystem Report/Notes
GITHUB PR #666
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-666/6/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

@ben-roling
Copy link
Contributor Author

Another case of yetus can't apply the patch from the PR. This time I just squash rebased on trunk and force-pushed.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
0 reexec 29 Docker mode activated.
_ Prechecks _
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
0 mvndep 21 Maven dependency ordering for branch
+1 mvninstall 1004 trunk passed
+1 compile 929 trunk passed
+1 checkstyle 176 trunk passed
+1 mvnsite 107 trunk passed
+1 shadedclient 932 branch has no errors when building and testing our client artifacts.
+1 findbugs 142 trunk passed
+1 javadoc 83 trunk passed
_ Patch Compile Tests _
0 mvndep 19 Maven dependency ordering for patch
+1 mvninstall 71 the patch passed
+1 compile 934 the patch passed
+1 javac 934 the patch passed
-0 checkstyle 186 root: The patch generated 1 new + 9 unchanged - 0 fixed = 10 total (was 9)
+1 mvnsite 126 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 xml 2 The patch has no ill-formed XML file.
+1 shadedclient 676 patch has no errors when building and testing our client artifacts.
+1 findbugs 172 the patch passed
+1 javadoc 103 the patch passed
_ Other Tests _
+1 unit 497 hadoop-common in the patch passed.
+1 unit 289 hadoop-aws in the patch passed.
+1 asflicense 47 The patch does not generate ASF License warnings.
6535
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-666/5/artifact/out/Dockerfile
GITHUB PR #666
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle
uname Linux 661f57241b09 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / bfc90bd
maven version: Apache Maven 3.3.9
Default Java 1.8.0_191
findbugs v3.1.0-RC1
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-666/5/artifact/out/diff-checkstyle-root.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-666/5/testReport/
Max. process+thread count 1417 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-666/5/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 29 Docker mode activated.
_ Prechecks _
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
0 mvndep 70 Maven dependency ordering for branch
+1 mvninstall 1093 trunk passed
+1 compile 1097 trunk passed
+1 checkstyle 190 trunk passed
+1 mvnsite 116 trunk passed
+1 shadedclient 988 branch has no errors when building and testing our client artifacts.
+1 findbugs 160 trunk passed
+1 javadoc 86 trunk passed
_ Patch Compile Tests _
0 mvndep 24 Maven dependency ordering for patch
+1 mvninstall 82 the patch passed
+1 compile 1010 the patch passed
+1 javac 1010 the patch passed
-0 checkstyle 199 root: The patch generated 1 new + 9 unchanged - 0 fixed = 10 total (was 9)
+1 mvnsite 115 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 xml 1 The patch has no ill-formed XML file.
+1 shadedclient 653 patch has no errors when building and testing our client artifacts.
+1 findbugs 182 the patch passed
+1 javadoc 84 the patch passed
_ Other Tests _
-1 unit 107 hadoop-common in the patch failed.
-1 unit 49 hadoop-aws in the patch failed.
0 asflicense 32 ASF License check generated no output?
6290
Reason Tests
Failed junit tests hadoop.util.curator.TestChildReaper
hadoop.util.TestDataChecksum
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-666/7/artifact/out/Dockerfile
GITHUB PR #666
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle
uname Linux cec799c2f34f 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 215ffc7
maven version: Apache Maven 3.3.9
Default Java 1.8.0_191
findbugs v3.1.0-RC1
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-666/7/artifact/out/diff-checkstyle-root.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-666/7/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-666/7/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-666/7/testReport/
Max. process+thread count 446 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-666/7/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

@ben-roling
Copy link
Contributor Author

Looks like the test failures were due to some sort of jenkins/yetus infrastructure issue. The logs are full of "java.lang.OutOfMemoryError: unable to create new native thread" like this one:

[ERROR] Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 20.103 s <<< FAILURE! - in org.apache.hadoop.util.TestDataChecksum
[ERROR] testCrc32(org.apache.hadoop.util.TestDataChecksum)  Time elapsed: 14.807 s  <<< ERROR!
java.lang.OutOfMemoryError: unable to create new native thread
	at java.lang.Thread.start0(Native Method)
	at java.lang.Thread.start(Thread.java:717)
	at org.apache.hadoop.util.Crc32PerformanceTest.doBench(Crc32PerformanceTest.java:387)

I'll use the trick Gabor mentioned on my other PR of amending my commit message and force-pushing to get it to build again.

@steveloughran steveloughran added the fs/s3 changes related to hadoop-aws; submitter must declare test endpoint label Apr 5, 2019
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the h1 level heading here was a mistake made when this section was added. I fixed it when addressing the merge conflict with my changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's happened in a couple of places. thanks

commit a03e02eb93c86f7ff2c5bac3db85a12d1bc166fe
Merge: cc5e44c988a df76cdc
Author: Ben Roling <[email protected]>
Date:   Thu Apr 18 16:09:16 2019 -0500

    Merge branch 'trunk' into HADOOP-16221

commit cc5e44c988aeba2a8f1e8899518a55482d52d93d
Merge: 8ff542741ab 04c0437
Author: Ben Roling <[email protected]>
Date:   Tue Apr 16 22:40:16 2019 -0500

    Merge branch 'trunk' into HADOOP-16221

commit 8ff542741ab3386a9ce967fb2f36fab2265413f4
Author: Ben Roling <[email protected]>
Date:   Tue Apr 16 22:27:14 2019 -0500

    Incorporate PR review feedback

commit 5d02d2d07ba3cc9503c1b5d2d8b75eb5818db5b8
Author: Ben Roling <[email protected]>
Date:   Thu Apr 4 15:27:28 2019 -0500

    Fix broken ITestS3AMetadataPersistenceException

    Previously the test could fail if the test file already existed.

    Edit: commit message ammended to force a yetus rebuild.

commit b359f27be46193a99bb23b96eff9fc5b64f9d25e
Author: Ben Roling <[email protected]>
Date:   Fri Mar 29 11:31:13 2019 -0500

    HADOOP-16221 add option to fail operation on metadata write failure
@ben-roling
Copy link
Contributor Author

Squashed and force-pushed to get Yetus to build it again.

@ben-roling
Copy link
Contributor Author

I did another run of the tests (with us-west-2)

mvn -T 1C verify -Dparallel-tests -DtestsThreadCount=8 -Ds3guard -Ddynamo
[ERROR] Tests run: 805, Failures: 3, Errors: 3, Skipped: 144

Tests had errors or failures:

  • ITestS3AContractGetFileStatusV1List
  • ITestDirectoryCommitMRJob
  • ITestS3GuardToolDynamoDB

I re-ran these each individually and they passed:

mvn -T 1C verify -Dtest=skip -Dit.test=ITestS3AContractGetFileStatusV1List -Ds3guard -Ddynamo
mvn -T 1C verify -Dtest=skip -Dit.test=ITestDirectoryCommitMRJob -Ds3guard -Ddynamo
mvn -T 1C verify -Dtest=skip -Dit.test=ITestS3GuardToolDynamoDB -Ds3guard -Ddynamo

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
0 reexec 26 Docker mode activated.
_ Prechecks _
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
0 mvndep 76 Maven dependency ordering for branch
+1 mvninstall 1159 trunk passed
+1 compile 1094 trunk passed
+1 checkstyle 151 trunk passed
+1 mvnsite 131 trunk passed
+1 shadedclient 1061 branch has no errors when building and testing our client artifacts.
+1 findbugs 154 trunk passed
+1 javadoc 97 trunk passed
_ Patch Compile Tests _
0 mvndep 22 Maven dependency ordering for patch
+1 mvninstall 85 the patch passed
+1 compile 940 the patch passed
+1 javac 940 the patch passed
-0 checkstyle 139 root: The patch generated 3 new + 9 unchanged - 0 fixed = 12 total (was 9)
+1 mvnsite 115 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 xml 2 The patch has no ill-formed XML file.
+1 shadedclient 747 patch has no errors when building and testing our client artifacts.
+1 findbugs 169 the patch passed
+1 javadoc 90 the patch passed
_ Other Tests _
+1 unit 503 hadoop-common in the patch passed.
+1 unit 278 hadoop-aws in the patch passed.
+1 asflicense 43 The patch does not generate ASF License warnings.
7069
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-666/11/artifact/out/Dockerfile
GITHUB PR #666
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle
uname Linux d90084259595 4.4.0-139-generic #165~14.04.1-Ubuntu SMP Wed Oct 31 10:55:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / df76cdc
maven version: Apache Maven 3.3.9
Default Java 1.8.0_191
findbugs v3.1.0-RC1
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-666/11/artifact/out/diff-checkstyle-root.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-666/11/testReport/
Max. process+thread count 1350 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-666/11/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor

steveloughran commented Apr 26, 2019

I've just been retesting this...happy with the changes in the operation, just two things I want to make sure we are all good with

  1. Do we need to wrap any existing IOExceptions raised in the finishedWrite() calls with their own exception. I'd going to say "yes" primarily because thats how we can guarantee that the failure won't trigger any of the retry logic used in existing operations which assume that an IOE only ever gets raised during the main operation against S3, rather than the subsequent metastore calls.

  2. Do we make this a new switch or bond it to auth mode?

(1) There's no way you'd ever want this to be disabled when in auth mode.
(2) When not in auth mode, we are meant to be more tolerant of OOB changes in the store, and you could consider files which have a file changed but not metastore update as "just" an OOB update

But in condition #2, even if we recover, there will be a period of inconsistency. Should we silently swallow this? Or raise an exception?

I'm coming round to the "this will always be on unless you somehow want to disable it" viewpoint too. Because if you aren't updating the store for some reason (example: you don't have write perms to the table), well, that merits a failure -doesn't it?

Accordingly, I'm going to propose

  • we do have the new config option
  • its true by default. That is, unless you say otherwise, if you can't update the metastore, it's an error.

saying "swallow metastore updates" is a special case people should be explicitly asking for.

Returning to this patch then, I'm happy with it with some small changes:

  1. we switch the default value of FAIL_ON_METADATA_WRITE_ERROR_DEFAULT to true
  2. change the docs to match.

@ben-roling
Copy link
Contributor Author

Returning to this patch then, I'm happy with it with some small changes:

  1. we switch the default value of FAIL_ON_METADATA_WRITE_ERROR_DEFAULT to true
  2. change the docs to match.

Great! I'll get a commit posted with the changes soon.

@ben-roling
Copy link
Contributor Author

@steveloughran I pushed the changes you suggested.

I also ran the tests again against us-west-2:

mvn -T 1C verify -Dparallel-tests -DtestsThreadCount=8 -Ds3guard -Ddynamo
Tests run: 805, Failures: 0, Errors: 1, Skipped: 144

The one error was in ITestDirectoryCommitMRJob, which succeeded on a re-run.

mvn -T 1C verify -Dtest=skip -Dit.test=ITestDirectoryCommitMRJob -Ds3guard -Ddynamo


<property>
<name>fs.s3a.metadatastore.fail.on.write.error</name>
<value>false</value>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should change to true here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, definitely, thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in aeccca1, will run tests again

```

Programmatic retries of the original operation would require overwrite=true.
Suppose the original operation was FileSystem.create(myFile, ovewrite=false).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ovewrite -> overwrite

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 056b52a


By default, S3AFileSystem write operations will fail when updates to
S3Guard metadata fail. S3AFileSystem first writes the file to S3 and then
updates the metadata in S3Guard. If the metadata write fails,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:end of line

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 21 Docker mode activated.
_ Prechecks _
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
0 mvndep 63 Maven dependency ordering for branch
+1 mvninstall 1094 trunk passed
+1 compile 956 trunk passed
+1 checkstyle 142 trunk passed
+1 mvnsite 128 trunk passed
+1 shadedclient 1020 branch has no errors when building and testing our client artifacts.
+1 findbugs 159 trunk passed
+1 javadoc 106 trunk passed
_ Patch Compile Tests _
0 mvndep 22 Maven dependency ordering for patch
+1 mvninstall 78 the patch passed
+1 compile 909 the patch passed
+1 javac 909 the patch passed
-0 checkstyle 138 root: The patch generated 3 new + 9 unchanged - 0 fixed = 12 total (was 9)
+1 mvnsite 124 the patch passed
-1 whitespace 0 The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 xml 1 The patch has no ill-formed XML file.
+1 shadedclient 679 patch has no errors when building and testing our client artifacts.
+1 findbugs 174 the patch passed
+1 javadoc 104 the patch passed
_ Other Tests _
+1 unit 526 hadoop-common in the patch passed.
+1 unit 291 hadoop-aws in the patch passed.
+1 asflicense 50 The patch does not generate ASF License warnings.
6801
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-666/12/artifact/out/Dockerfile
GITHUB PR #666
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle
uname Linux 8372537fdeaa 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 59816df
maven version: Apache Maven 3.3.9
Default Java 1.8.0_191
findbugs v3.1.0-RC1
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-666/12/artifact/out/diff-checkstyle-root.txt
whitespace https://builds.apache.org/job/hadoop-multibranch/job/PR-666/12/artifact/out/whitespace-eol.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-666/12/testReport/
Max. process+thread count 1598 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-666/12/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.


By default, S3AFileSystem write operations will fail when updates to
S3Guard metadata fail. S3AFileSystem first writes the file to S3 and then
updates the metadata in S3Guard. If the metadata write fails,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:end of line

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 45 Docker mode activated.
_ Prechecks _
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
0 mvndep 66 Maven dependency ordering for branch
+1 mvninstall 1033 trunk passed
+1 compile 1011 trunk passed
+1 checkstyle 142 trunk passed
+1 mvnsite 116 trunk passed
+1 shadedclient 987 branch has no errors when building and testing our client artifacts.
+1 findbugs 162 trunk passed
+1 javadoc 106 trunk passed
_ Patch Compile Tests _
0 mvndep 24 Maven dependency ordering for patch
+1 mvninstall 77 the patch passed
+1 compile 950 the patch passed
+1 javac 950 the patch passed
-0 checkstyle 145 root: The patch generated 3 new + 9 unchanged - 0 fixed = 12 total (was 9)
+1 mvnsite 128 the patch passed
-1 whitespace 0 The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 xml 1 The patch has no ill-formed XML file.
+1 shadedclient 686 patch has no errors when building and testing our client artifacts.
+1 findbugs 175 the patch passed
+1 javadoc 104 the patch passed
_ Other Tests _
+1 unit 511 hadoop-common in the patch passed.
+1 unit 288 hadoop-aws in the patch passed.
+1 asflicense 50 The patch does not generate ASF License warnings.
6839
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-666/13/artifact/out/Dockerfile
GITHUB PR #666
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle
uname Linux b6b8d5959456 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 4b4200f
maven version: Apache Maven 3.3.9
Default Java 1.8.0_191
findbugs v3.1.0-RC1
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-666/13/artifact/out/diff-checkstyle-root.txt
whitespace https://builds.apache.org/job/hadoop-multibranch/job/PR-666/13/artifact/out/whitespace-eol.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-666/13/testReport/
Max. process+thread count 1388 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-666/13/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

@ben-roling
Copy link
Contributor Author

Ran the tests again against us-west-2.

mvn -T 1C verify -Dparallel-tests -DtestsThreadCount=8 -Ds3guard -Ddynamo
Tests run: 805, Failures: 1, Errors: 2, Skipped: 144

The errors and failures were in ITestS3AContractGetFileStatusV1List, ITestS3AContractRename, and ITestDirectoryCommitMRJob. Each run individually passed.

mvn -T 1C verify -Dtest=skip -Dit.test=ITestS3AContractGetFileStatusV1List -Ds3guard -Ddynamo
mvn -T 1C verify -Dtest=skip -Dit.test=ITestS3AContractRename -Ds3guard -Ddynamo
mvn -T 1C verify -Dtest=skip -Dit.test=ITestDirectoryCommitMRJob -Ds3guard -Ddynamo

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
0 reexec 22 Docker mode activated.
_ Prechecks _
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
0 mvndep 20 Maven dependency ordering for branch
+1 mvninstall 1032 trunk passed
+1 compile 967 trunk passed
+1 checkstyle 130 trunk passed
+1 mvnsite 116 trunk passed
+1 shadedclient 899 branch has no errors when building and testing our client artifacts.
+1 findbugs 147 trunk passed
+1 javadoc 86 trunk passed
_ Patch Compile Tests _
0 mvndep 23 Maven dependency ordering for patch
+1 mvninstall 76 the patch passed
+1 compile 1029 the patch passed
+1 javac 1029 the patch passed
-0 checkstyle 144 root: The patch generated 3 new + 9 unchanged - 0 fixed = 12 total (was 9)
+1 mvnsite 115 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 xml 2 The patch has no ill-formed XML file.
+1 shadedclient 663 patch has no errors when building and testing our client artifacts.
+1 findbugs 190 the patch passed
+1 javadoc 93 the patch passed
_ Other Tests _
+1 unit 554 hadoop-common in the patch passed.
+1 unit 277 hadoop-aws in the patch passed.
+1 asflicense 38 The patch does not generate ASF License warnings.
6626
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-666/14/artifact/out/Dockerfile
GITHUB PR #666
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle
uname Linux 2955c20de4f9 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 7fbaa7d
maven version: Apache Maven 3.3.9
Default Java 1.8.0_191
findbugs v3.1.0-RC1
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-666/14/artifact/out/diff-checkstyle-root.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-666/14/testReport/
Max. process+thread count 1450 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-666/14/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor

OK, I'm +1 on this. applying patch locally and then doing a full retest; if all is well then I'll commit it

@steveloughran
Copy link
Contributor

+1, committed after test run

@ben-roling ben-roling deleted the HADOOP-16221 branch April 30, 2019 13:53
@ben-roling ben-roling restored the HADOOP-16221 branch April 30, 2019 13:53
shanthoosh pushed a commit to shanthoosh/hadoop that referenced this pull request Oct 15, 2019
As per subject

Author: Wei Song <[email protected]>

Reviewers: Jagadish Venkatraman <[email protected]>

Closes apache#666 from weisong44/SAMZA-1915 and squashes the following commits:

b7794af5 [Wei Song] Updated table section in CSS
396d5e8a [Wei Song] Merged from master
1e5de45 [Wei Song] Merge remote-tracking branch 'upstream/master'
e0e70acf [Wei Song] Merge branch 'master' into SAMZA-1915
c85604e [Wei Song] Merge remote-tracking branch 'upstream/master'
d687f716 [Wei Song] SAMZA-1915: Added docs for table API
242d844 [Wei Song] Merge remote-tracking branch 'upstream/master'
ec7d840 [Wei Song] Merge remote-tracking branch 'upstream/master'
e19b4dc [Wei Song] Merge remote-tracking branch 'upstream/master'
8ee7844 [Wei Song] Merge remote-tracking branch 'upstream/master'
1c6a2ea [Wei Song] Merge remote-tracking branch 'upstream/master'
a6c94ad [Wei Song] Merge remote-tracking branch 'upstream/master'
41299b5 [Wei Song] Merge remote-tracking branch 'upstream/master'
239a095 [Wei Song] Merge remote-tracking branch 'upstream/master'
eca0020 [Wei Song] Merge remote-tracking branch 'upstream/master'
5156239 [Wei Song] Merge remote-tracking branch 'upstream/master'
de708f5 [Wei Song] Merge remote-tracking branch 'upstream/master'
df2f8d7 [Wei Song] Merge remote-tracking branch 'upstream/master'
f28b491 [Wei Song] Merge remote-tracking branch 'upstream/master'
4782c61 [Wei Song] Merge remote-tracking branch 'upstream/master'
0440f75 [Wei Song] Merge remote-tracking branch 'upstream/master'
aae0f38 [Wei Song] Merge remote-tracking branch 'upstream/master'
a15a7c9 [Wei Song] Merge remote-tracking branch 'upstream/master'
5cbf9af [Wei Song] Merge remote-tracking branch 'upstream/master'
3f7ed71 [Wei Song] Added self to committer list
slfan1989 added a commit that referenced this pull request Jun 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fs/s3 changes related to hadoop-aws; submitter must declare test endpoint

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants