HADOOP-13230. S3A to optionally retain directory markers #2149

steveloughran · 2020-07-16T15:11:49Z

successor to #1861

steveloughran · 2020-07-16T15:12:39Z

testing against s3 ireland; not run anything other than the (now expanded) cost tests

steveloughran · 2020-07-16T16:15:50Z

tested s3 ireland with params -Dparallel-tests -DtestsThreadCount=8 -Dscale -Dmarkers=keep -Ds3guard -Ddynamo

one failure

[ERROR] Tests run: 24, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 369.68 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.ITestS3GuardOutOfBandOperations
[ERROR] testListingDelete[auth=false](org.apache.hadoop.fs.s3a.ITestS3GuardOutOfBandOperations)  Time elapsed: 19.103 s  <<< ERROR!
java.util.concurrent.ExecutionException: java.io.FileNotFoundException: No such file or directory: s3a://stevel-ireland/fork-0001/test/dir-e34a122f-a04d-48c3-90c1-9d35427fa939/file-1-e34a122f-a04d-48c3-90c1-9d35427fa939
	at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
	at org.apache.hadoop.fs.s3a.ITestS3GuardOutOfBandOperations.expectExceptionWhenReadingOpenFileAPI(ITestS3GuardOutOfBandOperations.java:1069)
	at org.apache.hadoop.fs.s3a.ITestS3GuardOutOfBandOperations.expectExceptionWhenReadingOpenFileAPI(ITestS3GuardOutOfBandOperations.java:1046)
	at org.apache.hadoop.fs.s3a.ITestS3GuardOutOfBandOperations.testListingDelete(ITestS3GuardOutOfBandOperations.java:1007)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: No such file or directory: s3a://stevel-ireland/fork-0001/test/dir-e34a122f-a04d-48c3-90c1-9d35427fa939/file-1-e34a122f-a04d-48c3-90c1-9d35427fa939
	at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3131)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2971)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.extractOrFetchSimpleFileStatus(S3AFileSystem.java:4764)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:1111)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$null$22(S3AFileSystem.java:4839)
	at org.apache.hadoop.util.LambdaUtils.eval(LambdaUtils.java:52)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$openFileWithOptions$23(S3AFileSystem.java:4838)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	... 1 more

cause is when status == null is passed in the failure happens is raised in the .get() on the future, but the operation expects it to fail in the first read(), which is only true with a file status is supplied)

this test path is only followed on an unversioned bucket, presumably that's what my test setup now is, and why it always worked before.

Works in IDE. May be a TTL expiry of s3guard record on overloaded test run.

filed https://issues.apache.org/jira/browse/HADOOP-17135

steveloughran · 2020-07-22T17:08:14Z

latest update has marker tool with tests, no docs. We are going to need a full page on directory markers and compatibility, aren't we?

Tests in progress, failure on

[ERROR] testProxyConnection(org.apache.hadoop.fs.s3a.ITestS3AConfiguration)  Time elapsed: 1.761 s  <<< FAILURE!
java.lang.AssertionError: Expected a org.apache.hadoop.fs.s3a.AWSClientIOException to be thrown, but got the result: : "expected failure creating FS when using proxy 127.0.0.1:1 got S3AFileSystem{uri=s3a://stevel-ireland, workingDir=s3a://stevel-ireland/user/stevel,...
	at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:499)
	at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:384)
	at org.apache.hadoop.fs.s3a.ITestS3AConfiguration.expectFSCreateFailure(ITestS3AConfiguration.java:159)
	at org.apache.hadoop.fs.s3a.ITestS3AConfiguration.testProxyConnection(ITestS3AConfiguration.java:134)

Seen this before -assume cause is that I've turned off bucket checks. Really test should combine a create and a list / and expect a failure in either place

steveloughran · 2020-07-24T15:11:00Z

see #2170 for the <= 3.2 subset of this patch, needed to not mistake paths with dir markers as empty.

As well as not deleting markers on file creation, I'm going to add the option of not doing any re-creation of markers when files are deleted.

This reduces IO on file deletion at the cost of a common assumption: "after you delete all files in a directory -the directory still exists".

I don't know what that's going to break, so it is going to be explicitly optional, with the recovery policy to be true/false/authoritative

hadoop-tools/hadoop-aws/pom.xml

iwasakims · 2020-07-29T11:27:01Z

ITestMarkerTool#testAuthPathIsMixed reproducibly fails. on Tokyo region.

$ mvn verify -Dtest=x -Dit.test=ITestMarkerTool
[INFO] Running org.apache.hadoop.fs.s3a.tools.ITestMarkerTool
[ERROR] Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 25.962 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.tools.ITestMarkerTool
[ERROR] testAuthPathIsMixed(org.apache.hadoop.fs.s3a.tools.ITestMarkerTool)  Time elapsed: 11.139 s  <<< ERROR!
org.apache.hadoop.fs.s3a.RemoteFileChangedException: copyFile(test/testAuthPathIsMixed/source/dir2/empty/, test/testAuthPathIsMixed/dest/dir2/empty/) `s3a://iwasakims-test/test/testAuthPathIsMixed/source/dir2/empty': File to rename not found on unguarded S3 store: copyFile(test/testAuthPathIsMixed/source/dir2/empty/, test/testAuthPathIsMixed/dest/dir2/empty/) on test/testAuthPathIsMixed/source/dir2/empty/: com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: FCCC5CB9EB1CE3D5; S3 Extended Request ID: ykAUAu7E18BsSSfBSmzWVD7tOQ0SGFhCwYKS6lSLPg3mk2DWkJkIeV8PjEYRD1i10QWcqryfCkU=), S3 Extended Request ID: ykAUAu7E18BsSSfBSmzWVD7tOQ0SGFhCwYKS6lSLPg3mk2DWkJkIeV8PjEYRD1i10QWcqryfCkU=:404 Not Found
Caused by: java.io.FileNotFoundException: copyFile(test/testAuthPathIsMixed/source/dir2/empty/, test/testAuthPathIsMixed/dest/dir2/empty/) on test/testAuthPathIsMixed/source/dir2/empty/: com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: FCCC5CB9EB1CE3D5; S3 Extended Request ID: ykAUAu7E18BsSSfBSmzWVD7tOQ0SGFhCwYKS6lSLPg3mk2DWkJkIeV8PjEYRD1i10QWcqryfCkU=), S3 Extended Request ID: ykAUAu7E18BsSSfBSmzWVD7tOQ0SGFhCwYKS6lSLPg3mk2DWkJkIeV8PjEYRD1i10QWcqryfCkU=:404 Not Found
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: FCCC5CB9EB1CE3D5; S3 Extended Request ID: ykAUAu7E18BsSSfBSmzWVD7tOQ0SGFhCwYKS6lSLPg3mk2DWkJkIeV8PjEYRD1i10QWcqryfCkU=)

steveloughran · 2020-07-29T17:52:37Z

@iwasakims yeah, been working on that new test. Here's my latest test run.

Test run; s3 london, bucket KMS encrypted, build params: -Dparallel-tests -DtestsThreadCount=6 -Dmarkers=keep -Dscale s3guard turned off

Two failures are related to empty dir markers; encryption over rename could be unrelated (only just turned KMS on yesterday, but...)

ERROR] Tests run: 3, Failures: 1, Errors: 0, Skipped: 2, Time elapsed: 9.589 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.ITestS3GuardEmptyDirs
[ERROR] testRenameEmptyDir(org.apache.hadoop.fs.s3a.ITestS3GuardEmptyDirs)  Time elapsed: 2.81 s  <<< FAILURE!
java.lang.AssertionError: Expected a java.io.FileNotFoundException to be thrown, but got the result: : S3AFileStatus{path=s3a://stevel-london/fork-0004/test/testRenameEmptyDir/AAA-source; isDirectory=true; modification_time=0; access_time=0; owner=stevel; group=stevel; permission=rwxrwxrwx; isSymlink=false; hasAcl=false; isEncrypted=true; isErasureCoded=false} isEmptyDirectory=TRUE eTag=null versionId=null
	at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:499)
	at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:384)
	at org.apache.hadoop.fs.s3a.ITestS3GuardEmptyDirs.testRenameEmptyDir(ITestS3GuardEmptyDirs.java:81)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

[ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 2, Time elapsed: 10.359 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings
[ERROR] testEncryptionOverRename(org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings)  Time elapsed: 2.256 s  <<< FAILURE!
org.junit.ComparisonFailure: Wrong algorithm in file s3a://stevel-london/fork-0005/test/testEncryptionOverRename-0400 with encryption algorithm AES256 and key null expected:<[aws:kms]> but was:<[AES256]>
	at org.junit.Assert.assertEquals(Assert.java:115)
	at org.apache.hadoop.fs.s3a.EncryptionTestUtils.assertEncrypted(EncryptionTestUtils.java:91)
	at org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings.assertEncrypted(ITestS3AEncryptionWithDefaultS3Settings.java:95)
	at org.apache.hadoop.fs.s3a.AbstractTestS3AEncryption.testEncryptionOverRename(AbstractTestS3AEncryption.java:116)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

[INFO] Running org.apache.hadoop.fs.s3a.performance.ITestS3ADeleteCost
[ERROR] Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 68.498 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.performance.ITestDirectoryMarkerListing
[ERROR] testRenameUnderMarker(org.apache.hadoop.fs.s3a.performance.ITestDirectoryMarkerListing)  Time elapsed: 4.903 s  <<< FAILURE!
java.lang.AssertionError: Expected 404 of fork-0003/test/testRenameUnderMarker/base/marker/: com.amazonaws.services.s3.model.ObjectMetadata@6aac4d33
	at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:499)
	at org.apache.hadoop.fs.s3a.performance.ITestDirectoryMarkerListing.head404(ITestDirectoryMarkerListing.java:464)
	at org.apache.hadoop.fs.s3a.performance.ITestDirectoryMarkerListing.testRenameUnderMarker(ITestDirectoryMarkerListing.java:392)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

steveloughran · 2020-07-30T14:40:18Z

filed [https://issues.apache.org/jira/browse/HADOOP-17167](https://issues.apache.org/jira/browse/HADOOP-171670 over the ITestS3AEncryptionWithDefaultS3Settings failure; unrelated, happens on trunk with my s3 config

steveloughran · 2020-07-31T09:49:27Z

Overnight test run failures. This is with a change not pushed up, where during teardown if FS != keep we do an audit of the store. Aim is to make sure that 100% of the FS operations don't leave markers in delete mode, so as to be fully compatible with existing releases.

[ERROR] testRunVerboseAudit(org.apache.hadoop.fs.s3a.tools.ITestMarkerTool)  Time elapsed: 3.298 s  <<< FAILURE!
org.junit.ComparisonFailure: [Exit code of marker(s3a://stevel-london/fork-0001/test/testRunVerboseAudit, true, 0) -> ScanResult{exitCode=46, tracker=DirMarkerTracker{leafMarkers=2, surplusMarkers=4, lastDirChecked=s3a://stevel-london/fork-0001/test/testRunVerboseAudit, filesFound=3, scanCount=28}, purgeSummary=null}] expected:<[0]> but was:<[46]>
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at org.apache.hadoop.fs.s3a.tools.ITestMarkerTool.markerTool(ITestMarkerTool.java:528)
	at org.apache.hadoop.fs.s3a.AbstractS3ATestBase.teardown(AbstractS3ATestBase.java:73)
	at org.apache.hadoop.fs.s3a.tools.ITestMarkerTool.teardown(ITestMarkerTool.java:109)

One unexpected seek failure (!)

[ERROR] testReadSmallFile[0](org.apache.hadoop.fs.contract.s3a.ITestS3AContractSeek)  Time elapsed: 1.191 s  <<< FAILURE!
java.lang.AssertionError: expected:<1024> but was:<433>
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:834)
	at org.junit.Assert.assertEquals(Assert.java:645)
	at org.junit.Assert.assertEquals(Assert.java:631)
	at org.apache.hadoop.fs.contract.AbstractContractSeekTest.testReadSmallFile(AbstractContractSeekTest.java:575)

and the landsat select test is timing out in parallel (6) test runs, but not standalone. This is consistent for me, new this week.

[ERROR] testSelectSeekFullLandsat(org.apache.hadoop.fs.s3a.select.ITestS3SelectLandsat)  Time elapsed: 599.989 s  <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 600000 milliseconds
	at java.lang.Throwable.fillInStackTrace(Native Method)
	at java.lang.Throwable.fillInStackTrace(Throwable.java:784)
	at java.lang.Throwable.<init>(Throwable.java:251)
	at org.apache.log4j.spi.LoggingEvent.getLocationInformation(LoggingEvent.java:253)
	at org.apache.log4j.helpers.PatternParser$LocationPatternConverter.convert(PatternParser.java:500)
	at org.apache.log4j.helpers.PatternConverter.format(PatternConverter.java:65)
	at org.apache.log4j.PatternLayout.format(PatternLayout.java:506)
	at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:310)
	at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
	at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
	at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
	at org.apache.log4j.Category.callAppenders(Category.java:206)
	at org.apache.log4j.Category.forcedLog(Category.java:391)
	at org.apache.log4j.Category.log(Category.java:856)
	at org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:230)
	at org.apache.hadoop.util.DurationInfo.close(DurationInfo.java:101)
	at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:111)
	at org.apache.hadoop.fs.s3a.select.SelectInputStream.read(SelectInputStream.java:246)
	at org.apache.hadoop.fs.s3a.select.SelectInputStream.seek(SelectInputStream.java:324)
	at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:67)
	at org.apache.hadoop.fs.s3a.select.AbstractS3SelectTest.seek(AbstractS3SelectTest.java:701)
	at org.apache.hadoop.fs.s3a.select.ITestS3SelectLandsat.testSelectSeekFullLandsat(ITestS3SelectLandsat.java:427)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

really weird where it times out though -log4j. Deadlock?

steveloughran · 2020-07-31T09:50:57Z

followup thought: I'm running with log @ debug...this log statement will only be hit in that mode. So until this week I wouldn't have been running in quite the same config. Maybe we've just found a log4j problem?

hadoop-yetus · 2020-08-04T08:35:00Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 34s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 1s	No case conflicting files found.
+0 🆗	markdownlint	0m 1s	markdownlint was not available.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s	The patch appears to include 23 new or modified test files.
		_ trunk Compile Tests _
+0 🆗	mvndep	0m 50s	Maven dependency ordering for branch
+1 💚	mvninstall	20m 16s	trunk passed
+1 💚	compile	22m 33s	trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚	compile	18m 59s	trunk passed with JDK Private Build-1.8.0_252-8u252-b09-1~18.04-b09
+1 💚	checkstyle	2m 37s	trunk passed
+1 💚	mvnsite	2m 33s	trunk passed
+1 💚	shadedclient	19m 37s	branch has no errors when building and testing our client artifacts.
+1 💚	javadoc	1m 7s	trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚	javadoc	2m 19s	trunk passed with JDK Private Build-1.8.0_252-8u252-b09-1~18.04-b09
+0 🆗	spotbugs	1m 20s	Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚	findbugs	3m 42s	trunk passed
-0 ⚠️	patch	1m 41s	Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 25s	Maven dependency ordering for patch
+1 💚	mvninstall	1m 24s	the patch passed
+1 💚	compile	21m 58s	the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
-1 ❌	javac	21m 58s	root-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 generated 4 new + 2045 unchanged - 4 fixed = 2049 total (was 2049)
+1 💚	compile	18m 56s	the patch passed with JDK Private Build-1.8.0_252-8u252-b09-1~18.04-b09
-1 ❌	javac	18m 56s	root-jdkPrivateBuild-1.8.0_252-8u252-b09-1~~18.04-b09 with JDK Private Build-1.8.0_252-8u252-b09-1~~18.04-b09 generated 4 new + 1939 unchanged - 4 fixed = 1943 total (was 1943)
-0 ⚠️	checkstyle	2m 48s	root: The patch generated 10 new + 66 unchanged - 2 fixed = 76 total (was 68)
-1 ❌	mvnsite	0m 51s	hadoop-aws in the patch failed.
-1 ❌	whitespace	0m 0s	The patch has 6 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚	xml	0m 1s	The patch has no ill-formed XML file.
+1 💚	shadedclient	14m 23s	patch has no errors when building and testing our client artifacts.
+1 💚	javadoc	1m 7s	the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚	javadoc	2m 9s	the patch passed with JDK Private Build-1.8.0_252-8u252-b09-1~18.04-b09
-1 ❌	findbugs	0m 45s	hadoop-aws in the patch failed.
		_ Other Tests _
+1 💚	unit	9m 37s	hadoop-common in the patch passed.
-1 ❌	unit	0m 48s	hadoop-aws in the patch failed.
+1 💚	asflicense	0m 50s	The patch does not generate ASF License warnings.
		175m 28s

Subsystem	Report/Notes
Docker	ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/6/artifact/out/Dockerfile
GITHUB PR	#2149
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml markdownlint
uname	Linux ff8c28055ac1 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `ab2b3df`
Default Java	Private Build-1.8.0_252-8u252-b09-1~18.04-b09
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_252-8u252-b09-1~18.04-b09
javac	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/6/artifact/out/diff-compile-javac-root-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.txt
javac	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/6/artifact/out/diff-compile-javac-root-jdkPrivateBuild-1.8.0_252-8u252-b09-1~18.04-b09.txt
checkstyle	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/6/artifact/out/diff-checkstyle-root.txt
mvnsite	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/6/artifact/out/patch-mvnsite-hadoop-tools_hadoop-aws.txt
whitespace	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/6/artifact/out/whitespace-eol.txt
findbugs	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/6/artifact/out/patch-findbugs-hadoop-tools_hadoop-aws.txt
unit	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/6/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/6/testReport/
Max. process+thread count	3249 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/6/console
versions	git=2.17.1 maven=3.6.0 findbugs=4.0.6
Powered by	Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

steveloughran · 2020-08-04T09:49:52Z

last local batch run;

mistake in one of the rename cost assumptions
audit code overreacts if test suite FS is closed

fail with mvit -Dparallel-tests -DtestsThreadCount=4 -Dmarkers=delete -Ds3guard -Ddynamo -Dscale -Dtest=moo -Dfs.s3a.directory.marker.audit=true -Dit.test=ITestDirectoryMarkerListing

[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.fs.s3a.performance.ITestDirectoryMarkerListing
[ERROR] Tests run: 17, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 32.636 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.performance.ITestDirectoryMarkerListing
[ERROR] testRenameUnderMarker(org.apache.hadoop.fs.s3a.performance.ITestDirectoryMarkerListing)  Time elapsed: 2.496 s  <<< FAILURE!
java.lang.AssertionError: Expected 404 of test/testRenameUnderMarker/base/marker/: "Object test/testRenameUnderMarker/base/marker/ of length 0"
   at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:499)
   at org.apache.hadoop.fs.s3a.performance.ITestDirectoryMarkerListing.head404(ITestDirectoryMarkerListing.java:520)
   at org.apache.hadoop.fs.s3a.performance.ITestDirectoryMarkerListing.testRenameUnderMarker(ITestDirectoryMarkerListing.java:434)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

hadoop-yetus · 2020-08-04T15:01:26Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 0s	Docker mode activated.
-1 ❌	patch	0m 5s	#2149 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.

Subsystem	Report/Notes
GITHUB PR	#2149
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/7/console
versions	git=2.17.1
Powered by	Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

Change-Id: I93097ff39f7254f18f8d382ad891002502c7112d

Change-Id: I1093fcce9737ac18a5e9a6cfb4f31da6731faf70

-always have one run in auth mode -cleaning up and javadocing -plan to pull up a base class and then move the delete and rename ops away Change-Id: I7fbf885de73eccc0bc842a4987ac12682ff1c63c

…category Change-Id: Ifadd671a6e39a4a8f4c4a6ab600132bc812ef79e

steveloughran · 2020-08-11T21:15:51Z

latest patch tweaks markertool, but also adds pathcapabilities probes to the s3a store so you can see if an instance is (a) markeraware and (b) whether markers are being kept or deleted on a given path. Look at the docs for details.

s bin/hadoop jar $CLOUDSTORE pathcapability fs.s3a.capability.directory.marker.keep s3a://stevel-london/tables
Probing s3a://stevel-london/tables for capability fs.s3a.capability.directory.marker.keep
2020-08-11 22:15:15,501 [main] INFO  s3a.S3AFileSystem (S3Guard.java:logS3GuardDisabled(1152)) - S3Guard is disabled on this bucket: stevel-london
2020-08-11 22:15:15,506 [main] INFO  impl.DirectoryPolicyImpl (DirectoryPolicyImpl.java:getDirectoryPolicy(143)) - Directory markers will be kept on authoritative paths
Using filesystem s3a://stevel-london
Path s3a://stevel-london/tables has capability fs.s3a.capability.directory.marker.keep

branch-3.2 doesn't support Path capabilities, but it does do it for stream capabilities; I'll extend cloudstore to have a streamcapabilities command too

hadoop-yetus · 2020-08-12T00:51:29Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	33m 5s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+0 🆗	markdownlint	0m 0s	markdownlint was not available.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
+1 💚	test4tests	0m 1s	The patch appears to include 24 new or modified test files.
		_ trunk Compile Tests _
+0 🆗	mvndep	3m 22s	Maven dependency ordering for branch
+1 💚	mvninstall	29m 43s	trunk passed
+1 💚	compile	19m 28s	trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚	compile	16m 50s	trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚	checkstyle	2m 44s	trunk passed
+1 💚	mvnsite	2m 25s	trunk passed
+1 💚	shadedclient	19m 46s	branch has no errors when building and testing our client artifacts.
+1 💚	javadoc	1m 15s	trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚	javadoc	2m 15s	trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+0 🆗	spotbugs	1m 15s	Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚	findbugs	3m 27s	trunk passed
-0 ⚠️	patch	1m 35s	Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 26s	Maven dependency ordering for patch
+1 💚	mvninstall	1m 23s	the patch passed
+1 💚	compile	18m 46s	the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
-1 ❌	javac	18m 46s	root-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 generated 4 new + 2045 unchanged - 4 fixed = 2049 total (was 2049)
+1 💚	compile	16m 44s	the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
-1 ❌	javac	16m 44s	root-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~~18.04-b01 with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~~18.04-b01 generated 4 new + 1939 unchanged - 4 fixed = 1943 total (was 1943)
-0 ⚠️	checkstyle	3m 23s	root: The patch generated 16 new + 72 unchanged - 2 fixed = 88 total (was 74)
+1 💚	mvnsite	2m 23s	the patch passed
-1 ❌	whitespace	0m 0s	The patch has 12 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
-1 ❌	whitespace	0m 0s	The patch 3 line(s) with tabs.
+1 💚	xml	0m 2s	The patch has no ill-formed XML file.
+1 💚	shadedclient	13m 54s	patch has no errors when building and testing our client artifacts.
+1 💚	javadoc	1m 14s	the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚	javadoc	2m 14s	the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
-1 ❌	findbugs	1m 23s	hadoop-tools/hadoop-aws generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
		_ Other Tests _
+1 💚	unit	9m 17s	hadoop-common in the patch passed.
+1 💚	unit	1m 39s	hadoop-aws in the patch passed.
+1 💚	asflicense	0m 55s	The patch does not generate ASF License warnings.
		211m 16s

Reason	Tests
FindBugs	module:hadoop-tools/hadoop-aws
	Found reliance on default encoding in org.apache.hadoop.fs.s3a.tools.MarkerTool.run(String[], PrintStream):in org.apache.hadoop.fs.s3a.tools.MarkerTool.run(String[], PrintStream): new java.io.FileWriter(String) At MarkerTool.java:[line 284]

Subsystem	Report/Notes
Docker	ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/11/artifact/out/Dockerfile
GITHUB PR	#2149
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml markdownlint
uname	Linux cc5f0b23c1ae 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `3fd3aeb`
Default Java	Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
javac	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/11/artifact/out/diff-compile-javac-root-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.txt
javac	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/11/artifact/out/diff-compile-javac-root-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01.txt
checkstyle	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/11/artifact/out/diff-checkstyle-root.txt
whitespace	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/11/artifact/out/whitespace-eol.txt
whitespace	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/11/artifact/out/whitespace-tabs.txt
findbugs	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/11/artifact/out/new-findbugs-hadoop-tools_hadoop-aws.html
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/11/testReport/
Max. process+thread count	3250 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/11/console
versions	git=2.17.1 maven=3.6.0 findbugs=4.0.6
Powered by	Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

mukund-thakur · 2020-08-12T07:27:28Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RenameOperation.java

   * Execute a full recursive rename.
-   * The source is a file: rename it to the destination.
-   * @throws IOException failure
+   * There is a special handling of directly markers here -only leaf markers


typo: directory

bgaborg · 2020-08-11T15:41:45Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3Guard.java

+   * @throws IOException if metadata store update failed
+   */
+  @RetryTranslated
+  public static boolean refreshEntry(


why don't we just put again? or just update the modtime if needed? if we only update modtime then the method name should reflect that.

bgaborg · 2020-08-11T15:58:30Z

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/directory_markers.md

+And, historically, When a path is listed, if a marker to that path is found, *it
+has been interpreted as an empty directory.*
+
+## <a name="problem"></a> Scale issues related to directory markers


maybe put the title in the html tag?

changed the title :)

bgaborg · 2020-08-11T16:00:26Z

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/directory_markers.md

+1. There's also the overhead of actually issuing the request and awaiting the
+response.
+
+Issue #2 has turned out to cause significant problems on some interactions with


yes, that's because how markdown works. it will do the ordering

bgaborg · 2020-08-12T09:49:11Z

hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AEncryptionSSEC.java

  @Test
  public void testDeleteEncryptedObjectWithDifferentKey() throws Exception {
-    requireUnguardedFilesystem();
+    //requireUnguardedFilesystem();


this is commented out, should it be removed instead?

bgaborg · 2020-08-12T12:07:18Z

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/directory_markers.md

+|-------------------------|-------------------------|
+| `fs.s3a.capability.directory.marker.aware`  | Does the filesystem support surplus directory markers? |
+| `fs.s3a.capability.directory.marker.keep`   | Does the path retain directory markers? |
+| `fs.s3a.capability.directory.marker.delete` | Does the path delete directory markers? |


Please add all the possible values of this propery this to this doc.

It's on a specific path, meaning in auth mode it will actually give you the real retain/delete state, rather than just "we retain on authoritative"; That's why I'd left it out. At this level (Application, not shell) my view is that the caller is more interested in what happens at that level...especially as the actual path to authoritativeness isn't visible.

...maybe we should have some props for bucket along with path, e.g. marker.path.keep and marker.path.delete for the dynamic ones?

bgaborg

Tested with fs.s3a.directory.marker.retention=delete against ireland. No issues. There were some nits and things to change, otherwise LGTM.

iwasakims · 2020-08-12T13:09:57Z

It looks good to me overall as I did not see relevant integration tests failures with the patch applied. It is a bit confusing that there are Parameterized tests covering both keep and delete in addition to -Dmarkers option while we need both.

It would help reviewing to split s3guard related part and performance tests to follow-up JIRAs. We can commit essential part first if it retains present default (DELETE) behaviour.

mvn verify -Dtest=x on trunk:

[ERROR] Errors: 
[ERROR]   ITestS3AConfiguration.testAutomaticProxyPortSelection:190->expectFSCreateFailure:159->lambda$expectFSCreateFailure$0:161 » AccessDenied
[ERROR]   ITestS3ATemporaryCredentials.testInvalidSTSBinding:257 » SdkClient Unable to f...
[ERROR]   ITestS3ATemporaryCredentials.testSTS:130 » SdkClient Unable to find a region v...
[ERROR]   ITestS3ATemporaryCredentials.testSessionRequestExceptionTranslation:441->lambda$testSessionRequestExceptionTranslation$5:442 » SdkClient
[ERROR]   ITestS3ATemporaryCredentials.testSessionTokenExpiry:222 » SdkClient Unable to ...
[ERROR]   ITestS3ATemporaryCredentials.testSessionTokenPropagation:193 » SdkClient Unabl...
[ERROR]   ITestDirectoryCommitProtocol>ITestStagingCommitProtocol.setup:67->AbstractITCommitProtocol.setup:160->AbstractITCommitProtocol.cleanupDestDir:120->AbstractCommitITest.rmdir:164 » AccessDenied
[INFO] 
[ERROR] Tests run: 700, Failures: 0, Errors: 7, Skipped: 251

mvn verify -Dtest=x -Dmarkers=keep with the patch applied:

[ERROR] Failures: 
[ERROR]   ITestSessionDelegationInFileystem.testDTUtilShell:719->dtutil:708->Assert.assertEquals:631->Assert.assertEquals:645->Assert.failNotEquals:834->Assert.fail:88 expected:<0> but was:<1>
[ERROR] Errors: 
[ERROR]   ITestS3AConfiguration.testAutomaticProxyPortSelection:190->expectFSCreateFailure:159->lambda$expectFSCreateFailure$0:161 » AccessDenied
[ERROR]   ITestS3ATemporaryCredentials.testInvalidSTSBinding:257 » SdkClient Unable to f...
[ERROR]   ITestS3ATemporaryCredentials.testSTS:130 » SdkClient Unable to find a region v...
[ERROR]   ITestS3ATemporaryCredentials.testSessionRequestExceptionTranslation:441->lambda$testSessionRequestExceptionTranslation$5:442 » SdkClient
[ERROR]   ITestS3ATemporaryCredentials.testSessionTokenExpiry:222 » SdkClient Unable to ...
[ERROR]   ITestS3ATemporaryCredentials.testSessionTokenPropagation:193 » SdkClient Unabl...
[ERROR]   ITestDelegatedMRJob.testJobSubmissionCollectsTokens:286 » SdkClient Unable to ...
[ERROR]   ITestSessionDelegationInFileystem.testAddTokensFromFileSystem:239 » SdkClient ...
[ERROR]   ITestSessionDelegationInFileystem.testCanRetrieveTokenFromCurrentUserCreds:264->createDelegationTokens:296->AbstractDelegationIT.mkTokens:88 » SdkClient
[ERROR]   ITestSessionDelegationInFileystem.testDTCredentialProviderFromCurrentUserCreds:282->createDelegationTokens:296->AbstractDelegationIT.mkTokens:88 » SdkClient
[ERROR]   ITestSessionDelegationInFileystem.testDelegatedFileSystem:312->createDelegationTokens:296->AbstractDelegationIT.mkTokens:88 » SdkClient
[ERROR]   ITestSessionDelegationInFileystem.testDelegationBindingMismatch1:441->createDelegationTokens:296->AbstractDelegationIT.mkTokens:88 » SdkClient
[ERROR]   ITestSessionDelegationInFileystem.testFileSystemBoundToCreator:690 » SdkClient
[ERROR]   ITestSessionDelegationInFileystem.testGetDTfromFileSystem:216 » SdkClient Unab...
[ERROR]   ITestSessionDelegationInFileystem.testHDFSFetchDTCommand:615->lambda$testHDFSFetchDTCommand$3:616 » SdkClient
[ERROR]   ITestSessionDelegationInFileystem.testYarnCredentialPickup:585 » SdkClient Una...
[ERROR]   ITestSessionDelegationTokens.testCreateAndUseDT:176 » SdkClient Unable to find...
[ERROR]   ITestSessionDelegationTokens.testCreateWithRenewer:230 » SdkClient Unable to f...
[ERROR]   ITestSessionDelegationTokens.testSaveLoadTokens:121 » SdkClient Unable to find...
[INFO] 
[ERROR] Tests run: 1349, Failures: 1, Errors: 19, Skipped: 521

adding the checks to bucket-info, with tests for this, provides a straightforward way to verify that an s3a client is compatible with kept markers. if the command `hadoop s3guard bucket-info -markers aware s3a://bucket/` succeeds then the client has the modifications to support directory markers above files. If it fails as an unknown option: not compatible Change-Id: I2b58501eda160f9c2598bf492908bc6b3bf34f28

hadoop-yetus · 2020-08-12T19:39:03Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 34s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 1s	No case conflicting files found.
+0 🆗	markdownlint	0m 1s	markdownlint was not available.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s	The patch appears to include 28 new or modified test files.
		_ trunk Compile Tests _
+0 🆗	mvndep	3m 21s	Maven dependency ordering for branch
+1 💚	mvninstall	25m 56s	trunk passed
+1 💚	compile	19m 28s	trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚	compile	16m 45s	trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚	checkstyle	2m 42s	trunk passed
+1 💚	mvnsite	2m 22s	trunk passed
+1 💚	shadedclient	19m 39s	branch has no errors when building and testing our client artifacts.
+1 💚	javadoc	1m 15s	trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚	javadoc	2m 15s	trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+0 🆗	spotbugs	1m 15s	Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚	findbugs	3m 25s	trunk passed
-0 ⚠️	patch	1m 36s	Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 28s	Maven dependency ordering for patch
+1 💚	mvninstall	1m 23s	the patch passed
+1 💚	compile	18m 47s	the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
-1 ❌	javac	18m 47s	root-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 generated 4 new + 2045 unchanged - 4 fixed = 2049 total (was 2049)
+1 💚	compile	16m 51s	the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
-1 ❌	javac	16m 51s	root-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~~18.04-b01 with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~~18.04-b01 generated 4 new + 1939 unchanged - 4 fixed = 1943 total (was 1943)
-0 ⚠️	checkstyle	2m 40s	root: The patch generated 20 new + 75 unchanged - 2 fixed = 95 total (was 77)
+1 💚	mvnsite	2m 17s	the patch passed
-1 ❌	whitespace	0m 0s	The patch has 20 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
-1 ❌	whitespace	0m 0s	The patch 3 line(s) with tabs.
+1 💚	xml	0m 2s	The patch has no ill-formed XML file.
+1 💚	shadedclient	13m 52s	patch has no errors when building and testing our client artifacts.
+1 💚	javadoc	1m 13s	the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚	javadoc	2m 17s	the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
-1 ❌	findbugs	1m 24s	hadoop-tools/hadoop-aws generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
		_ Other Tests _
+1 💚	unit	9m 19s	hadoop-common in the patch passed.
+1 💚	unit	1m 41s	hadoop-aws in the patch passed.
+1 💚	asflicense	0m 56s	The patch does not generate ASF License warnings.
		173m 52s

Reason	Tests
FindBugs	module:hadoop-tools/hadoop-aws
	Found reliance on default encoding in org.apache.hadoop.fs.s3a.tools.MarkerTool.run(String[], PrintStream):in org.apache.hadoop.fs.s3a.tools.MarkerTool.run(String[], PrintStream): new java.io.FileWriter(String) At MarkerTool.java:[line 287]

Subsystem	Report/Notes
Docker	ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/12/artifact/out/Dockerfile
GITHUB PR	#2149
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml markdownlint
uname	Linux 496f50095ac4 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `1071604`
Default Java	Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
javac	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/12/artifact/out/diff-compile-javac-root-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.txt
javac	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/12/artifact/out/diff-compile-javac-root-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01.txt
checkstyle	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/12/artifact/out/diff-checkstyle-root.txt
whitespace	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/12/artifact/out/whitespace-eol.txt
whitespace	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/12/artifact/out/whitespace-tabs.txt
findbugs	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/12/artifact/out/new-findbugs-hadoop-tools_hadoop-aws.html
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/12/testReport/
Max. process+thread count	1443 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/12/console
versions	git=2.17.1 maven=3.6.0 findbugs=4.0.6
Powered by	Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

mukund-thakur · 2020-08-13T07:14:57Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

+    LOG.debug("S3GetFileStatus {}", path);
+    Preconditions.checkArgument(!needEmptyDirectoryFlag
+        || probes.contains(StatusProbeEnum.List),
+        "s3GetFileStatus(%s) wants to know if a directory is empty but"


The condition and message doesn't seem consistent. Please review.

mukund-thakur · 2020-08-13T08:23:17Z

Overall looks good to me apart from some minor nits and doc changes. Really nice documentation.
Ran tests with two configs
mvn clean verify -Ds3guard -Ddynamo -Dauth -Dmarkers=delete -Dparallel-tests -DtestsThreadCount=8
mvn clean verify -Ds3guard -Ddynamo -Dauth -Dscale -Dparallel-tests -DtestsThreadCount=8
I don't see any related failure.

steveloughran · 2020-08-13T09:54:03Z

The condition and message doesn't seem consistent. Please review.

it is. I'm adding a comment. I'm going to put in a unit test to verify the condition. Key point: if needEmptyDirFlag is false: we don't care, if it is true then the probe must contain a list, so

!needEmptyDirectoryFlag ||
 (needEmptyDirectoryFlag  && probes.contains(StatusProbeEnum.List))

And obviously, the second test of the flag state is superflouous, which is why IDEs and things complain about it

steveloughran · 2020-08-13T14:05:30Z

It is a bit confusing that there are Parameterized tests covering both keep and delete in addition to -Dmarkers option while we need both.

the parameterized ones lets us explicitly explore behaviours when we turn features on and off, so verify that they do what we expect in terms of keep vs delete.

the -Dmarkers= lets us switch the entire test run to keep/delete, so verify that the whole API works with keep=true, and, with auditing enabled, that the delete option isn't somehow retaining markers. It's a PITA to have another option to run with, but making it explicit lets us show when pasting in our test results which option we used

It would help reviewing to split s3guard related part and performance tests to follow-up JIRAs. We can commit essential part first if it retains present default (DELETE) behaviour.

the s3guard bit which gabor was talking about? -if everyone wants it, then I shall reluctantly do this

the performance tests, I'd like them in. They are looking at the cost of delete/rename and listings. All the existing cost tests broke as fewer head calls were being made, more list etc, and trying to distinguish regressions from legit changes of the in-code-constants was hard. The OperationCost type with simple aggregation makes it easy to determine what the cost are.

oh, they aren't really performance BTW -cost. Should I put them in a cost/ package instead?

iwasakims

Thanks for the explanation. I agree that we need both parametrized tests and -Dmarkers= option.

the performance tests, I'd like them in. They are looking at the cost of delete/rename and listings.

The tests are nice addition while it takes time to walk through added code and test results. I'm ok here given that the itests passed and the test logs looks fine.

oh, they aren't really performance BTW -cost. Should I put them in a cost/ package instead?

We can do it later, after another performance tests are added. ITestDirectoryMarkerListing could be moved to under another package since it is not for performance nor cost.

I would like to move this forward. +1, pending checkstyle and findbugs warnings.

* specific bucket-level "marker.policy.{delete, keep, authoritative}" probes, which * dynamic probes renamed marker.action.{keep, delete} Capability logic all moved into DirectoryMarkerPolicyImpl ...which means it is now testable in unit tests. Tests added. Checkstyles. Change-Id: I27db716097a3bc1e1fe2639d2e90c1e855658675

steveloughran · 2020-08-14T11:11:24Z

@iwasakims thanks for that review.

checkstyles in, and feedback from Gabor: we now have static path capabilities probes for bucket policy as well as the dynamic "what will happen on this path". All in DirectoryPolicyImpl, so it can be unit tested too!

Testing: s3 london: -Dparallel-tests -DtestsThreadCount=4 -Dmarkers=keep -Dscale.

and
-Dparallel-tests -DtestsThreadCount=4 -Dmarkers=delete -Ds3guard -Ddynamo -Dscale -Dfs.s3a.directory.marker.audit=true

The first test run failed in ITestS3AConfiguration.testProxyConnection; one I've seen before, where an expected failure on proxies succeeds -this on a bucket where I've disabled existence checks. Doesn't happen when guarded. Hypothesis: when s3guard is on, the proxy issues surface in initialize(), when off: not until the first FS connection. Filed https://issues.apache.org/jira/browse/HADOOP-17207

hadoop-yetus · 2020-08-14T14:44:39Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	2m 20s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 1s	No case conflicting files found.
+0 🆗	markdownlint	0m 1s	markdownlint was not available.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s	The patch appears to include 28 new or modified test files.
		_ trunk Compile Tests _
+0 🆗	mvndep	3m 23s	Maven dependency ordering for branch
+1 💚	mvninstall	30m 43s	trunk passed
+1 💚	compile	23m 58s	trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚	compile	22m 46s	trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚	checkstyle	2m 56s	trunk passed
+1 💚	mvnsite	2m 11s	trunk passed
+1 💚	shadedclient	21m 8s	branch has no errors when building and testing our client artifacts.
+1 💚	javadoc	0m 57s	trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚	javadoc	2m 2s	trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+0 🆗	spotbugs	1m 10s	Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚	findbugs	3m 20s	trunk passed
-0 ⚠️	patch	1m 28s	Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 23s	Maven dependency ordering for patch
+1 💚	mvninstall	1m 26s	the patch passed
+1 💚	compile	20m 8s	the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
-1 ❌	javac	20m 8s	root-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 generated 4 new + 2048 unchanged - 4 fixed = 2052 total (was 2052)
+1 💚	compile	17m 26s	the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
-1 ❌	javac	17m 26s	root-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~~18.04-b01 with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~~18.04-b01 generated 4 new + 1941 unchanged - 4 fixed = 1945 total (was 1945)
-0 ⚠️	checkstyle	2m 54s	root: The patch generated 3 new + 75 unchanged - 2 fixed = 78 total (was 77)
+1 💚	mvnsite	2m 7s	the patch passed
-1 ❌	whitespace	0m 0s	The patch has 21 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
-1 ❌	whitespace	0m 0s	The patch 3 line(s) with tabs.
+1 💚	xml	0m 1s	The patch has no ill-formed XML file.
+1 💚	shadedclient	15m 40s	patch has no errors when building and testing our client artifacts.
+1 💚	javadoc	0m 55s	the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚	javadoc	1m 59s	the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚	findbugs	3m 37s	the patch passed
		_ Other Tests _
+1 💚	unit	10m 26s	hadoop-common in the patch passed.
+1 💚	unit	1m 25s	hadoop-aws in the patch passed.
+1 💚	asflicense	0m 48s	The patch does not generate ASF License warnings.
		194m 41s

Subsystem	Report/Notes
Docker	ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/13/artifact/out/Dockerfile
GITHUB PR	#2149
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml markdownlint
uname	Linux dda80bdd291e 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `86bbd38`
Default Java	Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
javac	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/13/artifact/out/diff-compile-javac-root-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.txt
javac	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/13/artifact/out/diff-compile-javac-root-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01.txt
checkstyle	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/13/artifact/out/diff-checkstyle-root.txt
whitespace	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/13/artifact/out/whitespace-eol.txt
whitespace	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/13/artifact/out/whitespace-tabs.txt
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/13/testReport/
Max. process+thread count	1913 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/13/console
versions	git=2.17.1 maven=3.6.0 findbugs=4.0.6
Powered by	Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

steveloughran · 2020-08-15T11:29:26Z

checkstyles are either unfixable (initialize()) or deliberate (test case naming)

./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:337:  public void initialize(URI name, Configuration originalConf):3: Method length is 151 lines (max allowed is 150). [MethodLength]
./hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/tools/ITestMarkerToolRootOperations.java:49:  public void test_100_audit_root_noauth() throws Throwable {:15: Name 'test_100_audit_root_noauth' must match pattern '^[a-z][a-zA-Z0-9]*$'. [MethodName]
./hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/tools/ITestMarkerToolRootOperations.java:60:  public void test_200_clean_root() throws Throwable {:15: Name 'test_200_clean_root' must match pattern '^[a-z][a-zA-Z0-9]*$'. [MethodName]

I'm going to merge this in now -just do one final review of the docs. I think I'll add a compatibility statement on the index.md page too, something to repeat for the other releases as I backport the getFileStatus change

Change-Id: If76d9f3c6918d5c3cfd9bb28d4a97e35654139ea

steveloughran · 2020-08-15T12:39:09Z

and its merged in. Thanks to everyone for their reviews and feedback!

Next steps

3.3.x cherrypick and test rerun (in progress)
3.2.x listing patch
patch from 3.2.x to 3.1, 3.0.
I'll see about a branch-2 fix, because I'll inevitably end up having to do that...

hadoop-yetus · 2020-08-15T14:45:25Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 31s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 1s	No case conflicting files found.
+0 🆗	markdownlint	0m 1s	markdownlint was not available.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s	The patch appears to include 28 new or modified test files.
		_ trunk Compile Tests _
+0 🆗	mvndep	3m 23s	Maven dependency ordering for branch
+1 💚	mvninstall	26m 20s	trunk passed
+1 💚	compile	19m 48s	trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚	compile	16m 47s	trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚	checkstyle	2m 41s	trunk passed
+1 💚	mvnsite	2m 21s	trunk passed
+1 💚	shadedclient	19m 44s	branch has no errors when building and testing our client artifacts.
+1 💚	javadoc	1m 16s	trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚	javadoc	2m 20s	trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+0 🆗	spotbugs	1m 14s	Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚	findbugs	3m 25s	trunk passed
-0 ⚠️	patch	1m 35s	Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 26s	Maven dependency ordering for patch
+1 💚	mvninstall	1m 26s	the patch passed
+1 💚	compile	18m 44s	the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
-1 ❌	javac	18m 44s	root-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 generated 4 new + 2045 unchanged - 4 fixed = 2049 total (was 2049)
+1 💚	compile	16m 44s	the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
-1 ❌	javac	16m 44s	root-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~~18.04-b01 with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~~18.04-b01 generated 4 new + 1939 unchanged - 4 fixed = 1943 total (was 1943)
-0 ⚠️	checkstyle	2m 41s	root: The patch generated 3 new + 75 unchanged - 2 fixed = 78 total (was 77)
+1 💚	mvnsite	2m 21s	the patch passed
-1 ❌	whitespace	0m 0s	The patch has 19 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
-1 ❌	whitespace	0m 0s	The patch 3 line(s) with tabs.
+1 💚	xml	0m 1s	The patch has no ill-formed XML file.
+1 💚	shadedclient	14m 10s	patch has no errors when building and testing our client artifacts.
+1 💚	javadoc	1m 14s	the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚	javadoc	2m 21s	the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚	findbugs	3m 43s	the patch passed
		_ Other Tests _
+1 💚	unit	9m 20s	hadoop-common in the patch passed.
+1 💚	unit	1m 44s	hadoop-aws in the patch passed.
+1 💚	asflicense	0m 56s	The patch does not generate ASF License warnings.
		175m 16s

Subsystem	Report/Notes
Docker	ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/14/artifact/out/Dockerfile
GITHUB PR	#2149
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml markdownlint
uname	Linux 8bf5646cbc90 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `e3d1966`
Default Java	Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
javac	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/14/artifact/out/diff-compile-javac-root-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.txt
javac	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/14/artifact/out/diff-compile-javac-root-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01.txt
checkstyle	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/14/artifact/out/diff-checkstyle-root.txt
whitespace	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/14/artifact/out/whitespace-eol.txt
whitespace	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/14/artifact/out/whitespace-tabs.txt
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/14/testReport/
Max. process+thread count	2515 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2149/14/console
versions	git=2.17.1 maven=3.6.0 findbugs=4.0.6
Powered by	Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

steveloughran · 2020-08-28T16:48:35Z

doing some followup tuning of the CLI tooling in #2254 ; nothing major. Better for integration test and general message formatting...the things which only surface after playing with the tool for a few days

liuml07 · 2020-10-08T04:33:32Z

Thanks Steve. Looking forward to a Hadoop 2 backport!

steveloughran · 2020-10-08T09:40:34Z

@liuml07 Looking forward to a Hadoop 2 backport!

I'll do a hadoop-2 fixup of list time changes, but if you look #2310 delete gets complex, especially handling partial failures of deletes. I'm going to have to say that any branch without that (major) change will be able to read data in a path which keeps markers, but it must not delete things -otherwise S3Guard can get out of sync.

liapengpony · 2025-08-03T14:50:12Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

   */
  @Override
  @SuppressWarnings("deprecation")
  public boolean isDirectory(Path f) throws IOException {


@steveloughran it looks the change to this function was meant to optimize performance, but I am experiencing performance regression when upgrading spark version from 3.1.2 to 3.5.1, and I found it caused by this change.

When I do a spark.read.parquet('s3a://path/to/1.parquet', ..., 's3a://path/to/10000.parquet') before this change, ONLY HEAD requests are sent to build the DataFrame. However, after this change, LIST requests are sent, which is significantly slower as I am reading from quite a lot of parquets.

The docstring "it is optimized to a single HEAD" also confuses me because StatusProbeEnum.DIRECTORIES is just an alias for StatusProbeEnum.LIST_ONLY.

Am I missing anything here?

not good. file a PR, including what you can of the stack of checks.

What is probably happening is that the method calling this is assuming all the paths are directories (which this call is optimised for) but as all the paths are files it ends up doing

LIST path

so yes, it would be a step backwards. The code should be calling getFileStatus to really get everything about a file.

how, why are yo providing a list of many may files, given that spark expects to be working on a directory at a time?

Thanks for the reply!

not good. file a PR, including what you can of the stack of checks.

Will do.

how, why are yo providing a list of many may files, given that spark expects to be working on a directory at a time?

We have an upstream service generating many parquet files, whose paths are long and deep in hierarchies. I am responsible for ingesting them into a Iceberg table with a PySpark cron job. Reading a list of many files instead of a directory is to workaround the slow and recursive S3 (in our case, Ceph RGW) LIST calls, so that I just need to do one LIST call before passing input to spark.read.parquet().

apache deleted a comment from hadoop-yetus Jul 21, 2020

steveloughran force-pushed the s3/HADOOP-13230-dir-markers branch from 9fb5ebf to 7951cd3 Compare July 22, 2020 16:45

steveloughran mentioned this pull request Jul 24, 2020

HADOOP-1320. Dir Marker getFileStatus() changes backport #2170

Closed

steveloughran requested a review from bgaborg July 24, 2020 15:11

steveloughran force-pushed the s3/HADOOP-13230-dir-markers branch 2 times, most recently from 92ef217 to fa83f8d Compare July 28, 2020 13:44

apache deleted a comment from hadoop-yetus Jul 28, 2020

iwasakims reviewed Jul 29, 2020

View reviewed changes

hadoop-tools/hadoop-aws/pom.xml Outdated Show resolved Hide resolved

apache deleted a comment from hadoop-yetus Jul 31, 2020

apache deleted a comment from hadoop-yetus Aug 4, 2020

steveloughran added 4 commits August 4, 2020 16:10

HADOOP-13230. Dir Marker patch: reapply everything from previous PR

66cac81

Change-Id: I93097ff39f7254f18f8d382ad891002502c7112d

HADOOP-13230 dir marker -cost assertion classes and tests evolving

93be7b6

Change-Id: I1093fcce9737ac18a5e9a6cfb4f31da6731faf70

HADOOP-13230 operation cost tests

f21b787

-always have one run in auth mode -cleaning up and javadocing -plan to pull up a base class and then move the delete and rename ops away Change-Id: I7fbf885de73eccc0bc842a4987ac12682ff1c63c

HADOOP-13230 factor out base class for cost tests and split suite by …

4ec17cd

…category Change-Id: Ifadd671a6e39a4a8f4c4a6ab600132bc812ef79e

mukund-thakur reviewed Aug 12, 2020

View reviewed changes

bgaborg reviewed Aug 12, 2020

View reviewed changes

mukund-thakur reviewed Aug 13, 2020

View reviewed changes

iwasakims approved these changes Aug 13, 2020

View reviewed changes

HADOOP-13230. Final doc review

e24d42b

Change-Id: If76d9f3c6918d5c3cfd9bb28d4a97e35654139ea

steveloughran closed this Aug 15, 2020

steveloughran added the fs/s3 changes related to hadoop-aws; submitter must declare test endpoint label Aug 15, 2020

steveloughran modified the milestone: hadoop-3.3.0 Aug 15, 2020

steveloughran deleted the s3/HADOOP-13230-dir-markers branch October 15, 2021 19:43

liapengpony reviewed Aug 3, 2025

View reviewed changes

HADOOP-13230. S3A to optionally retain directory markers #2149

HADOOP-13230. S3A to optionally retain directory markers #2149

Uh oh!

Conversation

steveloughran commented Jul 16, 2020

Uh oh!

steveloughran commented Jul 16, 2020

Uh oh!

steveloughran commented Jul 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steveloughran commented Jul 22, 2020

Uh oh!

steveloughran commented Jul 24, 2020

Uh oh!

Uh oh!

iwasakims commented Jul 29, 2020

Uh oh!

steveloughran commented Jul 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steveloughran commented Jul 30, 2020

Uh oh!

steveloughran commented Jul 31, 2020

Uh oh!

steveloughran commented Jul 31, 2020

Uh oh!

hadoop-yetus commented Aug 4, 2020

Uh oh!

steveloughran commented Aug 4, 2020

Uh oh!

hadoop-yetus commented Aug 4, 2020

Uh oh!

steveloughran commented Aug 11, 2020

Uh oh!

hadoop-yetus commented Aug 12, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bgaborg left a comment

Choose a reason for hiding this comment

Uh oh!

iwasakims commented Aug 12, 2020

Uh oh!

hadoop-yetus commented Aug 12, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mukund-thakur commented Aug 13, 2020

Uh oh!

steveloughran commented Aug 13, 2020

Uh oh!

steveloughran commented Aug 13, 2020

Uh oh!

iwasakims left a comment

Choose a reason for hiding this comment

Uh oh!

steveloughran commented Aug 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hadoop-yetus commented Aug 14, 2020

Uh oh!

steveloughran commented Aug 15, 2020

Uh oh!

steveloughran commented Jul 16, 2020 •

edited

Loading

steveloughran commented Jul 29, 2020 •

edited

Loading

steveloughran commented Aug 14, 2020 •

edited

Loading