Skip to content

Conversation

@steveloughran
Copy link
Contributor

@steveloughran steveloughran commented Aug 27, 2020

  • move from -expect to -min and -max; easier for CLI testing. Plus works
  • in -nonauth mode, even when policy == keep, files not in an auth path
    count as failure.
  • bucket-info option also prints out the authoritative path, so you have
    more idea what is happening
  • reporting of command failure more informative

The reason for change #2 is a workflow where you want to audit a dir, even
though you are in keep mode, and you don't have any auth path. You'd expect
-nonauth to say "no auth path", but instead it treats the whole dir as
auth.

https://issues.apache.org/jira/browse/HADOOP-17227

@steveloughran
Copy link
Contributor Author

bucket info gets improved too, based on working on the CLI.

bin/hadoop s3guard bucket-info s3a://stevel-london/
2020-08-28 15:45:02,624 [main] INFO  impl.DirectoryPolicyImpl (DirectoryPolicyImpl.java:getDirectoryPolicy(193)) - Directory markers will be kept on authoritative paths
Filesystem s3a://stevel-london
Location: eu-west-2
Filesystem s3a://stevel-london is using S3Guard with store DynamoDBMetadataStore{region=eu-west-2, tableName=stevel-london, tableArn=arn:aws:dynamodb:eu-west-2:152813717728:table/stevel-london}
Authoritative Metadata Store: fs.s3a.metadatastore.authoritative=false
Authoritative Path: fs.s3a.authoritative.path=/tables
Qualified Authoritative Paths:
	s3a://stevel-london/tables/

	Metadata time to live: (set in fs.s3a.metadatastore.metadata.ttl) = 00:15:00.000
Metadata Store Diagnostics:
	ARN=arn:aws:dynamodb:eu-west-2:152813717728:table/stevel-london
	billing-mode=per-request
	description=S3Guard metadata store in DynamoDB
	name=stevel-london
	persist.authoritative.bit=true
	read-capacity=0
	region=eu-west-2
	retryPolicy=ExponentialBackoffRetry(maxRetries=9, sleepTime=100 MILLISECONDS)
	size=38538
	sse=DISABLED
	status=ACTIVE
	table={AttributeDefinitions: [{AttributeName: child,AttributeType: S}, {AttributeName: parent,AttributeType: S}],TableName: stevel-london,KeySchema: [{AttributeName: parent,KeyType: HASH}, {AttributeName: child,KeyType: RANGE}],TableStatus: ACTIVE,CreationDateTime: Mon Mar 16 20:21:32 GMT 2020,ProvisionedThroughput: {NumberOfDecreasesToday: 0,ReadCapacityUnits: 0,WriteCapacityUnits: 0},TableSizeBytes: 38538,ItemCount: 300,TableArn: arn:aws:dynamodb:eu-west-2:152813717728:table/stevel-london,TableId: 422878eb-823c-4071-826a-3746b7c8fd18,BillingModeSummary: {BillingMode: PAY_PER_REQUEST,LastUpdateToPayPerRequestDateTime: Mon Mar 16 20:21:32 GMT 2020},}
	write-capacity=0

S3A Client
	Signing Algorithm: fs.s3a.signing-algorithm=(unset)
	Endpoint: fs.s3a.endpoint=s3.eu-west-2.amazonaws.com
	Encryption: fs.s3a.server-side-encryption-algorithm=none
	Input seek policy: fs.s3a.experimental.input.fadvise=normal
	Change Detection Source: fs.s3a.change.detection.source=etag
	Change Detection Mode: fs.s3a.change.detection.mode=server

S3A Committers
	The "magic" committer is supported in the filesystem
	S3A Committer factory class: mapreduce.outputcommitter.factory.scheme.s3a=org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory
	S3A Committer name: fs.s3a.committer.name=directory
	Cluster filesystem staging directory: fs.s3a.committer.staging.tmp.path=tmp/staging
	Local filesystem buffer directory: fs.s3a.buffer.dir=/tmp/hadoop-stevel/s3a
	File conflict resolution: fs.s3a.committer.staging.conflict-mode=append

Security
	Delegation token support is disabled

Security
	The directory marker policy is "authoritative"
	Available Policies: delete, keep, authoritative
	Authoritative paths: fs.s3a.authoritative.path=/tables

@apache apache deleted a comment from hadoop-yetus Aug 28, 2020
@steveloughran steveloughran added the fs/s3 changes related to hadoop-aws; submitter must declare test endpoint label Aug 28, 2020
@steveloughran
Copy link
Contributor Author

./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3GuardTool.java:28:import java.time.Duration;:8: Unused import - java.time.Duration. [UnusedImports]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3GuardTool.java:763:    public static final String PURPOSE = "destroy the Metadata Store including its contents": Line is longer than 80 characters (found 92). [LineLength]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/tools/MarkerTool.java:332:            String.format("Argument for %s is not a number: %s", option, value));: Line is longer than 80 characters (found 81). [LineLength]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/tools/MarkerTool.java:435:    /**: First sentence should end with a period. [JavadocStyle]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/tools/MarkerTool.java:622:  private ScanResult failScan(ScanResult result, int code, String message, Object...args) {: Line is longer than 80 characters (found 91). [LineLength]
./hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/AbstractS3ATestBase.java

@steveloughran
Copy link
Contributor Author

aah, the builder API raises checkstyles

./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3GuardTool.java:762:    public static final String PURPOSE = "destroy the Metadata Store including its": Line is longer than 80 characters (found 83). [LineLength]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/tools/MarkerTool.java:933:    public ScanArgsBuilder withSourceFS(final FileSystem sourceFS) {:58: 'sourceFS' hides a field. [HiddenField]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/tools/MarkerTool.java:939:    public ScanArgsBuilder withPath(final Path path) {:48: 'path' hides a field. [HiddenField]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/tools/MarkerTool.java:945:    public ScanArgsBuilder withDoPurge(final boolean doPurge) {:54: 'doPurge' hides a field. [HiddenField]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/tools/MarkerTool.java:951:    public ScanArgsBuilder withMinMarkerCount(final int minMarkerCount) {:57: 'minMarkerCount' hides a field. [HiddenField]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/tools/MarkerTool.java:957:    public ScanArgsBuilder withMaxMarkerCount(final int maxMarkerCount) {:57: 'maxMarkerCount' hides a field. [HiddenField]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/tools/MarkerTool.java:963:    public ScanArgsBuilder withLimit(final int limit) {:48: 'limit' hides a field. [HiddenField]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/tools/MarkerTool.java:969:    public ScanArgsBuilder withNonAuth(final boolean nonAuth) {:54: 'nonAuth' hides a field. [HiddenField]

@steveloughran
Copy link
Contributor Author

  • for testing, it'd be good if we could count objects -minobjects, -maxobjects, which would be for all objects, markers included, under a path. Helps verify rename &c, even when s3guard is enabled.

@apache apache deleted a comment from hadoop-yetus Sep 3, 2020
@apache apache deleted a comment from hadoop-yetus Sep 3, 2020
* move from -expect to -min and -max; easier for CLI testing. Plus works
* in -nonauth mode, even when policy == keep, files not in an auth path
  count as failure.
* bucket-info option also prints out the authoritative path, so you have
  more idea what is happening
* reporting of command failure more informative

The reason for change #2 is a workflow where you want to audit a dir, even
though you are in keep mode, and you don't have any auth path. You'd expect
-nonauth to say "no auth path", but instead it treats the whole dir as
auth.

Change-Id: Ib310e321e5862957fbd92bebfade93231f92b16f
Change-Id: Iddcefb26a7de0fce0c7b6ae0d679590005cd63b6
* fix checkstyle
* use bulder API for passing (Growing) set of params around

Change-Id: I1ce980a4d7d4f5e9ad7f1c7b7fa4c6fd9806b8f1
@steveloughran steveloughran force-pushed the s3/HADOOP-17227-markers-expect branch from d865487 to 1887f33 Compare September 3, 2020 12:18
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 29s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 29m 2s trunk passed
+1 💚 compile 0m 42s trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 compile 0m 37s trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 checkstyle 0m 29s trunk passed
+1 💚 mvnsite 0m 42s trunk passed
+1 💚 shadedclient 15m 7s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 24s trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javadoc 0m 31s trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+0 🆗 spotbugs 1m 5s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 2s trunk passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 34s the patch passed
+1 💚 compile 0m 33s the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javac 0m 33s the patch passed
+1 💚 compile 0m 29s the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 javac 0m 29s the patch passed
-0 ⚠️ checkstyle 0m 19s hadoop-tools/hadoop-aws: The patch generated 8 new + 11 unchanged - 0 fixed = 19 total (was 11)
+1 💚 mvnsite 0m 33s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 14m 34s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 19s the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javadoc 0m 27s the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 findbugs 1m 11s the patch passed
_ Other Tests _
+1 💚 unit 1m 30s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 32s The patch does not generate ASF License warnings.
72m 4s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2254/4/artifact/out/Dockerfile
GITHUB PR #2254
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle markdownlint
uname Linux 0ced8146f717 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 5e12dc5
Default Java Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
checkstyle https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2254/4/artifact/out/diff-checkstyle-hadoop-tools_hadoop-aws.txt
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2254/4/testReport/
Max. process+thread count 413 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2254/4/console
versions git=2.17.1 maven=3.6.0 findbugs=4.0.6
Powered by Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@bgaborg
Copy link

bgaborg commented Sep 3, 2020

Nice improvement Steve, LGTM, +1

Change-Id: I49cc69ad61601fd858005323e73ae5ad7178e82e
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 30s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 29m 10s trunk passed
+1 💚 compile 0m 42s trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 compile 0m 36s trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 checkstyle 0m 29s trunk passed
+1 💚 mvnsite 0m 42s trunk passed
+1 💚 shadedclient 15m 24s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 25s trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javadoc 0m 32s trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+0 🆗 spotbugs 1m 5s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 3s trunk passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 36s the patch passed
+1 💚 compile 0m 34s the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javac 0m 34s the patch passed
+1 💚 compile 0m 33s the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 javac 0m 33s the patch passed
-0 ⚠️ checkstyle 0m 22s hadoop-tools/hadoop-aws: The patch generated 1 new + 11 unchanged - 0 fixed = 12 total (was 11)
+1 💚 mvnsite 0m 37s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 13m 47s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 19s the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javadoc 0m 26s the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 findbugs 1m 5s the patch passed
_ Other Tests _
+1 💚 unit 1m 31s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
71m 59s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2254/5/artifact/out/Dockerfile
GITHUB PR #2254
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle markdownlint
uname Linux 6007ceecd302 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 5c15815
Default Java Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
checkstyle https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2254/5/artifact/out/diff-checkstyle-hadoop-tools_hadoop-aws.txt
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2254/5/testReport/
Max. process+thread count 413 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2254/5/console
versions git=2.17.1 maven=3.6.0 findbugs=4.0.6
Powered by Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@steveloughran steveloughran merged commit 5346cc3 into apache:trunk Sep 4, 2020
asfgit pushed a commit that referenced this pull request Sep 4, 2020
Contributed by Steve Loughran.
@steveloughran steveloughran deleted the s3/HADOOP-17227-markers-expect branch October 15, 2021 19:43
jojochuang pushed a commit to jojochuang/hadoop that referenced this pull request May 23, 2023
Contributed by Steve Loughran.

Change-Id: Ia36f058456db94c7358bc113ef298652445b03d3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fs/s3 changes related to hadoop-aws; submitter must declare test endpoint

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants