Skip to content

Conversation

@steveloughran
Copy link
Contributor

@steveloughran steveloughran commented Apr 27, 2020

Contributed by Steve Loughran.

This patch adds to hadoop-common an API For querying IO classes (Especially
input and output streams) for statistics.

It includes a big rework of the S3A Statistics including

  • implementation of the IOStatistics APIs
  • and contract tests for those and any other streams which implement the same interfaces and the same bytes read/written counters.
  • A split of the existing S3AInstrumentation classes into interface/implementations.
  • Troubled attempt to wire up the AWSSDK metrics

The AWS metric binding is breaking some of the S3 region handling code, so is
disabled, we're still using the old "create client then set endpoint" logic rather
than the builder API for constructing the S3 client.

Doing the public interface hand-in-hand with that implementation helps evolve
the interface, but it makes for a bigger patch.

There are contract tests for those and any other streams which implement
the same interfaces and the same bytes read/written counters.

Proposed: once the reviewers are happy with the design we can split the two up
into the hadoop-common changes (which can be used in ABFS) and the S3A FS
changes.

@steveloughran
Copy link
Contributor Author

Successor to #1820.

regarding AWS metric failures, when enabled code which goes near landat and common crawl buckets are failing, even when the default endpoint is being used


org.apache.hadoop.fs.s3a.AWSRedirectException: HEAD on landsat-pds: com.amazonaws.services.s3.model.AmazonS3Exception: The bucket is in this region: us-west-2. Please use this region to retry the request (Service: Amazon S3; Status Code: 301; Error Code: 301 Moved Permanently; Request ID: A783303EE9485EA1; S3 Extended Request ID: AT6EVbOELJpaqbsFDDAgH8FRHBv4WGkP4Cssk6N9ANLYIYFbeVqQllVf0/dKCZgKSV6MrPHMTzI=), S3 Extended Request ID: AT6EVbOELJpaqbsFDDAgH8FRHBv4WGkP4Cssk6N9ANLYIYFbeVqQllVf0/dKCZgKSV6MrPHMTzI=:301 Moved Permanently: The bucket is in this region: us-west-2. Please use this region to retry the request (Service: Amazon S3; Status Code: 301; Error Code: 301 Moved Permanently; Request ID: A783303EE9485EA1; S3 Extended Request ID: AT6EVbOELJpaqbsFDDAgH8FRHBv4WGkP4Cssk6N9ANLYIYFbeVqQllVf0/dKCZgKSV6MrPHMTzI=)

	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:234)
	at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:112)
	at org.apache.hadoop.fs.s3a.auth.delegation.ITestSessionDelegationInFileystem.readLandsatMetadata(ITestSessionDelegationInFileystem.java:574)
	at org.apache.hadoop.fs.s3a.auth.delegation.ITestSessionDelegationInFileystem.testDelegatedFileSystem(ITestSessionDelegationInFileystem.java:312)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The bucket is in this region: us-west-2. Please use this region to retry the request (Service: Amazon S3; Status Code: 301; Error Code: 301 Moved Permanently; Request ID: A783303EE9485EA1; S3 Extended Request ID: AT6EVbOELJpaqbsFDDAgH8FRHBv4WGkP4Cssk6N9ANLYIYFbeVqQllVf0/dKCZgKSV6MrPHMTzI=), S3 Extended Request ID: AT6EVbOELJpaqbsFDDAgH8FRHBv4WGkP4Cssk6N9ANLYIYFbeVqQllVf0/dKCZgKSV6MrPHMTzI=
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4920)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4866)
	at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1320)
	at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1294)
	at org.apache.hadoop.fs.s3a.auth.delegation.ITestSessionDelegationInFileystem.lambda$readLandsatMetadata$2(ITestSessionDelegationInFileystem.java:575)
	at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:110)
	... 17 more

* This package contains support for statistic collection and reporting.
* This is the public API; implementation classes are to be kept elsewhere.
*
* This package is defines two interfaces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx

@apache apache deleted a comment from hadoop-yetus Apr 29, 2020
* @param eval evaluator for the statistic
* @return the builder.
*/
public DynamicIOStatisticsBuilder add(String key,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can I add a counter of type long through the builder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

via a lambda expression

LOG.info("Statistics = {}", strVal);
verifyStatisticValue(statistics, STREAM_WRITE_BYTES, 1);
} finally {
fs.delete(path, false);
Copy link
Contributor

@mehakmeet mehakmeet May 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is fs.delete() required in these tests? Won't teardown() take care of it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just being thorough

}

/**
* Keys which the output stream must support.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output input

@steveloughran
Copy link
Contributor Author

How can I add a counter of type long through the builder?

it takes any lambda expression, so you can go add(key, (key) ->this.getLongval()).

Having some method add(key, long) wouldn't work, as that's not an evaluator, only a static value.

@steveloughran
Copy link
Contributor Author

failure on a test run, which is related to stats gathering. But I don't see this when run from the IDE. Race condition?

[ERROR]   ITestCommitOperations.testBulkCommitFiles:650->Assert.assertEquals:645->Assert.failNotEquals:834->Assert.fail:88 Number of records written after commit #2; first commit had 4; first commit ancestors CommitContext{operationState=AncestorState{operation=Commitid=55; dest=s3a://stevel-london/fork-0006/test/DELAY_LISTING_ME/testBulkCommitFiles/out; size=6; paths={s3a://stevel-london/fork-0006/test/DELAY_LISTING_ME/testBulkCommitFiles/out/file1 s3a://stevel-london/fork-0006 s3a://stevel-london/fork-0006/test/DELAY_LISTING_ME/testBulkCommitFiles s3a://stevel-london/fork-0006/test/DELAY_LISTING_ME/testBulkCommitFiles/out s3a://stevel-london/fork-0006/test s3a://stevel-london/fork-0006/test/DELAY_LISTING_ME}}}; second commit ancestors: CommitContext{operationState=AncestorState{operation=Commitid=55; dest=s3a://stevel-london/fork-0006/test/DELAY_LISTING_ME/testBulkCommitFiles/out; size=8; paths={s3a://stevel-london/fork-0006/test/DELAY_LISTING_ME/testBulkCommitFiles/out/file1 s3a://stevel-london/fork-0006/test/DELAY_LISTING_ME/testBulkCommitFiles/out/subdir/file2 s3a://stevel-london/fork-0006 s3a://stevel-london/fork-0006/test/DELAY_LISTING_ME/testBulkCommitFiles s3a://stevel-london/fork-0006/test/DELAY_LISTING_ME/testBulkCommitFiles/out s3a://stevel-london/fork-0006/test s3a://stevel-london/fork-0006/test/DELAY_LISTING_ME/testBulkCommitFiles/out/subdir s3a://stevel-london/fork-0006/test/DELAY_LISTING_ME}}}: s3guard_metadatastore_record_writes expected:<2> but was:<3>
[ERROR]   ITestS3ACommitterMRJob.test_200_execute:304->Assert.fail:88 Job job_1588787902323_0003 failed in state FAILED with cause Job commit failed: java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@2104a338 rejected from org.apache.hadoop.util.concurrent.HadoopThreadPoolExecutor@7272786a[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 10]
	at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
	at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
	at org.apache.hadoop.fs.s3a.commit.Tasks$Builder.runParallel(Tasks.java:313)
	at org.apache.hadoop.fs.s3a.commit.Tasks$Builder.run(Tasks.java:148)
	at org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.commitPendingUploads(AbstractS3ACommitter.java:480)
	at org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.commitJobInternal(AbstractS3ACommitter.java:620)
	at org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.commitJob(AbstractS3ACommitter.java:722)
	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:286)
	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:238)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

@steveloughran
Copy link
Contributor Author

cannot repeat this was an eight thread parallel run, maybe there was some retry/timeout

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 40s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 22 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 0m 55s Maven dependency ordering for branch
+1 💚 mvninstall 20m 44s trunk passed
+1 💚 compile 18m 39s trunk passed
+1 💚 checkstyle 2m 47s trunk passed
+1 💚 mvnsite 2m 19s trunk passed
+1 💚 shadedclient 19m 49s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 50s trunk passed
+0 🆗 spotbugs 1m 15s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 3m 20s trunk passed
-0 ⚠️ patch 1m 36s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 26s Maven dependency ordering for patch
+1 💚 mvninstall 1m 24s the patch passed
+1 💚 compile 16m 37s the patch passed
-1 ❌ javac 16m 37s root generated 1 new + 1870 unchanged - 1 fixed = 1871 total (was 1871)
-0 ⚠️ checkstyle 2m 49s root: The patch generated 10 new + 81 unchanged - 19 fixed = 91 total (was 100)
+1 💚 mvnsite 2m 25s the patch passed
-1 ❌ whitespace 0m 0s The patch has 6 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 xml 0m 2s The patch has no ill-formed XML file.
+1 💚 shadedclient 13m 57s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 47s the patch passed
+1 💚 findbugs 3m 35s the patch passed
_ Other Tests _
-1 ❌ unit 9m 15s hadoop-common in the patch passed.
+1 💚 unit 1m 34s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 55s The patch does not generate ASF License warnings.
125m 24s
Reason Tests
Failed junit tests hadoop.fs.ftp.TestFTPFileSystem
hadoop.metrics2.source.TestJvmMetrics
hadoop.io.compress.snappy.TestSnappyCompressorDecompressor
hadoop.io.compress.TestCompressorDecompressor
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/5/artifact/out/Dockerfile
GITHUB PR #1982
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle markdownlint xml
uname Linux e2723b83da03 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 192cad9
Default Java Private Build-1.8.0_252-8u252-b09-1~18.04-b09
javac https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/5/artifact/out/diff-compile-javac-root.txt
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/5/artifact/out/diff-checkstyle-root.txt
whitespace https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/5/artifact/out/whitespace-eol.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/5/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/5/testReport/
Max. process+thread count 1541 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/5/console
versions git=2.17.1 maven=3.6.0 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@apache apache deleted a comment from hadoop-yetus May 11, 2020
@apache apache deleted a comment from hadoop-yetus May 11, 2020
@apache apache deleted a comment from hadoop-yetus May 11, 2020
@steveloughran
Copy link
Contributor Author

  • docs to specify that after close() stats are immutable
  • same after unbuffer -until stream is used again

@steveloughran
Copy link
Contributor Author

add methods to add two iostatistics instances, and to subtract one from the other. This will help merging stats from multiple file read/writes, and for when we add the API to a long-life instance (e.g the s3a and abfs connectors), we can isolate IO better (though not across threads)

@steveloughran
Copy link
Contributor Author

This patch adds LocalFS/RawLocalFS/ChecksummedFileSystem statistics passthrough and collection.

Getting everything passed through is actually the harder part of the process...
I ended up having to debug things just work out what is going on there.

The fact that the raw local streams are buffered complicates testing.
The write tests expect the counter to not update until the stream
has closed; I need to expand the read tests this way too.

Although it makes for a bigger patch, it means that we get unit tests in
hadoop-common and that passthrough is all correct. It will also permit
applications to collect IO statistics on local storage operations.

new interface/impl to make it easy to instrument a class;
a map of key -> atomic long is built up as well as the stats mapping,
all the stream needs to add is a varags list of counters

    private final CounterIOStatistics ioStatistics = counterIOStatistics(
        STREAM_READ_BYTES,
        STREAM_READ_EXCEPTIONS,
        STREAM_READ_SEEK_OPERATIONS,
        STREAM_READ_SKIP_OPERATIONS,
        STREAM_READ_SKIP_BYTES);

which can then be set or incremented

          ioStatistics.increment(STREAM_READ_BYTES, 1);
          

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 22m 37s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 2s No case conflicting files found.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 23 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 0m 50s Maven dependency ordering for branch
+1 💚 mvninstall 24m 32s trunk passed
+1 💚 compile 21m 38s trunk passed
+1 💚 checkstyle 3m 34s trunk passed
+1 💚 mvnsite 2m 43s trunk passed
+1 💚 shadedclient 24m 38s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 47s trunk passed
+0 🆗 spotbugs 1m 20s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 4m 4s trunk passed
-0 ⚠️ patch 1m 40s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 28s Maven dependency ordering for patch
+1 💚 mvninstall 1m 42s the patch passed
+1 💚 compile 21m 25s the patch passed
-1 ❌ javac 21m 24s root generated 1 new + 1862 unchanged - 1 fixed = 1863 total (was 1863)
-0 ⚠️ checkstyle 3m 10s root: The patch generated 15 new + 160 unchanged - 22 fixed = 175 total (was 182)
+1 💚 mvnsite 2m 30s the patch passed
-1 ❌ whitespace 0m 0s The patch has 6 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 xml 0m 2s The patch has no ill-formed XML file.
+1 💚 shadedclient 17m 24s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 47s the patch passed
+1 💚 findbugs 4m 13s the patch passed
_ Other Tests _
-1 ❌ unit 9m 57s hadoop-common in the patch passed.
+1 💚 unit 1m 34s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 54s The patch does not generate ASF License warnings.
169m 39s
Reason Tests
Failed junit tests hadoop.metrics2.source.TestJvmMetrics
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/6/artifact/out/Dockerfile
GITHUB PR #1982
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle markdownlint xml
uname Linux e829422f0463 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / d4e3640
Default Java Private Build-1.8.0_252-8u252-b09-1~18.04-b09
javac https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/6/artifact/out/diff-compile-javac-root.txt
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/6/artifact/out/diff-checkstyle-root.txt
whitespace https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/6/artifact/out/whitespace-eol.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/6/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/6/testReport/
Max. process+thread count 1599 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/6/console
versions git=2.17.1 maven=3.6.0 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Contributed by Steve Loughran.

This patch adds to hadoop-common an API For querying IO classes (Especially
input and outpu streams) for statistics.

It includes a big rework of the S3A Statistics including

* implementation of the IOStatistics APIs
* and contract tests for those and any other streams which implement the same interfaces and	the same bytes read/written counters.
* A split  of the existing S3AInstrumentation classes into interface/implementations.
* Troubled attempt to wire up the AWSSDK metrics

The AWS metri binding will need to be split out and addressed separately
because the wiring up is breaking some of our region handling code

Doing the public interface hand-in-hand with that implementation helps evolve
the interface, but it makes for a bigger patch.

and contract tests for those and any other streams which implement
the same interfaces and	the same bytes read/written counters.

Proposed: once the reviewers are happy with the design we can split the two up
into the hadoop-common changes (which can be used in ABFS) and the S3A FS
changes.

Writing up the package info class makes me think it is a bit overcomplicated
right now and that maybe we should go for "dynamic always" statistics. If you
want a snapshot, it can be wrapped with IOStatisticsSupport.takeSnapshot().
This will simplify the client code and remove ambiguity in implementations as
to what they should be doing. We either provide callbacks to evaluate values or
references to AtomicLongs/AtomicIntegers which are probed on demand.

Change-Id: I91ae225d7def59602e84729f62a32f77158d97f6
This retains all the logic for using the builder API To create the
AWS SDK S3 client -but as that was failing in our endpoint/region
setup logic, it's been disabled in failing tests and in production.

it's still there ready to be turned on -once someone fixes the
regression.

Change-Id: If3251b7c835ad7da73bc2666cfa743d68f19ed24
…ounters

aren't updating correctly (or wrong ones...)

Change-Id: Ic2ca78fa5694dd579394c17fb01879cad6ebe8e1
Drastically simplify statistics API by removing any distinction between
snapshot/dynamic; everything is expected to be dynamic. There is a way to
snapshot them, and that is actually serializable (Tested). This allows
things like Spark to serialize statistics without extra work.

Add explicit tests for the DynamicIOStatistics which is expected to be the
sole implementation applications are likely to need.

This should be fairly straightforward to work with; IOStatisticsLogging is
the public API to convert statistics to strings robustly, especially in
log statements.

AWS tests are failing with

* the broken region stuff

* the put request count isn't being updated;
ITestS3AHugeMagicCommits>AbstractSTestS3AHugeFiles.test_010_CreateHugeFile
is failing.

Change-Id: I4d14cbef3a24379828b10ddc6a4cd793cb5d6f73
* Tune IOStatisticsLogging methods and output.

* Review/improve javadocs

* turn off failing network binding tests related to endpoint selection.

* Reinstate direct update of StorageStatistics values in S3AFS.IncrementStatistics

The last one was me thinking that the statistics context was back-updating
the filesystem storage statistics. IMO, it should; for now I just
reinstated the method.

Change-Id: I34033dd04a9cf88f84e2afd88eb70fdd127c5192
* Address Mukund's comments
* Add markdown page on the API
* Add tests for the on-demand stringifier
* Make S3A BlockOutputStreamStatistics and S3AInputStreamStatistics
  implement IOStatisticsSource and provide their (on-demand) IOStatistics
  instances through this API. That is: use it internally as well as
  a public API.

Change-Id: I42b8a17b2cf9af03ca358d4e0ac3ebddf71d12f0
Change-Id: I6aa053bd713dc754bb4428f075dbabae4466914c
cleanup: checkstyle, javadoc and other tweaks
This patch adds LocalFS/RawLocalFS/ChecksummedFileSystem statistics passthrough
 and collection.

Getting everything passed through is actually the harder part of the process...
I ended up having to debug things just work out what is going on there.

The fact that the raw local streams are buffered complicates testing.
The write tests expect the counter to not update until the stream
has closed; I need to expand the read tests this way too.

Although it makes for a bigger patch, it means that we get unit tests in
hadoop-common and that passthrough is all correct. It will also permit
applications to collect IO statistics on local storage operations.

Change-Id: Ibf9ebdb55aa57ef95199d5ccb6783cbf10216db9
Change-Id: I64e5c0e91d05e245e0615e695fb32f8a83f61958
@steveloughran steveloughran force-pushed the s3/HADOOP-16830-iostatistics branch from 72fe39e to 395ecd5 Compare May 20, 2020 15:27
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 33s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 2s No case conflicting files found.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 23 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 0m 49s Maven dependency ordering for branch
+1 💚 mvninstall 19m 8s trunk passed
+1 💚 compile 17m 8s trunk passed
+1 💚 checkstyle 2m 43s trunk passed
+1 💚 mvnsite 2m 19s trunk passed
+1 💚 shadedclient 19m 38s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 48s trunk passed
+0 🆗 spotbugs 1m 13s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 3m 16s trunk passed
-0 ⚠️ patch 1m 34s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 27s Maven dependency ordering for patch
-1 ❌ mvninstall 0m 27s hadoop-aws in the patch failed.
-1 ❌ compile 15m 33s root in the patch failed.
-1 ❌ javac 15m 33s root in the patch failed.
-0 ⚠️ checkstyle 2m 43s root: The patch generated 15 new + 160 unchanged - 22 fixed = 175 total (was 182)
-1 ❌ mvnsite 0m 49s hadoop-aws in the patch failed.
-1 ❌ whitespace 0m 0s The patch has 6 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 shadedclient 14m 12s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 47s the patch passed
-1 ❌ findbugs 0m 48s hadoop-aws in the patch failed.
_ Other Tests _
+1 💚 unit 9m 17s hadoop-common in the patch passed.
-1 ❌ unit 0m 47s hadoop-aws in the patch failed.
+1 💚 asflicense 0m 54s The patch does not generate ASF License warnings.
118m 58s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/7/artifact/out/Dockerfile
GITHUB PR #1982
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle markdownlint xml
uname Linux e76e345154f8 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 29b19cd
Default Java Private Build-1.8.0_252-8u252-b09-1~18.04-b09
mvninstall https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/7/artifact/out/patch-mvninstall-hadoop-tools_hadoop-aws.txt
compile https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/7/artifact/out/patch-compile-root.txt
javac https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/7/artifact/out/patch-compile-root.txt
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/7/artifact/out/diff-checkstyle-root.txt
mvnsite https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/7/artifact/out/patch-mvnsite-hadoop-tools_hadoop-aws.txt
whitespace https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/7/artifact/out/whitespace-eol.txt
findbugs https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/7/artifact/out/patch-findbugs-hadoop-tools_hadoop-aws.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/7/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/7/testReport/
Max. process+thread count 3417 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/7/console
versions git=2.17.1 maven=3.6.0 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

+javadocs

Change-Id: I83506eacf2fdec80e0db3c5fe23fb39b61b16abd
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 39s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 2s No case conflicting files found.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 23 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 0m 46s Maven dependency ordering for branch
+1 💚 mvninstall 19m 45s trunk passed
+1 💚 compile 17m 24s trunk passed
+1 💚 checkstyle 2m 45s trunk passed
+1 💚 mvnsite 2m 20s trunk passed
+1 💚 shadedclient 19m 30s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 46s trunk passed
+0 🆗 spotbugs 1m 10s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 3m 16s trunk passed
-0 ⚠️ patch 1m 32s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 25s Maven dependency ordering for patch
-1 ❌ mvninstall 0m 27s hadoop-aws in the patch failed.
-1 ❌ compile 15m 27s root in the patch failed.
-1 ❌ javac 15m 27s root in the patch failed.
-0 ⚠️ checkstyle 2m 43s root: The patch generated 15 new + 160 unchanged - 22 fixed = 175 total (was 182)
-1 ❌ mvnsite 0m 49s hadoop-aws in the patch failed.
-1 ❌ whitespace 0m 0s The patch has 6 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 shadedclient 14m 17s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 47s the patch passed
-1 ❌ findbugs 0m 47s hadoop-aws in the patch failed.
_ Other Tests _
+1 💚 unit 9m 18s hadoop-common in the patch passed.
-1 ❌ unit 0m 46s hadoop-aws in the patch failed.
+1 💚 asflicense 0m 53s The patch does not generate ASF License warnings.
119m 40s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/8/artifact/out/Dockerfile
GITHUB PR #1982
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle markdownlint xml
uname Linux c72bcbdd95a1 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 1a3c6bb
Default Java Private Build-1.8.0_252-8u252-b09-1~18.04-b09
mvninstall https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/8/artifact/out/patch-mvninstall-hadoop-tools_hadoop-aws.txt
compile https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/8/artifact/out/patch-compile-root.txt
javac https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/8/artifact/out/patch-compile-root.txt
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/8/artifact/out/diff-checkstyle-root.txt
mvnsite https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/8/artifact/out/patch-mvnsite-hadoop-tools_hadoop-aws.txt
whitespace https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/8/artifact/out/whitespace-eol.txt
findbugs https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/8/artifact/out/patch-findbugs-hadoop-tools_hadoop-aws.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/8/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/8/testReport/
Max. process+thread count 2702 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/8/console
versions git=2.17.1 maven=3.6.0 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

handoff ~all counters to CounterIOStatistics; extending that for
the expanded use (add, diff)

We could actually simplify the inputstats interface by just using get/set
on the keys -avoiding this for a bit of future flexibility

Change-Id: I649e8c5a8e8571b032a5a14590bb71150db1c95b
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 44s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 1s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 23 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 2m 0s Maven dependency ordering for branch
+1 💚 mvninstall 20m 23s trunk passed
+1 💚 compile 20m 15s trunk passed
+1 💚 checkstyle 3m 40s trunk passed
+1 💚 mvnsite 2m 38s trunk passed
+1 💚 shadedclient 25m 6s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 44s trunk passed
+0 🆗 spotbugs 1m 18s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 3m 47s trunk passed
-0 ⚠️ patch 1m 42s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 28s Maven dependency ordering for patch
+1 💚 mvninstall 1m 49s the patch passed
+1 💚 compile 22m 28s the patch passed
-1 ❌ javac 22m 28s root generated 1 new + 1862 unchanged - 1 fixed = 1863 total (was 1863)
-0 ⚠️ checkstyle 3m 39s root: The patch generated 30 new + 160 unchanged - 22 fixed = 190 total (was 182)
+1 💚 mvnsite 2m 31s the patch passed
-1 ❌ whitespace 0m 0s The patch has 8 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 shadedclient 17m 52s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 42s the patch passed
+1 💚 findbugs 4m 13s the patch passed
_ Other Tests _
+1 💚 unit 10m 47s hadoop-common in the patch passed.
+1 💚 unit 1m 30s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 52s The patch does not generate ASF License warnings.
146m 20s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/9/artifact/out/Dockerfile
GITHUB PR #1982
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle markdownlint xml
uname Linux 84e4bea4eded 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 9685314
Default Java Private Build-1.8.0_252-8u252-b09-1~18.04-b09
javac https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/9/artifact/out/diff-compile-javac-root.txt
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/9/artifact/out/diff-checkstyle-root.txt
whitespace https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/9/artifact/out/whitespace-eol.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/9/testReport/
Max. process+thread count 1547 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/9/console
versions git=2.17.1 maven=3.6.0 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Change-Id: I2b6aa6b1dd3ac5844f372906228b3b8dfc39c38e
Change-Id: Ife60eabcfe78c54d2e15ad5d7365d1a42ad48ed9
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 34s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 23 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 0m 21s Maven dependency ordering for branch
+1 💚 mvninstall 21m 37s trunk passed
+1 💚 compile 18m 18s trunk passed
+1 💚 checkstyle 2m 58s trunk passed
+1 💚 mvnsite 2m 9s trunk passed
+1 💚 shadedclient 21m 23s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 32s trunk passed
+0 🆗 spotbugs 1m 8s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 3m 18s trunk passed
-0 ⚠️ patch 1m 26s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 27s Maven dependency ordering for patch
+1 💚 mvninstall 1m 34s the patch passed
+1 💚 compile 18m 1s the patch passed
-1 ❌ javac 18m 1s root generated 1 new + 1862 unchanged - 1 fixed = 1863 total (was 1863)
-0 ⚠️ checkstyle 2m 57s root: The patch generated 20 new + 160 unchanged - 22 fixed = 180 total (was 182)
+1 💚 mvnsite 2m 7s the patch passed
-1 ❌ whitespace 0m 0s The patch has 8 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 shadedclient 15m 40s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 30s the patch passed
+1 💚 findbugs 3m 33s the patch passed
_ Other Tests _
+1 💚 unit 9m 50s hadoop-common in the patch passed.
+1 💚 unit 1m 26s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 45s The patch does not generate ASF License warnings.
129m 50s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/10/artifact/out/Dockerfile
GITHUB PR #1982
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle markdownlint xml
uname Linux 039c4101d198 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 0c25131
Default Java Private Build-1.8.0_252-8u252-b09-1~18.04-b09
javac https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/10/artifact/out/diff-compile-javac-root.txt
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/10/artifact/out/diff-checkstyle-root.txt
whitespace https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/10/artifact/out/whitespace-eol.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/10/testReport/
Max. process+thread count 3242 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/10/console
versions git=2.17.1 maven=3.6.0 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor Author

checkstyle. I intend to ignore those about _1, _2 and _3 methods as they match scala's; I plan to soon add tuple/triple classes with these to hadoop utils

Change-Id: I10497244526a133091babcf45bf24af6f8d8c3d6
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ patch 0m 6s #1982 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.
Subsystem Report/Notes
GITHUB PR #1982
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1982/11/console
versions git=2.17.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor Author

aah, i need to rebase. @mehakmeet -this is going to complicate your life. sorry. I'll do a whole new PR.

having written the new extensible design, I've decided I don't like it. It is too complex as I'm trying to support arbitrary arity tuples of any kind of statistic.it makes iterating/parsing this stuff way too complext

here's a better idea: we only support a limited set;

  • counter: long
  • min; long
  • max: long
  • mean (double, long)
  • gauge; long
  1. all but gauge have simple aggregation, for gauge i'll add stuff up too, on the assumption that they will be positive values (e.g 'number of active reads')
  2. and every set will have its own iterator.

what do people think?

@steveloughran steveloughran deleted the s3/HADOOP-16830-iostatistics branch June 11, 2020 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants