-
Couldn't load subscription status.
- Fork 9.1k
HADOOP-17461. Thread-level IOStatistics in S3A #4352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main comment is that the thread's statistic aggregator should be fetched/stored in constructor, not in close. indeed, it could maybe be passed in. when thread level is disabled, s3a fs would just pass in an EmptyIOStatisticsStore whose aggregation is a noop.
proposed changes to the testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs indentation
...adoop-common/src/main/java/org/apache/hadoop/fs/statistics/impl/IOStatisticsContextImpl.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: space between current and thread
...adoop-common/src/main/java/org/apache/hadoop/fs/statistics/impl/IOStatisticsContextImpl.java
Outdated
Show resolved
Hide resolved
...adoop-common/src/main/java/org/apache/hadoop/fs/statistics/impl/IOStatisticsContextImpl.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AReadOpContext.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AIOStatisticsContext.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AIOStatisticsContext.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ABlockOutputStream.java
Outdated
Show resolved
Hide resolved
|
Thanks for the review @steveloughran, sorry couldn't address anything until now(got little ill)
Got your point, so just one concern on that, should the IOStatisticsContext be a static instance in the S3AFileSystem, and we just pass on the iostatisticsAggregator to the streams, since we would still require the context in the streams to update the WeakReferenceThreadMap after the aggregation, right? |
|
we need a single IOStatisticsContext for all FS instances, so that a task reading from one fs and writing to another would have the stats updated from both actions. And it needs to be in hadoop-common, just like the common audit context, so that code can compile and link against it even without having hadoop-aws or hadoop-azure on the classpath. Making the context a weak ref map ensures that GCs will trigger cleanup. This would lose stats, but not while any stream was active, *or if some code picked up a reference to the thread stats before executing work. That is what I plan to do in spark; we will grab that ref before starting the work, and after it is finished, take a snapshot of it. Oh, and reset the values before work starts -we need that context there too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other than the need to move the thread context into hadoop common, everything here looks pretty complete
...project/hadoop-common/src/main/java/org/apache/hadoop/fs/statistics/IOStatisticsContext.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ABlockOutputStream.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
Outdated
Show resolved
Hide resolved
|
Had to force push to resolve conflicts. Some changes in the latest commit:
|
|
More merge conflicts due to vectored IO merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your tests show that we need the ability to reset the statistics for the current threat.
For the tests we wouldn't need to have any class level setup. Instead in the normal set up method we would get the current context stats and reset them. This is also exactly what we will need to do when collecting statistics in the applications where we will want to grab a reference to that context before executing the work, reset it at that point, so that all statistics collected on it at the end of the work will have been added exclusively during that execution.
Proposed: IOStatisticsContext adds resetCurrentThreadStatistics() which does this, calling clear() on the snapshot.
ITestS3AIOStatisticsContext can resetCurrentThreadStatistics() in its normal setup().
Then have test cases which verify that stats are shared invoke some method which does the file create/read, taking in a path which can be derived off methodPath. That way: no hardcoded paths in the test suite.
...ct/hadoop-common/src/main/java/org/apache/hadoop/fs/statistics/impl/IOStatisticsContext.java
Outdated
Show resolved
Hide resolved
...ct/hadoop-common/src/main/java/org/apache/hadoop/fs/statistics/impl/IOStatisticsContext.java
Outdated
Show resolved
Hide resolved
...ct/hadoop-common/src/main/java/org/apache/hadoop/fs/statistics/impl/IOStatisticsContext.java
Outdated
Show resolved
Hide resolved
...ct/hadoop-common/src/main/java/org/apache/hadoop/fs/statistics/impl/IOStatisticsContext.java
Outdated
Show resolved
Hide resolved
...ct/hadoop-common/src/main/java/org/apache/hadoop/fs/statistics/impl/IOStatisticsContext.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ABlockOutputStream.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AIOStatisticsContext.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AIOStatisticsContext.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor tuning of the reset operation, to make things slightly more efficient
...ct/hadoop-common/src/main/java/org/apache/hadoop/fs/statistics/impl/IOStatisticsContext.java
Outdated
Show resolved
Hide resolved
...ct/hadoop-common/src/main/java/org/apache/hadoop/fs/statistics/impl/IOStatisticsContext.java
Outdated
Show resolved
Hide resolved
...ct/hadoop-common/src/main/java/org/apache/hadoop/fs/statistics/impl/IOStatisticsContext.java
Outdated
Show resolved
Hide resolved
...ct/hadoop-common/src/main/java/org/apache/hadoop/fs/statistics/impl/IOStatisticsContext.java
Outdated
Show resolved
Hide resolved
|
Changed the way streams are accessing the thread IOStatistics, now we would directly get them from the current active context in the stream's constructor rather than pass them around through the builders, as it didn't seem to add anything if we can directly get it due to static nature of the context. Also, made the weakRef as IOstatisticsSnapshot rather than an aggregator and then cast as discussed above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm writing an initial PoC of stats collection in spark.
Spark wants to do incremental updates as a task goes along, which it does with a heartbeat thread invoking callbacks to collect the is Fred level stats of the worker threads.
And we can wire that up simply by calling getThreadIOStatistics() in the worker thread, caching the value. But resetting after updates isn't so easy.
I think we are going to need a specific class for each thread which offers the aggregator, snapshot and reset operations.
Which we can do in the + by adding non static state there, including a reset() and snapshot() methods.
the worker thread I would cache that value
workerContext: IOStatisticsContext = IOStatisticsContext.getContextForCurrentThreadThen to update the task metrics, the operation would be
def getCurrentStatistics():IOStatistics = {
// snapshot current value
current = workerContext.snapshot
// then reset the stats on the worker thread.
workerContext.reset
current
}
// which then is used to update the task metrics
taskMetrics.updateStatistics(getCurrentStatistics())| * | ||
| * @return the instance of IOStatisticsAggregator for the current thread. | ||
| */ | ||
| public IOStatisticsAggregator getThreadIOStatistics() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's call this ThreadIOStatisticsAggregator
|
see https://github.com/steveloughran/spark/tree/HADOOP-17461-iostatistics and where we'd be doing the collection for the spark RDDs (not the sql code, which is more significant but harder to get started on) |
|
ok, i have some changes but I will add them as a commit for you to review/cherrypick...i want to make sure they line up with a local spark build. key point: interfaces now support static methods, so looking up the current context can/should be a static method in IOStatisticsContext |
* move reference map and lookup to a IOStatisticsContextIntegration class * static method in IOStatisticsContext to relay lookup * add method to switch a thread's context; needed to aggregate worker thread IO in threads doing work for committers without the need to explicitly collect and pass back the stats * production code moves to the new methods * tests move to this and away from looking up the fields in the streams * stats are reset in s3a test setup * s3a committers collect data read stats during job commit and include in summary statistics. This is only the stats when reading manifest files, not the actual work. * tests to print the aggregate of all loaded success files in the run. Change-Id: I604990f2132b76d38e85ca8b777630225c32158e
* Cost of scan/load of magic files in task commit are collected * S3A list iterators update the context stats of the thread they were created in in close() calls. * With close() passthrough working and TaskPool invoking it if the iterator is closeable. Change-Id: If0a0c2de08d52a74b7c1f9498716d423b97b4003
|
This is my draft commit message btw Adds a new IOStatistics class IOStatisticsContext. This is the active collector of thread-level statistics for The S3A Filesystem's input and output streams, and listing The IOStatisticsContext of a thread can be retrieved and To collect statistics on a thread:
To instrument filesystem objects for thread-level
TaskPool does the context propagation and reset Contributed by Mehakmeet Singh |
+setting thread context to null resets it +move merging of fs stats into finally block, after streamStatistics close has been called to update final stats Change-Id: I913b6a473da12918025e7ec11d4168bd135f0fc5
|
Pushed the changes, makes sense in case of null IOStatisticsContext. There is one more issue, in case we have Also
*and Steve Loughran |
|
good point. how about adding a static method to enable it in the integration class? disabling it would be harder...what if contexts had already been generated? easier to only allow the caller to go from off to on. also, had an idea: add a stream capability then add asserts to the test to verify this
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, some minor tweaks
...ommon/src/main/java/org/apache/hadoop/fs/statistics/impl/IOStatisticsContextIntegration.java
Outdated
Show resolved
Hide resolved
...ommon/src/main/java/org/apache/hadoop/fs/statistics/impl/IOStatisticsContextIntegration.java
Outdated
Show resolved
Hide resolved
...ommon/src/main/java/org/apache/hadoop/fs/statistics/impl/IOStatisticsContextIntegration.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 pending the checkstyle fix
|
ok, one more change, as I write up something on integration. The RawLocal streams should support IOContext too. why so? makes testing integration easier, e.g. for distcp, spark etc. no need to wait until object store tests. This also lines up for moving ITestS3AIOStatisticsContext into hadoop common unit tests as a contract test. I'm not going to make that a requirement of this PR, but for adding abfs in we should do that |
|
-1 to the last patch. pick up 9dd221d from my PR |
|
(the streams need to cache the aggregator and update in close) |
This reverts commit de12dbe.
This is important for testing IOStatisticsContext functionality in unit tests as well as a source of actual data. +fixed the checkstyle Change-Id: I4f429a6a81729027026dc46bd1519f90a145c205
|
+1 pending yetus |
|
🎊 +1 overall
This message was automatically generated. |
|
ok, merged. Can you do a PR/cherrypick into branch-3.3 now? |
This adds a thread-level collector of IOStatistics, IOStatisticsContext, which can be: * Retrieved for a thread and cached for access from other threads. * reset() to record new statistics. * Queried for live statistics through the IOStatisticsSource.getIOStatistics() method. * Queries for a statistics aggregator for use in instrumented classes. * Asked to create a serializable copy in snapshot() The goal is to make it possible for applications with multiple threads performing different work items simultaneously to be able to collect statistics on the individual threads, and so generate aggregate reports on the total work performed for a specific job, query or similar unit of work. Some changes in IOStatistics-gathering classes are needed for this feature * Caching the active context's aggregator in the object's constructor * Updating it in close() Slightly more work is needed in multithreaded code, such as the S3A committers, which collect statistics across all threads used in task and job commit operations. Currently the IOStatisticsContext-aware classes are: * The S3A input stream, output stream and list iterators. * RawLocalFileSystem's input and output streams. * The S3A committers. * The TaskPool class in hadoop-common, which propagates the active context into scheduled worker threads. Collection of statistics in the IOStatisticsContext is disabled process-wide by default until the feature is considered stable. To enable the collection, set the option fs.thread.level.iostatistics.enabled to "true" in core-site.xml; Contributed by Mehakmeet Singh and Steve Loughran
This adds a thread-level collector of IOStatistics, IOStatisticsContext, which can be: * Retrieved for a thread and cached for access from other threads. * reset() to record new statistics. * Queried for live statistics through the IOStatisticsSource.getIOStatistics() method. * Queries for a statistics aggregator for use in instrumented classes. * Asked to create a serializable copy in snapshot() The goal is to make it possible for applications with multiple threads performing different work items simultaneously to be able to collect statistics on the individual threads, and so generate aggregate reports on the total work performed for a specific job, query or similar unit of work. Some changes in IOStatistics-gathering classes are needed for this feature * Caching the active context's aggregator in the object's constructor * Updating it in close() Slightly more work is needed in multithreaded code, such as the S3A committers, which collect statistics across all threads used in task and job commit operations. Currently the IOStatisticsContext-aware classes are: * The S3A input stream, output stream and list iterators. * RawLocalFileSystem's input and output streams. * The S3A committers. * The TaskPool class in hadoop-common, which propagates the active context into scheduled worker threads. Collection of statistics in the IOStatisticsContext is disabled process-wide by default until the feature is considered stable. To enable the collection, set the option fs.thread.level.iostatistics.enabled to "true" in core-site.xml; Contributed by Mehakmeet Singh and Steve Loughran
This adds a thread-level collector of IOStatistics, IOStatisticsContext, which can be: * Retrieved for a thread and cached for access from other threads. * reset() to record new statistics. * Queried for live statistics through the IOStatisticsSource.getIOStatistics() method. * Queries for a statistics aggregator for use in instrumented classes. * Asked to create a serializable copy in snapshot() The goal is to make it possible for applications with multiple threads performing different work items simultaneously to be able to collect statistics on the individual threads, and so generate aggregate reports on the total work performed for a specific job, query or similar unit of work. Some changes in IOStatistics-gathering classes are needed for this feature * Caching the active context's aggregator in the object's constructor * Updating it in close() Slightly more work is needed in multithreaded code, such as the S3A committers, which collect statistics across all threads used in task and job commit operations. Currently the IOStatisticsContext-aware classes are: * The S3A input stream, output stream and list iterators. * RawLocalFileSystem's input and output streams. * The S3A committers. * The TaskPool class in hadoop-common, which propagates the active context into scheduled worker threads. Collection of statistics in the IOStatisticsContext is disabled process-wide by default until the feature is considered stable. To enable the collection, set the option fs.thread.level.iostatistics.enabled to "true" in core-site.xml; Contributed by Mehakmeet Singh and Steve Loughran
This adds a thread-level collector of IOStatistics, IOStatisticsContext, which can be: * Retrieved for a thread and cached for access from other threads. * reset() to record new statistics. * Queried for live statistics through the IOStatisticsSource.getIOStatistics() method. * Queries for a statistics aggregator for use in instrumented classes. * Asked to create a serializable copy in snapshot() The goal is to make it possible for applications with multiple threads performing different work items simultaneously to be able to collect statistics on the individual threads, and so generate aggregate reports on the total work performed for a specific job, query or similar unit of work. Some changes in IOStatistics-gathering classes are needed for this feature * Caching the active context's aggregator in the object's constructor * Updating it in close() Slightly more work is needed in multithreaded code, such as the S3A committers, which collect statistics across all threads used in task and job commit operations. Currently the IOStatisticsContext-aware classes are: * The S3A input stream, output stream and list iterators. * RawLocalFileSystem's input and output streams. * The S3A committers. * The TaskPool class in hadoop-common, which propagates the active context into scheduled worker threads. Collection of statistics in the IOStatisticsContext is disabled process-wide by default until the feature is considered stable. To enable the collection, set the option fs.thread.level.iostatistics.enabled to "true" in core-site.xml; Contributed by Mehakmeet Singh and Steve Loughran
Description of PR
Adding Thread-level IOStatsitics in hadoop-common and implementing it in S3A Streams.
How was this patch tested?
Region: ap-south-1
mvn clean verify -Dparallel-tests -DtestsThreadCount=4 -DscaleAll tests ran fine.
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?