Skip to content

Commit f7b2638

Browse files
committed
HADOOP-18820: cut v1ProviderReferenced; update audit docs
Cut unused v1ProviderReferenced() method audit doc update - auditing is enabled again - new fs.s3a.audit.execution.interceptors option - how to log httpclient Change-Id: Ib0c5b86407e059d53c469151cf3d89ede65e9116
1 parent 644b390 commit f7b2638

File tree

3 files changed

+45
-25
lines changed

3 files changed

+45
-25
lines changed

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/V2Migration.java

Lines changed: 1 addition & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -45,30 +45,16 @@ private V2Migration() { }
4545

4646
public static final Logger SDK_V2_UPGRADE_LOG = LoggerFactory.getLogger(SDK_V2_UPGRADE_LOG_NAME);
4747

48-
private static final LogExactlyOnce WARN_OF_DIRECTLY_REFERENCED_CREDENTIAL_PROVIDER =
49-
new LogExactlyOnce(SDK_V2_UPGRADE_LOG);
50-
5148
private static final LogExactlyOnce WARN_OF_REQUEST_HANDLERS =
5249
new LogExactlyOnce(SDK_V2_UPGRADE_LOG);
5350

54-
/**
55-
* Notes an AWS V1 credential provider being referenced directly.
56-
* @param name name of the credential provider
57-
*/
58-
public static void v1ProviderReferenced(String name) {
59-
WARN_OF_DIRECTLY_REFERENCED_CREDENTIAL_PROVIDER.debug(
60-
"Directly referencing AWS SDK V1 credential provider {}. AWS SDK V1 credential "
61-
+ "providers will be removed once S3A is upgraded to SDK V2", name);
62-
}
63-
64-
6551
/**
6652
* Notes use of request handlers.
6753
*/
6854
public static void v1RequestHandlersUsed() {
6955
WARN_OF_REQUEST_HANDLERS.warn(
7056
"The request handler interface has changed in AWS SDK V2, use exception interceptors "
71-
+ "once S3A is upgraded to SDK V2");
57+
+ "now S3A is upgraded to SDK V2");
7258
}
7359

7460
}

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/auditing.md

Lines changed: 36 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ and inside the AWS S3 SDK, immediately before the request is executed.
2222
The full architecture is covered in [Auditing Architecture](auditing_architecture.html);
2323
this document covers its use.
2424

25-
## Important: Auditing is disabled by default
25+
## Important: Auditing is currently enabled
2626

2727
Due to a memory leak from the use of `ThreadLocal` fields, this auditing feature
2828
leaked memory as S3A filesystem instances were created and deleted.
@@ -32,7 +32,7 @@ See [HADOOP-18091](https://issues.apache.org/jira/browse/HADOOP-18091) _S3A audi
3232

3333
To avoid these memory leaks, auditing was disabled by default in the hadoop 3.3.2 release.
3434

35-
As these memory leaks have now been fixed, auditing has been re-enabled.
35+
As these memory leaks have now been fixed, auditing has been re-enabled in Hadoop 3.3.5+
3636

3737
To disable it, set `fs.s3a.audit.enabled` to `false`.
3838

@@ -77,7 +77,7 @@ ideally even identifying the process/job generating load.
7777

7878
## Using Auditing
7979

80-
Auditing is disabled by default.
80+
Auditing is enabled by default.
8181
When auditing enabled, a Logging Auditor will annotate the S3 logs through a custom
8282
HTTP Referrer header in requests made to S3.
8383
Other auditor classes may be used instead.
@@ -88,22 +88,22 @@ Other auditor classes may be used instead.
8888
|--------|---------|---------------|
8989
| `fs.s3a.audit.enabled` | Is auditing enabled? | `true` |
9090
| `fs.s3a.audit.service.classname` | Auditor classname | `org.apache.hadoop.fs.s3a.audit.impl.LoggingAuditor` |
91-
| `fs.s3a.audit.request.handlers` | List of extra subclasses of AWS SDK RequestHandler2 to include in handler chain | `""` |
91+
| `fs.s3a.audit.execution.interceptors` | Implementations of AWS v2 SDK `ExecutionInterceptor` to include in handler chain | `""` |
9292
| `fs.s3a.audit.referrer.enabled` | Logging auditor to publish the audit information in the HTTP Referrer header | `true` |
9393
| `fs.s3a.audit.referrer.filter` | List of audit fields to filter | `""` |
9494
| `fs.s3a.audit.reject.out.of.span.operations` | Auditor to reject operations "outside of a span" | `false` |
9595

9696

9797
### Disabling Auditing.
9898

99-
In this release of Hadoop, auditing is disabled.
99+
In this release of Hadoop, auditing is enabled by default.
100100

101101
This can be explicitly set globally or for specific buckets
102102

103103
```xml
104104
<property>
105105
<name>fs.s3a.audit.enabled</name>
106-
<value>false</value>
106+
<value>true</value>
107107
</property>
108108
```
109109

@@ -162,6 +162,23 @@ correlate access by S3 clients to the actual operations taking place.
162162
Note: this logging is described as "Best Effort". There's no guarantee as to
163163
when logs arrive.
164164

165+
### Integration with AWS SDK request processing
166+
167+
The auditing component inserts itself into the AWS SDK request processing
168+
code, so it can attach the referrer header.
169+
170+
It is possible to declare extra classes to add to the processing chain,
171+
all of which must implement the interface `software.amazon.awssdk.core.interceptor.ExecutionInterceptor`.
172+
173+
The list of classes is set in the configuration option `fs.s3a.audit.execution.interceptors`.
174+
175+
Before the upgrade to the V2 SDK, a list of extra subclasses of the AWS SDK `com.amazonaws.handlers.RequestHandler2`
176+
class could be declared in the option `fs.s3a.audit.request.handlers`;
177+
these would be wired up into the V1 request processing pipeline.
178+
179+
This option is now ignored completely, other than printing a warning message the first time a filesystem is created with a non-empty value.
180+
181+
165182
### Rejecting out-of-span operations
166183

167184
The logging auditor can be configured to raise an exception whenever
@@ -201,8 +218,8 @@ The HTTP referrer header is attached by the logging auditor.
201218
If the S3 Bucket is configured to log requests to another bucket, then these logs
202219
entries will include the audit information _as the referrer_.
203220

204-
This can be parsed (consult AWS documentation for a regular expression)
205-
and the http referrer header extracted.
221+
The S3 Server log entries can be parsed (consult AWS documentation for a regular expression)
222+
and the http referrer header extracted.
206223

207224
```
208225
https://audit.example.org/hadoop/1/op_rename/3c0d9b7e-2a63-43d9-a220-3c574d768ef3-3/
@@ -242,13 +259,14 @@ If any of the field values were `null`, the field is omitted.
242259

243260
_Notes_
244261

245-
* Thread IDs are from the current thread in the JVM, so can be compared to those in`````````
262+
* Thread IDs are from the current thread in the JVM, so can be compared to those in
246263
Log4J logs. They are never unique.
247264
* Task Attempt/Job IDs are only ever set during operations involving the S3A committers, specifically
248-
all operations excecuted by the committer.
265+
all operations executed by the committer.
249266
Operations executed in the same thread as the committer's instantiation _may_ also report the
250267
IDs, even if they are unrelated to the actual task. Consider them "best effort".
251268

269+
Thread IDs are generated as follows:
252270
```java
253271
Long.toString(Thread.currentThread().getId())
254272
```
@@ -269,6 +287,8 @@ This is why the span ID is always passed in as part of the URL,
269287
rather than just an HTTP query parameter: even if
270288
the header is chopped, the span ID will always be present.
271289

290+
As of August 2023, this header is not collected in AWS CloudTrail -only S3 Server logs.
291+
272292
## Privacy Implications of HTTP Referrer auditing
273293

274294
When the S3A client makes requests of an S3 bucket, the auditor
@@ -423,6 +443,12 @@ log4j.logger.org.apache.hadoop.fs.s3a.audit=TRACE
423443

424444
This is very noisy and not recommended in normal operation.
425445

446+
If logging of HTTP IO is enabled then the "referer" header is printed as part of every request:
447+
```
448+
log4j.logger.org.apache.http=DEBUG
449+
log4j.logger.software.amazon.awssdk.thirdparty.org.apache.http.client.HttpClient=DEBUG
450+
```
451+
426452
## Integration with S3A Committers
427453

428454
Work submitted through the S3A committer will have the job (query) ID associated

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/aws_sdk_upgrade.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -371,3 +371,11 @@ the interface `org.apache.hadoop.fs.s3a.audit.AWSAuditEventCallbacks`
371371

372372
Examine the interface and associated implementations to
373373
see how to migrate.
374+
375+
The option `fs.s3a.audit.request.handlers` to declare a list of v1 SDK
376+
`com.amazonaws.handlers.RequestHandler2` implementations to include
377+
in the AWS request chain is no longer supported: a warning is printed
378+
and the value ignored.
379+
380+
The V2 SDK equivalent, classes implementing `software.amazon.awssdk.core.interceptor.ExecutionInterceptor`
381+
can be declared in the configuration option `fs.s3a.audit.execution.interceptors`.

0 commit comments

Comments
 (0)