@@ -22,7 +22,7 @@ and inside the AWS S3 SDK, immediately before the request is executed.
2222The full architecture is covered in [ Auditing Architecture] ( auditing_architecture.html ) ;
2323this document covers its use.
2424
25- ## Important: Auditing is disabled by default   
25+ ## Important: Auditing is currently enabled   
2626
2727Due to a memory leak from the use of ` ThreadLocal `  fields, this auditing feature
2828leaked memory as S3A filesystem instances were created and deleted.
@@ -32,7 +32,7 @@ See [HADOOP-18091](https://issues.apache.org/jira/browse/HADOOP-18091) _S3A audi
3232
3333To avoid these memory leaks, auditing was disabled by default in the hadoop 3.3.2 release.
3434
35- As these memory leaks have now been fixed, auditing has been re-enabled. 
35+ As these memory leaks have now been fixed, auditing has been re-enabled in Hadoop 3.3.5+ 
3636
3737To disable it, set ` fs.s3a.audit.enabled `  to ` false ` .
3838
@@ -77,7 +77,7 @@ ideally even identifying the process/job generating load.
7777
7878## Using Auditing  
7979
80- Auditing is disabled  by default.
80+ Auditing is enabled  by default.
8181When auditing enabled, a Logging Auditor will annotate the S3 logs through a custom
8282HTTP Referrer header in requests made to S3.
8383Other auditor classes may be used instead.
@@ -88,22 +88,22 @@ Other auditor classes may be used instead.
8888| --------| ---------| ---------------| 
8989|  ` fs.s3a.audit.enabled `  |  Is auditing enabled? |  ` true `  | 
9090|  ` fs.s3a.audit.service.classname `  |  Auditor classname |  ` org.apache.hadoop.fs.s3a.audit.impl.LoggingAuditor `  | 
91- |  ` fs.s3a.audit.request.handlers  `  |  List  of extra subclasses of  AWS SDK RequestHandler2  to include in handler chain |  ` "" `  | 
91+ |  ` fs.s3a.audit.execution.interceptors  `  |  Implementations  of AWS v2  SDK ` ExecutionInterceptor `  to include in handler chain |  ` "" `  | 
9292|  ` fs.s3a.audit.referrer.enabled `  |  Logging auditor to publish the audit information in the HTTP Referrer header |  ` true `  | 
9393|  ` fs.s3a.audit.referrer.filter `  |  List of audit fields to filter |  ` "" `  | 
9494|  ` fs.s3a.audit.reject.out.of.span.operations `  |  Auditor to reject operations "outside of a span" |  ` false `  | 
9595
9696
9797### Disabling Auditing.  
9898
99- In this release of Hadoop, auditing is disabled .
99+ In this release of Hadoop, auditing is enabled by default .
100100
101101This can be explicitly set globally or for specific buckets
102102
103103``` xml 
104104<property >
105105  <name >fs.s3a.audit.enabled</name >
106-   <value >false </value >
106+   <value >true </value >
107107</property >
108108``` 
109109
@@ -162,6 +162,23 @@ correlate access by S3 clients to the actual operations taking place.
162162Note: this logging is described as "Best Effort". There's no guarantee as to
163163when logs arrive.
164164
165+ ### Integration with AWS SDK request processing  
166+ 
167+ The auditing component inserts itself into the AWS SDK request processing
168+ code, so it can attach the referrer header.
169+ 
170+ It is possible to declare extra classes to add to the processing chain,
171+ all of which must implement the interface ` software.amazon.awssdk.core.interceptor.ExecutionInterceptor ` .
172+ 
173+ The list of classes is set in the configuration option ` fs.s3a.audit.execution.interceptors ` .
174+ 
175+ Before the upgrade to the V2 SDK, a list of extra subclasses of the AWS SDK ` com.amazonaws.handlers.RequestHandler2 ` 
176+ class could be declared in the option ` fs.s3a.audit.request.handlers ` ;
177+ these would be wired up into the V1 request processing pipeline.
178+ 
179+ This option is now ignored completely, other than printing a warning message the first time a filesystem is created with a non-empty value.
180+ 
181+ 
165182### Rejecting out-of-span operations  
166183
167184The logging auditor can be configured to raise an exception whenever
@@ -201,8 +218,8 @@ The HTTP referrer header is attached by the logging auditor.
201218If the S3 Bucket is configured to log requests to another bucket, then these logs
202219entries will include the audit information _ as the referrer_ .
203220
204- This  can be parsed (consult AWS documentation for a regular expression)
205- and the http referrer header extracted.
221+ The S3 Server log entries  can be parsed (consult AWS documentation for a regular expression)
222+ and the http referrer header extracted.  
206223
207224``` 
208225https://audit.example.org/hadoop/1/op_rename/3c0d9b7e-2a63-43d9-a220-3c574d768ef3-3/ 
@@ -242,13 +259,14 @@ If any of the field values were `null`, the field is omitted.
242259
243260_ Notes_ 
244261
245- *  Thread IDs are from the current thread in the JVM, so can be compared to those in````````` 
262+ *  Thread IDs are from the current thread in the JVM, so can be compared to those in  
246263  Log4J logs. They are never unique.
247264*  Task Attempt/Job IDs are only ever set during operations involving the S3A committers, specifically
248-   all operations excecuted  by the committer.
265+   all operations executed  by the committer.
249266  Operations executed in the same thread as the committer's instantiation _ may_  also report the
250267  IDs, even if they are unrelated to the actual task. Consider them "best effort".
251268
269+ Thread IDs are generated as follows: 
252270``` java 
253271Long . toString(Thread . currentThread(). getId())
254272``` 
@@ -269,6 +287,8 @@ This is why the span ID is always passed in as part of the URL,
269287rather than just an HTTP query parameter: even if
270288the header is chopped, the span ID will always be present.
271289
290+ As of August 2023, this header is not collected in AWS CloudTrail -only S3 Server logs.
291+ 
272292## Privacy Implications of HTTP Referrer auditing  
273293
274294When the S3A client makes requests of an S3 bucket, the auditor
@@ -423,6 +443,12 @@ log4j.logger.org.apache.hadoop.fs.s3a.audit=TRACE
423443
424444This is very noisy and not recommended in normal operation.
425445
446+ If logging of HTTP IO is enabled then the "referer" header is printed as part of every request:
447+ ``` 
448+ log4j.logger.org.apache.http=DEBUG 
449+ log4j.logger.software.amazon.awssdk.thirdparty.org.apache.http.client.HttpClient=DEBUG 
450+ ``` 
451+ 
426452## Integration with S3A Committers  
427453
428454Work submitted through the S3A committer will have the job (query) ID associated
0 commit comments