Skip to content

Commit a36274d

Browse files
Ben Rolingsteveloughran
authored andcommitted
HADOOP-16085. S3Guard: use object version or etags to protect against inconsistent read after replace/overwrite.
Contributed by Ben Roling. S3Guard will now track the etag of uploaded files and, if an S3 bucket is versioned, the object version. You can then control how to react to a mismatch between the data in the DynamoDB table and that in the store: warn, fail, or, when using versions, return the original value. This adds two new columns to the table: etag and version. This is transparent to older S3A clients -but when such clients add/update data to the S3Guard table, they will not add these values. As a result, the etag/version checks will not work with files uploaded by older clients. For a consistent experience, upgrade all clients to use the latest hadoop version.
1 parent 729ccb2 commit a36274d

File tree

56 files changed

+3333
-465
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

56 files changed

+3333
-465
lines changed

hadoop-common-project/hadoop-common/src/main/resources/core-default.xml

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1904,15 +1904,15 @@
19041904
<name>fs.s3a.change.detection.mode</name>
19051905
<value>server</value>
19061906
<description>
1907-
Determines how change detection is applied to alert to S3 objects
1908-
rewritten while being read. Value 'server' indicates to apply the attribute
1909-
constraint directly on GetObject requests to S3. Value 'client' means to do a
1910-
client-side comparison of the attribute value returned in the response. Value
1911-
'server' would not work with third-party S3 implementations that do not
1912-
support these constraints on GetObject. Values 'server' and 'client' generate
1913-
RemoteObjectChangedException when a mismatch is detected. Value 'warn' works
1914-
like 'client' but generates only a warning. Value 'none' will ignore change
1915-
detection completely.
1907+
Determines how change detection is applied to alert to inconsistent S3
1908+
objects read during or after an overwrite. Value 'server' indicates to apply
1909+
the attribute constraint directly on GetObject requests to S3. Value 'client'
1910+
means to do a client-side comparison of the attribute value returned in the
1911+
response. Value 'server' would not work with third-party S3 implementations
1912+
that do not support these constraints on GetObject. Values 'server' and
1913+
'client' generate RemoteObjectChangedException when a mismatch is detected.
1914+
Value 'warn' works like 'client' but generates only a warning. Value 'none'
1915+
will ignore change detection completely.
19161916
</description>
19171917
</property>
19181918

hadoop-tools/hadoop-aws/pom.xml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -406,6 +406,11 @@
406406
<artifactId>hadoop-common</artifactId>
407407
<scope>provided</scope>
408408
</dependency>
409+
<dependency>
410+
<groupId>org.apache.httpcomponents</groupId>
411+
<artifactId>httpcore</artifactId>
412+
<scope>provided</scope>
413+
</dependency>
409414
<dependency>
410415
<groupId>org.apache.hadoop</groupId>
411416
<artifactId>hadoop-common</artifactId>

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Invoker.java

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -197,6 +197,33 @@ public void retry(String action,
197197
});
198198
}
199199

200+
/**
201+
* Execute a void operation with retry processing when doRetry=true, else
202+
* just once.
203+
* @param doRetry true if retries should be performed
204+
* @param action action to execute (used in error messages)
205+
* @param path path of work (used in error messages)
206+
* @param idempotent does the operation have semantics
207+
* which mean that it can be retried even if was already executed?
208+
* @param retrying callback on retries
209+
* @param operation operation to execute
210+
* @throws IOException any IOE raised, or translated exception
211+
*/
212+
@Retries.RetryTranslated
213+
public void maybeRetry(boolean doRetry,
214+
String action,
215+
String path,
216+
boolean idempotent,
217+
Retried retrying,
218+
VoidOperation operation)
219+
throws IOException {
220+
maybeRetry(doRetry, action, path, idempotent, retrying,
221+
() -> {
222+
operation.execute();
223+
return null;
224+
});
225+
}
226+
200227
/**
201228
* Execute a void operation with the default retry callback invoked.
202229
* @param action action to execute (used in error messages)
@@ -215,6 +242,28 @@ public void retry(String action,
215242
retry(action, path, idempotent, retryCallback, operation);
216243
}
217244

245+
/**
246+
* Execute a void operation with the default retry callback invoked when
247+
* doRetry=true, else just once.
248+
* @param doRetry true if retries should be performed
249+
* @param action action to execute (used in error messages)
250+
* @param path path of work (used in error messages)
251+
* @param idempotent does the operation have semantics
252+
* which mean that it can be retried even if was already executed?
253+
* @param operation operation to execute
254+
* @throws IOException any IOE raised, or translated exception
255+
*/
256+
@Retries.RetryTranslated
257+
public void maybeRetry(
258+
boolean doRetry,
259+
String action,
260+
String path,
261+
boolean idempotent,
262+
VoidOperation operation)
263+
throws IOException {
264+
maybeRetry(doRetry, action, path, idempotent, retryCallback, operation);
265+
}
266+
218267
/**
219268
* Execute a function with the default retry callback invoked.
220269
* @param action action to execute (used in error messages)
@@ -265,6 +314,41 @@ public <T> T retry(
265314
() -> once(action, path, operation));
266315
}
267316

317+
/**
318+
* Execute a function with retry processing when doRetry=true, else just once.
319+
* Uses {@link #once(String, String, Operation)} as the inner
320+
* invocation mechanism before retry logic is performed.
321+
* @param <T> type of return value
322+
* @param doRetry true if retries should be performed
323+
* @param action action to execute (used in error messages)
324+
* @param path path of work (used in error messages)
325+
* @param idempotent does the operation have semantics
326+
* which mean that it can be retried even if was already executed?
327+
* @param retrying callback on retries
328+
* @param operation operation to execute
329+
* @return the result of the call
330+
* @throws IOException any IOE raised, or translated exception
331+
*/
332+
@Retries.RetryTranslated
333+
public <T> T maybeRetry(
334+
boolean doRetry,
335+
String action,
336+
@Nullable String path,
337+
boolean idempotent,
338+
Retried retrying,
339+
Operation<T> operation)
340+
throws IOException {
341+
if (doRetry) {
342+
return retryUntranslated(
343+
toDescription(action, path),
344+
idempotent,
345+
retrying,
346+
() -> once(action, path, operation));
347+
} else {
348+
return once(action, path, operation);
349+
}
350+
}
351+
268352
/**
269353
* Execute a function with retry processing and no translation.
270354
* and the default retry callback.

0 commit comments

Comments
 (0)