Skip to content

Commit cdb6aac

Browse files
committed
HADOOP-17409. Remove s3guard from S3A module (#3534)
Completely removes S3Guard support from the S3A codebase. If the connector is configured to use any metastore other than the null and local stores (i.e. DynamoDB is selected) the s3a client will raise an exception and refuse to initialize. This is to ensure that there is no mix of S3Guard enabled and disabled deployments with the same configuration but different hadoop releases -it must be turned off completely. The "hadoop s3guard" command has been retained -but the supported subcommands have been reduced to those which are not purely S3Guard related: "bucket-info" and "uploads". This is major change in terms of the number of files changed; before cherry picking subsequent s3a patches into older releases, this patch will probably need backporting first. Goodbye S3Guard, your work is done. Time to die. Contributed by Steve Loughran. Change-Id: I4b8429640d6debd3928f991ef5fbc6d0aa1cab55
1 parent 47ba977 commit cdb6aac

File tree

218 files changed

+1234
-36689
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

218 files changed

+1234
-36689
lines changed

hadoop-common-project/hadoop-common/src/main/resources/core-default.xml

Lines changed: 3 additions & 186 deletions
Original file line numberDiff line numberDiff line change
@@ -1220,7 +1220,7 @@
12201220
com.amazonaws.auth.AWSCredentialsProvider.
12211221

12221222
When S3A delegation tokens are not enabled, this list will be used
1223-
to directly authenticate with S3 and DynamoDB services.
1223+
to directly authenticate with S3 and other AWS services.
12241224
When S3A Delegation tokens are enabled, depending upon the delegation
12251225
token binding it may be used
12261226
to communicate wih the STS endpoint to request session/role
@@ -1669,180 +1669,18 @@
16691669
</description>
16701670
</property>
16711671

1672-
<property>
1673-
<name>fs.s3a.metadatastore.authoritative</name>
1674-
<value>false</value>
1675-
<description>
1676-
When true, allow MetadataStore implementations to act as source of
1677-
truth for getting file status and directory listings. Even if this
1678-
is set to true, MetadataStore implementations may choose not to
1679-
return authoritative results. If the configured MetadataStore does
1680-
not support being authoritative, this setting will have no effect.
1681-
</description>
1682-
</property>
1683-
1684-
<property>
1685-
<name>fs.s3a.metadatastore.metadata.ttl</name>
1686-
<value>15m</value>
1687-
<description>
1688-
This value sets how long an entry in a MetadataStore is valid.
1689-
</description>
1690-
</property>
1691-
1692-
<property>
1693-
<name>fs.s3a.metadatastore.impl</name>
1694-
<value>org.apache.hadoop.fs.s3a.s3guard.NullMetadataStore</value>
1695-
<description>
1696-
Fully-qualified name of the class that implements the MetadataStore
1697-
to be used by s3a. The default class, NullMetadataStore, has no
1698-
effect: s3a will continue to treat the backing S3 service as the one
1699-
and only source of truth for file and directory metadata.
1700-
</description>
1701-
</property>
1702-
1703-
<property>
1704-
<name>fs.s3a.metadatastore.fail.on.write.error</name>
1705-
<value>true</value>
1706-
<description>
1707-
When true (default), FileSystem write operations generate
1708-
org.apache.hadoop.fs.s3a.MetadataPersistenceException if the metadata
1709-
cannot be saved to the metadata store. When false, failures to save to
1710-
metadata store are logged at ERROR level, but the overall FileSystem
1711-
write operation succeeds.
1712-
</description>
1713-
</property>
1714-
1715-
<property>
1716-
<name>fs.s3a.s3guard.cli.prune.age</name>
1717-
<value>86400000</value>
1718-
<description>
1719-
Default age (in milliseconds) after which to prune metadata from the
1720-
metadatastore when the prune command is run. Can be overridden on the
1721-
command-line.
1722-
</description>
1723-
</property>
1724-
1725-
17261672
<property>
17271673
<name>fs.s3a.impl</name>
17281674
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
17291675
<description>The implementation class of the S3A Filesystem</description>
17301676
</property>
17311677

1732-
<property>
1733-
<name>fs.s3a.s3guard.ddb.region</name>
1734-
<value></value>
1735-
<description>
1736-
AWS DynamoDB region to connect to. An up-to-date list is
1737-
provided in the AWS Documentation: regions and endpoints. Without this
1738-
property, the S3Guard will operate table in the associated S3 bucket region.
1739-
</description>
1740-
</property>
1741-
1742-
<property>
1743-
<name>fs.s3a.s3guard.ddb.table</name>
1744-
<value></value>
1745-
<description>
1746-
The DynamoDB table name to operate. Without this property, the respective
1747-
S3 bucket name will be used.
1748-
</description>
1749-
</property>
1750-
1751-
<property>
1752-
<name>fs.s3a.s3guard.ddb.table.create</name>
1753-
<value>false</value>
1754-
<description>
1755-
If true, the S3A client will create the table if it does not already exist.
1756-
</description>
1757-
</property>
1758-
1759-
<property>
1760-
<name>fs.s3a.s3guard.ddb.table.capacity.read</name>
1761-
<value>0</value>
1762-
<description>
1763-
Provisioned throughput requirements for read operations in terms of capacity
1764-
units for the DynamoDB table. This config value will only be used when
1765-
creating a new DynamoDB table.
1766-
If set to 0 (the default), new tables are created with "per-request" capacity.
1767-
If a positive integer is provided for this and the write capacity, then
1768-
a table with "provisioned capacity" will be created.
1769-
You can change the capacity of an existing provisioned-capacity table
1770-
through the "s3guard set-capacity" command.
1771-
</description>
1772-
</property>
1773-
1774-
<property>
1775-
<name>fs.s3a.s3guard.ddb.table.capacity.write</name>
1776-
<value>0</value>
1777-
<description>
1778-
Provisioned throughput requirements for write operations in terms of
1779-
capacity units for the DynamoDB table.
1780-
If set to 0 (the default), new tables are created with "per-request" capacity.
1781-
Refer to related configuration option fs.s3a.s3guard.ddb.table.capacity.read
1782-
</description>
1783-
</property>
1784-
1785-
<property>
1786-
<name>fs.s3a.s3guard.ddb.table.sse.enabled</name>
1787-
<value>false</value>
1788-
<description>
1789-
Whether server-side encryption (SSE) is enabled or disabled on the table.
1790-
By default it's disabled, meaning SSE is set to AWS owned CMK.
1791-
</description>
1792-
</property>
1793-
1794-
<property>
1795-
<name>fs.s3a.s3guard.ddb.table.sse.cmk</name>
1796-
<value/>
1797-
<description>
1798-
The KMS Customer Master Key (CMK) used for the KMS encryption on the table.
1799-
To specify a CMK, this config value can be its key ID, Amazon Resource Name
1800-
(ARN), alias name, or alias ARN. Users only need to provide this config if
1801-
the key is different from the default DynamoDB KMS Master Key, which is
1802-
alias/aws/dynamodb.
1803-
</description>
1804-
</property>
1805-
1806-
<property>
1807-
<name>fs.s3a.s3guard.ddb.max.retries</name>
1808-
<value>9</value>
1809-
<description>
1810-
Max retries on throttled/incompleted DynamoDB operations
1811-
before giving up and throwing an IOException.
1812-
Each retry is delayed with an exponential
1813-
backoff timer which starts at 100 milliseconds and approximately
1814-
doubles each time. The minimum wait before throwing an exception is
1815-
sum(100, 200, 400, 800, .. 100*2^N-1 ) == 100 * ((2^N)-1)
1816-
</description>
1817-
</property>
1818-
1819-
<property>
1820-
<name>fs.s3a.s3guard.ddb.throttle.retry.interval</name>
1821-
<value>100ms</value>
1822-
<description>
1823-
Initial interval to retry after a request is throttled events;
1824-
the back-off policy is exponential until the number of retries of
1825-
fs.s3a.s3guard.ddb.max.retries is reached.
1826-
</description>
1827-
</property>
1828-
1829-
<property>
1830-
<name>fs.s3a.s3guard.ddb.background.sleep</name>
1831-
<value>25ms</value>
1832-
<description>
1833-
Length (in milliseconds) of pause between each batch of deletes when
1834-
pruning metadata. Prevents prune operations (which can typically be low
1835-
priority background operations) from overly interfering with other I/O
1836-
operations.
1837-
</description>
1838-
</property>
1839-
18401678
<property>
18411679
<name>fs.s3a.retry.limit</name>
18421680
<value>7</value>
18431681
<description>
18441682
Number of times to retry any repeatable S3 client request on failure,
1845-
excluding throttling requests and S3Guard inconsistency resolution.
1683+
excluding throttling requests.
18461684
</description>
18471685
</property>
18481686

@@ -1851,7 +1689,7 @@
18511689
<value>500ms</value>
18521690
<description>
18531691
Initial retry interval when retrying operations for any reason other
1854-
than S3 throttle errors and S3Guard inconsistency resolution.
1692+
than S3 throttle errors.
18551693
</description>
18561694
</property>
18571695

@@ -1874,27 +1712,6 @@
18741712
</description>
18751713
</property>
18761714

1877-
<property>
1878-
<name>fs.s3a.s3guard.consistency.retry.limit</name>
1879-
<value>7</value>
1880-
<description>
1881-
Number of times to retry attempts to read/open/copy files when
1882-
S3Guard believes a specific version of the file to be available,
1883-
but the S3 request does not find any version of a file, or a different
1884-
version.
1885-
</description>
1886-
</property>
1887-
1888-
<property>
1889-
<name>fs.s3a.s3guard.consistency.retry.interval</name>
1890-
<value>2s</value>
1891-
<description>
1892-
Initial interval between attempts to retry operations while waiting for S3
1893-
to become consistent with the S3Guard data.
1894-
An exponential back-off is used here: every failure doubles the delay.
1895-
</description>
1896-
</property>
1897-
18981715
<property>
18991716
<name>fs.s3a.committer.name</name>
19001717
<value>file</value>

hadoop-common-project/hadoop-common/src/site/markdown/AdminCompatibilityGuide.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,8 @@ internal state stores:
137137

138138
* The internal MapReduce state data will remain compatible across minor releases within the same major version to facilitate rolling upgrades while MapReduce workloads execute.
139139
* HDFS maintains metadata about the data stored in HDFS in a private, internal format that is versioned. In the event of an incompatible change, the store's version number will be incremented. When upgrading an existing cluster, the metadata store will automatically be upgraded if possible. After the metadata store has been upgraded, it is always possible to reverse the upgrade process.
140-
* The AWS S3A guard keeps a private, internal metadata store that is versioned. Incompatible changes will cause the version number to be incremented. If an upgrade requires reformatting the store, it will be indicated in the release notes.
140+
* The AWS S3A guard kept a private, internal metadata store.
141+
Now that the feature has been removed, the store is obsolete and can be deleted.
141142
* The YARN resource manager keeps a private, internal state store of application and scheduler information that is versioned. Incompatible changes will cause the version number to be incremented. If an upgrade requires reformatting the store, it will be indicated in the release notes.
142143
* The YARN node manager keeps a private, internal state store of application information that is versioned. Incompatible changes will cause the version number to be incremented. If an upgrade requires reformatting the store, it will be indicated in the release notes.
143144
* The YARN federation service keeps a private, internal state store of application and cluster information that is versioned. Incompatible changes will cause the version number to be incremented. If an upgrade requires reformatting the store, it will be indicated in the release notes.

hadoop-common-project/hadoop-common/src/site/markdown/Compatibility.md

Lines changed: 5 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -477,19 +477,12 @@ rolled back to the older layout.
477477

478478
##### AWS S3A Guard Metadata
479479

480-
For each operation in the Hadoop S3 client (s3a) that reads or modifies
481-
file metadata, a shadow copy of that file metadata is stored in a separate
482-
metadata store, which offers HDFS-like consistency for the metadata, and may
483-
also provide faster lookups for things like file status or directory listings.
484-
S3A guard tables are created with a version marker which indicates
485-
compatibility.
480+
The S3Guard metastore used to store metadata in DynamoDB tables;
481+
as such it had to maintain a compatibility strategy.
482+
Now that S3Guard is removed, the tables are not needed.
486483

487-
###### Policy
488-
489-
The S3A guard metadata schema SHALL be considered
490-
[Private](./InterfaceClassification.html#Private) and
491-
[Unstable](./InterfaceClassification.html#Unstable). Any incompatible change
492-
to the schema MUST result in the version number of the schema being incremented.
484+
Applications configured to use an S3A metadata store other than
485+
the "null" store will fail.
493486

494487
##### YARN Resource Manager State Store
495488

hadoop-common-project/hadoop-common/src/site/markdown/filesystem/introduction.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -343,7 +343,7 @@ stores pretend that they are a FileSystem, a FileSystem with the same
343343
features and operations as HDFS. This is &mdash;ultimately&mdash;a pretence:
344344
they have different characteristics and occasionally the illusion fails.
345345

346-
1. **Consistency**. Object stores are generally *Eventually Consistent*: it
346+
1. **Consistency**. Object may be *Eventually Consistent*: it
347347
can take time for changes to objects &mdash;creation, deletion and updates&mdash;
348348
to become visible to all callers. Indeed, there is no guarantee a change is
349349
immediately visible to the client which just made the change. As an example,
@@ -447,10 +447,6 @@ Object stores have an even vaguer view of time, which can be summarized as
447447
* The timestamp is likely to be in UTC or the TZ of the object store. If the
448448
client is in a different timezone, the timestamp of objects may be ahead or
449449
behind that of the client.
450-
* Object stores with cached metadata databases (for example: AWS S3 with
451-
an in-memory or a DynamoDB metadata store) may have timestamps generated
452-
from the local system clock, rather than that of the service.
453-
This is an optimization to avoid round-trip calls to the object stores.
454450
+ A file's modification time is often the same as its creation time.
455451
+ The `FileSystem.setTimes()` operation to set file timestamps *may* be ignored.
456452
* `FileSystem.chmod()` may update modification times (example: Azure `wasb://`).

hadoop-project/src/site/markdown/index.md.vm

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -203,16 +203,6 @@ in both the task configuration and as a Java option.
203203
Existing configs that already specify both are not affected by this change.
204204
See the full release notes of MAPREDUCE-5785 for more details.
205205

206-
S3Guard: Consistency and Metadata Caching for the S3A filesystem client
207-
---------------------
208-
209-
[HADOOP-13345](https://issues.apache.org/jira/browse/HADOOP-13345) adds an
210-
optional feature to the S3A client of Amazon S3 storage: the ability to use
211-
a DynamoDB table as a fast and consistent store of file and directory
212-
metadata.
213-
214-
See [S3Guard](./hadoop-aws/tools/hadoop-aws/s3guard.html) for more details.
215-
216206
HDFS Router-Based Federation
217207
---------------------
218208
HDFS Router-Based Federation adds a RPC routing layer that provides a federated

hadoop-tools/hadoop-aws/dev-support/findbugs-exclude.xml

Lines changed: 0 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -29,20 +29,6 @@
2929
<Bug pattern="RCN_REDUNDANT_NULLCHECK_OF_NONNULL_VALUE" />
3030
</Match>
3131

32-
<!--
33-
This extends the serializable S3Object, so findbug checks
34-
serializability. It is never serialized however, so its
35-
warnings are false positives.
36-
-->
37-
<Match>
38-
<Class name="org.apache.hadoop.fs.s3a.InconsistentS3Object" />
39-
<Bug pattern="SE_TRANSIENT_FIELD_NOT_RESTORED" />
40-
</Match>
41-
<Match>
42-
<Class name="org.apache.hadoop.fs.s3a.InconsistentS3Object" />
43-
<Bug pattern="SE_NO_SERIALVERSIONID" />
44-
</Match>
45-
4632
<!--
4733
findbugs gets confused by lambda expressions in synchronized methods
4834
and considers references to fields to be unsynchronized.

0 commit comments

Comments
 (0)