@@ -180,7 +180,11 @@ The S3A Filesystem client supports the notion of input policies, similar
180180to that of the Posix ` fadvise() ` API call. This tunes the behavior of the S3A
181181client to optimise HTTP GET requests for the different use cases.
182182
183- ### fadvise ` sequential `
183+ The list of supported options is found in
184+ [ FSDataInputStream] ( ../../../../../../hadoop-common-project/hadoop-common/target/site/filesystem/fsdatainputstreambuilder.html ) .
185+
186+
187+ ### fadvise ` sequential ` , ` whole-file `
184188
185189Read through the file, possibly with some short forward seeks.
186190
@@ -196,6 +200,9 @@ sequential access, as should those reading data from gzipped `.gz` files.
196200Because the "normal" fadvise policy starts off in sequential IO mode,
197201there is rarely any need to explicit request this policy.
198202
203+ Distcp will automatically request ` whole-file ` access, even on deployments
204+ where the cluster configuration is for ` random ` IO.
205+
199206### fadvise ` random `
200207
201208Optimised for random IO, specifically the Hadoop ` PositionedReadable `
@@ -243,7 +250,7 @@ basis.
243250to set fadvise policies on input streams. Once implemented,
244251this will become the supported mechanism used for configuring the input IO policy.
245252
246- ### fadvise ` normal ` (default)
253+ ### fadvise ` normal ` or ` adaptive ` (default)
247254
248255The ` normal ` policy starts off reading a file in ` sequential ` mode,
249256but if the caller seeks backwards in the stream, it switches from
@@ -276,7 +283,45 @@ Fix: Use one of the dedicated [S3A Committers](committers.md).
276283
277284## <a name =" tuning " ></a > Options to Tune
278285
279- ### <a name =" pooling " ></a > Thread and connection pool settings.
286+ ### <a name =" flags " ></a > Performance Flags: ` fs.s3a.performance.flag `
287+
288+ This option takes a comma separated list of performance flags.
289+ View it as the equivalent of the ` -O ` compiler optimization list C/C++ compilers offer.
290+ That is a complicated list of options which deliver speed if the person setting them
291+ understands the risks.
292+
293+ * The list of flags MAY change across releases
294+ * The semantics of specific flags SHOULD NOT change across releases.
295+ * If an option is to be tuned which may relax semantics, a new option MUST be defined.
296+ * Unknown flags are ignored; this is to avoid compatibility.
297+ * The option ` * ` means "turn everything on". This is implicitly unstable across releases.
298+
299+ | * Option* | * Meaning* | Since |
300+ | ----------| --------------------| :------|
301+ | ` create ` | Create Performance | 3.4.1 |
302+
303+ The ` create ` flag has the same semantics as [ ` fs.s3a.create.performance ` ] ( #create-performance )
304+
305+
306+ ### <a name =" create-performance " ></a > Create Performance ` fs.s3a.create.performance `
307+
308+
309+ The configuration option ` fs.s3a.create.performance ` has the same behavior as
310+ the ` fs.s3a.performance.flag ` flag option ` create ` :
311+
312+ * No overwrite checks are made when creating a file, even if overwrite is set to ` false ` in the application/library code
313+ * No checks are made for an object being written above a path containing other objects (i.e. a "directory")
314+ * No checks are made for a parent path containing an object which is not a directory marker (i.e. a "file")
315+
316+ This saves multiple probes per operation, especially a ` LIST ` call.
317+
318+ It may however result in
319+ * Unintentional overwriting of data
320+ * Creation of directory structures which can no longer be navigated through filesystem APIs.
321+
322+ Use with care, and, ideally, enable versioning on the S3 store.
323+
324+ ### <a name =" threads " ></a > Thread and connection pool settings.
280325
281326Each S3A client interacting with a single bucket, as a single user, has its
282327own dedicated pool of open HTTP connections alongside a pool of threads used
@@ -441,16 +486,16 @@ killer.
4414861 . As discussed [ earlier] ( #pooling ) , use large values for
442487` fs.s3a.threads.max ` and ` fs.s3a.connection.maximum ` .
443488
444- 1 . Make sure that the bucket is using ` sequential ` or ` normal ` fadvise seek policies,
445- that is, ` fs.s3a.experimental.input.fadvise ` is not set to ` random `
446-
4474891 . Perform listings in parallel by setting ` -numListstatusThreads `
448490to a higher number. Make sure that ` fs.s3a.connection.maximum `
449491is equal to or greater than the value used.
450492
4514931 . If using ` -delete ` , set ` fs.trash.interval ` to 0 to avoid the deleted
452494objects from being copied to a trash directory.
453495
496+ 1 . If using distcp to upload to a new path where no existing data exists,
497+ consider adding the option ` create ` to the flags in ` fs.s3a.performance.flag ` .
498+
454499* DO NOT* switch ` fs.s3a.fast.upload.buffer ` to buffer in memory.
455500If one distcp mapper runs out of memory it will fail,
456501and that runs the risk of failing the entire job.
@@ -461,12 +506,6 @@ efficient in terms of HTTP connection use, and reduce the IOP rate against
461506the S3 bucket/shard.
462507
463508``` xml
464-
465- <property >
466- <name >fs.s3a.experimental.input.fadvise</name >
467- <value >normal</value >
468- </property >
469-
470509<property >
471510 <name >fs.s3a.block.size</name >
472511 <value >128M</value >
@@ -481,6 +520,12 @@ the S3 bucket/shard.
481520 <name >fs.trash.interval</name >
482521 <value >0</value >
483522</property >
523+
524+ <!-- maybe -->
525+ <property >
526+ <name >fs.s3a.create.performance</name >
527+ <value >create</value >
528+ </property >
484529```
485530
486531## <a name =" rm " ></a > hadoop shell commands ` fs -rm `
@@ -719,20 +764,20 @@ exception and S3A initialization will fail.
719764
720765Supported values for ` fs.s3a.ssl.channel.mode ` :
721766
722- | ` fs.s3a.ssl.channel.mode ` Value | Description |
723- | -------------------------------| -------------|
724- | ` default_jsse ` | Uses Java JSSE without GCM on Java 8 |
725- | ` default_jsse_with_gcm ` | Uses Java JSSE |
726- | ` default ` | Uses OpenSSL, falls back to ` default_jsse ` if OpenSSL cannot be loaded |
727- | ` openssl ` | Uses OpenSSL, fails if OpenSSL cannot be loaded |
767+ | ` fs.s3a.ssl.channel.mode ` Value | Description |
768+ | --------------------------------- | ----------------------------------------------------------- -------------|
769+ | ` default_jsse ` | Uses Java JSSE without GCM on Java 8 |
770+ | ` default_jsse_with_gcm ` | Uses Java JSSE |
771+ | ` default ` | Uses OpenSSL, falls back to ` default_jsse ` if OpenSSL cannot be loaded |
772+ | ` openssl ` | Uses OpenSSL, fails if OpenSSL cannot be loaded |
728773
729774The naming convention is setup in order to preserve backwards compatibility
730775with the ABFS support of [ HADOOP-15669] ( https://issues.apache.org/jira/browse/HADOOP-15669 ) .
731776
732777Other options may be added to ` fs.s3a.ssl.channel.mode ` in the future as
733778further SSL optimizations are made.
734779
735- ### WildFly classpath requirements
780+ ### WildFly classpath and SSL library requirements
736781
737782For OpenSSL acceleration to work, a compatible version of the
738783wildfly JAR must be on the classpath. This is not explicitly declared
@@ -742,14 +787,21 @@ optional.
742787If the wildfly JAR is not found, the network acceleration will fall back
743788to the JVM, always.
744789
745- Note: there have been compatibility problems with wildfly JARs and openSSL
790+ Similarly, the ` libssl ` library must be compatibile with wildfly.
791+
792+ Wildfly requires this native library to be part of an ` openssl ` installation.
793+ Third party implementations may not work correctly.
794+ This can be an isse in FIPS-compliant deployments, where the ` libssl ` library
795+ is a third-party implementation built with restricted TLS protocols.
796+
797+
798+ There have been compatibility problems with wildfly JARs and openSSL
746799releases in the past: version 1.0.4.Final is not compatible with openssl 1.1.1.
747800An extra complication was older versions of the ` azure-data-lake-store-sdk `
748801JAR used in ` hadoop-azure-datalake ` contained an unshaded copy of the 1.0.4.Final
749802classes, causing binding problems even when a later version was explicitly
750803being placed on the classpath.
751804
752-
753805## <a name =" initilization " ></a > Tuning FileSystem Initialization.
754806
755807### Bucket existence checks
0 commit comments