Skip to content

Commit bdbd59c

Browse files
authored
HADOOP-17004. ABFS: Improve the ABFS driver documentation
Contributed by Bilahari T H.
1 parent 7bb902b commit bdbd59c

File tree

1 file changed

+130
-3
lines changed
  • hadoop-tools/hadoop-azure/src/site/markdown

1 file changed

+130
-3
lines changed

hadoop-tools/hadoop-azure/src/site/markdown/abfs.md

Lines changed: 130 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -257,7 +257,8 @@ will have the URL `abfs://[email protected]/`
257257

258258

259259
You can create a new container through the ABFS connector, by setting the option
260-
`fs.azure.createRemoteFileSystemDuringInitialization` to `true`.
260+
`fs.azure.createRemoteFileSystemDuringInitialization` to `true`. Though the
261+
same is not supported when AuthType is SAS.
261262

262263
If the container does not exist, an attempt to list it with `hadoop fs -ls`
263264
will fail
@@ -317,8 +318,13 @@ driven by them.
317318

318319
What can be changed is what secrets/credentials are used to authenticate the caller.
319320

320-
The authentication mechanism is set in `fs.azure.account.auth.type` (or the account specific variant),
321-
and, for the various OAuth options `fs.azure.account.oauth.provider.type`
321+
The authentication mechanism is set in `fs.azure.account.auth.type` (or the
322+
account specific variant). The possible values are SharedKey, OAuth, Custom
323+
and SAS. For the various OAuth options use the config `fs.azure.account
324+
.oauth.provider.type`. Following are the implementations supported
325+
ClientCredsTokenProvider, UserPasswordTokenProvider, MsiTokenProvider and
326+
RefreshTokenBasedTokenProvider. An IllegalArgumentException is thrown if
327+
the specified provider type is not one of the supported.
322328

323329
All secrets can be stored in JCEKS files. These are encrypted and password
324330
protected —use them or a compatible Hadoop Key Management Store wherever
@@ -350,6 +356,15 @@ the password, "key", retrieved from the XML/JCECKs configuration files.
350356
*Note*: The source of the account key can be changed through a custom key provider;
351357
one exists to execute a shell script to retrieve it.
352358

359+
A custom key provider class can be provided with the config
360+
`fs.azure.account.keyprovider`. If a key provider class is specified the same
361+
will be used to get account key. Otherwise the Simple key provider will be used
362+
which will use the key specified for the config `fs.azure.account.key`.
363+
364+
To retrieve using shell script, specify the path to the script for the config
365+
`fs.azure.shellkeyprovider.script`. ShellDecryptionKeyProvider class use the
366+
script specified to retrieve the key.
367+
353368
### <a name="oauth-client-credentials"></a> OAuth 2.0 Client Credentials
354369

355370
OAuth 2.0 credentials of (client id, client secret, endpoint) are provided in the configuration/JCEKS file.
@@ -465,6 +480,13 @@ With an existing Oauth 2.0 token, make a request of the Active Directory endpoin
465480
Refresh token
466481
</description>
467482
</property>
483+
<property>
484+
<name>fs.azure.account.oauth2.refresh.endpoint</name>
485+
<value></value>
486+
<description>
487+
Refresh token endpoint
488+
</description>
489+
</property>
468490
<property>
469491
<name>fs.azure.account.oauth2.client.id</name>
470492
<value></value>
@@ -506,6 +528,13 @@ The Azure Portal/CLI is used to create the service identity.
506528
Optional MSI Tenant ID
507529
</description>
508530
</property>
531+
<property>
532+
<name>fs.azure.account.oauth2.msi.endpoint</name>
533+
<value></value>
534+
<description>
535+
MSI endpoint
536+
</description>
537+
</property>
509538
<property>
510539
<name>fs.azure.account.oauth2.client.id</name>
511540
<value></value>
@@ -542,6 +571,26 @@ and optionally `org.apache.hadoop.fs.azurebfs.extensions.BoundDTExtension`.
542571

543572
The declared class also holds responsibility to implement retry logic while fetching access tokens.
544573

574+
### <a name="delegationtokensupportconfigoptions"></a> Delegation Token Provider
575+
576+
A delegation token provider supplies the ABFS connector with delegation tokens,
577+
helps renew and cancel the tokens by implementing the
578+
CustomDelegationTokenManager interface.
579+
580+
```xml
581+
<property>
582+
<name>fs.azure.enable.delegation.token</name>
583+
<value>true</value>
584+
<description>Make this true to use delegation token provider</description>
585+
</property>
586+
<property>
587+
<name>fs.azure.delegation.token.provider.type</name>
588+
<value>{fully-qualified-class-name-for-implementation-of-CustomDelegationTokenManager-interface}</value>
589+
</property>
590+
```
591+
In case delegation token is enabled, and the config `fs.azure.delegation.token
592+
.provider.type` is not provided then an IlleagalArgumentException is thrown.
593+
545594
### Shared Access Signature (SAS) Token Provider
546595

547596
A Shared Access Signature (SAS) token provider supplies the ABFS connector with SAS
@@ -691,6 +740,84 @@ Config `fs.azure.account.hns.enabled` provides an option to specify whether
691740
Config `fs.azure.enable.check.access` needs to be set true to enable
692741
the AzureBlobFileSystem.access().
693742

743+
### <a name="featureconfigoptions"></a> Primary User Group Options
744+
The group name which is part of FileStatus and AclStatus will be set the same as
745+
the username if the following config is set to true
746+
`fs.azure.skipUserGroupMetadataDuringInitialization`.
747+
748+
### <a name="ioconfigoptions"></a> IO Options
749+
The following configs are related to read and write operations.
750+
751+
`fs.azure.io.retry.max.retries`: Sets the number of retries for IO operations.
752+
Currently this is used only for the server call retry logic. Used within
753+
AbfsClient class as part of the ExponentialRetryPolicy. The value should be
754+
>= 0.
755+
756+
`fs.azure.write.request.size`: To set the write buffer size. Specify the value
757+
in bytes. The value should be between 16384 to 104857600 both inclusive (16 KB
758+
to 100 MB). The default value will be 8388608 (8 MB).
759+
760+
`fs.azure.read.request.size`: To set the read buffer size.Specify the value in
761+
bytes. The value should be between 16384 to 104857600 both inclusive (16 KB to
762+
100 MB). The default value will be 4194304 (4 MB).
763+
764+
`fs.azure.readaheadqueue.depth`: Sets the readahead queue depth in
765+
AbfsInputStream. In case the set value is negative the read ahead queue depth
766+
will be set as Runtime.getRuntime().availableProcessors(). By default the value
767+
will be -1.
768+
769+
### <a name="securityconfigoptions"></a> Security Options
770+
`fs.azure.always.use.https`: Enforces to use HTTPS instead of HTTP when the flag
771+
is made true. Irrespective of the flag, AbfsClient will use HTTPS if the secure
772+
scheme (ABFSS) is used or OAuth is used for authentication. By default this will
773+
be set to true.
774+
775+
`fs.azure.ssl.channel.mode`: Initializing DelegatingSSLSocketFactory with the
776+
specified SSL channel mode. Value should be of the enum
777+
DelegatingSSLSocketFactory.SSLChannelMode. The default value will be
778+
DelegatingSSLSocketFactory.SSLChannelMode.Default.
779+
780+
### <a name="serverconfigoptions"></a> Server Options
781+
When the config `fs.azure.io.read.tolerate.concurrent.append` is made true, the
782+
If-Match header sent to the server for read calls will be set as * otherwise the
783+
same will be set with ETag. This is basically a mechanism in place to handle the
784+
reads with optimistic concurrency.
785+
Please refer the following links for further information.
786+
1. https://docs.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read
787+
2. https://azure.microsoft.com/de-de/blog/managing-concurrency-in-microsoft-azure-storage-2/
788+
789+
listStatus API fetches the FileStatus information from server in a page by page
790+
manner. The config `fs.azure.list.max.results` used to set the maxResults URI
791+
param which sets the pagesize(maximum results per call). The value should
792+
be > 0. By default this will be 500. Server has a maximum value for this
793+
parameter as 5000. So even if the config is above 5000 the response will only
794+
contain 5000 entries. Please refer the following link for further information.
795+
https://docs.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/list
796+
797+
### <a name="throttlingconfigoptions"></a> Throttling Options
798+
ABFS driver has the capability to throttle read and write operations to achieve
799+
maximum throughput by minimizing errors. The errors occur when the account
800+
ingress or egress limits are exceeded and, the server-side throttles requests.
801+
Server-side throttling causes the retry policy to be used, but the retry policy
802+
sleeps for long periods of time causing the total ingress or egress throughput
803+
to be as much as 35% lower than optimal. The retry policy is also after the
804+
fact, in that it applies after a request fails. On the other hand, the
805+
client-side throttling implemented here happens before requests are made and
806+
sleeps just enough to minimize errors, allowing optimal ingress and/or egress
807+
throughput. By default the throttling mechanism is enabled in the driver. The
808+
same can be disabled by setting the config `fs.azure.enable.autothrottling`
809+
to false.
810+
811+
### <a name="renameconfigoptions"></a> Rename Options
812+
`fs.azure.atomic.rename.key`: Directories for atomic rename support can be
813+
specified comma separated in this config. The driver prints the following
814+
warning log if the source of the rename belongs to one of the configured
815+
directories. "The atomic rename feature is not supported by the ABFS scheme
816+
; however, rename, create and delete operations are atomic if Namespace is
817+
enabled for your Azure Storage account."
818+
The directories can be specified as comma separated values. By default the value
819+
is "/hbase"
820+
694821
### <a name="perfoptions"></a> Perf Options
695822

696823
#### <a name="abfstracklatencyoptions"></a> 1. HTTP Request Tracking Options

0 commit comments

Comments
 (0)