|
257 | 257 |
|
258 | 258 |
|
259 | 259 | You can create a new container through the ABFS connector, by setting the option |
260 | | - `fs.azure.createRemoteFileSystemDuringInitialization` to `true`. |
| 260 | + `fs.azure.createRemoteFileSystemDuringInitialization` to `true`. Though the |
| 261 | + same is not supported when AuthType is SAS. |
261 | 262 |
|
262 | 263 | If the container does not exist, an attempt to list it with `hadoop fs -ls` |
263 | 264 | will fail |
@@ -317,8 +318,13 @@ driven by them. |
317 | 318 |
|
318 | 319 | What can be changed is what secrets/credentials are used to authenticate the caller. |
319 | 320 |
|
320 | | -The authentication mechanism is set in `fs.azure.account.auth.type` (or the account specific variant), |
321 | | -and, for the various OAuth options `fs.azure.account.oauth.provider.type` |
| 321 | +The authentication mechanism is set in `fs.azure.account.auth.type` (or the |
| 322 | +account specific variant). The possible values are SharedKey, OAuth, Custom |
| 323 | +and SAS. For the various OAuth options use the config `fs.azure.account |
| 324 | +.oauth.provider.type`. Following are the implementations supported |
| 325 | +ClientCredsTokenProvider, UserPasswordTokenProvider, MsiTokenProvider and |
| 326 | +RefreshTokenBasedTokenProvider. An IllegalArgumentException is thrown if |
| 327 | +the specified provider type is not one of the supported. |
322 | 328 |
|
323 | 329 | All secrets can be stored in JCEKS files. These are encrypted and password |
324 | 330 | protected —use them or a compatible Hadoop Key Management Store wherever |
@@ -350,6 +356,15 @@ the password, "key", retrieved from the XML/JCECKs configuration files. |
350 | 356 | *Note*: The source of the account key can be changed through a custom key provider; |
351 | 357 | one exists to execute a shell script to retrieve it. |
352 | 358 |
|
| 359 | +A custom key provider class can be provided with the config |
| 360 | +`fs.azure.account.keyprovider`. If a key provider class is specified the same |
| 361 | +will be used to get account key. Otherwise the Simple key provider will be used |
| 362 | +which will use the key specified for the config `fs.azure.account.key`. |
| 363 | + |
| 364 | +To retrieve using shell script, specify the path to the script for the config |
| 365 | +`fs.azure.shellkeyprovider.script`. ShellDecryptionKeyProvider class use the |
| 366 | +script specified to retrieve the key. |
| 367 | + |
353 | 368 | ### <a name="oauth-client-credentials"></a> OAuth 2.0 Client Credentials |
354 | 369 |
|
355 | 370 | OAuth 2.0 credentials of (client id, client secret, endpoint) are provided in the configuration/JCEKS file. |
@@ -465,6 +480,13 @@ With an existing Oauth 2.0 token, make a request of the Active Directory endpoin |
465 | 480 | Refresh token |
466 | 481 | </description> |
467 | 482 | </property> |
| 483 | +<property> |
| 484 | + <name>fs.azure.account.oauth2.refresh.endpoint</name> |
| 485 | + <value></value> |
| 486 | + <description> |
| 487 | + Refresh token endpoint |
| 488 | + </description> |
| 489 | +</property> |
468 | 490 | <property> |
469 | 491 | <name>fs.azure.account.oauth2.client.id</name> |
470 | 492 | <value></value> |
@@ -506,6 +528,13 @@ The Azure Portal/CLI is used to create the service identity. |
506 | 528 | Optional MSI Tenant ID |
507 | 529 | </description> |
508 | 530 | </property> |
| 531 | +<property> |
| 532 | + <name>fs.azure.account.oauth2.msi.endpoint</name> |
| 533 | + <value></value> |
| 534 | + <description> |
| 535 | + MSI endpoint |
| 536 | + </description> |
| 537 | +</property> |
509 | 538 | <property> |
510 | 539 | <name>fs.azure.account.oauth2.client.id</name> |
511 | 540 | <value></value> |
@@ -542,6 +571,26 @@ and optionally `org.apache.hadoop.fs.azurebfs.extensions.BoundDTExtension`. |
542 | 571 |
|
543 | 572 | The declared class also holds responsibility to implement retry logic while fetching access tokens. |
544 | 573 |
|
| 574 | +### <a name="delegationtokensupportconfigoptions"></a> Delegation Token Provider |
| 575 | + |
| 576 | +A delegation token provider supplies the ABFS connector with delegation tokens, |
| 577 | +helps renew and cancel the tokens by implementing the |
| 578 | +CustomDelegationTokenManager interface. |
| 579 | + |
| 580 | +```xml |
| 581 | +<property> |
| 582 | + <name>fs.azure.enable.delegation.token</name> |
| 583 | + <value>true</value> |
| 584 | + <description>Make this true to use delegation token provider</description> |
| 585 | +</property> |
| 586 | +<property> |
| 587 | + <name>fs.azure.delegation.token.provider.type</name> |
| 588 | + <value>{fully-qualified-class-name-for-implementation-of-CustomDelegationTokenManager-interface}</value> |
| 589 | +</property> |
| 590 | +``` |
| 591 | +In case delegation token is enabled, and the config `fs.azure.delegation.token |
| 592 | +.provider.type` is not provided then an IlleagalArgumentException is thrown. |
| 593 | + |
545 | 594 | ### Shared Access Signature (SAS) Token Provider |
546 | 595 |
|
547 | 596 | A Shared Access Signature (SAS) token provider supplies the ABFS connector with SAS |
@@ -691,6 +740,84 @@ Config `fs.azure.account.hns.enabled` provides an option to specify whether |
691 | 740 | Config `fs.azure.enable.check.access` needs to be set true to enable |
692 | 741 | the AzureBlobFileSystem.access(). |
693 | 742 |
|
| 743 | +### <a name="featureconfigoptions"></a> Primary User Group Options |
| 744 | +The group name which is part of FileStatus and AclStatus will be set the same as |
| 745 | +the username if the following config is set to true |
| 746 | +`fs.azure.skipUserGroupMetadataDuringInitialization`. |
| 747 | + |
| 748 | +### <a name="ioconfigoptions"></a> IO Options |
| 749 | +The following configs are related to read and write operations. |
| 750 | + |
| 751 | +`fs.azure.io.retry.max.retries`: Sets the number of retries for IO operations. |
| 752 | +Currently this is used only for the server call retry logic. Used within |
| 753 | +AbfsClient class as part of the ExponentialRetryPolicy. The value should be |
| 754 | +>= 0. |
| 755 | +
|
| 756 | +`fs.azure.write.request.size`: To set the write buffer size. Specify the value |
| 757 | +in bytes. The value should be between 16384 to 104857600 both inclusive (16 KB |
| 758 | +to 100 MB). The default value will be 8388608 (8 MB). |
| 759 | + |
| 760 | +`fs.azure.read.request.size`: To set the read buffer size.Specify the value in |
| 761 | +bytes. The value should be between 16384 to 104857600 both inclusive (16 KB to |
| 762 | +100 MB). The default value will be 4194304 (4 MB). |
| 763 | + |
| 764 | +`fs.azure.readaheadqueue.depth`: Sets the readahead queue depth in |
| 765 | +AbfsInputStream. In case the set value is negative the read ahead queue depth |
| 766 | +will be set as Runtime.getRuntime().availableProcessors(). By default the value |
| 767 | +will be -1. |
| 768 | + |
| 769 | +### <a name="securityconfigoptions"></a> Security Options |
| 770 | +`fs.azure.always.use.https`: Enforces to use HTTPS instead of HTTP when the flag |
| 771 | +is made true. Irrespective of the flag, AbfsClient will use HTTPS if the secure |
| 772 | +scheme (ABFSS) is used or OAuth is used for authentication. By default this will |
| 773 | +be set to true. |
| 774 | + |
| 775 | +`fs.azure.ssl.channel.mode`: Initializing DelegatingSSLSocketFactory with the |
| 776 | +specified SSL channel mode. Value should be of the enum |
| 777 | +DelegatingSSLSocketFactory.SSLChannelMode. The default value will be |
| 778 | +DelegatingSSLSocketFactory.SSLChannelMode.Default. |
| 779 | + |
| 780 | +### <a name="serverconfigoptions"></a> Server Options |
| 781 | +When the config `fs.azure.io.read.tolerate.concurrent.append` is made true, the |
| 782 | +If-Match header sent to the server for read calls will be set as * otherwise the |
| 783 | +same will be set with ETag. This is basically a mechanism in place to handle the |
| 784 | +reads with optimistic concurrency. |
| 785 | +Please refer the following links for further information. |
| 786 | +1. https://docs.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read |
| 787 | +2. https://azure.microsoft.com/de-de/blog/managing-concurrency-in-microsoft-azure-storage-2/ |
| 788 | + |
| 789 | +listStatus API fetches the FileStatus information from server in a page by page |
| 790 | +manner. The config `fs.azure.list.max.results` used to set the maxResults URI |
| 791 | + param which sets the pagesize(maximum results per call). The value should |
| 792 | + be > 0. By default this will be 500. Server has a maximum value for this |
| 793 | + parameter as 5000. So even if the config is above 5000 the response will only |
| 794 | +contain 5000 entries. Please refer the following link for further information. |
| 795 | +https://docs.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/list |
| 796 | + |
| 797 | +### <a name="throttlingconfigoptions"></a> Throttling Options |
| 798 | +ABFS driver has the capability to throttle read and write operations to achieve |
| 799 | +maximum throughput by minimizing errors. The errors occur when the account |
| 800 | +ingress or egress limits are exceeded and, the server-side throttles requests. |
| 801 | +Server-side throttling causes the retry policy to be used, but the retry policy |
| 802 | +sleeps for long periods of time causing the total ingress or egress throughput |
| 803 | +to be as much as 35% lower than optimal. The retry policy is also after the |
| 804 | +fact, in that it applies after a request fails. On the other hand, the |
| 805 | +client-side throttling implemented here happens before requests are made and |
| 806 | +sleeps just enough to minimize errors, allowing optimal ingress and/or egress |
| 807 | +throughput. By default the throttling mechanism is enabled in the driver. The |
| 808 | +same can be disabled by setting the config `fs.azure.enable.autothrottling` |
| 809 | +to false. |
| 810 | + |
| 811 | +### <a name="renameconfigoptions"></a> Rename Options |
| 812 | +`fs.azure.atomic.rename.key`: Directories for atomic rename support can be |
| 813 | +specified comma separated in this config. The driver prints the following |
| 814 | +warning log if the source of the rename belongs to one of the configured |
| 815 | +directories. "The atomic rename feature is not supported by the ABFS scheme |
| 816 | +; however, rename, create and delete operations are atomic if Namespace is |
| 817 | +enabled for your Azure Storage account." |
| 818 | +The directories can be specified as comma separated values. By default the value |
| 819 | +is "/hbase" |
| 820 | + |
694 | 821 | ### <a name="perfoptions"></a> Perf Options |
695 | 822 |
|
696 | 823 | #### <a name="abfstracklatencyoptions"></a> 1. HTTP Request Tracking Options |
|
0 commit comments