Review fixes:

gaborgsomogyi · gaborgsomogyi · commit 1c87238cb28f · 2019-02-07T15:26:55.000+01:00
* Simplified hadoopFSsToAccess
* Moved doc to generic area
diff --git a/core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala b/core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala
@@ -143,17 +143,16 @@ private[deploy] object HadoopFSDelegationTokenProvider {
     val defaultFS = FileSystem.get(hadoopConf)
     val master = sparkConf.get("spark.master", null)
     val stagingFS = if (master != null && master.contains("yarn")) {
-      sparkConf.get(STAGING_DIR)
-        .map(new Path(_).getFileSystem(hadoopConf))
-        .getOrElse(defaultFS)
+      sparkConf.get(STAGING_DIR).map(new Path(_).getFileSystem(hadoopConf))
     } else {
-      defaultFS
+      None
     }
 
     // Add the list of available namenodes for all namespaces in HDFS federation.
     // If ViewFS is enabled, this is skipped as ViewFS already handles delegation tokens for its
     // namespaces.
-    val hadoopFilesystems = if (!filesystemsToAccess.isEmpty || stagingFS.getScheme == "viewfs") {
+    val hadoopFilesystems = if (!filesystemsToAccess.isEmpty || defaultFS.getScheme == "viewfs" ||
+      (stagingFS.isDefined && stagingFS.get.getScheme == "viewfs")) {
       filesystemsToAccess.map(new Path(_).getFileSystem(hadoopConf)).toSet
     } else {
       val nameservices = hadoopConf.getTrimmedStrings("dfs.nameservices")
@@ -172,6 +171,6 @@ private[deploy] object HadoopFSDelegationTokenProvider {
       (filesystemsWithoutHA ++ filesystemsWithHA).toSet
     }
 
-    hadoopFilesystems + stagingFS + defaultFS
+    hadoopFilesystems ++ stagingFS + defaultFS
   }
 }
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
@@ -490,10 +490,6 @@ for:
   filesystem if `spark.yarn.stagingDir` is not set);
 - if Hadoop federation is enabled, all the federated filesystems in the configuration.
 
-If an application needs to interact with other secure Hadoop filesystems, their URIs need to be
-explicitly provided to Spark at launch time. This is done by listing them in the
-`spark.kerberos.access.hadoopFileSystems` property, described in the configuration section below.
-
 The YARN integration also supports custom delegation token providers using the Java Services
 mechanism (see `java.util.ServiceLoader`). Implementations of
 `org.apache.spark.deploy.yarn.security.ServiceCredentialProvider` can be made available to Spark
@@ -527,18 +523,6 @@ providers can be disabled individually by setting `spark.security.credentials.{s
   <br /> (Works also with the "local" master.)
   </td>
 </tr>
-<tr>
-  <td><code>spark.kerberos.access.hadoopFileSystems</code></td>
-  <td>(none)</td>
-  <td>
-    A comma-separated list of secure Hadoop filesystems your Spark application is going to access. For
-    example, <code>spark.kerberos.access.hadoopFileSystems=hdfs://nn1.com:8032,hdfs://nn2.com:8032,
-    webhdfs://nn3.com:50070</code>. The Spark application must have access to the filesystems listed
-    and Kerberos must be properly configured to be able to access them (either in the same realm
-    or in a trusted realm). Spark acquires security tokens for each of the filesystems so that
-    the Spark application can access those remote Hadoop filesystems.
-  </td>
-</tr>
 <tr>
   <td><code>spark.yarn.kerberos.relogin.period</code></td>
   <td>1m</td>
diff --git a/docs/security.md b/docs/security.md
@@ -752,6 +752,10 @@ configuration has Kerberos authentication turned (`hbase.security.authentication
 Similarly, a Hive token will be obtained if Hive is in the classpath, and the configuration includes
 URIs for remote metastore services (`hive.metastore.uris` is not empty).
 
+If an application needs to interact with other secure Hadoop filesystems, their URIs need to be
+explicitly provided to Spark at launch time. This is done by listing them in the
+`spark.kerberos.access.hadoopFileSystems` property, described in the configuration section below.
+
 Delegation token support is currently only supported in YARN and Mesos modes. Consult the
 deployment-specific page for more information.
 
@@ -769,6 +773,18 @@ The following options provides finer-grained control for this feature:
   application being run.
   </td>
 </tr>
+<tr>
+  <td><code>spark.kerberos.access.hadoopFileSystems</code></td>
+  <td>(none)</td>
+  <td>
+    A comma-separated list of secure Hadoop filesystems your Spark application is going to access. For
+    example, <code>spark.kerberos.access.hadoopFileSystems=hdfs://nn1.com:8032,hdfs://nn2.com:8032,
+    webhdfs://nn3.com:50070</code>. The Spark application must have access to the filesystems listed
+    and Kerberos must be properly configured to be able to access them (either in the same realm
+    or in a trusted realm). Spark acquires security tokens for each of the filesystems so that
+    the Spark application can access those remote Hadoop filesystems.
+  </td>
+</tr>
 </table>
 
 ## Long-Running Applications