[Spark-21842][Mesos] Support Kerberos ticket renewal and creation in Mesos #19272

ArtRand · 2017-09-19T00:10:20Z

What changes were proposed in this pull request?

tl;dr: Add a class, MesosHadoopDelegationTokenManager that updates delegation tokens on a schedule on the behalf of Spark Drivers. Broadcast renewed credentials to the executors.

The problem

We recently added Kerberos support to Mesos-based Spark jobs as well as Secrets support to the Mesos Dispatcher (SPARK-16742, SPARK-20812, respectively). However the delegation tokens have a defined expiration. This poses a problem for long running Spark jobs (e.g. Spark Streaming applications). YARN has a solution for this where a thread is scheduled to renew the tokens they reach 75% of their way to expiration. It then writes the tokens to HDFS for the executors to find (uses a monotonically increasing suffix).

This solution

We replace the current method in CoarseGrainedSchedulerBackend which used to discard the token renewal time with a protected method fetchHadoopDelegationTokens. Now the individual cluster backends are responsible for overriding this method to fetch and manage token renewal. The delegation tokens themselves, are still part of the CoarseGrainedSchedulerBackend as before.
In the case of Mesos renewed Credentials are broadcasted to the executors. This maintains all transfer of Credentials within Spark (as opposed to Spark-to-HDFS). It also does not require any writing of Credentials to disk. It also does not require any GC of old files.

How was this patch tested?

Manually against a Kerberized HDFS cluster.

Thank you for the reviews.

skonto · 2017-09-19T08:20:44Z

core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala

+    val creds = SparkHadoopUtil.get.deserialize(tokens)
+    // decode tokens and add them to the credentials
+    UserGroupInformation.getCurrentUser.addCredentials(SparkHadoopUtil.get.deserialize(tokens))
+  }


use the val above: UserGroupInformation.getCurrentUser.addCredentials(creds)

skonto · 2017-09-19T08:57:27Z

...s/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCredentialRenewer.scala

+    }
+    val tempCreds = ugi.getCredentials
+    val hadoopConf = SparkHadoopUtil.get.newConfiguration(conf)
+    var nextRenewalTime = Long.MaxValue


Same as spark.yarn.credentials.renewalTime, should not we have a common value somewhere?

When driver is restarted in case of yarn the old renewalTime is restored:

spark/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala

Line 59 in e92ffe6

"spark.yarn.credentials.renewalTime",

Does the code here cover this?

Right now when the MesosCredentialRenewer is initialized, it renews the current tokens and sets the renewal time to whatever the expiration time of those tokens is. On a driver restart, the same thing would happen. We could add spark.yarn.credentials.renewalTime as an override, but if the driver restarts, say 2 days later, spark.yarn.credentials.renewalTime is no longer relevant and it'll just immediately renew anyways.

Relavent code:
https://github.com/mesosphere/spark/blob/spark-21842-450-kerberos-ticket-renewal/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala#L210
^^ Where the initial renewal time is set
https://github.com/mesosphere/spark/blob/spark-21842-450-kerberos-ticket-renewal/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCredentialRenewer.scala#L66
^^ where we initialize the renewal time if the renewal time has passed

Ok so we always re-new when we start by fetching the tokens, got it.

skonto · 2017-09-19T09:04:05Z

core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala

          executor.stop()
        }
      }.start()
+    case UpdateDelegationTokens(tokens) =>


Let's add a comment that this is received only in mesos case, since CoarseGrainedExecutorBackend is used by both yarn and standalone.

skonto · 2017-09-19T09:10:52Z

core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala


 private[spark] object CoarseGrainedExecutorBackend extends Logging {

+  private def addDelegationTokens(tokens: Array[Byte], sparkConf: SparkConf) {


I think we should move this to SparkHadoopUtil and re-use methods such as:

spark/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala

Line 131 in d7b1fcf

def addCurrentUserCredentials(creds: Credentials): Unit = {

skonto · 2017-09-19T09:13:09Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

+        for ((x, executorData) <- executorDataMap) {
+          executorData.executorEndpoint.send(UpdateDelegationTokens(tokens))
+        }
+


remove space

skonto · 2017-09-19T09:26:37Z

...s/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCredentialRenewer.scala

+    val hadoopConf = SparkHadoopUtil.get.newConfiguration(conf)
+    val creds = SparkHadoopUtil.get.deserialize(bytes)
+    val intervals = creds.getAllTokens.asScala.flatMap { t =>
+      Try {


t -> token
This method does not return an interval, it just returns the new expiration time.
Compare with:

spark/core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala

Line 102 in b9ab791

private def getTokenRenewalInterval(

skonto · 2017-09-19T09:34:56Z

...main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala

+            new MesosCredentialRenewer(
+              conf,
+              hadoopDelegationTokenManager.get,
+              MesosCredentialRenewer.getTokenRenewalInterval(hadoopDelegationCreds.get, conf),


hadoopDelegationCreds.get call. Should we check against none? Creds are loaded in CoarseGrainedSchedulerBackend but if they are missing we should fail here?

skonto · 2017-09-19T09:46:45Z

When executor is started it asks from CoarseSchedulerBackend the spark config which contains the hadoop credentials

spark/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

Line 235 in 7c92351

case RetrieveSparkAppConfig =>

.
As we have discussed this is not safe. Using rpc an arbitrary executor could register to the scheduler and get tokens. Does this code handle this, do we authenticate executors?
Another topic is how about encryption at the rpc level? The latter is not supported on mesos (spark.io.encryption.enabled).

susanxhuynh

Thanks, Art! Would you mind adding a note about broadcasting the tokens to executors in the PR description? Also, see comments.

susanxhuynh · 2017-09-19T15:22:56Z

...s/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCredentialRenewer.scala

+          } catch {
+            case e: Exception =>
+              // Log the error and try to write new tokens back in an hour
+              logWarning("Couldn't broadcast tokens, trying agin in 20 seconds", e)


(sp) "again"

susanxhuynh · 2017-09-19T18:08:29Z

...s/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCredentialRenewer.scala

+            broadcastDelegationTokens(creds)
+          } catch {
+            case e: Exception =>
+              // Log the error and try to write new tokens back in an hour


Comment says "an hour" but code has 20 seconds.

good catch, I changed the code to match the YARN equivalent.

susanxhuynh · 2017-09-19T18:36:40Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

+
+      case UpdateDelegationTokens(tokens) =>
+        logDebug("Asking each executor to update HDFS delegation tokens")
+        for ((x, executorData) <- executorDataMap) {


(_, executorData) would be more Scala-like.

Alternatively executorDataMap.values.foreach(...)

susanxhuynh · 2017-09-20T23:23:16Z

...main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala

      val credentialRenewerThread = new Thread {
        setName("MesosCredentialRenewer")
        override def run(): Unit = {
+          val dummy: Option[Array[Byte]] = None


What is this for?

susanxhuynh · 2017-09-20T23:24:55Z

...s/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCredentialRenewer.scala

    def scheduleRenewal(runnable: Runnable): Unit = {
-      val remainingTime = timeOfNextRenewal - System.currentTimeMillis()
+      // val remainingTime = timeOfNextRenewal - System.currentTimeMillis()
+      val remainingTime = 5000


well that's embarrassing, just a debugging tool that I forgot to remove.

susanxhuynh

LGTM

kalvinnchau · 2017-09-21T18:38:38Z

...s/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCredentialRenewer.scala

+    logInfo(s"Attempting to login with ${conf.get("spark.yarn.principal", null)}")
+    // Get new delegation tokens by logging in with a new UGI
+    // inspired by AMCredentialRenewer.scala:L174
+    val ugi = if (mode == "keytab") {


I don't see where it refreshes the delegation tokens until the max-lifetime, then re-login with the keytab to get a new delegation tokens that'll last until the max-lifetime.

Does this skip over the potential issues with expiring delegation tokens (after the max-lifetime, 7 days default) by just re-logging in with the keytab every time the delegation tokens need to refresh, and then grabbing a new set of delegation tokens?

Hello @kalvinnchau You are correct, all this does is keep track of when the tokens will expire and renew them at that time. Part of my motivation for doing this is to avoid writing any files to disk (like new TGTs, if that's what you're suggesting). We can simply mount the keytab via the Mesos secrets primitive, then renew the tokens every so often. In order to be consistent I tried to keep this solution as close to YARN as possible.

The correct way would be for the credential management code to differentiate between token creation and token renewal; that way it would renew tokens at the renewal internal and create new ones after the max lifetime.

But it seems the original implementation took a shortcut and just creates new one instead of renewing existing ones; changing that would require changes in the credential provider interfaces, so this is enough for now.

kalvinnchau · 2017-09-25T16:41:08Z

...main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala

+            new MesosCredentialRenewer(
+              conf,
+              hadoopDelegationTokenManager.get,
+              MesosCredentialRenewer.getTokenRenewalTime(hadoopDelegationCreds.get, conf),


This sets the first renewal time to be the expiration time of the token.

It should be similar to the way next renewal time in the MesosCredentialRenewer class is calculated so that it renews the first token after 75% of expiration time has passed:

val currTime = System.currentTimeMillis() val renewTime = MesosCredentialRenewer.getTokenRenewalTime(hadoopDelegationCreds.get, conf) val rt = 0.75 * (renewTime - currTime) val credentialRenewer = new MesosCredentialRenewer( conf, hadoopDelegationTokenManager.get, (currTime + rt).toLong, driverEndpoint) credentialRenewer.scheduleTokenRenewal()

ArtRand · 2017-09-26T20:18:35Z

Hey @kalvinnchau good catch on the first renewal time. I believe I addressed it. Have a look. Thanks again.

kalvinnchau · 2017-09-26T20:45:00Z

@ArtRand thanks! I've been testing a local version of doing that, I'll pull that change in and test it as well.

kalvinnchau · 2017-09-27T18:13:22Z

@ArtRand
I'm running into an issue where it seems like the delegation tokens are being sent to the executors (as in I see the logs stating that the tokens are being broadcast), but the old delegation tokens are still in use.

When the job first starts up it created token 777222, then ~18 hours in the refresh occurs:
[INFO ] 2017-09-26 17:59:20.639 [Credential Refresh Thread-0] DFSClient - Created HDFS_DELEGATION_TOKEN token 772826 for <principal> on ha-hdfs:<namenode>

[INFO ] 2017-09-26 17:59:20.747 [Credential Refresh Thread-0] MesosCredentialRenewer - Sending new tokens to all executors

Then at ~24 hour mark I get the exception:
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 772222 for <principal>) is expired

Are you aware of anything that needs to be done to tell the thread to use the newer tokens? Or do the older tokens need to be removed from the UGI?

ArtRand · 2017-09-28T03:56:44Z

Hey @kalvinnchau thanks for having the patience to try this. This is a curious error though.

If you look at the addAll method called by UserGroupInformation.addCredentials() it should overwrite the current credentials.

I tried to reproduce your error, but being less patient, I changed my HDFS setup to request the tokens be updated every minute instead of everyday by adding the following to hdfs-site.xml:

    <property>
        <name>dfs.namenode.delegation.token.max-lifetime</name>
        <value>60000</value>
    </property>

I added some logging to the executor backend to check if they were indeed being updated.

case UpdateDelegationTokens(tokens) =>
      logInfo("Got request to update tokens")
      val oldCreds = UserGroupInformation.getCurrentUser.getCredentials
      for (t <- oldCreds.getAllTokens.asScala) {
        logInfo(s"Old Creds ${DelegationTokenIdentifier.stringifyToken(t)}")
      }
      val creds = SparkHadoopUtil.get.deserialize(tokens)
      for (t <- creds.getAllTokens.asScala) {
        val s = DelegationTokenIdentifier.stringifyToken(t)
        logInfo(s"Got new tokens $s")
      }
      SparkHadoopUtil.get.addDelegationTokens(tokens, env.conf)
      val newCreds = UserGroupInformation.getCurrentUser.getCredentials
      for (t <- newCreds.getAllTokens.asScala) {
        logInfo(s"New creds ${DelegationTokenIdentifier.stringifyToken(t)}")
      }

and indeed when I check the logs, indeed the token number has been updated.

17/09/28 03:32:58 INFO CoarseGrainedExecutorBackend: Got request to update tokens
17/09/28 03:32:58 INFO CoarseGrainedExecutorBackend: Old Creds HDFS_DELEGATION_TOKEN token 29 for hdfs on ha-hdfs:hdfs
17/09/28 03:32:59 INFO CoarseGrainedExecutorBackend: Got new tokens HDFS_DELEGATION_TOKEN token 31 for hdfs on ha-hdfs:hdfs
17/09/28 03:32:59 INFO CoarseGrainedExecutorBackend: New creds HDFS_DELEGATION_TOKEN token 31 for hdfs on ha-hdfs:hdfs

then some time later (in fact there was another update in the middle):

17/09/28 03:35:14 INFO CoarseGrainedExecutorBackend: Got request to update tokens
17/09/28 03:35:14 INFO CoarseGrainedExecutorBackend: Old Creds HDFS_DELEGATION_TOKEN token 34 for hdfs on ha-hdfs:hdfs
17/09/28 03:35:14 INFO CoarseGrainedExecutorBackend: Got new tokens HDFS_DELEGATION_TOKEN token 35 for hdfs on ha-hdfs:hdfs
17/09/28 03:35:14 INFO CoarseGrainedExecutorBackend: New creds HDFS_DELEGATION_TOKEN token 35 for hdfs on ha-hdfs:hdfs

I will run a 24h experiment to verify, but hopefully there is a way to validate that the update is working without waiting that long just to debug!

@vanzin Could you eyeball this? Am I missing something obvious?

vanzin · 2017-09-28T23:07:44Z

What you wrote sounds correct. However, I've seen errors like the ones above in the past, but haven't been able to fully debug them due to lack of logs. Part of it is because the user running the app didn't provide me the full logs; but also Spark currently doesn't have the logs you added in your debug code, not even at debug level, so we have only partial information about exactly what's going on from existing logs. (And these error tend to happen only after x days, so it's kind of a pain to reproduce.)

That all being said, though, as I mentioned, what you wrote sounds correct AFAIK.

vanzin · 2017-09-28T23:14:53Z

Another thing to keep in mind is that different Hadoop versions have different bugs in this area, so if you use a version of the Hadoop client library that suffers from some issue, or are talking to an HDFS version that has some bug, that can cause problems.

Without researching more I only remember an issue that affects HA mode (HDFS-9276), but there might be others.

ArtRand · 2017-09-29T15:50:46Z

Hello @vanzin

Thanks for taking a look at this. Good to know that there can be downstream errors depending on the situation.

Would very much appreciate a proper review on this work when you have some time, very keen on getting this into the next release.

kalvinnchau · 2017-10-04T16:29:22Z

@ArtRand curious, what version of hadoop are you build spark against and what version is the cluster that you're running?

ArtRand · 2017-10-05T00:08:36Z

@kalvinnchau I compiled Spark with Hadoop 2.6, I'm running on a DC/OS cluster with Mesos 1.4.0

SparkQA · 2017-11-06T17:49:42Z

Test build #83493 has finished for PR 19272 at commit 4558cea.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

ArtRand · 2017-11-06T18:13:15Z

Hello @vanzin thanks for continuing to help with this. Please take another look at this refactor.

In this change, there is one place to interact with hadoopDelegationTokens from CoarseGrainedSchedulerBackedn: initializeHadoopDelegationTokens. This method contains the logic for initializing the tokens and setting a token renewer. It's also now resource-manager specific. This seems cleaner than having a HadoopDelegationTokenManager in CoarseGrainedSchedulerBackend because any "token management" will always want to wrapHadoopDelegationTokenManager so you can keep all the necessary information in one place. Of course, happy to discuss further.

vanzin

I still don't like the API. There are just too many touch points between the different classes in your patch, and non-trivial initialization order requirements. That makes the code brittle and hard to modify later.

vanzin · 2017-11-06T23:29:38Z

core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala

    s"${delegationTokenProviders.keys.mkString(", ")}.")

-  /** Construct a [[HadoopDelegationTokenManager]] for the default Hadoop filesystem */
+  /**


This is not really changing anything, so I'd just revert changes to this file. Or, if you really want to, just keep the new @params you're adding below.

vanzin · 2017-11-06T23:30:59Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

-  // Hadoop delegation tokens to be sent to the executors.
-  val hadoopDelegationCreds: Option[Array[Byte]] = getHadoopDelegationCreds()
+  // Hadoop delegation tokens to be sent to the executors, can be updated as necessary.
+  protected var hadoopDelegationTokens: Option[Array[Byte]] = initializeHadoopDelegationTokens()


Why is this protected? There's no reason I can see for subclasses to need access to this field.

vanzin · 2017-11-06T23:32:45Z

...main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala


+    // check that the credentials are defined, even though it's likely that auth would have failed
+    // already if you've made it this far, then start the token renewer
+    if (hadoopDelegationTokens.isDefined) {


You shouldn't do this here, otherwise you need to keep that field protected in the parent class and that adds unnecessary coupling. Instead, do this in initializeHadoopDelegationTokens.

I agree that I shouldn't need to use the conditional hadoopDelegationTokens.isDefined, however there will need to be some check (UserGroupInformation.isSecurityEnabled or similar) to pass the driverEndpoint to the renewer/manager here. When the initial tokens are generated driverEndpoint is still None because start() hasn't been called yet. So I could schedule the renewal, but I'll still have to at least update the driverEndpoint here.

I could initialize the driverEndpoint in initializeHadoopDelegationTokens for Mesos and change around the logic in start() (for the MesosCoarseGrainedSchedulerBackend) but then you're just switching one conditional for another...

I may have spoke too soon, there might be a way..

You could call initializeHadoopDelegationTokens in start after everything that's needed is initialized. It would also better follow the scheduler's lifecycle.

Check out the patch now. hadoopDelegationTokens now calls initializeHadoopDelegationTokens (renamed fetchHadoopDelegationTokens) by name:

private val hadoopDelegationTokens: () => Option[Array[Byte]] = fetchHadoopDelegationTokens

where

override def fetchHadoopDelegationTokens(): Option[Array[Byte]] = { if (UserGroupInformation.isSecurityEnabled) { Some(hadoopDelegationTokenManager.getTokens()) } else { None } }

This has the effect of only generating the first set of delegation tokens once the first RetrieveSparkAppConfig message is received. At this point, everything has been initialized because renewer (renamed MesosHadoopDelegationTokenManager) is evaluated lazily with the correct driverEndpoint.

private lazy val hadoopDelegationTokenManager: MesosHadoopDelegationTokenManager = new MesosHadoopDelegationTokenManager(conf, sc.hadoopConfiguration, driverEndpoint)

It's maybe a bit confusing to just avoid an extra conditional. WDYT?

vanzin · 2017-11-06T23:33:53Z

...s/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCredentialRenewer.scala

+ * serialized) are broadcast to all running executors. On the executor side, when new Tokens are
+ * recieved they overwrite the current credentials.
+ */
+class MesosCredentialRenewer(


private[spark]

vanzin · 2017-11-06T23:34:24Z

...s/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCredentialRenewer.scala

+
+  private val (secretFile, mode) = getSecretFile(conf)
+
+  var (tokens: Array[Byte], timeOfNextRenewal: Long) = {


vanzin · 2017-11-06T23:37:18Z

...s/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCredentialRenewer.scala

+    (secretFile, mode)
+  }
+
+  def scheduleTokenRenewal(driverEndpoint: RpcEndpointRef): Unit = {


Why isn't this done in the constructor? There's a single call to this method, and the renewal interval could very easily be turned into a constructor arg.

vanzin · 2017-11-06T23:39:06Z

...main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala

-    Some(new HadoopDelegationTokenManager(sc.conf, sc.hadoopConfiguration))
+  private lazy val hadoopCredentialRenewer: MesosCredentialRenewer =
+    new MesosCredentialRenewer(
+      conf, new HadoopDelegationTokenManager(sc.conf, sc.hadoopConfiguration))


Why pass in a HadoopDelegationTokenManager if it's not used by this class? The renewer can create one itself.

vanzin · 2017-11-06T23:42:07Z

...main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala

+
+  override def initializeHadoopDelegationTokens(): Option[Array[Byte]] = {
+    if (UserGroupInformation.isSecurityEnabled) {
+      Some(hadoopCredentialRenewer.tokens)


So, seems to me that your "renewer" is doing more than just renewing tokens; it's also being used to generate the initial set. So aside from my comments about initializing the renewer here, you should also probably make this API a little cleaner. Right now there's too much coupling.

The renewer should do renewals only, otherwise it should be called something different.

SparkQA · 2017-11-08T23:10:41Z

Test build #83607 has finished for PR 19272 at commit 5f254e5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin

Ok, the API looks a lot better. Still a few things to take care of.

vanzin · 2017-11-09T18:37:35Z

core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala

      }.start()
+
+    case UpdateDelegationTokens(tokenBytes) =>
+      SparkHadoopUtil.get.addDelegationTokens(tokenBytes, env.conf)


Can you add a logInfo saying the tokens are being updated? This has always been helpful when debugging issues with this feature on YARN.

vanzin · 2017-11-09T18:38:43Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

        }
+
+      case UpdateDelegationTokens(newDelegationTokens) =>
+        // Update the driver's delegation tokens in case new executors are added later.


Stale comment?

vanzin · 2017-11-09T18:40:00Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

          sparkProperties,
          SparkEnv.get.securityManager.getIOEncryptionKey(),
-          hadoopDelegationCreds)
+          hadoopDelegationTokens.apply())


Can't you just call fetchHadoopDelegationTokens() directly?

vanzin · 2017-11-09T18:42:53Z

.../main/scala/org/apache/spark/scheduler/cluster/mesos/MesosHadoopDelegationTokenManager.scala

+ * received they overwrite the current credentials.
+ */
+private[spark] class MesosHadoopDelegationTokenManager(
+    conf: SparkConf, hadoopConfig: Configuration,


One arg per line.

vanzin · 2017-11-09T18:44:50Z

.../main/scala/org/apache/spark/scheduler/cluster/mesos/MesosHadoopDelegationTokenManager.scala

+    }
+
+    if (principal == null) {
+      logInfo(s"Using mode: $mode to retrieve Hadoop delegation tokens")


You should probably assert that mode is "tgt" in this case.

vanzin · 2017-11-09T18:48:55Z

.../main/scala/org/apache/spark/scheduler/cluster/mesos/MesosHadoopDelegationTokenManager.scala

+    } catch {
+      case e: Exception =>
+        throw new IllegalStateException("Failed to initialize Hadoop delegation tokens\n" +
+          s"\tPricipal: $principal\n\tmode: $mode\n\tsecret file $secretFile\n\tException: $e")


Use e as the cause of the exception you're throwing.

vanzin · 2017-11-09T18:53:06Z

.../main/scala/org/apache/spark/scheduler/cluster/mesos/MesosHadoopDelegationTokenManager.scala

+  private def getSecretFile(conf: SparkConf): (String, String) = {
+    val keytab = conf.get(config.KEYTAB).orNull
+    val tgt = conf.getenv("KRB5CCNAME")
+    require(keytab != null || tgt != null, "A keytab or TGT required.")


Is that really the case? KRB5CCNAME is not a required env variable. It has a default value, and the UGI class will use the credentials from the default location if they're available (and reloading the cache periodically).

So I think you don't really need this, but just to track whether there's a principal and keytab. And you don't need to call getUGIFromTicketCache later on since I'm pretty sure UGI takes care of that for you.

vanzin · 2017-11-09T18:54:37Z

.../main/scala/org/apache/spark/scheduler/cluster/mesos/MesosHadoopDelegationTokenManager.scala

+
+    val tempCreds = ugi.getCredentials
+    val hadoopConf = SparkHadoopUtil.get.newConfiguration(conf)
+    var nextRenewalTime = Long.MaxValue


val nextRenewalTime = ugi.doAs(new PrivilegedExceptionAction[Long] { ... }

vanzin · 2017-11-09T18:55:42Z

.../main/scala/org/apache/spark/scheduler/cluster/mesos/MesosHadoopDelegationTokenManager.scala

+  }
+
+  private def broadcastDelegationTokens(tokens: Array[Byte]) = {
+    logDebug("Sending new tokens to all executors")


I'd make this logInfo (similar message in YARN code has helped me a lot).

vanzin · 2017-11-09T18:56:30Z

.../main/scala/org/apache/spark/scheduler/cluster/mesos/MesosHadoopDelegationTokenManager.scala

+
+  private def broadcastDelegationTokens(tokens: Array[Byte]) = {
+    logDebug("Sending new tokens to all executors")
+    if (driverEndpoint == null) {


Make this a require in the constructor?

SparkQA · 2017-11-10T04:27:22Z

Test build #83660 has finished for PR 19272 at commit 8df7e37.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-11-10T04:36:49Z

Test build #83661 has finished for PR 19272 at commit 45b46ed.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

ArtRand · 2017-11-13T02:47:22Z

Hello @vanzin thanks for the continued help with this, anything else needed?

maverick2202 · 2017-11-13T18:10:51Z

We want to use this feature and will be great if this can be merged. Any idea which spark release will have it.

ArtRand · 2017-11-13T20:49:16Z

Hello @maverick2202, hopefully 2.3 (and maybe back ported?) but that's up to the Committers.

vanzin · 2017-11-13T22:43:42Z

.../main/scala/org/apache/spark/scheduler/cluster/mesos/MesosHadoopDelegationTokenManager.scala

+      UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, secretFile.get)
+    } else {
+      // if the ticket cache is not explicitly defined, use the default
+      if (secretFile.isEmpty) {


The point I was trying to make is that you do not need any special handling for TGT. The UGI class already does everything you need, you just need to get the current user. It will keep the TGT updated with any changes that happen on disk. You don't need to handle KRB5CCNAME anywhere, because UGI should be doing that for you. If it's not, you need to explain why you need this special handling, because the expected behavior is for this to work without you needing to do anything.

So you can simplify this class by only handling the principal / keytab case, and just using UserGroupInformation.getCurrentUser in the other case. You don't need to keep track of the "mode" or anything else, just whether you're using a principal / keytab pair.

How do the executors get the TGT though? KRB5CCNAME would need to be set in the executor containers as well, right? If it is, I suppose you don't need to broadcast the delegation tokens at all because runAsSparkUser takes care of it for you?

I guess now that #19437 has been merged we can disseminate the TGT though Mesos.

How do the executors get the TGT though?

They don't. That's why you're creating delegation tokens and sending them to the executor.

Ah gotcha. I misunderstood your previous comment to mean that you wouldn’t need to renew the tokens when using the ticket cache. I’ll simplify the logic. Thanks.

So getCurrentUser actually doesn't work. I believe for the reason mentioned here. The same expired credentials are returned (causing the renewer to loop).

I understand that optionally using a TGT instead of a keytab is different than the YARN reference implementation, and it's unusual to use it in this case since it expires anyways. Do you think it would be better to avoid the ticket renewal logic all together when using a TGT or to keep the older UserGroupInformation.getUGIFromTicketCache-based method?

Do you think it would be better to avoid the ticket renewal logic all together when using a TGT

If getCurrentUser does not work, then yes, that's probably the best way forward. Requiring the user to set KRB5CCNAME is not really a good way to go about this. Also because in that case the user still has to make sure the TGT is updated through other means.

Agreed, thanks.

ArtRand · 2017-11-14T21:23:32Z

@vanzin PTAL.
I removed the awkward mode parameter from the token manager. Now we only start the renewer thread when using a keytab/principal. The condition is logged appropriately.

SparkQA · 2017-11-15T00:39:45Z

Test build #83863 has finished for PR 19272 at commit 18d77ff.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2017-11-15T18:43:21Z

core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala

+   * Add or overwrite current user's credentials with serialized delegation tokens,
+   * also confirms correct hadoop configuration is set.
+   */
+  def addDelegationTokens(tokens: Array[Byte], sparkConf: SparkConf) {


Always forget this class is public. Add private[spark].

vanzin · 2017-11-15T18:43:28Z

core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala

+   * @param fraction fraction of the time until expiration return
+   * @return Date when the fraction of the time until expiration has passed
+   */
+  def getDateOfNextUpdate(expirationDate: Long, fraction: Double): Long = {


Add private[spark].

vanzin · 2017-11-15T18:49:55Z

.../main/scala/org/apache/spark/scheduler/cluster/mesos/MesosHadoopDelegationTokenManager.scala

+  }
+
+  def getTokens(): Array[Byte] = {
+    tokens


tokens is never updated, so fetchHadoopDelegationTokens() will always return the initial set even after it's expired.

Thanks for catching this, tokens are now updated for late-joining executors. https://github.com/apache/spark/pull/19272/files#diff-765ac3c4db227cd2c5d796f00794016fR145

vanzin · 2017-11-15T18:54:10Z

.../main/scala/org/apache/spark/scheduler/cluster/mesos/MesosHadoopDelegationTokenManager.scala

+  private def getNewDelegationTokens(): Array[Byte] = {
+    logInfo(s"Attempting to login to KDC with principal ${principal}")
+    // Get new delegation tokens by logging in with a new UGI
+    // inspired by AMCredentialRenewer.scala:L174.


nit: mentioning line numbers will make this stale very quickly.

SparkQA · 2017-11-15T23:34:58Z

Test build #83912 has finished for PR 19272 at commit 049e4b5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2017-11-15T23:52:43Z

Merging to master.

ArtRand · 2017-11-16T00:07:11Z

@vanzin Thanks for the reviews and mentorship!

…Mesos tl;dr: Add a class, `MesosHadoopDelegationTokenManager` that updates delegation tokens on a schedule on the behalf of Spark Drivers. Broadcast renewed credentials to the executors. We recently added Kerberos support to Mesos-based Spark jobs as well as Secrets support to the Mesos Dispatcher (SPARK-16742, SPARK-20812, respectively). However the delegation tokens have a defined expiration. This poses a problem for long running Spark jobs (e.g. Spark Streaming applications). YARN has a solution for this where a thread is scheduled to renew the tokens they reach 75% of their way to expiration. It then writes the tokens to HDFS for the executors to find (uses a monotonically increasing suffix). We replace the current method in `CoarseGrainedSchedulerBackend` which used to discard the token renewal time with a protected method `fetchHadoopDelegationTokens`. Now the individual cluster backends are responsible for overriding this method to fetch and manage token renewal. The delegation tokens themselves, are still part of the `CoarseGrainedSchedulerBackend` as before. In the case of Mesos renewed Credentials are broadcasted to the executors. This maintains all transfer of Credentials within Spark (as opposed to Spark-to-HDFS). It also does not require any writing of Credentials to disk. It also does not require any GC of old files. Manually against a Kerberized HDFS cluster. Thank you for the reviews. Author: ArtRand <[email protected]> Closes apache#19272 from ArtRand/spark-21842-450-kerberos-ticket-renewal.

…Mesos (#17) tl;dr: Add a class, `MesosHadoopDelegationTokenManager` that updates delegation tokens on a schedule on the behalf of Spark Drivers. Broadcast renewed credentials to the executors. We recently added Kerberos support to Mesos-based Spark jobs as well as Secrets support to the Mesos Dispatcher (SPARK-16742, SPARK-20812, respectively). However the delegation tokens have a defined expiration. This poses a problem for long running Spark jobs (e.g. Spark Streaming applications). YARN has a solution for this where a thread is scheduled to renew the tokens they reach 75% of their way to expiration. It then writes the tokens to HDFS for the executors to find (uses a monotonically increasing suffix). We replace the current method in `CoarseGrainedSchedulerBackend` which used to discard the token renewal time with a protected method `fetchHadoopDelegationTokens`. Now the individual cluster backends are responsible for overriding this method to fetch and manage token renewal. The delegation tokens themselves, are still part of the `CoarseGrainedSchedulerBackend` as before. In the case of Mesos renewed Credentials are broadcasted to the executors. This maintains all transfer of Credentials within Spark (as opposed to Spark-to-HDFS). It also does not require any writing of Credentials to disk. It also does not require any GC of old files. Manually against a Kerberized HDFS cluster. Thank you for the reviews. Author: ArtRand <[email protected]> Closes apache#19272 from ArtRand/spark-21842-450-kerberos-ticket-renewal.

…Mesos tl;dr: Add a class, `MesosHadoopDelegationTokenManager` that updates delegation tokens on a schedule on the behalf of Spark Drivers. Broadcast renewed credentials to the executors. We recently added Kerberos support to Mesos-based Spark jobs as well as Secrets support to the Mesos Dispatcher (SPARK-16742, SPARK-20812, respectively). However the delegation tokens have a defined expiration. This poses a problem for long running Spark jobs (e.g. Spark Streaming applications). YARN has a solution for this where a thread is scheduled to renew the tokens they reach 75% of their way to expiration. It then writes the tokens to HDFS for the executors to find (uses a monotonically increasing suffix). We replace the current method in `CoarseGrainedSchedulerBackend` which used to discard the token renewal time with a protected method `fetchHadoopDelegationTokens`. Now the individual cluster backends are responsible for overriding this method to fetch and manage token renewal. The delegation tokens themselves, are still part of the `CoarseGrainedSchedulerBackend` as before. In the case of Mesos renewed Credentials are broadcasted to the executors. This maintains all transfer of Credentials within Spark (as opposed to Spark-to-HDFS). It also does not require any writing of Credentials to disk. It also does not require any GC of old files. Manually against a Kerberized HDFS cluster. Thank you for the reviews. Author: ArtRand <[email protected]> Closes apache#19272 from ArtRand/spark-21842-450-kerberos-ticket-renewal.

ArtRand added 2 commits September 18, 2017 17:00

basic implementation of MesosCredentialRenewer

44a6098

pull upstream

781c5a7

ArtRand changed the title ~~[Spark-21842] Support Kerberos ticket renewal and creation in Mesos~~ [Spark-21842][Mesos] Support Kerberos ticket renewal and creation in Mesos Sep 19, 2017

skonto reviewed Sep 19, 2017

View reviewed changes

susanxhuynh suggested changes Sep 19, 2017

View reviewed changes

ArtRand added 3 commits September 20, 2017 05:14

working through comments

18d2c6c

cleanup

43ab547

remove unused imports

f136eaa

susanxhuynh reviewed Sep 20, 2017

View reviewed changes

remove debugging code

488f72a

susanxhuynh approved these changes Sep 21, 2017

View reviewed changes

kalvinnchau reviewed Sep 21, 2017

View reviewed changes

kalvinnchau reviewed Sep 25, 2017

View reviewed changes

use 75 percent of renewal time for the first renewal also

f5925fd

vanzin reviewed Nov 6, 2017

View reviewed changes

no more conditionals, start renewer automatically

5f254e5

vanzin reviewed Nov 9, 2017

View reviewed changes

ArtRand added 2 commits November 9, 2017 16:57

addressed comments

8df7e37

quick fixup, logging was in the wrong place

45b46ed

vanzin reviewed Nov 13, 2017

View reviewed changes

vanzin mentioned this pull request Nov 14, 2017

[SPARK-22372][core, yarn] Make cluster submission use SparkApplication. #19631

Closed

only renew tokens when using keytab

18d77ff

vanzin reviewed Nov 15, 2017

View reviewed changes

update tokens, make SparkHadoopUtil methods private

049e4b5

asfgit closed this in 1e82335 Nov 15, 2017

susanxhuynh mentioned this pull request Nov 16, 2017

[SPARK-21842][MESOS] Support Kerberos ticket renewal and creation in … d2iq-archive/spark#17

Merged


		private[spark] object CoarseGrainedExecutorBackend extends Logging {

		private def addDelegationTokens(tokens: Array[Byte], sparkConf: SparkConf) {


		private val (secretFile, mode) = getSecretFile(conf)

		var (tokens: Array[Byte], timeOfNextRenewal: Long) = {

[Spark-21842][Mesos] Support Kerberos ticket renewal and creation in Mesos #19272

[Spark-21842][Mesos] Support Kerberos ticket renewal and creation in Mesos #19272

Uh oh!

Conversation

ArtRand commented Sep 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

The problem

This solution

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skonto Sep 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skonto Sep 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skonto Sep 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skonto commented Sep 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

susanxhuynh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

susanxhuynh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArtRand commented Sep 26, 2017

Uh oh!

kalvinnchau commented Sep 26, 2017

Uh oh!

kalvinnchau commented Sep 27, 2017

Uh oh!

ArtRand commented Sep 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ArtRand commented Sep 19, 2017 •

edited

Loading

skonto Sep 19, 2017 •

edited

Loading

skonto Sep 19, 2017 •

edited

Loading

skonto Sep 19, 2017 •

edited

Loading

skonto commented Sep 19, 2017 •

edited

Loading

ArtRand commented Sep 28, 2017 •

edited

Loading

ArtRand commented Oct 5, 2017 •

edited

Loading

ArtRand Nov 8, 2017 •

edited

Loading

ArtRand Nov 8, 2017 •

edited

Loading