Skip to content

Conversation

@sunchao
Copy link
Member

@sunchao sunchao commented Nov 17, 2021

What changes were proposed in this pull request?

This adds a new config spark.yarn.am.tokenConfRegex which is similar to mapreduce.job.send-token-conf introduced via YARN-5910. It is used for YARN AM to pass Hadoop configs, such as dfs.nameservices, dfs.ha.namenodes., dfs.namenode.rpc-address., etc, to RM for renewing delegation tokens.

Why are the changes needed?

YARN-5910 introduced a new config mapreduce.job.send-token-conf which can be used to pass a job's local configuration to RM which uses them when renewing delegation tokens. A typical use case is when a YARN cluster needs to talk to multiple HDFS clusters, where the RM may not have all the configs (e.g., dfs.nameservices, dfs.ha.namenodes.<nameservice>.*, dfs.namenode.rpc-address) to connect to these clusters when renewing delegation tokens. In this case, the clients can use the feature to pass their local HDFS configs to RM.

Does this PR introduce any user-facing change?

Yes, a new config spark.yarn.am.tokenConfRegex will be introduced to Spark users. By default it is disabled.

How was this patch tested?

It seems difficult to come up with a unit test for this. I manually tested it against a YARN cluster with Hadoop version 3.x and it worked as expected.

$SPARK_HOME/bin/spark-shell --master yarn \
            --deploy-mode client \
            --conf spark.driver.extraClassPath="${HADOOP_CONF_DIR}" \
            --conf spark.executor.extraclasspath="${HADOOP_CONF_DIR}" \
            --conf spark.yarn.am.tokenConfRegex="^dfs.nameservices$|^dfs.namenode.rpc-address.*$|^dfs.ha.namenodes.*$|^dfs.client.failover.proxy.provider.*$|^dfs.namenode.kerberos.principal|^dfs.namenode.kerberos.principal.pattern" \
            --conf spark.yarn.access.hadoopFileSystems="<HDFS_URI>"

@github-actions github-actions bot added the YARN label Nov 17, 2021
@sunchao
Copy link
Member Author

sunchao commented Nov 17, 2021

cc @gaborgsomogyi @xkrogen

@SparkQA
Copy link

SparkQA commented Nov 17, 2021

Test build #145343 has finished for PR 34635 at commit b2411ba.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 17, 2021

Test build #145345 has finished for PR 34635 at commit 8c6e5b8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 17, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49814/

@SparkQA
Copy link

SparkQA commented Nov 17, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49816/

@SparkQA
Copy link

SparkQA commented Nov 17, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49814/

@SparkQA
Copy link

SparkQA commented Nov 17, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49816/

"'mapreduce.job.send-token-conf'. Please check YARN-5910 for more details.")
.version("3.3.0")
.stringConf
.createWithDefault("")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a regex expression, what does empty string regex mean here as a default value?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not clear, shall we use .createOptional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. I think createOptional is better.

"needs to talk to multiple downstream HDFS clusters, where the YARN RM may not have " +
"configs (e.g., dfs.nameservices, dfs.ha.namenodes.*, dfs.namenode.rpc-address.*)" +
"to connect to these clusters. This config is very similar to " +
"'mapreduce.job.send-token-conf'. Please check YARN-5910 for more details.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had better mention explicitly that this config is ignored in Hadoop 2.7 because we still have Hadoop 2.7 distribution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I missed that, added.

.createWithDefault(false)

private[spark] val AM_SEND_TOKEN_CONF =
ConfigBuilder("spark.yarn.am.sendTokenConf")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit.spark.yarn.am.tokenConf instead spark.yarn.am.sendTokenConf? sendTokenConf sounds like a boolean config like send or not send.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what is a good name for this and just followed the Hadoop side config name. send here is supposed to mean that the token conf is sent from AM to RM

Copy link
Contributor

@mridulm mridulm Nov 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use regexConf ?
Also, add .regex to config name ? (in addition to @dongjoon-hyun's suggestion for rename).

Take a look at spark.redaction.string.regex for an example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I updated the config name to spark.yarn.am.tokenConfRegex. Let me know if this looks better.

}
}
copy.write(dob);
amContainer.setTokensConf(ByteBuffer.wrap(dob.getData))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question. The compilation works with Hadoop 2.7, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, it won't work for Hadoop 2.7. Hmm let me think how to make it work ..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can follow ResourceRequestHelper and use reflection to lookup the method when using Hadoop 3.x, to avoid compilation error.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @sunchao . It looks reasonable.

Sorry, but I need to ask if you think we can add a test coverage for this.

@sunchao
Copy link
Member Author

sunchao commented Nov 18, 2021

Sorry, but I need to ask if you think we can add a test coverage for this.

I mentioned a bit in the PR description. It's pretty hard to come up with a e2e test for this esp. with kerberos involved. I checked a few related PRs such as #31761 and #23525 and they also didn't come with tests.

@SparkQA
Copy link

SparkQA commented Nov 18, 2021

Test build #145369 has finished for PR 34635 at commit 37c430c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 18, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49841/

@SparkQA
Copy link

SparkQA commented Nov 18, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49841/

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. I agree with you about the test case. Thanks, @sunchao .

}
copy.write(dob);

// since this method was added in Hadoop 2.9 and 3.0, we use reflection here to avoid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we doing this only for 3.x ? If not, relax the isHadoop3 condition ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, since Spark only supports built-in Hadoop 2.7 or 3.3, we have the check here. Do you mean support custom Hadoop version 2.9+ too with -Phadoop.version=2.9.x?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly - both 2.9 and 2.10 for example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I added the change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gently ping @mridulm . Does the latest change look good to you?

@sunchao sunchao changed the title [SPARK-37205][YARN] Introduce a new config 'spark.yarn.am.sendTokenConf' to support renewing delegation tokens in a multi-cluster environment [SPARK-37205][YARN] Introduce a new config 'spark.yarn.am.tokenConfRegex' to support renewing delegation tokens in a multi-cluster environment Nov 19, 2021
@SparkQA
Copy link

SparkQA commented Nov 19, 2021

Test build #145464 has finished for PR 34635 at commit 8ff6be1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 19, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49936/

@SparkQA
Copy link

SparkQA commented Nov 19, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49936/

@dongjoon-hyun
Copy link
Member

Could you review this once more please, @mridulm ?

@SparkQA
Copy link

SparkQA commented Nov 30, 2021

Test build #145752 has finished for PR 34635 at commit c69cb6e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 30, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50224/

@SparkQA
Copy link

SparkQA commented Nov 30, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50223/

@SparkQA
Copy link

SparkQA commented Nov 30, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50223/

@SparkQA
Copy link

SparkQA commented Nov 30, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50224/

@sunchao
Copy link
Member Author

sunchao commented Dec 8, 2021

@dongjoon-hyun could you help to double check the new changes and see if they look good to you? if so I'm going to merge this soon. Thanks.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. New changes look good to me, @sunchao .

@sunchao sunchao closed this in 77a8778 Dec 8, 2021
@sunchao
Copy link
Member Author

sunchao commented Dec 8, 2021

Merged, thanks!

@sunchao sunchao deleted the SPARK-37205 branch December 8, 2021 17:25
kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
…rn.am.tokenConfRegex' to support renewing delegation tokens in a multi-cluster environment (apache#1300)

This adds a new config `spark.yarn.am.tokenConfRegex` which is similar to `mapreduce.job.send-token-conf` introduced via [YARN-5910](https://issues.apache.org/jira/browse/YARN-5910). It is used for YARN AM to pass Hadoop configs, such as `dfs.nameservices`, `dfs.ha.namenodes.`, `dfs.namenode.rpc-address.`, etc, to RM for renewing delegation tokens.

[YARN-5910](https://issues.apache.org/jira/browse/YARN-5910) introduced a new config `mapreduce.job.send-token-conf` which can be used to pass a job's local configuration to RM which uses them when renewing delegation tokens. A typical use case is when a YARN cluster needs to talk to multiple HDFS clusters, where the RM may not have all the configs (e.g., `dfs.nameservices`, `dfs.ha.namenodes.<nameservice>.*`, `dfs.namenode.rpc-address`) to connect to these clusters when renewing delegation tokens. In this case, the clients can use the feature to pass their local HDFS configs to RM.

Yes, a new config `spark.yarn.am.tokenConfRegex` will be introduced to Spark users. By default it is disabled.

It seems difficult to come up with a unit test for this. I manually tested it against a YARN cluster with Hadoop version 3.x and it worked as expected.

```
$SPARK_HOME/bin/spark-shell --master yarn \
            --deploy-mode client \
            --conf spark.driver.extraClassPath="${HADOOP_CONF_DIR}" \
            --conf spark.executor.extraclasspath="${HADOOP_CONF_DIR}" \
            --conf spark.yarn.am.tokenConfRegex="^dfs.nameservices$|^dfs.namenode.rpc-address.*$|^dfs.ha.namenodes.*$|^dfs.client.failover.proxy.provider.*$|^dfs.namenode.kerberos.principal|^dfs.namenode.kerberos.principal.pattern" \
            --conf spark.yarn.access.hadoopFileSystems="<HDFS_URI>"
```

Closes apache#34635 from sunchao/SPARK-37205.

Authored-by: Chao Sun <[email protected]>
Signed-off-by: Chao Sun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants