[SPARK-45278][YARN] Support executor bind address in Yarn executors #47892

gedeh · 2024-08-27T14:00:12Z

What changes were proposed in this pull request?

Uptake --bind-address parameter in YarnCoarseGrainedExecutorBackend when launching new container in Yarn cluster. This PR also ensure YarnAllocator uses default hostname when its not configured.

Why are the changes needed?

We've came across issue with Spark running on Yarn in Istio enabled Kubernetes cluster. Previous PR #32633 is not merged because Spark 2.4 was EOL and 3.x branch didnt get enough traction.

Does this PR introduce any user-facing change?

Yes, new config specifically for Yarn cluster mode is added and relevant doc is updated.

How was this patch tested?

Tested in Kubenetes with Istio and added tests to YarnAllocatorSuite

Thanks!

gedeh · 2024-08-27T14:01:29Z

I have previous pull request #42870 and it was closed due to inactivity

gedeh · 2024-08-27T19:43:49Z

Hi @srowen @tokoko my previous PR was closed and I believe only missing small adjustment to the doc for Yarn. Appreciate if you can take a look again this one. Thanks!

dongjoon-hyun · 2024-09-11T15:04:26Z

cc @mridulm and @tgravescs

tgravescs · 2024-09-11T15:38:14Z

docs/running-on-yarn.md

this seems odd to me. The only thing this can really be set to in a multinode system is localhost or 0.0.0.0, right? Otherwise its not like you can tell it to be node1.foo.com, node2.foo.com or like individual ips right? At least not by the user directly.

Thanks for flagging this, yes theoretically it should only works with those 2 options you mentioned. For yarn running in k8s with Istio use case, executor needs to bind to 0.0.0.0. See detailed diagram

I've updated the config to accept either HOSTNAME or ALL_IPS. Thanks for your feedback

mridulm · 2024-09-12T05:23:44Z

I am not sure I understood this PR - how will user specify the executor IP to use for binding address ?
Spark executors will run on any host in the cluster - and unless there is only a single host, this wont work.

If this is indeed for such a case - this is not something we should introduce into Apache Spark.

tokoko · 2024-09-12T06:21:22Z

Can't speak for the author's use case, but we needed this because we had a multi-home networking inside our yarn cluster, infiniband for internal and ethernet for external. I haven't been able to track how/why, but apparently the default behavior of executor bind address changed somewhere along the line between spark 2 and spark 3. Spark 2 used to always bind to 0.0.0.0, Spark 3 would first find it's own ip address basically by "pinging" it's own hostname and bind to whatever that returned. In our case that was an internal ip, hence client mode stopped working when we transitioned from Spark 2 to Spark 3 as driver was no longer able to reach executors.

gedeh · 2024-11-06T16:20:10Z

I am not sure I understood this PR - how will user specify the executor IP to use for binding address ? Spark executors will run on any host in the cluster - and unless there is only a single host, this wont work.

If this is indeed for such a case - this is not something we should introduce into Apache Spark.

Sorry was away for a while. @tokoko use case is one of them, my use case and I've been seeing in the wild is running Spark yarn in k8s cluster with Istio. Same exact symptomps as explained

And true, maybe the configuration should be boolean instead of free form IP/address? Sorry I was just follwing existing configuration available in Spark. The possible option likely 0.0.0.0, localhost, or executor's hostname

gedeh · 2024-11-06T21:22:56Z

LInking previous comment as context #42870 (comment)

…NAME or ALL_IPS

gedeh · 2024-11-26T20:28:50Z

Hi @dongjoon-hyun @mridulm @tgravescs @tokoko apologies to call out, just to catch up after few weeks, wondering is there anything else remaining to address in this PR from your side? Thank you in advance

github-actions · 2025-03-07T00:25:42Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions bot added YARN DOCS labels Aug 27, 2024

gedeh mentioned this pull request Aug 27, 2024

[SPARK-45278] [YARN] Allow configuring Yarn executor bind address in Yarn #42870

Closed

gedeh mentioned this pull request Aug 27, 2024

[SPARK-45278][YARN] Support executor bind address in Yarn executors for Spark 3.5 #47896

Closed

tgravescs reviewed Sep 11, 2024

View reviewed changes

Hendra Saputra added 3 commits November 8, 2024 09:49

[SPARK-45278][YARN] Support executor bind address in Yarn executors

5a51833

[SPARK-45278][YARN] Remove unrelated changes for doc

bbf8d7f

[SPARK-45278][YARN] Allow limited configuration option to either HOST…

dcad398

…NAME or ALL_IPS

gedeh force-pushed the yarn-executor-bind-address branch from a063fdb to dcad398 Compare November 8, 2024 09:53

Merge branch 'apache:master' into yarn-executor-bind-address

2d01329

github-actions bot added the Stale label Mar 7, 2025

github-actions bot closed this Mar 8, 2025

[SPARK-45278][YARN] Support executor bind address in Yarn executors #47892

[SPARK-45278][YARN] Support executor bind address in Yarn executors #47892

Uh oh!

Conversation

gedeh commented Aug 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gedeh commented Aug 27, 2024

Uh oh!

gedeh commented Aug 27, 2024

Uh oh!

dongjoon-hyun commented Sep 11, 2024

Uh oh!

tgravescs Sep 11, 2024

Choose a reason for hiding this comment

Uh oh!

gedeh Nov 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gedeh Nov 8, 2024

Choose a reason for hiding this comment

Uh oh!

mridulm commented Sep 12, 2024

Uh oh!

tokoko commented Sep 12, 2024

Uh oh!

gedeh commented Nov 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gedeh commented Nov 6, 2024

Uh oh!

gedeh commented Nov 26, 2024

Uh oh!

github-actions bot commented Mar 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gedeh commented Aug 27, 2024 •

edited

Loading

gedeh Nov 6, 2024 •

edited

Loading

gedeh commented Nov 6, 2024 •

edited

Loading