Skip to content

Conversation

@gedeh
Copy link

@gedeh gedeh commented Aug 27, 2024

What changes were proposed in this pull request?

Uptake --bind-address parameter in YarnCoarseGrainedExecutorBackend when launching new container in Yarn cluster. This PR also ensure YarnAllocator uses default hostname when its not configured.

Why are the changes needed?

We've came across issue with Spark running on Yarn in Istio enabled Kubernetes cluster. Previous PR #32633 is not merged because Spark 2.4 was EOL and 3.x branch didnt get enough traction.

Does this PR introduce any user-facing change?

Yes, new config specifically for Yarn cluster mode is added and relevant doc is updated.

How was this patch tested?

Tested in Kubenetes with Istio and added tests to YarnAllocatorSuite

Thanks!

@gedeh
Copy link
Author

gedeh commented Aug 27, 2024

I have previous pull request #42870 and it was closed due to inactivity

@gedeh
Copy link
Author

gedeh commented Aug 27, 2024

Hi @srowen @tokoko my previous PR was closed and I believe only missing small adjustment to the doc for Yarn. Appreciate if you can take a look again this one. Thanks!

@dongjoon-hyun
Copy link
Member

cc @mridulm and @tgravescs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems odd to me. The only thing this can really be set to in a multinode system is localhost or 0.0.0.0, right? Otherwise its not like you can tell it to be node1.foo.com, node2.foo.com or like individual ips right? At least not by the user directly.

Copy link
Author

@gedeh gedeh Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for flagging this, yes theoretically it should only works with those 2 options you mentioned. For yarn running in k8s with Istio use case, executor needs to bind to 0.0.0.0. See detailed diagram
image

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the config to accept either HOSTNAME or ALL_IPS. Thanks for your feedback

@mridulm
Copy link
Contributor

mridulm commented Sep 12, 2024

I am not sure I understood this PR - how will user specify the executor IP to use for binding address ?
Spark executors will run on any host in the cluster - and unless there is only a single host, this wont work.

If this is indeed for such a case - this is not something we should introduce into Apache Spark.

@tokoko
Copy link

tokoko commented Sep 12, 2024

Can't speak for the author's use case, but we needed this because we had a multi-home networking inside our yarn cluster, infiniband for internal and ethernet for external. I haven't been able to track how/why, but apparently the default behavior of executor bind address changed somewhere along the line between spark 2 and spark 3. Spark 2 used to always bind to 0.0.0.0, Spark 3 would first find it's own ip address basically by "pinging" it's own hostname and bind to whatever that returned. In our case that was an internal ip, hence client mode stopped working when we transitioned from Spark 2 to Spark 3 as driver was no longer able to reach executors.

@gedeh
Copy link
Author

gedeh commented Nov 6, 2024

I am not sure I understood this PR - how will user specify the executor IP to use for binding address ? Spark executors will run on any host in the cluster - and unless there is only a single host, this wont work.

If this is indeed for such a case - this is not something we should introduce into Apache Spark.

Sorry was away for a while. @tokoko use case is one of them, my use case and I've been seeing in the wild is running Spark yarn in k8s cluster with Istio. Same exact symptomps as explained

And true, maybe the configuration should be boolean instead of free form IP/address? Sorry I was just follwing existing configuration available in Spark. The possible option likely 0.0.0.0, localhost, or executor's hostname

@gedeh
Copy link
Author

gedeh commented Nov 6, 2024

LInking previous comment as context #42870 (comment)

@gedeh gedeh force-pushed the yarn-executor-bind-address branch from a063fdb to dcad398 Compare November 8, 2024 09:53
@gedeh
Copy link
Author

gedeh commented Nov 26, 2024

Hi @dongjoon-hyun @mridulm @tgravescs @tokoko apologies to call out, just to catch up after few weeks, wondering is there anything else remaining to address in this PR from your side? Thank you in advance

@github-actions
Copy link

github-actions bot commented Mar 7, 2025

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Mar 7, 2025
@github-actions github-actions bot closed this Mar 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants