[SPARK-17984][YARN][Mesos][Deploy][WIP] add executor launch prefix support #16411

xwu-intel · 2016-12-27T01:05:12Z

What changes were proposed in this pull request?

Complete Standalone, Yarn, Mesos support from #15579 and adapt to latest master branch.

How was this patch tested?

Standalone and Yarn mode tested on spark cluster with 1 master + 2 slaves, Mesos mode is not tested due to lack of resources.

How to use

Global Environment Variable
export SPARK_EXECUTOR_LAUNCH_PREFIX="/tmp/spark-numa-example.sh"
Config Files
- Standalone mode: add SPARK_EXECUTOR_LAUNCH_PREFIX="/tmp/spark-numa-example.sh" in conf/spark-env.sh
- Yarn client mode: add spark.yarn.appMasterEnv.SPARK_EXECUTOR_LAUNCH_PREFIX="/tmp/spark-numa-example.sh" in conf/spark-defaults.conf

Attach the example script for adding executor launch prefix to enable NUMA-aware binding for executors. Same apply to adding other launch prefix such as strace, vtune etc..

spark-numa-example.zip

srowen · 2016-12-27T20:49:39Z

Why is this different from #16411 ?

xwu-intel · 2016-12-28T00:42:41Z

@srowen do you mean #15579?

Fix some character escape issues of [SPARK-17984][YARN][Mesos][Deploy][WIP] Added support for executor launch with leading arguments #15579 if the command string contains some special characters like '
Add Standalone and Mesos support as [SPARK-17984][YARN][Mesos][Deploy][WIP] Added support for executor launch with leading arguments #15579 only support Yarn mode

srowen · 2016-12-28T10:26:01Z

Oops yes wrong copy/paste.
We have two overlapping PRs then. It'd be better to collaborate on one. If the other is inactive can you close it @sheepduke ? but I am not sure we're going to merge this anyway.

xwu-intel · 2016-12-29T02:49:22Z

@srowen sheepduke is my colleague who just left. I am continuing to refine his work. It's OK to close #15579 . @sheepduke

tgravescs · 2017-01-03T16:26:31Z

I'm a little worried this is very open ended and could cause a lot of issues with users using it wrong. This opens up customers to basically do anything they want while launching an executor. Even not launching an executor since really this is replacing the normal executor launch command with this script. It relies on that customers script to actually launch the executor based on the command passed in.

If this goes in it definitely need much better explanation and docs on how to properly write and use it. I would rather see it being more truly of a pre-init script then a total replacement. Perhaps the spark executor launch command is a script that will pre-pend some users stuff but then makes sure it still calls the normal java executor launch command.

Also what about yarn cluster mode?

Do you have test results of configuring numa that shows definite improvements? How does this compare to Automatic NUMA balancing that I believe is on by default in Rhel7. I realize perhaps most machines aren't running rhel7 yet but wondering if it was tried.

does numactl require special priveleges (like root) to do certain operations?

The script looks very basic which I understand for an example is fine but it seems like there are definitely things missing and things people could get wrong.
For instance, how do you handle multiple containers on a node. How does this work when you specify an executor to have X cores.

Note I haven't done any tuning of numa myself so sorry if some of these questions seem obvious.
How does processes with numa configured interact with processes that don't? It seems like tuning things right could be quite hard especially if running on something like yarn where other applications aren't using the same logic to configure numa.

xwu-intel · 2017-01-09T06:00:12Z

@tgravescs Thanks for your comments. There are two things we have tried.

To add a prefix command on executor launch
I agree this opens a door for user to do anything for launching the executor. This patch is intended for profiling and debugging. May not fit for production. I am not sure it's the best form to implement, it fixed our problem quickly.
NUMA
The script attached about NUMA is only an example to show how to use this patch. User can customize it to fulfill their specific needs. Automatic NUMA balancing is by default enabled on our system. As mentioned in the original Redhat slides, It can only deal with certain cases and still can not beat manual pinning. From our experiments, not all cases have big NUMA penalties. We should use some platform tools such as Intel VTune to identify if there is a NUMA problem and tune case by case.

AmplabJenkins · 2018-06-09T00:24:14Z

Can one of the admins verify this patch?

add executor launch prefix support

62c8552

xwu-intel mentioned this pull request Dec 29, 2016

[SPARK-17984][YARN][Mesos][Deploy][WIP] Added support for executor launch with leading arguments #15579

Closed

Sync upstream

a637c6a

srowen mentioned this pull request Aug 20, 2018

[BUILD] Close stale PRs #22159

Closed

asfgit closed this in b8788b3 Aug 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-17984][YARN][Mesos][Deploy][WIP] add executor launch prefix support #16411

[SPARK-17984][YARN][Mesos][Deploy][WIP] add executor launch prefix support #16411

Uh oh!

xwu-intel commented Dec 27, 2016

Uh oh!

srowen commented Dec 27, 2016

Uh oh!

xwu-intel commented Dec 28, 2016

Uh oh!

srowen commented Dec 28, 2016

Uh oh!

xwu-intel commented Dec 29, 2016

Uh oh!

tgravescs commented Jan 3, 2017

Uh oh!

xwu-intel commented Jan 9, 2017

Uh oh!

AmplabJenkins commented Jun 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-17984][YARN][Mesos][Deploy][WIP] add executor launch prefix support #16411

[SPARK-17984][YARN][Mesos][Deploy][WIP] add executor launch prefix support #16411

Uh oh!

Conversation

xwu-intel commented Dec 27, 2016

What changes were proposed in this pull request?

How was this patch tested?

How to use

Uh oh!

srowen commented Dec 27, 2016

Uh oh!

xwu-intel commented Dec 28, 2016

Uh oh!

srowen commented Dec 28, 2016

Uh oh!

xwu-intel commented Dec 29, 2016

Uh oh!

tgravescs commented Jan 3, 2017

Uh oh!

xwu-intel commented Jan 9, 2017

Uh oh!

AmplabJenkins commented Jun 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants