[SPARK-5376][Mesos] MesosExecutor should have correct resources #4170

jongyoul · 2015-01-23T05:47:50Z

Divided task and executor resources
Added spark.mesos.executor.cores and fixed docs

- Divided task and executor resources - Added `spark.mesos.executor.cpus` and fixed docs

jongyoul · 2015-01-23T05:49:32Z

/cc @tnachen @pwendell This PR is about @pwendell 's todo. Review this, please.

SparkQA · 2015-01-23T05:52:42Z

Test build #25997 has started for PR 4170 at commit 71703c8.

This patch merges cleanly.

- Removed `TODO` comments

SparkQA · 2015-01-23T05:57:41Z

Test build #25998 has started for PR 4170 at commit f655eee.

This patch merges cleanly.

tnachen · 2015-01-23T06:26:06Z

docs/configuration.md

Spark code has been referring cpus to cores, so I assume you want to name this spark.mesos.executor.cores
I think we should re-word this, to something like:
The amount of cores to request for running the mesos executor.

I agree. I'm also confused between cpus and cores. I think cores is more proper term. Thanks.

- changed term from `cpus` to `cores` - Reworded docs

SparkQA · 2015-01-23T06:37:30Z

Test build #25999 has started for PR 4170 at commit 9054535.

This patch merges cleanly.

SparkQA · 2015-01-23T06:41:59Z

Test build #25998 has finished for PR 4170 at commit f655eee.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-01-23T06:42:00Z

Test build #25997 has finished for PR 4170 at commit 71703c8.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-01-23T06:42:02Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25998/
Test FAILed.

AmplabJenkins · 2015-01-23T06:42:03Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25997/
Test FAILed.

- changed variable name from `executorCpus` to `executorCores` - Fixed failed test case.

SparkQA · 2015-01-23T07:02:54Z

Test build #26003 has started for PR 4170 at commit a28b666.

This patch merges cleanly.

SparkQA · 2015-01-23T07:34:43Z

Test build #25999 has finished for PR 4170 at commit 9054535.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-01-23T07:34:47Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25999/
Test FAILed.

SparkQA · 2015-01-23T08:10:38Z

Test build #26003 has finished for PR 4170 at commit a28b666.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-01-23T08:10:42Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26003/
Test PASSed.

- Added memResource to MesosTaskInfo

SparkQA · 2015-01-23T09:17:44Z

Test build #26015 has started for PR 4170 at commit d714e8b.

This patch merges cleanly.

SparkQA · 2015-01-23T10:26:06Z

Test build #26015 has finished for PR 4170 at commit d714e8b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-01-23T10:26:10Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26015/
Test PASSed.

jongyoul · 2015-01-26T03:51:49Z

/cc @mateiz Could you please review this PR which is about offering resources to executor and task.

mateiz · 2015-01-28T01:07:26Z

Sorry, what was the problem here? Executors do use memory, in fact they use all of the memory of that JVM (the only reason we assign memory to tasks is that Mesos didn't support tasks with 0 memory).

jongyoul · 2015-01-28T01:32:47Z

@mateiz We agree with one executor and multi task is intended behaviour. In this situation, MesosScheduler offers CPUS_PER_TASK resources to executor when we launch separate task. If we launch two tasks on different salves, we offers 4 * CPUS_PER_TASK(= 2 for executor and 2 for tasks) for running only two tasks. @pwendell thinks that's too much resources and It's enough for executor to have one cores. In my PR, I enable to set executor's cores. In memory, we just offered memory to executor only. If we launch two tasks again, we offers 2 * calculateTotalMemory(sc) to all tasks. I think we offer two executor memories and two task memories. I agree that executor uses memory by itself, but we should fix amount of those value. If we completed two tasks on same slaves - which same frameworks and exeuctor and containers - , mesos' ui shows only calculateTotalMemory for that framework.

jongyoul · 2015-01-28T01:34:52Z

Thus, I believe that executor has executor cores and executor memory setting on ExecutorInfo, and task has its own cores and memories setting on TaskInfo while launching per task.

jongyoul · 2015-01-28T01:59:58Z

@mateiz That's a sample screenshot.

mateiz · 2015-01-28T02:09:18Z

Sorry, I'm not sure I understand about the memory. There should be the same amount of memory for each executor. More executors means more total memory. But the way that was being calculated before is fine, what's wrong with it?

For CPUs, I understand wanting to start executors off at 0 CPUs and just having 1 per task. I actually thought we did that initially, not sure why it's 1.

mateiz · 2015-01-28T02:10:05Z

Basically I don't understand what your patch is fixing. What's an example of config settings that gave the wrong allocation before, and what will they give now?

jongyoul · 2015-01-28T02:22:38Z

Sorry, I haven't shown you my configuration. my configuraion is 5G for SPARK_EXECUTOR_MEMORY and 5 for spark.task.cpus. In my screenshot, we launch two tasks on the same machine. Don't you think It's good to offer task memory twice? My PR gives correct resource management information to mesos' master. For CPUs, I don't know proper value of executor cpus, but not CPUS_TASK_CPUS. Recommend this value, please.

mateiz · 2015-01-28T02:30:15Z

Right, as I said, it doesn't make sense to offer task memory twice. Each executor is a single JVM, and JVMs cannot scale their memory up and down. The executor's memory is set to the same value that we configure that JVM with, with -Xmx. There's no way to make tasks use more memory than that, no matter how many tasks are running on there.

tnachen · 2015-01-28T02:44:07Z

@jongyul sorry didn't get to finish reviewing the PR, and I agree with matei that in spark usage of mesos it doesn't make sense to give tasks memory, as we share the same executor that is kept running.

jongyoul · 2015-01-28T02:52:30Z

I don't know the behaviour in coarse-grained mode, but in fine-grained mode, we use multiple JVM for running tasks. we run spark-class by launcher. This means we launch JVM by running per task. Am I wrong? If I've misunderstood how mesos works, I'm so sorry.

jongyoul · 2015-01-28T03:04:17Z

I believed that when we launch mesos driver launchTasks, container run the command bin/spark-class everytime running task. And in my qna email for mesos, @tnachen answers that one container run multiple command simultaneously. And my some tests show two tasks runs simutaneously because they write a same log file at the same time. And my digging codes results no limit to launch task on a mesos container. However, @mateiz told me that one executor only runs a single JVM and launch a single task at any time.

tnachen · 2015-01-28T03:06:32Z

If you read the fine-grained mode source code, you'll notice that Spark is using the slave id as the executor id, which is what we discussed on the mesos mailing list, that the executor will be re-used if all tasks reuse the same executor id.
Therefore, it's only launching one executor per slave, and if the executor dies Mesos will relaunch it when the task asks for it again.

jongyoul · 2015-01-28T03:10:09Z

@tnachen Yes, I fully understand reusing executor while a framework is alive. However, how about that case that we launch two tasks on a same executor? What you've answered is they are launched at the same time, isn't it?

tnachen · 2015-01-28T03:17:25Z

@jongyoul So an executor can only "launch" one task at a time, but can have multiple tasks running simultaneously as you mentioned.

It doesn't matter if they're all part of the same launchTasks message or seperate, as long as the framework and executor id are the same it will be launched in the same executor.

jongyoul · 2015-01-28T03:30:42Z

@tnachen @mateiz So sorry for taking up a lot of time. I've found that only one executor as a process runs at any time, and I understand executor can have multiple tasks at the same time. I've believed each executor is launched separately when driver launchTasks.

jongyoul · 2015-01-28T03:39:52Z

I'll close this PR. It's wrong approach.

elyast · 2015-03-13T04:48:08Z

One comment, however if you run multiple Spark applications even tough executor-id == slave-id, multiple executors can be started on the same host. (And every one of them will consume 1 CPU without scheduling any tasks). This can be painful when you want to run multiple streaming applications on Mesos in fine grained mode, because each streaming driver's executors will consume 1 CPU...

Screen shots illustrate situation on single slave, when there are two executors running for 2 different Spark applications (one is streaming app, second one is Zeppelin), and as u can see there 0 active tasks the consumption of CPU is 2.

tnachen · 2015-03-13T08:37:52Z

@elyast yes you are correct it is only applicable per Spark app. It is entirely possible to make executor cpu less than 1 (as it's based on shares), but it's not possible for now to share mesos executor across apps.

elyast · 2015-03-13T15:28:37Z

Sure its totally fine not to share, but at least it should be possible to configure allocation. Allocating 1 CPU per executor may just too much, obviously it depends how cpu intensive is his work, but I guess @mateiz know that much better than me

jongyoul · 2015-03-15T02:14:36Z

@elyast Thanks for interesting this PR, which was about resources of cores and memory. I misunderstood how mesos works specially in memory side, so I closed this PR. However, I agree with you that executor cores are sometime too much. @tnachen we cannot adjust memory issue, but, changing executor cores is meaningful and should fix that executor has same cores with CPUS_PER_TASK initially. See TODO(pwendell) in MesosSchedulerBackend.scala. I want to fix this TODO to extract a configuration parameter. What do you think of it?

tnachen · 2015-03-15T05:09:50Z

I think making it a configurable parameter sounda reasonable to me.

jongyoul · 2015-03-17T02:00:06Z

@tnachen @elyast I made a new issue about configuring mesos executor cores. https://issues.apache.org/jira/browse/SPARK-6350

elyast · 2015-03-17T04:50:40Z

cool thanks

[SPARK-5376][Mesos] MesosExecutor should have correct resources

71703c8

- Divided task and executor resources - Added `spark.mesos.executor.cpus` and fixed docs

[SPARK-5376][Mesos] MesosExecutor should have correct resources

f655eee

- Removed `TODO` comments

tnachen reviewed Jan 23, 2015
View reviewed changes

[SPARK-5376][Mesos] MesosExecutor should have correct resources

9054535

- changed term from `cpus` to `cores` - Reworded docs

[SPARK-5376][Mesos] MesosExecutor should have correct resources

a28b666

- changed variable name from `executorCpus` to `executorCores` - Fixed failed test case.

[SPARK-5376][Mesos] MesosExecutor should have correct resources

d714e8b

- Added memResource to MesosTaskInfo

jongyoul closed this Jan 28, 2015

jongyoul mentioned this pull request Jan 28, 2015

[SPARK-5198][Mesos] Change executorId more unique on mesos fine-grained ... #3994

Closed

jongyoul mentioned this pull request Mar 17, 2015

[SPARK-6350][Mesos] Make mesosExecutorCores configurable in mesos "fine-grained" mode #5063

Closed

[SPARK-5376][Mesos] MesosExecutor should have correct resources #4170

[SPARK-5376][Mesos] MesosExecutor should have correct resources #4170

Uh oh!

Conversation

jongyoul commented Jan 23, 2015

Uh oh!

jongyoul commented Jan 23, 2015

Uh oh!

SparkQA commented Jan 23, 2015

Uh oh!

SparkQA commented Jan 23, 2015

Uh oh!

tnachen Jan 23, 2015

Choose a reason for hiding this comment

Uh oh!

jongyoul Jan 23, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 23, 2015

Uh oh!

SparkQA commented Jan 23, 2015

Uh oh!

SparkQA commented Jan 23, 2015

Uh oh!

AmplabJenkins commented Jan 23, 2015

Uh oh!

AmplabJenkins commented Jan 23, 2015

Uh oh!

SparkQA commented Jan 23, 2015

Uh oh!

SparkQA commented Jan 23, 2015

Uh oh!

AmplabJenkins commented Jan 23, 2015

Uh oh!

SparkQA commented Jan 23, 2015

Uh oh!

AmplabJenkins commented Jan 23, 2015

Uh oh!

SparkQA commented Jan 23, 2015

Uh oh!

SparkQA commented Jan 23, 2015

Uh oh!

AmplabJenkins commented Jan 23, 2015

Uh oh!

jongyoul commented Jan 26, 2015

Uh oh!

mateiz commented Jan 28, 2015

Uh oh!

jongyoul commented Jan 28, 2015

Uh oh!

jongyoul commented Jan 28, 2015

Uh oh!

jongyoul commented Jan 28, 2015

Uh oh!

mateiz commented Jan 28, 2015

Uh oh!

mateiz commented Jan 28, 2015

Uh oh!

jongyoul commented Jan 28, 2015

Uh oh!

mateiz commented Jan 28, 2015

Uh oh!

tnachen commented Jan 28, 2015

Uh oh!

jongyoul commented Jan 28, 2015

Uh oh!

jongyoul commented Jan 28, 2015

Uh oh!

tnachen commented Jan 28, 2015

Uh oh!

jongyoul commented Jan 28, 2015

Uh oh!

tnachen commented Jan 28, 2015

Uh oh!

jongyoul commented Jan 28, 2015

Uh oh!

jongyoul commented Jan 28, 2015

Uh oh!

elyast commented Mar 13, 2015

Uh oh!