[SPARK-5337][Mesos][Standalone] respect spark.task.cpus when scheduling Applications #4129

CodingCat · 2015-01-21T01:42:23Z

https://issues.apache.org/jira/browse/SPARK-5337

Currently, we didn't consider spark.task.cpus when scheduling the applications in Master, so that we may fall into one of the following cases

the executor gets N cores but we need M cores to run a single task, where N < M
the executor gets N cores, we need M cores to run a single task, where N % M != 0 && N > M; so that we waste some cores in the executor

Patch for YARN is in submitted by @WangTaoTheTonic : #4123

SparkQA · 2015-01-21T01:48:28Z

Test build #25862 has finished for PR 4129 at commit d383b44.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-01-21T02:58:37Z

Test build #25863 has finished for PR 4129 at commit e28e19f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-01-21T03:06:04Z

Test build #25864 has finished for PR 4129 at commit 523b3b7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

WangTaoTheTonic · 2015-01-21T08:42:59Z

What if app.coresLeft < app.desc.coreNumPerTask?
And should we fix the same issue in mesos mode in this PR?

CodingCat · 2015-01-21T11:37:25Z

Hi， @WangTaoTheTonic , if app.coresLeft < app.desc.coreNumPerTask, we should not assign more cores to the executor...as those cores will just be wasted since we cannot run a task with that...

in the PR, it's just

 while (toAssign >= app.desc.coreNumPerTask) {

and

if (coresToUse >= app.desc.coreNumPerTask) {

CodingCat · 2015-01-21T11:59:26Z

just upload the fix for Mesos...

SparkQA · 2015-01-21T13:11:06Z

Test build #25888 has finished for PR 4129 at commit 619c8b9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

WangTaoTheTonic · 2015-01-21T13:42:44Z

core/src/main/scala/org/apache/spark/deploy/master/Master.scala

Too many app.desc.coreNumPerTask here.

CodingCat · 2015-01-21T17:37:37Z

Hi, @WangTaoTheTonic, I just addressed your comments

any other comments?

SparkQA · 2015-01-21T18:43:03Z

Test build #25892 has finished for PR 4129 at commit d54c5f5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-01-21T18:51:33Z

Test build #25893 has finished for PR 4129 at commit 9f2a8be.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

WangTaoTheTonic · 2015-01-22T02:27:42Z

core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala

Seems corePerTask and coreNumPerTask have some redundancies.

ah? what do you mean?

Is corePerTask necessary? Why don't just assign conf.getInt("spark.task.cpus", 1) to coreNumPerTask?

oh...I tried to embed the validation logic into the assignment, so ...it looks like this

val coreNumPerTask = conf.getInt("spark.task.cpus", 1)
if (coreNumPerTask < 1) {
throw new IllegalArgumentException(
s"spark.task.cpus is set to an invalid value $coreNumPerTask ")
}

Is it a better way?

WangTaoTheTonic · 2015-01-22T14:54:11Z

Besides, the # of cores needed by an application is set via spark.cores.max.

What will happen if we set spark.cores.max < spark.task.cpus ? I think the appcalition submitted will hang up infinitely.

SparkQA · 2015-01-22T23:01:07Z

Test build #25974 has finished for PR 4129 at commit bca1080.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-01-22T23:10:23Z

Test build #25975 has finished for PR 4129 at commit 8b088b2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-02-12T21:55:35Z

Test build #27373 has finished for PR 4129 at commit 43852cb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tnachen · 2015-02-14T09:14:11Z

core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala

How about fine grained mode in Mesos?

CodingCat · 2015-03-12T18:21:08Z

@WangTaoTheTonic any other comments?

SparkQA · 2015-03-12T19:44:42Z

Test build #28528 has finished for PR 4129 at commit 6d76f12.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

WangTaoTheTonic · 2015-03-13T09:19:07Z

core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala

Why remove sc.eventLogCodec?

WangTaoTheTonic · 2015-03-13T09:36:04Z

Looks good logically. But we need to do some tests based on this commit especially in mesos mode(fine/coarse grained).

@pwendell @JoshRosen Any comments?

SparkQA · 2015-03-13T12:19:28Z

Test build #28560 has finished for PR 4129 at commit 744e91b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

CodingCat · 2015-03-13T16:46:02Z

If I read the code correctly , I don't need to change that, as it starts mesos task with the correct cpu setup https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala#L142

tnachen · 2015-03-23T22:19:12Z

@CodingCat sorry you're right, I didn't realize CPUS_PER_TASK was configured to that flag. LGTM

andrewor14 · 2015-04-15T01:59:08Z

core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala

to reduce duplicate code, you can get this from scheduler.CPUS_PER_TASK

andrewor14 · 2015-04-15T02:21:49Z

@CodingCat good to see this being fixed. However, as it stands the existing solution does not seem completely correct. I pointed out a scenario where we will fall into an infinite scheduling loop if the conditions are not met. Also, once you have the time would you mind bringing this up to date? I believe your other patch I just merged in today conflicts quite significantly with your changes here.

CodingCat · 2015-04-15T10:44:15Z

sure, will work on it soon

SparkQA · 2015-04-20T19:35:12Z

Test build #30601 has finished for PR 4129 at commit da8d446.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- public class MapConfigProvider extends ConfigProvider
This patch does not change any dependencies.

SparkQA · 2015-04-20T21:31:36Z

Test build #30602 has finished for PR 4129 at commit 55d9143.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

SparkQA · 2015-04-20T21:46:38Z

Test build #30603 has finished for PR 4129 at commit c10f980.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- public class MapConfigProvider extends ConfigProvider
This patch does not change any dependencies.

CodingCat · 2015-04-20T23:10:58Z

@andrewor14 , I updated the patch, how about the current version?

andrewor14 · 2015-09-02T02:43:47Z

@CodingCat there have been large changes to the standalone scheduler in 1.5. I don't think this patch in its current state can be easily merged anymore. If you have time, would you mind closing this and reopening a new patch against the latest master?

CodingCat · 2015-09-02T10:12:46Z

sure, let me do it in the weekend

WangTaoTheTonic mentioned this pull request Jan 21, 2015

[SPARK-5336][YARN]spark.executor.cores must not be less than spark.task.cpus #4123

Closed

CodingCat changed the title ~~[SPARK-5337] respect spark.task.cpus when scheduling Applications~~ [SPARK-5337][Mesos][Standalone] respect spark.task.cpus when scheduling Applications Jan 21, 2015

WangTaoTheTonic reviewed Jan 21, 2015
View reviewed changes

core/src/main/scala/org/apache/spark/deploy/master/Master.scala Outdated

Copy link

Contributor

WangTaoTheTonic Jan 21, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many app.desc.coreNumPerTask here.

WangTaoTheTonic reviewed Jan 22, 2015
View reviewed changes

CodingCat force-pushed the SPARK-5337 branch from 8b088b2 to 43852cb Compare February 12, 2015 20:35

tnachen reviewed Feb 14, 2015
View reviewed changes

CodingCat force-pushed the SPARK-5337 branch from 43852cb to 6d76f12 Compare March 12, 2015 18:20

WangTaoTheTonic reviewed Mar 13, 2015
View reviewed changes

andrewor14 reviewed Apr 15, 2015
View reviewed changes

CodingCat added 8 commits April 20, 2015 14:00

respect spark.task.cpus when scheduling Applications

b49170a

fix style error

328bedc

respect spark.tasks.cpu in Mesos

7f1bb91

address the comments

4125f23

more fix

a21e608

another way

f1615cb

check maxCores and coreNumPerTask

e771377

fix rebase mistake

0aec159

CodingCat force-pushed the SPARK-5337 branch from 744e91b to da8d446 Compare April 20, 2015 19:31

address the comments

da8d446

CodingCat added 2 commits April 20, 2015 15:48

work around scalastyle checker

55d9143

fix several issues on allocation

c10f980

CodingCat closed this Sep 2, 2015

[SPARK-5337][Mesos][Standalone] respect spark.task.cpus when scheduling Applications #4129

[SPARK-5337][Mesos][Standalone] respect spark.task.cpus when scheduling Applications #4129

Uh oh!

Conversation

CodingCat commented Jan 21, 2015

Uh oh!

SparkQA commented Jan 21, 2015

Uh oh!

SparkQA commented Jan 21, 2015

Uh oh!

SparkQA commented Jan 21, 2015

Uh oh!

WangTaoTheTonic commented Jan 21, 2015

Uh oh!

CodingCat commented Jan 21, 2015

Uh oh!

CodingCat commented Jan 21, 2015

Uh oh!

SparkQA commented Jan 21, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CodingCat commented Jan 21, 2015

Uh oh!

SparkQA commented Jan 21, 2015

Uh oh!

SparkQA commented Jan 21, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WangTaoTheTonic commented Jan 22, 2015

Uh oh!

SparkQA commented Jan 22, 2015

Uh oh!

SparkQA commented Jan 22, 2015

Uh oh!

SparkQA commented Feb 12, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CodingCat commented Mar 12, 2015

Uh oh!

SparkQA commented Mar 12, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WangTaoTheTonic commented Mar 13, 2015

Uh oh!

SparkQA commented Mar 13, 2015

Uh oh!

CodingCat commented Mar 13, 2015

Uh oh!

tnachen commented Mar 23, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented Apr 15, 2015

Uh oh!

CodingCat commented Apr 15, 2015

Uh oh!

SparkQA commented Apr 20, 2015

Uh oh!

SparkQA commented Apr 20, 2015

Uh oh!

SparkQA commented Apr 20, 2015

Uh oh!

CodingCat commented Apr 20, 2015

Uh oh!

andrewor14 commented Sep 2, 2015

Uh oh!

CodingCat commented Sep 2, 2015

Uh oh!