Skip to content

Conversation

@CodingCat
Copy link
Contributor

https://issues.apache.org/jira/browse/SPARK-5337

Currently, we didn't consider spark.task.cpus when scheduling the applications in Master, so that we may fall into one of the following cases

  1. the executor gets N cores but we need M cores to run a single task, where N < M
  2. the executor gets N cores, we need M cores to run a single task, where N % M != 0 && N > M; so that we waste some cores in the executor

Patch for YARN is in submitted by @WangTaoTheTonic : #4123

@SparkQA
Copy link

SparkQA commented Jan 21, 2015

Test build #25862 has finished for PR 4129 at commit d383b44.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 21, 2015

Test build #25863 has finished for PR 4129 at commit e28e19f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 21, 2015

Test build #25864 has finished for PR 4129 at commit 523b3b7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@WangTaoTheTonic
Copy link
Contributor

What if app.coresLeft < app.desc.coreNumPerTask?
And should we fix the same issue in mesos mode in this PR?

@CodingCat
Copy link
Contributor Author

Hi, @WangTaoTheTonic , if app.coresLeft < app.desc.coreNumPerTask, we should not assign more cores to the executor...as those cores will just be wasted since we cannot run a task with that...

in the PR, it's just

 while (toAssign >= app.desc.coreNumPerTask) {

and

if (coresToUse >= app.desc.coreNumPerTask) {

@CodingCat CodingCat changed the title [SPARK-5337] respect spark.task.cpus when scheduling Applications [SPARK-5337][Mesos][Standalone] respect spark.task.cpus when scheduling Applications Jan 21, 2015
@CodingCat
Copy link
Contributor Author

just upload the fix for Mesos...

@SparkQA
Copy link

SparkQA commented Jan 21, 2015

Test build #25888 has finished for PR 4129 at commit 619c8b9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many app.desc.coreNumPerTask here.

@CodingCat
Copy link
Contributor Author

Hi, @WangTaoTheTonic, I just addressed your comments

any other comments?

@SparkQA
Copy link

SparkQA commented Jan 21, 2015

Test build #25892 has finished for PR 4129 at commit d54c5f5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 21, 2015

Test build #25893 has finished for PR 4129 at commit 9f2a8be.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems corePerTask and coreNumPerTask have some redundancies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah? what do you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is corePerTask necessary? Why don't just assign conf.getInt("spark.task.cpus", 1) to coreNumPerTask?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh...I tried to embed the validation logic into the assignment, so ...it looks like this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

val coreNumPerTask = conf.getInt("spark.task.cpus", 1)
if (coreNumPerTask < 1) {
throw new IllegalArgumentException(
s"spark.task.cpus is set to an invalid value $coreNumPerTask ")
}

Is it a better way?

@WangTaoTheTonic
Copy link
Contributor

Besides, the # of cores needed by an application is set via spark.cores.max.

What will happen if we set spark.cores.max < spark.task.cpus ? I think the appcalition submitted will hang up infinitely.

@SparkQA
Copy link

SparkQA commented Jan 22, 2015

Test build #25974 has finished for PR 4129 at commit bca1080.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 22, 2015

Test build #25975 has finished for PR 4129 at commit 8b088b2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 12, 2015

Test build #27373 has finished for PR 4129 at commit 43852cb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about fine grained mode in Mesos?

@CodingCat
Copy link
Contributor Author

@WangTaoTheTonic any other comments?

@SparkQA
Copy link

SparkQA commented Mar 12, 2015

Test build #28528 has finished for PR 4129 at commit 6d76f12.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove sc.eventLogCodec?

@WangTaoTheTonic
Copy link
Contributor

Looks good logically. But we need to do some tests based on this commit especially in mesos mode(fine/coarse grained).

@pwendell @JoshRosen Any comments?

@SparkQA
Copy link

SparkQA commented Mar 13, 2015

Test build #28560 has finished for PR 4129 at commit 744e91b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@CodingCat
Copy link
Contributor Author

If I read the code correctly , I don't need to change that, as it starts mesos task with the correct cpu setup https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala#L142

@tnachen
Copy link
Contributor

tnachen commented Mar 23, 2015

@CodingCat sorry you're right, I didn't realize CPUS_PER_TASK was configured to that flag. LGTM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to reduce duplicate code, you can get this from scheduler.CPUS_PER_TASK

@andrewor14
Copy link
Contributor

@CodingCat good to see this being fixed. However, as it stands the existing solution does not seem completely correct. I pointed out a scenario where we will fall into an infinite scheduling loop if the conditions are not met. Also, once you have the time would you mind bringing this up to date? I believe your other patch I just merged in today conflicts quite significantly with your changes here.

@CodingCat
Copy link
Contributor Author

sure, will work on it soon

@SparkQA
Copy link

SparkQA commented Apr 20, 2015

Test build #30601 has finished for PR 4129 at commit da8d446.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class MapConfigProvider extends ConfigProvider
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 20, 2015

Test build #30602 has finished for PR 4129 at commit 55d9143.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 20, 2015

Test build #30603 has finished for PR 4129 at commit c10f980.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class MapConfigProvider extends ConfigProvider
  • This patch does not change any dependencies.

@CodingCat
Copy link
Contributor Author

@andrewor14 , I updated the patch, how about the current version?

@andrewor14
Copy link
Contributor

@CodingCat there have been large changes to the standalone scheduler in 1.5. I don't think this patch in its current state can be easily merged anymore. If you have time, would you mind closing this and reopening a new patch against the latest master?

@CodingCat
Copy link
Contributor Author

sure, let me do it in the weekend

@CodingCat CodingCat closed this Sep 2, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants