-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-5337][Mesos][Standalone] respect spark.task.cpus when scheduling Applications #4129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #25862 has finished for PR 4129 at commit
|
|
Test build #25863 has finished for PR 4129 at commit
|
|
Test build #25864 has finished for PR 4129 at commit
|
|
What if app.coresLeft < app.desc.coreNumPerTask? |
|
Hi, @WangTaoTheTonic , if app.coresLeft < app.desc.coreNumPerTask, we should not assign more cores to the executor...as those cores will just be wasted since we cannot run a task with that... in the PR, it's just and |
|
just upload the fix for Mesos... |
|
Test build #25888 has finished for PR 4129 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too many app.desc.coreNumPerTask here.
|
Hi, @WangTaoTheTonic, I just addressed your comments any other comments? |
|
Test build #25892 has finished for PR 4129 at commit
|
|
Test build #25893 has finished for PR 4129 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems corePerTask and coreNumPerTask have some redundancies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah? what do you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is corePerTask necessary? Why don't just assign conf.getInt("spark.task.cpus", 1) to coreNumPerTask?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh...I tried to embed the validation logic into the assignment, so ...it looks like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val coreNumPerTask = conf.getInt("spark.task.cpus", 1)
if (coreNumPerTask < 1) {
throw new IllegalArgumentException(
s"spark.task.cpus is set to an invalid value $coreNumPerTask ")
}
Is it a better way?
|
Besides, the # of cores needed by an application is set via What will happen if we set |
|
Test build #25974 has finished for PR 4129 at commit
|
|
Test build #25975 has finished for PR 4129 at commit
|
8b088b2 to
43852cb
Compare
|
Test build #27373 has finished for PR 4129 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about fine grained mode in Mesos?
|
@WangTaoTheTonic any other comments? |
|
Test build #28528 has finished for PR 4129 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove sc.eventLogCodec?
|
Looks good logically. But we need to do some tests based on this commit especially in mesos mode(fine/coarse grained). @pwendell @JoshRosen Any comments? |
|
Test build #28560 has finished for PR 4129 at commit
|
|
If I read the code correctly , I don't need to change that, as it starts mesos task with the correct cpu setup https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala#L142 |
|
@CodingCat sorry you're right, I didn't realize CPUS_PER_TASK was configured to that flag. LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to reduce duplicate code, you can get this from scheduler.CPUS_PER_TASK
|
@CodingCat good to see this being fixed. However, as it stands the existing solution does not seem completely correct. I pointed out a scenario where we will fall into an infinite scheduling loop if the conditions are not met. Also, once you have the time would you mind bringing this up to date? I believe your other patch I just merged in today conflicts quite significantly with your changes here. |
|
sure, will work on it soon |
|
Test build #30601 has finished for PR 4129 at commit
|
|
Test build #30602 has finished for PR 4129 at commit
|
|
Test build #30603 has finished for PR 4129 at commit
|
|
@andrewor14 , I updated the patch, how about the current version? |
|
@CodingCat there have been large changes to the standalone scheduler in 1.5. I don't think this patch in its current state can be easily merged anymore. If you have time, would you mind closing this and reopening a new patch against the latest master? |
|
sure, let me do it in the weekend |
https://issues.apache.org/jira/browse/SPARK-5337
Currently, we didn't consider spark.task.cpus when scheduling the applications in Master, so that we may fall into one of the following cases
Patch for YARN is in submitted by @WangTaoTheTonic : #4123