[SPARK-25852][Core] we should filter the workOffers with freeCores>=CPUS_PER_TASK for better performance #22849

zuotingbing · 2018-10-26T09:42:51Z

What changes were proposed in this pull request?

we should filter the workOffers with freeCores>CPUS_PER_TASK for better performance

How was this patch tested?

Exist tests

zuotingbing · 2018-10-26T09:43:16Z

AmplabJenkins · 2018-10-26T09:47:29Z

Can one of the admins verify this patch?

srowen · 2018-10-27T13:44:45Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

I don't know this code well but the comment above implies that it means to make an offer on all executors?
What is the performance impact anyway?

On our cluster for production there are many executors and tasksets/tasks. As we know there is a round-robin manner to fill each node with tasks and it will be scheduled for each second by default("spark.scheduler.revive.interval", "1s"). It seams make no sense to schedule tasks to executors which have no free cores.

I'm saying there is a comment a few lines above that describes this as a fake offer that explicitly intends to contact all executors. I think we need to figure out if that's still relevant. I don't have git in front of me but check git blame or GitHub to see when this was written?

I checked the git log, at the beginning function makeOffers do not filter out any resource, maybe it is the reason why the comment is "Make fake resource offers on all executors"
// Make fake resource offers on all executors def makeOffers() { launchTasks(scheduler.resourceOffers( executorHost.toArray.map {case (id, host) => new WorkerOffer(id, host, freeCores(id))})) }

BTW, if freeCores < CPUS_PER_TASK, the code as fellow in resourceOffers() is inefficient since o.cores / CPUS_PER_TASK = 0
val tasks = shuffledOffers.map(o => new ArrayBuffer[TaskDescription](o.cores / CPUS_PER_TASK)) val availableSlots = shuffledOffers.map(o => o.cores / CPUS_PER_TASK).sum

… CPUS_PER_TASK for better performance

srowen · 2018-10-30T12:55:46Z

Yes, that is the comment I have been referring to. So it seems you can't filter, right? it's not scheduling work here. @jiangxb1987 do you know this part of the code?

jiangxb1987 · 2018-10-30T16:39:05Z

What do you mean by "better performance" ? If that means we can spend less time on TaskSchedulerImpl.resourceOffers() then I agree it's true, but AFAIK it's never reported this can be a bottleneck of the whole cluster, so maybe the perf gain is trivial here. If you expect better task distribution over existing executors then I don't see any case can be improved by this proposed change. Please correct me if I'm wrong.

jiangxb1987 · 2018-10-30T16:50:54Z

It may happen that a busy executor is marked as lost and later it re-register to the driver, in that case currently we call makeOffers() and that will add the executor into TaskSchedulerImpl.hostToExecutors. This is bad implementation here since it shall not have depend on the makeOffers() function to update a unrelated protected val, but that's what we have for now.

Closes apache#22859 Closes apache#22849 Closes apache#22591 Closes apache#22322 Closes apache#22312 Closes apache#19590 Closes apache#22934 from wangyum/CloseStalePRs. Authored-by: Yuming Wang <[email protected]> Signed-off-by: hyukjinkwon <[email protected]>

zuotingbing changed the title ~~[SPARK-25852][Core] we should filter the workOffers of which freeCores>0 when make fake resource offers on all executors~~ [SPARK-25852][Core] we should filter the workOffers of which freeCores>0 for better performance Oct 26, 2018

srowen reviewed Oct 27, 2018

View reviewed changes

zuotingbing changed the title ~~[SPARK-25852][Core] we should filter the workOffers of which freeCores>0 for better performance~~ [SPARK-25852][Core] we should filter the workOffers of which freeCores>=CPUS_PER_TASK for better performance Oct 30, 2018

zuotingbing changed the title ~~[SPARK-25852][Core] we should filter the workOffers of which freeCores>=CPUS_PER_TASK for better performance~~ [SPARK-25852][Core] we should filter the workOffers with freeCores>=CPUS_PER_TASK for better performance Oct 30, 2018

zuotingbing force-pushed the SPARK-25852 branch from bdb500c to 084b93f Compare October 30, 2018 09:08

[SPARK-25852][Core] we should filter the workOffers with freeCores >=…

a0a8896

… CPUS_PER_TASK for better performance

zuotingbing force-pushed the SPARK-25852 branch from 084b93f to a0a8896 Compare October 30, 2018 09:22

srowen mentioned this pull request Nov 3, 2018

[INFRA] Close stale PRs #22934

Closed

asfgit closed this in 463a676 Nov 4, 2018

zuotingbing deleted the SPARK-25852 branch March 16, 2021 01:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-25852][Core] we should filter the workOffers with freeCores>=CPUS_PER_TASK for better performance #22849

[SPARK-25852][Core] we should filter the workOffers with freeCores>=CPUS_PER_TASK for better performance #22849

Uh oh!

zuotingbing commented Oct 26, 2018 •

edited

Loading

Uh oh!

zuotingbing commented Oct 26, 2018

Uh oh!

AmplabJenkins commented Oct 26, 2018

Uh oh!

srowen Oct 27, 2018

Uh oh!

zuotingbing Oct 29, 2018 •

edited

Loading

Uh oh!

srowen Oct 29, 2018

Uh oh!

zuotingbing Oct 30, 2018

Uh oh!

zuotingbing Oct 30, 2018 •

edited

Loading

Uh oh!

srowen commented Oct 30, 2018

Uh oh!

jiangxb1987 commented Oct 30, 2018

Uh oh!

jiangxb1987 commented Oct 30, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-25852][Core] we should filter the workOffers with freeCores>=CPUS_PER_TASK for better performance #22849

[SPARK-25852][Core] we should filter the workOffers with freeCores>=CPUS_PER_TASK for better performance #22849

Uh oh!

Conversation

zuotingbing commented Oct 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

zuotingbing commented Oct 26, 2018

Uh oh!

AmplabJenkins commented Oct 26, 2018

Uh oh!

srowen Oct 27, 2018

Choose a reason for hiding this comment

Uh oh!

zuotingbing Oct 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srowen Oct 29, 2018

Choose a reason for hiding this comment

Uh oh!

zuotingbing Oct 30, 2018

Choose a reason for hiding this comment

Uh oh!

zuotingbing Oct 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srowen commented Oct 30, 2018

Uh oh!

jiangxb1987 commented Oct 30, 2018

Uh oh!

jiangxb1987 commented Oct 30, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zuotingbing commented Oct 26, 2018 •

edited

Loading

zuotingbing Oct 29, 2018 •

edited

Loading

zuotingbing Oct 30, 2018 •

edited

Loading