Skip to content

Conversation

@zuotingbing
Copy link

@zuotingbing zuotingbing commented Oct 26, 2018

What changes were proposed in this pull request?

we should filter the workOffers with freeCores>CPUS_PER_TASK for better performance

How was this patch tested?

Exist tests

@zuotingbing
Copy link
Author

2018-10-26_162822

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@zuotingbing zuotingbing changed the title [SPARK-25852][Core] we should filter the workOffers of which freeCores>0 when make fake resource offers on all executors [SPARK-25852][Core] we should filter the workOffers of which freeCores>0 for better performance Oct 26, 2018
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know this code well but the comment above implies that it means to make an offer on all executors?
What is the performance impact anyway?

Copy link
Author

@zuotingbing zuotingbing Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On our cluster for production there are many executors and tasksets/tasks. As we know there is a round-robin manner to fill each node with tasks and it will be scheduled for each second by default("spark.scheduler.revive.interval", "1s"). It seams make no sense to schedule tasks to executors which have no free cores.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm saying there is a comment a few lines above that describes this as a fake offer that explicitly intends to contact all executors. I think we need to figure out if that's still relevant. I don't have git in front of me but check git blame or GitHub to see when this was written?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the git log, at the beginning function makeOffers do not filter out any resource, maybe it is the reason why the comment is "Make fake resource offers on all executors"
// Make fake resource offers on all executors def makeOffers() { launchTasks(scheduler.resourceOffers( executorHost.toArray.map {case (id, host) => new WorkerOffer(id, host, freeCores(id))})) }

Copy link
Author

@zuotingbing zuotingbing Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, if freeCores < CPUS_PER_TASK, the code as fellow in resourceOffers() is inefficient since o.cores / CPUS_PER_TASK = 0
val tasks = shuffledOffers.map(o => new ArrayBuffer[TaskDescription](o.cores / CPUS_PER_TASK)) val availableSlots = shuffledOffers.map(o => o.cores / CPUS_PER_TASK).sum

@zuotingbing zuotingbing changed the title [SPARK-25852][Core] we should filter the workOffers of which freeCores>0 for better performance [SPARK-25852][Core] we should filter the workOffers of which freeCores>=CPUS_PER_TASK for better performance Oct 30, 2018
@zuotingbing zuotingbing changed the title [SPARK-25852][Core] we should filter the workOffers of which freeCores>=CPUS_PER_TASK for better performance [SPARK-25852][Core] we should filter the workOffers with freeCores>=CPUS_PER_TASK for better performance Oct 30, 2018
@srowen
Copy link
Member

srowen commented Oct 30, 2018

Yes, that is the comment I have been referring to. So it seems you can't filter, right? it's not scheduling work here. @jiangxb1987 do you know this part of the code?

@jiangxb1987
Copy link
Contributor

What do you mean by "better performance" ? If that means we can spend less time on TaskSchedulerImpl.resourceOffers() then I agree it's true, but AFAIK it's never reported this can be a bottleneck of the whole cluster, so maybe the perf gain is trivial here. If you expect better task distribution over existing executors then I don't see any case can be improved by this proposed change. Please correct me if I'm wrong.

@jiangxb1987
Copy link
Contributor

It may happen that a busy executor is marked as lost and later it re-register to the driver, in that case currently we call makeOffers() and that will add the executor into TaskSchedulerImpl.hostToExecutors. This is bad implementation here since it shall not have depend on the makeOffers() function to update a unrelated protected val, but that's what we have for now.

@srowen srowen mentioned this pull request Nov 3, 2018
@asfgit asfgit closed this in 463a676 Nov 4, 2018
@zuotingbing zuotingbing deleted the SPARK-25852 branch March 16, 2021 01:18
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
Closes apache#22859
Closes apache#22849
Closes apache#22591
Closes apache#22322
Closes apache#22312
Closes apache#19590

Closes apache#22934 from wangyum/CloseStalePRs.

Authored-by: Yuming Wang <[email protected]>
Signed-off-by: hyukjinkwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants