Skip to content

Conversation

@Ngone51
Copy link
Member

@Ngone51 Ngone51 commented Dec 23, 2018

What changes were proposed in this pull request?

availableCpus decrease as tasks allocated, so, we should update availableSlots by availableCpus for barrier taskset to avoid unnecessary resourceOffer process.

How was this patch tested?

existed.

@Ngone51
Copy link
Member Author

Ngone51 commented Dec 23, 2018

ping @jiangxb1987 , please take a look, thanks.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

*/
def resourceOffers(offers: IndexedSeq[WorkerOffer]): Seq[Seq[TaskDescription]] = synchronized {

def availableSlots(availableCpus: Array[Int]): Int = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result is not always correct, please refer to test("don't schedule for a barrier taskSet if available slots are less than pending tasks")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean for "The result is not always correct" ? The way of calculating slots num ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, please refer to the test case above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen and tested that suite before submitting this pr. And I follow the original way of calculating the slots, what's the difference ?

jiangxb1987 pushed a commit that referenced this pull request Sep 27, 2019
…` before checking available slots for barrier taskSet

### What changes were proposed in this pull request?

availableSlots are computed before the for loop looping over all TaskSets in resourceOffers. But the number of slots changes in every iteration, as in every iteration these slots are taken. The number of available slots checked by a barrier task set has therefore to be recomputed in every iteration from availableCpus.

### Why are the changes needed?

Bugfix.
This could make resourceOffer attempt to start a barrier task set, even though it has not enough slots available. That would then be caught by the `require` in line 519, which will throw an exception, which will get caught and ignored by Dispatcher's MessageLoop, so nothing terrible would happen, but the exception would prevent resourceOffers from considering further TaskSets.
Note that launching the barrier TaskSet can still fail if other requirements are not satisfied, and still can be rolled-back by throwing exception in this `require`. Handling it more gracefully remains a TODO in SPARK-24818, but this fix at least should resolve the situation when it's unable to launch because of insufficient slots.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Added UT

Closes #23375

Closes #25946 from juliuszsompolski/SPARK-29263.

Authored-by: Juliusz Sompolski <[email protected]>
Signed-off-by: Xingbo Jiang <[email protected]>
(cherry picked from commit 420abb4)
Signed-off-by: Xingbo Jiang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants