Skip to content

Commit 989dcb6

Browse files
committed
tick: Handle broadcast wakeup of multiple cpus
Some brilliant hardware implementations wake multiple cores when the broadcast timer fires. This leads to the following interesting problem: CPU0 CPU1 wakeup from idle wakeup from idle leave broadcast mode leave broadcast mode restart per cpu timer restart per cpu timer go back to idle handle broadcast (empty mask) enter broadcast mode programm broadcast device enter broadcast mode programm broadcast device So what happens is that due to the forced reprogramming of the cpu local timer, we need to set a event in the future. Now if we manage to go back to idle before the timer fires, we switch off the timer and arm the broadcast device with an already expired time (covered by forced mode). So in the worst case we repeat the above ping pong forever. Unfortunately we have no information about what caused the wakeup, but we can check current time against the expiry time of the local cpu. If the local event is already in the past, we know that the broadcast timer is about to fire and send an IPI. So we mark ourself as an IPI target even if we left broadcast mode and avoid the reprogramming of the local cpu timer. This still leaves the possibility that a CPU which is not handling the broadcast interrupt is going to reach idle again before the IPI arrives. This can't be solved in the core code and will be handled in follow up patches. Reported-by: Jason Liu <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: LAK <[email protected]> Cc: John Stultz <[email protected]> Cc: Arjan van de Veen <[email protected]> Cc: Lorenzo Pieralisi <[email protected]> Tested-by: Santosh Shilimkar <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
1 parent 26517f3 commit 989dcb6

File tree

1 file changed

+58
-1
lines changed

1 file changed

+58
-1
lines changed

kernel/time/tick-broadcast.c

Lines changed: 58 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -393,6 +393,7 @@ int tick_resume_broadcast(void)
393393

394394
static cpumask_var_t tick_broadcast_oneshot_mask;
395395
static cpumask_var_t tick_broadcast_pending_mask;
396+
static cpumask_var_t tick_broadcast_force_mask;
396397

397398
/*
398399
* Exposed for debugging: see timer_list.c
@@ -483,6 +484,10 @@ static void tick_handle_oneshot_broadcast(struct clock_event_device *dev)
483484
}
484485
}
485486

487+
/* Take care of enforced broadcast requests */
488+
cpumask_or(tmpmask, tmpmask, tick_broadcast_force_mask);
489+
cpumask_clear(tick_broadcast_force_mask);
490+
486491
/*
487492
* Wakeup the cpus which have an expired event.
488493
*/
@@ -518,6 +523,7 @@ void tick_broadcast_oneshot_control(unsigned long reason)
518523
struct clock_event_device *bc, *dev;
519524
struct tick_device *td;
520525
unsigned long flags;
526+
ktime_t now;
521527
int cpu;
522528

523529
/*
@@ -545,7 +551,16 @@ void tick_broadcast_oneshot_control(unsigned long reason)
545551
WARN_ON_ONCE(cpumask_test_cpu(cpu, tick_broadcast_pending_mask));
546552
if (!cpumask_test_and_set_cpu(cpu, tick_broadcast_oneshot_mask)) {
547553
clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
548-
if (dev->next_event.tv64 < bc->next_event.tv64)
554+
/*
555+
* We only reprogram the broadcast timer if we
556+
* did not mark ourself in the force mask and
557+
* if the cpu local event is earlier than the
558+
* broadcast event. If the current CPU is in
559+
* the force mask, then we are going to be
560+
* woken by the IPI right away.
561+
*/
562+
if (!cpumask_test_cpu(cpu, tick_broadcast_force_mask) &&
563+
dev->next_event.tv64 < bc->next_event.tv64)
549564
tick_broadcast_set_event(bc, cpu, dev->next_event, 1);
550565
}
551566
} else {
@@ -566,6 +581,47 @@ void tick_broadcast_oneshot_control(unsigned long reason)
566581
tick_broadcast_pending_mask))
567582
goto out;
568583

584+
/*
585+
* If the pending bit is not set, then we are
586+
* either the CPU handling the broadcast
587+
* interrupt or we got woken by something else.
588+
*
589+
* We are not longer in the broadcast mask, so
590+
* if the cpu local expiry time is already
591+
* reached, we would reprogram the cpu local
592+
* timer with an already expired event.
593+
*
594+
* This can lead to a ping-pong when we return
595+
* to idle and therefor rearm the broadcast
596+
* timer before the cpu local timer was able
597+
* to fire. This happens because the forced
598+
* reprogramming makes sure that the event
599+
* will happen in the future and depending on
600+
* the min_delta setting this might be far
601+
* enough out that the ping-pong starts.
602+
*
603+
* If the cpu local next_event has expired
604+
* then we know that the broadcast timer
605+
* next_event has expired as well and
606+
* broadcast is about to be handled. So we
607+
* avoid reprogramming and enforce that the
608+
* broadcast handler, which did not run yet,
609+
* will invoke the cpu local handler.
610+
*
611+
* We cannot call the handler directly from
612+
* here, because we might be in a NOHZ phase
613+
* and we did not go through the irq_enter()
614+
* nohz fixups.
615+
*/
616+
now = ktime_get();
617+
if (dev->next_event.tv64 <= now.tv64) {
618+
cpumask_set_cpu(cpu, tick_broadcast_force_mask);
619+
goto out;
620+
}
621+
/*
622+
* We got woken by something else. Reprogram
623+
* the cpu local timer device.
624+
*/
569625
tick_program_event(dev->next_event, 1);
570626
}
571627
}
@@ -707,5 +763,6 @@ void __init tick_broadcast_init(void)
707763
#ifdef CONFIG_TICK_ONESHOT
708764
alloc_cpumask_var(&tick_broadcast_oneshot_mask, GFP_NOWAIT);
709765
alloc_cpumask_var(&tick_broadcast_pending_mask, GFP_NOWAIT);
766+
alloc_cpumask_var(&tick_broadcast_force_mask, GFP_NOWAIT);
710767
#endif
711768
}

0 commit comments

Comments
 (0)