Skip to content

Commit 9c0b4bb

Browse files
vingu-linaroIngo Molnar
authored andcommitted
sched/cpufreq: Rework schedutil governor performance estimation
The current method to take into account uclamp hints when estimating the target frequency can end in a situation where the selected target frequency is finally higher than uclamp hints, whereas there are no real needs. Such cases mainly happen because we are currently mixing the traditional scheduler utilization signal with the uclamp performance hints. By adding these 2 metrics, we loose an important information when it comes to select the target frequency, and we have to make some assumptions which can't fit all cases. Rework the interface between the scheduler and schedutil governor in order to propagate all information down to the cpufreq governor. effective_cpu_util() interface changes and now returns the actual utilization of the CPU with 2 optional inputs: - The minimum performance for this CPU; typically the capacity to handle the deadline task and the interrupt pressure. But also uclamp_min request when available. - The maximum targeting performance for this CPU which reflects the maximum level that we would like to not exceed. By default it will be the CPU capacity but can be reduced because of some performance hints set with uclamp. The value can be lower than actual utilization and/or min performance level. A new sugov_effective_cpu_perf() interface is also available to compute the final performance level that is targeted for the CPU, after applying some cpufreq headroom and taking into account all inputs. With these 2 functions, schedutil is now able to decide when it must go above uclamp hints. It now also has a generic way to get the min performance level. The dependency between energy model and cpufreq governor and its headroom policy doesn't exist anymore. eenv_pd_max_util() asks schedutil for the targeted performance after applying the impact of the waking task. [ mingo: Refined the changelog & C comments. ] Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Link: https://lore.kernel.org/r/[email protected]
1 parent 50181c0 commit 9c0b4bb

File tree

5 files changed

+89
-83
lines changed

5 files changed

+89
-83
lines changed

include/linux/energy_model.h

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -243,7 +243,6 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
243243
scale_cpu = arch_scale_cpu_capacity(cpu);
244244
ps = &pd->table[pd->nr_perf_states - 1];
245245

246-
max_util = map_util_perf(max_util);
247246
max_util = min(max_util, allowed_cpu_cap);
248247
freq = map_util_freq(max_util, ps->frequency, scale_cpu);
249248

kernel/sched/core.c

Lines changed: 38 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -7467,64 +7467,63 @@ int sched_core_idle_cpu(int cpu)
74677467
* required to meet deadlines.
74687468
*/
74697469
unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
7470-
enum cpu_util_type type,
7471-
struct task_struct *p)
7470+
unsigned long *min,
7471+
unsigned long *max)
74727472
{
7473-
unsigned long dl_util, util, irq, max;
7473+
unsigned long util, irq, scale;
74747474
struct rq *rq = cpu_rq(cpu);
74757475

7476-
max = arch_scale_cpu_capacity(cpu);
7477-
7478-
if (!uclamp_is_used() &&
7479-
type == FREQUENCY_UTIL && rt_rq_is_runnable(&rq->rt)) {
7480-
return max;
7481-
}
7476+
scale = arch_scale_cpu_capacity(cpu);
74827477

74837478
/*
74847479
* Early check to see if IRQ/steal time saturates the CPU, can be
74857480
* because of inaccuracies in how we track these -- see
74867481
* update_irq_load_avg().
74877482
*/
74887483
irq = cpu_util_irq(rq);
7489-
if (unlikely(irq >= max))
7490-
return max;
7484+
if (unlikely(irq >= scale)) {
7485+
if (min)
7486+
*min = scale;
7487+
if (max)
7488+
*max = scale;
7489+
return scale;
7490+
}
7491+
7492+
if (min) {
7493+
/*
7494+
* The minimum utilization returns the highest level between:
7495+
* - the computed DL bandwidth needed with the IRQ pressure which
7496+
* steals time to the deadline task.
7497+
* - The minimum performance requirement for CFS and/or RT.
7498+
*/
7499+
*min = max(irq + cpu_bw_dl(rq), uclamp_rq_get(rq, UCLAMP_MIN));
7500+
7501+
/*
7502+
* When an RT task is runnable and uclamp is not used, we must
7503+
* ensure that the task will run at maximum compute capacity.
7504+
*/
7505+
if (!uclamp_is_used() && rt_rq_is_runnable(&rq->rt))
7506+
*min = max(*min, scale);
7507+
}
74917508

74927509
/*
74937510
* Because the time spend on RT/DL tasks is visible as 'lost' time to
74947511
* CFS tasks and we use the same metric to track the effective
74957512
* utilization (PELT windows are synchronized) we can directly add them
74967513
* to obtain the CPU's actual utilization.
7497-
*
7498-
* CFS and RT utilization can be boosted or capped, depending on
7499-
* utilization clamp constraints requested by currently RUNNABLE
7500-
* tasks.
7501-
* When there are no CFS RUNNABLE tasks, clamps are released and
7502-
* frequency will be gracefully reduced with the utilization decay.
75037514
*/
75047515
util = util_cfs + cpu_util_rt(rq);
7505-
if (type == FREQUENCY_UTIL)
7506-
util = uclamp_rq_util_with(rq, util, p);
7507-
7508-
dl_util = cpu_util_dl(rq);
7516+
util += cpu_util_dl(rq);
75097517

75107518
/*
7511-
* For frequency selection we do not make cpu_util_dl() a permanent part
7512-
* of this sum because we want to use cpu_bw_dl() later on, but we need
7513-
* to check if the CFS+RT+DL sum is saturated (ie. no idle time) such
7514-
* that we select f_max when there is no idle time.
7515-
*
7516-
* NOTE: numerical errors or stop class might cause us to not quite hit
7517-
* saturation when we should -- something for later.
7519+
* The maximum hint is a soft bandwidth requirement, which can be lower
7520+
* than the actual utilization because of uclamp_max requirements.
75187521
*/
7519-
if (util + dl_util >= max)
7520-
return max;
7522+
if (max)
7523+
*max = min(scale, uclamp_rq_get(rq, UCLAMP_MAX));
75217524

7522-
/*
7523-
* OTOH, for energy computation we need the estimated running time, so
7524-
* include util_dl and ignore dl_bw.
7525-
*/
7526-
if (type == ENERGY_UTIL)
7527-
util += dl_util;
7525+
if (util >= scale)
7526+
return scale;
75287527

75297528
/*
75307529
* There is still idle time; further improve the number by using the
@@ -7535,28 +7534,15 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
75357534
* U' = irq + --------- * U
75367535
* max
75377536
*/
7538-
util = scale_irq_capacity(util, irq, max);
7537+
util = scale_irq_capacity(util, irq, scale);
75397538
util += irq;
75407539

7541-
/*
7542-
* Bandwidth required by DEADLINE must always be granted while, for
7543-
* FAIR and RT, we use blocked utilization of IDLE CPUs as a mechanism
7544-
* to gracefully reduce the frequency when no tasks show up for longer
7545-
* periods of time.
7546-
*
7547-
* Ideally we would like to set bw_dl as min/guaranteed freq and util +
7548-
* bw_dl as requested freq. However, cpufreq is not yet ready for such
7549-
* an interface. So, we only do the latter for now.
7550-
*/
7551-
if (type == FREQUENCY_UTIL)
7552-
util += cpu_bw_dl(rq);
7553-
7554-
return min(max, util);
7540+
return min(scale, util);
75557541
}
75567542

75577543
unsigned long sched_cpu_util(int cpu)
75587544
{
7559-
return effective_cpu_util(cpu, cpu_util_cfs(cpu), ENERGY_UTIL, NULL);
7545+
return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
75607546
}
75617547
#endif /* CONFIG_SMP */
75627548

kernel/sched/cpufreq_schedutil.c

Lines changed: 25 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ struct sugov_cpu {
4747
u64 last_update;
4848

4949
unsigned long util;
50-
unsigned long bw_dl;
50+
unsigned long bw_min;
5151

5252
/* The field below is for single-CPU policies only: */
5353
#ifdef CONFIG_NO_HZ_COMMON
@@ -143,7 +143,6 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
143143
unsigned int freq = arch_scale_freq_invariant() ?
144144
policy->cpuinfo.max_freq : policy->cur;
145145

146-
util = map_util_perf(util);
147146
freq = map_util_freq(util, freq, max);
148147

149148
if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
@@ -153,14 +152,30 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
153152
return cpufreq_driver_resolve_freq(policy, freq);
154153
}
155154

155+
unsigned long sugov_effective_cpu_perf(int cpu, unsigned long actual,
156+
unsigned long min,
157+
unsigned long max)
158+
{
159+
/* Add dvfs headroom to actual utilization */
160+
actual = map_util_perf(actual);
161+
/* Actually we don't need to target the max performance */
162+
if (actual < max)
163+
max = actual;
164+
165+
/*
166+
* Ensure at least minimum performance while providing more compute
167+
* capacity when possible.
168+
*/
169+
return max(min, max);
170+
}
171+
156172
static void sugov_get_util(struct sugov_cpu *sg_cpu)
157173
{
158-
unsigned long util = cpu_util_cfs_boost(sg_cpu->cpu);
159-
struct rq *rq = cpu_rq(sg_cpu->cpu);
174+
unsigned long min, max, util = cpu_util_cfs_boost(sg_cpu->cpu);
160175

161-
sg_cpu->bw_dl = cpu_bw_dl(rq);
162-
sg_cpu->util = effective_cpu_util(sg_cpu->cpu, util,
163-
FREQUENCY_UTIL, NULL);
176+
util = effective_cpu_util(sg_cpu->cpu, util, &min, &max);
177+
sg_cpu->bw_min = min;
178+
sg_cpu->util = sugov_effective_cpu_perf(sg_cpu->cpu, util, min, max);
164179
}
165180

166181
/**
@@ -306,7 +321,7 @@ static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
306321
*/
307322
static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu)
308323
{
309-
if (cpu_bw_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->bw_dl)
324+
if (cpu_bw_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->bw_min)
310325
sg_cpu->sg_policy->limits_changed = true;
311326
}
312327

@@ -407,8 +422,8 @@ static void sugov_update_single_perf(struct update_util_data *hook, u64 time,
407422
sugov_cpu_is_busy(sg_cpu) && sg_cpu->util < prev_util)
408423
sg_cpu->util = prev_util;
409424

410-
cpufreq_driver_adjust_perf(sg_cpu->cpu, map_util_perf(sg_cpu->bw_dl),
411-
map_util_perf(sg_cpu->util), max_cap);
425+
cpufreq_driver_adjust_perf(sg_cpu->cpu, sg_cpu->bw_min,
426+
sg_cpu->util, max_cap);
412427

413428
sg_cpu->sg_policy->last_freq_update_time = time;
414429
}

kernel/sched/fair.c

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7793,7 +7793,7 @@ static inline void eenv_pd_busy_time(struct energy_env *eenv,
77937793
for_each_cpu(cpu, pd_cpus) {
77947794
unsigned long util = cpu_util(cpu, p, -1, 0);
77957795

7796-
busy_time += effective_cpu_util(cpu, util, ENERGY_UTIL, NULL);
7796+
busy_time += effective_cpu_util(cpu, util, NULL, NULL);
77977797
}
77987798

77997799
eenv->pd_busy_time = min(eenv->pd_cap, busy_time);
@@ -7816,7 +7816,7 @@ eenv_pd_max_util(struct energy_env *eenv, struct cpumask *pd_cpus,
78167816
for_each_cpu(cpu, pd_cpus) {
78177817
struct task_struct *tsk = (cpu == dst_cpu) ? p : NULL;
78187818
unsigned long util = cpu_util(cpu, p, dst_cpu, 1);
7819-
unsigned long eff_util;
7819+
unsigned long eff_util, min, max;
78207820

78217821
/*
78227822
* Performance domain frequency: utilization clamping
@@ -7825,7 +7825,23 @@ eenv_pd_max_util(struct energy_env *eenv, struct cpumask *pd_cpus,
78257825
* NOTE: in case RT tasks are running, by default the
78267826
* FREQUENCY_UTIL's utilization can be max OPP.
78277827
*/
7828-
eff_util = effective_cpu_util(cpu, util, FREQUENCY_UTIL, tsk);
7828+
eff_util = effective_cpu_util(cpu, util, &min, &max);
7829+
7830+
/* Task's uclamp can modify min and max value */
7831+
if (tsk && uclamp_is_used()) {
7832+
min = max(min, uclamp_eff_value(p, UCLAMP_MIN));
7833+
7834+
/*
7835+
* If there is no active max uclamp constraint,
7836+
* directly use task's one, otherwise keep max.
7837+
*/
7838+
if (uclamp_rq_is_idle(cpu_rq(cpu)))
7839+
max = uclamp_eff_value(p, UCLAMP_MAX);
7840+
else
7841+
max = max(max, uclamp_eff_value(p, UCLAMP_MAX));
7842+
}
7843+
7844+
eff_util = sugov_effective_cpu_perf(cpu, eff_util, min, max);
78297845
max_util = max(max_util, eff_util);
78307846
}
78317847

kernel/sched/sched.h

Lines changed: 7 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2994,24 +2994,14 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
29942994
#endif
29952995

29962996
#ifdef CONFIG_SMP
2997-
/**
2998-
* enum cpu_util_type - CPU utilization type
2999-
* @FREQUENCY_UTIL: Utilization used to select frequency
3000-
* @ENERGY_UTIL: Utilization used during energy calculation
3001-
*
3002-
* The utilization signals of all scheduling classes (CFS/RT/DL) and IRQ time
3003-
* need to be aggregated differently depending on the usage made of them. This
3004-
* enum is used within effective_cpu_util() to differentiate the types of
3005-
* utilization expected by the callers, and adjust the aggregation accordingly.
3006-
*/
3007-
enum cpu_util_type {
3008-
FREQUENCY_UTIL,
3009-
ENERGY_UTIL,
3010-
};
3011-
30122997
unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
3013-
enum cpu_util_type type,
3014-
struct task_struct *p);
2998+
unsigned long *min,
2999+
unsigned long *max);
3000+
3001+
unsigned long sugov_effective_cpu_perf(int cpu, unsigned long actual,
3002+
unsigned long min,
3003+
unsigned long max);
3004+
30153005

30163006
/*
30173007
* Verify the fitness of task @p to run on @cpu taking into account the

0 commit comments

Comments
 (0)