Skip to content

Commit 3488af0

Browse files
sjp38akpm00
authored andcommitted
mm/damon/core: handle zero {aggregation,ops_update} intervals
Patch series "mm/damon/core: fix handling of zero non-sampling intervals". DAMON's internal intervals accounting logic is not correctly handling non-sampling intervals of zero values for a wrong assumption. This could cause unexpected monitoring behavior, and even result in infinite hang of DAMON sysfs interface user threads in case of zero aggregation interval. Fix those by updating the intervals accounting logic. For details of the root case and solutions, please refer to commit messages of fixes. This patch (of 2): DAMON's logics to determine if this is the time to do aggregation and ops update assumes next_{aggregation,ops_update}_sis are always set larger than current passed_sample_intervals. And therefore it further assumes continuously incrementing passed_sample_intervals every sampling interval will make it reaches to the next_{aggregation,ops_update}_sis in future. The logic therefore make the action and update next_{aggregation,ops_updaste}_sis only if passed_sample_intervals is same to the counts, respectively. If Aggregation interval or Ops update interval are zero, however, next_aggregation_sis or next_ops_update_sis are set same to current passed_sample_intervals, respectively. And passed_sample_intervals is incremented before doing the next_{aggregation,ops_update}_sis check. Hence, passed_sample_intervals becomes larger than next_{aggregation,ops_update}_sis, and the logic says it is not the time to do the action and update next_{aggregation,ops_update}_sis forever, until an overflow happens. In other words, DAMON stops doing aggregations or ops updates effectively forever, and users cannot get monitoring results. Based on the documents and the common sense, a reasonable behavior for such inputs is doing an aggregation and an ops update for every sampling interval. Handle the case by removing the assumption. Note that this could incur particular real issue for DAMON sysfs interface users, in case of zero Aggregation interval. When user starts DAMON with zero Aggregation interval and asks online DAMON parameter tuning via DAMON sysfs interface, the request is handled by the aggregation callback. Until the callback finishes the work, the user who requested the online tuning just waits. Hence, the user will be stuck until the passed_sample_intervals overflows. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 4472edf ("mm/damon/core: use number of passed access sampling as a timer") Signed-off-by: SeongJae Park <[email protected]> Cc: <[email protected]> [6.7.x] Signed-off-by: Andrew Morton <[email protected]>
1 parent faa242b commit 3488af0

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

mm/damon/core.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2000,7 +2000,7 @@ static int kdamond_fn(void *data)
20002000
if (ctx->ops.check_accesses)
20012001
max_nr_accesses = ctx->ops.check_accesses(ctx);
20022002

2003-
if (ctx->passed_sample_intervals == next_aggregation_sis) {
2003+
if (ctx->passed_sample_intervals >= next_aggregation_sis) {
20042004
kdamond_merge_regions(ctx,
20052005
max_nr_accesses / 10,
20062006
sz_limit);
@@ -2018,7 +2018,7 @@ static int kdamond_fn(void *data)
20182018

20192019
sample_interval = ctx->attrs.sample_interval ?
20202020
ctx->attrs.sample_interval : 1;
2021-
if (ctx->passed_sample_intervals == next_aggregation_sis) {
2021+
if (ctx->passed_sample_intervals >= next_aggregation_sis) {
20222022
ctx->next_aggregation_sis = next_aggregation_sis +
20232023
ctx->attrs.aggr_interval / sample_interval;
20242024

@@ -2028,7 +2028,7 @@ static int kdamond_fn(void *data)
20282028
ctx->ops.reset_aggregated(ctx);
20292029
}
20302030

2031-
if (ctx->passed_sample_intervals == next_ops_update_sis) {
2031+
if (ctx->passed_sample_intervals >= next_ops_update_sis) {
20322032
ctx->next_ops_update_sis = next_ops_update_sis +
20332033
ctx->attrs.ops_update_interval /
20342034
sample_interval;

0 commit comments

Comments
 (0)