Skip to content

Commit bce29ac

Browse files
Daniel Bristot de Oliveirarostedt
authored andcommitted
trace: Add osnoise tracer
In the context of high-performance computing (HPC), the Operating System Noise (*osnoise*) refers to the interference experienced by an application due to activities inside the operating system. In the context of Linux, NMIs, IRQs, SoftIRQs, and any other system thread can cause noise to the system. Moreover, hardware-related jobs can also cause noise, for example, via SMIs. The osnoise tracer leverages the hwlat_detector by running a similar loop with preemption, SoftIRQs and IRQs enabled, thus allowing all the sources of *osnoise* during its execution. Using the same approach of hwlat, osnoise takes note of the entry and exit point of any source of interferences, increasing a per-cpu interference counter. The osnoise tracer also saves an interference counter for each source of interference. The interference counter for NMI, IRQs, SoftIRQs, and threads is increased anytime the tool observes these interferences' entry events. When a noise happens without any interference from the operating system level, the hardware noise counter increases, pointing to a hardware-related noise. In this way, osnoise can account for any source of interference. At the end of the period, the osnoise tracer prints the sum of all noise, the max single noise, the percentage of CPU available for the thread, and the counters for the noise sources. Usage Write the ASCII text "osnoise" into the current_tracer file of the tracing system (generally mounted at /sys/kernel/tracing). For example:: [root@f32 ~]# cd /sys/kernel/tracing/ [root@f32 tracing]# echo osnoise > current_tracer It is possible to follow the trace by reading the trace trace file:: [root@f32 tracing]# cat trace # tracer: osnoise # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth MAX # || / SINGLE Interference counters: # |||| RUNTIME NOISE % OF CPU NOISE +-----------------------------+ # TASK-PID CPU# |||| TIMESTAMP IN US IN US AVAILABLE IN US HW NMI IRQ SIRQ THREAD # | | | |||| | | | | | | | | | | <...>-859 [000] .... 81.637220: 1000000 190 99.98100 9 18 0 1007 18 1 <...>-860 [001] .... 81.638154: 1000000 656 99.93440 74 23 0 1006 16 3 <...>-861 [002] .... 81.638193: 1000000 5675 99.43250 202 6 0 1013 25 21 <...>-862 [003] .... 81.638242: 1000000 125 99.98750 45 1 0 1011 23 0 <...>-863 [004] .... 81.638260: 1000000 1721 99.82790 168 7 0 1002 49 41 <...>-864 [005] .... 81.638286: 1000000 263 99.97370 57 6 0 1006 26 2 <...>-865 [006] .... 81.638302: 1000000 109 99.98910 21 3 0 1006 18 1 <...>-866 [007] .... 81.638326: 1000000 7816 99.21840 107 8 0 1016 39 19 In addition to the regular trace fields (from TASK-PID to TIMESTAMP), the tracer prints a message at the end of each period for each CPU that is running an osnoise/CPU thread. The osnoise specific fields report: - The RUNTIME IN USE reports the amount of time in microseconds that the osnoise thread kept looping reading the time. - The NOISE IN US reports the sum of noise in microseconds observed by the osnoise tracer during the associated runtime. - The % OF CPU AVAILABLE reports the percentage of CPU available for the osnoise thread during the runtime window. - The MAX SINGLE NOISE IN US reports the maximum single noise observed during the runtime window. - The Interference counters display how many each of the respective interference happened during the runtime window. Note that the example above shows a high number of HW noise samples. The reason being is that this sample was taken on a virtual machine, and the host interference is detected as a hardware interference. Tracer options The tracer has a set of options inside the osnoise directory, they are: - osnoise/cpus: CPUs at which a osnoise thread will execute. - osnoise/period_us: the period of the osnoise thread. - osnoise/runtime_us: how long an osnoise thread will look for noise. - osnoise/stop_tracing_us: stop the system tracing if a single noise higher than the configured value happens. Writing 0 disables this option. - osnoise/stop_tracing_total_us: stop the system tracing if total noise higher than the configured value happens. Writing 0 disables this option. - tracing_threshold: the minimum delta between two time() reads to be considered as noise, in us. When set to 0, the default value will be used, which is currently 5 us. Additional Tracing In addition to the tracer, a set of tracepoints were added to facilitate the identification of the osnoise source. - osnoise:sample_threshold: printed anytime a noise is higher than the configurable tolerance_ns. - osnoise:nmi_noise: noise from NMI, including the duration. - osnoise:irq_noise: noise from an IRQ, including the duration. - osnoise:softirq_noise: noise from a SoftIRQ, including the duration. - osnoise:thread_noise: noise from a thread, including the duration. Note that all the values are *net values*. For example, if while osnoise is running, another thread preempts the osnoise thread, it will start a thread_noise duration at the start. Then, an IRQ takes place, preempting the thread_noise, starting a irq_noise. When the IRQ ends its execution, it will compute its duration, and this duration will be subtracted from the thread_noise, in such a way as to avoid the double accounting of the IRQ execution. This logic is valid for all sources of noise. Here is one example of the usage of these tracepoints:: osnoise/8-961 [008] d.h. 5789.857532: irq_noise: local_timer:236 start 5789.857529929 duration 1845 ns osnoise/8-961 [008] dNh. 5789.858408: irq_noise: local_timer:236 start 5789.858404871 duration 2848 ns migration/8-54 [008] d... 5789.858413: thread_noise: migration/8:54 start 5789.858409300 duration 3068 ns osnoise/8-961 [008] .... 5789.858413: sample_threshold: start 5789.858404555 duration 8723 ns interferences 2 In this example, a noise sample of 8 microseconds was reported in the last line, pointing to two interferences. Looking backward in the trace, the two previous entries were about the migration thread running after a timer IRQ execution. The first event is not part of the noise because it took place one millisecond before. It is worth noticing that the sum of the duration reported in the tracepoints is smaller than eight us reported in the sample_threshold. The reason roots in the overhead of the entry and exit code that happens before and after any interference execution. This justifies the dual approach: measuring thread and tracing. Link: https://lkml.kernel.org/r/e649467042d60e7b62714c9c6751a56299d15119.1624372313.git.bristot@redhat.com Cc: Phil Auld <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Kate Carcia <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Alexandre Chartre <[email protected]> Cc: Clark Willaims <[email protected]> Cc: John Kacur <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Daniel Bristot de Oliveira <[email protected]> [ Made the following functions static: trace_irqentry_callback() trace_irqexit_callback() trace_intel_irqentry_callback() trace_intel_irqexit_callback() Added to include/trace.h: osnoise_arch_register() osnoise_arch_unregister() Fixed define logic for LATENCY_FS_NOTIFY Reported-by: kernel test robot <[email protected]> ] Signed-off-by: Steven Rostedt (VMware) <[email protected]>
1 parent 6880c98 commit bce29ac

File tree

13 files changed

+2072
-4
lines changed

13 files changed

+2072
-4
lines changed

Documentation/trace/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ Linux Tracing Technologies
2323
histogram-design
2424
boottime-trace
2525
hwlat_detector
26+
osnoise-tracer
2627
intel_th
2728
ring-buffer-design
2829
stm
Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
==============
2+
OSNOISE Tracer
3+
==============
4+
5+
In the context of high-performance computing (HPC), the Operating System
6+
Noise (*osnoise*) refers to the interference experienced by an application
7+
due to activities inside the operating system. In the context of Linux,
8+
NMIs, IRQs, SoftIRQs, and any other system thread can cause noise to the
9+
system. Moreover, hardware-related jobs can also cause noise, for example,
10+
via SMIs.
11+
12+
hwlat_detector is one of the tools used to identify the most complex
13+
source of noise: *hardware noise*.
14+
15+
In a nutshell, the hwlat_detector creates a thread that runs
16+
periodically for a given period. At the beginning of a period, the thread
17+
disables interrupt and starts sampling. While running, the hwlatd
18+
thread reads the time in a loop. As interrupts are disabled, threads,
19+
IRQs, and SoftIRQs cannot interfere with the hwlatd thread. Hence, the
20+
cause of any gap between two different reads of the time roots either on
21+
NMI or in the hardware itself. At the end of the period, hwlatd enables
22+
interrupts and reports the max observed gap between the reads. It also
23+
prints a NMI occurrence counter. If the output does not report NMI
24+
executions, the user can conclude that the hardware is the culprit for
25+
the latency. The hwlat detects the NMI execution by observing
26+
the entry and exit of a NMI.
27+
28+
The osnoise tracer leverages the hwlat_detector by running a
29+
similar loop with preemption, SoftIRQs and IRQs enabled, thus allowing
30+
all the sources of *osnoise* during its execution. Using the same approach
31+
of hwlat, osnoise takes note of the entry and exit point of any
32+
source of interferences, increasing a per-cpu interference counter. The
33+
osnoise tracer also saves an interference counter for each source of
34+
interference. The interference counter for NMI, IRQs, SoftIRQs, and
35+
threads is increased anytime the tool observes these interferences' entry
36+
events. When a noise happens without any interference from the operating
37+
system level, the hardware noise counter increases, pointing to a
38+
hardware-related noise. In this way, osnoise can account for any
39+
source of interference. At the end of the period, the osnoise tracer
40+
prints the sum of all noise, the max single noise, the percentage of CPU
41+
available for the thread, and the counters for the noise sources.
42+
43+
Usage
44+
-----
45+
46+
Write the ASCII text "osnoise" into the current_tracer file of the
47+
tracing system (generally mounted at /sys/kernel/tracing).
48+
49+
For example::
50+
51+
[root@f32 ~]# cd /sys/kernel/tracing/
52+
[root@f32 tracing]# echo osnoise > current_tracer
53+
54+
It is possible to follow the trace by reading the trace trace file::
55+
56+
[root@f32 tracing]# cat trace
57+
# tracer: osnoise
58+
#
59+
# _-----=> irqs-off
60+
# / _----=> need-resched
61+
# | / _---=> hardirq/softirq
62+
# || / _--=> preempt-depth MAX
63+
# || / SINGLE Interference counters:
64+
# |||| RUNTIME NOISE % OF CPU NOISE +-----------------------------+
65+
# TASK-PID CPU# |||| TIMESTAMP IN US IN US AVAILABLE IN US HW NMI IRQ SIRQ THREAD
66+
# | | | |||| | | | | | | | | | |
67+
<...>-859 [000] .... 81.637220: 1000000 190 99.98100 9 18 0 1007 18 1
68+
<...>-860 [001] .... 81.638154: 1000000 656 99.93440 74 23 0 1006 16 3
69+
<...>-861 [002] .... 81.638193: 1000000 5675 99.43250 202 6 0 1013 25 21
70+
<...>-862 [003] .... 81.638242: 1000000 125 99.98750 45 1 0 1011 23 0
71+
<...>-863 [004] .... 81.638260: 1000000 1721 99.82790 168 7 0 1002 49 41
72+
<...>-864 [005] .... 81.638286: 1000000 263 99.97370 57 6 0 1006 26 2
73+
<...>-865 [006] .... 81.638302: 1000000 109 99.98910 21 3 0 1006 18 1
74+
<...>-866 [007] .... 81.638326: 1000000 7816 99.21840 107 8 0 1016 39 19
75+
76+
In addition to the regular trace fields (from TASK-PID to TIMESTAMP), the
77+
tracer prints a message at the end of each period for each CPU that is
78+
running an osnoise/ thread. The osnoise specific fields report:
79+
80+
- The RUNTIME IN USE reports the amount of time in microseconds that
81+
the osnoise thread kept looping reading the time.
82+
- The NOISE IN US reports the sum of noise in microseconds observed
83+
by the osnoise tracer during the associated runtime.
84+
- The % OF CPU AVAILABLE reports the percentage of CPU available for
85+
the osnoise thread during the runtime window.
86+
- The MAX SINGLE NOISE IN US reports the maximum single noise observed
87+
during the runtime window.
88+
- The Interference counters display how many each of the respective
89+
interference happened during the runtime window.
90+
91+
Note that the example above shows a high number of HW noise samples.
92+
The reason being is that this sample was taken on a virtual machine,
93+
and the host interference is detected as a hardware interference.
94+
95+
Tracer options
96+
---------------------
97+
98+
The tracer has a set of options inside the osnoise directory, they are:
99+
100+
- osnoise/cpus: CPUs at which a osnoise thread will execute.
101+
- osnoise/period_us: the period of the osnoise thread.
102+
- osnoise/runtime_us: how long an osnoise thread will look for noise.
103+
- osnoise/stop_tracing_us: stop the system tracing if a single noise
104+
higher than the configured value happens. Writing 0 disables this
105+
option.
106+
- osnoise/stop_tracing_total_us: stop the system tracing if total noise
107+
higher than the configured value happens. Writing 0 disables this
108+
option.
109+
- tracing_threshold: the minimum delta between two time() reads to be
110+
considered as noise, in us. When set to 0, the default value will
111+
will be used, which is currently 5 us.
112+
113+
Additional Tracing
114+
------------------
115+
116+
In addition to the tracer, a set of tracepoints were added to
117+
facilitate the identification of the osnoise source.
118+
119+
- osnoise:sample_threshold: printed anytime a noise is higher than
120+
the configurable tolerance_ns.
121+
- osnoise:nmi_noise: noise from NMI, including the duration.
122+
- osnoise:irq_noise: noise from an IRQ, including the duration.
123+
- osnoise:softirq_noise: noise from a SoftIRQ, including the
124+
duration.
125+
- osnoise:thread_noise: noise from a thread, including the duration.
126+
127+
Note that all the values are *net values*. For example, if while osnoise
128+
is running, another thread preempts the osnoise thread, it will start a
129+
thread_noise duration at the start. Then, an IRQ takes place, preempting
130+
the thread_noise, starting a irq_noise. When the IRQ ends its execution,
131+
it will compute its duration, and this duration will be subtracted from
132+
the thread_noise, in such a way as to avoid the double accounting of the
133+
IRQ execution. This logic is valid for all sources of noise.
134+
135+
Here is one example of the usage of these tracepoints::
136+
137+
osnoise/8-961 [008] d.h. 5789.857532: irq_noise: local_timer:236 start 5789.857529929 duration 1845 ns
138+
osnoise/8-961 [008] dNh. 5789.858408: irq_noise: local_timer:236 start 5789.858404871 duration 2848 ns
139+
migration/8-54 [008] d... 5789.858413: thread_noise: migration/8:54 start 5789.858409300 duration 3068 ns
140+
osnoise/8-961 [008] .... 5789.858413: sample_threshold: start 5789.858404555 duration 8812 ns interferences 2
141+
142+
In this example, a noise sample of 8 microseconds was reported in the last
143+
line, pointing to two interferences. Looking backward in the trace, the
144+
two previous entries were about the migration thread running after a
145+
timer IRQ execution. The first event is not part of the noise because
146+
it took place one millisecond before.
147+
148+
It is worth noticing that the sum of the duration reported in the
149+
tracepoints is smaller than eight us reported in the sample_threshold.
150+
The reason roots in the overhead of the entry and exit code that happens
151+
before and after any interference execution. This justifies the dual
152+
approach: measuring thread and tracing.

arch/x86/kernel/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,7 @@ obj-$(CONFIG_FUNCTION_TRACER) += ftrace_$(BITS).o
102102
obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o
103103
obj-$(CONFIG_FTRACE_SYSCALLS) += ftrace.o
104104
obj-$(CONFIG_X86_TSC) += trace_clock.o
105+
obj-$(CONFIG_TRACING) += trace.o
105106
obj-$(CONFIG_CRASH_CORE) += crash_core_$(BITS).o
106107
obj-$(CONFIG_KEXEC_CORE) += machine_kexec_$(BITS).o
107108
obj-$(CONFIG_KEXEC_CORE) += relocate_kernel_$(BITS).o crash.o

0 commit comments

Comments
 (0)