- 
                Notifications
    You must be signed in to change notification settings 
- Fork 8.2k
Add Instrumentation subsystem #70253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cbedb8c    to
    0580526      
    Compare
  
    Currently, CORTEX_M_DWT option does not refresh its value when TIMING_FUNCTIONS, on which it depends, is selected via menuconfig. This occurs because during the early west build steps the aarch32-specific Kconfig file is parsed with TIMING_FUNCTIONS=n, resulting in CORTEX_M_DWT being set to 'n' (not set) in .config. Later, when the user runs west with -t menuconfig and sets TIMING_FUNCTIONS=y using the menus or any other option that selects TIMING_FUNCTIONS=y indirectly, CORTEX_M_DWT is already set to 'n' and remains set as such. That is inconvenient, and, moreoever, since CORTEX_M_DWT gets set if TIMING_FUNCTIONS is set in prj.conf, the current behavior is not consistent, i.e. via the menuconfig setting TIMING_FUNCTIONS=y doesn't imply selecting CORTEX_M_DWT automatically. This commit addresses this issue by selecting CORTEX_M_DWT=y if TIMING_FUNCTIONS becomes set to 'y', otherwise allowing it to be set via the prompt. This is possible by making the CORTEX_M_DWT prompt conditional. Hence, in menuconfig, if TIMING_FUNCTIONS gets set CORTEX_M_DWT is set automatically. If TIMING_FUNCTIONS is not set, a prompt is presented to allow selecting CORTEX_M_DWT manually. Signed-off-by: Gustavo Romero <[email protected]>
Adds a new compiler flag to enable function instrumentation injection at compile time for compilers that support it. Adds support for this flag to GCC when `CONFIG_INSTRUMENTATION` is enabled, setting the `-finstrument-functions` flag during compilation. Signed-off-by: Kevin Townsend <[email protected]> Signed-off-by: Gustavo Romero <[email protected]>
Adds a new '__no_instrumentation__' preprocessor definition for GCC, mapping it to '__attribute__((__no_instrument_function__))'. This new macro is used to disable code instrumentation (enabled when '-finstrument-functions' flag is passed to the compiler) for specific functions when placed before a function definition. Signed-off-by: Kevin Townsend <[email protected]> Signed-off-by: Gustavo Romero <[email protected]>
This commit adds the instrumentation subsystem. Signed-off-by: Gustavo Romero <[email protected]> Signed-off-by: Kevin Townsend <[email protected]>
This commit enables users to pass -DQEMU_SOCKET=y for 'west' when selecting to run the build on a QEMU VM. This allows the target's serial port to be served to the host via a TCP/IP port for convenience, easing communication with the target from the external world. Signed-off-by: Erik Schilling <[email protected]> Signed-off-by: Gustavo Romero <[email protected]>
This commit adds 'zaru', a CLI tool for the instrumentation subsystem. Signed-off-by: Gustavo Romero <[email protected]> Signed-off-by: Kevin Townsend <[email protected]> Signed-off-by: Erik Schilling <[email protected]>
Adds a sample to show the tracing and profiling features when the instrumentation subsystem is enabled (CONFIG_INSTRUMENTATION=y). The sample consists in two threads in ping-pong mode, taking turns to execute loops that spend some CPU cycles. This allows a couple of context switches and so to demonstrate tracing of thread scheduling events. The sample also has an example on how to capture and show traces and profile info via the CLI tool zaru and also how to export the traces so they can be loaded in the Perfetto Trace Viewer tool. Signed-off-by: Gustavo Romero <[email protected]> Signed-off-by: Kevin Townsend <[email protected]>
0580526    to
    30ffae9      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for continuing this! Some comments on having a glance at it, haven't tried it but something I will give a try when I have some time
| /** CPU number. */ | ||
| uint8_t cpu : 3; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have boards with > 8 CPU cores in them, how would this work with those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have boards with > 8 CPU cores in them, how would this work with those?
But we do have a limit to 12 CPU max anyway, or do we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That 12 limit is going to be... challenging. Limiting to 8 even more so.
| void *thread_id; | ||
| #ifdef CONFIG_THREAD_NAME | ||
| /* Thread name (that can be compacted with the correlate lookup table). */ | ||
| char thread_name[20]; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably use the Kconfig for the length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is required and not a "should probably".  THREAD_MAX_NAME_LEN  is already larger than 20, and gets set even larger in some conditionals.
| void instr_dump_deltas_uart(void); | ||
|  | ||
| /** | ||
| * @brief Shared callabck handler to process entry/exit events. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*callback
| struct instr_record *instr_rb_get_item(struct instr_record *record); | ||
| int instr_rb_get_item_wrapping(struct instr_record *record); | ||
|  | ||
| /* TODO(gromero): get_item_size() probably not for external use. */ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need fixing?
|  | ||
| #include <zephyr/instrumentation/instrumentation.h> | ||
|  | ||
| #define DEBUG 0 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use Kconfig
| k_thread_name_copy(curr_thread, record->context.thread_name, 20); | ||
| } else { /* Not in a thread context */ | ||
| strncpy(record->context.thread_name, "thread-none", 12); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be using sizeof() of the variable and #define the none
| /* | ||
| * No zero-copy way since buffer wraps. | ||
| * | ||
| * TODO(gromero): refactor below as a | ||
| * single helper function? | ||
| */ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need addressing?
| * Always finish instrumentation once 'main' returns. | ||
| */ | ||
| if (callee == main) { | ||
| instr_disable(); | ||
| return; | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that you can't do instrumentation on a custom function e.g. application starts with an empty main and you want to do it based on a different thread?
| /* | ||
| * TODO(gromero): add check for claim being called before this func ? | ||
| * i.e. it's forbidden to call this function before calling | ||
| * get_item_claim(). | ||
| */ | ||
|  | ||
| /* | ||
| * TODO(gromero): can item size be deduced from item at 'head', so arg | ||
| * 'record' can be eliminated? | ||
| */ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite a few of these, do they need addressing?
| /* Wrap buffer. */ | ||
| head = 0; | ||
|  | ||
| /* Second, write remaining item bytes not written yet. */ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a query on these, assuming that this is running in a multi-core system, how does this protect against 2 cores trying to do things with the ring buffer at once?
| 
 Thank you so much for another review round! @nordicjm @kartben While I'm going through the comments, I'd like to point out that now it's super easy to experiment with it with a QEMU VM: In one terminal, build and run the sample for the  In another terminal, set the tracing/profiling section and collect data using  Then, for instance, open the traces in Perfetto, i.e. access Perfetto UI -> "Open trace file" and load  For profiling info, use  HTH! | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the effort in getting all of these changes and support done! I'm excited to see this change coming in. A few minor questions and clarifications needed as it is today.
I'm a bit concerned how this will work on multiple core systems, as called out by other reviewers as well, but feel that can be a follow-up to enable as any challenges are better understood.
| @@ -0,0 +1,198 @@ | |||
| /* | |||
| * Copyright 2023 Linaro | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit, since it's now 2024, do these need to be updated?
| /** CPU number. */ | ||
| uint8_t cpu : 3; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That 12 limit is going to be... challenging. Limiting to 8 even more so.
| /** CPU number. */ | ||
| uint8_t cpu : 3; | ||
| /** Reserved for future use. */ | ||
| uint8_t rsvd : 2; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have missed it in the RFC, what future plans do you foresee for the rsvd bits here?
| void *thread_id; | ||
| #ifdef CONFIG_THREAD_NAME | ||
| /* Thread name (that can be compacted with the correlate lookup table). */ | ||
| char thread_name[20]; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is required and not a "should probably".  THREAD_MAX_NAME_LEN  is already larger than 20, and gets set even larger in some conditionals.
| */ | ||
|  | ||
| #ifndef _INSTRUMENTATION_TIMESTAMP_H | ||
| #define _INSTRUMENTATION_TIMESTAMP_H | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Earlier public headers followed a consistent format, this one uses another. Any specific reason why?
| endif() | ||
|  | ||
| if (CONFIG_INSTRUMENTATION_EXCLUDE_FILE_LIST) | ||
| zephyr_compile_options(-finstrument-functions-exclude-function-list=${CONFIG_INSTRUMENTATION_EXCLUDE_FILE_LIST}) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these flags backwards?  -functions-excluded-funciton-list= be part of line 20 and functions-exclude-file-list= part of 24?  If they are not backwards, it may be worth adding a comment to explicitly call out why these are correctly located.
| The function name to be matched is its user-visible name. The match is | ||
| done on substrings: if the function name in the list is a substring of | ||
| a function name, it is considered to be a match. The function names in | ||
| the list are separate by a comman, for instance: sym0,sym1,... | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
additionally, typically a space after a , sym0, sym1, ...
| if (address) { | ||
| printk("trigger: %p\n", address); | ||
| } else { | ||
| printk("trigger not set.\n"); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not trigger:   same question applies to line 67
| if (endptr != beginptr) { | ||
| instr_set_stop_func((void *)address); | ||
| } else { | ||
| printk("invalid argument for 'stopper' in: '%s'\n", cmd); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong on a usage model here.  I envisioning an automated usage here that would have UART scraping as part of the process to generate profiling/tracing trends over time.  If that is indeed the case, does it make sense to keep these kinds of messages all the same formatting to enable scraping tools easier?  For example: stopper: invalid argument in %s would match what is going on for line 65
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding my comments about doxygen comments missing, you may check doc coverage here: https://builds.zephyrproject.io/zephyr/pr/70253/api-coverage/zephyr/instrumentation/index.html
Thanks!
| /* | ||
| * This sample shows the instrumentation subsystem tracing and profiling | ||
| * features. It basically consists in two threads in a ping-pong mode, taking | ||
| * turns to execute loops that spend some CPU cycles. | ||
| * | ||
| * After one flashes this sample to the target, it must be possible to | ||
| * collect and visualize traces and profiling info using the instrumentation CLI | ||
| * tool, zaru. Please _note_ that this subsystem uses the retained_mem driver, | ||
| * hence it's necessary to add the proper devicetree overlay for the target | ||
| * board (see './boards/b_u585i_iot02a.overlay' in this dir. as a reference). | ||
| * | ||
| * Example: | ||
| * | ||
| * $ export ZEPHYR_BASE=~/zephyrproject/zephyr | ||
| * | ||
| * $ west build -p always -b b_u585i_iot02a samples/subsys/instrumentation -t flash | ||
| * (wait ~2 seconds so the sample finishes 2 rounds of ping-pong between 'main' | ||
| * and 'thread_A') | ||
| * | ||
| * (check instrumentation status) | ||
| * $ scripts/zaru.py status | ||
| * | ||
| * (set the tracing / profiling location; in this sample the function | ||
| * 'get_sem_and_exec_function' is the one interesting to allow the observation | ||
| * of context switches) | ||
| * $ scripts/zaru.py trace -v -c get_sem_and_exec_function | ||
| * | ||
| * (reboot target so tracing/profiling at the location is effective) | ||
| * $ scripts/zaru.py reboot | ||
| * (wait ~2 seconds so the sample finishes 2 rounds of ping-pong between 'main' | ||
| * and 'thread_A') | ||
| * | ||
| * (get traces) | ||
| * $ scripts/zaru.py trace -v | ||
| * | ||
| * (.. and get profile) | ||
| * $ scripts/zaru.py profile -v -n 10 | ||
| * | ||
| * Or alternatively, export the traces to Perfetto: | ||
| * | ||
| * (it's necessary to reboot because 'zaru.py trace' dumped the buffer and it's | ||
| * now empty) | ||
| * $ scripts/zaru.py reboot | ||
| * (wait ~2 seconds so the sample finishes 2 rounds of ping-pong between 'main' | ||
| * and 'thread_A') | ||
| * | ||
| * $ scripts/zaru.py trace -v --perfetto --output perfetto_zephyr.json | ||
| * (then go to http://perfetto.dev, Trace Viewer, and load perfetto_zephyr.json) | ||
| * | ||
| */ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be an actual README for the code sample. It looks like you've done most of the work of documenting here already, so hopefully this won't be a lot of work :)
https://docs.zephyrproject.org/latest/samples/sample_definition_and_criteria.html#samples-must-be-documented
Let me know if I can do anything to help
| INSTR_EVENT_ENTRY = 0, /* Callee entry event record, followed by instr_event. */ | ||
| INSTR_EVENT_EXIT, /* Callee exit event record, followed by instr_event. */ | ||
| INSTR_EVENT_PROFILE, /* Profile events */ | ||
| INSTR_EVENT_SCHED_IN, /* Thread switched in scheduler event */ | ||
| INSTR_EVENT_SCHED_OUT, /* Thread switched out scheduler event */ | ||
| INSTR_EVENT_NUM, /* Add more events above this one */ | ||
| INSTR_EVENT_INVALID /* Invalid or no event generated after promotion */ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as this is API, it needs proper doxygen comments
| INSTR_EVENT_ENTRY = 0, /* Callee entry event record, followed by instr_event. */ | |
| INSTR_EVENT_EXIT, /* Callee exit event record, followed by instr_event. */ | |
| INSTR_EVENT_PROFILE, /* Profile events */ | |
| INSTR_EVENT_SCHED_IN, /* Thread switched in scheduler event */ | |
| INSTR_EVENT_SCHED_OUT, /* Thread switched out scheduler event */ | |
| INSTR_EVENT_NUM, /* Add more events above this one */ | |
| INSTR_EVENT_INVALID /* Invalid or no event generated after promotion */ | |
| INSTR_EVENT_ENTRY = 0, /**< Callee entry event record, followed by instr_event. */ | |
| INSTR_EVENT_EXIT, /**< Callee exit event record, followed by instr_event. */ | |
| INSTR_EVENT_PROFILE, /**< Profile events */ | |
| INSTR_EVENT_SCHED_IN, /**< Thread switched in scheduler event */ | |
| INSTR_EVENT_SCHED_OUT, /**< Thread switched out scheduler event */ | |
| INSTR_EVENT_NUM, /**< Add more events above this one */ | |
| INSTR_EVENT_INVALID /**< Invalid or no event generated after promotion */ | 
| * @brief Header for the event records. | ||
| */ | ||
| struct instr_header { | ||
| /* Event type */ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /* Event type */ | |
| /** Event type */ | 
| enum instr_event_types type; | ||
| } __packed; | ||
|  | ||
| struct instr_event_context { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| struct instr_event_context { | |
| /** | |
| * @brief.... | |
| */ | |
| struct instr_event_context { | 
| /** | ||
| * @brief Checks if tracing feature is available. | ||
| * | ||
| */ | ||
| bool instr_tracing_supported(void); | ||
|  | ||
| /** | ||
| * @brief Checks if profiling feature is available. | ||
| * | ||
| */ | ||
| bool instr_profiling_supported(void); | ||
|  | ||
| /** | ||
| * @brief Checks if subsystem is ready to be initialized. Must called be before | ||
| * instr_init(). | ||
| */ | ||
| bool instr_fundamentals_initialized(void); | ||
|  | ||
| /** | ||
| * @brief Performs initialisation required by the system. | ||
| */ | ||
| int instr_init(void); | ||
|  | ||
| /** | ||
| * @brief Tells if instrumentation subsystem is properly initialized. | ||
| */ | ||
| bool instr_initialized(void); | ||
|  | ||
| /** | ||
| * @brief Tells if instrumentation is enabled, i.e. can be turned on. | ||
| */ | ||
| bool instr_enabled(void); | ||
|  | ||
| /** | ||
| * @brief Enables instrumentation. | ||
| */ | ||
| int instr_enable(void); | ||
|  | ||
| /** | ||
| * @brief Disables instrumentation. | ||
| */ | ||
| int instr_disable(void); | ||
|  | ||
| /** | ||
| * @brief Turns on instrumentation (start recording events). | ||
| */ | ||
| int instr_turn_on(void); | ||
|  | ||
| /** | ||
| * @brief Turns off instrumentation (stop recording events). | ||
| */ | ||
| int instr_turn_off(void); | ||
|  | ||
| /** | ||
| * @brief Tells if instrumentation is turned on. | ||
| */ | ||
| bool instr_turned_on(void); | ||
|  | ||
| /** | ||
| * @brief Tells if instrumentation can collect traces. | ||
| */ | ||
| bool instr_trace_enabled(void); | ||
|  | ||
| /** | ||
| * @brief Tells if instrumentation can collect profile info. | ||
| */ | ||
| bool instr_profile_enabled(void); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please document return values (@return or @retval)
| void instr_rb_init(void); | ||
|  | ||
| struct instr_record *instr_rb_put_item_claim(enum instr_event_types type); | ||
| void instr_rb_put_item_finish(enum instr_event_types type); | ||
| int instr_rb_put_item_wrapping(struct instr_record *record); | ||
|  | ||
| struct instr_record *instr_rb_get_item_claim(void); | ||
| void instr_rb_get_item_finish(struct instr_record *record); | ||
| struct instr_record *instr_rb_get_item(struct instr_record *record); | ||
| int instr_rb_get_item_wrapping(struct instr_record *record); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should these be documented? or are they meant to be internal API?
| /** | ||
| * @brief Initialize timestamp | ||
| * | ||
| */ | ||
| int instr_timestamp_init(void); | ||
|  | ||
| /** | ||
| * @brief Get current timestamp in nanoseconds | ||
| * | ||
| */ | ||
| uint64_t instr_timestamp_ns(void); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing @return
| @gromero are you still interested in working on this PR? Thanks! | 
| 
 @kartben Hi Benjamin! YES, I am. Just being hard to find the cycles to address the last comments. I want to ask if it's ok to merge the code at first without support for multiple CPUs. That simplifies things a bit at the moment. If so, then it would remain the use of Zephyr ring buffer for simplification, so only one task would stay on my side :) | 
| 
 Yes, I think that would be fine, but will let others chime in too. There are also other pending request for changes that would need to be addressed. Thanks! | 
| This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time. | 
| @erwango Thanks for removing the  | 
| This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time. | 
| @gromero Hi, Gustavo! Thank you indeed for your efforts! And a couple of cents from me: 
 | 
| This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time. | 
Hi Community,
This series introduces the Instrumentation subsystem and its associated CLI tool
zaru. It leverages the compiler function instrumentation flags to collect tracing and profiling information in Zephyr. For details about it, please see [0].The Instrumentation subsystem has been tested under physical boards
b_u585i_iot02aandnucleo_wba52cgand also under QEMU VM virtual machinemps2_an385. Please note that, since it relies on the retention subsystem, if another board different from the ones mentioned is used for testing this series the proper device-tree overlay for the board needs to be added toboards/dir accordingly. Check examples insamples/subsys/instrumentation/boards/*.overlay.Previous discussions on this series (RFC) can be found here;
#57371
[0] RFC: #57373
Thank you!
Cheers,
Gustavo