Skip to content

Profiler: which steps are contained in others and which are "leaf" operations? #6405

@tdvginz

Description

@tdvginz

Looking at the given profiler status:

Profiler Report                                                                                                                                              
                                                                                                                                                             
Action                          |  Mean duration (s)    |Num calls              |  Total time (s)       |  Percentage %         |                            
-----------------------------------------------------------------------------------------------------------------------------                                
Total                           |  -                    |_                      |  5.5595e+04           |  100 %                |                            
-----------------------------------------------------------------------------------------------------------------------------                                
run_training_epoch              |  1.0899e+04           |5                      |  5.4494e+04           |  98.02                |                            
run_training_batch              |  1.2468               |40031                  |  4.9912e+04           |  89.777               |                            
training_step_and_backward      |  1.2069               |40031                  |  4.8313e+04           |  86.902               |                            
model_forward                   |  0.67271              |40031                  |  2.6929e+04           |  48.438               |                            
optimizer_step_and_closure_0    |  1.3005               |20015                  |  2.603e+04            |  46.82                |                            
model_backward                  |  0.53271              |40031                  |  2.1325e+04           |  38.357               |                            
evaluation_step_and_end         |  0.29524              |10626                  |  3137.2               |  5.6429               |                            
on_validation_epoch_end         |  328.91               |6                      |  1973.4               |  3.5496               |                            
on_train_start                  |  1006.1               |1                      |  1006.1               |  1.8098               |
on_train_batch_end              |  0.0038002            |40030                  |  152.12               |  0.27362              |
get_train_batch                 |  0.0032787            |40031                  |  131.25               |  0.23608              |
on_validation_end               |  12.656               |6                      |  75.933               |  0.13658              |

We see that the Percentage column doesn't sum to 100, and indeed, model_forward in contained inside each run_training_batch thus its time is partial to run_training_batch time.
But what about model_backward? is it contained inside model_forward? I assume no, but it there is the ability to see the profiler in a "graph" like structure, where you can understand what operations are "leaf" nodes, or part-of some other operation, it will really ease up the analysis.

Example (taken from pytorch docs):
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureIs an improvement or enhancementhelp wantedOpen to be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions