Skip to content

Accessing performance of model in progress bar #4326

@swethmandava

Description

@swethmandava

When I run training, I see progress bar indicate iterations/sec. How can I access it? I wrote a simple hook:

class PerfCallback(ProgressBar):

    def __init__(self):
        super().__init__()  # don't forget this :)

    def on_train_start(self, trainer, pl_module):
        super().on_train_start(trainer, pl_module)
        self.total_runtime = 0
        self.unpadded_tokens = 0
        self.all_tokens = 0
        self.total_steps = 0

    def on_train_batch_start(self, trainer, pl_module, batch, batch_idx, dataloader_idx):
        self.t0 = time.time()

    def on_train_batch_end(self, trainer, pl_module, outputs, batch, batch_idx, dataloader_idx):
        runtime = time.time() - self.t0
        self.total_runtime += runtime
        print("on batch end runtime=%.2f, it/s = %.2f" %(runtime, 1/runtime))

However, my prints indicate ~1.7it/s but the progress bar sows 6.07s/it.

In my training_step, I also return some stats (batch size, number of tokens) in logs by returning the following:

{"loss": loss_tensors[0], "log":logs, "progress_bar":{"global_step":self.global_step}}

for more relevant perf metrics. However, in my callback. print(outputs) shows an empty list. Anything I'm missing?

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinghelp wantedOpen to be worked on

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions