-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
bugSomething isn't workingSomething isn't workinghelp wantedOpen to be worked onOpen to be worked on
Milestone
Description
When I run training, I see progress bar indicate iterations/sec. How can I access it? I wrote a simple hook:
class PerfCallback(ProgressBar):
def __init__(self):
super().__init__() # don't forget this :)
def on_train_start(self, trainer, pl_module):
super().on_train_start(trainer, pl_module)
self.total_runtime = 0
self.unpadded_tokens = 0
self.all_tokens = 0
self.total_steps = 0
def on_train_batch_start(self, trainer, pl_module, batch, batch_idx, dataloader_idx):
self.t0 = time.time()
def on_train_batch_end(self, trainer, pl_module, outputs, batch, batch_idx, dataloader_idx):
runtime = time.time() - self.t0
self.total_runtime += runtime
print("on batch end runtime=%.2f, it/s = %.2f" %(runtime, 1/runtime))
However, my prints indicate ~1.7it/s but the progress bar sows 6.07s/it.
In my training_step, I also return some stats (batch size, number of tokens) in logs by returning the following:
{"loss": loss_tensors[0], "log":logs, "progress_bar":{"global_step":self.global_step}}
for more relevant perf metrics. However, in my callback. print(outputs) shows an empty list. Anything I'm missing?
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinghelp wantedOpen to be worked onOpen to be worked on