Skip to content

Avoid unnecessary DDP synchronization when gradient_accumulation_steps > 1  #4092

@noamwies

Description

@noamwies

🚀 Feature

Avoid unnecessary DDP synchronization when gradient_accumulation_steps > 1

Motivation

When training large models the synchronization is costly and the actual speedup from 2 gpus is much lower than 200%

Pitch

We can use DDP no_sync feature to avoid synchronization in steps that doesn't call optimizer_step

Metadata

Metadata

Assignees

Labels

featureIs an improvement or enhancementhelp wantedOpen to be worked on

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions