Skip to content

SimpleFSDP Status Tracking #1980

@ruisizhang123

Description

@ruisizhang123

SimpleFSDP consists of two major components: (1) Frontend composability with different parallelisms & distributed training techniques; (2) Backend optimization in torch.compile to overlap communication.

Frontend Composability

Dense model (llama3)

  • [Done] Parallelisms: TP/PP/CP

  • [Done] other techniques: Distributed checkpointing / mixed-precision training / meta initialization / activation checkpointing

  • [Done] Float 8 training: numeric difference (@pianpwk): We will see numeric difference in inductor mode because of triton kernel implementations. But we get bit-wise numeric equivalence in aot_eager.

MoE model (DSV3)

Backend Optimization

Manual bucketing & reordering

Auto bucketing & reordering

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions