Skip to content

FSDP error when load from state_dict #17566

@KevinCrp

Description

@KevinCrp

Bug description

Hi,

I'm using the FSSDStrategy to train a model (strategy = FSDPStrategy()). The training is OK and at the end, I got a .ckpt file.
When I load it with load_from_checkpoint() I got an issue about missing elements in the state_dict

RuntimeError: Error(s) in loading state_dict for Model:
        Missing key(s) in state_dict: "net.conv_list.0.att", "net.conv_list.0.bias", "net.conv_list.1.att", "net.conv_list.1.bias", "net.conv_list.2.att", "net.conv_list.2.bias", "net.mlp.norms.0.module.weight", "net.mlp.norms.0.module.bias", "net.mlp.norms.1.module.weight", "net.mlp.norms.1.module.bias".

How can I solve it?
Kevin

What version are you seeing the problem on?

v2_0

How to reproduce the bug

No response

Error messages and logs

# Error messages and logs here please

Environment

Current environment
#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0): 2.0.0
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 2.0): 2.0.0
#- Python version (e.g., 3.9): 3.8.16
#- OS (e.g., Linux): Linux
#- CUDA/cuDNN version: 12.0/cudnn8
#- GPU models and configuration: 
#- How you installed Lightning(`conda`, `pip`, source): conda
#- Running environment of LightningApp (e.g. local, cloud):

More info

No response

cc @awaelchli @carmocca

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions