-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
bugSomething isn't workingSomething isn't workingstrategy: fsdpFully Sharded Data ParallelFully Sharded Data Parallelver: 2.0.x
Milestone
Description
Bug description
Hi,
I'm using the FSSDStrategy to train a model (strategy = FSDPStrategy()). The training is OK and at the end, I got a .ckpt file.
When I load it with load_from_checkpoint() I got an issue about missing elements in the state_dict
RuntimeError: Error(s) in loading state_dict for Model:
Missing key(s) in state_dict: "net.conv_list.0.att", "net.conv_list.0.bias", "net.conv_list.1.att", "net.conv_list.1.bias", "net.conv_list.2.att", "net.conv_list.2.bias", "net.mlp.norms.0.module.weight", "net.mlp.norms.0.module.bias", "net.mlp.norms.1.module.weight", "net.mlp.norms.1.module.bias".How can I solve it?
Kevin
What version are you seeing the problem on?
v2_0
How to reproduce the bug
No response
Error messages and logs
# Error messages and logs here please
Environment
Current environment
#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0): 2.0.0
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 2.0): 2.0.0
#- Python version (e.g., 3.9): 3.8.16
#- OS (e.g., Linux): Linux
#- CUDA/cuDNN version: 12.0/cudnn8
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source): conda
#- Running environment of LightningApp (e.g. local, cloud):
More info
No response
anicolson
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingstrategy: fsdpFully Sharded Data ParallelFully Sharded Data Parallelver: 2.0.x