-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
bugSomething isn't workingSomething isn't workingdistributedGeneric distributed-related topicGeneric distributed-related topichelp wantedOpen to be worked onOpen to be worked on
Description
🐛 Bug
When i use ddp, i got this CUDA out of memory error.
But it works when using dp mode in same batch size.. I don't understand this situation.
And why is there a memory stack to GPU 0? Although i don't use GPU 0, There is a lot of memory consumption.
Please reproduce using the BoringModel
trainer = Trainer(fast_dev_run=False, gpus=args.gpu, max_epochs=args.epoch, distributed_backend='ddp', logger=tb_logger) # distributed_backend='dp')
trainer.fit(model=model, train_dataloader=train_loader, val_dataloaders=val_loader)
To Reproduce
Use following BoringModel and post here
Environment
Note: Bugs with code are solved faster ! Colab Notebook should be made public !
-
IDE: Please, use our python bug_report_model.py template. -
Colab Notebook: Please copy and paste the output from our environment collection script (or fill out the checklist below manually).
You can get the script and run it with:
wget https://raw.githubusercontent.com/PyTorchLightning/pytorch-lightning/master/tests/collect_env_details.py
# For security purposes, please check the contents of collect_env_details.py before running it.
python collect_env_details.py
- PyTorch Version (e.g., 1.0): 1.6.0
- Pytorch-lightning: 1.2.10
- OS (e.g., Linux): Ubuntu 18.04
- How you installed PyTorch (
conda,pip, source): conda - Python version: 3.x
- CUDA/cuDNN version:10.2
Additional context
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingdistributedGeneric distributed-related topicGeneric distributed-related topichelp wantedOpen to be worked onOpen to be worked on