-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
🐛 Bug
A colleague (@siyuanfeng-tri) and I sometimes get confused on how the --gpus flag is to be interpreted by argparse. I see the following docs:
https://pytorch-lightning.readthedocs.io/en/1.2.1/advanced/multi_gpu.html#select-gpu-devices
But we're sometimes confused about when argparse interpretation will either assume it's the count of the gpu (int/str) or the device index (List[int]).
Are there docs for this? If not, can that be clarified somehow?
Please reproduce using the BoringModel
see notebook
The main complaint is that gpus=3 implies gpus=[3], while gpus="3" implies gpus=[0,1,2].
Mix that with implicit conversion from argparse from str to int, and you get a kinda weird public interface.
To Reproduce
example notebook: https://colab.research.google.com/drive/1pe9_F2S73-gQ3hOeh_MMiGhmbXmGURDQ?usp=sharing
Expected behavior
Less confusing / more explicit options? (maybe my complaint is with weird implicit behavior of Trainer(gpus=...)?)
Environment
- PyTorch Version (e.g., 1.7.1):
- OS: Ubuntu 18.04
- How you installed PyTorch:
pip - Python version: 3.6.9
- CUDA/cuDNN version: N/A
- GPU models and configuration: N/A
- Any other relevant information: N/A
Additional context
N/A