Skip to content

cli: Confused on (str, int, List[int]) variants for argparse for --gpus flag? #6228

@EricCousineau-TRI

Description

@EricCousineau-TRI

🐛 Bug

A colleague (@siyuanfeng-tri) and I sometimes get confused on how the --gpus flag is to be interpreted by argparse. I see the following docs:
https://pytorch-lightning.readthedocs.io/en/1.2.1/advanced/multi_gpu.html#select-gpu-devices

But we're sometimes confused about when argparse interpretation will either assume it's the count of the gpu (int/str) or the device index (List[int]).

Are there docs for this? If not, can that be clarified somehow?

Please reproduce using the BoringModel

see notebook

The main complaint is that gpus=3 implies gpus=[3], while gpus="3" implies gpus=[0,1,2].
Mix that with implicit conversion from argparse from str to int, and you get a kinda weird public interface.

To Reproduce

example notebook: https://colab.research.google.com/drive/1pe9_F2S73-gQ3hOeh_MMiGhmbXmGURDQ?usp=sharing

Expected behavior

Less confusing / more explicit options? (maybe my complaint is with weird implicit behavior of Trainer(gpus=...)?)

Environment

  • PyTorch Version (e.g., 1.7.1):
  • OS: Ubuntu 18.04
  • How you installed PyTorch: pip
  • Python version: 3.6.9
  • CUDA/cuDNN version: N/A
  • GPU models and configuration: N/A
  • Any other relevant information: N/A

Additional context

N/A

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingdocsDocumentation relatedhelp wantedOpen to be worked onpriority: 1Medium priority taskquestionFurther information is requested

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions