Skip to content

CombinedLoader changes sampling in DDP  #7013

@lukashermann

Description

@lukashermann

🐛 Bug

The behavior of the validation dataloader sampling changes if you use the CombinedLoader with ddp in comparison to using a single dataloader. The CombinedLoader does not split and distribute validation dataset on the gpus, but all gpus get the full validation set. The problem is resolved when you explicitly pass the DistributedSampler to the dataloader.

Please reproduce using the BoringModel

https://gist.github.com/lukashermann/b19964ba32c9bde241be3e54deea01ad

To Reproduce

To reproduce run the file and check cmd line output.

Expected behavior

Single dataloader:
device cuda:1 [1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63]
device cuda:0 [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62]

Combined dataloader:
device cuda:0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]
device cuda:1 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]

Is this behavior intended?

Environment

  • CUDA:
    - GPU:
    - GeForce RTX 2080 Ti
    - GeForce RTX 2080 Ti
    - available: True
    - version: 11.1
  • Packages:
    - numpy: 1.19.2
    - pyTorch_debug: False
    - pyTorch_version: 1.8.0
    - pytorch-lightning: 1.3.0rc1
    - tqdm: 4.53.0
  • System:
    - OS: Linux
    - architecture:
    - 64bit
    - ELF
    - processor: x86_64
    - python: 3.8.5
    - version: removed reduce on non-loss outputs from dp #78-Ubuntu SMP Fri Mar 19 13:29:52 UTC 2021

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingdata handlingGeneric data-related topicdistributedGeneric distributed-related topichelp wantedOpen to be worked onpriority: 2Low priority task

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions