When using ddp with multiple gpus, each validation and test loop is called with the entire validation dataset for each gpu.
Expected behavior is that the dataset is divided appropriately across the gpus.
I am using current master (cloned Mar 14), Ubuntu 19.10, Cuda 10.1, python 3.7.5, pytorch 1.4, venv environment.
The problem appears to be in auto_add_sampler() in data_loading.py. It does not create a DistributedSampler for validation or test datasets.