-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Description
🐛 Bug
This is a rather difficult bug to diagnose because certain internet activity must be present. The issue is with torhvision.datasets.utils.download_file_from_google_drive(). It does not gracefully handle large files that have exceeded their daily download quota.
To Reproduce
The following two prerequisites must be met in order to detect this issue.
- the quota for a google drive file must be exceeded for the day.
- file size should be relatively large (> 3 GB). It's not clear how file size impacts this behavior but I've experienced this behavior multiple times with large files.
from torchvision.datasets.utils import *
# use the WIDERFACE training data file as an example
file_id = '0B6eKvaijfFUDQUUwd21EckhUbWs'
root = 'data_folder'
filename = 'WIDER_train.zip'
md5 = '3fedf70df600953d25982bcd13d91ba2'
download_file_from_google_drive(file_id, root, filename, md5)will lead to the python session getting killed.
The python process hangs on the call to torchvision.datasets.utils._quota_exceeded(...). My best guess is the code in this function is performing a string search that is either inefficient or causing python to search the entire data payload (resulting in a timeout).
def _quota_exceeded(response: "requests.models.Response") -> bool: # type: ignore[name-defined]
return "Google Drive - Quota exceeded" in response.textExpected behavior
Calling download_file_from_google_drive(...) should not kill the session when download quota thresholds have been met on large files.
Environment
PyTorch version: 1.8.0.dev20201021
Is debug build: True
CUDA used to build PyTorch: Could not collect
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.1 LTS (x86_64)
GCC version: (Ubuntu 8.4.0-3ubuntu2) 8.4.0
Clang version: 10.0.1-++20200708122807+ef32c611aa2-1~exp1~20200707223407.61
CMake version: version 3.18.1
Python version: 3.8 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: 10.1.105
GPU models and configuration: GPU 0: GeForce GTX 1650
Nvidia driver version: 450.80.02
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.19.1
[pip3] torch==1.8.0.dev20201021
[pip3] torchvision==0.9.0a0+9984146
[conda] blas 1.0 mkl
[conda] cpuonly 1.0 0 pytorch-nightly
[conda] cudatoolkit 10.1.243 h6bb024c_0
[conda] mkl 2020.2 256
[conda] mkl-service 2.3.0 py38he904b0f_0
[conda] mkl_fft 1.2.0 py38h23d657b_0
[conda] mkl_random 1.1.1 py38h0573a6f_0
[conda] numpy 1.19.1 py38hbc911f0_0
[conda] numpy-base 1.19.1 py38hfa32c7d_0
[conda] pytorch 1.8.0.dev20201021 py3.8_cpu_0 [cpuonly] pytorch-nightly
[conda] torchvision 0.8.0a0+1fbd0b7 pypi_0 pypi
cc @pmeier