Skip to content

Getting headers don't contain content-disposition error when downloading data from Google Drive #204

@Nayef211

Description

@Nayef211

🐛 Describe the bug

I seem to intermittently get the following error when executing the following piece of code to download torchtext datasets from Google drive.

>>> from torchtext.datasets.amazonreviewfull import AmazonReviewFull
>>> dataset = AmazonReviewFull(split="train")
>>> next(iter(dataset))

Stack trace

  File "/home/nayef211/.local/lib/python3.8/site-packages/torchdata/datapipes/iter/load/online.py", line 98, in __iter__
    yield _get_response_from_google_drive(url, timeout=self.timeout)
  File "/home/nayef211/.local/lib/python3.8/site-packages/torchdata/datapipes/iter/load/online.py", line 74, in _get_response_from_google_drive
    raise RuntimeError("Internal error: headers don't contain content-disposition.")
RuntimeError: Internal error: headers don't contain content-disposition.

According to @parmeet:

This error existed in torchtext even before the migration, So I don't think any changes on torchdata would have triggered this. In my experience, I have seen this when the quota is exceeded. This is a transient error, though I never dig deeper into if there is a way to increase the quota limit or other alternatives to prevent this from happening.

The relevant discussion for this error can be found on this PR thread pytorch/text#1594 (comment).

cc @ejguan

Versions

PyTorch version: 1.11.0.dev20220111
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: CentOS Stream 8 (x86_64)
GCC version: (GCC) 8.5.0 20210514 (Red Hat 8.5.0-3)
Clang version: Could not collect
CMake version: version 3.19.6
Libc version: glibc-2.28

Python version: 3.8.12 (default, Oct 12 2021, 13:49:34)  [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.6.13-0_fbk18_hardened_6007_g4c10224f1437-x86_64-with-glibc2.17
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.4
[pip3] pytorch-sphinx-theme==0.0.24
[pip3] torch==1.11.0.dev20220111
[pip3] torchdata==0.3.0a0+ec32ee4
[pip3] torchtext==0.12.0a0+b04ae99
[conda] blas                      1.0                         mkl
[conda] cpuonly                   2.0                           0    pytorch
[conda] mkl                       2021.4.0           h06a4308_640
[conda] numpy                     1.21.4                   pypi_0    pypi
[conda] pytorch                   1.11.0.dev20220111     py3.8_cpu_0    pytorch-nightly
[conda] pytorch-mutex             1.0                         cpu    pytorch
[conda] pytorch-sphinx-theme      0.0.24                   pypi_0    pypi
[conda] torchtext                 0.12.0a0+e691934          pypi_0    pypi

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions