Downloads from Google Drive return empty files / are still broken

## 🐛 Bug

All downloaded files are empty when downloading from google Drive via `torchvision.utils.download_file_from_google_drive` (or methods that resolve to this function, e.g. `download_url`)

## To Reproduce

Steps to reproduce the behavior:

1. install torchvision (tested with `master` branch at 959666891589c35e6d225943253f523cffbae4cc)
2. Run code below
```
from torchvision.datasets.utils import download_url, download_file_from_google_drive

try:
    download_url(
        "http://www.vision.caltech.edu/Image_Datasets/Caltech101/101_ObjectCategories.tar.gz",
        "./caltech101",
        filename="101_ObjectCategories.tar.gz",
        md5="b224c7392d521a49829488ab0f1120d9")
except:
    pass
finally:
    folder = './miniimagenet'
    gdrive_id = '16V_ZlkW4SsnNDtnGmaBRq2OoPmUOc5mY'
    gz_filename = 'mini-imagenet.tar.gz'
    gz_md5 = 'b38f1eb4251fb9459ecc8e7febf9b2eb'
    download_file_from_google_drive(gdrive_id, folder, gz_filename, md5=gz_md5)
```
3. Afterwards, both `101_ObjectCategories.tar.gz` and `mini-imagenet.tar.gz` are empty files

## Expected behavior

Download fails explicitly if google drive quota is exceeded, and succeeds otherwise.

## Additional context & Error source

Related Issues #3708 #2992
Related PRs: #3710 #3035

The issue stems from the fact that google drive issues a quota on downloads and that the returned `response.status_code` cannot be used to check the quota consistently (refer #2992). As a workaround, we therefore need to check the payload for the corresponding string via `_quota_exceeded`. Before #3710, this required parsing the whole payload/content, which was infeasible and therefore disabled in #3035.

However, the proposed solution in #3710 breaks `torchvision.utils.download_file_from_google_drive`. The reason is that one should only iterate once over the content of a Response, refer the [requests documentation](https://docs.python-requests.org/en/latest/user/advanced/#body-content-workflow) for streaming content.

Opposed to this, we currently construct iterators twice from a streaming Response: first [here](https://github.com/pytorch/vision/blob/master/torchvision/datasets/utils.py#L189), second [here](https://github.com/pytorch/vision/blob/master/torchvision/datasets/utils.py#L252).

As a result, the second iterator has length of 0 and therefore the files written to disk are empty.

## Proposed solution

Since we do not want to issue the same request twice to google drive (the response may be different between first request and 2nd one from google drive), I suggest to 
1. construct the Iterator + extract its first chunk inside `download_file_from_google_drive`
2. pass only the first chunk to `_quota_exceeded`
3. pass the first chunk + partially consumed Iterator to `_save_response_content` if the quota check is passed

I have begun working on such a solution here: 

https://github.com/ORippler/vision/tree/fix_google_drive_quotacheck

@pmeier 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Downloads from Google Drive return empty files / are still broken #4108

🐛 Bug

To Reproduce

Expected behavior

Additional context & Error source

Proposed solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Downloads from Google Drive return empty files / are still broken #4108

Description

🐛 Bug

To Reproduce

Expected behavior

Additional context & Error source

Proposed solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions