Skip to content

Caltech101, Caltech256 downloads are broken due to Google Drive redirect "scan for viruses" popup #5716

@ellisbrown

Description

@ellisbrown

🐛 Describe the bug

It seems that Caltech101 and Caltech256 both fail to be downloaded currently using torchvision==0.12.0. The tarfile download links both redirect to Google Drive links. Unfortunately, both links download the following HTML popup instead of the actual tarfile:
image

I found an old issue with what appears to be a similar bug #4904.

The errors are slightly different for the different datasets, so I've pasted both below. Note: I've tried this on a remote Linux host and on my local MacBook to the same effect.

Caltech101

Replication steps:

from torchvision.datasets import Caltech101
Caltech101(".", download=True)

Error Stack Trace:

2216.0 bytesExtracting ./caltech101/101_ObjectCategories.tar.gz to ./caltech101
Traceback (most recent call last):
  File "/Users/ellisbrown/miniconda3/envs/pytorch/lib/python3.8/tarfile.py", line 1674, in gzopen
    t = cls.taropen(name, mode, fileobj, **kwargs)
  File "/Users/ellisbrown/miniconda3/envs/pytorch/lib/python3.8/tarfile.py", line 1651, in taropen
    return cls(name, mode, fileobj, **kwargs)
  File "/Users/ellisbrown/miniconda3/envs/pytorch/lib/python3.8/tarfile.py", line 1514, in __init__
    self.firstmember = self.next()
  File "/Users/ellisbrown/miniconda3/envs/pytorch/lib/python3.8/tarfile.py", line 2318, in next
    tarinfo = self.tarinfo.fromtarfile(self)
  File "/Users/ellisbrown/miniconda3/envs/pytorch/lib/python3.8/tarfile.py", line 1104, in fromtarfile
    buf = tarfile.fileobj.read(BLOCKSIZE)
  File "/Users/ellisbrown/miniconda3/envs/pytorch/lib/python3.8/gzip.py", line 292, in read
    return self._buffer.read(size)
  File "/Users/ellisbrown/miniconda3/envs/pytorch/lib/python3.8/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/Users/ellisbrown/miniconda3/envs/pytorch/lib/python3.8/gzip.py", line 479, in read
    if not self._read_gzip_header():
  File "/Users/ellisbrown/miniconda3/envs/pytorch/lib/python3.8/gzip.py", line 427, in _read_gzip_header
    raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b'<!')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ellisbrown/miniconda3/envs/pytorch/lib/python3.8/site-packages/torchvision/datasets/caltech.py", line 50, in __init__
    self.download()
  File "/Users/ellisbrown/miniconda3/envs/pytorch/lib/python3.8/site-packages/torchvision/datasets/caltech.py", line 131, in download
    download_and_extract_archive(
  File "/Users/ellisbrown/miniconda3/envs/pytorch/lib/python3.8/site-packages/torchvision/datasets/utils.py", line 434, in download_and_extract_archive
    extract_archive(archive, extract_root, remove_finished)
  File "/Users/ellisbrown/miniconda3/envs/pytorch/lib/python3.8/site-packages/torchvision/datasets/utils.py", line 409, in extract_archive
    extractor(from_path, to_path, compression)
  File "/Users/ellisbrown/miniconda3/envs/pytorch/lib/python3.8/site-packages/torchvision/datasets/utils.py", line 272, in _extract_tar
    with tarfile.open(from_path, f"r:{compression[1:]}" if compression else "r") as tar:
  File "/Users/ellisbrown/miniconda3/envs/pytorch/lib/python3.8/tarfile.py", line 1621, in open
    return func(name, filemode, fileobj, **kwargs)
  File "/Users/ellisbrown/miniconda3/envs/pytorch/lib/python3.8/tarfile.py", line 1678, in gzopen
    raise ReadError("not a gzip file")
tarfile.ReadError: not a gzip file

Caltech256

Replication steps:

from torchvision.datasets import Caltech256
Caltech256(".", download=True)

Error Stack Trace:

2213.0 bytesExtracting ./caltech256/256_ObjectCategories.tar to ./caltech256
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ellisbrown/miniconda3/envs/pytorch/lib/python3.8/site-packages/torchvision/datasets/caltech.py", line 172, in __init__
    self.download()
  File "/Users/ellisbrown/miniconda3/envs/pytorch/lib/python3.8/site-packages/torchvision/datasets/caltech.py", line 230, in download
    download_and_extract_archive(
  File "/Users/ellisbrown/miniconda3/envs/pytorch/lib/python3.8/site-packages/torchvision/datasets/utils.py", line 434, in download_and_extract_archive
    extract_archive(archive, extract_root, remove_finished)
  File "/Users/ellisbrown/miniconda3/envs/pytorch/lib/python3.8/site-packages/torchvision/datasets/utils.py", line 409, in extract_archive
    extractor(from_path, to_path, compression)
  File "/Users/ellisbrown/miniconda3/envs/pytorch/lib/python3.8/site-packages/torchvision/datasets/utils.py", line 272, in _extract_tar
    with tarfile.open(from_path, f"r:{compression[1:]}" if compression else "r") as tar:
  File "/Users/ellisbrown/miniconda3/envs/pytorch/lib/python3.8/tarfile.py", line 1608, in open
    raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully

Contents of downloaded file:

Note: this is the same for caltech101/101_ObjectCategories.tar.gz and caltech256/256_ObjectCategories.tar

<!DOCTYPE html>
<html>
	<head>
		<title>Google Drive - Virus scan warning</title>
		<meta http-equiv="content-type" content="text/html; charset=utf-8"/>
		<style nonce="h28dxYpLOAY/48+kGat7VA">/* Copyright 2022 Google Inc. All Rights Reserved. */
.goog-inline-block{position:relative;display:-moz-inline-box;display:inline-block}* html .goog-inline-block,*:first-child+html .goog-inline-block{display:inline}.goog-link-button{position:relative;color:#15c;text-decoration:underline;cursor:pointer}.goog-link-button-disabled{color:#ccc;text-decoration:none;cursor:default}body{color:#222;font:normal 13px/1.4 arial,sans-serif;margin:0}.grecaptcha-badge{visibility:hidden}.uc-main{padding-top:50px;text-align:center}#uc-dl-icon{display:inline-block;margin-top:16px;padding-right:1em;vertical-align:top}#uc-text{display:inline-block;max-width:68ex;text-align:left}.uc-error-caption,.uc-warning-caption{color:#222;font-size:16px}#uc-download-link{text-decoration:none}.uc-name-size a{color:#15c;text-decoration:none}.uc-name-size a:visited{color:#61c;text-decoration:none}.uc-name-size a:active{color:#d14836;text-decoration:none}.uc-footer{color:#777;font-size:11px;padding-bottom:5ex;padding-top:5ex;text-align:center}.uc-footer a{color:#15c}.uc-footer a:visited{color:#61c}.uc-footer a:active{color:#d14836}.uc-footer-divider{color:#ccc;width:100%}</style>
		<link rel="icon" href="null"/>
	</head>
	<body>
		<div class="uc-main">
			<div id="uc-dl-icon" class="image-container">
				<div class="drive-sprite-aux-download-file"></div>
			</div>
			<div id="uc-text">
				<p class="uc-warning-caption">Google Drive can't scan this file for viruses.</p>
				<p class="uc-warning-subcaption">
					<span class="uc-name-size">
						<a href="/open?id=137RyRjvTBkBiIfeYBNZBtViDHQ6_Ewsp">101_ObjectCategories.tar.gz</a> (126M)
					</span> is too large for Google to scan for viruses. Would you still like to download this file?
				</p>
				<form id="downloadForm" action="https://docs.google.com/uc?export=download&amp;id=137RyRjvTBkBiIfeYBNZBtViDHQ6_Ewsp&amp;confirm=t" method="post">
					<input type="submit" id="uc-download-link" class="goog-inline-block jfk-button jfk-button-action" value="Download anyway"/>
				</form>
			</div>
		</div>
		<div class="uc-footer">
			<hr class="uc-footer-divider">
			</div>
		</body>
	</html>

Versions

Collecting environment information...
PyTorch version: 1.11.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 12.3 (arm64)
GCC version: Could not collect
Clang version: 13.1.6 (clang-1316.0.21.2)
CMake version: Could not collect
Libc version: N/A

Python version: 3.8.11 (default, Aug 16 2021, 12:04:33) [Clang 12.0.0 ] (64-bit runtime)
Python platform: macOS-12.3-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.22.3
[pip3] torch==1.11.0
[pip3] torchaudio==0.11.0
[pip3] torchvision==0.12.0
[conda] numpy 1.22.3 pypi_0 pypi
[conda] torch 1.11.0 pypi_0 pypi
[conda] torchaudio 0.11.0 pypi_0 pypi
[conda] torchvision 0.12.0 pypi_0 pypi

cc @pmeier @YosuaMichael

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions