-
Notifications
You must be signed in to change notification settings - Fork 814
One of the three datasets returned by Multi30k seems to be bugged. #2001
Description
🐛 Bug
Describe the bug A clear and concise description of what the bug is.
The testing data returned by Multi30k doesn't match the expected SHA256 hash. The precise error is:
RuntimeError: The computed hash 0681be16a532912288a91ddd573594fbdd57c0fbb81486eff7c55247e35326c2 of C:\Users\raaaa/.cache\torch\text\datasets\Multi30k\mmt16_task1_test.tar.gz does not match the expectedhash 6d1ca1dba99e2c5dd54cae1226ff11c2551e6ce63527ebb072a1f70f72a5cd36. Delete the file manually and retry.
This exception is thrown by __iter__ of HashCheckerIterDataPipe(hash_dict={'C:\\Users\\raaaa/.cache\\torch\\text\\datasets\\Multi30k\\mmt16_task1_test.tar.gz': '6d1ca1dba99e2c5dd54cae1226ff11c2551e6ce63527ebb072a1f70f72a5cd36'}, hash_type='sha256', rewind=True, source_datapipe=MapperIterDataPipe)
I've done what the message suggested; I deleted the files manually and did it again, but the same error occurs.
To Reproduce Steps to reproduce the behavior:
Paste the following into a new Python file and run it.
import torchtext
def _main():
train, val, test = torchtext.datasets.Multi30k(language_pair=("de", "en"))
# The following works fine because `val` and `train` datasets are fine.
# for thing in val:
# print(thing)
# break
# Invoking the generator (which is `test`) in the following way triggers the error.
for thing in test:
print(thing)
break
if __name__ == "__main__":
_main()
You should see the error I pasted above.
Expected behavior A clear and concise description of what you expected to happen.
I expect no error.
Environment
PyTorch version: 1.13.0+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A
OS: Microsoft Windows 11 Pro
GCC version: (x86_64-posix-seh, Built by strawberryperl.com project) 8.3.0
Clang version: Could not collect
CMake version: version 3.20.2
Libc version: N/A
Python version: 3.10.1 (tags/v3.10.1:2cd268a, Dec 6 2021, 19:10:37) [MSC v.1929 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22000-SP0
Is CUDA available: True
CUDA runtime version: 11.7.64
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3080
Nvidia driver version: 526.86
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] mypy==0.950
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.4
[pip3] torch==1.13.0+cu117
[pip3] torchaudio==0.13.0+cu117
[pip3] torchdata==0.5.0
[pip3] torchtext==0.14.0
[pip3] torchvision==0.14.0+cu117
[conda] Could not collect
You can get the script and run it with:
Additional context Add any other context about the problem here.