Skip to content

Conversation

@AniTho
Copy link

@AniTho AniTho commented Oct 3, 2020

Added the function to download LSUN data from the site, extract it and then delete zip file

@pmeier
Copy link
Contributor

pmeier commented Oct 3, 2020

Closes #2746. @AniTho I'll review in 1-2 days.

Copy link
Contributor

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR @AniTho! I have a few comments how to make it more robust. Furthermore, the linter is failing with

./torchvision/datasets/lsun.py:97:1: W293 blank line contains whitespace

Could you fix that?

# for each class, create an LSUNClassDataset
self.dbs = []
for c in self.classes:
self._download(c)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The download should be optional. Other datasets have a download=True flag in the constructor that indicates whether the download should happen or not.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I will make the changes.
Thanks for your support and guidance.

Comment on lines +102 to +106
if not(os.path.isfile(os.path.join(self.root, file_name))):
download_url(url, self.root, file_name)
print("Extracting File")
extract_archive(file_path, self.root, True)
print("Done!!!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make this simpler by using

def download_and_extract_archive(
url: str,
download_root: str,
extract_root: Optional[str] = None,
filename: Optional[str] = None,
md5: Optional[str] = None,
remove_finished: bool = False,
) -> None:

self.length = count

def _download(self, c):
url = 'http://dl.yf.io/lsun/scenes/{}'.format(c + '_lmdb.zip')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the download you could check if the lmdb file is present and if its MD5 checks out with

def check_integrity(fpath: str, md5: Optional[str] = None) -> bool:
if not os.path.isfile(fpath):
return False
if md5 is None:
return True
return check_md5(fpath, md5)

If that is the case you don't need to download anything.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmeier Do I have to find md5 hash, if yes how? otherwise I can just directly check if the file exists or not

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I have to find md5 hash [...] otherwise I can just directly check if the file exists or not

Although not terribly common, download errors happen. Without a checksum, you have no idea if the file maybe is corrupt. Thus it we prefer to have a MD5 checksum for most files. In your case that would mean for the .zip and .lmdb files of every class.

if yes how?

You can download and extract all archives with the download logic you already have. Afterwards you can use a variety of tools. Depending on your setup you might have md5sum already installed. If you don't want additional software, you can use torchvision.datasets.utils.calculate_md5

def calculate_md5(fpath: str, chunk_size: int = 1024 * 1024) -> str:

@codecov
Copy link

codecov bot commented Oct 6, 2020

Codecov Report

Merging #2748 into master will increase coverage by 0.07%.
The diff coverage is 93.44%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2748      +/-   ##
==========================================
+ Coverage   73.05%   73.12%   +0.07%     
==========================================
  Files          96       96              
  Lines        8298     8317      +19     
  Branches     1291     1293       +2     
==========================================
+ Hits         6062     6082      +20     
  Misses       1838     1838              
+ Partials      398      397       -1     
Impacted Files Coverage Δ
torchvision/transforms/functional.py 80.36% <33.33%> (-1.77%) ⬇️
torchvision/io/image.py 86.00% <80.00%> (+6.00%) ⬆️
torchvision/transforms/functional_tensor.py 72.54% <97.67%> (+2.46%) ⬆️
torchvision/io/__init__.py 100.00% <100.00%> (ø)
torchvision/transforms/functional_pil.py 66.19% <100.00%> (+0.97%) ⬆️
torchvision/transforms/transforms.py 80.92% <100.00%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a9a8220...c0188c8. Read the comment docs.

@pmeier
Copy link
Contributor

pmeier commented Apr 7, 2022

Closing this since it is stale and the prototype version of LSUN in #5390 includes downloads.

@pmeier pmeier closed this Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants