Skip to content

Conversation

@mpashchenkov
Copy link

@mpashchenkov mpashchenkov commented Dec 10, 2020

Added python script which downloads models from https://github.com/onnx/models
Behavior (UPDATED):

  • Script creates two directories onnx_models_cache and onnx_models;
  • onnx_models_cache contains result of git clone (small files);
  • onnx_models contains large ONNX models which will be used for something;
  • Checks sha for models from onnx_models folder and skip download if model already exists and pass sha check;
  • You can remove onnx_models and call script again, then models will be downloaded again;
  • Keeps constant size of onnx_models_cache and removes lfs folder from onnx_models_cache for that;
  • Script is tested on Linux and Windows;
  • Names of models which wasn't downloaded are placed to failedModels.

Also you can call download_onnx_models.py with model_name/print,

  • model with model_name will be downloaded in this case;
  • print - prints all models which script can download.

Added new directories:

  • testdata/gapi;
  • testdata/gapi/onnx;

@alalek
Copy link
Member

alalek commented Dec 10, 2020

testdata/gapi/onnx/images/laptop.jpg

Legal/copyright information is required about the used image.

@alalek
Copy link
Member

alalek commented Dec 11, 2020

git clone
git lfs pull

Until downloaded these LFS files EXIST and store lfs metadata.
It is really hard to debug problem of "missing test file" from tests logs.


Git repository itself should go into dedicated "download_cache" directory.

What is about repository commit revision?
File hashsum (MD5/SHA256)? (to ensure that it is valid/not truncated/etc)

# and "lfs" folder will exist during download
lfs_cache_path = CACHE_DIR + '.git/lfs'
if os.path.exists(lfs_cache_path):
rmdir_with_data(lfs_cache_path)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alalek, it is questionable step. Cache directory gets some data after calling git lfs pull. I remove this folder and don't know. Should cache directory have constant size (180 MB) or should contain received data (500 mb)? This (without deleting lfs folder) avoids downloading models after deleting onnx_models folder.

@mpashchenkov
Copy link
Author

@smirnov-alexey, can you review this?

Copy link

@smirnov-alexey smirnov-alexey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

Copy link

@OrestChura OrestChura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, most are just questions to double-check

Copy link

@OrestChura OrestChura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comment cavils last, but I'm all good with the script. Thanks!

@mpashchenkov
Copy link
Author

@alalek, does script have correct behavior (check description)? If yes, can we merge it?

@dmatveev
Copy link

@alalek can we merge this thing? I believe it is required for a more correct ONNX models coverage we do now.

Copy link
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good!

Comment on lines 73 to 80
def clear_pulled_files():
# "lfs" directory contains some data
# Remove this folder then "onnx_models_cache" will have standard size - 180 MB
# If we remove "onnx_models" folder then models will be downloaded again
# and "lfs" folder will exist during download
lfs_cache_path = CACHE_DIR + '.git/lfs'
if os.path.exists(lfs_cache_path):
rmdir_with_data(lfs_cache_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to clean '.git' internals? Does it break anything?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but without cleaning models won't be downloaded again and will be moved from cache. And for models update you should remove cache directory.
It is main question about behavior. Do we need update models from GH when we remove models directory (NOT cache)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't clear the cache now.

Comment on lines 86 to 78
os.system('git clone --recursive https://github.com/onnx/models.git onnx_models_cache')
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git clone

Need to fetch+checkout the latest commits from the upstream repository if folder already exists.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added git pull of master branch.


CUR_DIR = os.getcwd()
# This directory contains result of "git clone"
CACHE_DIR = CUR_DIR + '/onnx_models_cache/'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'/onnx_models_cache/'

Please use '/.cache/onnx_models/'

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@mpashchenkov mpashchenkov requested a review from alalek December 13, 2021 09:07
@mpashchenkov
Copy link
Author

@alalek, Could this be merged if all comments are applied correctly?

@alalek alalek merged commit 37294e3 into opencv:4.x Dec 23, 2021
@alalek alalek mentioned this pull request Dec 30, 2021
@alalek alalek mentioned this pull request Feb 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants