Skip to content

Conversation

@diablodale
Copy link
Contributor

another fix for opencv/opencv#20575. The downloader will now delete invalid (bad hash) files. This prevents later failures in running OpenCV code which attempts to use invalid files.

tested on 3.4. Approach should also work in 4.x

@alalek
Copy link
Member

alalek commented Aug 30, 2021

@diablodale I added "rename" feature in the second commit. This is useful for further investigation of problems (e.g., separate problems of empty file or "captcha" requests).
Please take a look.

BTW, ResNet50 downloads unique content from OneDrive, so new and new files are created.

return self.verify()
candidate_verify = self.verify()
if not candidate_verify:
if os.path.exists(self.filename):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to extract this block of code into a new class method, e.g. handle_bad_download:

if not self.verify():
    self.handle_bad_download();
    return False

return True

os.remove(self.filename)
try:
if hasattr(self, 'sha_actual') and self.sha_actual:
rename_target = self.filename + '.' + str(self.sha_actual)
Copy link
Contributor Author

@diablodale diablodale Aug 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to have a single invalid name. Not to have always changing invalid names. Difficult to manage cleanup, ever increase drive usage, etc. 👎 Simpler code too.

It is easily possible for a download's sha_actual to change on every attempt. The results in unbounded number of invalid files. This is a problem I suggest we don't create. (Similar to log files that don't manage themselves.) If some developer out there needs per-download-attempt-backup-files, they can rename it themselves.

I do not think there is any useful mainstream scenario that needs per-download-attempt-backup-files.
A single xxxx.invalid for the most recent attempt is all that is needed and for it to be replaced with a new invalid attempt. AND...to be removed when an attempt succeeds!

@diablodale
Copy link
Contributor Author

BTW, ResNet50 downloads unique content from OneDrive, so new and new files are created.

Ok, then what needs to change in the downloader? If I take the resnet50 URL from download_models.py and copy it to my browser, it continues to fail with:

This item might not exist or is no longer available
This item might have been deleted, expired, or you might not have permission to view it. Contact the owner of this item for more information.

@alalek
Copy link
Member

alalek commented Sep 1, 2021

Updated to address review comments.

what needs to change in the downloader? If I take the resnet50 URL from download_models.py and copy it to my browser, it continues to fail

This can be fixed by new public working link for this model (current link is dead - this happens with "external" resources).
Or place correct file on the target location (scripts would check hash sum and would not try to download anything if content is valid).

@diablodale
Copy link
Contributor Author

This can be fixed by new public working link for this model (current link is dead - this happens with "external" resources).
Or place correct file on the target location (scripts would check hash sum and would not try to download anything if content is valid).

Oops. I misunderstood your original comment, "...ResNet50 downloads unique content from OneDrive, so new and new files are created." Does your original comment and "...can be fixed by new public working link..." mean the same thing? If not, please help me understand your original comment more.

os.remove(rename_target)
finally:
os.rename(self.filename, rename_target)
print(' downloaded content is renamed to ' + rename_target)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend clearly stating this is an error condition and the action taken. Current wording is vague. For example, a change of
"downloaded content is renamed to"
to
"renaming invalid file to".

That makes the two error conditions express their action
"renaming invalid file to"
"deleting invalid file"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@alalek
Copy link
Member

alalek commented Sep 1, 2021

ResNet50 downloads unique content from OneDrive, so new and new files are created

OneDrive returns some stub message on HTTP request for missing or not available files. This stub is different (with different hash sums).

...can be fixed by new public working link...

The current link doesn't work. Another valid link should help here (but we don't have such link).

@alalek
Copy link
Member

alalek commented Sep 1, 2021

👍

@alalek alalek merged commit 0b9e766 into opencv:3.4 Sep 1, 2021
@alalek alalek mentioned this pull request Sep 1, 2021
@alalek alalek mentioned this pull request Oct 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants