-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add a metadata backfill task #15368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a metadata backfill task #15368
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit regarding available config versus hard-coding subdomains. Otherwise, should be fine?
Suspect IssuesThis pull request was deployed and Sentry observed the following issues:
Did you find this useful? React with a 👍 or 👎 |
# Hash the metadata and add it to the File instance | ||
file_.metadata_file_sha256_digest = ( | ||
hashlib.sha256(wheel_metadata_contents).hexdigest().lower() | ||
) | ||
file_.metadata_file_blake2_256_digest = ( | ||
hashlib.blake2b(wheel_metadata_contents, digest_size=256 // 8) | ||
.hexdigest() | ||
.lower() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume the digests here are written to the DB immediately? (Rather than when _file
's destructor is called)
If so, it seems like there is a failure mode here that could leave wheels without their metadata file?
ie: If the backfill task's worker process gets killed/restarted between writing the digests and storing the metadata file in the storage backend (or if the write to the storage backend fails). In such a case, the next time the backfill tasks runs, the wheel in question wouldn't be returned by the DB query since the digests would then be set - even though the metadata file was never uploaded.
If setting the digests was moved after writing to the storage backend, then in the case of an interrupted backfill task, the inconsistency between storage backend (which would have the uploaded file) and DB digest state (which wouldn't yet have the digest) would instead self-heal on the next run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a non-issue, the changes to the database aren't persisted until the entire request has completed and a transaction is committed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good to know. (The fact that everything was configured to use transactions isn't obvious from code inspection of this single task.)
Towards resolving #8254, this adds a periodic task that backfills wheel metadata in reverse chronological order.