Skip to content

Conversation

di
Copy link
Member

@di di commented Feb 13, 2024

Towards resolving #8254, this adds a periodic task that backfills wheel metadata in reverse chronological order.

@di di requested a review from a team as a code owner February 13, 2024 00:10
Copy link
Member

@ewdurbin ewdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit regarding available config versus hard-coding subdomains. Otherwise, should be fine?

@di di requested a review from ewdurbin February 13, 2024 00:37
@di di requested a review from ewdurbin February 13, 2024 16:43
@ewdurbin ewdurbin merged commit fd35a53 into pypi:main Feb 13, 2024
@sentry-io
Copy link

sentry-io bot commented Feb 13, 2024

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ OperationalError: ["raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if t... warehouse.packaging.tasks.backfill_metadata View Issue
  • ‼️ OperationalError: ["raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if t... warehouse.packaging.tasks.backfill_metadata View Issue

Did you find this useful? React with a 👍 or 👎

Comment on lines +94 to +102
# Hash the metadata and add it to the File instance
file_.metadata_file_sha256_digest = (
hashlib.sha256(wheel_metadata_contents).hexdigest().lower()
)
file_.metadata_file_blake2_256_digest = (
hashlib.blake2b(wheel_metadata_contents, digest_size=256 // 8)
.hexdigest()
.lower()
)
Copy link

@edmorley edmorley Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume the digests here are written to the DB immediately? (Rather than when _file's destructor is called)

If so, it seems like there is a failure mode here that could leave wheels without their metadata file?

ie: If the backfill task's worker process gets killed/restarted between writing the digests and storing the metadata file in the storage backend (or if the write to the storage backend fails). In such a case, the next time the backfill tasks runs, the wheel in question wouldn't be returned by the DB query since the digests would then be set - even though the metadata file was never uploaded.

If setting the digests was moved after writing to the storage backend, then in the case of an interrupted backfill task, the inconsistency between storage backend (which would have the uploaded file) and DB digest state (which wouldn't yet have the digest) would instead self-heal on the next run.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a non-issue, the changes to the database aren't persisted until the entire request has completed and a transaction is committed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good to know. (The fact that everything was configured to use transactions isn't obvious from code inspection of this single task.)

@di di deleted the backfill-metadata-task branch February 14, 2024 02:07
@di di mentioned this pull request Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants