-
-
Notifications
You must be signed in to change notification settings - Fork 19
Significantly enhance the safety of metadata manipulation #221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
945daa8 to
c064b54
Compare
2bccb8d to
6033f68
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #221 +/- ##
==========================================
Coverage 100.00% 100.00%
==========================================
Files 38 39 +1
Lines 2227 2447 +220
Branches 426 335 -91
==========================================
+ Hits 2227 2447 +220 ☔ View full report in Codecov by Sentry. |
e759a36 to
298beef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow that's a lot of change!
See inline comments ; maybe we should discuss it live once you've looked at it
|
Here's my attempt at making those explicit, dedicated metadata easier to apprehend and use. I wrote this quickly at that time then a loong time passed before I looked at it and had to fix the tests to make it work.
I chose an approach in which the flexibility is built into the base Metadata class, using Class variables that gets overriden by the subclasses and can be shadowed by instances should need be. One thing that's a tad annoying is that Another take was to spot accepting bytes as inputs as well as strings where there's no good reason to. For instance, DateMetadata takes a date or datetime only now. Supporting those extra stuff is an additional burden and there's no real value in our usecase. Major changes that has to be introduced (we'll work on CHANGELOG if we pursue with this)
We need to validate this but this frees us from a lot of meaningless tests and really simplifies dev and maintenance.
I wonder if we should offer a way to create the StandardMetadataSet with values directly (to make it less verbose). Would be quite easy now with something like (assuming the expects_* decorator add that type: class StandardMetadataList:
...
@classmethod
def from_values(
cls,
Name: NameMetadata.input_type,
Language: LanguageMetadata.input_type,
Title: TitleMetadata.input_type,
Creator: CreatorMetadata.input_type,
Publisher: PublisherMetadata.input_type,
Date: DateMetadata.input_type,
Illustration_48x48_at_1: DefaultIllustrationMetadata.input_type,
Description: DescriptionMetadata.input_type,
LongDescription: LongDescriptionMetadata.input_type | None = None,
Tags: TagsMetadata.input_type | None = None,
Scraper: ScraperMetadata.input_type | None = None,
Flavour: FlavourMetadata.input_type | None = None,
Source: SourceMetadata.input_type | None = None,
License: LicenseMetadata.input_type | None = None,
Relation: RelationMetadata.input_type | None = None,
): ...All tests are passing. I did not write new onesbut I had to update |
- Explicit callback definition - simplified delete_callback to be a dumb callback (not chaining)
Reasoning: coverage reported a lot of missing lines on zim/metadata.py with previous version Also includes auto linting where new ruff complained
In order to properly expose input type in __init__ (for pyright and user assit), use one base class (subclassing Metadata) per input type. Cant get rid of the `Any` on `Metadata` init (otherwise would me re-implement the init everywhere). Used the opportunity to remove the `expecting` classvar and modified tests accordingly - Also fixed a minor issue in bytes reading by seeking back to previous position and not zero. - Also shared binary reading logic inside main base class (was already there) so it can be reused in illustration - Now explicitly says the type of stored data (can be different to inputs in somewhat flexible ones)
a2c1456 to
7e2efa1
Compare
|
Thanks a lot, nothing left to add, I like it! Glad we've made this "not-negligible" move. I just force-push to fixup commits and rebase on main. |
Fix #205
This is a full rewrite of #217, so I've opened a new PR since changes since last review made no more sense from my PoV.
zim.metadata.check_metadata_conventionszim.creator.Creator.config_metadataby using these types and been more strict:StandardMetadataclass for standard metadata, including list of mandatory oneX-prefixfail_on_missing_prefixargumentadd_metadata, use same metadata typeszim.creator.Creator.startwith new types, and drop all metadata from memory after being passed to the libzimzim.creator.convert_and_check_metadata(not usefull anymore, simply use proper metadata type)MANDATORY_ZIM_METADATA_KEYSandDEFAULT_DEV_ZIM_METADATAfromconstantstozim.metadatato avoid circular dependenciesinputs.unique_valuesutility function to compute the list of uniques values from a given list, but preserving initial list order__init__ofzim.creator.Creator, renamedisable_metadata_checkstocheck_metadata_conventionsfor clarity and brevityzim.metadata.check_metadata_conventions, so if you have many creator running in parallel, they can't have different settings, last one initialized will "win"Nota:
tests/zim/test_zim_creator.pytotests/zim/test_metadata.pysince most checks are now done at metadata initialization instead of whenconfig_metadataorstartare called, but coverage is similar