Generate Python msgspec.Struct classes from the Schema.org vocabulary for high-performance data validation and serialization.
Inspired by pydantic_schemaorg.
Provide a tool to automatically generate efficient Python data structures based on Schema.org, using the msgspec library. This enables fast serialization, deserialization, and validation of Schema.org structured data.
This project was developed using a combination of AI tools:
- Cursor IDE: The primary development environment
- Claude 3.7 Sonnet: Used as the primary AI coding agent
- Gemini 2.5: Was used for brainstorming and architecture planning
The entire project was developed using this AI-assisted workflow, from initial concept to final implementation.
While AI assisted in development, all code was reviewed and tested.
- Schema Acquisition: Downloads the latest Schema.org vocabulary (JSON-LD).
- Type Mapping: Maps Schema.org types (Text, Number, Date, URL, etc.) to Python types (
str,int | float,datetime.date,URL,bool). - Code Generation: Creates
msgspec.Structdefinitions from Schema.org types, including type hints and docstrings. - Proper Inheritance: Preserves the Schema.org class hierarchy using Python inheritance (
Bookinherits fromCreativeWork, which inherits fromThing). - JSON-LD Compatibility: All models support JSON-LD fields (
@id,@type,@context) that serialize correctly. - Property Cardinality: Implements Schema.org's multiple-value property model, where properties can take both single values and lists of values.
- Category Organization: Organizes generated classes into subdirectories (CreativeWork, Person, etc.).
- Circular Dependency Resolution: Uses forward references (
"TypeName") andTYPE_CHECKINGimports. - Python Compatibility: Handles reserved keywords.
- Convenient Imports: All generated classes are importable from
msgspec_schemaorg.models. - ISO8601 Date Handling: Utility function
parse_iso8601for date/datetime strings. - Type Specificity: Sorts type unions to prioritize more specific types (e.g.,
IntegerbeforeNumber). - URL Validation: Validates URL fields using a centralized
URLtype with pattern validation. - Comprehensive Testing: Includes tests for model generation, validation, inheritance, and usage.
pip install msgspec-schemaorgOr install from source for development:
git clone https://github.com/mikewolfd/msgspec-schemaorg.git
cd msgspec-schemaorg
pip install -e .import msgspec
from msgspec_schemaorg.models import Person, PostalAddress
# Create Struct instances
address = PostalAddress(
streetAddress="123 Main St",
addressLocality="Anytown",
postalCode="12345",
addressCountry="US"
)
person = Person(
name="Jane Doe",
jobTitle="Software Engineer",
address=address,
# JSON-LD fields
id="https://example.com/people/jane",
context="https://schema.org"
)
# Encode to JSON
json_bytes = msgspec.json.encode(person)
print(json_bytes.decode())
# Output: {"name":"Jane Doe","jobTitle":"Software Engineer","address":{"streetAddress":"123 Main St","addressLocality":"Anytown","postalCode":"12345","addressCountry":"US"},"@id":"https://example.com/people/jane","@context":"https://schema.org","@type":"Person"}Run the generation script. This fetches the schema and creates Python models in msgspec_schemaorg/models/.
python scripts/generate_models.pyOptions:
--schema-url URL: Specify Schema.org data URL.--output-dir DIR: Set output directory for generated code.--save-schema: Save the downloaded schema JSON locally.--clean: Clean the output directory before generation.
Import and use the generated Struct classes as shown in the Quick Start. All models are available under msgspec_schemaorg.models.
from msgspec_schemaorg.models import BlogPosting, Person, Organization, ImageObject
# Create nested objects
blog_post = BlogPosting(
name="Understanding Schema.org with Python",
headline="How to Use Schema.org Types in Python",
author=Person(name="Jane Author"),
publisher=Organization(name="TechMedia Inc."),
image=ImageObject(url="https://example.com/images/header.jpg"),
datePublished="2023-09-15", # ISO8601 date string
# JSON-LD fields
id="https://example.com/blog/schema-org-python",
context="https://schema.org"
)All Schema.org models preserve the original class hierarchy:
from msgspec_schemaorg.models import Thing, CreativeWork, Book
# All Schema.org types inherit ultimately from Thing
isinstance(Book(), Thing) # True
isinstance(Book(), CreativeWork) # True
# Properties are inherited
book = Book(name="The Great Gatsby")
print(book.name) # Inherited from ThingAll models have JSON-LD fields for linked data integration:
from msgspec_schemaorg.models import Product
import msgspec
import json
# Create a product with JSON-LD fields
product = Product(
name="Smartphone",
id="https://example.com/products/123", # Maps to @id
context="https://schema.org", # Maps to @context
type="Product" # Maps to @type (usually has default value)
)
# Encode to JSON
json_bytes = msgspec.json.encode(product)
data = json.loads(json_bytes)
# JSON-LD fields are properly serialized with @ prefix
print(data["@id"]) # https://example.com/products/123
print(data["@context"]) # https://schema.org
print(data["@type"]) # ProductUse the parse_iso8601 utility for date strings:
from msgspec_schemaorg.utils import parse_iso8601
from msgspec_schemaorg.models import BlogPosting
published_date = parse_iso8601("2023-09-15") # -> datetime.date
modified_time = parse_iso8601("2023-09-20T14:30:00Z") # -> datetime.datetime
post = BlogPosting(datePublished=published_date, dateModified=modified_time)
print(post.datePublished.year) # 2023URL fields are automatically validated using a centralized URL type:
import msgspec
from msgspec_schemaorg.models import WebSite
# Valid URL
website = WebSite(name="My Website", url="https://example.com")
# Invalid URL during decoding raises ValidationError
try:
msgspec.json.decode(
b'{"name":"Invalid Site", "url":"not-a-valid-url"}',
type=WebSite
)
except msgspec.ValidationError as e:
print(f"Validation Error: {e}")Use run.py for common tasks:
python run.py generate # Generate models
python run.py test # Run all tests
python run.py example # Run basic example
python run.py all # Generate models and run tests/examplesRun the test suite:
python run_tests.pyOr run specific test groups:
python run_tests.py unittest
python run_tests.py examples
python run_tests.py imports
python run_tests.py inheritance # Test the inheritance structureThe tests cover model generation, imports, date parsing, URL validation, inheritance, and example script execution.
- Primitives: Schema.org types like
Text,Number,Date,URLare mapped to Python types (str,int | float,datetime.date,URL,bool). - Specificity: Type unions are sorted (e.g.,
IntegerbeforeNumber). - Literals:
Booleanconstants useLiteral[True]/Literal[False]. - URLs: Validated using a consistent
URLtype with pattern validation. - Inheritance: Schema.org hierarchy is preserved through Python class inheritance.
- JSON-LD: All models support standard JSON-LD fields (
@id,@type,@context). - Enumerations: Schema.org enumerations are available as Python Enum classes in the
msgspec_schemaorg.enumspackage, organized by category (e.g.,msgspec_schemaorg.enums.intangible).
Access and use Schema.org enumeration values as Python Enums:
from msgspec_schemaorg.enums.intangible import DeliveryMethod, MediaAuthenticityCategory
import msgspec
# Create an offer with enum value
offer = {
"name": "Fast Delivery Package",
"price": 15.99,
"availableDeliveryMethod": DeliveryMethod.LockerDelivery,
"priceCurrency": "USD"
}
# Encode to JSON (enums serialize to their string values)
json_bytes = msgspec.json.encode(offer)
print(json_bytes.decode())
# Output includes: "availableDeliveryMethod": "LockerDelivery"
# List all enum values
for method in DeliveryMethod:
print(f" - {method.name}: {method.value}")
# Access enum metadata
print(f"ID: {DeliveryMethod.ParcelService.__schema_id__}")
print(f"Label: {DeliveryMethod.ParcelService.__schema_label__}")
print(f"Comment: {DeliveryMethod.ParcelService.__schema_comment__}")
# Use enums in model classes
from msgspec_schemaorg.models import MediaReview, Person
review = MediaReview(
name="Image Analysis",
author=Person(name="Media Reviewer"),
mediaAuthenticityCategory=MediaAuthenticityCategory.OriginalMediaContent
)Enum classes are organized by category in the msgspec_schemaorg.enums package. The most commonly used enums are in the msgspec_schemaorg.enums.intangible module.
- Core Schema Only: Extensions (e.g., health/medical) are not included.
- Optional Properties: All properties are generated as optional (
| None). - Extra Fields Ignored by Default: By default,
msgspecignores fields present in the input data but not defined in theStruct. To raise an error for unknown fields,Structs must be defined withforbid_unknown_fields=True.
Contributions are welcome! Please see CONTRIBUTING.md.
This project is licensed under the MIT License - see the LICENSE file for details.