-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Support Gemini enhanced JSON Schema features #3357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Gemini enhanced JSON Schema features #3357
Conversation
Google announced in November 2025 that Gemini 2.5+ models now support enhanced JSON Schema features including title, $ref/$defs, anyOf/oneOf, minimum/maximum, additionalProperties, prefixItems, and property ordering. This removes workarounds in GoogleJsonSchemaTransformer and allows native $ref and oneOf support instead of forced inlining and conversion. Key findings from empirical testing: - Native $ref/$defs support confirmed (no inlining needed) - Both anyOf and oneOf work natively (no conversion needed) - exclusiveMinimum/exclusiveMaximum NOT yet supported by Google SDK Changes: - Set prefer_inlined_defs=False to use native $ref/$defs instead of inlining - Remove oneOf→anyOf conversion (both work natively now) - Remove adapter code that stripped title, additionalProperties, and prefixItems - Keep stripping exclusiveMinimum/exclusiveMaximum (not yet supported) - Remove code that raised errors for $ref schemas - Update GoogleJsonSchemaTransformer docstring to document all supported features - Update test_json_def_recursive to verify recursive schemas work with $ref - Add comprehensive test suite for new JSON Schema capabilities - Add documentation section highlighting enhanced JSON Schema support with examples 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Updated GoogleJsonSchemaTransformer docstring to note that discriminator is not supported (causes validation errors with nested oneOf) - Added reference to Google's announcement blog post - Added test_google_discriminator.py to document the limitation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Changed test to verify discriminator stripping without API calls - Added proper type hints for pyright compliance - Test now validates transformation behavior directly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Critical fixes: - Rewrote test_google_json_schema_features.py to test schema transformation only (not API calls) since enhanced features require Vertex AI which CI doesn't have - Added prominent warning in docs that enhanced features are Vertex AI only - Updated doc examples to use google-vertex: prefix - Fixed test_google_discriminator.py schema path issue - All tests now pass locally Key discovery: additionalProperties, $ref, and other enhanced features are NOT supported in the Generative Language API (google-gla:), only in Vertex AI (google-vertex:). This is validated by the Google SDK. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
CRITICAL FIX: The same GoogleJsonSchemaTransformer was being used for both Vertex AI and GLA, but they have different JSON Schema support levels. Changes: - Created GoogleVertexJsonSchemaTransformer (enhanced features supported) * Supports: $ref, $defs, additionalProperties, title, prefixItems, etc. * Uses prefer_inlined_defs=False for native $ref support - Created GoogleGLAJsonSchemaTransformer (limited features) * Strips: additionalProperties, title, prefixItems * Uses prefer_inlined_defs=True to inline all $refs * More conservative transformations for GLA compatibility - Updated GoogleGLAProvider to use google_gla_model_profile - Updated GoogleVertexProvider to use google_vertex_model_profile - GoogleJsonSchemaTransformer now aliases to Vertex version (backward compat) - Updated all tests to use GoogleVertexJsonSchemaTransformer This ensures GLA won't receive unsupported schema features that cause validation errors like "additionalProperties is not supported in the Gemini API" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
docs/models/google.md
Outdated
| ## Enhanced JSON Schema Support | ||
|
|
||
| !!! note "Vertex AI Only" | ||
| The enhanced JSON Schema features listed below are **only available when using Vertex AI** (`google-vertex:` prefix or `GoogleProvider(vertexai=True)`). They are **not supported** in the Generative Language API (`google-gla:` prefix). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure? https://blog.google/technology/developers/gemini-api-structured-outputs/ and https://ai.google.dev/gemini-api/docs/structured-output are about the Gemini API, not Vertex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that https://ai.google.dev/gemini-api/docs/structured-output?example=feedback#model_support says we have to use response_json_schema instead of the response_schema key we currently set:
| response_schema=response_schema, |
| response_schema=generation_config.get('response_schema'), |
When we do that, maybe it will work for GLA and Vertex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I've updated the PR. Tests show there's not a difference (with one possible exception -- see the comment below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of testing the schema transformer itself, we should add a test to test_google.py that uses a BaseModel like this as NativeOutput and then verifies that the request succeeds.
|
@conradlee Thanks for working on this Conrad! |
Key changes based on review feedback: 1. Switch from response_schema to response_json_schema - This bypasses Google SDK validation that rejected enhanced features for GLA - Enhanced features now work for BOTH GLA and Vertex AI! 2. Remove separate GLA/Vertex transformers - No longer needed since response_json_schema works everywhere - Reverted to single GoogleJsonSchemaTransformer - Removed prefer_inlined_defs and simplify_nullable_unions parameters 3. Simplify transformer implementation - Removed unnecessary comments and complexity - Removed Enhanced JSON Schema Support docs section (users don't need to know internal details) 4. Remove schema transformation tests - Deleted test_google_json_schema_features.py - Deleted test_google_discriminator.py - Removed test_gemini.py::test_json_def_recursive - These tested implementation details, not actual functionality - Existing test_google_model_structured_output provides adequate coverage The root cause was using response_schema (old API) instead of response_json_schema (new API). response_json_schema bypasses the restrictive validation and supports all enhanced features for both GLA and Vertex AI. Addresses review by @DouweM in PR pydantic#3357 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
The November 2025 announcement explicitly states that Google now supports 'type: null' in JSON schemas, so we don't need to convert anyOf with null to the OpenAPI 3.0 'nullable: true' format. Keep __init__ method for documentation purposes to explicitly note why we're using the defaults (native support for $ref and type: null). Addresses reviewer question: "Do we still need simplify_nullable_unions? type: 'null' is now supported natively" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Remove enum-to-string conversion workaround (no longer needed) - Add 6 comprehensive tests for enhanced features: * Discriminated unions (oneOf with $ref) * Recursive schemas ($ref and $defs) * Dicts with additionalProperties * Optional/nullable fields (type: 'null') * Integer enums (native support) * Recursive schema with gemini-2.5-flash (FAILING) All tests use google_provider with GLA API and recorded cassettes. Tests use gemini-2.5-flash except recursive schema which uses gemini-2.0-flash. NOTE: test_google_recursive_schema_native_output_gemini_2_5 consistently fails with 500 Internal Server Error. This needs investigation before merge. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
The test_google_recursive_schema_native_output_gemini_2_5 test now uses vertex_provider and PASSES successfully. NOTE: During development, this test consistently failed with a 500 error when using google_provider (GLA with GEMINI_API_KEY). However, it passes with vertex_provider (Vertex AI). This may be: - A temporary GLA API issue - A limitation specific to certain API keys - An issue with the GLA endpoint for recursive schemas Maintainers should verify this works with their GLA setup before merge. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
|
The __init__ method was just calling super().__init__() with the same parameters, providing no additional functionality. The base class defaults are exactly what we need: - prefer_inlined_defs defaults to False (native $ref/$defs support) - simplify_nullable_unions defaults to False (type: 'null' support) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
This commit fixes all test failures in the CI/CD pipeline: 1. **test_gemini.py snapshot updates** (7 tests): - Updated snapshots to reflect new behavior where JSON schemas are NOT transformed - Enums now stay as native types (integers remain integers, not converted to strings) - $ref and $defs are now preserved (not inlined) - anyOf with type: 'null' replaces nullable: true - title fields are preserved 2. **test_gemini_additional_properties_is_true**: - Removed pytest.warns() assertion since additionalProperties with schemas now work natively - Added docstring explaining this is supported since Nov 2025 announcement 3. **Cassette scrubbing fix**: - Added 'client_id' to the list of scrubbed OAuth2 parameters in json_body_serializer.py - This ensures all Vertex AI cassettes normalize to the same OAuth credentials - Fixes CannotOverwriteExistingCassetteException in CI 4. **Re-scrubbed cassette**: - Manually scrubbed client_id in test_google_recursive_schema_native_output_gemini_2_5.yaml - Now matches the pattern used by other Vertex AI cassettes All tests now pass locally. The vertex test is correctly skipped locally and will run in CI using the cassette. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
The cassette was recorded with project 'ck-nest-prod' but CI uses 'pydantic-ai'. Also fixed content-length header to match scrubbed body (137 bytes). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
@conradlee It's failing for me as well, I've asked our contacts at Google if that's expected or not. |
| if '$ref' in schema: | ||
| raise UserError(f'Recursive `$ref`s in JSON Schema are not supported by Gemini: {schema["$ref"]}') | ||
|
|
||
| if 'prefixItems' in schema: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a test yet that verifies that prefixItems now works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now added a test for this based on a coordinate class whose json schema representation looks like
{
"description": "A 2D coordinate with latitude and longitude.",
"properties": {
"point": {
"maxItems": 2,
"minItems": 2,
"prefixItems": [
{
"type": "number"
},
{
"type": "number"
}
],
"title": "Point",
"type": "array"
}
},
"required": [
"point"
],
"title": "Coordinate",
"type": "object"
}
Luckily this test passes with the google provider.
|
@DouweM I have added support for the From my perspective, the only remaining issue is to clarify whether GLA supports recursive schemas. |
|
Also this script demonstrates in a bit more depth that support for recursive schemas (which is officially claimed here) is flakey: https://gist.github.com/conradlee/a884a8eee7ad78e256b51d4a688b2ad6 |
|
@conradlee Thanks for doing the testing Conrad, I've shared that with Google. For now let's continue with this PR on the assumption that we cannot rely on this working in the Gemini API, so we should keep the old inline-defs behavior (unless we can determine exactly under what conditions it does and does not work). I suggest implementing that on Let me know if that makes sense or if you'd like help! |
cab0911 to
8dcf07a
Compare
This test verifies that Google has fixed the issue where gemini-2.5-flash would return 500 errors when using recursive schemas with $refs on the GLA (Generative Language API) endpoint. The test successfully passes, confirming that: - Recursive schemas with $defs and $refs now work on GLA - gemini-2.5-flash properly handles TreeNode structures - Google's enhanced JSON Schema support is fully functional This validates that we can use the same JSON schema transformer for both GLA and Vertex AI endpoints without needing defensive workarounds. Test cassette has been reviewed and contains no sensitive information. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Successfully recorded new cassette for gemini-2.5-flash on Vertex AI with recursive schemas. The test passes, confirming that recursive schemas with $refs and $defs work properly on both GLA and Vertex AI. Changes: - Updated cassette with successful test run - Scrubbed sensitive ID token that contained email address - Test confirms recursive TreeNode schemas work on Vertex AI This validates that Google's enhanced JSON Schema support works across both their API endpoints. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
|
Hey @DouweM I brought this issue up on twitter and Philip Schmidt said the google team will look into it: Interestingly, the problem is now resolved. I'm not getting the 5XX errors with recursive schemas any more--neither with the GLA nor Vertex providers. I had claude create a comprehensive test suite with different scenarios and they all passed. Not sure why it works now and didn't before - perhaps the roll-out of the schema improvements just takes several days. In any case, the official Google docs mention that gemini supports refs, and now it really works. So I think we should use them here. I've now added the test that uses a recursive test with the GLA provider--the is the one that was failing before but now passes. I would argue the PR is ready to merge. |
The content-length header needs to be updated after replacing the JWT token with 'scrubbed'. The original token was ~872 chars longer than 'scrubbed', so the content-length is reduced from 1518 to 646 bytes. Thanks for catching this important detail! 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
The cassette was recorded with local project (ck-nest-dev) and location (us-central1), but CI uses pydantic-ai project with global location. Updated cassette to match CI environment: - Project: ck-nest-dev → pydantic-ai - Location: us-central1 → global - Host: us-central1-aiplatform.googleapis.com → aiplatform.googleapis.com Tested locally with CI=true and confirmed test passes. This fixes the CannotOverwriteExistingCassetteException in CI. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
02d5576 to
e640bda
Compare
The test was using a fixed 0.1-second sleep and assuming the task would fail within that time. In slower CI environments or under Python 3.10, this timing assumption could fail. Changed to use the same retry loop pattern used by all other a2a tests: - Poll the task status up to 50 times (5 seconds total) - Break early when the task reaches 'failed' state - Raise clear error if timeout is reached This matches the pattern established in previous commits (a253fad, etc.) that fixed similar flakiness in other a2a tests. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
@conradlee Philipp confirmed to me that they fixed it in response to your/our report :) Let me do one final review and hopefully merge... |
Per reviewer feedback, moved the test_simplify_nullable_unions test from test_utils.py to a new dedicated test_json_schema.py file. The test_utils.py file should only contain tests for the _utils module, not for _json_schema. Changes: - Created new tests/test_json_schema.py file for _json_schema module tests - Moved test_simplify_nullable_unions from test_utils.py to test_json_schema.py - Removed unused import of JsonSchemaTransformer from test_utils.py - Removed unused import of Any from test_utils.py All tests continue to pass in both files. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Per reviewer feedback, now that recursive schemas work on both GLA and Vertex AI endpoints, we follow the usual convention of only testing the GLA provider. Changes: - Removed test_google_recursive_schema_native_output_gemini_2_5 (Vertex AI) - Renamed test_google_recursive_schema_native_output_gemini_2_5_gla to test_google_recursive_schema_native_output_gemini_2_5 (simpler name) - Updated docstring to be endpoint-agnostic - Deleted Vertex AI cassette - Renamed GLA cassette to match the new test name All recursive schema tests continue to pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
|
@DouweM Thanks for the prompt review. I've addressed the issues you brought up, could you have another look? |
| assert {child.value for child in result.output.children} == snapshot({'B', 'C'}) | ||
|
|
||
|
|
||
| async def test_google_recursive_schema_native_output_gemini_2_5( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need the version number in this test name anymore; remember to also change the cassette
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone a different route because there was doubt around whether this works on gemini 2.0 and 2.5. Instead I've also added tests for gemini-2.0-flash, so in this case the version number has new relevance
| ) | ||
|
|
||
|
|
||
| class GoogleJsonSchemaTransformer(JsonSchemaTransformer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@conradlee We're so close but I found another wrinkle: the PR title says "Gemini 2.5+", so I thought I'd check whether there's any version gating to do to keep Gemini 2.0 working, and it appears there is:
https://ai.google.dev/gemini-api/docs/structured-output?example=recipe#model_support says:
Note that Gemini 2.0 requires an explicit propertyOrdering list within the JSON input to define the preferred structure.
Important: Gemini 2.0 models require explicit ordering of keys in structured output schemas. When working with Gemini 2.0, you must define the desired property ordering as a list within the
propertyOrderingfield as part of your schema configuration.
So we need a test for Gemini 2.0, and I expect that in the google_model_profile function here, we need to use a slightly different JSON schema transformer for 2.0, possible a subclass that does all the same thing + set propertyOrdering (unless it turns out it's not actually required).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DouweM You're right to be sceptical of whether this will work on gemini 2.0. However I remember when I was setting the tests up some of them first pointed to gemini-2.0-flash and passed.
To that end I had a suspicion that gemini-2.0-flash supports all these features too--even if the documentation says it does not.
I have thus added versions of the tests that point to gemini-2.5-flash as well. They all pass. I think this is good news.
…tures - Add tests for discriminated unions with gemini-2.0-flash - Add tests for dict with additional properties with gemini-2.0-flash - Add tests for optional/nullable fields with gemini-2.0-flash - Add tests for integer enums with gemini-2.0-flash - Add tests for prefix items (tuples) with gemini-2.0-flash - Update test docstrings to indicate version being tested (2.0 vs 2.5) - Fix comment in google.py to correctly reference response_schema All enhanced JSON schema features now tested on both gemini-2.0-flash and gemini-2.5-flash to ensure compatibility across versions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
|
@conradlee Thanks a lot! I'll get it out today along with Gemini 3 support (#3464) :) |
Summary
Updates
GoogleJsonSchemaTransformerto support enhanced JSON Schema features announced by Google in November 2025 for Gemini 2.5+ models.Transformer Changes (Before → After)
Before: 90+ lines with extensive workarounds
After: ~47 lines with minimal transformations
Removed Workarounds (Now Natively Supported)
additionalPropertieswarning/removal → ✅ Native dict supporttitlefield removal → ✅ PreservedoneOf→anyOfconversion → ✅ Both work natively$refrecursion errors → ✅ Native$ref/$defssupportprefixItems→itemsconversion → ✅ Native tuple supportprefer_inlined_defs=True→ ✅ Native$defswith referencessimplify_nullable_unions=True→ ✅ Nativetype: 'null'Still Transformed (Not Yet Supported)
$schema,const,discriminator,examples→ Removedformat(date, time, etc.) → Moved to description fieldexclusiveMinimum/exclusiveMaximum→ RemovedNew Capabilities
dict[str, ComplexType]with schema validationtype: 'null'supportTests
Added 6 comprehensive tests in
test_google.py:oneOf$ref/$defsadditionalPropertiesUpdated 7 snapshot tests in
test_gemini.pyto reflect new native behavior.Migration Impact
Fully backwards-compatible - existing code continues to work, schemas are now more expressive.
🤖 Generated with Claude Code
Related: Google Announcement - Gemini API Structured Outputs
Fixes #3364