Add Gemini Model Provider #4

mkmeral · 2025-10-10T12:30:47Z

Add Gemini Live API Support for Bidirectional Streaming

Description

This PR adds support for Google's Gemini Live API as a bidirectional streaming model provider, enabling real-time audio conversations with native audio input/output, image/video input, and automatic transcription.

Key Features

Gemini Live Model Provider (gemini_live.py)

Uses official google-genai SDK for robust WebSocket communication
Native audio streaming with 16kHz input and 24kHz output
Real-time audio transcription (both input and output)
Image/video frame input support for multimodal conversations
Automatic VAD-based interruption handling
Tool calling integration
Message history support

Enhanced Bidirectional Streaming

Added ImageInputEvent type for sending images/video frames
Added TranscriptEvent type for audio transcriptions (separate from text output)
Extended BidirectionalAgent.send() to accept text, audio, and image inputs
Updated abstract BidirectionalModelSession interface with send_image_content()

Test Suite Enhancements

Updated test to support both Gemini Live and Nova Sonic
Added camera capture for real-time video frame streaming (1 FPS)
Demonstrates audio + video multimodal interaction
Falls back to Nova Sonic if no Gemini API key provided

Implementation Details

The implementation follows the same architectural patterns as Nova Sonic:

Provider-agnostic event conversion
Clean separation between session management and model interface
Simplified configuration - all Gemini Live API parameters pass through directly
Proper async/await patterns with context manager for connection lifecycle

Configuration Example

from strands.experimental.bidirectional_streaming.models.gemini_live import GeminiLiveBidirectionalModel

model = GeminiLiveBidirectionalModel(
    model_id="gemini-2.5-flash-native-audio-preview-09-2025",
    api_key="your-api-key",
    params={
        "response_modalities": ["AUDIO"],
        "input_audio_transcription": {},   # Enable input transcription
        "output_audio_transcription": {},  # Enable output transcription
    }
)

Related Issues

Documentation PR

Type of Change

New feature

Testing

How have you tested the change?

Tested real-time audio conversations with Gemini Live API
Verified audio transcription (input and output) works correctly
Tested image/video frame streaming from camera
Verified tool calling integration
Tested message history support
Confirmed interruption handling via VAD
Verified fallback to Nova Sonic when no API key provided
Ran hatch fmt for code formatting

Test Environment

Python 3.12+
Dependencies: google-genai, pyaudio, opencv-python, pillow
Tested with GOOGLE_AI_API_KEY environment variable

Files Changed

New: src/strands/experimental/bidirectional_streaming/models/gemini_live.py (501 lines)
Modified: src/strands/experimental/bidirectional_streaming/agent/agent.py - Added image input support
Modified: src/strands/experimental/bidirectional_streaming/models/bidirectional_model.py - Added abstract send_image_content() method
Modified: src/strands/experimental/bidirectional_streaming/models/novasonic.py - Added stub for image input (not supported)
Modified: src/strands/experimental/bidirectional_streaming/types/bidirectional_streaming.py - Added ImageInputEvent and TranscriptEvent types
Modified: src/strands/experimental/bidirectional_streaming/tests/test_bidirectional_streaming.py - Enhanced test with Gemini Live and camera support

Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

I ran hatch run prepare

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

- Add input_audio_transcription and output_audio_transcription parameter pass-through in _build_live_config() - These parameters enable real-time transcription of both user speech (input) and model audio responses (output) - Remove debug logging and temporary debug files (gemini_live_events.jsonl, debug_transcripts.py) - Clean up unused json import The transcription parameters were being set in the test configuration but weren't being passed through to the SDK because _build_live_config() only handled specific parameters. Now transcription events will be properly emitted via the transcript event type.

Instead of cherry-picking specific parameters, just pass through all config from params directly to the SDK. This is simpler and more flexible - users can configure any Gemini Live API parameter without us having to explicitly handle each one. The previous approach was unnecessarily complicated with manual parameter filtering.

- Add proper error logging in close() method - Remove empty line in send_tool_result() try block - Add newline at end of file - Improve code consistency

…entation

- Add GeminiLiveBidirectionalModel and GeminiLiveSession to models __init__.py - Add ImageInputEvent and TranscriptEvent to types __init__.py - Ensures new types and model are properly exported for external use

mehtarac · 2025-10-14T18:18:48Z

src/strands/experimental/bidirectional_streaming/tests/test_bidirectional_streaming.py

suggestion: I think for testing purposes i would prefer a new test file dedicated for the gemini model provider.

Murat Kaan Meral added 8 commits October 6, 2025 15:22

feat(gemini): add gemini live model provider

c48d91c

feat(gemini): Add video support

10970d0

feat: add messages

a1f7c12

refactor: Clean up error handling and formatting

8ce2722

- Add proper error logging in close() method - Remove empty line in send_tool_result() try block - Add newline at end of file - Improve code consistency

chore: Remove OpenAI Realtime files - keeping only Gemini Live implem…

1620cbf

…entation

feat: Export Gemini Live model and new event types

785426d

- Add GeminiLiveBidirectionalModel and GeminiLiveSession to models __init__.py - Add ImageInputEvent and TranscriptEvent to types __init__.py - Ensures new types and model are properly exported for external use

mehtarac reviewed Oct 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Gemini Model Provider #4

Add Gemini Model Provider #4

Uh oh!

mkmeral commented Oct 10, 2025

Uh oh!

mehtarac Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add Gemini Model Provider #4

Are you sure you want to change the base?

Add Gemini Model Provider #4

Uh oh!

Conversation

mkmeral commented Oct 10, 2025

Add Gemini Live API Support for Bidirectional Streaming

Description

Key Features

Implementation Details

Configuration Example

Related Issues

Documentation PR

Type of Change

Testing

How have you tested the change?

Test Environment

Files Changed

Checklist

Uh oh!

mehtarac Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants