Skip to content

Conversation

@mehtarac
Copy link
Owner

@mehtarac mehtarac commented Oct 9, 2025

Description

This PR adds OpenAI Realtime API support to the Strands bidirectional streaming system, enabling real-time audio conversations. Users can now choose between Amazon Nova Sonic and OpenAI Realtime API providers.

Relevant documentation:

Files Added

src/strands/experimental/bidirectional_streaming/models/openai.py

OpenAI model provider implementation

Key additions:

  • OpenAIRealtimeBidirectionalModel class for creating connections
  • OpenAIRealtimeSession class for WebSocket session management
  • WebSocket communication with OpenAI Realtime API
  • Voice activity detection event handling

src/strands/experimental/bidirectional_streaming/tests/test_bidi_openai.py

Integration test for OpenAI voice chat

Key features:

  • End-to-end audio testing with PyAudio microphone and speaker
  • Concurrent audio input/output processing
  • Real-time conversation with OpenAI's voice assistant

Files Modified

src/strands/experimental/bidirectional_streaming/types/bidirectional_streaming.py

Extended type definitions

Additions:

  • VoiceActivityEvent type for speech detection events (For OpenAI)
  • UsageMetricsEvent type for token usage tracking

Running the Test Script

Prerequisites:

  • Set OPENAI_API_KEY environment variable
  • Install pyaudio for audio I/O

Commands:

# Set API key
export OPENAI_API_KEY="sk-your-api-key-here"

# Run the test
python src/strands/experimental/bidirectional_streaming/tests/test_bidi_openai.py

Expected behavior:

  • Script starts OpenAI Realtime connection
  • Audio input from microphone is sent to OpenAI
  • OpenAI voice responses play through speakers
  • User speech and assistant responses are transcribed to console
  • Press Ctrl+C to stop the test

Related Issues

strands-agents#217

Documentation PR

Type of Change

New feature

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

"type": "server_vad",
"threshold": 0.5,
"prefix_padding_ms": 300,
"silence_duration_ms": 500,
Copy link
Collaborator

@JackYPCOnline JackYPCOnline Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it is useful , I saw two configuration, just for your reference: 1
2

system_prompt: str | None = None,
tools: list[ToolSpec] | None = None,
messages: Messages | None = None,
**kwargs,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This **kwargs is now unsed?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, **kwargs in create_bidirectional_connection should stay since it's part of the abstract interface and may be used by other implementations or future extensions

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 all abstract methods should have **kwargs, that will allow us to extend these methods with more inputs later on


await agent.send(audio_event)

except asyncio.TimeoutError:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could discuss tomorrow when we want to fail hard, when we want to handle it silently


logger.debug("OpenAI Realtime session initialized: %s", self.session_id)

def _require_active(self) -> bool:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd expect require active to throw exceptions, if not active

system_prompt: str | None = None,
tools: list[ToolSpec] | None = None,
messages: Messages | None = None,
**kwargs,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 all abstract methods should have **kwargs, that will allow us to extend these methods with more inputs later on

if "project" in self.config:
headers.append(("OpenAI-Project", self.config["project"]))

websocket = await websockets.connect(url, additional_headers=headers)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should websocket be in session initialization? though considering, we'll merge the two, it probably doesn't matter much

@mehtarac mehtarac merged commit ce97c1d into main Oct 29, 2025
1 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants