Skip to content

Conversation

@JackYPCOnline
Copy link
Collaborator

@JackYPCOnline JackYPCOnline commented Oct 23, 2025

Description

This PR refactors the bidirectional streaming interface to provide a cleaner, more generic API that aligns with industry patterns (Google Gemini, OpenAI) while maintaining strong typing and provider flexibility.

This PR should also kickoff the bar-raise dicsussion. We should agree on the interface before making any code changes to model providers.

What have changed?

1. Unified Send Interface

Before:

await session.send_text_content("Hello")
await session.send_audio_content(audio_event)
await session.send_image_content(image_event)
await session.send_tool_result(tool_id, result)

After:

await session.send_events("Hello")
await session.send_events(AudioInputEvent(...))
await session.send_events(ImageInputEvent(...))
await session.send_events(ToolResultInputEvent(...))

Need to discuss: if we want to sperate static(image, text, etc) content vs realtime(audio for now, maybe video in future) content?

2. New Typed Events

2.1 Added ToolResultInputEvent
class ToolResultInputEvent(TypedDict):
    tool_use_id: str
    result: Dict[str, Any]

We should discuss: We already have ToolResult class in SDK, why we need this? What is the diffrenece.

2.2 Enhanced ImageInputEvent

Supports OpenAI/Gemini realtime API patterns:

class ImageInputEvent(TypedDict):
    image_url: Optional[str]      # Data URLs, hosted URLs, file IDs
    imageData: Optional[bytes]    # Raw bytes alternative
    mimeType: Optional[str]       # Required with imageData

Let' discuss the interface from below dimensions:

  • Flattern / two-layer model interface
    • one file or two files
  • Unify/ Seperate send functions

3. Model/Session Separation Clarified

Model (Stateless):

  • Configuration management
  • Session factory
  • Client initilization

Session (Stateful):

  • Active connection state
  • Real-time communication
  • Event streaming

4. Interface Simplification

  • Provider-specific abstractions (_format_tools_for_provider)
  • Unify one send_event fucntion send_events()

Breaking change

  • Method signatures changed from specific to generic

  • send_text_content()send_events()

  • send_audio_content()send_events()

  • send_image_content()send_events()

  • send_tool_result()send_events()

Related Issues

Documentation PR

Type of Change

Bug fix
New feature
Breaking change
Documentation update
Other (please describe):

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

logger = logging.getLogger(__name__)


class BidirectionalModelSession(abc.ABC):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move away from the name Session as to not confuse with our SessionManager. Suggest "Connection"

"""

@abc.abstractmethod
async def receive_events(self) -> AsyncIterable[BidirectionalStreamEvent]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can this is send() and recieve? similar to the agent methods

metadata: Optional[Dict[str, Any]]


class ToolResultInputEvent(TypedDict):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use the existing ToolResult?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this class.

pass

@abc.abstractmethod
def _format_tools_for_provider(self, tool_specs: list[ToolSpec]) -> Any:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove private function

async def send_events(self, content: Union[str, ImageInputEvent, AudioInputEvent, ToolResultInputEvent]) -> None:
"""Send structured content (text, images,audio tool results) to the model.
Args:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other thing: Realtime Event| other event.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants