Skip to content

[FEATURE] Add Native Audio Support to ContentBlock Type #866

@westonbrown

Description

@westonbrown

Problem Statement

The Strands SDK's ContentBlock type currently supports image, video, and document content but lacks audio support. As multimodal AI models increasingly support audio input/output, this gap forces developers to use untyped workarounds, breaking type safety and SDK consistency.

The recently merged LlamaCpp provider (PR #585) demonstrates this limitation - it must cast to Dict[str, Any] to handle audio for models like Qwen2.5-Omni, losing the type safety that makes Strands reliable.

Proposed Solution

Proposed Solution

Extend the SDK's media types to include audio, following the established pattern for other media types:

# src/strands/types/media.py
AudioFormat = Literal["wav", "mp3", "flac"]

class AudioSource(TypedDict):
    """Contains the content of audio data."""
    bytes: bytes

class AudioContent(TypedDict):
    """Audio to include in a message."""
    format: AudioFormat
    source: AudioSource

# src/strands/types/content.py
class ContentBlock(TypedDict, total=False):
    # ... existing fields ...
    audio: AudioContent  # Add alongside image, video, document

Use Case

This enhancement would benefit:

  • Model Providers: Bedrock (Nova Sonic), LlamaCpp (Qwen2.5-Omni)
  • Applications: Voice assistants, transcription services, audio analysis, real-time conversation

Alternatives Solutions

No response

Additional Context

The LlamaCpp implementation currently handles audio as:

# Current workaround in llamacpp.py
if "audio" in content:
    audio_content = cast(Dict[str, Any], content)  # Loss of type safety
    audio_data = base64.b64encode(audio_content["audio"]["source"]["bytes"])

With native support, all model providers could handle audio consistently and safely within the SDK's type system.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions