-
Notifications
You must be signed in to change notification settings - Fork 444
Description
Problem Statement
Most LLM APIs can retrieve image inputs both as base64 data and URL that refer to image somewhere in web. But for now, ImageContent typed dict support only bytes input to covert it into base64 data and send it via specific model provider to model API.
Proposed Solution
IDK if there are some additional actions required, but for the first step it's a good idea to enhance ImageContent with the ability to pass not only bytes, but also image source URL. Or create two schemas for image input: one for bytes format (maybe renaming of ImageContent is ok) and one for URL. Here is a sample:
- ImageBytesContent
ImageFormat = Literal["png", "jpeg", "gif", "webp"]
"""Supported image formats."""
class ImageSourceBytes(TypedDict):
"""Contains the content of an image.
Attributes:
bytes: The binary content of the image.
"""
source_bytes: bytes
class ImageBytesContent(TypedDict):
"""An image to include in a message.
Attributes:
format: The format of the image (e.g., "png", "jpeg").
source_bytes: The source containing the image's binary content.
"""
format: ImageFormat
source_bytes: ImageSourceBytes
- ImageSourceURLContent
class ImageSourceURLContent(TypedDict):
"""An image to include in a message.
Attributes:
source_url: The source containing the image's binary content.
"""
source_url: str
Then providers can convert image message contents depends on recieved image input TypedDict
Use Case
You can pass image as LLM input not only as bytes, but also as URL. It's handful when you don't have image locally and provide a way to get rid of downloading image before sending it to LLM.
Alternatives Solutions
No response
Additional Context
No response