Skip to content

Commit cfdfddc

Browse files
Merge branch 'main' into fix-tracing-setup-docs
2 parents f64c952 + d146551 commit cfdfddc

File tree

3 files changed

+38
-0
lines changed

3 files changed

+38
-0
lines changed

docs/models/litellm.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,3 +71,20 @@ if __name__ == "__main__":
7171

7272
asyncio.run(main(model, api_key))
7373
```
74+
75+
## Tracking usage data
76+
77+
If you want LiteLLM responses to populate the Agents SDK usage metrics, pass `ModelSettings(include_usage=True)` when creating your agent.
78+
79+
```python
80+
from agents import Agent, ModelSettings
81+
from agents.extensions.models.litellm_model import LitellmModel
82+
83+
agent = Agent(
84+
name="Assistant",
85+
model=LitellmModel(model="your/model", api_key="..."),
86+
model_settings=ModelSettings(include_usage=True),
87+
)
88+
```
89+
90+
With `include_usage=True`, LiteLLM requests report token and request counts through `result.context_wrapper.usage` just like the built-in OpenAI models.

docs/usage.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,24 @@ print("Total tokens:", usage.total_tokens)
2828

2929
Usage is aggregated across all model calls during the run (including tool calls and handoffs).
3030

31+
### Enabling usage with LiteLLM models
32+
33+
LiteLLM providers do not report usage metrics by default. When you are using [`LitellmModel`](models/litellm.md), pass `ModelSettings(include_usage=True)` to your agent so that LiteLLM responses populate `result.context_wrapper.usage`.
34+
35+
```python
36+
from agents import Agent, ModelSettings, Runner
37+
from agents.extensions.models.litellm_model import LitellmModel
38+
39+
agent = Agent(
40+
name="Assistant",
41+
model=LitellmModel(model="your/model", api_key="..."),
42+
model_settings=ModelSettings(include_usage=True),
43+
)
44+
45+
result = await Runner.run(agent, "What's the weather in Tokyo?")
46+
print(result.context_wrapper.usage.total_tokens)
47+
```
48+
3149
## Accessing usage with sessions
3250

3351
When you use a `Session` (e.g., `SQLiteSession`), each call to `Runner.run(...)` returns usage for that specific run. Sessions maintain conversation history for context, but each run's usage is independent.

src/agents/run.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -272,6 +272,7 @@ async def run(
272272
We recommend only using this if you are exclusively using OpenAI models;
273273
other model providers don't write to the Conversation object,
274274
so you'll end up having partial conversations stored.
275+
session: A session for automatic conversation history management.
275276
Returns:
276277
A run result containing all the inputs, guardrail results and the output of the last
277278
agent. Agents may perform handoffs, so we don't know the specific type of the output.
@@ -329,6 +330,7 @@ def run_sync(
329330
previous_response_id: The ID of the previous response, if using OpenAI models via the
330331
Responses API, this allows you to skip passing in input from the previous turn.
331332
conversation_id: The ID of the stored conversation, if any.
333+
session: A session for automatic conversation history management.
332334
Returns:
333335
A run result containing all the inputs, guardrail results and the output of the last
334336
agent. Agents may perform handoffs, so we don't know the specific type of the output.
@@ -383,6 +385,7 @@ def run_streamed(
383385
previous_response_id: The ID of the previous response, if using OpenAI models via the
384386
Responses API, this allows you to skip passing in input from the previous turn.
385387
conversation_id: The ID of the stored conversation, if any.
388+
session: A session for automatic conversation history management.
386389
Returns:
387390
A result object that contains data about the run, as well as a method to stream events.
388391
"""

0 commit comments

Comments
 (0)