Prevent Token Wastage When Input Guardrails Delay Error Generation

## Description of the Issue
As we know, on the first run, both the main agent and the guardrail agent run in parallel using asyncio.gather(). However, if the guardrail agent takes extra time to generate an error, the main agent continues and executes _run_single_turn, which sends a request to the LLM. The LLM processes the request and returns a response—resulting in wasted tokens. This behavior of the SDK contradicts the official documentation, which clearly states:
"If the guardrail detects malicious usage, it can immediately raise an error, which stops the expensive model from running and saves you time/money."

## Potential Solutions
Introduce a flag like is_expensive_task: bool – This would allow developers to signal that the task should wait for guardrail validation before proceeding, avoiding unnecessary token usage.

Handle the sequencing internally – The SDK could be refactored so that the main agent waits for the guardrail's result before calling the LLM, at least in expensive or high-risk scenarios.

Run the agents sequentially instead of in parallel – While this would prevent premature LLM calls, it may slow down execution and might not be suitable for performance-critical applications.

A hybrid or configurable approach could provide the best balance between efficiency and safety, giving developers flexibility based on their use case.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prevent Token Wastage When Input Guardrails Delay Error Generation #867

Description of the Issue

Potential Solutions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Prevent Token Wastage When Input Guardrails Delay Error Generation #867

Description

Description of the Issue

Potential Solutions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions