Description
Description of the Issue
As we know, on the first run, both the main agent and the guardrail agent run in parallel using asyncio.gather(). However, if the guardrail agent takes extra time to generate an error, the main agent continues and executes _run_single_turn, which sends a request to the LLM. The LLM processes the request and returns a response—resulting in wasted tokens. This behavior of the SDK contradicts the official documentation, which clearly states:
"If the guardrail detects malicious usage, it can immediately raise an error, which stops the expensive model from running and saves you time/money."
Potential Solutions
Introduce a flag like is_expensive_task: bool – This would allow developers to signal that the task should wait for guardrail validation before proceeding, avoiding unnecessary token usage.
Handle the sequencing internally – The SDK could be refactored so that the main agent waits for the guardrail's result before calling the LLM, at least in expensive or high-risk scenarios.
Run the agents sequentially instead of in parallel – While this would prevent premature LLM calls, it may slow down execution and might not be suitable for performance-critical applications.
A hybrid or configurable approach could provide the best balance between efficiency and safety, giving developers flexibility based on their use case.