Improve OpenAI error handling #918

mkmeral · 2025-09-24T11:37:03Z

Description

This PR fixes proper exception handling for OpenAI model provider to correctly distinguish between context window overflow and rate limiting errors. Previously, the OpenAI model provider did not handle these errors, causing them to bubble up as raw OpenAI SDK exceptions instead of being converted to the SDK's standardized exception types.

Key Changes

Added proper exception handling for both stream() and structured_output() methods
Context window errors (HTTP 400, context_length_exceeded) → ContextWindowOverflowException
Rate limit errors (HTTP 429, rate_limit_exceeded) → ModelThrottledException
Comprehensive test coverage for all error scenarios
Updated docstrings to document the new exception behavior

OpenAI Error Types

OpenAI has two separate token limits that cause different error types requiring different handling:

1. Context Length Limit (400 BadRequestError)

Model's maximum context window (e.g., 128,000 tokens for GPT-4o)
Triggers when the total conversation exceeds this limit
Error code: context_length_exceeded
Solution: Reduce context size (automatic with conversation manager)

2. Rate Limit - TPM (429 RateLimitError)

Tokens Per Minute allowance for the account (e.g., 30,000 TPM)
Triggers when a single request exceeds the per-minute token budget
Error message: "Request too large for [model] on tokens per min (TPM)"
Solution: Wait and retry (automatic with exponential backoff)

Why This Matters

Large context requests can be rejected at the rate limiting layer before reaching context length validation. The TPM limit acts as a "first line of defense" - if a single request would consume more tokens than the per-minute allowance, it gets blocked immediately as a rate limit violation rather than proceeding to context validation.

This means even with models supporting large context windows (like GPT-4o's 128k), users are effectively limited by their TPM quota for single large requests.

Implementation Decision: All Rate Limits as Throttling

Important: We cannot reliably differentiate between token rate limiting caused by:

A single large request exceeding TPM limits
Multiple smaller requests hitting cumulative TPM limits

Depending on specific error messages is fragile because:

Different providers (OpenAI, Azure OpenAI, Groq) may use different error formats
Error messages can change over time
Message parsing is brittle and error-prone

Therefore, all rate limit errors are treated as throttling and handled with retry logic, regardless of whether they're token-based or request-based. This provides consistent, robust behavior across all OpenAI-compatible providers.

Related Issues

Partially addresses #860

Note: This PR fixes the missing exception handling in the OpenAI model provider, but does not solve the underlying issue reported in #860. The user is experiencing TPM rate limit errors and expecting the SummarizingConversationManager to handle them by reducing context. However, rate limit errors should be handled with retry logic, not context reduction, as reducing context size won't help with TPM limits that are time-based quotas.

Type of Change

Bug fix

Testing

How have you tested the change?

I ran hatch run prepare
Added comprehensive unit tests covering all error scenarios:
- Context window overflow errors → ContextWindowOverflowException
- Token-based rate limit errors → ModelThrottledException
- Request-based rate limit errors → ModelThrottledException
- Other BadRequestError exceptions pass through unchanged
Tested both stream() and structured_output() methods
Verified existing functionality remains unchanged
Manual testing with large context scenarios

Test Coverage

# New tests added:
- test_stream_context_overflow_exception
- test_stream_other_bad_request_errors_passthrough  
- test_structured_output_context_overflow_exception
- test_stream_rate_limit_as_throttle
- test_stream_request_rate_limit_as_throttle
- test_structured_output_rate_limit_as_throttle

Behavior Changes

Before

# Raw OpenAI exceptions bubbled up
agent("large context")  # → openai.RateLimitError (unhandled)

After

# Proper exception handling with automatic retry
agent("large context")  # → ModelThrottledException → automatic retry with backoff

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly (docstrings)
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Unshure

Any way we can add integ tests for this as well? Can we update https://github.com/strands-agents/sdk-python/blob/main/tests_integ/test_context_overflow.py to include a openai test as well?

Murat Kaan Meral added 2 commits September 24, 2025 11:17

fix(openai): Improve error handling for OpenAI model provider

312533a

move imports to top

35f2eb2

mkmeral had a problem deploying to auto-approve September 24, 2025 11:37 — with GitHub Actions Failure

mkmeral changed the title ~~OpenAI context overflow~~ Improve OpenAI error handling Sep 24, 2025

mkmeral mentioned this pull request Sep 24, 2025

[BUG] SummarizingContexManager for OpenAI Models doesn't works #860

Closed

3 tasks

Unshure previously approved these changes Sep 24, 2025

View reviewed changes

fix(openai): Add integ tests for error handling

27955a3

mkmeral dismissed Unshure’s stale review via 27955a3 September 25, 2025 10:00

mkmeral had a problem deploying to auto-approve September 25, 2025 10:00 — with GitHub Actions Failure

mkmeral enabled auto-merge (squash) September 25, 2025 14:13

Unshure approved these changes Sep 25, 2025

View reviewed changes

JackYPCOnline approved these changes Sep 25, 2025

View reviewed changes

mkmeral merged commit f5e2070 into strands-agents:main Sep 25, 2025
11 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve OpenAI error handling #918

Improve OpenAI error handling #918

Uh oh!

mkmeral commented Sep 24, 2025 •

edited

Loading

Uh oh!

Unshure left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Improve OpenAI error handling #918

Improve OpenAI error handling #918

Uh oh!

Conversation

mkmeral commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Key Changes

OpenAI Error Types

1. Context Length Limit (400 BadRequestError)

2. Rate Limit - TPM (429 RateLimitError)

Why This Matters

Implementation Decision: All Rate Limits as Throttling

Related Issues

Type of Change

Testing

Test Coverage

Behavior Changes

Before

After

Checklist

Uh oh!

Unshure left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mkmeral commented Sep 24, 2025 •

edited

Loading