Skip to content

Conversation

mkmeral
Copy link
Contributor

@mkmeral mkmeral commented Sep 24, 2025

Description

This PR fixes proper exception handling for OpenAI model provider to correctly distinguish between context window overflow and rate limiting errors. Previously, the OpenAI model provider did not handle these errors, causing them to bubble up as raw OpenAI SDK exceptions instead of being converted to the SDK's standardized exception types.

Key Changes

  1. Added proper exception handling for both stream() and structured_output() methods
  2. Context window errors (HTTP 400, context_length_exceeded) → ContextWindowOverflowException
  3. Rate limit errors (HTTP 429, rate_limit_exceeded) → ModelThrottledException
  4. Comprehensive test coverage for all error scenarios
  5. Updated docstrings to document the new exception behavior

OpenAI Error Types

OpenAI has two separate token limits that cause different error types requiring different handling:

1. Context Length Limit (400 BadRequestError)

  • Model's maximum context window (e.g., 128,000 tokens for GPT-4o)
  • Triggers when the total conversation exceeds this limit
  • Error code: context_length_exceeded
  • Solution: Reduce context size (automatic with conversation manager)

2. Rate Limit - TPM (429 RateLimitError)

  • Tokens Per Minute allowance for the account (e.g., 30,000 TPM)
  • Triggers when a single request exceeds the per-minute token budget
  • Error message: "Request too large for [model] on tokens per min (TPM)"
  • Solution: Wait and retry (automatic with exponential backoff)

Why This Matters

Large context requests can be rejected at the rate limiting layer before reaching context length validation. The TPM limit acts as a "first line of defense" - if a single request would consume more tokens than the per-minute allowance, it gets blocked immediately as a rate limit violation rather than proceeding to context validation.

This means even with models supporting large context windows (like GPT-4o's 128k), users are effectively limited by their TPM quota for single large requests.

Implementation Decision: All Rate Limits as Throttling

Important: We cannot reliably differentiate between token rate limiting caused by:

  • A single large request exceeding TPM limits
  • Multiple smaller requests hitting cumulative TPM limits

Depending on specific error messages is fragile because:

  • Different providers (OpenAI, Azure OpenAI, Groq) may use different error formats
  • Error messages can change over time
  • Message parsing is brittle and error-prone

Therefore, all rate limit errors are treated as throttling and handled with retry logic, regardless of whether they're token-based or request-based. This provides consistent, robust behavior across all OpenAI-compatible providers.

Related Issues

Partially addresses #860

Note: This PR fixes the missing exception handling in the OpenAI model provider, but does not solve the underlying issue reported in #860. The user is experiencing TPM rate limit errors and expecting the SummarizingConversationManager to handle them by reducing context. However, rate limit errors should be handled with retry logic, not context reduction, as reducing context size won't help with TPM limits that are time-based quotas.

Type of Change

Bug fix

Testing

How have you tested the change?

  • I ran hatch run prepare
  • Added comprehensive unit tests covering all error scenarios:
    • Context window overflow errors → ContextWindowOverflowException
    • Token-based rate limit errors → ModelThrottledException
    • Request-based rate limit errors → ModelThrottledException
    • Other BadRequestError exceptions pass through unchanged
  • Tested both stream() and structured_output() methods
  • Verified existing functionality remains unchanged
  • Manual testing with large context scenarios

Test Coverage

# New tests added:
- test_stream_context_overflow_exception
- test_stream_other_bad_request_errors_passthrough  
- test_structured_output_context_overflow_exception
- test_stream_rate_limit_as_throttle
- test_stream_request_rate_limit_as_throttle
- test_structured_output_rate_limit_as_throttle

Behavior Changes

Before

# Raw OpenAI exceptions bubbled up
agent("large context")  # → openai.RateLimitError (unhandled)

After

# Proper exception handling with automatic retry
agent("large context")  # → ModelThrottledException → automatic retry with backoff

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly (docstrings)
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@mkmeral mkmeral changed the title OpenAI context overflow Improve OpenAI error handling Sep 24, 2025
Unshure
Unshure previously approved these changes Sep 24, 2025
Copy link
Member

@Unshure Unshure left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way we can add integ tests for this as well? Can we update https://github.com/strands-agents/sdk-python/blob/main/tests_integ/test_context_overflow.py to include a openai test as well?

@mkmeral mkmeral merged commit f5e2070 into strands-agents:main Sep 25, 2025
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants