Skip to content

[BUG] SummarizingContexManager for OpenAI Models doesn't works #860

@girishp1983

Description

@girishp1983

Checks

  • I have updated to the lastest minor and patch version of Strands
  • I have checked the documentation and this is not expected behavior
  • I have searched ./issues and there are no duplicates of my issue

Strands Version

1.7.1

Python Version

3.13.7

Operating System

MacOS

Installation Method

pip

Steps to Reproduce

openai_model = OpenAIModel(
    client_args={
        "api_key": "*************************************************************",
        "base_url": "https://api.groq.com/openai/v1",
        "_strict_response_validation": False
    },
    model_id="moonshotai/kimi-k2-instruct-0905",
    params={
        "temperature": 0.7,
        "max_tokens": 8192
    }
)

# Create conversation manager with summarizing strategy
# This will summarize older messages to prevent context length exceeded errors
conversation_manager = SummarizingConversationManager(
    summary_ratio=0.3,
    preserve_recent_messages=3,
    summarization_agent=openai_model
)

# Create agent with OpenAI model, Exa tools, and conversation manager
agent = Agent(
    model=openai_model,
    tools=[exa_search_and_contents],  # Using custom tool instead of separate exa_search and exa_get_contents
    system_prompt=system_prompt,
    conversation_manager=conversation_manager
)

Expected Behavior

When the model does not respond because of context length issue this context manager is supposed to make context smaller by summarising it.

Actual Behavior

Some platforms like Groq have TPM limit. If message has become too long it will consistently exceed TPM, so every time you will get exception. Trying after some time does not help. In such cases SummarizingConversationManager should trigger.

However, SummarizingConversationManager does not handle it because it looks for ContextOverFlow exceptions, this is not a ContextOverFlow issue, but TPM exceeded issue.

ERROR:backend_strands_multi_agent_with_groq_custom_tool_updated:Error in multi-agent research workflow: Error code: 413 - {'error': {'message': 'Request too large for model moonshotai/kimi-k2-instruct-0905 in organization org_01hvwykqttfc1brn6gydxkxgre service tier on_demand on tokens per minute (TPM): Limit 250000, Requested 252062, please reduce your message size and try again. Need more tokens? Visit https://groq.com/self-serve-support/ to request higher limits.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}

Additional Context

I also compared anthropic.py, bedrock.py and openai.py files of the SDK.

The first two indeed raise ContextWindowOverflowException when context is too long. However, OpenAI.py does not raise ContextWindowOverflowException under any circumstance. So that file also needs to be fixed.

Possible Solution

  1. Fix OpenAI.py to raise - ContextWindowOverflowException
  2. Also for all the models, raise the same/similar exception when the Token Per Minute (TPM) is exceeded that SummarizingContextManager should handle (SummarizingContextManager.py file)

Related Issues

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions