Skip to content

Tool handler issue when using Opus 4.1 with streaming #1298

@matthicksj

Description

@matthicksj

Initial Checks

Description

Tool handlers fail to execute after cache miss when streaming context exists

Context

This bug was discovered while using Claude Opus 4.1 with large thinking budgets (>8192 tokens) for complex multi-agent orchestration. Opus 4.1's interleaved thinking capability allows it to think between tool calls, making it ideal for adaptive planning scenarios. However, thinking budgets above 8192 tokens require streaming to be enabled, which triggers this bug.

Description

The MCP Python SDK has a critical bug where tool handlers fail to execute when all three conditions are met:

  1. MCP servers are spawned fresh for each request (stateless architecture)
  2. A streaming context exists anywhere in the async environment (e.g., Claude API client with stream=True)
  3. The tool cache is empty (first tool request to server)

This specifically affects production deployments using stateless MCP servers with Claude Opus 4.1's advanced features, but can occur with any streaming-enabled AI client.

Environment

  • MCP SDK Version: 1.13.0 - 1.13.1 (confirmed bug present)
  • Python Version: 3.10+ (tested on 3.11.9)
  • Operating System: macOS, Linux (confirmed on both)
  • Related Dependencies:
    • anthropic[vertex] 0.55.0
    • anyio 4.9.0

Minimal Reproduction Test

This test demonstrates the nested handler invocation bug that causes tool execution failures in streaming contexts.

Test File: test_nested_handler_bug.py

"""Test that reproduces the nested handler invocation bug from issue #1298."""

from typing import Any
import anyio
import pytest
from mcp.client.session import ClientSession
from mcp.server.lowlevel import Server
from mcp.types import ListToolsRequest, TextContent, Tool


@pytest.mark.anyio
async def test_nested_handler_invocation_bug():
    """Verify that cache refresh uses nested handler invocation (the bug).
    
    Issue #1298: Tool handlers fail when cache refresh triggers
    nested handler invocation via self.request_handlers[ListToolsRequest](None),
    which disrupts async execution flow in streaming contexts.
    
    Expected behavior:
    - WITHOUT FIX: Test fails, detecting nested handler invocation
    - WITH FIX: Test passes, no nested invocation occurs
    """
    server = Server("test-server")
    
    # Track handler invocations to detect the bug
    handler_invocations = []
    
    @server.list_tools()
    async def list_tools():
        await anyio.sleep(0.001)
        return [
            Tool(
                name="test_tool",
                description="Test tool",
                inputSchema={"type": "object", "properties": {}}
            )
        ]
    
    @server.call_tool()
    async def call_tool(name: str, arguments: dict[str, Any]):
        return [TextContent(type="text", text="Tool executed successfully")]
    
    # Intercept the ListToolsRequest handler to detect nested invocation
    original_handler = server.request_handlers.get(ListToolsRequest)
    
    async def interceptor(req):
        # req is None for nested invocations (the bug!)
        # req is a proper request object for normal invocations
        if req is None:
            handler_invocations.append("NESTED - BUG DETECTED")
            print("❌ NESTED handler invocation detected (bug present)")
        else:
            handler_invocations.append("normal")
            print("✓ Normal handler invocation")
        
        if original_handler:
            return await original_handler(req)
        return None
    
    server.request_handlers[ListToolsRequest] = interceptor
    
    # Setup communication channels
    from anyio.streams.memory import MemoryObjectReceiveStream, MemoryObjectSendStream
    from mcp.shared.message import SessionMessage
    
    server_to_client_send, server_to_client_receive = anyio.create_memory_object_stream[SessionMessage](10)
    client_to_server_send, client_to_server_receive = anyio.create_memory_object_stream[SessionMessage](10)
    
    async def run_server():
        await server.run(
            client_to_server_receive,
            server_to_client_send,
            server.create_initialization_options()
        )
    
    async with anyio.create_task_group() as tg:
        tg.start_soon(run_server)
        
        async with ClientSession(server_to_client_receive, client_to_server_send) as session:
            await session.initialize()
            
            # Clear the cache to force a refresh on next tool call
            # This is the trigger for the bug
            server._tool_cache.clear()
            
            # Make a tool call - this triggers cache refresh
            result = await session.call_tool("test_tool", {})
            
            # Verify the tool call succeeded
            assert result is not None
            assert not result.isError
            assert result.content[0].text == "Tool executed successfully"
            
            # Check if nested handler invocation occurred
            has_nested = any("NESTED" in inv for inv in handler_invocations)
            
            if has_nested:
                print(f"\n❌ BUG CONFIRMED: {handler_invocations}")
                print("Nested handler invocation disrupts async execution in streaming contexts")
            else:
                print(f"\n✅ FIX VERIFIED: {handler_invocations}")
                print("Direct function call avoids nested handler invocation")
            
            # The bug is present if nested handler invocation occurs
            assert not has_nested, (
                "Nested handler invocation detected during cache refresh. "
                "This pattern (calling request_handlers[ListToolsRequest](None)) "
                "disrupts async execution in streaming contexts (issue #1298)."
            )
        
        tg.cancel_scope.cancel()


if __name__ == "__main__":
    # Run the test directly
    import asyncio
    asyncio.run(test_nested_handler_invocation_bug())
    print("\n✅ Test passed - fix is working correctly!")

Running the Test

With pytest:

pytest test_nested_handler_bug.py -v -s

Standalone (without pytest):

python test_nested_handler_bug.py

Expected Output

WITHOUT the fix (bug present):

❌ NESTED handler invocation detected (bug present)
✓ Normal handler invocation

❌ BUG CONFIRMED: ['NESTED - BUG DETECTED', 'normal']
Nested handler invocation disrupts async execution in streaming contexts

AssertionError: Nested handler invocation detected during cache refresh...

WITH the fix (bug resolved):

✓ Normal handler invocation

✅ FIX VERIFIED: ['normal']
Direct function call avoids nested handler invocation

✅ Test passed - fix is working correctly!

What This Test Demonstrates

  1. The Bug Pattern: When the tool cache is empty and a tool is called, _get_cached_tool_definition() calls self.request_handlers[ListToolsRequest](None) to refresh the cache.

  2. Why It's Problematic: This nested handler invocation (handler calling handler) disrupts the async execution context, especially in streaming environments where multiple async operations are interleaved.

Root Cause Analysis

The bug is in src/mcp/server/lowlevel/server.py in the _get_cached_tool_definition method:

# BUGGY CODE (v1.13.0-1.13.1)
async def _get_cached_tool_definition(self, tool_name: str) -> types.Tool | None:
    if tool_name not in self._tool_cache:
        if types.ListToolsRequest in self.request_handlers:
            # BUG: Direct handler invocation breaks in streaming contexts
            await self.request_handlers[types.ListToolsRequest](None)
    # ...

The direct invocation of the request handler creates nested handler execution which disrupts the async execution flow when any streaming context exists in the environment. After the cache refresh completes, execution doesn't properly continue to the tool handler.

Reproducibility

The issue is consistently reproducible with the provided test case. The bug consistently occurs when:

  • Starting with a fresh MCP server (no cached tools)
  • Having any streaming context active in the environment
  • Making the first tool call to the server

Impact

Affected Users

  • Production deployments using stateless/fresh-spawn MCP servers
  • Claude Opus 4.1 users with thinking budgets > 8192 tokens (requires streaming)
  • Multi-agent architectures with adaptive orchestration
  • Any MCP usage where AI clients have streaming enabled

Severity

  • Silent failure - tool handlers don't execute but no error is raised
  • Breaks core MCP functionality in common deployment patterns
  • No obvious workaround without code changes

Workarounds

Until fixed, users can:

  1. Disable streaming in AI clients (limits capabilities)
  2. Pre-warm cache with list_tools() before tool calls
  3. Use persistent servers instead of fresh spawns
  4. Apply the patch manually to their installation

Related Issues

Additional Context

Use Case Background

We're using MCP servers to provide tools for Claude Opus 4.1's interleaved thinking feature, which enables the model to reflect and adapt its strategy between tool calls. This is particularly powerful for:

  • Multi-agent orchestration where each tool queries different specialized agents
  • Complex investigations that require adaptive planning based on partial results
  • Error recovery scenarios where the model needs to adjust its approach mid-execution

The requirement for streaming (to support thinking budgets >8192 tokens) combined with our stateless server architecture (for scalability) triggers this bug consistently.

Investigation Notes

  • The bug only manifests when all three conditions are present
  • Removing any single condition (persistent servers, disabled streaming, or pre-warmed cache) prevents the bug
  • The issue appears to be related to how nested async handler execution interacts with streaming contexts
  • This may be related to StdioServerTransport Fails to Invoke Tool Handlers on Linux (Python SDK) #1278 which shows similar symptoms in a different context

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1Significant bug affecting many users, highly requested featurebugSomething isn't workingready for workEnough information for someone to start working on

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions