Skip to content

Add backpressure when rapidly creating new stateful Streamable HTTP sessions without closing them #677

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

halter73
Copy link
Contributor

@halter73 halter73 commented Aug 5, 2025

  • Also reduce MaxIdleSessionCount to default to 10,000

    • Note this does not include "active" requests. Any client can keep its session alive by keeping a GET request open for receiving unsolicited messages (meaning it doesn't correspond to a POST request) from the server over SSE.
  • I tested the AspNetCoreMcpSample using https://github.com/wg/wrk/ and the following command:

    $ ./wrk -t32 -c256 -d15s http://172.20.240.1:3001/ -s scripts/mcp.lua <session-id>
    Running 15s test @ http://172.20.240.1:3001/
      32 threads and 256 connections
      Thread Stats   Avg      Stdev     Max   +/- Stdev
        Latency     9.26ms    9.85ms 293.26ms   93.73%
        Req/Sec     0.98k   163.43     3.06k    79.90%
      472786 requests in 15.10s, 161.69MB read
    Requests/sec:  31310.41
    Transfer/sec:     10.71MB
    

    And the following lua script:

    local counter = 0
    
    function setup(thread)
       thread:set("counter", counter * 1000000)
       counter = counter + 1
    end
    
    function init(args)
       wrk.method = "POST"
       wrk.headers["Accept"] = "application/json,text/event-stream"
       wrk.headers["Content-Type"] = "application/json"
       wrk.headers["Mcp-Session-Id"] = args[1]
    end
    
    function request()
       local id = wrk.thread:get("counter")
       wrk.thread:set("counter", id + 1)
    
       wrk.body = string.format('{"method":"tools/call","params":{"name":"echo","arguments":{"message":"abc"}},"jsonrpc":"2.0","id":%d}', id)
    
       return wrk.format()
    end

The ~30,000 RPS isn't too bad considering the RPS I get only ~5,000 RPS more using the same parameters against the following minimal endpoint:

app.MapPost("/minimal", (JsonRpcMessage message) => message);
$ ./wrk -t32 -c256 -d15s http://172.20.240.1:3001/minimal -s scripts/mcp.lua
Running 15s test @ http://172.20.240.1:3001/minimal
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     8.96ms   12.84ms 329.63ms   94.40%
    Req/Sec     1.13k   204.75     3.70k    76.82%
  545396 requests in 15.10s, 138.66MB read
Requests/sec:  36117.74
Transfer/sec:      9.18MB

However, @DavidParks8 made another benchmark that created a new session per request which quickly overloaded the server. I emulated this benchmark by simply commenting out as the -- wrk.headers["Mcp-Session-Id"] = args[1] part of the lua script, and sure enough after just over 100,000 requests the server became overwhelmed. I attached a debugger and saw GC could not keep up disposing all the McpServers/McpSessions and the hundreds of thread pool threads started that were responsible for calling DisposeAsync and unreferencing the pruned idle sessions stalled waiting on GC leading to thread pool starvation.

Parallel Stacks screenshot showing GC induced thread pool starvation
$ ./wrk -t32 -c256 -d15s http://172.20.240.1:3001/ -s scripts/mcp.lua
Running 15s test @ http://172.20.240.1:3001/
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    23.35ms   49.85ms 636.69ms   92.86%
    Req/Sec   592.05    275.30     2.57k    66.37%
  136024 requests in 15.10s, 46.51MB read
Requests/sec:   9008.73
Transfer/sec:      3.08MB
$ ./wrk -t32 -c256 -d15s http://172.20.240.1:3001/ -s scripts/mcp.lua
Running 15s test @ http://172.20.240.1:3001/
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.00us    0.00us   0.00us    -nan%
    Req/Sec     0.00      0.00     0.00      -nan%
  0 requests in 15.09s, 0.00B read
Requests/sec:      0.00
Transfer/sec:       0.00B

With these changes, since we only allow 11,000 new sessions to be created with the new default idle session limit of 10,000, we'll only prune down to 1,000 idle sessions every 5 seconds if the client doesn't gracefully close the session with a DELETE request.

The new default steady-states to approximately 200 new sessions per second. We can look at improving the maximum new session rate this once we have better object pooling to avoid GC pressure when creating new sessions so rapidly. We could also look into proactively pruning idle sessions when new sessions are waiting rather than waiting on the background service.

Here are results that show that the new session RPS remains stable after this change even if you try to open more concurrent connections to allow more parallel requests. Memory usage looks stable as well at about 400 MB in these tests.

$ ./wrk -t32 -c1024 -d60s http://172.20.240.1:3001/ -s scripts/mcp.lua --timeout 15s
Running 1m test @ http://172.20.240.1:3001/
  32 threads and 1024 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.01s     2.31s   10.01s    55.56%
    Req/Sec    67.96    148.70     3.89k    91.58%
  18266 requests in 1.00m, 6.41MB read
Requests/sec:    303.92
Transfer/sec:    109.28KB
$ ./wrk -t32 -c1024 -d60s http://172.20.240.1:3001/ -s scripts/mcp.lua --timeout 15s
Running 1m test @ http://172.20.240.1:3001/
  32 threads and 1024 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.03s     2.25s    6.48s    50.25%
    Req/Sec    71.43    142.28     3.36k    93.49%
  19143 requests in 1.00m, 6.73MB read
Requests/sec:    318.57
Transfer/sec:    114.62KB

These changes do not have any apparent impact on single-session performance which remains a little over 30k RPS.

@halter73 halter73 changed the title Add backpressure when rapidly creating new stateful Streamable HTTP sessions Add backpressure when rapidly creating new stateful Streamable HTTP sessions without closing them Aug 5, 2025

// Acquire semaphore when session becomes inactive (reference count goes to 0) to slow
// down session creation until idle sessions are disposed by the background service.
await session._idleSessionSemaphore.WaitAsync();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this wait forever since there is no cancellationToken?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory no, since the session is already marked as idle by virtue of the reference count going to zero, the corresponding call to Release() to unblock this should not be inhibited.

For this to wait, that means we are at least at 110% of the MaxIdleSessionCount. Either idle tracking background service will eventually dispose enough sessions to get it below 110% and call Release() or another request will reactivate the session and call Release(). And any sessions waiting for this call to WaitAsync() to complete are already eligible for pruning if they haven't been reactivated.

@halter73
Copy link
Contributor Author

I'm closing this in favor of #701 which is an improvement over this PR because it disposes an unroots extra idle sessions inline before starting new ones , so there's no need to wait for the IdleTrackingBackgroundService to clean up the next time the periodic timer fires even though the periodic timer still exists to handle normal IdleTimeouts when the MaxIdleSessionCount is not reached.

@halter73 halter73 closed this Aug 12, 2025
@halter73 halter73 deleted the halter73/stateful-session-backpressure branch August 12, 2025 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants