Skip to content

Fix: Prevent session manager shutdown on individual session crash #841

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

soby
Copy link

@soby soby commented May 29, 2025

Previously, an unhandled exception within a single MCP session's MCPServer.run() task could propagate to the StreamableHTTPSessionManager's main task group. This would cause the entire task group to cancel, effectively shutting down the session manager and terminating all active sessions.

This commit addresses the issue by:

  1. Wrapping the self.app.run(...) call within the run_server (for stateful requests) and run_stateless_server (for stateless requests) inner functions in StreamableHTTPSessionManager with a try...except Exception block.
  2. Logging any caught exceptions along with the session ID (for stateful requests) to aid in debugging the crashed session.

This change ensures that if a single session encounters an unexpected error and crashes, it only affects that specific session. The StreamableHTTPSessionManager will continue to run, and other active sessions will remain operational. This significantly improves the robustness and availability of the server.

Motivation and Context

Unhandled exceptions such as a network error would render the server unusable until restart

How Has This Been Tested?

Yes, specifically by generating client disconnects and observing the server log the unhandled error but remain running and stable

Breaking Changes

No breaking changes

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

@soby
Copy link
Author

soby commented May 29, 2025

To address a test failure, I updated the cleanup logic to only remove error sessions and not those that have been explicitly terminated. I see that the terminated sessions are retained and used for the 404 vs 400 return code logic. However, I do not see any place where those are ever removed so I imagine they just accumulate until server restart. It's not within the scope of this PR to address but that's likely not desirable behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant