Skip to content

Cursor deadlock when connected to a hidden replica with directConnection=true #1332

Closed
@aleksandr-gorokhov

Description

@aleksandr-gorokhov

Versions/Environment

  1. What version of Rust are you using?
    rustc 1.84.1 (e71f9a9a9 2025-01-27)
  2. What operating system are you using?
    macOS Sequoia 15.3.1
  3. What versions of the driver and its dependencies are you using? (Run
    cargo pkgid mongodb & cargo pkgid bson)
    registry+https://github.com/rust-lang/crates.io-index#[email protected]
    registry+https://github.com/rust-lang/crates.io-index#[email protected]
  4. What version of MongoDB are you using? (Check with the MongoDB shell using db.version())
    "6.0.20"
  5. What is your MongoDB topology (standalone, replica set, sharded cluster, serverless)? Replica Set

Describe the bug

When connected to a hidden secondary replica with directConnection=true, the MongoDB Rust driver fails to retrieve additional documents beyond 16MB (by default) due to server selection logic.

BE SPECIFIC:

  • What is the expected behavior and what is actually happening?
    Actual:
let cursor = collection.find(doc! {}).batch_size(10000).await?

match cursor.try_next().await {
            Ok(Some(doc)) => {
                batch.push(doc);
                println!("Batch length: {}", batch.len());
            },
            Ok(None) => break, // No more documents
            Err(e) => {
                println!("Error fetching document: {}", e);
                continue;
            }
        }

At one time mongo transfers only 16MB of data which in my case is around 3k documents. Afterwards cursor attempts to fetch more documents. However, the server selection logic determines that there is no data-bearing replica available for polling, as RsOther is not included in the is_data_bearing function.
As a result, the cursor remains stuck and cannot fetch additional documents..

Kind: Server selection timeout: None of the available servers suitable for criteria Predicate. Topology: { Type: Single, Servers: [ { Address: mongo3:27019, Type: RsOther, Average RTT: 5.854334ms, Last Update Time: 2025-03-14 13:39:21.591 +00:00:00, Max Wire Version: 17, Min Wire Version: 0, Replica Set Name: rs0, Replica Set Version: 1 } ] }, labels: {}

Expected:
The MongoDB cursor should continuously poll for additional documents until all available data has been retrieved.

  • Do you have any ideas on why this may be happening that could give us a
    clue in the right direction?

A hidden MongoDB replica is classified as RsOther. When attempting to select a server for a get_more request, RsOther instances cannot be selected. Additionally, when directConnection=true is used, the driver does not allow any other servers to be chosen, leading to a deadlock situation where the cursor cannot fetch additional documents.

  • Are there multiple ways of triggering this bug (perhaps more than one
    function produce a crash)?

This issue occurs with any cursor that has a batch_size large enough to retrieve more than 16MB of data from MongoDB. Once the initial batch limit is reached, the cursor fails to fetch additional documents due to the server selection constraints.

To Reproduce

  1. Set up a MongoDB replica set.
  2. Configure one of the replicas as hidden.
  3. Connect to the hidden replica using directConnection=true.
  4. Create a cursor with a batch_size large enough to exceed 16MB of data (or your system’s default limit).
  5. Start polling the cursor.
  6. Observe that the cursor gets stuck and cannot fetch additional documents.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions