Skip to content

MONGOCRYPT-723 support $lookup #954

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Feb 20, 2025
Merged

Conversation

kevinAlbs
Copy link
Contributor

@kevinAlbs kevinAlbs commented Feb 14, 2025

Summary

Support encrypting "aggregate" commands with $lookup

Tested with libmongocrypt patch: https://spruce.mongodb.com/version/67af62d2eff4dc0007490c8f
Tested with C driver patch pointing to this branch: https://spruce.mongodb.com/version/67af5960bad7a30007a37b90

Background & Motivation

Parsing

Enough of the aggregate pipeline is parsed to find occurrences of $lookup. $lookup can be nested within $lookup, $facet, and $unionWith.

mongocryptd/crypt_shared return an error for $unionWith and $facet: Aggregation stage $unionWith is not allowed or supported with automatic encryption.. Regardless, this PR parses within $unionWith and $facet for completeness.

Parsing in libmongocrypt may require future changes. If a future server version extends the aggregate pipeline, libmongocrypt may not find all $lookup stages. However, if libmongocrypt does not pass a schema for a needed collection, mongocryptd/crypt_shared errors (e.g. Missing encryption schema for namespace: db.c2) rather than assumes the collection has no schema. This was decided acceptable in Technical Design: Support $lookup in CSFLE and QE:

mongocryptd/crypt_shared version check

A version check is added to ensure mongocryptd/crypt_shared supports multiple schemas. This is intended to improve the error that would otherwise be returned from a too-old mongocryptd/crypt_shared. Testing mongocryptd 8.0 gets the following errors:

  • jsonSchema or encryptionInformation is required when passing multiple CSFLE schemas.
  • Exactly one namespace is supported with encryptionInformation when passing multiple QE schemas.

Instead, libmongocrypt returns an error like the following:

  • Encrypting 'aggregate' requires multiple schemas. Detected mongocryptd with wire version 17, but need 26. Upgrade mongocryptd to 8.1 or newer. when using mongocryptd
  • Encrypting 'aggregate' requires multiple schemas. Detected crypt_shared with version 8.0.4, but need 8.1. Upgrade crypt_shared to 8.1 or newer. when using crypt_shared

Opt-in to support multi-collection commands

The state MONGOCRYPT_CTX_NEED_MONGO_COLLINFO requests the driver send listCollections to check for server-side schemas. Before this PR, this state only expected at most one result from listCollections. Quoting integrating.md:

Return the first result (if any) with mongocrypt_ctx_mongo_feed

To support $lookup, drivers are now expected to return all results from listCollections.

libmongocrypt cannot distinguish between "did not pass all results" (following old protocol) and "server did not have results". libmongocrypt applies empty schemas to collections that have no known schema. If a driver upgrades libmongocrypt and does not implement the new protocol, an empty schema might be applied to a collection that really has a server-side schema.

To ensure drivers upgrading libmongocrypt follow the new protocol, drivers must call mongocrypt_setopt_enable_multiple_collinfo to enable support for multi-collection commands. Without opting-in, libmongocrypt returns an error if a command requires multiple collections. This is to avoid the risk that a driver upgrades libmongocrypt without implementing the new protocol.

Increasing TEST_DATA_COUNT

Increasing TEST_DATA_COUNT was done to support the new tests. Increasing too high resulted in a segfault on Windows (too large stack frame?). Filed MONGOCRYPT-775 to investigate separately (not strictly needed for this PR).

Require opting in to enable multi-collection support.
Require drivers to signal that the new protocol for MONGOCRYPT_CTX_NEED_MONGO_COLLINFO is implemented.
Used to implement opt-in check
Useful to further configure `mongocrypt_t` before test
Test $lookup is supported.
Test $unionWith and $facet are parsed.
Test opt-in is required to support multiple collections.
Test mongocryptd/crypt_shared version is checked.
To fix observed test failures on Evergreen
@kevinAlbs kevinAlbs marked this pull request as ready for review February 14, 2025 16:36
@kevinAlbs kevinAlbs requested a review from eramongodb February 18, 2025 17:32
kevinAlbs and others added 2 commits February 18, 2025 12:34
Co-authored-by: Ezra Chung <[email protected]>
Co-authored-by: Ezra Chung <[email protected]>
@kevinAlbs kevinAlbs merged commit 33fdf65 into mongodb:master Feb 20, 2025
53 checks passed
} tester_mongocrypt_flags;

/* Arbitrary max of 2048 instances of temporary test data. Increase as needed.
/* Arbitrary max of 2148 instances of temporary test data. Increase as needed.
* TODO(MONGOCRYPT-775) increasing further (e.g. 3000+) causes a segfault on Windows test runs. Revise.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting "static" on the "tester" definition in main() should fix this, it's a pretty large thing to keep on the stack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants