Skip to content

Commit 33fdf65

Browse files
MONGOCRYPT-723 support $lookup (#954)
* add opt-in to multi-collection support with `mongocrypt_setopt_enable_multiple_collinfo` Require opting in to enable multi-collection support. Require drivers to signal that the new protocol for `MONGOCRYPT_CTX_NEED_MONGO_COLLINFO` is implemented. * add `mc_schema_broker_has_multiple_requests` Used to implement opt-in check * parse $lookup * add version check for mongocryptd * add version check for crypt shared * add option to skip init in tests Useful to further configure `mongocrypt_t` before test * test multi-collection commands Test $lookup is supported. Test $unionWith and $facet are parsed. Test opt-in is required to support multiple collections. Test mongocryptd/crypt_shared version is checked. * increase `TEST_DATA_COUNT` To fix observed test failures on Evergreen --------- Co-authored-by: Ezra Chung <[email protected]>
1 parent 3e2f462 commit 33fdf65

File tree

124 files changed

+5356
-45
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

124 files changed

+5356
-45
lines changed

integrating.md

+13-2
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,15 @@ All contexts.
140140

141141
#### State: `MONGOCRYPT_CTX_NEED_MONGO_COLLINFO` ####
142142

143+
> [!IMPORTANT]
144+
> <a name="multi-collection-commands"></a> **Multi-collection commands**: prior to 1.13.0, drivers were expected to pass _at most one result_ from `listCollections`. In 1.13.0, drivers are expected to pass _all results_ from `listCollections` to support multi-collection commands (e.g. aggregate with `$lookup`).
145+
>
146+
> Drivers must call `mongocrypt_setopt_enable_multiple_collinfo` to indicate the new behavior is implemented and opt-in to support for multi-collection commands. This opt-in is to prevent the following bug scenario:
147+
> > A driver upgrades to 1.13.0, but does not update prior behavior which passes at most one result of a multi-collection command.
148+
> > A multi-collection command requests schemas for both `db.c1` and `db.c2`.
149+
> > The driver only passes the result for `db.c1` even though `db.c2` also has a result.
150+
> > Therefore, libmongocrypt incorrectly believes `db.c2` has no schema.
151+
143152
**libmongocrypt needs**...
144153

145154
A result from a listCollections cursor.
@@ -148,7 +157,7 @@ A result from a listCollections cursor.
148157

149158
1. Run listCollections on the encrypted MongoClient with the filter
150159
provided by `mongocrypt_ctx_mongo_op`
151-
2. Return the first result (if any) with `mongocrypt_ctx_mongo_feed` or proceed to the next step if nothing was returned.
160+
2. Pass all results (if any) with calls to `mongocrypt_ctx_mongo_feed` or proceed to the next step if nothing was returned. Results may be passed in any order.
152161
3. Call `mongocrypt_ctx_mongo_done`
153162

154163
**Applies to...**
@@ -157,6 +166,8 @@ auto encrypt
157166

158167
#### State: `MONGOCRYPT_CTX_NEED_MONGO_COLLINFO_WITH_DB` ####
159168

169+
See [note](#multi-collection-commands) about multi-collection commands.
170+
160171
**libmongocrypt needs**...
161172

162173
Results from a listCollections cursor from a specified database.
@@ -165,7 +176,7 @@ Results from a listCollections cursor from a specified database.
165176

166177
1. Run listCollections on the encrypted MongoClient with the filter
167178
provided by `mongocrypt_ctx_mongo_op` on the database provided by `mongocrypt_ctx_mongo_db`.
168-
2. Return the first result (if any) with `mongocrypt_ctx_mongo_feed` or proceed to the next step if nothing was returned.
179+
2. Pass all results (if any) with calls to `mongocrypt_ctx_mongo_feed` or proceed to the next step if nothing was returned. Results may be passed in any order.
169180
3. Call `mongocrypt_ctx_mongo_done`
170181

171182
**Applies to...**

src/mc-schema-broker-private.h

+3
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,9 @@ void mc_schema_broker_use_rangev2(mc_schema_broker_t *sb);
4141
// Returns error if two requests have different databases (not-yet supported).
4242
bool mc_schema_broker_request(mc_schema_broker_t *sb, const char *db, const char *coll, mongocrypt_status_t *status);
4343

44+
// mc_schema_broker_has_multiple_requests returns true if there are requests for multiple unique collections
45+
bool mc_schema_broker_has_multiple_requests(const mc_schema_broker_t *sb);
46+
4447
// mc_schema_broker_append_listCollections_filter appends a filter to use with the listCollections command.
4548
// Example: { "name": { "$in": [ "coll1", "coll2" ] } }
4649
// The filter matches all not-yet-satisfied collections.

src/mc-schema-broker.c

+5
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,11 @@ bool mc_schema_broker_request(mc_schema_broker_t *sb, const char *db, const char
9393
return true;
9494
}
9595

96+
bool mc_schema_broker_has_multiple_requests(const mc_schema_broker_t *sb) {
97+
BSON_ASSERT_PARAM(sb);
98+
return sb->ll_len > 1;
99+
}
100+
96101
void mc_schema_broker_destroy(mc_schema_broker_t *sb) {
97102
if (!sb) {
98103
return;

src/mongocrypt-ctx-encrypt.c

+220-19
Original file line numberDiff line numberDiff line change
@@ -2220,9 +2220,161 @@ static bool needs_ismaster_check(mongocrypt_ctx_t *ctx) {
22202220
BSON_ASSERT_PARAM(ctx);
22212221

22222222
bool using_mongocryptd = !ectx->bypass_query_analysis && !ctx->crypt->csfle.okay;
2223-
// The "create" and "createIndexes" command require an isMaster check when
2224-
// using mongocryptd. See MONGOCRYPT-429.
2225-
return using_mongocryptd && (0 == strcmp(ectx->cmd_name, "create") || 0 == strcmp(ectx->cmd_name, "createIndexes"));
2223+
2224+
if (!using_mongocryptd) {
2225+
return false;
2226+
}
2227+
2228+
if (mc_schema_broker_has_multiple_requests(ectx->sb)) {
2229+
// Only mongocryptd 8.1 (wire version 26) supports multiple schemas with csfleEncryptionSchemas.
2230+
return true;
2231+
}
2232+
// MONGOCRYPT-429: The "create" and "createIndexes" command are only supported on mongocrypt 6.0 (wire version 17).
2233+
if (0 == strcmp(ectx->cmd_name, "create") || 0 == strcmp(ectx->cmd_name, "createIndexes")) {
2234+
return true;
2235+
}
2236+
2237+
return false;
2238+
}
2239+
2240+
// `find_collections_in_pipeline` finds other collection names in an aggregate pipeline that may need schemas.
2241+
static bool find_collections_in_pipeline(mc_schema_broker_t *sb,
2242+
bson_iter_t pipeline_iter,
2243+
const char *db,
2244+
mstr_view path,
2245+
mongocrypt_status_t *status) {
2246+
bson_iter_t array_iter;
2247+
if (!BSON_ITER_HOLDS_ARRAY(&pipeline_iter) || !bson_iter_recurse(&pipeline_iter, &array_iter)) {
2248+
CLIENT_ERR("failed to recurse pipeline at path: %s", path.data);
2249+
return false;
2250+
}
2251+
2252+
while (bson_iter_next(&array_iter)) {
2253+
bson_iter_t stage_iter;
2254+
const char *stage_key = bson_iter_key(&array_iter);
2255+
2256+
if (!BSON_ITER_HOLDS_DOCUMENT(&array_iter) || !bson_iter_recurse(&array_iter, &stage_iter)
2257+
|| !bson_iter_next(&stage_iter)) {
2258+
CLIENT_ERR("failed to recurse stage at path: %s.%s", path.data, stage_key);
2259+
return false;
2260+
}
2261+
2262+
const char *stage = bson_iter_key(&stage_iter);
2263+
// Check for $lookup.
2264+
if (0 == strcmp(stage, "$lookup")) {
2265+
bson_iter_t lookup_iter;
2266+
if (!BSON_ITER_HOLDS_DOCUMENT(&stage_iter) || !bson_iter_recurse(&stage_iter, &lookup_iter)) {
2267+
CLIENT_ERR("failed to recurse $lookup at path: %s.%s", path.data, stage_key);
2268+
return false;
2269+
}
2270+
2271+
while (bson_iter_next(&lookup_iter)) {
2272+
const char *field = bson_iter_key(&lookup_iter);
2273+
if (0 == strcmp(field, "from")) {
2274+
if (!BSON_ITER_HOLDS_UTF8(&lookup_iter)) {
2275+
CLIENT_ERR("expected string, but '%s' for 'from' field at path: %s.%s",
2276+
mc_bson_type_to_string(bson_iter_type(&lookup_iter)),
2277+
path.data,
2278+
stage_key);
2279+
return false;
2280+
}
2281+
const char *from = bson_iter_utf8(&lookup_iter, NULL);
2282+
if (!mc_schema_broker_request(sb, db, from, status)) {
2283+
return false;
2284+
}
2285+
}
2286+
2287+
if (0 == strcmp(field, "pipeline")) {
2288+
mstr subpath = mstr_append(path, mstrv_lit("."));
2289+
mstr_inplace_append(&subpath, mstrv_view_cstr(stage_key));
2290+
mstr_inplace_append(&subpath, mstrv_lit(".$lookup.pipeline"));
2291+
if (!find_collections_in_pipeline(sb, lookup_iter, db, subpath.view, status)) {
2292+
mstr_free(subpath);
2293+
return false;
2294+
}
2295+
mstr_free(subpath);
2296+
}
2297+
}
2298+
}
2299+
2300+
// Check for $facet.
2301+
if (0 == strcmp(stage, "$facet")) {
2302+
bson_iter_t facet_iter;
2303+
if (!BSON_ITER_HOLDS_DOCUMENT(&stage_iter) || !bson_iter_recurse(&stage_iter, &facet_iter)) {
2304+
CLIENT_ERR("failed to recurse $facet at path: %s.%s", path.data, stage_key);
2305+
return false;
2306+
}
2307+
2308+
while (bson_iter_next(&facet_iter)) {
2309+
const char *field = bson_iter_key(&facet_iter);
2310+
mstr subpath = mstr_append(path, mstrv_lit("."));
2311+
mstr_inplace_append(&subpath, mstrv_view_cstr(stage_key));
2312+
mstr_inplace_append(&subpath, mstrv_lit(".$facet."));
2313+
mstr_inplace_append(&subpath, mstrv_view_cstr(field));
2314+
if (!find_collections_in_pipeline(sb, facet_iter, db, subpath.view, status)) {
2315+
mstr_free(subpath);
2316+
return false;
2317+
}
2318+
mstr_free(subpath);
2319+
}
2320+
}
2321+
2322+
// Check for $unionWith.
2323+
if (0 == strcmp(stage, "$unionWith")) {
2324+
bson_iter_t unionWith_iter;
2325+
if (!BSON_ITER_HOLDS_DOCUMENT(&stage_iter) || !bson_iter_recurse(&stage_iter, &unionWith_iter)) {
2326+
CLIENT_ERR("failed to recurse $unionWith at path: %s.%s", path.data, stage_key);
2327+
return false;
2328+
}
2329+
2330+
while (bson_iter_next(&unionWith_iter)) {
2331+
const char *field = bson_iter_key(&unionWith_iter);
2332+
if (0 == strcmp(field, "coll")) {
2333+
if (!BSON_ITER_HOLDS_UTF8(&unionWith_iter)) {
2334+
CLIENT_ERR("expected string, but got '%s' for 'coll' field at path: %s.%s",
2335+
mc_bson_type_to_string(bson_iter_type(&unionWith_iter)),
2336+
path.data,
2337+
stage_key);
2338+
return false;
2339+
}
2340+
const char *coll = bson_iter_utf8(&unionWith_iter, NULL);
2341+
if (!mc_schema_broker_request(sb, db, coll, status)) {
2342+
return false;
2343+
}
2344+
}
2345+
2346+
if (0 == strcmp(field, "pipeline")) {
2347+
mstr subpath = mstr_append(path, mstrv_lit("."));
2348+
mstr_inplace_append(&subpath, mstrv_view_cstr(stage_key));
2349+
mstr_inplace_append(&subpath, mstrv_lit(".$unionWith.pipeline"));
2350+
if (!find_collections_in_pipeline(sb, unionWith_iter, db, subpath.view, status)) {
2351+
mstr_free(subpath);
2352+
return false;
2353+
}
2354+
mstr_free(subpath);
2355+
}
2356+
}
2357+
}
2358+
}
2359+
2360+
return true;
2361+
}
2362+
2363+
static bool
2364+
find_collections_in_agg(mongocrypt_binary_t *cmd, mc_schema_broker_t *sb, const char *db, mongocrypt_status_t *status) {
2365+
bson_t cmd_bson;
2366+
if (!_mongocrypt_binary_to_bson(cmd, &cmd_bson)) {
2367+
CLIENT_ERR("failed to convert command to BSON");
2368+
return false;
2369+
}
2370+
2371+
bson_iter_t iter;
2372+
if (!bson_iter_init_find(&iter, &cmd_bson, "pipeline")) {
2373+
// Command may be malformed. Let server error.
2374+
return true;
2375+
}
2376+
2377+
return find_collections_in_pipeline(sb, iter, db, mstrv_lit("aggregate.pipeline"), status);
22262378
}
22272379

22282380
bool mongocrypt_ctx_encrypt_init(mongocrypt_ctx_t *ctx, const char *db, int32_t db_len, mongocrypt_binary_t *cmd) {
@@ -2313,6 +2465,22 @@ bool mongocrypt_ctx_encrypt_init(mongocrypt_ctx_t *ctx, const char *db, int32_t
23132465
}
23142466
}
23152467

2468+
if (0 == strcmp(ectx->cmd_name, "aggregate")) {
2469+
if (!find_collections_in_agg(cmd, ectx->sb, ectx->cmd_db, ctx->status)) {
2470+
_mongocrypt_ctx_fail(ctx);
2471+
return false;
2472+
}
2473+
2474+
if (mc_schema_broker_has_multiple_requests(ectx->sb)) {
2475+
if (!ctx->crypt->multiple_collinfo_enabled) {
2476+
return _mongocrypt_ctx_fail_w_msg(ctx,
2477+
"aggregate includes a $lookup stage, but libmongocrypt is not "
2478+
"configured to support encrypting a "
2479+
"command with multiple collections");
2480+
}
2481+
}
2482+
}
2483+
23162484
if (ctx->opts.kek.provider.aws.region || ctx->opts.kek.provider.aws.cmk) {
23172485
return _mongocrypt_ctx_fail_w_msg(ctx, "aws masterkey options must not be set");
23182486
}
@@ -2341,11 +2509,8 @@ bool mongocrypt_ctx_encrypt_init(mongocrypt_ctx_t *ctx, const char *db, int32_t
23412509
bson_free(cmd_val);
23422510
}
23432511

2344-
/* The "create" and "createIndexes" command require sending an isMaster
2345-
* request to mongocryptd. */
2512+
// Check if an isMaster request to mongocryptd is needed to detect feature support:
23462513
if (needs_ismaster_check(ctx)) {
2347-
/* We are using mongocryptd. We need to ensure that mongocryptd
2348-
* maxWireVersion >= 17. */
23492514
ectx->ismaster.needed = true;
23502515
ctx->state = MONGOCRYPT_CTX_NEED_MONGO_MARKINGS;
23512516
return true;
@@ -2355,6 +2520,10 @@ bool mongocrypt_ctx_encrypt_init(mongocrypt_ctx_t *ctx, const char *db, int32_t
23552520
}
23562521

23572522
#define WIRE_VERSION_SERVER_6 17
2523+
#define WIRE_VERSION_SERVER_8_1 26
2524+
// The crypt_shared version format is defined in mongo_crypt-v1.h.
2525+
// Example: server 6.2.1 is encoded as 0x0006000200010000
2526+
#define CRYPT_SHARED_8_1 0x0008000100000000ull
23582527

23592528
/* mongocrypt_ctx_encrypt_ismaster_done is called when:
23602529
* 1. The max wire version of mongocryptd is known.
@@ -2367,20 +2536,52 @@ static bool mongocrypt_ctx_encrypt_ismaster_done(mongocrypt_ctx_t *ctx) {
23672536

23682537
ectx->ismaster.needed = false;
23692538

2370-
/* The "create" and "createIndexes" command require bypassing on mongocryptd
2371-
* older than version 6.0. */
23722539
if (needs_ismaster_check(ctx)) {
2373-
if (ectx->ismaster.maxwireversion < WIRE_VERSION_SERVER_6) {
2374-
// Bypass auto encryption.
2375-
// Satisfy schema request with an empty schema.
2376-
if (!mc_schema_broker_satisfy_remaining_with_empty_schemas(ectx->sb,
2377-
NULL /* do not cache */,
2378-
ctx->status)) {
2379-
return _mongocrypt_ctx_fail(ctx);
2540+
// MONGOCRYPT-429: "create" and "createIndexes" require bypassing on mongocryptd older than version 6.0.
2541+
if (0 == strcmp(ectx->cmd_name, "create") || 0 == strcmp(ectx->cmd_name, "createIndexes")) {
2542+
if (ectx->ismaster.maxwireversion < WIRE_VERSION_SERVER_6) {
2543+
// Bypass auto encryption.
2544+
// Satisfy schema request with an empty schema.
2545+
if (!mc_schema_broker_satisfy_remaining_with_empty_schemas(ectx->sb,
2546+
NULL /* do not cache */,
2547+
ctx->status)) {
2548+
return _mongocrypt_ctx_fail(ctx);
2549+
}
2550+
ctx->nothing_to_do = true;
2551+
ctx->state = MONGOCRYPT_CTX_READY;
2552+
return true;
2553+
}
2554+
}
2555+
2556+
if (mc_schema_broker_has_multiple_requests(ectx->sb)) {
2557+
// Ensure mongocryptd supports multiple schemas.
2558+
if (ectx->ismaster.maxwireversion < WIRE_VERSION_SERVER_8_1) {
2559+
mongocrypt_status_t *status = ctx->status;
2560+
CLIENT_ERR("Encrypting '%s' requires multiple schemas. Detected mongocryptd with wire version %" PRId32
2561+
", but need %" PRId32 ". Upgrade mongocryptd to 8.1 or newer.",
2562+
ectx->cmd_name,
2563+
ectx->ismaster.maxwireversion,
2564+
WIRE_VERSION_SERVER_8_1);
2565+
_mongocrypt_ctx_fail(ctx);
2566+
return false;
2567+
}
2568+
}
2569+
}
2570+
2571+
if (ctx->crypt->csfle.okay) {
2572+
if (mc_schema_broker_has_multiple_requests(ectx->sb)) {
2573+
// Ensure crypt_shared supports multiple schemas.
2574+
uint64_t version = ctx->crypt->csfle.get_version();
2575+
const char *version_str = ctx->crypt->csfle.get_version_str();
2576+
if (version < CRYPT_SHARED_8_1) {
2577+
mongocrypt_status_t *status = ctx->status;
2578+
CLIENT_ERR("Encrypting '%s' requires multiple schemas. Detected crypt_shared with version %s, but "
2579+
"need 8.1. Upgrade crypt_shared to 8.1 or newer.",
2580+
ectx->cmd_name,
2581+
version_str);
2582+
_mongocrypt_ctx_fail(ctx);
2583+
return false;
23802584
}
2381-
ctx->nothing_to_do = true;
2382-
ctx->state = MONGOCRYPT_CTX_READY;
2383-
return true;
23842585
}
23852586
}
23862587

src/mongocrypt-private.h

+1
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,7 @@ struct _mongocrypt_t {
130130
/// Pointer to the global csfle_lib object. Should not be freed directly.
131131
mongo_crypt_v1_lib *csfle_lib;
132132
bool retry_enabled;
133+
bool multiple_collinfo_enabled;
133134
};
134135

135136
typedef enum {

src/mongocrypt.c

+6
Original file line numberDiff line numberDiff line change
@@ -166,6 +166,12 @@ bool mongocrypt_setopt_retry_kms(mongocrypt_t *crypt, bool enable) {
166166
return true;
167167
}
168168

169+
bool mongocrypt_setopt_enable_multiple_collinfo(mongocrypt_t *crypt) {
170+
ASSERT_MONGOCRYPT_PARAM_UNINIT(crypt);
171+
crypt->multiple_collinfo_enabled = true;
172+
return true;
173+
}
174+
169175
bool mongocrypt_setopt_kms_provider_aws(mongocrypt_t *crypt,
170176
const char *aws_access_key_id,
171177
int32_t aws_access_key_id_len,

src/mongocrypt.h

+11
Original file line numberDiff line numberDiff line change
@@ -322,6 +322,17 @@ bool mongocrypt_setopt_log_handler(mongocrypt_t *crypt, mongocrypt_log_fn_t log_
322322
MONGOCRYPT_EXPORT
323323
bool mongocrypt_setopt_retry_kms(mongocrypt_t *crypt, bool enable);
324324

325+
/**
326+
* Enable support for multiple collection schemas. Required to support $lookup.
327+
*
328+
* @param[in] crypt The @ref mongocrypt_t object.
329+
* @pre @ref mongocrypt_init has not been called on @p crypt.
330+
* @returns A boolean indicating success. If false, an error status is set.
331+
* Retrieve it with @ref mongocrypt_ctx_status
332+
*/
333+
MONGOCRYPT_EXPORT
334+
bool mongocrypt_setopt_enable_multiple_collinfo(mongocrypt_t *crypt);
335+
325336
/**
326337
* Configure an AWS KMS provider on the @ref mongocrypt_t object.
327338
*

0 commit comments

Comments
 (0)