Skip to content

PYTHON-5061 - Add an API to extend the bson TypeRegistry #2345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 21, 2025

Conversation

NoahStapp
Copy link
Contributor

No description provided.

@NoahStapp NoahStapp requested a review from blink1073 May 20, 2025 14:54
if isinstance(codec, TypeEncoder):
self._validate_type_encoder(codec)
is_valid_codec = True
self._encoder_map[codec.python_type] = codec.transform_python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there thread safety concerns here? What happens when add_codec is called when the registry is being used for bson encoding/decoding? What about with the C extensions and/or free threading mode?

Copy link
Contributor Author

@NoahStapp NoahStapp May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If add_codec is called while the registry is in use, the new codec added would only be available for encoding/decoding after the add_codec call completes. Yes, this is technically not thread-safe, but is extending a TypeRegistry while already encoding/decoding something we actually expect users to do?

If it is, avoiding that race condition would require us to detect if a TypeRegistry is already in use for encoding/decoding and block add_codec until the processing completes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose we keep the Registry immutable. After all, CodecOptions is intended to be immmutable and the registry is part of CodecOptions. This new api should return a new registry instance and keep the old one unchanged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason CodecOptions is intended to be immutable besides the race condition outlined above? Returning a new registry instance would then require users to create a new CodecOptions with that registry, along with all of the other settings they may have configured in the original. That seems significantly more cumbersome to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutating is problematic. Consider one section of your app is using CodecOptions, and then another unrelated thread changes some of the settings (eg via add_codec or some other option), that could violate the assumptions of the other code and lead to TypeErrors.

I think I would even prefer an api like this:

orig_codecs = coll.codec_options.type_registry.codecs
new_registry = TypeRegistry([new_codec]+orig_codecs)
new_coll = coll.with_options(codec_options=coll.codec_options.with_options(type_registry=new_registry))

All we need to add is a type_registry.codecs property. Sure this is cumbersome but it does the job, follows our existing conventions, and should be rarely used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're saying instead of adding an API to add codecs, just add an API to return all of the current codecs? Ignoring the double underscore on TypeRegistry.__type_codecs, doesn't that already exist and make this change unneeded in the first place?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but __type_codecs is private. We need to expose it in a public api (as well as fallback_encoder).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the change is making those two attributes publicly available. I like that approach.

@NoahStapp
Copy link
Contributor Author

@ShaneHarvey made a good point that this can be done with fewer potentially far-reaching changes by simply making the following pattern available publicly:

orig_codecs = coll.codec_options.type_registry.codecs
fallback_encoder = coll.codec_options.type_registry.fallback_encoder
new_registry = TypeRegistry([new_codec]+orig_codecs, fallback_encoder)
new_coll = coll.with_options(codec_options=coll.codec_options.with_options(type_registry=new_registry))

--------------------------------------
PyMongo 4.14 brings a number of changes including:

- Added :meth:`bson.codec_options.TypeRegistry.codecs` and :meth:`bson.codec_options.TypeRegistry.fallback_encoder` properties
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ShaneHarvey
ShaneHarvey previously approved these changes May 20, 2025
@NoahStapp NoahStapp merged commit 717fb47 into mongodb:master May 21, 2025
76 of 78 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants