-
Notifications
You must be signed in to change notification settings - Fork 76
marshmallow_dataclasses.class_schema() fails in multithreaded app due to thread-safety issue #282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Because forward references are allowed in type annotations. E.g. from __future__ import annotations
from dataclasses import dataclass
@dataclass
class A:
b: B
# Cannot construct schema for A yet, because type B is not yet defined.
@dataclass
class B:
x: int I haven't looked at this in a while, but I vaguely recall that things can get even more twisted when types are defined in function-local scopes. |
Again, I haven't looked at this for a bit, so I may be mistaken, but my gut response is: Constructing Marshmallow schemas is not thread-safe. The metaclass for That being the case, my first reaction is that we ( |
Thank you for following up, Jeff. One part of the challenge is that none of
this is documented as being not thread-safe, so we stumble on the problem
if we’re lucky enough to catch it during dev and testing, and spend
non-trivial time troubleshooting it.
The other part is that there is no recommended workaround. What would you
recommend as a safe way to use marshmallow data class in a multithreaded
environment?
Finally, you mention elsewhere that one needs to access .Schema attribute
of the data class in order to avoid memory leaks. Is calling
marshmallow_dataclass.class_schema() on the dataclass equivalent to
accessing its .Schema attribute for this purpose?
Many thanks,
Vitaly
…On Mon, May 5, 2025 at 2:48 AM Jeff Dairiki ***@***.***> wrote:
*dairiki* left a comment (lovasoa/marshmallow_dataclass#282)
<#282 (comment)>
It turns out that class_schema() may invoke
marshmallow.class_registry.register(), which is not thread-safe.
Again, I haven't looked at this for a bit, so I may be mistaken, but my
gut response is:
Constructing Marshmallow schemas is not thread-safe. The metaclass for
marshmallow.Schema invokes class_register.register, therefore, defining
subclasses of marshmallow.Schema is not thread-safe. (That may be what
you are already saying.)
That being the case, my first reaction is that we (marshmallow_dataclass)
should not be expected to go to any pains to make class_schema() (our
construction process for creating Marshmallow schema classes) any more
thread-safe than Marshmallow is. If creating schema classes in Marshmallow
is not thread-safe, it should not be surprising that using
marshmallow_dataclass to construct Marshmallow schema classes is
similarly un-thread-safe. (Creating *instances* of Marshmallow schema
classes is thread-safe. At least, it should be.)
—
Reply to this email directly, view it on GitHub
<#282 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAK72KW4E7JHMAISSZUGHPL244CVBAVCNFSM6AAAAAB4NBFEDWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQNJQGA3DINBQHA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
@dairiki I'm only becoming familiar with this issue now, but I think the Marshmallow maintainers recently (~5 months ago) might be making progress towards fixing this in their Many issues involving non-threadsafety of nested Schema definitions have been closed after they merged this PR, so I'm hopeful the context object is used by the class_registry and that would mean this change would help the class_registry become thread-safe somehow? |
Uh oh!
There was an error while loading. Please reload this page.
I came across a marshmallow-dataclass-related race condition in my new code that was resulting in the marshmallow exception when using
class_schema()
while marshalling and unmarshalling dataclasses from threads of a multi-threaded application:I root caused the exception to the following. My code was invoking marshmallow_dataclass.class_schema() from threads:
It turns out that class_schema() may invoke marshmallow.class_registry.register(), which is not thread-safe. It performs operations on a global
dict
, andmarshmallow.class_registry.register()
is in excess of a single GIL-protected byte code.I have not yet come up with a simple test case, but it's easy to see via simple code inspection that
marshmallow.class_registry.register()
(and class_schema() as a result) is not thread-safe: https://github.com/marshmallow-code/marshmallow/blob/a651fbcb85f4a2fc1bbc9df168e66476854fe178/src/marshmallow/class_registry.py#L26-L69.I was able to work around the race condition by forcing generation of the schema at import time by calling
class_schema()
on all of my dataclasses at the top level of the module. However, in a large code base with multiple contributors, it's nearly impossible to ensure that a workaround like that won't be missed and show up in production, since some of the scenarios might not trigger the race condition in the test environment, but would show up in production later.On a related note:
.Schema()
orclass_schema()()
is invoked? Presumably, if someone is usingmarshmallow_dataclass.dataclass
, they will be marshalling and/or unmarshalling with that class, so pre-generating the schema wouldn't be a waste. And if they are not using it for marshalling/unmarshalling, then they should just be using the builtindataclasses.dataclass
, instead.The text was updated successfully, but these errors were encountered: