Skip to content

Massive memory leak since 8.5.5 / leaking the entire stack frame and all local variables everywhere #198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
luk2302 opened this issue May 27, 2022 · 7 comments · Fixed by #258 · May be fixed by #214
Closed

Massive memory leak since 8.5.5 / leaking the entire stack frame and all local variables everywhere #198

luk2302 opened this issue May 27, 2022 · 7 comments · Fixed by #258 · May be fixed by #214

Comments

@luk2302
Copy link

luk2302 commented May 27, 2022

In #187 a change was implemented that included the current stack frame in an internal cache, therefore causing all local variables of the entire call stack to never be freed. Demo:

from dataclasses import dataclass
import marshmallow_dataclass
from marshmallow.schema import BaseSchema
from marshmallow_dataclass import _internal_class_schema


class Outer:
    def __del__(self):
        print("deleting outer...")


@dataclass(frozen=True)
class Sample:
    a: str
    def __del__(self):
        print("deleting sample...")


def inner_demo():
    schema = marshmallow_dataclass.class_schema(Sample, base_schema=BaseSchema)()
    data = schema.load({"a": "a string"})
    return data


def demo():
    d = inner_demo()
    o = Outer()


for i in range(10):
    demo()
    print(_internal_class_schema.cache_info())

print("done with demo, no object deleted so far")
_internal_class_schema.cache_clear()
print("cleared cache")

No __del__ is called on either Sample or Outer during the method invocations, only after the cache_clear can the local variables be garbage collected. If you run the demo with 8.4.4 or just replace _internal_class_schema(clazz, base_schema, clazz_frame) with _internal_class_schema(clazz, base_schema, None) in marshmallow_dataclass.class_schema. Additionally this also demonstrates that the caching does no longer work as there no cache hits at all anymore.

@luk2302 luk2302 changed the title Massive memory leak since 8.5.5 Massive memory leak since 8.5.5 / leaking the entire stack frame and all local variables everywhere Jun 10, 2022
dairiki added a commit to dairiki/marshmallow_dataclass that referenced this issue Sep 24, 2022
dairiki added a commit to dairiki/marshmallow_dataclass that referenced this issue Sep 24, 2022
dairiki added a commit to dairiki/marshmallow_dataclass that referenced this issue Sep 24, 2022
dairiki added a commit to dairiki/marshmallow_dataclass that referenced this issue Sep 25, 2022
dairiki added a commit to dairiki/marshmallow_dataclass that referenced this issue Jan 19, 2023
dairiki added a commit to dairiki/marshmallow_dataclass that referenced this issue Jan 19, 2023
mvanderlee pushed a commit to mvanderlee/marshmallow_dataclass that referenced this issue Mar 10, 2024
mvanderlee pushed a commit to mvanderlee/marshmallow_dataclass that referenced this issue Mar 10, 2024
@LostInDarkMath
Copy link

Hi,
I came across this issue because our software has a memory leak in production. After days of troubleshooting and narrowing down the problem, I came across this ticket.

Are there any plans to fix this in the near future?
And how can it be that such a critical bug remains unfixed for 2 years?

@lovasoa
Copy link
Owner

lovasoa commented Mar 19, 2024

I understand your frustration, but please refrain from "how can it be", when referring to free and open source software. Instead, you can offer time or money to free software projects to unblock situations like this.

dairiki added a commit that referenced this issue Mar 20, 2024
* Test for memory leaks as described in #198

* Possible fix for #198: memory leak

* Optimization: avoid holding frame reference when locals == globals

* Get caller frame at decoration-time

Here we are more careful about which caller's locals we use to
resolve forward type references.  We want the callers locals
at decoration-time — not at decorator-construction time.

Consider:
```py
frozen_dataclass = marshmallow_dataclass.dataclass(frozen=True)

def f():
    @custom_dataclass
    class A:
        b: "B"

    @custom_dataclass
    class B:
        x: int
```

The locals we want in this case are the one from where the
custom_dataclass decorator is called, not from where
marshmallow_dataclass.dataclass is called.

* Add ability to pass explicit localns (and globalns) to class_schema

When class_schema is called, it doesn't need the caller's whole stack
frame.  What it really wants is a `localns` to pass to
`typing.get_type_hints` to be used to resolve type references.

Here we add the ability to pass an explicit `localns` parameter to
`class_schema`.  We also add the ability to pass an explicit
`globalns`, because ... might as well — it might come in useful.
(Since we need these only to pass to `get_type_hints`, we might
as well match `get_type_hints` API as closely as possible.)

* test: check for frame leakage when decorators throw exceptions

* Fix mypy by setting python to the minimum supported version, 3.8

---------

Co-authored-by: Jeff Dairiki <[email protected]>
@dairiki
Copy link
Collaborator

dairiki commented Mar 20, 2024

This is (mostly) fixed by #258 (marshmallow_dataclass >= 8.6.1).

When necessary, references to frames are still held until the schema is constructed (by accessing the .Schema attribute). Memory leakage is still possible if large numbers of marshmallow_dataclasses are constructed without accessing their .Schema attribute.

@vitaly-krugl
Copy link

vitaly-krugl commented May 4, 2025

Hi @dairiki - I wanted to follow up regarding:

Memory leakage is still possible if large numbers of marshmallow_dataclasses are constructed without accessing their .Schema attribute.

Does this mean that if a large number of these dataclasses are constructed using the dataclass constructor directly (without unmarshalling from data), then we still have a memory leak situation in the latest version of marshmallow-dataclass?

If so, what is the point of deferring schema construction, if deferring leads to serious issues? The deferring is also responsible for a terrible race condition in multi-threaded applications - see #282. Can the marshmallow_dataclass.dataclass decorator construct the schema inline instead (without deferring), and put all of these issues to rest?

Also, in this context, I am assuming that calling class_schema() on the dataclass is equivalent to accessing its .Schema attribute. Is this correct?

If I have a marshmallow dataclass and schema generation defined at top level of module like below, do I still need to be concerned about memory leaks after your fix?

@marshmallow_dataclass.dataclass
class MyDataclass:
  s: str
  i: int

MyDataclassSchema = marshmallow_dataclass.class_schema(MyDataclass)()

@vkruglik-aka
Copy link

Hi @lovasoa, sorry to bother you - could you please chime in on my comment above - #198 (comment)? Many thanks in advance.

@lovasoa
Copy link
Owner

lovasoa commented May 26, 2025

Hi! I'm sorry, I don't use this package myself anymore and currently do not have a lot of bandwidth to maintain it. Happy to add new maintainers, though!

@vkruglik-aka
Copy link

vkruglik-aka commented May 27, 2025

Thank you for following up, @lovasoa. Has there been an attempt to get the core marshmallow project to take marshmallow-dataclass under its wing? They seem like a great pairing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants