Skip to content

fix: Make non-embedded module clear their locals before deallocation #3798

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions include/pybind11/detail/internals.h
Original file line number Diff line number Diff line change
Expand Up @@ -510,11 +510,19 @@ struct local_internals {
};

/// Works like `get_internals`, but for things which are locally registered.
/// There is one variable storing the local internals for each module, but every embedded module
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little unclear if I understand this correctly. I'm wondering is

but all embedded modules share the same locals.

correct? If not I'm confused, if yes that would be my suggested wording.

Copy link
Contributor Author

@StarQTius StarQTius Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry if it does not sound clear, English is not my native language. What I meant is that since the static local variable of get_local_internals does not get dynamically linked when a non-embedded module is loaded by the interpreter, you end up with one definition of the locals for the interpreter, and one definition for each non-embedded modules.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in that case my suggested wording is clearer. But I'm not a native speaker, too!
Let's see what @Skylion007 thinks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm having serious 2nd thoughts.

E.g. at Google, Python and all extension modules for a given binary (or test) are linked together statically. Each binary or test is hermetic.

Others may use RTLD_GLOBAL. Probably rare, but maybe in certain environment that's what's needed?

What does this PR mean in those situations? I'm not sure. Does someone else have a clear understanding of how the m_clear functionality plays out across all environments?

Guessing a lot: I think to be completely safe, we'd need some kind of lookup table, to keep track of which module_def pointers are associated with a given local_internals pointer, and if more than one, we cannot safely clear_local_internals until the last one disappears.

Or we could have static std::unordered_map<module_def *, local_internals> with all the required logic to register & clear types given a module_def pointer, but search for types over all keys of the map.

It seems pretty tricky to get the m_clear functionality correct in all environments.

Given those uncertainties, and that pybind11 existed to this point without using m_clear, maybe this should be an opt-in, e.g. via some #define, or runtime option? That would be a safer and less sudden way to find out what's possible and what not.

/// shares the same locals.
inline local_internals &get_local_internals() {
static local_internals locals;
return locals;
}

/// Clears the locally registered types and exception translators.
inline void clear_local_internals() {
detail::get_local_internals().registered_types_cpp.clear();
detail::get_local_internals().registered_exception_translators.clear();
}

/// Constructs a std::string with the given arguments, stores it in `internals`, and returns its
/// `c_str()`. Such strings objects have a long storage duration -- the internal strings are only
/// cleared when the program exits or after interpreter shutdown (when embedding), and so are
Expand Down
10 changes: 6 additions & 4 deletions include/pybind11/embed.h
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,11 @@
static ::pybind11::module_::module_def PYBIND11_CONCAT(pybind11_module_def_, name); \
static void PYBIND11_CONCAT(pybind11_init_, name)(::pybind11::module_ &); \
static PyObject PYBIND11_CONCAT(*pybind11_init_wrapper_, name)() { \
auto m = ::pybind11::module_::create_extension_module( \
PYBIND11_TOSTRING(name), nullptr, &PYBIND11_CONCAT(pybind11_module_def_, name)); \
auto m = ::pybind11::detail::create_extension_module_impl( \
PYBIND11_TOSTRING(name), \
nullptr, \
&PYBIND11_CONCAT(pybind11_module_def_, name), \
true); \
try { \
PYBIND11_CONCAT(pybind11_init_, name)(m); \
return m.ptr(); \
Expand Down Expand Up @@ -200,8 +203,7 @@ inline void finalize_interpreter() {
}
// Local internals contains data managed by the current interpreter, so we must clear them to
// avoid undefined behaviors when initializing another interpreter
detail::get_local_internals().registered_types_cpp.clear();
detail::get_local_internals().registered_exception_translators.clear();
detail::clear_local_internals();

Py_Finalize();

Expand Down
67 changes: 44 additions & 23 deletions include/pybind11/pybind11.h
Original file line number Diff line number Diff line change
Expand Up @@ -1127,6 +1127,14 @@ class cpp_function : public function {
}
};

class module_;

PYBIND11_NAMESPACE_BEGIN(detail)

module_ create_extension_module_impl(const char *, const char *, PyModuleDef *, bool);

PYBIND11_NAMESPACE_END(detail)

/// Wrapper for Python extension modules
class module_ : public object {
public:
Expand Down Expand Up @@ -1221,32 +1229,45 @@ class module_ : public object {
``def`` should point to a statically allocated module_def.
\endrst */
static module_ create_extension_module(const char *name, const char *doc, module_def *def) {
// module_def is PyModuleDef
// Placement new (not an allocation).
def = new (def)
PyModuleDef{/* m_base */ PyModuleDef_HEAD_INIT,
/* m_name */ name,
/* m_doc */ options::show_user_defined_docstrings() ? doc : nullptr,
/* m_size */ -1,
/* m_methods */ nullptr,
/* m_slots */ nullptr,
/* m_traverse */ nullptr,
/* m_clear */ nullptr,
/* m_free */ nullptr};
auto *m = PyModule_Create(def);
if (m == nullptr) {
if (PyErr_Occurred()) {
throw error_already_set();
}
pybind11_fail("Internal error in module_::create_extension_module()");
}
// TODO: Should be reinterpret_steal for Python 3, but Python also steals it again when
// returned from PyInit_...
// For Python 2, reinterpret_borrow was correct.
return reinterpret_borrow<module_>(m);
return detail::create_extension_module_impl(name, doc, def, false);
}
};

PYBIND11_NAMESPACE_BEGIN(detail)

// Initializes a 'PyModuleDef' instance and create an extension module out of it.
inline module_ create_extension_module_impl(const char *name,
const char *doc,
PyModuleDef *def,
bool is_embedded) {
// module_def is PyModuleDef
// Placement new (not an allocation).
def = new (def) PyModuleDef{
/* m_base */ PyModuleDef_HEAD_INIT,
/* m_name */ name,
/* m_doc */ options::show_user_defined_docstrings() ? doc : nullptr,
/* m_size */ -1,
/* m_methods */ nullptr,
/* m_slots */ nullptr,
/* m_traverse */ nullptr,
/* m_clear */ nullptr,
// Non-embedded modules must clean their own locals since they can only be accessed by them
/* m_free */ is_embedded ? nullptr : +[](void *) { detail::clear_local_internals(); }};
auto *m = PyModule_Create(def);
if (m == nullptr) {
if (PyErr_Occurred()) {
throw error_already_set();
}
pybind11_fail("Internal error in module_::create_extension_module_impl()");
}
// TODO: Should be reinterpret_steal for Python 3, but Python also steals it again when
// returned from PyInit_...
// For Python 2, reinterpret_borrow was correct.
return reinterpret_borrow<module_>(m);
}

PYBIND11_NAMESPACE_END(detail)

// When inside a namespace (or anywhere as long as it's not the first item on a line),
// C++20 allows "module" to be used. This is provided for backward compatibility, and for
// simplicity, if someone wants to use py::module for example, that is perfectly safe.
Expand Down
7 changes: 7 additions & 0 deletions tests/test_embed/external_module.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,15 @@ PYBIND11_MODULE(external_module, m) {
int v;
};

class B {};

py::class_<A>(m, "A").def(py::init<int>()).def_readwrite("value", &A::v);

// PR #3798 : Local internals must be cleared on finalization so they can be registered again.
// This class is not explicitly used in the test, but it is still registered / unregistered
// when the interpreter stops and restarts.
py::class_<B>(m, "B", py::module_local());

m.def("internals_at",
[]() { return reinterpret_cast<uintptr_t>(&py::detail::get_internals()); });
}
1 change: 1 addition & 0 deletions tests/test_embed/test_interpreter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,7 @@ TEST_CASE("Restart the interpreter") {
== py::module_::import("external_module").attr("internals_at")().cast<uintptr_t>());

// Restart the interpreter.
// PR #3798 : Local internals must be cleared on finalization so they can be registered again.
py::finalize_interpreter();
REQUIRE(Py_IsInitialized() == 0);

Expand Down