Skip to content

Call PySys_SetArgv when initializing interpreter. #2341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Aug 26, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
1801811
Call PySys_SetArgv when initializing interpreter.
drmoose Jul 29, 2020
c3e96c2
Document argc/argv parameters in initialize_interpreter.
drmoose Jul 30, 2020
5145e58
Remove manual memory management from set_interpreter_argv in favor of…
drmoose Jul 30, 2020
62243f2
Use size_t for indexers in set_interpreter_argv.
drmoose Jul 30, 2020
32c1445
Minimize macros for flow control in set_interpreter_argv.
drmoose Jul 30, 2020
3ff967d
Fix 'unused variable' warning on Py2
drmoose Jul 30, 2020
27ad85e
whitespace
drmoose Jul 30, 2020
fb3de9f
Define wide_char_arg_deleter outside set_interpreter_argv.
bstaletic Aug 25, 2020
3b8438e
Do sys.path workaround in C++ rather than eval.
bstaletic Aug 26, 2020
0566160
Factor out wchar conversion to a separate function.
bstaletic Aug 28, 2020
74243c5
Restore widened_argv variable declaration.
drmoose Aug 28, 2020
c55ab94
Fix undeclared widened_arg variable on some paths.
drmoose Aug 28, 2020
fe38a24
Use delete[] to match new wchar_t[].
drmoose Aug 28, 2020
3aa548f
Fix compiler errors
drmoose Aug 28, 2020
1d7f2b3
Use PY_VERSION_HEX for a cleaner CVE-2008-5983 mode check.
drmoose Aug 28, 2020
d5c1df9
Fix typo
drmoose Aug 28, 2020
aa3b8b8
Use explicit type for deleter so delete[] works cross-compiler.
drmoose Aug 28, 2020
0627b12
Always use PySys_SetArgvEx because pybind11 doesn't support pythons t…
drmoose Aug 28, 2020
5c38bc5
Remove pointless ternary operator.
drmoose Aug 28, 2020
8358be4
Use unique_ptr.reset instead of a second initialization.
drmoose Aug 28, 2020
8d64831
Rename add_program_dir_to_path parameter to clarify intent.
drmoose Aug 31, 2020
3ad3c6d
Merge remote-tracking branch 'upstream/master' into set_interpreter_argv
drmoose Oct 10, 2020
abc7b38
Add defined() check before evaluating HAVE_BROKEN_MBSTOWCS.
drmoose Oct 10, 2020
65881fe
Merge branch 'master' into set_interpreter_argv
Skylion007 Aug 9, 2021
7763c78
Apply clang-tidy fixes
Skylion007 Aug 12, 2021
6ddc929
Merge branch 'master' of https://github.com/pybind/pybind11 into set_…
Skylion007 Aug 16, 2021
ff2976d
Pre-commit
Skylion007 Aug 16, 2021
912e1c9
refactor: use const for set_interpreter_argv
henryiii Aug 21, 2021
52d88c3
Merge branch 'master' of https://github.com/pybind/pybind11 into set_…
Skylion007 Aug 21, 2021
1ede69e
Try to fix const issue and allocate vector properly
Skylion007 Aug 21, 2021
318495e
fix: copy strings on Python 2
henryiii Aug 21, 2021
96cb69f
Merge branch 'master' into set_interpreter_argv
rwgk Aug 26, 2021
e5c3305
Applying clang-format-diff relative to master.
rwgk Aug 26, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 110 additions & 8 deletions include/pybind11/embed.h
Original file line number Diff line number Diff line change
Expand Up @@ -87,29 +87,126 @@ struct embedded_module {
}
};

/// Python 2.x/3.x-compatible version of `PySys_SetArgv`
inline void set_interpreter_argv(int argc, char** argv, bool add_current_dir_to_path) {
// Before it was special-cased in python 3.8, passing an empty or null argv
// caused a segfault, so we have to reimplement the special case ourselves.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: shouldn't we add an debug assert that argc >= 0? We are casting a signed int to an unsigned size_t

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's covered by the argc <= 0 below, combined with argc = 1.

char** safe_argv = argv;
if (nullptr == argv || argc <= 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (nullptr == argv || argc <= 0) {
if (argv == nullptr || argc <= 0) {

I prefer the variable first (such as in the second half of this line, in fact).

safe_argv = new char*[1];
if (nullptr == safe_argv) return;
safe_argv[0] = new char[1];
if (nullptr == safe_argv[0]) {
delete[] safe_argv;
return;
}
safe_argv[0][0] = '\0';
Copy link
Collaborator

@henryiii henryiii Aug 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this function really modify the char** it is passed? Here it's modifying a new one, for example. It doesn't seem directly obvious why this is not const. I found const char* const* preferable in some cases in CLI11, IIRC.

You can always be more strict; if you have a char**, you can call a const char* const* on it. But you can't (without feelling really bad about yourself about casting, anyway) go the other way.

argc = 1;
}
#if PY_MAJOR_VERSION >= 3
// SetArgv* on python 3 takes wchar_t, so we have to convert.
wchar_t** widened_argv = new wchar_t*[static_cast<unsigned>(argc)];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why static_cast<unsigned>? Is it to avoid sign conversion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Although I couldn't reproduce it on any of my local compilers, github's clang gave an error there. argc is guaranteed to be positive by the if guard above, so the cast is safe.

for (int ii = 0; ii < argc; ++ii) {
# if PY_MINOR_VERSION >= 5
// From Python 3.5 onwards, we're supposed to use Py_DecodeLocale to
// generate the wchar_t version of argv.
widened_argv[ii] = Py_DecodeLocale(safe_argv[ii], nullptr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're trying to avoid sign conversions, ii should be unsigned as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but that would make the loop condition a signed comparison error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be cleaner if I did

size_t argc_unsigned = static_cast<unsigned>(argc)

at the top of the function and then just used size_ts throughout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 ...nope. If I do that I get narrowing conversion errors on the PySys_SetArgv calls.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the impossible things to get completely right without using some casts. The array indices are size_t, yet argv is an int.

I'm not going to nag about this, but there is some inconsistency in this patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved slightly (ii is now size_t) but still a little awkward.

# define FREE_WIDENED_ARG(X) PyMem_RawFree(X)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be PyMem_RawFree() or PyMem_Free()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyMem_RawFree() was recommended specifically by the Py_DecodeLocale docs

# else
// Before Python 3.5, we're stuck with mbstowcs, which may or may not
// actually work. Mercifully, pyconfig.h provides this define:
# ifdef HAVE_BROKEN_MBSTOWCS
size_t count = strlen(safe_argv[ii]);
# else
size_t count = mbstowcs(nullptr, safe_argv[ii], 0);
# endif
widened_argv[ii] = nullptr;
if (count != static_cast<size_t>(-1)) {
widened_argv[ii] = new wchar_t[count + 1];
mbstowcs(widened_argv[ii], safe_argv[ii], count + 1);
}
# define FREE_WIDENED_ARG(X) delete[] X
# endif
if (nullptr == widened_argv[ii]) {
// Either we ran out of memory or had a unicode encoding issue.
// Free what we've encoded so far and bail.
for (--ii; ii >= 0; --ii)
FREE_WIDENED_ARG(widened_argv[ii]);
return;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we raise an exception here, instead of silently doing nothing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think silently doing nothing is actually OK here. Consider:

  • All this means is that PySys_SetArgvEx won't be called, but current pybind doesn't call it, so probably 95% of users won't notice or care.
  • Assuming locales are configured correctly on your machine, you are probably out of memory. As such, your exception handler is not likely to work.
  • Assuming locales are configured incorrectly, a failure here will be harder to debug than the inevitable failures in the user's python code.

That said, I'm not married to the decision. It just seemed like a reasonable fallback to revert to the old behavior.

}
}

# if PY_MINOR_VERSION < 1 || (PY_MINOR_VERSION == 1 && PY_MICRO_VERSION < 3)
# define NEED_PYRUN_TO_SANITIZE_PATH 1
// don't have SetArgvEx yet
PySys_SetArgv(argc, widened_argv);
# else
PySys_SetArgvEx(argc, widened_argv, add_current_dir_to_path ? 1 : 0);
# endif

// PySys_SetArgv makes new PyUnicode objects so we can clean up this memory
if (nullptr != widened_argv) {
for (int ii = 0; ii < argc; ++ii)
if (nullptr != widened_argv[ii])
FREE_WIDENED_ARG(widened_argv[ii]);
delete[] widened_argv;
}
# undef FREE_WIDENED_ARG
#else
// python 2.x
# if PY_MINOR_VERSION < 6 || (PY_MINOR_VERSION == 6 && PY_MICRO_VERSION < 6)
# define NEED_PYRUN_TO_SANITIZE_PATH 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a macro?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. My worry was that if I made it a bool, a smart optimizer would be able to see that the PyRun_SimpleString call is unreachable code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really, really don't like using macros for flow control.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed how this code is gated in the rewrite; it's still gated by the PY_*_VERSION macros (because I'm not sure how to not do that), but I avoid the #define...#undef, which I agree was hideous.

// don't have SetArgvEx yet
PySys_SetArgv(argc, safe_argv);
# else
PySys_SetArgvEx(argc, safe_argv, add_current_dir_to_path ? 1 : 0);
# endif
#endif

#ifdef NEED_PYRUN_TO_SANITIZE_PATH
# undef NEED_PYRUN_TO_SANITIZE_PATH
if (!add_current_dir_to_path)
PyRun_SimpleString("import sys; sys.path.pop(0)\n");
#endif

// if we allocated new memory to make safe_argv, we need to free it
if (safe_argv != argv) {
delete[] safe_argv[0];
delete[] safe_argv;
}
}

PYBIND11_NAMESPACE_END(detail)

/** \rst
Initialize the Python interpreter. No other pybind11 or CPython API functions can be
called before this is done; with the exception of `PYBIND11_EMBEDDED_MODULE`. The
optional parameter can be used to skip the registration of signal handlers (see the
`Python documentation`_ for details). Calling this function again after the interpreter
has already been initialized is a fatal error.
optional `init_signal_handlers` parameter can be used to skip the registration of
signal handlers (see the `Python documentation`_ for details). Calling this function
again after the interpreter has already been initialized is a fatal error.

If initializing the Python interpreter fails, then the program is terminated. (This
is controlled by the CPython runtime and is an exception to pybind11's normal behavior
of throwing exceptions on errors.)

The remaining optional parameters, `argc`, `argv`, and `add_current_dir_to_path` are
used to populate ``sys.argv`` and ``sys.path``.
See the |PySys_SetArgvEx documentation|_ for details.

.. _Python documentation: https://docs.python.org/3/c-api/init.html#c.Py_InitializeEx
.. |PySys_SetArgvEx documentation| replace:: ``PySys_SetArgvEx`` documentation
.. _PySys_SetArgvEx documentation: https://docs.python.org/3/c-api/init.html#c.PySys_SetArgvEx
\endrst */
inline void initialize_interpreter(bool init_signal_handlers = true) {
inline void initialize_interpreter(bool init_signal_handlers = true,
int argc = 0,
char** argv = nullptr,
bool add_current_dir_to_path = true) {
if (Py_IsInitialized())
pybind11_fail("The interpreter is already running");

Py_InitializeEx(init_signal_handlers ? 1 : 0);

// Make .py files in the working directory available by default
module::import("sys").attr("path").cast<list>().append(".");
detail::set_interpreter_argv(argc, argv, add_current_dir_to_path);
}

/** \rst
Expand Down Expand Up @@ -171,6 +268,8 @@ inline void finalize_interpreter() {
Scope guard version of `initialize_interpreter` and `finalize_interpreter`.
This a move-only guard and only a single instance can exist.

See `initialize_interpreter` for a discussion of its constructor arguments.

.. code-block:: cpp

#include <pybind11/embed.h>
Expand All @@ -182,8 +281,11 @@ inline void finalize_interpreter() {
\endrst */
class scoped_interpreter {
public:
scoped_interpreter(bool init_signal_handlers = true) {
initialize_interpreter(init_signal_handlers);
scoped_interpreter(bool init_signal_handlers = true,
int argc = 0,
char** argv = nullptr,
bool add_current_dir_to_path = true) {
initialize_interpreter(init_signal_handlers, argc, argv, add_current_dir_to_path);
}

scoped_interpreter(const scoped_interpreter &) = delete;
Expand Down
24 changes: 24 additions & 0 deletions tests/test_embed/test_interpreter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ class Widget {

std::string the_message() const { return message; }
virtual int the_answer() const = 0;
virtual std::string argv0() const = 0;

private:
std::string message;
Expand All @@ -31,6 +32,7 @@ class PyWidget final : public Widget {
using Widget::Widget;

int the_answer() const override { PYBIND11_OVERLOAD_PURE(int, Widget, the_answer); }
std::string argv0() const override { PYBIND11_OVERLOAD_PURE(std::string, Widget, argv0); }
};

PYBIND11_EMBEDDED_MODULE(widget_module, m) {
Expand Down Expand Up @@ -282,3 +284,25 @@ TEST_CASE("Reload module from file") {
result = module.attr("test")().cast<int>();
REQUIRE(result == 2);
}

TEST_CASE("sys.argv gets initialized properly") {
py::finalize_interpreter();
{
py::scoped_interpreter default_scope;
auto module = py::module::import("test_interpreter");
auto py_widget = module.attr("DerivedWidget")("The question");
const auto &cpp_widget = py_widget.cast<const Widget &>();
REQUIRE(cpp_widget.argv0() == "");
}

{
char* argv[] = { strdup("a.out") };
py::scoped_interpreter argv_scope(true, 1, argv);
free(argv[0]);
auto module = py::module::import("test_interpreter");
auto py_widget = module.attr("DerivedWidget")("The question");
const auto &cpp_widget = py_widget.cast<const Widget &>();
REQUIRE(cpp_widget.argv0() == "a.out");
}
py::initialize_interpreter();
}
4 changes: 4 additions & 0 deletions tests/test_embed/test_interpreter.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# -*- coding: utf-8 -*-
from widget_module import Widget
import sys


class DerivedWidget(Widget):
Expand All @@ -8,3 +9,6 @@ def __init__(self, message):

def the_answer(self):
return 42

def argv0(self):
return sys.argv[0]