Skip to content

Do not set docstring for function when it's empty #2745

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 28, 2020

Conversation

lqf96
Copy link
Contributor

@lqf96 lqf96 commented Dec 25, 2020

Description

In pybind11, automatic docstring generated from function signature and user-provided documents can be suppressed by disabling all docstring-related options. However, this currently produces an empty docstring, which is unnecessary and interferes with situations where users want to manually add docstring to the bound function later directly through the Python C API (for example, PyTorch currently does this to its manual C bindings and I find it difficult to extend this to work with pybind11-bound functions without touching pybind11's implementation details). This pull request modifies this behavior, such that when all docstring-related options are disabled and the resulting docstring is empty, no docstring is set on the function.

Suggested changelog entry:

* Leave docstring unset when all docstring-related options are disabled, rather than set an empty string.

@lqf96 lqf96 force-pushed the func-docstring-none branch from cf33e61 to abe59a9 Compare December 25, 2020 09:09
Copy link
Collaborator

@YannickJadoul YannickJadoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interferes with situations where users want to manually add docstring to the bound function later directly through the Python C API

Could you elaborate on this? How exactly does it interfere?

Just curious to know what you mean. Apart from that, obviously, I'm not against not doing work when it's not necessary, so this looks good to me :-)

I find it difficult to extend this to work with pybind11-bound functions without touching pybind11's implementation details

Agreed. I hope that on the medium/long-term, pybind11 would allow a more fine-grained control over customizing this.

@lqf96
Copy link
Contributor Author

lqf96 commented Dec 26, 2020

interferes with situations where users want to manually add docstring to the bound function later directly through the Python C API

Could you elaborate on this? How exactly does it interfere?

Just curious to know what you mean. Apart from that, obviously, I'm not against not doing work when it's not necessary, so this looks good to me :-)

I have been working on a series of pull requests for PyTorch and one of them (pytorch/pytorch#49771) is replacing legacy manual bindings of torch.Generator (PyTorch's equivalent of random.Random and numpy.random.RandomState) with pybind11 bindings. The legacy bindings does not have any docstrings associated with them when they are created using the Python C API. Rather, it is added later through an internal function (torch._C._add_docstr), which sets the docstring for binding functions or properties from the C side. In this way all docstrings can be kept in a separated Python file and added to the APIs during initialization.

Previously, torch._C._add_docstr only handles manual bindings. When I'm trying to extend this function to work with pybind11 bindings, I find out that even if all docstring options are disabled, the resulting instancemethod would have an empty docstring. To replace these docstrings with PyTorch ones, I need to first free the existing ml_doc are then assign it with a copy of the PyTorch docstring created by strdup, otherwise there will be memory leaks. However, the way ml_doc is allocated and released by pybind11 is clearly implementation detail, and by doing so I'm utilizing this implementation detail which is not a good thing.

With this pull request, the ml_doc will be empty and I no longer need to free the existing docstring. However, to set the ml_doc field I still need to strdup PyTorch-provided docstrings, so the problem of using implementation detail is not totally resolved. I plan to come up with another PR to acknowledge this. I also feel that even without all the context, this is still a natural thing to do because in pure Python if you don't set the docstring of a function, the __doc__ field will be None, and this PR matches that behavior exactly.

@YannickJadoul
Copy link
Collaborator

That's very elaborate, thanks! :-)

I need to first free the existing ml_doc are then assign it with a copy of the PyTorch docstring created by strdup, otherwise there will be memory leaks.

And yep, I had completely forgotten about this! Ofc, you could set the __doc__ Python attribute (which is what I thought you were doing), but that would probably be a lot less efficient :-)

However, the way ml_doc is allocated and released by pybind11 is clearly implementation detail, and by doing so I'm utilizing this implementation detail which is not a good thing.

Yep, though expecting that pybind11 will create PyCFunctions might also be considered an implementation detail? One of the roads to fixing #2722 might involve steering away from those; but nothing's been decided or even seriously tried, though.

@lqf96
Copy link
Contributor Author

lqf96 commented Dec 26, 2020

And yep, I had completely forgotten about this! Ofc, you could set the __doc__ Python attribute (which is what I thought you were doing), but that would probably be a lot less efficient :-)

Will this is actually impossible :( Trying to set the __doc__ attribute will result in an error saying that __doc__ is readonly for instancemethods.

BTW I just pushed another commit to remove the redundant pointer check logic. Now it looks cleaner :)

@YannickJadoul
Copy link
Collaborator

And yep, I had completely forgotten about this! Ofc, you could set the __doc__ Python attribute (which is what I thought you were doing), but that would probably be a lot less efficient :-)

Will this is actually impossible :( Trying to set the __doc__ attribute will result in an error saying that __doc__ is readonly for instancemethods.

Argh, yes, sorry. I should have known that. Those silly C-function wrappers.

BTW I just pushed another commit to remove the redundant pointer check logic. Now it looks cleaner :)

I saw; thanks! Looks good, now :-) I had already approved, but I'll invite a few others, to see if we can get one or a few more sanity checks :-)

std::free(const_cast<char *>(func->m_ml->ml_doc));
func->m_ml->ml_doc = strdup(signatures.c_str());
std::free(const_cast<char *>(func->m_ml->ml_doc));
func->m_ml->ml_doc = nullptr;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn’t this be in an else branch below the next line? Or maybe oven better, make it a ternary operator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I just made it a ternary operator... and I almost forget that ternary operator is lazily evaluated like the if statement in C++. Haven't been writing C++ for some time...

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, as-is or with the ternary operator as suggested by Henry. Thanks Qifan!

@lqf96
Copy link
Contributor Author

lqf96 commented Dec 27, 2020

Just pushed another commit to address @henryiii 's comment. If there's no more change needed I can squash the commits.

@YannickJadoul
Copy link
Collaborator

If there's no more change needed I can squash the commits.

We'll automatically squash when merging ;-)

@henryiii, if there's nothing left, go ahead and push that button.

@henryiii henryiii merged commit d587a2f into pybind:master Dec 28, 2020
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Dec 28, 2020
@henryiii
Copy link
Collaborator

We only need nice history if there's a reason to keep more than one commit for a PR, like a large one or one where a file changed names and also was edited. Thanks!

@lqf96 lqf96 deleted the func-docstring-none branch December 28, 2020 04:10
@henryiii henryiii removed the needs changelog Possibly needs a changelog entry label Dec 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants