Skip to content

bpo-33625: Release GIL for grp.getgr{nam,gid} and pwd.getpw{nam,uid} #7081

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Sep 7, 2018

Conversation

william-gr
Copy link
Contributor

@william-gr william-gr commented May 23, 2018

Especially when using more complex nss modules the call might that an
unknown amount of time depending on the service in question.
This makes sures others threads are not blocked waiting the call to
finish.

https://bugs.python.org/issue33625

@tiran
Copy link
Member

tiran commented May 23, 2018

LGTM, thanks!

Please use the blurb tool to add a news entry.

@william-gr
Copy link
Contributor Author

@tiran NEWS added, let me know if thats not to your liking, its my first time doing this ; )

Thanks!

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about getpwall()? And is it worth to release GIL in the grp and the spwd modules?

@@ -0,0 +1,2 @@
Release GIL on `pwd.getpwnam` and `pwd.getpwuid`. Patch by William
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sphinx complains about using the default role. Write as :func:`pwd.getpwnam` or ``pwd.getpwnam()``.

@william-gr
Copy link
Contributor Author

@serhiy-storchaka Regarding the getpwall() I thought about it but I wondered about the impact of releasing GIL on every loop iteration. I am not familar with cpython code, would that be bad?

Regarding the other modules, I would think so. Do you want me to do that in this same PR?

@serhiy-storchaka
Copy link
Member

You are right, and switching a thread between setpwent() and getpwent() is not safe. We would need to add a special global lock for getpwall(). Yet one example when GIL makes the code safer and simpler.

@tiran
Copy link
Member

tiran commented May 24, 2018

Can you either fix getgrgid and getgrnam in this PR or open up another PR. Don't bother about spwd. The module shouldn't be used any more.

@william-gr
Copy link
Contributor Author

@serhiy-storchaka You are absolutely right about re-entrant functions. However I am not sure whats the best way to check if they are available in the system in python project. Should I add a check in configure.ac for the existence of each of them? HAVE_GETPWNAM_R, HAVE_GETPWUID_R ?

@william-gr william-gr changed the title bpo-33625: Release GIL for getpwnam and getpwuid bpo-33625: Release GIL for grp.getgr(nam,gid} and pwd.getpw{nam,uid} May 24, 2018
@william-gr
Copy link
Contributor Author

I have changed grp module and used the re-entrant versions of them. Let me know what you think. Thanks!

@william-gr william-gr changed the title bpo-33625: Release GIL for grp.getgr(nam,gid} and pwd.getpw{nam,uid} bpo-33625: Release GIL for grp.getgr{nam,gid} and pwd.getpw{nam,uid} May 24, 2018
@william-gr
Copy link
Contributor Author

I am now handling ERANGE of _r functions, will update patch soon.

@william-gr william-gr force-pushed the pwd-release-gil branch 2 times, most recently from cc6368f to 1e68811 Compare May 24, 2018 15:16
@william-gr
Copy link
Contributor Author

Updated handling ERANGE, let me know how that looks and/or any suggestions/changes.

Thank you again.

@william-gr
Copy link
Contributor Author

Windows-PR seems to be failing but I dont know why, I cant find logs in the link. How should I proceed?

int status, bufsize = NSS_BUFLEN_GROUP;
struct group *grpbuf = NULL;

p = malloc(sizeof(struct group));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a constant size buffer. It is better to use a local variable for it.

struct group grp;
...
status = getgrgid_r(gid, &grp, buf, bufsize, &p);

struct group *grpbuf = NULL;

p = malloc(sizeof(struct group));
buf = malloc(bufsize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use PyMem_RawMalloc().


do {
status = getgrgid_r(gid, p, buf, bufsize, &grpbuf);
if(grpbuf == NULL && status == ERANGE) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a space between if and ( for conforming PEP 7.

buf = malloc(bufsize);

do {
status = getgrgid_r(gid, p, buf, bufsize, &grpbuf);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use p instead of grpbuf.


do {
status = getgrgid_r(gid, p, buf, bufsize, &grpbuf);
if(grpbuf == NULL && status == ERANGE) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition is doubled in the while below. It would be better to write:

if (grpbuf != NULL || status != ERANGE) {
    break;
}

and use while (1).

do {
status = getgrgid_r(gid, p, buf, bufsize, &grpbuf);
if(grpbuf == NULL && status == ERANGE) {
free(buf);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be more efficient to use realloc.

status = getgrgid_r(gid, p, buf, bufsize, &grpbuf);
if(grpbuf == NULL && status == ERANGE) {
free(buf);
if((bufsize << 1) > (NSS_BUFLEN_GROUP << 10)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this check for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to make sure we are not allocating too big of a buffer (its a maximum cap)

@william-gr
Copy link
Contributor Author

@serhiy-storchaka thanks for the review. Let me know if I have addressed them to your liking.

@william-gr william-gr force-pushed the pwd-release-gil branch 3 times, most recently from 7702a5f to 84031ff Compare May 24, 2018 19:53
@william-gr
Copy link
Contributor Author

Is there a way I can trigger a new macOS build? The failure does not seem related to the commit.

@bedevere-bot
Copy link

Thanks for making the requested changes!

@vstinner: please review the changes made to this pull request.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New review. My comments apply to the all modified functions.

#endif
if (p == NULL) {
if (buf != NULL) {
PyMem_RawFree(buf);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyMem_RawFree(NULL) is valid (does nothing), the if() is useless.

break;
}
bufsize <<= 1;
p = PyMem_RawRealloc(buf, bufsize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's strange to reuse the "struct group *p;" variable to reallocate the "char *buf = NULL;" variable: C types are not the same!? I wold prefer to see a different variable (ex: "buf2").


bufsize = sysconf(_SC_GETGR_R_SIZE_MAX);
if (bufsize == -1) {
bufsize = 1024;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind to add a "#define DEFAULT_BUFFER_SIZE 1024" at top level, rather than using an hardcoded constant here?

if (p == NULL) {
if (nomem == 1) {
PyErr_NoMemory();
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: the PEP 7 requires

}
else {

buf = (char *) p;
}

if (status != 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to move this just after the get...() call.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@william-gr
Copy link
Contributor Author

I have made the requested changes; please review again

Thank you

@bedevere-bot
Copy link

Thanks for making the requested changes!

@vstinner: please review the changes made to this pull request.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Honestly, the new PR is way better than the first version! I was going to merge it, when I spotted another issue: you don't check if PyMem_Malloc() fails. IMHO it's not a good idea to call get*() functions with buf=NULL. Maybe the function fails, maybe you get a crash... I would prefer to avoid the risk of crash :-) I proposed to move code which allocates the memory to avoid redundancy and to make sure that get*() are never called with buf=NULL.

By the way, I also proposed to fix a mojibake (encoding) issue while we are on these functions.

Maybe the mojibake issue can be fixed in a separated PR, since I don't think that we are going to backport this one to 3.7 and older. So maybe wait until this PR is merged, and then write a second PR to fix the mojibake issue, and we can easily backport the second PR to all supported branches.

PyErr_NoMemory();
}
else {
PyErr_Format(PyExc_KeyError, "getgrnam(): name not found: %s", name_chars);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, I see a bug which is not related to your change, but while we modify these functions, it would be nice to fix it.

name_chars is encoded to the filesystem encoding, whereas %s decodes it from UTF-8. If the filesystem encoding is not UTF-8, you get mojibake.

The fix is simple: use %S format and pass name: name is always a Unicode string.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If name is always a Unicode string, you can use %U.

It may be worth to use %R for the case if the name contains invisible characters or trailing whitespaces.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
else {
PyErr_Format(PyExc_KeyError,
"getpwnam(): name not found: %s", name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment: please use the %S format here, but if you modify this function, would you mind to also fix the name of the parameter? Rename "arg" to "name", as in Doc/library/pwd.rst. Maybe rename the char* name to name_chars, as in grp_getgrnam_impl().

if (bufsize == -1) {
bufsize = DEFAULT_BUFFER_SIZE;
}
buf = PyMem_RawMalloc(bufsize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, wait, you don't check if PyMem_Malloc() failed here? I'm not sure that it's safe to call get*() functions with buf=NULL.

I have a proposition. Move the memory allocation at the start of the loop, and always use PyMem_Realloc(). PyMem_Realloc(buf, bufsize) behaves as PyMem_Malloc(). Pseudo-code:

/* PyMem_Malloc code removed from here */
while (1) {
        buf2 = PyMem_RawRealloc(buf, bufsize);
        if (buf2 == NULL) {
            nomem = 1;
            break;
        }
        buf = buf2; /* buf cannot be NULL from this point */
        (...)
    /* no more PyMem_Realloc code here neither */
}

(I don't propose to add these comments, it's just to explain my idea ;-))

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@william-gr
Copy link
Contributor Author

I have made the requested changes; please review again

Thanks for pointing that out and your patience with the reviews. I am glad you pointed the Malloc issue, I also found other problem with latest Realloc change I had made.

I will take your advise and create another PR for mojibake.

@bedevere-bot
Copy link

Thanks for making the requested changes!

@vstinner: please review the changes made to this pull request.

@vstinner
Copy link
Member

vstinner commented Sep 7, 2018

I merged your PR, thanks!

Would you mind to write a second PR to fix the encoding issue that I spootted? I explained how to fix it (use %S format).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants