Skip to content

gh-59705: Add _thread.set_name() function #127338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Dec 6, 2024
Merged

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Nov 27, 2024

On Linux, threading.Thread now sets the thread name to the operating system.

configure now checks if pthread_setname_np() function is available.

On Linux, threading.Thread now sets the thread name to the operating
system.

configure now checks if pthread_setname_np() function is available.
@vstinner
Copy link
Member Author

This implementation is very basic on purpose. I plan to add support for more platform in follow-up PRs.

  • On Linux, set_name() does nothing if the name is longer than 15 bytes. Should the function truncate silently to 15 bytes instead? I don't think that raising an exception is very convenient here.
  • Setting Thread.name after Thread.start() doesn't call again set_name(). set_name() is called only once per thread, at startup.
  • I didn't add automated tests since I don't want to add a get_name() function (use Thread.name to get a thread name).

Demo 1 (main thread):

$ ./python
>>> import os
>>> pid=os.getpid()
>>> with open(f"/proc/{pid}/task/{pid}/comm") as fp: print(f"comm = {fp.read()!r}")
... 
comm = 'python\n'

>>> import _thread; _thread.set_name("demo")
>>> with open(f"/proc/{pid}/task/{pid}/comm") as fp: print(f"comm = {fp.read()!r}")
... 
comm = 'demo\n'

Demo 2 (thread):

$ ./python
>>> import threading, os, time
>>> os.getpid()
81921
>>> t=threading.Thread(target=time.sleep, args=(60,), name="sleeper")
>>> t.start()
^Z

$ cat /proc/81921/task/81927/comm 
sleeper

See also a previous attempt to implement the feature: #14578

@vstinner
Copy link
Member Author

I didn't add automated tests since I don't want to add a get_name() function (use Thread.name to get a thread name).

I changed my mind and added a private _thread._get_name() function for tests.

@vstinner
Copy link
Member Author

@pitrou @encukou @serhiy-storchaka: Would you mind to review this change? It's to set the thread name in threading.Thread to the operating system.

@vstinner
Copy link
Member Author

On Linux, set_name() does nothing if the name is longer than 15 bytes. Should the function truncate silently to 15 bytes instead? I don't think that raising an exception is very convenient here.

I modified _thread.set_name(name) to truncate name to 15 bytes on Linux.

Truncating the string in threading.Thread would be more complicated since it requires to encode the string the filesystem encoding, detect the operating system (Linux), and hardcode the 15 bytes limit there. IMO it's more convenient to truncate in _thread.set_name().

Copy link
Member

@encukou encukou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thank you!

The truncation is not pretty with non-ASCII names. I guess codepoint-preserving truncation is not worth the effort, and Linux tools need to deal with thread names being arbitrary bytes.

But, we can test the edge cases, to ensure this quality-of-life enhancement doesn't start raising exceptions in working code.

@vstinner
Copy link
Member Author

@encukou: I addressed your reviews. Please review the updated PR.

I added tests on long names and non-ASCII names.

@vstinner
Copy link
Member Author

@encukou: Maybe the "replace" error handler can be used, instead of not setting the name if the name cannot be encoded to the filesystem encoding. What do you think?

@serhiy-storchaka
Copy link
Member

You can use FS_NONASCII. You can also use TESTFN_UNDECODABLE to test that it works with arbitrary bytes and TESTFN_UNENCODABLE to test for encoding error.

Is it a hard limit for the size? Is it the same on other platforms? I would prefer to use a named constant instead of magic numbers 15, 16, 17.

@vstinner
Copy link
Member Author

@serhiy-storchaka:

Is it a hard limit for the size?

Yes. Using a longer name fails with ERANGE.

Is it the same on other platforms?

It's 16 bytes on Linux and 64 bytes on macOS, so no, it's not the same.

I would prefer to use a named constant instead of magic numbers 15, 16, 17.

I failed to find a public constant for these limits. For example, Darwin MAXTHREADNAMESIZE constant is private (I'm not 100% sure, but I don't have macOS so I cannot check manually, I only read the code).

@vstinner
Copy link
Member Author

You can use FS_NONASCII. You can also use TESTFN_UNDECODABLE to test that it works with arbitrary bytes and TESTFN_UNENCODABLE to test for encoding error.

Ok, I added tests using FS_NONASCII and TESTFN_UNENCODABLE.

@serhiy-storchaka
Copy link
Member

AFAIK, to support cross-compilation, configure is limited to compiling & linking. It shouldn't run the built code.

Well, then hardcoding limits for known platforms is okay. We could also determine it at runtime, but this would be too complicated.

On Linux, the thread name can be retrieved by reading /proc: see my examples #127338 (comment).

It is the content of the file, not its name. There is a flaw in this example: what encoding do you use to decode it? How do you read the name of the thread in other process with different locale?

@vstinner
Copy link
Member Author

vstinner commented Dec 5, 2024

@serhiy-storchaka: I addressed your review, please review the updated PR.

@serhiy-storchaka:

It is the content of the file, not its name. There is a flaw in this example: what encoding do you use to decode it?

open() uses the current LC_CTYPE locale encoding by default. _thread.set_name() uses the filesystem encoding.

How do you read the name of the thread in other process with different locale?

You have the same problem with file content. It's not a new problem.

IMO the Python filesystem encoding is a better choice than UTF-8 for the thread name.

@serhiy-storchaka
Copy link
Member

You have the same problem with file content. It's not a new problem.

It is only a large problem if you use locale encoding for file content. If you use a fixed encoding (e.g. UTF-8) or write encoding as a metadata before writing thee encoded content, it is not a problem or a lesser problem.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with some decisions, but do not want to block the PR. We can fix this later.

Co-authored-by: Serhiy Storchaka <[email protected]>
errno = rc;
return PyErr_SetFromErrno(PyExc_OSError);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pthread_getname_np should add a trailing NUL byte, but like everything here, that's platform-specific. I suggest being defensive here.

Suggested change
name[Py_ARRAY_LENGTH(name)-1] = 0;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On what platform it does not add the null byte?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The null byte is added on all supported platforms. Before I made sure that the buffer always ended with a null byte, but @serhiy-storchaka asked me to remove it. Let's be optimistic. We can adjust the code later if needed.

Copy link
Member

@encukou encukou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, ship it :)
Or make a few more changes if you want it to be more perfect.

@vstinner vstinner enabled auto-merge (squash) December 6, 2024 16:20
@vstinner vstinner merged commit 67b18a1 into python:main Dec 6, 2024
41 checks passed
@vstinner vstinner deleted the thread_set_name branch December 6, 2024 16:27
Comment on lines 2426 to 2427
size_t len = PyBytes_GET_SIZE(name_encoded);
if (len > PYTHREAD_NAME_MAXLEN) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also inline len. It is only used here.

srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this pull request Jan 8, 2025
On Linux, threading.Thread now sets the thread name to the operating
system.

* configure now checks if pthread_getname_np()
  and pthread_setname_np() functions are available.
* Add PYTHREAD_NAME_MAXLEN macro.
* Add _thread._NAME_MAXLEN constant for test_threading.

Co-authored-by: Serhiy Storchaka <[email protected]>
@kulikjak
Copy link
Contributor

kulikjak commented Apr 2, 2025

I am sorry I didn't get to this earlier; on Solaris, the tests are green and all works as expected.

There is one small issue with Solaris detection in test_set_name and I opened #132012 to resolve that.

jamadden added a commit to gevent/gevent that referenced this pull request Apr 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants