-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
bpo-40170: Add _PyObject_CheckBuffer() internal function #19541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@vstinner Hi, victor. Looks like |
I chose to not add an internal _PyObject_CheckBuffer() on purpose, to simplify the implementation. I don't think that it's worth it. @methane, @pablogsal: What do you think? |
This change is related to my commit ef5c615 which converted the PyObject_CheckBuffer() macro to a function. |
I feel this API is not performance-critical. So I agree with @vstinner. Would you provide a benchmark if you think this API is important? |
Oh, I was unware victor's intension before :( I run Mean +- std dev: [python3.9d] 1.84 us +- 0.29 us -> [python] 1.76 us +- 0.23 us: 1.04x faster (-4%) Mean +- std dev: [python3.9d] 1.87 us +- 0.32 us -> [python] 1.82 us +- 0.18 us: 1.02x faster (-2%) Mean +- std dev: [python3.9d] 1.96 us +- 0.35 us -> [python] 1.84 us +- 0.22 us: 1.06x faster (-6%) little improvment ;( |
Remark about your benchmark: the std dev is quite big in you take in account the small difference (50 nanoseconds). For benchmarks which are close to nanoseconds, it's better to use CPU isolation if possible: https://pyperf.readthedocs.io/en/latest/system.html Rather than running the benchmark 3 times, you can ask pyperf to compute more values. For example, use -p 100 to spawn 100 processes (sequentially) rather than 20. See the doc: Moreover, what is the D in "python3.9d"? Is it a debug build? Benchmarks are more reliable when run on a binary compiled in release mode with LTO + PGO. |
Honestly, I don't think that it's worth it to bother with this micro-optimization. Calling PyObject_CheckBuffer() is likely to take less than 50 nanoseconds. I close the issue. It's all about tradeoffs. It depends if a function is commonly used or not. Here I don't think that it's worth it o bother with inlining. Moreover, using LTO, the compiler may be allowed to inline PyObject_CheckBuffer() anyway, especially when using -fno-semantic-interposition which is used by default in Clang. FYI We modified the Python package in Fedora to use -fno-semantic-interposition so GCC can inline function calls from libpython to libpython (Pyhon is built with --enable-shared on Fedora to get libpython). |
Wow, thanks a million, victor. Learned much from your info.
|
Please don't run benchmarks on a debug build: they contain many debug checks which are run at runtime. |
Copy that. Thanks for your guide. |
https://bugs.python.org/issue40170