Skip to content

bpo-40170: Add _PyObject_CheckBuffer() internal function #19541

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

shihai1991
Copy link
Member

@shihai1991 shihai1991 commented Apr 15, 2020

@shihai1991
Copy link
Member Author

@vstinner Hi, victor. Looks like _PyObject_CheckBuffer() should be added, right?

@vstinner
Copy link
Member

@vstinner Hi, victor. Looks like _PyObject_CheckBuffer() should be added, right?

I chose to not add an internal _PyObject_CheckBuffer() on purpose, to simplify the implementation. I don't think that it's worth it.

@methane, @pablogsal: What do you think?

@vstinner
Copy link
Member

This change is related to my commit ef5c615 which converted the PyObject_CheckBuffer() macro to a function.

@methane
Copy link
Member

methane commented Apr 16, 2020

I feel this API is not performance-critical. So I agree with @vstinner.

Would you provide a benchmark if you think this API is important?

@shihai1991
Copy link
Member Author

shihai1991 commented Apr 16, 2020

I feel this API is not performance-critical. So I agree with @vstinner.

Would you provide a benchmark if you think this API is important?

Oh, I was unware victor's intension before :(

I run $ ./python -m pyperf timeit --compare-to python3.9d "bytearray(range(10))" three times:

Mean +- std dev: [python3.9d] 1.84 us +- 0.29 us -> [python] 1.76 us +- 0.23 us: 1.04x faster (-4%)
Not significant!

Mean +- std dev: [python3.9d] 1.87 us +- 0.32 us -> [python] 1.82 us +- 0.18 us: 1.02x faster (-2%)
Not significant!

Mean +- std dev: [python3.9d] 1.96 us +- 0.35 us -> [python] 1.84 us +- 0.22 us: 1.06x faster (-6%)

little improvment ;(

@vstinner
Copy link
Member

I run (...) three times: (...)

Remark about your benchmark: the std dev is quite big in you take in account the small difference (50 nanoseconds). For benchmarks which are close to nanoseconds, it's better to use CPU isolation if possible: https://pyperf.readthedocs.io/en/latest/system.html

Rather than running the benchmark 3 times, you can ask pyperf to compute more values. For example, use -p 100 to spawn 100 processes (sequentially) rather than 20. See the doc:
https://pyperf.readthedocs.io/en/latest/run_benchmark.html#runs-values-warmups-outer-and-inner-loops

Moreover, what is the D in "python3.9d"? Is it a debug build? Benchmarks are more reliable when run on a binary compiled in release mode with LTO + PGO.

@vstinner
Copy link
Member

little improvment ;(

Honestly, I don't think that it's worth it to bother with this micro-optimization. Calling PyObject_CheckBuffer() is likely to take less than 50 nanoseconds. I close the issue.

It's all about tradeoffs. It depends if a function is commonly used or not. Here I don't think that it's worth it o bother with inlining.

Moreover, using LTO, the compiler may be allowed to inline PyObject_CheckBuffer() anyway, especially when using -fno-semantic-interposition which is used by default in Clang. FYI We modified the Python package in Fedora to use -fno-semantic-interposition so GCC can inline function calls from libpython to libpython (Pyhon is built with --enable-shared on Fedora to get libpython).

@vstinner vstinner closed this Apr 16, 2020
@shihai1991
Copy link
Member Author

little improvment ;(

Honestly, I don't think that it's worth it to bother with this micro-optimization. Calling PyObject_CheckBuffer() is likely to take less than 50 nanoseconds. I close the issue.

It's all about tradeoffs. It depends if a function is commonly used or not. Here I don't think that it's worth it o bother with inlining.

Moreover, using LTO, the compiler may be allowed to inline PyObject_CheckBuffer() anyway, especially when using -fno-semantic-interposition which is used by default in Clang. FYI We modified the Python package in Fedora to use -fno-semantic-interposition so GCC can inline function calls from libpython to libpython (Pyhon is built with --enable-shared on Fedora to get libpython).

Wow, thanks a million, victor. Learned much from your info.

Moreover, what is the D in "python3.9d"? Is it a debug build?
Yes, I use ./configure --with-pydebug --with-trace-refs && make install to install the master vision.

@shihai1991 shihai1991 deleted the bpo_40170_1 branch April 16, 2020 15:17
@vstinner
Copy link
Member

Yes, I use ./configure --with-pydebug --with-trace-refs && make install to install the master vision.

Please don't run benchmarks on a debug build: they contain many debug checks which are run at runtime.

@shihai1991
Copy link
Member Author

Yes, I use ./configure --with-pydebug --with-trace-refs && make install to install the master vision.

Please don't run benchmarks on a debug build: they contain many debug checks which are run at runtime.

Copy that. Thanks for your guide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants