Fix buffer_info for ctypes buffers (pybind#2502) #2503

fritzr · 2020-09-17T17:29:41Z

Failing test which exhibits issue #2502, followed by a fix for the issue.

henryiii · 2020-09-19T00:41:34Z

Adding a new cpp file (tests/test_ctypes_buffer.cpp) will increase the compile time; I think it would be best to put this into another test, probably test_buffers?

henryiii · 2020-09-19T01:09:01Z

Also, I think this is a simpler fix and doesn't make a new function (patch shown on top of yours):

diff --git a/include/pybind11/buffer_info.h b/include/pybind11/buffer_info.h
index ace8af63..13f5dcda 100644
--- a/include/pybind11/buffer_info.h
+++ b/include/pybind11/buffer_info.h
@@ -52,7 +52,13 @@ struct buffer_info {
     : buffer_info(const_cast<T*>(ptr), sizeof(T), format_descriptor<T>::format(), size, readonly) { }

     explicit buffer_info(Py_buffer *view, bool ownview = true)
-    : buffer_info(from_view(view)) {
+    : buffer_info(view->buf,
+                  view->itemsize,
+                  view->format,
+                  view->ndim,
+                  {view->shape, view->shape + view->ndim},
+                  view->strides ? std::vector<ssize_t>(view->strides, view->strides + view->ndim) : std::vector<ssize_t>((std::size_t) view->ndim),
+                  view->readonly) {
         this->m_view = view;
         this->ownview = ownview;
     }
@@ -91,23 +97,6 @@ private:
                 detail::any_container<ssize_t> &&shape_in, detail::any_container<ssize_t> &&strides_in, bool readonly)
     : buffer_info(ptr, itemsize, format, ndim, std::move(shape_in), std::move(strides_in), readonly) { }

-    static buffer_info from_view(Py_buffer *view) {
-      if (view->strides)
-       return {
-         view->buf, view->itemsize, view->format, view->ndim,
-         {view->shape, view->shape + view->ndim},
-         {view->strides, view->strides + view->ndim},
-         static_cast<bool>(view->readonly)
-       };
-      else
-       return {
-         view->buf, view->itemsize, view->format, view->ndim,
-         {view->shape, view->shape + view->ndim},
-         std::vector<ssize_t>(static_cast<std::vector<ssize_t>::size_type>(view->ndim)),
-         static_cast<bool>(view->readonly)
-       };
-    }
-
     Py_buffer *m_view = nullptr;
     bool ownview = false;
 };

I can commit it if you want me to.

henryiii · 2020-09-19T01:12:14Z

Also, shouldn't strides be all 1 instead of all 0?

YannickJadoul · 2020-09-19T21:54:30Z

I like @henryiii's suggestion. It seems more logical to have this as a localized, ternary-operator check, rather than a whole new function.

And yes, it should just fit into test_buffers.cpp or so, no?

Also, shouldn't strides be all 1 instead of all 0?

Actually, no, even 1 seems wrong. I thought strides contain the actual amount of bytes to jump for each dimension. This should probably reuse this:

pybind11/include/pybind11/numpy.h

Lines 792 to 799 in 5a8ec8e

    
           static std::vector<ssize_t> c_strides(const std::vector<ssize_t> &shape, ssize_t itemsize) { 
        
               auto ndim = shape.size(); 
        
               std::vector<ssize_t> strides(ndim, itemsize); 
        
               if (ndim > 0) 
        
                   for (size_t i = ndim - 1; i > 0; --i) 
        
                       strides[i - 1] = strides[i] * shape[i]; 
        
               return strides; 
        
           }

YannickJadoul · 2020-09-19T22:02:06Z

Also, can we get an idea/explanation of where this nullptr value for the strides field comes from? Is this a valid value in Py_Buffer?

It's not entirely clear to me what it means if strides is null, from the docs. Does it mean that it is C-contiguous?

https://docs.python.org/3/c-api/buffer.html#buffer-structure does say

int ndim
The number of dimensions the memory represents as an n-dimensional array. If it is 0, buf points to a single item representing a scalar. In this case, shape, strides and suboffsets MUST be NULL.

But that seems not to be the case in your example?

henryiii · 2020-09-19T23:31:29Z

I believe this issue is py 2 only?

YannickJadoul · 2020-09-19T23:40:28Z

I believe this issue is py 2 only?

I think you're right, but the Python 2 docs are the same, it seems. Also, from that docs page:

PyBUF_STRIDES | This implies PyBUF_ND. The returned buffer must provide strides information (i.e. the strides cannot be NULL). This would be used when the consumer can handle strided, discontiguous arrays. Handling strides automatically assumes you can handle shape. The exporter can raise an error if a strided representation of the data is not possible (i.e. without the suboffsets).

And we are passing that flag:

pybind11/include/pybind11/pytypes.h

Lines 1363 to 1372 in 5e6ec49

    
           buffer_info request(bool writable = false) const { 
        
               int flags = PyBUF_STRIDES | PyBUF_FORMAT; 
        
               if (writable) flags |= PyBUF_WRITABLE; 
        
               auto *view = new Py_buffer(); 
        
               if (PyObject_GetBuffer(m_ptr, view, flags) != 0) { 
        
                   delete view; 
        
                   throw error_already_set(); 
        
               } 
        
               return buffer_info(view); 
        
           }

So it does seems like ctypes is misbehaving and violating the buffer protocol, no?

fritzr · 2020-09-21T16:16:26Z

Adding a new cpp file (tests/test_ctypes_buffer.cpp) will increase the compile time; I think it would be best to put this into another test, probably test_buffers?

Yes, sorry, I didn't see that file. Done.

Also, I think this is a simpler fix and doesn't make a new function (patch shown on top of yours):

That's better. Done.

fritzr · 2020-09-21T16:26:49Z

Regarding strides... From https://docs.python.org/3/c-api/buffer.html#buffer-request-types:

Request	shape	strides	suboffsets
PyBUF_INDIRECT	yes	yes	if needed
PyBUF_STRIDES	yes	yes	NULL
PyBUF_ND	yes	NULL	NULL
PyBUF_SIMPLE	NULL	NULL	NULL

In practice ctypes appears to ignore the flags in its getbufferproc and fills in the buffer as if it was requested via PyBUF_ND. So in a way yes, ctypes is misbehaving. Looking at the sources this appears to be true for both Python 2 and Python 3.

Regarding the value of strides, I think using py::array::c_strides to create the buffer_info::strides from a NULL Py_buffer.strides is the way to go as @YannickJadoul suggested. Since it is a protected method we need to move it to reuse the code from both places, perhaps moved to py::detail::c_strides in buffer_info.h?

diff --git a/include/pybind11/buffer_info.h b/include/pybind11/buffer_info.h
index 7fab749..d7b5075 100644
--- a/include/pybind11/buffer_info.h
+++ b/include/pybind11/buffer_info.h
@@ -13,6 +13,20 @@
 
 PYBIND11_NAMESPACE_BEGIN(PYBIND11_NAMESPACE)
 
+PYBIND11_NAMESPACE_BEGIN(detail)
+
+// Default, C-style strides
+inline std::vector<ssize_t> c_strides(const std::vector<ssize_t> &shape, ssize_t itemsize) {
+    auto ndim = shape.size();
+    std::vector<ssize_t> strides(ndim, itemsize);
+    if (ndim > 0)
+       for (size_t i = ndim - 1; i > 0; --i)
+           strides[i - 1] = strides[i] * shape[i];
+    return strides;
+}
+
+PYBIND11_NAMESPACE_END(detail)
+
 /// Information record describing a Python buffer object
 struct buffer_info {
     void *ptr = nullptr;          // Pointer to the underlying storage
@@ -56,7 +70,7 @@ struct buffer_info {
            {view->shape, view->shape + view->ndim},
             view->strides
            ? std::vector<ssize_t>(view->strides, view->strides + view->ndim)
-           : std::vector<ssize_t>(static_cast<std::vector<ssize_t>::size_type>(view->ndim)),
+           : detail::c_strides(shape, itemsize),
            view->readonly) {
         this->m_view = view;
         this->ownview = ownview;
diff --git a/include/pybind11/numpy.h b/include/pybind11/numpy.h
index 03e1ed6..f482c15 100644
--- a/include/pybind11/numpy.h
+++ b/include/pybind11/numpy.h
@@ -560,7 +560,7 @@ public:
           const void *ptr = nullptr, handle base = handle()) {
 
         if (strides->empty())
-            *strides = c_strides(*shape, dt.itemsize());
+            *strides = detail::c_strides(*shape, dt.itemsize());
 
         auto ndim = shape->size();
         if (ndim != strides->size())
@@ -788,16 +788,6 @@ protected:
             throw std::domain_error("array is not writeable");
     }
 
-    // Default, C-style strides
-    static std::vector<ssize_t> c_strides(const std::vector<ssize_t> &shape, ssize_t itemsize) {
-        auto ndim = shape.size();
-        std::vector<ssize_t> strides(ndim, itemsize);
-        if (ndim > 0)
-            for (size_t i = ndim - 1; i > 0; --i)
-                strides[i - 1] = strides[i] * shape[i];
-        return strides;
-    }
-
     // F-style strides; default when constructing an array_t with `ExtraFlags & f_style`
     static std::vector<ssize_t> f_strides(const std::vector<ssize_t> &shape, ssize_t itemsize) {
         auto ndim = shape.size();
@@ -865,7 +855,7 @@ public:
 
     explicit array_t(ShapeContainer shape, const T *ptr = nullptr, handle base = handle())
         : array_t(private_ctor{}, std::move(shape),
-                ExtraFlags & f_style ? f_strides(*shape, itemsize()) : c_strides(*shape, itemsize()),
+                ExtraFlags & f_style ? f_strides(*shape, itemsize()) : detail::c_strides(*shape, itemsize()),
                 ptr, base) { }
 
     explicit array_t(ssize_t count, const T *ptr = nullptr, handle base = handle())

Or should we simply move it to a public static method of array?

YannickJadoul · 2020-09-21T16:30:38Z

Regarding the value of strides, I think using py::array::c_strides to create the buffer_info::strides from a NULL Py_buffer.strides is the way to go as @YannickJadoul suggested.

Can we get a few tests/ctypes specification, or so, that show that this should always be c_strides?

Since it is a protected method we need to move it to reuse the code from both places, perhaps moved to py::detail::c_strides in buffer_info.h?

That's most likely the way to go, yes :-)

YannickJadoul · 2020-09-21T16:31:12Z

Since it is a protected method we need to move it to reuse the code from both places, perhaps moved to py::detail::c_strides in buffer_info.h?

That's most likely the way to go, yes :-)

For consistency, please pull out f_strides, then, as well.

include/pybind11/buffer_info.h

c_strides and f_strides are moved from numpy.h (py::array) to buffer_info.h (py::detail) so they can be used from the buffer_info Py_buffer constructor.

fritzr · 2020-09-21T20:14:42Z

Strides are now filled out "for real" plus some better ctypes tests.

For additional confidence, numpy itself contains the following code when creating an array from a Py_buffer view in _array_from_buffer_3118 which handles NULL strides:

https://github.com/numpy/numpy/blob/f981a7ff4b02204f215eb089ce77f9a3f3906b1e/numpy/core/src/multiarray/ctors.c#L1208-L1221

        if (view->strides != NULL) {
            for (k = 0; k < nd; ++k) {
                strides[k] = view->strides[k];
            }
        }
        else {
            d = view->len;
            for (k = 0; k < nd; ++k) {
                if (view->shape[k] != 0) {
                    d /= view->shape[k];
                }
                strides[k] = d;
            }
        }

fritzr · 2020-09-21T20:36:30Z

Interestingly, CI tests show that in pypy2 the py::buffer_info for a bytes object ends up with the readonly field set to false. This causes the test failure, since I tried to use from_buffer or from_buffer_copy based on the py::buffer_info::readonly field.

This seems like a bug in PyPy since AFAICT this could only happen if the bf_getbuffer function for bytes objects really sets Py_buffer->readonly to zero. I don't have an active pypy installation right now to debug exactly why this occurs. Should I change the test to something more "reliable" like trying to use from_buffer and falling back to from_buffer_copy if we see a TypeError?

YannickJadoul

For additional confidence, numpy itself contains the following code when creating an array from a Py_buffer view in _array_from_buffer_3118 which handles NULL strides:

OK, that is the kind of convincing argument I wanted to see. Thanks a lot for doing the effort of finding that! :-)

Interestingly, CI tests show that in pypy2 the py::buffer_info for a bytes object ends up with the readonly field set to false.

This is weird indeed. If I saw correctly, this is in a separate test from the changes you made? Do you have any intuition what's going on?

This seems like a bug in PyPy since AFAICT this could only happen if the bf_getbuffer function for bytes objects really sets Py_buffer->readonly to zero. I don't have an active pypy installation right now to debug exactly why this occurs. Should I change the test to something more "reliable" like trying to use from_buffer and falling back to from_buffer_copy if we see a TypeError?

Pfffff, if we trace the bug to PyPy, it's better to just ignore this test for PyPy 2 and catch errors for the other interpreters, I'd say.

(Approving, given that the PyPy tests succeed, ofc)

YannickJadoul · 2020-09-21T21:38:20Z

This is weird indeed. If I saw correctly, this is in a separate test from the changes you made? Do you have any intuition what's going on?

Nevermind this. The error annotations are in the wrong place, in the diff.

fritzr · 2020-09-21T21:56:54Z

For additional confidence, numpy itself contains the following code when creating an array from a Py_buffer view in _array_from_buffer_3118 which handles NULL strides:

OK, that is the kind of convincing argument I wanted to see. Thanks a lot for doing the effort of finding that! :-)

Glad to! I also tried some tests like so (pseudocode):

assert m.get_buffer_info(np.ctypeslib.as_array(ctypes_array)) == m.get_buffer_info(ctypes_array)

These pass (yay!), but recent versions of numpy will issue a RuntimeWarning that ctypes.Array format strings are not PEP 3118-compliant in Python 2. Do you think I should add these cases and allow/suppress the warning in PY2?

This seems like a bug in PyPy since AFAICT this could only happen if the bf_getbuffer function for bytes objects really sets Py_buffer->readonly to zero. I don't have an active pypy installation right now to debug exactly why this occurs. Should I change the test to something more "reliable" like trying to use from_buffer and falling back to from_buffer_copy if we see a TypeError?

Pfffff, if we trace the bug to PyPy, it's better to just ignore this test for PyPy 2 and catch errors for the other interpreters, I'd say.

Sounds good. I've marked a skip for that part of the test with PyPy 2. Now to go write them a PR.... Maybe. ;)

henryiii · 2020-09-21T22:18:03Z

What is "recent versions of numpy"? NumPy dropped Python 2 support, so it can't be the most recent version. ;)

YannickJadoul · 2020-09-21T22:21:45Z

Do you think I should add these cases and allow/suppress the warning in PY2?

If you thinks it adds anything, go ahead, but otherwise, I think these tests should capture enough?

Sounds good. I've marked a skip for that part of the test with PyPy 2. Now to go write them a PR.... Maybe. ;)

I saw. Looks good to me. If you can approximately figure out what goes wrong and is unexpected, do please log an issue with PyPy (and feel free to tag me, so we can remove this from pybind11 if it gets solved :-) ).

fritzr · 2020-09-22T16:00:17Z

What is "recent versions of numpy"? NumPy dropped Python 2 support, so it can't be the most recent version. ;)

Specifically the warning was introduced in numpy v1.5.0. Only 10 years old...!

Do you think I should add these cases and allow/suppress the warning in PY2?

If you thinks it adds anything, go ahead, but otherwise, I think these tests should capture enough?

Yes I think the current tests are effective enough.

Sounds good. I've marked a skip for that part of the test with PyPy 2. Now to go write them a PR.... Maybe. ;)

I saw. Looks good to me. If you can approximately figure out what goes wrong and is unexpected, do please log an issue with PyPy (and feel free to tag me, so we can remove this from pybind11 if it gets solved :-) ).

I've minimally reproduced the issue so it's definitely a bug in pypy, therefore I opened pypy#3307 for this and mentioned you in the description.

YannickJadoul · 2020-09-22T16:56:36Z

@fritzr I saw! Thanks for the extra work and for tagging me :-)

YannickJadoul · 2020-09-22T16:58:08Z

Not sure who're the numpy/buffer protocol experts here. Maybe @rwgk as well, after #2223?

rwgk · 2020-10-03T15:19:54Z

https://github.com/numpy/numpy/blob/f981a7ff4b02204f215eb089ce77f9a3f3906b1e/numpy/core/src/multiarray/ctors.c#L1208-L1221

I think it could be useful to add this link as a comment to the corresponding code in pybind11. If questions come up in the future, the comment will make it easy to start comparing with numpy developments.

rwgk

Looks great to me!
I looked pretty carefully, nothing stood out to me.
From the comments I see that @fritzr and @YannickJadoul looked at the details way more than I could myself. I'm convinced this is good to merge, for the 2.6.0 release.

rwgk · 2020-10-03T15:29:40Z

Not sure who're the numpy/buffer protocol experts here. Maybe @rwgk as well, after #2223?

I don't consider myself a numpy/buffer expert, at all :-)
Although I've done a lot of numerical work in general.

YannickJadoul · 2020-10-03T21:09:10Z

OK, so let's merge this then, and get it into 2.6.0. Thanks again, @fritzr! :-)

rwgk · 2020-10-04T15:33:39Z

This PR was included in Google-global testing: no issues found.

fritzr added 2 commits September 17, 2020 12:52

tests: New test for ctypes buffers (pybind#2502)

8c1ae99

fix: fix buffer_info segfault on views with no stride (pybind11#2502)

f565e3a

fritzr changed the title ~~tests: New test for ctypes buffers (pybind#2502)~~ Fix buffer_info for ctypes buffers (pybind#2502) Sep 17, 2020

fritzr added 2 commits September 17, 2020 15:30

Explicit conversions in buffer_info to make clang happy (pybind#2502)

73cf4c2

Another explicit cast in buffer_info constructor for clang (pybind#2502)

1fff837

fritzr added 2 commits September 21, 2020 11:46

Simpler implementation of buffer_info constructor from Py_buffer.

a2d4c07

Move test_ctypes_buffer into test_buffers

76de987

YannickJadoul reviewed Sep 21, 2020

View reviewed changes

include/pybind11/buffer_info.h Show resolved Hide resolved

fritzr added 3 commits September 21, 2020 15:35

Comment on why view->strides may be NULL (and fix some whitespace)

86246e3

Use c_strides() instead of zero when view->strides is NULL.

a4790c8

c_strides and f_strides are moved from numpy.h (py::array) to buffer_info.h (py::detail) so they can be used from the buffer_info Py_buffer constructor.

Increase ctypes buffer test coverage in test_buffers.

355bd37

YannickJadoul approved these changes Sep 21, 2020

View reviewed changes

Split ctypes tests and skip one which is broken in PyPy2.

18a2aa5

YannickJadoul requested review from henryiii and bstaletic September 22, 2020 16:56

YannickJadoul requested a review from rwgk September 22, 2020 16:58

henryiii approved these changes Oct 3, 2020

View reviewed changes

rwgk approved these changes Oct 3, 2020

View reviewed changes

henryiii added this to the v2.6.0 milestone Oct 3, 2020

YannickJadoul merged commit e8ad33b into pybind:master Oct 3, 2020

YannickJadoul mentioned this pull request Oct 3, 2020

[BUG] Segfault in py::buffer::request on a ctypes array #2502

Closed

rwgk mentioned this pull request Feb 10, 2023

FWD pybind11 google/pybind11clif#2503

Closed

Fix buffer_info for ctypes buffers (pybind#2502) #2503

Fix buffer_info for ctypes buffers (pybind#2502) #2503

Uh oh!

Conversation

fritzr commented Sep 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

henryiii commented Sep 19, 2020

Uh oh!

henryiii commented Sep 19, 2020

Uh oh!

henryiii commented Sep 19, 2020

Uh oh!

YannickJadoul commented Sep 19, 2020

Uh oh!

YannickJadoul commented Sep 19, 2020

Uh oh!

henryiii commented Sep 19, 2020

Uh oh!

YannickJadoul commented Sep 19, 2020

Uh oh!

fritzr commented Sep 21, 2020

Uh oh!

fritzr commented Sep 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YannickJadoul commented Sep 21, 2020

Uh oh!

YannickJadoul commented Sep 21, 2020

Uh oh!

Uh oh!

fritzr commented Sep 21, 2020

Uh oh!

fritzr commented Sep 21, 2020

Uh oh!

YannickJadoul left a comment

Choose a reason for hiding this comment

Uh oh!

YannickJadoul commented Sep 21, 2020

Uh oh!

fritzr commented Sep 21, 2020

Uh oh!

henryiii commented Sep 21, 2020

Uh oh!

YannickJadoul commented Sep 21, 2020

Uh oh!

fritzr commented Sep 22, 2020

Uh oh!

YannickJadoul commented Sep 22, 2020

Uh oh!

YannickJadoul commented Sep 22, 2020

Uh oh!

rwgk commented Oct 3, 2020

Uh oh!

rwgk left a comment

Choose a reason for hiding this comment

Uh oh!

rwgk commented Oct 3, 2020

Uh oh!

YannickJadoul commented Oct 3, 2020

Uh oh!

rwgk commented Oct 4, 2020

Uh oh!

Uh oh!

fritzr commented Sep 17, 2020 •

edited

Loading

fritzr commented Sep 21, 2020 •

edited

Loading