Blame docs/src/userguide/buffer.rst

Packit Service 99d393
.. _buffer:
Packit Service 99d393
Packit Service 99d393
Implementing the buffer protocol
Packit Service 99d393
================================
Packit Service 99d393
Packit Service 99d393
Cython objects can expose memory buffers to Python code
Packit Service 99d393
by implementing the "buffer protocol".
Packit Service 99d393
This chapter shows how to implement the protocol
Packit Service 99d393
and make use of the memory managed by an extension type from NumPy.
Packit Service 99d393
Packit Service 99d393
Packit Service 99d393
A matrix class
Packit Service 99d393
--------------
Packit Service 99d393
Packit Service 99d393
The following Cython/C++ code implements a matrix of floats,
Packit Service 99d393
where the number of columns is fixed at construction time
Packit Service 99d393
but rows can be added dynamically.
Packit Service 99d393
Packit Service 99d393
::
Packit Service 99d393
Packit Service 99d393
    # matrix.pyx
Packit Service 99d393
    from libcpp.vector cimport vector
Packit Service 99d393
Packit Service 99d393
    cdef class Matrix:
Packit Service 99d393
        cdef unsigned ncols
Packit Service 99d393
        cdef vector[float] v
Packit Service 99d393
Packit Service 99d393
        def __cinit__(self, unsigned ncols):
Packit Service 99d393
            self.ncols = ncols
Packit Service 99d393
Packit Service 99d393
        def add_row(self):
Packit Service 99d393
            """Adds a row, initially zero-filled."""
Packit Service 99d393
            self.v.extend(self.ncols)
Packit Service 99d393
Packit Service 99d393
There are no methods to do anything productive with the matrices' contents.
Packit Service 99d393
We could implement custom ``__getitem__``, ``__setitem__``, etc. for this,
Packit Service 99d393
but instead we'll use the buffer protocol to expose the matrix's data to Python
Packit Service 99d393
so we can use NumPy to do useful work.
Packit Service 99d393
Packit Service 99d393
Implementing the buffer protocol requires adding two methods,
Packit Service 99d393
``__getbuffer__`` and ``__releasebuffer__``,
Packit Service 99d393
which Cython handles specially.
Packit Service 99d393
Packit Service 99d393
::
Packit Service 99d393
Packit Service 99d393
    from cpython cimport Py_buffer
Packit Service 99d393
    from libcpp.vector cimport vector
Packit Service 99d393
Packit Service 99d393
    cdef class Matrix:
Packit Service 99d393
        cdef Py_ssize_t ncols
Packit Service 99d393
        cdef Py_ssize_t shape[2]
Packit Service 99d393
        cdef Py_ssize_t strides[2]
Packit Service 99d393
        cdef vector[float] v
Packit Service 99d393
Packit Service 99d393
        def __cinit__(self, Py_ssize_t ncols):
Packit Service 99d393
            self.ncols = ncols
Packit Service 99d393
Packit Service 99d393
        def add_row(self):
Packit Service 99d393
            """Adds a row, initially zero-filled."""
Packit Service 99d393
            self.v.extend(self.ncols)
Packit Service 99d393
Packit Service 99d393
        def __getbuffer__(self, Py_buffer *buffer, int flags):
Packit Service 99d393
            cdef Py_ssize_t itemsize = sizeof(self.v[0])
Packit Service 99d393
Packit Service 99d393
            self.shape[0] = self.v.size() / self.ncols
Packit Service 99d393
            self.shape[1] = self.ncols
Packit Service 99d393
Packit Service 99d393
            # Stride 1 is the distance, in bytes, between two items in a row;
Packit Service 99d393
            # this is the distance between two adjacent items in the vector.
Packit Service 99d393
            # Stride 0 is the distance between the first elements of adjacent rows.
Packit Service 99d393
            self.strides[1] = <Py_ssize_t>(  <char *>&(self.v[1])
Packit Service 99d393
                                           - <char *>&(self.v[0]))
Packit Service 99d393
            self.strides[0] = self.ncols * self.strides[1]
Packit Service 99d393
Packit Service 99d393
            buffer.buf = <char *>&(self.v[0])
Packit Service 99d393
            buffer.format = 'f'                     # float
Packit Service 99d393
            buffer.internal = NULL                  # see References
Packit Service 99d393
            buffer.itemsize = itemsize
Packit Service 99d393
            buffer.len = self.v.size() * itemsize   # product(shape) * itemsize
Packit Service 99d393
            buffer.ndim = 2
Packit Service 99d393
            buffer.obj = self
Packit Service 99d393
            buffer.readonly = 0
Packit Service 99d393
            buffer.shape = self.shape
Packit Service 99d393
            buffer.strides = self.strides
Packit Service 99d393
            buffer.suboffsets = NULL                # for pointer arrays only
Packit Service 99d393
Packit Service 99d393
        def __releasebuffer__(self, Py_buffer *buffer):
Packit Service 99d393
            pass
Packit Service 99d393
Packit Service 99d393
The method ``Matrix.__getbuffer__`` fills a descriptor structure,
Packit Service 99d393
called a ``Py_buffer``, that is defined by the Python C-API.
Packit Service 99d393
It contains a pointer to the actual buffer in memory,
Packit Service 99d393
as well as metadata about the shape of the array and the strides
Packit Service 99d393
(step sizes to get from one element or row to the next).
Packit Service 99d393
Its ``shape`` and ``strides`` members are pointers
Packit Service 99d393
that must point to arrays of type and size ``Py_ssize_t[ndim]``.
Packit Service 99d393
These arrays have to stay alive as long as any buffer views the data,
Packit Service 99d393
so we store them on the ``Matrix`` object as members.
Packit Service 99d393
Packit Service 99d393
The code is not yet complete, but we can already compile it
Packit Service 99d393
and test the basic functionality.
Packit Service 99d393
Packit Service 99d393
::
Packit Service 99d393
Packit Service 99d393
    >>> from matrix import Matrix
Packit Service 99d393
    >>> import numpy as np
Packit Service 99d393
    >>> m = Matrix(10)
Packit Service 99d393
    >>> np.asarray(m)
Packit Service 99d393
    array([], shape=(0, 10), dtype=float32)
Packit Service 99d393
    >>> m.add_row()
Packit Service 99d393
    >>> a = np.asarray(m)
Packit Service 99d393
    >>> a[:] = 1
Packit Service 99d393
    >>> m.add_row()
Packit Service 99d393
    >>> a = np.asarray(m)
Packit Service 99d393
    >>> a
Packit Service 99d393
    array([[ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
Packit Service 99d393
           [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]], dtype=float32)
Packit Service 99d393
Packit Service 99d393
Now we can view the ``Matrix`` as a NumPy ``ndarray``,
Packit Service 99d393
and modify its contents using standard NumPy operations.
Packit Service 99d393
Packit Service 99d393
Packit Service 99d393
Memory safety and reference counting
Packit Service 99d393
------------------------------------
Packit Service 99d393
Packit Service 99d393
The ``Matrix`` class as implemented so far is unsafe.
Packit Service 99d393
The ``add_row`` operation can move the underlying buffer,
Packit Service 99d393
which invalidates any NumPy (or other) view on the data.
Packit Service 99d393
If you try to access values after an ``add_row`` call,
Packit Service 99d393
you'll get outdated values or a segfault.
Packit Service 99d393
Packit Service 99d393
This is where ``__releasebuffer__`` comes in.
Packit Service 99d393
We can add a reference count to each matrix,
Packit Service 99d393
and lock it for mutation whenever a view exists.
Packit Service 99d393
Packit Service 99d393
::
Packit Service 99d393
Packit Service 99d393
    cdef class Matrix:
Packit Service 99d393
        # ...
Packit Service 99d393
        cdef int view_count
Packit Service 99d393
Packit Service 99d393
        def __cinit__(self, Py_ssize_t ncols):
Packit Service 99d393
            self.ncols = ncols
Packit Service 99d393
            self.view_count = 0
Packit Service 99d393
Packit Service 99d393
        def add_row(self):
Packit Service 99d393
            if self.view_count > 0:
Packit Service 99d393
                raise ValueError("can't add row while being viewed")
Packit Service 99d393
            self.v.resize(self.v.size() + self.ncols)
Packit Service 99d393
Packit Service 99d393
        def __getbuffer__(self, Py_buffer *buffer, int flags):
Packit Service 99d393
            # ... as before
Packit Service 99d393
Packit Service 99d393
            self.view_count += 1
Packit Service 99d393
Packit Service 99d393
        def __releasebuffer__(self, Py_buffer *buffer):
Packit Service 99d393
            self.view_count -= 1
Packit Service 99d393
Packit Service 99d393
Packit Service 99d393
Flags
Packit Service 99d393
-----
Packit Service 99d393
We skipped some input validation in the code.
Packit Service 99d393
The ``flags`` argument to ``__getbuffer__`` comes from ``np.asarray``
Packit Service 99d393
(and other clients) and is an OR of boolean flags
Packit Service 99d393
that describe the kind of array that is requested.
Packit Service 99d393
Strictly speaking, if the flags contain ``PyBUF_ND``, ``PyBUF_SIMPLE``,
Packit Service 99d393
or ``PyBUF_F_CONTIGUOUS``, ``__getbuffer__`` must raise a ``BufferError``.
Packit Service 99d393
These macros can be ``cimport``'d from ``cpython.buffer``.
Packit Service 99d393
Packit Service 99d393
(The matrix-in-vector structure actually conforms to ``PyBUF_ND``,
Packit Service 99d393
but that would prohibit ``__getbuffer__`` from filling in the strides.
Packit Service 99d393
A single-row matrix is F-contiguous, but a larger matrix is not.)
Packit Service 99d393
Packit Service 99d393
Packit Service 99d393
References
Packit Service 99d393
----------
Packit Service 99d393
Packit Service 99d393
The buffer interface used here is set out in
Packit Service 99d393
:PEP:`3118`, Revising the buffer protocol.
Packit Service 99d393
Packit Service 99d393
A tutorial for using this API from C is on Jake Vanderplas's blog,
Packit Service 99d393
`An Introduction to the Python Buffer Protocol
Packit Service 99d393
<https://jakevdp.github.io/blog/2014/05/05/introduction-to-the-python-buffer-protocol/>`_.
Packit Service 99d393
Packit Service 99d393
Reference documentation is available for
Packit Service 99d393
`Python 3 <https://docs.python.org/3/c-api/buffer.html>`_
Packit Service 99d393
and `Python 2 <https://docs.python.org/2.7/c-api/buffer.html>`_.
Packit Service 99d393
The Py2 documentation also describes an older buffer protocol
Packit Service 99d393
that is no longer in use;
Packit Service 99d393
since Python 2.6, the :PEP:`3118` protocol has been implemented,
Packit Service 99d393
and the older protocol is only relevant for legacy code.