Blame external/pybind11/docs/advanced/pycpp/numpy.rst

Packit 534379
.. _numpy:
Packit 534379
Packit 534379
NumPy
Packit 534379
#####
Packit 534379
Packit 534379
Buffer protocol
Packit 534379
===============
Packit 534379
Packit 534379
Python supports an extremely general and convenient approach for exchanging
Packit 534379
data between plugin libraries. Types can expose a buffer view [#f2]_, which
Packit 534379
provides fast direct access to the raw internal data representation. Suppose we
Packit 534379
want to bind the following simplistic Matrix class:
Packit 534379
Packit 534379
.. code-block:: cpp
Packit 534379
Packit 534379
    class Matrix {
Packit 534379
    public:
Packit 534379
        Matrix(size_t rows, size_t cols) : m_rows(rows), m_cols(cols) {
Packit 534379
            m_data = new float[rows*cols];
Packit 534379
        }
Packit 534379
        float *data() { return m_data; }
Packit 534379
        size_t rows() const { return m_rows; }
Packit 534379
        size_t cols() const { return m_cols; }
Packit 534379
    private:
Packit 534379
        size_t m_rows, m_cols;
Packit 534379
        float *m_data;
Packit 534379
    };
Packit 534379
Packit 534379
The following binding code exposes the ``Matrix`` contents as a buffer object,
Packit 534379
making it possible to cast Matrices into NumPy arrays. It is even possible to
Packit 534379
completely avoid copy operations with Python expressions like
Packit 534379
``np.array(matrix_instance, copy = False)``.
Packit 534379
Packit 534379
.. code-block:: cpp
Packit 534379
Packit 534379
    py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
Packit 534379
       .def_buffer([](Matrix &m) -> py::buffer_info {
Packit 534379
            return py::buffer_info(
Packit 534379
                m.data(),                               /* Pointer to buffer */
Packit 534379
                sizeof(float),                          /* Size of one scalar */
Packit 534379
                py::format_descriptor<float>::format(), /* Python struct-style format descriptor */
Packit 534379
                2,                                      /* Number of dimensions */
Packit 534379
                { m.rows(), m.cols() },                 /* Buffer dimensions */
Packit 534379
                { sizeof(float) * m.cols(),             /* Strides (in bytes) for each index */
Packit 534379
                  sizeof(float) }
Packit 534379
            );
Packit 534379
        });
Packit 534379
Packit 534379
Supporting the buffer protocol in a new type involves specifying the special
Packit 534379
``py::buffer_protocol()`` tag in the ``py::class_`` constructor and calling the
Packit 534379
``def_buffer()`` method with a lambda function that creates a
Packit 534379
``py::buffer_info`` description record on demand describing a given matrix
Packit 534379
instance. The contents of ``py::buffer_info`` mirror the Python buffer protocol
Packit 534379
specification.
Packit 534379
Packit 534379
.. code-block:: cpp
Packit 534379
Packit 534379
    struct buffer_info {
Packit 534379
        void *ptr;
Packit 534379
        ssize_t itemsize;
Packit 534379
        std::string format;
Packit 534379
        ssize_t ndim;
Packit 534379
        std::vector<ssize_t> shape;
Packit 534379
        std::vector<ssize_t> strides;
Packit 534379
    };
Packit 534379
Packit 534379
To create a C++ function that can take a Python buffer object as an argument,
Packit 534379
simply use the type ``py::buffer`` as one of its arguments. Buffers can exist
Packit 534379
in a great variety of configurations, hence some safety checks are usually
Packit 534379
necessary in the function body. Below, you can see an basic example on how to
Packit 534379
define a custom constructor for the Eigen double precision matrix
Packit 534379
(``Eigen::MatrixXd``) type, which supports initialization from compatible
Packit 534379
buffer objects (e.g. a NumPy matrix).
Packit 534379
Packit 534379
.. code-block:: cpp
Packit 534379
Packit 534379
    /* Bind MatrixXd (or some other Eigen type) to Python */
Packit 534379
    typedef Eigen::MatrixXd Matrix;
Packit 534379
Packit 534379
    typedef Matrix::Scalar Scalar;
Packit 534379
    constexpr bool rowMajor = Matrix::Flags & Eigen::RowMajorBit;
Packit 534379
Packit 534379
    py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
Packit 534379
        .def("__init__", [](Matrix &m, py::buffer b) {
Packit 534379
            typedef Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic> Strides;
Packit 534379
Packit 534379
            /* Request a buffer descriptor from Python */
Packit 534379
            py::buffer_info info = b.request();
Packit 534379
Packit 534379
            /* Some sanity checks ... */
Packit 534379
            if (info.format != py::format_descriptor<Scalar>::format())
Packit 534379
                throw std::runtime_error("Incompatible format: expected a double array!");
Packit 534379
Packit 534379
            if (info.ndim != 2)
Packit 534379
                throw std::runtime_error("Incompatible buffer dimension!");
Packit 534379
Packit 534379
            auto strides = Strides(
Packit 534379
                info.strides[rowMajor ? 0 : 1] / (py::ssize_t)sizeof(Scalar),
Packit 534379
                info.strides[rowMajor ? 1 : 0] / (py::ssize_t)sizeof(Scalar));
Packit 534379
Packit 534379
            auto map = Eigen::Map<Matrix, 0, Strides>(
Packit 534379
                static_cast<Scalar *>(info.ptr), info.shape[0], info.shape[1], strides);
Packit 534379
Packit 534379
            new (&m) Matrix(map);
Packit 534379
        });
Packit 534379
Packit 534379
For reference, the ``def_buffer()`` call for this Eigen data type should look
Packit 534379
as follows:
Packit 534379
Packit 534379
.. code-block:: cpp
Packit 534379
Packit 534379
    .def_buffer([](Matrix &m) -> py::buffer_info {
Packit 534379
        return py::buffer_info(
Packit 534379
            m.data(),                                /* Pointer to buffer */
Packit 534379
            sizeof(Scalar),                          /* Size of one scalar */
Packit 534379
            py::format_descriptor<Scalar>::format(), /* Python struct-style format descriptor */
Packit 534379
            2,                                       /* Number of dimensions */
Packit 534379
            { m.rows(), m.cols() },                  /* Buffer dimensions */
Packit 534379
            { sizeof(Scalar) * (rowMajor ? m.cols() : 1),
Packit 534379
              sizeof(Scalar) * (rowMajor ? 1 : m.rows()) }
Packit 534379
                                                     /* Strides (in bytes) for each index */
Packit 534379
        );
Packit 534379
     })
Packit 534379
Packit 534379
For a much easier approach of binding Eigen types (although with some
Packit 534379
limitations), refer to the section on :doc:`/advanced/cast/eigen`.
Packit 534379
Packit 534379
.. seealso::
Packit 534379
Packit 534379
    The file :file:`tests/test_buffers.cpp` contains a complete example
Packit 534379
    that demonstrates using the buffer protocol with pybind11 in more detail.
Packit 534379
Packit 534379
.. [#f2] http://docs.python.org/3/c-api/buffer.html
Packit 534379
Packit 534379
Arrays
Packit 534379
======
Packit 534379
Packit 534379
By exchanging ``py::buffer`` with ``py::array`` in the above snippet, we can
Packit 534379
restrict the function so that it only accepts NumPy arrays (rather than any
Packit 534379
type of Python object satisfying the buffer protocol).
Packit 534379
Packit 534379
In many situations, we want to define a function which only accepts a NumPy
Packit 534379
array of a certain data type. This is possible via the ``py::array_t<T>``
Packit 534379
template. For instance, the following function requires the argument to be a
Packit 534379
NumPy array containing double precision values.
Packit 534379
Packit 534379
.. code-block:: cpp
Packit 534379
Packit 534379
    void f(py::array_t<double> array);
Packit 534379
Packit 534379
When it is invoked with a different type (e.g. an integer or a list of
Packit 534379
integers), the binding code will attempt to cast the input into a NumPy array
Packit 534379
of the requested type. Note that this feature requires the
Packit 534379
:file:`pybind11/numpy.h` header to be included.
Packit 534379
Packit 534379
Data in NumPy arrays is not guaranteed to packed in a dense manner;
Packit 534379
furthermore, entries can be separated by arbitrary column and row strides.
Packit 534379
Sometimes, it can be useful to require a function to only accept dense arrays
Packit 534379
using either the C (row-major) or Fortran (column-major) ordering. This can be
Packit 534379
accomplished via a second template argument with values ``py::array::c_style``
Packit 534379
or ``py::array::f_style``.
Packit 534379
Packit 534379
.. code-block:: cpp
Packit 534379
Packit 534379
    void f(py::array_t<double, py::array::c_style | py::array::forcecast> array);
Packit 534379
Packit 534379
The ``py::array::forcecast`` argument is the default value of the second
Packit 534379
template parameter, and it ensures that non-conforming arguments are converted
Packit 534379
into an array satisfying the specified requirements instead of trying the next
Packit 534379
function overload.
Packit 534379
Packit 534379
Structured types
Packit 534379
================
Packit 534379
Packit 534379
In order for ``py::array_t`` to work with structured (record) types, we first
Packit 534379
need to register the memory layout of the type. This can be done via
Packit 534379
``PYBIND11_NUMPY_DTYPE`` macro, called in the plugin definition code, which
Packit 534379
expects the type followed by field names:
Packit 534379
Packit 534379
.. code-block:: cpp
Packit 534379
Packit 534379
    struct A {
Packit 534379
        int x;
Packit 534379
        double y;
Packit 534379
    };
Packit 534379
Packit 534379
    struct B {
Packit 534379
        int z;
Packit 534379
        A a;
Packit 534379
    };
Packit 534379
Packit 534379
    // ...
Packit 534379
    PYBIND11_MODULE(test, m) {
Packit 534379
        // ...
Packit 534379
Packit 534379
        PYBIND11_NUMPY_DTYPE(A, x, y);
Packit 534379
        PYBIND11_NUMPY_DTYPE(B, z, a);
Packit 534379
        /* now both A and B can be used as template arguments to py::array_t */
Packit 534379
    }
Packit 534379
Packit 534379
The structure should consist of fundamental arithmetic types, ``std::complex``,
Packit 534379
previously registered substructures, and arrays of any of the above. Both C++
Packit 534379
arrays and ``std::array`` are supported. While there is a static assertion to
Packit 534379
prevent many types of unsupported structures, it is still the user's
Packit 534379
responsibility to use only "plain" structures that can be safely manipulated as
Packit 534379
raw memory without violating invariants.
Packit 534379
Packit 534379
Vectorizing functions
Packit 534379
=====================
Packit 534379
Packit 534379
Suppose we want to bind a function with the following signature to Python so
Packit 534379
that it can process arbitrary NumPy array arguments (vectors, matrices, general
Packit 534379
N-D arrays) in addition to its normal arguments:
Packit 534379
Packit 534379
.. code-block:: cpp
Packit 534379
Packit 534379
    double my_func(int x, float y, double z);
Packit 534379
Packit 534379
After including the ``pybind11/numpy.h`` header, this is extremely simple:
Packit 534379
Packit 534379
.. code-block:: cpp
Packit 534379
Packit 534379
    m.def("vectorized_func", py::vectorize(my_func));
Packit 534379
Packit 534379
Invoking the function like below causes 4 calls to be made to ``my_func`` with
Packit 534379
each of the array elements. The significant advantage of this compared to
Packit 534379
solutions like ``numpy.vectorize()`` is that the loop over the elements runs
Packit 534379
entirely on the C++ side and can be crunched down into a tight, optimized loop
Packit 534379
by the compiler. The result is returned as a NumPy array of type
Packit 534379
``numpy.dtype.float64``.
Packit 534379
Packit 534379
.. code-block:: pycon
Packit 534379
Packit 534379
    >>> x = np.array([[1, 3],[5, 7]])
Packit 534379
    >>> y = np.array([[2, 4],[6, 8]])
Packit 534379
    >>> z = 3
Packit 534379
    >>> result = vectorized_func(x, y, z)
Packit 534379
Packit 534379
The scalar argument ``z`` is transparently replicated 4 times.  The input
Packit 534379
arrays ``x`` and ``y`` are automatically converted into the right types (they
Packit 534379
are of type  ``numpy.dtype.int64`` but need to be ``numpy.dtype.int32`` and
Packit 534379
``numpy.dtype.float32``, respectively).
Packit 534379
Packit 534379
.. note::
Packit 534379
Packit 534379
    Only arithmetic, complex, and POD types passed by value or by ``const &``
Packit 534379
    reference are vectorized; all other arguments are passed through as-is.
Packit 534379
    Functions taking rvalue reference arguments cannot be vectorized.
Packit 534379
Packit 534379
In cases where the computation is too complicated to be reduced to
Packit 534379
``vectorize``, it will be necessary to create and access the buffer contents
Packit 534379
manually. The following snippet contains a complete example that shows how this
Packit 534379
works (the code is somewhat contrived, since it could have been done more
Packit 534379
simply using ``vectorize``).
Packit 534379
Packit 534379
.. code-block:: cpp
Packit 534379
Packit 534379
    #include <pybind11/pybind11.h>
Packit 534379
    #include <pybind11/numpy.h>
Packit 534379
Packit 534379
    namespace py = pybind11;
Packit 534379
Packit 534379
    py::array_t<double> add_arrays(py::array_t<double> input1, py::array_t<double> input2) {
Packit 534379
        py::buffer_info buf1 = input1.request(), buf2 = input2.request();
Packit 534379
Packit 534379
        if (buf1.ndim != 1 || buf2.ndim != 1)
Packit 534379
            throw std::runtime_error("Number of dimensions must be one");
Packit 534379
Packit 534379
        if (buf1.size != buf2.size)
Packit 534379
            throw std::runtime_error("Input shapes must match");
Packit 534379
Packit 534379
        /* No pointer is passed, so NumPy will allocate the buffer */
Packit 534379
        auto result = py::array_t<double>(buf1.size);
Packit 534379
Packit 534379
        py::buffer_info buf3 = result.request();
Packit 534379
Packit 534379
        double *ptr1 = (double *) buf1.ptr,
Packit 534379
               *ptr2 = (double *) buf2.ptr,
Packit 534379
               *ptr3 = (double *) buf3.ptr;
Packit 534379
Packit 534379
        for (size_t idx = 0; idx < buf1.shape[0]; idx++)
Packit 534379
            ptr3[idx] = ptr1[idx] + ptr2[idx];
Packit 534379
Packit 534379
        return result;
Packit 534379
    }
Packit 534379
Packit 534379
    PYBIND11_MODULE(test, m) {
Packit 534379
        m.def("add_arrays", &add_arrays, "Add two NumPy arrays");
Packit 534379
    }
Packit 534379
Packit 534379
.. seealso::
Packit 534379
Packit 534379
    The file :file:`tests/test_numpy_vectorize.cpp` contains a complete
Packit 534379
    example that demonstrates using :func:`vectorize` in more detail.
Packit 534379
Packit 534379
Direct access
Packit 534379
=============
Packit 534379
Packit 534379
For performance reasons, particularly when dealing with very large arrays, it
Packit 534379
is often desirable to directly access array elements without internal checking
Packit 534379
of dimensions and bounds on every access when indices are known to be already
Packit 534379
valid.  To avoid such checks, the ``array`` class and ``array_t<T>`` template
Packit 534379
class offer an unchecked proxy object that can be used for this unchecked
Packit 534379
access through the ``unchecked<N>`` and ``mutable_unchecked<N>`` methods,
Packit 534379
where ``N`` gives the required dimensionality of the array:
Packit 534379
Packit 534379
.. code-block:: cpp
Packit 534379
Packit 534379
    m.def("sum_3d", [](py::array_t<double> x) {
Packit 534379
        auto r = x.unchecked<3>(); // x must have ndim = 3; can be non-writeable
Packit 534379
        double sum = 0;
Packit 534379
        for (ssize_t i = 0; i < r.shape(0); i++)
Packit 534379
            for (ssize_t j = 0; j < r.shape(1); j++)
Packit 534379
                for (ssize_t k = 0; k < r.shape(2); k++)
Packit 534379
                    sum += r(i, j, k);
Packit 534379
        return sum;
Packit 534379
    });
Packit 534379
    m.def("increment_3d", [](py::array_t<double> x) {
Packit 534379
        auto r = x.mutable_unchecked<3>(); // Will throw if ndim != 3 or flags.writeable is false
Packit 534379
        for (ssize_t i = 0; i < r.shape(0); i++)
Packit 534379
            for (ssize_t j = 0; j < r.shape(1); j++)
Packit 534379
                for (ssize_t k = 0; k < r.shape(2); k++)
Packit 534379
                    r(i, j, k) += 1.0;
Packit 534379
    }, py::arg().noconvert());
Packit 534379
Packit 534379
To obtain the proxy from an ``array`` object, you must specify both the data
Packit 534379
type and number of dimensions as template arguments, such as ``auto r =
Packit 534379
myarray.mutable_unchecked<float, 2>()``.
Packit 534379
Packit 534379
If the number of dimensions is not known at compile time, you can omit the
Packit 534379
dimensions template parameter (i.e. calling ``arr_t.unchecked()`` or
Packit 534379
``arr.unchecked<T>()``.  This will give you a proxy object that works in the
Packit 534379
same way, but results in less optimizable code and thus a small efficiency
Packit 534379
loss in tight loops.
Packit 534379
Packit 534379
Note that the returned proxy object directly references the array's data, and
Packit 534379
only reads its shape, strides, and writeable flag when constructed.  You must
Packit 534379
take care to ensure that the referenced array is not destroyed or reshaped for
Packit 534379
the duration of the returned object, typically by limiting the scope of the
Packit 534379
returned instance.
Packit 534379
Packit 534379
The returned proxy object supports some of the same methods as ``py::array`` so
Packit 534379
that it can be used as a drop-in replacement for some existing, index-checked
Packit 534379
uses of ``py::array``:
Packit 534379
Packit 534379
- ``r.ndim()`` returns the number of dimensions
Packit 534379
Packit 534379
- ``r.data(1, 2, ...)`` and ``r.mutable_data(1, 2, ...)``` returns a pointer to
Packit 534379
  the ``const T`` or ``T`` data, respectively, at the given indices.  The
Packit 534379
  latter is only available to proxies obtained via ``a.mutable_unchecked()``.
Packit 534379
Packit 534379
- ``itemsize()`` returns the size of an item in bytes, i.e. ``sizeof(T)``.
Packit 534379
Packit 534379
- ``ndim()`` returns the number of dimensions.
Packit 534379
Packit 534379
- ``shape(n)`` returns the size of dimension ``n``
Packit 534379
Packit 534379
- ``size()`` returns the total number of elements (i.e. the product of the shapes).
Packit 534379
Packit 534379
- ``nbytes()`` returns the number of bytes used by the referenced elements
Packit 534379
  (i.e. ``itemsize()`` times ``size()``).
Packit 534379
Packit 534379
.. seealso::
Packit 534379
Packit 534379
    The file :file:`tests/test_numpy_array.cpp` contains additional examples
Packit 534379
    demonstrating the use of this feature.
Packit 534379
Packit 534379
Ellipsis
Packit 534379
========
Packit 534379
Packit 534379
Python 3 provides a convenient ``...`` ellipsis notation that is often used to
Packit 534379
slice multidimensional arrays. For instance, the following snippet extracts the
Packit 534379
middle dimensions of a tensor with the first and last index set to zero.
Packit 534379
Packit 534379
.. code-block:: python
Packit 534379
Packit 534379
   a = # a NumPy array
Packit 534379
   b = a[0, ..., 0]
Packit 534379
Packit 534379
The function ``py::ellipsis()`` function can be used to perform the same
Packit 534379
operation on the C++ side:
Packit 534379
Packit 534379
.. code-block:: cpp
Packit 534379
Packit 534379
   py::array a = /* A NumPy array */;
Packit 534379
   py::array b = a[py::make_tuple(0, py::ellipsis(), 0)];