NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.
At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance. There are several important differences between NumPy arrays and the standard Python sequences:
The points about sequence size and speed are particularly important in
scientific computing. As a simple example, consider the case of
multiplying each element in a 1-D sequence with the corresponding
element in another sequence of the same length. If the data are
stored in two Python lists, a
and b
, we could iterate over
each element:
c = [] for i in range(len(a)): c.append(a[i]*b[i])
This produces the correct answer, but if a
and b
each contain
millions of numbers, we will pay the price for the inefficiencies of
looping in Python. We could accomplish the same task much more
quickly in C by writing (for clarity we neglect variable declarations
and initializations, memory allocation, etc.)
for (i = 0; i < rows; i++): { c[i] = a[i]*b[i]; }
This saves all the overhead involved in interpreting the Python code and manipulating Python objects, but at the expense of the benefits gained from coding in Python. Furthermore, the coding work required increases with the dimensionality of our data. In the case of a 2-D array, for example, the C code (abridged as before) expands to
for (i = 0; i < rows; i++): { for (j = 0; j < columns; j++): { c[i][j] = a[i][j]*b[i][j]; } }
NumPy gives us the best of both worlds: element-by-element operations are the "default mode" when an ndarray is involved, but the element-by-element operation is speedily executed by pre-compiled C code. In NumPy
c = a * b
does what the earlier examples do, at near-C speeds, but with the code simplicity we expect from something based on Python. Indeed, the NumPy idiom is even simpler! This last example illustrates two of NumPy's features which are the basis of much of its power: vectorization and broadcasting.
Vectorization describes the absence of any explicit looping, indexing, etc., in the code - these things are taking place, of course, just "behind the scenes" in optimized, pre-compiled C code. Vectorized code has many advantages, among which are:
for
loops.Broadcasting is the term used to describe the implicit
element-by-element behavior of operations; generally speaking, in
NumPy all operations, not just arithmetic operations, but
logical, bit-wise, functional, etc., behave in this implicit
element-by-element fashion, i.e., they broadcast. Moreover, in the
example above, a
and b
could be multidimensional arrays of the
same shape, or a scalar and an array, or even two arrays of with
different shapes, provided that the smaller array is "expandable" to
the shape of the larger in such a way that the resulting broadcast is
unambiguous. For detailed "rules" of broadcasting see
numpy.doc.broadcasting.
NumPy fully supports an object-oriented approach, starting, once again, with ndarray. For example, ndarray is a class, possessing numerous methods and attributes. Many of its methods mirror functions in the outer-most NumPy namespace, giving the programmer complete freedom to code in whichever paradigm she prefers and/or which seems most appropriate to the task at hand.