Blame doc/sort.rst

Packit 67cb25
.. index:: sorting, heapsort
Packit 67cb25
Packit 67cb25
*******
Packit 67cb25
Sorting
Packit 67cb25
*******
Packit 67cb25
Packit 67cb25
This chapter describes functions for sorting data, both directly and
Packit 67cb25
indirectly (using an index).  All the functions use the *heapsort*
Packit 67cb25
algorithm.  Heapsort is an :math:`O(N \log N)` algorithm which operates
Packit 67cb25
in-place and does not require any additional storage.  It also provides
Packit 67cb25
consistent performance, the running time for its worst-case (ordered
Packit 67cb25
data) being not significantly longer than the average and best cases.
Packit 67cb25
Note that the heapsort algorithm does not preserve the relative ordering
Packit 67cb25
of equal elements---it is an *unstable* sort.  However the resulting
Packit 67cb25
order of equal elements will be consistent across different platforms
Packit 67cb25
when using these functions.
Packit 67cb25
Packit 67cb25
Sorting objects
Packit 67cb25
===============
Packit 67cb25
Packit 67cb25
The following function provides a simple alternative to the standard
Packit 67cb25
library function :func:`qsort`.  It is intended for systems lacking
Packit 67cb25
:func:`qsort`, not as a replacement for it.  The function :func:`qsort`
Packit 67cb25
should be used whenever possible, as it will be faster and can provide
Packit 67cb25
stable ordering of equal elements.  Documentation for :func:`qsort` is
Packit 67cb25
available in the GNU C Library Reference Manual.
Packit 67cb25
Packit 67cb25
The functions described in this section are defined in the header file
Packit 67cb25
:file:`gsl_heapsort.h`.
Packit 67cb25
Packit 67cb25
.. index::
Packit 67cb25
   single: comparison functions, definition
Packit 67cb25
Packit 67cb25
.. function:: void gsl_heapsort (void * array, size_t count, size_t size, gsl_comparison_fn_t compare)
Packit 67cb25
Packit 67cb25
   This function sorts the :data:`count` elements of the array :data:`array`,
Packit 67cb25
   each of size :data:`size`, into ascending order using the comparison
Packit 67cb25
   function :data:`compare`.  The type of the comparison function is defined by
Packit 67cb25
Packit 67cb25
   .. type:: gsl_comparison_fn_t
Packit 67cb25
Packit 67cb25
      ::
Packit 67cb25
Packit 67cb25
         int (*gsl_comparison_fn_t) (const void * a, const void * b)
Packit 67cb25
Packit 67cb25
   A comparison function should return a negative integer if the first
Packit 67cb25
   argument is less than the second argument, :code:`0` if the two arguments
Packit 67cb25
   are equal and a positive integer if the first argument is greater than
Packit 67cb25
   the second argument.
Packit 67cb25
Packit 67cb25
   For example, the following function can be used to sort doubles into
Packit 67cb25
   ascending numerical order.
Packit 67cb25
Packit 67cb25
   ::
Packit 67cb25
Packit 67cb25
      int
Packit 67cb25
      compare_doubles (const double * a, const double * b)
Packit 67cb25
      {
Packit 67cb25
        if (*a > *b)
Packit 67cb25
          return 1;
Packit 67cb25
        else if (*a < *b)
Packit 67cb25
          return -1;
Packit 67cb25
        else
Packit 67cb25
          return 0;
Packit 67cb25
      }
Packit 67cb25
Packit 67cb25
   The appropriate function call to perform the sort is::
Packit 67cb25
Packit 67cb25
      gsl_heapsort (array, count, sizeof(double), compare_doubles);
Packit 67cb25
Packit 67cb25
   Note that unlike :func:`qsort` the heapsort algorithm cannot be made into
Packit 67cb25
   a stable sort by pointer arithmetic.  The trick of comparing pointers for
Packit 67cb25
   equal elements in the comparison function does not work for the heapsort
Packit 67cb25
   algorithm.  The heapsort algorithm performs an internal rearrangement of
Packit 67cb25
   the data which destroys its initial ordering.
Packit 67cb25
Packit 67cb25
.. index:: indirect sorting
Packit 67cb25
Packit 67cb25
.. function:: int gsl_heapsort_index (size_t * p, const void * array, size_t count, size_t size, gsl_comparison_fn_t compare)
Packit 67cb25
Packit 67cb25
   This function indirectly sorts the :data:`count` elements of the array
Packit 67cb25
   :data:`array`, each of size :data:`size`, into ascending order using the
Packit 67cb25
   comparison function :data:`compare`.  The resulting permutation is stored
Packit 67cb25
   in :data:`p`, an array of length :data:`n`.  The elements of :data:`p` give the
Packit 67cb25
   index of the array element which would have been stored in that position
Packit 67cb25
   if the array had been sorted in place.  The first element of :data:`p`
Packit 67cb25
   gives the index of the least element in :data:`array`, and the last
Packit 67cb25
   element of :data:`p` gives the index of the greatest element in
Packit 67cb25
   :data:`array`.  The array itself is not changed.
Packit 67cb25
Packit 67cb25
Sorting vectors
Packit 67cb25
===============
Packit 67cb25
Packit 67cb25
The following functions will sort the elements of an array or vector,
Packit 67cb25
either directly or indirectly.  They are defined for all real and integer
Packit 67cb25
types using the normal suffix rules.  For example, the :code:`float`
Packit 67cb25
versions of the array functions are :func:`gsl_sort_float` and
Packit 67cb25
:func:`gsl_sort_float_index`.  The corresponding vector functions are
Packit 67cb25
:func:`gsl_sort_vector_float` and :func:`gsl_sort_vector_float_index`.  The
Packit 67cb25
prototypes are available in the header files :file:`gsl_sort_float.h`
Packit 67cb25
:file:`gsl_sort_vector_float.h`.  The complete set of prototypes can be
Packit 67cb25
included using the header files :file:`gsl_sort.h` and
Packit 67cb25
:file:`gsl_sort_vector.h`.
Packit 67cb25
Packit 67cb25
There are no functions for sorting complex arrays or vectors, since the
Packit 67cb25
ordering of complex numbers is not uniquely defined.  To sort a complex
Packit 67cb25
vector by magnitude compute a real vector containing the magnitudes
Packit 67cb25
of the complex elements, and sort this vector indirectly.  The resulting
Packit 67cb25
index gives the appropriate ordering of the original complex vector.
Packit 67cb25
Packit 67cb25
.. index::
Packit 67cb25
   single: sorting vector elements
Packit 67cb25
   single: vector, sorting elements of
Packit 67cb25
Packit 67cb25
.. function:: void gsl_sort (double * data, const size_t stride, size_t n)
Packit 67cb25
Packit 67cb25
   This function sorts the :data:`n` elements of the array :data:`data` with
Packit 67cb25
   stride :data:`stride` into ascending numerical order.
Packit 67cb25
Packit 67cb25
.. function:: void gsl_sort2 (double * data1, const size_t stride1, double * data2, const size_t stride2, size_t n)
Packit 67cb25
Packit 67cb25
   This function sorts the :data:`n` elements of the array :data:`data1` with
Packit 67cb25
   stride :data:`stride1` into ascending numerical order, while making the
Packit 67cb25
   same rearrangement of the array :data:`data2` with stride :data:`stride2`,
Packit 67cb25
   also of size :data:`n`.
Packit 67cb25
Packit 67cb25
.. function:: void gsl_sort_vector (gsl_vector * v)
Packit 67cb25
Packit 67cb25
   This function sorts the elements of the vector :data:`v` into ascending
Packit 67cb25
   numerical order.
Packit 67cb25
Packit 67cb25
.. function:: void gsl_sort_vector2 (gsl_vector * v1, gsl_vector * v2)
Packit 67cb25
Packit 67cb25
   This function sorts the elements of the vector :data:`v1` into ascending
Packit 67cb25
   numerical order, while making the same rearrangement of the vector :data:`v2`.
Packit 67cb25
Packit 67cb25
.. index::
Packit 67cb25
   single: indirect sorting, of vector elements
Packit 67cb25
Packit 67cb25
.. function:: void gsl_sort_index (size_t * p, const double * data, size_t stride, size_t n)
Packit 67cb25
Packit 67cb25
   This function indirectly sorts the :data:`n` elements of the array
Packit 67cb25
   :data:`data` with stride :data:`stride` into ascending order, storing the
Packit 67cb25
   resulting permutation in :data:`p`.  The array :data:`p` must be allocated with
Packit 67cb25
   a sufficient length to store the :data:`n` elements of the permutation.
Packit 67cb25
   The elements of :data:`p` give the index of the array element which would
Packit 67cb25
   have been stored in that position if the array had been sorted in place.
Packit 67cb25
   The array :data:`data` is not changed.
Packit 67cb25
Packit 67cb25
.. function:: int gsl_sort_vector_index (gsl_permutation * p, const gsl_vector * v)
Packit 67cb25
Packit 67cb25
   This function indirectly sorts the elements of the vector :data:`v` into
Packit 67cb25
   ascending order, storing the resulting permutation in :data:`p`.  The
Packit 67cb25
   elements of :data:`p` give the index of the vector element which would
Packit 67cb25
   have been stored in that position if the vector had been sorted in
Packit 67cb25
   place.  The first element of :data:`p` gives the index of the least element
Packit 67cb25
   in :data:`v`, and the last element of :data:`p` gives the index of the
Packit 67cb25
   greatest element in :data:`v`.  The vector :data:`v` is not changed.
Packit 67cb25
Packit 67cb25
Selecting the k smallest or largest elements
Packit 67cb25
============================================
Packit 67cb25
Packit 67cb25
The functions described in this section select the :math:`k` smallest
Packit 67cb25
or largest elements of a data set of size :math:`N`.  The routines use an
Packit 67cb25
:math:`O(kN)` direct insertion algorithm which is suited to subsets that
Packit 67cb25
are small compared with the total size of the dataset. For example, the
Packit 67cb25
routines are useful for selecting the 10 largest values from one million
Packit 67cb25
data points, but not for selecting the largest 100,000 values.  If the
Packit 67cb25
subset is a significant part of the total dataset it may be faster
Packit 67cb25
to sort all the elements of the dataset directly with an :math:`O(N \log N)`
Packit 67cb25
algorithm and obtain the smallest or largest values that way.
Packit 67cb25
Packit 67cb25
.. function:: int gsl_sort_smallest (double * dest, size_t k, const double * src, size_t stride, size_t n)
Packit 67cb25
Packit 67cb25
   This function copies the :data:`k` smallest elements of the array
Packit 67cb25
   :data:`src`, of size :data:`n` and stride :data:`stride`, in ascending
Packit 67cb25
   numerical order into the array :data:`dest`.  The size :data:`k` of the subset must be
Packit 67cb25
   less than or equal to :data:`n`.  The data :data:`src` is not modified by
Packit 67cb25
   this operation.
Packit 67cb25
Packit 67cb25
.. function:: int gsl_sort_largest (double * dest, size_t k, const double * src, size_t stride, size_t n)
Packit 67cb25
Packit 67cb25
   This function copies the :data:`k` largest elements of the array
Packit 67cb25
   :data:`src`, of size :data:`n` and stride :data:`stride`, in descending
Packit 67cb25
   numerical order into the array :data:`dest`. :data:`k` must be
Packit 67cb25
   less than or equal to :data:`n`. The data :data:`src` is not modified by
Packit 67cb25
   this operation.
Packit 67cb25
Packit 67cb25
.. function:: int gsl_sort_vector_smallest (double * dest, size_t k, const gsl_vector * v)
Packit 67cb25
              int gsl_sort_vector_largest (double * dest, size_t k, const gsl_vector * v)
Packit 67cb25
Packit 67cb25
   These functions copy the :data:`k` smallest or largest elements of the
Packit 67cb25
   vector :data:`v` into the array :data:`dest`. :data:`k`
Packit 67cb25
   must be less than or equal to the length of the vector :data:`v`.
Packit 67cb25
Packit 67cb25
The following functions find the indices of the :math:`k` smallest or
Packit 67cb25
largest elements of a dataset.
Packit 67cb25
Packit 67cb25
.. function:: int gsl_sort_smallest_index (size_t * p, size_t k, const double * src, size_t stride, size_t n)
Packit 67cb25
Packit 67cb25
   This function stores the indices of the :data:`k` smallest elements of
Packit 67cb25
   the array :data:`src`, of size :data:`n` and stride :data:`stride`, in the
Packit 67cb25
   array :data:`p`.  The indices are chosen so that the corresponding data is
Packit 67cb25
   in ascending numerical order.  :data:`k` must be
Packit 67cb25
   less than or equal to :data:`n`. The data :data:`src` is not modified by
Packit 67cb25
   this operation.
Packit 67cb25
Packit 67cb25
.. function:: int gsl_sort_largest_index (size_t * p, size_t k, const double * src, size_t stride, size_t n)
Packit 67cb25
Packit 67cb25
   This function stores the indices of the :data:`k` largest elements of
Packit 67cb25
   the array :data:`src`, of size :data:`n` and stride :data:`stride`, in the
Packit 67cb25
   array :data:`p`.  The indices are chosen so that the corresponding data is
Packit 67cb25
   in descending numerical order.  :data:`k` must be
Packit 67cb25
   less than or equal to :data:`n`. The data :data:`src` is not modified by
Packit 67cb25
   this operation.
Packit 67cb25
Packit 67cb25
.. function:: int gsl_sort_vector_smallest_index (size_t * p, size_t k, const gsl_vector * v)
Packit 67cb25
              int gsl_sort_vector_largest_index (size_t * p, size_t k, const gsl_vector * v)
Packit 67cb25
Packit 67cb25
   These functions store the indices of the :data:`k` smallest or largest
Packit 67cb25
   elements of the vector :data:`v` in the array :data:`p`. :data:`k` must be less than or equal to the length of the vector
Packit 67cb25
   :data:`v`.
Packit 67cb25
Packit 67cb25
Computing the rank
Packit 67cb25
==================
Packit 67cb25
Packit 67cb25
The *rank* of an element is its order in the sorted data.  The rank
Packit 67cb25
is the inverse of the index permutation, :math:`p`.  It can be computed
Packit 67cb25
using the following algorithm::
Packit 67cb25
Packit 67cb25
  for (i = 0; i < p->size; i++) 
Packit 67cb25
    {
Packit 67cb25
      size_t pi = p->data[i];
Packit 67cb25
      rank->data[pi] = i;
Packit 67cb25
    }
Packit 67cb25
Packit 67cb25
This can be computed directly from the function
Packit 67cb25
:code:`gsl_permutation_inverse(rank,p)`.
Packit 67cb25
Packit 67cb25
The following function will print the rank of each element of the vector
Packit 67cb25
:math:`v`::
Packit 67cb25
Packit 67cb25
  void
Packit 67cb25
  print_rank (gsl_vector * v)
Packit 67cb25
  {
Packit 67cb25
    size_t i;
Packit 67cb25
    size_t n = v->size;
Packit 67cb25
    gsl_permutation * perm = gsl_permutation_alloc(n);
Packit 67cb25
    gsl_permutation * rank = gsl_permutation_alloc(n);
Packit 67cb25
Packit 67cb25
    gsl_sort_vector_index (perm, v);
Packit 67cb25
    gsl_permutation_inverse (rank, perm);
Packit 67cb25
Packit 67cb25
    for (i = 0; i < n; i++)
Packit 67cb25
      {
Packit 67cb25
        double vi = gsl_vector_get(v, i);
Packit 67cb25
        printf ("element = %d, value = %g, rank = %d\n",
Packit 67cb25
                 i, vi, rank->data[i]);
Packit 67cb25
      }
Packit 67cb25
Packit 67cb25
    gsl_permutation_free (perm);
Packit 67cb25
    gsl_permutation_free (rank);
Packit 67cb25
  }
Packit 67cb25
Packit 67cb25
Examples
Packit 67cb25
========
Packit 67cb25
Packit 67cb25
The following example shows how to use the permutation :math:`p` to print
Packit 67cb25
the elements of the vector :math:`v` in ascending order::
Packit 67cb25
Packit 67cb25
  gsl_sort_vector_index (p, v);
Packit 67cb25
Packit 67cb25
  for (i = 0; i < v->size; i++)
Packit 67cb25
    {
Packit 67cb25
      double vpi = gsl_vector_get (v, p->data[i]);
Packit 67cb25
      printf ("order = %d, value = %g\n", i, vpi);
Packit 67cb25
    }
Packit 67cb25
Packit 67cb25
The next example uses the function :func:`gsl_sort_smallest` to select
Packit 67cb25
the 5 smallest numbers from 100000 uniform random variates stored in an
Packit 67cb25
array,
Packit 67cb25
Packit 67cb25
.. include:: examples/sortsmall.c
Packit 67cb25
   :code:
Packit 67cb25
Packit 67cb25
The output lists the 5 smallest values, in ascending order,
Packit 67cb25
Packit 67cb25
.. include:: examples/sortsmall.txt
Packit 67cb25
   :code:
Packit 67cb25
Packit 67cb25
References and Further Reading
Packit 67cb25
==============================
Packit 67cb25
Packit 67cb25
The subject of sorting is covered extensively in the following,
Packit 67cb25
Packit 67cb25
* Donald E. Knuth, The Art of Computer Programming: Sorting and
Packit 67cb25
  Searching (Vol 3, 3rd Ed, 1997), Addison-Wesley, ISBN 0201896850.
Packit 67cb25
Packit 67cb25
The Heapsort algorithm is described in the following book,
Packit 67cb25
Packit 67cb25
* Robert Sedgewick, Algorithms in C, Addison-Wesley, 
Packit 67cb25
  ISBN 0201514257.