Blame doc/rstat.rst

Packit 67cb25
.. index::
Packit 67cb25
   single: running statistics
Packit 67cb25
   single: online statistics
Packit 67cb25
Packit 67cb25
******************
Packit 67cb25
Running Statistics
Packit 67cb25
******************
Packit 67cb25
Packit 67cb25
This chapter describes routines for computing running statistics,
Packit 67cb25
also known as online statistics, of data. These routines are
Packit 67cb25
suitable for handling large datasets for which it may be
Packit 67cb25
inconvenient or impractical to store in memory all at once.
Packit 67cb25
The data can be processed in a single pass, one point at a time.
Packit 67cb25
Each time a data point is added to the accumulator, internal
Packit 67cb25
parameters are updated in order to compute the current mean, variance,
Packit 67cb25
standard deviation, skewness, and kurtosis. These statistics are
Packit 67cb25
exact, and are updated with numerically stable single-pass algorithms.
Packit 67cb25
The median and arbitrary quantiles are also available, however these
Packit 67cb25
calculations use algorithms which provide approximations, and grow
Packit 67cb25
more accurate as more data is added to the accumulator.
Packit 67cb25
Packit 67cb25
The functions described in this chapter are declared in the header file
Packit 67cb25
:file:`gsl_rstat.h`.
Packit 67cb25
Packit 67cb25
Initializing the Accumulator
Packit 67cb25
============================
Packit 67cb25
Packit 67cb25
.. type:: gsl_rstat_workspace
Packit 67cb25
Packit 67cb25
   This workspace contains parameters used to calculate various statistics
Packit 67cb25
   and are updated after each data point is added to the accumulator.
Packit 67cb25
Packit 67cb25
.. function:: gsl_rstat_workspace * gsl_rstat_alloc (void)
Packit 67cb25
Packit 67cb25
   This function allocates a workspace for computing running statistics.
Packit 67cb25
   The size of the workspace is :math:`O(1)`.
Packit 67cb25
Packit 67cb25
.. function:: void gsl_rstat_free (gsl_rstat_workspace * w)
Packit 67cb25
Packit 67cb25
   This function frees the memory associated with the workspace :data:`w`.
Packit 67cb25
Packit 67cb25
.. function:: int gsl_rstat_reset (gsl_rstat_workspace * w)
Packit 67cb25
Packit 67cb25
   This function resets the workspace :data:`w` to its initial state,
Packit 67cb25
   so it can begin working on a new set of data.
Packit 67cb25
Packit 67cb25
Adding Data to the Accumulator
Packit 67cb25
==============================
Packit 67cb25
Packit 67cb25
.. function:: int gsl_rstat_add (const double x, gsl_rstat_workspace * w)
Packit 67cb25
Packit 67cb25
   This function adds the data point :data:`x` to the statistical
Packit 67cb25
   accumulator, updating calculations of the mean, variance,
Packit 67cb25
   standard deviation, skewness, kurtosis, and median.
Packit 67cb25
Packit 67cb25
.. function:: size_t gsl_rstat_n (const gsl_rstat_workspace * w)
Packit 67cb25
Packit 67cb25
   This function returns the number of data so far added to the accumulator.
Packit 67cb25
Packit 67cb25
Current Statistics
Packit 67cb25
==================
Packit 67cb25
Packit 67cb25
.. function:: double gsl_rstat_min (const gsl_rstat_workspace * w)
Packit 67cb25
Packit 67cb25
   This function returns the minimum value added to the accumulator.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_rstat_max (const gsl_rstat_workspace * w)
Packit 67cb25
Packit 67cb25
   This function returns the maximum value added to the accumulator.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_rstat_mean (const gsl_rstat_workspace * w)
Packit 67cb25
Packit 67cb25
   This function returns the mean of all data added to the accumulator,
Packit 67cb25
   defined as
Packit 67cb25
Packit 67cb25
   .. only:: not texinfo
Packit 67cb25
Packit 67cb25
      .. math:: {\Hat\mu} = {1 \over N} \sum x_i
Packit 67cb25
Packit 67cb25
   .. only:: texinfo
Packit 67cb25
Packit 67cb25
      ::
Packit 67cb25
Packit 67cb25
         \Hat\mu = (1/N) \sum x_i
Packit 67cb25
Packit 67cb25
.. function:: double gsl_rstat_variance (const gsl_rstat_workspace * w)
Packit 67cb25
Packit 67cb25
   This function returns the variance of all data added to the accumulator,
Packit 67cb25
   defined as
Packit 67cb25
Packit 67cb25
   .. only:: not texinfo
Packit 67cb25
Packit 67cb25
      .. math:: {\Hat\sigma}^2 = {1 \over (N-1)} \sum (x_i - {\Hat\mu})^2
Packit 67cb25
Packit 67cb25
   .. only:: texinfo
Packit 67cb25
Packit 67cb25
      ::
Packit 67cb25
Packit 67cb25
         \Hat\sigma^2 = (1/(N-1)) \sum (x_i - \Hat\mu)^2
Packit 67cb25
Packit 67cb25
.. function:: double gsl_rstat_sd (const gsl_rstat_workspace * w)
Packit 67cb25
Packit 67cb25
   This function returns the standard deviation of all data added to the
Packit 67cb25
   accumulator, defined as the square root of the variance given above.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_rstat_sd_mean (const gsl_rstat_workspace * w)
Packit 67cb25
Packit 67cb25
   This function returns the standard deviation of the mean, defined as
Packit 67cb25
Packit 67cb25
   .. only:: not texinfo
Packit 67cb25
   
Packit 67cb25
      .. math:: \Hat\sigma_{\Hat\mu} = {\Hat\sigma \over \sqrt{N}}
Packit 67cb25
Packit 67cb25
   .. only:: texinfo
Packit 67cb25
Packit 67cb25
      ::
Packit 67cb25
Packit 67cb25
         sd_mean = \Hat\sigma / \sqrt{N}
Packit 67cb25
Packit 67cb25
.. function:: double gsl_rstat_rms (const gsl_rstat_workspace * w)
Packit 67cb25
Packit 67cb25
   This function returns the root mean square of all data added to the
Packit 67cb25
   accumulator, defined as
Packit 67cb25
Packit 67cb25
   .. math:: rms = \sqrt{{1 \over N} \sum x_i^2}
Packit 67cb25
Packit 67cb25
.. function:: double gsl_rstat_skew (const gsl_rstat_workspace * w)
Packit 67cb25
Packit 67cb25
   This function returns the skewness of all data added to the accumulator,
Packit 67cb25
   defined as
Packit 67cb25
Packit 67cb25
   .. only:: not texinfo
Packit 67cb25
Packit 67cb25
      .. math::
Packit 67cb25
Packit 67cb25
         skew = {1 \over N} \sum 
Packit 67cb25
          {\left( x_i - {\Hat\mu} \over {\Hat\sigma} \right)}^3
Packit 67cb25
Packit 67cb25
   .. only:: texinfo
Packit 67cb25
Packit 67cb25
      ::
Packit 67cb25
Packit 67cb25
         skew = (1/N) \sum ((x_i - \Hat\mu)/\Hat\sigma)^3
Packit 67cb25
Packit 67cb25
.. function:: double gsl_rstat_kurtosis (const gsl_rstat_workspace * w)
Packit 67cb25
Packit 67cb25
   This function returns the kurtosis of all data added to the accumulator,
Packit 67cb25
   defined as
Packit 67cb25
Packit 67cb25
   .. only:: not texinfo
Packit 67cb25
Packit 67cb25
      .. math::
Packit 67cb25
Packit 67cb25
         kurtosis = \left( {1 \over N} \sum 
Packit 67cb25
          {\left(x_i - {\Hat\mu} \over {\Hat\sigma} \right)}^4 
Packit 67cb25
          \right) 
Packit 67cb25
          - 3
Packit 67cb25
Packit 67cb25
   .. only:: texinfo
Packit 67cb25
Packit 67cb25
      ::
Packit 67cb25
Packit 67cb25
         kurtosis = ((1/N) \sum ((x_i - \Hat\mu)/\Hat\sigma)^4)  - 3
Packit 67cb25
Packit 67cb25
.. function:: double gsl_rstat_median (gsl_rstat_workspace * w)
Packit 67cb25
Packit 67cb25
   This function returns an estimate of the median of the data added to
Packit 67cb25
   the accumulator.
Packit 67cb25
Packit 67cb25
Quantiles
Packit 67cb25
=========
Packit 67cb25
Packit 67cb25
The functions in this section estimate quantiles dynamically without
Packit 67cb25
storing the entire dataset, using the algorithm of Jain and Chlamtec, 1985.
Packit 67cb25
Only five points (markers) are stored which represent the minimum
Packit 67cb25
and maximum of the data, as well as current estimates of the
Packit 67cb25
:math:`p/2`-, :math:`p`-, and :math:`(1+p)/2`-quantiles. Each time
Packit 67cb25
a new data point is added, the marker positions and heights are
Packit 67cb25
updated.
Packit 67cb25
Packit 67cb25
.. type:: gsl_rstat_quantile_workspace
Packit 67cb25
Packit 67cb25
   This workspace contains parameters for estimating quantiles of the
Packit 67cb25
   current dataset
Packit 67cb25
Packit 67cb25
.. function:: gsl_rstat_quantile_workspace * gsl_rstat_quantile_alloc (const double p)
Packit 67cb25
Packit 67cb25
   This function allocates a workspace for the dynamic estimation of
Packit 67cb25
   :data:`p`-quantiles, where :data:`p` is between :math:`0` and :math:`1`.
Packit 67cb25
   The median corresponds to :math:`p = 0.5`. The size of the workspace
Packit 67cb25
   is :math:`O(1)`.
Packit 67cb25
Packit 67cb25
.. function:: void gsl_rstat_quantile_free (gsl_rstat_quantile_workspace * w)
Packit 67cb25
Packit 67cb25
   This function frees the memory associated with the workspace :data:`w`.
Packit 67cb25
Packit 67cb25
.. function:: int gsl_rstat_quantile_reset (gsl_rstat_quantile_workspace * w)
Packit 67cb25
Packit 67cb25
   This function resets the workspace :data:`w` to its initial state,
Packit 67cb25
   so it can begin working on a new set of data.
Packit 67cb25
Packit 67cb25
.. function:: int gsl_rstat_quantile_add (const double x, gsl_rstat_quantile_workspace * w)
Packit 67cb25
Packit 67cb25
   This function updates the estimate of the :math:`p`-quantile with
Packit 67cb25
   the new data point :data:`x`.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_rstat_quantile_get (gsl_rstat_quantile_workspace * w)
Packit 67cb25
Packit 67cb25
   This function returns the current estimate of the :math:`p`-quantile.
Packit 67cb25
Packit 67cb25
Examples
Packit 67cb25
========
Packit 67cb25
Packit 67cb25
Here is a basic example of how to use the statistical functions:
Packit 67cb25
Packit 67cb25
.. include:: examples/rstat.c
Packit 67cb25
   :code:
Packit 67cb25
Packit 67cb25
The program should produce the following output,
Packit 67cb25
Packit 67cb25
.. include:: examples/rstat.txt
Packit 67cb25
   :code:
Packit 67cb25
Packit 67cb25
This next program estimates the lower quartile, median and upper
Packit 67cb25
quartile from 10,000 samples of a random Rayleigh distribution,
Packit 67cb25
using the :math:`P^2` algorithm of Jain and Chlamtec. For
Packit 67cb25
comparison, the exact values are also computed from the sorted
Packit 67cb25
dataset.
Packit 67cb25
Packit 67cb25
.. include:: examples/rquantile.c
Packit 67cb25
   :code:
Packit 67cb25
Packit 67cb25
The program should produce the following output,
Packit 67cb25
Packit 67cb25
.. include:: examples/rquantile.txt
Packit 67cb25
   :code:
Packit 67cb25
Packit 67cb25
References and Further Reading
Packit 67cb25
==============================
Packit 67cb25
Packit 67cb25
The algorithm used to dynamically estimate :math:`p`-quantiles is described
Packit 67cb25
in the paper,
Packit 67cb25
Packit 67cb25
* R. Jain and I. Chlamtac.
Packit 67cb25
  *The P^2 algorithm for dynamic calculation of quantiles and histograms without storing observations*,
Packit 67cb25
  Communications of the ACM, Volume 28 (October), Number 10, 1985,
Packit 67cb25
  p. 1076-1085.