|
Packit |
67cb25 |
.. index::
|
|
Packit |
67cb25 |
single: running statistics
|
|
Packit |
67cb25 |
single: online statistics
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
******************
|
|
Packit |
67cb25 |
Running Statistics
|
|
Packit |
67cb25 |
******************
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
This chapter describes routines for computing running statistics,
|
|
Packit |
67cb25 |
also known as online statistics, of data. These routines are
|
|
Packit |
67cb25 |
suitable for handling large datasets for which it may be
|
|
Packit |
67cb25 |
inconvenient or impractical to store in memory all at once.
|
|
Packit |
67cb25 |
The data can be processed in a single pass, one point at a time.
|
|
Packit |
67cb25 |
Each time a data point is added to the accumulator, internal
|
|
Packit |
67cb25 |
parameters are updated in order to compute the current mean, variance,
|
|
Packit |
67cb25 |
standard deviation, skewness, and kurtosis. These statistics are
|
|
Packit |
67cb25 |
exact, and are updated with numerically stable single-pass algorithms.
|
|
Packit |
67cb25 |
The median and arbitrary quantiles are also available, however these
|
|
Packit |
67cb25 |
calculations use algorithms which provide approximations, and grow
|
|
Packit |
67cb25 |
more accurate as more data is added to the accumulator.
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
The functions described in this chapter are declared in the header file
|
|
Packit |
67cb25 |
:file:`gsl_rstat.h`.
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
Initializing the Accumulator
|
|
Packit |
67cb25 |
============================
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. type:: gsl_rstat_workspace
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
This workspace contains parameters used to calculate various statistics
|
|
Packit |
67cb25 |
and are updated after each data point is added to the accumulator.
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. function:: gsl_rstat_workspace * gsl_rstat_alloc (void)
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
This function allocates a workspace for computing running statistics.
|
|
Packit |
67cb25 |
The size of the workspace is :math:`O(1)`.
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. function:: void gsl_rstat_free (gsl_rstat_workspace * w)
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
This function frees the memory associated with the workspace :data:`w`.
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. function:: int gsl_rstat_reset (gsl_rstat_workspace * w)
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
This function resets the workspace :data:`w` to its initial state,
|
|
Packit |
67cb25 |
so it can begin working on a new set of data.
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
Adding Data to the Accumulator
|
|
Packit |
67cb25 |
==============================
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. function:: int gsl_rstat_add (const double x, gsl_rstat_workspace * w)
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
This function adds the data point :data:`x` to the statistical
|
|
Packit |
67cb25 |
accumulator, updating calculations of the mean, variance,
|
|
Packit |
67cb25 |
standard deviation, skewness, kurtosis, and median.
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. function:: size_t gsl_rstat_n (const gsl_rstat_workspace * w)
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
This function returns the number of data so far added to the accumulator.
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
Current Statistics
|
|
Packit |
67cb25 |
==================
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. function:: double gsl_rstat_min (const gsl_rstat_workspace * w)
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
This function returns the minimum value added to the accumulator.
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. function:: double gsl_rstat_max (const gsl_rstat_workspace * w)
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
This function returns the maximum value added to the accumulator.
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. function:: double gsl_rstat_mean (const gsl_rstat_workspace * w)
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
This function returns the mean of all data added to the accumulator,
|
|
Packit |
67cb25 |
defined as
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. only:: not texinfo
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. math:: {\Hat\mu} = {1 \over N} \sum x_i
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. only:: texinfo
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
::
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
\Hat\mu = (1/N) \sum x_i
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. function:: double gsl_rstat_variance (const gsl_rstat_workspace * w)
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
This function returns the variance of all data added to the accumulator,
|
|
Packit |
67cb25 |
defined as
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. only:: not texinfo
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. math:: {\Hat\sigma}^2 = {1 \over (N-1)} \sum (x_i - {\Hat\mu})^2
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. only:: texinfo
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
::
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
\Hat\sigma^2 = (1/(N-1)) \sum (x_i - \Hat\mu)^2
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. function:: double gsl_rstat_sd (const gsl_rstat_workspace * w)
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
This function returns the standard deviation of all data added to the
|
|
Packit |
67cb25 |
accumulator, defined as the square root of the variance given above.
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. function:: double gsl_rstat_sd_mean (const gsl_rstat_workspace * w)
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
This function returns the standard deviation of the mean, defined as
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. only:: not texinfo
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. math:: \Hat\sigma_{\Hat\mu} = {\Hat\sigma \over \sqrt{N}}
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. only:: texinfo
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
::
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
sd_mean = \Hat\sigma / \sqrt{N}
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. function:: double gsl_rstat_rms (const gsl_rstat_workspace * w)
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
This function returns the root mean square of all data added to the
|
|
Packit |
67cb25 |
accumulator, defined as
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. math:: rms = \sqrt{{1 \over N} \sum x_i^2}
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. function:: double gsl_rstat_skew (const gsl_rstat_workspace * w)
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
This function returns the skewness of all data added to the accumulator,
|
|
Packit |
67cb25 |
defined as
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. only:: not texinfo
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. math::
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
skew = {1 \over N} \sum
|
|
Packit |
67cb25 |
{\left( x_i - {\Hat\mu} \over {\Hat\sigma} \right)}^3
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. only:: texinfo
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
::
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
skew = (1/N) \sum ((x_i - \Hat\mu)/\Hat\sigma)^3
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. function:: double gsl_rstat_kurtosis (const gsl_rstat_workspace * w)
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
This function returns the kurtosis of all data added to the accumulator,
|
|
Packit |
67cb25 |
defined as
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. only:: not texinfo
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. math::
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
kurtosis = \left( {1 \over N} \sum
|
|
Packit |
67cb25 |
{\left(x_i - {\Hat\mu} \over {\Hat\sigma} \right)}^4
|
|
Packit |
67cb25 |
\right)
|
|
Packit |
67cb25 |
- 3
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. only:: texinfo
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
::
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
kurtosis = ((1/N) \sum ((x_i - \Hat\mu)/\Hat\sigma)^4) - 3
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. function:: double gsl_rstat_median (gsl_rstat_workspace * w)
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
This function returns an estimate of the median of the data added to
|
|
Packit |
67cb25 |
the accumulator.
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
Quantiles
|
|
Packit |
67cb25 |
=========
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
The functions in this section estimate quantiles dynamically without
|
|
Packit |
67cb25 |
storing the entire dataset, using the algorithm of Jain and Chlamtec, 1985.
|
|
Packit |
67cb25 |
Only five points (markers) are stored which represent the minimum
|
|
Packit |
67cb25 |
and maximum of the data, as well as current estimates of the
|
|
Packit |
67cb25 |
:math:`p/2`-, :math:`p`-, and :math:`(1+p)/2`-quantiles. Each time
|
|
Packit |
67cb25 |
a new data point is added, the marker positions and heights are
|
|
Packit |
67cb25 |
updated.
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. type:: gsl_rstat_quantile_workspace
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
This workspace contains parameters for estimating quantiles of the
|
|
Packit |
67cb25 |
current dataset
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. function:: gsl_rstat_quantile_workspace * gsl_rstat_quantile_alloc (const double p)
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
This function allocates a workspace for the dynamic estimation of
|
|
Packit |
67cb25 |
:data:`p`-quantiles, where :data:`p` is between :math:`0` and :math:`1`.
|
|
Packit |
67cb25 |
The median corresponds to :math:`p = 0.5`. The size of the workspace
|
|
Packit |
67cb25 |
is :math:`O(1)`.
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. function:: void gsl_rstat_quantile_free (gsl_rstat_quantile_workspace * w)
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
This function frees the memory associated with the workspace :data:`w`.
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. function:: int gsl_rstat_quantile_reset (gsl_rstat_quantile_workspace * w)
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
This function resets the workspace :data:`w` to its initial state,
|
|
Packit |
67cb25 |
so it can begin working on a new set of data.
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. function:: int gsl_rstat_quantile_add (const double x, gsl_rstat_quantile_workspace * w)
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
This function updates the estimate of the :math:`p`-quantile with
|
|
Packit |
67cb25 |
the new data point :data:`x`.
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. function:: double gsl_rstat_quantile_get (gsl_rstat_quantile_workspace * w)
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
This function returns the current estimate of the :math:`p`-quantile.
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
Examples
|
|
Packit |
67cb25 |
========
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
Here is a basic example of how to use the statistical functions:
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. include:: examples/rstat.c
|
|
Packit |
67cb25 |
:code:
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
The program should produce the following output,
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. include:: examples/rstat.txt
|
|
Packit |
67cb25 |
:code:
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
This next program estimates the lower quartile, median and upper
|
|
Packit |
67cb25 |
quartile from 10,000 samples of a random Rayleigh distribution,
|
|
Packit |
67cb25 |
using the :math:`P^2` algorithm of Jain and Chlamtec. For
|
|
Packit |
67cb25 |
comparison, the exact values are also computed from the sorted
|
|
Packit |
67cb25 |
dataset.
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. include:: examples/rquantile.c
|
|
Packit |
67cb25 |
:code:
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
The program should produce the following output,
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
.. include:: examples/rquantile.txt
|
|
Packit |
67cb25 |
:code:
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
References and Further Reading
|
|
Packit |
67cb25 |
==============================
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
The algorithm used to dynamically estimate :math:`p`-quantiles is described
|
|
Packit |
67cb25 |
in the paper,
|
|
Packit |
67cb25 |
|
|
Packit |
67cb25 |
* R. Jain and I. Chlamtac.
|
|
Packit |
67cb25 |
*The P^2 algorithm for dynamic calculation of quantiles and histograms without storing observations*,
|
|
Packit |
67cb25 |
Communications of the ACM, Volume 28 (October), Number 10, 1985,
|
|
Packit |
67cb25 |
p. 1076-1085.
|