Blame doc/statistics.rst

Packit 67cb25
.. index::
Packit 67cb25
   single: statistics
Packit 67cb25
   single: mean
Packit 67cb25
   single: standard deviation
Packit 67cb25
   single: variance
Packit 67cb25
   single: estimated standard deviation
Packit 67cb25
   single: estimated variance
Packit 67cb25
   single: t-test
Packit 67cb25
   single: range
Packit 67cb25
   single: min
Packit 67cb25
   single: max
Packit 67cb25
Packit 67cb25
**********
Packit 67cb25
Statistics
Packit 67cb25
**********
Packit 67cb25
Packit 67cb25
This chapter describes the statistical functions in the library.  The
Packit 67cb25
basic statistical functions include routines to compute the mean,
Packit 67cb25
variance and standard deviation.  More advanced functions allow you to
Packit 67cb25
calculate absolute deviations, skewness, and kurtosis as well as the
Packit 67cb25
median and arbitrary percentiles.  The algorithms use recurrence
Packit 67cb25
relations to compute average quantities in a stable way, without large
Packit 67cb25
intermediate values that might overflow. 
Packit 67cb25
Packit 67cb25
The functions are available in versions for datasets in the standard
Packit 67cb25
floating-point and integer types.  The versions for double precision
Packit 67cb25
floating-point data have the prefix :code:`gsl_stats` and are declared in
Packit 67cb25
the header file :file:`gsl_statistics_double.h`.  The versions for integer
Packit 67cb25
data have the prefix :code:`gsl_stats_int` and are declared in the header
Packit 67cb25
file :file:`gsl_statistics_int.h`.   All the functions operate on C 
Packit 67cb25
arrays with a :code:`stride` parameter specifying the spacing between 
Packit 67cb25
elements.  
Packit 67cb25
Packit 67cb25
Mean, Standard Deviation and Variance
Packit 67cb25
=====================================
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_mean (const double data[], size_t stride, size_t n)
Packit 67cb25
Packit 67cb25
   This function returns the arithmetic mean of :data:`data`, a dataset of
Packit 67cb25
   length :data:`n` with stride :data:`stride`.  The arithmetic mean, or
Packit 67cb25
   *sample mean*, is denoted by :math:`\Hat\mu` and defined as,
Packit 67cb25
Packit 67cb25
   .. math:: \Hat\mu = {1 \over N} \sum x_i
Packit 67cb25
Packit 67cb25
   where :math:`x_i` are the elements of the dataset :data:`data`.  For
Packit 67cb25
   samples drawn from a gaussian distribution the variance of
Packit 67cb25
   :math:`\Hat\mu` is :math:`\sigma^2 / N`.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_variance (const double data[], size_t stride, size_t n)
Packit 67cb25
Packit 67cb25
   This function returns the estimated, or *sample*, variance of
Packit 67cb25
   :data:`data`, a dataset of length :data:`n` with stride :data:`stride`.  The
Packit 67cb25
   estimated variance is denoted by :math:`\Hat\sigma^2` and is defined by,
Packit 67cb25
Packit 67cb25
   .. only:: not texinfo
Packit 67cb25
Packit 67cb25
      .. math:: {\Hat\sigma}^2 = {1 \over (N-1)} \sum (x_i - {\Hat\mu})^2
Packit 67cb25
Packit 67cb25
   .. only:: texinfo
Packit 67cb25
Packit 67cb25
      ::
Packit 67cb25
Packit 67cb25
         \Hat\sigma^2 = (1/(N-1)) \sum (x_i - \Hat\mu)^2
Packit 67cb25
Packit 67cb25
   where :math:`x_i` are the elements of the dataset :data:`data`.  Note that
Packit 67cb25
   the normalization factor of :math:`1/(N-1)` results from the derivation
Packit 67cb25
   of :math:`\Hat\sigma^2` as an unbiased estimator of the population
Packit 67cb25
   variance :math:`\sigma^2`.  For samples drawn from a Gaussian distribution
Packit 67cb25
   the variance of :math:`\Hat\sigma^2` itself is :math:`2 \sigma^4 / N`.
Packit 67cb25
Packit 67cb25
   This function computes the mean via a call to :func:`gsl_stats_mean`.  If
Packit 67cb25
   you have already computed the mean then you can pass it directly to
Packit 67cb25
   :func:`gsl_stats_variance_m`.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_variance_m (const double data[], size_t stride, size_t n, double mean)
Packit 67cb25
Packit 67cb25
   This function returns the sample variance of :data:`data` relative to the
Packit 67cb25
   given value of :data:`mean`.  The function is computed with :math:`\Hat\mu`
Packit 67cb25
   replaced by the value of :data:`mean` that you supply,
Packit 67cb25
Packit 67cb25
   .. only:: not texinfo
Packit 67cb25
Packit 67cb25
      .. math:: {\Hat\sigma}^2 = {1 \over (N-1)} \sum (x_i - mean)^2
Packit 67cb25
Packit 67cb25
   .. only:: texinfo
Packit 67cb25
Packit 67cb25
      ::
Packit 67cb25
Packit 67cb25
         \Hat\sigma^2 = (1/(N-1)) \sum (x_i - mean)^2
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_sd (const double data[], size_t stride, size_t n)
Packit 67cb25
              double gsl_stats_sd_m (const double data[], size_t stride, size_t n, double mean)
Packit 67cb25
Packit 67cb25
   The standard deviation is defined as the square root of the variance.
Packit 67cb25
   These functions return the square root of the corresponding variance
Packit 67cb25
   functions above.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_tss (const double data[], size_t stride, size_t n)
Packit 67cb25
              double gsl_stats_tss_m (const double data[], size_t stride, size_t n, double mean)
Packit 67cb25
Packit 67cb25
   These functions return the total sum of squares (TSS) of :data:`data` about
Packit 67cb25
   the mean.  For :func:`gsl_stats_tss_m` the user-supplied value of
Packit 67cb25
   :data:`mean` is used, and for :func:`gsl_stats_tss` it is computed using
Packit 67cb25
   :func:`gsl_stats_mean`.
Packit 67cb25
Packit 67cb25
   .. only:: not texinfo
Packit 67cb25
Packit 67cb25
      .. math:: {\rm TSS} = \sum (x_i - mean)^2
Packit 67cb25
Packit 67cb25
   .. only:: texinfo
Packit 67cb25
Packit 67cb25
      ::
Packit 67cb25
Packit 67cb25
         TSS =  \sum (x_i - mean)^2
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_variance_with_fixed_mean (const double data[], size_t stride, size_t n, double mean)
Packit 67cb25
Packit 67cb25
   This function computes an unbiased estimate of the variance of
Packit 67cb25
   :data:`data` when the population mean :data:`mean` of the underlying
Packit 67cb25
   distribution is known *a priori*.  In this case the estimator for
Packit 67cb25
   the variance uses the factor :math:`1/N` and the sample mean
Packit 67cb25
   :math:`\Hat\mu` is replaced by the known population mean :math:`\mu`,
Packit 67cb25
Packit 67cb25
   .. only:: not texinfo
Packit 67cb25
Packit 67cb25
      .. math:: {\Hat\sigma}^2 = {1 \over N} \sum (x_i - \mu)^2
Packit 67cb25
Packit 67cb25
   .. only:: texinfo
Packit 67cb25
Packit 67cb25
      ::
Packit 67cb25
Packit 67cb25
         \Hat\sigma^2 = (1/N) \sum (x_i - \mu)^2
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_sd_with_fixed_mean (const double data[], size_t stride, size_t n, double mean)
Packit 67cb25
Packit 67cb25
   This function calculates the standard deviation of :data:`data` for a
Packit 67cb25
   fixed population mean :data:`mean`.  The result is the square root of the
Packit 67cb25
   corresponding variance function.
Packit 67cb25
Packit 67cb25
Absolute deviation
Packit 67cb25
==================
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_absdev (const double data[], size_t stride, size_t n)
Packit 67cb25
Packit 67cb25
   This function computes the absolute deviation from the mean of
Packit 67cb25
   :data:`data`, a dataset of length :data:`n` with stride :data:`stride`.  The
Packit 67cb25
   absolute deviation from the mean is defined as,
Packit 67cb25
Packit 67cb25
   .. only:: not texinfo
Packit 67cb25
Packit 67cb25
      .. math:: absdev  = {1 \over N} \sum |x_i - {\Hat\mu}|
Packit 67cb25
Packit 67cb25
   .. only:: texinfo
Packit 67cb25
Packit 67cb25
      ::
Packit 67cb25
Packit 67cb25
         absdev  = (1/N) \sum |x_i - \Hat\mu|
Packit 67cb25
Packit 67cb25
   where :math:`x_i` are the elements of the dataset :data:`data`.  The
Packit 67cb25
   absolute deviation from the mean provides a more robust measure of the
Packit 67cb25
   width of a distribution than the variance.  This function computes the
Packit 67cb25
   mean of :data:`data` via a call to :func:`gsl_stats_mean`.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_absdev_m (const double data[], size_t stride, size_t n, double mean)
Packit 67cb25
Packit 67cb25
   This function computes the absolute deviation of the dataset :data:`data`
Packit 67cb25
   relative to the given value of :data:`mean`,
Packit 67cb25
Packit 67cb25
   .. only:: not texinfo
Packit 67cb25
Packit 67cb25
      .. math:: absdev  = {1 \over N} \sum |x_i - mean|
Packit 67cb25
Packit 67cb25
   .. only:: texinfo
Packit 67cb25
Packit 67cb25
      ::
Packit 67cb25
Packit 67cb25
         absdev  = (1/N) \sum |x_i - mean|
Packit 67cb25
Packit 67cb25
   This function is useful if you have already computed the mean of
Packit 67cb25
   :data:`data` (and want to avoid recomputing it), or wish to calculate the
Packit 67cb25
   absolute deviation relative to another value (such as zero, or the
Packit 67cb25
   median).
Packit 67cb25
Packit 67cb25
.. index:: skewness, kurtosis
Packit 67cb25
Packit 67cb25
Higher moments (skewness and kurtosis)
Packit 67cb25
======================================
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_skew (const double data[], size_t stride, size_t n)
Packit 67cb25
Packit 67cb25
   This function computes the skewness of :data:`data`, a dataset of length
Packit 67cb25
   :data:`n` with stride :data:`stride`.  The skewness is defined as,
Packit 67cb25
Packit 67cb25
   .. only:: not texinfo
Packit 67cb25
Packit 67cb25
      .. math::
Packit 67cb25
Packit 67cb25
         skew = {1 \over N} \sum 
Packit 67cb25
          {\left( x_i - {\Hat\mu} \over {\Hat\sigma} \right)}^3
Packit 67cb25
Packit 67cb25
   .. only:: texinfo
Packit 67cb25
Packit 67cb25
      ::
Packit 67cb25
Packit 67cb25
         skew = (1/N) \sum ((x_i - \Hat\mu)/\Hat\sigma)^3
Packit 67cb25
Packit 67cb25
   where :math:`x_i` are the elements of the dataset :data:`data`.  The skewness
Packit 67cb25
   measures the asymmetry of the tails of a distribution.
Packit 67cb25
Packit 67cb25
   The function computes the mean and estimated standard deviation of
Packit 67cb25
   :data:`data` via calls to :func:`gsl_stats_mean` and :func:`gsl_stats_sd`.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_skew_m_sd (const double data[], size_t stride, size_t n, double mean, double sd)
Packit 67cb25
Packit 67cb25
   This function computes the skewness of the dataset :data:`data` using the
Packit 67cb25
   given values of the mean :data:`mean` and standard deviation :data:`sd`,
Packit 67cb25
Packit 67cb25
   .. only:: not texinfo
Packit 67cb25
Packit 67cb25
      .. math:: skew = {1 \over N} \sum {\left( x_i - mean \over sd \right)}^3
Packit 67cb25
Packit 67cb25
   .. only:: texinfo
Packit 67cb25
Packit 67cb25
      ::
Packit 67cb25
Packit 67cb25
         skew = (1/N) \sum ((x_i - mean)/sd)^3
Packit 67cb25
Packit 67cb25
   These functions are useful if you have already computed the mean and
Packit 67cb25
   standard deviation of :data:`data` and want to avoid recomputing them.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_kurtosis (const double data[], size_t stride, size_t n)
Packit 67cb25
Packit 67cb25
   This function computes the kurtosis of :data:`data`, a dataset of length
Packit 67cb25
   :data:`n` with stride :data:`stride`.  The kurtosis is defined as,
Packit 67cb25
Packit 67cb25
   .. only:: not texinfo
Packit 67cb25
Packit 67cb25
      .. math::
Packit 67cb25
Packit 67cb25
         kurtosis = \left( {1 \over N} \sum 
Packit 67cb25
          {\left(x_i - {\Hat\mu} \over {\Hat\sigma} \right)}^4 
Packit 67cb25
          \right) 
Packit 67cb25
          - 3
Packit 67cb25
Packit 67cb25
   .. only:: texinfo
Packit 67cb25
Packit 67cb25
      ::
Packit 67cb25
Packit 67cb25
         kurtosis = ((1/N) \sum ((x_i - \Hat\mu)/\Hat\sigma)^4)  - 3
Packit 67cb25
Packit 67cb25
   The kurtosis measures how sharply peaked a distribution is, relative to
Packit 67cb25
   its width.  The kurtosis is normalized to zero for a Gaussian
Packit 67cb25
   distribution.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_kurtosis_m_sd (const double data[], size_t stride, size_t n, double mean, double sd)
Packit 67cb25
Packit 67cb25
   This function computes the kurtosis of the dataset :data:`data` using the
Packit 67cb25
   given values of the mean :data:`mean` and standard deviation :data:`sd`,
Packit 67cb25
Packit 67cb25
   .. only:: not texinfo
Packit 67cb25
Packit 67cb25
      .. math::
Packit 67cb25
Packit 67cb25
         kurtosis = {1 \over N}
Packit 67cb25
           \left( \sum {\left(x_i - mean \over sd \right)}^4 \right) 
Packit 67cb25
           - 3
Packit 67cb25
Packit 67cb25
   .. only:: texinfo
Packit 67cb25
Packit 67cb25
      ::
Packit 67cb25
Packit 67cb25
         kurtosis = ((1/N) \sum ((x_i - mean)/sd)^4) - 3
Packit 67cb25
Packit 67cb25
   This function is useful if you have already computed the mean and
Packit 67cb25
   standard deviation of :data:`data` and want to avoid recomputing them.
Packit 67cb25
Packit 67cb25
Autocorrelation
Packit 67cb25
===============
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_lag1_autocorrelation (const double data[], const size_t stride, const size_t n)
Packit 67cb25
Packit 67cb25
   This function computes the lag-1 autocorrelation of the dataset :data:`data`.
Packit 67cb25
Packit 67cb25
   .. only:: not texinfo
Packit 67cb25
Packit 67cb25
      .. math::
Packit 67cb25
Packit 67cb25
         a_1 = {\sum_{i = 2}^{n} (x_{i} - \Hat\mu) (x_{i-1} - \Hat\mu)
Packit 67cb25
         \over
Packit 67cb25
         \sum_{i = 1}^{n} (x_{i} - \Hat\mu) (x_{i} - \Hat\mu)}
Packit 67cb25
Packit 67cb25
   .. only:: texinfo
Packit 67cb25
Packit 67cb25
      ::
Packit 67cb25
Packit 67cb25
         a_1 = {\sum_{i = 2}^{n} (x_{i} - \Hat\mu) (x_{i-1} - \Hat\mu)
Packit 67cb25
                \over
Packit 67cb25
                \sum_{i = 1}^{n} (x_{i} - \Hat\mu) (x_{i} - \Hat\mu)}
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_lag1_autocorrelation_m (const double data[], const size_t stride, const size_t n, const double mean)
Packit 67cb25
Packit 67cb25
   This function computes the lag-1 autocorrelation of the dataset
Packit 67cb25
   :data:`data` using the given value of the mean :data:`mean`.
Packit 67cb25
Packit 67cb25
.. index::
Packit 67cb25
   single: covariance, of two datasets
Packit 67cb25
Packit 67cb25
Covariance
Packit 67cb25
==========
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_covariance (const double data1[], const size_t stride1, const double data2[], const size_t stride2, const size_t n)
Packit 67cb25
Packit 67cb25
   This function computes the covariance of the datasets :data:`data1` and
Packit 67cb25
   :data:`data2` which must both be of the same length :data:`n`.
Packit 67cb25
Packit 67cb25
   .. only:: not texinfo
Packit 67cb25
Packit 67cb25
      .. math:: covar = {1 \over (n - 1)} \sum_{i = 1}^{n} (x_{i} - \Hat x) (y_{i} - \Hat y)
Packit 67cb25
Packit 67cb25
   .. only:: texinfo
Packit 67cb25
Packit 67cb25
      ::
Packit 67cb25
Packit 67cb25
         covar = (1/(n - 1)) \sum_{i = 1}^{n} (x_i - \Hat x) (y_i - \Hat y)
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_covariance_m (const double data1[], const size_t stride1, const double data2[], const size_t stride2, const size_t n, const double mean1, const double mean2)
Packit 67cb25
Packit 67cb25
   This function computes the covariance of the datasets :data:`data1` and
Packit 67cb25
   :data:`data2` using the given values of the means, :data:`mean1` and
Packit 67cb25
   :data:`mean2`.  This is useful if you have already computed the means of
Packit 67cb25
   :data:`data1` and :data:`data2` and want to avoid recomputing them.
Packit 67cb25
Packit 67cb25
.. index::
Packit 67cb25
   single: correlation, of two datasets
Packit 67cb25
Packit 67cb25
Correlation
Packit 67cb25
===========
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_correlation (const double data1[], const size_t stride1, const double data2[], const size_t stride2, const size_t n)
Packit 67cb25
Packit 67cb25
   This function efficiently computes the Pearson correlation coefficient
Packit 67cb25
   between the datasets :data:`data1` and :data:`data2` which must both be of
Packit 67cb25
   the same length :data:`n`.
Packit 67cb25
Packit 67cb25
   .. only:: not texinfo
Packit 67cb25
Packit 67cb25
      .. math::
Packit 67cb25
Packit 67cb25
         r = {cov(x, y) \over \Hat\sigma_x \Hat\sigma_y} =
Packit 67cb25
         {{1 \over n-1} \sum (x_i - \Hat x) (y_i - \Hat y)
Packit 67cb25
         \over
Packit 67cb25
         \sqrt{{1 \over n-1} \sum (x_i - {\Hat x})^2}
Packit 67cb25
         \sqrt{{1 \over n-1} \sum (y_i - {\Hat y})^2}
Packit 67cb25
         }
Packit 67cb25
Packit 67cb25
   .. only:: texinfo
Packit 67cb25
Packit 67cb25
      ::
Packit 67cb25
Packit 67cb25
         r = cov(x, y) / (\Hat\sigma_x \Hat\sigma_y)
Packit 67cb25
           = {1/(n-1) \sum (x_i - \Hat x) (y_i - \Hat y)
Packit 67cb25
              \over
Packit 67cb25
              \sqrt{1/(n-1) \sum (x_i - \Hat x)^2} \sqrt{1/(n-1) \sum (y_i - \Hat y)^2}
Packit 67cb25
             }
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_spearman (const double data1[], const size_t stride1, const double data2[], const size_t stride2, const size_t n, double work[])
Packit 67cb25
Packit 67cb25
   This function computes the Spearman rank correlation coefficient between
Packit 67cb25
   the datasets :data:`data1` and :data:`data2` which must both be of the same
Packit 67cb25
   length :data:`n`. Additional workspace of size 2 * :data:`n` is required in
Packit 67cb25
   :data:`work`. The Spearman rank correlation between vectors :math:`x` and
Packit 67cb25
   :math:`y` is equivalent to the Pearson correlation between the ranked
Packit 67cb25
   vectors :math:`x_R` and :math:`y_R`, where ranks are defined to be the
Packit 67cb25
   average of the positions of an element in the ascending order of the values.
Packit 67cb25
Packit 67cb25
Weighted Samples
Packit 67cb25
================
Packit 67cb25
Packit 67cb25
The functions described in this section allow the computation of
Packit 67cb25
statistics for weighted samples.  The functions accept an array of
Packit 67cb25
samples, :math:`x_i`, with associated weights, :math:`w_i`.  Each sample
Packit 67cb25
:math:`x_i` is considered as having been drawn from a Gaussian
Packit 67cb25
distribution with variance :math:`\sigma_i^2`.  The sample weight
Packit 67cb25
:math:`w_i` is defined as the reciprocal of this variance, :math:`w_i = 1/\sigma_i^2`.
Packit 67cb25
Setting a weight to zero corresponds to removing a sample from a dataset.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_wmean (const double w[], size_t wstride, const double data[], size_t stride, size_t n)
Packit 67cb25
Packit 67cb25
   This function returns the weighted mean of the dataset :data:`data` with
Packit 67cb25
   stride :data:`stride` and length :data:`n`, using the set of weights :data:`w`
Packit 67cb25
   with stride :data:`wstride` and length :data:`n`.  The weighted mean is defined as,
Packit 67cb25
Packit 67cb25
   .. only:: not texinfo
Packit 67cb25
Packit 67cb25
      .. math:: {\Hat\mu} = {{\sum w_i x_i} \over {\sum w_i}}
Packit 67cb25
Packit 67cb25
   .. only:: texinfo
Packit 67cb25
Packit 67cb25
      ::
Packit 67cb25
Packit 67cb25
         \Hat\mu = (\sum w_i x_i) / (\sum w_i)
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_wvariance (const double w[], size_t wstride, const double data[], size_t stride, size_t n)
Packit 67cb25
Packit 67cb25
   This function returns the estimated variance of the dataset :data:`data`
Packit 67cb25
   with stride :data:`stride` and length :data:`n`, using the set of weights
Packit 67cb25
   :data:`w` with stride :data:`wstride` and length :data:`n`.  The estimated
Packit 67cb25
   variance of a weighted dataset is calculated as,
Packit 67cb25
Packit 67cb25
   .. only:: not texinfo
Packit 67cb25
Packit 67cb25
      .. math::
Packit 67cb25
Packit 67cb25
         \Hat\sigma^2 = {{\sum w_i} \over {(\sum w_i)^2 - \sum (w_i^2)}} 
Packit 67cb25
                         \sum w_i (x_i - \Hat\mu)^2
Packit 67cb25
Packit 67cb25
   .. only:: texinfo
Packit 67cb25
Packit 67cb25
      ::
Packit 67cb25
Packit 67cb25
         \Hat\sigma^2 = ((\sum w_i)/((\sum w_i)^2 - \sum (w_i^2))) 
Packit 67cb25
                         \sum w_i (x_i - \Hat\mu)^2
Packit 67cb25
Packit 67cb25
   Note that this expression reduces to an unweighted variance with the
Packit 67cb25
   familiar :math:`1/(N-1)` factor when there are :math:`N` equal non-zero
Packit 67cb25
   weights.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_wvariance_m (const double w[], size_t wstride, const double data[], size_t stride, size_t n, double wmean)
Packit 67cb25
Packit 67cb25
   This function returns the estimated variance of the weighted dataset
Packit 67cb25
   :data:`data` using the given weighted mean :data:`wmean`.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_wsd (const double w[], size_t wstride, const double data[], size_t stride, size_t n)
Packit 67cb25
Packit 67cb25
   The standard deviation is defined as the square root of the variance.
Packit 67cb25
   This function returns the square root of the corresponding variance
Packit 67cb25
   function :func:`gsl_stats_wvariance` above.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_wsd_m (const double w[], size_t wstride, const double data[], size_t stride, size_t n, double wmean)
Packit 67cb25
Packit 67cb25
   This function returns the square root of the corresponding variance
Packit 67cb25
   function :func:`gsl_stats_wvariance_m` above.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_wvariance_with_fixed_mean (const double w[], size_t wstride, const double data[], size_t stride, size_t n, const double mean)
Packit 67cb25
Packit 67cb25
   This function computes an unbiased estimate of the variance of the weighted
Packit 67cb25
   dataset :data:`data` when the population mean :data:`mean` of the underlying
Packit 67cb25
   distribution is known *a priori*.  In this case the estimator for
Packit 67cb25
   the variance replaces the sample mean :math:`\Hat\mu` by the known
Packit 67cb25
   population mean :math:`\mu`,
Packit 67cb25
Packit 67cb25
   .. only:: not texinfo
Packit 67cb25
Packit 67cb25
      .. math:: \Hat\sigma^2 = {{\sum w_i (x_i - \mu)^2} \over {\sum w_i}}
Packit 67cb25
Packit 67cb25
   .. only:: texinfo
Packit 67cb25
Packit 67cb25
      ::
Packit 67cb25
Packit 67cb25
         \Hat\sigma^2 = (\sum w_i (x_i - \mu)^2) / (\sum w_i)
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_wsd_with_fixed_mean (const double w[], size_t wstride, const double data[], size_t stride, size_t n, const double mean)
Packit 67cb25
Packit 67cb25
   The standard deviation is defined as the square root of the variance.
Packit 67cb25
   This function returns the square root of the corresponding variance
Packit 67cb25
   function above.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_wtss (const double w[], const size_t wstride, const double data[], size_t stride, size_t n)
Packit 67cb25
              double gsl_stats_wtss_m (const double w[], const size_t wstride, const double data[], size_t stride, size_t n, double wmean)
Packit 67cb25
Packit 67cb25
   These functions return the weighted total sum of squares (TSS) of
Packit 67cb25
   :data:`data` about the weighted mean.  For :func:`gsl_stats_wtss_m` the
Packit 67cb25
   user-supplied value of :data:`wmean` is used, and for :func:`gsl_stats_wtss`
Packit 67cb25
   it is computed using :func:`gsl_stats_wmean`.
Packit 67cb25
Packit 67cb25
   .. only:: not texinfo
Packit 67cb25
Packit 67cb25
      .. math:: {\rm TSS} = \sum w_i (x_i - wmean)^2
Packit 67cb25
Packit 67cb25
   .. only:: texinfo
Packit 67cb25
Packit 67cb25
      ::
Packit 67cb25
Packit 67cb25
         TSS =  \sum w_i (x_i - wmean)^2
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_wabsdev (const double w[], size_t wstride, const double data[], size_t stride, size_t n)
Packit 67cb25
Packit 67cb25
   This function computes the weighted absolute deviation from the weighted
Packit 67cb25
   mean of :data:`data`.  The absolute deviation from the mean is defined as,
Packit 67cb25
Packit 67cb25
   .. only:: not texinfo
Packit 67cb25
Packit 67cb25
      .. math:: absdev = {{\sum w_i |x_i - \Hat\mu|} \over {\sum w_i}}
Packit 67cb25
Packit 67cb25
   .. only:: texinfo
Packit 67cb25
Packit 67cb25
      ::
Packit 67cb25
Packit 67cb25
         absdev = (\sum w_i |x_i - \Hat\mu|) / (\sum w_i)
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_wabsdev_m (const double w[], size_t wstride, const double data[], size_t stride, size_t n, double wmean)
Packit 67cb25
Packit 67cb25
   This function computes the absolute deviation of the weighted dataset
Packit 67cb25
   :data:`data` about the given weighted mean :data:`wmean`.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_wskew (const double w[], size_t wstride, const double data[], size_t stride, size_t n)
Packit 67cb25
Packit 67cb25
   This function computes the weighted skewness of the dataset :data:`data`.
Packit 67cb25
Packit 67cb25
   .. only:: not texinfo
Packit 67cb25
Packit 67cb25
      .. math:: skew = {{\sum w_i ((x_i - {\Hat x})/{\Hat \sigma})^3} \over {\sum w_i}}
Packit 67cb25
Packit 67cb25
   .. only:: texinfo
Packit 67cb25
Packit 67cb25
      ::
Packit 67cb25
Packit 67cb25
         skew = (\sum w_i ((x_i - \Hat x)/\Hat \sigma)^3) / (\sum w_i)
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_wskew_m_sd (const double w[], size_t wstride, const double data[], size_t stride, size_t n, double wmean, double wsd)
Packit 67cb25
Packit 67cb25
   This function computes the weighted skewness of the dataset :data:`data`
Packit 67cb25
   using the given values of the weighted mean and weighted standard
Packit 67cb25
   deviation, :data:`wmean` and :data:`wsd`.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_wkurtosis (const double w[], size_t wstride, const double data[], size_t stride, size_t n)
Packit 67cb25
Packit 67cb25
   This function computes the weighted kurtosis of the dataset :data:`data`.
Packit 67cb25
Packit 67cb25
   .. only:: not texinfo
Packit 67cb25
Packit 67cb25
      .. math:: kurtosis = {{\sum w_i ((x_i - {\Hat x})/{\Hat \sigma})^4} \over {\sum w_i}} - 3
Packit 67cb25
Packit 67cb25
   .. only:: texinfo
Packit 67cb25
Packit 67cb25
      ::
Packit 67cb25
Packit 67cb25
         kurtosis = ((\sum w_i ((x_i - \Hat x)/\Hat \sigma)^4) / (\sum w_i)) - 3
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_wkurtosis_m_sd (const double w[], size_t wstride, const double data[], size_t stride, size_t n, double wmean, double wsd)
Packit 67cb25
Packit 67cb25
   This function computes the weighted kurtosis of the dataset :data:`data`
Packit 67cb25
   using the given values of the weighted mean and weighted standard
Packit 67cb25
   deviation, :data:`wmean` and :data:`wsd`.
Packit 67cb25
Packit 67cb25
Maximum and Minimum values
Packit 67cb25
==========================
Packit 67cb25
Packit 67cb25
The following functions find the maximum and minimum values of a
Packit 67cb25
dataset (or their indices).  If the data contains :code:`NaN`-s then a
Packit 67cb25
:code:`NaN` will be returned, since the maximum or minimum value is
Packit 67cb25
undefined.  For functions which return an index, the location of the
Packit 67cb25
first :code:`NaN` in the array is returned.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_max (const double data[], size_t stride, size_t n)
Packit 67cb25
Packit 67cb25
   This function returns the maximum value in :data:`data`, a dataset of
Packit 67cb25
   length :data:`n` with stride :data:`stride`.  The maximum value is defined
Packit 67cb25
   as the value of the element :math:`x_i` which satisfies :math:`x_i \ge x_j`
Packit 67cb25
   for all :math:`j`.
Packit 67cb25
Packit 67cb25
   If you want instead to find the element with the largest absolute
Packit 67cb25
   magnitude you will need to apply :func:`fabs` or :func:`abs` to your data
Packit 67cb25
   before calling this function.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_min (const double data[], size_t stride, size_t n)
Packit 67cb25
Packit 67cb25
   This function returns the minimum value in :data:`data`, a dataset of
Packit 67cb25
   length :data:`n` with stride :data:`stride`.  The minimum value is defined
Packit 67cb25
   as the value of the element :math:`x_i` which satisfies :math:`x_i \le x_j`
Packit 67cb25
   for all :math:`j`.
Packit 67cb25
Packit 67cb25
   If you want instead to find the element with the smallest absolute
Packit 67cb25
   magnitude you will need to apply :func:`fabs` or :func:`abs` to your data
Packit 67cb25
   before calling this function.
Packit 67cb25
Packit 67cb25
.. function:: void gsl_stats_minmax (double * min, double * max, const double data[], size_t stride, size_t n)
Packit 67cb25
Packit 67cb25
   This function finds both the minimum and maximum values :data:`min`,
Packit 67cb25
   :data:`max` in :data:`data` in a single pass.
Packit 67cb25
Packit 67cb25
.. function:: size_t gsl_stats_max_index (const double data[], size_t stride, size_t n)
Packit 67cb25
Packit 67cb25
   This function returns the index of the maximum value in :data:`data`, a
Packit 67cb25
   dataset of length :data:`n` with stride :data:`stride`.  The maximum value is
Packit 67cb25
   defined as the value of the element :math:`x_i` which satisfies 
Packit 67cb25
   :math:`x_i \ge x_j`
Packit 67cb25
   for all :math:`j`.  When there are several equal maximum
Packit 67cb25
   elements then the first one is chosen.
Packit 67cb25
Packit 67cb25
.. function:: size_t gsl_stats_min_index (const double data[], size_t stride, size_t n)
Packit 67cb25
Packit 67cb25
   This function returns the index of the minimum value in :data:`data`, a
Packit 67cb25
   dataset of length :data:`n` with stride :data:`stride`.  The minimum value
Packit 67cb25
   is defined as the value of the element :math:`x_i` which satisfies
Packit 67cb25
   :math:`x_i \ge x_j`
Packit 67cb25
   for all :math:`j`.  When there are several equal
Packit 67cb25
   minimum elements then the first one is chosen.
Packit 67cb25
Packit 67cb25
.. function:: void gsl_stats_minmax_index (size_t * min_index, size_t * max_index, const double data[], size_t stride, size_t n)
Packit 67cb25
Packit 67cb25
   This function returns the indexes :data:`min_index`, :data:`max_index` of
Packit 67cb25
   the minimum and maximum values in :data:`data` in a single pass.
Packit 67cb25
Packit 67cb25
Median and Percentiles
Packit 67cb25
======================
Packit 67cb25
Packit 67cb25
The median and percentile functions described in this section operate on
Packit 67cb25
sorted data in :math:`O(1)` time. There is also a routine for computing
Packit 67cb25
the median of an unsorted input array in average :math:`O(n)` time using
Packit 67cb25
the quickselect algorithm. For convenience we use *quantiles*, measured on a scale
Packit 67cb25
of 0 to 1, instead of percentiles (which use a scale of 0 to 100).
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_median_from_sorted_data (const double sorted_data[], const size_t stride, const size_t n)
Packit 67cb25
Packit 67cb25
   This function returns the median value of :data:`sorted_data`, a dataset
Packit 67cb25
   of length :data:`n` with stride :data:`stride`.  The elements of the array
Packit 67cb25
   must be in ascending numerical order.  There are no checks to see
Packit 67cb25
   whether the data are sorted, so the function :func:`gsl_sort` should
Packit 67cb25
   always be used first.
Packit 67cb25
Packit 67cb25
   When the dataset has an odd number of elements the median is the value
Packit 67cb25
   of element :math:`(n-1)/2`.  When the dataset has an even number of
Packit 67cb25
   elements the median is the mean of the two nearest middle values,
Packit 67cb25
   elements :math:`(n-1)/2` and :math:`n/2`.  Since the algorithm for
Packit 67cb25
   computing the median involves interpolation this function always returns
Packit 67cb25
   a floating-point number, even for integer data types.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_median (double data[], const size_t stride, const size_t n)
Packit 67cb25
Packit 67cb25
   This function returns the median value of :data:`data`, a dataset
Packit 67cb25
   of length :data:`n` with stride :data:`stride`. The median is found
Packit 67cb25
   using the quickselect algorithm. The input array does not need to be
Packit 67cb25
   sorted, but note that the algorithm rearranges the array and so the input
Packit 67cb25
   is not preserved on output.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_quantile_from_sorted_data (const double sorted_data[], size_t stride, size_t n, double f)
Packit 67cb25
Packit 67cb25
   This function returns a quantile value of :data:`sorted_data`, a
Packit 67cb25
   double-precision array of length :data:`n` with stride :data:`stride`.  The
Packit 67cb25
   elements of the array must be in ascending numerical order.  The
Packit 67cb25
   quantile is determined by the :data:`f`, a fraction between 0 and 1.  For
Packit 67cb25
   example, to compute the value of the 75th percentile :data:`f` should have
Packit 67cb25
   the value 0.75.
Packit 67cb25
Packit 67cb25
   There are no checks to see whether the data are sorted, so the function
Packit 67cb25
   :func:`gsl_sort` should always be used first.
Packit 67cb25
Packit 67cb25
   The quantile is found by interpolation, using the formula
Packit 67cb25
Packit 67cb25
   .. only:: not texinfo
Packit 67cb25
Packit 67cb25
      .. math:: \hbox{quantile} = (1 - \delta) x_i + \delta x_{i+1}
Packit 67cb25
Packit 67cb25
   .. only:: texinfo
Packit 67cb25
Packit 67cb25
      ::
Packit 67cb25
Packit 67cb25
         quantile = (1 - \delta) x_i + \delta x_{i+1}
Packit 67cb25
Packit 67cb25
   where :math:`i` is :code:`floor((n - 1)f)` and :math:`\delta` is
Packit 67cb25
   :math:`(n-1)f - i`.
Packit 67cb25
Packit 67cb25
   Thus the minimum value of the array (:code:`data[0*stride]`) is given by
Packit 67cb25
   :data:`f` equal to zero, the maximum value (:code:`data[(n-1)*stride]`) is
Packit 67cb25
   given by :data:`f` equal to one and the median value is given by :data:`f`
Packit 67cb25
   equal to 0.5.  Since the algorithm for computing quantiles involves
Packit 67cb25
   interpolation this function always returns a floating-point number, even
Packit 67cb25
   for integer data types.
Packit 67cb25
Packit 67cb25
.. @node Statistical tests
Packit 67cb25
.. @section Statistical tests
Packit 67cb25
Packit 67cb25
.. FIXME, do more work on the statistical tests
Packit 67cb25
Packit 67cb25
.. -@deftypefun double gsl_stats_ttest (const double data1[], double data2[], size_t n1, size_t n2)
Packit 67cb25
.. -@deftypefunx Statistics double gsl_stats_int_ttest (const double data1[], double data2[], size_t n1, size_t n2)
Packit 67cb25
Packit 67cb25
.. The function :func:`gsl_stats_ttest` computes the t-test statistic for
Packit 67cb25
.. the two arrays :data:`data1`[] and :data:`data2`[], of lengths :data:`n1` and
Packit 67cb25
.. -:data:`n2` respectively.
Packit 67cb25
Packit 67cb25
.. The t-test statistic measures the difference between the means of two
Packit 67cb25
.. datasets.
Packit 67cb25
Packit 67cb25
Order Statistics
Packit 67cb25
================
Packit 67cb25
Packit 67cb25
The :math:`k`-th *order statistic* of a sample is equal to its :math:`k`-th smallest value.
Packit 67cb25
The :math:`k`-th order statistic of a set of :math:`n` values :math:`x = \left\{ x_i \right\}, 1 \le i \le n` is
Packit 67cb25
denoted :math:`x_{(k)}`. The median of the set :math:`x` is equal to :math:`x_{\left( \frac{n}{2} \right)}` if
Packit 67cb25
:math:`n` is odd, or the average of :math:`x_{\left( \frac{n}{2} \right)}` and :math:`x_{\left( \frac{n}{2} + 1 \right)}`
Packit 67cb25
if :math:`n` is even. The :math:`k`-th smallest element of a length :math:`n` vector can be found
Packit 67cb25
in average :math:`O(n)` time using the quickselect algorithm.
Packit 67cb25
Packit 67cb25
.. function:: gsl_stats_select(double data[], const size_t stride, const size_t n, const size_t k)
Packit 67cb25
Packit 67cb25
   This function finds the :data:`k`-th smallest element of the input array :data:`data`
Packit 67cb25
   of length :data:`n` and stride :data:`stride` using the quickselect method. The
Packit 67cb25
   algorithm rearranges the elements of :data:`data` and so the input array is not preserved
Packit 67cb25
   on output.
Packit 67cb25
Packit 67cb25
.. index::
Packit 67cb25
   single: robust location estimators
Packit 67cb25
   single: location estimation
Packit 67cb25
   single: estimation, location
Packit 67cb25
Packit 67cb25
Robust Location Estimates
Packit 67cb25
=========================
Packit 67cb25
Packit 67cb25
A *location estimate* refers to a typical or central value which best describes a given
Packit 67cb25
dataset. The mean and median are both examples of location estimators. However, the
Packit 67cb25
mean has a severe sensitivity to data outliers and can give erroneous values when
Packit 67cb25
even a small number of outliers are present. The median on the other hand, has
Packit 67cb25
a strong insensitivity to data outliers, but due to its non-smoothness it can
Packit 67cb25
behave unexpectedly in certain situations. GSL offers the following alternative
Packit 67cb25
location estimators, which are robust to the presence of outliers.
Packit 67cb25
Packit 67cb25
.. index::
Packit 67cb25
   single: trimmed mean
Packit 67cb25
   single: truncated mean
Packit 67cb25
   single: mean, trimmed
Packit 67cb25
   single: mean, truncated
Packit 67cb25
Packit 67cb25
Trimmed Mean
Packit 67cb25
------------
Packit 67cb25
Packit 67cb25
The trimmed mean, or *truncated mean*, discards a certain number of smallest and largest
Packit 67cb25
samples from the input vector before computing the mean of the remaining samples. The
Packit 67cb25
amount of trimming is specified by a factor :math:`\alpha \in [0,0.5]`. Then the
Packit 67cb25
number of samples discarded from both ends of the input vector is
Packit 67cb25
:math:`\left\lfloor \alpha n \right\rfloor`, where :math:`n` is the length of the input.
Packit 67cb25
So to discard 25% of the samples from each end, one would set :math:`\alpha = 0.25`.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_trmean_from_sorted_data (const double alpha, const double sorted_data[], const size_t stride, const size_t n)
Packit 67cb25
Packit 67cb25
   This function returns the trimmed mean of :data:`sorted_data`, a dataset
Packit 67cb25
   of length :data:`n` with stride :data:`stride`. The elements of the array
Packit 67cb25
   must be in ascending numerical order.  There are no checks to see
Packit 67cb25
   whether the data are sorted, so the function :func:`gsl_sort` should
Packit 67cb25
   always be used first. The trimming factor :math:`\alpha` is given in :data:`alpha`.
Packit 67cb25
   If :math:`\alpha \ge 0.5`, then the median of the input is returned.
Packit 67cb25
Packit 67cb25
.. index::
Packit 67cb25
   single: Gastwirth estimator
Packit 67cb25
Packit 67cb25
Gastwirth Estimator
Packit 67cb25
-------------------
Packit 67cb25
Packit 67cb25
Gastwirth's location estimator is a weighted sum of three order statistics,
Packit 67cb25
Packit 67cb25
.. only:: not texinfo
Packit 67cb25
Packit 67cb25
   .. math:: gastwirth = 0.3 \times Q_{\frac{1}{3}} + 0.4 \times Q_{\frac{1}{2}} + 0.3 \times Q_{\frac{2}{3}}
Packit 67cb25
Packit 67cb25
.. only:: texinfo
Packit 67cb25
Packit 67cb25
   ::
Packit 67cb25
Packit 67cb25
      gastwirth = 0.3 * Q_{1/3} + 0.4 * Q_{1/2} + 0.3 * Q_{2/3}
Packit 67cb25
Packit 67cb25
where :math:`Q_{\frac{1}{3}}` is the one-third quantile, :math:`Q_{\frac{1}{2}}` is the one-half
Packit 67cb25
quantile (i.e. median), and :math:`Q_{\frac{2}{3}}` is the two-thirds quantile.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_gastwirth_from_sorted_data (const double sorted_data[], const size_t stride, const size_t n)
Packit 67cb25
Packit 67cb25
   This function returns the Gastwirth location estimator of :data:`sorted_data`, a dataset
Packit 67cb25
   of length :data:`n` with stride :data:`stride`.  The elements of the array
Packit 67cb25
   must be in ascending numerical order.  There are no checks to see
Packit 67cb25
   whether the data are sorted, so the function :func:`gsl_sort` should
Packit 67cb25
   always be used first.
Packit 67cb25
Packit 67cb25
.. index::
Packit 67cb25
   single: robust scale estimators
Packit 67cb25
   single: scale estimation
Packit 67cb25
   single: estimation, scale
Packit 67cb25
Packit 67cb25
Robust Scale Estimates
Packit 67cb25
======================
Packit 67cb25
Packit 67cb25
A *robust scale estimate*, also known as a robust measure of scale, attempts to quantify
Packit 67cb25
the statistical dispersion (variability, scatter, spread) in a set of data which may contain outliers.
Packit 67cb25
In such datasets, the usual variance or standard deviation scale estimate can be rendered useless
Packit 67cb25
by even a single outlier.
Packit 67cb25
Packit 67cb25
.. index::
Packit 67cb25
   single: median absolute deviation
Packit 67cb25
Packit 67cb25
.. _sec_mad-statistic:
Packit 67cb25
Packit 67cb25
Median Absolute Deviation (MAD)
Packit 67cb25
-------------------------------
Packit 67cb25
Packit 67cb25
The median absolute deviation (MAD) is defined as
Packit 67cb25
Packit 67cb25
.. only:: not texinfo
Packit 67cb25
Packit 67cb25
   .. math:: MAD = 1.4826 \times \textrm{median} \left\{ \left| x_i - \textrm{median} \left( x \right) \right| \right\}
Packit 67cb25
Packit 67cb25
.. only:: texinfo
Packit 67cb25
Packit 67cb25
   ::
Packit 67cb25
Packit 67cb25
      MAD = 1.4826 median { | x_i - median(x) | }
Packit 67cb25
Packit 67cb25
In words, first the median of all samples is computed. Then the median
Packit 67cb25
is subtracted from all samples in the input to find the deviation of each sample
Packit 67cb25
from the median. The median of all absolute deviations is then the MAD.
Packit 67cb25
The factor :math:`1.4826` makes the MAD an unbiased estimator of the standard deviation for Gaussian data.
Packit 67cb25
The median absolute deviation has an asymptotic efficiency of 37%.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_mad0 (const double data[], const size_t stride, const size_t n, double work[])
Packit 67cb25
.. function:: double gsl_stats_mad (const double data[], const size_t stride, const size_t n, double work[])
Packit 67cb25
Packit 67cb25
   These functions return the median absolute deviation of :data:`data`, a dataset
Packit 67cb25
   of length :data:`n` and stride :data:`stride`.
Packit 67cb25
   The :code:`mad0` function calculates
Packit 67cb25
   :math:`\textrm{median} \left\{ \left| x_i - \textrm{median} \left( x \right) \right| \right\}`
Packit 67cb25
   (i.e. the :math:`MAD` statistic without the bias correction scale factor).
Packit 67cb25
   These functions require additional workspace of size :code:`n` provided in :data:`work`.
Packit 67cb25
Packit 67cb25
.. index::
Packit 67cb25
   single: Sn statistic
Packit 67cb25
Packit 67cb25
.. _sec_Sn-statistic:
Packit 67cb25
Packit 67cb25
:math:`S_n` Statistic
Packit 67cb25
---------------------
Packit 67cb25
Packit 67cb25
The :math:`S_n` statistic developed by Croux and Rousseeuw is defined as
Packit 67cb25
Packit 67cb25
.. only:: not texinfo
Packit 67cb25
Packit 67cb25
   .. math:: S_n = 1.1926 \times c_n \times \textrm{median}_i \left\{ \textrm{median}_j \left( \left| x_i - x_j \right| \right) \right\}
Packit 67cb25
Packit 67cb25
.. only:: texinfo
Packit 67cb25
Packit 67cb25
   ::
Packit 67cb25
Packit 67cb25
      S_n = 1.1926 * c_n * median_i { median_j ( | x_i - x_j | ) }
Packit 67cb25
Packit 67cb25
For each sample :math:`x_i, 1 \le i \le n`, the median of the values :math:`\left| x_i - x_j \right|` is computed for all
Packit 67cb25
:math:`x_j, 1 \le j \le n`. This yields :math:`n` values, whose median then gives the final :math:`S_n`.
Packit 67cb25
The factor :math:`1.1926` makes :math:`S_n` an unbiased estimate of the standard deviation for Gaussian data.
Packit 67cb25
The factor :math:`c_n` is a correction factor to correct bias in small sample sizes. :math:`S_n` has an asymptotic
Packit 67cb25
efficiency of 58%.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_Sn0_from_sorted_data (const double sorted_data[], const size_t stride, const size_t n, double work[])
Packit 67cb25
.. function:: double gsl_stats_Sn_from_sorted_data (const double sorted_data[], const size_t stride, const size_t n, double work[])
Packit 67cb25
Packit 67cb25
   These functions return the :math:`S_n` statistic of :data:`sorted_data`, a dataset
Packit 67cb25
   of length :data:`n` with stride :data:`stride`.  The elements of the array
Packit 67cb25
   must be in ascending numerical order.  There are no checks to see
Packit 67cb25
   whether the data are sorted, so the function :func:`gsl_sort` should
Packit 67cb25
   always be used first. The :code:`Sn0` function calculates
Packit 67cb25
   :math:`\textrm{median}_i \left\{ \textrm{median}_j \left( \left| x_i - x_j \right| \right) \right\}`
Packit 67cb25
   (i.e. the :math:`S_n` statistic without the bias correction scale factors).
Packit 67cb25
   These functions require additional workspace of size
Packit 67cb25
   :code:`n` provided in :data:`work`.
Packit 67cb25
Packit 67cb25
.. index::
Packit 67cb25
   single: Qn statistic
Packit 67cb25
Packit 67cb25
.. _sec_Qn-statistic:
Packit 67cb25
Packit 67cb25
:math:`Q_n` Statistic
Packit 67cb25
---------------------
Packit 67cb25
Packit 67cb25
The :math:`Q_n` statistic developed by Croux and Rousseeuw is defined as
Packit 67cb25
Packit 67cb25
.. only:: not texinfo
Packit 67cb25
Packit 67cb25
   .. math:: Q_n = 2.21914 \times d_n \times \left\{ \left| x_i - x_j \right|, i < j \right\}_{(k)}
Packit 67cb25
Packit 67cb25
.. only:: texinfo
Packit 67cb25
Packit 67cb25
   ::
Packit 67cb25
Packit 67cb25
      Q_n = 2.21914 * d_n * { | x_i - x_j |, i < j }_{(k)}
Packit 67cb25
Packit 67cb25
The factor :math:`2.21914` makes :math:`Q_n` an unbiased estimate of the standard deviation for Gaussian data.
Packit 67cb25
The factor :math:`d_n` is a correction factor to correct bias in small sample sizes. The order statistic
Packit 67cb25
is
Packit 67cb25
Packit 67cb25
.. only:: not texinfo
Packit 67cb25
Packit 67cb25
   .. math:: k = \left(
Packit 67cb25
                   \begin{array}{c}
Packit 67cb25
                     \left\lfloor \frac{n}{2} \right\rfloor + 1 \\
Packit 67cb25
                     2
Packit 67cb25
                   \end{array}
Packit 67cb25
                 \right)
Packit 67cb25
Packit 67cb25
.. only:: texinfo
Packit 67cb25
Packit 67cb25
   ::
Packit 67cb25
Packit 67cb25
      k = ( floor(n/2) + 1 )
Packit 67cb25
          (       2        )
Packit 67cb25
Packit 67cb25
:math:`Q_n` has an asymptotic efficiency of 82%.
Packit 67cb25
Packit 67cb25
.. function:: double gsl_stats_Qn0_from_sorted_data (const double sorted_data[], const size_t stride, const size_t n, double work[], int work_int[])
Packit 67cb25
              double gsl_stats_Qn_from_sorted_data (const double sorted_data[], const size_t stride, const size_t n, double work[], int work_int[])
Packit 67cb25
Packit 67cb25
   These functions return the :math:`Q_n` statistic of :data:`sorted_data`, a dataset
Packit 67cb25
   of length :data:`n` with stride :data:`stride`. The elements of the array
Packit 67cb25
   must be in ascending numerical order.  There are no checks to see
Packit 67cb25
   whether the data are sorted, so the function :func:`gsl_sort` should
Packit 67cb25
   always be used first. The :code:`Qn0` function calculates
Packit 67cb25
   :math:`\left\{ \left| x_i - x_j \right|, i < j \right\}_{(k)}`
Packit 67cb25
   (i.e. :math:`Q_n` without the bias correction scale factors).
Packit 67cb25
   These functions require additional workspace of size
Packit 67cb25
   :code:`3n` provided in :data:`work` and integer workspace of size :code:`5n`
Packit 67cb25
   provided in :data:`work_int`.
Packit 67cb25
Packit 67cb25
Examples
Packit 67cb25
========
Packit 67cb25
Packit 67cb25
Here is a basic example of how to use the statistical functions:
Packit 67cb25
Packit 67cb25
.. include:: examples/stat.c
Packit 67cb25
   :code:
Packit 67cb25
Packit 67cb25
The program should produce the following output,
Packit 67cb25
Packit 67cb25
.. include:: examples/stat.txt
Packit 67cb25
   :code:
Packit 67cb25
Packit 67cb25
Here is an example using sorted data,
Packit 67cb25
Packit 67cb25
.. include:: examples/statsort.c
Packit 67cb25
   :code:
Packit 67cb25
Packit 67cb25
This program should produce the following output,
Packit 67cb25
Packit 67cb25
.. include:: examples/statsort.txt
Packit 67cb25
   :code:
Packit 67cb25
Packit 67cb25
References and Further Reading
Packit 67cb25
==============================
Packit 67cb25
Packit 67cb25
The standard reference for almost any topic in statistics is the
Packit 67cb25
multi-volume *Advanced Theory of Statistics* by Kendall and Stuart.
Packit 67cb25
Packit 67cb25
* Maurice Kendall, Alan Stuart, and J. Keith Ord.
Packit 67cb25
  *The Advanced Theory of Statistics* (multiple volumes)
Packit 67cb25
  reprinted as *Kendall's Advanced Theory of Statistics*.
Packit 67cb25
  Wiley, ISBN 047023380X.
Packit 67cb25
Packit 67cb25
Many statistical concepts can be more easily understood by a Bayesian
Packit 67cb25
approach.  The following book by Gelman, Carlin, Stern and Rubin gives a
Packit 67cb25
comprehensive coverage of the subject.
Packit 67cb25
Packit 67cb25
* Andrew Gelman, John B. Carlin, Hal S. Stern, Donald B. Rubin.
Packit 67cb25
  *Bayesian Data Analysis*.
Packit 67cb25
  Chapman & Hall, ISBN 0412039915.
Packit 67cb25
Packit 67cb25
For physicists the Particle Data Group provides useful reviews of
Packit 67cb25
Probability and Statistics in the "Mathematical Tools" section of its
Packit 67cb25
Annual Review of Particle Physics. 
Packit 67cb25
Packit 67cb25
* *Review of Particle Properties*,
Packit 67cb25
  R.M. Barnett et al., Physical Review D54, 1 (1996)
Packit 67cb25
Packit 67cb25
The Review of Particle Physics is available online at
Packit 67cb25
the website http://pdg.lbl.gov/.
Packit 67cb25
Packit 67cb25
The following papers describe robust scale estimation,
Packit 67cb25
Packit 67cb25
* C. Croux and P. J. Rousseeuw, *Time-Efficient algorithms for two highly robust
Packit 67cb25
  estimators of scale*, Comp. Stat., Physica, Heidelberg, 1992.
Packit 67cb25
* P. J. Rousseeuw and C. Croux, *Explicit scale estimators with high breakdown point*,
Packit 67cb25
  L1-Statistical Analysis and Related Methods, pp. 77-92, 1992.