Tree - source-git/gsl - CentOS Git server

source-git / gsl

Blame doc/statistics.rst

Blob History Raw

Packit	67cb25	`.. index::`
Packit	67cb25	`single: statistics`
Packit	67cb25	`single: mean`
Packit	67cb25	`single: standard deviation`
Packit	67cb25	`single: variance`
Packit	67cb25	`single: estimated standard deviation`
Packit	67cb25	`single: estimated variance`
Packit	67cb25	`single: t-test`
Packit	67cb25	`single: range`
Packit	67cb25	`single: min`
Packit	67cb25	`single: max`
Packit	67cb25
Packit	67cb25	`**********`
Packit	67cb25	`Statistics`
Packit	67cb25	`**********`
Packit	67cb25
Packit	67cb25	`This chapter describes the statistical functions in the library. The`
Packit	67cb25	`basic statistical functions include routines to compute the mean,`
Packit	67cb25	`variance and standard deviation. More advanced functions allow you to`
Packit	67cb25	`calculate absolute deviations, skewness, and kurtosis as well as the`
Packit	67cb25	`median and arbitrary percentiles. The algorithms use recurrence`
Packit	67cb25	`relations to compute average quantities in a stable way, without large`
Packit	67cb25	`intermediate values that might overflow.`
Packit	67cb25
Packit	67cb25	`The functions are available in versions for datasets in the standard`
Packit	67cb25	`floating-point and integer types. The versions for double precision`
Packit	67cb25	floating-point data have the prefix :code:`gsl_stats` and are declared in
Packit	67cb25	the header file :file:`gsl_statistics_double.h`. The versions for integer
Packit	67cb25	data have the prefix :code:`gsl_stats_int` and are declared in the header
Packit	67cb25	file :file:`gsl_statistics_int.h`. All the functions operate on C
Packit	67cb25	arrays with a :code:`stride` parameter specifying the spacing between
Packit	67cb25	`elements.`
Packit	67cb25
Packit	67cb25	`Mean, Standard Deviation and Variance`
Packit	67cb25	`=====================================`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_mean (const double data[], size_t stride, size_t n)`
Packit	67cb25
Packit	67cb25	This function returns the arithmetic mean of :data:`data`, a dataset of
Packit	67cb25	length :data:`n` with stride :data:`stride`. The arithmetic mean, or
Packit	67cb25	sample mean, is denoted by :math:`\Hat\mu` and defined as,
Packit	67cb25
Packit	67cb25	`.. math:: \Hat\mu = {1 \over N} \sum x_i`
Packit	67cb25
Packit	67cb25	where :math:`x_i` are the elements of the dataset :data:`data`. For
Packit	67cb25	`samples drawn from a gaussian distribution the variance of`
Packit	67cb25	:math:`\Hat\mu` is :math:`\sigma^2 / N`.
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_variance (const double data[], size_t stride, size_t n)`
Packit	67cb25
Packit	67cb25	`This function returns the estimated, or sample, variance of`
Packit	67cb25	:data:`data`, a dataset of length :data:`n` with stride :data:`stride`. The
Packit	67cb25	estimated variance is denoted by :math:`\Hat\sigma^2` and is defined by,
Packit	67cb25
Packit	67cb25	`.. only:: not texinfo`
Packit	67cb25
Packit	67cb25	`.. math:: {\Hat\sigma}^2 = {1 \over (N-1)} \sum (x_i - {\Hat\mu})^2`
Packit	67cb25
Packit	67cb25	`.. only:: texinfo`
Packit	67cb25
Packit	67cb25	`::`
Packit	67cb25
Packit	67cb25	`\Hat\sigma^2 = (1/(N-1)) \sum (x_i - \Hat\mu)^2`
Packit	67cb25
Packit	67cb25	where :math:`x_i` are the elements of the dataset :data:`data`. Note that
Packit	67cb25	the normalization factor of :math:`1/(N-1)` results from the derivation
Packit	67cb25	of :math:`\Hat\sigma^2` as an unbiased estimator of the population
Packit	67cb25	variance :math:`\sigma^2`. For samples drawn from a Gaussian distribution
Packit	67cb25	the variance of :math:`\Hat\sigma^2` itself is :math:`2 \sigma^4 / N`.
Packit	67cb25
Packit	67cb25	This function computes the mean via a call to :func:`gsl_stats_mean`. If
Packit	67cb25	`you have already computed the mean then you can pass it directly to`
Packit	67cb25	:func:`gsl_stats_variance_m`.
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_variance_m (const double data[], size_t stride, size_t n, double mean)`
Packit	67cb25
Packit	67cb25	This function returns the sample variance of :data:`data` relative to the
Packit	67cb25	given value of :data:`mean`. The function is computed with :math:`\Hat\mu`
Packit	67cb25	replaced by the value of :data:`mean` that you supply,
Packit	67cb25
Packit	67cb25	`.. only:: not texinfo`
Packit	67cb25
Packit	67cb25	`.. math:: {\Hat\sigma}^2 = {1 \over (N-1)} \sum (x_i - mean)^2`
Packit	67cb25
Packit	67cb25	`.. only:: texinfo`
Packit	67cb25
Packit	67cb25	`::`
Packit	67cb25
Packit	67cb25	`\Hat\sigma^2 = (1/(N-1)) \sum (x_i - mean)^2`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_sd (const double data[], size_t stride, size_t n)`
Packit	67cb25	`double gsl_stats_sd_m (const double data[], size_t stride, size_t n, double mean)`
Packit	67cb25
Packit	67cb25	`The standard deviation is defined as the square root of the variance.`
Packit	67cb25	`These functions return the square root of the corresponding variance`
Packit	67cb25	`functions above.`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_tss (const double data[], size_t stride, size_t n)`
Packit	67cb25	`double gsl_stats_tss_m (const double data[], size_t stride, size_t n, double mean)`
Packit	67cb25
Packit	67cb25	These functions return the total sum of squares (TSS) of :data:`data` about
Packit	67cb25	the mean. For :func:`gsl_stats_tss_m` the user-supplied value of
Packit	67cb25	:data:`mean` is used, and for :func:`gsl_stats_tss` it is computed using
Packit	67cb25	:func:`gsl_stats_mean`.
Packit	67cb25
Packit	67cb25	`.. only:: not texinfo`
Packit	67cb25
Packit	67cb25	`.. math:: {\rm TSS} = \sum (x_i - mean)^2`
Packit	67cb25
Packit	67cb25	`.. only:: texinfo`
Packit	67cb25
Packit	67cb25	`::`
Packit	67cb25
Packit	67cb25	`TSS = \sum (x_i - mean)^2`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_variance_with_fixed_mean (const double data[], size_t stride, size_t n, double mean)`
Packit	67cb25
Packit	67cb25	`This function computes an unbiased estimate of the variance of`
Packit	67cb25	:data:`data` when the population mean :data:`mean` of the underlying
Packit	67cb25	`distribution is known a priori. In this case the estimator for`
Packit	67cb25	the variance uses the factor :math:`1/N` and the sample mean
Packit	67cb25	:math:`\Hat\mu` is replaced by the known population mean :math:`\mu`,
Packit	67cb25
Packit	67cb25	`.. only:: not texinfo`
Packit	67cb25
Packit	67cb25	`.. math:: {\Hat\sigma}^2 = {1 \over N} \sum (x_i - \mu)^2`
Packit	67cb25
Packit	67cb25	`.. only:: texinfo`
Packit	67cb25
Packit	67cb25	`::`
Packit	67cb25
Packit	67cb25	`\Hat\sigma^2 = (1/N) \sum (x_i - \mu)^2`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_sd_with_fixed_mean (const double data[], size_t stride, size_t n, double mean)`
Packit	67cb25
Packit	67cb25	This function calculates the standard deviation of :data:`data` for a
Packit	67cb25	fixed population mean :data:`mean`. The result is the square root of the
Packit	67cb25	`corresponding variance function.`
Packit	67cb25
Packit	67cb25	`Absolute deviation`
Packit	67cb25	`==================`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_absdev (const double data[], size_t stride, size_t n)`
Packit	67cb25
Packit	67cb25	`This function computes the absolute deviation from the mean of`
Packit	67cb25	:data:`data`, a dataset of length :data:`n` with stride :data:`stride`. The
Packit	67cb25	`absolute deviation from the mean is defined as,`
Packit	67cb25
Packit	67cb25	`.. only:: not texinfo`
Packit	67cb25
Packit	67cb25	`.. math:: absdev = {1 \over N} \sum \|x_i - {\Hat\mu}\|`
Packit	67cb25
Packit	67cb25	`.. only:: texinfo`
Packit	67cb25
Packit	67cb25	`::`
Packit	67cb25
Packit	67cb25	`absdev = (1/N) \sum \|x_i - \Hat\mu\|`
Packit	67cb25
Packit	67cb25	where :math:`x_i` are the elements of the dataset :data:`data`. The
Packit	67cb25	`absolute deviation from the mean provides a more robust measure of the`
Packit	67cb25	`width of a distribution than the variance. This function computes the`
Packit	67cb25	mean of :data:`data` via a call to :func:`gsl_stats_mean`.
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_absdev_m (const double data[], size_t stride, size_t n, double mean)`
Packit	67cb25
Packit	67cb25	This function computes the absolute deviation of the dataset :data:`data`
Packit	67cb25	relative to the given value of :data:`mean`,
Packit	67cb25
Packit	67cb25	`.. only:: not texinfo`
Packit	67cb25
Packit	67cb25	`.. math:: absdev = {1 \over N} \sum \|x_i - mean\|`
Packit	67cb25
Packit	67cb25	`.. only:: texinfo`
Packit	67cb25
Packit	67cb25	`::`
Packit	67cb25
Packit	67cb25	`absdev = (1/N) \sum \|x_i - mean\|`
Packit	67cb25
Packit	67cb25	`This function is useful if you have already computed the mean of`
Packit	67cb25	:data:`data` (and want to avoid recomputing it), or wish to calculate the
Packit	67cb25	`absolute deviation relative to another value (such as zero, or the`
Packit	67cb25	`median).`
Packit	67cb25
Packit	67cb25	`.. index:: skewness, kurtosis`
Packit	67cb25
Packit	67cb25	`Higher moments (skewness and kurtosis)`
Packit	67cb25	`======================================`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_skew (const double data[], size_t stride, size_t n)`
Packit	67cb25
Packit	67cb25	This function computes the skewness of :data:`data`, a dataset of length
Packit	67cb25	:data:`n` with stride :data:`stride`. The skewness is defined as,
Packit	67cb25
Packit	67cb25	`.. only:: not texinfo`
Packit	67cb25
Packit	67cb25	`.. math::`
Packit	67cb25
Packit	67cb25	`skew = {1 \over N} \sum`
Packit	67cb25	`{\left( x_i - {\Hat\mu} \over {\Hat\sigma} \right)}^3`
Packit	67cb25
Packit	67cb25	`.. only:: texinfo`
Packit	67cb25
Packit	67cb25	`::`
Packit	67cb25
Packit	67cb25	`skew = (1/N) \sum ((x_i - \Hat\mu)/\Hat\sigma)^3`
Packit	67cb25
Packit	67cb25	where :math:`x_i` are the elements of the dataset :data:`data`. The skewness
Packit	67cb25	`measures the asymmetry of the tails of a distribution.`
Packit	67cb25
Packit	67cb25	`The function computes the mean and estimated standard deviation of`
Packit	67cb25	:data:`data` via calls to :func:`gsl_stats_mean` and :func:`gsl_stats_sd`.
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_skew_m_sd (const double data[], size_t stride, size_t n, double mean, double sd)`
Packit	67cb25
Packit	67cb25	This function computes the skewness of the dataset :data:`data` using the
Packit	67cb25	given values of the mean :data:`mean` and standard deviation :data:`sd`,
Packit	67cb25
Packit	67cb25	`.. only:: not texinfo`
Packit	67cb25
Packit	67cb25	`.. math:: skew = {1 \over N} \sum {\left( x_i - mean \over sd \right)}^3`
Packit	67cb25
Packit	67cb25	`.. only:: texinfo`
Packit	67cb25
Packit	67cb25	`::`
Packit	67cb25
Packit	67cb25	`skew = (1/N) \sum ((x_i - mean)/sd)^3`
Packit	67cb25
Packit	67cb25	`These functions are useful if you have already computed the mean and`
Packit	67cb25	standard deviation of :data:`data` and want to avoid recomputing them.
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_kurtosis (const double data[], size_t stride, size_t n)`
Packit	67cb25
Packit	67cb25	This function computes the kurtosis of :data:`data`, a dataset of length
Packit	67cb25	:data:`n` with stride :data:`stride`. The kurtosis is defined as,
Packit	67cb25
Packit	67cb25	`.. only:: not texinfo`
Packit	67cb25
Packit	67cb25	`.. math::`
Packit	67cb25
Packit	67cb25	`kurtosis = \left( {1 \over N} \sum`
Packit	67cb25	`{\left(x_i - {\Hat\mu} \over {\Hat\sigma} \right)}^4`
Packit	67cb25	`\right)`
Packit	67cb25	`- 3`
Packit	67cb25
Packit	67cb25	`.. only:: texinfo`
Packit	67cb25
Packit	67cb25	`::`
Packit	67cb25
Packit	67cb25	`kurtosis = ((1/N) \sum ((x_i - \Hat\mu)/\Hat\sigma)^4) - 3`
Packit	67cb25
Packit	67cb25	`The kurtosis measures how sharply peaked a distribution is, relative to`
Packit	67cb25	`its width. The kurtosis is normalized to zero for a Gaussian`
Packit	67cb25	`distribution.`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_kurtosis_m_sd (const double data[], size_t stride, size_t n, double mean, double sd)`
Packit	67cb25
Packit	67cb25	This function computes the kurtosis of the dataset :data:`data` using the
Packit	67cb25	given values of the mean :data:`mean` and standard deviation :data:`sd`,
Packit	67cb25
Packit	67cb25	`.. only:: not texinfo`
Packit	67cb25
Packit	67cb25	`.. math::`
Packit	67cb25
Packit	67cb25	`kurtosis = {1 \over N}`
Packit	67cb25	`\left( \sum {\left(x_i - mean \over sd \right)}^4 \right)`
Packit	67cb25	`- 3`
Packit	67cb25
Packit	67cb25	`.. only:: texinfo`
Packit	67cb25
Packit	67cb25	`::`
Packit	67cb25
Packit	67cb25	`kurtosis = ((1/N) \sum ((x_i - mean)/sd)^4) - 3`
Packit	67cb25
Packit	67cb25	`This function is useful if you have already computed the mean and`
Packit	67cb25	standard deviation of :data:`data` and want to avoid recomputing them.
Packit	67cb25
Packit	67cb25	`Autocorrelation`
Packit	67cb25	`===============`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_lag1_autocorrelation (const double data[], const size_t stride, const size_t n)`
Packit	67cb25
Packit	67cb25	This function computes the lag-1 autocorrelation of the dataset :data:`data`.
Packit	67cb25
Packit	67cb25	`.. only:: not texinfo`
Packit	67cb25
Packit	67cb25	`.. math::`
Packit	67cb25
Packit	67cb25	`a_1 = {\sum_{i = 2}^{n} (x_{i} - \Hat\mu) (x_{i-1} - \Hat\mu)`
Packit	67cb25	`\over`
Packit	67cb25	`\sum_{i = 1}^{n} (x_{i} - \Hat\mu) (x_{i} - \Hat\mu)}`
Packit	67cb25
Packit	67cb25	`.. only:: texinfo`
Packit	67cb25
Packit	67cb25	`::`
Packit	67cb25
Packit	67cb25	`a_1 = {\sum_{i = 2}^{n} (x_{i} - \Hat\mu) (x_{i-1} - \Hat\mu)`
Packit	67cb25	`\over`
Packit	67cb25	`\sum_{i = 1}^{n} (x_{i} - \Hat\mu) (x_{i} - \Hat\mu)}`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_lag1_autocorrelation_m (const double data[], const size_t stride, const size_t n, const double mean)`
Packit	67cb25
Packit	67cb25	`This function computes the lag-1 autocorrelation of the dataset`
Packit	67cb25	:data:`data` using the given value of the mean :data:`mean`.
Packit	67cb25
Packit	67cb25	`.. index::`
Packit	67cb25	`single: covariance, of two datasets`
Packit	67cb25
Packit	67cb25	`Covariance`
Packit	67cb25	`==========`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_covariance (const double data1[], const size_t stride1, const double data2[], const size_t stride2, const size_t n)`
Packit	67cb25
Packit	67cb25	This function computes the covariance of the datasets :data:`data1` and
Packit	67cb25	:data:`data2` which must both be of the same length :data:`n`.
Packit	67cb25
Packit	67cb25	`.. only:: not texinfo`
Packit	67cb25
Packit	67cb25	`.. math:: covar = {1 \over (n - 1)} \sum_{i = 1}^{n} (x_{i} - \Hat x) (y_{i} - \Hat y)`
Packit	67cb25
Packit	67cb25	`.. only:: texinfo`
Packit	67cb25
Packit	67cb25	`::`
Packit	67cb25
Packit	67cb25	`covar = (1/(n - 1)) \sum_{i = 1}^{n} (x_i - \Hat x) (y_i - \Hat y)`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_covariance_m (const double data1[], const size_t stride1, const double data2[], const size_t stride2, const size_t n, const double mean1, const double mean2)`
Packit	67cb25
Packit	67cb25	This function computes the covariance of the datasets :data:`data1` and
Packit	67cb25	:data:`data2` using the given values of the means, :data:`mean1` and
Packit	67cb25	:data:`mean2`. This is useful if you have already computed the means of
Packit	67cb25	:data:`data1` and :data:`data2` and want to avoid recomputing them.
Packit	67cb25
Packit	67cb25	`.. index::`
Packit	67cb25	`single: correlation, of two datasets`
Packit	67cb25
Packit	67cb25	`Correlation`
Packit	67cb25	`===========`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_correlation (const double data1[], const size_t stride1, const double data2[], const size_t stride2, const size_t n)`
Packit	67cb25
Packit	67cb25	`This function efficiently computes the Pearson correlation coefficient`
Packit	67cb25	between the datasets :data:`data1` and :data:`data2` which must both be of
Packit	67cb25	the same length :data:`n`.
Packit	67cb25
Packit	67cb25	`.. only:: not texinfo`
Packit	67cb25
Packit	67cb25	`.. math::`
Packit	67cb25
Packit	67cb25	`r = {cov(x, y) \over \Hat\sigma_x \Hat\sigma_y} =`
Packit	67cb25	`{{1 \over n-1} \sum (x_i - \Hat x) (y_i - \Hat y)`
Packit	67cb25	`\over`
Packit	67cb25	`\sqrt{{1 \over n-1} \sum (x_i - {\Hat x})^2}`
Packit	67cb25	`\sqrt{{1 \over n-1} \sum (y_i - {\Hat y})^2}`
Packit	67cb25	`}`
Packit	67cb25
Packit	67cb25	`.. only:: texinfo`
Packit	67cb25
Packit	67cb25	`::`
Packit	67cb25
Packit	67cb25	`r = cov(x, y) / (\Hat\sigma_x \Hat\sigma_y)`
Packit	67cb25	`= {1/(n-1) \sum (x_i - \Hat x) (y_i - \Hat y)`
Packit	67cb25	`\over`
Packit	67cb25	`\sqrt{1/(n-1) \sum (x_i - \Hat x)^2} \sqrt{1/(n-1) \sum (y_i - \Hat y)^2}`
Packit	67cb25	`}`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_spearman (const double data1[], const size_t stride1, const double data2[], const size_t stride2, const size_t n, double work[])`
Packit	67cb25
Packit	67cb25	`This function computes the Spearman rank correlation coefficient between`
Packit	67cb25	the datasets :data:`data1` and :data:`data2` which must both be of the same
Packit	67cb25	length :data:`n`. Additional workspace of size 2 * :data:`n` is required in
Packit	67cb25	:data:`work`. The Spearman rank correlation between vectors :math:`x` and
Packit	67cb25	:math:`y` is equivalent to the Pearson correlation between the ranked
Packit	67cb25	vectors :math:`x_R` and :math:`y_R`, where ranks are defined to be the
Packit	67cb25	`average of the positions of an element in the ascending order of the values.`
Packit	67cb25
Packit	67cb25	`Weighted Samples`
Packit	67cb25	`================`
Packit	67cb25
Packit	67cb25	`The functions described in this section allow the computation of`
Packit	67cb25	`statistics for weighted samples. The functions accept an array of`
Packit	67cb25	samples, :math:`x_i`, with associated weights, :math:`w_i`. Each sample
Packit	67cb25	:math:`x_i` is considered as having been drawn from a Gaussian
Packit	67cb25	distribution with variance :math:`\sigma_i^2`. The sample weight
Packit	67cb25	:math:`w_i` is defined as the reciprocal of this variance, :math:`w_i = 1/\sigma_i^2`.
Packit	67cb25	`Setting a weight to zero corresponds to removing a sample from a dataset.`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_wmean (const double w[], size_t wstride, const double data[], size_t stride, size_t n)`
Packit	67cb25
Packit	67cb25	This function returns the weighted mean of the dataset :data:`data` with
Packit	67cb25	stride :data:`stride` and length :data:`n`, using the set of weights :data:`w`
Packit	67cb25	with stride :data:`wstride` and length :data:`n`. The weighted mean is defined as,
Packit	67cb25
Packit	67cb25	`.. only:: not texinfo`
Packit	67cb25
Packit	67cb25	`.. math:: {\Hat\mu} = {{\sum w_i x_i} \over {\sum w_i}}`
Packit	67cb25
Packit	67cb25	`.. only:: texinfo`
Packit	67cb25
Packit	67cb25	`::`
Packit	67cb25
Packit	67cb25	`\Hat\mu = (\sum w_i x_i) / (\sum w_i)`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_wvariance (const double w[], size_t wstride, const double data[], size_t stride, size_t n)`
Packit	67cb25
Packit	67cb25	This function returns the estimated variance of the dataset :data:`data`
Packit	67cb25	with stride :data:`stride` and length :data:`n`, using the set of weights
Packit	67cb25	:data:`w` with stride :data:`wstride` and length :data:`n`. The estimated
Packit	67cb25	`variance of a weighted dataset is calculated as,`
Packit	67cb25
Packit	67cb25	`.. only:: not texinfo`
Packit	67cb25
Packit	67cb25	`.. math::`
Packit	67cb25
Packit	67cb25	`\Hat\sigma^2 = {{\sum w_i} \over {(\sum w_i)^2 - \sum (w_i^2)}}`
Packit	67cb25	`\sum w_i (x_i - \Hat\mu)^2`
Packit	67cb25
Packit	67cb25	`.. only:: texinfo`
Packit	67cb25
Packit	67cb25	`::`
Packit	67cb25
Packit	67cb25	`\Hat\sigma^2 = ((\sum w_i)/((\sum w_i)^2 - \sum (w_i^2)))`
Packit	67cb25	`\sum w_i (x_i - \Hat\mu)^2`
Packit	67cb25
Packit	67cb25	`Note that this expression reduces to an unweighted variance with the`
Packit	67cb25	familiar :math:`1/(N-1)` factor when there are :math:`N` equal non-zero
Packit	67cb25	`weights.`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_wvariance_m (const double w[], size_t wstride, const double data[], size_t stride, size_t n, double wmean)`
Packit	67cb25
Packit	67cb25	`This function returns the estimated variance of the weighted dataset`
Packit	67cb25	:data:`data` using the given weighted mean :data:`wmean`.
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_wsd (const double w[], size_t wstride, const double data[], size_t stride, size_t n)`
Packit	67cb25
Packit	67cb25	`The standard deviation is defined as the square root of the variance.`
Packit	67cb25	`This function returns the square root of the corresponding variance`
Packit	67cb25	function :func:`gsl_stats_wvariance` above.
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_wsd_m (const double w[], size_t wstride, const double data[], size_t stride, size_t n, double wmean)`
Packit	67cb25
Packit	67cb25	`This function returns the square root of the corresponding variance`
Packit	67cb25	function :func:`gsl_stats_wvariance_m` above.
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_wvariance_with_fixed_mean (const double w[], size_t wstride, const double data[], size_t stride, size_t n, const double mean)`
Packit	67cb25
Packit	67cb25	`This function computes an unbiased estimate of the variance of the weighted`
Packit	67cb25	dataset :data:`data` when the population mean :data:`mean` of the underlying
Packit	67cb25	`distribution is known a priori. In this case the estimator for`
Packit	67cb25	the variance replaces the sample mean :math:`\Hat\mu` by the known
Packit	67cb25	population mean :math:`\mu`,
Packit	67cb25
Packit	67cb25	`.. only:: not texinfo`
Packit	67cb25
Packit	67cb25	`.. math:: \Hat\sigma^2 = {{\sum w_i (x_i - \mu)^2} \over {\sum w_i}}`
Packit	67cb25
Packit	67cb25	`.. only:: texinfo`
Packit	67cb25
Packit	67cb25	`::`
Packit	67cb25
Packit	67cb25	`\Hat\sigma^2 = (\sum w_i (x_i - \mu)^2) / (\sum w_i)`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_wsd_with_fixed_mean (const double w[], size_t wstride, const double data[], size_t stride, size_t n, const double mean)`
Packit	67cb25
Packit	67cb25	`The standard deviation is defined as the square root of the variance.`
Packit	67cb25	`This function returns the square root of the corresponding variance`
Packit	67cb25	`function above.`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_wtss (const double w[], const size_t wstride, const double data[], size_t stride, size_t n)`
Packit	67cb25	`double gsl_stats_wtss_m (const double w[], const size_t wstride, const double data[], size_t stride, size_t n, double wmean)`
Packit	67cb25
Packit	67cb25	`These functions return the weighted total sum of squares (TSS) of`
Packit	67cb25	:data:`data` about the weighted mean. For :func:`gsl_stats_wtss_m` the
Packit	67cb25	user-supplied value of :data:`wmean` is used, and for :func:`gsl_stats_wtss`
Packit	67cb25	it is computed using :func:`gsl_stats_wmean`.
Packit	67cb25
Packit	67cb25	`.. only:: not texinfo`
Packit	67cb25
Packit	67cb25	`.. math:: {\rm TSS} = \sum w_i (x_i - wmean)^2`
Packit	67cb25
Packit	67cb25	`.. only:: texinfo`
Packit	67cb25
Packit	67cb25	`::`
Packit	67cb25
Packit	67cb25	`TSS = \sum w_i (x_i - wmean)^2`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_wabsdev (const double w[], size_t wstride, const double data[], size_t stride, size_t n)`
Packit	67cb25
Packit	67cb25	`This function computes the weighted absolute deviation from the weighted`
Packit	67cb25	mean of :data:`data`. The absolute deviation from the mean is defined as,
Packit	67cb25
Packit	67cb25	`.. only:: not texinfo`
Packit	67cb25
Packit	67cb25	`.. math:: absdev = {{\sum w_i \|x_i - \Hat\mu\|} \over {\sum w_i}}`
Packit	67cb25
Packit	67cb25	`.. only:: texinfo`
Packit	67cb25
Packit	67cb25	`::`
Packit	67cb25
Packit	67cb25	`absdev = (\sum w_i \|x_i - \Hat\mu\|) / (\sum w_i)`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_wabsdev_m (const double w[], size_t wstride, const double data[], size_t stride, size_t n, double wmean)`
Packit	67cb25
Packit	67cb25	`This function computes the absolute deviation of the weighted dataset`
Packit	67cb25	:data:`data` about the given weighted mean :data:`wmean`.
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_wskew (const double w[], size_t wstride, const double data[], size_t stride, size_t n)`
Packit	67cb25
Packit	67cb25	This function computes the weighted skewness of the dataset :data:`data`.
Packit	67cb25
Packit	67cb25	`.. only:: not texinfo`
Packit	67cb25
Packit	67cb25	`.. math:: skew = {{\sum w_i ((x_i - {\Hat x})/{\Hat \sigma})^3} \over {\sum w_i}}`
Packit	67cb25
Packit	67cb25	`.. only:: texinfo`
Packit	67cb25
Packit	67cb25	`::`
Packit	67cb25
Packit	67cb25	`skew = (\sum w_i ((x_i - \Hat x)/\Hat \sigma)^3) / (\sum w_i)`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_wskew_m_sd (const double w[], size_t wstride, const double data[], size_t stride, size_t n, double wmean, double wsd)`
Packit	67cb25
Packit	67cb25	This function computes the weighted skewness of the dataset :data:`data`
Packit	67cb25	`using the given values of the weighted mean and weighted standard`
Packit	67cb25	deviation, :data:`wmean` and :data:`wsd`.
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_wkurtosis (const double w[], size_t wstride, const double data[], size_t stride, size_t n)`
Packit	67cb25
Packit	67cb25	This function computes the weighted kurtosis of the dataset :data:`data`.
Packit	67cb25
Packit	67cb25	`.. only:: not texinfo`
Packit	67cb25
Packit	67cb25	`.. math:: kurtosis = {{\sum w_i ((x_i - {\Hat x})/{\Hat \sigma})^4} \over {\sum w_i}} - 3`
Packit	67cb25
Packit	67cb25	`.. only:: texinfo`
Packit	67cb25
Packit	67cb25	`::`
Packit	67cb25
Packit	67cb25	`kurtosis = ((\sum w_i ((x_i - \Hat x)/\Hat \sigma)^4) / (\sum w_i)) - 3`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_wkurtosis_m_sd (const double w[], size_t wstride, const double data[], size_t stride, size_t n, double wmean, double wsd)`
Packit	67cb25
Packit	67cb25	This function computes the weighted kurtosis of the dataset :data:`data`
Packit	67cb25	`using the given values of the weighted mean and weighted standard`
Packit	67cb25	deviation, :data:`wmean` and :data:`wsd`.
Packit	67cb25
Packit	67cb25	`Maximum and Minimum values`
Packit	67cb25	`==========================`
Packit	67cb25
Packit	67cb25	`The following functions find the maximum and minimum values of a`
Packit	67cb25	dataset (or their indices). If the data contains :code:`NaN`-s then a
Packit	67cb25	:code:`NaN` will be returned, since the maximum or minimum value is
Packit	67cb25	`undefined. For functions which return an index, the location of the`
Packit	67cb25	first :code:`NaN` in the array is returned.
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_max (const double data[], size_t stride, size_t n)`
Packit	67cb25
Packit	67cb25	This function returns the maximum value in :data:`data`, a dataset of
Packit	67cb25	length :data:`n` with stride :data:`stride`. The maximum value is defined
Packit	67cb25	as the value of the element :math:`x_i` which satisfies :math:`x_i \ge x_j`
Packit	67cb25	for all :math:`j`.
Packit	67cb25
Packit	67cb25	`If you want instead to find the element with the largest absolute`
Packit	67cb25	magnitude you will need to apply :func:`fabs` or :func:`abs` to your data
Packit	67cb25	`before calling this function.`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_min (const double data[], size_t stride, size_t n)`
Packit	67cb25
Packit	67cb25	This function returns the minimum value in :data:`data`, a dataset of
Packit	67cb25	length :data:`n` with stride :data:`stride`. The minimum value is defined
Packit	67cb25	as the value of the element :math:`x_i` which satisfies :math:`x_i \le x_j`
Packit	67cb25	for all :math:`j`.
Packit	67cb25
Packit	67cb25	`If you want instead to find the element with the smallest absolute`
Packit	67cb25	magnitude you will need to apply :func:`fabs` or :func:`abs` to your data
Packit	67cb25	`before calling this function.`
Packit	67cb25
Packit	67cb25	`.. function:: void gsl_stats_minmax (double * min, double * max, const double data[], size_t stride, size_t n)`
Packit	67cb25
Packit	67cb25	This function finds both the minimum and maximum values :data:`min`,
Packit	67cb25	:data:`max` in :data:`data` in a single pass.
Packit	67cb25
Packit	67cb25	`.. function:: size_t gsl_stats_max_index (const double data[], size_t stride, size_t n)`
Packit	67cb25
Packit	67cb25	This function returns the index of the maximum value in :data:`data`, a
Packit	67cb25	dataset of length :data:`n` with stride :data:`stride`. The maximum value is
Packit	67cb25	defined as the value of the element :math:`x_i` which satisfies
Packit	67cb25	:math:`x_i \ge x_j`
Packit	67cb25	for all :math:`j`. When there are several equal maximum
Packit	67cb25	`elements then the first one is chosen.`
Packit	67cb25
Packit	67cb25	`.. function:: size_t gsl_stats_min_index (const double data[], size_t stride, size_t n)`
Packit	67cb25
Packit	67cb25	This function returns the index of the minimum value in :data:`data`, a
Packit	67cb25	dataset of length :data:`n` with stride :data:`stride`. The minimum value
Packit	67cb25	is defined as the value of the element :math:`x_i` which satisfies
Packit	67cb25	:math:`x_i \ge x_j`
Packit	67cb25	for all :math:`j`. When there are several equal
Packit	67cb25	`minimum elements then the first one is chosen.`
Packit	67cb25
Packit	67cb25	`.. function:: void gsl_stats_minmax_index (size_t * min_index, size_t * max_index, const double data[], size_t stride, size_t n)`
Packit	67cb25
Packit	67cb25	This function returns the indexes :data:`min_index`, :data:`max_index` of
Packit	67cb25	the minimum and maximum values in :data:`data` in a single pass.
Packit	67cb25
Packit	67cb25	`Median and Percentiles`
Packit	67cb25	`======================`
Packit	67cb25
Packit	67cb25	`The median and percentile functions described in this section operate on`
Packit	67cb25	sorted data in :math:`O(1)` time. There is also a routine for computing
Packit	67cb25	the median of an unsorted input array in average :math:`O(n)` time using
Packit	67cb25	`the quickselect algorithm. For convenience we use quantiles, measured on a scale`
Packit	67cb25	`of 0 to 1, instead of percentiles (which use a scale of 0 to 100).`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_median_from_sorted_data (const double sorted_data[], const size_t stride, const size_t n)`
Packit	67cb25
Packit	67cb25	This function returns the median value of :data:`sorted_data`, a dataset
Packit	67cb25	of length :data:`n` with stride :data:`stride`. The elements of the array
Packit	67cb25	`must be in ascending numerical order. There are no checks to see`
Packit	67cb25	whether the data are sorted, so the function :func:`gsl_sort` should
Packit	67cb25	`always be used first.`
Packit	67cb25
Packit	67cb25	`When the dataset has an odd number of elements the median is the value`
Packit	67cb25	of element :math:`(n-1)/2`. When the dataset has an even number of
Packit	67cb25	`elements the median is the mean of the two nearest middle values,`
Packit	67cb25	elements :math:`(n-1)/2` and :math:`n/2`. Since the algorithm for
Packit	67cb25	`computing the median involves interpolation this function always returns`
Packit	67cb25	`a floating-point number, even for integer data types.`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_median (double data[], const size_t stride, const size_t n)`
Packit	67cb25
Packit	67cb25	This function returns the median value of :data:`data`, a dataset
Packit	67cb25	of length :data:`n` with stride :data:`stride`. The median is found
Packit	67cb25	`using the quickselect algorithm. The input array does not need to be`
Packit	67cb25	`sorted, but note that the algorithm rearranges the array and so the input`
Packit	67cb25	`is not preserved on output.`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_quantile_from_sorted_data (const double sorted_data[], size_t stride, size_t n, double f)`
Packit	67cb25
Packit	67cb25	This function returns a quantile value of :data:`sorted_data`, a
Packit	67cb25	double-precision array of length :data:`n` with stride :data:`stride`. The
Packit	67cb25	`elements of the array must be in ascending numerical order. The`
Packit	67cb25	quantile is determined by the :data:`f`, a fraction between 0 and 1. For
Packit	67cb25	example, to compute the value of the 75th percentile :data:`f` should have
Packit	67cb25	`the value 0.75.`
Packit	67cb25
Packit	67cb25	`There are no checks to see whether the data are sorted, so the function`
Packit	67cb25	:func:`gsl_sort` should always be used first.
Packit	67cb25
Packit	67cb25	`The quantile is found by interpolation, using the formula`
Packit	67cb25
Packit	67cb25	`.. only:: not texinfo`
Packit	67cb25
Packit	67cb25	`.. math:: \hbox{quantile} = (1 - \delta) x_i + \delta x_{i+1}`
Packit	67cb25
Packit	67cb25	`.. only:: texinfo`
Packit	67cb25
Packit	67cb25	`::`
Packit	67cb25
Packit	67cb25	`quantile = (1 - \delta) x_i + \delta x_{i+1}`
Packit	67cb25
Packit	67cb25	where :math:`i` is :code:`floor((n - 1)f)` and :math:`\delta` is
Packit	67cb25	:math:`(n-1)f - i`.
Packit	67cb25
Packit	67cb25	Thus the minimum value of the array (:code:`data[0*stride]`) is given by
Packit	67cb25	:data:`f` equal to zero, the maximum value (:code:`data[(n-1)*stride]`) is
Packit	67cb25	given by :data:`f` equal to one and the median value is given by :data:`f`
Packit	67cb25	`equal to 0.5. Since the algorithm for computing quantiles involves`
Packit	67cb25	`interpolation this function always returns a floating-point number, even`
Packit	67cb25	`for integer data types.`
Packit	67cb25
Packit	67cb25	`.. @node Statistical tests`
Packit	67cb25	`.. @section Statistical tests`
Packit	67cb25
Packit	67cb25	`.. FIXME, do more work on the statistical tests`
Packit	67cb25
Packit	67cb25	`.. -@deftypefun double gsl_stats_ttest (const double data1[], double data2[], size_t n1, size_t n2)`
Packit	67cb25	`.. -@deftypefunx Statistics double gsl_stats_int_ttest (const double data1[], double data2[], size_t n1, size_t n2)`
Packit	67cb25
Packit	67cb25	.. The function :func:`gsl_stats_ttest` computes the t-test statistic for
Packit	67cb25	.. the two arrays :data:`data1`[] and :data:`data2`[], of lengths :data:`n1` and
Packit	67cb25	.. -:data:`n2` respectively.
Packit	67cb25
Packit	67cb25	`.. The t-test statistic measures the difference between the means of two`
Packit	67cb25	`.. datasets.`
Packit	67cb25
Packit	67cb25	`Order Statistics`
Packit	67cb25	`================`
Packit	67cb25
Packit	67cb25	The :math:`k`-th order statistic of a sample is equal to its :math:`k`-th smallest value.
Packit	67cb25	The :math:`k`-th order statistic of a set of :math:`n` values :math:`x = \left\{ x_i \right\}, 1 \le i \le n` is
Packit	67cb25	denoted :math:`x_{(k)}`. The median of the set :math:`x` is equal to :math:`x_{\left( \frac{n}{2} \right)}` if
Packit	67cb25	:math:`n` is odd, or the average of :math:`x_{\left( \frac{n}{2} \right)}` and :math:`x_{\left( \frac{n}{2} + 1 \right)}`
Packit	67cb25	if :math:`n` is even. The :math:`k`-th smallest element of a length :math:`n` vector can be found
Packit	67cb25	in average :math:`O(n)` time using the quickselect algorithm.
Packit	67cb25
Packit	67cb25	`.. function:: gsl_stats_select(double data[], const size_t stride, const size_t n, const size_t k)`
Packit	67cb25
Packit	67cb25	This function finds the :data:`k`-th smallest element of the input array :data:`data`
Packit	67cb25	of length :data:`n` and stride :data:`stride` using the quickselect method. The
Packit	67cb25	algorithm rearranges the elements of :data:`data` and so the input array is not preserved
Packit	67cb25	`on output.`
Packit	67cb25
Packit	67cb25	`.. index::`
Packit	67cb25	`single: robust location estimators`
Packit	67cb25	`single: location estimation`
Packit	67cb25	`single: estimation, location`
Packit	67cb25
Packit	67cb25	`Robust Location Estimates`
Packit	67cb25	`=========================`
Packit	67cb25
Packit	67cb25	`A location estimate refers to a typical or central value which best describes a given`
Packit	67cb25	`dataset. The mean and median are both examples of location estimators. However, the`
Packit	67cb25	`mean has a severe sensitivity to data outliers and can give erroneous values when`
Packit	67cb25	`even a small number of outliers are present. The median on the other hand, has`
Packit	67cb25	`a strong insensitivity to data outliers, but due to its non-smoothness it can`
Packit	67cb25	`behave unexpectedly in certain situations. GSL offers the following alternative`
Packit	67cb25	`location estimators, which are robust to the presence of outliers.`
Packit	67cb25
Packit	67cb25	`.. index::`
Packit	67cb25	`single: trimmed mean`
Packit	67cb25	`single: truncated mean`
Packit	67cb25	`single: mean, trimmed`
Packit	67cb25	`single: mean, truncated`
Packit	67cb25
Packit	67cb25	`Trimmed Mean`
Packit	67cb25	`------------`
Packit	67cb25
Packit	67cb25	`The trimmed mean, or truncated mean, discards a certain number of smallest and largest`
Packit	67cb25	`samples from the input vector before computing the mean of the remaining samples. The`
Packit	67cb25	amount of trimming is specified by a factor :math:`\alpha \in [0,0.5]`. Then the
Packit	67cb25	`number of samples discarded from both ends of the input vector is`
Packit	67cb25	:math:`\left\lfloor \alpha n \right\rfloor`, where :math:`n` is the length of the input.
Packit	67cb25	So to discard 25% of the samples from each end, one would set :math:`\alpha = 0.25`.
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_trmean_from_sorted_data (const double alpha, const double sorted_data[], const size_t stride, const size_t n)`
Packit	67cb25
Packit	67cb25	This function returns the trimmed mean of :data:`sorted_data`, a dataset
Packit	67cb25	of length :data:`n` with stride :data:`stride`. The elements of the array
Packit	67cb25	`must be in ascending numerical order. There are no checks to see`
Packit	67cb25	whether the data are sorted, so the function :func:`gsl_sort` should
Packit	67cb25	always be used first. The trimming factor :math:`\alpha` is given in :data:`alpha`.
Packit	67cb25	If :math:`\alpha \ge 0.5`, then the median of the input is returned.
Packit	67cb25
Packit	67cb25	`.. index::`
Packit	67cb25	`single: Gastwirth estimator`
Packit	67cb25
Packit	67cb25	`Gastwirth Estimator`
Packit	67cb25	`-------------------`
Packit	67cb25
Packit	67cb25	`Gastwirth's location estimator is a weighted sum of three order statistics,`
Packit	67cb25
Packit	67cb25	`.. only:: not texinfo`
Packit	67cb25
Packit	67cb25	`.. math:: gastwirth = 0.3 \times Q_{\frac{1}{3}} + 0.4 \times Q_{\frac{1}{2}} + 0.3 \times Q_{\frac{2}{3}}`
Packit	67cb25
Packit	67cb25	`.. only:: texinfo`
Packit	67cb25
Packit	67cb25	`::`
Packit	67cb25
Packit	67cb25	`gastwirth = 0.3 * Q_{1/3} + 0.4 * Q_{1/2} + 0.3 * Q_{2/3}`
Packit	67cb25
Packit	67cb25	where :math:`Q_{\frac{1}{3}}` is the one-third quantile, :math:`Q_{\frac{1}{2}}` is the one-half
Packit	67cb25	quantile (i.e. median), and :math:`Q_{\frac{2}{3}}` is the two-thirds quantile.
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_gastwirth_from_sorted_data (const double sorted_data[], const size_t stride, const size_t n)`
Packit	67cb25
Packit	67cb25	This function returns the Gastwirth location estimator of :data:`sorted_data`, a dataset
Packit	67cb25	of length :data:`n` with stride :data:`stride`. The elements of the array
Packit	67cb25	`must be in ascending numerical order. There are no checks to see`
Packit	67cb25	whether the data are sorted, so the function :func:`gsl_sort` should
Packit	67cb25	`always be used first.`
Packit	67cb25
Packit	67cb25	`.. index::`
Packit	67cb25	`single: robust scale estimators`
Packit	67cb25	`single: scale estimation`
Packit	67cb25	`single: estimation, scale`
Packit	67cb25
Packit	67cb25	`Robust Scale Estimates`
Packit	67cb25	`======================`
Packit	67cb25
Packit	67cb25	`A robust scale estimate, also known as a robust measure of scale, attempts to quantify`
Packit	67cb25	`the statistical dispersion (variability, scatter, spread) in a set of data which may contain outliers.`
Packit	67cb25	`In such datasets, the usual variance or standard deviation scale estimate can be rendered useless`
Packit	67cb25	`by even a single outlier.`
Packit	67cb25
Packit	67cb25	`.. index::`
Packit	67cb25	`single: median absolute deviation`
Packit	67cb25
Packit	67cb25	`.. _sec_mad-statistic:`
Packit	67cb25
Packit	67cb25	`Median Absolute Deviation (MAD)`
Packit	67cb25	`-------------------------------`
Packit	67cb25
Packit	67cb25	`The median absolute deviation (MAD) is defined as`
Packit	67cb25
Packit	67cb25	`.. only:: not texinfo`
Packit	67cb25
Packit	67cb25	`.. math:: MAD = 1.4826 \times \textrm{median} \left\{ \left\| x_i - \textrm{median} \left( x \right) \right\| \right\}`
Packit	67cb25
Packit	67cb25	`.. only:: texinfo`
Packit	67cb25
Packit	67cb25	`::`
Packit	67cb25
Packit	67cb25	`MAD = 1.4826 median { \| x_i - median(x) \| }`
Packit	67cb25
Packit	67cb25	`In words, first the median of all samples is computed. Then the median`
Packit	67cb25	`is subtracted from all samples in the input to find the deviation of each sample`
Packit	67cb25	`from the median. The median of all absolute deviations is then the MAD.`
Packit	67cb25	The factor :math:`1.4826` makes the MAD an unbiased estimator of the standard deviation for Gaussian data.
Packit	67cb25	`The median absolute deviation has an asymptotic efficiency of 37%.`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_mad0 (const double data[], const size_t stride, const size_t n, double work[])`
Packit	67cb25	`.. function:: double gsl_stats_mad (const double data[], const size_t stride, const size_t n, double work[])`
Packit	67cb25
Packit	67cb25	These functions return the median absolute deviation of :data:`data`, a dataset
Packit	67cb25	of length :data:`n` and stride :data:`stride`.
Packit	67cb25	The :code:`mad0` function calculates
Packit	67cb25	:math:`\textrm{median} \left\{ \left\| x_i - \textrm{median} \left( x \right) \right\| \right\}`
Packit	67cb25	(i.e. the :math:`MAD` statistic without the bias correction scale factor).
Packit	67cb25	These functions require additional workspace of size :code:`n` provided in :data:`work`.
Packit	67cb25
Packit	67cb25	`.. index::`
Packit	67cb25	`single: Sn statistic`
Packit	67cb25
Packit	67cb25	`.. _sec_Sn-statistic:`
Packit	67cb25
Packit	67cb25	:math:`S_n` Statistic
Packit	67cb25	`---------------------`
Packit	67cb25
Packit	67cb25	The :math:`S_n` statistic developed by Croux and Rousseeuw is defined as
Packit	67cb25
Packit	67cb25	`.. only:: not texinfo`
Packit	67cb25
Packit	67cb25	`.. math:: S_n = 1.1926 \times c_n \times \textrm{median}_i \left\{ \textrm{median}_j \left( \left\| x_i - x_j \right\| \right) \right\}`
Packit	67cb25
Packit	67cb25	`.. only:: texinfo`
Packit	67cb25
Packit	67cb25	`::`
Packit	67cb25
Packit	67cb25	`S_n = 1.1926 * c_n * median_i { median_j ( \| x_i - x_j \| ) }`
Packit	67cb25
Packit	67cb25	For each sample :math:`x_i, 1 \le i \le n`, the median of the values :math:`\left\| x_i - x_j \right\|` is computed for all
Packit	67cb25	:math:`x_j, 1 \le j \le n`. This yields :math:`n` values, whose median then gives the final :math:`S_n`.
Packit	67cb25	The factor :math:`1.1926` makes :math:`S_n` an unbiased estimate of the standard deviation for Gaussian data.
Packit	67cb25	The factor :math:`c_n` is a correction factor to correct bias in small sample sizes. :math:`S_n` has an asymptotic
Packit	67cb25	`efficiency of 58%.`
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_Sn0_from_sorted_data (const double sorted_data[], const size_t stride, const size_t n, double work[])`
Packit	67cb25	`.. function:: double gsl_stats_Sn_from_sorted_data (const double sorted_data[], const size_t stride, const size_t n, double work[])`
Packit	67cb25
Packit	67cb25	These functions return the :math:`S_n` statistic of :data:`sorted_data`, a dataset
Packit	67cb25	of length :data:`n` with stride :data:`stride`. The elements of the array
Packit	67cb25	`must be in ascending numerical order. There are no checks to see`
Packit	67cb25	whether the data are sorted, so the function :func:`gsl_sort` should
Packit	67cb25	always be used first. The :code:`Sn0` function calculates
Packit	67cb25	:math:`\textrm{median}_i \left\{ \textrm{median}_j \left( \left\| x_i - x_j \right\| \right) \right\}`
Packit	67cb25	(i.e. the :math:`S_n` statistic without the bias correction scale factors).
Packit	67cb25	`These functions require additional workspace of size`
Packit	67cb25	:code:`n` provided in :data:`work`.
Packit	67cb25
Packit	67cb25	`.. index::`
Packit	67cb25	`single: Qn statistic`
Packit	67cb25
Packit	67cb25	`.. _sec_Qn-statistic:`
Packit	67cb25
Packit	67cb25	:math:`Q_n` Statistic
Packit	67cb25	`---------------------`
Packit	67cb25
Packit	67cb25	The :math:`Q_n` statistic developed by Croux and Rousseeuw is defined as
Packit	67cb25
Packit	67cb25	`.. only:: not texinfo`
Packit	67cb25
Packit	67cb25	`.. math:: Q_n = 2.21914 \times d_n \times \left\{ \left\| x_i - x_j \right\|, i < j \right\}_{(k)}`
Packit	67cb25
Packit	67cb25	`.. only:: texinfo`
Packit	67cb25
Packit	67cb25	`::`
Packit	67cb25
Packit	67cb25	`Q_n = 2.21914 * d_n * { \| x_i - x_j \|, i < j }_{(k)}`
Packit	67cb25
Packit	67cb25	The factor :math:`2.21914` makes :math:`Q_n` an unbiased estimate of the standard deviation for Gaussian data.
Packit	67cb25	The factor :math:`d_n` is a correction factor to correct bias in small sample sizes. The order statistic
Packit	67cb25	`is`
Packit	67cb25
Packit	67cb25	`.. only:: not texinfo`
Packit	67cb25
Packit	67cb25	`.. math:: k = \left(`
Packit	67cb25	`\begin{array}{c}`
Packit	67cb25	`\left\lfloor \frac{n}{2} \right\rfloor + 1 \\`
Packit	67cb25	`2`
Packit	67cb25	`\end{array}`
Packit	67cb25	`\right)`
Packit	67cb25
Packit	67cb25	`.. only:: texinfo`
Packit	67cb25
Packit	67cb25	`::`
Packit	67cb25
Packit	67cb25	`k = ( floor(n/2) + 1 )`
Packit	67cb25	`( 2 )`
Packit	67cb25
Packit	67cb25	:math:`Q_n` has an asymptotic efficiency of 82%.
Packit	67cb25
Packit	67cb25	`.. function:: double gsl_stats_Qn0_from_sorted_data (const double sorted_data[], const size_t stride, const size_t n, double work[], int work_int[])`
Packit	67cb25	`double gsl_stats_Qn_from_sorted_data (const double sorted_data[], const size_t stride, const size_t n, double work[], int work_int[])`
Packit	67cb25
Packit	67cb25	These functions return the :math:`Q_n` statistic of :data:`sorted_data`, a dataset
Packit	67cb25	of length :data:`n` with stride :data:`stride`. The elements of the array
Packit	67cb25	`must be in ascending numerical order. There are no checks to see`
Packit	67cb25	whether the data are sorted, so the function :func:`gsl_sort` should
Packit	67cb25	always be used first. The :code:`Qn0` function calculates
Packit	67cb25	:math:`\left\{ \left\| x_i - x_j \right\|, i < j \right\}_{(k)}`
Packit	67cb25	(i.e. :math:`Q_n` without the bias correction scale factors).
Packit	67cb25	`These functions require additional workspace of size`
Packit	67cb25	:code:`3n` provided in :data:`work` and integer workspace of size :code:`5n`
Packit	67cb25	provided in :data:`work_int`.
Packit	67cb25
Packit	67cb25	`Examples`
Packit	67cb25	`========`
Packit	67cb25
Packit	67cb25	`Here is a basic example of how to use the statistical functions:`
Packit	67cb25
Packit	67cb25	`.. include:: examples/stat.c`
Packit	67cb25	`:code:`
Packit	67cb25
Packit	67cb25	`The program should produce the following output,`
Packit	67cb25
Packit	67cb25	`.. include:: examples/stat.txt`
Packit	67cb25	`:code:`
Packit	67cb25
Packit	67cb25	`Here is an example using sorted data,`
Packit	67cb25
Packit	67cb25	`.. include:: examples/statsort.c`
Packit	67cb25	`:code:`
Packit	67cb25
Packit	67cb25	`This program should produce the following output,`
Packit	67cb25
Packit	67cb25	`.. include:: examples/statsort.txt`
Packit	67cb25	`:code:`
Packit	67cb25
Packit	67cb25	`References and Further Reading`
Packit	67cb25	`==============================`
Packit	67cb25
Packit	67cb25	`The standard reference for almost any topic in statistics is the`
Packit	67cb25	`multi-volume Advanced Theory of Statistics by Kendall and Stuart.`
Packit	67cb25
Packit	67cb25	`* Maurice Kendall, Alan Stuart, and J. Keith Ord.`
Packit	67cb25	`The Advanced Theory of Statistics (multiple volumes)`
Packit	67cb25	`reprinted as Kendall's Advanced Theory of Statistics.`
Packit	67cb25	`Wiley, ISBN 047023380X.`
Packit	67cb25
Packit	67cb25	`Many statistical concepts can be more easily understood by a Bayesian`
Packit	67cb25	`approach. The following book by Gelman, Carlin, Stern and Rubin gives a`
Packit	67cb25	`comprehensive coverage of the subject.`
Packit	67cb25
Packit	67cb25	`* Andrew Gelman, John B. Carlin, Hal S. Stern, Donald B. Rubin.`
Packit	67cb25	`Bayesian Data Analysis.`
Packit	67cb25	`Chapman & Hall, ISBN 0412039915.`
Packit	67cb25
Packit	67cb25	`For physicists the Particle Data Group provides useful reviews of`
Packit	67cb25	`Probability and Statistics in the "Mathematical Tools" section of its`
Packit	67cb25	`Annual Review of Particle Physics.`
Packit	67cb25
Packit	67cb25	`* Review of Particle Properties,`
Packit	67cb25	`R.M. Barnett et al., Physical Review D54, 1 (1996)`
Packit	67cb25
Packit	67cb25	`The Review of Particle Physics is available online at`
Packit	67cb25	`the website http://pdg.lbl.gov/.`
Packit	67cb25
Packit	67cb25	`The following papers describe robust scale estimation,`
Packit	67cb25
Packit	67cb25	`* C. Croux and P. J. Rousseeuw, *Time-Efficient algorithms for two highly robust`
Packit	67cb25	`estimators of scale*, Comp. Stat., Physica, Heidelberg, 1992.`
Packit	67cb25	`* P. J. Rousseeuw and C. Croux, Explicit scale estimators with high breakdown point,`
Packit	67cb25	`L1-Statistical Analysis and Related Methods, pp. 77-92, 1992.`

source-git / gsl

Source Code

Blame doc/statistics.rst