Tree - source-git/gsl - CentOS Git server

source-git / gsl

Blame statistics/TODO

Blob History Raw

Packit	67cb25	`# -- org --`
Packit	67cb25	`#+CATEGORY: statistics`
Packit	67cb25
Packit	67cb25	`* From: James Theiler <jt@lanl.gov>`
Packit	67cb25	`To: John Lamb <J.D.Lamb@btinternet.com>`
Packit	67cb25	`Cc: gsl-discuss@sources.redhat.com`
Packit	67cb25	`Subject: Re: Collecting statistics for time dependent data?`
Packit	67cb25	`Date: Thu, 9 Dec 2004 14:18:36 -0700 (MST)`
Packit	67cb25
Packit	67cb25	`On Thu, 9 Dec 2004, John Lamb wrote:`
Packit	67cb25
Packit	67cb25	`] Raimondo Giammanco wrote:`
Packit	67cb25	`] > Hello,`
Packit	67cb25	`] >`
Packit	67cb25	`] > I was wondering if there is a way to compute "running" statistics with`
Packit	67cb25	`] > gsl.`
Packit	67cb25	`] >`
Packit	67cb25	`] Yes you can do it, but there's nothing in GSL that does it and its eay`
Packit	67cb25	`] enough that you don't need GSL. Something like (untested)`
Packit	67cb25	`]`
Packit	67cb25	`] double update_mean( double* mean, int* n, double x ){`
Packit	67cb25	`] if( *n == 1 )`
Packit	67cb25	`] *mean = x;`
Packit	67cb25	`] else`
Packit	67cb25	`] mean = (1 - (double)1 / n ) * *mean + x / n;`
Packit	67cb25	`] }`
Packit	67cb25	`]`
Packit	67cb25	`] will work and you can derive a similar method for updating the variance`
Packit	67cb25	`] using the usual textbook formula.`
Packit	67cb25	`]`
Packit	67cb25	`] var[x] = (1/n) sum x^2_i - mean(x)^2`
Packit	67cb25	`]`
Packit	67cb25	`] I don't know if there is a method that avoids the rounding errors. I`
Packit	67cb25	`] don't know why so many textbooks repeat this formula without the`
Packit	67cb25	`] slightest warning that it can go so badly wrong.`
Packit	67cb25	`]`
Packit	67cb25	`]`
Packit	67cb25
Packit	67cb25	`Stably updating mean and variance is remarkably nontrivial. There was`
Packit	67cb25	`a series of papers in Comm ACM that discussed the issue; the final one`
Packit	67cb25	`(that I know of) refers back to the earlier ones, and it can be found`
Packit	67cb25	`in D.H.D. West, Updating mean and variance estimates: an improved`
Packit	67cb25	`method, Comm ACM 22:9, 532 (1979) [* I see Luke Stras just sent this`
Packit	67cb25	`reference! *]. I'll just copy out the pseudocode since the paper is`
Packit	67cb25	`old enough that it might not be easy to find. This, by the way, is`
Packit	67cb25	`generalized for weighted data, so it assumes that you get a weight and`
Packit	67cb25	`a data value (W_i and X_i) that you use to update the estimates XBAR`
Packit	67cb25	`and S2:`
Packit	67cb25
Packit	67cb25	`SUMW = W_1`
Packit	67cb25	`M = X_1`
Packit	67cb25	`T = 0`
Packit	67cb25	`For i=2,3,...,n`
Packit	67cb25	`{`
Packit	67cb25	`Q = X_i - M`
Packit	67cb25	`TEMP = SUM + W_i // typo: He meant SUMW`
Packit	67cb25	`R = Q*W_i/TEMP`
Packit	67cb25	`M = M + R`
Packit	67cb25	`T = T + RSUMWQ`
Packit	67cb25	`SUMW = TEMP`
Packit	67cb25	`}`
Packit	67cb25	`XBAR = M`
Packit	67cb25	`S2 = Tn/((n-1)SUMW)`
Packit	67cb25
Packit	67cb25
Packit	67cb25
Packit	67cb25	`jt`
Packit	67cb25
Packit	67cb25	`--`
Packit	67cb25	`James Theiler Space and Remote Sensing Sciences`
Packit	67cb25	`MS-B244, ISR-2, LANL Los Alamos National Laboratory`
Packit	67cb25	`Los Alamos, NM 87545 http://nis-www.lanl.gov/~jt`
Packit	67cb25
Packit	67cb25
Packit	67cb25	`* Look at STARPAC ftp://ftp.ucar.edu/starpac/ and Statlib`
Packit	67cb25	`http://lib.stat.cmu.edu/ for more ideas`
Packit	67cb25
Packit	67cb25	`* Try using the Kahan summation formula to improve accuracy for the`
Packit	67cb25	`NIST tests (see Brian for details, below is a sketch of the algorithm).`
Packit	67cb25
Packit	67cb25	`sum = x(1)`
Packit	67cb25	`c = 0`
Packit	67cb25
Packit	67cb25	`DO i = 2, 1000000, 1`
Packit	67cb25	`y = x(i) - c`
Packit	67cb25	`t = sum + y`
Packit	67cb25	`c = (t - sum) - y`
Packit	67cb25	`sum = t`
Packit	67cb25	`ENDDO`
Packit	67cb25
Packit	67cb25	`* Prevent incorrect use of unsorted data for quartile calculations`
Packit	67cb25	`using a typedef for sorted data (?)`
Packit	67cb25
Packit	67cb25	`* Rejection of outliers`
Packit	67cb25
Packit	67cb25	`* Time series. Auto correlation, cross-correlation, smoothing (moving`
Packit	67cb25	`average), detrending, various econometric things. Integrated`
Packit	67cb25	`quantities (area under the curve). Interpolation of noisy data/fitting`
Packit	67cb25	`-- maybe add that to the existing interpolation stuff.What about`
Packit	67cb25	`missing data and gaps?`
Packit	67cb25
Packit	67cb25	`There is a new GNU package called gretl which does econometrics`
Packit	67cb25
Packit	67cb25	`* Statistical tests (equal means, equal variance, etc).`
Packit	67cb25

source-git / gsl

Source Code

Blame statistics/TODO