Blame netem/README.distribution

Packit d3f73b
Notes about distribution tables from Nistnet
Packit d3f73b
-------------------------------------------------------------------------------
Packit d3f73b
I. About the distribution tables
Packit d3f73b
Packit d3f73b
The table used for "synthesizing" the distribution is essentially a scaled,
Packit d3f73b
translated, inverse to the cumulative distribution function.
Packit d3f73b
Packit d3f73b
Here's how to think about it: Let F() be the cumulative distribution
Packit d3f73b
function for a probability distribution X.  We'll assume we've scaled
Packit d3f73b
things so that X has mean 0 and standard deviation 1, though that's not
Packit d3f73b
so important here.  Then:
Packit d3f73b
Packit d3f73b
	F(x) = P(X <= x) = \int_{-inf}^x f
Packit d3f73b
Packit d3f73b
where f is the probability density function.
Packit d3f73b
Packit d3f73b
F is monotonically increasing, so has an inverse function G, with range
Packit d3f73b
0 to 1.  Here, G(t) = the x such that P(X <= x) = t.  (In general, G may
Packit d3f73b
have singularities if X has point masses, i.e., points x such that
Packit d3f73b
P(X = x) > 0.)
Packit d3f73b
Packit d3f73b
Now we create a tabular representation of G as follows:  Choose some table
Packit d3f73b
size N, and for the ith entry, put in G(i/N).  Let's call this table T.
Packit d3f73b
Packit d3f73b
The claim now is, I can create a (discrete) random variable Y whose
Packit d3f73b
distribution has the same approximate "shape" as X, simply by letting
Packit d3f73b
Y = T(U), where U is a discrete uniform random variable with range 1 to N.
Packit d3f73b
To see this, it's enough to show that Y's cumulative distribution function,
Packit d3f73b
(let's call it H), is a discrete approximation to F.  But
Packit d3f73b
Packit d3f73b
	H(x) = P(Y <= x)
Packit d3f73b
	     = (# of entries in T <= x) / N   -- as Y chosen uniformly from T
Packit d3f73b
	     = i/N, where i is the largest integer such that G(i/N) <= x
Packit d3f73b
	     = i/N, where i is the largest integer such that i/N <= F(x)
Packit d3f73b
	     		-- since G and F are inverse functions (and F is
Packit d3f73b
	     		   increasing)
Packit d3f73b
	     = floor(N*F(x))/N
Packit d3f73b
Packit d3f73b
as desired.
Packit d3f73b
Packit d3f73b
II. How to create distribution tables (in theory)
Packit d3f73b
Packit d3f73b
How can we create this table in practice? In some cases, F may have a
Packit d3f73b
simple expression which allows evaluating its inverse directly.  The
Packit d3f73b
pareto distribution is one example of this.  In other cases, and
Packit d3f73b
especially for matching an experimentally observed distribution, it's
Packit d3f73b
easiest simply to create a table for F and "invert" it.  Here, we give
Packit d3f73b
a concrete example, namely how the new "experimental" distribution was
Packit d3f73b
created.
Packit d3f73b
Packit d3f73b
1. Collect enough data points to characterize the distribution.  Here, I
Packit d3f73b
collected 25,000 "ping" roundtrip times to a "distant" point (time.nist.gov).
Packit d3f73b
That's far more data than is really necessary, but it was fairly painless to
Packit d3f73b
collect it, so...
Packit d3f73b
Packit d3f73b
2. Normalize the data so that it has mean 0 and standard deviation 1.
Packit d3f73b
Packit d3f73b
3. Determine the cumulative distribution.  The code I wrote creates a table
Packit d3f73b
covering the range -10 to +10, with granularity .00005.  Obviously, this
Packit d3f73b
is absurdly over-precise, but since it's a one-time only computation, I
Packit d3f73b
figured it hardly mattered.
Packit d3f73b
Packit d3f73b
4. Invert the table: for each table entry F(x) = y, make the y*TABLESIZE
Packit d3f73b
(here, 4096) entry be x*TABLEFACTOR (here, 8192).  This creates a table
Packit d3f73b
for the ("normalized") inverse of size TABLESIZE, covering its domain 0
Packit d3f73b
to 1 with granularity 1/TABLESIZE.  Note that even with the granularity
Packit d3f73b
used in creating the table for F, it's possible not all the entries in
Packit d3f73b
the table for G will be filled in.  So, make a pass through the
Packit d3f73b
inverse's table, filling in any missing entries by linear interpolation.
Packit d3f73b
Packit d3f73b
III. How to create distribution tables (in practice)
Packit d3f73b
Packit d3f73b
If you want to do all this yourself, I've provided several tools to help:
Packit d3f73b
Packit d3f73b
1. maketable does the steps 2-4 above, and then generates the appropriate
Packit d3f73b
header file.  So if you have your own time distribution, you can generate
Packit d3f73b
the header simply by:
Packit d3f73b
Packit d3f73b
	maketable < time.values > header.h
Packit d3f73b
Packit d3f73b
2. As explained in the other README file, the somewhat sleazy way I have
Packit d3f73b
of generating correlated values needs correction.  You can generate your
Packit d3f73b
own correction tables by compiling makesigtable and makemutable with
Packit d3f73b
your header file.  Check the Makefile to see how this is done.
Packit d3f73b
Packit d3f73b
3. Warning: maketable, makesigtable and especially makemutable do
Packit d3f73b
enormous amounts of floating point arithmetic.  Don't try running
Packit d3f73b
these on an old 486.  (NIST Net itself will run fine on such a
Packit d3f73b
system, since in operation, it just needs to do a few simple integral
Packit d3f73b
calculations.  But getting there takes some work.)
Packit d3f73b
Packit d3f73b
4. The tables produced are all normalized for mean 0 and standard
Packit d3f73b
deviation 1.  How do you know what values to use for real?  Here, I've
Packit d3f73b
provided a simple "stats" utility.  Give it a series of floating point
Packit d3f73b
values, and it will return their mean (mu), standard deviation (sigma),
Packit d3f73b
and correlation coefficient (rho).  You can then plug these values
Packit d3f73b
directly into NIST Net.