Blame netem/README.distribution

Packit Service 3880ab
Notes about distribution tables from Nistnet
Packit Service 3880ab
-------------------------------------------------------------------------------
Packit Service 3880ab
I. About the distribution tables
Packit Service 3880ab
Packit Service 3880ab
The table used for "synthesizing" the distribution is essentially a scaled,
Packit Service 3880ab
translated, inverse to the cumulative distribution function.
Packit Service 3880ab
Packit Service 3880ab
Here's how to think about it: Let F() be the cumulative distribution
Packit Service 3880ab
function for a probability distribution X.  We'll assume we've scaled
Packit Service 3880ab
things so that X has mean 0 and standard deviation 1, though that's not
Packit Service 3880ab
so important here.  Then:
Packit Service 3880ab
Packit Service 3880ab
	F(x) = P(X <= x) = \int_{-inf}^x f
Packit Service 3880ab
Packit Service 3880ab
where f is the probability density function.
Packit Service 3880ab
Packit Service 3880ab
F is monotonically increasing, so has an inverse function G, with range
Packit Service 3880ab
0 to 1.  Here, G(t) = the x such that P(X <= x) = t.  (In general, G may
Packit Service 3880ab
have singularities if X has point masses, i.e., points x such that
Packit Service 3880ab
P(X = x) > 0.)
Packit Service 3880ab
Packit Service 3880ab
Now we create a tabular representation of G as follows:  Choose some table
Packit Service 3880ab
size N, and for the ith entry, put in G(i/N).  Let's call this table T.
Packit Service 3880ab
Packit Service 3880ab
The claim now is, I can create a (discrete) random variable Y whose
Packit Service 3880ab
distribution has the same approximate "shape" as X, simply by letting
Packit Service 3880ab
Y = T(U), where U is a discrete uniform random variable with range 1 to N.
Packit Service 3880ab
To see this, it's enough to show that Y's cumulative distribution function,
Packit Service 3880ab
(let's call it H), is a discrete approximation to F.  But
Packit Service 3880ab
Packit Service 3880ab
	H(x) = P(Y <= x)
Packit Service 3880ab
	     = (# of entries in T <= x) / N   -- as Y chosen uniformly from T
Packit Service 3880ab
	     = i/N, where i is the largest integer such that G(i/N) <= x
Packit Service 3880ab
	     = i/N, where i is the largest integer such that i/N <= F(x)
Packit Service 3880ab
	     		-- since G and F are inverse functions (and F is
Packit Service 3880ab
	     		   increasing)
Packit Service 3880ab
	     = floor(N*F(x))/N
Packit Service 3880ab
Packit Service 3880ab
as desired.
Packit Service 3880ab
Packit Service 3880ab
II. How to create distribution tables (in theory)
Packit Service 3880ab
Packit Service 3880ab
How can we create this table in practice? In some cases, F may have a
Packit Service 3880ab
simple expression which allows evaluating its inverse directly.  The
Packit Service 3880ab
pareto distribution is one example of this.  In other cases, and
Packit Service 3880ab
especially for matching an experimentally observed distribution, it's
Packit Service 3880ab
easiest simply to create a table for F and "invert" it.  Here, we give
Packit Service 3880ab
a concrete example, namely how the new "experimental" distribution was
Packit Service 3880ab
created.
Packit Service 3880ab
Packit Service 3880ab
1. Collect enough data points to characterize the distribution.  Here, I
Packit Service 3880ab
collected 25,000 "ping" roundtrip times to a "distant" point (time.nist.gov).
Packit Service 3880ab
That's far more data than is really necessary, but it was fairly painless to
Packit Service 3880ab
collect it, so...
Packit Service 3880ab
Packit Service 3880ab
2. Normalize the data so that it has mean 0 and standard deviation 1.
Packit Service 3880ab
Packit Service 3880ab
3. Determine the cumulative distribution.  The code I wrote creates a table
Packit Service 3880ab
covering the range -10 to +10, with granularity .00005.  Obviously, this
Packit Service 3880ab
is absurdly over-precise, but since it's a one-time only computation, I
Packit Service 3880ab
figured it hardly mattered.
Packit Service 3880ab
Packit Service 3880ab
4. Invert the table: for each table entry F(x) = y, make the y*TABLESIZE
Packit Service 3880ab
(here, 4096) entry be x*TABLEFACTOR (here, 8192).  This creates a table
Packit Service 3880ab
for the ("normalized") inverse of size TABLESIZE, covering its domain 0
Packit Service 3880ab
to 1 with granularity 1/TABLESIZE.  Note that even with the granularity
Packit Service 3880ab
used in creating the table for F, it's possible not all the entries in
Packit Service 3880ab
the table for G will be filled in.  So, make a pass through the
Packit Service 3880ab
inverse's table, filling in any missing entries by linear interpolation.
Packit Service 3880ab
Packit Service 3880ab
III. How to create distribution tables (in practice)
Packit Service 3880ab
Packit Service 3880ab
If you want to do all this yourself, I've provided several tools to help:
Packit Service 3880ab
Packit Service 3880ab
1. maketable does the steps 2-4 above, and then generates the appropriate
Packit Service 3880ab
header file.  So if you have your own time distribution, you can generate
Packit Service 3880ab
the header simply by:
Packit Service 3880ab
Packit Service 3880ab
	maketable < time.values > header.h
Packit Service 3880ab
Packit Service 3880ab
2. As explained in the other README file, the somewhat sleazy way I have
Packit Service 3880ab
of generating correlated values needs correction.  You can generate your
Packit Service 3880ab
own correction tables by compiling makesigtable and makemutable with
Packit Service 3880ab
your header file.  Check the Makefile to see how this is done.
Packit Service 3880ab
Packit Service 3880ab
3. Warning: maketable, makesigtable and especially makemutable do
Packit Service 3880ab
enormous amounts of floating point arithmetic.  Don't try running
Packit Service 3880ab
these on an old 486.  (NIST Net itself will run fine on such a
Packit Service 3880ab
system, since in operation, it just needs to do a few simple integral
Packit Service 3880ab
calculations.  But getting there takes some work.)
Packit Service 3880ab
Packit Service 3880ab
4. The tables produced are all normalized for mean 0 and standard
Packit Service 3880ab
deviation 1.  How do you know what values to use for real?  Here, I've
Packit Service 3880ab
provided a simple "stats" utility.  Give it a series of floating point
Packit Service 3880ab
values, and it will return their mean (mu), standard deviation (sigma),
Packit Service 3880ab
and correlation coefficient (rho).  You can then plug these values
Packit Service 3880ab
directly into NIST Net.