
Packit Service 
3880ab 
Notes about distribution tables from Nistnet


Packit Service 
3880ab 



Packit Service 
3880ab 
I. About the distribution tables


Packit Service 
3880ab 


Packit Service 
3880ab 
The table used for "synthesizing" the distribution is essentially a scaled,


Packit Service 
3880ab 
translated, inverse to the cumulative distribution function.


Packit Service 
3880ab 


Packit Service 
3880ab 
Here's how to think about it: Let F() be the cumulative distribution


Packit Service 
3880ab 
function for a probability distribution X. We'll assume we've scaled


Packit Service 
3880ab 
things so that X has mean 0 and standard deviation 1, though that's not


Packit Service 
3880ab 
so important here. Then:


Packit Service 
3880ab 


Packit Service 
3880ab 
F(x) = P(X <= x) = \int_{inf}^x f


Packit Service 
3880ab 


Packit Service 
3880ab 
where f is the probability density function.


Packit Service 
3880ab 


Packit Service 
3880ab 
F is monotonically increasing, so has an inverse function G, with range


Packit Service 
3880ab 
0 to 1. Here, G(t) = the x such that P(X <= x) = t. (In general, G may


Packit Service 
3880ab 
have singularities if X has point masses, i.e., points x such that


Packit Service 
3880ab 
P(X = x) > 0.)


Packit Service 
3880ab 


Packit Service 
3880ab 
Now we create a tabular representation of G as follows: Choose some table


Packit Service 
3880ab 
size N, and for the ith entry, put in G(i/N). Let's call this table T.


Packit Service 
3880ab 


Packit Service 
3880ab 
The claim now is, I can create a (discrete) random variable Y whose


Packit Service 
3880ab 
distribution has the same approximate "shape" as X, simply by letting


Packit Service 
3880ab 
Y = T(U), where U is a discrete uniform random variable with range 1 to N.


Packit Service 
3880ab 
To see this, it's enough to show that Y's cumulative distribution function,


Packit Service 
3880ab 
(let's call it H), is a discrete approximation to F. But


Packit Service 
3880ab 


Packit Service 
3880ab 
H(x) = P(Y <= x)


Packit Service 
3880ab 
= (# of entries in T <= x) / N  as Y chosen uniformly from T


Packit Service 
3880ab 
= i/N, where i is the largest integer such that G(i/N) <= x


Packit Service 
3880ab 
= i/N, where i is the largest integer such that i/N <= F(x)


Packit Service 
3880ab 
 since G and F are inverse functions (and F is


Packit Service 
3880ab 
increasing)


Packit Service 
3880ab 
= floor(N*F(x))/N


Packit Service 
3880ab 


Packit Service 
3880ab 
as desired.


Packit Service 
3880ab 


Packit Service 
3880ab 
II. How to create distribution tables (in theory)


Packit Service 
3880ab 


Packit Service 
3880ab 
How can we create this table in practice? In some cases, F may have a


Packit Service 
3880ab 
simple expression which allows evaluating its inverse directly. The


Packit Service 
3880ab 
pareto distribution is one example of this. In other cases, and


Packit Service 
3880ab 
especially for matching an experimentally observed distribution, it's


Packit Service 
3880ab 
easiest simply to create a table for F and "invert" it. Here, we give


Packit Service 
3880ab 
a concrete example, namely how the new "experimental" distribution was


Packit Service 
3880ab 
created.


Packit Service 
3880ab 


Packit Service 
3880ab 
1. Collect enough data points to characterize the distribution. Here, I


Packit Service 
3880ab 
collected 25,000 "ping" roundtrip times to a "distant" point (time.nist.gov).


Packit Service 
3880ab 
That's far more data than is really necessary, but it was fairly painless to


Packit Service 
3880ab 
collect it, so...


Packit Service 
3880ab 


Packit Service 
3880ab 
2. Normalize the data so that it has mean 0 and standard deviation 1.


Packit Service 
3880ab 


Packit Service 
3880ab 
3. Determine the cumulative distribution. The code I wrote creates a table


Packit Service 
3880ab 
covering the range 10 to +10, with granularity .00005. Obviously, this


Packit Service 
3880ab 
is absurdly overprecise, but since it's a onetime only computation, I


Packit Service 
3880ab 
figured it hardly mattered.


Packit Service 
3880ab 


Packit Service 
3880ab 
4. Invert the table: for each table entry F(x) = y, make the y*TABLESIZE


Packit Service 
3880ab 
(here, 4096) entry be x*TABLEFACTOR (here, 8192). This creates a table


Packit Service 
3880ab 
for the ("normalized") inverse of size TABLESIZE, covering its domain 0


Packit Service 
3880ab 
to 1 with granularity 1/TABLESIZE. Note that even with the granularity


Packit Service 
3880ab 
used in creating the table for F, it's possible not all the entries in


Packit Service 
3880ab 
the table for G will be filled in. So, make a pass through the


Packit Service 
3880ab 
inverse's table, filling in any missing entries by linear interpolation.


Packit Service 
3880ab 


Packit Service 
3880ab 
III. How to create distribution tables (in practice)


Packit Service 
3880ab 


Packit Service 
3880ab 
If you want to do all this yourself, I've provided several tools to help:


Packit Service 
3880ab 


Packit Service 
3880ab 
1. maketable does the steps 24 above, and then generates the appropriate


Packit Service 
3880ab 
header file. So if you have your own time distribution, you can generate


Packit Service 
3880ab 
the header simply by:


Packit Service 
3880ab 


Packit Service 
3880ab 
maketable < time.values > header.h


Packit Service 
3880ab 


Packit Service 
3880ab 
2. As explained in the other README file, the somewhat sleazy way I have


Packit Service 
3880ab 
of generating correlated values needs correction. You can generate your


Packit Service 
3880ab 
own correction tables by compiling makesigtable and makemutable with


Packit Service 
3880ab 
your header file. Check the Makefile to see how this is done.


Packit Service 
3880ab 


Packit Service 
3880ab 
3. Warning: maketable, makesigtable and especially makemutable do


Packit Service 
3880ab 
enormous amounts of floating point arithmetic. Don't try running


Packit Service 
3880ab 
these on an old 486. (NIST Net itself will run fine on such a


Packit Service 
3880ab 
system, since in operation, it just needs to do a few simple integral


Packit Service 
3880ab 
calculations. But getting there takes some work.)


Packit Service 
3880ab 


Packit Service 
3880ab 
4. The tables produced are all normalized for mean 0 and standard


Packit Service 
3880ab 
deviation 1. How do you know what values to use for real? Here, I've


Packit Service 
3880ab 
provided a simple "stats" utility. Give it a series of floating point


Packit Service 
3880ab 
values, and it will return their mean (mu), standard deviation (sigma),


Packit Service 
3880ab 
and correlation coefficient (rho). You can then plug these values


Packit Service 
3880ab 
directly into NIST Net.
