C.3. Advanced Usage Guide

This section amplifies the Usage Guide with material on how to choose a random generator for your simulation, what default generators have been implemented for the distribution objects, and information on the set of test programs used.

C.3.1. Choosing a Generator

C.3.1.1. Choosing A Generator

In this version of Random for Swarm there are a total of 36 different generators implemented. If you are a serious simulationist you need to select which one(s) to use for your model.

Here are some factors to consider:

  1. We want a generator to have as good statistical properties as possible. We measure this by subjecting the generators to various tests. I have subjected the implemented generators to two sets of tests, Diehard and ENT. (Look in directory /testR6 of the test package.) (The Generators quality table summarizes the test results. The highlights are:

    1. The LCG and SCG generators are of very poor quality (they fail many of the tests), and should never be used. [They are likely to disappear in the next release.]

    2. The lagged-Fibonacci generators (ACG, SWBi, PSWB) all fail the Diehard `Birthday Spacings Test', and are therefore only conditionally recommended for use.

    3. The other 32-bit generators pass all tests, and I therefore have no reason not to recommend them all for use.

    4. The 31-bit generators all fail certain tests because one bit has a constant value. Beyond that they all seem to be ok.

  2. We want a generator to have a long enough period for our purpose, and in general the longer the period the better. (However, note that to generate 264 random numbers from a fast generator like MT19937 would take 2.1 million years on a 486/66, so in practice any generator with a period close to 260 or larger will be acceptable.) The PMMLCG generators, although they are of acceptable quality otherwise, have a period less than 231 which we can exhaust in under 3 hours. So use those only for quick `toy' applications.

  3. We want a generator to execute efficiently, the faster the better. The Generators data table indicates the execution times for the generators as implemented.

  4. We want a generator to take as little resources (memory) as possible. The Generators data table indicates the size of each generator's state in bits.

  5. Other things being equal, we want our generator to have a single full period rather than a number of shorter periods, since we may not know whether a particular starting seed will put us into a long or a short subcycle. This disqualifies SWB, and possibly the new MWC and RWC generators (Marsaglia does not say how many periods these generators have, but only gives the period length.)

C.3.1.2. Strategy For Using Random Generators

There are 3 possible strategies for using random generators:

  1. Use one single generator for your whole simulation, and have all `users' of randomness (distributions and other objects) call this single generator in an interleaved fashion. For this purpose, generators such as MT19937, TT800, TT775, TT443 (and possibly PSWB and SWB) seem particularly well suited since they have immense periods. The code in random/random.m shows how to connect several distribution objects to a single random generator. The generator randomGenerator supplied there is of class MT19937.

  2. Use a single generator of long period, but divide this long period up into a number of non-overlapping (hence statistically independent) segments, and let each `user' of randomness draw their random numbers from separate segments. Doing this requires that we be able to quickly access these separate segments. The `split' generators C2LCGX and C4LCGX implement such a scheme. You can specify at creation how you want the period of these generators subdivided. (See the source code for details.)

  3. Use a separate random generator for each `user' of randomness. In this case, you need to make sure that two or more generators of the same type are not started with the same seed, since in this case their output will be highly correlated. Provide your own seeds, or use RANDOMSEED or NEXTSEED.

The distribution objects, if created with the `createWithDefault: aZone' method, will create for themselves a fresh generator, with each class of distribution using a different class of generator. All the generators in this case are initialized with NEXTSEED (or RANDOMSEED, if you start the program with --varyseed).

C.3.1.3. Generator Quality

These tables shows the results of testing the generator objects statistically:

Table C-1. Random Library: Generator Statistical Tests

Generatortiming uS (unsigned)bits stateperiod (s) # lengthtests failed (x)
    abcdefghijklmnopqrstuvwx
30-bit output
SCG3.32816501 255x.xx2xxxxxxx..x..xx.xx..
32-bit output
LCG1 2.564 32 1 232 x...xxxxxxx.........x...
LCG2 2.564 32 1 232 x...xxxxxxx.........x...
LCG3 2.564 32 1 232 x...xxxxxxx.........x...
ACG 2.702 1760 1 255 x.......................
SWB1 3.285 1185 64 21178 x.......................
SWB2 3.285 7691536 2757 x.......................
SWB3 3.285 673 192 2664 x.......................
PSWB 3.452 1377 1 21376 x.......................
MWCA 3.399 64 ?? 259 ........................
MWCB 3.420 64 ?? 259 ........................
MT19937 3.69819968 1 219937........................
TT800 4.654 800 1 2800 ........................
C3MWC 6.387 192 ?? 2118 ........................
RWC2 7.445 96 ?? 292 ........................
RWC8 14.649 288 ?? 2250 ........................
31-bit output
C2TAUS1 4.078 621 260 1..-1x111x1.........x...
C2TAUS2 4.078 621 260 1..-1x111x1.........x...
C2TAUS3 4.078 621 260 1..-1x111x1.........x...
TT775 4.6547751 2775 1..-1x111x1.........x...
TT403 4.6704031 2403 1..-1x111x1.........x...
PMMLCG1 4.715 311 231 1..-1x111x1.........x...
PMMLCG2 4.715 311 231 1..-1x111x1.........x...
PMMLCG3 4.715 311 231 1..-1x111x1.........x...
PMMLCG4 4.715 311 231 1..-1x111x1.........x...
PMMLCG5 4.715 311 231 1..-1x111x1.........x...
PMMLCG6 4.715 311 231 1..-1x111x1.........x...
PMMLCG7 4.715 311 231 1..-1x111x1.........x...
PMMLCG8 4.715 311 231 1..-1x111x1.........x...
PMMLCG9 4.715 311 231 1..-1x111x1.........x...
C2LCGX 7.029 621 260 (Split)1..-1x111x1.........x...
MRG5 9.6741551 2155 1..-1x111x1.........x...
MRG6 10.4491861 2186 1..-1x111x1.........x...
MRG7 10.9132171 2217 1..-1x111x1.........x...
C4LCGX 11.6881241 2210 (Split)1..-1x111x1.........x...
C2MRG3 13.4591861 2185 1..-1x111x1.........x...
Test codeTest suiteExplanation of test codes
aDiehardBirthday Spacings Test
bDiehardOverlapping 5-permutation Test
cDiehardBinary Rank Test (31x31)
dDiehardBinary Rank Test (32x32)
eDiehardBinary Rank Test (6x8)
fDiehardBitstream Test (overlapping 20-tuples)
gDiehardOPSO (Overlapping Pairs, Sparse Occupancy)
hDiehardOQSO (Overlapping Quadruples, Sparse Occupancy)
iDiehardDNA Test
jDiehardCount-the-1's Test (integers)
kDiehardCount-the-1's Test (specific bytes)
lDiehardParking Lot Test
mDiehardMinimum Distance Test
nDiehard3D Spheres Test
oDiehardSqueeze Test
pDiehardOverlapping Sums Test
qDiehardRuns Test
rDiehardCraps Test
sENTEntropy Test
tENTCompression test
uENTChi-Square Test
vENTArithmetic Mean
wENTMonte Carlo value for Pi
xENTSerial Correlation Coefficient
IndicationExplanation of indications
.The generator passed this test
1The generator passed this test, except for the lowest bit
2The generator passed this test, except for the 2 lowest bits
xThe generator failed this test completely
-This test cannot be passed by this generator (too few bits)

Notes.

  • For 31-bit generators, it is normal to fail 1 part of these tests:a,e,g,h,i,k. This is because we left-shift the output of these generators by 1 bit, so that the lowest bit is always 0.

  • For 31-bit generators, it is normal to fail test d (32x32 rank test).

  • All 31-bit generators also fail tests f, j, and u (and most others don't.) This is *likely* due to the constant low bit.

Choosing A Generator.

  • Clearly unacceptable generators: LCG1, LCG2, LCG3, SCG.

  • Conditionally recommended generators: ACG, SWB1, SWB2, SWB3, PSWB.

  • Use with caution: the PMMLCGx generators (due to their very short period 231).

  • All other generators are recommended at this time.

C.3.1.4. More generator data

And this table gives further data about the generators:

Table C-2. Random Library: Generator Data

GeneratorSeedsModulusCycles (length)BitsSpeedState
 
(a) Simple Short Generators
LCG112321 m321.44232
LCG212321 m321.44232
LCG312321 m321.44232
PMMLCG11231-11 m-1310.78431
PMMLCG21231-11 m-1310.78431
PMMLCG31231-11 m-1310.78431
PMMLCG41231-11 m-1310.78431
PMMLCG51231-1051 m-1310.78431
PMMLCG61231-2251 m-1310.78431
PMMLCG71231-3251 m-1310.78431
PMMLCG81231-851 m-1310.78431
PMMLCG91231-2491 m-1310.78431
(b) Simple Long Generators
ACG 55 232 1 255 321.369760
SCG 55 109 1 255 301.111650
SWB1 37+c232 64 21178321.126185
SWB2 24+c232 1536 2757 321.126769
SWB3 21+c232 192 2664 321.126673
PSWB 43+c232-51 21376 321.070 1377
TT403 13 231 1 2403 310.792403
TT775 25 231 1 2775 310.795775
TT800 25 232 1 2800 320.795800
MT19937 624 232 1 219937321.00019937
MRG5 5 231-1 1 2155 310.382155
MRG6 6 231-1 1 2186 310.354 186
MRG7 7 231-1 1 2217 310.339217
MWCA 2 232 ? 259 321.088 64
MWCB 2 232 ? 259 321.088 64
RWC2 3 232 ? 292 320.497 96
RWC8 18s 232 ? 2250 320.252288
(c) Long Generators with Splitting Facilities: (none)
(d) Combined Generators
C2TAUS1 (M)2231-11 260 310.907 62
C2TAUS2 (M)2231-11 260 310.907 62
C2TAUS3 (M)2231-11 260 310.907 62
C2MRG3 (M)6231-11 2185310.275186
C3MWC (M)6232 ? 2118320.570192
(e) Combined Generators with Splitting Facilities
C2LCG (M,S)2231-851 260 310.526 62
C4LCG (M,S)4231-1 1 2120310.316124
GeneratorCalculating cycle lengths
LCGicycle = m = 232
PMMLCGicycle = m-1 < 231
ACGcycle = 2r
SCGcycle = 2r
SWBicycle = ( 232r - 232s ) / #cycles
PSWBcycle = mr - ms
TGFSRicycle = 2w*N - 1
MRGicycle = mi - 1
C2MRG3cycle = (m13 - 1)*(m23 - 1)/2
C2TAUSicycle = Mask1 * Mask2
C2LCGcycle = (m1 * m2) / 2
C4LCGcycle = (m1 * m2 * m3 * m4) / 8
MWCA, MWCBcycle = ? (not specified by Marsaglia)
C3MWCcycle = ? " "
RWC2cycle = ? " "
RWC8cycle = ? " "

C.3.2. Default Generators for the Distributions

C.3.2.1. Random Library: Default Generators

When distributions are created using the createWithDefaults: aZone method, they create their own generator and initialize it with NEXTSEED (or with RANDOMSEED, if you started the program with the --varyseed switch).

The generators used are as follows:

Table C-3. Random Library: Default Generators

Distribution Generator
RandomBitDistusesC2TAUS1gen
BernoulliDistusesC2TAUS2gen
UniformIntegerDistusesTT403gen
UniformUnsignedDistusesTT775gen
UniformDoubleDistusesTT800gen
NormalDistusesMWCAgen
LogNormalDistusesMWCBgen
ExponentialDistusesC2TAUS3gen
GammaDistusesPSWBgen

These generators were chosen on the basis of quality and execution speed.

C.3.2.2. Utility Generator And Distributions

There are 4 default random objects defined in random/random.m. These are:
id <MT19937gen>          randomGenerator;
id <UniformIntegerDist>   uniformIntRand;
id <UniformUnsignedDist>  uniformUnsRand;
id <UniformDoubleDist>    uniformDblRand;

These objects may be called from anywhere in your program. Note (a): the generator is initialized with NEXTSEED or RANDOMSEED depending on the use of the --varyseed command line option. Note (b): the distribution objects are created without default statistical parameters.

C.3.3. Random Library Test Programs

  1. In a separate tar file, available at the SFI ftp site, there are a set of programs which exercise aspects of the generator objects' functionality. The following (very utilitarian) programs are available:

    • testR0. a program which exercises every generator and distribution, verifying correct operation and comparing output to that obtained on the author's system.

    • testR1. a program which prints out diagnostic output for code in random.m.

    • testR2. a program which asks each distribution and generator to describe itself using the Swarm xprint method.

    • testR3. a program which asks each distribution and generator to describe itself using the objects' 'describe' method.

    • testR4. a program that performs timing tests on each generator and distribution, computing the time it takes to call each object 10,000,000 times.

    • testR6. a program that generates a binary file containing 2.5M variates from a specified generator, for purposes of statistical testing (e.g. with ENT or Diehard.)

    • testR7. a program which records, for each generator, the value of unsignedMax, the number of output bits, and the value of lengthOfSeedVector.

    • testR9. a program which records, for each generator and distribution, the object's 'magic number' and the buffer size needed for getState/setState.

  2. Statistical testing: the generators have been put through the tests in the /testR6/ENT and /testR6/Diehard directories. The raw data can be found there. The results are summarized in the test log files found in the same test distribution. The distribution objects have not been tested statistically yet.

The code here represents an effort to implement several efficient, reasonably safe generators. The algorithms come from reading the literature (Bibliography). These algorithms have been implemented as accurately as possible and run through some simple tests. Some generators, included here for historical reasons only, are known to have bad statistical properties, and their use is deprecated. See Advanced Usage Guidefor information on the quality of the included generators.

While the objects in this library are believed to function correctly, the prudent and paranoid modeller would do well testing them in some domain-specific way. One easy way to do this is to run an experiment twice: once with one class of generator (say, PMMLCG), and once with another (say, SWB). If the results differ radically, then you can suspect one of the generators. If they don't, well, the generators still might be faulty!

The generators supplied with this release have been subjected to statistical testing using George Marsaglia's Diehard tests as well as John Walker's entropy tests (ENT). The results of these tests are summarized in Generator quality table. Other data on the properties of the generators are found there as well. These data support a discussion of how to choose one or more generators for your simulation.

The ENT test is included in the tarball of test programs mentioned above. The Diehard tests are copyright and hence are not, but they can be downloaded from the web at: ftp://ftp.csis.hku.hk/pub/random .

The distribution objects have not been statistically tested.