Monte Carlo methods for Statistical Inference:

Generation of Random Numbers

Hung Chen

hchen@math.ntu.edu.tw Department of Mathematics

National Taiwan University

25th February 2004

Meet at NS 104 On Wednesday from 9:10 to 12.

Outline

Introduction

{ Inverse CDF Method

{ Generate Discrete Random Variates { Limitation

Classical Uniform Variate Generator

{ Lehmer congruential generator (LCG)

{ Tausworthe feedback shift register generator { Combination of random number generators

Non-uniform Variate Generation { Simulating Stochastic Processes

Poisson process

Brownian motion

{ Acceptance/Rejection Methods { Bayesian Analysis

{ Simulating Multivariate Random Variates

Gibbs Sampling and MCMC

{ Brief Introduction on Markov Chain { MCMC

{ Gibbs sampling

Classical Uniform Variate Generator

Simulation is used heavily when analytical study of a statistical procedure becomes intractable.

Simulation of random variables and random pro- cesses using computers is among the fastest growing areas of computational statistics.

Many statistical techniques rely on simulating ran- dom variables.

{ One traditional area is the use of random numbers to sample from a population.

{ More recent applications include simulation of high- dimensional, complex stochastic systems that are beyond analytical studies.

{ In many practical situations the probability dis- tributions are far too complicated to analyze and often it is easier to simulate these distributions on computers and the resulting samples can be analyzed instead.

The study of a random variable through simulations is becoming a powerful tool in the hands of the statisticians.

Monte Carlo experimentation is the use of simulated random numbers to estimate some functional of a prob- ability distribution.

Building block in any simulation study is non-uniform variate generation.

{ Many algorithms are available.

{ Example: Generate normal random variable.

Box-Muller method (Polar method)

If X and Y are independent and standard normal random variables then for

= tan ^{1}

Y X

; R = p

X^{2} + Y ^{2}

is uniform in [0; 2] and R^{2} is exponential with
mean 2.

(0) U_{1}; U_{2} iid U(0; 1)

(1) X_{1} = ( 2 ln U_{1})^{1=2} cos(2U_{2})
(2) X_{2} = ( 2 ln U_{1})^{1=2} sin(2U_{2})
(3) X_{1}; X_{2} iid N(0; 1)

Inverse method

If X F , then F (X) U(0; 1).

{ In the above methods, it assumes that we can produce an endless ow of a iid uniform random variate generators.

On the computer, we generally settle for pseudo- random numbers, that is, numbers that appear to be random but actually deterministic.

CDF transformation method X = F (U), U U(0; 1) where

F (u) = inffx j F (x) ug is the generalized inverse of the cdf F .

{ For a standard exponential random variable, the transformation

X = log(U)

yields one exponential for each uniform variable.

{ How to simulate the process of ipping a coin with probability of head p?

For a discrete random variable, although the inverse of the cdf does not exist, the inverse cdf method can still be used.

{ The value of the discrete random variable is cho- sen as the smallest value within its countable range such that the cdf is no less than the value of the uniform variate.

For a multivariate random variable, the inverse cdf method yields a level curve in the range of the ran- dom variable; hence, the method is not directly use- ful for multivariate random variable.

{ Multivariate random variates can be generated us-

ing the inverse cdf method rst on a univariate marginal and then on a sequence of univariate conditionals.

Discrete Random Variables

A discrete random variable takes only a countable num- ber of values with pre-dened probabilities.

A discrete random variable is characterized by its
probability mass function dened as P (x_{1}) = p_{1}; P (x_{2}) =
p_{2}; : : : ; P (x_{n}) = p_{n}; : : : such that such that for all i,
0 p_{i} 1, and P

i p_{i} = 1.

Commonly used discrete random variables are bino- mial, Poisson, geometric and negative-binomial. As an

How do we generate a Poisson random variable with parameter ?

The probability mass function is given by:

p_{i} = exp( )^{i}

i! ; i = 0; 1; 2; : : : : Note that

P (X = i + 1)

P (X = i) = i + 1:

F_{X}(i + 1) can be written in the following interative
form:

F_{X}(i + 1) = F_{X}(i) + P (X = i)
i + 1:
The algorithm is

i. Generate U according to U[0; 1].

ii. Set i = 0, p = exp( ), and F = p.

iii. If U < F , set X = i and stop.

iv. Set p = p=(i + 1), F = F + p, i = i + 1 v. Go to Step (iii).

Denition For a given random variable, with a specied proba-
bility mass function f(x_{i}; p_{i}); i = 0; 1; 2; : : :g, the pro-
cess of selecting a value x_{i} with probability p_{i} is
called Simulation. If this selection is performed many
times, generating a sequence fX_{j}g, then

1 n

Xn j=1

I_{X}_{j}(fx_{i}g) ! p_{i}:

Uniform Random Number Generation

Use algebraic methods to generate sequences of numbers that mimic the behavior of a uniform ran- dom variable.

{ These numbers are called pseudorandom num- bers.

{ A uniform pseudorandom number generator is a
mapping f that, starting from an initial value x_{0},
generates a sequence

x_{0}; f(x_{0}); f(f(x_{0})); f(f(f(x_{0}))); : : : :

Since f is computed on a computer (without the
use of random number generator!!), it is a deter-
ministic mapping. That is, given x_{0} the remaining

sequence is xed everytime the sequence is com- puted.

The elements of such a sequence should have the following properties:

1. The patterns between the numbers the appearing in a sequence should be minimized.

2. The correlation between the neighboring elements should be reasonably small.

3. The values should be distributed nearly uniformly over the whole the range of possible values.

4. The sequences should have large periods, where a period is dened to be duration after which a sequence repeats itself.

5. There exist a set of goodness of t tests for test-

ing the probability distributions associated with the observed random variables. The elements of a pseudorandom sequence should provide a reason- able performance in these goodness of t tests.

No random number generator is capable of gener- ating (a) uniform and (b) independent variates.

Slight defect in RNG may have dramatic eect on whole simulation study.

{ Deng and Chhikara (1992)
{ If U_{1}; U_{2}; : : : ; U_{n} iid U(0; 1),

Z_{n} =

P_{n}

i=1 U_{i} ^{n}_{2}

pn=12 N(0; 1):

What if the assumption of iid and/or \U(0; 1)"

fail?

Classical uniform variate generator

Linear congruential generator [Lehmer (1951)]

{ X_{i} = BX_{i 1} + A mod n.

{ U_{i} = X_{i}=m

{ LCG has been used in almost all computer systems and packages.

{ Popular LCG (e.g., IMSL, SAS)
(a) B = 16807, A = 0, m = 2^{31} 1.

(b) Its period is m 1 2:1 10^{9}.
{ Comments

Period ( m) depends on B; A; m; X_{0}.

The period is too short by today's standard.

Large-scale simulation study is more and more common.

uniformity in 1-dimensional space

LCG cannot generate set of all lattice points in
k space, S_{k}, for k 2.

Consider S_{2} = f(I; J) j 0 I; J mg and do
plots of (U_{i}; U_{i+1}), i = 0; 1; 2; : : :

Insert p7 and 8 of Deng's note.

Feedback shift register [Tausworthe (1965)]

{ a_{j} = P_{k}

i=1 c_{i}a_{j i}(mod 2) where a_{i}; c_{i} 2 f0; 1g, c_{k} =
1

{ The mth random variate is the d bits binary num-

ber

0:a_{0}a_{1} : : : a_{d 1} base 2

0:a_{d}a_{d+1} : : : a_{2d 1} base 2
...

0:a_{md}a_{md+1} : : : a_{md+d 1} base 2
: : :

It can have an extremely long period, 2^{k} 1, (if
c_{i}'s are properly selected) for a large k,

good theoretical k-space uniformity

Poor empirical performance

Combination generators:

Wichmann and Hill (1982): Add three LCGs and take its fractional part.

{ X_{i} = AX_{i 1} mod m_{1}

{ Y_{i} = BY_{i 1} mod m_{2}
{ Z_{i} = CZ_{i 1} mod m_{3}

{ U_{i} = X_{i}=m_{1} + Y_{i}=m_{2} + Z_{i}=m_{3} mod 1
Comments:

{ Period is LCM(m_{1} 1; m_{2} 1; m_{3} 1).

For m_{1} = 30269; m_{2} = 30307; m_{3} = 30323, its period
is 6:95 10^{12}.

{ About 3000 times longer period than LCG-16807.

{ About three times slower than LCG.

{ No theoretical justication for uniformity provided.

Statistical justication given in Deng and George (1990)

{ Suppose that X_{1} and X_{2} are independent r.v. over

[0; 1] with pdfs f_{1}(x_{1}) and f_{2}(x_{2}) respectively.

{ j f_{1}(x_{1}) 1 j _{1}, j f_{2}(x_{2}) 1 j _{2}

{ Let Y = X_{1}+X_{2} mod 1 and denote its pdf by f(y).

{ Conclusion: j f(y) 1 j _{1}_{2}.
In general, Y = P_{n}

i=1 X_{i} mod 1 and denote its pdf
by f(y). Then

j f(y) 1 j

Yn i=1

_{i}:

Exponential and Poisson RVs

The exponential density function is dened by f(x) =

exp( x); if 0 x < 1,

0; otherwise.

Here is any positive constant, depending on the ex- periment.

The exponential density is often used describe ex- periments involving a question of the form: How long until something happens?

For example, the exponential density is often used to study the time between emissions of particles from a radioactive source.

\Memoryless" property:

Let T be an exponentially distributed random vari-

able with parameter .

It says that P (T > r + s j T > r) = P (T > s).

There is a very important relationship between the ex- ponential density and the Poisson distribution.

Dene X_{1}; X_{2}; : : : to be a sequence of independent
exponentially distributed random variables with pa-
rameter .

Think of X_{i} as denoting the amount of time between
the ith and (i + 1)st emissions of a particle by a
radioactive source.

Consider a time interval of length t, and we let Y de- note the random variable which counts the number of emissions that occur in this time interval.

Find the distribution function of Y (clearly, Y is a

discrete random variable).

Let S_{n} denote the sum X_{1} + X_{2} + : : : + X_{n}, then it is
easy to see that

P (Y = n) = P (S_{n} t and S_{n+1} > t)

= P (S_{n} t) P (S_{n+1} t):

The density of S_{n} is given by the following formula:

g_{n}(x) =

( ^{(x)}_{(n 1)!}^{n 1} exp( x); if x > 0;

0; otherwise.

It is a gamma density with parameters and n.

It is easy to show by induction on n that the cumu-

lative distribution function of S_{n} is given by:

G_{n}(x) =
8<

: 1 exp( x)

1 + ^{x}_{1!} + + ^{(x)}_{(n 1)!}^{n 1}

; if x > 0;

0; otherwise.

We recognize easily that it is the probability of tak- ing on the value n by a Poisson-distributed random variable, with parameter t.

The above relationship will allow us to simulate a Poisson distribution, once we have found a way to simulate an exponential density.

To simulate a Poisson random variable W with pa- rameter , we

{ Generate a sequence of values of an exponentially distributed random variable with the same param-

eter.

{ Keep track of the subtotals S_{k} of these values.

{ We stop generating the sequence when the subto- tal rst exceeds .

{ Assume that we nd that S_{n} < S_{n+1}. Then
the value n is returned as a simulated value for
W .

Simulating Poisson Processes

A point process consisting of randomly occurring points in the plane is said to be a two-dimensional Poisson process having rate , if

1. the number of points in any given region of area A is Poisson distributed with mean A; and

2. the number of points in disjoint regions are indepen- dent.

Let O be the origin in R^{2} and R_{i} be the ith nearest
Poisson point to O, i 1 (R_{0} = O).

It can be shown that

(R^{2}_{i} R^{2}_{i 1}) are exponentially distributed with rate

.

By symmetry, the respective angles of the Poisson points are independent and uniform [0; 2].

The following algorithm simulates a two-dimensional Poisson process in a ball of radius r centered at O, C(r).

1. Generate independent exponentials X_{1}; X_{2}; : : : with
rate 1, stopping at

N = min

n : X_{1} + X_{2} + + X_{n}

> r^{2}

2. if N = 1, stop, there are no points in C(r). Other- wise, for i = 1; 2; : : : ; N 1, set

R_{i} = p

(X_{1} + X_{2} + + X_{i})=:

3. Generate independent uniform [0; 1] random variables
U_{1}; U_{2}; : : : ; U_{N 1}.

4. Return the N 1 Poisson points in C(r) whose polar
coordinates are (R_{i}; 2U_{i}); i = 1; : : : ; N 1.

Brownian motion Finance Application:

As you may know something about the celebrated Black- Scholes formula of nance. The problem addressed by the formula is determining how much an \option"

should cost. This option is called the \call" options.

A call option on a certain stock is the right to buy a share of the stock at a certain xed price (the strike price) at a certain xed time in the future (the maturity date).

If I buy a call option from you, I am paying you a
certain amount of money in return for the right to
force you to sell me a share of the stock, if I want
it, at the strike price, K, on the maturity date, t_{1}.

The problem is, what is the right amount of money for me to pay for this right?

{ The meaning of the term right here relates to the economic term arbitrage.

{ An arbitrage opportunity is the opportunity to make money instantly and without risk. That is, you get some money for sure, right now.

{ Such free lunches are not supposed to exist, or at least should be rare and short-lived.

The basic reason for believing this is that many people are looking for such opportunities to make money.

If the price of commodity A were so low, for ex- ample, that some clever nancial transaction in-

volving buying commodity A and perhaps selling some others were guaranteed to make an instan- taneous prot, then many eager arbitrage seek- ers would try to perform the transaction many times.

The resulting increased demand for commod- ity A would cause its price to increase, thereby destroying the arbitrage opportunity.

It assume that there is a nancial instrument called bond such that its \interest rate" or the \riskless"

rate of return be r, that is, $1 in a riskless invest- ment today becomes $exp(rt) at time t.

dB(t)

dt = rB(t);

where B(t) is the bond price at time t.

Let the stock price at time t be X(t).

A little thought shows that the value of the option
at time t_{1} is the random variable (X(t_{1}) K)_{+}, since
it makes sense for me to exercise the option if and
only if X(t_{1}) > K.

Let Y (t) denote the magic, no-arbitrage price for the option that we are seeking.

Assume that Y (t) may be expressed as some func- tion f(X(t); t) of X(t) and t; our goal is to determine the function f.

Assume a simple probabilistic model for the evolu- tion of the stock price: suppose X is the geometric Brownian motion having stochastic dierential

dX = Xdt + XdW:

Thus, X is the exponential of a Brownian motion with drift.

Note that the riskless investments change as exp(linear function), and stocks change as exp(Brownian mo- tion).

What we are really assuming is that returns, that is, proportional changes in the stock price, are station- ary and independent over dierent time intervals.

The formulation of this process was inspired by the physical phenomenon of Brownian motion, which is the irregular jiggling sort of movement exhibited by a small particle suspended in a uid, named after the botanist Robert Brown who observed and studied it in 1827.

A physical explanation of Brownian motion was given

by Einstein, who analyzed Brownian motion as the cumulative eect of innumerable collisions of the suspended particle with the molecules of the uid.

Einstein's analysis provided historically important sup- port for the atomic theory of matter, which was still a matter of controversy at the time-shortly after 1900.

The mathematical theory of Brownian motion was given a rm foundation by Norbert Wiener in 1923;

the mathematical model we will study is also known as the \Wiener process."

Brownian motion and diusions are used all the time in models in all sorts of elds, such as nance (in mod- eling the prices of stocks, for example), economics,

queueing theory, engineering, and biology.

Just as a pollen particle is continually bueted by collisions with water molecules, the price of a stock is bueted by the actions of many individual in- vestors.

Construction of Brownian motion on the time interval [0; 1]:

Connect-the-dots approach: At each stage of the construction we obtain a more and more detailed picture of a sample path.

W (0) = 0

For W (1), we generate a N(0; 1) random variable Z_{1}
and take Z_{1} to be W (1) since W (1) N(0; 1).

Given that the path passes through the two points
(0; 0) and (1; Z_{1}), the conditional expectation is the
linear interpolation X^{(0)}(t) = Z_{1}t.

This will be our rst crude approximation to a sam- ple path.

Next let's simulate a value for W (1=2).

{ Given the values we have already generated for

W (0) and W (1), we know that W (1=2) N(Z_{1}=2; (1=2)(1=2)).

{ Generate another independent standard random
variable Z_{2} and take W (1=2) to be X^{(0)}(1=2) +
(1=2)Z_{2}.

{ Dene the approximation X^{(1)} to be the piece-
wise linear path joining the three points (0; 0),
(1=2; W (1=2)), and (1; W (1)).

Simulate W (1=4) and W (3=4).

{ E(W (t) j W (0); W (1=2); W (1)) = X^{(1)}(t)

{ Conditional variance of both W (1=4) and W (3=4) is (1=4)(1=4)=(1=2) = 1=8.

{ Generate two more independent standard random
variables Z_{3} and Z_{4}, and dene

W (1=4) = X^{(1)}(1=4) + p1

8Z_{3};
W (3=4) = X^{(1)}(3=4) + p1

8Z_{4}:

{ The approximation X^{(2)} to be the piecewise lin-
ear interpolation of the simulated values we have
obtained for the times 0, 1=4, 1=2, 3=4, and 1.

In general, to get from X^{(n)} to X^{(n+1)}, we generate

2^{n} new standard normal random variables Z_{2}^{n}_{+1}; Z_{2}^{n}_{+2}; : : : ; Z_{2}n+1,
multiply these by the appropriate conditional stan-

dard deviation p

2 ^{n 2} = 2 ^{(n=2) 1}, and add to the
values X^{(n)}(1=2^{n+1}); X^{(n)}(3=2^{n+1}); : : : ; X^{(}n)(1 1=2^{n+1})

to get the new values X^{(n+1)}(1=2^{n+1}); X^{(n+1)}(3=2^{n+1}); : : : ; X^{(}n+

1)(1 1=2^{n+1}).

Claim. With probability 1, the sequence of functions
X^{(1)}; X^{(2)}; : : : converges uniformly over the interval
[0; 1].

{ The limit of a uniformly convergent sequence of continuous functions is a continuous function.

{ To appreciate the need for uniformity of conver- gence in order to be guaranteed that the limit function is continuous, recall the following stan-

dard example.

For n = 1; 2; : : :, consider the function t^{n} for t 2
[0; 1]. Then as n ! 1, this converges to 0 for all
t < 1 whereas it converges to 1 for t = 1, so that
the limit is not a continuous function.

{ Dene the maximum dierence M_{n} between X^{(n+1)}
and X^{(n)} by

M_{n} = max

t2[0;1] j X^{(n+1)}(t) X^{(n)}(t) j :
{ Note that if P

M_{n} < 1, then the sequence of
functions X^{(1)}; X^{(2)}; : : : converges uniformly over
[0; 1].

{ It is sucient to show that P fP

M_{n} < 1g = 1.

{ Observe that

M_{n} = 2 ^{n=2 1} maxfj Z_{2}^{n}_{+1} j; j Z_{2}^{n}_{+2} j; : : : ; j Z_{2}n+1 jg:

{ Note that X1

n=1

P fj Z_{n} j> p

c log ng 2p1 2

X1 n=1

e (1=2)c log n

pc log n

= p2 c

X1 n=1

1

n^{c=2}(log n)^{1=2}
which is nite for c > 2.

{ By the Borel-Cantelli lemma,
P fj Z_{n} j> p

c log n innitely ofteng = 0:

{ Taking c > 2, the fact implies that with probability 1,

M_{n} 2 ^{n=2 1}q

c log(2^{n+1})

holds for all suciently large n.

We have P

M_{n} < 1 with probability 1, which
completes the proof of the above claim.

Acceptance/Rejection Method

This method assumes that we have a method for sim- ulating from some density function g and our task is utilize samples from g to simulate from a given density function f.

g can be fairly arbitrary except for one condition men- tioned below.

The basic idea is to simulate from g and accept the samples with probability proportional to the ratio f=g.

{ Requirement: Let C be a constant such that f(Y )

g(Y ) C; for all Y :

Simulation procedure:

(1) Simulate Y from the density g and simulate U from uniform [0; 1].

(2) If U f(Y )=[Cg(Y )] then X = Y else go to step 1.

Validity of Acceptance/Rejection Method

Let X be the value obtained and n be the number of iterations required to reach this value.

P (X x) = P (Y_{n} x) = P

Y x j U f(Y_{n})
Cg(Y_{n})

= P

Y x; U _{Cg(Y}^{f(Y}^{n}^{)}

n)

P

Y x; U _{Cg(Y}^{f(Y}^{n}^{)}

n)

=

R _{x}

1

R _{f(y)=Cg(y)}

0 1dug_{Y} (y)dudy
R _{1}

1

R _{f(y)=Cg(y)}

0 1dug_{Y} (y)dudy

=

R _{x}

1 f(y)

Cg(y)g_{Y} (y)dy
R _{1}

1 f(y)

Cg(y)g_{Y} (y)dy;

since Y and U are independent random variables.

(Their joint density function is the product of the marginals g(y) 1)

As x ! 1, the left side goes to 1 and the integral on the right side also goes to 1.

Therefore,

C

Z _{1}

1

f(y)

Cg(y)g_{Y} (y)dy = 1
and P (X x) = R _{x}

1 f(Y )dY . We conclude that X is random with probability density f.

Eciency: For a given value of Y we accept Y by gen- erating a uniform U and comparing U with f(Y )=Cg(Y ).

Accept Y with probability f(Y )=Cg(Y ).

Each iteration in the loop involves independent real- izations, we can compute the probability of accept- ing Y as X according to

P

U f(Y ) Cg(Y )

= K = 1 C:

If C is large then the process, of generating samples from f using this method, will be slow.

What is the distribution of n?

Illustration: Use acceptance/rejection method to gen- erate sample from standard normal density function.

Find g with support on ( 1; 1).

{ Sampling from the standard exponential density function (g(x) = exp( x)) can be done quickly.

{ Note that the support of g is on [0; 1) and f is symmetric at 0. Convert the problem to the generation of half-normal variate.

Generate X =j Z j with density function f(x) = p2

2 exp x^{2}
2

!

; x 0:

Determine C.

{ The bound on the ratio of f to g:

f(x) g(x) =

r2e

exp (x 1)^{2}
2

!

r2e

= C:

{ f(x)=Cg(x) = exp( (x 1)^{2}=2).

Algorithm 1

1. Generate Y , an exponential random variable with mean 1, and U, a uniform [0; 1] random variable.

2. If U exp( (Y 1)^{2}=2) set X = Y , otherwise
return to (1).

Algorithm 2

Observe that log(U) (Y 1)^{2}=2 and log(U) is
exponential with rate 1.

1. Generate Y_{1} and Y_{2}, two samples from exponential
random variable with mean 1.

2. If Y_{2} (Y_{1} 1)^{2}=2 set X = Y_{1}, otherwise return to
(1).

Having generated a random variable which is the ab- solute value of a standard normal, we can generate sample from standard normal.

1. Generate U a uniform random variable the algo- rithm described above.

2. If U 2 (0; 1=2] set Z = X, else set Z = X.

Example on R-programming:

Generate deviates from a beta distribution with param- eters and .

f(x) = 1

B(; )x^{ 1}(1 x)^{ 1}:

It has a nite support [0; 1].

Choose g as a uniform.

Need to nd the mode f. Solve 1

x

1

1 x = 0

and obtain xmode = ( 1)=( + 2).

C = (xmode)^{ 1}(1 xmode)^{ 1} ( + )=( () ())
R-program

alpha<- 2; beta<- 7; nsimu<- 1000

xmode<- (alpha -1)/(alpha+ beta -2)

dmax<- xmode^(alpha -1)*(1-xmode)^(beta-1)*

gamma(alpha+beta)/(gamma(alpha)*gamma(beta))

y<- runif(nsimu)

x<- na.omit(ifelse(runif(nsimu)<=dbeta(y,alpha, beta)/dmax,y,NA))

Note that dmax 3:18 in this case, we expect to get around 1000=3:18 deviates.

Remarks

No clear rule to nd g.

{ g(y) should be similar and dominate f(y).

The constant C maynot be easy to nd.

{ As an example, how do we determine C for the posterior distribution

p( j y) / (2 + )^{125}(1 )^{38}^{34}

If it is hard to apply rejection method, what can be used?

Simulating Multivariate Random Variates

With multivariate distributions, one is often faced with enormous problems for random variate generation.

Von Neumann's rejection method [von Neumann 1963] requires a case-by-case study.

{ It is dicult to determine a usable majorizing den- sity.

The conditional method (generate one random vari- ate; generate the next one conditional on the rst one, and so forth) requires often dicult-to-compute marginal densities.

Consider the generation of multivariate normal with
mean 0 and variance-covariance matrix = (_{ij})_{pp}.

{ X_{p1} has a multivariate normal distribution i X
can be written as

X = + AZ

where _{pp}, A are constant and X = (X_{1}; : : : ; X_{p})^{T}
where the Z_{j} are independent standard normal
variables.

{ = AA^{T}

A is nonsingular i is positive denite.

{ By the spectral decomposition theorem, there ex-
ists P orthogonal such that = P^{T}DP.

Here D is the diagonal matrix whose diagonal en- tries are nonnegative eigenvalues of .

{ If rank() = p, ^{1=2} = PD^{1=2}P^{T}.
Useful R-command:

solve: Solve a system of equations.

eigen: Computes eigenvalues and eigenvectors.

backsolve: Solve an upper or lower Triangular System.

chol: Compute the Cholesky factorization of a real symmetric positive-denite square matrix.

qr: The QR decomposition of a matrix

{ Write X = ((X^{(1)})^{T}; (X^{(2)})^{T})^{T}, the conditional dis-
tribution of X^{(2)} given X^{(1)} = x^{(1)} is normal with
mean ^{(2)} + _{21}_{11}^{1}(x^{(1)} ^{(1)}) and variance _{22}

_{21}_{11}^{1}_{21}.

{ X_{1} is generated as N(_{1}; _{11}),

{ X_{2} is generated as N(_{2}+_{12}X_{1}=_{11}; _{22} _{12}^{2} =_{11}),

and so on,

Generate multivariate random variates by use of ei- ther iid univariates followed by a transformation.