# HungChen MonteCarlomethodsforStatisticalInference:GenerationofRandomNumbers

## Full text

(1)

Monte Carlo methods for Statistical Inference:

Generation of Random Numbers

Hung Chen

hchen@math.ntu.edu.tw Department of Mathematics

National Taiwan University

25th February 2004

Meet at NS 104 On Wednesday from 9:10 to 12.

(2)

Outline

 Introduction

{ Inverse CDF Method

{ Generate Discrete Random Variates { Limitation

 Classical Uniform Variate Generator

{ Lehmer congruential generator (LCG)

{ Tausworthe feedback shift register generator { Combination of random number generators

 Non-uniform Variate Generation { Simulating Stochastic Processes

 Poisson process

(3)

 Brownian motion

{ Acceptance/Rejection Methods { Bayesian Analysis

{ Simulating Multivariate Random Variates

 Gibbs Sampling and MCMC

{ Brief Introduction on Markov Chain { MCMC

{ Gibbs sampling

(4)

Classical Uniform Variate Generator

Simulation is used heavily when analytical study of a statistical procedure becomes intractable.

 Simulation of random variables and random pro- cesses using computers is among the fastest growing areas of computational statistics.

 Many statistical techniques rely on simulating ran- dom variables.

{ One traditional area is the use of random numbers to sample from a population.

{ More recent applications include simulation of high- dimensional, complex stochastic systems that are beyond analytical studies.

(5)

{ In many practical situations the probability dis- tributions are far too complicated to analyze and often it is easier to simulate these distributions on computers and the resulting samples can be analyzed instead.

 The study of a random variable through simulations is becoming a powerful tool in the hands of the statisticians.

Monte Carlo experimentation is the use of simulated random numbers to estimate some functional of a prob- ability distribution.

 Building block in any simulation study is non-uniform variate generation.

{ Many algorithms are available.

(6)

{ Example: Generate normal random variable.

 Box-Muller method (Polar method)

If X and Y are independent and standard normal random variables then for

 = tan 1

Y X



; R = p

X2 + Y 2

 is uniform in [0; 2] and R2 is exponential with mean 2.

(0) U1; U2 iid U(0; 1)

(1) X1 = ( 2 ln U1)1=2 cos(2U2) (2) X2 = ( 2 ln U1)1=2 sin(2U2) (3) X1; X2 iid N(0; 1)

 Inverse method

If X  F , then F (X)  U(0; 1).

(7)

{ In the above methods, it assumes that we can produce an endless ow of a iid uniform random variate generators.

 On the computer, we generally settle for pseudo- random numbers, that is, numbers that appear to be random but actually deterministic.

 CDF transformation method X = F (U), U  U(0; 1) where

F (u) = inffx j F (x)  ug is the generalized inverse of the cdf F .

{ For a standard exponential random variable, the transformation

X = log(U)

yields one exponential for each uniform variable.

(8)

{ How to simulate the process of ipping a coin with probability of head p?

 For a discrete random variable, although the inverse of the cdf does not exist, the inverse cdf method can still be used.

{ The value of the discrete random variable is cho- sen as the smallest value within its countable range such that the cdf is no less than the value of the uniform variate.

 For a multivariate random variable, the inverse cdf method yields a level curve in the range of the ran- dom variable; hence, the method is not directly use- ful for multivariate random variable.

{ Multivariate random variates can be generated us-

(9)

ing the inverse cdf method rst on a univariate marginal and then on a sequence of univariate conditionals.

(10)

Discrete Random Variables

A discrete random variable takes only a countable num- ber of values with pre-de ned probabilities.

 A discrete random variable is characterized by its probability mass function de ned as P (x1) = p1; P (x2) = p2; : : : ; P (xn) = pn; : : : such that such that for all i, 0  pi  1, and P

i pi = 1.

 Commonly used discrete random variables are bino- mial, Poisson, geometric and negative-binomial. As an

 How do we generate a Poisson random variable with parameter ?

(11)

The probability mass function is given by:

pi = exp( )i

i! ; i = 0; 1; 2; : : : : Note that

P (X = i + 1)

P (X = i) =  i + 1:

FX(i + 1) can be written in the following interative form:

FX(i + 1) = FX(i) + P (X = i)  i + 1: The algorithm is

i. Generate U according to U[0; 1].

ii. Set i = 0, p = exp( ), and F = p.

iii. If U < F , set X = i and stop.

(12)

iv. Set p = p=(i + 1), F = F + p, i = i + 1 v. Go to Step (iii).

De nition For a given random variable, with a speci ed proba- bility mass function f(xi; pi); i = 0; 1; 2; : : :g, the pro- cess of selecting a value xi with probability pi is called Simulation. If this selection is performed many times, generating a sequence fXjg, then

1 n

Xn j=1

IXj(fxig) ! pi:

(13)

Uniform Random Number Generation

 Use algebraic methods to generate sequences of numbers that mimic the behavior of a uniform ran- dom variable.

{ These numbers are called pseudorandom num- bers.

{ A uniform pseudorandom number generator is a mapping f that, starting from an initial value x0, generates a sequence

x0; f(x0); f(f(x0)); f(f(f(x0))); : : : :

Since f is computed on a computer (without the use of random number generator!!), it is a deter- ministic mapping. That is, given x0 the remaining

(14)

sequence is xed everytime the sequence is com- puted.

The elements of such a sequence should have the following properties:

1. The patterns between the numbers the appearing in a sequence should be minimized.

2. The correlation between the neighboring elements should be reasonably small.

3. The values should be distributed nearly uniformly over the whole the range of possible values.

4. The sequences should have large periods, where a period is de ned to be duration after which a sequence repeats itself.

5. There exist a set of goodness of t tests for test-

(15)

ing the probability distributions associated with the observed random variables. The elements of a pseudorandom sequence should provide a reason- able performance in these goodness of t tests.

 No random number generator is capable of gener- ating (a) uniform and (b) independent variates.

 Slight defect in RNG may have dramatic e ect on whole simulation study.

{ Deng and Chhikara (1992) { If U1; U2; : : : ; Un iid  U(0; 1),

Zn =

Pn

i=1 Ui n2

pn=12  N(0; 1):

What if the assumption of iid and/or \U(0; 1)"

fail?

(16)

 Classical uniform variate generator

Linear congruential generator [Lehmer (1951)]

{ Xi = BXi 1 + A mod n.

{ Ui = Xi=m

{ LCG has been used in almost all computer systems and packages.

{ Popular LCG (e.g., IMSL, SAS) (a) B = 16807, A = 0, m = 231 1.

(b) Its period is m 1  2:1  109. { Comments

 Period ( m) depends on B; A; m; X0.

 The period is too short by today's standard.

Large-scale simulation study is more and more common.

(17)

 uniformity in 1-dimensional space

 LCG cannot generate set of all lattice points in k space, Sk, for k  2.

 Consider S2 = f(I; J) j 0  I; J  mg and do plots of (Ui; Ui+1), i = 0; 1; 2; : : :

 Insert p7 and 8 of Deng's note.

 Feedback shift register [Tausworthe (1965)]

{ aj = Pk

i=1 ciaj i(mod 2) where ai; ci 2 f0; 1g, ck = 1

{ The mth random variate is the d bits binary num-

(18)

ber

0:a0a1 : : : ad 1 base 2

0:amdamd+1 : : : amd+d 1 base 2 : : :

 It can have an extremely long period, 2k 1, (if ci's are properly selected) for a large k,

 good theoretical k-space uniformity

 Poor empirical performance

 Combination generators:

Wichmann and Hill (1982): Add three LCGs and take its fractional part.

{ Xi = AXi 1 mod m1

(19)

{ Yi = BYi 1 mod m2 { Zi = CZi 1 mod m3

{ Ui = Xi=m1 + Yi=m2 + Zi=m3 mod 1 Comments:

{ Period is LCM(m1 1; m2 1; m3 1).

For m1 = 30269; m2 = 30307; m3 = 30323, its period is 6:95  1012.

{ About 3000 times longer period than LCG-16807.

{ About three times slower than LCG.

{ No theoretical justi cation for uniformity provided.

 Statistical justi cation given in Deng and George (1990)

{ Suppose that X1 and X2 are independent r.v. over

(20)

[0; 1] with pdfs f1(x1) and f2(x2) respectively.

{ j f1(x1) 1 j 1, j f2(x2) 1 j 2

{ Let Y = X1+X2 mod 1 and denote its pdf by f(y).

{ Conclusion: j f(y) 1 j 12. In general, Y = Pn

i=1 Xi mod 1 and denote its pdf by f(y). Then

j f(y) 1 j

Yn i=1

i:

(21)

Exponential and Poisson RVs

The exponential density function is de ned by f(x) =

  exp( x); if 0  x < 1,

0; otherwise.

Here  is any positive constant, depending on the ex- periment.

 The exponential density is often used describe ex- periments involving a question of the form: How long until something happens?

 For example, the exponential density is often used to study the time between emissions of particles from a radioactive source.

 \Memoryless" property:

Let T be an exponentially distributed random vari-

(22)

able with parameter .

It says that P (T > r + s j T > r) = P (T > s).

There is a very important relationship between the ex- ponential density and the Poisson distribution.

 De ne X1; X2; : : : to be a sequence of independent exponentially distributed random variables with pa- rameter .

 Think of Xi as denoting the amount of time between the ith and (i + 1)st emissions of a particle by a radioactive source.

 Consider a time interval of length t, and we let Y de- note the random variable which counts the number of emissions that occur in this time interval.

 Find the distribution function of Y (clearly, Y is a

(23)

discrete random variable).

 Let Sn denote the sum X1 + X2 + : : : + Xn, then it is easy to see that

P (Y = n) = P (Sn  t and Sn+1 > t)

= P (Sn  t) P (Sn+1  t):

 The density of Sn is given by the following formula:

gn(x) =

( (x)(n 1)!n 1 exp( x); if x > 0;

0; otherwise.

It is a gamma density with parameters  and n.

 It is easy to show by induction on n that the cumu-

(24)

lative distribution function of Sn is given by:

Gn(x) = 8<

: 1 exp( x)



1 + x1! +    + (x)(n 1)!n 1



; if x > 0;

0; otherwise.

We recognize easily that it is the probability of tak- ing on the value n by a Poisson-distributed random variable, with parameter t.

 The above relationship will allow us to simulate a Poisson distribution, once we have found a way to simulate an exponential density.

 To simulate a Poisson random variable W with pa- rameter , we

{ Generate a sequence of values of an exponentially distributed random variable with the same param-

(25)

eter.

{ Keep track of the subtotals Sk of these values.

{ We stop generating the sequence when the subto- tal rst exceeds .

{ Assume that we nd that Sn   < Sn+1. Then the value n is returned as a simulated value for W .

(26)

Simulating Poisson Processes

A point process consisting of randomly occurring points in the plane is said to be a two-dimensional Poisson process having rate , if

1. the number of points in any given region of area A is Poisson distributed with mean A; and

2. the number of points in disjoint regions are indepen- dent.

Let O be the origin in R2 and Ri be the ith nearest Poisson point to O, i  1 (R0 = O).

It can be shown that

 (R2i R2i 1) are exponentially distributed with rate

.

(27)

 By symmetry, the respective angles of the Poisson points are independent and uniform [0; 2].

The following algorithm simulates a two-dimensional Poisson process in a ball of radius r centered at O, C(r).

1. Generate independent exponentials X1; X2; : : : with rate 1, stopping at

N = min



n : X1 + X2 +    + Xn

 > r2



2. if N = 1, stop, there are no points in C(r). Other- wise, for i = 1; 2; : : : ; N 1, set

Ri = p

(X1 + X2 +    + Xi)=:

3. Generate independent uniform [0; 1] random variables U1; U2; : : : ; UN 1.

(28)

4. Return the N 1 Poisson points in C(r) whose polar coordinates are (Ri; 2Ui); i = 1; : : : ; N 1.

(29)

Brownian motion Finance Application:

As you may know something about the celebrated Black- Scholes formula of nance. The problem addressed by the formula is determining how much an \option"

should cost. This option is called the \call" options.

 A call option on a certain stock is the right to buy a share of the stock at a certain xed price (the strike price) at a certain xed time in the future (the maturity date).

 If I buy a call option from you, I am paying you a certain amount of money in return for the right to force you to sell me a share of the stock, if I want it, at the strike price, K, on the maturity date, t1.

(30)

 The problem is, what is the right amount of money for me to pay for this right?

{ The meaning of the term right here relates to the economic term arbitrage.

{ An arbitrage opportunity is the opportunity to make money instantly and without risk. That is, you get some money for sure, right now.

{ Such free lunches are not supposed to exist, or at least should be rare and short-lived.

The basic reason for believing this is that many people are looking for such opportunities to make money.

 If the price of commodity A were so low, for ex- ample, that some clever nancial transaction in-

(31)

volving buying commodity A and perhaps selling some others were guaranteed to make an instan- taneous pro t, then many eager arbitrage seek- ers would try to perform the transaction many times.

 The resulting increased demand for commod- ity A would cause its price to increase, thereby destroying the arbitrage opportunity.

 It assume that there is a nancial instrument called bond such that its \interest rate" or the \riskless"

rate of return be r, that is, \$1 in a riskless invest- ment today becomes \$exp(rt) at time t.

dB(t)

dt = rB(t);

where B(t) is the bond price at time t.

(32)

 Let the stock price at time t be X(t).

 A little thought shows that the value of the option at time t1 is the random variable (X(t1) K)+, since it makes sense for me to exercise the option if and only if X(t1) > K.

 Let Y (t) denote the magic, no-arbitrage price for the option that we are seeking.

Assume that Y (t) may be expressed as some func- tion f(X(t); t) of X(t) and t; our goal is to determine the function f.

 Assume a simple probabilistic model for the evolu- tion of the stock price: suppose X is the geometric Brownian motion having stochastic di erential

dX = Xdt + XdW:

(33)

Thus, X is the exponential of a Brownian motion with drift.

 Note that the riskless investments change as exp(linear function), and stocks change as exp(Brownian mo- tion).

What we are really assuming is that returns, that is, proportional changes in the stock price, are station- ary and independent over di erent time intervals.

The formulation of this process was inspired by the physical phenomenon of Brownian motion, which is the irregular jiggling sort of movement exhibited by a small particle suspended in a uid, named after the botanist Robert Brown who observed and studied it in 1827.

 A physical explanation of Brownian motion was given

(34)

by Einstein, who analyzed Brownian motion as the cumulative e ect of innumerable collisions of the suspended particle with the molecules of the uid.

 Einstein's analysis provided historically important sup- port for the atomic theory of matter, which was still a matter of controversy at the time-shortly after 1900.

 The mathematical theory of Brownian motion was given a rm foundation by Norbert Wiener in 1923;

the mathematical model we will study is also known as the \Wiener process."

Brownian motion and di usions are used all the time in models in all sorts of elds, such as nance (in mod- eling the prices of stocks, for example), economics,

(35)

queueing theory, engineering, and biology.

 Just as a pollen particle is continually bu eted by collisions with water molecules, the price of a stock is bu eted by the actions of many individual in- vestors.

Construction of Brownian motion on the time interval [0; 1]:

 Connect-the-dots approach: At each stage of the construction we obtain a more and more detailed picture of a sample path.

 W (0) = 0

 For W (1), we generate a N(0; 1) random variable Z1 and take Z1 to be W (1) since W (1)  N(0; 1).

(36)

 Given that the path passes through the two points (0; 0) and (1; Z1), the conditional expectation is the linear interpolation X(0)(t) = Z1t.

This will be our rst crude approximation to a sam- ple path.

 Next let's simulate a value for W (1=2).

{ Given the values we have already generated for

W (0) and W (1), we know that W (1=2)  N(Z1=2; (1=2)(1=2)).

{ Generate another independent standard random variable Z2 and take W (1=2) to be X(0)(1=2) + (1=2)Z2.

{ De ne the approximation X(1) to be the piece- wise linear path joining the three points (0; 0), (1=2; W (1=2)), and (1; W (1)).

(37)

 Simulate W (1=4) and W (3=4).

{ E(W (t) j W (0); W (1=2); W (1)) = X(1)(t)

{ Conditional variance of both W (1=4) and W (3=4) is (1=4)(1=4)=(1=2) = 1=8.

{ Generate two more independent standard random variables Z3 and Z4, and de ne

W (1=4) = X(1)(1=4) + p1

8Z3; W (3=4) = X(1)(3=4) + p1

8Z4:

{ The approximation X(2) to be the piecewise lin- ear interpolation of the simulated values we have obtained for the times 0, 1=4, 1=2, 3=4, and 1.

 In general, to get from X(n) to X(n+1), we generate

(38)

2n new standard normal random variables Z2n+1; Z2n+2; : : : ; Z2n+1, multiply these by the appropriate conditional stan-

dard deviation p

2 n 2 = 2 (n=2) 1, and add to the values X(n)(1=2n+1); X(n)(3=2n+1); : : : ; X(n)(1 1=2n+1)

to get the new values X(n+1)(1=2n+1); X(n+1)(3=2n+1); : : : ; X(n+

1)(1 1=2n+1).

 Claim. With probability 1, the sequence of functions X(1); X(2); : : : converges uniformly over the interval [0; 1].

{ The limit of a uniformly convergent sequence of continuous functions is a continuous function.

{ To appreciate the need for uniformity of conver- gence in order to be guaranteed that the limit function is continuous, recall the following stan-

(39)

dard example.

For n = 1; 2; : : :, consider the function tn for t 2 [0; 1]. Then as n ! 1, this converges to 0 for all t < 1 whereas it converges to 1 for t = 1, so that the limit is not a continuous function.

{ De ne the maximum di erence Mn between X(n+1) and X(n) by

Mn = max

t2[0;1] j X(n+1)(t) X(n)(t) j : { Note that if P

Mn < 1, then the sequence of functions X(1); X(2); : : : converges uniformly over [0; 1].

{ It is sucient to show that P fP

Mn < 1g = 1.

(40)

{ Observe that

Mn = 2 n=2 1 maxfj Z2n+1 j; j Z2n+2 j; : : : ; j Z2n+1 jg:

{ Note that X1

n=1

P fj Zn j> p

c log ng  2p1 2

X1 n=1

e (1=2)c log n

pc log n

= p2 c

X1 n=1

1

nc=2(log n)1=2 which is nite for c > 2.

{ By the Borel-Cantelli lemma, P fj Zn j> p

c log n in nitely ofteng = 0:

{ Taking c > 2, the fact implies that with probability 1,

Mn  2 n=2 1q

c log(2n+1)

(41)

holds for all suciently large n.

We have P

Mn < 1 with probability 1, which completes the proof of the above claim.

(42)

Acceptance/Rejection Method

This method assumes that we have a method for sim- ulating from some density function g and our task is utilize samples from g to simulate from a given density function f.

g can be fairly arbitrary except for one condition men- tioned below.

 The basic idea is to simulate from g and accept the samples with probability proportional to the ratio f=g.

{ Requirement: Let C be a constant such that f(Y )

g(Y )  C; for all Y :

 Simulation procedure:

(43)

(1) Simulate Y from the density g and simulate U from uniform [0; 1].

(2) If U  f(Y )=[Cg(Y )] then X = Y else go to step 1.

(44)

Validity of Acceptance/Rejection Method

 Let X be the value obtained and n be the number of iterations required to reach this value.

P (X  x) = P (Yn  x) = P



Y  x j U  f(Yn) Cg(Yn)



= P 

Y  x; U  Cg(Yf(Yn)

n)

 P 

Y  x; U  Cg(Yf(Yn)

n)



=

R x

1

R f(y)=Cg(y)

0 1dugY (y)dudy R 1

1

R f(y)=Cg(y)

0 1dugY (y)dudy

=

R x

1 f(y)

Cg(y)gY (y)dy R 1

1 f(y)

Cg(y)gY (y)dy;

(45)

since Y and U are independent random variables.

(Their joint density function is the product of the marginals g(y)  1)

 As x ! 1, the left side goes to 1 and the integral on the right side also goes to 1.

Therefore,

C

Z 1

1

f(y)

Cg(y)gY (y)dy = 1 and P (X  x) = R x

1 f(Y )dY . We conclude that X is random with probability density f.

Eciency: For a given value of Y we accept Y by gen- erating a uniform U and comparing U with f(Y )=Cg(Y ).

 Accept Y with probability f(Y )=Cg(Y ).

(46)

 Each iteration in the loop involves independent real- izations, we can compute the probability of accept- ing Y as X according to

P



U  f(Y ) Cg(Y )



= K = 1 C:

If C is large then the process, of generating samples from f using this method, will be slow.

 What is the distribution of n?

Illustration: Use acceptance/rejection method to gen- erate sample from standard normal density function.

 Find g with support on ( 1; 1).

{ Sampling from the standard exponential density function (g(x) = exp( x)) can be done quickly.

(47)

{ Note that the support of g is on [0; 1) and f is symmetric at 0. Convert the problem to the generation of half-normal variate.

 Generate X =j Z j with density function f(x) = p2

2 exp x2 2

!

; x  0:

 Determine C.

{ The bound on the ratio of f to g:

f(x) g(x) =

r2e

 exp (x 1)2 2

!



r2e

 = C:

{ f(x)=Cg(x) = exp( (x 1)2=2).

 Algorithm 1

(48)

1. Generate Y , an exponential random variable with mean 1, and U, a uniform [0; 1] random variable.

2. If U  exp( (Y 1)2=2) set X = Y , otherwise return to (1).

 Algorithm 2

Observe that log(U)  (Y 1)2=2 and log(U) is exponential with rate 1.

1. Generate Y1 and Y2, two samples from exponential random variable with mean 1.

2. If Y2  (Y1 1)2=2 set X = Y1, otherwise return to (1).

Having generated a random variable which is the ab- solute value of a standard normal, we can generate sample from standard normal.

(49)

1. Generate U a uniform random variable the algo- rithm described above.

2. If U 2 (0; 1=2] set Z = X, else set Z = X.

Example on R-programming:

Generate deviates from a beta distribution with param- eters and .

f(x) = 1

B( ; )x 1(1 x) 1:

 It has a nite support [0; 1].

 Choose g as a uniform.

 Need to nd the mode f. Solve 1

x

1

1 x = 0

and obtain xmode = ( 1)=( + 2).

(50)

 C = (xmode) 1(1 xmode) 1 ( + )=( ( ) ( )) R-program

 alpha<- 2; beta<- 7; nsimu<- 1000

 xmode<- (alpha -1)/(alpha+ beta -2)

 dmax<- xmode^(alpha -1)*(1-xmode)^(beta-1)*

gamma(alpha+beta)/(gamma(alpha)*gamma(beta))

 y<- runif(nsimu)

 x<- na.omit(ifelse(runif(nsimu)<=dbeta(y,alpha, beta)/dmax,y,NA))

Note that dmax 3:18 in this case, we expect to get around 1000=3:18 deviates.

Remarks

 No clear rule to nd g.

(51)

{ g(y) should be similar and dominate f(y).

 The constant C maynot be easy to nd.

{ As an example, how do we determine C for the posterior distribution

p( j y) / (2 + )125(1 )3834

If it is hard to apply rejection method, what can be used?

(52)

Simulating Multivariate Random Variates

With multivariate distributions, one is often faced with enormous problems for random variate generation.

 Von Neumann's rejection method [von Neumann 1963] requires a case-by-case study.

{ It is dicult to determine a usable majorizing den- sity.

 The conditional method (generate one random vari- ate; generate the next one conditional on the rst one, and so forth) requires often dicult-to-compute marginal densities.

Consider the generation of multivariate normal with mean 0 and variance-covariance matrix  = (ij)pp.

(53)

{ Xp1 has a multivariate normal distribution i X can be written as

X =  + AZ

where pp, A are constant and X = (X1; : : : ; Xp)T where the Zj are independent standard normal variables.

{  = AAT

A is nonsingular i  is positive de nite.

{ By the spectral decomposition theorem, there ex- ists P orthogonal such that  = PTDP.

Here D is the diagonal matrix whose diagonal en- tries are nonnegative eigenvalues of .

{ If rank() = p, 1=2 = PD1=2PT. Useful R-command:

(54)

 solve: Solve a system of equations.

 eigen: Computes eigenvalues and eigenvectors.

 backsolve: Solve an upper or lower Triangular System.

 chol: Compute the Cholesky factorization of a real symmetric positive-de nite square matrix.

 qr: The QR decomposition of a matrix

{ Write X = ((X(1))T; (X(2))T)T, the conditional dis- tribution of X(2) given X(1) = x(1) is normal with mean (2) + 21111(x(1) (1)) and variance 22

2111121.

{ X1 is generated as N(1; 11),

{ X2 is generated as N(2+12X1=11; 22 122 =11),

(55)

and so on,

 Generate multivariate random variates by use of ei- ther iid univariates followed by a transformation.

Updating...

## References

Related subjects :