Monte Carlo Integration

(1)

Monte Carlo Integration

Digital Image Synthesis Yung-Yu Chuang

12/6/2007

with slides by Pat Hanrahan and Torsten Moller

Introduction

• The integral equations generally don’t have analytic solutions, so we must turn to numerical methods.

• Standard methods like Trapezoidal integration or Gaussian quadrature are not effective for high-dimensional and discontinuous integrals.

= ) ω p, ( _o

Lo L_e(p,ω_o)

i i i

i

o,ω ) (p,ω )cosθ ω ω

p,

2 f( L_i d

∫

s

+

Numerical quadrature

• Suppose we want to calculate , but can’t solve it analytically. The approximations through quadrature rules have the form

which is essentially the weighted sum of samples of the function at various points

∫

= ^b

a f x dx

I ( )

∑

=

= ⁿ

i

i if x w I

1

) ˆ (

Midpoint rule

convergence

(2)

Trapezoid rule

convergence

Simpson’s rule

• Similar to trapezoid but using a quadratic polynomial approximation

convergence

assuming f has a continuous fourth derivative.

Curse of dimensionality and discontinuity

• For an sd function f,

• If the 1d rule has a convergence rate of O(n^-r), the sd rule would require a much larger number (n^s) of samples to work as well as the 1d one.

Thus, the convergence rate is only O(n^-r/s).

• If f is discontinuous, convergence is O(n^-1/s) for sd.

Randomized algorithms

• Las Vegas v.s. Monte Carlo

• Las Vegas: always gives the right answer by using randomness.

• Monte Carlo: gives the right answer on the average. Results depend on random numbers used, but statistically likely to be close to the right answer.

(3)

Monte Carlo integration

• Monte Carlo integration: uses sampling to estimate the values of integrals. It only

requires to be able to evaluate the integrand at arbitrary points, making it easy to implement and applicable to many problems.

• If n samples are used, its converges at the rate of O(n^-1/2). That is, to cut the error in half, it is necessary to evaluate four times as many samples.

• Images by Monte Carlo methods are often noisy.

Most current methods try to reduce noise.

Monte Carlo methods

• Advantages

– Easy to implement

– Easy to think about (but be careful of statistical bias) – Robust when used with complex integrands and

domains (shapes, lights, …)

– Efficient for high dimensional integrals

• Disadvantages

– Noisy

– Slow (many samples needed for convergence)

Basic concepts

• X is a random variable

• Applying a function to a random variable gives another random variable, Y=f(X).

• CDF (cumulative distribution function)

• PDF (probability density function): nonnegative, sum to 1

• canonical uniform random variable ξ (provided by standard library and easy to transform to other distributions)

dx x x dP

p ( )

) ( ≡

} Pr{

)

(x X x

P ≡ ≤

Discrete probability distributions

• Discrete events

X

_i with probability

p

_i

• Cumulative PDF (distribution)

• Construction of samples

• To randomly select an event,

• Select

X

_i if

i 0 p ≥

1

n i i

p

=

∑

=

1

i i

P₋ < ≤U P

pi

1 j

j i

i

P p

=

∑

U 1

0

Uniform random variable X₃ Pi

(4)

Continuous Probability Distributions

• PDF

• CDF

1

0 ( ) Pr( )

P x = X < x

Pr( ) ( )

( ) ( )

X p x dx

P P

β α

α β

β α

≤ ≤ =

= −

∫

( ) p x

0 1

Uniform

( ) 0 p x ≥

0

( ) ( )

x

P x =

∫

p x dx

(1) 1

P =

( ) P x

Expected values

• Average value of a function f(x) over some distribution of values p(x) over its domain D

• Example: cos function over [0, π], p is uniform

[ ]

⁼

∫

D

p f x f x p x dx

E ( ) ( ) ( )

[ ]

⁼

∫

0^π _π1 ⁼0

cos )

cos(x x dx

E_p

Variance

• Expected deviation from the expected value

• Fundamental concept of quantifying the error in Monte Carlo methods

[ ]

( )

[

⁽ ⁾ ⁽ ⁾ ²

]

)]

(

[f x E f x E f x

V = −

Properties

[

af(x)

]

aE

[

f(x)

]

E =

[ ]

∑

⎥⎦⁼

⎢ ⎤

⎣

⎡

i

i i

says that the expected value E[F_N] of the estimator F_Nequals the integral

∑

=

= − ^N

i

N f X

N a F b

1

) (

General Monte Carlo estimator

• Given a random variable X drawn from an arbitrary PDF p(x), then the estimator is

• Although the converge rate of MC estimator is O(N^1/2), slower than other integral methods, its converge rate is independent of the dimension, making it the only practical method for high dimensional integral

∑

=

= ^N

i i

i

N p X

X f F N

1 ( )

) 1 (

Convergence of Monte Carlo

• Chebyshev’s inequality: let X be a random variable with expected value μ and variance σ². For any real number k>0,

• For example, for k= , it shows that at least half of the value lie in the interval

• Let , the MC estimate FYi = f⁽Xi⁾^/p⁽Xi⁾ _N becomes

∑

=

= ^N

i i

N Y

F N

1

2

} 1

|

Pr{|X −μ ≥kσ ≤ k

2

) 2 ,

2

(μ− σ μ+ σ

Convergence of Monte Carlo

• According to Chebyshev’s inequality,

• Plugging into Chebyshev’s inequality,

So, for a fixed threshold, the error decreases at the rate N^-1/2.

[ ]

V

[ ]

Y Y N

N V Y N V

N Y V F V

N

i i N

1 1

1 ] 1

[

1 2 1

2

1

∑ ∑

∑

= = =

=

⎥=

⎦

⎢ ⎤

⎣

= ⎡

⎥⎦

⎢ ⎤

⎣

= ⎡

δ _⎪⎭^≤δ

⎪⎬

⎫

⎪⎩

⎪⎨

⎧ ⎟

⎠

⎜ ⎞

⎝

≥⎛

− ²

] 1

| [ ] [

|

Pr _N _N V F^N

F E F

δ _⎪⎭^≤δ

⎪⎬

⎫

⎪⎩

⎪⎨

⎧ ⎟

⎠

⎜ ⎞

⎝

≥ ⎛

− ²

] 1

[

| 1

|

Pr V Y

I N F_N

(6)

Properties of estimators

• An estimator F_N is called unbiased if for all N

That is, the expected value is independent of N.

• Otherwise, the bias of the estimator is defined as

• If the bias goes to zero as N increases, the estimator is called consistent

Q F

E[ _N]=

Q F E

F_N]= [ _N]− β[

0 ] [

lim =

∞

→ N

N β F

Q F

E _N

N =

∞

→ [ ]

lim

Example of a biased consistent estimator

• Suppose we are doing antialiasing on a 1d pixel, to determine the pixel value, we need to

evaluate , where is the filter function with

• A common way to evaluate this is

• When N=1, we have

∫

= ¹

0w(x)f(x)dx

I w(x)

∫

0¹w( dxx) ⁼1

∑ ∑

=

= = _N

i i

N

i i i

N w X

X f X F w

1 1

) (

) ( ) (

I dx x f X

f X E

w X f X E w F

E ⎥ = = ≠

⎦

⎢ ⎤

⎣

= ⎡ 1

∫

0¹

1 1 1

1 [ ( )] ( )

) (

) ( ) ] (

[

Example of a biased consistent estimator

• When N=2, we have

• However, when N is very large, the bias approaches to zero

I dx x dx

w x w

x f x w x f x F w

E ≠

+

=

∫ ∫

0¹ +

1

0 1 2

2 1

2 2 1

1

2 ( ) ( )

) ( ) ( ) ( ) ] (

[

∑

=

= =

N

i i

N

i i i

N

X N w

X f X N w

F

1 1

) 1 (

) ( ) 1 (

I dx x f x w dx

x w

dx x f x w X

N w

X f X N w

F E

N

i i

N N

i i i

N

N N = = =

∫

=

∫

∑

∞ =

→

∞ =

→

∞

→

1 1 0

0 1 0

1

1 ( ) ( )

) (

) ( ) ( )

1 ( lim

) ( ) 1 (

lim ] [ lim

Choosing samples

•

• Carefully choosing the PDF from which samples are drawn is an important technique to reduce variance. We want the f/p to have a low

variance. Hence, it is necessary to be able to draw samples from the chosen PDF.

• How to sample an arbitrary distribution from a variable of uniform distribution?

– Inversion – Rejection – Transform

∑

=

= ^N

i i

i

N p X

X f F N

1 ( )

) ( 1

(7)

Inversion method

• Cumulative probability distribution function

• Construction of samples

Solve for X=P^-1(U)

• Must know:

1. The integral of p(x)

2. The inverse function P^-1(x) U

X 1

0 ( ) Pr( )

P x = X < x

Proof for the inversion method

• Let U be an uniform random variable and its CDF is P_u(x)=x. We will show that Y=P^-1(U) has the CDF P(x).

because P is monotonic,

Thus, Y’s CDF is exactly P(x).

{ }

^Pr

{

⁽ ⁾

}

^Pr

^{

⁽ ⁾

^}

⁽ ⁽ ⁾⁾ ⁽ ⁾

Pr Y ≤x = P⁻¹ U ≤ x = U ≤P x =P_u P x =P x

2 1 2 1) ( )if

(x P x x x

P ≤ ≤

Inversion method

• Compute CDF P(x)

• Compute P^-1(x)

• Obtain ξ

• Compute X_i=P^-1(ξ)

Example: Power Function

• Assume

( ) ( 1) ⁿ p x = n+ x

1 1 1

0 0

1

1 1

n

n x

x dx n n

+

= =

+ +

∫

( ) ⁿ 1

P x =x ⁺

1 1

~ ( ) ( ) ⁿ

X p x ⇒ =X P⁻ U = ⁺ U

1 2 1

max( , , , _n, _n ) Y= U U L U U ₊

1 1

1

Pr( ) ⁿ Pr( ) ⁿ

i

Y x U x x

+ +

=

< =

∏

< = Trick

It is used in sampling Blinn’s microfacet model.

(It only works for sampling power distribution)

Similarly, a trick to obtain a Gaussian distribution is to take average.

(8)

• Compute CDF P(x)

• Compute P^-1(x)

• Obtain ξ

• Compute X_i=P^-1(ξ)

Example: exponential distribution

useful for rendering participating media.

ce

ax

x

p ( ) =

⁻

0

= 1

∫

^∞

^ce

⁻^ax

^dx ^c ⁼ ^a

x as ax

e ds

ae x

P ⁽ ⁾ = ∫

0 ⁻

= ¹ −

⁻

)

1 1 ln(

)

1

(

a x x

P

⁻

= − −

ξ ξ 1 ln

) 1 1 ln(

a X = − a − = −

Rejection method

• Algorithm

Pick U₁and U₂

Accept

U

₁if

U

₂

< f(U

₁

)

• Wasteful?

1

0

( )

y f x

I f x dx dx dy

<

=

∫

∫∫

^y⁼ ^{f x}^{( )}

Efficiency = Area / Area of rectangle

• Sometimes, we can’t integrate into CDF or invert CDF

Rejection method

• Rejection method is a dart-throwing method without performing integration and inversion.

1. Find q(x) so that p(x)<Mq(x) 2. Dart throwing

a. Choose a pair (X, ξ),where X is sampled from q(x) b. If (ξ<p(X)/Mq(X)) return X

• Equivalently, we pick a point (X, ξMq(X)). If it lies beneath p(X) then we are fine.

Why it works

• For each iteration, we generate X_i from q. The sample is returned if ξ<p(X)/Mq(X), which happens with probability p(X)/Mq(X).

• So, the probability to return x is

whose integral is 1/M

• Thus, when a sample is returned (with

probability 1/M), X_i is distributed according to p(x).

M x p x Mq

x x p

q ( )

) (

) ) (

( =

(9)

Example: sampling a unit disk

void RejectionSampleDisk(float *x, float *y) { float sx, sy;

do {

sx = 1.f -2.f * RandomFloat();

sy = 1.f -2.f * RandomFloat();

} while (sx*sx + sy*sy > 1.f)

*x = sx; *y = sy;

}

π/4～78.5% good samples, gets worse in higher dimensions, for example, for sphere, π/8～39.3%

Transformation of variables

• Given a random variable X from distribution p_x(x) to a random variable Y=y(X), where y is one-to- one, i.e. monotonic. We want to derive the distribution of Y, p_y(y).

•

• PDF:

) ( } Pr{

)}

( Pr{

)) (

(y x Y y x X x P x

P_y = ≤ = ≤ = _x

dx x dP dx

x y

dP_y( ( )) _x( )

=

) (x p_x dx

dy dy

y dP dx y dy

p_y ^y( )

)

( =

) ( )

(

1

x dx p

y dy

p_y _x

−

⎟⎠

⎜ ⎞

⎝

=⎛

x

y P_x(x)

P_y(y)

Example

X Y

x x

p_x sin

2 ) (

=

2 1 1

1 sin 2 cos ) 2 ( ) (cos )

(

y y x

x x p x y

p_y _x

= −

=

= ⁻ ⁻

Transformation method

• A problem to apply the above method is that we usually have some PDF to sample from, not a given transformation.

• Given a source random variable X with p_x(x) and a target distribution p_y(y), try transform X into to another random variable Y so that Y has the distribution p_y(y).

• We first have to find a transformation y(x) so that P_x(x)=P_y(y). Thus,

)) ( ( )

(x P ¹ P x y = _y⁻ _x

(10)

Transformation method

• Let’s prove that the above transform works.

We first prove that the random variable Z= P_x(x) has a uniform distribution. If so, then

should have distribution P_x(x) from the inversion method.

Thus, Z is uniform and the transformation works.

• It is an obvious generalization of the inversion method, in which X is uniform and P_x(x)=x.

)

1( Z P_y⁻

{

^Z ≤^x

}

=Pr

{

^Px(^X)≤^x

}

=Pr

{

^X ≤^Px⁻ (^x)

}

=^Px(^Px⁻ (^x))=^x

Pr ¹ ¹

Thus, if X has the distribution , then the random variable has the distribution

Example

x x

p_x( ) = py(y) =e^y

) 2 (

x2

x

P_x = Py(y) =e^y

2 ln ln

2 2 ) ln(

)) ( ( )

(

2

1 = = −

= ⁻ x x

x P P x

y _y _x

y y

P_y⁻¹( )=ln

2 ln ln

2 −

= X

Y

x x p_x( )=

y

y y e

p ( )=

y

Multiple dimensions

We often need the other way around,

Spherical coordinates

• The spherical coordinate representation of directions is

θ φ θ

φ θ cos

sin sin

cos sin

r z

r y

r x

=

θ sin

|

|J_T =r²

) , , ( sin )

, ,

(r r² p x y z

p θ φ = θ

(11)

Spherical coordinates

• Now, look at relation between spherical directions and a solid angles

• Hence, the density in terms of

φ θ θ

ω d d

d = sin

φ θ ,

ω ω φ

θ φ

θ d d p d

p ( , ) = ( ) ) ( sin )

,

( θ φ θ p ω

p =

Multidimensional sampling

• Separable case: independently sample X from p_x and Y from p_y. p(x,y)=p_x(x)p_y(y)

• Often, this is not possible. Compute the marginal density function p(x) first.

• Then, compute the conditional density function

• Use 1D sampling with p(x) and p(y|x).

∫

= p x y dy x

p ( ) ( , )

) (

) , ) (

|

( p x

y x x p

y

p =

Sampling a hemisphere

• Sample a hemisphere uniformly, i.e.

• Sample θ first

• Now sampling ψ

ω π 2 ) 1

( =

p

π φ θ

θ 2

) sin ,

( =

p

∫

Ω

= ( )

1 p ω

π 2

= 1 c

θ π φ

φ θ φ θ

θ

^π ^π sin

2 ) sin

, ( )

(

2

0 2

0

=

∫

^p ^d

∫

^d

p

π θ

φ θ θ

φ

2

1 )

( ) , ) (

|

( = =

p p p

c p(ω)=

Sampling a hemisphere

• Now, we use inversion technique in order to sample the PDF’s

• Inverting these:

θ α

α

θ

) ^αsin 1 cos (

0

−

=

∫

^d

P

π α φ θ π

φ

^α

2 2

) 1

| (

0

=

∫

^d

P

2 1 1

2 cos

πξ φ

ξ θ

=

= ⁻

(12)

Sampling a hemisphere

• Convert these to Cartesian coordinate

• Similar derivation for a full sphere

2 1 1

2 cos

πξ φ

ξ θ

=

= ⁻

1

2 1 1

cos

1 ) 2 sin(

sin sin

1 ) 2 cos(

cos sin

ξ θ

ξ πξ

φ θ

ξ πξ

( ) , ) (

|

( = =

r p

r r p

p ) 2

(r r

P = π

θ θ ) 2

| ( r = P

ξ1

=

r θ =2πξ₂

(13)

Shirley’s mapping

1 2

4 1

r U U U θ π

=

Sampling a triangle

2 1

1 1 1

0 0 0

0

(1 ) 1

(1 )

2 2

u u

A=

∫ ∫

⁻ dv du=

∫

−u du= − − = ( , ) 2

p u v = 0 0

1 u

v u v

≥

≥ + ≤

u v

u+ =v 1 ( ,1u −u)

) (

) , ) (

|

( p v

v u v p

u

p ≡

Sampling a triangle

• Here

u

and

v

are not independent!

• Conditional probability

( , ) 2 p u v =

0

0 0 0 0 0

0 0

( | ) ( | ) 1

(1 ) (1 )

o o

v v v

P v u p v u dv dv

u u

= = =

− −

∫ ∫

( | ) 1

(1 ) p v u

= u

−

0 2

0 0 0

( ) ^u 2(1 ) (1 )

P u =

∫

−u du = −u

1

0

( ) 2 2(1 )

u

p u dv u

−

=

∫

= −

0 1 1

u = − U

0 1 2

v = U U

∫

≡ p u v dv u

p( ) ( , )

Cosine weighted hemisphere θ

ω ) cos

( ∝

p

∫

Ω

= p ( ω ) d ω 1

∫ ∫

=

²^π ^π

θ θ θ φ

0 0

2

cos sin

1 c d d

φ θ θ

ω d d

d = sin

∫

=

²

0

cos sin

2 1 c π

^π

θ θ d θ

π¹

= c

θ π θ

φ

θ 1 cos sin )

,

( =

p

(14)

Cosine weighted hemisphere θ

π θ φ

θ 1 cos sin )

,

( =

p

θ θ

θ φ

θ π θ

θ

^π

1 cos sin 2 cos sin sin 2 )

(

²

0

= =

= ∫ ^d

p

π θ

φ θ θ

φ 2

1 )

( ) , ) (

|

( = =

p p p

2

1

2 1 2 cos ) 1

( θ = − θ + = ξ P

2

)

|

( ξ

π θ φ

φ = =

P

) 2 1 ( 2 cos 1

1

ξ

θ =

⁻

−

2 πξ

2

φ =

Cosine weighted hemisphere

• Malley’s method: uniformly generates points on the unit disk and then generates directions by projecting them up to the hemisphere above it.

Vector CosineSampleHemisphere(float u1,float u2){

Vector ret;

ConcentricSampleDisk(u1, u2, &ret.x, &ret.y);

ret.z = sqrtf(max(0.f,1.f - ret.x*ret.x - ret.y*ret.y));

return ret;

}

r φ

Cosine weighted hemisphere

• Why Malley’s method works?

• Unit disk sampling

• Map to hemisphere p(r,φ)=π^r

) , (sin )

,

(r φ ⇒ θ φ

φ θ

φ φ

θ

=

= sin r

) , ( ^r φ

Y = T X = ( θ , φ )

) ( )

( ))

(

( T x J x

¹

p x

p

_y

=

_T ⁻ _x

θ θ

1 cos 0

0 ) cos

( ⎟⎟ ⎠ =

⎜⎜ ⎞

⎝

= ⎛ x J

_T

Cosine weighted hemisphere

φ φ

θ

=

= sin r

) , ( r φ

Y = T X = ( θ , φ )

) ( )

( ))

(

( T x J x

¹

p x

p

_y

=

_T ⁻ _x

θ θ

1 cos 0

0 ) cos

( ⎟⎟ =

⎠

⎜⎜ ⎞

⎝

= ⎛ x J

_T

π θ φ θ

φ

θ , ) ( , ) ^cos ^sin

( = J p r =

p

_T

(15)

Sampling Phong lobe θ

ω

ⁿ

p( )∝cos

θ

ω

c ⁿ

p( )= cos

∫ ∫

= =

=

π φ

π θ

φ θ θ

2

θ

0 2 /

0

1 sin

cos d d

c ⁿ

∫

=

− ⁰

θ

cos sin

2 ) 1 ,

( n ⁿ

p +

=

Sampling Phong lobe

' cos 1

1 )cos 1 ( cos cos

) 1 (

sin cos ) 1 ( ) ' (

sin cos ) 1 ( sin 2 cos

) 1 (

1

' cos

1 cos ' 1

0 '

0 2

0

θ

θ θ θ

θ θ θ θ

θ θ φ

θ π θ

θ

θ θ

θ θ θ

π φ

+

= +

=

−

=

+ +

−

= +

−

=

+

=

+ + =

=

∫

n

n n

n

n n

n n d

n

d n

P

n n d

p

( )

¹ 1

cos⁻1 ⁺

= ⁿ ξ

θ

θ π θ

φ

θ cos sin

2 ) 1 ,

( n ⁿ

p = +

Sampling Phong lobe

π φ φ θ π

φ

π θ θ

θ θ θ

φ θ θ

φ

φ φ

π

2 ' 2

) 1

|' (

2 1 sin cos ) 1 (

sin cos )

( ) , ) (

| (

'

0

21

∫

=

+

=

+ =

=

d P

n p

p p _n

n n

2πξ2

φ=

θ π θ

φ

θ cos sin

2 ) 1 ,

( n ⁿ

p = +

Sampling Phong lobe

) 2 , (cos ) , ( 1,

n= θ φ = ⁻¹ ξ₁ πξ₂ ⎟

⎠

⎜ ⎞

⎝

⎛ −

= cos⁻¹(1 2 ₁),2 ₂ 2

) 1 ,

(θ φ ξ πξ

cosine-weighted hemisphere

2 2 1 2cos ) 1

(θ =− θ+ θ P

θ

θ) 1 cos ¹ 1 cos² ( = − ⁿ⁺ = − P

θ θ

θ

θ ² sin² 1 cos²

2 ) 1 sin 2 1 2( 1 2 2 1 2cos

1 + =− − + = = −

−

When n=1, it is actually equivalent to