Chapter 3 Variance Reduction Methodology
3.3 IS For SN Factor
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
from the factor. Hence, we need to exploit proper IS method again to reduce the impact of factor.
Fig. 3-2 Comparisom of −F zm( ) /m under different loading values with b=0.2, m=100, q=0.1.
3.3 IS For SN Factor
As discussion in section 3.2, the key to improve efficiency of (3.2) is based on the elimination of residual randomness. Consider the (3.2) with ( )θ z =θm( ; )z q , any estimator pmq has the variance decomposition as following
[ mq] [ [ mq| ]] [ [ mq| ]]
Var p =E Var p Z +Var E p Z
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Applying IS to conditioning Z only makes [E Var p[ mq| ]]Z small. To get further improvement of efficiency, GL focus on the second term in the variance decomposition.
By simple algebra, we know that the zero-variance IS density for Var E p[ [ mq| ]]Z is
1 [ { }| ] ( | ; ; )
[ { m }]E I Lm mq Z z f z
E I L mq > = μ σ λ
>
It is pity that sampling from this density is generally infeasible because of the normalization constant. For ( , , )μ σ λ =(0,1,0) , GL thus suggest using original distribution with a appropriate mode as optimal density. Rather than choose IS density arbitrarily, it is intuitively clear that shifting the mode makes the likelihood ratio inside expectation to be small. For symmetric density, such strategy may make a substantial variance reduction. Whenever zero variance density cannot be approximated only by shifting the mode, however, this algorithm becomes less beneficial. For instance, when the structure of [ {E I Lm >mq}|Z =z f z] ( | ; ; )μ σ λ has a width which is very different from original density, shifting a drift will turn out to be ineffective mechanism.
Faced with a similar problem, Capriotti (2008) uses Levenberg-Marquardt method to provide a reasonable IS density which is not limited the determination of drift. The implementation to determine the optimal parameters, however, incurs another efficiency problem. To keep the algorithm efficient, particular care for robust parameter estimation needs to be taken in preliminary Monte Carlo simulation. This leads algorithm to be more complicated.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Instead of solving robust problem, we adopt another strategy to vanish randomness from Z . Let a new estimator applying IS for factor and conditional on factor to be
I L{ m>mq e} −2{θm( ; )Z q Lm−ψLm(θm( ; ), )}Z q Z W Z( ) (3.3)
Here W Z( )= f z( | ; ; ) /μ σ λ f*( | ; ; )z μ σ λ and f*( | ; ; )z μ σ λ denotes the IS density.
We put emphasis on the variance of (3.3) rather than variance decomposition. Observe the second moment of (3.3)
2 ( ; ) 2 ( ( ; ), ) 2
{ m } m z q Lm Lm m z q z ( ) E I L⎡⎢⎣ >mq e− θ + ψ θ W Z ⎤⎥⎦
≤E I L⎡⎢⎣ { m>mq e} 2Fm( )ZW2( )Z ⎤⎥⎦ ≤
∫
e2Fm( )zW2( ) ( | ; ; )Z f* z μ σ λ dzDirectly minimizing second moment is difficult, a surrogate to guide proper IS density factor is required and thus we consider the upper bound of second moment. To avoid integrating intricate function F z , we choose a different approximation. By the m( ) simple differentiation, we know F z is concave and then we have a loose upper m( ) bound by the first order Taylor expansion at point tm.
2Fm( )z 2( ) ( | ; ; )* 2{Fm( )tm Fm'( )(tm z tm)} 2( ) ( | ; ; )*
e W Z f z μ σ λ dz≤ e + − W Z f z μ σ λ dz
∫ ∫
=E e[{ {Fm( )tm +Fm'( )(tm Z t−m)}W Z( )} ]2
‧
Considering Jensen’s inequality, the inequality holds if
' '
then the formulation yields
'( )
this connection of (3.4) and (3.5) implies that we can design a new exponential twisted IS density where tm =F tm'( )m , namely m arg max{ ( )m 2/ 2}
t
t = F t −t
Once we have selected the new IS density of Z , the algorithm proceeds as follows:
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
1. Compute m arg max{ ( )m 2/ 2}
t
t = F t −t
2. Sample Z from f*( | ; ; )z μ σ λ
3. Twisting the conditional default probability 4. Return the estimator
( ; ) ( ( ; ), ) log ( )
ˆET { m } m Z q Lm Lm m Z q Z t Zm MZ tm p =I L >mq e−θ +ψ θ e− +
Fig. 3-3 Graph of F tm( )+F t zm'( )( − for a single factor t) with b=0.2,ρ=0.3,q=0.1 and m=1,000.
By simple algebra, we know that the new IS distribution for Z is the closed skew normal CSN t( ,1, ,m λ λ− tm,1) if the original distribution is SN(0,1, )λ . In fact, the appropriate IS distribution is not limited to the original distribution family and this result shows the difficulty in searching the optimal IS density. For more properties of closed skew normal one can see Graciela et al (2004).
‧
Consider a single factor homogeneous portfolio with the factor ~Z SN(0,1, )λ and
~ (0,1,0)
i SN
ε . Suppose the default and loss threshold are b m and mq respectively. Then the estimator ˆpET satisfy
(a)For λ≥0
Proof: The result would follow from a similar discussion in GL (2004). To start, we consider lower bound and upper bound of liminf and limsup respectively.
(a)λ≥0: First, we show that
‧
By conditional property, for anyυ>0, we have the following result
[ { m }]
E I L >mq = [ [ {E E I Lm >mq}| ( )p Z ≥ +q υ]]
=P L( m >mq p Z| ( )≥ +q υ) ( ( )P p Z ≥ +q υ) ≥P L( m >mq p Z| ( )= +q υ) ( ( )P p Z > +q υ)
The inequality holds because L is binomially distributed with parameter m m and ( )p Z .
Applying the lower bound (3.62) of Johnson et al. (1993), we have a lower bound for the conditional probabilityP L( m >mq p Z| ( )= +q υ)≥1/ 2.Substituting the result into the lower bound, we have
[ { m }]
‧
Applying l’Hospital’s rule, we obtain
2 1
which proves the formulation.
Next we show that
‧
By differentiation, we know that Fm( )⋅ is an increasing and concave function, thus we know for any tm∈ℜ
‧
where ( )φ ⋅ denotes the density function of standard normal random variable. For the first inequality, we get
'
By l’Hospital’s rule, we thus have
‧
By Jensen’s inequality, we complete the proof.
(b)λ<0:
First, we show the lower bound
‧
By the lower bound (3.62) of Johnson et al. (1993), we have
[ { m }] probability is lower bounded by
0 ( ( )zm 0) ( ( ( )zm 0)) κ φ υ κ+ Φ λ υ κ+
and we obtain
‧
Combining all results, we get the formulation.
‧
The inequality holds because of second mean value theorem for integral, so we know
‧
‧
Combining those inequalities, we obtain
By the limsup and liminf, we have
2 2
‧
‧
by the central limit theorem, for m large enough,
E I mq[ { <Lm <m q( +δ)}| ( )p Z ≤q] rule, by following the same steps discussed before then we get
‧
the same steps discussed before then we get2 2
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Combining all results, we get the formulation.
2 2
2
2 2
1 1 2
lim log {(ˆ ) } ( )
ET 1
m
E p b m
λ
ρ λ
→∞
= − +
+ □
Note That Theorem 3.1 shows that the estimator is asymptotical optimal only in the case λ≥0. With − +1 2 < < , the second moment decreases faster than the first λ 0 moment, but not twice as fast. For λ≤ − +1 2, however, the second moment even decreases slower than the first moment. This result implies that sampling from the new measure is no more effective than general Monte Carlo.
Fig. 3-4 Comparison of rate function under λ<0 for a single homogeneous model.
‧
method with parameter F tm'( )m does not achieve the maximum utility. Review the properties of exponentially twisted procedure, we know that the maximal variance reduction occurs when parameter F tm'( )m makes the mean value of f*( | ; ; )z μ σ λ(3.6) show that the maximal utility will happen if we apply exponential twist method to factor. The phenomenon coincides with that GL suggest. However, if λ<0, the mean value of f*( | 0;1; )z λ is
‧
algorithm is less efficient if λ is getting smaller. This result in (3.7) also corresponds with Fig. 3-4. Namely, negative λ incurs a width which makes the variance increase.In the next Chapter, we will tailor the algorithm to eliminate the effect from shape parameter λ .
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y Chapter 4 The New Method for SN Factor
In Chapter 3, we know that asymptotical efficiency can not be achieved because the nonlinear behavior of F Z is non-negligible. This suggests that to obtain further m( ) variance reduction we need to address the other component of F Z . Glasserman et al m( ) (1999) attempted to use stratification technique to decrease variability except for linear part. Here, we completely limit ourselves to IS methodology to build an efficient algorithm. Our approach emphasizes on choosing density “form” of Z rather than shifting, scaling or exponentially twisting. In the next section, we begin with a more general result in Chiang, Yueh, and Hsie (2007) (henceforth CYH) but for a different model.
4.1 Extension of CYH Importance Sampling Algorithm
The key idea in CYH is to find a simple alternative characterization of default event. To motivate the algorithm we take, observe the following proposition:
Proposition 4.1
Consider a single factor model where c = , i 1 bi = and b ρi= ; Random variables ρ Z and εi follow SN(0,1, )λ and SN(0,1,0) respectively. Then the set {Lm>mq} is equivalent to the event {Z >H[mq+1]} if H[mq+1] is denoted as [mq +1]th order statistics of { }Hi im=1, where
1 2 i i
H b m ρ ε
ρ
− −
=
‧
Proposition 4.1 indicates a simple alternative characterization for the event {Lm>mq}. It provides a simpler way to ensure that for every replication where the set we interest always takes place. By Proposition 4.1, we create an estimator of single SN factor model as following
{ m } r
I L >mq L (4.1)
where Lr = −1 F HZ( [mq+1]) denotes the likelihood ratio and F is the cumulative Z density function of Z . Clearly, (4.1) is not restricted to what the distribution of Z is.
This means that the algorithm is allowed to general case. We will consider behavior of (4.1) in the following theorem. By analyzing the asymptotical performance, we can find a useful guideline for choosing appropriate IS density of Z .
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Theorem 4.1
Consider a single factor model where c = , i 1 bi= , b ρi= and ( , )ρ Z ε follow the i same distribution assumption in Proposition 1, then (4.1) has bounded relative error.
Proof: First, we exploit the result shown in Lucas et al (2003). Assume that S is a j latent variable which obeys the general factor model
( , )
j j
S =g f ε
where f is common factor, ε is specific risk factor, and ( , )j g ⋅ ⋅ defines the functional form of the factor model. Lucas et al (2003) used the Theorem 12.13 of Williams (1991) and indicate that
* . *
1
lim1 n { j }a s ( j | )
n j
I S s P S s f
→∞n
=
< → <
∑
By the same argument as Lucas et al (2003), we then have
lim m a s. [ { i }| ]
m
L E I X b m Z
→∞ m → >
So, we have
‧
The second moment of (4.1) is written as (by Theorem 1.10 of Shao (1998) and Theorem 4.3.1 of Sen and Singer (1993))
[ { m } ]2r
‧
The method works well for portfolio whose tail behavior is dominated by a “key”
random variable. To show that the results have content, we give two specific examples.
For the first example, consider the model
( 1 2 )
Here Z is a standard normal random variable, ε are i.i.d standard normal random i variables, and S is chi-square distribution with r degrees of freedom. Applying the key idea behind Proposition 1, we can find the set {S<(Hi)[mq+1]} where
( 1 2 ) /
i i i i
H = ρZ+ −ρ ε b m. It is the simpler expression of {Lm>mq}. Note that the sample of Z is generated from the original distribution. By a analogous algorithm in section 3.3, we get a efficient estimation of {Lm>mq}. In this model, the random variable S plays a key role to vanish variability. Changing the measure of S is sufficient to achieve substantial variance reduction.
Table 4-1:Comparison of different methods for m=250; ν =12;q=0.25
.
‧
Table 4-1 shows the performance of two estimators. Algorithm 1 is the suggestion of Bassamboo et al (2008) and we know that it has the bounded relative error. In the last column, we list the sample variance ratio V R.
2 4 simulation in Table 4-1. But, note that the implementation of the new method is more easily.
The next example illustrates the normal case discussed in Glasserman (2004). All obligors are divided into two blocks. The first block consists of m obligors whose 1 marginal default probability isp . This block is dominated by the factor 1 Z and has a 1 common loading a . The second block comprises the last 1 m−m1 obligors. All obligors in the second block have marginal default probability p and affected only by 2 factor Z with a common loading 2 a . This model is 2
‧
Table 4-2 shows the performance of general Monte Carlo and the CYH method.
Note that variance reduction is measured relative to general Monte Carlo simulation.
The CYH method provides an excellent performance than general Monte Carlo. The behavior of loss distribution seems to be successfully captured by the CYH method.
Table 4-2:Comparison for( ,m m a a1; ; ; ;1 2 p p1 2) (1,000;150;0.8;0.7;0.05;0.001)=
( )
P L>
Monte Carlo (Run 10 )5 CYH Method (Run 10 ) 3 Estimation S.E Estimation S.E
.
Although using order statistic increases simplicity, the flexibility is restricted simultaneously. For instance, if all the exposures c are different from each other, then i the sorting and partitioning procedures make the method time consuming. Obviously, the original problem in estimating rare event is transferred into another one. For more discussions about determination of key random variable is referred in Lucas et al (2003).
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
4.2 The Proposed algorithm for Skew Factor Model
Note the conclusion in Theorem 4.1, we know the likelihood L has an excellent r utility in variance reduction. Although the new method is inflexible to tackle inhomogeneous portfolio, it provides a way to build IS density for Z . In the following, we will introduce the strategy to search an effective IS algorithm.
Consider the result described in section 3.3, we know that vanishing linear variability of F z can increase the efficiency of simulation except for m( ) λ <0. Therefore, the procedure of eliminating the linear part of F z is essential. This m( ) means the new likelihood ratio LNewr ( )z = f z( | 0;1; ) /λ fNew( )z must contain the function exp(−t zm ), namely
( ) exp( )
New
r m
L z ∝ −t z (4.2)
Furthermore, in the second part of Theorem 3.1, we find that the nonlinear behavior of
m( )
F z seriously effect the efficiency of variance reduction. To eliminate this effect
from the nonlinear part, we consider the limit regime rather than integral itself. We focus on modifying the other part of density of Z but for vanishing nonlinear part of
m( )
F z directly. With the definition of asymptotically optimal, we need to find a
likelihood ratio which decrease as fast as possible if we apply IS to Z .
In Theorem 4.1, L is of the bounded relative property. It is reasonable to utilize r
‧
asymptotical decay rate of L to create appropriate IS density. In multifactor and r inhomogeneity case, however, getting a likelihood ratio like L is difficult. So we turn r attention to setting where expectation of likelihood ratio decays in the same rate of L . r In other words, we expect the following equation holds
lim log 1 then determined. Note that the combinative way leads to not only vanishing the linear effect but considering the nonlinear part of F z simultaneously. m( )
To represent our procedure precisely, we consider the setting where
~ (0,1, )
If m is large sufficiently, an approximation for the integral (e.g, Shao 1998, Chap. 1 and Sen and Singer 1993, Chap. 4) suggests that
1( ) 1 2
‧
where ξ denotes a constant. The last equation holds because of the second mean value theorem for integral. Note that the choice of an appropriate IS density in such procedure is not unique. For instance, the following density
( ) 2 ( ) (| | ( ))
N
New m m
f z = φ z−t Φ λ z+t
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
is another feasible one. By the similar argument, a appropriate IS density for λ ≥ is 0
( ) 2 ( ) ( )
( | 0;1; )
t zm
P
fNew z e z z
M t φ λ
= λ Φ (4.5)
Especially, when λ =0, fNewP becomes the IS density GL suggest. For notational simplicity, considering ( , )μ σ =(0,1), we build our algorithm as following:
1. Calculate m arg max{ ( )m 2/ 2}
t
t = F t −t .
2. Check value λ of Z , choose (4.5) as IS density of Z if λ ≥ and (4.4) 0 otherwise.
3. Set t to the IS density. m
4. Sampling Z and calculate the product LNewr ( )z of each likelihood ratio.
5. Compute ( ; )θm z q .
6. Return the estimate {I Lm>mq L} r
where Lr =e−θm( ; )Z q Lm+ψ θm(m( ; ), )Z q Z LNewr is the combined likelihood ratio. If we repeat step 1 to 6 times, an estimator ˆpNew can be constructed by averaging the values of
the estimates, we have a estimation for (P Lm>mq) under skew normal copula model.
Once we have selected a new parameter vector t which satisfies m
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
arg max{ ( ) 1 } 2
T
m m
t
t = F t − t t ,
choosing (4.4) or (4.5) and component of t for single IS density; we can easily extend m the single factor to multiple factors.
‧
4.3 Asymptotic Optimality
We now consider the performance of the estimator ˆpNew. The strength of our proposed lies in its variance reduction efficiency established by the following theorem:
Theorem 4.2
Consider the same assumption in Theorem 4.1, then (a) For λ ≥ 0
By the similar argument in Theorem 3.1, we know for arbitrary τ >0
‧
‧
By the similar discussion in Theorem 3.1, we have
2 2
Using the second mean value theorem for integral, for a small value ζ , we get
2 2
‧
Combining all the result and applying Jensen’s inequality we complete the proof. □
This result indicates that our proposed IS algorithm should be effective in estimating loss distribution. Even though the assumption in Theorem 4.2 is for homogeneous single factor model, the proposed algorithm is practicably applied to multifactor and inhomogeneity cases. Note that our proposed algorithm does not require what density the factor Z should follow. When the specific factors are of arbitrary distribution, we need only to modify the associated fNew( )z to satisfy equation (4.3). In next chapter, our numerical results for skew normal factor model also confirm the expectation.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y Chapter 5 Implementation Issues
In this chapter we compare performance of the new estimator ˆpNew with general Monte Carlo simulation. We investigate sensitivity to λ , b , ρ and q . The broad conclusions are that the new algorithm provides significant improvement over the performance of general Monte Carlo simulation. This improvement increase as the event becomes rare. This result supports our theoretical conclusions that the sample variance ratio, as measured by the ratio of the standard deviation of general Monte Carlo simulation to the standard deviation of ˆpNew, remains well behaved as the probability of large losses becomes increasingly rare.
For implementation of new algorithm, (4.4) is easily generated using the inverse transform method. However, the cumulative distribution associated with (4.5) does not have a closed form. It is not straightforward to use the inverse transform methods to generate samples from this distribution. Instead, we use a root-finding method of numerical integration to generate samples we need.
Our first example is a single factor portfolio of m =1,000 and ~Z SN(0,1, )λ . The model parameters are chosen to be q =0.4, b =0.0345, 0.3ρ = and exposure
i 1
c = . We generate 5,000 samples for proposed algorithm and 100,000 samples for
general Monte Carlo simulation. Table 5-1 reports samples variance ratio for several values of λ in estimating (P Lm>mq).
‧
Table 5-1: Variance Reduction for decreasing λ.
( m )
At small value of λ, the variance ratio becomes very large. The performance of ˆNew
p is significantly better than general Monte Carlo simulation. The improvement is substantial especially for negative value of λ. Note that the variance ratio rapidly changes when negative value λ varies slowly.
Table 5-2: Variance Reduction for increasing b.
( m )
Table 5-2 shows the performance of the proposed algorithm as b changes. Again
‧
c = . We generate 5,000 samples for proposed algorithm and 100,000 samples for
general Monte Carlo simulation. In last column, we observe that all performances are significantly better than general Monte Carlo simulation. The variance ratio improves as
b increases.
Table 5-3 shows performance of the proposed algorithm as factor loading ρ changes. In this case, the parameters of model are m =1,000, q =0.4, λ = −1,
0.0345
b = and c = . We generate 5,000 samples for proposed algorithm and i 1 100,000 samples for general Monte Carlo simulation. All results perform significantly better than general Monte Carlo simulation, especially when ρ decrease.
Table 5-3: Variance Reduction for increasing ρ .
( m )
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
from the interval (-1,0) The exposures c is kept fixed at 1; i b and i a are distributed ij uniformly from (0.02, 0.07) and (0, 1/ 5 ) respectively. Table 5-4 compares the performance of the proposed algorithm with general Monte Carlo simulation as q change. The general Monte Carlo simulation results are based on 50,000 replications whereas the number of IS replications is 1,000. When the loss level is small, the proposed algorithm is a bit better than general Monte Carlo simulation. At large values of q , the {Lm>mq} becomes rare and then the variance ratio becomes large. The improvement is obvious for q in the range of 0.45 to 0.5.
Table 5-4: Variance Reduction for increasing q.
( m )
P L >mq
Method
General Monte Carlo (Runs:5 10× 5)
ˆNew p
(Runs:1 10× 3) q Prob. est S E. Prob. est S E.
. V R
0.35 1.90 10× −3 1.94 10× −4 1.91 10× −3 1.79 10× −4 59 0.40 8.01 10× −4 1.26 10× −4 7.55 10× −4 7.22 10× −5 153 0.45 2.20 10× −4 6.63 10× −5 2.72 10× −4 2.59 10× −5 325 0.50 1.00 10× −4 4.47 10× −5 1.00 10× −4 1.09 10× −5 827
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y Chapter 6 Concluding Remarks
In this thesis, we have proposed a new algorithm for estimation of tail probability in skew normal copula model. We started with the case of applying exponential twist technique to the default random variables conditional on common factors. However, GL show that the conditional IS estimator does not achieve asymptotical optimal unless the correlation between obligors is very weak. Therefore, GL further suggest shift the mean of underlying factor to eliminate the residual variability. This procedure makes the algorithm asymptotical optimal.
Different from the normal copula model, however, the leptokurtic and asymmetric characters of skew normal result in the situation where second moment of IS estimator converge in unintelligible decreasing rate. So, to choose IS density of underlying factors becomes intricate. To improve the efficiency of simulation, we intuitively consider the usual exponential twist to eliminate the linear part of F z . Surprisingly, using m( ) exponential twist does not guarantee variance reduction. A way to speed up the decreasing rate of likelihood ratio is necessary.
Further analyze the failure of case (3.7), we know that the achievement of optimality depends on the location of mean value. Once the mean value locate on a specific point tm, we will obtain the maximal utility of simulation. We had considered a new exponential twist density eH z( )f z( | 0;1; ) /λ MZ( ( ))H z where H t( )m =F tm'( )m and
( )m
H t simultaneously satisfy the following equation
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
( )
( | 0;1; ) ( ( ))
H tm z
m
Z m
z e f z dz t
M H t λ
∞
−∞ →
∫
However, searching the function H ⋅( ) is not easy in practice and that is to be the one direction of future work. We next extend the CYH method to solve utility problem. By finding an asymptotical behavior, we can decide the IS density of factors.
Note that our proposed algorithm is also applied to other factor assumption.
Because our consideration of building IS density put emphasis on adjusting the width of a distribution to mimic the form of optimal density but for the determination of the optimal shifting. We have successfully extended single factor assumption in CYH to multifactor cases and illustrated its effectiveness in more complex cases through numerical results. The other direction of future work is to extend the approach to
Because our consideration of building IS density put emphasis on adjusting the width of a distribution to mimic the form of optimal density but for the determination of the optimal shifting. We have successfully extended single factor assumption in CYH to multifactor cases and illustrated its effectiveness in more complex cases through numerical results. The other direction of future work is to extend the approach to