AN INTRODUCTION TO EXTREME ORDER STATISTICS AND ACTUARIAL APPLICATIONS

(1)

AN INTRODUCTION TO EXTREME ORDER

STATISTICS AND ACTUARIAL APPLICATIONS

2004 ERM Symposium, Chicago April 26, 2004

Sessions CS 1E, 2E, 3E:

Extreme Value Forum

H. N. Nagaraja (hnn@stat.ohio-state.edu) Ohio State University, Columbus

Hour 1: ORDER STATISTICS - BASIC THEORY & ACTUARIAL

EXAMPLES

Hour 2: EXTREME VALUES - BASIC MODELS

Hour 3: EXTREME VALUES - INFERENCE AND APPLICATIONS

(2)

Hour 1: ORDER STATISTICS - BASIC THEORY & ACTUARIAL

EXAMPLES

1. Exact Distributions of Order Statistics

• Uniform Parent

• Exponential

• Normal

• Gumbel

2. Order Statistics Related Quantities of Actuarial In- terest

• Last Survivor Policy

• Multiple Life Annuities

• Percentiles and VaR (Value at Risk)

• Tail Probabilities

• Conditional Tail Expectation (CTE)

• Exceedances and Mean Excess Function 3. Large Sample Distribution of Order Statistics

• Central Order Statistics (Central Percentiles)

• Extreme Order Statistics

• Intermediate Order Statistics

(3)

4. Sum of top order statistics & sample CTE 5. Data Examples and Model Fitting

• Nationwide Data

• Danish Fire-loss Data

• Missouri Flood level Data

• Venice Sea Level Data

Hour 2: EXTREME VALUES - BASIC MODELS

1. Limit Distributions For the Maximum

• Fr´echet

• Weibull

• Gumbel (or the extreme-value)

2. Generalized Extreme Value (GEV) Distribution 3. Limit Distributions for the Other Upper Extremes 4. Limit Distributions for the Minimum

5. Exceedances - Generalized Pareto Distribution (GPD) 6. Examples

(4)

Hour 3: EXTREME VALUES - INFERENCE AND APPLICATIONS

1. GEV Distribution- Maximum Likelihood Estimates 2. Blocked Maxima Approach

3. Inference based on top k extremes

• Pickands’ Estimator

• Hills Estimator

• Estimating High Quantiles

4. GPD and Estimates Based on Exceedances 5. General Diagnostic Plots

• QQ Plot

• Empirical Mean Excess Function Plot 6. Examples

7. General Remarks 8. Software Resources 9. References

(5)

Hour 1

1 Exact Distribution of An Order Statistic If the random variables X₁, ..., X_n are arranged in order of magnitude and then written as

X₍₁₎ ≤ · · · ≤ X_(n),

we call X_(i) the ith order statistic (i = 1, ..., n). We as- sume X_i to be statistically independent and identically distributed - a random sample from a continuous pop- ulation with cumulative distribution function (cdf) F (x) and probability density function (pdf) f (x).

CDF and PDF of X_r

F_(r)(x) = Pr{X_(r) ≤ x}

=

Xn i=r

µn i

¶

Fⁱ(x)[1 − F (x)]ⁿ⁻ⁱ Minimum : F₍₁₎(x) = 1 − [1 − F (x)]ⁿ Maximum : F_(n)(x) = [F (x)]ⁿ

f_(r)(x) = n!

(r − 1)!(n − r)!F^r−1(x)[1 − F (x)]^n−rf (x).

(6)

Standard Uniform Parent – PDFs f (x) = 1, 0 ≤ x ≤ 1.

0 1 2 3 4 5

pdf at x

0.2 0.4 0.6 0.8 1

x

Figure 1: The probability density function (pdf) of the standard uniform population and the pdfs of the first, second, third, fourth and largest order statistics from a random sample of size 5.

(7)

Standard Uniform Parent – CDFs

F (x) = x, 0 ≤ x ≤ 1.

E(X_(r)) = r

n + 1; V ar(X_(r)) = r(n − r + 1) (n + 1)² .

0 0.2 0.4 0.6 0.8 1

cdf at x

0.2 0.4 0.6 0.8 1

x

Figure 2: The cumulative distribution function (cdf) of the standard uniform population and the cdfs of the first (minimum), third (median), and the largest (maximum) order statistics from a random sample of size 5.

(8)

Standard Normal Parent – PDFs f (x; µ, σ) = 1

√2πσ exp{− 1

2σ²(x−µ)²}, −∞ < x < ∞.

Here µ is the mean and σ is the standard deviation.

Standard Normal: µ = 0; σ = 1.

−4 −2 0 2 4

0.00.20.40.60.8

x

Probability Density Function

1 5

2 4

3

Figure 3: The pdf of the standard normal population and the pdfs of the first, second, third, fourth and the largest order statistics from a random sample of size 5.

(9)

Standard Normal Parent – PDFs of the Maxima

−4 −2 0 2 4 6

0.00.20.40.60.81.01.2

x

10 50

100 200

Figure 4: The pdf of the standard normal parent and the pdfs of the maximum from random samples of size n = 10, 50, 100, 200.

(10)

Standard Exponential Parent – PDFs f (x; λ) = λ exp{−λx}, 0 ≤ x < ∞.

Here mean = std. dev. = 1/λ; Std. Exponential: λ = 1.

Moments for the Max:

E(X_(n)) ≈ log(n) − 0.5772; V ar(X_(n)) ≈ π²/6 ≈ 1.64.

0 1 2 3 4

0.00.20.40.60.81.01.2

x

16

17

18

19

20

Figure 5: The pdfs of the standard exponential parent and those of the top 5 order statistics from a random sample of size 20.

(11)

Standard Gumbel Parent – PDFs

F (x; a, b) = exp{− exp[−(x − µ)/σ]}, −∞ < x < ∞.

Interesting Fact: X_(n) is Gumbel with µ and σ/n.

Standard Gumbel: µ = 0, σ = 1;

E(X) = −γ(Euler’s constant) ≈ −0.5772;

Variance = π²/6.

−2 0 2 4 6

0.00.20.40.60.8

x

48

49

50

Figure 6: The pdfs of the standard Gumbel parent and those of the top 3 order statistics from a random sample of size n = 50.

(12)

2 Order Statistics Related Quantities of Actuarial Interest

Last Survivor Policy and Multiple Life Annuities

(a) Last Survivor Policy: Distribution of the Max- imum; of interest will be the mean, standard deviation and percentiles of the distribution.

(b) Multiple Life Annuities: Distribution of the rth order statistic from a sample of size n; say the third to die out of five.

Percentiles and VaR (Value at Risk)

Upper pth percentile of the Distribution of X. Here X could be the random variable corresponding to profits and losses (P&L) or returns on a financial instrument over a certain time horizon. Let

x_p = F⁻¹(1 − p),

where F⁻¹ is the inverse cdf or the quantile function.

If p = 0.25, or p = .05 say – central order statistics, p = 0.01 or p = 0.001 – extreme order statistics. VaR = Value-at-Risk is generally an extreme percentile.

(13)

Exceedances and Tail Probabilities

Exceedance is an event that an observation from the population of X exceeds a given threshold c. Probability exceeding or below a certain threshold c; probability of ruin.

Pr(X > c).

When c is an extreme threshold, we need EVT.

Conditional Tail Expectation (CTE)

Suppose the top 100p% of the population are selected and one is interested in the average of the selected group.

E(X|X > x_p) = µ_p = 1 p

Z _∞

x_p

xf (x)dx.

Mean Excess Function

Suppose c is the threshold and the interest is the av- erage excess above it.

E(X − c|X > c) = m(c) = 1 1 − F (c)

Z _∞

c

(x − c)f (x)dx

= 1

1 − F (c) Z _∞

c

[1 − F (x)]dx.

(14)

3 Large Sample Distribution of Order Statis- tics

• Central Order Statistics (Central Percentiles)

• Extreme Order Statistics

• Intermediate Order Statistics

The asymptotic theory of order statistics is concerned with the distribution of X_r:n, suitably standardized, as n approaches ∞. If r/n ≈ p, fundamentally different results are obtained according as

(a) 0 < p < 1 (central or quantile case) (b) r or n − r is held fixed (extreme case)

(c) p = 0 or 1, with r = r(n) (intermediate case).

In Cases (a) and (c) the limit distribution is generally normal. In case (b), it is one of the three extreme-value distributions, or a member of the family of Generalized Extreme Value (GEV) distributions.

(15)

Central Case

Result 1. Let 0 < p < 1, and assume r ≈ np, and 0 < f (F⁻¹(p)) < ∞. Then the asymptotic distri- bution of

n¹²(X_r:n − F⁻¹(p)) is normal with zero mean and variance

p(1 − p) [f (F⁻¹(p))]².

• With 0 < p₁ < p₂ < 1,

³

n¹²(X_r₁_:n − F⁻¹(p₁)), n¹²(X_r₂_:n − F⁻¹(p₂))

´ is bivariate normal with correlations

p₁(1 − p₂) p₂(1 − p₁).

• These limiting means, variances and correlations can be used to approximate the moments of central order statistics.

• One can use this result to estimate and find confidence intervals for central percentiles.

• One can find the large sample distribution of, e.g., the sample interquartile range (IQR).

(16)

Intermediate Case

Result 2. Suppose the upper-tail of F satisfies some smoothness conditions (von Mises conditions).

As r → ∞ and r/n → 0, with p_n = r/n, nf (x_p_n)r⁻¹² (X_n−r+1:n − z_p_n) is asymptotically standard normal.

• x_p is the upper pth quantile.

• There are 3 types of von Mises conditions - tied to EVT.

• Here and in the quantile case, one can obtain local estimates of the pdf f using neighboring order statis- tics.

Upper and Lower Extremes

None of the limit distributions is normal. There are 3 families of distributions that generate the limit distributions for the upper extremes, and for the lower extremes.

• A nonsymmetric parent can have different types of limit distributions for the max. and min.

(17)

• The limit distribution for the kth maximum is related to, but different from that of the maximum.

Example. Consider a random sample from a standard exponential population. That is,

Pr(X ≤ x) = 1 − exp(−x), x ≥ 0.

Then, for large n X_(n) − log(n) ≈ Gumbel with location parameter 0 and scale 1. That is,

Pr(X_(n)−log(n) ≤ x) ≈ exp[− exp(−x)], for all real x.

In contrast, the sample minimum is exactly Exponential with mean 1/n without any location shift. That is,

Pr(X₍₁₎ ≤ x) = 1 − exp(−nx), x ≥ 0.

Other Upper Extremes

Pr(X_(n−k)−log(n) ≤ x) ≈ exp[− exp(−x)]

Xk j=0

exp(−jx) j! . Other Lower Extremes

Pr(nX_(k) ≤ x) ≈ 1 − exp(−x) Xk

j=0

(x)^j

j! , x ≥ 0.

(18)

4 Sum of top order statistics & sample CTE Suppose we are interested in estimating the CTE µ_p = E(X|X > x_p) where x_p is the upper pth quantile and p is not very small (say 5%) and let k/n ≈ p. Define the sample CTE as

D_k = 1 k

Xn j=n−k+1

X_(j). Result 3. For large n,

√k(D_k − µ_p) ≈ N(0, σ_p² + p(x_p − µ_p)²).

where σ_p² = V ar(X|X > x_p).

• On the right, x_p, µ_p and σ_p² can be estimated from the sample using the sample quantile X_(n−k), D_k and S_D² , the sample variance of the top k order statistics.

• The CTE and Mean Excess Function are related.

E(X|X > c) = c + m(c).

(19)

5 Data Examples and Model Fitting

Capital Requirements Data

Data Source: Steve Craighead, Nationwide Insurance, Columbus Values are negatives of the cap-requirements

-10000000 30000000 60000000

-10000000 0 10000000 20000000 30000000 40000000 50000000 60000000 70000000 80000000 90000000

.001 .01 .05.10 .25 .50 .75 .90.95 .99 .999

-4 -3 -2 -1 0 1 2 3 4

Normal Quantile Plot

Quantiles

100.0% maximum 89490993

99.5% 36103098

97.5% 11807434

90.0% 2782728.4

75.0% quartile 207245.65

50.0% median -2.094e+6

25.0% quartile -3.8605e6

10.0% -5.1367e6

2.5% -6.5963e6

0.5% -8.0968e6

0.0% minimum -9.133e+6

Moments

Mean -1.0354e6

Std Dev 5983789.7

Std Err Mean 189318.73

upper 95% Mean -663845.7 lower 95% Mean -1.4069e6

N 999

Bivariate Fit of mexf By (-capreq)_lag1

10000000 20000000 30000000

mexf

19

(20)

Danish Fire Insurance Claims Data Data Source: http://www.math.ethz.ch/~mcneil/

(Alexander McNeil Swiss Federal Institute of Technology) Original Source: Rytgaard (ASTIN Bulletin)

0 100 200

Loss-in-DKM 01/01/1978 01/01/1980 01/01/1982 01/01/1984 01/01/1986 01/01/1988 01/01/1990 01/01/1992

Date

0 10 20 30 40 50 60

Quantiles (Note: Top 3 values were excl.)

100.0% maximum 65.707

99.5% 32.402

97.5% 15.921

90.0% 5.506

75.0% quartile 2.964

50.0% median 1.775

25.0% quartile 1.321

10.0% 1.113

2.5% 1.020

0.5% 1.000

0.0% minimum 1.000

Moments (Top 3 excl.)

Mean 3.1308527

Std Dev 4.6580174

Std Err Mean 0.1001319

upper 95% Mean 3.3272175 lower 95% Mean 2.9344879

N 2164

Bivariate Fit of mexf By loss_lag1

0 20 40 60 70 90 110 120

mexf

0 10 20 30 40 50 60 70 80 90100 120 140 loss_lag1

(21)

Missouri River Annual Maximum Flow Data for the Towns of Boonville and Hermann

Data Source: US Army Corp of Engineers

-100000 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 1100000

Boonville

1880 1900 1920 1940 1960 1980 2000 Year

0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 1100000

Hermann

1880 1900 1920 1940 1960 1980 2000 Year

0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000

Extreme-Value and Normal fits for Boonville

100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000

Extreme Value Fit for Hermann

0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000

Boonville

100000 300000 500000 700000 900000 Hermann

Pearson Correlation

Boonv ille Hermann

1.0000 0.8936

0.8936 1.0000 Boonv ille Hermann

(22)

Venice Annual Extreme Sea level Readings (1931-1981) Data Source: Reiss and Thomas (1997); XTREMES software.

50 100 150 200

max1

1920 1930 1940 1950 1960 1970 1980 1990 y ear

Highest Annual level

70 80 90 100 110 120 130 140 150

max2

1920 1930 1940 1950 1960 1970 1980 1990 y ear

Second Best Annual level

60 80 100 120 140 160 180 200

max1

1930 1940 1950 1960 1970 1980 1990 y ear

max1 = -990.0706 + 0.5673303 year

60 80 100 120 140 160 180

Extreme Value Fit for mod.max1

Pairwise Pearson Correlations

max1 max2 max3 max4 max5 max6

1.0000 0.8103 0.7746 0.6724 0.6741 0.6685

0.8103 1.0000 0.9455 0.9145 0.8872 0.8626

0.7746 0.9455 1.0000 0.9524 0.9208 0.8901

0.6724 0.9145 0.9524 1.0000 0.9662 0.9253

0.6741 0.8872 0.9208 0.9662 1.0000 0.9682

0.6685 0.8626 0.8901 0.9253 0.9682 1.0000

max1 max2 max3 max4 max5 max6

(23)

Hour 2: EXTREME VALUES - BASIC MODELS

1 Sample Maximum

For an arbitrary parent distribution, X_n:n, the largest in a random sample of n from a population with cdf F (x), even after suitable standardization, may not possess a limiting distribution.

Result 3. If F (x) is such that (X_(n) − a_n)/b_n has a limit distribution for large n, then the limiting cdf must be one of just three types:

(Fr´echet) G₁(x; α) = 0 x ≤ 0, α > 0,

= exp(−x^−α) x > 0;

(Weibull) G₂(x; α) = exp [−(−x)^α] x ≤ 0, α > 0,

= 1 x > 0;

(Gumbel) G₃(x) = exp¡

−e^−x¢

− ∞ < x < ∞.

(1)

(24)

Notes

• For a given parent cdf F there can be only one type of G.

• Norming constants a_n and b_n need to be estimated from the data and formulas depend on the type of G and parent cdf F . Convenient choices are given below in Result 4.

• For the sample minimum, the limit distributions are similar and have one-to-one relationship with the family in (1).

• necessary and sufficient conditions for each of the 3 possibilities are known and are very technical. They depend on the right-tail thickness of F .

• For many insurance applications, the right-tail is Pareto like resulting in limiting Fr´echet distribution.

• Simpler sufficient conditions are available for continuous distributions.

• There are parent cdfs for which no (nondegenerate) limit distribution exists for the max.

(25)

Formulas for Norming Constants For the Maximum Result 4. The location shift a_n and scale shift b_n are

(i) a_n = 0 and b_n = x_1/n if G = G₁

(ii)a_n = F⁻¹(1) and b_n = F⁻¹(1) − x_1/n if G = G₂ (iii) a_n = x_1/n and b_n = E(X − a_n|X > a_n) ≈ [nf (a_n)]⁻¹ if G = G₃

Domains For Common Distributions

Fr´echet: F has infinite upper limit.

Cauchy; Pareto; Burr; Stable with index < 2; Loggamma.

Weibull: F Bounded Above.

Uniform; Power law (upper end); Beta.

Gumbel: Bounded or unbounded F .

Exponential; Weibull; Gamma; Normal; Benktander-type I and type II.

(26)

An Example: Pareto Distribution

Suppose the cdf is

F (x) = 1 − x^−α, x ≥ 1, α > 0.

Then

Pr(X_(n) ≤ n^1/αx) ≈ exp{−x^−α} for all x > 0.

Note: The α of the limit distribution of the maximum is the α of the parent distribution representing the tail thickness.

Another Example Suppose the cdf is

F (x) = 1 − (log(x))⁻¹, x ≥ e.

Then you cannot normalize X_(n) so that we get a nondegenerate limit distribution.

(27)

2 Generalized Extreme-Value Distribution The G_k in (1) are members of the family of generalized extreme-value (GEV) distributions

G_ξ(y; µ, σ) =









 exp

h

−(1 + ξ^y−µ_σ )⁻¹^ξ i

, 1 + ξ^y−µ_σ > 0, ξ 6= 0, exp¡

−e^{−[(y−µ)/σ]}¢

, −∞ < y < ∞, ξ = 0.

(2) ξ is the shape, µ is the location and σ is the scale pa- rameter.

Connections

G₁(y; α) when ξ > 0 , σ = 1, µξ = 1 and α = 1/ξ, G₂(y; α) when ξ < 0, σ = 1, and α = −1/ξ

G₃(y) when ξ = 0, σ = 1 and µ = 0.

(28)

Moments of GEV

Take µ = 0, σ = 1.

When ξ 6= 0,

Mean = 1

ξ [Γ(1 − ξ) − 1] , ξ < 1;

Variance = 1 ξ²

£Γ(1 − 2ξ) − Γ²(1 − ξ)¤

, ξ < 1/2;

Mode = 1 ξ

£(1 + ξ)^−ξ − 1¤ .

When ξ = 0, we have the Gumbel distribution and Mean = γ = 0.5772.. = Euler’s constant;

Variance = π²/6; Mode = 0.

Quantile Function of GEV

Take µ = 0, σ = 1.

When ξ 6= 0,

G⁻¹_ξ (q) = 1 ξ

©[− log(q)]^−ξ − 1ª

For the Gumbel cdf, G⁻¹₀ (q) = − log[− log(q)].

(29)

3 Limit Distributions of the top Extremes kth Order Statistic

Result 5.If F (x) is such that (X_(n) − a_n)/b_n has a limiting cdf G, then, for a fixed k, the limiting cdf of (X_(n−k+1) − a_n)/b_n is of the form:

G^(k)(x) = G(x) Xk−1

j=0

[− log G(x)]^j

j! .

Notes

• The same G and the same set of norming constants work.

• The form of G^(k)(x) has a Poisson partial sum form.

Pareto Example Contd.

Suppose the cdf is

F (x) = 1 − x^−α, x ≥ 1, α > 0.

Then

Pr(X_(n−k+1) ≤ n^1/αx) ≈ exp{−x^−α} Xk−1

j=0

x^−jα j!

for all x > 0.

(30)

Limiting Joint Distribution of the top kth Order Statistics

Result 6. If (X_(n) − a_n)/b_n has limiting cdf G and g is the pdf of G then the k-dimensional vector

µX_(n) − a_n

b_n , . . . , X_(n−k+1) − a_n b_n

¶

(3) has a limit distribution with joint pdf

g_(1,...,k)(w₁, . . . , w_k) = G(w_k) Yk

i=1

g(w_i)

G(w_i), w₁ > · · · > w_k. (4) Pareto Example Contd.

Suppose the cdf is

F (x) = 1 − x^−α, x ≥ 1, α > 0.

Then ³

X_(n)/n^1/α, . . . , X_(n−k+1)/n^1/α

´ has limiting joint pdf

g_(1,...,k)(w₁, . . . , w_k) = exp{−w^−α_k } Yk

i=1

α w^α+1_i , for all w₁ > · · · > w_k > 0.

(31)

Notes

• Result 6 is useful for making inference when only the top k observations are available.

• The joint density in (4) will involve the location, scale and shape parameters associated with the General- ized Extreme Value Distribution of interest.

• Methods of Maximum Likelihood or Moments for the Estimation of the parameters will involve this joint density.

Discrete Parent Populations

While the results hold, no familiar discrete distribution is in the domain of maximal attraction (or minimal attraction).

(32)

4 Limiting Distribution of the Sample Mini- mum

Result similar to Result 3 holds for the asymptotic dis- tribution of the standardized minimum, (X₍₁₎ − a^∗_n)/b^∗_n.

Result 7.The three possible limiting cdfs for the minimum are as follows:

G^∗₁(x; α) = 1 − exp£

−(−x)^−α¤

x ≤ 0, α > 0,

= 1 x > 0;

G^∗₂(x; α) = 0 x ≤ 0, α > 0, (5)

= 1 − exp(−x^α) x > 0;

G^∗₃(x) = 1 − exp (−e^x) − ∞ < x < ∞.

Clearly, G^∗_i(x) = 1 − G_i(−x), i = 1, 2, 3. where G_i is given by (1).

Notes

• Here G^∗₂(x; α) is the real Weibull distribution!

• Norming constants can be obtained using Result 4 with appropriate modifications.

• Instead of dealing with the sample minimum from X one can see it as the negative of the maximum from

(33)

−X. Thus need to know only the upper extremes well.

Pareto Example Contd.

Suppose the cdf is

F (x) = 1 − x^−α, x ≥ 1, α > 0.

Then

P (X₍₁₎ > x) = [1 − F (x)]ⁿ = x^−nα, x ≥ 1.

So

P {n(X₍₁₎ − 1) > y} = [1 + y

n]^−nα, y ≥ 0.

This approaches exp(−y^α) as n increases. So the limiting cdf is G^∗₂(x; α).

Joint Limiting Distribution of Max and Min.

Any “lower” extreme X_(r) is asymptotically indepen- dent of any “upper” extreme X_(n+1−k). This result is very useful for finding the limiting distributions of statistics such as the range and the midrange.

(34)

5 Distributions of Exceedances and the Gen- eralized Pareto Distribution (GPD)

When F is unbounded to the right, there is an inter- esting connection between the limit distribution for X_(n) for large n and the limit behavior as t → ∞ of stan- dardized excess life (X − a(t))/b(t), conditioned on the event {X > t}.

Balkema and de Haan (1974), and Pickands (1975) characterize the family of limiting distributions for the excess life.

Result 8. For a nondegenerate cdf H(x) if

t→∞lim Pr

½X − a(t)

b(t) > x¯

¯X > t

¾

= 1 − H(x) then H(x) is a Pareto-type cdf having the form

H_ξ(x) =









1 − (1 + ξx)⁻¹^ξ, x ≥ 0, if ξ > 0 1 − e^−x, x ≥ 0, if ξ = 0

(6)

if and only if Pr

µX_(n) − a_n

b_n ≤ x

¶

→ G_ξ(x)

where G_ξ is the GEV cdf given in (2) with the same shape parameter ξ ≥ 0.

(35)

Moments of GPD When ξ 6= 0,

Mean = 1

1 − ξ, ξ < 1;

Variance = £

(1 − 2ξ)(1 − ξ)²¤₋₁

, ξ < 1/2;

Mode = 1 ξ

£(1 + ξ)^−ξ − 1¤ .

Mean Excess Function = E(X−t|X > t) = 1 + ξt

1 − ξ , ξ < 1.

When ξ = 0, we have the standard exponential distri- bution and

Mean = Standard Deviation = 1 = Mean Excess Function.

Quantile Function of GPD When ξ 6= 0,

H_ξ⁻¹(q) = 1 ξ

©(1 − q)^−ξ − 1ª .

For the Exponential cdf, H₀⁻¹(q) = − log(1 − q).

Note

In addition to the shape parameter ξ the GPD has one more parameter, namely the scale parameter σ.

(36)

Hour 3: INFERENCE FOR EXTREME VALUE MODELS

1 Generalized Extreme Value Distribution Let Y be a random variable having a generalized extreme- value (GEV) distribution with shape parameter ξ, loca- tion parameter µ and scale parameter σ. The cdf is (see (2))

G_ξ(y; µ, σ) =









 exp

h

−(1 + ξ^y−µ_σ )⁻¹^ξ i

, 1 + ξ^y−µ_σ > 0, ξ 6= 0, exp¡

−e^{−[(y−µ)/σ]}¢

, −∞ < y < ∞, ξ = 0.

The pdf is

g_ξ(y; µ, σ) =











1

σG_ξ(y; µ, σ)(1 + ξ^y−µ_σ )⁻¹^ξ,

ξ(y − µ) > −σ, ξ 6= 0,

1

σ exp¡

−e^{−[(y−µ)/σ]}¢

e^{−[(y−µ)/σ]},

−∞ < y < ∞, ξ = 0.

(7)

(37)

1.1 Maximum Likelihood Method

• For a given data (Y₁, . . . , Y_m) whose joint likelihood L (i.e., probability density function) is known but for an unknown parameter θ, we choose the parameter value such that the likelihood L is maximized.

• It is generally chosen by looking at the first deriva- tives with respect to θ (if θ represents a multiparam- eter) of the logarithm of the likelihood (log L) and equating to 0 and solving for θ. That is,

∂ log L

∂θ = 0

is solved and the solutions are checked to see which one corresponds to the largest possible value for the likelihood L.

• This general recipe produces estimates called Maxi- mum Likelihood Estimates (MLE) (say ˆθ) that have good large-sample properties under “regularity conditions”.

• One condition is that the range of the data is free of the unknown parameter we are trying to estimate (GEV with ξ 6= 0 is a “nonregular situation”.)

• In particular, if m is the sample size and m is large,

(38)

generally, √

m(ˆθ − θ) ≈ N(0, σ_θ²)

where σ_θ² is related to the average Fisher information in the sample.

• This variance σ_θ² can be estimated by the reciprocal of

− 1 m

∂²log L

∂θ²

¯¯

θ=ˆθ

Example. If we have a random sample of size m from an exponential distribution with pdf

f (y; θ) = 1

θ exp{−y/θ}, y ≥ 0, the log-likelihood is given by

log L(θ) = −m log(θ) − 1 θ

Xm i=1

y_i.

Differentiating this with respect to θ and equating to 0 we get only one solution namely,

θ = y,ˆ

the sample mean. Var(Y ) = θ² and this is the reciprocal of the Fisher information per observation. We know (central limit theorem) that

√m(ˆθ − θ) ≈ N(0, θ²).

(39)

2 Blocked Maxima Approach

Suppose we have a LARGE sample of size n from a dis- tribution with cdf F whose sample maximum is in the domain of maximal attraction. We are interested in the large percentiles and upper tails of F .

Suppose we can divide the total sample into m nonover- lapping blocks each of size r where r itself is big (say over 100). Note that n = m · r. Let Y₁ be the largest in the first block,..., Y_m be the largest of the mth block. Then Y₁, . . . , Y_m behaves like a random sample from the GEV distribution with pdf g(y) given by (7).

Note: If we know that F is in the domain of attraction of Gumbel, then ξ = 0 and hence we will have one less parameter and we will satisfy the regularity conditions for nice properties for the MLE to hold. We will consider it first.

Gumbel Distribution

From (7) we can write the joint pdf of Y₁, . . . , Y_m. The log-likelihood, log L(µ, σ) will be

−m log σ − Xm

i=1

µy_i − µ σ

¶

− Xm

i=1

exp

½

−

µy_i − µ σ

¶¾ .

(40)

We differentiate this with respect to µ and σ and ob- tain two equations. Solving iteratively using numerical techniques, we obtain the MLE (ˆµ, ˆσ). If m is large, the vector is approximately normally distributed with mean (µ, σ) and covariance matrix

1 m

6σ² π²

Ãπ²

6 + (1 − γ)² (1 − γ)

(1 − γ) 1

!

where γ is the Euler’s constant 0.5772... This can be used to provide confidence intervals for the parameter estimates.

GEV Distribution

The log-likelihood for the GEV parameters is given by

−m log σ −(1 + ¹_ξ)P_m

i=1log£

1 + ξ¡_y

i−µ σ

¢¤

−P_m

i=1

£1 + ξ ¡_y

i−µ σ

¢¤_−1/ξ ,

provided ξ(y_i − µ) > −σ for i = 1, . . . , m. Since the range depends on the unknown parameters, the MLEs may not have the usual nice properties.

(41)

Smith (1985)

• When ξ > −0.5, MLEs have the usual properties.

• −1 < ξ ≤ −0.5: MLEs are generally obtainable but do not have the standard asymptotic properties.

• ξ < −1: MLEs are unlikely to be obtainable from numerical techniques- likelihood is too bumpy.

• If ξ ≤ −0.5, we have a short bounded upper tail for F - not encountered; Fr´echet has ξ > 0.

• ( ˆξ, ˆµ, ˆσ) is asymptotically normal with mean (ξ, µ, σ) and the covariance matrix can be approximated using the inverse of the observed Fisher information matrix.

Large Quantile Estimation

Let z_p be the upper pth quantile of the GEV. Then its estimate is given by

ˆ z_p =







 ˆ µ − ^ˆ^σ_ˆ

ξ(1 − y_p^ξ^ˆ), if ˆξ 6= 0 ˆ

µ − ˆσ log(y_p), if ξ = 0 (Gumbel),

where y_p = − log(1 − p) is the upper pth quantile of the standard exponential population. Confidence inter- val for this percentile estimate can be computed using

(42)

the covariance matrix of the MLEs of the individual parameters and linear approximations.

What we need are the estimates of the large quantiles of F . Suppose

F (x_p₀) = 1−p₀ or F^r(x_p₀) = Pr(X_(r) ≤ x_p₀) = (1−p₀)^r. Then with p = 1 − (1 − p₀)^r we have

ˆ

x_p₀ = ˆz_p

given above. For example, suppose we need to estimate 99.5th (= 1 − p₀) percentile of the parent population and we have used blocks of size 100. Then p = 1 − (0.995)¹⁰⁰ ≈ 0.39 and we use this percentile estimate ˆz_.39.

Tail Probability Estimation

Pr(X > c) = 1 − F (c)

= 1 − [F^r(c)]^1/r

≈ 1 − [G_ξ(c; µ, σ)]^1/r.

Use the estimates of ξ, µ, and σ in the above formula to estimate the tail probability for the population.

(43)

3 Inference Using Top k Order Statistics Suppose n is large, but not large enough to make blocks of subsamples of substantial size. One can think of using the top k order statistics. If k is small when compared to n, one can use the joint pdf given in Result 6, namely,

g(y_k) Yk

i=1

g(y_i)

G(y_i), y₁ > · · · > y_k

where g and G correspond to the GEV distribution. Thus, the likelihood will be

exp (

−

·

1 + ξ(y_i − µ σ )

¸₋¹

ξ

) _k Y

i=1

1 σ

·

1 + ξ(y_i − µ σ )

¸₋¹

ξ−1

, where ξ(y_i − µ) > −σ, i = 1, . . . , k. Using this like- lihood, one can obtain the MLEs of ξ, µ, σ. These esti- mates can be used to estimate the high percentiles of F by using p = 1 − (1 − p₀)ⁿ and the ˆz_p given earlier.

Remarks

• If we have data of top k order statistics from several blocks, say m, then we use the product of these like- lihoods and determine the MLEs using the combined likelihood.

(44)

• How large k can we take? A tricky question. Larger k means decreased standard error of the estimators, but increased bias as we move away from G towards F . Typically one increases k and looks at a plot of the estimates of the parameters; when they stabilize, one stops.

Quick Estimators of ξ

(a) Hill’s (1975) Estimator

• Assume ξ > 0 (Fr´echet type). This means upper limit of F is infinite.

ξˆ_H = 1 k

Xn i=n−k+1

log(X_(i)) − log(X_(n−k)).

• This is the mean excess function for the log(X) val- ues! log(X) values are defined for X > 0 and this is the case for the upper extremes from large samples.

• The Hill estimator is also asymptotically normal.

• This is the MLE of ξ assuming µ = 0 and σ = 1.

The MLE of α = 1/ ˆξ.

• Choice of k is determined by the examination of the plot of (k, ˆξ_H(k)).

(45)

(b) Pickands’ Estimator

• Works for any ξ real.

ξˆ_P = 1

log 2log

½ X_(n−k+1) − X_(n−2k+1) X_(n−2k+1) − X_(n−4k+1)

¾ .

• Here k is large but k/n is small.

• Choice of k is determined by the examination of the plot of (k, ˆξ_P(k)).

Remark

There are other estimators that are refinements of Hill’s and Pickands’. They have explicit forms and hence com- putation is easy, but there are no clear preferences.

(46)

4 Generalized Pareto Distribution – Estimates Based on Exceedances

Let t be the threshold and we use only the X_i that exceed t. Then, for t large, the exceedances Y ^∗ = X −t have cdfs that can be approximated by a Pareto-type cdf having the form

H_ξ(y; σ) =









1 − (1 + ξ^y_σ)⁻¹^ξ, y > 0; ξy > −σ, if ξ 6= 0 1 − e^−y/σ, y ≥ 0, if ξ = 0

(8) If there are k exceedances above t, say y₁^∗, . . . , y_k^∗, the MLEs of ξ and σ are obtained by using the log-likelihood

−k log(σ) − (1 + 1 ξ)

Xk i=1

log(1 + ξy_i^∗ k )

and for the case ξ = 0 we have the MLE based on the exponential distribution.

• The variance covariance matrix of ( ˆξ, ˆσ) is approxi- mated by using the inverse of the sample Fisher information matrix.

• Let z_p be the upper pth percentile of H_ξ(y; σ). Then an estimate of the upper pth percentile of F is given

(47)

by

ˆ

x_p = t + σˆ ξˆ

(µPr{X > t}

p

¶ξˆ

− 1 )

where we assume Pr{X > t} > p and Pr{X > t} is estimated by the sample proportion exceeding t.

• Choice of the threshold t - how close the exceedances are modeled by the Generalized Pareto distribution.

• Such data are called Peaks Over Threshold (POT) data.

(48)

5 General Diagnostic Plots

5.1 QQ plots

Plots of Sample Order Statistics against the Quantiles of the fitted/hypothesized distribution. Generally

¡X_(i), F⁻¹(_n+1ⁱ )¢

for i = 1, . . . , n, is plotted. Sometimes (i − ¹₂)/n is used in place of _n+1ⁱ . Linear trend indicates good fit.

5.2 Mean Excess Function Plots

Plots of (k, e_n(k)), k = 1, 2, ... that tell us how close the data agrees with the assumed distribution. Here e_n(k) is the sample mean excess function

e_n(k) = 1 k

Xn i=n−k+1

X_(i) − X_(n−k) = D_k − X_(n−k).

5.3 Plots for Choosing the top k Order Statistics

Plots of (k, ˆθ(k)) where ˆθ(k) is the estimate of the param- eter θ based on top k order statistics; it is accompanied by standard errors of the estimate. Used to examine the trade-off characteristics between bias and variance, and choose k for making inference.

(49)

6 General Remarks

1. The distribution of the maximum of a Poisson number of IID excesses over a high threshold is a GEV.

(Crucial for stop-loss treaties.)

2. Relaxing Model Assumptions- Adjusting for Trend.

3. Stationarity with mild dependence structures.

4. Non-identical but independent, with no dominating distribution.

5. Linear processes (ARMA etc.)

(50)

7 Examples

Extreme Sea Levels in Venice Data Largest annual sea levels from 1931-1981.

a) Maxima of blocks of n = 365 days.

b) Scatterplot and Adjusting for trend.

c) Fitting GEV.

d) Estimating large percentile for 2005.

Reference: Ferreiras article in Reiss and Thomas (1997);

XTREMES software.

(51)

8 Computational/Software Resources

• XTREMES.

A free-standing software that comes with Reiss and Thomas’ book (academic edition- with limited data capacity). Professional edition is also available. Web- site:

http://www.xtremes.math.uni-siegen.de/xtremes/

• Alexander McNeil’s Website:

http://www.math.ethz.ch/ mcneil/software.html Has free software attachment that works with S-PLUS.

• http://www.maths.bris.ac.uik/ masgc/ismev/summary.html Associated with Coles’ book.

(52)

9 References

1. Arnold B. C., Balakrishnan N., and Nagaraja, H. N.

(1992). A First Course in Order Statistics. John Wiley, New York.

An elementary introduction to Order Statistics.

2. Balkema, A. A. and de Haan, L. (1974). Residual lifetime at great age. Annals of Probability 2, 792- 804.

3. Cebri´an, A. C., Denuit, M., and Lambert, P. (2003).

Generalized Pareto fit to the Society of Actuaries’

Large Claims Database. North American Actuarial Journal 7, No. 3, 18-36.

Shows how the GPD model outperforms parametric counter parts for this data set.

4. Coles, S. (2001). An Introduction to Statistical Modeling of Extreme Values. Springer Verlag, Lon- don.

A truly enjoyable applied introduction to extremes;

an excellent starting point. Has S-Plus codes and help to run them. Has two data examples from finance.

5. Craighead, S. (2004). Personal communication - Cap- ital Requirements Data.

(53)

6. David H. A. and Nagaraja, H. N. (2003). Order Statistics, Third Edition. John Wiley, New York.

An encyclopedic reference - a dense treatment.

7. Embrechts P., Kl¨uppelberg C., and Mikosch, T. (1997).

Modelling Extremal Events for Insurance and Fi- nance. Springer Verlag, Berlin.

A compendium of theory and applications - requires a strong mathematical background; has examples from Insurance and Finance.

8. Hill, B. M. (1975). A simple general approach to inference about the tail of a distribution. Annals of Statistics 3, 1163-1174.

9. McNeil, A. J. Webpage: http://www.math.ethz.ch/ mcneil/

An excellent resource for software, datasets, and pa- pers.

10. McNeil, A. J. (1996). Estimating the tails of loss severity distributions using extreme value theory. (his webpage)

11. Nagaraja, H. N., Choudhary, P.K., and Matalas, N.

(2002). Number of records in a bivariate sample with application to Missouri river flood data. Methodol-

(54)

ogy and Computing in Applied Probability 4, 377- 392.

12. Pickands, J. (1975). Statistical inference using ex- treme order statistics. Annals of Statistics 3, 119- 131.

13. Reiss R.-D. and Thomas, M. (1997). Statistical Anal- ysis of Extreme Values. Birkh¨auser Verlag, Basel, Switzerland.

XTREMES software comes with the book.

14. SAS JMP software, Version 5.1. (2003). SAS Insti- tute, Cary NC.

15. Smith, R. L. (1985). Maximum likelihood estimation in a class of non-regular cases. Biometrika 72, 67- 90.