A Reliability-based Selective Index for Regional Flood Frequency Analysis Methods

(1)

A reliability-based selective index for regional flood

frequency analysis methods

Gwo-Fong Lin* and Lu-Hsien Chen

Department of Civil Engineering, National Taiwan University, Taipei 10617, Taiwan

Abstract:

In this paper, a new index is proposed for the selection of the best regional frequency analysis method. First, based on the theory of reliability, the new selective index is developed. The variances of three regional T-year event estimators are then derived. The proposed methodology is applied to an actual watershed. For each regional method, the reliability of various T-year regional estimates is computed. Finally, the reliability-based selective index graph is constructed from which the best regional method can be determined. In addition, the selection result is compared with that based on the traditional index, root mean square error. The proposed new index is recommended as an alternative to the existing indices such as root mean square error, because the influence of uncertainty and the accuracy of estimates are considered. Copyright 2003 John Wiley & Sons, Ltd.

KEY WORDS regional flood frequency analysis; reliability; selective index

INTRODUCTION

Regional flood frequency analysis is often needed to estimate T-year event magnitude at sites without observed data. Several regional flood frequency analysis methods have been proposed. How to choose the most suitable or the best regional method for an ungauged site is an important task. For the selection, usually one compares the regional and at-site estimates. For such a purpose, the most common measure of the performance of the regional estimates is the root mean square error (Wiltshire, 1986; Burn, 1988, 1990; Cunnane, 1988; Ribeiro-Correa et al., 1995; GREHYS, 1996, 1997). The root mean square error (RMSE) of the T-year event estimate is: RMSE D 1 Ns Ns iD1 O qi_R,T Oqi_AS,T O qi_AS,T 2 1

where Ns is the number of stations, OqiAS,T and OqiR,T are respectively the at-site and regional T-year event

estimates of Station i. The regional method that yields the smallest value of root mean square error is the best method.

An important question facing the hydrologists in the frequency analysis, at-site or regional, is the natural uncertainty as well as the model and parameter uncertainty. For the selection of a regional method, the influence of uncertainty has been neglected in the past. Such a limitation has prompted a search for an improved approach to selecting the most suitable regional method. The objective of this paper is to find a new index that is based on reliability. A reliability-based selective index graph is constructed from which the best regional method can be determined. The advantages of the proposed methodology are demonstrated through an actual application.

* Correspondence to: Gwo-Fong Lin, Department of Civil Engineering, National Taiwan University, Taipei 10617, Taiwan. E-mail: [email protected]

(2)

A RELIABILITY-BASED SELECTIVE INDEX

When performing a regional analysis, one expects that the regional T-year estimate be as close to the true T-year event as possible for an ungauged site. In addition, one desires that the regional estimate will not underestimate the true event. The probability that the regional estimate will not underestimate the true event is: r DProb[qR½qAS] 2

where qRis the regional estimator and qASis the at-site estimator for a particular return period T-year (which

is regarded as true relative to the regional estimate). Equation (2) can be further written as:

r DProb[SM ½ 0] 3

where SM is the safety margin (SM D qRqAS). The mean of SM is given by:

SMDE[qRqAS] D qRqAS 4

where qR is the mean of the regional estimator and qAS is the mean of the at-site estimator. Furthermore,

the variance of SM can be obtained as:

_SM2 D_q2_RC_q2_AS2Cov[qR, qAS] 5

where 2

qR is the variance of the regional estimator, 2

qAS is the variance of the at-site estimator, and

Cov[qR, qAS] is the covariance of the regional and at-site estimators. If the regional and at-site estimators

are independent, then

_SM2 D_q2_RC_q2_AS 6 The assumption of a normal distribution for SM is quite satisfactory when r 0Ð99 (Ang, 1973). If SM is assumed to be normally distributed, then the standardized random variable U D SM SM/SM is also

normally distributed. Hence, Equation (3) becomes:

r DProb[U ½ SM/SM] D FUSM/SM 7

where U is the standardized normal random variable and FUuis the cumulative distribution function of the

standardized random variable U.

Statistical estimates are often presented with an interval. The concept of interval estimates is adopted herein for the development of the new selective index. The new selective index ωˇ is defined as the probability of

the regional estimate lying in the interval of 1 C ˇqAS and 1 ˇqAS. In other words, ωˇ is the probability

of the regional estimate lying within ˇqASof the true event magnitude qAS. Mathematically, the new selective

index ωˇ is written as:

ωˇ Dr1ˇr1Cˇ 8

where

r1šˇDProb[qR½1 š ˇqAS] 9

From Equations (8) and (9), one can see that ωˇ increases as the value of ˇ increases. When performing the

regional frequency analysis, one can construct the relationship of ωˇ and ˇ for each regional method. The

graph containing the curves of ωˇ versus ˇ for various methods is called the ‘selective index graph’ herein.

For a fixed value of ˇ, the method with a larger value of ωˇ is a better method according to the definition of

the selective index ωˇ (see Equation (8)). That is, the best method has the largest probability that the regional

(3)

VARIANCE OF REGIONAL ESTIMATORS

As shown in the previous section, we need to know the means and variances of both the regional and at-site estimators. The means and variances of various at-site estimators are already available. In this section, we try to derive the variances of the regional estimators corresponding to three regional methods as follows.

The station-year method

For an ungauged site, the regional T-year event estimator, OqR,T, is given by:

O

qR,T D OxTO 10

where Ois the estimator of mean annual maximum flow for the ungauged site, and OxTis called the standardized

regional T-year event estimator. The standardized annual maximum flow for Year j at Station i is defined as: xijDqij/qi 11

where q_i is the mean annual maximum flow at Station i.

The generalized extreme value (GEV) distribution is adopted herein. The cumulative distribution function of the GEV distribution is:

FXx Dexp   1 kx ˛ 1   12

where is the location parameter, ˛ is the scale parameter and is the shape parameter. Extreme value distributions Type 1, 2 and 3 correspond to D 0, < 0 and > 0, respectively. Equation (12) applies for 1< x < 1, C ˛/ x < 1 and 1 < x C ˛/ for extreme value distributions Type 1, 2 and 3, respectively. The at-site T-year event estimator, OqAS,T, is expressed as:

O qAS,T D O C O ˛ O f1 K O Tg 13 where KT D ln 1 1 T 14

The mean and approximate variance of OqAS,T can be obtained as (Lu and Stedinger, 1992):

E[OqAS,T] D C ˛ [1 1 C ] 15 Var[OqAS,T] D a2exp[a0p C a1pexp C a2p2Ca3p3] n 16

where p is the cumulative probability; n is the sample size; a0p, a1p, a2p and a3p are coefficients

depending on p. Equation (16) provides good estimates of Var[OqAS,T] for n ½ 10 and jj 0Ð3. One can refer

to Lu and Stedinger (1992) for a0, a1, a2 and a3.

Assuming OxT and Oare independent, one can obtain the variance of the regional T-year event estimator,

O qR,T, as:

Var[OqR,T] D E[OxT]2 Var[ O] C E[ O]2 Var[OxT] 17

where Ocan be obtained from a regression model which is a function of mean daily flow, basin area, basin slope, etc. The model is written as:

(4)

where M D 1 2 Ð Ð Ð mT 19 B D     1 b11 b12 Ð Ð Ð b1n 1 b21 b22 Ð Ð Ð b2n .. . ... ... ... 1 bm1 bm2 Ð Ð Ð bmn     20 C D c0 c1Ð Ð ÐcnT 21

where mis the mean annual maximum flow for Station m, bmnis the nth basin characteristic for Station m,

and C is a column vector containing the regression coefficients. The variance of the estimated coefficient OMis given by:

Var[ OM] D 2B0B0B1B 22 where B0 _{is the condition vector and}2 _{is the error variance of the regression model. Once E[Ox}

T], Var[OxT]

and Var[ O] are obtained from Equations (15), (16) and (22), the variance of the regional estimator can be determined from Equation (17).

The index-flood method

The Gumbel distribution (extreme value distribution Type 1) is adopted herein. The at-site T-year event estimator can be written as:

O

qAS,TD C ˛y 23

where ˛ and are parameters estimated using the L-moment, and y D ln[ ln1 1/T] is the Gumbel reduced variate. In terms of sample mean x and sample variance s, the estimators of ˛ and are written as:

O

˛ D0Ð7797s 24

O D x 0Ð5772 O˛ 25 The variance of the at-site T-year event estimator using L-moments is (Phien, 1987):

Var[OqAS,T] D

˛2[1Ð1128 0Ð9066/n 0Ð4574 1Ð1722/ny C 0Ð8046 0Ð1855/ny2]

n 1 26

where n is the number of years of record. As n increases, Equation (26) converges to:

Var[OqAS,T] D

˛2[1Ð1128 0Ð4574y C 0Ð8046y2]

n 27

The difference between Equations (26) and (27) is less than 5% for n > 20. The regional T-year event estimator can be written as:

O qR,T D O1 C y Q 28 Q D 1 N M iD1 ni O ˛i Oi 29

where ni is the number of years of record for the ith station, N is the total years of record, and M is the

(5)

The variance of the regional T-year event estimator is expressed as: Var[OqR,T] D 1 C y˛Q Q 2 Var[O] C 2 1 C y˛Q Q yE[O]Cov O,˛Q Q C2y2 Var Q ˛ Q 30

When O and Q˛/Q are independent, then Equation (30) becomes:

Var[OqR,T] D 1 C y˛Q Q 2 Var[O] C 2y2 Var Q ˛ Q 31

The variance of the at-site O˛/O can be estimated using the first-order approximation:

Var O ˛ O D 1 2 Var[ O˛] C ˛2 4 Var[O] C 2 ˛

3Cov[ O˛, O] 32

One can refer to Lu and Stedinger (1992) for Var[ O˛], Var[O] and Cov[ O˛, O]:

Var[ O˛] D 0Ð8046˛ 2 n 33 Var[O] D1Ð1128˛ 2 n 34

Cov[ O˛, O] D 0Ð2287˛

2

n 35

Substituting Equations (33)– (35) into Equation (32) gives Var[ O˛/O] which is then substituted into Equation (29) to yield: Var[ Q] D 1 N2 M iD1 ni O ˛i Oi 2 0Ð8046 0Ð4574˛Oi Oi C1Ð1128 O ˛i Oi 2 36

Finally, substituting Equations (34) and (36) into Equation (31), one can obtain the variance of the regional T-year event estimator, Var[OqR,T].

The regional-regression model

In this paper, the regional T-year event estimate is expressed as:

O yT Dln OqR,T D O0C n jD1 O_j_{ln B}_ij ₃₇

where OqR,T is the regional T-year event estimate, Bij is the jth basin characteristic (e.g., basin area, lake

index, watershed slope, longitude and latitude) at Station i, and O0, O1, . . . , On are regression coefficients.

The regression coefficients can be obtained from the following model:

(6)

where YTDyT,1 yT,2 Ð Ð Ð yT,mT 39 X D   1 ln B11 Ð Ð Ð ln B1n .. . ... ... 1 ln Bm1 Ð Ð Ð ln Bmn   40  D 0 1 Ð Ð Ð nT 41 ε D ε1 ε2 Ð Ð Ð εmT 42

and yT,mis the T-year logarithmic flow at Station m. The estimator of the parameter  is given by:

O

 D[X0X]1X0Y 43 If the estimate errors ε1, ε2, . . . , εmare independent and each is with zero mean and variance ε2, the variance

of the parameter  is:

O

_2 D_ε2X0X1 44 Once the regression coefficients 0, 1, . . . , n are obtained, from Equation (37) the regional T-year event

estimator is given by:

O

qR,TDexp[ OyT] 45

and its variance is:

Var[OqR,T] D exp[2 OyT] Var[ OyT] 46

where OyT and Var[ OyT] can be obtained from:

O YTDX O 47 Var[ OYT] D ε2X 0 X0X1X 48 Because 2

ε is unknown usually, an unbiased estimate of ε2 is given by:

s2D ns iD1 yT,i OyT,i2 nsk 1 49

where yT and OyT are the T-year observed and estimated logarithmic flows, respectively, ns is the sample

size and k is the number of independent variables, which is equal to the number of regression coefficients minus one.

APPLICATION AND DISCUSSION

In this paper, actual data of the Tanshui river basin in northern Taiwan are used. Locations of water-level stations are shown in Figure 1. The average slope of the Tanshui River is 1/45, and the area of the basin is 2726 km2_{. There are 16 water-level stations in this basin.}

First, at-site (i.e., single-site) frequency analysis for each station is performed. The 2-, 5-, 10-, 25-, 50-, 100- and 200-year floods (called at-site estimates) and their corresponding variances are then found. In dealing with the regional frequency analysis, each time one of the 16 stations is assumed to be ungauged. Based on the data of the remaining 15 stations, then the regional estimates (2-, 5-, 10-, 25-, 50-, 100- and 200-year

(7)

• water-level station

Figure 1. Locations of water-level stations in the Tanshui river basin

floods) and their corresponding variances for the ungauged station are determined. In total, we performed regional analysis 16 times for each regional method. Regarding the regression of the basin characteristics, we choose basin area, lake index, basin slope, longitude and latitude as the basin characteristics. However, we find that lake index, basin slope, longitude and latitude can be neglected according to the statistical t-test.

Once the means and variances of the at-site and regional estimators are found, the reliability of regional estimates can be determined from Equation (7). Table I shows the average reliability of regional estimates for the three regional methods based on only the observed data. We use the term ‘average’ because we perform the regional analysis 16 times. Hence, a value of reliability corresponding to a T-year event in Table I is the average of 16 values. Among three methods the index-flood method yields the highest values of reliability for various T-year estimates as shown in Table I. That is, the index-flood method has the lowest risk of underestimating a T-year event.

To go further into the assessment of the regional estimates, the technique of Monte Carlo simulation is applied to generate synthetic streamflows. We assume these models are correct and simulate from them using the parameters estimated. The average reliability of regional estimates for the three regional methods based on simulated data is summarized in Table II. The result in Table II is similar to that in Table I. Figure 2 compares the reliability of station-year estimates based on the observed and simulated data, respectively. In a like manner, the comparisons for the index-flood estimates and regional-regression estimates are shown in Figures 3 and 4. It is found that the results based on the observed data are in good agreement with those based on the simulated data. The comparison indicates that the observed data closely follow the model for the statistics we calculated.

(8)

Table I. The reliability of regional estimates based on the observed data

Return period Reliability

(year)

Station-year method Index-flood method Regional-regression model 2 0Ð452 0Ð524 0Ð411 5 0Ð447 0Ð531 0Ð368 10 0Ð440 0Ð532 0Ð429 25 0Ð428 0Ð533 0Ð433 50 0Ð419 0Ð533 0Ð451 100 0Ð413 0Ð534 0Ð469 200 0Ð409 0Ð534 0Ð484 Mean 0Ð430 0Ð532 0Ð435

Table II. The reliability of regional estimates based on the simulated data

Return period Reliability

(year)

Station-year method Index-flood method Regional-regression model 2 0Ð422 0Ð504 0Ð390 5 0Ð437 0Ð521 0Ð348 10 0Ð430 0Ð509 0Ð379 25 0Ð408 0Ð553 0Ð424 50 0Ð444 0Ð548 0Ð461 100 0Ð430 0Ð547 0Ð486 200 0Ð426 0Ð554 0Ð501 Mean 0Ð428 0Ð534 0Ð427 0.30 0.40 0.50 0.60 0.70 2 5 10 25 50 100 200 Reliability Observed Simulated

Return period (year)

Figure 2. The reliability of station-year estimates based on the observed and simulated data, respectively

Finally, we proceed to establish the relationship of ωˇ and ˇ for each regional method. The variation

of selective index ωˇ with ˇ (from 0Ð1 to 1Ð0) for various regional methods is presented in Table III and

Figure 5 (called the reliability-based selective index graph). We can see that the selective index ωˇ increases

as the value of ˇ increases for any method as expected. This is in agreement with the theoretical analysis. As indicated in the aforementioned theoretical analysis, the method with a larger value of ωˇ is a better method

when the value of ˇ is fixed. Hence, according to Table III and Figure 5, the regional-regression model is the best one. To compare the selection result with that based on the traditional index, root mean square error,

(9)

0.30 0.40 0.50 0.60 0.70 2 5 10 25 50 100 200 Reliability Observed Simulated

Figure 3. The reliability of index-flood estimates based on the observed and simulated data, respectively

Observed Simulated 0.30 0.40 0.50 0.60 0.70 2 5 10 25 50 100 200 Reliability

Figure 4. The reliability of regional-regression estimates based on the observed and simulated data, respectively

Table III. Values of the reliability-based selective index ωˇ for various regional methods

ˇ ωˇ

Station-year method Index-flood method Regional-regression model 0Ð1 0Ð12 0Ð15 0Ð18 0Ð2 0Ð24 0Ð29 0Ð36 0Ð3 0Ð34 0Ð41 0Ð52 0Ð4 0Ð43 0Ð51 0Ð65 0Ð5 0Ð50 0Ð59 0Ð77 0Ð6 0Ð57 0Ð65 0Ð85 0Ð7 0Ð62 0Ð70 0Ð90 0Ð8 0Ð67 0Ð73 0Ð93 0Ð9 0Ð71 0Ð77 0Ð95 1Ð0 0Ð73 0Ð79 0Ð96

we compute the root mean square error of various T-year event estimates for the three regional methods. As shown in Table IV, the station-year method is the best method if we use RMSE as the selective index.

SUMMARY AND CONCLUSIONS

In this paper, a new selective index is proposed for the selection of the best regional flood frequency analysis method. The new selective index is based on the theory of reliability. Actual application of the proposed methodology is performed. Three regional frequency analysis methods, namely the station-year,

(10)

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 0.1 0.2 0.3 0.4 0.5 0.6 β ωb 0.7 0.8 0.9 1 Station-year method Index-flood method Regional-regression model

Figure 5. The variation of the reliability-based selective index ωˇwith ˇ for various regional methods Table IV. The RMSE for various regional methods

Return RMSE

period

Station-year method Index-flood method Regional-regression (year) model 2 0Ð225 0Ð285 0Ð289 5 0Ð225 0Ð221 0Ð277 10 0Ð205 0Ð191 0Ð182 25 0Ð225 0Ð200 0Ð188 50 0Ð225 0Ð266 0Ð204 100 0Ð225 0Ð278 0Ð252 200 0Ð225 0Ð288 0Ð334 Mean 0Ð222 0Ð247 0Ð246

index-flood and regional-regression models, are considered in the application. The selection result based on the new index is different from that based on the traditional index (root mean square error). The proposed new index has advantages over existing indices, because it considers the influence of variance of estimates.

REFERENCES

Ang AH-S. 1973. Structural risk analysis and reliability-based design. Journal of the Structural Engineering Division, ASCE 99(ST9): 1891– 1910.

Burn DH. 1988. Delineation of groups for regional flood frequency analysis. Journal of Hydrology 104: 345– 361.

Burn DH. 1990. An appraisal of the ‘region of influence’ approach to flood frequency analysis. Hydrological Science Journal 35(2): 149– 165. Cunnane C. 1988. Methods and merits of regional flood frequency analysis. Journal of Hydrology 100: 269– 290.

GREHYS (Groupe de recherche en hydrologie statistique). 1996. Presentation and review of some methods for regional flood frequency analysis. Journal of Hydrology 186: 63– 84.

GREHYS (Groupe de recherche en hydrologie statistique). 1997. Intercomparison of regional flood frequency procedures Canadian rivers.

Journal of Hydrology 186: 85– 103.

Lu LH, Stedinger JR. 1992. Variance of two- and three-parameter GEV/PWM quantile estimators: formulae, confidence intervals, and a comparison. Journal of Hydrology 138: 247– 267.

(11)

Phien HN. 1987. A method of parameter estimator for the extreme value type I distribution. Journal of Hydrology 90: 251– 268. Ribeiro-Correa J, Cavadias G, Rousselle J. 1995. Identification of hydrological neighborhoods using canonical correlation analysis. Journal

of Hydrology 173: 71– 89.