Bayesian Uncertainty Quantification for OBSM Surrogate

2 Uncertainty Quantification for Overcomplete Bases Sur- Sur-rogate Construction

2.2 Bayesian Uncertainty Quantification for OBSM Surrogate

To derive the uncertainty of the OBSM prediction in (1), we can see that the difficulty stems from the fact that the OBSM lacks a stochastic component to facilitate the quantification of uncertainties of the predictions. A Bayesian approach is used here by imposing priors on the coefficients of the OBSM model. Then we can generate the posterior samples for each prediction to derive its prediction uncertainty. We elaborate on this scheme in the following.

2.2.1 Stochastic Search Variable Selection for Uncertainty Quantification in OBSM Suppose we have identified m basis functions with non-zero coefficients via the OBSM to construct a surrogate model, Pm

j=1c_jφ_j(x). To quantify the uncertainty of the OBSM prediction, we use a Bayesian variable selection approach, Stochastic Search Variable Selection (SSVS), proposed by George and McCulloch (1993, 1997).

For the OBSM, we make the following statistical model assumption:

y(x) =

j=1

cjφj(x) + (x), (5)

where (x) are the independent normal errors with zero mean and variance σ². Therefore on the explored point set P_exp = {x₁, . . . , x_n}, we have

Y =

j=1

c_jφ˜_j + ˜.

Here, ˜ = ((x1), . . . , (xn))^> is an n×1 error vector that follows a multivariate normal distribution with zero mean and covariance matrix σ²I_n, and I_nis the n × n identity matrix. SSVS is originally

proposed for the variable selection problems, where an m × 1 vector of latent variables, γ = (γ₁, . . . , γ_m)^>, is augmented into the model (5) to indicate which variables are selected. However, our purpose here is not on variable selection. These indicators are used to denote the contributions of the bases in the OBSM. That is, γ_i = 1 indicates that ˜φ_i is important (i.e., a significant contributor), and γ_i = 0 indicates the opposite.

The prior assumptions for SSVS are given as follows. The prior distribution of γ_i is P (γ_i = 0) = p_i and P (γ_i = 1) = 1 − p_i. The prior distribution of c_i is a mixture normal distribution and is dependent on γ_i. That is,

[c_i|γ_i = 0] ∼ N (0, τ_0i), and [c_i|γ_i = 1] ∼ N (0, τ_1i). (6) If ˜φ_i is unimportant, then the prior distribution is N (0, τ_0i) with a small τ_0i. If ˜φ_i is important, a large τ_1i is chosen. In particular, we can write τ_1i = C_iτ_0i, where C_i > 1. The prior distributions of (γ_i, c_i) are assumed to be independent for i = 1, . . . , m. These distributions are independent of the prior distribution of the residual variance σ², which is an inverse gamma distribution satisfying σ² ∼ IG(ν/2, νλ/2).

Now we focus on the the posterior samples of the coefficients instead of focusing on the indicator samples. Let c = (c₁, . . . , c_m)^> be the vector of the non-zero coefficients c_i. Since the posterior distribution of c cannot be derived directly, the SSVS is implemented via a Gibbs sampling scheme, where each iteration samples from the conditional distributions [c|γ, Y, σ], [γ|c, Y, σ], and [σ|Y, c, γ].

Suppose we run SSVS for (T + T₀) times and then discard the T₀ initial samples, we can collect T posterior samples of c, c⁽¹⁾, . . . , c^{(T )}.

To quantify the uncertainty of the prediction value of OBSM at a point x^∗, the posterior samples of the coefficients c^(t) are used to get the posterior samples of the prediction values at x^∗ via

y^(t)(x^∗) =

j=1

c^(t)_j φ_j(x^∗) + ^(t),

where ^(t) is a sample from N (0, (σ^(t))²) and t = 1, . . . , T. Consequently, the uncertainty of the OBSM prediction, y(x^∗), can be quantified by a statistic of these posterior samples of y^(t) for t = 1, . . . , T . One example is the 95% credible interval for y(x^∗), denoted as [y_{(0.025T )}, y_{(0.975T )}], where y_{(0.025T )}and y_{(0.975T )}are respectively the 2.5 and 97.5 percentiles of the samples y⁽¹⁾, . . . , y^{(T )}.

2.2.2 Parameter Tuning for SSVS

To implement SSVS for deriving the uncertainty of the OBSM predictor, we pre-specify the tuning parameters in the prior distributions as follows. First, for the parameters ν and λ in the prior distribution of σ², we follow Chipman et al. (1997) by choosing ν = 2 and λ as the 99% quantile of the inverse gamma prior that is close to pV ar(Y ). Second, for the probability, p_i, since we should not have any prior information about which φ_i is important, we simply set p_i = 0.5 as suggested by George and McCulloch (1993, 1997).

Last but not the least, the parameters C_i and τ_0i in the mixture normal prior of c_i play important roles in SSVS. George and McCulloch (1993) and Beattie et al. (2002) suggested choosing the best set-up from several pre-specified candidates of (τ_0i, C_i). Chipman et al. (1997) proposed another approach by simply fixing Ci = 10 and setting τ0i = ∆Y /(3∆ ˜φi), where ∆ ˜φi = max( ˜φ_i) − min( ˜φ_i) represents the large change in ˜φ_i and ∆Y = pV ar(Y )/5. However, these suggestions are tailored for the variable selection problems. To focus on the stated UQ problem, we propose the following tuning procedure for (C_i, τ_0i). We choose τ_0i= ∆Y /(3∆ ˜φ_i) as in Chipman et al. (1997) and set C_i = C. Since the uncertainty of the OBSM predictor is quantified via credible interval, we use the following procedure to tune C.

T1. For a given value of C, every prediction value must be in the corresponding credible interval.

T2. Among all the C’s that satisfy T1, the best value of C is chosen by minimizing the loss function defined in (7). This choice is based on the rationale of the “equal-tailed” credible interval for each point x. In particular, we define

L(C) = {X

(E(y(x)) − y^M_C(x))²}^1/2, (7)

where E(y(x)) = P

jc_jφ_j(x) is the prediction value obtained by the OBSM and y_C^M(x) is the median value of the posterior samples for the point x generated by fixing the value of C.

2.3 An Illustration

A numerical experiment is used here to demonstrate how the approach can be used to quantify the uncertainties of the OBSM predictors. This example is based on a real computational problem considered in Chen et al. (2011) and Wang et al. (2001). Chen et al. (2011) studied the problem

(a) True surface (b) Surrogate surface

Figure 2: (a) The objective function fL over the 41 × 41 grid and (b) The surrogate surface obtained by the OBSM over the 41 × 41 grid. The white dots are used to denote the initial design points.

of identifying positive Lyapunov exponents (LE) for a dynamical system simulating bistable laser diodes. In this problem, two parameters Sp1 (the pump rate) and mc (the modulation current) are tuned so that the corresponding LE is positive, which indicates that the light output of the dynamical system is chaotic. Let f_L(S_p1, m_c) denote the largest LE of the dynamical system corresponding to the point (S_p1, m_c). Chen et al. (2011) showed that the OBSM performed well in constructing the surrogate surface for f_L(S_p1, m_c).

The experimental region is set as X = {(S_p1, m_c)|20 ≤ S_p1 ≤ 30, 5 ≤ m_c ≤ 15} and X is divided into 41 × 41 grid points. Figure 2(a) shows the true function surface f_L over the 41 × 41 grid. This function surface is smooth in the left part but oscillates violently in the right part.

Suppose the initial design is the 21-point uniform design as in Chen et al. (2011), i.e. n = 21, and then the responses are collected over these points. To implement the OBSM, following Chen et al. (2011), we use the Gabor basis function in Eq. (3). The parameters in the Gabor basis dictio-nary are: (i) σ_v² = 2σ²_uand σ_u ∈ {0.5, 1, 1.5, 6}, (ii) λ =√

2πσ_u and θ ∈ {0,^π₈,^2π₈ ,^3π₈ ,^4π₈ ,^5π₈ ,^6π₈ ,^7π₈ }, and (iii) ϕ = 0. The number of the Gabor dictionary is 21 × 4 × 8 × 1. The matching pursuit algorithm is iterated 40 times. That is, we choose 40 bases for surrogate construction. Figure 2(b) shows the surrogate surface constructed by the OBSM.

To quantify uncertainties of the OBSM predictors, SSVS in Section 2.2.1 is used to generate

Table 1: The tuning results for C = 10, 20, 30, positive LE example C Coverage rate L(C) =k OBSM - Median k

10 1647/1681 6.74364970056741 20 1681/1681 6.69279511035165 30 1681/1681 7.99894240494143

the posterior prediction samples for each grid point. Except for the parameter C, the other tuning parameters are chosen as those shown in Section 2.2.2. To tune C, the candidate set of C is chosen as {10, 20, 30}. For each value of C, the SSVS is iterated 20,000 times, and 10,000 posterior samples are collected after discarding the first 10,000 iterations. Suppose we are interested in 95% credible interval. Then based on the tuning criteria, T1 and T2, the tuning results with respect to C are summarized in Table 1. When C = 10, the first criterion T1 is violated, because the coverage rate 1, 647/1, 681 is less than 1. For the other two values, C = 20 and 30, the coverage rate is equal to 1. Consequently, we choose C = 20 because it has a lower value of L(C). Figure 3(a) shows that the OBSM prediction values are all close to the median values of the posterior samples and the 95% credible intervals for all grid points are displayed in Figure 3(b). Figures 3(c) and (d) give the zoomed-in display of credible intervals for the first 1, 000 grid points and the last 681 points separately.

Since the length of the credible interval can be treated as a proxy of the variance of the prediction value, we identify the 10 points with larger lengths of the credible intervals and the 10 points with smaller lengths of the intervals in Figure 4. The black dot (·) denotes points of the ten shortest credible intervals and the white plus sign (+) points of the ten longest credible intervals.

We find that the points with larger CI’s are all in the right hand side where the true surface is oscillatory. The points with smaller CI’s are all in the left part where the true surface is smooth.

This observation suggests that we have very stable prediction values for the left part. However, consider the model for the right hand side. From Figure 4, we find that the large variation points (marked by the white plus signs) occur in the area where the surrogate model changes more widely.

This may explain why prediction variations are large in this area.

在文檔中 Computer Experiment Analysis with Mixed Types of Factors (頁 35-40)