2 Uncertainty Quantification for Overcomplete Bases Sur- Sur-rogate Construction
2.2 Bayesian Uncertainty Quantification for OBSM Surrogate
To derive the uncertainty of the OBSM prediction in (1), we can see that the difficulty stems from the fact that the OBSM lacks a stochastic component to facilitate the quantification of uncertainties of the predictions. A Bayesian approach is used here by imposing priors on the coefficients of the OBSM model. Then we can generate the posterior samples for each prediction to derive its prediction uncertainty. We elaborate on this scheme in the following.
2.2.1 Stochastic Search Variable Selection for Uncertainty Quantification in OBSM Suppose we have identified m basis functions with non-zero coefficients via the OBSM to construct a surrogate model, Pm
j=1cjφj(x). To quantify the uncertainty of the OBSM prediction, we use a Bayesian variable selection approach, Stochastic Search Variable Selection (SSVS), proposed by George and McCulloch (1993, 1997).
For the OBSM, we make the following statistical model assumption:
y(x) =
m
X
j=1
cjφj(x) + (x), (5)
where (x) are the independent normal errors with zero mean and variance σ2. Therefore on the explored point set Pexp = {x1, . . . , xn}, we have
Y =
m
X
j=1
cjφ˜j + ˜.
Here, ˜ = ((x1), . . . , (xn))> is an n×1 error vector that follows a multivariate normal distribution with zero mean and covariance matrix σ2In, and Inis the n × n identity matrix. SSVS is originally
proposed for the variable selection problems, where an m × 1 vector of latent variables, γ = (γ1, . . . , γm)>, is augmented into the model (5) to indicate which variables are selected. However, our purpose here is not on variable selection. These indicators are used to denote the contributions of the bases in the OBSM. That is, γi = 1 indicates that ˜φi is important (i.e., a significant contributor), and γi = 0 indicates the opposite.
The prior assumptions for SSVS are given as follows. The prior distribution of γi is P (γi = 0) = pi and P (γi = 1) = 1 − pi. The prior distribution of ci is a mixture normal distribution and is dependent on γi. That is,
[ci|γi = 0] ∼ N (0, τ0i), and [ci|γi = 1] ∼ N (0, τ1i). (6) If ˜φi is unimportant, then the prior distribution is N (0, τ0i) with a small τ0i. If ˜φi is important, a large τ1i is chosen. In particular, we can write τ1i = Ciτ0i, where Ci > 1. The prior distributions of (γi, ci) are assumed to be independent for i = 1, . . . , m. These distributions are independent of the prior distribution of the residual variance σ2, which is an inverse gamma distribution satisfying σ2 ∼ IG(ν/2, νλ/2).
Now we focus on the the posterior samples of the coefficients instead of focusing on the indicator samples. Let c = (c1, . . . , cm)> be the vector of the non-zero coefficients ci. Since the posterior distribution of c cannot be derived directly, the SSVS is implemented via a Gibbs sampling scheme, where each iteration samples from the conditional distributions [c|γ, Y, σ], [γ|c, Y, σ], and [σ|Y, c, γ].
Suppose we run SSVS for (T + T0) times and then discard the T0 initial samples, we can collect T posterior samples of c, c(1), . . . , c(T ).
To quantify the uncertainty of the prediction value of OBSM at a point x∗, the posterior samples of the coefficients c(t) are used to get the posterior samples of the prediction values at x∗ via
y(t)(x∗) =
m
X
j=1
c(t)j φj(x∗) + (t),
where (t) is a sample from N (0, (σ(t))2) and t = 1, . . . , T. Consequently, the uncertainty of the OBSM prediction, y(x∗), can be quantified by a statistic of these posterior samples of y(t) for t = 1, . . . , T . One example is the 95% credible interval for y(x∗), denoted as [y(0.025T ), y(0.975T )], where y(0.025T )and y(0.975T )are respectively the 2.5 and 97.5 percentiles of the samples y(1), . . . , y(T ).
2.2.2 Parameter Tuning for SSVS
To implement SSVS for deriving the uncertainty of the OBSM predictor, we pre-specify the tuning parameters in the prior distributions as follows. First, for the parameters ν and λ in the prior distribution of σ2, we follow Chipman et al. (1997) by choosing ν = 2 and λ as the 99% quantile of the inverse gamma prior that is close to pV ar(Y ). Second, for the probability, pi, since we should not have any prior information about which φi is important, we simply set pi = 0.5 as suggested by George and McCulloch (1993, 1997).
Last but not the least, the parameters Ci and τ0i in the mixture normal prior of ci play important roles in SSVS. George and McCulloch (1993) and Beattie et al. (2002) suggested choosing the best set-up from several pre-specified candidates of (τ0i, Ci). Chipman et al. (1997) proposed another approach by simply fixing Ci = 10 and setting τ0i = ∆Y /(3∆ ˜φi), where ∆ ˜φi = max( ˜φi) − min( ˜φi) represents the large change in ˜φi and ∆Y = pV ar(Y )/5. However, these suggestions are tailored for the variable selection problems. To focus on the stated UQ problem, we propose the following tuning procedure for (Ci, τ0i). We choose τ0i= ∆Y /(3∆ ˜φi) as in Chipman et al. (1997) and set Ci = C. Since the uncertainty of the OBSM predictor is quantified via credible interval, we use the following procedure to tune C.
T1. For a given value of C, every prediction value must be in the corresponding credible interval.
T2. Among all the C’s that satisfy T1, the best value of C is chosen by minimizing the loss function defined in (7). This choice is based on the rationale of the “equal-tailed” credible interval for each point x. In particular, we define
L(C) = {X
x
(E(y(x)) − yMC(x))2}1/2, (7)
where E(y(x)) = P
jcjφj(x) is the prediction value obtained by the OBSM and yCM(x) is the median value of the posterior samples for the point x generated by fixing the value of C.
2.3 An Illustration
A numerical experiment is used here to demonstrate how the approach can be used to quantify the uncertainties of the OBSM predictors. This example is based on a real computational problem considered in Chen et al. (2011) and Wang et al. (2001). Chen et al. (2011) studied the problem
(a) True surface (b) Surrogate surface
Figure 2: (a) The objective function fL over the 41 × 41 grid and (b) The surrogate surface obtained by the OBSM over the 41 × 41 grid. The white dots are used to denote the initial design points.
of identifying positive Lyapunov exponents (LE) for a dynamical system simulating bistable laser diodes. In this problem, two parameters Sp1 (the pump rate) and mc (the modulation current) are tuned so that the corresponding LE is positive, which indicates that the light output of the dynamical system is chaotic. Let fL(Sp1, mc) denote the largest LE of the dynamical system corresponding to the point (Sp1, mc). Chen et al. (2011) showed that the OBSM performed well in constructing the surrogate surface for fL(Sp1, mc).
The experimental region is set as X = {(Sp1, mc)|20 ≤ Sp1 ≤ 30, 5 ≤ mc ≤ 15} and X is divided into 41 × 41 grid points. Figure 2(a) shows the true function surface fL over the 41 × 41 grid. This function surface is smooth in the left part but oscillates violently in the right part.
Suppose the initial design is the 21-point uniform design as in Chen et al. (2011), i.e. n = 21, and then the responses are collected over these points. To implement the OBSM, following Chen et al. (2011), we use the Gabor basis function in Eq. (3). The parameters in the Gabor basis dictio-nary are: (i) σv2 = 2σ2uand σu ∈ {0.5, 1, 1.5, 6}, (ii) λ =√
2πσu and θ ∈ {0,π8,2π8 ,3π8 ,4π8 ,5π8 ,6π8 ,7π8 }, and (iii) ϕ = 0. The number of the Gabor dictionary is 21 × 4 × 8 × 1. The matching pursuit algorithm is iterated 40 times. That is, we choose 40 bases for surrogate construction. Figure 2(b) shows the surrogate surface constructed by the OBSM.
To quantify uncertainties of the OBSM predictors, SSVS in Section 2.2.1 is used to generate
Table 1: The tuning results for C = 10, 20, 30, positive LE example C Coverage rate L(C) =k OBSM - Median k
10 1647/1681 6.74364970056741 20 1681/1681 6.69279511035165 30 1681/1681 7.99894240494143
the posterior prediction samples for each grid point. Except for the parameter C, the other tuning parameters are chosen as those shown in Section 2.2.2. To tune C, the candidate set of C is chosen as {10, 20, 30}. For each value of C, the SSVS is iterated 20,000 times, and 10,000 posterior samples are collected after discarding the first 10,000 iterations. Suppose we are interested in 95% credible interval. Then based on the tuning criteria, T1 and T2, the tuning results with respect to C are summarized in Table 1. When C = 10, the first criterion T1 is violated, because the coverage rate 1, 647/1, 681 is less than 1. For the other two values, C = 20 and 30, the coverage rate is equal to 1. Consequently, we choose C = 20 because it has a lower value of L(C). Figure 3(a) shows that the OBSM prediction values are all close to the median values of the posterior samples and the 95% credible intervals for all grid points are displayed in Figure 3(b). Figures 3(c) and (d) give the zoomed-in display of credible intervals for the first 1, 000 grid points and the last 681 points separately.
Since the length of the credible interval can be treated as a proxy of the variance of the prediction value, we identify the 10 points with larger lengths of the credible intervals and the 10 points with smaller lengths of the intervals in Figure 4. The black dot (·) denotes points of the ten shortest credible intervals and the white plus sign (+) points of the ten longest credible intervals.
We find that the points with larger CI’s are all in the right hand side where the true surface is oscillatory. The points with smaller CI’s are all in the left part where the true surface is smooth.
This observation suggests that we have very stable prediction values for the left part. However, consider the model for the right hand side. From Figure 4, we find that the large variation points (marked by the white plus signs) occur in the area where the surrogate model changes more widely.
This may explain why prediction variations are large in this area.