Variable Selection - 計算統計方法在積體電路最佳化及敏感度分析之研究

In response surface work it is customary to construct the full model corresponding to the situation at hand. That is, in steepest ascent we usually build the full first-order model, and in the analysis of a second-order model we usually construct the full quadratic. An experimenter may encounter situations where the full model may not be appropriate; that is, a model based on a subset of the regressors in the full model may be superior. Variable selection or model-building techniques may be used to identify the best subset of regressors to include in a regression model [39].

Variable selection is determined by statistical analysis of the generated response surface models. Input factors showing a significant effect on an individual response can be system-atically determined using statistical techniques. Variations of these significant input factors will produce the greatest fluctuations in device performance. This analysis is extremely useful in understanding what areas of manufacturing require greater control.

(1) Half-normal plot and t test

As screening design, a technique for identifying significant model terms can be based

2.4 : Variable Selection 19

on the half-normal plot of model coefficients. This method is originally proposed for ana-lyzing two-level factorial experiments applicable in cases where no degrees of freedom are available for estimating the variance of an error term. The effects are plotted on half-normal probability paper, those standing apart being identified as potentially real effects [40]. Prob-ability plotting may also be used for experiments having three level. One approach is to express the effects with linear and quadratic components, and construct the normal proba-bility plot of those components standardized to have the same variance [41][42].

The half-normal plots are informal graphical methods involving visual judgment. A formal test of effect significance is called t test for the least squares estimate ˆβ. It can be shown that the least squares estimate ˆβ has a multivariate normal distribution with mean vector β and variance-covariance matrix σ²(X^TX)⁻¹, i.e.,

ˆβ ∼ MN(β, σ²(X^TX)⁻¹), (2.17)

where M N stands for multivariate normal. The (i, j)th entry of the variance-covariance matrix is Cov( ˆβi, ˆβ_j) and the jth diagonal element is Cov( ˆβj, ˆβ_j) = V ar( ˆβj). Therefore, the distribution for the individual ˆβ_jis N(βj, σ_jj² (X^TX)⁻¹), which suggests that for testing the null hypothesis

H₀ : βj = 0, (2.18)

20 Chapter 2 : Statistical Methodology

the following t statistic be used:

ˆβ_j

ˆσjj² (X^TX)⁻¹ ∼ tN−p−1 (under H₀). (2.19)

Under H₀, it has a t distribution with N − p − 1 degrees of freedom.

(2) Stepwise regression

Alternative of variable selection is called stepwise regression. It is one of various meth-ods for evaluating only a small number of subset regression models by either adding or deleting regressors one at a time. Stepwise regression is a popular combination of proce-dures forward selection and backward elimination [38].

The procedure of the forward selection begins with the assumption that there are no regressors in the model other than the intercept. An effort is made to find an optimal subset by inserting regressors into the model one at a time. The first regressor selected for entry into the equation is the one that has the largest simple correlation with the response variable y. Suppose that this regressor is x₁. This is also the regressor that will produce the largest value of the F-statistic for testing significance of regression. This regressor is entered if the F-statistic exceeds a preselected F-value, say FIN (or F-to-enter). The second regressor chosen for entry is the one that now has the largest correlation with y after adjusting for the effect of the first regressor entered (x₁) on y. We refer ro these correlations as partial correlations. They are the simple correlations between the residuals from the regression ˆy = ˆβ₀ + ˆβ₁x₁ and the residuals from the regressions of the other candidate regressors on

2.4 : Variable Selection 21

x₁, say ˆxj = ˆα0j + ˆα1jx₁, j = 2, 3, . . . , K.

Suppose that at Step 2 the regressor with the highest partial correlation with y is x₂. This implies that the largest partial F-statistic is

F = SSR(x₂|x₁)

M S_E(x1, x₂). (2.20)

If this F-value exceeds FIN, then x₂ is added to the model. In general, at each step the regressor having the highest partial correlation with y (or equivalently the largest partial F-statistic given the other regressors already in the model) is added to the model if its partial F-statistic exceeds the preselected entry level FIN [38]. The procedure terminates either when the partial F-statistic at a particular step does not exceed F_IN or when the last candidate regressor is added to the model.

Forward selection begins with no regressors in the model and attempts to insert vari-ables until a suitable model is obtained. Backward elimination attempts to find a good model by working in the opposite direction. That is, we begin with a model that includes all K candidate regressors. Then the partial F-statistic (or a t-statistic, which is equivalent) is computed for each regressor as if it is the last variable to enter the model. The smallest of these partial F-statistics is compared with a preselected value, F_{OU T} (or F-to-move); and if the smallest partial F-value is less than FOU T, that regressor is removed from the model.

Now a regression model with K − 1 regressors is constructed, the partial F-statistics for

22 Chapter 2 : Statistical Methodology

this new model calculated, and the procedure repeated. The backward elimination algo-rithm terminates when the smallest partial F-value is not less than the preselected cutoff value F_{OU T} [38].

Backward elimination is often a very good variable selection procedure. It is particu-larly favored by analysts who like to see the effect of including all the candidate regressors, just so that nothing obvious will be missed. The two procedures described above suggest a number of possible combinations. One of the most popular is the stepwise regression algorithm and the flowchart is shown in Fig. 2.4 This is a modification of forward selection in which at each step all regressors entered into the model previously are reassessed via their partial F- or t-statistics. A regressor added at an earlier step may now be redundant because of the relationship between it and regressors now in the equation. If the partial F-statistic for a variable is less than F_{OU T}, that variable is dropped from the model.

Stepwise regression requires two cutoff values, FIN and FOU T. Several analysts prefer to choose F_IN = FOU T, although this is not necessary. Sometimes we choose F_IN >

F_{OU T}, making it more difficult to add a regressor than to delete one [38].

2.4 : Variable Selection 23

Calculate correlation matrix

Choose X which is the largest correlation between X and Y

to build regression model

Discard non-significant X Choose X which partial F ratio is the largest ratio in the model significant?

Yes

Figure 2.4: A flowchart of the stepwise regression algorithm used in our work.

24 Chapter 2 : Statistical Methodology

在文檔中計算統計方法在積體電路最佳化及敏感度分析之研究 (頁 48-54)