In response surface work it is customary to construct the full model corresponding to the situation at hand. That is, in steepest ascent we usually build the full first-order model, and in the analysis of a second-order model we usually construct the full quadratic. An experimenter may encounter situations where the full model may not be appropriate; that is, a model based on a subset of the regressors in the full model may be superior. Variable selection or model-building techniques may be used to identify the best subset of regressors to include in a regression model [39].
Variable selection is determined by statistical analysis of the generated response surface models. Input factors showing a significant effect on an individual response can be system-atically determined using statistical techniques. Variations of these significant input factors will produce the greatest fluctuations in device performance. This analysis is extremely useful in understanding what areas of manufacturing require greater control.
(1) Half-normal plot and t test
As screening design, a technique for identifying significant model terms can be based
2.4 : Variable Selection 19
on the half-normal plot of model coefficients. This method is originally proposed for ana-lyzing two-level factorial experiments applicable in cases where no degrees of freedom are available for estimating the variance of an error term. The effects are plotted on half-normal probability paper, those standing apart being identified as potentially real effects [40]. Prob-ability plotting may also be used for experiments having three level. One approach is to express the effects with linear and quadratic components, and construct the normal proba-bility plot of those components standardized to have the same variance [41][42].
The half-normal plots are informal graphical methods involving visual judgment. A formal test of effect significance is called t test for the least squares estimate ˆβ. It can be shown that the least squares estimate ˆβ has a multivariate normal distribution with mean vector β and variance-covariance matrix σ2(XTX)−1, i.e.,
ˆβ ∼ MN(β, σ2(XTX)−1), (2.17)
where M N stands for multivariate normal. The (i, j)th entry of the variance-covariance matrix is Cov( ˆβi, ˆβj) and the jth diagonal element is Cov( ˆβj, ˆβj) = V ar( ˆβj). Therefore, the distribution for the individual ˆβjis N(βj, σjj2 (XTX)−1), which suggests that for testing the null hypothesis
H0 : βj = 0, (2.18)
20 Chapter 2 : Statistical Methodology
the following t statistic be used:
ˆβj
ˆσjj2 (XTX)−1 ∼ tN−p−1 (under H0). (2.19)
Under H0, it has a t distribution with N − p − 1 degrees of freedom.
(2) Stepwise regression
Alternative of variable selection is called stepwise regression. It is one of various meth-ods for evaluating only a small number of subset regression models by either adding or deleting regressors one at a time. Stepwise regression is a popular combination of proce-dures forward selection and backward elimination [38].
The procedure of the forward selection begins with the assumption that there are no regressors in the model other than the intercept. An effort is made to find an optimal subset by inserting regressors into the model one at a time. The first regressor selected for entry into the equation is the one that has the largest simple correlation with the response variable y. Suppose that this regressor is x1. This is also the regressor that will produce the largest value of the F-statistic for testing significance of regression. This regressor is entered if the F-statistic exceeds a preselected F-value, say FIN (or F-to-enter). The second regressor chosen for entry is the one that now has the largest correlation with y after adjusting for the effect of the first regressor entered (x1) on y. We refer ro these correlations as partial correlations. They are the simple correlations between the residuals from the regression ˆy = ˆβ0 + ˆβ1x1 and the residuals from the regressions of the other candidate regressors on
2.4 : Variable Selection 21
x1, say ˆxj = ˆα0j + ˆα1jx1, j = 2, 3, . . . , K.
Suppose that at Step 2 the regressor with the highest partial correlation with y is x2. This implies that the largest partial F-statistic is
F = SSR(x2|x1)
M SE(x1, x2). (2.20)
If this F-value exceeds FIN, then x2 is added to the model. In general, at each step the regressor having the highest partial correlation with y (or equivalently the largest partial F-statistic given the other regressors already in the model) is added to the model if its partial F-statistic exceeds the preselected entry level FIN [38]. The procedure terminates either when the partial F-statistic at a particular step does not exceed FIN or when the last candidate regressor is added to the model.
Forward selection begins with no regressors in the model and attempts to insert vari-ables until a suitable model is obtained. Backward elimination attempts to find a good model by working in the opposite direction. That is, we begin with a model that includes all K candidate regressors. Then the partial F-statistic (or a t-statistic, which is equivalent) is computed for each regressor as if it is the last variable to enter the model. The smallest of these partial F-statistics is compared with a preselected value, FOU T (or F-to-move); and if the smallest partial F-value is less than FOU T, that regressor is removed from the model.
Now a regression model with K − 1 regressors is constructed, the partial F-statistics for
22 Chapter 2 : Statistical Methodology
this new model calculated, and the procedure repeated. The backward elimination algo-rithm terminates when the smallest partial F-value is not less than the preselected cutoff value FOU T [38].
Backward elimination is often a very good variable selection procedure. It is particu-larly favored by analysts who like to see the effect of including all the candidate regressors, just so that nothing obvious will be missed. The two procedures described above suggest a number of possible combinations. One of the most popular is the stepwise regression algorithm and the flowchart is shown in Fig. 2.4 This is a modification of forward selection in which at each step all regressors entered into the model previously are reassessed via their partial F- or t-statistics. A regressor added at an earlier step may now be redundant because of the relationship between it and regressors now in the equation. If the partial F-statistic for a variable is less than FOU T, that variable is dropped from the model.
Stepwise regression requires two cutoff values, FIN and FOU T. Several analysts prefer to choose FIN = FOU T, although this is not necessary. Sometimes we choose FIN >
FOU T, making it more difficult to add a regressor than to delete one [38].
2.4 : Variable Selection 23
Calculate correlation matrix
Choose X which is the largest correlation between X and Y
to build regression model
Discard non-significant X Choose X which partial F ratio is the largest ratio in the model significant?
Yes
Figure 2.4: A flowchart of the stepwise regression algorithm used in our work.
24 Chapter 2 : Statistical Methodology