Optimum design of experiments for statistical inference

(1)

Optimum design of experiments for statistical inference

Steven G. Gilmour

University of Southampton, UK and Luzia A. Trinca

Universidade Estadual Paulista, Botucatu, Brazil

[Read before The Royal Statistical Society on Wednesday, November 16th, 2011, the President , Professor V. S. Isham, in the Chair ]

Summary. One attractive feature of optimum design criteria, such as D- and A-optimality, is that they are directly related to statistically interpretable properties of the designs that are obtained, such as minimizing the volume of a joint confidence region for the parameters. However, the assumed relationships with inferential procedures are valid only if the variance of experimental units is assumed to be known. If the variance is estimated, then the properties of the inferences depend also on the number of degrees of freedom that are available for estimating the error variance. Modified optimality criteria are defined, which correctly reflect the utility of designs with respect to some common types of inference. For fractional factorial and response surface experiments, the designs that are obtained are quite different from those which are optimal under the standard criteria, with many more replicate points required to estimate error. The optimality of these designs assumes that inference is the only purpose of running the experiment, but in practice interpretation of the point estimates of parameters and checking for lack of fit of the treatment model assumed are also usually important. Thus, a compromise between the new criteria and others is likely to be more relevant to many practical situations. Compound criteria are developed, which take account of multiple objectives, and are applied to fractional factorial and response surface experiments. The resulting designs are more similar to standard designs but still have sufficient residual degrees of freedom to allow effective inferences to be carried out. The new procedures developed are applied to three experiments from the food industry to see how the designs used could have been improved and to several illustrative examples. The design optimization is implemented through a simple exchange algorithm.

Keywords: A-optimality; Blocking; Compound criterion; D-optimality; Exchange algorithm;

Factorial design; Lack of fit; Pure error; Response surface

1. Introduction

Experiments with complex treatment structures, such as fractional factorial, response surface or mixtures designs, are very common in industrial research and development, as well as in many laboratory-based sciences. In such experiments, variance-based optimality criteria are increas- ingly used, owing to their availability in software packages, to choose the treatment design and, if appropriate, to arrange the design in blocks. An advantage of criteria such as D-optimality, A- (or more generally L-)optimality, G-optimality, etc., is that they have meaningful interpre- tations in relation to the statistical analysis to be performed on the data from the experiment.

For example, a D-optimum design minimizes the volume of a joint conﬁdence region for the Address for correspondence: Steven G. Gilmour, School of Mathematics, University of Southampton, Highﬁeld, Southampton, SO17 1BJ, UK.

E-mail: s.gilmour@soton.ac.uk

(2)

parameters, an A-optimum design minimizes the average variance of parameter estimates and minimizes the average squared width of the corresponding conﬁdence intervals and so on. Very importantly, multiple objectives can be met through the use of compound criteria, although this is not yet common in practice. For a comprehensive and accessible description of optimality criteria and their applications, see Atkinson et al. (2007).

The work that is presented here was motivated mainly by experimental research carried out for the food industry. Fractional factorial and response surface designs are widely used for such experiments, but the relatively large run-to-run variation that is caused by the use of biological materials means that some of the very small designs that are used in some other industrial applications are ineffective. The analysis of data involves inferences based on conﬁdence intervals or hypothesis tests, to ensure that ﬁtted response surfaces are not overinterpreted as showing effects which could be due to random variation. Despite the supposed relationships between optimality criteria and common methods of analysing the data from these experiments, many experimenters still prefer to use standard designs, such as central composite designs (CCDs).

One advantage of classical designs is that, unlike most optimum designs, they include replicate points.

In almost all experiments, the statistical inference procedures that are used rely on an internal estimate of the variance between experimental units, which is often called ‘error’. In experiments in which the proposed model for treatment effects includes fewer parameters than there are treatments, there are some differences in practice about which estimate of error is used. The default in many statistical computing packages, especially if a general regression program is used, and the procedure that is described in many regression textbooks (which are often aimed at analysing observational data), is to use the residual mean square from the ﬁtted model. However, most textbooks on the design and analysis of experiments recommend separating lack of ﬁt from

‘pure error’ and using the pure error mean square for inference.

We strongly recommend the latter method of estimating error. However, once this has been accepted, the usual definitions of optimum design criteria no longer have all of the statistical interpretations claimed for them, since the sizes of confidence intervals and regions, or equiva- lently the power of hypothesis tests, depend not only on the variance matrix of the parameter estimates, but also on the degrees of freedom for pure error. The objectives of this paper are to show how the criteria should be correctly defined in order to have the properties claimed, to show that they can be applied in some of the same ways as the traditional criteria and to illustrate by examples that quite different designs can be optimal under the new criteria.

In Section 2, we discuss the analysis of data and, in particular, the estimation of error for inferential procedures. In Section 3, adjusted deﬁnitions of various optimality criteria which have the desired interpretations and examples of their use to choose designs are given. In Sec- tion 4 compound criteria which allow a compromise between different experimental objectives are developed and illustrated. Finally, some overall lessons are drawn in Section 5.

The programs that were used to obtain near optimum designs can be obtained from http://www.blackwellpublishing.com/rss

2. Inference from designed experiments

The analysis of data from factorial-type designs typically includes ﬁtting one or more polynomial models and carrying out hypothesis tests on these models, in particular to compare models of different orders and to test whether the model is better than a null model, but also to test individual high order parameters. Once a reasonable model has been found it is interpreted by estimating the parameters. Except in rare cases in which the variance of experimental units,σ²,

(3)

can be assumed to be known, all such tests require an estimate ofσ². We assume that estimating σ²is not of interest in itself but is required only for carrying out inference that is related to the treatment comparisons.

For a completely randomized design, in general, we assume that an experiment has been run with t treatments deﬁned by combinations of the levels of the factors and that the responses can be modelled as

Yij= μi+ "ij, i = 1, . . . , t, j = 1, . . . , ni, .1/

where Yijis the response from the jth replicate of treatment i,μ_iis the expected response from treatment i, E.ε/ = 0 and V.ε/ = σ²I. We shall refer to this as the full treatment model. Exper- imenters often try to make interpretation easier and more informative by ﬁtting a submodel with

μ_i= β0+ f.x_i/β, i = 1, . . . , t, .2/

where x_irepresents the levels of the factors X1, . . . , Xqin treatment i, f is a .p − 1/-dimensional function of these levels andβ is a .p − 1/-dimensional vector of parameters. In this paper we shall assume that f represents a polynomial which respects functional marginality, but the same arguments would apply to other linear models.

The first question that we shall address is whether, when performing inference from model (2), we should use the estimate ofσ²from fitting model (2), or the pure error estimate obtained from fitting model (1). To illustrate the difference that it might make, consider exercise 11.6 of Box and Draper (2007). They analysed data from a three-factor rotatable CCD with four centre points, with one gross outlier (from a factorial point) removed. The pure error estimate ofσ² is s²= 10:77 on 3 degrees of freedom, whereas the estimate that is obtained from the second- order model, i.e. pooling the pure error and second-order model lack-of-fit degrees of freedom, is s²_p= 7:03 on 7 degrees of freedom. The mean square for second-order effects is 30.90 and, using s_p², Box and Draper found that the test of the second-order parameters gives a p-value of 0.037 and went on to do further interpretation of this model. However, using s², the test of the second-order parameters gives a test statistic of F = 2:87, which on 6 and 3 degrees of freedom gives a p-value of 0.208, which would suggest that there is little justification for further interpreting the second-order model. A more clear-cut recommendation could have been given if the design had allowed more than 3 degrees of freedom for pure error.

This is not an isolated example. A few pages earlier in the same book (Box and Draper (2007), pages 668–669), one example has s²= 5:667 and s²_p= 19:612, although lack of fit is not signifi- cant, and the next example has s²= 220:5 and s²_p= 81:7. Although in these two examples the qualitative conclusions are not changed greatly, standard errors could be quite drastically under- estimated or overestimated and confidence intervals can be too narrow or too wide. Clearly, it is an important decision which estimate ofσ²is used, but there is no unanimity among researchers. In many of the textbooks which users of factorial and response surface designs use, there is no acknowledgement of the complexity of the issue. Khuri and Cornell (1996), Dean and Voss (1999), Box and Draper (2007) and Atkinson et al. (2007) recommend using s²_p. Cochran and Cox (1957) and John (1971) recommend using s². Cornell (2002) seems to recommend both in different sections and Hinkelmann and Kempthorne (2005) seem to give a different recommendation from Hinkelmann and Kempthorne (2008).

Although most researchers do not discuss the issue and their practice varies, those who do discuss it generally come down in favour of using pure error, i.e. s². Among some of the classic texts, Draper and Smith (1998), page 48, state that s²

(4)

‘will usually provide an estimate ofσ²which is much more reliable than we can obtain from any other source’,

adding that

‘For this reason, it is sensible when designing experiments to arrange for (replicates)’.

In the context of factorial experiments, Scheffé (1959), pages 126–127, strongly condemned the practice of pooling sums of squares from non-signiﬁcant interactions because it leads to biased estimation ofσ²(since pooling of zero effects will only be done when their estimates turn out to be small) and leads to procedures whose statistical properties are not known. He also recommends

‘designing the experiment so that there will be a sufﬁcient number of d.f. for (pure) error’.

Cox (1958) agrees thatσ²‘is best estimated’ by pure error, but is more forgiving and allows the use of s²_pif ‘only one observation is made on each treatment’. Davies (1956) agrees and Wu and Hamada (2009) tend towards the same view but make a less clear-cut recommendation.

We accept the view that inferential procedures should be carried out by using the unbiased pure error estimator ofσ², while acknowledging that statistical inference is often not the most important part of the analysis and interpretation of experimental data. The main problem with using s²_pis that the biases induced are unmeasurable and the inferences are therefore difﬁcult to interpret. In any particular experiment, if carrying out inference is regarded as being important, then the design should be chosen to make that inference as informative as possible. In the next section, we show how the standard optimality criteria must be modiﬁed to do this. Later, we shall consider compound criteria for situations in which inference represents a part, but not the whole, of the means of analysing the data from the experiment.

3. Design criteria for inferential procedures 3.1. Definitions

If the data from an experiment are to be analysed primarily by using confidence intervals or regions and/or hypothesis tests, then the experiment should be designed to ensure that these procedures will be as informative as possible. For simplicity of presentation and to clarify the relationships with standard criteria, we shall assume that the aim is to obtain unbiased confidence intervals or regions of minimal length or volume. The same criteria will maximize the power of hypothesis tests which can be obtained from these confidence intervals or regions, e.g.

t-tests of individual parameters or F -tests of sets of parameters. To carry out these procedures the design must allow a sufﬁcient number of degrees of freedom for estimating error. We can, in fact, specify formally the appropriate criterion.

The usual statistical justification for D-optimality is that it minimizes the volume of the joint confidence region for the parameters—see, for example, Atkinson et al. (2007), page 135. This is based on the fact that the volume of the confidence region is proportional to|XX|⁻¹⁼², where X is the polynomial model matrix, given the treatment design, with ith row .1 f.xi//, and so the D- criterion minimizes 1=|XX|. Although this is correct, as noted by Kiefer (1959), ‘with σ²known or else (pure error degrees of freedom) the same for all designs’, in a general form, the volume of a 100.1 − α/% confidence region (see Draper and Smith (1998), page 145) is proportional to

.F_p,d;1−α/^p=2|XX|⁻¹⁼²,

where p is the number of parameters in the model, d is the number of pure error degrees of freedom and F_p,d;1−α is the .1 − α/-quantile of the F -distribution with p numerator and d

(5)

denominator degrees of freedom. Thus the D-criterion should be to minimize .F_p,d;1−α/^p=|XX|:

We shall refer to this as the DP.α/ criterion. In this paper, we shall use α = 0:05 for illustration and refer to the criterion simply as DP, but the required conﬁdence level should be considered carefully for each experiment. Despite the above quotation, Kiefer (1959) did not suggest this additional step, since he did not separate lack of ﬁt from pure error.

Similarly, DS-optimality is intended to minimize the volume of a joint conﬁdence region for a subset of p2of the parameters by minimizing|.M⁻¹/22|, where M = XX and .M⁻¹/22is the portion of its inverse corresponding to the subset of the parameters of interest. To take account of pure error estimation correctly, the .DP/Scriterion is to minimize

.F_p2,d;1−α/^p²|.M⁻¹/22|:

This criterion should be used, for example, when a major objective of the experiment is to compare the ﬁrst-order model with the second-order model. Then the higher order terms will form the subset and minimizing the volume of a conﬁdence region for them will be equivalent to maximizing the power of a test for their existence. Note that if the parameters of interest are the treatment parameters and the nuisance parameter(s) is or are the intercept or the intercept plus block effects, then standard DS-optimality reduces to D-optimality. With the new criterion, this is no longer true, owing to the reduction in the numerator degrees of freedom.

Throughout this paper, we shall use DSto refer to the situation where all parameters of the polynomial treatment model (excluding the intercept) contribute to the criterion, unless otherwise stated.

L-optimality is intended to minimize the mean of the variances of several linear functions of the parameters, deﬁned by L.β0β/by minimizing tr{W.XX/⁻¹}, where W = LL. If W is diagonal, this reduces to weighted-A-optimality and if all the diagonal elements are equal we obtain A-optimality, whereas, if p2 of them are equal and the rest are 0, we obtain AS-opti- mality. Note that A- and AS-optimality are scale dependent, i.e. they are not invariant to linear reparameterizations, so, whenever a design is described as A or ASoptimal, we shall state with respect to which scaling. If L has a single column, the criterion reduces to c-optimality. With L-optimality the property that is claimed for the criterion can be related to point estimation and so the standard criteria are acceptable in this sense. However, the width of a conﬁdence interval for the ith linear function of the parameters is proportional to√{F_1,d;1−αl_i.XX/⁻¹l_i}, where l_iis the ith row of the matrix L, and so the mean of the squared lengths of such intervals is minimized by minimizing

F_1,d;1−αtr{W.XX/⁻¹},

which we refer to as the LP-criterion, with the letter P also being used for the special cases, such as AP-optimality. It might be advisable to replace F1,d;1−α with a similar quantity corrected for multiple testing, but this is rarely done in the analysis of data from experiments of this type.

The usual form of G-optimality minimizes the maximum variance of the estimated response over the design region by minimizing maxx{.1 f.x//.XX/⁻¹.1 f.x//}. Again this is suitable for point estimation but, if (pointwise) conﬁdence intervals for the mean response or prediction intervals for new units are required, we should deﬁne the GP-criterion to be to minimize

F_1,d;1−αmax

x {.1 f.x//.XX/⁻¹.1 f.x//}:

(6)

The same general idea can be used for any other design optimality criterion which relates to the experiment’s ability to allow statistical inference of any type to be performed. Other obvious examples include I -optimality and T -optimality—see Atkinson et al. (2007) for these and other criteria.

We make the following comments about these new criteria.

(a) An echo of the general idea can be found in the last section of Fisher (1966), pages 242–

245, in the context of sample size calculations. However, Fisher’s method is based on ﬁducial probability, which might have hindered its acceptance. As he wrote, it

‘is unintelligible only to those who over a long period resisted the cogency of the ﬁducial argument’.

(b) In experiments in which the only treatment design question is how many (greater than 0) replicates there should be of each treatment, e.g. experiments with unstructured treat- ments, the number of distinct treatments is constant, so d depends only on the total number of experimental units and hence the new criteria are identical to the standard criteria.

(c) Asymptotically, as d → ∞, the new criteria converge to the standard criteria. Hence, in very large experiments, the designs that are chosen will be the same.

(d) The concept of continuous design is not meaningful with the new criteria, since the quantiles of the F -distributions are not proportional to n. Consequently, there are no equivalence theorems. Hence, for ﬁnding optimum or near optimum designs, there is usually no alternative to either an exhaustive search or a discrete optimization heuristic, such as an exchange algorithm.

(e) The standard versions of most criteria are meaningful in terms of point estimation, but for the D-criterion and its variants the standard version really has no statistical interpretation and should be abandoned.

Although these criteria are new, the idea of including enough degrees of freedom to estimate pure error is common in response surface methodology. Following Dykstra (1959), several researchers have considered partially replicated two-level designs, although optimality is not considered. More recently, Liao and Chai (2004) and Dasgupta and Jacroux (2010) evaluated their partially replicated designs by using the standard D-criterion. However, in all these cases, the choice of the number of pure error degrees of freedom is made informally and separately from optimality considerations.

We have implemented some of the new criteria in a standard exchange algorithm for construct- ing optimum designs. A candidate set of treatments (combinations of factors’ levels) is formed.

A random initial design, selected from the candidate set, starts the search and the complete procedure is repeated for a number of different initial designs (tries), as is usual for exchange algorithms. The search proceeds by systematically making exchanges between treatments in the current design and in the candidate set and accepting any exchange that improves the criterion function. Many other types of exchange algorithm exist, e.g. making use of the ideas of excursion (making more than one exchange before evaluating the design) or stochastic ideas, such as accepting with small probability an exchange which makes the design worse. For typical response surface problems, the simple exchange algorithm that we use is as good as any other, although for specific examples it might be possible to do slightly better. The numbers of pure error degrees of freedom are obtained by labelling the treatment combinations and counting the number of treatment levels at each exchange tried. These algorithms will find near optimum designs, but there is no guarantee that they will find the global optimum.

(7)

3.2. Examples

To show how the optimum designs using the new criteria differ from those using the standard criteria, we present some illustrative examples. DS- and AS-optimum designs are found for poly- nomial models, the nuisance parameter being the intercept, such that .M⁻¹/22= .XQ₀X/⁻¹ where Q₀= I − .1=n/11 and the X-matrix does not include the column of 1s. When aiming at a second-order model in a cubic region of experimentation under the AS-criterion, we use weights to bring the different effects to the same scale, i.e. linear and interaction terms are given weight 1 and quadratic effects are given weight 0.25. This ensures that the optimum design for each individual parameter gives the same variance for the relevant parameter and can be considered as a simple application of the scaling that was recommended for non-linear models by Atkinson et al. (1993). The elements of the diagonal of W were scaled to add up to 1. In a spherical region, all parameters are given equal weights because they contribute equally to the polynomial approximation to the unknown true response function. We use the notation of subset designs to describe the treatments that are used. For q factors Sris defined as the subset from the full factorial, in which r factors appear as±1 (or plus or minus another constant in a spherical region) and, if r < q, q − r factors appear as 0; see Gilmour (2006) and Ahmad and Gilmour (2010) for more details on subset designs. The following tables show DS-, .DP/S-, AS-, .AP/S- and, in some cases, compound optimum designs for different numbers of factors and runs. Compound criteria are discussed in Section 4. To discriminate better between the designs, the tables also show pure error, PE, and lack of fit, LoF, degrees of freedom and efficiencies (in percentages) with respect to the best design found under each criterion.

3.2.1. Example 1 (n = 16; q = 3; p = 10)

In Table 1 we show a few designs, under the second-order model, for three factors in 16 runs.

Using the DS- or AS-criteria resulted in the same design I which, as usual, includes very few points close to the centre of the design region. There are no replicated treatments and thus the design does not allow pure error estimation. It does, however, allow 6 degrees of freedom for lack of fit. Using the pure error versions of the criteria resulted in 6 and 5 degrees of freedom for the .DP/S- and .AP/S-criteria, designs II and III, respectively. Design II is very extreme in the sense that it has no degrees of freedom for checking lack of fit. In both designs there is replication of points in the corners and on the faces of the cube, but no centre points. This is quite different from the usual practice of experimenters. Design IV is .AP/S.α = 0:05=9/ optimal, i.e. we used a Bonferroni adjustment for multiple comparisons. Although designs II and IV are equivalent in terms of their pure error degrees of freedom they have different properties for estimating the effects: the former being better for joint inferences on the treatment parameters; the latter for inferences on individual parameters. For comparison we also evaluated two subset designs, a CCD with two centre points (S3+ S1+ 2S0), allowing 1 degree of freedom for pure error and a modified Box–Behnken design (S2+ 4S0), allowing 3 degrees of freedom for pure error. The CCD is quite similar to the .AP/S-optimum design in terms of properties of the information matrix (DS-eff= 93.15; AS-eff= 90.75) but poorer for pure error estimation (.DP/S-eff= 1.91;

.AP/S-eff= 4.31). The Box–Behnken design is a compromise in terms of pure error and lack- of-ﬁt degrees of freedom, but is poor in terms of variance properties (DS-eff= 74.94; AS-eff

= 66.34; .DP/S-eff= 41.95; .AP/S-eff= 50.17). The desirability of such compromises will be discussed further in Section 4.

3.2.2. Example 2 (n = 18; q = 3; p = 10)

For the experiment that was described in exercise 11.6 of Box and Draper (2007), which was

(8)

Table 1. Optimum† designs and their properties for three three-level factors in 16 runs under the second-order model (example 1)

Designs for the following criteria:

DSor AS—I (DP)S—II (AP)S—III

X1 X2 X3 X1 X2 X3 X1 X2 X3

−1 −1 −1 −1 −1 1 −1 −1 −1

−1 −1 1 −1 −1 1 −1 −1 1

−1 1 −1 −1 1 −1 −1 1 −1

−1 1 1 −1 1 −1 −1 1 −1

1 −1 −1 −1 1 1 −1 1 1

1 −1 1 1 −1 1 1 −1 −1

1 1 −1 1 −1 1 1 −1 1

1 1 1 1 1 1 1 −1 1

1 −1 0 1 0 −1 1 1 −1

1 1 0 1 0 −1 1 1 1

1 0 −1 0 −1 −1 1 0 0

1 0 1 0 −1 −1 1 0 0

0 1 1 −1 0 0 0 1 0

−1 0 0 −1 0 0 0 1 0

0 −1 0 0 1 0 0 0 1

0 0 −1 0 0 1 0 0 1

df (PE; LoF)‡ (0; 6) (6; 0) (5; 1)

DS-eff 100.00 83.09 93.03

AS-eff 100.00 66.52 86.27

.DP/S-eff 0.00 100.00 96.17

.AP/S-eff 0.00 85.10 100.00

(continued )

mentioned in Section 2, several optimum designs are shown in Table 2. The efﬁciencies for the Box and Draper design (using the axial points at √

3, which is slightly different from those actually used) are DS-eff= 96.36, AS-eff= 98.68, .DP/S-eff= 42.46 and .AP/S-eff= 63.10.

Note that design III is very similar to this design. Again the designs that are built by the new criteria give quite extreme designs which allow little or no testing for lack of ﬁt.

3.2.3. Example 3: cassava bread (n = 26; q = 3; p = 10)

Escouto (2000) performed an experiment to formulate a recipe for gluten-free bread based on cassava flour. Gluten-free food is recommended to people with coeliac disease, which is an auto- immune disorder of the small intestine which occurs in genetically predisposed individuals, and there is a large market for gluten-free versions of staple foods. The experiment involved three factors: X1, the amount of powdered albumen (egg white) with levels 10, 20 and 30 g; X2, the amount of yeast with levels 5, 10 and 15 g; X3, the amount of ground cassava flour with levels 45, 55 and 65 g. Other substances such as salt, sugar, vegetable fat, powdered milk, fermented cassava starch and water were maintained at constant levels in all recipes, as were factors that were associated with the mixing and baking processes. A modified CCD, with duplicated factorial points and four centre points (2S3+S1+4S0) in 26 runs was planned, allowing 11 degrees of freedom for pure error. Several organoleptic characteristics of the product were evaluated as response variables. The objective was to find a formulation which would present similar

(9)

Table 1 (continued )

Design for criterion Designs for the following compound criteria§:

(AP)S§§—IV

κ = (0, 0.2, 0, 0.8)§§, κ = (0, 0.2, 0, 0.8)—VI κ = (0.2, 0, 0, 0.8)—V

X1 X2 X3 X1 X2 X3 X1 X2 X3

−1 −1 −1 −1 −1 −1 −1 −1 −1

−1 1 −1 −1 −1 1 −1 1 −1

1 −1 −1 −1 1 −1 −1 1 1

1 1 −1 −1 1 1 1 −1 −1

−1 0 1 1 −1 −1 1 −1 −1

−1 0 1 1 −1 1 1 −1 1

1 0 1 1 1 −1 1 1 −1

0 −1 1 1 1 1 1 1 1

0 1 1 −1 0 0 −1 −1 0

0 1 1 1 0 0 −1 0 1

−1 0 0 0 −1 0 0 −1 1

−1 0 0 0 −1 0 1 0 0

0 0 −1 0 0 −1 0 1 0

0 0 −1 0 0 −1 0 0 −1

df (PE; LoF)‡ (6; 0) (4; 2) (3; 3)

DS-eff 80.94 95.10 98.68

AS-eff 73.09 89.46 96.03

.DP/S-eff 97.42 78.21 55.24

.AP/S-eff 93.50 88.89 72.63

†Where a heading indicates two criteria, the design is optimal for both.

‡Degrees of freedom for pure error and lack of ﬁt.

§The compound criterion is deﬁned in equation (5) in Section 4.1.

§§Conﬁdence level corrected for multiple comparisons.

characteristics to wheat-based white bread. Several alternative designs are presented in Table 3.

Again the DS- and AS-criteria give identical designs (design I) allowing 9 degrees of freedom for pure error. Using the (DP)_S- and .AP/S-criteria resulted in 15 and 12 degrees of freedom for pure error respectively. Using a Bonferroni correction for multiple comparisons, the .AP/S- .α = 0:05=9/ optimum design (design IV) gives a design allowing 13 degrees of freedom for pure error. These designs show that, for carrying out inference, it is beneﬁcial to sacriﬁce a little in terms of the traditional criteria to improve the estimation ofσ².

For comparison we show in Table 4 the properties of a few subset designs, including the design (2S3+S1+4S0) which was actually used in the experiment. We note that this design allows similar numbers of pure error degrees of freedom to design III but it is poorer with respect to the other properties. A modifed Box–Behnken design (2S2+ 2S0) allows the same number of pure error degrees of freedom as the .AP/S- .α = 0:05=9/ optimum design but it is also poorer with respect to the other properties. The 3³-factorial excluding the centre point (S3+ S2+ S1) is quite attractive with respect to its variance properties but allows no degrees of freedom for pure error. Other attempts to construct designs from the S_rsubsets are quite inferior with respect to all the properties evaluated.

3.2.4. Example 4: oil extraction (n = 40; q = 5; p = 20)

An experiment on the extraction of oil from oilseeds was described by Rosenthal et al. (2001).

(10)

Table 2. Optimum designs and their properties for three factors in a spherical region (α Dp

.3=2/) in 18 runs under the second-order model (example 2)

DS—I (DP)S—II AS—III

X1 X2 X3 X1 X2 X3 X1 X2 X3

−1 −1 −1 1 −1 −1 −1 −1 −1

−1 −1 1 1 −1 −1 −1 −1 −1

−1 1 −1 1 1 −1 −1 −1 1

−1 1 1 1 1 −1 −1 1 −1

1 −1 1 0 −α α −1 1 1

1 1 −1 0 −α α 1 −1 −1

1 1 1 0 α α 1 −1 1

0 −α −α 0 α α 1 1 −1

α 0 −α −α 0 −α 1 1 1

α −α 0 −α 0 −α −√

3 0 0

0 0 −√

3 −α 0 α √

3 0 0

0 0 √

3 −α −α 0 0 −√

3 0

0 −√

3 0 −α −α 0 0 √

3 0

0 √

3 0 −α α 0 0 0 −√

−√ 3

3 0 0 −α α 0 0 0 √

√ 3

3 0 0 √

3 0 0 0 0 0

0 0 0 √

3 0 0 0 0 0

0 0 0 0 0 0 0 0 0

df (PE; LoF) (1;7) (8;0) (3;5)

DS-eff 100.00 87.28 98.75

AS-eff 96.49 73.56 100.00

.DP/S-eff 1.61 100.00 43.50

.AP/S-eff 3.87 89.58 63.94

(continued )

The experiment involved five factors, one of them qualitative (type of enzyme) at two levels (X1), and a second-order model was expected to fit the data. The actual experiment included 50 runs, 10 of which had no enzyme added and thus two of the other factors were not defined. For illustration we shall consider designs for the 40 runs with enzyme added. For this part of the experiment, the design used was¹₂S5+S2+4S1which allows 6 degrees of freedom for pure error.

This design and four optimum designs are shown in Table 5. The DS-optimum design (design I) allows just 1 degree of freedom for pure error, whereas the AS-optimum design (design III) allows none. The .DP/S-optimum design (design II) has 20 degrees of freedom for pure error, but no degrees of freedom for lack of ﬁt. The effect of this on the other properties of the design is quite large. The .AP/S-optimum design allows 15 degrees of freedom for pure error and 5 for lack of ﬁt and might seem more reasonable to many experimenters.

3.2.5. Example 5 (n = 16; q = 8; p = 9)

Table 6 shows designs for eight two-level factors in 16 runs, under the ﬁrst-order model. This situation differs from the others considered in that the DS- and AS-optimum design (design I) has the same information matrix as a regular 1=16 fractional factorial, which does not allow for pure error estimation. The .DP/S-optimum design (design II) is an irregular fraction which

(11)

Table 2 (continued )

Design for criterion Designs for the following compound criteria†:

(AP)S—IV

κ = (0.5, 0, 0, 0.5)—V κ = (0, 0.5, 0, 0.5)—VI

X1 X2 X3 X1 X2 X3 X1 X2 X3

−1 1 1 −1 −1 1 −α −α 0

1 −1 1 1 −1 −1 −α α 0

1 −1 1 1 −1 −1 α −α 0

1 1 −1 1 −1 1 α α 0

1 1 1 0 α −α −α 0 −α

1 1 1 0 α −α −α 0 α

−α −α 0 0 α α −α 0 α

−α −α 0 −α 0 −α α 0 −α

0 −α −α −α 0 −α α 0 α

0 −α −α −α α 0 0 −α −α

−α 0 −α −α α 0 0 α −α

−α√ 0 −α α α 0 0 −α α

3 0 0 0 0 √

3 0 α α

0 √

3 0 0 −√

3 0 0 −α α

0 0 √

3 √

3 0 0 0 0 0

0 0 0 0 0 0 0 0 0

df (PE; LoF) (7;1) (6;2) (5;3)

DS-eff 90.69 93.50 93.49

AS-eff 86.34 88.55 96.58

.DP/S-eff 95.75 88.55 76.05

.AP/S-eff 100.00 95.78 94.66

†The compound criterion is deﬁned in equation (5) in Section 4.1.

allows 7 degrees of freedom for pure error and none for lack of fit. Again this shows that, for inference, it is worth sacrificing a considerable amount in terms of the D-criterion to gain pure error degrees of freedom. The .AP/S-optimum design (design III) is even more extremely irregular, with one factor having 12 runs at one level and four runs at the other. Of course, by sacrificing orthogonality, we lose the ability to perform other types of analysis, such as normal or half-normal plots of effects. If this, rather than formal inference, was regarded as the main purpose of the experiment then, of course, we should use the standard AS-criterion to reflect this.

3.3. Blocked designs

Since many multifactor designs use moderately large numbers of runs, it is common that they must be run in blocks. For example, in industrial experiments the blocks often correspond to days or shifts.

In a blocked design the full treatment model for the response in unit j of block i with treatment k applied to it may be written as

Yij.k/= μi+ τk+ "ij, .3/

where μi is the expected response in block i, τk is the effect of treatment k, i = 1, . . . , b, j = 1, . . . , ni.Σ^b_i=1ni=n/, k =1, . . . , t, E.ε/=0 and V.ε/=σ²I. Block effects may be ﬁxed or random

(12)

Table 3. Optimum designs and their properties for three three-level factors in 26 runs under the second-order model (example 3)

‡Conﬁdence level corrected for multiple comparison.

but, as discussed in Gilmour and Trinca (2006), in the design phase it is safer to consider them as ﬁxed since the variance components are not known. This ensures that in the most difﬁcult case, with a large block variance component, the design is optimal, whereas, when the block variance component is small, though the design might be suboptimal for this case, it will give better estimation than in the case of a large block variance component. We shall follow this advice here.

As with completely randomized designs, we shall try to ﬁt submodels forτksuch as

τk= f.xk/β, .4/

(13)

Table 4. Subset designs and their properties for three three-level factors in 26 runs under the second-order model (example 3)

Design df (PE; LoF) DS-eff AS-eff (DP)S-eff (AP)S-eff

2S3+ S1+ 4S0 (11; 5) 90.89 82.43 86.56 83.16 S3+ 2S1+ 6S0 (11; 5) 72.68 66.63 69.22 67.22

S3+ S2+ S1 (0; 16) 94.27 92.82 0.00 0.00

2S2+ 2S0 (13; 3) 78.71 70.79 79.99 74.13

S2+ 2S1+ 2S0 (7; 9) 58.81 45.26 44.12 39.56 S2+ S1+ 8S0 (7; 9) 55.78 43.89 41.85 38.36

where f.xk/ and β are deﬁned as in equation (2). Note that xk may be written as x_ij, i.e. the levels of the q factors that were applied to unit j of block i.

By ﬁtting model (3) we obtain an unbiased estimator ofσ²under the usual assumption of addi- tive treatment effects, plus a valid randomization. Using the same argument as for unblocked designs, for inferences on theβ-parameters we should use this estimate of error variance and so should use a design which allows its estimation. Note that, as is usually the case, some treatment effects may be confounded with blocks.

The variance matrix of the least squares estimator ˆβ is

var. ˆβ/ = σ².M⁻¹_B /22= σ².XQX/⁻¹, where

M_B=

ZZ ZX XZ XX

,

X is deﬁned as before (the column of 1s is excluded and so X has p − 1 columns), Q = I − Z.ZZ/⁻¹Zand Z is the n × b matrix whose columns are indicators for blocks. The subscript B stands for the blocked design.

Criteria for designs allowing for blocks, the usual ones and the new ones proposed in this paper, are deﬁned as before with M_B or XQX taking the place of XX or XQ₀X, and the appropriate numerator degrees of freedom. In particular, DS-optimum designs are also D-optimum designs because |MB| = |XQX||ZZ| and |ZZ| is constant. However, .DP/S-optimum designs are not DP-optimum designs, owing to the reduction in numerator degrees of freedom.

Whereas the DP-criterion minimizes .F_p+b−1,dB;1−α/^p+b−1=|MB| the .DP/S-criterion minimizes .F_{p−1, d}B;1−α/^p−1=|XQX|, where dB is the pure error degrees of freedom from the blocked design.

An exchange algorithm is also applied to construct near optimum blocked designs. This is a simple extension of the algorithm that was brieﬂy described in Section 3.1. A random initial design starts the search and the complete procedure is repeated for a number of different initial designs. Exchanges which improve the criterion are accepted. For the new criteria the calcula- tion of dB= n − rank.Z : T/, where T is the n × t matrix indicator for treatments, is required for obtaining the pure error degrees of freedom, reducing the computational beneﬁts of updating formulae.

3.3.1. Example 6: pastry dough (n = 28; b = 7; q = 3; p = 10)

This experiment was described by Trinca and Gilmour (1999). The main objective was to discover how some factors that are involved in an extrusion process for mixing dough could be

(14)

Table5.Designsandtheirpropertiesforfivefactors,oneattwolevelsandfouratthreelevels,in40runs,underthesecond-ordermodel(example4;see alsoTable8inSection4.3.4)

(15)

Table5(continued)

(16)

Table 6. Optimum designs and their properties for eight two-level factors in 16 runs, under the linear effects model (example 5)

DSor AS—I (DP)S—II

X1 X2 X3 X4 X5 X6 X7 X8 X1 X2 X3 X4 X5 X6 X7 X8

−1 −1 −1 −1 1 1 −1 1 −1 −1 −1 1 −1 −1 −1 −1

−1 −1 −1 1 −1 −1 −1 −1 −1 −1 1 −1 1 1 1 −1

−1 −1 1 −1 −1 1 1 −1 −1 −1 1 −1 1 1 1 −1

−1 −1 1 1 1 −1 −1 −1 −1 −1 1 1 1 1 1 1

−1 1 −1 −1 1 −1 1 −1 −1 −1 1 1 1 1 1 1

−1 1 −1 1 −1 1 1 1 −1 1 −1 −1 −1 1 1 1

−1 1 1 −1 −1 −1 −1 1 −1 1 −1 −1 −1 1 1 1

−1 1 1 1 1 1 1 1 −1 1 1 −1 1 −1 −1 1

1 −1 −1 −1 −1 −1 1 1 1 −1 −1 −1 1 1 −1 1

1 −1 −1 1 1 1 −1 1 1 −1 −1 −1 1 1 −1 1

1 −1 1 −1 1 −1 1 1 1 −1 1 −1 −1 −1 1 1

1 −1 1 1 −1 1 1 −1 1 −1 1 −1 −1 −1 1 1

1 1 −1 −1 −1 1 −1 −1 1 1 −1 1 1 −1 1 −1

1 1 −1 1 1 −1 1 −1 1 1 −1 1 1 −1 1 −1

1 1 1 −1 1 1 −1 −1 1 1 1 1 −1 1 −1 −1

1 1 1 1 −1 −1 −1 1 1 1 1 1 −1 1 −1 −1

df (PE; LoF) (0; 7) (7; 0)

DS-eff 100.00 88.69

AS-eff 100.00 81.08

.DP/S-eff 0.00 100.00

.AP/S-eff 0.00 95.25

(AP)S,κ = (0, 0.5, 0, 0.5)†—III κ = (0.5, 0, 0, 0.5)†—IV

X1 X2 X3 X4 X5 X6 X7 X8 X1 X2 X3 X4 X5 X6 X7 X8

−1 −1 −1 1 −1 −1 1 −1 −1 −1 1 −1 −1 1 1 −1

−1 −1 −1 1 −1 −1 1 −1 −1 −1 1 −1 1 −1 1 −1

−1 −1 1 1 1 −1 −1 1 −1 −1 1 1 1 1 −1 1

−1 1 −1 −1 −1 −1 −1 1 −1 1 −1 −1 −1 1 −1 1

−1 1 −1 −1 −1 −1 −1 1 −1 1 −1 −1 1 −1 −1 1

−1 1 −1 1 1 1 −1 −1 −1 1 −1 1 1 1 1 −1

−1 1 1 −1 1 −1 1 −1 1 −1 −1 −1 1 1 −1 −1

−1 1 1 1 −1 1 1 1 1 −1 −1 1 −1 −1 1 1

1 −1 −1 −1 1 1 1 1 1 1 1 −1 1 1 1 1

1 −1 1 −1 −1 1 −1 −1 1 1 1 −1 1 1 1 1

1 1 −1 1 1 −1 1 1 1 1 1 1 −1 −1 −1 −1

1 1 1 1 −1 −1 −1 −1 1 1 1 1 −1 −1 −1 −1

df (PE; LoF) (6; 1) (6; 1)

DS-eff 93.06 93.06

AS-eff 91.14 87.80

.DP/S-eff 94.27 94.27

.AP/S-eff 100.00 96.34

(17)

varied to control the properties of the pastry. Three controllable factors of interest were the ﬂow rate of water into the mix, the initial moisture content in the mix and the screw speed. As the properties of the dough varied from day to day because of uncontrollable factors, the design was divided into seven blocks (days) of four runs each. Table 7 shows some alternative designs, along with the design that was used. The DS- and AS-optimum design (design I) allows 2 degrees of freedom for error. The .DP/S-optimum design allows 12 degrees of freedom for error, whereas the .AP/S-optimum design allows 10. Again we note that, for the .DP/S-optimum design, there are no spare degrees of freedom for lack of ﬁt. Because of the extra restrictions required by blocking, we see that the designs which allow for pure error estimation cost more in terms of the traditional criteria than in completely randomized designs. The design that was actually used is clearly inferior to the new designs.

Table 7. Designs and their properties for three three-level factors in seven blocks of four, under the second-order model (example 6; see also Table 9 in Section 4.3.6)

Block Designs for the following criteria: Design used—IV

DSor AS—I (DP)S—II (AP)S—III

X1 X2 X3 X1 X2 X3 X1 X2 X3 X1 X2 X3

1 −1 1 −1 −1 −1 1 −1 −1 1 −1 −1 −1

−1 −1 1 −1 1 −1 −1 1 −1 −1 1 1

1 −1 −1 1 −1 −1 1 −1 −1 1 −1 1

1 1 1 1 1 1 1 1 1 1 1 −1

2 −1 −1 −1 −1 −1 1 1 −1 −1 −1 −1 1

1 1 −1 −1 1 −1 1 1 1 −1 1 −1

1 −1 1 1 −1 −1 0 −1 1 1 −1 −1

0 0 0 1 1 1 1 0 0 1 1 1

3 −1 1 1 −1 −1 1 −1 1 1 −1 1 −1

1 −1 1 −1 1 −1 1 1 −1 0 −1 0

1 1 −1 1 −1 −1 −1 −1 0 1 0 0

0 0 0 1 1 1 0 0 −1 0 0 1

4 −1 1 −1 −1 1 1 −1 1 1 1 −1 1

−1 −1 0 1 1 −1 1 1 −1 −1 0 0

1 0 −1 −1 0 0 −1 −1 0 0 1 0

0 1 1 0 −1 0 0 0 −1 0 0 −1

5 −1 1 1 −1 1 1 −1 1 −1 −1 −1 −1

1 1 0 1 1 −1 1 −1 −1 1 1 1

−1 0 −1 −1 0 0 0 −1 1 0 0 0

0 −1 1 0 −1 0 1 0 0 0 0 0

6 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 1

1 −1 0 1 −1 1 1 −1 1 1 1 −1

−1 0 1 1 1 0 −1 0 1 0 0 0

0 1 −1 0 0 −1 0 1 0 0 0 0

7 −1 −1 1 −1 −1 −1 −1 −1 −1 −1 1 1

−1 1 0 1 −1 1 1 −1 1 1 −1 −1

1 0 1 1 1 0 −1 0 1 0 0 0

0 −1 −1 0 0 −1 0 1 0 0 0 0

df (PE; LoF) (2; 10) (12; 0) (10; 2) (7; 5)

DS-eff 100.00 88.19 94.87 80.02

AS-eff 100.00 78.40 92.48 73.31

.DP/S-eff 16.36 100.00 99.59 69.01

.AP/S-eff 29.00 88.65 100.00 70.38

(18)

4. Compound criteria 4.1. Definition

The designs that are produced by the new criteria are quite extreme and experimenters might be reluctant to use designs which are so different from what they are used to. This attitude might be correct sometimes, depending on the objectives of the experiment. We would argue strongly in favour of carefully considering what kinds of data analysis will be used to meet the experimenters’ objectives and then carefully choosing the optimality criterion to match that data analysis. If a joint conﬁdence region or a global F -test of the treatment parameters will be the only relevant analysis, then a .DP/S-optimum design should be chosen.

In many experiments several types of data analysis are important, not all of them requir- ing an estimate of error. In particular, the analysis might involve all or most of the following:

(a) a global F -test of the treatment parameters, for which we should use .DP/S-optimality;

(b) t-tests of the individual treatment parameters, for which we should use weighted-AP- optimality;

(c) point estimation of the individual treatment parameters, for which we should use weighted- A-optimality;

(d) checking for lack of ﬁt of the assumed treatment model and, if appropriate, ﬁtting a few higher order terms.

For the last of these analyses, there is no obvious design optimality criterion to use. Atkin- son (1972) proposed a criterion for ﬁnding designs which are powerful for detecting lack of ﬁt.

This requires a prior estimate of the sizes of the higher order parameters. Jones and Mitchell (1978) relaxed this by using a minimax or average version over a range of parameter values.

More recently, Goos et al. (2005) combined this criterion with others to produce designs which are both model robust and model sensitive. However, all of these criteria aim specifically at testing for lack of fit, whereas we are also interested in being able to estimate a few higher order parameters and discriminating between models. To avoid having to consider too many different criteria, we use the degree-of-freedom efficiency that was proposed by Daniel (1976), pages 177–178, as a simple way of incorporating all of these requirements. The degree-of-freedom efficiency is the proportion of our experimental resource which is used to estimate the effects of treatments. Clearly this is directly in conflict with our pure error criteria and a good design must be a compromise.

We now combine all of these in a compound criterion, following exactly the method- ology that was described by Atkinson et al. (2007). We first define the following efficiencies, for the design with treatment model matrix X which has d degrees of freedom for pure error:

(a) the .DP/S-efﬁciency,

E1= |XQ₀X|^1=.p−1/F_p−1,d_D_;1−α1

F_{p−1,d;1−α}₁|.X_DPQ₀X_DP/|^1=.p−1/,

where X_DP is the model matrix for the .DP/S-optimum design, which has d_Ddegrees of freedom for pure error, and the global F -test will be performed at the 100α1% level of signiﬁcance;

(b) the weighted-AP-efﬁciency,

(19)

E2=tr{W.X_APX_AP/⁻¹}F_1,d_A_;1−α₂ tr{W.XX/⁻¹}F_1,d;1−α2

,

where X_APis the model matrix for the weighted-AP-optimum design, which has dAdegrees of freedom for pure error and the individual t-tests will be calculated at the 100α2% level of signiﬁcance;

(c) the weighted-A-efﬁciency,

E3=tr{W.X_AX_A/⁻¹} tr{W.XX/⁻¹} ,

where X_Ais the model matrix for the weighted-A-optimum design;

(d) the degree-of-freedom efﬁciency,

E4=n − d n :

Next we combine these criteria with weightsκ1, . . . ,κ4respectively to obtain E = E^κ₁¹E^κ₂²× E^κ₃³E₄^κ⁴. After ignoring terms which do not depend on the design to be optimized, this means that we choose a design to maximize

|XQ₀X|^κ¹^=.p−1/.n − d/^κ⁴

F_{p−1,d;1−α}^κ¹ ₁F_1,d;1−α^κ² ₂tr{W.XX/⁻¹}^κ²^+κ³: .5/

The weightsκ should be chosen to reﬂect the relative importance of different aspects of the analysis and, in some experiments, some of the weights might be 0.

4.2. Compound criteria for blocking

The compound criteria that were deﬁned in Section 4.1 can easily be extended to blocked designs.

For blocking, XX or XQ₀X should be replaced by M_B or XQX in the efﬁciency formulae.

Hence, we have the following efﬁciencies:

(a) the .DP/S-efﬁciency,

E1= |XQX|^1=.p−1/F_{p−1, d}_BD_;1−α1

F_p−1,d_B_;1−α₁|.X_DP QX_DP/|^1=.p−1/,

where X_DPis the treatment model matrix forβ for the .DP/S-optimum blocked design, which has dBDdegrees of freedom for pure error, and the global F -test will be calculated at the 100α1% level of signiﬁcance;

(b) the weighted-.AP/S-efﬁciency,

E2=tr{W.X_APQX_AP/⁻¹}F_1,d_BA_;1−α2

tr{W.XQX/⁻¹}F_{1, d}_B_;1−α₂ ,

where X_APis the treatment model matrix forβ for the AP-optimum design, which has dBAdegrees of freedom for pure error and the individual t-tests will be calculated at the 100α2% level of signiﬁcance;

(c) the weighted-AS-efﬁciency,

E3=tr{W.X_AQX_A/⁻¹} tr{W.XQX/⁻¹} ,

(20)

where X_Ais the treatment model matrix forβ for the A-optimum design;

(d) the degree-of-freedom efﬁciency,

E4=n − b + 1 − dB

n − b + 1 : Ignoring constants, the combined criterion is

|XQX|^κ¹^=.p−1/.n − b + 1 − dB/^κ⁴

F_p−1,d^κ¹ _B_;1_−α₁F_1,d^κ²_B_;1_−α₂tr{W.XQX/⁻¹}^κ²^+κ³: .6/

4.3. Examples

Designs were built, using various compound criteria, for several of the examples that were presented in earlier sections. Many different compound criteria can give the same optimum design and sometimes they can be the same as the designs that are obtained by using simple criteria.

In what follows we discuss some of the more interesting designs that were found.

4.3.1. Example 1 (n = 16; q = 3; p = 10), continued

In Table 1 two designs (designs V and VI) are shown which place a high weight on degree-of- freedom efﬁciency, and the effect of this is very clear. The usual criteria allow no degrees of freedom for pure error and the pure-error-based criteria do not allow any test for lack of ﬁt.

The designs that are produced by the compound criteria are less extreme, offering a compromise between pure error and lack-of-fit degrees of freedom. We also note that it is possible to produce better designs than classical designs (e.g. regular subset designs) with similar numbers of degrees of freedom for pure error, e.g. comparing the modified Box–Behnken and design VI. The structure of design V is interesting, containing a full S3subset, with duplicates of two points, and six axial points, only two of which are a pair, the others being two replicates of one of each of the other pairs. This structure also seems like a compromise between the very tightly defined structure of subset designs and the apparently messy structures of D-optimum designs.

This design, which is optimal for two different weight patterns for the compound criteria, is very similar to the CCD in terms of the variance properties, but it allows much better estimation of pure error. It looks very attractive for practical use.

A similar type of compromise can be seen in the compound optimum designs in Table 2 for the experiment in a spherical region. Designs V and VI can be recommended for practical use, even though they are quite different from the standard designs.

4.3.3. Example 3: cassava bread (n = 26; q = 3; p = 10), continued

Some of the obvious compound optimum designs, as shown in Table 3, turn out to be the same as some of the other optimum designs that were found. In a situation like this, the criteria are not far from converging to each other, although we still see that the design used can be optimized for different objectives.

4.3.4. Example 4: oil extraction (n = 40; q = 5; p = 20), continued

This is another case where the designs in Table 5 give almost all degrees of freedom to lack of ﬁt, for the standard criteria, and to pure error for the new criteria, whereas classical designs, such

(21)

as that used in the experiment, tend to split the residual degrees of freedom more evenly. Two compound optimum designs are shown in Table 8. Again, we ﬁnd that they seem to represent a very good compromise, having a reasonable split between pure error and lack-of-ﬁt degrees of freedom and very good variance properties. Design VI, for example, has fairly similar degrees of freedom to those of the design actually used, but it has much lower variances of the treatment parameter estimators. If this consulting problem arose now, this is the design that we would recommend.

Even for the critical case of eight factors in 16 runs that was shown in Table 6, it is possible to obtain some weight patterns for the compound criterion which allow degrees of freedom for both pure error and lack-of-ﬁt estimation. Designs III and IV have the same value of the DS- and .DP/S-criteria. Design IV looks attractive, since it also seems ‘more regular’ than the .DP/S- and .AP/S-optimum designs in that each factor is either level balanced or has an imbalance of two runs. Design III has higher weighted-A-efﬁciency but is highly irregular. For structures of this type, there might be scope for further study of aliasing patterns and their relationship with the new optimality criteria.

4.3.6. Example 6: pastry dough (n = 28; b = 7; q = 3;p = 10), continued

For the pastry dough experiment, two compound optimum designs are shown in Table 9, which again show an attractive compromise between variance properties and allocation of degrees of freedom. Design VI has the same allocation of degrees of freedom to pure error and lack of ﬁt as the design actually used in the experiment, but it has considerably better estimation properties.

In hindsight, this might be the design that we would recommend now.

These examples have shown that the properties of the design can be tailored to speciﬁc objectives by choosing appropriate weights (κs) in the compound criterion. As is frequently the case with weighted criteria, in practice the experimenter should try a few different settings of theκs and evaluate the resulting designs to decide which one to use. Some experimenters might appreciate even simpler rules of thumb and, on the basis of the examples given here, along with several others, we could recommend using relative weights of 0, 1 or 4 for analyses which are of no importance, of secondary importance and of primary importance respectively.

Small changes in weights (except at 0) do not greatly affect the relative properties of different designs.

5. Discussion

In fitting polynomial models the validity of using higher order terms as an error estimate depends on the validity of the model, whereas the validity of pure error as an error estimate is not dependent on the fit of the model. Classical design criteria were developed assuming that an indepen- dent error estimate was available and thus, when that is not so, the usual criteria do not have the properties that they are intended to have. We have proposed modifications to the usual criteria so that the resulting designs take into account the necessity of obtaining a valid estimate of error for proper inferences about the parameters of the model. The corrected criteria, based on the quantiles of appropriate F -distributions for inferences, may result in quite extreme designs that do not allow any lack-of-fit checks. However, by using compound criteria which incorporate

(22)

Table 8. Compound optimum designs and their properties for five factors, one at two levels and four at three levels, in 40 runs, under the second-order model (example 4; see also Table 5)

Designs for the following compound criteria:

κ = (0.5, 0, 0, 0.5)—V κ = (0, 0.5, 0, 0.5)—VI

X1 X2 X3 X4 X5 X1 X2 X3 X4 X5

−1 −1 −1 −1 −1 −1 −1 −1 −1 1

−1 −1 −1 1 1 −1 −1 −1 1 −1

−1 −1 1 −1 1 −1 −1 −1 1 −1

−1 −1 1 1 −1 −1 −1 1 −1 −1

−1 −1 1 1 −1 −1 −1 1 1 1

−1 1 −1 −1 1 −1 1 −1 −1 −1

−1 1 −1 −1 1 −1 1 −1 1 1

−1 1 −1 1 −1 −1 1 −1 1 1

−1 1 −1 1 −1 −1 1 1 1 −1

−1 1 1 −1 −1 −1 1 1 1 −1

−1 1 1 −1 −1 1 −1 −1 −1 −1

−1 1 1 1 1 1 −1 −1 −1 −1

1 −1 −1 −1 1 1 −1 −1 1 1

1 −1 −1 1 −1 1 −1 1 −1 1

1 −1 1 −1 −1 1 −1 1 1 −1

1 1 −1 −1 −1 1 1 −1 −1 1

1 1 −1 −1 −1 1 1 −1 1 −1

1 1 −1 1 1 1 1 1 1 1

1 1 1 −1 1 1 1 1 1 1

1 1 1 −1 1 −1 1 1 −1 0

1 1 1 1 −1 1 1 −1 1 0

1 1 1 1 −1 1 1 1 −1 0

1 1 −1 1 1 −1 1 −1 0 −1

1 −1 1 1 0 −1 1 1 0 1

−1 1 1 0 1 1 1 1 0 −1

1 −1 1 0 1 −1 −1 0 −1 −1

−1 −1 0 −1 1 −1 1 0 −1 1

1 −1 0 1 1 1 1 0 −1 −1

−1 0 −1 1 1 −1 0 1 −1 1

1 0 1 1 1 1 0 1 −1 −1

−1 −1 −1 0 0 −1 −1 1 0 0

1 1 −1 0 0 −1 0 −1 −1 0

−1 1 0 1 0 1 0 −1 0 1

−1 0 1 −1 0 1 0 0 1 −1

−1 0 0 0 −1 1 −1 0 0 0

1 0 0 0 −1 −1 0 0 1 0

df (PE; LoF) (12; 8) (9; 11)

DS-eff 98.36 98.34

AS-eff 92.43 95.84

.DP/S-eff 89.09 77.22

.AP/S-eff 98.27 94.53