©2012 Royal Statistical Society 0035–9254/12/61345
**61**,*Part 3*,*pp. 345–401*

**Optimum design of experiments for statistical** **inference**

Steven G. Gilmour

*University of Southampton, UK*
and Luzia A. Trinca

*Universidade Estadual Paulista, Botucatu, Brazil*

*[Read before The Royal Statistical Society on Wednesday, November 16th, 2011, the President ,*
*Professor V. S. Isham, in the Chair ]*

* Summary. One attractive feature of optimum design criteria, such as D- and A-optimality, is that*
they are directly related to statistically interpretable properties of the designs that are obtained,
such as minimizing the volume of a joint confidence region for the parameters. However, the
assumed relationships with inferential procedures are valid only if the variance of experimental
units is assumed to be known. If the variance is estimated, then the properties of the inferences
depend also on the number of degrees of freedom that are available for estimating the error
variance. Modified optimality criteria are defined, which correctly reflect the utility of designs
with respect to some common types of inference. For fractional factorial and response surface
experiments, the designs that are obtained are quite different from those which are optimal
under the standard criteria, with many more replicate points required to estimate error. The opti-
mality of these designs assumes that inference is the only purpose of running the experiment,
but in practice interpretation of the point estimates of parameters and checking for lack of fit of
the treatment model assumed are also usually important. Thus, a compromise between the new
criteria and others is likely to be more relevant to many practical situations. Compound criteria
are developed, which take account of multiple objectives, and are applied to fractional factorial
and response surface experiments. The resulting designs are more similar to standard designs
but still have sufficient residual degrees of freedom to allow effective inferences to be carried
out. The new procedures developed are applied to three experiments from the food industry to
see how the designs used could have been improved and to several illustrative examples. The
design optimization is implemented through a simple exchange algorithm.

*Keywords: A-optimality; Blocking; Compound criterion; D-optimality; Exchange algorithm;*

Factorial design; Lack of fit; Pure error; Response surface

**1.** **Introduction**

Experiments with complex treatment structures, such as fractional factorial, response surface or
mixtures designs, are very common in industrial research and development, as well as in many
laboratory-based sciences. In such experiments, variance-based optimality criteria are increas-
ingly used, owing to their availability in software packages, to choose the treatment design and,
*if appropriate, to arrange the design in blocks. An advantage of criteria such as D-optimality,*
*A- (or more generally L-)optimality, G-optimality, etc., is that they have meaningful interpre-*
tations in relation to the statistical analysis to be performed on the data from the experiment.

*For example, a D-optimum design minimizes the volume of a joint conﬁdence region for the*
*Address for correspondence: Steven G. Gilmour, School of Mathematics, University of Southampton, Highﬁeld,*
Southampton, SO17 1BJ, UK.

E-mail: s.gilmour@soton.ac.uk

*parameters, an A-optimum design minimizes the average variance of parameter estimates and*
minimizes the average squared width of the corresponding conﬁdence intervals and so on. Very
importantly, multiple objectives can be met through the use of compound criteria, although this
is not yet common in practice. For a comprehensive and accessible description of optimality
*criteria and their applications, see Atkinson et al. (2007).*

The work that is presented here was motivated mainly by experimental research carried out for the food industry. Fractional factorial and response surface designs are widely used for such experiments, but the relatively large run-to-run variation that is caused by the use of biological materials means that some of the very small designs that are used in some other industrial appli- cations are ineffective. The analysis of data involves inferences based on conﬁdence intervals or hypothesis tests, to ensure that ﬁtted response surfaces are not overinterpreted as showing effects which could be due to random variation. Despite the supposed relationships between optimality criteria and common methods of analysing the data from these experiments, many experimenters still prefer to use standard designs, such as central composite designs (CCDs).

One advantage of classical designs is that, unlike most optimum designs, they include replicate points.

In almost all experiments, the statistical inference procedures that are used rely on an internal estimate of the variance between experimental units, which is often called ‘error’. In experiments in which the proposed model for treatment effects includes fewer parameters than there are treat- ments, there are some differences in practice about which estimate of error is used. The default in many statistical computing packages, especially if a general regression program is used, and the procedure that is described in many regression textbooks (which are often aimed at analysing observational data), is to use the residual mean square from the ﬁtted model. However, most textbooks on the design and analysis of experiments recommend separating lack of ﬁt from

‘pure error’ and using the pure error mean square for inference.

We strongly recommend the latter method of estimating error. However, once this has been accepted, the usual deﬁnitions of optimum design criteria no longer have all of the statistical interpretations claimed for them, since the sizes of conﬁdence intervals and regions, or equiva- lently the power of hypothesis tests, depend not only on the variance matrix of the parameter estimates, but also on the degrees of freedom for pure error. The objectives of this paper are to show how the criteria should be correctly deﬁned in order to have the properties claimed, to show that they can be applied in some of the same ways as the traditional criteria and to illustrate by examples that quite different designs can be optimal under the new criteria.

In Section 2, we discuss the analysis of data and, in particular, the estimation of error for inferential procedures. In Section 3, adjusted deﬁnitions of various optimality criteria which have the desired interpretations and examples of their use to choose designs are given. In Sec- tion 4 compound criteria which allow a compromise between different experimental objectives are developed and illustrated. Finally, some overall lessons are drawn in Section 5.

The programs that were used to obtain near optimum designs can be obtained from http://www.blackwellpublishing.com/rss

**2.** **Inference from designed experiments**

The analysis of data from factorial-type designs typically includes ﬁtting one or more polyno-
mial models and carrying out hypothesis tests on these models, in particular to compare models
of different orders and to test whether the model is better than a null model, but also to test
individual high order parameters. Once a reasonable model has been found it is interpreted by
estimating the parameters. Except in rare cases in which the variance of experimental units,*σ*^{2},

can be assumed to be known, all such tests require an estimate of*σ*^{2}. We assume that estimating
*σ*^{2}is not of interest in itself but is required only for carrying out inference that is related to the
treatment comparisons.

For a completely randomized design, in general, we assume that an experiment has been run
*with t treatments deﬁned by combinations of the levels of the factors and that the responses can*
be modelled as

Yij*= μ*i+ "ij, i = 1, . . . , t, j = 1, . . . , ni, .1/

where Yij*is the response from the jth replicate of treatment i,μ*_{i}is the expected response from
**treatment i, E.ε/ = 0 and V.ε/ = σ**^{2}* I. We shall refer to this as the full treatment model. Exper-*
imenters often try to make interpretation easier and more informative by ﬁtting a submodel
with

*μ*_{i}*= β*0**+ f.x**_{i}/^{}* β,* i = 1, . . . , t, .2/

**where x**_{i}represents the levels of the factors X1, . . . , Xq* in treatment i, f is a .p − 1/-dimensional*
function of these levels and

**β is a .p − 1/-dimensional vector of parameters. In this paper we****shall assume that f represents a polynomial which respects functional marginality, but the same**arguments would apply to other linear models.

The ﬁrst question that we shall address is whether, when performing inference from model
(2), we should use the estimate of*σ*^{2}from ﬁtting model (2), or the pure error estimate obtained
from ﬁtting model (1). To illustrate the difference that it might make, consider exercise 11.6 of
Box and Draper (2007). They analysed data from a three-factor rotatable CCD with four centre
points, with one gross outlier (from a factorial point) removed. The pure error estimate of*σ*^{2}
is s^{2}= 10:77 on 3 degrees of freedom, whereas the estimate that is obtained from the second-
order model, i.e. pooling the pure error and second-order model lack-of-ﬁt degrees of freedom,
is s^{2}_{p}= 7:03 on 7 degrees of freedom. The mean square for second-order effects is 30.90 and,
using s_{p}^{2}*, Box and Draper found that the test of the second-order parameters gives a p-value*
of 0.037 and went on to do further interpretation of this model. However, using s^{2}, the test
of the second-order parameters gives a test statistic of F = 2:87, which on 6 and 3 degrees of
*freedom gives a p-value of 0.208, which would suggest that there is little justiﬁcation for further*
interpreting the second-order model. A more clear-cut recommendation could have been given
if the design had allowed more than 3 degrees of freedom for pure error.

This is not an isolated example. A few pages earlier in the same book (Box and Draper (2007),
pages 668–669), one example has s^{2}= 5:667 and s^{2}_{p}= 19:612, although lack of ﬁt is not signiﬁ-
cant, and the next example has s^{2}= 220:5 and s^{2}_{p}= 81:7. Although in these two examples the
qualitative conclusions are not changed greatly, standard errors could be quite drastically under-
estimated or overestimated and conﬁdence intervals can be too narrow or too wide. Clearly, it is
an important decision which estimate of*σ*^{2}is used, but there is no unanimity among research-
ers. In many of the textbooks which users of factorial and response surface designs use, there
is no acknowledgement of the complexity of the issue. Khuri and Cornell (1996), Dean and
*Voss (1999), Box and Draper (2007) and Atkinson et al. (2007) recommend using s*^{2}_{p}. Cochran
and Cox (1957) and John (1971) recommend using s^{2}. Cornell (2002) seems to recommend
both in different sections and Hinkelmann and Kempthorne (2005) seem to give a different
recommendation from Hinkelmann and Kempthorne (2008).

Although most researchers do not discuss the issue and their practice varies, those who do
discuss it generally come down in favour of using pure error, i.e. s^{2}. Among some of the classic
texts, Draper and Smith (1998), page 48, state that s^{2}

‘will usually provide an estimate of*σ*^{2}which is much more reliable than we can obtain from any other
source’,

adding that

‘For this reason, it is sensible when designing experiments to arrange for (replicates)’.

In the context of factorial experiments, Scheffé (1959), pages 126–127, strongly condemned
the practice of pooling sums of squares from non-signiﬁcant interactions because it leads to
biased estimation of*σ*^{2}(since pooling of zero effects will only be done when their estimates turn
out to be small) and leads to procedures whose statistical properties are not known. He also
recommends

‘designing the experiment so that there will be a sufﬁcient number of d.f. for (pure) error’.

Cox (1958) agrees that*σ*^{2}‘is best estimated’ by pure error, but is more forgiving and allows the
use of s^{2}_{p}if ‘only one observation is made on each treatment’. Davies (1956) agrees and Wu and
Hamada (2009) tend towards the same view but make a less clear-cut recommendation.

We accept the view that inferential procedures should be carried out by using the unbiased
pure error estimator of*σ*^{2}, while acknowledging that statistical inference is often not the most
important part of the analysis and interpretation of experimental data. The main problem with
using s^{2}_{p}is that the biases induced are unmeasurable and the inferences are therefore difﬁcult to
interpret. In any particular experiment, if carrying out inference is regarded as being important,
then the design should be chosen to make that inference as informative as possible. In the next
section, we show how the standard optimality criteria must be modiﬁed to do this. Later, we
shall consider compound criteria for situations in which inference represents a part, but not the
whole, of the means of analysing the data from the experiment.

**3.** **Design criteria for inferential procedures**
*3.1.* *Definitions*

If the data from an experiment are to be analysed primarily by using conﬁdence intervals or regions and/or hypothesis tests, then the experiment should be designed to ensure that these procedures will be as informative as possible. For simplicity of presentation and to clarify the relationships with standard criteria, we shall assume that the aim is to obtain unbiased con- ﬁdence intervals or regions of minimal length or volume. The same criteria will maximize the power of hypothesis tests which can be obtained from these conﬁdence intervals or regions, e.g.

*t-tests of individual parameters or F -tests of sets of parameters. To carry out these procedures*
the design must allow a sufﬁcient number of degrees of freedom for estimating error. We can,
in fact, specify formally the appropriate criterion.

*The usual statistical justiﬁcation for D-optimality is that it minimizes the volume of the joint*
*conﬁdence region for the parameters—see, for example, Atkinson et al. (2007), page 135. This is*
based on the fact that the volume of the conﬁdence region is proportional to**|X**^{}**X**|^{−1=2}**, where X**
* is the polynomial model matrix, given the treatment design, with ith row .1 f.x*i/

^{}

*/, and so the D-*

**criterion minimizes 1=|X**

^{}

**X**

*|. Although this is correct, as noted by Kiefer (1959), ‘with σ*

^{2}known or else (pure error degrees of freedom) the same for all designs’, in a general form, the volume

*of a 100.1 − α/% conﬁdence region (see Draper and Smith (1998), page 145) is proportional to*

.F* _{p,d;1−α}*/

^{p=2}

**|X**

^{}

**X**|

^{−1=2},

*where p is the number of parameters in the model, d is the number of pure error degrees of*
freedom and F_{p,d;1−α}*is the .1 − α/-quantile of the F -distribution with p numerator and d*

*denominator degrees of freedom. Thus the D-criterion should be to minimize*
.F* _{p,d;1−α}*/

^{p}

**=|X**

^{}

**X**|:

*We shall refer to this as the DP.α/ criterion. In this paper, we shall use α = 0:05 for illustration*
and refer to the criterion simply as DP, but the required conﬁdence level should be considered
carefully for each experiment. Despite the above quotation, Kiefer (1959) did not suggest this
additional step, since he did not separate lack of ﬁt from pure error.

Similarly, DS-optimality is intended to minimize the volume of a joint conﬁdence region for
a subset of p2of the parameters by minimizing**|.M**^{−1}/22**|, where M = X**^{}**X and .M**^{−1}/22is the
portion of its inverse corresponding to the subset of the parameters of interest. To take account
of pure error estimation correctly, the .DP/Scriterion is to minimize

.F_{p}2*,d;1−α*/^{p}^{2}**|.M**^{−1}/22|:

This criterion should be used, for example, when a major objective of the experiment is to
compare the ﬁrst-order model with the second-order model. Then the higher order terms will
form the subset and minimizing the volume of a conﬁdence region for them will be equivalent
to maximizing the power of a test for their existence. Note that if the parameters of interest
are the treatment parameters and the nuisance parameter(s) is or are the intercept or the inter-
cept plus block effects, then standard DS*-optimality reduces to D-optimality. With the new*
criterion, this is no longer true, owing to the reduction in the numerator degrees of freedom.

Throughout this paper, we shall use DSto refer to the situation where all parameters of the poly- nomial treatment model (excluding the intercept) contribute to the criterion, unless otherwise stated.

*L-optimality is intended to minimize the mean of the variances of several linear functions of*
**the parameters, deﬁned by L**^{}*.β*0**β**^{}/^{}by minimizing tr**{W.X**^{}**X/**^{−1}**}, where W = LL**^{}**. If W is**
*diagonal, this reduces to weighted-A-optimality and if all the diagonal elements are equal we*
*obtain A-optimality, whereas, if p*2 of them are equal and the rest are 0, we obtain AS-opti-
*mality. Note that A- and A*S-optimality are scale dependent, i.e. they are not invariant to linear
*reparameterizations, so, whenever a design is described as A or A*Soptimal, we shall state with
**respect to which scaling. If L has a single column, the criterion reduces to c-optimality. With***L-optimality the property that is claimed for the criterion can be related to point estimation*
and so the standard criteria are acceptable in this sense. However, the width of a conﬁdence
*interval for the ith linear function of the parameters is proportional to*√*{F*_{1,d;1−α}**l**^{}_{i}**.X**^{}**X/**^{−1}**l**_{i}*},*
**where l**^{}_{i}**is the ith row of the matrix L**^{}, and so the mean of the squared lengths of such intervals
is minimized by minimizing

F* _{1,d;1−α}*tr{W.X

^{}

**X/**

^{−1}

*},*

which we refer to as the LP-criterion, with the letter P also being used for the special cases, such
as AP-optimality. It might be advisable to replace F*1,d;1−α* with a similar quantity corrected
for multiple testing, but this is rarely done in the analysis of data from experiments of this
type.

*The usual form of G-optimality minimizes the maximum variance of the estimated response*
over the design region by minimizing max**x****{.1 f.x/**^{}**/.X**^{}**X/**^{−1}**.1 f.x/**^{}/^{}*}. Again this is suitable*
for point estimation but, if (pointwise) conﬁdence intervals for the mean response or prediction
intervals for new units are required, we should deﬁne the GP-criterion to be to minimize

F* _{1,d;1−α}*max

**x** **{.1 f.x/**^{}**/.X**^{}**X/**^{−1}**.1 f.x/**^{}/^{}*}:*

The same general idea can be used for any other design optimality criterion which relates to
the experiment’s ability to allow statistical inference of any type to be performed. Other obvious
*examples include I -optimality and T -optimality—see Atkinson et al. (2007) for these and other*
criteria.

We make the following comments about these new criteria.

(a) An echo of the general idea can be found in the last section of Fisher (1966), pages 242–

245, in the context of sample size calculations. However, Fisher’s method is based on ﬁducial probability, which might have hindered its acceptance. As he wrote, it

‘is unintelligible only to those who over a long period resisted the cogency of the ﬁducial argument’.

(b) In experiments in which the only treatment design question is how many (greater than
0) replicates there should be of each treatment, e.g. experiments with unstructured treat-
*ments, the number of distinct treatments is constant, so d depends only on the total*
number of experimental units and hence the new criteria are identical to the standard
criteria.

(c) Asymptotically, as d → ∞, the new criteria converge to the standard criteria. Hence, in very large experiments, the designs that are chosen will be the same.

(d) The concept of continuous design is not meaningful with the new criteria, since the
*quantiles of the F -distributions are not proportional to n. Consequently, there are no*
equivalence theorems. Hence, for ﬁnding optimum or near optimum designs, there is
usually no alternative to either an exhaustive search or a discrete optimization heuristic,
such as an exchange algorithm.

(e) The standard versions of most criteria are meaningful in terms of point estimation, but for
*the D-criterion and its variants the standard version really has no statistical interpretation*
and should be abandoned.

Although these criteria are new, the idea of including enough degrees of freedom to estim-
ate pure error is common in response surface methodology. Following Dykstra (1959), several
researchers have considered partially replicated two-level designs, although optimality is not
considered. More recently, Liao and Chai (2004) and Dasgupta and Jacroux (2010) evaluated
*their partially replicated designs by using the standard D-criterion. However, in all these cases,*
the choice of the number of pure error degrees of freedom is made informally and separately
from optimality considerations.

We have implemented some of the new criteria in a standard exchange algorithm for construct- ing optimum designs. A candidate set of treatments (combinations of factors’ levels) is formed.

A random initial design, selected from the candidate set, starts the search and the complete procedure is repeated for a number of different initial designs (tries), as is usual for exchange algorithms. The search proceeds by systematically making exchanges between treatments in the current design and in the candidate set and accepting any exchange that improves the criter- ion function. Many other types of exchange algorithm exist, e.g. making use of the ideas of excursion (making more than one exchange before evaluating the design) or stochastic ideas, such as accepting with small probability an exchange which makes the design worse. For typical response surface problems, the simple exchange algorithm that we use is as good as any other, although for speciﬁc examples it might be possible to do slightly better. The numbers of pure error degrees of freedom are obtained by labelling the treatment combinations and counting the number of treatment levels at each exchange tried. These algorithms will ﬁnd near optimum designs, but there is no guarantee that they will ﬁnd the global optimum.

*3.2.* *Examples*

To show how the optimum designs using the new criteria differ from those using the standard
criteria, we present some illustrative examples. DS- and AS-optimum designs are found for poly-
**nomial models, the nuisance parameter being the intercept, such that .M**^{−1}/22**= .X**^{}**Q**_{0}**X/**^{−1}
**where Q**_{0}**= I − .1=n/11**^{} **and the X-matrix does not include the column of 1s. When aiming**
at a second-order model in a cubic region of experimentation under the AS-criterion, we use
weights to bring the different effects to the same scale, i.e. linear and interaction terms are given
weight 1 and quadratic effects are given weight 0.25. This ensures that the optimum design
for each individual parameter gives the same variance for the relevant parameter and can be
considered as a simple application of the scaling that was recommended for non-linear models
* by Atkinson et al. (1993). The elements of the diagonal of W were scaled to add up to 1. In*
a spherical region, all parameters are given equal weights because they contribute equally to
the polynomial approximation to the unknown true response function. We use the notation of

*subset designs to describe the treatments that are used. For q factors S*ris deﬁned as the subset

*from the full factorial, in which r factors appear as*±1 (or plus or minus another constant in a spherical region) and, if r < q, q − r factors appear as 0; see Gilmour (2006) and Ahmad and Gilmour (2010) for more details on subset designs. The following tables show DS-, .DP/S-, AS-, .AP/S- and, in some cases, compound optimum designs for different numbers of factors and runs. Compound criteria are discussed in Section 4. To discriminate better between the designs, the tables also show pure error, PE, and lack of ﬁt, LoF, degrees of freedom and efﬁciencies (in percentages) with respect to the best design found under each criterion.

*3.2.1.* *Example 1 (n = 16; q = 3; p = 10)*

In Table 1 we show a few designs, under the second-order model, for three factors in 16 runs.

Using the DS- or AS-criteria resulted in the same design I which, as usual, includes very few
points close to the centre of the design region. There are no replicated treatments and thus the
design does not allow pure error estimation. It does, however, allow 6 degrees of freedom for
lack of ﬁt. Using the pure error versions of the criteria resulted in 6 and 5 degrees of freedom for
the .DP/S- and .AP/S-criteria, designs II and III, respectively. Design II is very extreme in the
sense that it has no degrees of freedom for checking lack of ﬁt. In both designs there is replication
of points in the corners and on the faces of the cube, but no centre points. This is quite different
from the usual practice of experimenters. Design IV is .AP/S*.α = 0:05=9/ optimal, i.e. we used*
a Bonferroni adjustment for multiple comparisons. Although designs II and IV are equivalent
in terms of their pure error degrees of freedom they have different properties for estimating the
effects: the former being better for joint inferences on the treatment parameters; the latter for
inferences on individual parameters. For comparison we also evaluated two subset designs, a
CCD with two centre points (S3+ S1+ 2S0), allowing 1 degree of freedom for pure error and
a modiﬁed Box–Behnken design (S2+ 4S0), allowing 3 degrees of freedom for pure error. The
CCD is quite similar to the .AP/S-optimum design in terms of properties of the information
matrix (DS-eff= 93.15; AS-eff= 90.75) but poorer for pure error estimation (.DP/S-eff= 1.91;

.AP/S-eff= 4.31). The Box–Behnken design is a compromise in terms of pure error and lack- of-ﬁt degrees of freedom, but is poor in terms of variance properties (DS-eff= 74.94; AS-eff

= 66.34; .DP/S-eff= 41.95; .AP/S-eff= 50.17). The desirability of such compromises will be discussed further in Section 4.

*3.2.2.* *Example 2 (n = 18; q = 3; p = 10)*

For the experiment that was described in exercise 11.6 of Box and Draper (2007), which was

**Table 1.** Optimum† designs and their properties for three three-level factors in 16 runs under the sec-
ond-order model (example 1)

*Designs for the following criteria:*

DS*or A*S*—I* *(DP)*S*—II* *(AP)*S*—III*

X1 X2 X3 X1 X2 X3 X1 X2 X3

−1 −1 −1 −1 −1 1 −1 −1 −1

−1 −1 1 −1 −1 1 −1 −1 1

−1 1 −1 −1 1 −1 −1 1 −1

−1 1 1 −1 1 −1 −1 1 −1

1 −1 −1 −1 1 1 −1 1 1

1 −1 1 1 −1 1 1 −1 −1

1 1 −1 1 −1 1 1 −1 1

1 1 1 1 1 1 1 −1 1

1 −1 0 1 0 −1 1 1 −1

1 1 0 1 0 −1 1 1 1

1 0 −1 0 −1 −1 1 0 0

1 0 1 0 −1 −1 1 0 0

0 1 1 −1 0 0 0 1 0

−1 0 0 −1 0 0 0 1 0

0 −1 0 0 1 0 0 0 1

0 0 −1 0 0 1 0 0 1

df (PE; LoF)‡ (0; 6) (6; 0) (5; 1)

DS-eff 100.00 83.09 93.03

AS-eff 100.00 66.52 86.27

.DP/S-eff 0.00 100.00 96.17

.AP/S-eff 0.00 85.10 100.00

*(continued )*

mentioned in Section 2, several optimum designs are shown in Table 2. The efﬁciencies for the Box and Draper design (using the axial points at √

3, which is slightly different from those actually used) are DS-eff= 96.36, AS-eff= 98.68, .DP/S-eff= 42.46 and .AP/S-eff= 63.10.

Note that design III is very similar to this design. Again the designs that are built by the new criteria give quite extreme designs which allow little or no testing for lack of ﬁt.

*3.2.3.* *Example 3: cassava bread (n = 26; q = 3; p = 10)*

Escouto (2000) performed an experiment to formulate a recipe for gluten-free bread based on cassava ﬂour. Gluten-free food is recommended to people with coeliac disease, which is an auto- immune disorder of the small intestine which occurs in genetically predisposed individuals, and there is a large market for gluten-free versions of staple foods. The experiment involved three factors: X1, the amount of powdered albumen (egg white) with levels 10, 20 and 30 g; X2, the amount of yeast with levels 5, 10 and 15 g; X3, the amount of ground cassava ﬂour with levels 45, 55 and 65 g. Other substances such as salt, sugar, vegetable fat, powdered milk, fermented cassava starch and water were maintained at constant levels in all recipes, as were factors that were associated with the mixing and baking processes. A modiﬁed CCD, with duplicated fac- torial points and four centre points (2S3+S1+4S0) in 26 runs was planned, allowing 11 degrees of freedom for pure error. Several organoleptic characteristics of the product were evaluated as response variables. The objective was to ﬁnd a formulation which would present similar

**Table 1** *(continued )*

*Design for criterion* *Designs for the following compound criteria§:*

*(AP)*S*§§—IV*

**κ = (0, 0.2, 0, 0.8)§§,****κ = (0, 0.2, 0, 0.8)—VI****κ = (0.2, 0, 0, 0.8)—V**

X1 X2 X3 X1 X2 X3 X1 X2 X3

−1 −1 −1 −1 −1 −1 −1 −1 −1

−1 1 −1 −1 −1 1 −1 1 −1

1 −1 −1 −1 1 −1 −1 1 1

1 1 −1 −1 1 1 1 −1 −1

−1 0 1 1 −1 −1 1 −1 −1

−1 0 1 1 −1 1 1 −1 1

1 0 1 1 1 −1 1 1 −1

1 0 1 1 1 −1 1 1 −1

0 −1 1 1 1 1 1 1 1

0 −1 1 1 1 1 1 1 1

0 1 1 −1 0 0 −1 −1 0

0 1 1 1 0 0 −1 0 1

−1 0 0 0 −1 0 0 −1 1

−1 0 0 0 −1 0 1 0 0

0 0 −1 0 0 −1 0 1 0

0 0 −1 0 0 −1 0 0 −1

df (PE; LoF)‡ (6; 0) (4; 2) (3; 3)

DS-eff 80.94 95.10 98.68

AS-eff 73.09 89.46 96.03

.DP/S-eff 97.42 78.21 55.24

.AP/S-eff 93.50 88.89 72.63

†Where a heading indicates two criteria, the design is optimal for both.

‡Degrees of freedom for pure error and lack of ﬁt.

§The compound criterion is deﬁned in equation (5) in Section 4.1.

§§Conﬁdence level corrected for multiple comparisons.

characteristics to wheat-based white bread. Several alternative designs are presented in Table 3.

Again the DS- and AS-criteria give identical designs (design I) allowing 9 degrees of freedom
for pure error. Using the (DP)_{S}- and .AP/S-criteria resulted in 15 and 12 degrees of freedom
for pure error respectively. Using a Bonferroni correction for multiple comparisons, the .AP/S-
*.α = 0:05=9/ optimum design (design IV) gives a design allowing 13 degrees of freedom for pure*
error. These designs show that, for carrying out inference, it is beneﬁcial to sacriﬁce a little in
terms of the traditional criteria to improve the estimation of*σ*^{2}.

For comparison we show in Table 4 the properties of a few subset designs, including the design
(2S3+S1+4S0) which was actually used in the experiment. We note that this design allows similar
numbers of pure error degrees of freedom to design III but it is poorer with respect to the other
properties. A modifed Box–Behnken design (2S2+ 2S0) allows the same number of pure error de-
grees of freedom as the .AP/S*- .α = 0:05=9/ optimum design but it is also poorer with respect to the*
other properties. The 3^{3}-factorial excluding the centre point (S3+ S2+ S1) is quite attractive with
respect to its variance properties but allows no degrees of freedom for pure error. Other attempts to
construct designs from the S_{r}subsets are quite inferior with respect to all the properties evaluated.

*3.2.4.* *Example 4: oil extraction (n = 40; q = 5; p = 20)*

*An experiment on the extraction of oil from oilseeds was described by Rosenthal et al. (2001).*

**Table 2.** Optimum designs and their properties for three factors in a spherical region (α Dp

*.3=2/) in 18*
runs under the second-order model (example 2)

*Designs for the following criteria:*

DS*—I* *(DP)*S*—II* AS*—III*

X1 X2 X3 X1 X2 X3 X1 X2 X3

−1 −1 −1 1 −1 −1 −1 −1 −1

−1 −1 1 1 −1 −1 −1 −1 −1

−1 1 −1 1 1 −1 −1 −1 1

−1 1 1 1 1 −1 −1 1 −1

1 −1 1 0 *−α* *α* −1 1 1

1 1 −1 0 *−α* *α* 1 −1 −1

1 1 1 0 *α* *α* 1 −1 1

0 *−α* *−α* 0 *α* *α* 1 1 −1

*α* 0 *−α* *−α* 0 *−α* 1 1 1

*α* *−α* 0 *−α* 0 *−α* −√

3 0 0

0 0 −√

3 *−α* 0 *α* √

3 0 0

0 0 √

3 *−α* *−α* 0 0 −√

3 0

0 −√

3 0 *−α* *−α* 0 0 √

3 0

0 √

3 0 *−α* *α* 0 0 0 −√

−√ 3

3 0 0 *−α* *α* 0 0 0 √

√ 3

3 0 0 √

3 0 0 0 0 0

0 0 0 √

3 0 0 0 0 0

0 0 0 0 0 0 0 0 0

df (PE; LoF) (1;7) (8;0) (3;5)

DS-eff 100.00 87.28 98.75

AS-eff 96.49 73.56 100.00

.DP/S-eff 1.61 100.00 43.50

.AP/S-eff 3.87 89.58 63.94

*(continued )*

The experiment involved ﬁve factors, one of them qualitative (type of enzyme) at two levels
(X1), and a second-order model was expected to ﬁt the data. The actual experiment included 50
runs, 10 of which had no enzyme added and thus two of the other factors were not deﬁned. For
illustration we shall consider designs for the 40 runs with enzyme added. For this part of the
experiment, the design used was^{1}_{2}S5+S2+4S1which allows 6 degrees of freedom for pure error.

This design and four optimum designs are shown in Table 5. The DS-optimum design (design I) allows just 1 degree of freedom for pure error, whereas the AS-optimum design (design III) allows none. The .DP/S-optimum design (design II) has 20 degrees of freedom for pure error, but no degrees of freedom for lack of ﬁt. The effect of this on the other properties of the design is quite large. The .AP/S-optimum design allows 15 degrees of freedom for pure error and 5 for lack of ﬁt and might seem more reasonable to many experimenters.

*3.2.5.* *Example 5 (n = 16; q = 8; p = 9)*

Table 6 shows designs for eight two-level factors in 16 runs, under the ﬁrst-order model. This situation differs from the others considered in that the DS- and AS-optimum design (design I) has the same information matrix as a regular 1=16 fractional factorial, which does not allow for pure error estimation. The .DP/S-optimum design (design II) is an irregular fraction which

**Table 2** *(continued )*

*Design for criterion* *Designs for the following compound criteria†:*

*(AP)*S*—IV*

**κ = (0.5, 0, 0, 0.5)—V****κ = (0, 0.5, 0, 0.5)—VI**

X1 X2 X3 X1 X2 X3 X1 X2 X3

−1 1 1 −1 −1 1 *−α* *−α* 0

−1 1 1 −1 −1 1 *−α* *−α* 0

1 −1 1 1 −1 −1 *−α* *α* 0

1 −1 1 1 −1 −1 *α* *−α* 0

1 1 −1 1 −1 1 *α* *α* 0

1 1 1 0 *α* *−α* *−α* 0 *−α*

1 1 1 0 *α* *−α* *−α* 0 *α*

*−α* *−α* 0 0 *α* *α* *−α* 0 *α*

*−α* *−α* 0 *−α* 0 *−α* *α* 0 *−α*

0 *−α* *−α* *−α* 0 *−α* *α* 0 *α*

0 *−α* *−α* *−α* *α* 0 0 *−α* *−α*

*−α* 0 *−α* *−α* *α* 0 0 *α* *−α*

*−α*√ 0 *−α* *α* *α* 0 0 *−α* *α*

3 0 0 0 0 √

3 0 *α* *α*

0 √

3 0 0 −√

3 0 0 *−α* *α*

0 0 √

3 √

3 0 0 0 0 0

0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0

df (PE; LoF) (7;1) (6;2) (5;3)

DS-eff 90.69 93.50 93.49

AS-eff 86.34 88.55 96.58

.DP/S-eff 95.75 88.55 76.05

.AP/S-eff 100.00 95.78 94.66

†The compound criterion is deﬁned in equation (5) in Section 4.1.

allows 7 degrees of freedom for pure error and none for lack of ﬁt. Again this shows that,
*for inference, it is worth sacriﬁcing a considerable amount in terms of the D-criterion to gain*
pure error degrees of freedom. The .AP/S-optimum design (design III) is even more extremely
irregular, with one factor having 12 runs at one level and four runs at the other. Of course, by
sacriﬁcing orthogonality, we lose the ability to perform other types of analysis, such as normal
or half-normal plots of effects. If this, rather than formal inference, was regarded as the main
purpose of the experiment then, of course, we should use the standard AS-criterion to reﬂect
this.

*3.3.* *Blocked designs*

Since many multifactor designs use moderately large numbers of runs, it is common that they must be run in blocks. For example, in industrial experiments the blocks often correspond to days or shifts.

*In a blocked design the full treatment model for the response in unit j of block i with treatment*
*k applied to it may be written as*

Yij.k/*= μ*i*+ τ*k+ "ij, .3/

where *μ*i *is the expected response in block i,* *τ*k *is the effect of treatment k, i = 1, . . . , b, j =*
1, . . . , ni.Σ^{b}_{i=1}ni**=n/, k =1, . . . , t, E.ε/=0 and V.ε/=σ**^{2}**I. Block effects may be ﬁxed or random**

**Table 3.** Optimum designs and their properties for three three-level factors in 26 runs under the second-order
model (example 3)

†The compound criterion is deﬁned in equation (5) in Section 4.1.

‡Conﬁdence level corrected for multiple comparison.

but, as discussed in Gilmour and Trinca (2006), in the design phase it is safer to consider them as ﬁxed since the variance components are not known. This ensures that in the most difﬁcult case, with a large block variance component, the design is optimal, whereas, when the block variance component is small, though the design might be suboptimal for this case, it will give better estimation than in the case of a large block variance component. We shall follow this advice here.

As with completely randomized designs, we shall try to ﬁt submodels for*τ*ksuch as

*τ*k**= f.x**k/^{}* β,* .4/

**Table 4.** Subset designs and their properties for three three-level factors in 26
runs under the second-order model (example 3)

*Design* *df (PE; LoF)* DS*-eff* AS*-eff* *(DP)*S*-eff* *(AP)*S*-eff*

2S3+ S1+ 4S0 (11; 5) 90.89 82.43 86.56 83.16 S3+ 2S1+ 6S0 (11; 5) 72.68 66.63 69.22 67.22

S3+ S2+ S1 (0; 16) 94.27 92.82 0.00 0.00

2S2+ 2S0 (13; 3) 78.71 70.79 79.99 74.13

S2+ 2S1+ 2S0 (7; 9) 58.81 45.26 44.12 39.56 S2+ S1+ 8S0 (7; 9) 55.78 43.89 41.85 38.36

**where f.x**k* / and β are deﬁned as in equation (2). Note that x*k

**may be written as x**

_{ij}, i.e. the

*levels of the q factors that were applied to unit j of block i.*

By ﬁtting model (3) we obtain an unbiased estimator of*σ*^{2}under the usual assumption of addi-
tive treatment effects, plus a valid randomization. Using the same argument as for unblocked
designs, for inferences on the* β-parameters we should use this estimate of error variance and so*
should use a design which allows its estimation. Note that, as is usually the case, some treatment
effects may be confounded with blocks.

The variance matrix of the least squares estimator ˆ**β is**

var. ˆ**β/ = σ**^{2}**.M**^{−1}_{B} /22*= σ*^{2}**.X**^{}**QX/**^{−1},
where

**M**_{B}=

**Z**^{}**Z** **Z**^{}**X**
**X**^{}**Z** **X**^{}**X**

,

**X is deﬁned as before (the column of 1s is excluded and so X has p − 1 columns), Q = I −**
**Z.Z**^{}**Z/**^{−1}**Z**^{}**and Z is the n × b matrix whose columns are indicators for blocks. The subscript B**
stands for the blocked design.

Criteria for designs allowing for blocks, the usual ones and the new ones proposed in this
**paper, are deﬁned as before with M**_{B} **or X**^{}**QX taking the place of X**^{}**X or X**^{}**Q**_{0}**X, and the**
appropriate numerator degrees of freedom. In particular, DS*-optimum designs are also D-opti-*
mum designs because **|M**B**| = |X**^{}**QX||Z**^{}**Z| and |Z**^{}**Z**| is constant. However, .DP/S-optimum
designs are not DP-optimum designs, owing to the reduction in numerator degrees of freedom.

Whereas the DP-criterion minimizes .F_{p+b−1,d}B;1*−α*/^{p+b−1}**=|M**B| the .DP/S-criterion minimizes
.F_{p−1, d}B;1*−α*/^{p−1}**=|X**^{}**QX**|, where dB is the pure error degrees of freedom from the blocked
design.

An exchange algorithm is also applied to construct near optimum blocked designs. This is
a simple extension of the algorithm that was brieﬂy described in Section 3.1. A random initial
design starts the search and the complete procedure is repeated for a number of different initial
designs. Exchanges which improve the criterion are accepted. For the new criteria the calcula-
tion of dB**= n − rank.Z : T/, where T is the n × t matrix indicator for treatments, is required for**
obtaining the pure error degrees of freedom, reducing the computational beneﬁts of updating
formulae.

*3.3.1.* *Example 6: pastry dough (n = 28; b = 7; q = 3; p = 10)*

This experiment was described by Trinca and Gilmour (1999). The main objective was to discover how some factors that are involved in an extrusion process for mixing dough could be

**Table5.**Designsandtheirpropertiesforfivefactors,oneattwolevelsandfouratthreelevels,in40runs,underthesecond-ordermodel(example4;see alsoTable8inSection4.3.4)

**Table5***(continued*)

**Table 6.** Optimum designs and their properties for eight two-level factors in 16 runs, under the linear effects
model (example 5)

*Designs for the following criteria:*

DS*or A*S*—I* *(DP)*S*—II*

X1 X2 X3 X4 X5 X6 X7 X8 X1 X2 X3 X4 X5 X6 X7 X8

−1 −1 −1 −1 1 1 −1 1 −1 −1 −1 1 −1 −1 −1 −1

−1 −1 −1 1 −1 −1 −1 −1 −1 −1 1 −1 1 1 1 −1

−1 −1 1 −1 −1 1 1 −1 −1 −1 1 −1 1 1 1 −1

−1 −1 1 1 1 −1 −1 −1 −1 −1 1 1 1 1 1 1

−1 1 −1 −1 1 −1 1 −1 −1 −1 1 1 1 1 1 1

−1 1 −1 1 −1 1 1 1 −1 1 −1 −1 −1 1 1 1

−1 1 1 −1 −1 −1 −1 1 −1 1 −1 −1 −1 1 1 1

−1 1 1 1 1 1 1 1 −1 1 1 −1 1 −1 −1 1

1 −1 −1 −1 −1 −1 1 1 1 −1 −1 −1 1 1 −1 1

1 −1 −1 1 1 1 −1 1 1 −1 −1 −1 1 1 −1 1

1 −1 1 −1 1 −1 1 1 1 −1 1 −1 −1 −1 1 1

1 −1 1 1 −1 1 1 −1 1 −1 1 −1 −1 −1 1 1

1 1 −1 −1 −1 1 −1 −1 1 1 −1 1 1 −1 1 −1

1 1 −1 1 1 −1 1 −1 1 1 −1 1 1 −1 1 −1

1 1 1 −1 1 1 −1 −1 1 1 1 1 −1 1 −1 −1

1 1 1 1 −1 −1 −1 1 1 1 1 1 −1 1 −1 −1

df (PE; LoF) (0; 7) (7; 0)

DS-eff 100.00 88.69

AS-eff 100.00 81.08

.DP/S-eff 0.00 100.00

.AP/S-eff 0.00 95.25

*(AP)*S*, κ = (0, 0.5, 0, 0.5)†—III*

**κ = (0.5, 0, 0, 0.5)†—IV**X1 X2 X3 X4 X5 X6 X7 X8 X1 X2 X3 X4 X5 X6 X7 X8

−1 −1 −1 1 −1 −1 1 −1 −1 −1 1 −1 −1 1 1 −1

−1 −1 −1 1 −1 −1 1 −1 −1 −1 1 −1 1 −1 1 −1

−1 −1 1 1 1 −1 −1 1 −1 −1 1 1 1 1 −1 1

−1 −1 1 1 1 −1 −1 1 −1 −1 1 1 1 1 −1 1

−1 1 −1 −1 −1 −1 −1 1 −1 1 −1 −1 −1 1 −1 1

−1 1 −1 −1 −1 −1 −1 1 −1 1 −1 −1 1 −1 −1 1

−1 1 −1 1 1 1 −1 −1 −1 1 −1 1 1 1 1 −1

−1 1 −1 1 1 1 −1 −1 −1 1 −1 1 1 1 1 −1

−1 1 1 −1 1 −1 1 −1 1 −1 −1 −1 1 1 −1 −1

−1 1 1 −1 1 −1 1 −1 1 −1 −1 −1 1 1 −1 −1

−1 1 1 1 −1 1 1 1 1 −1 −1 1 −1 −1 1 1

−1 1 1 1 −1 1 1 1 1 −1 −1 1 −1 −1 1 1

1 −1 −1 −1 1 1 1 1 1 1 1 −1 1 1 1 1

1 −1 1 −1 −1 1 −1 −1 1 1 1 −1 1 1 1 1

1 1 −1 1 1 −1 1 1 1 1 1 1 −1 −1 −1 −1

1 1 1 1 −1 −1 −1 −1 1 1 1 1 −1 −1 −1 −1

df (PE; LoF) (6; 1) (6; 1)

DS-eff 93.06 93.06

AS-eff 91.14 87.80

.DP/S-eff 94.27 94.27

.AP/S-eff 100.00 96.34

†The compound criterion is deﬁned in equation (5) in Section 4.1.

varied to control the properties of the pastry. Three controllable factors of interest were the ﬂow rate of water into the mix, the initial moisture content in the mix and the screw speed. As the properties of the dough varied from day to day because of uncontrollable factors, the design was divided into seven blocks (days) of four runs each. Table 7 shows some alternative designs, along with the design that was used. The DS- and AS-optimum design (design I) allows 2 degrees of freedom for error. The .DP/S-optimum design allows 12 degrees of freedom for error, whereas the .AP/S-optimum design allows 10. Again we note that, for the .DP/S-optimum design, there are no spare degrees of freedom for lack of ﬁt. Because of the extra restrictions required by blocking, we see that the designs which allow for pure error estimation cost more in terms of the traditional criteria than in completely randomized designs. The design that was actually used is clearly inferior to the new designs.

**Table 7.** Designs and their properties for three three-level factors in seven blocks of four, under the sec-
ond-order model (example 6; see also Table 9 in Section 4.3.6)

*Block* *Designs for the following criteria:* *Design used—IV*

DS*or A*S*—I* *(DP)*S*—II* *(AP)*S*—III*

X1 X2 X3 X1 X2 X3 X1 X2 X3 X1 X2 X3

1 −1 1 −1 −1 −1 1 −1 −1 1 −1 −1 −1

−1 −1 1 −1 1 −1 −1 1 −1 −1 1 1

1 −1 −1 1 −1 −1 1 −1 −1 1 −1 1

1 1 1 1 1 1 1 1 1 1 1 −1

2 −1 −1 −1 −1 −1 1 1 −1 −1 −1 −1 1

1 1 −1 −1 1 −1 1 1 1 −1 1 −1

1 −1 1 1 −1 −1 0 −1 1 1 −1 −1

0 0 0 1 1 1 1 0 0 1 1 1

3 −1 1 1 −1 −1 1 −1 1 1 −1 1 −1

1 −1 1 −1 1 −1 1 1 −1 0 −1 0

1 1 −1 1 −1 −1 −1 −1 0 1 0 0

0 0 0 1 1 1 0 0 −1 0 0 1

4 −1 1 −1 −1 1 1 −1 1 1 1 −1 1

−1 −1 0 1 1 −1 1 1 −1 −1 0 0

1 0 −1 −1 0 0 −1 −1 0 0 1 0

0 1 1 0 −1 0 0 0 −1 0 0 −1

5 −1 1 1 −1 1 1 −1 1 −1 −1 −1 −1

1 1 0 1 1 −1 1 −1 −1 1 1 1

−1 0 −1 −1 0 0 0 −1 1 0 0 0

0 −1 1 0 −1 0 1 0 0 0 0 0

6 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 1

1 −1 0 1 −1 1 1 −1 1 1 1 −1

−1 0 1 1 1 0 −1 0 1 0 0 0

0 1 −1 0 0 −1 0 1 0 0 0 0

7 −1 −1 1 −1 −1 −1 −1 −1 −1 −1 1 1

−1 1 0 1 −1 1 1 −1 1 1 −1 −1

1 0 1 1 1 0 −1 0 1 0 0 0

0 −1 −1 0 0 −1 0 1 0 0 0 0

df (PE; LoF) (2; 10) (12; 0) (10; 2) (7; 5)

DS-eff 100.00 88.19 94.87 80.02

AS-eff 100.00 78.40 92.48 73.31

.DP/S-eff 16.36 100.00 99.59 69.01

.AP/S-eff 29.00 88.65 100.00 70.38

**4.** **Compound criteria**
*4.1.* *Definition*

The designs that are produced by the new criteria are quite extreme and experimenters might
be reluctant to use designs which are so different from what they are used to. This attitude
might be correct sometimes, depending on the objectives of the experiment. We would argue
strongly in favour of carefully considering what kinds of data analysis will be used to meet the
experimenters’ objectives and then carefully choosing the optimality criterion to match that
*data analysis. If a joint conﬁdence region or a global F -test of the treatment parameters will be*
the only relevant analysis, then a .DP/S-optimum design should be chosen.

In many experiments several types of data analysis are important, not all of them requir- ing an estimate of error. In particular, the analysis might involve all or most of the follow- ing:

*(a) a global F -test of the treatment parameters, for which we should use .DP/*S-optimality;

*(b) t-tests of the individual treatment parameters, for which we should use weighted-AP-*
optimality;

(c) point estimation of the individual treatment parameters, for which we should use weighted-
*A-optimality;*

(d) checking for lack of ﬁt of the assumed treatment model and, if appropriate, ﬁtting a few higher order terms.

For the last of these analyses, there is no obvious design optimality criterion to use. Atkin- son (1972) proposed a criterion for ﬁnding designs which are powerful for detecting lack of ﬁt.

This requires a prior estimate of the sizes of the higher order parameters. Jones and Mitchell (1978) relaxed this by using a minimax or average version over a range of parameter values.

*More recently, Goos et al. (2005) combined this criterion with others to produce designs which*
are both model robust and model sensitive. However, all of these criteria aim speciﬁcally at
testing for lack of ﬁt, whereas we are also interested in being able to estimate a few higher order
parameters and discriminating between models. To avoid having to consider too many differ-
ent criteria, we use the degree-of-freedom efﬁciency that was proposed by Daniel (1976), pages
177–178, as a simple way of incorporating all of these requirements. The degree-of-freedom
efﬁciency is the proportion of our experimental resource which is used to estimate the effects
of treatments. Clearly this is directly in conﬂict with our pure error criteria and a good design
must be a compromise.

We now combine all of these in a compound criterion, following exactly the method-
*ology that was described by Atkinson et al. (2007). We ﬁrst deﬁne the following efﬁciencies,*
**for the design with treatment model matrix X which has d degrees of freedom for pure**
error:

(a) the .DP/S-efﬁciency,

E1= **|X**^{}**Q**_{0}**X**|^{1=.p−1/}F_{p−1,d}_{D}_{;1−α}1

F_{p−1,d;1−α}_{1}**|.X**^{}_{DP}**Q**_{0}**X**_{DP}/|^{1=.p−1/},

**where X**_{DP} is the model matrix for the .DP/S-optimum design, which has d_{D}degrees of
*freedom for pure error, and the global F -test will be performed at the 100α*1% level of
signiﬁcance;

(b) the weighted-AP-efﬁciency,

E2=tr{W.X^{}_{AP}**X**_{AP}/^{−1}*}F*_{1,d}_{A}_{;1−α}_{2}
tr**{W.X**^{}**X/**^{−1}*}F** _{1,d;1−α}*2

,

**where X**_{AP}is the model matrix for the weighted-AP-optimum design, which has dAdegrees
*of freedom for pure error and the individual t-tests will be calculated at the 100α*2% level
of signiﬁcance;

*(c) the weighted-A-efﬁciency,*

E3=tr{W.X^{}_{A}**X**_{A}/^{−1}*}*
tr{W.X^{}**X/**^{−1}*}* ,

**where X**_{A}*is the model matrix for the weighted-A-optimum design;*

(d) the degree-of-freedom efﬁciency,

E4=n − d n :

Next we combine these criteria with weights*κ*1, . . . ,*κ*4respectively to obtain E = E^{κ}_{1}^{1}E^{κ}_{2}^{2}×
E^{κ}_{3}^{3}E_{4}^{κ}^{4}. After ignoring terms which do not depend on the design to be optimized, this means
that we choose a design to maximize

**|X**^{}**Q**_{0}**X**|^{κ}^{1}^{=.p−1/}.n − d/^{κ}^{4}

F_{p−1,d;1−α}^{κ}^{1} _{1}F_{1,d;1−α}^{κ}^{2} _{2}tr{W.X^{}**X/**^{−1}*}*^{κ}^{2}^{+κ}^{3}: .5/

The weights* κ should be chosen to reﬂect the relative importance of different aspects of the*
analysis and, in some experiments, some of the weights might be 0.

*4.2.* *Compound criteria for blocking*

The compound criteria that were deﬁned in Section 4.1 can easily be extended to blocked designs.

**For blocking, X**^{}**X or X**^{}**Q**_{0}**X should be replaced by M**_{B} **or X**^{}**QX in the efﬁciency formulae.**

Hence, we have the following efﬁciencies:

(a) the .DP/S-efﬁciency,

E1= **|X**^{}**QX**|^{1=.p−1/}F_{p−1, d}_{BD}_{;1−α}1

F_{p−1,d}_{B}_{;1−α}_{1}**|.X**_{DP}^{} **QX**_{DP}/|^{1=.p−1/},

**where X**_{DP}is the treatment model matrix for* β for the .DP/*S-optimum blocked design,
which has dBD

*degrees of freedom for pure error, and the global F -test will be calculated*at the 100

*α*1% level of signiﬁcance;

(b) the weighted-.AP/S-efﬁciency,

E2=tr**{W.X**^{}_{AP}**QX**_{AP}/^{−1}*}F*_{1,d}_{BA}_{;1−α}2

tr{W.X^{}**QX/**^{−1}*}F*_{1, d}_{B}_{;1−α}_{2} ,

**where X**_{AP}is the treatment model matrix for* β for the AP-optimum design, which has*
dBA

*degrees of freedom for pure error and the individual t-tests will be calculated at the*100

*α*2% level of signiﬁcance;

(c) the weighted-AS-efﬁciency,

E3=tr**{W.X**^{}_{A}**QX**_{A}/^{−1}*}*
tr**{W.X**^{}**QX/**^{−1}*}* ,

**where X**_{A}is the treatment model matrix for**β for the A-optimum design;**

(d) the degree-of-freedom efﬁciency,

E4=n − b + 1 − dB

n − b + 1 : Ignoring constants, the combined criterion is

**|X**^{}**QX**|^{κ}^{1}^{=.p−1/}.n − b + 1 − dB/^{κ}^{4}

F_{p−1,d}^{κ}^{1} _{B}_{;1}_{−α}_{1}F_{1,d}^{κ}^{2}_{B}_{;1}_{−α}_{2}tr{W.X^{}**QX/**^{−1}*}*^{κ}^{2}^{+κ}^{3}: .6/

*4.3.* *Examples*

Designs were built, using various compound criteria, for several of the examples that were pre- sented in earlier sections. Many different compound criteria can give the same optimum design and sometimes they can be the same as the designs that are obtained by using simple criteria.

In what follows we discuss some of the more interesting designs that were found.

*4.3.1.* *Example 1 (n = 16; q = 3; p = 10), continued*

In Table 1 two designs (designs V and VI) are shown which place a high weight on degree-of- freedom efﬁciency, and the effect of this is very clear. The usual criteria allow no degrees of freedom for pure error and the pure-error-based criteria do not allow any test for lack of ﬁt.

The designs that are produced by the compound criteria are less extreme, offering a compro-
mise between pure error and lack-of-ﬁt degrees of freedom. We also note that it is possible to
produce better designs than classical designs (e.g. regular subset designs) with similar numbers
of degrees of freedom for pure error, e.g. comparing the modiﬁed Box–Behnken and design
VI. The structure of design V is interesting, containing a full S3subset, with duplicates of two
points, and six axial points, only two of which are a pair, the others being two replicates of one
of each of the other pairs. This structure also seems like a compromise between the very tightly
*deﬁned structure of subset designs and the apparently messy structures of D-optimum designs.*

This design, which is optimal for two different weight patterns for the compound criteria, is very similar to the CCD in terms of the variance properties, but it allows much better estimation of pure error. It looks very attractive for practical use.

*4.3.2.* *Example 2 (n = 18; q = 3; p = 10), continued*

A similar type of compromise can be seen in the compound optimum designs in Table 2 for the experiment in a spherical region. Designs V and VI can be recommended for practical use, even though they are quite different from the standard designs.

*4.3.3.* *Example 3: cassava bread (n = 26; q = 3; p = 10), continued*

Some of the obvious compound optimum designs, as shown in Table 3, turn out to be the same as some of the other optimum designs that were found. In a situation like this, the criteria are not far from converging to each other, although we still see that the design used can be optimized for different objectives.

*4.3.4.* *Example 4: oil extraction (n = 40; q = 5; p = 20), continued*

This is another case where the designs in Table 5 give almost all degrees of freedom to lack of ﬁt, for the standard criteria, and to pure error for the new criteria, whereas classical designs, such

as that used in the experiment, tend to split the residual degrees of freedom more evenly. Two compound optimum designs are shown in Table 8. Again, we ﬁnd that they seem to represent a very good compromise, having a reasonable split between pure error and lack-of-ﬁt degrees of freedom and very good variance properties. Design VI, for example, has fairly similar degrees of freedom to those of the design actually used, but it has much lower variances of the treatment parameter estimators. If this consulting problem arose now, this is the design that we would recommend.

*4.3.5.* *Example 5 (n = 16; q = 8; p = 9), continued*

Even for the critical case of eight factors in 16 runs that was shown in Table 6, it is possible to
obtain some weight patterns for the compound criterion which allow degrees of freedom for
both pure error and lack-of-ﬁt estimation. Designs III and IV have the same value of the DS- and
.DP/S-criteria. Design IV looks attractive, since it also seems ‘more regular’ than the .DP/S-
and .AP/S-optimum designs in that each factor is either level balanced or has an imbalance of
*two runs. Design III has higher weighted-A-efﬁciency but is highly irregular. For structures of*
this type, there might be scope for further study of aliasing patterns and their relationship with
the new optimality criteria.

*4.3.6.* *Example 6: pastry dough (n = 28; b = 7; q = 3;p = 10), continued*

For the pastry dough experiment, two compound optimum designs are shown in Table 9, which again show an attractive compromise between variance properties and allocation of degrees of freedom. Design VI has the same allocation of degrees of freedom to pure error and lack of ﬁt as the design actually used in the experiment, but it has considerably better estimation properties.

In hindsight, this might be the design that we would recommend now.

These examples have shown that the properties of the design can be tailored to speciﬁc objec-
tives by choosing appropriate weights (*κs) in the compound criterion. As is frequently the*
case with weighted criteria, in practice the experimenter should try a few different settings
of the*κs and evaluate the resulting designs to decide which one to use. Some experimenters*
might appreciate even simpler rules of thumb and, on the basis of the examples given here,
along with several others, we could recommend using relative weights of 0, 1 or 4 for analyses
which are of no importance, of secondary importance and of primary importance respectively.

Small changes in weights (except at 0) do not greatly affect the relative properties of different designs.

**5.** **Discussion**

In ﬁtting polynomial models the validity of using higher order terms as an error estimate depends
on the validity of the model, whereas the validity of pure error as an error estimate is not depen-
dent on the ﬁt of the model. Classical design criteria were developed assuming that an indepen-
dent error estimate was available and thus, when that is not so, the usual criteria do not have the
properties that they are intended to have. We have proposed modiﬁcations to the usual criteria
so that the resulting designs take into account the necessity of obtaining a valid estimate of error
for proper inferences about the parameters of the model. The corrected criteria, based on the
*quantiles of appropriate F -distributions for inferences, may result in quite extreme designs that*
do not allow any lack-of-ﬁt checks. However, by using compound criteria which incorporate

**Table 8.** Compound optimum designs and their properties for five factors, one at two
levels and four at three levels, in 40 runs, under the second-order model (example
4; see also Table 5)

*Designs for the following compound criteria:*

**κ = (0.5, 0, 0, 0.5)—V****κ = (0, 0.5, 0, 0.5)—VI**

X1 X2 X3 X4 X5 X1 X2 X3 X4 X5

−1 −1 −1 −1 −1 −1 −1 −1 −1 1

−1 −1 −1 −1 −1 −1 −1 −1 −1 1

−1 −1 −1 1 1 −1 −1 −1 1 −1

−1 −1 1 −1 1 −1 −1 −1 1 −1

−1 −1 1 1 −1 −1 −1 1 −1 −1

−1 −1 1 1 −1 −1 −1 1 1 1

−1 1 −1 −1 1 −1 1 −1 −1 −1

−1 1 −1 −1 1 −1 1 −1 1 1

−1 1 −1 1 −1 −1 1 −1 1 1

−1 1 −1 1 −1 −1 1 1 1 −1

−1 1 1 −1 −1 −1 1 1 1 −1

−1 1 1 −1 −1 1 −1 −1 −1 −1

−1 1 1 1 1 1 −1 −1 −1 −1

1 −1 −1 −1 1 1 −1 −1 1 1

1 −1 −1 −1 1 1 −1 −1 1 1

1 −1 −1 1 −1 1 −1 1 −1 1

1 −1 −1 1 −1 1 −1 1 −1 1

1 −1 1 −1 −1 1 −1 1 1 −1

1 −1 1 −1 −1 1 −1 1 1 −1

1 1 −1 −1 −1 1 1 −1 −1 1

1 1 −1 −1 −1 1 1 −1 1 −1

1 1 −1 1 1 1 1 1 1 1

1 1 1 −1 1 1 1 1 1 1

1 1 1 −1 1 −1 1 1 −1 0

1 1 1 1 −1 1 1 −1 1 0

1 1 1 1 −1 1 1 1 −1 0

1 1 −1 1 1 −1 1 −1 0 −1

1 −1 1 1 0 −1 1 1 0 1

−1 1 1 0 1 1 1 1 0 −1

1 −1 1 0 1 −1 −1 0 −1 −1

−1 −1 0 −1 1 −1 1 0 −1 1

1 −1 0 1 1 1 1 0 −1 −1

−1 0 −1 1 1 −1 0 1 −1 1

1 0 1 1 1 1 0 1 −1 −1

−1 −1 −1 0 0 −1 −1 1 0 0

1 1 −1 0 0 −1 0 −1 −1 0

−1 1 0 1 0 1 0 −1 0 1

−1 0 1 −1 0 1 0 0 1 −1

−1 0 0 0 −1 1 −1 0 0 0

1 0 0 0 −1 −1 0 0 1 0

df (PE; LoF) (12; 8) (9; 11)

DS-eff 98.36 98.34

AS-eff 92.43 95.84

.DP/S-eff 89.09 77.22

.AP/S-eff 98.27 94.53