The first control chart was invented by Walter Shewhart, who made significant contributions to the quality of manufactured products (Juran, 1997). Afterwards, control charts have been one of the main tools of statistical process control (SPC). They are used to identify special causes of variability in a process by a graphical representation of a quality characteristic for the process under investigation (Hoyer and Ellis, 1996a, b, c, Nelson, 1999).
There are two distinct phases in control charting practice (see, e.g. Woodall, 2000). In Phase I, control charts are used for retrospectively testing whether the process is in control. In Phase II, control charts are used for monitoring the process for detecting any change from the in-control state. Woodall (2000) remarked that significant effort for process understanding and process improvement is often required in the transition from Phase I to Phase II.
In practice, the process parameters (e.g., mean and standard deviation) needed for constructing control limits in Phase II are usually unknown. Therefore, for estimating the process parameters, we often face to collect a set of process data and to decide whether they come from an in-control process. If not, one needs to inspect the out-of-control data points for assignable causes. We may eliminate assignable causes, if found, to stabilize process (Woodall, 2000) or discard some of these out-of-control samples and estimate the process parameters with the remaining presumably in-control data (Jones and Champ, 2002). Some estimation problems in constructing control charts are explicitly mentioned in Woodall and Montgomery (1999); also see Reynolds and Stoumbos (2001), Nedumaran and Pignatiello (2001), and Albers and Kallenberg (2004 a, b). For instance, in Jones and Champ (2002), it was pointed out that if one thinks of checking the m samples in a Phase I X control chart for stability as a sequence of m hypothesis tests for the mean, then these tests are dependent when the mean and standard deviation are estimated with all of the m samples. In order to achieve a fixed overall false-alarm rate, the control limits must be based on the joint distribution of the m
control charting statistics.
In early days, control charts are designed based on controlling the individual false-alarm rate for each sample no matter how many samples are tested (e.g., Hunter, 1998, Vermaat et al., 2003). Recently many research works on Phase I analysis are designed based on controlling the overall false-alarm rate. With a fixed individual false-alarm rate, the overall false-alarm rate gets larger when the number of samples m gets larger. This is why many authors choose to adjust the individual false-alarm rate with Bonferroni-type procedures in order to provide a reasonable overall false-alarm rate (see, e.g., Borror and Champ, 2001;
Champ and Chou, 2003; Mahmoud and Woodall, 2004; Nedumaran and Pignatiello, 2000).
Nedumaran and Pignatiello (2005) discussed many issues in constructing retrospective X control chart limits so as to control the overall probability of a false alarm at a desired level.
However, as mentioned before, Bonferroni-type methods tend to have little power in detecting out-of-control samples, especially when the number of samples is large. The FDR method proposed by Benjamini and Hochberg (1995) has the advantage of having higher power compared to the Bonferroni method for testing multiple hypotheses.
Consider the problem of testing m (null) hypotheses, of which hypotheses are true.
Let
m0
R and R denote the number of the rejected true hypotheses and the total number of the 0
rejected hypotheses, respectively. Table 1 summarizes the four possible outcomes of the m tests.
Table 1: Possible outcomes from m hypothesis tests Declared
non-significant
Declared significant
Total
True null hypotheses m0− R0 R 0 m0
Non-true null hypotheses m1− R1 R 1 m1
Total m−R R m
The false discovery rate, FDR, is defined as the expected proportion of erroneously rejected null hypotheses. With R and R representing, respectively, the number of true null 0 hypotheses rejected and the total number of null hypotheses rejected in a multiple testing procedure, let Q =R0/R if R > 0 and Q = 0 if R = 0. Then E(Q) is the FDR. The family-wise error rate (FWER) is defined as P( ), which is different from the “signal probability”
defined as P( ). Note that when the process is in control, the FWER and the signal probability are the overall false-alarm rate.
0 1
R ≥ 1
R≥
Benjamini and Hochberg (1995) also proved that:
(a) When all the null hypotheses are true, the FDR is the same as the FWER. This is obvious since in this case R0 = and thus R E Q( )=P R( ≥ =1) P R( 0 ≥ = FWER. 1)
(b) When only part of the null hypotheses are true and the others are false, the FDR is smaller than or equal to the FWER. The proof is also simple: when R0 =0, then Q= 0;
when R0 ≥1, then Q=R0/R≤ 1,Thus
{R0 1}
I ≥ ≥ . Taking expectations on both sides leads Q to P(R0 ≥ ≥1) E(Q).
Thus, any procedure that controls the FWER also controls the FDR. Therefore, the FDR offers a less stringent multiple-testing criterion than the FWER. The FDR may be more appropriate for some applications, particularly where a large number of null hypotheses tests are involved, for example, the microarray data analysis in bioinformatics.
Benjamini and Hochberg (1995) proved by induction that the following FDR method controls the FDR at level α when the p-values of the observed test statistics (under the null hypothesis) are independent and identically distributed as uniform [0, 1].
Step 1: Compute the p-values of the observed test statistics under the null hypothesis.
Step 2: Order the p-values as p(1) ≤ ≤... p( )m .
Step 3: Calculate k*= max {1≤ ≤k m p: ( )k ≤αk m/ }.
Step 4: If k* exists, then reject the null hypotheses corresponding to {p(1),...,p( *)k };
otherwise, reject nothing.
Benjamini and Yekutieli (2001) proved that this same procedure also controls the FDR when the test statistics have positive regression dependency on each of the test statistics corresponding to the true null hypotheses. Some other related research works on FDR include Finner and Roters (2001, 2002) and Sarkar (2002).