國
立
交
通
大
學
工業工程與管理學系
博
士
論
文
良率指標 S
pk的抽樣性質與樣本資訊下估計良率的精準度
Sampling Properties of the Yield Index S
pk
With
Estimation Accuracy and Sample Size Information
研 究 生:鄭雅靜
指導教授:彭文理 教授
良率指標 S
pk的抽樣性質與樣本資訊下估計良率的精準度
Sampling Properties of the Yield Index S
pkWith
Estimation Accuracy and Sample Size Information
研 究 生:鄭雅靜 Student:Ya-Ching Cheng
指導教授:彭文理 Advisor:W. L. Pearn
國 立 交 通 大 學
工 業 工 程 與 管 理 學 系
博 士 論 文
A Thesis
Submitted to Department of Industrial Engineering and Management
College of Management
National Chiao Tung University
in partial Fulfillment of the Requirements
for the Degree of
Doctor
in
Industrial Engineering and Management
January 2009
Hsinchu, Taiwan
良率指標
S
pk的抽樣性質與樣本資訊下估計良率的精準
學生:鄭雅靜
指導教授
:
彭文理 教授
國立交通大學 工業工程與管理學系 博士班
摘
要
Boyles 在 1994 年提出一個良率指標取名為 S
pk。該指標為常態製程提供一個精準正
確的良率衡量。Lee et al.在 2002 年提出一個常態近似的方法來估計製程的 S
pk值。在本
論文,我們延伸擴展前人的成就,在三種不同的情況下考量運用良率指標的抽樣分配:
(i)多重抽樣樣本下(ii)摺合逼近法的抽樣分配(iii)允收抽樣情況。我們在多重抽樣樣本下
推導出良率指標的抽樣分配,並且發現針對相同的 S
pk值,製程平均值落在規格上下限
中心時良率指標估計量的變異程度最大。針對一些常用的指標需求,S
pk的信賴下限已
列成表。為了讓使用者更易於接受推導出來的常態逼近分配,我們檢查了真實的型
I 誤
差並與一開始就設定好的顯著水準比較。我們計算了該常態逼近會收斂到與真值只有特
定差異的需求樣本數。最後,一個可再充電鋰電池製程實例被呈現並說明從業者如何應
用 S
pk的信賴下限到多重抽樣的樣本。
接著,我們使用摺合逼近法來估計製程的 S
pk值,並與常態逼近法做比較。比較的
結果顯示摺合的方法的確比常態逼近法在估計製程良率及 S
pk值上更為準確。根據摺合
的方法,我們建構一個逐步且有效率的步驟來說明如何估計製程良率。我們亦研究了摺
合方法的準確性,提供在特定檢定力需求下,以及在特定收斂需求下所需要的採集的樣
本個數。本論文最後一個部份,我們考慮根據 S
pk值來做允收與否的決策。允收抽樣計
畫是一個提供買賣雙方對於檢驗貨品是否符合產品品質需求的決策法則。我們提出一個
以 S
pk為依據的計數值抽樣計畫來處理不良率為極小
PPM 的製程。我們根據同時解一對
非線性的方程式,建立一個有效率的方法來決定所需抽樣的個數以及接受貨品與否的臨
界值。根據我們設計出來的抽樣計畫,從業者可以決定檢驗貨批所需的數量和相對應的
允收決策值。
關鍵字:製程能力,產品良率,多重抽樣,信賴下限,臨界值,檢定力,允收抽樣計畫,
貨批檢驗,不良率
Sampling Properties of the Yield Index S
pkWith
Estimation Accuracy and Sample Size Information
Student:Ya-Ching Cheng
Advisor:Dr. W. L. Pearn
Department of Industrial Engineering and Management
National Chiao Tung University
Abstract
The yield index Spk proposed by Boyles (1994) provides an exact measure on the production
yield of normal processes. Lee et al. (2002) considered a normal approximation for estimating Spk. In the thesis, we extend the results and consider the sampling distribution of the yield index in three conditions (i) for multiple samples, (ii) in the convolution method, and (iii) for acceptance sampling. Under multiple samples, we derive the sampling distribution for the estimator ′S of ˆpk
Spk, and observe that for the same Spk, the variance of ′S would be largest when the process ˆpk
mean is on the center of specification limits. Lower bounds of Spk are tabulated for some
commonly used capability requirements. To assess the normally approximated distribution of ′
ˆ pk
S , we also check out the actual type I error and compare with the preset significant level. We
also compute the sample sizes required for the normal approximation to converge to the actual
Spk within a designated accuracy. Then, a real-world application of the one-cell rechargeable
Li-ion battery packs is presented to illustrate how practitioners can apply the lower bounds to actual data collected in multiple samples.
Next, we consider a convolution approximation for estimating Spk, and compare with the
normal approximation. The comparison results show that the convolution method does provide a
more accurate estimation to Spk as well as the production yield than the normal approximation.
An efficient step-by-step procedure based on the convolution method is developed to illustrate how to estimate the production yield. Also investigated is the accuracy of the convolution method which provides useful information about sample size required for designated power levels,
and for convergence. Finally, we consider the acceptance determination based on the Spk index.
Acceptance sampling plans provide the vendor and the buyer decision rules for lot sentencing to meet their product quality needs. A variables sampling plan based on the index Spk is proposed to handle processes requiring very low PPM fraction of defectives. We develop an effective method
for obtaining the required sample sizes n and the critical acceptance value c0 by solving
simultaneously two nonlinear equations. Based on the designed sampling plan, the practitioners can determine the number of production items to be sampled for inspection and the corresponding critical acceptance value for lot sentencing.
Keywords: Process capability, production yield, multiple samples, lower confidence bound, critical value, power of test, acceptance sampling plan, lot sentencing, fraction of defectives
誌
謝
我能完成這本博士論文,首先也是最最要感謝的就是在我背後支持我的家人-我的
爸媽,還有我親愛的大姐,還有等了我許多年的男友。當然還有不吝其煩教導指導我的
兩位老師:我的博士論文指導教授,彭文理教授,以及交大統計所的洪慧念教授(也是
我的碩士論文指導教授)。哦!還有多年來曾經陪伴我度過漫漫研究室歲月的難兄難
弟,不是,是學長姐以及學弟妹們,尤其是于婷學姐,長久以來給我許多心靈上的慰藉,
感謝你們,真的很高興這段時間有你們這一群朋友。
最後,我想就謝謝天吧!感謝祢讓這本論文順利完稿!
鄭雅靜
于民國九十七年十二月三日星期三
Contents
摘要... iii
Abstract... iv
誌謝... v
Contents... vi
List of Tables ... viii
List of Figures ... ix
CHAPTER 1... 1
INTRODUCTION ... 1
1.1.PROCESS CAPABILITY INDICES... 1
1.2.LITERATURES REVIEW... 2
1.3.MOTIVATION OF RESEARCH... 3
CHAPTER 2... 4
ESTIMATING PROCESS YIELD BASED ON SPK FOR MULTIPLE SAMPLES... 4
2.1.INTRODUCTION... 4
2.2.THE YIELD INDEX SPK... 5
2.3.ESTIMATING SPKUNDER MULTIPLE SAMPLES... 7
2.4.LOWER CONFIDENCE BOUND OF SPK... 10
2.5.ACCURACY OF THE NORMAL APPROXIMATION... 11
2.6.AN APPLICATION EXAMPLE... 15
CHAPTER 3... 18
PROCEDURE OF THE CONVOLUTION METHOD FOR ESTIMATING PRODUCTION YIELD WITH SAMPLE SIZE INFORMATION... 18
3.1.INTRODUCTION... 18
3.2.THE YIELD INDEX SPK... 18
3.3.NORMAL APPROXIMATION OF Sˆpk: Sˆ′pk ... 21
3.4.CONVOLUTION APPROXIMATION OF Sˆpk: Sˆ′′pk ... 23
3.5.COMPARISONS OF BOTH APPROXIMATIONS... 27
3.5.1. Comparison of probability curves ... 27
3.5.2. Comparison of critical values ... 28
3.6.ACCURACY ANALYSIS... 33
3.6.1. Sample size required for designated power ... 33
3.6.2. Sample size required for convergence ... 35
CHAPTER 4... 36
PRODUCT ACCEPTANCE DETERMINATION BASED ON THE SPK INDEX ... 36
4.1.INTRODUCTION... 36
4.2.PROCESS CAPABILITY INDICES... 38
4.2.1. Process capability indices Ca, Cp, Cpk, Cpm, Cpmk... 38
4.2.2. The yield index Spk... 39
4.2.3. Sampling distribution of the estimated Spk... 40
4.3.DESIGNING SPKVARIABLES SAMPLING PLAN... 41
4.4.ACCURACY OF SPKVARIABLES SAMPLING PLANS... 47
4.5.AN APPLICATION EXAMPLE... 49
CHAPTER 5... 52
CONCLUSIONS AND FUTURE WORKS... 52
REFERENCES ... 54
APPENDIX... 59
APPENDIX I:TAYLOR EXPANSION OF Sˆpk FOR MULTIPLE SAMPLES... 59
APPENDIX II:CALCULATION OF LOWER BOUNDS FOR MULTIPLE SAMPLES... 60
List of Tables
Table 2-1. (a) The Cpk value, calculated yield, and actual yield of five different processes in Figure 2-1, (b) The Spk value, calculated yield, and actual yield of five different processes in Figure 2-1 ………..………. 6 Table 2-2. Some different processes and corresponding mnVar Sˆ( pk′ with S) pk = 1.0 ……… 9 Table 2-3. Approximated LB for various m, ˆS , n = 5(5)50, and pk α = 0.05, 0.025, 0.01 … 10
Table 2-4. Simulated type I errors α for various m, n, and Spk with 10,000 lower bounds 12
Table 2-5. Ratios of the average of 10,000 lower bounds and the real Spk, i.e. LB / Spk …… 13
Table 2-6. Sample sizes required for the normal approximation to converge with α = .05 14
Table 2-7. The collected electrical characteristic data of 12 samples each of size 50 ……… 16
Table 3-1. Various Spk values and the corresponding production yields as well as
non-conformities in PPM ………. 19
Table 3-2. Contradiction between ˆS and test statistic T in Lee’s method ……… 22 pk
Table 3-3. Critical values of the two approximations versus the simulated ones ………….. 30 Table 3-4. Sample size required for designated power levels of the convolution method … 34 Table 3-5. Sample size required for the convolution approximation to converge …………. 35 Table 4-1. ( n , c0) v a l u e s f o r α-risk = 0 . 0 1 , 0 . 0 2 5 ( 0 . 0 2 5 ) 0 . 1 0 , β-risk =
0.01,0.025(0.025)0.10 with various (SAQL, SLTPD) ……… 46
Table 4-2. Probabilities of accepting the lot for -riskα = 0.01, 0.025(0.025)0.10, -riskβ = 0.01,0.025(0.025)0.10 with various (SAQL, SLTPD) by simulation with N=10000 .. 48 Table 4-3. Specification of Y5V/BME/1206/22uF/6.3V ……….. 50 Table 4-4. The sample data with 55 observations (unit: mm) ………. 50
List of Figures
Figure 2-1. Distribution of five processes with USL = 36.0, LSL = 24.0 ……… 6 Figure 2-2. Histogram of 10000 lower bounds with (a) Spk = 1.00, (b) Spk = 1.33, (c) Spk =
1.50, (d) Spk = 1.67 ……… 11
Figure 2-3. The X S charts based on the collected 12 samples each of size 50 ……… 16 −
Figure 3-1. Histograms of ˆS with simulation parameters Spk pk = 1.0 and ξ = 0 ………. 20
Figure 3-2. ξ versus critical values for various n and Spk ……… 26
Figure 3-3. The PDF (l.h.s.) and CDF (r.h.s.) of ˆ′S and ˆ′′pk S as well as the density pk
and distribution curves of ˆS via simulation ………. 28 pk
Figure 3-4. Power curves for testing (a) H0: Spk≤ 1.0 vs H1: Spk > 1.0, n = 30, (b) H0:
Spk≤ 1.0 vs H1: Spk > 1.0, n = 50, (c) H0: Spk≤ 1.5 vs H1: Spk > 1.5, n = 30, (d)
H0: Spk≤ 1.5 vs H1: Spk > 1.5, n = 50 ……… 31
Figure 4-1. (a) Surface plot of S1, (b) Surface plot of S2 ……… 43
Figure 4-2. Surface plot of S1 and S2 ……….. 43
Figure 4-3. (a) The required sample size n as surface plot with α= 0.01 (0.01) 0.10 and
β = 0.01(0.01)0.10 under (SAQL, SLTPD) = (1.33, 1.00), (b) The critical
acceptance value c0 as surface plot with α= 0.01 (0.01) 0.10 and β =
0.01(0.01)0.10 under (SAQL, SLTPD) = (1.33, 1.00), (c) The required sample size
n as surface plot with α= 0.01 (0.01) 0.10 and β = 0.01(0.01)0.10 under (SAQL, SLTPD) = (1.50, 1.33), (d) The critical acceptance value c0 as surface plot with α= 0.01 (0.01) 0.10 and β = 0.01(0.01)0.10 under (SAQL, SLTPD) = (1.50, 1.33) ……… 44 Figure 4-4. Construction of multi-layer ceramic capacitor ……….. 49
Chapter 1
Introduction
1.1. Process Capability Indices
Production yield, for a long time, has been a standard criterion used in the manufacturing industry as a common measure on process performance, and defined as the percentage of processed product unit that falls within the manufacturing specification limits. For product units falling out of the manufacturing tolerance, additional cost would be incurred to the factory for scrapping or repairing the product. All passed product units, which incur no additional cost to the factory, are equally accepted by the producer. Numerous process capability indices (PCI) have been proposed to the manufacturing industry, to provide numerical measures on the production yield as well as process performance. Those indices, such as Cp, Cpk, Cpm, Cpmk, and Spk, establish the relationship between the actual process performance and the manufacturing specifications, which have been the focus of the recent research in statistical and quality assurance literatures. The explicit forms of the indices are defined as follows:
a C 1 m d μ − = − , 6 p USL LSL C σ − = , min , 3 3 pk USL LSL C μ μ σ σ − − ⎧ ⎫ = ⎨ ⎬ ⎩ ⎭= C C , a p 2 2 6 ( ) pm USL LSL C T σ μ − = + − , pmk min 3 2 ( )2 ,3 2 ( )2 USL LSL C T T μ μ σ μ σ μ ⎧ − − ⎫ ⎪ ⎪ = ⎨ ⎬ + − + − ⎪ ⎪ ⎩ ⎭ , and -1 1 1 1 3 2 2 pk USL LSL S μ μ σ σ ⎧ ⎛ − ⎞ ⎛ − ⎞⎫ = Φ ⎨ Φ⎜ ⎟+ Φ⎜ ⎟⎬ ⎝ ⎠ ⎝ ⎠ ⎩ ⎭,
where USL and LSL are the upper and lower specification limits, respectively, μ is the process
mean, σ is the process standard deviation, m = (USL + LSL)/2 is the center of the specification
limits, d = (USL − LSL)/2 is the half length of the specification limits, T is the target value, ( )Φ ⋅ is the cumulatively distribution function (CDF) of the standard normal variable, and Φ ⋅ is −1( ) the inverse function of Φ ⋅ . ( )
1.2. Literatures Review
The index Cp measures the overall process variation relative to the specification tolerance, therefore only reflects the process precision (the product consistency) (see Juran [20]; Kane [21]).
Owing to the simplicity of the design, Cp can not reflect the tendency of process centering. In
order to reflect the deviations of process mean from the target value, several indices similar in nature to Cp, such as Cpk, Cpm, Cpmk, have been proposed. Those indices attempt to take into
consideration the magnitude of process variance as well as process location. The Cpk index was
developed because the Cp index can not adequately deal with cases that process mean is not
centered. However, a large value of Cpk does not really say anything about the location of the
mean in the tolerance interval. The Cpk index has been regarded as a yield-based index since it
provides bounds on production yield for a normally distributed process. The Cp and Cpk indices
are appropriate measures of progress for quality improvement paradigms in which reduction of variability is the guiding principle and production yield is the primary measure of success.
Taguchi, on the other hand, emphasizes the loss in a product’s worth, rather than the production yield, when one of its characteristics departs from the target value. Hsiang and
Taguchi [16] introduced the index Cpm, which was also proposed independently by Chan et al. [4].
The Cpm index is related to the idea of squared error loss, loss X( ) (= X T− )2, and has been called
the Taguchi index. The Cpm index incorporates the process variation with respect to the target
value with the manufacturing specifications preset in the factory, which reflects the degree of process targeting. Chan et al. [4] also discussed the sampling properties of the natural estimator of
Cpm. Boyles [2] provided a definitive analysis of Cpm and its usefulness in measuring process
targeting. Pearn and Shu [38] provided explicit formulas with efficient algorithms to obtain the
lower confidence bound of Cpm using the maximum likelihood estimator (MLE) of Cpm. Pearn et al.
[39] developed a two-phase supplier selection procedure based on the Cpm index providing useful
information about sample size required for a designated selection power.
Pearn et al. [35] proposed a third-generation capability index called Cpmk, which is constructed by combining the merits of the three indices Cp, Cpk, and Cpm. The index Cpmk alters the user either the process variance increases or the process mean deviates from its target value. The Cpmk index
responds to the departure of the process mean from the target value T faster than the other three indices Cp, Cpk, and Cpm, while it remains sensitive to the changes of process variation. Vännman and Kotz [49] obtained the distribution of the estimated Cp(u, v) for cases with on-center target.
By taking u = 1 and v = 1, the distribution of Cp(1, 1) = Cpmk is obtained. Chen and Hsu [7]
proposed the asymptotic sampling distribution of Cpmk, and showed that the estimated Cpmk is
consistent, asymptotically unbiased estimator of Cpmk and is asymptotically normal while the
fourth moment of the characteristic X is finite. Wright [51] derived an explicit but rather complicated expression of the probability density function (PDF) of the estimated Cpmk. Pearn and
Lin [37] alternatively expressed the CDF and PDF of the estimated Cpmk in terms of a mixture of
the chi-square distribution and normal distribution. The CDF form of the estimated Cpmk obtained
by Pearn and Lin [37] considerably simplify the complexity for analyzing the statistical properties of the estimated Cpmk.
1.3. Motivation of Research
Process yield is the most common and standard criteria used in the manufacturing industry
for measuring process performance. The indices Cpm and Cpmk are designed to emphasize the loss
in a product’s worth when one of its characteristics departs from the target value T. The index Cp can provide yield estimation only for on-center processes, which can be expressed as:
Yield = Φ2 ( ) 1Cp − ,
and for processes with departure mean, the process yield would be less than Φ2 ( ) 1Cp − . The
index Cpk can provide interval estimation on the process yield (Boyles [2]), that is 2 Φ (3Cpk) – 1 ≤ Yield ≤ Φ (3Cpk).
Only the yield index Spk provides an exact measure on the production yield.
The organization of this dissertation is as follows. In chapter 2, we consider to estimate the
process yield based on the Spk index for multiple samples. In chapeter3, we propose the
convolution method for estimating the process yield. In chapter 4, product acceptance
determination based on the Spk index is developed. In the final chapter, we make some
Chapter 2
Estimating Process Yield Based on S
pkfor Multiple Samples
2.1. Introduction
Production yield is defined as the percentage of processed product units passing the inspection. That is, the product characteristic must fall within the manufacturing tolerance. For processes with two-sided manufacturing specifications, the process yield can be calculated as
Yield = F(USL)-F(LSL), where F(·) is the cumulative distribution function of the process
characteristic. If the process characteristic follows the normal distribution, then the process yield can be alternatively expressed as Yield = Φ[(USL-μ)/ ] [(σ-Φ LSL-μ)/ ]σ . Take Cpk for example, if Cpk
= c, then the process yield would be in the range of 2 Φ (3c) – 1 and Φ (3c), i.e. 2 Φ (3Cpk) –
1 ≤ Yield ≤ Φ (3Cpk) (Boyles [2]). To overcome this shortcoming, Boyles [3] proposed the yield
index called Spk. There is a one-to-one relationship between Spk and the process yield, Yield = 2 Φ (3Spk) – 1.
Most of the results obtained regarding the statistical properties of estimated capability indices are based on one single sample. However, a common practice in process control is to estimate the process capability indices by using past “in-control data” from multiple samples, particularly, when a daily-based or weekly-based production control plan is implemented for monitoring process stability. To use estimators based on several small multiple samples and interpret the results as if they were based on a single sample may result in incorrect conclusions. In order to use past in-control data from multiple samples to make decisions regarding process capability, the distribution of the estimated capability index based on multiple samples should be taken into account. When using multiple samples, Kirmani et al. [22] have investigated the distribution of estimators of based on the sample standard deviations of the multiple samples. Li et al. [27] have
investigated the distribution of estimators of Cp and Cpk based on the ranges of the multiple
samples. Vännman and Hubele [48] considered the indices in the class defined by C u v and p( , )
parameters μ and σ are based on multiple samples.
In Chapter 2, we investigate the behavior of an estimator of Spk for multiple samples. In the
second section (section 2.2.), we compare the yield index Spk with the most commonly used index
Cpk, and review some results of Spk under single sample. In the third section (section 2.3.), we
derive the sampling distribution for the estimator of Spk under multiple samples and result in a
normal approximation distribution. In the fourth section (section 2.4.), we find that the spread of ′
ˆ pk
S would be largest when the process mean is on the center of specification limits for the same Spk, so we calculate the lower bounds of Spk from our deriving distribution of ′S based on the ˆpk situation with the largest variance for conservative. In the fifth section (section 2.5.), we show the accuracy of our normal-approximated distribution of ′S by displaying the histograms of lower ˆpk
bounds and the actual type I errors. Finally (in section 2.6.), we give an application example to describe how to use the lower bounds as listed in our tables.
2.2. The Yield Index S
pkWe consider a group of five processes as printed in Figure 2-1. For these processes, USL =
36.0, LSL = 24.0, and mean μ = 30.0, 30.5, 31.0, 31.5, 32.0, standard deviation σ = 2.0, 11/6,
5/3, 1.5, 4/3, respectively (from process A to E). The Cpk value and calculated yield of theses five processes are all the same as in Table 2-1(a), and the Spk value and its calculated yield in Table
2-1(b). The ‘Actual Yield’ in Table 2-1(a) and 2-1(b) is defined by Φ ((USL-μ)/σ )- Φ ((LSL
-μ)/σ ). We can see that for these five processes the calculated yield of Cpk can only guarantee the lower bound yield, however the calculated yield of Spk value can truly reveal the actual yield of each process.
For single sample, Lee et al. [26] have derived the distribution of an estimator of Spk. The estimator is defined as φ ⎛ ⎞ = + + ⎜ ⎟ ⎝ ⎠ 1 ˆ 6 (3 ) pk pk p pk W S S O n n S
where W is normally distributed with a mean of zero and a variance of a2+b , 2
φ φ ⎧ − ⎛ − ⎞ + ⎛ + ⎞⎫ ⎪ ⎪ = ⎨ ⎜⎜ ⎟⎟+ ⎜⎜ ⎟⎟⎬ ⎪ ⎝ ⎠ ⎝ ⎠⎪ ⎩ ⎭ 1 1 1 1 1 2 dr dr dr dr dp dp dp dp C C C C a C C C C , φ φ ⎛ − ⎞ ⎛ + ⎞ = ⎜⎜ ⎟⎟− ⎜⎜ ⎟⎟ ⎝ ⎠ ⎝ ⎠ 1 dr 1 dr dp dp C C b C C ,
Figure 2-1. Distribution of five processes with USL = 36.0, LSL = 24.0. Table 2-1(a). The Cpk value, calculated yield, and actual yield of five different processes in Figure 2-1.
Process μ σ Cpk Calculated Yield Actual Yield
A 30.0 2.00 1.0 0.9973 0.9973
B 30.5 1.83 1.0 0.9973 0.9985
C 31.0 1.67 1.0 0.9973 0.9986
D 31.5 1.50 1.0 0.9973 0.9986
E 32.0 1.33 1.0 0.9973 0.9987
Table 2-1(b). The Spk value , calculated yield, and actual yield of five different processes in Figure 2-1.
Process μ σ Spk Calculated Yield Actual Yield
A 30.0 2.00 1.000000 0.9973 0.9973
B 30.5 1.83 1.055311 0.9985 0.9985
C 31.0 1.67 1.067441 0.9986 0.9986
D 31.5 1.50 1.068365 0.9986 0.9986
E 32.0 1.33 1.068385 0.9987 0.9987
φ ⋅() is the probability density function (PDF) of the standard normal variable, Cdr =(μ−m)/d, and /dCdp =σ . Therefore, ˆS is asymptotically normal-distributed with mean Spk pk and variance
φ
+
2 2 2
(a b )/36n (3Spk). Furthermore, Pearn and Chuang [34] investigated the accuracy of the
mean square error for some commonly used quality requirement.
Most of the results obtained regarding the statistical properties of estimated capability indices are based on one single sample. However, to use estimators based on several small multiple samples and interpret the results as if they were based on a single sample may result in incorrect conclusions. In order to use past in-control data from multiple samples to make decisions regarding process capability, the distribution of the estimated capability index based on multiple samples should be taken into account. So, following we will investigate the sampling distribution of Spk on multiple samples.
2.3. Estimating S
pkUnder Multiple Samples
For the case when the studied characteristic of the process is normally distributed and we have m multiple samples where the sample size of the ith sample is n. Let x , i = 1,…, m; j = ij
1,…, n, be the characteristic value of the m× n samples with mean μ and variance σ2. Assume
that the process is in statistical control during the time period that the multiple samples are taken. Consider the process is monitored using a X -chart together with a S -chart. Then, for each multiple sample, let x and i 2
i
s denote the sample mean and sample variance, respectively, of
the ith sample and let N denote the total number of observations, i.e
= =
∑
1 1 n i ij j x x n , = = − −∑
2 2 1 1 ( ) 1 n i ij i j s x x n and = =∑
= 1 m i N n mn .As an estimator of μ, we use the overall sample mean, i.e.
μ = = = = =
∑
=∑∑
1 1 1 1 1 ˆ m i m n ij i i j x x x m mn .We consider two ways to compute the variance estimator in estimating Spk (Hubele and Vänman
[18]). One estimator of σ2 is the pooled variance estimator defined as
σ = = =
∑
− 2 2 2 1 1 ˆ p m ( 1) i i s n s mn = = =∑∑
− 2 1 1 1 m n ( ) ij i i j x x mn .The other is an un-pooled variance estimator defined as
σ = = = =
∑∑
− 2 2 2 1 1 1 ˆ u m n ( ij ) i j s x x mn .The natural estimator of Spk is = ˆ pk S μ σ − ⎧ ⎛ − ⎞ Φ ⎨ Φ⎜ ⎟ ⎝ ⎠ ⎩ 1 ˆ 1 1 ˆ 3 2 USL μ σ ⎫ − ⎛ ⎞ + Φ⎜ ⎟⎬ ⎝ ⎠⎭ ˆ 1 ˆ 2 LSL .
It is obviously that the sampling distribution of ˆS is a very complex function of pk μˆ and σˆ .
However, a useful approximation could be obtained by the following expansion of Spk. For
deriving convenience, we use the notations in Lee’s paper: m d dr C = μ− , d dp C =σ , ˆ ˆ m d dr C =μ− , ˆ ˆ d dp C =σ , and then the estimator of Spk can be rewritten as
= ˆ pk S Φ− ⎧⎪ Φ⎛⎜ − ⎞⎟ ⎨ ⎜ ⎟ ⎪ ⎝ ⎠ ⎩ 1 1 ˆ 1 1 ˆ 3 2 dr dp C C ⎫ ⎛ + ⎞⎪ ⎜ ⎟ + Φ ⎬ ⎜ ⎟⎪ ⎝ ⎠⎭ ˆ 1 1 ˆ 2 dr dp C C . Let
(
μ μ)
= ˆ− Z mn and Y = mn(
σˆ2−σ2)
.We note that μˆ is a complete sufficient statistic and σˆ (for either 2 2
p
s or 2
u
s ) is an ancillary
statistic, so by Basu’s theorem Z and Y are independent. Since the first two moments of μˆ and
σˆ exist, by the Central Limit Theorem Y converges to N(0,2 2σ4) under both estimators, 2
p
s
and 2
u
s , and Z converges to N(0,σ2) as mn goes to infinity. Consequently, by the Taylor’s expansion ˆS can be expressed as pk
φ ⎛ ⎞ = + + ⎜ ⎟ ⎝ ⎠ 1 ˆ 6 (3 ) pk pk p pk W S S O mn mn S , where φ φ φ φ σ σ ⎡ − ⎛ − ⎞ + ⎛ + ⎞⎤ ⎡ ⎛ − ⎞ ⎛ + ⎞⎤ = − ⎢ ⎜⎜ ⎟⎟+ ⎜⎜ ⎟⎟⎥− ⎢ ⎜⎜ ⎟⎟− ⎜⎜ ⎟⎟⎥ ⎢ ⎝ ⎠ ⎝ ⎠⎥ ⎢ ⎝ ⎠ ⎝ ⎠⎥ ⎣ ⎦ ⎣ ⎦ 2 1 1 1 1 1 1 1 1 2 dr dr dr dr dr dr dp dp dp dp dp dp C C C C C C W Y Z C C C C C C
which is normally distributed with mean zero and variance a2+b , 2
φ φ ⎧ − ⎛ − ⎞ + ⎛ + ⎞⎫ ⎪ ⎪ = ⎨ ⎜⎜ ⎟⎟+ ⎜⎜ ⎟⎟⎬ ⎪ ⎝ ⎠ ⎝ ⎠⎪ ⎩ ⎭ 1 1 1 1 1 2 dr dr dr dr dp dp dp dp C C C C a C C C C , φ φ ⎛ − ⎞ ⎛ + ⎞ = ⎜⎜ ⎟⎟− ⎜⎜ ⎟⎟ ⎝ ⎠ ⎝ ⎠ 1 dr 1 dr dp dp C C b C C
and φ is the pdf of the standard normal distribution (See Appendix I for explicit derivation). We let
′ ˆ pk S = − ⎛⎜ ⎞⎟ ⎝ ⎠ 1 ˆ pk p S O mn
Thus, our ′S is normally distributed, i.e. ˆpk
′ ˆ pk S ~ φ ⎛ + ⎞ ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ 2 2 2 , 36 (3 ) pk pk a b N S mn S .
For testing process performance, we consider the following null and alternative hypotheses: ≤
0: pk
H S c , c is a specified value. (Process is incapable)
> 1: pk
H S c . (Process is capable)
The testing statistic is
(
)
φ = − + 2 2 ˆ 6 (3 ) ˆ ˆ ˆ pk pk mn S T S c a bwhere ˆa and ˆb are estimates of a and b, with Cdr and Cdp replaced by ˆC and ˆdr C dp respectively. The null hypothesis H0 is rejected at α level if T >z , where α z is the upper α
α
100 % point of the standard normal distribution. An approximate 1−α confidence interval
for Spk is α φ ⎛ + ⎜ − ⎜ ⎝ 2 2 2 ˆ ˆ ˆ ˆ 6 (3 ) pk pk a b S z mn S , α φ ⎞ + ⎟ + ⎟ ⎠ 2 2 2 ˆ ˆ ˆ ˆ 6 (3 ) pk pk a b S z mn S .
Table 2-2. Some different processes and corresponding mnVar Sˆ( pk′ with S) pk = 1.0.
μ σ Cdr Cdp a b mnVar Sˆ( pk′ ) 8.0000 2.0000 0.0000000 0.3333333 0.0188027 0.0000000 0.499999935 8.0452 1.9995 0.0075315 0.3332483 0.0187932 0.0006001 0.499999793 8.0908 1.9979 0.0151253 0.3329907 0.0187643 0.0012004 0.499995833 8.1371 1.9953 0.0228474 0.3325531 0.0187158 0.0018014 0.499978579 8.1846 1.9915 0.0307724 0.3319220 0.0186472 0.0024033 0.499930778 8.2339 1.9865 0.0389895 0.3310765 0.0185575 0.0030066 0.499826688 8.2857 1.9799 0.0476115 0.3299852 0.0184455 0.0036117 0.499628976 8.3407 1.9716 0.0567897 0.3286013 0.0183095 0.0042190 0.499284463 8.4004 1.9611 0.0667404 0.3268532 0.0181470 0.0048292 0.498717347 8.4668 1.9478 0.0777965 0.3246260 0.0179547 0.0054431 0.497816042 8.5431 1.9303 0.0905222 0.3217203 0.0177273 0.0060618 0.496407586 8.6361 1.9065 0.1060173 0.3177420 0.0174563 0.0066871 0.494199792
2.4. Lower Confidence Bound of S
pkWe note that, for the same Spk, the variance of ′S increases as the process mean closes to ˆpk the center of the specification limits, and would be largest when the process mean is at the center of the specification limits. For two processes with the same Spk, i.e. the same process yield, one with process mean away from the center of the specification limits must have smaller variance in order to have process yield equal to the other. Also, the process with smaller variance would have
smaller variance of ′S . Table 2-2 shows some different processes and corresponding ˆpk
pk
mnVar Sˆ( ′ with LSL = 2.0, USL = 14.0, and S) pk =1.0.
Table 2-3. Approximated LB for various m, ˆS , n = 5(5)50, and pk α= 0.05, 0.025, 0.01. ˆ pk S 1.0 1.33 1.5 1.67 2.0 m n 0.05 0.025 0.01 0.05 0.025 0.01 0.05 0.025 0.01 0.05 0.025 0.01 0.05 0.025 0.01 5 0.7690 0.7364 0.7018 1.0253 0.9819 0.9358 1.1535 1.1046 1.0528 1.2817 1.2274 1.1698 1.5380 1.4729 1.4037 10 0.8248 0.7980 0.7690 1.0997 1.0640 1.0253 1.2372 1.1970 1.1535 1.3747 1.3301 1.2817 1.6496 1.5961 1.5380 15 0.8522 0.8287 0.8030 1.1362 1.1050 1.0707 1.2783 1.2431 1.2046 1.4204 1.3813 1.3384 1.7044 1.6575 1.6061 20 0.8694 0.8482 0.8248 1.1592 1.1309 1.0997 1.3041 1.2723 1.2372 1.4491 1.4137 1.3747 1.7388 1.6964 1.6496 25 0.8815 0.8620 0.8403 1.1754 1.1493 1.1204 1.3223 1.2930 1.2605 1.4693 1.4367 1.4006 1.7631 1.7240 1.6807 30 0.8907 0.8725 0.8522 1.1876 1.1633 1.1362 1.3361 1.3088 1.2783 1.4846 1.4542 1.4204 1.7815 1.7450 1.7044 35 0.8980 0.8808 0.8616 1.1973 1.1744 1.1488 1.3470 1.3212 1.2925 1.4968 1.4681 1.4361 1.7961 1.7617 1.7233 40 0.9040 0.8876 0.8694 1.2053 1.1835 1.1592 1.3560 1.3315 1.3041 1.5067 1.4795 1.4491 1.8080 1.7753 1.7388 45 0.9090 0.8934 0.8759 1.2119 1.1912 1.1679 1.3635 1.3401 1.3139 1.5150 1.4890 1.4600 1.8180 1.7868 1.7519 3 50 0.9132 0.8983 0.8815 1.2176 1.1977 1.1754 1.3699 1.3475 1.3223 1.5221 1.4972 1.4693 1.8265 1.7966 1.7631 5 0.8248 0.7980 0.7690 1.0997 1.0640 1.0253 1.2372 1.1970 1.1535 1.3747 1.3301 1.2817 1.6496 1.5961 1.5380 10 0.8694 0.8482 0.8248 1.1592 1.1309 1.0997 1.3041 1.2723 1.2372 1.4491 1.4137 1.3747 1.7388 1.6964 1.6496 15 0.8907 0.8725 0.8522 1.1876 1.1633 1.1362 1.3361 1.3088 1.2783 1.4846 1.4542 1.4204 1.7815 1.7450 1.7044 20 0.9040 0.8876 0.8694 1.2053 1.1835 1.1592 1.3560 1.3315 1.3041 1.5067 1.4795 1.4491 1.8080 1.7753 1.7388 25 0.9132 0.8983 0.8815 1.2176 1.1977 1.1754 1.3699 1.3475 1.3223 1.5221 1.4972 1.4693 1.8265 1.7966 1.7631 30 0.9202 0.9063 0.8907 1.2269 1.2084 1.1876 1.3803 1.3595 1.3361 1.5337 1.5106 1.4846 1.8404 1.8127 1.7815 35 0.9257 0.9127 0.8980 1.2342 1.2169 1.1973 1.3885 1.3690 1.3470 1.5428 1.5212 1.4967 1.8514 1.8254 1.7961 40 0.9301 0.9178 0.9040 1.2401 1.2238 1.2053 1.3952 1.3768 1.3560 1.5503 1.5298 1.5067 1.8603 1.8357 1.8080 45 0.9338 0.9222 0.9089 1.2451 1.2295 1.2119 1.4008 1.3833 1.3634 1.5565 1.5370 1.5150 1.8677 1.8444 1.8179 6 50 0.9370 0.9259 0.9132 1.2493 1.2345 1.2176 1.4056 1.3888 1.3698 1.5618 1.5432 1.5221 1.8741 1.8518 1.8265
The lower bounds displayed in Table 2-3 are calculated under the condition that process mean is on the center of specification limits for assurance purpose. This approach ensures that
wrongly concluding an incapable process as capable. When the practitioner wants to know what the least process yield (or say Spk) is, necessary samples could be taken from the “stable” process to calculate the ˆS and check the lower bound. The lower bound represents the minimal Spk pk of the process with 1−α confidence level.
For the convenience of practitioners, we also develop a Matlab program to calculate the lower bounds (see Appendix II). Table 2-3 shows the lower bounds LB computed from the
normal approximation for ˆS = 1.0, 1.33, 1.5, 1.67, 2.0, n = 5(5)50, m = 3, 6, and pk α= 0.05,
0.025, 0.01. For example, sampling with number of multiple samples m = 3 and each of sample size n = 50, resulting in sampling estimate ˆS = 1.67, we then conclude that the process has at pk
least Spk = 1.5221 with 95% confidence level.
Figure 2-2(a). Histogram of 10000 lower bounds with Spk = 1.00.
Figure 2-2(b). Histogram of 10000 lower bounds with Spk = 1.33.
Figure 2-2(c). Histogram of 10000 lower bounds with Spk = 1.50.
Figure 2-2(d). Histogram of 10000 lower bounds with Spk = 1.67.
In order to assess the normally approximated distribution of ′S , we simulate with 10,000 ˆpk
replications to generate 10,000 estimates of ˆS , calculate their lower bounds, and compare with pk
the real (preset) Spk for various commonly used quality requirement. Figures 2-2(a) to 2-2(d) show histograms of lower bounds each of 10,000 replications with α = 0.05, m = 3, n = 50, Spk = 1.00, 1.33, 1.50, and 1.67, respectively. Table 2-4 displays the actual type I errors for various m, n, Spk, and each with 10,000 simulated lower bounds.
Table 2-4. Simulated type I errors α for various m, n, and Spk with 10,000 lower bounds.
m Spk n=10 n=20 n=30 n=50 n=100 n=150 n=200 1.00 0.1520 0.1156 0.1015 0.0812 0.0788 0.0680 0.0695 1.33 0.1593 0.1147 0.0987 0.0839 0.0750 0.0680 0.0694 1.50 0.1634 0.1161 0.1015 0.0869 0.0707 0.0677 0.0656 1.67 0.1629 0.1132 0.1053 0.0842 0.0764 0.0673 0.0663 1 2.00 0.1691 0.1188 0.1012 0.0845 0.0737 0.0701 0.0684 1.00 0.1126 0.0897 0.0817 0.0748 0.0655 0.0628 0.0602 1.33 0.1124 0.0901 0.0823 0.0715 0.0677 0.0632 0.0614 1.50 0.1148 0.0940 0.0842 0.0706 0.0684 0.0692 0.0663 1.67 0.1158 0.0939 0.0823 0.0754 0.0702 0.0588 0.0584 2 2.00 0.1223 0.0904 0.0844 0.0715 0.0639 0.0613 0.0641 1.00 0.1017 0.0838 0.0742 0.0692 0.0629 0.0579 0.0563 1.33 0.1011 0.0825 0.0726 0.0651 0.0671 0.0560 0.0556 1.50 0.1016 0.0791 0.0784 0.0638 0.0632 0.0600 0.0567 1.67 0.1048 0.0854 0.0681 0.0681 0.0624 0.0586 0.0556 3 2.00 0.1063 0.0805 0.0772 0.0693 0.0664 0.0644 0.0576 1.00 0.0816 0.0693 0.0705 0.0634 0.0619 0.0573 0.0555 1.33 0.0831 0.0750 0.0669 0.0576 0.0624 0.0545 0.0578 1.50 0.0832 0.0727 0.0677 0.0634 0.0599 0.0611 0.0555 1.67 0.0861 0.0744 0.0673 0.0576 0.0609 0.0603 0.0575 6 2.00 0.0763 0.0741 0.0699 0.0681 0.0626 0.0568 0.0558 1.00 0.0739 0.0675 0.0687 0.0573 0.0583 0.0561 0.0553 1.33 0.0752 0.0655 0.0652 0.0546 0.0590 0.0546 0.0550 1.50 0.0762 0.0671 0.0610 0.0628 0.0528 0.0561 0.0535 1.67 0.0804 0.0688 0.0616 0.0623 0.0597 0.0577 0.0554 9 2.00 0.0748 0.0713 0.0641 0.0636 0.0609 0.0533 0.0498 1.00 0.0707 0.0643 0.0593 0.0569 0.0553 0.0534 0.0557 1.33 0.0751 0.0657 0.0559 0.0570 0.0625 0.0568 0.0521 1.50 0.0671 0.0652 0.0641 0.0590 0.0559 0.0559 0.0545 1.67 0.0728 0.0682 0.0625 0.0597 0.0587 0.0577 0.0554 12 2.00 0.0705 0.0599 0.0645 0.0575 0.0542 0.0538 0.0520
Table 2-5. Ratios of the average of 10,000 lower bounds and the real Spk, i.e.LB S . / pk m S pk n=10 n=20 n=30 n=50 n=100 n=150 n=200 1.00 0.8056 0.8286 0.8483 0.8726 0.9030 0.9178 0.9275 1.33 0.8122 0.8287 0.8487 0.8729 0.9032 0.9179 0.9273 1.50 0.8156 0.8302 0.8505 0.8730 0.9029 0.9178 0.9271 1.67 0.8171 0.8312 0.8511 0.8732 0.9032 0.9186 0.9274 1 2.00 0.8239 0.8341 0.8510 0.8739 0.9038 0.9177 0.9278 1.00 0.8283 0.8611 0.8810 0.9030 0.9274 0.9388 0.9460 1.33 0.8314 0.8636 0.8802 0.9025 0.9276 0.9396 0.9472 1.50 0.8319 0.8627 0.8817 0.9021 0.9285 0.9400 0.9469 1.67 0.8304 0.8628 0.8815 0.9037 0.9277 0.9391 0.9466 2 2.00 0.8358 0.8629 0.8816 0.9020 0.9277 0.9394 0.9471 1.00 0.8496 0.8820 0.8982 0.9179 0.9390 0.9493 0.9552 1.33 0.8493 0.8806 0.8979 0.9170 0.9402 0.9497 0.9555 1.50 0.8484 0.8810 0.8983 0.9179 0.9390 0.9494 0.9555 1.67 0.8506 0.8815 0.8976 0.9176 0.9393 0.9493 0.9558 3 2.00 0.8516 0.8812 0.8987 0.9185 0.9401 0.9498 0.9556 1.00 0.8807 0.9095 0.9241 0.9394 0.9563 0.9632 0.9682 1.33 0.8803 0.9100 0.9239 0.9388 0.9559 0.9634 0.9684 1.50 0.8810 0.9106 0.9241 0.9394 0.9561 0.9636 0.9683 1.67 0.8825 0.9096 0.9241 0.9387 0.9556 0.9635 0.9685 6 2.00 0.8819 0.9102 0.9240 0.9400 0.9563 0.9635 0.9679 1.00 0.8988 0.9245 0.9369 0.9499 0.9630 0.9697 0.9734 1.33 0.8994 0.9236 0.9357 0.9495 0.9637 0.9700 0.9735 1.50 0.8988 0.9248 0.9361 0.9499 0.9634 0.9699 0.9736 1.67 0.8995 0.9244 0.9363 0.9500 0.9638 0.9702 0.9737 9 2.00 0.8996 0.9247 0.9369 0.9499 0.9635 0.9697 0.9735 1.00 0.9104 0.9329 0.9442 0.9561 0.9681 0.9736 0.9770 1.33 0.9100 0.9330 0.9434 0.9558 0.9683 0.9736 0.9769 1.50 0.9093 0.9333 0.9441 0.9557 0.9682 0.9738 0.9771 1.67 0.9095 0.9337 0.9441 0.9560 0.9682 0.9736 0.9773 12 2.00 0.9106 0.9323 0.9446 0.9556 0.9677 0.9735 0.9772
The results in Table 2-4 show that when m = 12, n = 50, the confidence level of the normal approximation is almost equal to the preset 1-α (the confidence levels are all greater than 94%). As we know, the simulation results are in large variation, and by Central Limit Theorem the average is in small variation, so we calculate the average of the lower bounds and compare to the real Spk. Table 2-5 shows the ratios of the average lower bounds relative to the real Spk. It is noted
that no matter what the real Spk is, the ratios of LB S are almost equal with the same m and n. / pk
Thus, it is reasonable to estimate the true Spk from the ratios. For example, when m = 3 and n =
200, practitioners can repeat the sampling procedure, obtain the average lower bound, and estimate the real Spk by LB /0.9558.
Table 2-6. Sample sizes required for the normal approximation to converge with α = 0.05.
Designated Accuracy, ε m Spk 0.10 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 1.00 193 238 301 392 534 769 1201 2135 4802 19208 1.33 342 422 534 697 949 1366 2135 3795 8537 34147 1.50 433 534 676 882 1201 1729 2702 4802 10805 43217 1.67 534 659 834 1089 1483 2135 3335 5929 13339 53354 1 2.00 769 949 1201 1568 2135 3074 4802 8537 19208 76830 1.00 97 119 151 196 267 385 601 1068 2401 9604 1.33 171 211 267 349 475 683 1068 1898 4269 17074 1.50 217 267 338 441 601 865 1351 2401 5403 21609 1.67 267 330 417 545 742 1068 1668 2965 6670 26677 2 2.00 385 475 601 784 1068 1537 2401 4269 9604 38415 1.00 65 80 101 131 178 257 401 712 1601 6403 1.33 114 141 178 233 317 456 712 1265 2846 11383 1.50 145 178 226 294 401 577 901 1601 3602 14406 1.67 178 220 278 363 495 712 1112 1977 4447 17785 3 2.00 257 317 401 523 712 1025 1601 2846 6403 25610 1.00 33 40 51 66 89 129 201 356 801 3202 1.33 57 71 89 117 159 228 356 633 1423 5692 1.50 73 89 113 147 201 289 451 801 1801 7203 1.67 89 110 139 182 248 356 556 989 2224 8893 6 2.00 129 159 201 262 356 513 801 1423 3202 12805 1.00 22 27 34 44 60 86 134 238 534 2135 1.33 38 47 60 78 106 152 238 422 949 3795 1.50 49 60 76 98 134 193 301 534 1201 4802 1.67 60 74 93 121 165 238 371 659 1483 5929 9 2.00 86 106 134 175 238 342 534 949 2135 8537 1.00 17 20 26 33 45 65 101 178 401 1601 1.33 29 36 45 59 80 114 178 317 712 2846 1.50 37 45 57 74 101 145 226 401 901 3602 1.67 45 55 70 91 124 178 278 495 1112 4447 12 2.00 65 80 101 131 178 257 401 712 1601 6403
We further consider how many sample size n should be taken to ensure that the sampling
estimator is closed enough to the real Spk within a designated accuracy ε (Pearn et al. [36]).
Table 2-6 displays the sample sizes required for the normal approximation to converge to the real
Spk within a designated accuracy ε less than 0.10, 0.09, 0.08, 0.07, 0.06, 0.05, 0.04, 0.03, 0.02, 0.01, respectively, and the derivation is briefly done as follows:
{
ˆ′ − ≤ε}
≥ −α Pr Spk Spk 1 ⇒( )
⎧ ′ − ⎪ ⎨ ′ ⎪ ⎩ ˆ Pr ˆ pk pk pk S S Var S( )
ε ⎫⎪ α ≤ ⎬≥ − ′ ⎪ ⎭ 1 2 ˆ pk Var S ⇒( )
(
)
ε ≥ Φ− −α ′ 1 1 /2 ˆ pk Var S ⇒ φ + 2 2 2 36 (3 pk) a b mn S ε α − ≤ ⎡Φ − ⎤ ⎣ ⎦ 2 2 1(1 /2) ⇒(
)
(
α)
φ ε − ⎡ ⎤ + ⎣Φ − ⎦ ≥ 2 2 2 1 2 2 1 /2 36 (3 pk) a b mn S .For example, for m = 9, Spk = 1.33 with risk α= 0.05, a sample size of n ≥ 3795 ensures that
the difference between the sampling ˆS and the real Spk pk is smaller than 0.01. Thus, if the
sampling ˆS = 1.33, then we can conclude that the actual performance Spk pk > 1.32 with 95% confidence level. This convergence investigated is not for practical purpose, but to illustrate the behavior and the rate of convergence for the normal approximation.
2.6. An Application Example
The integrated circuits (IC) industry has been the most popular industry for previous years. Products of integrated circuits are various types such as office automation equipment (copiers, facsimile machines, printers, etc.), vending machines, banking terminals, CD or DVD players, battery chargers, etc. We investigated a company in Taiwan manufacturing one-cell rechargeable Li-ion battery packs which have advantages of low current consumption, high withstand voltage, high accuracy voltage detection, over current and short circuit protection, and wide operating temperature range. Among the advanced features, the most important one is the high accuracy voltage detection. Once the voltage detector falls down, the lifetime or reliability of the Li-ion battery pack will be discounted. The preset upper and lower specification limits of the over charge detector are USL = 4.40 V, LSL = 4.30 V, and target value is set to T= 4.35 V.
Figure 2-3. The X S charts based on the collected 12 samples each of size 50. − Table 2-7. The collected electrical characteristic data of 12 samples each of size 50.
subsample 1 2 3 4 5 6 7 8 9 10 11 12
i
x 4.3526 4.3483 4.3544 4.3490 4.3563 4.3542 4.3482 4.3537 4.3535 4.3505 4.3476 4.3502
i
s 0.0133 0.0120 0.0124 0.0093 0.0104 0.0114 0.0119 0.0174 0.0126 0.0112 0.0104 0.0102
Suppose the minimal precision requirement for this process is set to Spk = 1.0. We calculate
the overall sample mean x = 4.35154, the pooled sample standard deviation i s =p 2
p
s = 0.01192,
the un-pooled sample standard deviation s =u 2
u s = 0.01225, and ˆ pk S = μ σ − ⎧ ⎛ − ⎞ Φ ⎨ Φ⎜ ⎟ ⎝ ⎠ ⎩ 1 ˆ 1 1 ˆ 3 2 USL μ σ ⎫ − ⎛ ⎞ + Φ⎜ ⎟⎬ ⎝ ⎠⎭ ˆ 1 ˆ 2 LSL = Φ− ⎧ Φ⎛ − ⎞ ⎨ ⎜ ⎟ ⎝ ⎠ ⎩ 1 1 1 4.40 4.35154 3 2 0.01192 ⎫ − ⎛ ⎞ + Φ⎜ ⎟⎬ ⎝ ⎠⎭ 1 4.35154 4.30 2 0.01192 or Φ− ⎧ Φ⎛ − ⎞ ⎨ ⎜⎝ ⎟⎠ ⎩ 1 1 1 4.40 4.35154 3 2 0.01225 ⎫ − ⎛ ⎞ + Φ⎜ ⎟⎬ ⎝ ⎠⎭ 1 4.35154 4.30 2 0.01225 = 1.3871 or 1.3503.
We run the program in Appendix II to find the lower bound as 1.3242 (or 1.2890). Thus, we conclude that the true value of the process capability Spk would be no less than 1.3242 (or 1.2890) with 95% confidence level.
To estimate the real Spk, the factory manager could implement a weekly based control system by repeating the sampling procedure for consecutive, say 10 weeks, then calculate the average lower bounds. For example, the ten weeks lower bounds result LB = 1.4527. Refer to Table 2-5, the biggest ratio of LB S for m = 12 and n = 50 is 0.9561, then he can estimate the real S/ pk pk = 1.4527/0.9561 1.52. The corresponding process yield then could be estimated as 0.999994885, or equally, fraction of defectives is 5.115 ppm.
Chapter 3
Procedure of the Convolution Method for Estimating
Production Yield With Sample Size Information
3.1. Introduction
As mentioned in chapter 1, numerous process capability indices (PCI) have been proposed to provide numerical measures on the production yield as well as process performance. Those indices, such as Ca, Cp, Cpk, Cpm, and Cpmk, are defined to emphasize the process centering, process
precision or process loss, and only the index Spk, provides an exact measure on the production
yield. Note that the capability indices are designed to monitor the performance for stable normal
or near-normal processes with symmetric tolerances. In practice, the process mean μ and the
process variance σ2 are unknown. To calculate the index value, sample data must be collected,
and a great degree of uncertainty may be introduced into the assessments due to the sampling errors. As the use of the capability indices grows more widespread, users are becoming educated and sensitive to the impact of the estimators and their distributions, learning that capability measures must be reported in confidence intervals or via capability testing. Statistical properties of the estimators of those indices under various process conditions have been investigated extensively, including Chan et al. [4], Pearn et al. [35], Kotz and Johnson [23], Vännman and Kotz [49], Vännman [47], Kotz and Lovelace [25], Chen [5], Zhang [54], Kotz and Johnson [24], Lee et al. [26], Xie et al. [53], Spiring et al. [42], Pearn et al. [38], Pearn et al. [36],Pearn et al. [39],
Montgomery [31], Wu [52]. In this chapter, we propose the convolution method based on the Spk
index for estimating the production yield for single sample.
3.2. The Yield Index S
pkBoyles [3] proposed a yield measurement index, referred to as Spk, based on the production
yield of normal processes. The yield index Spk, as defined previously, also can be alternatively
-1 1 1 1 1 1 3 2 2 dr dr pk dp dp C C S C C ⎧ ⎛ − ⎞ ⎛ + ⎞⎫ ⎪ ⎪ = Φ ⎨ Φ⎜⎜ ⎟⎟+ Φ⎜⎜ ⎟⎟⎬ ⎪ ⎝ ⎠ ⎝ ⎠⎪ ⎩ ⎭,
where (Cdr = μ−m)/d, /dCdp =σ , m = (USL + LSL)/2 is the midpoint of the specification
limits, and d = (USL − LSL)/2 is the half length of the specification interval.
As mentioned previously, the index Cpk can only provide interval estimation on the
production yield. The indices Cpm and Cpmk are defined by being related to the customer’s loss.
Only the yield index Spk can provide a one-to-one correspondence to the production yield, which
can be expressed as
= 2 (3 pk) 1
Yield Φ S − .
Table 3-1 summarizes the corresponding production yields as well as non-conformities in parts
per million (PPM) for Spk = 1.0(0.1)2.0, including the most commonly used performance
requirements: 1.00, 1.33, 1.50, 1.67, and 2.00. For example, if a process has capability index
value Spk = 1.50, then the yield of the process is 0.999993205 and the corresponding
non-conformities is roughly 7 parts per million.
Table 3-1. Various Spk values and the corresponding
production yields as well as non-conformities in PPM.
Spk Yield PPM 1.00 0.997300204 2699.796 1.10 0.999033152 966.848 1.20 0.999681783 318.217 1.30 0.999903807 96.193 1.33 0.999933927 66.073 1.40 0.999973309 26.691 1.50 0.999993205 6.795 1.60 0.999998413 1.587 1.67 0.999999456 0.544 1.70 0.999999660 0.340 1.80 0.999999933 0.067 1.90 0.999999988 0.012 2.00 0.999999998 0.002
Assume that X1,…,Xn be a random sample of the characteristic from a normal process. The
-1 1 1 1 ˆ 3 2 2 pk USL X X LSL S S S ⎧ ⎛ − ⎞ ⎛ − ⎞⎫ ⎪ ⎪ = Φ ⎨ Φ⎜ ⎟+ Φ⎜ ⎟⎬ ⎪ ⎝ ⎠ ⎝ ⎠⎪ ⎩ ⎭,
and can also be expressed as ˆ pk S =1 1 1 1 ˆ ˆ 3 2 − ⎧⎪ ⎛ − ⎞ Φ ⎨ Φ⎜⎜ ⎟⎟ ⎪ ⎝ ⎠ ⎩ dr dp C C ˆ 1 1 ˆ 2 dr dp C C ⎫ ⎛ + ⎞⎪ + Φ⎜⎜ ⎟⎬ ⎟⎪ ⎝ ⎠⎭,
where ˆCdr =(X−m)/d and ˆCdp =S/d are natural estimators of Cdr and Cdp, respectively,
1 1
n i
n i
X =
∑
= X is the sample mean, and 2 = 1−∑
= − 21 1( )
n i
n i
S X X is the sample variance. The
distribution of the natural estimator of Spk is mathematically intractable as it is a complex function
of the statistics X and S (or ˆ2
dr
C and ˆC ). However, we can profile the sampling dp distribution of Spk by using a simulation technique. Figure 3-1 shows the histograms of ˆS with pk
simulation parameters Spk = 1.0, ξ = (μ−m)/σ = 0, and sample size n = 20, 30, 50, 80 each
with 10,000 simulated ˆS . The histograms reveal that the probability density function (PDF) of pk
ˆ pk
S is nearly bell-shaped, symmetric to the real Spk for large sample sizes, and slightly skewed to the right for small sample sizes.
n = 20 n = 30
n = 50 n = 80
Many researchers have focused on the sampling distribution of Spk. Lee et al. [26] derived a normal approximated distribution of the estimated Spk. Pearn et al. [34] investigated the accuracy of the normal approximation computationally, and suggested that a sample size greater than 150 is required for the normal approximation sufficiently accurate. Pearn and Cheng [33] further
derived a normal approximated distribution of the estimated Spk under multiple samples, and
investigated the sample sizes required to converge to Spk within a designated accuracy. Chen [6]
considered that the formula of the normal approximation is messy and cumbersome to deal with.
Chen [6] applied four bootstrap methods to find the lower confidence bounds on Spk, and showed
that the standard bootstrap (SB) method significantly outperforms the other three bootstrap methods in coverage fraction. We note, however, the bootstrap re-sampling method results in different solutions each time, while the theoretical sampling distribution approach provides a unique lower bound for the same sample estimates.
The distribution of ˆS is analytically intractable, but approximate distributions of ˆpk S can pk
be obtained. In the following sections, two approximate distributions are considered and compared to the distribution of the estimated Spk obtained via simulations.
3.3. Normal Approximation of ˆ
S : ˆ
pkS′
pkLee et al. [26] considered a normal approximation of ˆS , which is denoted ˆ′pk S in this pk
paper. The normal distribution of ˆ′S is distributed with a mean Spk pk and a variance
2 2 2 (a +b )/[36nφ (3Spk)], i.e. 2 2 2 ˆ ~ , 36 φ (3 ) ⎛ + ⎞ ′ ⎜⎜ ⎟⎟ ⎝ ⎠ pk pk pk a b S N S n S , where 1 1 1 1 1 2 dr dr dr dr dp dp dp dp C C C C a C φ C C φ C ⎧ − ⎛ − ⎞ + ⎛ + ⎞⎫ ⎪ ⎪ = ⎨ ⎜⎜ ⎟⎟+ ⎜⎜ ⎟⎟⎬ ⎪ ⎝ ⎠ ⎝ ⎠⎪ ⎩ ⎭, and 1 dr 1 dr dp dp C C b C C φ⎛ − ⎞ φ⎛ + ⎞ = ⎜⎜ ⎟⎟− ⎜⎜ ⎟⎟ ⎝ ⎠ ⎝ ⎠.
The normal approximation is useful in statistical inferences for Spk. Consider the following null versus alternative hypotheses:
H0 : Spk≤ C, a specified value; H1 : Spk > C.
The decision rule with 1−α confidence level should be that to reject the null hypothesis H0 if the sample statistic ˆS is equal to or larger than the critical value cpk 0, where c0 satisfies the following equation
{
ˆ 0 0}
Pr Spk′ ≥c |H :Spk ≤C ≤ . α
Lee et al. [26] suggested performing the hypothesis testing with the test statistic
(
)
φ = − + 2 2 ˆ 6 (3 ) ˆ C ˆ ˆ pk pk n S T S a b ,where ˆa and ˆb are the natural estimators of a and b, with Cdr and Cdp replaced by ˆC and ˆdr C , dp
respectively. Then, the decision rule becomes that the null hypothesis H0 would be rejected if
α
≥
T z , where z is the upper α 100 % point of the standard normal distribution. α
This approach is intuitive and reasonable, but introduces additional sampling errors from estimating a and b (or Cdr and Cdp) with ˆa and ˆb (or ˆC and ˆdr C ). Thus, it would certainly dp become less reliable. For example, in Table 3-2 the sample estimate of Spk in Process B is larger than the one in Process A, but contradictorily it turns out a smaller test statistic T in Process B. Table 3-2 shows a couple of examples for testing H0: Spk≤ 1.0 versus H1: Spk > 1.0 in which the
sample estimate of Spk is larger (e.g. Processes B, D, F, and H), but on the contrary, the
corresponding test statistic T is smaller.
Table 3-2. Contradiction between ˆS and test statistic T in Lee’s method. pk
Process X S C ˆdr C ˆdp Sˆpk T A 7.695115 1.365970 0.139023 0.273194 1.114490 0.807547 B 7.674245 1.372115 0.134849 0.274423 1.114555 0.807412 C 7.707630 1.335160 0.141526 0.267032 1.134942 1.207505 D 7.681125 1.342895 0.136225 0.268579 1.135032 1.207252 E 7.683340 1.314965 0.136668 0.262993 1.156439 1.063747 F 7.650165 1.324405 0.130033 0.264881 1.156573 1.063459 G 7.700125 1.219685 0.140025 0.243937 1.234395 1.929673 H 7.680760 1.224995 0.136152 0.244999 1.234452 1.929267
Pearn et al. [36] showed that for a specific Spk (e.g. Spk = C), the variance of ˆ′S would be pk the largest with on-center processes, i.e. with ξ =(μ−m)/σ = 0. Consequently, the critical value
of testing H0: Spk≤ C versus H1: Spk > C would be the largest, and the test statistic T would be the
smallest with ξ = 0. Hence, for practical purpose we would obtain the test statistic (or critical
value) with ξ= 0 without having to further estimate the parameter ξ (or parameters a and b).
The test statistic T obtained in this way is increasing in ˆS , and there would be no contradiction. pk
Pearn et al. [36] listed in the Table III of the published paper the critical values c0 of the ˆ′S pk approach which were obtained by the following probability
{
ˆ 0}
Pr S′ ≥pk c S| pk≤C and = 0ξ ≤α .
Lee et al. [26] showed that the normal distribution of ˆ′S can produce an adequate pk
approximation to the actual distribution of ˆS for a large enough sample size. However, Pearn pk et al. [36] noted that the normal approximation would significantly under-calculate the critical
values for small sample sizes, and suggested that a sample of size greater than 150 is recommended in real applications, for which the magnitude of under-calculation would be as
large as 0.02 at most. Since the critical value of the ˆ′Spk approach is significantly
under-calculated for small sample sizes, it is necessary to do some improvement.
3.4. Convolution Approximation of ˆ
S : ˆ′′
pkS
pkThe critical value obtained from the normal approximation is significantly under-calculated for small sample sizes. Thus, we go further to do some improvement by considering a
convolution approximation of the estimated Spk. First, we define the two random variables
( μ σ)/
= −
Z n X and Y = n S( 2−σ2)/2σ2 . The two random variables Z and Y are
independent since X and S are independent variables. It is well-known that the variable Z 2
follows the standard normal distribution N(0, 1) according to the famous Central Limit Theory,
and Y can be expressed as a function of a chi-square random variable with −1n degrees of
freedom, i.e.
( )
~ 0,1 Z N ,(
χ−)
− − 2 1 2 ( 1) ~ n n 1 n Y .Then, we can rewrite the form of ˆS as the following analytical expansion: pk
ˆ pk S = 2 2 1 2 3 4 5 1 pk p S D Z D Y D Z D ZY D Y O n n ⎛ ⎞ + + + + + + ⎜ ⎟ ⎝ ⎠,
where D1 = λ φ ⎛ − ⎞ ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ 0 1 6 (3Spk) n , D2 = λ φ ⎛ − ⎞ ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ 1 1 6 (3Spk) n , D3 = λ λ φ φ ⎛ ⎞ ⎜ − ⎟ ⎜ ⎡ ⎤ ⎟ ⎣ ⎦ ⎝ ⎠ 2 0 1 2 1 12 (3 ) 8 (3 ) pk pk pk S n S S , D4 = λ λ λ λ φ φ ⎛ − ⎞ ⎜ + ⎟ ⎜ ⎡ ⎤ ⎟ ⎣ ⎦ ⎝ ⎠ 0 1 0 2 2 1 6 (3 ) 4 (3 ) pk pk pk S n S S , D5 = λ λ λ φ φ ⎛ − ⎞ ⎜ + ⎟ ⎜ ⎡ ⎤ ⎟ ⎣ ⎦ ⎝ ⎠ 2 1 1 3 2 3 1 12 (3 ) 8 (3 ) pk pk pk S n S S , and 1 1 1 1 1 ( 1) λ =⎜⎛ − ⎟ ⎜⎞ ⎛φ − ⎟⎞+ − + ×⎛⎜ + ⎞ ⎛⎟ ⎜φ + ⎞⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ k k k dr dr dr dr k dp dp dp dp C C C C C C C C , k = 0, 1, 2, 3. Let ˆ′′pk S = 2 2 1 2 3 4 5 pk S +D Z D Y D Z+ + +D ZY D Y+ . The cumulative distribution function (CDF) of ˆ′′S , pk ˆ′′ ( )
pk
S
F x , then can be derived by the
probability ˆ′′pk( ) S F x =Pr
{
Sˆ′′ − ≤pk x 0}
= 2 2 1 3 1 4 1 3 3 3 3 1 ( ) ( ) Pr 0 2 4 4 E Y E D D Y x D Z D D D E ⎫ ⎧ ⎛ + ⎞ + Δ ⎪ + − + ≤ ⎪ ⎨ ⎜ ⎟ ⎬ ⎝ ⎠ ⎪ ⎪ ⎩ ⎭ , where = 2− 1 4 4 3 5 E D D D , E2 =D D1 4−2D D , 2 3 = 2 1 E 3 E E , = − 2 4 4 3 pk 1 E D S D , and 2 1( ) 2 1(4 3 4) Δ x =E −E D x E . −The explicit form of the CDF of ˆ′′S is presented in the Appendix. The CDF of ˆ′′pk S consists of pk eight parts according to the signs of D3, E1 and y0+E3, where y0 =− is the minimal value of the 2n variable Y. Applying the Leibniz’s rule for derivatives, we can also obtain the probability density function (PDF) of ˆ′′S . pk
Again, we consider the following hypothesis testing
H0 : Spk≤ C, a specified value; H1 : Spk > C.