A comparison of two methods for transforming non-normal manufacturing data

(1)

Int J Adv Manuf Technol DOI 10.1007/s00170-005-0279-3

O R I G I N A L A RT I C L E

S. H. Chung . W. L. Pearn . Y. S. Yang

A comparison of two methods for transforming non-normal

manufacturing data

Received: 5 February 2004 / Accepted: 5 May 2005 / Published online: 22 March 2006 # Springer-Verlag London Limited 2006

Abstract Many statistical methods applied to manufactur-ing quality control and operations management have been under the assumption that the process characteristic investigated is normally distributed. If the process charac-teristic is not normally distributed, a popular approach is to transform the non-normal data into a normal one. In this paper, we consider the Box-Cox transformation, and compare the transformation power using two different parameter estimation methods, including the maximum likelihood estimator (MLE) and the method of percentiles (MOP). The performance comparison is based on the pass rate under the Shapiro-Wilk normality test. The results show that, in general, the MOP has better pass rate, while the MLE has smaller power variation for most cases investigated. For small sample size (n=5, 10) both methods perform equally well. For large sample size, the MOP is recommended due to its simplicity and significantly higher pass rate.

Keywords Box-Cox transformation . Non-normal distribution . Parameter estimation

1 Introduction

For most industrial applications, normality is assumed due to the advantage of analytical convenience and existing effective statistical methods. But, for many engineering operations such as locating pins or automatic sensors, the manufacturing data is often truncated or appears to be non-normal. Pezdek [9] gave a non-normal data example and perform process performance analysis. Pezdek [9] demon-strated how the non-normal characteristic would signifi-cantly impact the data analysis result and the conclusion, thus conveying incorrect process information. If the process characteristic is not normally distributed, a popular

approach is to transform the non-normal data into a normal one. Box and Cox [1] modified the family of power transformation introduced by Tukey [13] which, in its simple form, consists of transformations T_λ: y→yλdefined as yð Þλ ¼ y λ₁ λ ðλ 6¼ 0Þ ln y λ ¼ 0ð Þ (1.1) The transformation in Eq.1.1 is defined for y>0. It is hoped that for some value ofλ, the data transformed for a non-normal one can be fitted to a normal distribution. Box and Cox [1] considered the maximum likelihood method and the Bayesian approach for estimating the parameterλ. Draper and Cox [2] derived an analytical expression for the accuracy of maximum likelihood estimate ofλ. Hinkley [5] proposed an analytical procedure to estimate the transfor-mation parameter based on order statistics. Hinkley [5] further proposed a simple method for choosing appropriate power transformation for non-normal data. Hernandes and Johnson [4] introduced an approach, by minimizing the Kullback-Leibler information, to optimize the normal transformation; but their procedure for selecting the best value ofλ uses the information function, and is therefore restricted to cases with large sample size n. Taylor [12] used the skewness coefficient as a measure of the process symmetry to estimate the transformation power. Taylor [12] showed that the use of skewness coefficient could significantly improve the transformation power for some special cases. Han [3] investigated a non-parametric approach, and proposed an estimator for the transformation based on Kendall’s rank correlation. Due to the complexity in computing the maximum likelihood estimate of an extended Box-Cox model, Ogwang [8] proposed a simple algorithm which accommodates the classical model in Box and Cox [1]. An illustrative example is presented in Ogwang [8] to demonstrate good performance of the algorithm.

In this paper we consider the Box-Cox transformation [1] and compare the transformation power using two

S. H. Chung (*) . W. L. Pearn . Y. S. Yang Department of Industrial Engineering & Management, National Chiao Tung University,

Taiwan, Republic of China e-mail: [email protected]

(2)

different parameter estimation methods, including the maximum likelihood estimator (MLE) and the method of percentiles (MOP). The performance comparison is based on the pass rate under the Shapiro-Wilk normality test [11].

2 Box-Cox transformation based on MLE

Suppose that Y1,…,Yn are continuous nonnegative inde-pendent and identically distributed random variables. A restriction on the variables to be positive is necessary for the transformation in Eq. 1.1 to be valid. Consider the

power transformation in Eq. 1.1 and its extension

simplified from Han [3]:

yð Þ_iλ ¼ β þ ei; i ¼ 1; 2; n; (2.1)

whereβ is a constant and eiis the error term. Assume that the errors are independent and approximately normally distributed with mean zero and varianceσ2. The Jacobian of transformation from ei to yi is yλ1_i , and the log-likelihood of the observed sample, y1, y2,…,yn, is given by L λð Þ ¼ n=2ð Þ ln 2πð Þ n=2ð Þ ln σ2 2σ 21Xn i¼1 yð Þiλ β h i2 þ λ 1ð ÞXn i¼1 ln yi (2.2)

The maximum likelihood estimator can be obtained by maximizing the log–likelihood function of Eq. 2.2 with respect to β, σ2 and λ. Dividing by its geometric mean scales, y

¼ exp n1P_{ln y} i

ð Þ , the Jacobian term of Eq.2.2

becomes zero. Schlesselman [10] noted that the constantβ in the model is essential to the scale invariance property of the Box-Cox transformation. This implies that we can obtain the scaled version maximum likelihood estimate of the parameterλ for the Box-Cox transformation, and then retrieve λ of the original un-scaled transformation. Correspondingly, dividing both sides of Eq. 2.1 by y: λ with some manipulations we obtain the following, y λið Þ¼ βþ ei; i ¼ 1; 2; ; n (2.3) where y_i ¼ yi . y: λ; β_{¼ β y}h : λð Þi._y: λ_{; and e} i ¼ ei . y: λ: The log-likelihood of the scaled sample y₁; ; y_n is given by Lð Þλ ¼ n=2ð Þ ln 2πð Þ n=2ð Þ ln σ2 2σ 21Xn i¼1 y λ_ið Þ β h i2 (2.4)

whereσ2 is the variance of ei . The maximum likelihood estimators of the parameters in Eq. 2.3 can be derived by maximizing Eq. 2.4 with respect to β, σ*2 _and λ. Taking partial differentiation with respect toβ*, σ*2_andλ, and setting the partial derivatives to zero, we obtain the following normal equations: Xn i¼1 e_i ¼ 0 (2.5) σ2_¼Xn i¼1 e2_i , n (2.6) 1λ2_σ2 Xn i¼1 ei λyλi ln yi yλi 1 ¼ 0 (2.7)

From Eq. 2.5, we solve for the maximum likelihood estimator ofβ* to obtain

bβ _{¼ y} λð Þ _(2.8)

By plugging Eq.2.8in Eq.2.6, the maximum likelihood estimator ofσ2 is bσ2_¼Xn i¼1 y λið Þ y λð Þ ₂, n (2.9)

Multiplying both sides of Eq.2.7byλσ2, and setting be i ¼ y λð Þ i y λð Þ, we obtain Xn i¼1 be i yλi ln yi y λið Þ ¼Xn i¼1 be i yλi ln yi bβ bei ¼ 0 (2.10)

With Eq. 2.5 and bβP n

i¼1 be

i ¼ 0 , then Eq. 2.7 can be rewritten as Xn i¼1 be i yλi ln yi bei ¼Xn i¼1 be i f λð Þ bei ¼ 0 (2.11)

Ogwnag [8] suggested using the first-order Taylor expansion to approximate f(λ) around λ0, to avoid tedious computation, which provides a unique solution that is as

(3)

good as that of using higher order ones. Therefore, Eq.2.11 can be rewritten and solved forλ as:

bλ ¼ Pn i¼1 be2 i þ Pn i¼1 be i½λ0f0ð Þf λλ0 ð Þ0 Pn i¼1 be if0ð Þλ0 (2.12)

Based on the above estimation, we develop the following procedure, which is simple to implement for practitioners, to calculate the power of the Box-Cox transformation using the maximum likelihood estimator: Step 1

Choose a number as an initial guessλ₀ofλ for a given random sample.

Step 2

Transform the original sample using Eq. 1.1 with parameter λ₀, then calculate bβ and bσ2 based on Eqs.2.8and2.9, respectively.

Step 3

Use Eq.2.12to solve for a current valueλ_cofλ. Step 4

Check whether the difference between λ_c and λ₀ is less than a predetermined precision level. If not, reset λ0to λcand iterate between step 1 and step 3 until the difference between λ_c and λ₀ is smaller than the predetermined precision level.

Step 5

Use the λcobtained in step 4 as the optimal value for bλ . Apply the Shapiro-Wilk test to check the normality of the transformed sample.

3 Box-Cox transformation based on percentile estimator

Hinkley [5] proposed an alternative method, which does not require as much calculation as that required for using the maximum likelihood method. We note that the method discussed in Hinkley [5] is also sensitive to outliers. Assume that the sample Y1,…,Yn can be described by Eq. 2.1 with the common distribution function F(y). The percentileηpis defined as F(ηp)=p(0<p<1). If there is aλ such that the transformed data yð Þλ defined in Eqs.1.1and 2.1 follow a normal distribution, then the ηp and η1–p percentiles will be symmetric to the median. This property suggests using the order statistics of the random sample, which is corresponding to the tail probabilities p and 1–p for some p. As pointed out by Hinkley [5], yð Þλ is seldom perfectly symmetrical for most λ. But, we anticipate that there might be a value of λ making the transformed data nearly symmetric. Therefore, we look for the power transformation in Eq.1.1for which

ηλ

0:5 ηλp ¼ ηλ1p ηλ0:5 (3.1)

Note that Y(1),…,Y(n)are the ordered values and eY is the median of Y1,…,Yn. Clearly, except the trivial solution, eY ¼ Yð Þr ¼ Yðnrþ1Þ, other solutions ofλ must be found by solving the following Eq.3.2:

eYλ_Yλ r

ð Þ¼ Yðλnrþ1Þ eYλ; where r ¼ np½ (3.2) Equation 3.2 provides exact solutions for λ. Another solution to Eq.3.2isλ=0. This occurs when

eY.Yð Þr ¼ Yðnrþ1Þ .

eY (3.3)

which is the condition for sample percentiles of ln y symmetric about the median. If λ≠0, we could easily rewrite Eq.3.2as the form given below:

Yð Þr . eY λ þ Yðnrþ1Þ . eY λ ¼ 2 (3.4)

Hinkley [5] showed that there must exist one nonzero solution to Eq. 3.4, and suggested the use of multiple values of p, sayp1<…<pm<1/2, to obtain multiple equations of Eq.3.4, and then summing them to obtain a reasonably efficient estimator of λ. The resulting equation is given below: Xm j¼1 cj Y _r j ð Þ eY !λ þ Yðnrjþ1Þ eY !λ 2 4 3 5 ¼ 2Xm j¼1 cj (3.5)

where rj=[npj] and c1,…,cmare arbitrary weights. Hinkley [5] showed that using equal weights on cj, with pj≤ 0.05 and m=3, which average out the asymmetry characteristic of yð Þλ , increases the precision of the transformation. We may solve Eq.3.5for a good approximatedλ. However, if the following condition is met, then the solution bλ ¼ 0 will be chosen. Xm j¼1 cjlnðYð ÞY_r_j ð_nr_j_þ1ÞÞ ¼ 2 Xm j¼1 cjln eY (3.6)

Hinkley [6] proposed a similar method for choosing a symmetrizing transformation based on the asymmetry degree of the sample, which is measured by

d ¼ sample mean sample medianð Þsample scale: (3.7) If the underlying distribution is symmetric, then the mean and the median must be identical. Thus, the sample data drawn from such distribution should reflect such property, and a good estimate of λ should minimize the value of d.

(4)

Base on the above argument with settingλ to –2≤λ≤2 (as recommended by Tukey [13]), a step-by-step procedure for computing the power of Box-Cox transformation based on MOP may be presented as follows:

Step 1

Choose –2 as an initial guess λ0 of λ for a given random sample.

Step 2

Transform the original sample by taking the powerλ0 and then find the sample mean, Yð Þλ; sample me-dian,eYð Þλ ; and sample inter-quartile range, r, for the transformed random sample, Y1ð Þλ; Y2ð Þλ; ; Ynð Þλ: Step 3

Calculate d defined in Eq.3.7using the inter-quartile range as the sample scale.

Step 4

Check whether d is less than a predetermined precision level. If not, iterate steps 1–3 by increasing the magnitude ofλ by 0.05 as new λ0, till the difference betweenλ0 andλc is smaller than the predetermined precision level.

Step 5

Use theλ derived from step 4 as the optimal estimated bλ: Employ the Shapiro-Wilk test to check the normali-ty of the transformed sample.

4 Implementation and application

To illustrate the Box-Cox transformation using the two estimators, we consider the data given in Table1collected from a forging manufacturing process making a specific type of piston rings for automotive engines. Before starting data analysis, the normality assumption must be checked. Figure1gives a standard normal plot and the standardized original data, which does not appear to be normal. The original data is then transformed to one that is likely to be normal so that data analysis can be performed.

After estimatingλ by MOP and MLE, the original data is then transformed using Eq.1.1with powers of 0.25 and 0, respectively. The transformed data of the piston rings for automotive engines, using the two estimators, MLE and MOP, are given in Tables 2 and 3. In Fig. 2, the curve marked by crosses is the normal plot for the transformed data using MOP, and the curve marked by triangles is one for the transformed data using MLE. The near normality for both sets of the transformed data indicates that the transformation is effective; the normal assumption required

for applying the statistical method may be satisfactory, and the manufacturing quality analysis could be performed.

5 Comparison of the transformation power

To compare the transformation power using the two estimators (MLE and MOP), a set of distributions widely applied to engineering applications modelling different process characteristics are selected. Those distributions can be grouped into three categories: (1) negatively skewed, and the beta (5, 1) distribution; (2) positively skewed, including the gamma distributions, F distributions, and the lognormal distributions; and (3) the symmetrical distribu-tions, including the beta (0.5, 0.5) distribution, uniform distributions with flat kurtosis, and the T distributions with sharp kurtosis. To obtain useful information, 1,000 samples of size 5, 10, 25, 50, 100, and 500 are generated for each distribution in the simulation. For each sample, we use the optimal bλ obtained using the two estimators for the

Table 1 Thirty piston ring diameters for automotive engines

0.32 0.47 0.52 0.59 0.77 0.81 0.81 0.90 0.96 1.18 1.20 1.20 1.31 1.35 1.43 1.51 1.62 1.74 1.87 1.89 1.95 2.05 2.10 2.20 2.48 2.81 3.00 3.09 3.37 4.75 -3 -2 -1 0 1 2 3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Fig. 1 Normal plot of standardized original data indicated by circles

Table 2 The thirty diameters of the piston rings transformed by MOP 2.0085 2.3120 2.3967 2.5057 2.7470 2.7947 2.7947 2.8960 2.9594 3.1690 3.1865 3.1865 3.2794 3.3116 3.3742 3.4341 3.5127 3.5941 3.6776 3.6900 3.7268 3.7863 3.8152 3.8715 4.0196 4.1789 4.2643 4.3033 4.4196 4.9052

Table 3 The thirty diameters of the piston rings transformed by MLE –1.1394 –0.7550 –0.6539 –0.5276 –0.2614 –0.2107 –0.2107 –0.1054 –0.0408 0.1655 0.1823 0.1823 0.2700 0.3001 0.3577 0.4121 0.4824 0.5539 0.6259 0.6366 0.6678 0.7178 0.7419 0.7885 0.9083 1.0332 1.0986 1.1282 1.2149 1.5581

(5)

transformation. To test the normality of the transformed data, the Shapiro-Wilk test is employed. Madansky [7] showed that the Shapiro-Wilk test, in general, is more powerful than other goodness-of-fit tests. Sample means and variances of the estimated optimal powers, bλ, are calculated, and the pass rates of the transformed data using the Shapiro-Wilk test are then computed. A flow chart illustrating the simulation procedure is presented in Fig.3.

5.1 Negatively skewed distributions

We first consider the beta distributions with heavy left tails. Suppose a random sample of size 100 is generated from beta (10, 2) (Fig.4), with the histograms of the transformed samples by MOP and MLE in Figs.5and6, respectively. In correcting the asymmetry, MOP seems to be more effective than the MLE. Figure7gives the pass rates of beta (5, 1) for various sample sizes. The MOP outperforms MLE in

pass rates except for n=5, 500. In Fig.8, we note that the transformation is rather effective using the MOP with powers between 1.5 and 2. While using the MLE results in much the same value of bλ near zero, the variation of bλ using the MOP is significantly greater than that of using the MLE, as seen in Fig.9.

5.2 Positively skewed distributions

Next, we consider the positively skewed distributions including the gamma family, F, some beta distributions, and lognormal distributions. A random sample drawn from beta (2, 10) is presented to show the transformation in Figs.10,11,12. Both MOP and MLE seem to be effective in centralization and lightening tails.

Among these positively skewed distributions consid-ered, exponential (1), Weilbull (5, 1), andχ2(1) are highly positively skewed. According to Fig.13, MOP has higher pass rates than MLE in most cases for different sample

No No

Yes

Yes Generate 1000 samples of size 5, 10, 25, 100, and 500 for each specific distribution.

Perform Shapiro-Wilk test to assess normality. Is normality of the sample rejected? Use MLE to estimate λ estimate λ Use MOP to

Compute the pass rates of the transformed samples which is normally distributed.

Is normality of the sample rejected? for each transformed samples. Perform Shapiro-Wilk test to assess normality

Use the estimated λ to transform the sample as model (1.1).

Fig. 3 Flow chart for simula-tion procedure 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0 2 4 6 8 10 12 14

Fig. 4 A negatively-skewed random sample of size 100 drawn from beta (10, 2) -3 -2 -1 0 1 2 3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Fig. 2 Normal plots of standardized transformed data indicated by crosses for MOP and triangles for MLE

(6)

sizes. MOP provides nearly constant pass rates. For MLE the larger the sample size, the smaller the pass rate. While n is equal to 5 or 10 as shown in Fig.14, the best choice of bλ

is zero no matter which method was employed. MLE suggests the use of square root as the power of the transformation for exponential and chi-square distribu--1 -0.95 -0.9 -0.85 -0.8 -0.75 -0.7 -0.65 -0.6 -0.55 -0.5 0 2 4 6 8 10 12 14

Fig. 5 Histogram of the transformed sample by MOP, b ¼ 2

-0.50 -0.45 -0.4 -0.35 -0.3 10 -0.2 -0.15 -0.1 -0.05 0 2 4 6 8 10 12 14

Fig. 6 Histogram of the transformed sample by MLE, b ¼ 0

0 20 40 60 80 100 5 500 MOP 10 25 50 100 pass rate (%) sample size MLE Pass rates for beta (5,1)

Fig. 7 Pass rates for beta (5, 1) random sample

0 0.5 1 1.5 2 2.5 5 10 25 50 100 mean sample size Means for beta (5,1)

Fig. 8 Means for b values of beta (5, 1) random sample

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 5 10 25 50 100 variance sample size Variances for beta (5,1)

Fig. 9 Variances for b values of beta (5, 1) random sample

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 2 4 6 8 10 12 14 16

Fig. 10 A positive-skewed random sample of size 100 drawn from beta (2,10)

(7)

tions, and zero for the Weilbull distribution; whereas MOP advises the values of bλ as 0.25 and 1.25, respectively. The variances of estimated bλ by both methods are consistent for most distributions with the exception of the Weilbull distribution due to its extreme skewness (Fig.15).

For the F distributions, the pass rates of normality tests are shown in Fig.16, where it seems that there is a uniform pass rate except when the sample size is as large as 500. No matter what method was employed, zero is the best choice for λ as the power parameter when the sample size is as small as five. Figure17also gives the estimated values of bλ to transform the random samples drawn from F distribu-tions into normal ones. MOP suggests negative values of powers for mildly skewed F distributions but MLE gives positive ones. Once again, MOP is more inefficient because of the larger variances of bλ than MLE as shown

in Fig.18. For lognormal distributions, both methods produce

much the same results for the lognormal distributions investigated here, as seen in Figs.19,20,21.

5 5.5 6 6.5 7 7.5 8 8.5 9 0 2 4 6 8 10 12 14 16

Fig. 11 The sample transformed from Fig.10by MOP with b ¼ 0:1

-4.50 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 2 4 6 8 10 12 14 16

Fig. 12 The sample transformed from Fig.10by MLE with b ¼ 0

0 20 40 60 80 100 5 10 25 50 100 500 pass rate (%) sample size Exp(1)_MOP W(5,1)_MOP Chi2(1)_MOP Exp(1)_MLE W(5,1)_MLE Chi2(1)_MLE Pass rates for highly positively skewed distributions

Fig. 13 Pass rates for highly positively skewed random samples

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 5 10 25 50 100 500 mean sample size

Means for highly positively skewed distributions

Fig. 14 Means for b values of highly positively skewed random samples 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 5 10 25 50 100 500 variance sample size

Variances for highly positively skewed distributions

Fig. 15 Variances for b values of highly positively skewed random samples

(8)

The exceptional case is the lognormal distribution with small-scale parameter, 0.1. For near symmetrical lognor-mal distributions, MLE is superior to MOP in pass rates of transformation. Figures22,23,24summarize the simula-tion results for gamma, chi-square, and Rayleigh

distribu-tions. For chi-square distributions, the larger the degree of freedom, the higher the pass rates, especially while the sample size is very large. MOP surpasses MLE in pass rates more as sample sizes increase. Zero is the only choice of transformation power estimated by MLE, while the preponderance of bλ values obtained by MOP indicate transformations using powers between 0.2 and 0.3. The trends of Rayleigh and gamma distributions are the same as chi-square distributions.

Pass rates for F distributions

0 20 40 60 80 100 pass rate (%) F(5,5)_MOP F(5,10)_MOP F(10,5)_MOP F(5,10)_MLE F(5,5)_MLE F(10,5)_MLE 5 10 25 50 ₁₀₀ 500 sample size

Fig. 16 Pass rates for random samples of F distribution

0 0.3 0.4 5 1 0 2 5 5 0 100 500 0.2 0.1 -0.1 -0.2 mean sample size Means for F distributions

Fig. 17 Means for b values of random samples drawn from F distributions 0 0.1 0.2 0.3 0.4 0.5 0.6 5 1 0 2 5 5 0 100 50 0

sam ple size

variance

Variances of for F distributions

Fig. 18 Variances for b values of random samples drawn from F distributions 50 60 70 80 90 100 5 10 25 50 100 500 LN(0,0.1)_MOP LN(0,0.1)_MLE LN(0,0.5)_MOP LN(),0.5)_MLE LN(0,1)_MOP LN(0,1)_MLE pass rate (%) sample size

Pass rates for lognormal distributions

Fig. 19 Pass rates for random samples of lognormal distributions

-0.5 0 0.5 1 1.5 5 10 25 50 100 500

sam ple size

m

ean

Means for lognormal distributions

Fig. 20 Means for b values of random samples drawn from lognormal distributions -0 .5 0 0.5 1 1.5 2 2.5

Variance for lognormal distributions

variance

5 10 25 50 100 500

sample size

Fig. 21 Variances for b values of random samples drawn from lognormal distributions

(9)

5.3 Symmetric distributions

Four different symmetric distributions are included in our investigation. We consider the beta distribution with parameters (0.5, 0.5) as a symmetric one with concave heavy tails. In contrast to the beta (0.5, 0.5) distribution, the T distribution may represent symmetric leptokurtic ones. In addtion, uniform distributions with constant symmetry and Rayleigh distributions with parameters 10 and 20, which are near symmetrical, are considered. Figures 25, 26, 27 demonstrate the transformation for platykurtic beta (0.5, 0.5), while Figs.28,29,30depict leptokurtic beta (40, 40). Since the distribution shown in Fig.25appears to have a very heavy tail, neither MOP nor MLE perform well in normal transformation. Nevertheless, not only MOP but also MLE works well for leptokurtic beta (40, 40) as shown in Figs.28,29,30.

For beta (0.5, 0.5) and uniform distributions, Figs. 31 and34show the same pattern due to samples drawn from heavy tailed distributions. The pass rates of MOP are greater than MLE between moderate sample sizes from n=10 to n=100. The pass rates decrease as sample size increases no matter what method is applied. Figures32and 35 depict that the suggested transformation power for

uniform and beta distributions of all sample sizes by MLE is zero, which is the same as MOP when the sample size is no greater than ten. If using MOP, bλ should be greater than 0.8 and less than 1 when the sample size is greater than or equal to 25. Referring to Figs. 33 and 36, the variances decrease as sample size increases from 25, because the large sample size can increase the precision of the estimation.

For leptokurtic distribution, 1,000 samples were gener-ated from each T distribution with different degrees of freedom. MOP and MLE seem to perform very equally in pass rates. As can be seen from Fig.37, higher kurtosis of the distribution results in lower pass rate. The estimated values ofλ obtained using MLE (referring to Fig.38) are zero, while the estimated powers by MOP are between 1.6 and 1.8 for median sample sizes, but drop steeply to almost square-root as sample size goes up to 500. Unlike platy distributions, as shown in Fig. 39, the variances increase with the increase of sample size.

0 20 40 60 80 100 Chi2(5)_MOP Chi2(5)_MLE Chi2(10)_MOP Chi2(10)_MLE Gam(10,1)_MOP Gam(10,1)_MLE Rayl(1)_MOP Rayl(1)_MLE 5 10 25 50 100 500 pass rate (%) sample size

Pass rates for some positively skewed distributions

Fig. 22 Pass rates for random samples of positively skewed distributions 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 mean 5 10 25 50 100 500 sample size

Means for some positively skewed distributions

Fig. 23 Means for b values of random samples drawn from positively skewed distributions

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 va ri an ce 5 10 25 50 100 500

Variances for some positively skewed distributions

sample size

Fig. 24 Variances for b values of random samples drawn from positively skewed distributions

-1 -0.5 0 0.5 1 0 2 4 6 8 10 12

Fig. 25 A platykurtic and symmetric random sample with heavy tails of size 100 drawn from beta (0.5, 0.5)

(10)

Since the Rayleigh distributions considered here are very near symmetric, their behaviors are similar to T

distribu-tions in pass rates and means of the estimated powers, as shown in Figs.40and41. Only variances do not follow the pattern of T distributions as shown in Fig.42.

-1 -0.5 0 0.5 1 0 2 4 6 8 10 12

-3.50 -3 -2.5 -2 -1.5 -1 -0.5 2 4 6 8 10 12

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 2 4 6 8 -0 2 0

Fig. 28 A leptokurtic random sample of size 100 drawn from beta (40, 40) -0.80 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 2 4 6 8 10 12 14

-1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 0 2 4 6 8 10 12 14

0 20 40 60 80 100 MOP MLE pass rate (%) 5 10 25 50 100 500 sample size Pass rates for beta (0.5, 0.5)

(11)

6 Conclusions and recommendations

In the practice of manufacturing quality control and operations management, many statistical methods applica-tions require that the process characteristic of interest is normally distributed. If the data collected is not normally distributed, using normal-base techniques may result in incorrect conclusions. For such situations, the most popular approach is to transform the non-normal data into a normal one. In this paper, we consider the Box-Cox

transforma-tion, and compare the transformation power using two different parameter estimation methods, including the maximum likelihood estimator (MLE) and the method of percentiles (MOP). The performance comparison is based on the pass rate under the Shapiro-Wilk normality test. We note that small and median samples generally result in higher pass rates than those of large samples. The results also show that in general the MOP has better pass rate, while the MLE has smaller power variation for most cases investigated. For small sample size (n=5, 10), both

Means for Beta (0.5, 0.5)

-0.2 0 0.2 0.4 0.6 0.8 1 5 10 25 50 100 sample size means

Fig. 32 Means for b values of random samples drawn from beta (0.5, 0.5)

Variances for Beta (0.5, 0.5)

0 0.05 0.1 0.15 0.2 0.25 0.3 5 10 25 50 100 sample size variances

Fig. 33 Variances for b values of random samples drawn from beta (0.5, 0.5)

Pass rates for uniform distributions

0 20 40 60 80 100 5 10 25 50 100 500 sample size pass rate (%) U(0,1)_MOP U(0,1)_MLE U(0,10)_MOP U(0,10)_MLE U(0,20)_MOP U(0,20)_MLE

Fig. 34 Pass rates for random samples of uniform distributions

Means for uniform distributions

-0.2 0 0.2 0.4 0.6 0.8 1 1.2 5 10 25 50 100 sample size means

Fig. 35 Means for b values of random samples drawn from uniform distributions

Variances for uniform distributions

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 5 10 25 50 100 sample size variances

Fig. 36 Variances for b values of random samples drawn from uniform distributions

Pass rates for T distributions

0 20 40 60 80 100 5 10 25 50 100 500 sample size pass rate (%) T(2)_MOP T(2)_MLE T(10) _MOP T(10) _MLE T(20) _MOP T(20) _MLE

(12)

methods perform equally well. For large sample size, the MOP is recommended due to its simplicity and signifi-cantly higher pass rate.

References

1. Box GEP, Cox DR (1964) An analysis of transformations. J Roy Stat Soc B 26:211–252

2. Draper NR, Cox DR (1969) On distributions and their transformations to normality. J Roy Stat Soc B 31:472–476 3. Han AK (1987) A non-parametric analysis of transformation.

J Econometrics 35:191–209

4. Hernandes F, Johnson RA (1980) The large sample behavior of transformation to normality. J Am Stat Assoc 75:855–861 5. Hinkley DV (1975) On power transformations to symmetry.

Biometrika 62(1):101–111

6. Hinkley DV (1976) On quick choice of power transformation. Appl Stat 26(1):67–69

7. Madansky A (1988) Prescriptions for working statisticians. Springer, New York

8. Ogwang T, Gouranga Rao UL (1997) A simple algorithm for estimating Box-Cox models. Statistician 46(3):399–409 9. Pyzdek T (1995) Why normal distributions aren’t [all that

normal]. Qual Eng 7(4):767–777

10. Schlesselman J (1971) Power families. A note on the Box-Cox transformation. J Roy Stat Soc B 33:307–371

11. Shapiro SS, Wilk MB (1965) An analysis of variance test for normality. Biometrika 52(3–4):591–611

12. Taylor JMG (1985) Power transformations to symmetry. Biometrika 72(1):145–152

13. Tukey JW (1957) The comparative anatomy of transformations. Ann Math Stat 28:602–632

Means for T distributions

0 0.5 1 1.5 2 5 10 25 50 100 500 sample size means

Fig. 38 Means for b values of random samples drawn from T distributions

Variances for T distributions

0 1 2 3 4 5 10 25 50 100 500 sample size variances

Fig. 39 Variances for b values of random samples drawn from T distributions

Pass rates for Rayleigh distributions

0 20 40 60 80 100 5 10 25 50 100 500 sample size pass rate (%) R(10)_MOP R(10)_MLE R(20)_MOP R(20)_MLE

Fig. 40 Pass rates for random samples of Rayleigh distributions

Means for Rayleigh distributions

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 5 10 25 30 100 500 sample size means

Fig. 41 Means for b values of random samples drawn from Rayleigh distributions

Variances for Rayleigh distributions

0 0.2 0.4 0.6 0.8 1 5 10 25 50 100 500 sample size variances

Fig. 42 Variances for b values of random samples drawn from Rayleigh distributions