國 立 交 通 大 學
統 計 學 研 究 所
碩 士 論 文
Likelihood Inference under the Transformed Truncated Normal
Mode Regression Model
研 究 生:沈彥廷
指導教授:陳志榮 博士
變換截常態眾數迴歸模型的概似推論
Likelihood Inference under the Transformed Truncated
Normal Mode Regression Model
研 究 生:沈彥廷 Student:Yen-Tin Sen
指導教授:陳志榮 博士 Advisor:Dr. Chih-Rung Chen
國 立 交 通 大 學
統 計 學 研 究 所
碩 士 論 文
A Thesis
Submitted to Institute of Statistics
College of Science
National Chiao Tung University
In Partial Fulfillment of the Requirements
for the Degree of Master
in
Statistics
June 2010
Hsinchu, Taiwan
變換截常態眾數迴歸模型的概似推論
學生:沈彥廷 指導教授:陳志榮 博士
國立交通大學統計學研究所
摘要
當一組資料經過變換以後,其值域有可能不包含某些實數;在此情況下,
變換後的資料不可能滿足傳統的常態假設。因此我們提出一個變換截常態眾
數迴歸模型及其概似推論,然後應用到兩組實際的資料中,並與傳統的常態
假設做比較。最後比較變換截常態眾數、平均數和中位數三種不同迴歸模型
的計算複雜性。
Likelihood Inference under the Transformed Truncated
Normal Mode Regression Model
Student: Yen-Tin Sen
Advisor: Dr. Chih-Rung Chen
Institute of Statistics
National Chiao Tung University
Abstract
In the paper, the likelihood inference under the transformed truncated normal mode regression model is proposed when the range of the transformation is possibly different from the whole real line. The proposed methodology is then applied to two real data sets in Box and Cox (1964), where the truncated normality assumption is compared with the conventional normality assumption. Finally, the proposed model is compared with the transformed truncated normal mean and median regression models via the computational complexity.
誌 謝
我最需要感謝的,是陳志榮老師的指導,有老師的幫忙才能順利完成這
篇論文,而且老師對於其學生付出的心力,遠比想像中還要多很多,從老師
身上可以感覺出對研究的熱忱,除了學到解決問題的能力之外,更了解處理
事情應該要細心、謹慎;會以老師為榜樣,在未來的日子,懷抱熱情並以求
知的精神走下去。感謝兩年來所上老師們的教導,同時也要謝謝所上行政人
員,在剛來統研所這個新環境時的幫助,還有同學們在課業上互相學習、競
爭,這對我助益良多。很高興能在統研所和老師、同學一起度過兩年。
最後,感謝父母給於生活上的安定,讓我可以專心於學業上,不必為其
他事操煩,有父母的付出、老師的教導、同學們的關愛讓我能順順利利完成
學業。
沈 彥 廷 謹至于
國 立 交 通 大 學 統 計 學 研 究 所
中華民國九十九年六月
Contents
1 Introduction 1 2 Transformed Truncated Normal Mode Regression Model 4
2.1 Transformed Truncated Normal Mode Regression Model . . . 4
2.2 Maximum Likelihood Estimation . . . 4
2.3 Hypothesis Testing and Confidence Regions . . . 7
2.4 Prediction Region of Future Observations . . . 8
3 Two Real Data Sets 11 3.1 A Biological Experiment Using a 3×4 Factorial Design . . . 11
3.2 A Textile Experiment Using a Single Replicate of a 33 Design . . . . 13
List of Tables
1 Survival times (1 unit = 10 hours) of animals in a 3×4 factorial experiment. 29 2 MLEs under the false normality assumption and the truncated normality
assumption, respectively, for Example 3.1. . . 30 3 MLEs without interaction under the false normality assumption and the
truncated normality assumption, respectively, for Example 3.1. . . 31 4 MLEs with λ = −1 under the false normality assumption and the truncated
normality assumption, respectively, for Example 3.1. . . 32 5 MLEs without interaction and with λ = −1 under the false normality
assumption and the truncated normality assumption, respectively, for Ex-ample 3.1. . . 33 6 Cycles to failure of worsted yarn: 33 factorial experiment without replication. 34
7 MLEs under the false normality assumption and the truncated normality assumption, respectively, for Example 3.2. . . 35 8 MLEs without quadratic terms under the false normality assumption and
the truncated normality assumption , respectively, for Example 3.2. . . 36 9 MLEs with λ = 0 under the false normality assumption and the truncated
normality assumption , respectively, for Example 3.2. . . 37 10 MLEs with λ = 0 and without quadratic terms under the false
normal-ity assumption and the truncated normalnormal-ity assumption, respectively, for Example 3.2. . . 38
List of Figures
1 Some different modified power transformations. . . 39 2 (a) Residual plot against fitted values for the original data under the
two-way ANOVA effects model for Example 3.1. (b) Residual plot against fitted values for the transformed data under the Box-Cox transformed mode regression model for Example 3.1. . . 40 3 (a) Normal probability plot under the false normality assumption for
Ex-ample 3.1. (b) Normal probability plot under the truncated normality assumption for Example 3.1. . . 40 4 (a) Normal probability plot under the false normality assumption
with-out interactions for Example 3.1. (b) Normal probability plot under the truncated normality assumption without interactions for Example 3.1. . . 41 5 (a) Normal probability plot under the false normality assumption with
λ = −1 for Example 3.1. (b) Normal probability plot under the truncated normality assumption with λ = −1 for Example 3.1. . . 41 6 (a) Normal probability plot under the false normality assumption without
interactions and with λ = −1 for Example 3.1. (b) Normal probability plot under the truncated normality assumption without interactions and with λ = −1 for Example 3.1. . . 42 7 (a) Residual plot against fitted values for the original data under the
quadratic regression model for Example 3.2. (b) Residual plot against fit-ted values for the transformed data under the Box-Cox transformed mode regression model for Example 3.2. . . 42 8 (a) Normal probability plot under the false normality assumption for
Ex-ample 3.2. (b) Normal probability plot under the truncated normality assumption for Example 3.2. . . 43 9 (a) Normal probability plot under the false normality assumption without
quadratic effects and interactions for Example 3.2. (b) Normal probability plot under the truncated normality assumption without quadratic effects and interactions for Example 3.2. . . 43
10 (a) Normal probability plot under the false normality assumption with λ = 0 for Example 3.2. (b) Normal probability plot under the truncated normality assumption with λ = 0 for Example 3.2. . . 44 11 (a) Normal probability plot under the false normality assumption without
quadratic effects and interactions and with λ = 0 for Example 3.2. (b) Normal probability plot under the truncated normality assumption without quadratic effects and interactions and with λ = 0 for Example 3.2. . . 44
1
Introduction
The techniques for linear models are justified by assuming simplicity of systematic structure, constancy of error variances, normality of distributions, and independence of responses. In analyzing data which do not satisfy the traditional assumptions for linear models, Tukey (1957) suggested two alternatives: either a new analysis must be devised or the data must be transformed to satisfy the assumptions. If a satisfactory transformation can be found, it is usually easier to use the conventional techniques for linear models to analyze the transformed data than to develop a new method to analyze the original data.
It is common practice simply to assume the following normal regression model: yi = f (xi; β) + εi (1)
for i = 1, . . . , n, where yi is the response for subject i; xi is a known covariate vector
for subject i; β is an unknown finite-dimensional regression parameter vector; f (·; β) is a known regression function for each β, e.g., f (xi; β) = xTi β or exp{xTi β}; and εis are
i.i.d. N (0, σ2) errors with unknown positive standard deviation σ. Notice that all of the mean, median, and mode of yi are the same as f (xi; β) for i = 1, . . . , n.
When there exist heteroscedastic errors and/or departures from normality in the data, one possible approach is to transform the data. A widely used family of transformations to transform positive continuous data is the family of modified power transformations
u(λ) ≡ (uλ− 1)/λ for λ 6= 0, log(u) for λ = 0. (2) Figure 1 shows some different modified power transformations.
In such situations, Box and Cox (1964) proposed the following Box-Cox transformed linear normal regression model:
yi(λ) = xTi β + εi (3)
for i = 1, . . . , n, where yi has support (0, ∞) and λ is an unknown real-valued
transfor-mation parameter.
When both heteroscedastic errors and departures from normality cannot be simulta-neously removed in the data by any single transformation, Carroll and Ruppert (1988) proposed the following Box-Cox transformed heteroscedastic normal regression model:
for i = 1, . . . , n, where εis are independent errors distributed as N (0, g2(f (xi; β), zi; γ) σ2)
such that zi is a known covariate vector for subject i, e.g., zi is a known function of xi;
γ is an unknown finite-dimensional parameter vector; g(·, ·; γ) is a known positive function for each γ, e.g., g(f (xi; β), zi; γ) = exp{f (xi; β)γ1 + ziTγ2} with γ ≡ (γ1, γ2T)T; and σ is
an unknown positive scale parameter. In Carroll and Ruppert (1988), the constants 1/g2(f (x
i; β), zi; γ) are called the true weights. Notice that model (3) is a special case of
model (4) when f (xi; β) = xTiβ and g(f (xi; β), zi; γ) = 1 for i = 1, . . . , n.
However, y(λ)i ∈ (−1/λ, ∞) for λ > 0, (−∞, ∞) (≡ R) for λ = 0, and (−∞, −1/λ) for λ < 0, respectively. Thus, except for λ = 0, y(λ)i cannot be normally distributed. Hence, Poirier (1978) modified model (3) to the following Box-Cox transformed linear truncated normal mode regression model:
yi(λ) = xTi β + εi (5)
for i = 1, . . . , n, where εis are independent errors distributed as either N (0, σ2) for λ = 0
or truncated N (0, σ2) for λ 6= 0 with unknown positive scale parameter σ. Notice that,
for i = 1, . . . , n, xTi β is the mode of yi(λ) when it is in the support of y(λ)i ; however, it is generally neither the mean nor median of yi(λ).
In Chen and Wang (2003), three widely used families of transformations with ranges possibly different from R are reviewed as follows:
Example 1.1 The family of shifted power transformations (Box and Cox, 1964)
h(u; λ) ≡ (u − a)(λ) = [(u − a)λ− 1]/λ for λ 6= 0, log(u − a) for λ = 0, (6) can be used to transform data with known support (a, ∞), where a ∈ R, e.g., a = 0. Then the range h((a, ∞); λ) is (−1/λ, ∞) for λ > 0, R for λ = 0, and (−∞, −1/λ) for λ < 0, respectively. Similarly, the family of transformations
h(u; λ) ≡ −(b − u)(λ) = [1 − (b − u)λ]/λ for λ 6= 0, − log(b − u) for λ = 0, (7) can be used to transform data with known support (−∞, b), where b ∈ R, e.g., b = 0. Then the range h((−∞, b); λ) is (−∞, 1/λ) for λ > 0, R for λ = 0, and (1/λ, ∞) for λ < 0, respectively.
Tukey, 1977) h(u; λ) ≡ (u − a)(λ)− (b − u)(λ) = [(u − a)λ− (b − u)λ]/λ for λ 6= 0,
log[(u − a)/(b − u)] for λ = 0,
(8) can be used to transform data with known support (a, b), where −∞ < a < b < ∞, e.g., (a, b) = (0, 1). Then the range h((a, b); λ) is (−(b − a)λ/λ, (b − a)λ/λ) for λ > 0 and R
for λ ≤ 0, respectively.
Example 1.3 The family of shifted modulus power transformations (John and Draper, 1980)
h(u; λ) ≡ sgn(u − λ2)(|u − λ2| + 1)(λ1)
=
sgn(u − λ2)[(|u − λ2| + 1)λ1 − 1]/λ1 for λ1 6= 0,
sgn(u − λ2) log(|u − λ2| + 1) for λ1 = 0,
(9) can be used to transform data with support R, where λ ≡ (λ1, λ2)T and sgn(u) = 1 for
u > 0, 0 for u = 0, and −1 for u < 0, respectively. Then the range h(R; λ) is R for λ1 ≥ 0
and (1/λ1, −1/λ1) for λ1 < 0, respectively.
In order to cover such kinds of families in Examples 1–3, Chen and Wang (2003) modified model (4) to the following transformed truncated normal median regression model:
h(yi; λ) = f (xi; β) + εi (10)
for i = 1, . . . , n, where yi has known support (a, b) (⊂ R), e.g., (a, b) = (0, ∞), or (0, 1), or
R; λ is an unknown finite-dimensional transformation vector; h(·; λ) is a known strictly increasing and differentiable real-valued function on (a, b), e.g., Examples 1.1–1.3; and εis are independent errors distributed as either N (0, g2(f (xi; β), zi; γ) σ2) or truncated
N (µi(λ, β, σ, γ), g2(f (xi; β), zi; γ) σ2) with median 0 for some µi(λ, β, σ, γ) ∈ R. Notice
that, for i = 1, . . . , n, f (xi; β) is the median of h(yi; λ); however, it is generally neither
the mean nor mode of h(yi; λ).
In Section 2, the transformed truncated normal mode regression model is proposed to extend model (5) and then the corresponding likelihood inference is discussed thoroughly. In Section 3, the proposed methodology is applied to two real data sets in Box and Cox (1964). Finally, conclusions and discussion are given in Section 4.
2
Transformed Truncated Normal Mode Regression
Model
In this section, the transformed truncated normal mode regression model is proposed to extend model (5) and then the corresponding likelihood inference is discussed thoroughly.
2.1
Transformed Truncated Normal Mode Regression Model
Consider the following transformed truncated normal mode regression model:
h(yi; λ) = f (xi; β) + εi (11)
for i = 1, . . . , n, where yi is the response for subject i with known support (a, b) (⊂ R),
e.g., (0, ∞), or (0, 1), or R; λ is an unknown finite-dimensional transformation vector; h(·; λ) is a known strictly increasing and differentiable real-valued function on (a, b), e.g., Examples 1.1–1.3 in Section 1; xiis a known covariate vector for subject i; β is an unknown
finite-dimensional regression parameter vector; f (·; β) is a known regression function for each β, e.g., f (xi; β) = xTi β or exp{xTi β}; and εis are independent errors distributed as
either N (0, g2(f (x
i; β), zi; γ) σ2) or truncated N (0, g2(f (xi; β), zi; γ) σ2) such that zi is a
known covariate vector for subject i, e.g., zi is a known function of xi; γ is an unknown
finite-dimensional parameter vector; g(·, ·; γ) is a known positive function for each γ, e.g., g(f (xi; β), zi; γ) = exp{f (xi; β) γ1 + ziTγ2} with γ ≡ (γ1, γ2T)T; and σ is an unknown
positive scale parameter. Notice that, for i = 1, . . . , n, f (xi; β) is the mode of h(yi; λ)
when it is in the support of h(yi; λ); however, it is generally neither the mean nor median
of h(yi; λ).
2.2
Maximum Likelihood Estimation
Let θ (≡ (θ1, . . . , θd)T) denote the d-dimensional parameter vector (λT, βT, σ, γT)T
in the parameter space Θ, where Θ is a non-empty open subset of the d-dimensional Euclidean space Rd. Let Φ(·) denote the cumulative distribution function (c.d.f.) of N (0, 1) and φ(·) the probability density function (p.d.f.) of N (0, 1). Set
ei(u; θ) ≡ h(u; λ) − f (xi; β) g(f (xi; β), zi; γ) σ ≡ h(u; λ) − fi(β) gi(β, γ) σ (12)
for u ∈ [a, b], θ ∈ Θ, and i = 1, . . . , n, where h(a; λ) ≡ limu↓ah(u; λ) and h(b; λ) ≡
limu↑bh(u; λ).
Under model (11), the p.d.f. of yi is
pi(yi; θ) =
φ(ei(yi; θ))h0(yi; λ)
gi(β, γ) σ [Φ(ei(b; θ)) − Φ(ei(a; θ))]
· 1(a,b)(yi) (13)
for i = 1, . . . , n, where h0(yi; λ) ≡ ∂h(u; λ)/∂u|u=yi ≡ h
0
i(λ) and 1(a,b)(yi) = 1 for yi ∈ (a, b)
and 0 otherwise. Set y ≡ (y1, . . . , yn)T. Set hi(λ) ≡ h(yi; λ), and ei(θ) ≡ ei(yi; θ) for θ ∈ Θ
and i = 1, . . . , n. Then, given y, the likelihood function for θ is
n
Y
i=1
φ(ei(θ))h0i(λ)
gi(β, γ) σ [Φ(ei(b; θ)) − Φ(ei(a; θ))]
≡ L(θ) (14) and the log-likelihood function for θ is
log[L(θ)] ≡ `(θ) ≡ n X i=1 `i(θ), (15) where
`i(θ) = log[φ(ei(θ))] + log[h0i(λ)] − log[gi(β, γ)] − log(σ)
− log[Φ(ei(b; θ)) − Φ(ei(a; θ))]. (16)
Assume that ∂ ∂θ Z b a pi(yi; θ) dyi = Z b a ∂pi(yi; θ) ∂θ dyi (17) and ∂2 ∂θ∂θT Z b a pi(yi; θ) dyi = Z b a ∂2p i(yi; θ) ∂θ∂θT dyi (18)
for θ ∈ Θ and i = 1, . . . , n. Then the score function for θ is ∂`(θ) ∂θ = n X i=1 ∂`i(θ) ∂θ ≡ n X i=1 Si(θ) ≡ S(θ), (19)
the observed Fisher information for θ is −∂ 2`(θ) ∂θ∂θT = − n X i=1 ∂2` i(θ) ∂θ∂θT ≡ n X i=1 Ji(θ) ≡ J (θ), (20)
and the expected Fisher information for θ is Covθ(S(θ)) = n X i=1 Covθ(Si(θ)) ≡ n X i=1 Ii(θ) ≡ I(θ), (21)
where both Si(θ) and Ji(θ) are put in Appendices A and B, respectively. By equation (17),
Eθ(Si(θ)) = 0d×1 for θ ∈ Θ and i = 1, . . . , n, where 0d×1 denotes the d × 1 vector
(0, . . . , 0)T. By equations (17) and (18), Eθ(Ji(θ)) = Ii(θ) for θ ∈ Θ and i = 1, . . . , n.
Assume that, given y, there exists a unique maximum likelihood estimate (MLE) ˆ
θ(y) (≡ ˆθ) of θ. Then ˆθ solves the score equation S(ˆθ) = 0d×1 for θ. One possible
approach to evaluate ˆθ is as follows: First choose a good initial value ˆθ(0) and then iterate
the following equations
ˆ
θ(k+1)= ˆθ(k)+ M−1(ˆθ(k))S(ˆθ(k)) (22)
for k = 0, 1, 2, . . . until ||S(ˆθ(k+1))|| < ε for some small positive value ε, e.g., ε = 10−3,
where ||a|| ≡ (aTa)1/2 for a ∈ Rd. When M (ˆθ(k)) = I(ˆθ(k)) for k = 0, 1, 2, . . ., it is called the Fisher scoring method. However, it will take too much time to evaluate I(ˆθ(k))s
because there is generally no closed-form formula for each I(ˆθ(k)). When M (ˆθ(k)) = J (ˆθ(k)) for k = 0, 1, 2, . . ., it is called the Newton-Raphson method. It is usually difficult
to find a good initial value for the Newton-Raphson method specially when d is not a small positive integer. Moreover, it is not necessary that `(ˆθ(k+1)) > `(ˆθ(k)) for k =
0, 1, 2, . . .. Thus, a modified Newton-Raphson method is suggested as follows: First choose a good initial value ˆθ(0) and a fraction λ ∈ (0, 1), e.g., λ = 1/2. When ˆθ(k) is obtained
and ST(ˆθ(k))J−1(ˆθ(k))S(ˆθ(k)) 6= 0 for some non-negative integer k, iterate the following equations
ˆ
θ(k+1,j) ≡ ˆθ(k)+ sgn(ST(ˆθ(k))J−1(ˆθ(k))S(ˆθ(k))) λjJ−1(ˆθ(k))S(ˆθ(k)) (23)
for j = 0, 1, 2, . . . , mk, where mk is the first j such that `(ˆθ(k+1,j)) > `(ˆθ(k)). Set ˆθ(k+1) ≡
ˆ
θ(k+1,mk) for k = 0, 1, 2, . . . until ||S(ˆθ(k+1))|| < ε for some small positive value ε, e.g.,
ε = 10−3.
When ST(ˆθ(k))J−1(ˆθ(k))S(ˆθ(k)) 6= 0 for some non-negative integer k, it follows from the first-order Taylor expansion that
`(ˆθ(k+1,j))
= `(ˆθ(k)) + ST(ˆθ(k))hsgn(ST(ˆθ(k))J−1(ˆθ(k))S(ˆθ(k))) λjJ−1(ˆθ(k))S(ˆθ(k))i+ o(λj) = `(ˆθ(k)) + λj|ST(ˆθ(k))J−1
(ˆθ(k))S(ˆθ(k))| + o(λj) (24) as j → ∞, which implies that `(ˆθ(k+1,j)) > `(ˆθ(k)) for large j and thus m
k is well-defined.
Now consider the case where the sample size n tends to infinity. Assume that the following conditions hold:
(i) the minimum eigenvalue of I(θ) tends to infinity as n → ∞;
(ii) Eθ(max1≤i≤n|∂`i(θ)/∂θj|)/[V arθ(∂`(θ)/∂θj)]1/2→ 0 as n → ∞ for j = 1, . . . , d;
(iii) I−1/2(θ)J (θ)I−1/2(θ) −→ Ip d as n → ∞ , where Id denotes the identity matrix of
order d; and
(iv) [diag{I11(θ), . . . , Idd(θ)}]−1/2I(θ)[diag{I11(θ), . . . , Idd(θ)}]−1/2 → Σ(θ) as n → ∞,
where Ijj(θ) denotes the jth diagonal element of I(θ) for j = 1, . . . , d and Σ(θ) is a
positive definite covariance matrix.
Let M (θ) denote either I(θ) or J (θ). Then, by Theorem 1.80 of Prakasa Rao (1999), M−1/2(θ)S(θ)−→ Nd d(0d×1, Id) (25)
as n → ∞, where Nd(0d×1, Id) denotes the d-variate normal distribution with mean
vec-tor 0d×1 and covariance matrix Id. Assume that
I−1/2(θ){S(ˆθ) − [S(θ) − J (θ)(ˆθ − θ)]} = op(1) (26)
as n → ∞. Then, by condition (iii) and equations (25) and (26), M1/2(θ)(ˆθ − θ) = M−1/2(θ)S(θ) + op(1)
d
−
→ Nd(0d×1, Id) (27)
as n → ∞. Thus, by condition (i) and equation (27), ˆθ is a weakly consistent estimator of θ. Assume that I−1/2(θ)I(ˆθ)I−1/2(θ) −→ Ip d and J−1/2(θ)J (ˆθ)J−1/2(θ)
p − → Id as n → ∞. Then, by equation (27), M1/2(ˆθ)(ˆθ − θ) = M1/2(θ)(ˆθ − θ) + op(1) d − → Nd(0d×1, Id) (28) as n → ∞.
2.3
Hypothesis Testing and Confidence Regions
In this subsection, let ω (≡ (ψT, χT)T ∈ Ω ⊂ Rd) be a one-to-one reparameterization
of θ such that det(∂θ/∂ωT) 6= 0 and ∂2θ
j/∂χ∂χT is a continuous function of χ for j =
1, . . . , d, where ψ is the d0-dimensional parameter vector of interest and χ is a (d − d0
)-dimensional nuisance parameter vector with d0 ∈ {1, . . . , d}. Here χ does not exist when
d0 = d. Suppose that we are interested in testing the null hypothesis H0: ψ = ψ0 versus
Set Sψ(χ) ≡ ∂`(θ)/∂χ, Iψ(χ) ≡ Covω(Sψ(χ)), and Jψ(χ) ≡ −∂Sψ(χ)/∂χT for ω ∈ Ω.
Then Sψ(χ) = [∂θT/∂χ]S(θ), Iψ(χ) = [∂θT/∂χ]I(θ)[∂θ/∂χT], and
Jψ(χ) = ∂θT ∂χ J (θ) ∂θ ∂χT − d X j=1 ∂2θ j ∂χ∂χTS(θ)j (29)
for ω ∈ Ω, where S(θ) ≡ (S(θ)1, . . . , S(θ)d)T. Assume that, given y, there exists a unique
MLE ˆχψ(y) (≡ ˆχψ) of χ for fixed ψ. Then ˆχψ solves the score equation Sψ( ˆχψ) = 0(d−d0)×1
for χ, where 0(d−d0)×1 denotes the (d − d0) × 1 vector (0, . . . , 0)
T.
Set W (ψ) ≡ 2[`(ˆθ) − `(θ(ψ, ˆχψ))]. Assume that I −1/2 ψ (χ)Jψ( ˆχψ)I −1/2 ψ (χ) p − → Id−d0, Iψ1/2(χ)( ˆχψ− χ) = I −1/2 ψ (χ)Sψ(χ) + op(1), (30) `(θ) = `(ˆθ) + ST(ˆθ)(θ − ˆθ) − 1 2(θ − ˆθ) TJ (ˆθ)(θ − ˆθ) + o p(1), (31) and `(θ) = `(θ(ψ, ˆχψ)) + SψT( ˆχψ)(χ − ˆχψ) − 1 2(χ − ˆχψ) TJ ψ( ˆχψ)(χ − ˆχψ) + op(1) (32)
as n → ∞. Then, by equations (27) and (28), W (ψ) = ST(θ)I−1/2(θ) ( Id− I1/2(θ) ∂θ ∂χT ∂θT ∂χ I(θ) ∂θ ∂χT −1 ∂θT ∂χ I 1/2(θ) ) I−1/2(θ)S(θ) +op(1) d − → χ2 d0 (33) as n → ∞.
Let α ∈ (0, 1) be fixed, e.g., α = 0.05. The likelihood ratio test with asymptotic size α is to reject H0: ψ = ψ0 if and only if the likelihood ratio test statistic W (ψ0) > χ2α,d0,
where χ2α,d0 denotes the upper α quantile of the χ2distribution with d0degrees of freedom.
To evaluate W (ψ0), we need to evaluate ˆχψ0. One possible approach to evaluate ˆχψ0 is
to utilize a modified Newton-Raphson method in Section 2.2. Therefore, {ψ0: W (ψ0) ≤
χ2
α,d0} is an asymptotic size 1 − α confidence region for ψ.
2.4
Prediction Region of Future Observations
Suppose that
for j = 1, . . . , m, where m is a known positive integer, yn+j is the future observation for
subject n + j with support (a, b), xn+j is a known covariate vector for subject n + j,
and εn+j is an error distributed as either N (0, g2(f (xn+j; β), zn+j; γ) σ2) or truncated
N (0, g2(f (x
n+j; β), zn+j; γ) σ2) with known covariate vector zn+j, and ε1, . . . , εn+m are
independent. For θ ∈ Θ, u ∈ [a, b], and j = 1, . . . , m, set en+j(u; θ) ≡ h(u; λ) − f (xn+j; β) g(f (xn+j; β), zn+j; γ) σ ≡ h(u; λ) − fn+j(β) gn+j(β, γ) σ . (35) Let α ∈ (0, 1) be fixed, e.g., α = 0.05. For θ ∈ Θ and j = 1, . . . , m, let Φn+j(·; θ)
denote the c.d.f. of εn+j and qn+j,α(θ) the α quantile of yn+j. Then
qn+j,α(θ) = h−1(fn+j(β) + Φ−1n+j(α; θ); λ) (36)
with MLE qn+j,α(ˆθ) for θ ∈ Θ and j = 1, . . . , m, where
Φ−1n+j(α; θ) = gn+j(β, γ) σ Φ−1((1 − α) Φ(en+j(a; θ)) + α Φ(en+j(b; θ)))
≡ gn+j(β, γ) σ Φ−1n+j(α, a, b; θ). (37)
Assume that qn+j,α(θ) is a continuously differentiable function of θ with ∂qn+j,α(θ)/∂θ 6=
0d×1 for θ ∈ Θ and j = 1, . . . , m. Then
∂qn+j,α(θ) ∂θ = ∂fn+j(β)/∂θ + σ Φ−1n+j(α, a, b; θ) ∂gn+j(β, γ)/∂θ h0(q n+j,α(θ); λ) +gn+j(β, γ) Φ −1 n+j(α, a, b; θ) ∂ σ/∂θ + gn+j(β, γ) σ ∂Φ−1n+j(α, a, b; θ)/∂θ h0(q n+j,α(θ); λ) −∂h(u; λ)/∂θ|u=qn+j,α(θ) h0(q n+j,α(θ); λ) (38) for θ ∈ Θ and j = 1, . . . , m, where
∂Φ−1n+j(α, a, b; θ) ∂θ =
(1 − α) ∂Φ(en+j(a; θ))/∂θ + α ∂Φ(en+j(b; θ))/∂θ
φ(Φ−1n+j(α, a, b; θ)) (39) with both ∂Φ(en+j(a; θ))/∂θ and ∂Φ(en+j(b; θ))/∂θ being evaluated by similar formulas
in Appendix A. By equations (27) and (28), ∂qn+j,α(θ) ∂θT M −1 (θ)∂qn+j,α(θ) ∂θ −1/2 [qn+j,α(ˆθ) − qn+j,α(θ)] d − → N (0, 1) (40) and ∂qn+j,α(θ) ∂θT θ=ˆθ M−1(ˆθ)∂qn+j,α(θ) ∂θ θ=ˆθ −1/2 [qn+j,α(ˆθ) − qn+j,α(θ)] d − → N (0, 1) (41)
as n → ∞ for θ ∈ Θ and j = 1, . . . , m, where M denotes either I or J . Set αm ≡ [1 − (1 − α)1/m]/2. Since yn+1, . . . , yn+m are independent,
Pθ m \ j=1 {yn+j ∈ [qn+j,αm(θ), qn+j,1−αm(θ)]} ! = m Y j=1 Pθ({yn+j ∈ [qn+j,αm(θ), qn+j,1−αm(θ)]}) = (1 − 2 αm) m = 1 − α, (42)
which implies that [qn+1,αm(θ), qn+1,1−αm(θ)] × · · · × [qn+m,αm(θ), qn+m,1−αm(θ)] is a size
1 − α prediction region for (yn+1, . . . , yn+m)T with MLE [qn+1,αm(ˆθ), qn+1,1−αm(ˆθ)] × · · · ×
3
Two Real Data Sets
In this section, the proposed methodology is applied to two real data sets in Box and Cox (1964).
3.1
A Biological Experiment Using a 3×4 Factorial Design
Table 1 shows the survival times of animals in a 3 × 4 factorial experiment, the factors being A with three poisons and B with four treatments. Each combination of these two factors is replicated for four animals, the allocation to animals being completely randomized. The two-way analysis-of-variance (ANOVA) effects model is
yijk= µ + τi+ βj + (τ β)ij + εijk (43)
for i = 1, 2, 3 and j, k = 1, 2, 3, 4, where yijk is the kth observation for the ith poison of
factor A and the jth treatment of factor B; µ is the overall mean, τi is the main effect of
the ith level of factor A, βj is the main effect of the jth level of factor B, (τ β)ij is the
interaction between the ith level of factor A and the jth level of factor B, and εijks are
i.i.d. N (0, σ2) errors with unknown positive standard deviation σ. Figure 2(a) shows the
residual plot against fitted values for the original data under the two-way ANOVA effects model. It is seen that V ar(yijk) increases as E(yijk) increases.
Now consider the following Box-Cox transformed truncated normal mode regression model:
yijk(λ) = µ + τi+ βj+ (τ β)ij + εijk (44)
for i = 1, 2, 3 and j, k = 1, 2, 3, 4, where each yijk has support (0, ∞), λ is am unknown
real-valued transformation parameter and εijks are independent errors distributed as either
N (0, σ2) or truncated N (0, σ2) with unknown positive scale parameter σ. Figure 2(b)
shows the residual plot against fitted values for the transformed data under the Box-Cox transformed truncated normal mode regression model.
One possible way to find a good initial value ˆθ(0) in this case is put in Appendix C. How to plot the normal probability plot in this case is put in Appendix D. Table 2 shows the MLEs under the false normality assumption and under the truncated normality assumption, respectively. Figure 3 shows the normal probability plots under the false normality assumption and the truncated normality assumption, respectively.
First, test the null hypothesis H0: (τ β)ij = 0 for all (i, j) versus the alternative H1:
(τ β)ij 6= 0 for some (i, j). Then the asymptotic p-value is 0.3168 and thus it fails to reject
the null hypothesis H0.
Table 3 shows the MLEs without interactions under the false normality assumption and the truncated normality assumption, respectively. Figure 4 shows the normal probability plots under the false normality assumption and the truncated normality assumption, respectively, without interactions.
Similarly, we are also interested in testing the null hypothesis H0: λ = −1 and (τ β)ij =
0 for all (i, j) versus the alternative H1: λ 6= −1 or (τ β)ij 6= 0 for some (i, j). The
asymptotic p-value is 0.2829 under the truncated normality assumption, and thus it also fails to reject the null hypothesis H0.
Table 4 shows the MLEs with λ = −1 under the false normality assumption and the truncated normality assumption, respectively. Figure 5 shows the normal probability plots under the false normality assumption and the truncated normality assumption, respectively, with λ = −1. Table 5 shows the MLEs under the false normality assumption and under the truncated normality assumption, respectively, without interactions and with λ = −1. Figure 6 shows the normal probability plots under the false normality assumption and the truncated normality assumption, respectively, without interactions and with λ = −1.
Suppose that
yi(λ)
ljlkl = µ + τil+ βjl+ εiljlkl (45)
for l = 1, . . . , m, where m is a positive integer, yiljlkl is the klth observation with the
ilth poisson of factor A and the jlth treatment of factor B, all (il, jl, kl)s are different for
kl ≥ 5, εiljlkl is an error distributed as either N (0, σ
2) or truncated N (0, σ2), and ε
ijks
and εiljlkls are independent.
Let α ∈ (0, 1) be fixed, e.g., 0.05. For l = 1, . . . , m, let Φiljlkl(·; θ) denote the c.d.f. of
εiljlkl and qiljlkl,α(θ) the α quantile of yiljlkl. Then
qiljlkl,α(θ) =1 + λ µ + τil+ βjl+ Φ −1 iljlkl(α; θ) 1/λ (46) for l = 1, . . . , m, where Φ−1i ljlkl(α; θ) = σ Φ −1 α Φ −1/λ − µ − τil− βjl σ . (47)
Thus, [qi1j1k1,αm(θ), qi1j1k1,1−αm(θ)] × · · · × [qimjmkm,αm(θ), qimjmkm,1−αm(θ)] is a size 1 − α
prediction region for (yi1j1k1, . . . , yimjmkm)
T with MLE [q
i1j1k1,αm(ˆθ), qi1j1k1,1−αm(ˆθ)] × · · · ×
[qimjmkm,αm(ˆθ), qimjmkm,1−αm(ˆθ)], where αm ≡ [1 − (1 − α)
1/m]/2.
3.2
A Textile Experiment Using a Single Replicate of a 3
3Design
Table 6 shows the numbers of cycles to failure, y, obtained in a single replicate of a 33 factorial experiment in which the factors are
x1: length of test specimen (250, 300, 350 mm),
x2: amplitude of loading cycle (8, 9, 10 mm),
x3: load (40, 45, 50 gm).
In Table 6, the levels of the x1, x2, and x3 are coded as −1, 0, 1, respectively. Consider
the following quadratic regression model: yi = β0+ 3 X j=1 βjxij + X 1≤j≤k≤3 βjkxijxik+ εi (48)
for i = 1, . . . , 27, where yiis the response for (x1, x2, x3) = (xi1, xi2, xi3), β0is the intercept,
βjs and βjks are regression coefficients, and εis are i.i.d. N (0, σ2) errors with unknown
positive standard deviation σ. Figure 7(a) shows the residual plot against fitted values for the original data under the quadratic regression model. It is easily seen that there is an obvious pattern in Figure 7(a).
Now consider the following Box-Cox transformed truncated normal mode regression model: yi(λ) = β0+ 3 X j=1 βjxij + X 1≤j≤k≤3 βjkxijxik+ εi (49)
for i = 1, . . . , 27, where yi has support (0, ∞) and εis are independent errors distributed
as either N (0, σ2) or truncated N (0, σ2) with unknown positive standard deviation σ.
Figure 7(b) is the residual plot against fitted values for the transformed data under the Box-Cox transformed truncated normal mode regression model.
Table 7 shows the MLEs under the false normality assumption and under the truncated normality assumption, respectively. It is seen that the MLEs under the false normality assumption are nearly the same as under the truncated normality assumption. Figure 8 shows the normal probability plots under the false normality assumption and the trun-cated normality assumption, respectively.
First, test the null hypothesis H0: βjk = 0 for all (j, k) versus the alternative H1:
βjk 6= 0 for some (j, k). Then the asymptotic p-value is 0.3487 and thus it fails to reject
the null hypothesis H0.
Table 8 shows the MLEs under the false normality assumption and under the truncated normality assumption, respectively, without quadratic effects and interactions. Figure 9 shows the normal probability plots under the false normality assumption and the trun-cated normality assumption, respectively, without quadratic effects and interactions.
Similarly, we are also interested in testing the null hypothesis H0: λ = 0 and βjk = 0
for all (j, k) versus the alternative H1: λ 6= 0 or βjk 6= 0 for some (j, k). The asymptotic
p-value is 0.4313 under the truncated normality assumption, and thus it also fails to reject the null hypothesis H0.
Table 9 shows the MLEs with λ = 0 under the false normality assumption and the truncated normality assumption, respectively. Figure 10 shows the normal probability plots under the false normality assumption and the truncated normality assumption, re-spectively, with λ = 0. Table 10 shows the MLEs under the false normality assumption and the truncated normality assumption, respectively, without quadratic effects and in-teractions and with λ = 0. Figure 11 shows the normal probability plots under the false normality assumption and the truncated normality assumption, respectively, with-out quadratic effects and interactions and with λ = 0.
Suppose that yl(λ) = β0+ 3 X j=1 βjxlj+ εl (50)
for l = 27 + 1, . . . , 27 + m, where m is a positive integer, yl is the lth observation for
(x1, x2, x3) = (xl1, xl2, xl3), εl is the (l − 27)-th future error distributed as either N (0, σ2)
or truncated N (0, σ2), and ε1, . . . , ε27+m are independent. Let α ∈ (0, 1) be fixed, e.g.,
0.05. For l = 27 + 1, . . . , 27 + m, let Φl(·; θ) denote the c.d.f. of εland ql,α(θ) the α quantile
of yl. Then ql,α(θ) = ( 1 + λ " β0 + 3 X j=1 βjxlj+ Φ−1l (α; θ) #)1/λ (51) for l = 27 + 1, . . . , 27 + m, where Φ−1l (α; θ) = σ Φ−1 α Φ −1/λ − β0− P3 j=1βjxlj σ !! . (52) Thus, [q27+1,αm(θ), q27+1,1−αm(θ)] × · · · × [q27+m,αm(θ), q27+m,1−αm(θ)] is a size 1 − α
predic-tion region for (y27+1, . . . , y27+m)T with MLE [q27+1,αm(ˆθ), q27+1,1−αm(ˆθ)]×· · ·×[q27+m,αm(ˆθ),
q27+m,1−αm(ˆθ)], where αm ≡ [1 − (1 − α)
4
Conclusions and Discussion
Now consider the following transformed truncated normal mean regression model: h(yi; λ) = f (xi; β) + εi (53)
for i = 1, . . . , n, where yi is the response for subject i with known support (a, b) (⊂ R);
λ is an unknown finite-dimensional transformation vector; h(·; λ) is a known strictly increasing and differentiable real-valued function on (a, b); xi is a known covariate vector
for subject i; β is an unknown finite-dimensional regression parameter vector; f (·; β) is a known regression function for each β; and εis are independent errors distributed
as either N (0, g2(f (x
i; β), zi; γ) σ2) or truncated N (µi(θ), σi2(θ)) such that µi(θ) is an
unknown mean parameter, σi(θ) is an unknown positive standard deviation parameter,
zi is a known covariate vector for subject i; γ is an unknown finite-dimensional parameter
vector; g(·, ·; γ) is a known positive function for each γ; and σ is an unknown positive scale parameter. Notice that, for i = 1, . . . , n, f (xi; β) is the mean of h(yi; λ) when it is
in the support of h(yi; λ), and g2(f (xi; β), zi; γ) σ2 is the variance of h(yi; λ).
By Johnson and Kotz (1994), using well-known formulas for the truncated normal distributions, it can show that suppose εi ∼ N (µi(θ), σi(θ)) has a normal distribution and
lies within the interval εi ∈ (ai, bi). Set a0i = [ai− µi(θ)]/σi(θ), b0i = [bi− µi(θ)]/σi(θ), εi
conditional on ai < εi < bi has a truncated normal distribution with probability density
function f (εi; µi(θ), σi(θ), ai, bi) = φ((εi− µi(θ))/σi(θ) σi(θ) [Φ(b0i) − Φ(a0i)] . (54) Then Eθ(εi|{ai < εi < bi}) = µi(θ) + φ(a0i) − φ(b0i) Φ(b0i) − Φ(a0i)σi(θ) (55) and V arθ(εi|{ai < εi < bi}) = σi2(θ) " 1 + a 0 iφ(a 0 i) − b 0 iφ(b 0 i) Φ(b0i) − Φ(a0i) − φ(a0 i) − φ(b 0 i) Φ(b0i) − Φ(a0i) 2# . (56) Since simultaneously solving equations (55) and (56), it will take too much time to evaluate the MLEs and the corresponding likelihood inference.
Consider the following transformed truncated normal median regression model: h(yi; λ) = f (xi; β) + εi (57)
where εis are independent errors distributed as either N (0, g2(f (xi; β), zi; γ) σ2) or
trun-cated N (µi(θ), σ2i(θ)), Notice that, for i = 1, . . . , n, f (xi; β) is the median of h(yi; λ), and
g(f (xi; β), zi; γ) σ is the interquartile range of h(yi; λ).
One way to obtain µi(θ)s is to utilize the Newton-Raphson method, but µi(θ) generally
it has no closed-form to be evaluated directly. It will take too much time to evaluate µi(θ).
In this paper, we propose the transformed truncated normal mode regression model. The important advantage of our model is that the MLEs are easy and fast to compute. In the proposed model, we utilize the MLEs and likelihood function to do hypothesis testing and statistic intervals, and we compare the MLEs under truncated normality assumption with the MLEs under false normality assumption.
Under the false normality assumption, the log-likelihood function for θ is log[L(θ)] ≡ `(θ) ≡ n X i=1 `i(θ), (58) where
`i(θ) = log[φ(ei(θ))] + log[h0i(λ)] − log[gi(β, γ)] − log(σ). (59)
Then the score function for θ is ∂`(θ) ∂θ = n X i=1 ∂`i(θ) ∂θ ≡ n X i=1 Si(θ) ≡ S(θ). (60)
We compare equations (59) and (60) with equations (16) and (19).
Consider the standard deviation is fixed, if the sample size is not large enough, the difference between the score function for θ under the false normality assumption and under the truncated normality assumption will be small. Hence, the MLEs under the false normality assumption are similiar with under the truncated normality assumption.
Consider the sample size is fixed, if the standard deviation is very small, ei(b; θ) tends
to be ∞ and ei(a; θ) tends to be −∞ generally. Thus, the difference between the score
function for θ under the false normality assumption and under the truncated normality assumption will be small. Hence, the MLEs under the false normality assumption are similiar with under the truncated normality assumption.
In Tables 2-5, there is no significant differences between the MLEs under the false normality assumption and the truncated normality assumption. A possible reason is that the sample size in Example 3.1 is not large enough.
In Tables 6-10, the MLEs under the false normality assumption are nearly the same as under the truncated normality assumption. Some possible reasons are that λ and σ are closed to 0, and the sample size is also not large enough in Example 3.2.
When the range of the response transformation is possibly different from R, the like-lihood inference under the coventional normality assumption is inappropriate and thus should not be used. Therefore, when the range of the response transformation is possi-bly different from R, we may assume that the proposed model holds and the likelihood inference under the proposed model in Section 2 can be used.
References
[1] Box, G. E. P. and Cox, D. R. (1964) An analysis of transformations. Journal of the Royal Statistical Society: Series B, 26, 211–252.
[2] Carroll, R. J. and Ruppert, D. (1988) Transformation and Weighting in Regression. Chapman and Hall, New York.
[3] Chen, C.-R. and Wang, L.-C. (2003) Likelihood inference under the general response transformation model with heteroscedastic errors. Taiwanese Journal of Mathematics, 7, 261–273.
[4] John, J. A. and Draper, N. R. (1980) An alternative family of transformations. Applied Statistics, 29, 190–197.
[5] Johnson, N. L. and Kotz, S. (1994) Distributions in Statistics: Continuous Univariate Distributions. John Wiley & Sons, New York.
[6] McLachlan, G. J. (2008) The EM Algorithm and Extensions. John Wiley & Sons, Hoboken.
[7] Mosteller, F. and Tukey, J. W. (1977) Data Analysis and Linear Regression. Addison-Wesley, Reading, Massachusetts.
[8] Poirier, D. J. (1978) The use of the Box–Cox transformation in limited dependent variable models. Journal of the American Statistical Association, 73, 284–287.
[9] Prakasa Rao, B. L. S. (1999) Semimartingales and Their Statistical Inference. Chap-man & Hall/CRC, Boca Raton.
[10] Tukey, J. W. (1957) On the comparative anatomy of transformations. The Annals of Mathematical Statistics, 28, 602–632.
Appendix A
For i = 1, . . . , n, Si(θ) = φ0(ei(θ))[∂ei(θ)/∂θ] φ(ei(θ)) + ∂h 0 i(λ)/∂θ h0i(λ) − ∂σ/∂θ σ − ∂g(β, γ)/∂θ g(β, γ) −∂Φ(ei(b; θ))/∂θ − ∂Φ(ei(a; θ))/∂θΦ(ei(b; θ)) − Φ(ei(a; θ))
, where
φ0(ei(θ)) = −ei(θ)φ(ei(θ))
and for ei(u; θ) ∈ R,
∂ei(u; θ) ∂λ = ∂h(u; λ)/∂λ σ gi(β, γ) , ∂ei(u; θ) ∂β = − ∂f (xi; β)/∂β σ gi(β, γ) −h(u; λ) − f (xi; β) σ g2 i(β, γ) ∂gi(β, γ) ∂β , ∂ei(u; θ) ∂σ = − ei(u; θ) σ , ∂ei(u; θ) ∂γ = − ei(u; θ) gi(β, γ) ∂gi(β, γ) ∂γ , ∂Φ(ei(u; θ)) ∂θ = φ(ei(u; θ)) ∂ei(u; θ) ∂θ 1R(ei(u; θ)). As an example, when a = 0, b = ∞, h(u; λ) = u(λ), f
i(β) = xTi β, and gi(β, γ) = 1 for u ∈ (0, ∞) and i = 1, . . . , n, Si(θ) = −ei(θ) ∂ei(θ) ∂θ + y 1−λ i ∂h0i(λ) ∂θ − σ −1∂σ ∂θ + φ(ei(0; θ)) 1 − Φ(ei(0; θ)) ∂ei(0; θ)) ∂θ ,
where ei(θ) = yi(λ)− xT i β σ , ∂ei(θ) ∂λ = log(yi)yλi − y (λ) i σλ , ∂ei(θ) ∂β = − xi σ, ∂ei(θ) ∂σ = − ei(θ) σ , ei(0; θ) = − 1/λ + xT i β σ , ∂ei(0; θ) ∂λ = 1 σλ2, ∂ei(0; θ) ∂β = − xi σ, ∂ei(0; θ) ∂σ = − ei(0; θ) σ , ∂σ ∂σ = 1, h0i(λ) = yiλ−1, and ∂h0i(λ) ∂λ = log(yi)y λ−1 i .
Appendix B
For i = 1, . . . , n, Ji(θ) = − φ00(ei(θ))[∂ei(θ)/∂θ][∂ei(θ)/∂θT] + φ0(ei(θ))[∂2ei(θ)/∂θ∂θT] φ(ei(θ)) +[φ 0(e i(θ))]2[∂ei(θ)/∂θ][∂ei(θ)/∂θT] φ2(e i(θ)) − ∂ 2h0 i(λ)/∂θ∂θT h0 i(λ) +[∂h 0 i(λ)/∂θ][∂h0i(λ)/∂θT] (h0i)2(λ) + ∂2σ/∂θ∂θT σ − [∂σ/∂θ][∂σ/∂θT] σ2 +∂ 2g(β, γ)/∂θ∂θT g(β, γ) − [∂g(β, γ)/∂θ][∂g(β, γ)/∂θT] g2(β, γ) +∂ 2Φ(e i(b; θ))/∂θ∂θT − ∂2Φ(ei(a; θ))/∂θ∂θTΦ(ei(b; θ)) − Φ(ei(a; θ))
−[∂Φ(ei(b; θ))/∂θ − ∂Φ(ei(a; θ))/∂θ][∂Φ(ei(b; θ))/∂θ
T − ∂Φ(e
i(a; θ))/∂θT]
[Φ(ei(b; θ)) − Φ(ei(a; θ))]2
, where
φ0(ei(θ)) = −ei(θ)φ(ei(θ)),
φ00(ei(θ)) = −φ(ei(θ)) + e2i(θ)φ(ei(θ))
φ0(ei(θ)) = −ei(θ)φ(ei(θ)), φ00(ei(θ)) = −φ(ei(θ)) + e2i(θ)φ(ei(θ)), ∂ei(u; θ) ∂λ = ∂h(u; λ)/∂λ gi(β, γ) σ , ∂ei(u; θ) ∂β = − ∂fi(β)/∂β gi(β, γ) σ −h(u; λ) − fi(β) g2 i(β, γ) σ ∂gi(β, γ) ∂β , ∂ei(u; θ) ∂σ = − ei(u; θ) σ , ∂ei(u; θ) ∂γ = − ei(u; θ) gi(β, γ) ∂gi(β, γ) ∂γ , ∂2ei(u; θ) ∂λ∂λT = ∂2h(u; λ)/∂λ∂λT gi(β, γ) σ , ∂2e i(u; θ) ∂β∂βT = − ∂2f i(β)/∂β∂βT gi(β, γ) σ +∂fi(β)/∂β g2 i(β, γ) σ ∂gi(β, γ) ∂βT + ∂fi(β)/∂β g2 i(β, γ) σ ∂gi(β, γ) ∂βT +2h(u; λ) − fi(β) g3 i(β, γ) σ ∂gi(β, γ) ∂β ∂gi(β, γ) ∂βT − h(u; λ) − fi(β) g2 i(β, γ) σ ∂2gi(β, γ) ∂β∂βT , ∂2ei(u; θ) ∂σ2 = 2 ei(u; θ) σ2 , ∂2e i(u; θ) ∂γ∂γT = 2 ei(u; θ) g2 i(β, γ) ∂gi(β, γ) ∂γ ∂gi(β, γ) ∂γT − ei(u; θ) gi(β, γ) ∂2g i(β, γ) ∂γ∂γT , ∂2ei(u; θ) ∂λ∂βT = − ∂h(u; λ)/∂λ g2 i(β, γ) σ ∂gi(β, γ) ∂βT , ∂2ei(u; θ) ∂λ∂σ = − ∂h(u; λ)/∂λ gi(β, γ) σ2 , ∂2e i(u; θ) ∂λ∂γT = − ∂h(u; λ)/∂λ g2 i(β, γ)σ ∂gi(β, γ) ∂γ , ∂2e i(u; θ) ∂β∂σ = ∂fi(β)/∂β gi(β, γ) σ2 +h(u; λ) − fi(β) g2 i(β, γ) σ2 ∂gi(β, γ) ∂β , ∂2e i(u; θ) ∂β∂γT = ∂fi(β)/∂β g2 i(β, γ) σ ∂gi(β, γ) ∂γT + 2 h(u; λ) − fi(β) g3 i(β, γ) σ ∂gi(β, γ) ∂β ∂gi(β, γ) ∂γT −h(u; λ) − fi(β) g2 i(β, γ) σ ∂2gi(β, γ) ∂β∂γT , ∂2ei(u; θ) ∂σ∂γ = ei(u; θ) gi(β, γ) σ ∂gi(β, γ) ∂γ , ∂Φ(ei(u; θ)) ∂θ = φ(ei(u; θ)) ∂ei(u; θ) ∂θ 1R(ei(u; θ)), and ∂2Φ(e i(u; θ))
∂θ∂θT = −ei(u; θ)φ(ei(u; θ))
∂ei(u; θ) ∂θ ∂ei(u; θ) ∂θT 1R(ei(u; θ)) +φ(ei(u; θ)) ∂2ei(u; θ) ∂θ∂θT 1R(ei(u; θ)).
As an example, when a = 0, b = ∞, h(u; λ) = u(λ), fi(β) = xTi β, and gi(β, γ) = 1 for u ∈ (0, ∞) and i = 1, . . . , n, Ji(θ) = [1 − e2i(θ)][∂ei(θ)/∂θ][∂ei(θ)/∂θT] + ei(θ)[∂2ei(θ)/∂θ∂θT] +e2i(θ)[∂ei(θ)/∂θ][∂ei(θ)/∂θT] − yi1−λ ∂2h0i(λ) ∂θ∂θT +yi2 (1−λ)∂h 0 i(λ) ∂θ ∂h0i(λ) ∂θT + σ −1 ∂2σ ∂θ∂θT − σ −2 ∂σ ∂θ ∂σ ∂θT +ei(0; θ)φ(ei(0; θ)) 1 − Φ(ei(0; θ)) ∂ei(0; θ) ∂θ ∂ei(0; θ) ∂θT − φ(e0(u; θ)) 1 − Φ(ei(0; θ)) ∂2ei(0; θ) ∂θ∂θT − φ(ei(0; θ)) 1 − Φ(ei(0; θ)) 2 ∂ei(0; θ) ∂θ ∂ei(0; θ) ∂θT , where ei(θ) = y(λ)i − xT i β σ , ∂ei(θ) ∂λ = log(yi)yλi − y (λ) i σ λ , ∂ei(θ) ∂β = − xi σ, ∂ei(θ) ∂σ = − ei(θ) σ , ∂2ei(θ) ∂λ2 = [log(yi)]2yλi − [log(yi)y (λ) i − y (λ) i ]/λ σ λ − log(yi)yiλ− y (λ) i σ λ2 , ∂2e i(θ) ∂β2 = 0, ∂2e i(θ) ∂σ2 = 2 ei(θ) σ2 , ∂2e i(θ) ∂λ∂β = 0, ∂2e i(θ) ∂λ∂σ = − log(yi)yλi − y (λ) i σ2λ , ∂2ei(θ) ∂β∂σ = xi σ2, ei(0; θ) = − 1/λ + xT iβ σ ,
∂ei(0; θ) ∂λ = 1 σ λ2, ∂ei(0; θ) ∂β = − xi σ, ∂ei(0; θ) ∂σ = − ei(0; θ) σ . ∂2e i(0; θ) ∂λ2 = − 2 σ λ3, ∂2ei(0; θ) ∂β∂βT = 0, ∂2e i(0; θ) ∂σ2 = 2 ei(0; θ) σ2 , ∂2e i(0; θ) ∂λ∂β = 0, ∂2ei(0; θ) ∂λ∂σ = − 1 σ2λ2, ∂2e i(0; θ) ∂β∂σ = xi σ2, ∂σ ∂σ = 1, h0i(λ) = yiλ−1, ∂h0i(λ) ∂λ = log(yi)y λ−1 i , and ∂2h0 i(λ) ∂λ2 = [log(yi)] 2 yλ−1i .
Appendix C
Consider the following Box-Cox transformed truncated normal two-way ANOVA model: y(λ)ijk = µij + εijk = µ + τi+ βj + (τ β)ij + εijk
for i = 1, . . . , a; j = 1, . . . , b; and k = 1, . . . , n, where a, b, n ∈ {2, 3, . . .} and εijks are
independent errors distributed as either N (0, σ2) or truncated N (0, σ2) with unknown
positive standard deviation σ.
(i) Choose several initial values ˆλ(0)s in a non-empty set S, e.g., S = {−2, −7/4, −3/2, −5/4, −1, −3/4, −1/2, −1/4, 0, 1/4, 1/2, 3/4, 1, 5/4, 3/2, 7/4, 2}.
(ii) For each ˆλ(0) in S, choose the initial values ˆ µ(0) ≡ y(ˆλ(0)) ... , b τi(0) ≡ y (ˆλ(0)) i.. − y (ˆλ(0)) ... , b βj (0) ≡ y(ˆ.j.λ(0))− y(ˆλ(0)) ... , (cτ β)(0)ij ≡ y(ˆij.λ(0))− yi..(ˆλ(0))− y(ˆ.j.λ(0))+ y(ˆ...λ(0)), and ˆ σ2(0) ≡ 1 abn − ab − 1 a X i=1 b X j=1 n X k=1 h yijk(ˆλ(0))− y(ˆij.λ(0))i2 for i = 1, . . . , a; j = 1, . . . , b; and k = 1, . . . , n, where
y... ≡ 1 abn a X i=1 b X j=1 n X k=1 yijk, yi.. ≡ 1 bn b X j=1 n X k=1 yijk, y.j. ≡ 1 an a X i=1 n X k=1 yijk, and yij. ≡ 1 n n X k=1 yijk.
(iii) Denote these ˆθ(0)s as ˆθ(0,1), ˆθ(0,2), . . . , ˆθ(0,|S|), where |S| denotes the number of ele-ments in S. Choose ˆθ(0) as ˆθ(0,`∗) such that ˆθ(0,`∗) = max
In Example 3.1, when we choose ˆλ(0) = −3/4, we have the largest log-likelihood function for ˆθ(0), `(ˆθ(0)) = 55.6467, then we use the initial value to iterate the equation
Appendix D
Suppose that the transformed truncated normal mode two-way ANOVA model is yijk(λ) = µ + τi+ βj+ (τ β)ij + εijk
for i = 1, 2, 3 and j, k = 1, 2, 3, 4, where each yijk has support (0, ∞) and εijks are
independent errors distributed as either N (0, σ2) or truncated N (0, σ2) with unknown
positive standard deviation σ and with support (−∞, −1/λ − µ − τi − βj − (τ β)ij) for
λ < 0. Thus, the c.d.f. of εijk/σ is
Pθ({εijk/σ < u}) =
Φ(u)
Φ([−1/λ − µ − τi− βj− (τ β)ij]/σ)
. By the probability integral transformation,
Φ(εijk/σ)
Φ([−1/λ − µ − τi− βj − (τ β)ij]/σ)
∼ uniform(0, 1), which implies that
Φ−1 Φ(εijk/σ) Φ([−1/λ − µ − τi − βj − (τ β)ij]/σ ) ∼ N (0, 1).
Table 1: Survival times (1 unit = 10 hours) of animals in a 3×4 factorial experiment. B (Treatment) A (Poison) 1 2 3 4 1 0.31 0.82 0.43 0.45 0.45 1.10 0.45 0.71 0.46 0.88 0.63 0.66 0.43 0.72 0.76 0.62 2 0.36 0.92 0.44 0.56 0.29 0.61 0.35 1.02 0.40 0.49 0.31 0.71 0.23 1.24 0.40 0.38 3 0.22 0.30 0.23 0.30 0.21 0.37 0.25 0.36 0.18 0.38 0.24 0.31 0.23 0.29 0.22 0.33
Table 2: MLEs under the false normality assumption and the truncated normality as-sumption, respectively, for Example 3.1.
False Truncated MLE Normality Normality
ˆ λ −0.8073 −0.8077 ˆ µ −1.4175 −1.4179 ˆ τ1 0.6797 0.6799 ˆ τ2 0.2878 0.2879 ˆ β1 −0.7383 −0.7386 ˆ β2 0.6451 0.6453 ˆ β3 −0.2778 −0.2779 (cτ β)11 0.1359 0.1360 (cτ β)12 −0.0658 −0.0659 (cτ β)13 0.2160 0.2161 (cτ β)21 −0.1043 −0.1043 (cτ β)22 0.1142 0.1142 (cτ β)23 −0.1234 −0.1234 ˆ σ 0.3567 0.3569
Table 3: MLEs without interaction under the false normality assumption and the trun-cated normality assumption, respectively, for Example 3.1.
False Truncated MLE Normality Normality
ˆ λ −0.7440 −0.7441 ˆ µ −1.3576 −1.3577 ˆ τ1 0.6393 0.6394 ˆ τ2 0.2696 0.2697 ˆ β1 −0.7383 −0.6938 ˆ β2 0.6123 0.6124 ˆ β3 −0.2778 −0.2644 ˆ σ 0.3636 0.3637
Table 4: MLEs with λ = −1 under the false normality assumption and the truncated normality assumption, respectively, for Example 3.1.
False Truncated MLE Normality Normality
ˆ µ −1.6232 −1.6228 ˆ τ1 0.8213 0.8219 ˆ τ2 0.3526 0.3524 ˆ β1 −0.8961 −0.8965 ˆ β2 0.7596 0.7608 ˆ β3 −0.3240 −0.3244 (cτ β)11 0.2111 0.2106 (cτ β)12 −0.1211 −0.1193 (cτ β)13 0.2632 0.2626 (cτ β)21 −0.1017 −0.1015 (cτ β)22 0.1126 0.1120 (cτ β)23 −0.1194 −0.1192 ˆ σ 0.4235 0.4241
Table 5: MLEs without interaction and with λ = −1 under the false normality assumption and the truncated normality assumption, respectively, for Example 3.1.
False Truncated MLE Normality Normality
ˆ µ −1.6232 −1.6216 ˆ τ1 0.8213 0.8242 ˆ τ2 0.3527 0.3512 ˆ β1 −0.7383 −0.8978 ˆ β2 0.7596 0.7635 ˆ β3 −0.3240 −0.3256 ˆ σ 0.4607 0.4627
Table 6: Cycles to failure of worsted yarn: 33 factorial experiment without replication. Factor levels x1 x2 x3 Cycles to failure −1 −1 −1 674 −1 −1 0 370 −1 −1 1 292 −1 0 −1 338 −1 0 0 266 −1 0 1 210 −1 1 −1 170 −1 1 0 118 −1 1 1 90 0 −1 −1 1414 0 −1 0 1198 0 −1 1 634 0 0 −1 1022 0 0 0 620 0 0 1 438 0 1 −1 442 0 1 0 332 0 1 1 220 1 −1 −1 3636 1 −1 0 3184 1 −1 1 2000 1 0 −1 1568 1 0 0 1070 1 0 1 566 1 1 −1 1140 1 1 0 884 1 1 1 360
Table 7: MLEs under the false normality assumption and the truncated normality as-sumption, respectively, for Example 3.2.
False Truncated MLE Normality Normality
ˆ λ −0.2158 −0.2158 ˆ β0 3.4929 3.4929 ˆ β1 0.2142 0.2142 ˆ β2 −0.1626 −0.1626 ˆ β3 −0.0954 −0.0954 ˆ β12 0.0541 −0.0541 ˆ β13 0.0232 −0.0232 ˆ β23 −0.0124 −0.0124 ˆ β11 −0.0219 0.0219 ˆ β22 −0.0030 0.0030 ˆ β33 −0.0164 −0.0164 ˆ σ 0.0435 0.0435
Table 8: MLEs without quadratic terms under the false normality assumption and the truncated normality assumption , respectively, for Example 3.2.
False Truncated MLE Normality Normality
ˆ λ −0.0363 −0.0363 ˆ β0 5.6577 5.6577 ˆ β1 0.6611 0.6611 ˆ β2 −0.5010 −0.5010 ˆ β3 −0.2950 −0.2950 ˆ σ 0.1541 0.1541
Table 9: MLEs with λ = 0 under the false normality assumption and the truncated normality assumption , respectively, for Example 3.2.
False Truncated MLE Normality Normality
ˆ β0 6.4763 6.4763 ˆ β1 0.8324 0.8324 ˆ β2 −0.6310 −0.6310 ˆ β3 −0.3716 −0.3716 ˆ β12 −0.0383 −0.0383 ˆ β13 −0.0684 −0.0684 ˆ β23 −0.0208 −0.0208 ˆ β11 −0.1275 −0.1275 ˆ β22 −0.0176 −0.0176 ˆ β33 −0.0466 −0.0466 ˆ σ 0.1758 0.1758
Table 10: MLEs with λ = 0 and without quadratic terms under the false normality assumption and the truncated normality assumption, respectively, for Example 3.2.
False Truncated MLE Normality Normality
ˆ β0 6.3486 6.3486 ˆ β1 0.8323 0.8323 ˆ β2 −0.6310 −0.6310 ˆ β3 −0.3716 −0.3716 ˆ σ 0.1950 0.1950
(a) (b)
Figure 2:
(a) Residual plot against fitted values for the original data under the two-way ANOVA effects model for Example 3.1.
(b) Residual plot against fitted values for the transformed data under the Box-Cox trans-formed mode regression model for Example 3.1.
(a) (b)
Figure 3:
(a) Normal probability plot under the false normality assumption for Example 3.1. (b) Normal probability plot under the truncated normality assumption for Example 3.1.
(a) (b)
Figure 4:
(a) Normal probability plot under the false normality assumption without interactions for Example 3.1.
(b) Normal probability plot under the truncated normality assumption without interac-tions for Example 3.1.
(a) (b)
Figure 5:
(a) Normal probability plot under the false normality assumption with λ = −1 for Ex-ample 3.1.
(b) Normal probability plot under the truncated normality assumption with λ = −1 for Example 3.1.
(a) (b)
Figure 6:
(a) Normal probability plot under the false normality assumption without interactions and with λ = −1 for Example 3.1.
(b) Normal probability plot under the truncated normality assumption without interac-tions and with λ = −1 for Example 3.1.
(a) (b)
Figure 7:
(a) Residual plot against fitted values for the original data under the quadratic regression model for Example 3.2.
(b) Residual plot against fitted values for the transformed data under the Box-Cox trans-formed mode regression model for Example 3.2.
(a) (b)
Figure 8:
(a) Normal probability plot under the false normality assumption for Example 3.2. (b) Normal probability plot under the truncated normality assumption for Example 3.2.
(a) (b)
Figure 9:
(a) Normal probability plot under the false normality assumption without quadratic effects and interactions for Example 3.2.
(b) Normal probability plot under the truncated normality assumption without quadratic effects and interactions for Example 3.2.
(a) (b)
Figure 10:
(a) Normal probability plot under the false normality assumption with λ = 0 for Example 3.2.
(b) Normal probability plot under the truncated normality assumption with λ = 0 for Example 3.2.
(a) (b)
Figure 11:
(a) Normal probability plot under the false normality assumption without quadratic effects and interactions and with λ = 0 for Example 3.2.
(b) Normal probability plot under the truncated normality assumption without quadratic effects and interactions and with λ = 0 for Example 3.2.