變換截常態眾數迴歸模型的概似推論

(1)

國立交通大學

統計學研究所

碩士論文

Likelihood Inference under the Transformed Truncated Normal

Mode Regression Model

研究生：沈彥廷

指導教授：陳志榮博士

(2)

變換截常態眾數迴歸模型的概似推論

Likelihood Inference under the Transformed Truncated

Normal Mode Regression Model

研究生：沈彥廷 Student：Yen-Tin Sen

指導教授：陳志榮博士 Advisor：Dr. Chih-Rung Chen

國立交通大學

統計學研究所

碩士論文

A Thesis

Submitted to Institute of Statistics

College of Science

National Chiao Tung University

In Partial Fulfillment of the Requirements

for the Degree of Master

in

Statistics

June 2010

Hsinchu, Taiwan

(3)

變換截常態眾數迴歸模型的概似推論

學生：沈彥廷指導教授：陳志榮博士

國立交通大學統計學研究所

摘要

當一組資料經過變換以後，其值域有可能不包含某些實數；在此情況下，

變換後的資料不可能滿足傳統的常態假設。因此我們提出一個變換截常態眾

數迴歸模型及其概似推論，然後應用到兩組實際的資料中，並與傳統的常態

假設做比較。最後比較變換截常態眾數、平均數和中位數三種不同迴歸模型

的計算複雜性。

(4)

Likelihood Inference under the Transformed Truncated

Normal Mode Regression Model

Student: Yen-Tin Sen

Advisor: Dr. Chih-Rung Chen

Institute of Statistics

National Chiao Tung University

Abstract

In the paper, the likelihood inference under the transformed truncated normal mode regression model is proposed when the range of the transformation is possibly different from the whole real line. The proposed methodology is then applied to two real data sets in Box and Cox (1964), where the truncated normality assumption is compared with the conventional normality assumption. Finally, the proposed model is compared with the transformed truncated normal mean and median regression models via the computational complexity.

(5)

誌謝

我最需要感謝的，是陳志榮老師的指導，有老師的幫忙才能順利完成這

篇論文，而且老師對於其學生付出的心力，遠比想像中還要多很多，從老師

身上可以感覺出對研究的熱忱，除了學到解決問題的能力之外，更了解處理

事情應該要細心、謹慎；會以老師為榜樣，在未來的日子，懷抱熱情並以求

知的精神走下去。感謝兩年來所上老師們的教導，同時也要謝謝所上行政人

員，在剛來統研所這個新環境時的幫助，還有同學們在課業上互相學習、競

爭，這對我助益良多。很高興能在統研所和老師、同學一起度過兩年。

最後，感謝父母給於生活上的安定，讓我可以專心於學業上，不必為其

他事操煩，有父母的付出、老師的教導、同學們的關愛讓我能順順利利完成

學業。

沈彥廷謹至于

國立交通大學統計學研究所

中華民國九十九年六月

(6)

List of Tables

1 Survival times (1 unit = 10 hours) of animals in a 3×4 factorial experiment. 29 2 MLEs under the false normality assumption and the truncated normality

assumption, respectively, for Example 3.1. . . 30 3 MLEs without interaction under the false normality assumption and the

truncated normality assumption, respectively, for Example 3.1. . . 31 4 MLEs with λ = −1 under the false normality assumption and the truncated

normality assumption, respectively, for Example 3.1. . . 32 5 MLEs without interaction and with λ = −1 under the false normality

assumption and the truncated normality assumption, respectively, for Ex-ample 3.1. . . 33 6 Cycles to failure of worsted yarn: 33 _{factorial experiment without replication. 34}

7 MLEs under the false normality assumption and the truncated normality assumption, respectively, for Example 3.2. . . 35 8 MLEs without quadratic terms under the false normality assumption and

the truncated normality assumption , respectively, for Example 3.2. . . 36 9 MLEs with λ = 0 under the false normality assumption and the truncated

normality assumption , respectively, for Example 3.2. . . 37 10 MLEs with λ = 0 and without quadratic terms under the false

normal-ity assumption and the truncated normalnormal-ity assumption, respectively, for Example 3.2. . . 38

(8)

List of Figures

1 Some different modified power transformations. . . 39 2 (a) Residual plot against fitted values for the original data under the

two-way ANOVA effects model for Example 3.1. (b) Residual plot against fitted values for the transformed data under the Box-Cox transformed mode regression model for Example 3.1. . . 40 3 (a) Normal probability plot under the false normality assumption for

Ex-ample 3.1. (b) Normal probability plot under the truncated normality assumption for Example 3.1. . . 40 4 (a) Normal probability plot under the false normality assumption

with-out interactions for Example 3.1. (b) Normal probability plot under the truncated normality assumption without interactions for Example 3.1. . . 41 5 (a) Normal probability plot under the false normality assumption with

λ = −1 for Example 3.1. (b) Normal probability plot under the truncated normality assumption with λ = −1 for Example 3.1. . . 41 6 (a) Normal probability plot under the false normality assumption without

interactions and with λ = −1 for Example 3.1. (b) Normal probability plot under the truncated normality assumption without interactions and with λ = −1 for Example 3.1. . . 42 7 (a) Residual plot against fitted values for the original data under the

quadratic regression model for Example 3.2. (b) Residual plot against fit-ted values for the transformed data under the Box-Cox transformed mode regression model for Example 3.2. . . 42 8 (a) Normal probability plot under the false normality assumption for

Ex-ample 3.2. (b) Normal probability plot under the truncated normality assumption for Example 3.2. . . 43 9 (a) Normal probability plot under the false normality assumption without

quadratic effects and interactions for Example 3.2. (b) Normal probability plot under the truncated normality assumption without quadratic effects and interactions for Example 3.2. . . 43

(9)

10 (a) Normal probability plot under the false normality assumption with λ = 0 for Example 3.2. (b) Normal probability plot under the truncated normality assumption with λ = 0 for Example 3.2. . . 44 11 (a) Normal probability plot under the false normality assumption without

quadratic effects and interactions and with λ = 0 for Example 3.2. (b) Normal probability plot under the truncated normality assumption without quadratic effects and interactions and with λ = 0 for Example 3.2. . . 44

(10)

1 Introduction

The techniques for linear models are justified by assuming simplicity of systematic structure, constancy of error variances, normality of distributions, and independence of responses. In analyzing data which do not satisfy the traditional assumptions for linear models, Tukey (1957) suggested two alternatives: either a new analysis must be devised or the data must be transformed to satisfy the assumptions. If a satisfactory transformation can be found, it is usually easier to use the conventional techniques for linear models to analyze the transformed data than to develop a new method to analyze the original data.

It is common practice simply to assume the following normal regression model: yi = f (xi; β) + εi (1)

for i = 1, . . . , n, where yi is the response for subject i; xi is a known covariate vector

for subject i; β is an unknown finite-dimensional regression parameter vector; f (·; β) is a known regression function for each β, e.g., f (xi; β) = xTi β or exp{xTi β}; and εis are

i.i.d. N (0, σ2) errors with unknown positive standard deviation σ. Notice that all of the mean, median, and mode of yi are the same as f (xi; β) for i = 1, . . . , n.

When there exist heteroscedastic errors and/or departures from normality in the data, one possible approach is to transform the data. A widely used family of transformations to transform positive continuous data is the family of modified power transformations

u(λ) ≡    (uλ_{− 1)/λ for λ 6= 0,} log(u) for λ = 0. (2) Figure 1 shows some different modified power transformations.

In such situations, Box and Cox (1964) proposed the following Box-Cox transformed linear normal regression model:

y_i(λ) = xT_i β + εi (3)

for i = 1, . . . , n, where yi has support (0, ∞) and λ is an unknown real-valued

transfor-mation parameter.

When both heteroscedastic errors and departures from normality cannot be simulta-neously removed in the data by any single transformation, Carroll and Ruppert (1988) proposed the following Box-Cox transformed heteroscedastic normal regression model:

(11)

for i = 1, . . . , n, where εis are independent errors distributed as N (0, g2(f (xi; β), zi; γ) σ2)

such that zi is a known covariate vector for subject i, e.g., zi is a known function of xi;

γ is an unknown finite-dimensional parameter vector; g(·, ·; γ) is a known positive function for each γ, e.g., g(f (xi; β), zi; γ) = exp{f (xi; β)γ1 + ziTγ2} with γ ≡ (γ1, γ2T)T; and σ is

an unknown positive scale parameter. In Carroll and Ruppert (1988), the constants 1/g2_{(f (x}

i; β), zi; γ) are called the true weights. Notice that model (3) is a special case of

model (4) when f (xi; β) = xTiβ and g(f (xi; β), zi; γ) = 1 for i = 1, . . . , n.

However, y(λ)_i ∈ (−1/λ, ∞) for λ > 0, (−∞, ∞) (≡ R) for λ = 0, and (−∞, −1/λ) for λ < 0, respectively. Thus, except for λ = 0, y(λ)_i cannot be normally distributed. Hence, Poirier (1978) modified model (3) to the following Box-Cox transformed linear truncated normal mode regression model:

y_i(λ) = xT_i β + εi (5)

for i = 1, . . . , n, where εis are independent errors distributed as either N (0, σ2) for λ = 0

or truncated N (0, σ2_{) for λ 6= 0 with unknown positive scale parameter σ. Notice that,}

for i = 1, . . . , n, xT_i β is the mode of y_i(λ) when it is in the support of y(λ)_i ; however, it is generally neither the mean nor median of y_i(λ).

In Chen and Wang (2003), three widely used families of transformations with ranges possibly different from R are reviewed as follows:

Example 1.1 The family of shifted power transformations (Box and Cox, 1964)

h(u; λ) ≡ (u − a)(λ) =    [(u − a)λ_{− 1]/λ for λ 6= 0,} log(u − a) for λ = 0, (6) can be used to transform data with known support (a, ∞), where a ∈ R, e.g., a = 0. Then the range h((a, ∞); λ) is (−1/λ, ∞) for λ > 0, R for λ = 0, and (−∞, −1/λ) for λ < 0, respectively. Similarly, the family of transformations

h(u; λ) ≡ −(b − u)(λ) =    [1 − (b − u)λ_{]/λ for λ 6= 0,} − log(b − u) for λ = 0, (7) can be used to transform data with known support (−∞, b), where b ∈ R, e.g., b = 0. Then the range h((−∞, b); λ) is (−∞, 1/λ) for λ > 0, R for λ = 0, and (1/λ, ∞) for λ < 0, respectively.

(12)

Tukey, 1977) h(u; λ) ≡ (u − a)(λ)− (b − u)(λ) =    [(u − a)λ_{− (b − u)}λ_{]/λ for λ 6= 0,}

log[(u − a)/(b − u)] for λ = 0,

(8) can be used to transform data with known support (a, b), where −∞ < a < b < ∞, e.g., (a, b) = (0, 1). Then the range h((a, b); λ) is (−(b − a)λ_{/λ, (b − a)}λ_{/λ) for λ > 0 and R}

for λ ≤ 0, respectively.

Example 1.3 The family of shifted modulus power transformations (John and Draper, 1980)

h(u; λ) ≡ sgn(u − λ2)(|u − λ2| + 1)(λ1)

=   

sgn(u − λ2)[(|u − λ2| + 1)λ1 − 1]/λ1 for λ1 6= 0,

sgn(u − λ2) log(|u − λ2| + 1) for λ1 = 0,

(9) can be used to transform data with support R, where λ ≡ (λ1, λ2)T and sgn(u) = 1 for

u > 0, 0 for u = 0, and −1 for u < 0, respectively. Then the range h(R; λ) is R for λ1 ≥ 0

and (1/λ1, −1/λ1) for λ1 < 0, respectively.

In order to cover such kinds of families in Examples 1–3, Chen and Wang (2003) modified model (4) to the following transformed truncated normal median regression model:

h(yi; λ) = f (xi; β) + εi (10)

for i = 1, . . . , n, where yi has known support (a, b) (⊂ R), e.g., (a, b) = (0, ∞), or (0, 1), or

R; λ is an unknown finite-dimensional transformation vector; h(·; λ) is a known strictly increasing and differentiable real-valued function on (a, b), e.g., Examples 1.1–1.3; and εis are independent errors distributed as either N (0, g2(f (xi; β), zi; γ) σ2) or truncated

N (µi(λ, β, σ, γ), g2(f (xi; β), zi; γ) σ2) with median 0 for some µi(λ, β, σ, γ) ∈ R. Notice

that, for i = 1, . . . , n, f (xi; β) is the median of h(yi; λ); however, it is generally neither

the mean nor mode of h(yi; λ).

In Section 2, the transformed truncated normal mode regression model is proposed to extend model (5) and then the corresponding likelihood inference is discussed thoroughly. In Section 3, the proposed methodology is applied to two real data sets in Box and Cox (1964). Finally, conclusions and discussion are given in Section 4.

(13)

2 Transformed Truncated Normal Mode Regression

Model

In this section, the transformed truncated normal mode regression model is proposed to extend model (5) and then the corresponding likelihood inference is discussed thoroughly.

2.1 Transformed Truncated Normal Mode Regression Model

Consider the following transformed truncated normal mode regression model:

h(yi; λ) = f (xi; β) + εi (11)

for i = 1, . . . , n, where yi is the response for subject i with known support (a, b) (⊂ R),

e.g., (0, ∞), or (0, 1), or R; λ is an unknown finite-dimensional transformation vector; h(·; λ) is a known strictly increasing and differentiable real-valued function on (a, b), e.g., Examples 1.1–1.3 in Section 1; xiis a known covariate vector for subject i; β is an unknown

finite-dimensional regression parameter vector; f (·; β) is a known regression function for each β, e.g., f (xi; β) = xTi β or exp{xTi β}; and εis are independent errors distributed as

either N (0, g2_{(f (x}

i; β), zi; γ) σ2) or truncated N (0, g2(f (xi; β), zi; γ) σ2) such that zi is a

known covariate vector for subject i, e.g., zi is a known function of xi; γ is an unknown

finite-dimensional parameter vector; g(·, ·; γ) is a known positive function for each γ, e.g., g(f (xi; β), zi; γ) = exp{f (xi; β) γ1 + ziTγ2} with γ ≡ (γ1, γ2T)T; and σ is an unknown

positive scale parameter. Notice that, for i = 1, . . . , n, f (xi; β) is the mode of h(yi; λ)

when it is in the support of h(yi; λ); however, it is generally neither the mean nor median

of h(yi; λ).

2.2 Maximum Likelihood Estimation

Let θ (≡ (θ1, . . . , θd)T) denote the d-dimensional parameter vector (λT, βT, σ, γT)T

in the parameter space Θ, where Θ is a non-empty open subset of the d-dimensional Euclidean space Rd. Let Φ(·) denote the cumulative distribution function (c.d.f.) of N (0, 1) and φ(·) the probability density function (p.d.f.) of N (0, 1). Set

ei(u; θ) ≡ h(u; λ) − f (xi; β) g(f (xi; β), zi; γ) σ ≡ h(u; λ) − fi(β) gi(β, γ) σ (12)

(14)

for u ∈ [a, b], θ ∈ Θ, and i = 1, . . . , n, where h(a; λ) ≡ limu↓ah(u; λ) and h(b; λ) ≡

limu↑bh(u; λ).

Under model (11), the p.d.f. of yi is

pi(yi; θ) =

φ(ei(yi; θ))h0(yi; λ)

gi(β, γ) σ [Φ(ei(b; θ)) − Φ(ei(a; θ))]

· 1(a,b)(yi) (13)

for i = 1, . . . , n, where h0(yi; λ) ≡ ∂h(u; λ)/∂u|u=yi ≡ h

0

i(λ) and 1(a,b)(yi) = 1 for yi ∈ (a, b)

and 0 otherwise. Set y ≡ (y1, . . . , yn)T. Set hi(λ) ≡ h(yi; λ), and ei(θ) ≡ ei(yi; θ) for θ ∈ Θ

and i = 1, . . . , n. Then, given y, the likelihood function for θ is

n

Y

i=1

φ(ei(θ))h0i(λ)

gi(β, γ) σ [Φ(ei(b; θ)) − Φ(ei(a; θ))]

≡ L(θ) (14) and the log-likelihood function for θ is

log[L(θ)] ≡ `(θ) ≡ n X i=1 `i(θ), (15) where

`i(θ) = log[φ(ei(θ))] + log[h0i(λ)] − log[gi(β, γ)] − log(σ)

− log[Φ(ei(b; θ)) − Φ(ei(a; θ))]. (16)

Assume that ∂ ∂θ Z b a pi(yi; θ) dyi = Z b a ∂pi(yi; θ) ∂θ dyi (17) and ∂2 ∂θ∂θT Z b a pi(yi; θ) dyi = Z b a ∂2_p i(yi; θ) ∂θ∂θT dyi (18)

for θ ∈ Θ and i = 1, . . . , n. Then the score function for θ is ∂`(θ) ∂θ = n X i=1 ∂`i(θ) ∂θ ≡ n X i=1 Si(θ) ≡ S(θ), (19)

the observed Fisher information for θ is −∂ 2_`(θ) ∂θ∂θT = − n X i=1 ∂2_` i(θ) ∂θ∂θT ≡ n X i=1 Ji(θ) ≡ J (θ), (20)

and the expected Fisher information for θ is Covθ(S(θ)) = n X i=1 Covθ(Si(θ)) ≡ n X i=1 Ii(θ) ≡ I(θ), (21)

(15)

where both Si(θ) and Ji(θ) are put in Appendices A and B, respectively. By equation (17),

Eθ(Si(θ)) = 0d×1 for θ ∈ Θ and i = 1, . . . , n, where 0d×1 denotes the d × 1 vector

(0, . . . , 0)T. By equations (17) and (18), Eθ(Ji(θ)) = Ii(θ) for θ ∈ Θ and i = 1, . . . , n.

Assume that, given y, there exists a unique maximum likelihood estimate (MLE) ˆ

θ(y) (≡ ˆθ) of θ. Then ˆθ solves the score equation S(ˆθ) = 0d×1 for θ. One possible

approach to evaluate ˆθ is as follows: First choose a good initial value ˆθ(0) _{and then iterate}

the following equations

ˆ

θ(k+1)= ˆθ(k)+ M−1(ˆθ(k))S(ˆθ(k)) (22)

for k = 0, 1, 2, . . . until ||S(ˆθ(k+1)_{)|| < ε for some small positive value ε, e.g., ε = 10}−3_,

where ||a|| ≡ (aTa)1/2 for a ∈ Rd. When M (ˆθ(k)) = I(ˆθ(k)) for k = 0, 1, 2, . . ., it is called the Fisher scoring method. However, it will take too much time to evaluate I(ˆθ(k)_)s

because there is generally no closed-form formula for each I(ˆθ(k)). When M (ˆθ(k)) = J (ˆθ(k)_{) for k = 0, 1, 2, . . ., it is called the Newton-Raphson method. It is usually difficult}

to find a good initial value for the Newton-Raphson method specially when d is not a small positive integer. Moreover, it is not necessary that `(ˆθ(k+1)_{) > `(ˆ}_θ(k)_{) for k =}

0, 1, 2, . . .. Thus, a modified Newton-Raphson method is suggested as follows: First choose a good initial value ˆθ(0) _{and a fraction λ ∈ (0, 1), e.g., λ = 1/2. When ˆ}_θ(k) _{is obtained}

and ST(ˆθ(k))J−1(ˆθ(k))S(ˆθ(k)) 6= 0 for some non-negative integer k, iterate the following equations

ˆ

θ(k+1,j) ≡ ˆθ(k)+ sgn(ST(ˆθ(k))J−1(ˆθ(k))S(ˆθ(k))) λjJ−1(ˆθ(k))S(ˆθ(k)) (23)

for j = 0, 1, 2, . . . , mk, where mk is the first j such that `(ˆθ(k+1,j)) > `(ˆθ(k)). Set ˆθ(k+1) ≡

ˆ

θ(k+1,mk) _{for k = 0, 1, 2, . . . until ||S(ˆ}_θ(k+1)_{)|| < ε for some small positive value ε, e.g.,}

ε = 10−3.

When ST(ˆθ(k))J−1(ˆθ(k))S(ˆθ(k)) 6= 0 for some non-negative integer k, it follows from the first-order Taylor expansion that

`(ˆθ(k+1,j))

= `(ˆθ(k)) + ST(ˆθ(k))hsgn(ST(ˆθ(k))J−1(ˆθ(k))S(ˆθ(k))) λjJ−1(ˆθ(k))S(ˆθ(k))i+ o(λj) = `(ˆθ(k)) + λj|ST_(ˆ_θ(k)_)J−1

(ˆθ(k))S(ˆθ(k))| + o(λj) (24) as j → ∞, which implies that `(ˆθ(k+1,j)_{) > `(ˆ}_θ(k)_{) for large j and thus m}

k is well-defined.

Now consider the case where the sample size n tends to infinity. Assume that the following conditions hold:

(16)

(i) the minimum eigenvalue of I(θ) tends to infinity as n → ∞;

(ii) Eθ(max1≤i≤n|∂`i(θ)/∂θj|)/[V arθ(∂`(θ)/∂θj)]1/2→ 0 as n → ∞ for j = 1, . . . , d;

(iii) I−1/2(θ)J (θ)I−1/2(θ) −→ Ip d as n → ∞ , where Id denotes the identity matrix of

order d; and

(iv) [diag{I11(θ), . . . , Idd(θ)}]−1/2I(θ)[diag{I11(θ), . . . , Idd(θ)}]−1/2 → Σ(θ) as n → ∞,

where Ijj(θ) denotes the jth diagonal element of I(θ) for j = 1, . . . , d and Σ(θ) is a

positive definite covariance matrix.

Let M (θ) denote either I(θ) or J (θ). Then, by Theorem 1.80 of Prakasa Rao (1999), M−1/2(θ)S(θ)−→ Nd d(0d×1, Id) (25)

as n → ∞, where Nd(0d×1, Id) denotes the d-variate normal distribution with mean

vec-tor 0d×1 and covariance matrix Id. Assume that

I−1/2(θ){S(ˆθ) − [S(θ) − J (θ)(ˆθ − θ)]} = op(1) (26)

as n → ∞. Then, by condition (iii) and equations (25) and (26), M1/2(θ)(ˆθ − θ) = M−1/2(θ)S(θ) + op(1)

d

−

→ Nd(0d×1, Id) (27)

as n → ∞. Thus, by condition (i) and equation (27), ˆθ is a weakly consistent estimator of θ. Assume that I−1/2(θ)I(ˆθ)I−1/2(θ) −→ Ip d and J−1/2(θ)J (ˆθ)J−1/2(θ)

p − → Id as n → ∞. Then, by equation (27), M1/2(ˆθ)(ˆθ − θ) = M1/2(θ)(ˆθ − θ) + op(1) d − → Nd(0d×1, Id) (28) as n → ∞.

2.3 Hypothesis Testing and Confidence Regions

In this subsection, let ω (≡ (ψT, χT)T ∈ Ω ⊂ Rd_{) be a one-to-one reparameterization}

of θ such that det(∂θ/∂ωT_{) 6= 0 and ∂}2_θ

j/∂χ∂χT is a continuous function of χ for j =

1, . . . , d, where ψ is the d0-dimensional parameter vector of interest and χ is a (d − d0

)-dimensional nuisance parameter vector with d0 ∈ {1, . . . , d}. Here χ does not exist when

d0 = d. Suppose that we are interested in testing the null hypothesis H0: ψ = ψ0 versus

(17)

Set Sψ(χ) ≡ ∂`(θ)/∂χ, Iψ(χ) ≡ Covω(Sψ(χ)), and Jψ(χ) ≡ −∂Sψ(χ)/∂χT for ω ∈ Ω.

Then Sψ(χ) = [∂θT/∂χ]S(θ), Iψ(χ) = [∂θT/∂χ]I(θ)[∂θ/∂χT], and

Jψ(χ) = ∂θT ∂χ J (θ) ∂θ ∂χT − d X j=1 ∂2_θ j ∂χ∂χTS(θ)j (29)

for ω ∈ Ω, where S(θ) ≡ (S(θ)1, . . . , S(θ)d)T. Assume that, given y, there exists a unique

MLE ˆχψ(y) (≡ ˆχψ) of χ for fixed ψ. Then ˆχψ solves the score equation Sψ( ˆχψ) = 0(d−d0)×1

for χ, where 0(d−d0)×1 denotes the (d − d0) × 1 vector (0, . . . , 0)

T_.

Set W (ψ) ≡ 2[`(ˆθ) − `(θ(ψ, ˆχψ))]. Assume that I −1/2 ψ (χ)Jψ( ˆχψ)I −1/2 ψ (χ) p − → Id−d0, I_ψ1/2(χ)( ˆχψ− χ) = I −1/2 ψ (χ)Sψ(χ) + op(1), (30) `(θ) = `(ˆθ) + ST(ˆθ)(θ − ˆθ) − 1 2(θ − ˆθ) T_{J (ˆ}_{θ)(θ − ˆ}_{θ) + o} p(1), (31) and `(θ) = `(θ(ψ, ˆχψ)) + SψT( ˆχψ)(χ − ˆχψ) − 1 2(χ − ˆχψ) T_J ψ( ˆχψ)(χ − ˆχψ) + op(1) (32)

as n → ∞. Then, by equations (27) and (28), W (ψ) = ST(θ)I−1/2(θ) ( Id− I1/2(θ) ∂θ ∂χT ∂θT ∂χ I(θ) ∂θ ∂χT −1 ∂θT ∂χ I 1/2_(θ) ) I−1/2(θ)S(θ) +op(1) d − → χ2 d0 (33) as n → ∞.

Let α ∈ (0, 1) be fixed, e.g., α = 0.05. The likelihood ratio test with asymptotic size α is to reject H0: ψ = ψ0 if and only if the likelihood ratio test statistic W (ψ0) > χ2α,d0,

where χ2_α,d₀ denotes the upper α quantile of the χ2distribution with d0degrees of freedom.

To evaluate W (ψ0), we need to evaluate ˆχψ0. One possible approach to evaluate ˆχψ0 is

to utilize a modified Newton-Raphson method in Section 2.2. Therefore, {ψ0: W (ψ0) ≤

χ2

α,d0} is an asymptotic size 1 − α confidence region for ψ.

2.4 Prediction Region of Future Observations

Suppose that

(18)

for j = 1, . . . , m, where m is a known positive integer, yn+j is the future observation for

subject n + j with support (a, b), xn+j is a known covariate vector for subject n + j,

and εn+j is an error distributed as either N (0, g2(f (xn+j; β), zn+j; γ) σ2) or truncated

N (0, g2_{(f (x}

n+j; β), zn+j; γ) σ2) with known covariate vector zn+j, and ε1, . . . , εn+m are

independent. For θ ∈ Θ, u ∈ [a, b], and j = 1, . . . , m, set en+j(u; θ) ≡ h(u; λ) − f (xn+j; β) g(f (xn+j; β), zn+j; γ) σ ≡ h(u; λ) − fn+j(β) gn+j(β, γ) σ . (35) Let α ∈ (0, 1) be fixed, e.g., α = 0.05. For θ ∈ Θ and j = 1, . . . , m, let Φn+j(·; θ)

denote the c.d.f. of εn+j and qn+j,α(θ) the α quantile of yn+j. Then

qn+j,α(θ) = h−1(fn+j(β) + Φ−1n+j(α; θ); λ) (36)

with MLE qn+j,α(ˆθ) for θ ∈ Θ and j = 1, . . . , m, where

Φ−1_n+j(α; θ) = gn+j(β, γ) σ Φ−1((1 − α) Φ(en+j(a; θ)) + α Φ(en+j(b; θ)))

≡ gn+j(β, γ) σ Φ−1n+j(α, a, b; θ). (37)

Assume that qn+j,α(θ) is a continuously differentiable function of θ with ∂qn+j,α(θ)/∂θ 6=

0d×1 for θ ∈ Θ and j = 1, . . . , m. Then

∂qn+j,α(θ) ∂θ = ∂fn+j(β)/∂θ + σ Φ−1n+j(α, a, b; θ) ∂gn+j(β, γ)/∂θ h0_(q n+j,α(θ); λ) +gn+j(β, γ) Φ −1 n+j(α, a, b; θ) ∂ σ/∂θ + gn+j(β, γ) σ ∂Φ−1n+j(α, a, b; θ)/∂θ h0_(q n+j,α(θ); λ) −∂h(u; λ)/∂θ|u=qn+j,α(θ) h0_(q n+j,α(θ); λ) (38) for θ ∈ Θ and j = 1, . . . , m, where

∂Φ−1_n+j(α, a, b; θ) ∂θ =

(1 − α) ∂Φ(en+j(a; θ))/∂θ + α ∂Φ(en+j(b; θ))/∂θ

φ(Φ−1_n+j(α, a, b; θ)) (39) with both ∂Φ(en+j(a; θ))/∂θ and ∂Φ(en+j(b; θ))/∂θ being evaluated by similar formulas

in Appendix A. By equations (27) and (28), ∂qn+j,α(θ) ∂θT M −1 (θ)∂qn+j,α(θ) ∂θ −1/2 [qn+j,α(ˆθ) − qn+j,α(θ)] d − → N (0, 1) (40) and ∂qn+j,α(θ) ∂θT θ=ˆθ M−1(ˆθ)∂qn+j,α(θ) ∂θ θ=ˆθ −1/2 [qn+j,α(ˆθ) − qn+j,α(θ)] d − → N (0, 1) (41)

(19)

as n → ∞ for θ ∈ Θ and j = 1, . . . , m, where M denotes either I or J . Set αm ≡ [1 − (1 − α)1/m]/2. Since yn+1, . . . , yn+m are independent,

Pθ m \ j=1 {yn+j ∈ [qn+j,αm(θ), qn+j,1−αm(θ)]} ! = m Y j=1 Pθ({yn+j ∈ [qn+j,αm(θ), qn+j,1−αm(θ)]}) = (1 − 2 αm) m _{= 1 − α,} ₍₄₂₎

which implies that [qn+1,αm(θ), qn+1,1−αm(θ)] × · · · × [qn+m,αm(θ), qn+m,1−αm(θ)] is a size

1 − α prediction region for (yn+1, . . . , yn+m)T with MLE [qn+1,αm(ˆθ), qn+1,1−αm(ˆθ)] × · · · ×

(20)

3 Two Real Data Sets

In this section, the proposed methodology is applied to two real data sets in Box and Cox (1964).

3.1 A Biological Experiment Using a 3×4 Factorial Design

Table 1 shows the survival times of animals in a 3 × 4 factorial experiment, the factors being A with three poisons and B with four treatments. Each combination of these two factors is replicated for four animals, the allocation to animals being completely randomized. The two-way analysis-of-variance (ANOVA) effects model is

yijk= µ + τi+ βj + (τ β)ij + εijk (43)

for i = 1, 2, 3 and j, k = 1, 2, 3, 4, where yijk is the kth observation for the ith poison of

factor A and the jth treatment of factor B; µ is the overall mean, τi is the main effect of

the ith level of factor A, βj is the main effect of the jth level of factor B, (τ β)ij is the

interaction between the ith level of factor A and the jth level of factor B, and εijks are

i.i.d. N (0, σ2_{) errors with unknown positive standard deviation σ. Figure 2(a) shows the}

residual plot against fitted values for the original data under the two-way ANOVA effects model. It is seen that V ar(yijk) increases as E(yijk) increases.

Now consider the following Box-Cox transformed truncated normal mode regression model:

y_ijk(λ) = µ + τi+ βj+ (τ β)ij + εijk (44)

for i = 1, 2, 3 and j, k = 1, 2, 3, 4, where each yijk has support (0, ∞), λ is am unknown

real-valued transformation parameter and εijks are independent errors distributed as either

N (0, σ2_{) or truncated N (0, σ}2_{) with unknown positive scale parameter σ. Figure 2(b)}

shows the residual plot against fitted values for the transformed data under the Box-Cox transformed truncated normal mode regression model.

One possible way to find a good initial value ˆθ(0) in this case is put in Appendix C. How to plot the normal probability plot in this case is put in Appendix D. Table 2 shows the MLEs under the false normality assumption and under the truncated normality assumption, respectively. Figure 3 shows the normal probability plots under the false normality assumption and the truncated normality assumption, respectively.

(21)

First, test the null hypothesis H0: (τ β)ij = 0 for all (i, j) versus the alternative H1:

(τ β)ij 6= 0 for some (i, j). Then the asymptotic p-value is 0.3168 and thus it fails to reject

the null hypothesis H0.

Table 3 shows the MLEs without interactions under the false normality assumption and the truncated normality assumption, respectively. Figure 4 shows the normal probability plots under the false normality assumption and the truncated normality assumption, respectively, without interactions.

Similarly, we are also interested in testing the null hypothesis H0: λ = −1 and (τ β)ij =

0 for all (i, j) versus the alternative H1: λ 6= −1 or (τ β)ij 6= 0 for some (i, j). The

asymptotic p-value is 0.2829 under the truncated normality assumption, and thus it also fails to reject the null hypothesis H0.

Table 4 shows the MLEs with λ = −1 under the false normality assumption and the truncated normality assumption, respectively. Figure 5 shows the normal probability plots under the false normality assumption and the truncated normality assumption, respectively, with λ = −1. Table 5 shows the MLEs under the false normality assumption and under the truncated normality assumption, respectively, without interactions and with λ = −1. Figure 6 shows the normal probability plots under the false normality assumption and the truncated normality assumption, respectively, without interactions and with λ = −1.

Suppose that

y_i(λ)

ljlkl = µ + τil+ βjl+ εiljlkl (45)

for l = 1, . . . , m, where m is a positive integer, yiljlkl is the klth observation with the

ilth poisson of factor A and the jlth treatment of factor B, all (il, jl, kl)s are different for

kl ≥ 5, εiljlkl is an error distributed as either N (0, σ

2_{) or truncated N (0, σ}2_{), and ε}

ijks

and εiljlkls are independent.

Let α ∈ (0, 1) be fixed, e.g., 0.05. For l = 1, . . . , m, let Φiljlkl(·; θ) denote the c.d.f. of

εiljlkl and qiljlkl,α(θ) the α quantile of yiljlkl. Then

qiljlkl,α(θ) =1 + λ µ + τil+ βjl+ Φ −1 iljlkl(α; θ) 1/λ (46) for l = 1, . . . , m, where Φ−1_i ljlkl(α; θ) = σ Φ −1 α Φ −1/λ − µ − τil− βjl σ . (47)

(22)

Thus, [qi1j1k1,αm(θ), qi1j1k1,1−αm(θ)] × · · · × [qimjmkm,αm(θ), qimjmkm,1−αm(θ)] is a size 1 − α

prediction region for (yi1j1k1, . . . , yimjmkm)

T _{with MLE [q}

i1j1k1,αm(ˆθ), qi1j1k1,1−αm(ˆθ)] × · · · ×

[qimjmkm,αm(ˆθ), qimjmkm,1−αm(ˆθ)], where αm ≡ [1 − (1 − α)

1/m_]/2.

3.2 A Textile Experiment Using a Single Replicate of a 3

3

Design

Table 6 shows the numbers of cycles to failure, y, obtained in a single replicate of a 33 factorial experiment in which the factors are

x1: length of test specimen (250, 300, 350 mm),

x2: amplitude of loading cycle (8, 9, 10 mm),

x3: load (40, 45, 50 gm).

In Table 6, the levels of the x1, x2, and x3 are coded as −1, 0, 1, respectively. Consider

the following quadratic regression model: yi = β0+ 3 X j=1 βjxij + X 1≤j≤k≤3 βjkxijxik+ εi (48)

for i = 1, . . . , 27, where yiis the response for (x1, x2, x3) = (xi1, xi2, xi3), β0is the intercept,

βjs and βjks are regression coefficients, and εis are i.i.d. N (0, σ2) errors with unknown

positive standard deviation σ. Figure 7(a) shows the residual plot against fitted values for the original data under the quadratic regression model. It is easily seen that there is an obvious pattern in Figure 7(a).

Now consider the following Box-Cox transformed truncated normal mode regression model: y_i(λ) = β0+ 3 X j=1 βjxij + X 1≤j≤k≤3 βjkxijxik+ εi (49)

for i = 1, . . . , 27, where yi has support (0, ∞) and εis are independent errors distributed

as either N (0, σ2_{) or truncated N (0, σ}2_{) with unknown positive standard deviation σ.}

Figure 7(b) is the residual plot against fitted values for the transformed data under the Box-Cox transformed truncated normal mode regression model.

Table 7 shows the MLEs under the false normality assumption and under the truncated normality assumption, respectively. It is seen that the MLEs under the false normality assumption are nearly the same as under the truncated normality assumption. Figure 8 shows the normal probability plots under the false normality assumption and the trun-cated normality assumption, respectively.

(23)

First, test the null hypothesis H0: βjk = 0 for all (j, k) versus the alternative H1:

βjk 6= 0 for some (j, k). Then the asymptotic p-value is 0.3487 and thus it fails to reject

the null hypothesis H0.

Table 8 shows the MLEs under the false normality assumption and under the truncated normality assumption, respectively, without quadratic effects and interactions. Figure 9 shows the normal probability plots under the false normality assumption and the trun-cated normality assumption, respectively, without quadratic effects and interactions.

Similarly, we are also interested in testing the null hypothesis H0: λ = 0 and βjk = 0

for all (j, k) versus the alternative H1: λ 6= 0 or βjk 6= 0 for some (j, k). The asymptotic

p-value is 0.4313 under the truncated normality assumption, and thus it also fails to reject the null hypothesis H0.

Table 9 shows the MLEs with λ = 0 under the false normality assumption and the truncated normality assumption, respectively. Figure 10 shows the normal probability plots under the false normality assumption and the truncated normality assumption, re-spectively, with λ = 0. Table 10 shows the MLEs under the false normality assumption and the truncated normality assumption, respectively, without quadratic effects and in-teractions and with λ = 0. Figure 11 shows the normal probability plots under the false normality assumption and the truncated normality assumption, respectively, with-out quadratic effects and interactions and with λ = 0.

Suppose that y_l(λ) = β0+ 3 X j=1 βjxlj+ εl (50)

for l = 27 + 1, . . . , 27 + m, where m is a positive integer, yl is the lth observation for

(x1, x2, x3) = (xl1, xl2, xl3), εl is the (l − 27)-th future error distributed as either N (0, σ2)

or truncated N (0, σ2), and ε1, . . . , ε27+m are independent. Let α ∈ (0, 1) be fixed, e.g.,

0.05. For l = 27 + 1, . . . , 27 + m, let Φl(·; θ) denote the c.d.f. of εland ql,α(θ) the α quantile

of yl. Then ql,α(θ) = ( 1 + λ " β0 + 3 X j=1 βjxlj+ Φ−1l (α; θ) #)1/λ (51) for l = 27 + 1, . . . , 27 + m, where Φ−1_l (α; θ) = σ Φ−1 α Φ −1/λ − β0− P3 j=1βjxlj σ !! . (52) Thus, [q27+1,αm(θ), q27+1,1−αm(θ)] × · · · × [q27+m,αm(θ), q27+m,1−αm(θ)] is a size 1 − α

(24)

predic-tion region for (y27+1, . . . , y27+m)T with MLE [q27+1,αm(ˆθ), q27+1,1−αm(ˆθ)]×· · ·×[q27+m,αm(ˆθ),

q27+m,1−αm(ˆθ)], where αm ≡ [1 − (1 − α)

(25)

4 Conclusions and Discussion

Now consider the following transformed truncated normal mean regression model: h(yi; λ) = f (xi; β) + εi (53)

for i = 1, . . . , n, where yi is the response for subject i with known support (a, b) (⊂ R);

λ is an unknown finite-dimensional transformation vector; h(·; λ) is a known strictly increasing and differentiable real-valued function on (a, b); xi is a known covariate vector

for subject i; β is an unknown finite-dimensional regression parameter vector; f (·; β) is a known regression function for each β; and εis are independent errors distributed

as either N (0, g2_{(f (x}

i; β), zi; γ) σ2) or truncated N (µi(θ), σi2(θ)) such that µi(θ) is an

unknown mean parameter, σi(θ) is an unknown positive standard deviation parameter,

zi is a known covariate vector for subject i; γ is an unknown finite-dimensional parameter

vector; g(·, ·; γ) is a known positive function for each γ; and σ is an unknown positive scale parameter. Notice that, for i = 1, . . . , n, f (xi; β) is the mean of h(yi; λ) when it is

in the support of h(yi; λ), and g2(f (xi; β), zi; γ) σ2 is the variance of h(yi; λ).

By Johnson and Kotz (1994), using well-known formulas for the truncated normal distributions, it can show that suppose εi ∼ N (µi(θ), σi(θ)) has a normal distribution and

lies within the interval εi ∈ (ai, bi). Set a0i = [ai− µi(θ)]/σi(θ), b0i = [bi− µi(θ)]/σi(θ), εi

conditional on ai < εi < bi has a truncated normal distribution with probability density

function f (εi; µi(θ), σi(θ), ai, bi) = φ((εi− µi(θ))/σi(θ) σi(θ) [Φ(b0i) − Φ(a0i)] . (54) Then Eθ(εi|{ai < εi < bi}) = µi(θ) + φ(a0_i) − φ(b0_i) Φ(b0_i) − Φ(a0_i)σi(θ) (55) and V arθ(εi|{ai < εi < bi}) = σi2(θ) " 1 + a 0 iφ(a 0 i) − b 0 iφ(b 0 i) Φ(b0_i) − Φ(a0_i) − φ(a0 i) − φ(b 0 i) Φ(b0_i) − Φ(a0_i) 2# . (56) Since simultaneously solving equations (55) and (56), it will take too much time to evaluate the MLEs and the corresponding likelihood inference.

Consider the following transformed truncated normal median regression model: h(yi; λ) = f (xi; β) + εi (57)

(26)

where εis are independent errors distributed as either N (0, g2(f (xi; β), zi; γ) σ2) or

trun-cated N (µi(θ), σ2i(θ)), Notice that, for i = 1, . . . , n, f (xi; β) is the median of h(yi; λ), and

g(f (xi; β), zi; γ) σ is the interquartile range of h(yi; λ).

One way to obtain µi(θ)s is to utilize the Newton-Raphson method, but µi(θ) generally

it has no closed-form to be evaluated directly. It will take too much time to evaluate µi(θ).

In this paper, we propose the transformed truncated normal mode regression model. The important advantage of our model is that the MLEs are easy and fast to compute. In the proposed model, we utilize the MLEs and likelihood function to do hypothesis testing and statistic intervals, and we compare the MLEs under truncated normality assumption with the MLEs under false normality assumption.

Under the false normality assumption, the log-likelihood function for θ is log[L(θ)] ≡ `(θ) ≡ n X i=1 `i(θ), (58) where

`i(θ) = log[φ(ei(θ))] + log[h0i(λ)] − log[gi(β, γ)] − log(σ). (59)

Then the score function for θ is ∂`(θ) ∂θ = n X i=1 ∂`i(θ) ∂θ ≡ n X i=1 Si(θ) ≡ S(θ). (60)

We compare equations (59) and (60) with equations (16) and (19).

Consider the standard deviation is fixed, if the sample size is not large enough, the difference between the score function for θ under the false normality assumption and under the truncated normality assumption will be small. Hence, the MLEs under the false normality assumption are similiar with under the truncated normality assumption.

Consider the sample size is fixed, if the standard deviation is very small, ei(b; θ) tends

to be ∞ and ei(a; θ) tends to be −∞ generally. Thus, the difference between the score

function for θ under the false normality assumption and under the truncated normality assumption will be small. Hence, the MLEs under the false normality assumption are similiar with under the truncated normality assumption.

In Tables 2-5, there is no significant differences between the MLEs under the false normality assumption and the truncated normality assumption. A possible reason is that the sample size in Example 3.1 is not large enough.

(27)

In Tables 6-10, the MLEs under the false normality assumption are nearly the same as under the truncated normality assumption. Some possible reasons are that λ and σ are closed to 0, and the sample size is also not large enough in Example 3.2.

When the range of the response transformation is possibly different from R, the like-lihood inference under the coventional normality assumption is inappropriate and thus should not be used. Therefore, when the range of the response transformation is possi-bly different from R, we may assume that the proposed model holds and the likelihood inference under the proposed model in Section 2 can be used.

(28)

References

[1] Box, G. E. P. and Cox, D. R. (1964) An analysis of transformations. Journal of the Royal Statistical Society: Series B, 26, 211–252.

[2] Carroll, R. J. and Ruppert, D. (1988) Transformation and Weighting in Regression. Chapman and Hall, New York.

[3] Chen, C.-R. and Wang, L.-C. (2003) Likelihood inference under the general response transformation model with heteroscedastic errors. Taiwanese Journal of Mathematics, 7, 261–273.

[4] John, J. A. and Draper, N. R. (1980) An alternative family of transformations. Applied Statistics, 29, 190–197.

[5] Johnson, N. L. and Kotz, S. (1994) Distributions in Statistics: Continuous Univariate Distributions. John Wiley & Sons, New York.

[6] McLachlan, G. J. (2008) The EM Algorithm and Extensions. John Wiley & Sons, Hoboken.

[7] Mosteller, F. and Tukey, J. W. (1977) Data Analysis and Linear Regression. Addison-Wesley, Reading, Massachusetts.

[8] Poirier, D. J. (1978) The use of the Box–Cox transformation in limited dependent variable models. Journal of the American Statistical Association, 73, 284–287.

[9] Prakasa Rao, B. L. S. (1999) Semimartingales and Their Statistical Inference. Chap-man & Hall/CRC, Boca Raton.

[10] Tukey, J. W. (1957) On the comparative anatomy of transformations. The Annals of Mathematical Statistics, 28, 602–632.

(29)

Appendix A

For i = 1, . . . , n, Si(θ) = φ0(ei(θ))[∂ei(θ)/∂θ] φ(ei(θ)) + ∂h 0 i(λ)/∂θ h0_i(λ) − ∂σ/∂θ σ − ∂g(β, γ)/∂θ g(β, γ) −∂Φ(ei(b; θ))/∂θ − ∂Φ(ei(a; θ))/∂θ

Φ(ei(b; θ)) − Φ(ei(a; θ))

, where

φ0(ei(θ)) = −ei(θ)φ(ei(θ))

and for ei(u; θ) ∈ R,

∂ei(u; θ) ∂λ = ∂h(u; λ)/∂λ σ gi(β, γ) , ∂ei(u; θ) ∂β = − ∂f (xi; β)/∂β σ gi(β, γ) −h(u; λ) − f (xi; β) σ g2 i(β, γ) ∂gi(β, γ) ∂β , ∂ei(u; θ) ∂σ = − ei(u; θ) σ , ∂ei(u; θ) ∂γ = − ei(u; θ) gi(β, γ) ∂gi(β, γ) ∂γ , ∂Φ(ei(u; θ)) ∂θ = φ(ei(u; θ)) ∂ei(u; θ) ∂θ 1R(ei(u; θ)). As an example, when a = 0, b = ∞, h(u; λ) = u(λ)_{, f}

i(β) = xTi β, and gi(β, γ) = 1 for u ∈ (0, ∞) and i = 1, . . . , n, Si(θ) = −ei(θ) ∂ei(θ) ∂θ + y 1−λ i ∂h0_i(λ) ∂θ − σ −1∂σ ∂θ + φ(ei(0; θ)) 1 − Φ(ei(0; θ)) ∂ei(0; θ)) ∂θ ,

(30)

where ei(θ) = y_i(λ)− xT i β σ , ∂ei(θ) ∂λ = log(yi)yλi − y (λ) i σλ , ∂ei(θ) ∂β = − xi σ, ∂ei(θ) ∂σ = − ei(θ) σ , ei(0; θ) = − 1/λ + xT i β σ , ∂ei(0; θ) ∂λ = 1 σλ2, ∂ei(0; θ) ∂β = − xi σ, ∂ei(0; θ) ∂σ = − ei(0; θ) σ , ∂σ ∂σ = 1, h0_i(λ) = y_iλ−1, and ∂h0_i(λ) ∂λ = log(yi)y λ−1 i .

(31)

Appendix B

For i = 1, . . . , n, Ji(θ) = − φ00(ei(θ))[∂ei(θ)/∂θ][∂ei(θ)/∂θT] + φ0(ei(θ))[∂2ei(θ)/∂θ∂θT] φ(ei(θ)) +[φ 0_(e i(θ))]2[∂ei(θ)/∂θ][∂ei(θ)/∂θT] φ2_(e i(θ)) − ∂ 2_h0 i(λ)/∂θ∂θT h0 i(λ) +[∂h 0 i(λ)/∂θ][∂h0i(λ)/∂θT] (h0_i)2_(λ) + ∂2σ/∂θ∂θT σ − [∂σ/∂θ][∂σ/∂θT] σ2 +∂ 2_{g(β, γ)/∂θ∂θ}T g(β, γ) − [∂g(β, γ)/∂θ][∂g(β, γ)/∂θT_] g2_{(β, γ)} +∂ 2_Φ(e i(b; θ))/∂θ∂θT − ∂2Φ(ei(a; θ))/∂θ∂θT

Φ(ei(b; θ)) − Φ(ei(a; θ))

−[∂Φ(ei(b; θ))/∂θ − ∂Φ(ei(a; θ))/∂θ][∂Φ(ei(b; θ))/∂θ

T _{− ∂Φ(e}

i(a; θ))/∂θT]

[Φ(ei(b; θ)) − Φ(ei(a; θ))]2

, where

φ0(ei(θ)) = −ei(θ)φ(ei(θ)),

φ00(ei(θ)) = −φ(ei(θ)) + e2i(θ)φ(ei(θ))

(32)

φ0(ei(θ)) = −ei(θ)φ(ei(θ)), φ00(ei(θ)) = −φ(ei(θ)) + e2i(θ)φ(ei(θ)), ∂ei(u; θ) ∂λ = ∂h(u; λ)/∂λ gi(β, γ) σ , ∂ei(u; θ) ∂β = − ∂fi(β)/∂β gi(β, γ) σ −h(u; λ) − fi(β) g2 i(β, γ) σ ∂gi(β, γ) ∂β , ∂ei(u; θ) ∂σ = − ei(u; θ) σ , ∂ei(u; θ) ∂γ = − ei(u; θ) gi(β, γ) ∂gi(β, γ) ∂γ , ∂2ei(u; θ) ∂λ∂λT = ∂2h(u; λ)/∂λ∂λT gi(β, γ) σ , ∂2_e i(u; θ) ∂β∂βT = − ∂2_f i(β)/∂β∂βT gi(β, γ) σ +∂fi(β)/∂β g2 i(β, γ) σ ∂gi(β, γ) ∂βT + ∂fi(β)/∂β g2 i(β, γ) σ ∂gi(β, γ) ∂βT +2h(u; λ) − fi(β) g3 i(β, γ) σ ∂gi(β, γ) ∂β ∂gi(β, γ) ∂βT − h(u; λ) − fi(β) g2 i(β, γ) σ ∂2gi(β, γ) ∂β∂βT , ∂2ei(u; θ) ∂σ2 = 2 ei(u; θ) σ2 , ∂2_e i(u; θ) ∂γ∂γT = 2 ei(u; θ) g2 i(β, γ) ∂gi(β, γ) ∂γ ∂gi(β, γ) ∂γT − ei(u; θ) gi(β, γ) ∂2_g i(β, γ) ∂γ∂γT , ∂2ei(u; θ) ∂λ∂βT = − ∂h(u; λ)/∂λ g2 i(β, γ) σ ∂gi(β, γ) ∂βT , ∂2ei(u; θ) ∂λ∂σ = − ∂h(u; λ)/∂λ gi(β, γ) σ2 , ∂2_e i(u; θ) ∂λ∂γT = − ∂h(u; λ)/∂λ g2 i(β, γ)σ ∂gi(β, γ) ∂γ , ∂2_e i(u; θ) ∂β∂σ = ∂fi(β)/∂β gi(β, γ) σ2 +h(u; λ) − fi(β) g2 i(β, γ) σ2 ∂gi(β, γ) ∂β , ∂2_e i(u; θ) ∂β∂γT = ∂fi(β)/∂β g2 i(β, γ) σ ∂gi(β, γ) ∂γT + 2 h(u; λ) − fi(β) g3 i(β, γ) σ ∂gi(β, γ) ∂β ∂gi(β, γ) ∂γT −h(u; λ) − fi(β) g2 i(β, γ) σ ∂2gi(β, γ) ∂β∂γT , ∂2ei(u; θ) ∂σ∂γ = ei(u; θ) gi(β, γ) σ ∂gi(β, γ) ∂γ , ∂Φ(ei(u; θ)) ∂θ = φ(ei(u; θ)) ∂ei(u; θ) ∂θ 1R(ei(u; θ)), and ∂2_Φ(e i(u; θ))

∂θ∂θT = −ei(u; θ)φ(ei(u; θ))

∂ei(u; θ) ∂θ ∂ei(u; θ) ∂θT 1R(ei(u; θ)) +φ(ei(u; θ)) ∂2ei(u; θ) ∂θ∂θT 1R(ei(u; θ)).

(33)

As an example, when a = 0, b = ∞, h(u; λ) = u(λ), fi(β) = xTi β, and gi(β, γ) = 1 for u ∈ (0, ∞) and i = 1, . . . , n, Ji(θ) = [1 − e2i(θ)][∂ei(θ)/∂θ][∂ei(θ)/∂θT] + ei(θ)[∂2ei(θ)/∂θ∂θT] +e2_i(θ)[∂ei(θ)/∂θ][∂ei(θ)/∂θT] − yi1−λ ∂2h0_i(λ) ∂θ∂θT +y_i2 (1−λ)∂h 0 i(λ) ∂θ ∂h0_i(λ) ∂θT + σ −1 ∂2σ ∂θ∂θT − σ −2 ∂σ ∂θ ∂σ ∂θT +ei(0; θ)φ(ei(0; θ)) 1 − Φ(ei(0; θ)) ∂ei(0; θ) ∂θ ∂ei(0; θ) ∂θT − φ(e0(u; θ)) 1 − Φ(ei(0; θ)) ∂2ei(0; θ) ∂θ∂θT − φ(ei(0; θ)) 1 − Φ(ei(0; θ)) 2 ∂ei(0; θ) ∂θ ∂ei(0; θ) ∂θT , where ei(θ) = y(λ)_i − xT i β σ , ∂ei(θ) ∂λ = log(yi)yλi − y (λ) i σ λ , ∂ei(θ) ∂β = − xi σ, ∂ei(θ) ∂σ = − ei(θ) σ , ∂2ei(θ) ∂λ2 = [log(yi)]2yλi − [log(yi)y (λ) i − y (λ) i ]/λ σ λ − log(yi)yiλ− y (λ) i σ λ2 , ∂2_e i(θ) ∂β2 = 0, ∂2_e i(θ) ∂σ2 = 2 ei(θ) σ2 , ∂2_e i(θ) ∂λ∂β = 0, ∂2_e i(θ) ∂λ∂σ = − log(yi)yλi − y (λ) i σ2_λ , ∂2ei(θ) ∂β∂σ = xi σ2, ei(0; θ) = − 1/λ + xT iβ σ ,

(34)

∂ei(0; θ) ∂λ = 1 σ λ2, ∂ei(0; θ) ∂β = − xi σ, ∂ei(0; θ) ∂σ = − ei(0; θ) σ . ∂2_e i(0; θ) ∂λ2 = − 2 σ λ3, ∂2ei(0; θ) ∂β∂βT = 0, ∂2_e i(0; θ) ∂σ2 = 2 ei(0; θ) σ2 , ∂2_e i(0; θ) ∂λ∂β = 0, ∂2ei(0; θ) ∂λ∂σ = − 1 σ2_λ2, ∂2_e i(0; θ) ∂β∂σ = xi σ2, ∂σ ∂σ = 1, h0_i(λ) = y_iλ−1, ∂h0_i(λ) ∂λ = log(yi)y λ−1 i , and ∂2_h0 i(λ) ∂λ2 = [log(yi)] 2 yλ−1_i .

(35)

Appendix C

Consider the following Box-Cox transformed truncated normal two-way ANOVA model: y(λ)_ijk = µij + εijk = µ + τi+ βj + (τ β)ij + εijk

for i = 1, . . . , a; j = 1, . . . , b; and k = 1, . . . , n, where a, b, n ∈ {2, 3, . . .} and εijks are

independent errors distributed as either N (0, σ2_{) or truncated N (0, σ}2_{) with unknown}

positive standard deviation σ.

(i) Choose several initial values ˆλ(0)s in a non-empty set S, e.g., S = {−2, −7/4, −3/2, −5/4, −1, −3/4, −1/2, −1/4, 0, 1/4, 1/2, 3/4, 1, 5/4, 3/2, 7/4, 2}.

(ii) For each ˆλ(0) in S, choose the initial values ˆ µ(0) ≡ y(ˆλ(0)₎ ... , b τi(0) ≡ y (ˆλ(0)₎ i.. − y (ˆλ(0)) ... , b βj (0) ≡ y(ˆ_.j.λ(0))− y(ˆλ(0)) ... , (cτ β)(0)_ij ≡ y(ˆ_ij.λ(0))− y_i..(ˆλ(0))− y(ˆ_.j.λ(0))+ y(ˆ_...λ(0)), and ˆ σ2(0) ≡ 1 abn − ab − 1 a X i=1 b X j=1 n X k=1 h y_ijk(ˆλ(0))− y(ˆ_ij.λ(0))i2 for i = 1, . . . , a; j = 1, . . . , b; and k = 1, . . . , n, where

y_... ≡ 1 abn a X i=1 b X j=1 n X k=1 yijk, y_i.. ≡ 1 bn b X j=1 n X k=1 yijk, y_.j. ≡ 1 an a X i=1 n X k=1 yijk, and y_ij. ≡ 1 n n X k=1 yijk.

(iii) Denote these ˆθ(0)s as ˆθ(0,1), ˆθ(0,2), . . . , ˆθ(0,|S|), where |S| denotes the number of ele-ments in S. Choose ˆθ(0) _{as ˆ}_θ(0,`∗) _{such that ˆ}_θ(0,`∗) _{= max}

(36)

In Example 3.1, when we choose ˆλ(0) = −3/4, we have the largest log-likelihood function for ˆθ(0)_{, `(ˆ}_θ(0)_{) = 55.6467, then we use the initial value to iterate the equation}

(37)

Appendix D

Suppose that the transformed truncated normal mode two-way ANOVA model is y_ijk(λ) = µ + τi+ βj+ (τ β)ij + εijk

for i = 1, 2, 3 and j, k = 1, 2, 3, 4, where each yijk has support (0, ∞) and εijks are

independent errors distributed as either N (0, σ2_{) or truncated N (0, σ}2_{) with unknown}

positive standard deviation σ and with support (−∞, −1/λ − µ − τi − βj − (τ β)ij) for

λ < 0. Thus, the c.d.f. of εijk/σ is

Pθ({εijk/σ < u}) =

Φ(u)

Φ([−1/λ − µ − τi− βj− (τ β)ij]/σ)

. By the probability integral transformation,

Φ(εijk/σ)

Φ([−1/λ − µ − τi− βj − (τ β)ij]/σ)

∼ uniform(0, 1), which implies that

Φ−1 Φ(εijk/σ) Φ([−1/λ − µ − τi − βj − (τ β)ij]/σ ) ∼ N (0, 1).

(38)

Table 1: Survival times (1 unit = 10 hours) of animals in a 3×4 factorial experiment. B (Treatment) A (Poison) 1 2 3 4 1 0.31 0.82 0.43 0.45 0.45 1.10 0.45 0.71 0.46 0.88 0.63 0.66 0.43 0.72 0.76 0.62 2 0.36 0.92 0.44 0.56 0.29 0.61 0.35 1.02 0.40 0.49 0.31 0.71 0.23 1.24 0.40 0.38 3 0.22 0.30 0.23 0.30 0.21 0.37 0.25 0.36 0.18 0.38 0.24 0.31 0.23 0.29 0.22 0.33

(39)

Table 2: MLEs under the false normality assumption and the truncated normality as-sumption, respectively, for Example 3.1.

False Truncated MLE Normality Normality

ˆ λ −0.8073 −0.8077 ˆ µ −1.4175 −1.4179 ˆ τ1 0.6797 0.6799 ˆ τ2 0.2878 0.2879 ˆ β1 −0.7383 −0.7386 ˆ β2 0.6451 0.6453 ˆ β3 −0.2778 −0.2779 (cτ β)11 0.1359 0.1360 (cτ β)12 −0.0658 −0.0659 (cτ β)13 0.2160 0.2161 (cτ β)21 −0.1043 −0.1043 (cτ β)22 0.1142 0.1142 (cτ β)23 −0.1234 −0.1234 ˆ σ 0.3567 0.3569

(40)

Table 3: MLEs without interaction under the false normality assumption and the trun-cated normality assumption, respectively, for Example 3.1.

ˆ λ −0.7440 −0.7441 ˆ µ −1.3576 −1.3577 ˆ τ1 0.6393 0.6394 ˆ τ2 0.2696 0.2697 ˆ β1 −0.7383 −0.6938 ˆ β2 0.6123 0.6124 ˆ β3 −0.2778 −0.2644 ˆ σ 0.3636 0.3637

(41)

Table 4: MLEs with λ = −1 under the false normality assumption and the truncated normality assumption, respectively, for Example 3.1.

ˆ µ −1.6232 −1.6228 ˆ τ1 0.8213 0.8219 ˆ τ2 0.3526 0.3524 ˆ β1 −0.8961 −0.8965 ˆ β2 0.7596 0.7608 ˆ β3 −0.3240 −0.3244 (cτ β)11 0.2111 0.2106 (cτ β)12 −0.1211 −0.1193 (cτ β)13 0.2632 0.2626 (cτ β)21 −0.1017 −0.1015 (cτ β)22 0.1126 0.1120 (cτ β)23 −0.1194 −0.1192 ˆ σ 0.4235 0.4241

(42)

Table 5: MLEs without interaction and with λ = −1 under the false normality assumption and the truncated normality assumption, respectively, for Example 3.1.

ˆ µ −1.6232 −1.6216 ˆ τ1 0.8213 0.8242 ˆ τ2 0.3527 0.3512 ˆ β1 −0.7383 −0.8978 ˆ β2 0.7596 0.7635 ˆ β3 −0.3240 −0.3256 ˆ σ 0.4607 0.4627

(43)

Table 6: Cycles to failure of worsted yarn: 33 factorial experiment without replication. Factor levels x1 x2 x3 Cycles to failure −1 −1 −1 674 −1 −1 0 370 −1 −1 1 292 −1 0 −1 338 −1 0 0 266 −1 0 1 210 −1 1 −1 170 −1 1 0 118 −1 1 1 90 0 −1 −1 1414 0 −1 0 1198 0 −1 1 634 0 0 −1 1022 0 0 0 620 0 0 1 438 0 1 −1 442 0 1 0 332 0 1 1 220 1 −1 −1 3636 1 −1 0 3184 1 −1 1 2000 1 0 −1 1568 1 0 0 1070 1 0 1 566 1 1 −1 1140 1 1 0 884 1 1 1 360

(44)

Table 7: MLEs under the false normality assumption and the truncated normality as-sumption, respectively, for Example 3.2.

ˆ λ −0.2158 −0.2158 ˆ β0 3.4929 3.4929 ˆ β1 0.2142 0.2142 ˆ β2 −0.1626 −0.1626 ˆ β3 −0.0954 −0.0954 ˆ β12 0.0541 −0.0541 ˆ β13 0.0232 −0.0232 ˆ β23 −0.0124 −0.0124 ˆ β11 −0.0219 0.0219 ˆ β22 −0.0030 0.0030 ˆ β33 −0.0164 −0.0164 ˆ σ 0.0435 0.0435

(45)

Table 8: MLEs without quadratic terms under the false normality assumption and the truncated normality assumption , respectively, for Example 3.2.

ˆ λ −0.0363 −0.0363 ˆ β0 5.6577 5.6577 ˆ β1 0.6611 0.6611 ˆ β2 −0.5010 −0.5010 ˆ β3 −0.2950 −0.2950 ˆ σ 0.1541 0.1541

(46)

Table 9: MLEs with λ = 0 under the false normality assumption and the truncated normality assumption , respectively, for Example 3.2.

ˆ β0 6.4763 6.4763 ˆ β1 0.8324 0.8324 ˆ β2 −0.6310 −0.6310 ˆ β3 −0.3716 −0.3716 ˆ β12 −0.0383 −0.0383 ˆ β13 −0.0684 −0.0684 ˆ β23 −0.0208 −0.0208 ˆ β11 −0.1275 −0.1275 ˆ β22 −0.0176 −0.0176 ˆ β33 −0.0466 −0.0466 ˆ σ 0.1758 0.1758

(47)

Table 10: MLEs with λ = 0 and without quadratic terms under the false normality assumption and the truncated normality assumption, respectively, for Example 3.2.

ˆ β0 6.3486 6.3486 ˆ β1 0.8323 0.8323 ˆ β2 −0.6310 −0.6310 ˆ β3 −0.3716 −0.3716 ˆ σ 0.1950 0.1950

(48)

(49)

(a) (b)

Figure 2:

(a) Residual plot against fitted values for the original data under the two-way ANOVA effects model for Example 3.1.

(b) Residual plot against fitted values for the transformed data under the Box-Cox trans-formed mode regression model for Example 3.1.

(a) (b)

Figure 3:

(a) Normal probability plot under the false normality assumption for Example 3.1. (b) Normal probability plot under the truncated normality assumption for Example 3.1.

(50)

(a) (b)

Figure 4:

(a) Normal probability plot under the false normality assumption without interactions for Example 3.1.

(b) Normal probability plot under the truncated normality assumption without interac-tions for Example 3.1.

(a) (b)

Figure 5:

(a) Normal probability plot under the false normality assumption with λ = −1 for Ex-ample 3.1.

(b) Normal probability plot under the truncated normality assumption with λ = −1 for Example 3.1.

(51)

(a) (b)

Figure 6:

(a) Normal probability plot under the false normality assumption without interactions and with λ = −1 for Example 3.1.

(b) Normal probability plot under the truncated normality assumption without interac-tions and with λ = −1 for Example 3.1.

(a) (b)

Figure 7:

(a) Residual plot against fitted values for the original data under the quadratic regression model for Example 3.2.

(b) Residual plot against fitted values for the transformed data under the Box-Cox trans-formed mode regression model for Example 3.2.

(52)

(a) (b)

Figure 8:

(a) Normal probability plot under the false normality assumption for Example 3.2. (b) Normal probability plot under the truncated normality assumption for Example 3.2.

(a) (b)

Figure 9:

(a) Normal probability plot under the false normality assumption without quadratic effects and interactions for Example 3.2.

(b) Normal probability plot under the truncated normality assumption without quadratic effects and interactions for Example 3.2.

(53)

(a) (b)

Figure 10:

(a) Normal probability plot under the false normality assumption with λ = 0 for Example 3.2.

(b) Normal probability plot under the truncated normality assumption with λ = 0 for Example 3.2.

(a) (b)

Figure 11:

(a) Normal probability plot under the false normality assumption without quadratic effects and interactions and with λ = 0 for Example 3.2.

(b) Normal probability plot under the truncated normality assumption without quadratic effects and interactions and with λ = 0 for Example 3.2.

變換截常態眾數迴歸模型的概似推論

國 立 交 通 大 學

統 計 學 研 究 所

碩 士 論 文

Likelihood Inference under the Transformed Truncated Normal

Mode Regression Model

研 究 生：沈彥廷

指導教授：陳志榮 博士

變換截常態眾數迴歸模型的概似推論

Likelihood Inference under the Transformed Truncated

Normal Mode Regression Model

研 究 生：沈彥廷 Student：Yen-Tin Sen

指導教授：陳志榮 博士 Advisor：Dr. Chih-Rung Chen

國 立 交 通 大 學

統 計 學 研 究 所

碩 士 論 文

A Thesis

Submitted to Institute of Statistics

College of Science

National Chiao Tung University

In Partial Fulfillment of the Requirements

for the Degree of Master

in

Statistics

June 2010

Hsinchu, Taiwan

變換截常態眾數迴歸模型的概似推論

學生：沈彥廷 指導教授：陳志榮 博士

國立交通大學統計學研究所

摘要

當一組資料經過變換以後，其值域有可能不包含某些實數；在此情況下，

變換後的資料不可能滿足傳統的常態假設。因此我們提出一個變換截常態眾

數迴歸模型及其概似推論，然後應用到兩組實際的資料中，並與傳統的常態

假設做比較。最後比較變換截常態眾數、平均數和中位數三種不同迴歸模型

的計算複雜性。

Likelihood Inference under the Transformed Truncated

Normal Mode Regression Model

Student: Yen-Tin Sen

Advisor: Dr. Chih-Rung Chen

Institute of Statistics

National Chiao Tung University

誌 謝

我最需要感謝的，是陳志榮老師的指導，有老師的幫忙才能順利完成這

篇論文，而且老師對於其學生付出的心力，遠比想像中還要多很多，從老師

身上可以感覺出對研究的熱忱，除了學到解決問題的能力之外，更了解處理

事情應該要細心、謹慎；會以老師為榜樣，在未來的日子，懷抱熱情並以求

知的精神走下去。感謝兩年來所上老師們的教導，同時也要謝謝所上行政人

員，在剛來統研所這個新環境時的幫助，還有同學們在課業上互相學習、競

爭，這對我助益良多。很高興能在統研所和老師、同學一起度過兩年。

最後，感謝父母給於生活上的安定，讓我可以專心於學業上，不必為其

他事操煩，有父母的付出、老師的教導、同學們的關愛讓我能順順利利完成

學業。

沈 彥 廷 謹至于

國 立 交 通 大 學 統 計 學 研 究 所

中華民國九十九年六月

Contents

List of Tables

List of Figures

1

Introduction

2

Transformed Truncated Normal Mode Regression

Model

2.1

Transformed Truncated Normal Mode Regression Model

2.2

Maximum Likelihood Estimation

2.3

Hypothesis Testing and Confidence Regions

2.4

Prediction Region of Future Observations

3

Two Real Data Sets

3.1

A Biological Experiment Using a 3×4 Factorial Design

3.2

A Textile Experiment Using a Single Replicate of a 3

Design

4

Conclusions and Discussion

國立交通大學

統計學研究所

碩士論文

研究生：沈彥廷

指導教授：陳志榮博士

研究生：沈彥廷 Student：Yen-Tin Sen

指導教授：陳志榮博士 Advisor：Dr. Chih-Rung Chen

國立交通大學

統計學研究所

碩士論文

學生：沈彥廷指導教授：陳志榮博士

誌謝

沈彥廷謹至于

國立交通大學統計學研究所