單調無母數迴歸在廣義線性模型上之研究

(1)

國立交通大學

統計學研究所

碩士論文

單調無母數迴歸在廣義線性模型上之應用

Nonparametric Monotone Regression for

Generalized Linear Models

研究生：文誠智

指導教授：洪志真博士

(2)

單調無母數迴歸在廣義線性模型上之應用

Nonparametric Monotone Regression for

Generalized Linear Models

研究生：文誠智

Student

：Cheng-Chih Wen

指導教授：洪志真

Advisor

：Dr. Jyh-Jen Horng Shiau

國立交通大學

統計學研究所

碩士論文

A Thesis

Submitted to Institute of Statistics

College of Science

National Chiao Tung University

in Partial Fulfillment of the Requirements

for the Degree of

Master

in

Statistics

June 2007

Hsinchu, Taiwan, Republic of China

(3)

單調無母數迴歸在廣義線性模型上之研究

研究生：文誠智指導教授：洪志真博士

國立交通大學統計學研究所

摘要

本篇文章裡，為解決WAT(Wafer Acceptance Test)-EC(Engineering

Control) 的問題，我們發展了單調無母數迴歸。藉由 Gu(2002) ，

Zhang(2004)所提出的方法加以結合及修正，將反應變數拓展至整個指

數族上，與此相關的演算法也會在本文中提出。我們利用Natural Cubic

Splines的性質發展出有效率的計算法，並用模擬資料來探討其效率。

當反應變數為Bernoulli或Poisson分部時，其模擬的結果都有不錯的

表現。此外，在“真實函數＂具有單調性的情形下，有單調限制估計

量之ASE(Averages Square Error)與無單調限制並沒有明顯差異。然

而，當無單調限制之估計量呈現出非單調時，則單調限制估計量在ASE

上之表現會明顯優於前者。最後，我們將說明如何利用此方法來篩選

EC中的WAT測試項目並且建立適當的管制上下限。

(4)

Nonparametric Monotone Regression for

Generalized Linear Models

Student: Cheng-Chih Wen

Advisor: Dr. Jyh-Jen Horng Shiau

Institute of Statistic

National Chiao Tung University

Abstract

In this study, motivated by the WAT-EC problem, we develop a nonparametric monotone smoothing spline smoother for analyzing responses from exponential families by combining the methodologies provided in Gu (2002) and Zhang (2004) along with our modification. An algorithm with implementation details is provided. Computation is efficient because we utilize the characteristics of the natural cubic splines. The effectiveness of the proposed method is studied by simulation. The simulation results demonstrate that the proposed method performs well in the regression models with both the Bernoulli and Poisson responses. When the “true” function is monotonic, the proposed monotone estimator performs about the same as the unconstrained smoother in terms of the averaged squared error for the cases when the latter performs well. On the other hand, constrained smoother outperforms the unconstrained smoother when the unconstrained smoother produces non-monotone estimates. As an illustrative example, we demonstrate the proposed method can be used in screening WAT test items for more stringent engineering control and in setting appropriate control limits.

(5)

誌謝

陳鄰安教授曾經說過：「數學可以因數學而數學，統計不能因統計而統計」，我

用了兩年的碩士生涯才體會到這兩句話的涵義。不僅如此，除了在統計方面學到許多

新知識，也認識很多好朋友。對於此篇論文的完成，我要感謝我的指導老師洪志真

教授仔細認真的批閱，以及教導我做學問應有的態度，並耐心回答我在論文上所遇到

的問題，使我順利完成論文。也要感謝我的研究所同學們俊睿、益銘、花花、雪芳、

永在和建威，給我鼓勵與歡笑，帶給我美好的碩士生回憶。

最後要感謝博士班的碩慧學姐，除了論文上的協助外，也時常不吝惜分享其人生

經驗與價值觀，讓我了解自己欠缺的知識，並在我人生的規劃中給予適時的建議，受

益良多。

此外還要感謝家人的支持，以及女朋友的體諒，使我可以毫無顧慮地專心完成論

文。僅將此論文獻給我的老師、學姐、家人以及朋友們

文誠智謹誌于

國立交通大學統計研究所

中華民國九十六年六月

(6)

1 Introduction

The Wafer Acceptance Test (WAT) in semiconductor manufacturing is aimed at mon-itoring whether the electric characteristics of devices, such as voltage, current, and re-sistance, are regular or not. In Fab, every wafer must go through WAT device testing. In addition, a more stringent control, hereinafter referred to as “engineering control (EC)”, is further imposed on passing wafers. For implementing EC, engineers would need to select a set of critical WAT test items and determine more stringent “control” limits (than that of the regular WAT) for each of them. A wafer will be “held” up for further investigation when any of the EC items fails (i.e., exceeding the prescribed EC limits). However, without any objective assessment on how critical the WAT test items are toward the yield, engineers tend to select potential items as many as possible (sometimes even in hundreds) to perform the extra engineering control. Unfortunately, more than tolerable number of false alarms often occur with this practice. As a result, annoyed by the excess number of false alarms, engineers tend to ignore them, despite the extra efforts and costs spent in performing EC. Hence, to make EC more effective, it would be helpful if an assessment tool for screening EC test items is available for engineers and for evaluating the adequacy of the predetermined EC control limits as well.

Motivated by the above problem, as an assessment tool, we propose developing an EC performance curve for each WAT test item. The proposed EC performance curve of a test item aims at presenting the relationship between the EC passing rate and the

circuit probe (Cp) yield of the wafers, where the EC passing rate is the probability that

a randomly selected wafer passes this EC test item and the Cp yield is the proportion

of the chips on the wafer that pass the functional test called the circuit probe test. The logic behind defining the EC performance curve as such is that an effective control of

a critical EC test item would improve the process, which in turn leads to a higher Cp

yield. Thus an EC test item that can discriminate the Cp yield to some extent would be

(8)

1 displays the EC performance curves of three EC test items showing respectively the

probabilities, denoted by p(Cp) as a function of the Cp yield, of a wafer passing these

EC test items. The dashed curve depicts that the passing probability is one or almost

one for Cp > .15, indicating that most of wafers with Cp greater than .15 would pass

this test easily. In other words, such an EC test item can only discriminate low-yield wafers. Conversely, the dotted curve discriminates only wafers with high yields. The

solid curve discriminates better for the middle values of the Cp yield. Thus engineers

can pick the EC test items based on the process under study or monitoring. For

example, for a process with a fairly high Cp yield, an EC test with a performance curve

similar to the dotted curve may be a good candidate for the engineering control. We remark that a set of inappropriate control limits may make the performance curve of a critical-in-nature EC test item lose some or all of its discrimination power. Thus engineers may be able to adjust the control limits of an EC test to an appropriate level so that this particular EC test would have a desirable discrimination power for effective control.

Furthermore, although the WAT-EC tests and the circuit probe tests are quite different in nature, one tends to expect that the two test results of the same wafer should be somewhat positively associated. Thus it is reasonable to assume that the EC performance curves are monotonic.

An example of some similarity in nature as our WAT-EC problem is the item responses estimation problem discussed in Rossi, Wang, and Ramsay (2002). The data set consists of the responses of N examinees to n question items in a test. Assume that each item is answered either right or wrong. The authors proposed to estimate nonparametrically the probability that examinee j gets item i right from the discrete data and a covariate such as the IQ score for each of the N examinees via the EM Algorithm. Moreover, the discrimination power of the test items was also discussed in the paper.

To develop the EC performance curve for an EC test item, we adopt the non-parametric regression approach to estimating the functional relationship between the

(9)

passing rate and the Cp yield. In the nonparametric regression approach, the only

assumption on the regression function is smoothness and no functional form needs to be specified, which provides a great advantage of flexibility in function estimation and some convenience in modeling. The price we pay for adopting the nonparametric regres-sion approach instead of the parametric regresregres-sion approach is the slight inefficiency. However, this inefficiency only happens when the specified parametric regression model is adequate.

The statistical model we consider for the WAT-EC data is as follows. The indepen-dent variable (i.e., the covariate) of the regression is a random variable X representing

the Cp yield of a wafer. The dependent variable Y is the corresponding pass/fail result

of the WAT-EC test item for that wafer. Recall that p(x) is the passing probability of

the wafer with the Cp yield X = x. Then the dependent variable Y has a Bernoulli

distribution with a passing probability p(x). As mentioned before, we will estimate the EC performance curve p(·) by nonparametric monotone regression.

Although this study is motivated by the WAT-EC application, the nonparametric monotone regression estimation method developed in this work can be applied to ap-plications with the dependent variable Y from the exponential family. For example, the number of particles on a wafer with a covariate affecting the number of particles may be modeled by a Poisson distribution in which the mean number of particles may be of interest and could be described as a monotone function of the covariate. Or the number of defects in a product item may be again modeled by a Poisson distribution and the covariate could be a process condition that is monotonically associated with the number of defects when the product item was manufactured.

The topic of monotone function smoothing has been discussed for quite a long time in the literature. One of the major techniques used in the monotone regression focuses on the first derivative of the function to be estimated. Under the assumption that the random errors follow the normal distribution, Ramsay (1998) proposed expressing the first derivative of a monotone function as the exponential of a smooth function and estimating the smooth exponent by a B-spline. Under the same Gaussian model

(10)

for random errors, by adopting the penalized least squares approach, Zhang (2004) developed a simple method trying to obtain a monotone function estimate by forc-ing the estimated first derivative of the function to be non-negative or non-positive in computation. Based on the method constructed by Ramsay (1998), Wang (2000) ex-tended the distribution to the exponential family and developed a two-step algorithm to implement a monotonic regression technique. For more research works on nonpara-metric monotone regression, see Ramsay (1998), Wang (2000), Zhang (2004), and the references cited therein.

In this study, we adopt a different approach from that in Wang (2000). Instead of the exponent approach by Ramsay (1998), We combine the penalized likelihood approach given in Gu (2002) for estimating the parameter function when responses are from exponential families with the Zhang’s approach of forcing the estimated parameter function to be monotonic. With this approach, it is more natural that we use smoothing splines rather than B-splines. We remark that Zhang’s method has a hard-to-see flaw and with this flaw the monotonicity of the estimated function cannot be guaranteed. We modify Zhang’s method to ensure the monotonicity.

The rest of the paper is organized as follows. Section 2 reviews the three components we use in developing our estimation method, including smoothing splines as described in Green and Silverman (1994), Zhang’s approach to forcing a smoothing spline estimate to be monotonic, and Gu’s approach and algorithm for nonparametric regression when data are from exponential families. Section 3 describes the estimation method and the algorithm we propose in this paper. Section 4 examines the effectiveness of the proposed method by a simulation study for the cases of Bernoulli data and Poisson data. A comparison study is conducted to demonstrate the value of adding the monotone constraint. Section 5 returns to the motivated WAE-EC example and illustrates how to use the proposed method as an assessment tool. Section 6 concludes the paper with a brief summary and some discussions.

(11)

2 Literature Review

In the following subsections, we review the three components we use in developing the proposed method, including smoothing splines (in natural cubic splines), monotone smoothing splines (also in natural cubic splines), and smoothing splines for responses from exponential families.

2.1 Smoothing Splines - Natural Cubic Splines

Smoothing splines has been a very popular smoothing technique for decades. For computational purpose, we only review smoothing cubic splines as described in Green and Silverman (1994). For other aspects of smoothing splines, readers are referred to Wahba (1990), Eubank (1999), and Gu (2002) and the research works cited therein.

Consider the problem of fitting a curve from a set of noisy data {(x1, Y1), . . . , (xn, Yn)},

where xi ∈ [a, b], i = 1, . . . , n. Let S2[a, b] denote the space of functions that are

dif-ferentiable on [a, b] with absolutely continuous first derivative and square integrable

second derivative. Given any function g in S2[a, b], let the penalized sum of squares of

g be S(g) = 1 n n X i=1 {Yi− g(xi)} 2 + λ Z b a {g00(x)}2dx, (1)

where λ > 0 is the smoothing parameter controlling the tradeoff between the closeness

to data and the smoothness of the fitted curve. The smoothing spline estimate ˆg is

defined as the minimizer of the functional S(g) over all g ∈ S2[a, b].

Suppose the real numbers x1, . . . , xn are given on interval [a, b] such that a ≤ x1 <

x2 < . . . < xn ≤ b. A function g defined on [a, b] is called a cubic spline when

the following two conditions are satisfied: (i) g is a cubic polynomial on each of the

subintervals (a, x1), (x1, x2), . . . , (xn, b); (ii) the first and second derivatives of g are

continuous at each knot xi. If, in addition, the second and third derivatives of g are

zero at a and b, then g is said to be a natural cubic spline. More specifically, a natural

cubic spline is linear on the two boundary subintervals [a, x1] and [xn, b]. It is well

(12)

Wahba, 1990).

Suppose that g is a natural cubic spline on interval [a, b] with knots x1, . . . , xn.

Denote

gi = g(xi) and γi = g00(xi) for i = 1, . . . , n.

According to the definition of a natural cubic spline, the second derivative of g at x1

and xn are zero, that is, γ1 = γn = 0. Let g be the vector (g1, . . . , gn)

T

and γ be

the vector (γ2, . . . , γn−1)

T

. They constructed the following two matrices Q and R for

computation. Denote hi = xi+1− xi for i = 1, . . . , n − 1. Let Q be the n × (n − 2)

matrix with elements qij given by

qj−1,j = h−1j−1, qj,j = −h−1j−1− h −1

j , qj+1,j = h−1j ,

and qij = 0 for |i − j| ≥ 2 for i = 1, . . . , n and j = 2, . . . , n − 1. Note that the top

left element of Q is q12 and the bottom right element is qn,n−1. R is the symmetric

(n − 2) × (n − 2) band matrix with nonzero elements rij given by

rii = 1 3(hi−1+ hi) for 2 ≤ i ≤ n − 1, ri,i+1 = ri+1,i = 1 6hi for 2 ≤ i ≤ n − 2.

The matrix R is strictly diagonal dominant, that is, |rii| > Pj6=i|rij| for each i. It

follows that R is strictly positive definite. Define an n × n matrix K by

K = QR−1QT.

Green and Silverman (1994) proved that the vectors g and γ can specify a natural

cubic spline g if and only if the condition QTg = Rγ is satisfied. When this condition

holds, the roughness penalty satisfies Z b

a

{g00(t)}2dt = γTRγ = gTKg.

Return to the curve fitting problem. Since the smoothing spline estimator ˆg is a

(13)

need to search over a finite-dimensional class of functions, i.e., the natural cubic splines

with knots at the xi’s, instead of the infinite-dimensional space S2[a, b].

Let g be the natural cubic spline formed by the vectors g and γ, and matrices Q and R. Rewrite S(g) in terms of these vectors and matrices as follows. Let Y =

(Y1, . . . , Yn)T. Express the residual sum of squares about g as (Y − g)T(Y − g) and

the roughness penalty term R

{g00_(x)}2_{dx as g}T_{Kg to obtain} S(g) = 1 n(Y − g) T (Y − g) + λgTKg = 1 n n gT(I + nλK)g − 2YTg + YTYo.

Since K is non-negative definite, the matrix I + nλK is strictly positive definite. It therefore follows that S(g) has a unique minimum, which can be expressed as

g = (I + nλK)−1Y. (2)

Green and Silverman (1994) showed that the vector g can define the smoothing spline ˆ

g uniquely. That is, over the space of all natural cubic splines with knots xi, S(g) has

the unique minimum satisfying (2). Furthermore, the value of g(x) at any point x can

be specified by the vectors g and γ, where γ can be obtained by solving QTg = Rγ.

More specifically, on each subinterval [xi, xi+1], 1 ≤ i ≤ n − 1, it can be shown that

g(x) = (x − xi) gi+1+ (xi+1− x) gi hi −1 6(x − xi)(xi+1− x) 1 + x − xi hi γi+1+ 1 + xi+1− x hi γi for xi ≤ x ≤ xi+1.

If x is in the two boundary subintervals, by the fact that a natural cubic spline is linear on the boundary subintervals, we have

g(x) = g1− (x1− x)g0(x1) for x ≤ x1,

g(x) = gn+ (x − xn)g0(xn) for x ≥ xn,

where g0(x1) and g0(xn) are derivatives of g at x1 and xn, respectively, which can be

obtained by g0(x1) = g2− g1 x2− x1 − 1 6(x2− x1) γ2, g0(xn) = gn− gn−1 xn− xn−1 + 1 6(xn− xn−1) γn−1.

(14)

2.2 Monotone Smoothing Splines

Zhang (2004) proposed a simple and efficient monotone smoother based on smoothing spline estimation. The main idea is to impose a monotone constraint on the derivative of the estimated regression function.

Assume that data {(xi, Yi), i = 1, . . . , n} are sampled from the following

nonpara-metric regression model

Yi = f (xi) + i, i = 1, . . . , n,

where f (x) is an unknown smooth function with thrice continuous derivatives on

inter-val [a, b] and the random errors i’s are white noise with mean zero and standard

devi-ation σ. For simplicity, assume that the design points xi satisfy a ≤ x1 < . . . < xn ≤ b.

A smooth estimator of f can be defined as the minimizer ˆf of the following penalized

least squares criterion: 1 n n X i=1 {Yi− f (xi)}2+ λ Z {f000(x)}2dx, (3)

where again λ > 0 is the smoothing parameter.

To derive closed-form formulas for both ˆf (x) and its derivative, write f (x) in terms

of g(x) = f0(x) as

f (x) = f (x1) +

Z x

x1

g(u)du. (4)

Substitute (4) into (3) to obtain the following regularization criterion: 1 n n X i=1 {Yi− f (x1) − Z xi x1 g(x)dx} 2 + λ Z {g00(x)}2dx. (5)

Let ( ˆf (x1), ˆg) be the minimizer of (5).

By treating g as a natural cubic spline with knots xi’s for i = 1, . . . , n, Zhang

(2004) established the relationship between the function f and its derivative g using

the method of Green and Silverman (1994) and gave the closed-form formulas for ˆf

and ˆg, respectively, as in the following.

Denote fi = f (xi), gi = g(xi), and γi = g00(xi) for i = 1, . . . , n. Since g is a natural

(15)

γ = (γ2, . . . , γn−1)T. Let hi = xi+1−xi, i = 1, 2, . . . , n−1, and Q, R, K be the matrices

as defined in Green and Silverman (1994, pages 12,13) (also defined in Subsection 2.1). According to their Theorem 2.1, it can be shown that g is a natural cubic spline with

knots x1, . . . , xn if and only if γ = R−1QTg. As mentioned before,

K = QR−1QT and

Z b

a

g00(x)2dx = gTKg.

Denote the n-dimensional column vector of zeros by 0n . Let C = (c1, . . . , cn)T and

D = (d1, . . . , dn)T, where c1 = 0n, d1 = 0n−2, and, for i = 2, 3, . . . , n,

ci = (h1, h1+ h2, . . . , hi−2+ hi−1, hi−1, 0, . . . , 0)

T , di = (h31+ h 3 2, . . . , h 3 i−2+ h 3 i−1, h 3 i−1, 0, . . . , 0) T .

Note that ci and di are n × 1 and (n − 2) × 1 vectors, respectively. According to

Proposition 1 in Zhang (2004), the vector f can be expressed in terms of C, D, Q, R, and g as f = f (x1) · 1n+ { 1 2C − 1 24DR −1 QT}g.

To find the estimator of the function f , denote

M = 1 2C − 1 24DR −1 QT , ˜g = f1 g ! , ˜M = 1n M , and ˜K = 1 0 T n 0n K ! . Then the minimizer of the regularization problem (5) is

ˆ ˜ g = fˆ1 ˆ g ! = ( ˜MTM + nλ ˜˜ K)−1M˜TY, (6)

where Y = (Y1, · · · , Yn)T. We then have the estimator ˆf = ˜M ˆ˜g.

We remark that Zhang (2004) has a typo that the matrix ˜M is defined as an

(n + 1) × (n + 1) matrix ˜ M = 1 0 T n 1n M ! ,

which cannot be right since the vector Y in (6) is an n × 1 vector.

Zhang (2004) developed a simple method with an attempt to obtain to a monotone estimate. Without loss of generality, assume that f is decreasing, hence g is

(16)

this goal, Zhang (2004) replaced ˆg(x) by ˆg+(x) = max(ˆg(x), 0). He then estimated f by ˆ f = ˜M fˆ1 ˆ g+ ! , where ˆg+ = (ˆg+(x1), · · · , ˆg+(xn))T.

Unfortunately, the monotonicity of ˆf cannot be guaranteed. The reason is that the

i-th element of ˆf , denoted by ˆfi, is not exactly the value of ˆf (x1) +

Rxi

x1 gˆ+(u)du as

desired. Instead, it is ˆf (x1) +

Rxi

x1 g˜+(u)du, where the function ˜g+(·) is the natural cubic

spline interpolating {(xi, ˆg+(xi), i = 1, ..., n} and there is no guarantee that it would

be nonnegative.

2.3 Regression with Responses from Exponential Families

For the response variable Y from an exponential family distribution along with a co-variate x, consider the following conditional density function

f (y|x) = exp{yη(x) − q(η(x))

φ + c(y, φ)}, (7)

where q, and c are known functions, η(x) is the parameter function to be estimated, and φ > 0 is a known dispersion parameter independent of x. It is well known that

E[Y |x] = q0(η(x)) = µ(x) and V ar[Y |x] = q00(η(x))φ.

Assume the responses Yi corresponding to the covariate xi, i = 1, · · · , n, are i.i.d.

from the exponential family distribution (7). Then the penalized log-likelihood func-tional can be expressed as

−1 n n X i=1 {Yiη(xi) − q(η(xi))} + λ 2J (η), (8)

where J (λ) is the roughness penalty. Gu (2002) showed that the minimizer of (8) can

be computed via the Newton iteration, which updates ˜η (the η obtained in the last

iteration) by the minimizer of the following penalized weighted least squares functional 1 n n X i=1 ˜ ωi{ ˜Yi− η(xi)} 2 + λJ (η),

where ˜Yi = ˜η(xi) − ˜ui/ ˜wi with ˜ui = −Yi + q0(˜η(xi)) and ˜wi = q00(˜η(xi)). Note that

˜

(17)

3 Proposed Method

3.1 Penalized Weighted Least Squares Function Estimator for

Exponential Families

Consider the nonparametric regression for a generalized linear model specified within

the exponential family. The data observed are i.i.d. samples (xi, Yi), for i = 1, . . . , n,

from an exponential family distribution. The covariate variable xi is within some

interval [a, b] and the density of Yi is

f (y|x) = exp{yη(x) − q(η(x))

φ + c(y, φ)},

where q and c are known functions, and φ > 0 is either known or considered as a nuisance parameter. The unknown parameter function η(x) is the central aim of esti-mation. In our study, η(x) needs to meet both smoothness and monotone conditions on interval [a, b]. Instead of maximizing the log-likelihood, we choose to minimize the following penalized log-likelihood functional

−1 n n X i=1 {Yiη(xi) − q(η(xi))} + λ 2 Z b a [D(m)η(x)]2dx, (9)

where λ is the smoothing parameter.

Gu (2002) gave a quadratic approximation of −Yiη(xi) + q(η(xi)) at ˜η(xi) as

1 2ω˜i{η(xi) − ˜η(xi) + ˜ ui ˜ wi } 2 + Ci,

where ˜ui = −Yi + q0(˜η(xi)), ˜ωi = q00(˜η(xi)), and Ci is not related to η(xi). Without

imposing the constraint that η(x) is monotonic, the minimizer of the penalized log likelihood functional (9) can be obtained by recursively finding the minimizer of the penalized weighted least squares functional

l = 1 n n X i=1 ˜ ωi{ ˜Yi− η(xi)} 2 + λ Z b a [D(m)η(x)]2dx (10)

(18)

3.2 Monotone Natural Spline Estimator

To impose the monotone constraint in the estimation of η(x), we focus on the estimation of the first derivative of η(x). Since η(x) is non-increasing (or non-decreasing), the

derivative η0(x) is non-positive (or non-negative). There are reasons for dealing with

η0(x). Firstly, in practice, it is easier to impose non-positiveness (or non-negativeness)

on η0(x) than to impose monotonicity on η(x). Secondly, we can derive a closed-form

formula for η0(x) easily by the property of natural cubic splines as given in Green and

Silverman (1994).

Assume that xi, i = 1, . . . , n, are real numbers on interval [a, b], which satisfy

a ≤ x1 < x2 < . . . < xn ≤ b. Since we are more focused on the smoothness of η0(x)

than that of η(x), it is natural to choose m = 3 in the roughness penalty functional of (10). For any x ∈ [a, b],

η(x) = η(a) + Z x

a

g(u)du. Substituting the above expression into (10), we have

l = 1 n n X i=1 ˜ ωi{ ˜Yi− η(a) − Z xi a g(x)dx} 2 + λ Z b a [g00(x)]2dx. (11)

We remark that Zhang (2004) specified that ˆg, the minimizer of (5), is a natural

cubic spline; however, according to Wahba (1990), ˆg should be a piecewise quartic

polynomial in [x1, xn] and linear in the two boundary subintervals if the function space

to search for the minimizer is S2[a, b].

Mainly for the computational purpose and also for simplicity, we shall restrict the

function space to search for the minimizer of (11), denoted by ˆg, to be the class of

natural cubic splines. That is, ˆg is set to be a natural cubic spline. We remark that

the class of natural cubic splines is a smaller space than S2[a, b] but it is rich enough

for approximating any reasonably smooth function.

Using the same notation as in Green and Silverman (1994), let η = (η1, . . . , ηn),

γ = (γ2, . . . , γn−1)

T

, g = (g1, . . . , gn)

T

, where ηi = η(xi), gi = g(xi), and γi = g00(xi)

(19)

Zhang (2004) by expressing η(x) as η(x) = η(a)+Rx

a g(u)du instead of η(x1)+

Rx

x1g(u)du.

Then the matrices of C and D need to be modified as well. Denote η(a) by ηa.

Let hi = xi+1−xi, for i = 1, . . . , n−1, h0 = x1−a, k1 = h0(2+h0/h1), k2 = −h02/h1,

and k3 = −2h20h1. Let the matrices Q, R, and K be defined as in Subsection 2.1.

Modify the matrices C and D by replacing ci with ci + k1 · e1n+ k2 · en2 and di with

di+ k3· en1, where eki is the k-dimensional unit vector with the i th element being 1 and

0 elsewhere. In words, the matrix C is modified by adding k1 to the first column and

k2 to the second column. Similarly, D is modified by adding k3 to the first column.

Let M = 1₂C − ₂₄1DR−1QT _{as before.}

Proposition 1 gives the relationship between η and g. All the proofs in this paper are given in the Appendix A.

Proposition 1 η = ηa· 1n+ M g.

The main reason that we consider η(a) rather than η(x1) is that the matrix M

constructed in this way is invertible while the matrix M in Zhang (2004) is not. Another advantage is that η(a) can be specified by the vector g, utilizing the fact that a natural

cubic spline is linear on the boundary subinterval [a, x1], see the Appendix A.

Proposition 2 shows that the matrix M in Proposition 1 is invertible. In the follow-ing, when the dimension of a matrix or vector helps readfollow-ing, we will add the dimension

as the subscript in the notation. For example, Mn,n denotes the n by n matrix M .

Proposition 2 Mn,n is invertible for all n ≥ 3.

According to Proposition 1, the vector η can be specified by g and ηa. Given ηa,

(11) can be written in matrix form as

l = 1 n( ˜Y − M g) T _˜ W ( ˜Y − M g) + λgTKg = 1 n{g T_(MT_{W M + nλK)g − 2 ˜}_˜ _YT_{W M g + ˜}_˜ _YT_{W ˜}_˜_Y}, ₍₁₂₎ where ˜Y = { ˜Y1− ηa, . . . , ˜Yn− ηa} T and ˜W = diag(˜ω1, . . . , ˜ωn).

Since K is semi-positive definite, the matrix MT_{W M is strictly positive definite by}_˜

(20)

at

ˆ

g = (MTW M + nλK)˜ −1MTW ˜˜Y. (13)

Suppose the parameter function ˆη is monotone increasing. Then ˆg must be non-negative

everywhere. To achieve this, we follow the same approach adopted by Zhang (2004)

by setting ˆg(xi) to 0 when ˆg(xi) < 0. Let ˆg+= (ˆg+(x1), · · · , ˆg+(xn))T, where ˆg+(xi) =

max(ˆg(xi), 0). We then use ˆg+ to construct vector ˆη instead of using ˆg. That is,

ˆ

η = ˆηa· 1n+ M ˆg+. (14)

It is interesting to observe that M ˆg+ can be conceived as integrating the natural cubic

spline that interpolates the points {(xi, ˆg+(xi)), i = 1, . . . , n}. Denote the interpolating

natural cubic spline by ˜g+. Unfortunately, when ˆg+(xk) = ˆg+(xk+1) = 0 for some k, it

is impossible to have ˆg+(x) ≥ 0 for all x ∈ [xk, xk+1]. Then the monotonicity of ˆη is

lost. To remedy this problem, we modify the estimate to ensure the monotonicity as follows.

For xi ≤ x ≤ xi+1, i = 1, ..., n − 1, let

ˆ ηmon(x) = ˆη(a) + i X j=1 τj Z xj xj−1 ˆ g+(u)du + τx Z x xi ˆ g+(u)du,

where τi is an indicator function defined as 1 if ˆηi > ˆηi−1 and 0 otherwise and τx = 1

when Rx

xigˆ+(u)du > 0 and 0 otherwise. Note that the value of

Rx

xiˆg+(u)du can be

obtained as described in Subsection 2.1 since ˆg+ is a natural cubic spline. It is obvious

ˆ

ηmon is monotonic.

For computational purpose, we shall express ˆη_mon = (ηmon,1, ..., ηmon,n)T in matrix

form. Let ˜M be the (n + 1) × (n + 1) matrix given by

˜ M = 1 0 T n 1n M ! .

Let N be the n × (n + 1) matrix with elements nij, 1 ≤ i, j ≤ n, given by nii = −1,

ni,i+1 = 1 for 1 ≤ i ≤ n and nij = 0 elsewhere. Let S = (s1, s2, . . . , sn) be the n × n

matrix given by si = (0Ti−1, τi · 1Tn−i+1)

T

for i = 1, . . . , n.

Proposition 3 ηˆ_mon= ˆηa· 1n+ SN ˜M ¯g , where ¯g =

ˆ ηa ˆ g+ ! .

(21)

Note that N ˜M ¯g = N ˜M ηˆa ˆ g+ ! = N       ˆ ηa ˆ η1 .. . ˆ ηn       =       ˆ η1− ˆηa ˆ η2− ˆη1 .. . ˆ ηn− ˆηn−1       ,

which computes the integral Rxi+1

xi ˜g+(u)du for each subinterval. The effect of S is to

accumulate these integrals up to xi but skip those integrals Rxxii+1g(u)du for which

ˆ

ηi ≤ ˆηi−1.

3.3 Algorithm

We propose using the back-fitting approach to obtain estimators ˆηa and ˆg+. For

back-fitting, see Hastie and Tibshirani (1990). Recall that the log-likelihood function for the exponential family is

l = −1 n n X i=1 {Yiη(xi) − q(η(xi))} + λ 2 Z b a [g00(x)]2dx, (15)

where η(xi) = ηa+ M [i, .]g, if g is a natural cubic spline. Given g, we can get a suitable

value of ηa by solving the equation

∂l ∂ηa = −1 n n X i=1 {Yi− ∂q(η(xi)) ∂ηa } = 0.

Substitute the new ηa into (12) to obtain the new estimate of g by (13). Thus,

re-peat the iteration until the value of ηa converges. Since the equation ∂l/∂ηa = 0 is

non-linear, we propose using the Newton-Ralphson method to get the new estimate of η(a). For details of the Newton-Ralphson method, see, for example, Burden and Faires (2001).

Back-fitting Algorithm Step 1

Given the covariate (x1, . . . , xn), calculate the n × n matrix C, n × (n − 2) matrices

Q and D, (n − 2) × (n − 2) matrix R, then compute the matrices K = QR−1QT and

(22)

Step 2

Begin with iteration k = 0. Set ˆg(0) _{= (ˆ}_g(0)

1 , . . . , ˆgn(0))

T

and ˆη(0)

a to some initial values.

Set the tolerance level T. Step 3

Given ˆg(k), update ˆη_a(k) by ˆη_a(k+1), the minimizer of (15), with the Newton-Ralphson

method. Step 4

Construct the n × n diagonal matrix ˜W and ˜Y with ˆg(k) _{and ˆ}_η(k+1)

a . Then update ˆg(k) by ˆg(k+1) _{= (M}T_{W M + nλK)}_˜ −1_MT_{W ˜}_˜_Y. Step 5 If ˆ η(k+1)(a)−ˆη(k)(a) ˆ η(k)_(a)

< T , stop iterating and set vector ˆη = ˆη

(k+1)

a · 1n+ SN ˜M ¯g+.

Oth-erwise, set ˆg(k)_{← ˆ}_g(k+1)

+ and ˆη(k)a ← ˆη(k+1)a , return to Step 3.

4 Simulation Studies

4.1 Performance Evaluation

To evaluate the effectiveness of the proposed method, we apply the monotone regression smoother on some data generated from exponential family models, including Bernoulli and Poisson data as illustrative examples. The smoothing parameter λ is chosen by the Generalized Cross Validation (GCV) method proposed by Craven and Wahba (1979).

4.1.1 Bernoulli Data

n observations, {(xi, Yi), i = 1, · · · , n}, are generated independently, in which xi’s are

generated independently from interval [0, 1] uniformly and Yi is generated from the

Bernoulli distribution with the probability function P (Yi = 1|xi) = p(xi), where p(x)

(23)

the Bernoulli response Y given the covariate x can be written as f (y|x) = exp{y η(x) − log(1 + exp(η(x)))}, where η(x) = log(p(x)/(1 − p(x))).

As an illustrative example, we choose n=200 and p(x) = 1−(1−x4.5₎2.5_{for x ∈ [0, 1].}

That is, xi ∼ U (0, 1), Yi ∼ Bernoulli(p(xi)), i = 1, . . . , 200. The simulation results

are displayed in Figure 2. The solid line is the target function p(x), the dotted line is the estimated smooth curve under monotone constraint with smoothing parameter λ = 0.00005 chosen by GCV. It is observed that the estimated function is fairly close to the target function for this example.

In Figure 2(a), the dots are raw data {(xi, Yi)}, while the dots in Figure 2(b) are

binned data. For binning data, the interval [0, 1] is divided into 25 equally spaced subintervals. For each subinterval, we count the points and calculate the proportion of the 1’s in that subinterval. The points plotted on Figure 2(b) are the proportion

of 10s versus the midpoint of the corresponding subinterval. While it is hard to read

from the raw data the information of p(x), the binned data can follow the trend of the underlying target function p(x) pretty well.

4.1.2 Poisson Data

The conditional density of the Poisson distribution can be expressed as

f (y|x) = φ(x)

y

exp (−φ(x))

y! = exp{yη(x) − e

η(x)_{− log(y!)},}

where η(x) = log φ(x). Assume the target function is

φ(x) = log(x2+ 1) for 1 ≤ x ≤ 3.

200 covariates {xi} are generated independently from U (1, 3), and the response Yi

follows Poisson(φ(xi)), for 1 ≤ i ≤ 200. Figure 3 shows the fitting results. In Figure 3,

the solid line is the target function φ(x) and the dotted line is the estimated parameter

(24)

see how well the GCV estimate performs, we also show the “optimal” estimated curve

(the dash-dot line) for which λopt = 0.5 about this example is the λ minimizing the

averaged squared error (ASE) defined as

ASE( ˆφ(x)) = 1 n n X i=1 ˆ φ(xi) − φ(xi) 2 .

Note that ˆφ(·) depends on λ.

Again, the dots in Figure 3(a) are raw data {(xi, Yi)} while the dots in Figure 3(b)

are the binned data. When comparing the two estimated curves in terms of ASE, we

find the values of ASE are 0.01190 and 0.01189 for λopt and λGCV, respectively. There

is no obvious difference between the curves corresponding to λ = 1.5 and λ = 0.5. Consider another example in which the curvature of the mean function has more variation. Let φ(x) = 1 e − e −x + x −sin(π x) π for 1 ≤ x ≤ 3.

Similarly, generate 200 xi’s from U (1, 3) randomly and Yi’s accordingly. λGCV = 0.25

while λopt = 2.5 × 10−5 by minimizing ASE criterion. The results are shown in Figure

4. The estimated curve with λGCV has obvious departures from the target function φ.

On the other hand, the “optimal” curve with λopt captures the main trend of the target

function. In addition, the ASE is 0.008 for λopt and is 0.031 for λGCV. Note that, as

observed from Figure 4(b), the binned data are so noisy that it is hard to expect a data-driven method like GCV would perform well.

4.2 Monotone vs. Constraint-Free Smoothing Spline

Estima-tor

We are interested in knowing whether adding the monotone constraint is value-added in estimation of monotone functions. In other words, if the underlying function is monotonic, would a regular (unconstrained) smoother performs poorer than or as well as a constrained smoother? Also, how often a regular smoother would produce a non-monotone estimate when the true function is monotonic? To answer these questions,

(25)

we compare the proposed method with the method given in Gu (2002) by a simulation study.

Consider the Bernoulli example in Subsection 4.1.1. The true parameter function

under study is of the form p(x) = 1 − (1 − xα₎β_{, where α and β control the shape and}

the speed of going upward of the function. When α ≥ 1 and β ≥ 1, p(x) is increasing for all x ∈ [0, 1]. If β is much larger than α, the curve p(x) climbs up to 1 rapidly. On the other hand, if β is much smaller than α, the curve reaches 1 slowly. Thus, three settings are studied: (α, β) = (1.98, 28), (6.28, 17.67), and (6.9, 1.1) (in the order from fast to slow). Figure 1 shows these three functions. The effect of the sample size is also studied with n = 50, 100, and 200.

Let ˆpm(x) denote the estimate under the monotone constraint and ˆpg(x) denote the

unconstrainted estimate developed by Gu (2002). Define ASE(ˆp) = 1_nPn

i=1(ˆp(xi) − p(xi))2

and calculate ASE(ˆpm) and ASE(ˆpg) for the same data set. For each setting of (α, β)

and n, repeat the trial 10000 times. Table 1 displays the percentage of monotone ˆ

pg in 10000 trials. It is observed that Gu’s estimate (ˆpg) tends to produce more

non-monotone estimates for smaller n. Table 2 shows the proportion of ASE(ˆpg) ≥

ASE(ˆpm). We see that, for all n under study, when the true function rises slowly, ˆpm

performs better than ˆpg, while ˆpg performs better than ˆpm for functions rising rapidly.

And for all three target functions, the performance of ˆpm gets better when the sample

size n gets larger. For illustration, Figure 5 shows some cases with monotone ˆpg and

some with non-monotone ˆpg along with the corresponding ˆpm.

For distribution comparison, boxplots of the 10000 ASE(ˆpg) and the corresponding

10000 ASE(ˆpm) are displayed in Figures 6-8. The sample quartiles of these estimates

are given in Table 3. It seems there is no significant difference between the distributions

of ASE(ˆpg) and ASE(ˆpm). In summary, ˆpm performs as well as ˆpg in terms of ASE,

while ensuring monotonicity.

To see how bad (well) the unconstrained smoother can be when it produces a non-monotone (non-monotone) estimate, we generate 10000 cases of non-non-monotone (non-monotone)

ˆ

(26)

10000 ˆpg versus the corresponding 10000 ˆpm under the condition that ˆpg is monotonic

(in panel (a)) or not monotonic (in panel (b)). It is clear to see from these figures that

the constrained smoother outperforms the unconstrained one when ˆpg is not

mono-tonic, while performing as well as ˆpg when ˆpg is monotonic. In summary, constraining

monotonicity prevents the chance of poor estimation while attaining about the same performance for cases the unconstrained method performs well.

5 Examples

Return to the motivated WAT-EC example described above. In this section, we demon-strate the proposed method can be useful in setting appropriate EC limits to achieve better discrimination power and in the process of screening WAT test items for further engineering control.

For the first purpose, we generate responses Yi’s directly by checking if the

measure-ments are within the control limits. Suppose that 200 wafers are taken from some lots and measured the voltage at some testkeys on each wafer. Consider the relationship

between the Cp yield and the mean voltage µ(Cp,i) for the i-th wafer satisfying

µ(Cp,i) = 1 + 0.1 exp(−1.5 Cp,i3 ) Ti,

where Cp,i is the yield of the i-th wafer generated randomly from the beta distribution

Beta(8, 2) and Ti is a random variable with probability P (Ti = 1) = P (Ti = −1) = 0.5,

for i = 1 . . . , 200. Beta(8, 2) is chosen to mimic the reality because it is skewed to the

left. The function µ(Cp) indicates that the mean voltage approaches to the target

voltage level 1 as Cp increases to 1. Suppose the distribution of voltage for the i-th

wafer follows the normal distribution with mean µ(Cp,i) and standard deviation 0.1. 9

measurements are generated for each wafer and the mean voltage ¯viis computed. Figure

12 presents a histogram of these 200 mean voltages, for which the sample mean is 1.005 and sample standard deviation is 0.06. Suppose an engineer sets the upper (UCL) and

lower control limits (UCL) at LCL=0.88 and UCL=1.12, respectively. Define Yi = 1 if

¯

(27)

rate for Cp yield is estimated and displayed in Figure 13(a). The passing rate curve

seems to lack the discrimination power for Cp ≥ .6 while showing high discrimination

power for Cp in (0.4, 0.5). Such an EC performance curve is not useful for most of the

current processes in IC industries If this test item is critical in nature, the undesirable low discrimination power may be caused by the ad hoc choices of control limits. To demonstrate this, we reset the control limits to LCL=.93 and UCL=1.07. Then the EC performance curve displayed in Figure 13(b) indicates a fairly good discrimination power in the range of (.4, 1). A further reduction of the control limits to LCL=.97 and UCL=1.03 is too stringent, since the passing rate drops down to about 60% or below

even for very high Cp.

Moreover, consider the case that there is no significant relationship between the Cp

yield and voltage. Assume Cp,i ∼ Beta(8, 2) and mean voltage µi ∼ Beta(10, 10) + 0.5,

for i = 1, . . . , 200, and they are independent. Generate 200 ¯vi accordingly as described

above. For these 200 ¯vi, the sample mean is 0.99 and sample standard deviation is

0.10. Let UCL=1.1 and LCL=0.9. The corresponding passing rate presented in Figure

13(d) is flat for most of the Cp range. If we change the control limits, the resulting

passing rate changes only in the level but has a similar pattern. A test items with such kind of EC performance curve indicates the test results may be not so much related to

the Cp yield. Including this test item for EC may merely increase the number of false

alarms. Consequently, this test item should not be chosen for engineering control.

6 Conclusions

In this study, motivated by the WAT-EC problem, we develop a nonparametric mono-tone smoothing spline smoother for analyzing responses from exponential families by combining the methodologies provided in Gu (2002) and Zhang (2004) along with our modification. An algorithm with implementation details is provided. Computation is efficient because we utilize the characteristics of the natural cubic splines. The simu-lation results demonstrate that the proposed method performs well in the regression

(28)

models with both the Bernoulli and Poisson responses. When the “true” function is monotonic, the proposed monotone estimator performs about the same as the uncon-strained smoother in terms of the averaged squared error for the cases when the latter performs well. On the other hand, constrained smoother outperforms the unconstrained smoother when the unconstrained smoother produces non-monotone estimates. Thus, the choice is obvious. If the function is monotonic in nature, then we should choose the method with the monotone constraint imposed.

As an illustrative example, we demonstrate the proposed method can be used in screening test items for engineering control and in setting appropriate control limits.

With the nonparametric regression in nature, the proposed method has the great advantage of model flexibility. Also since the method can be applied to all kinds of data as long as they follow the exponential family, it can find many potential applications in areas such as industries, medicine, education, social studies, and so on.

(29)

A

Appendix: Proofs

Proposition 1.

η = ηa· 1n+ M g.

Proof. Define xi− xi−1= hi−1, for i = 2, . . . , n, and let h0 = x1− a. Since the function

g is linear in subinterval [a, x1], it shows that

Z x1

a

g(u)du = h0(g1 + g(a))

2 . (16)

By Green and Silverman (1994),

g0₁ = g2− g1 x2− x1 − 1 6(x2− x1)γ2 and g(x) = g1− (x1− x)g10 for x ≤ x1.

Then g(a) can be expressed in terms of g1, g2, and γ2 as

g(a) = (1 +h0 h1 )g1− h0 h1 g2+ 1 6h0h1γ2.

Substituting the above expression into (16), we get Z x1 a g(u)du = 1 2{h0(2 + h0 h1 )g1 − h02 h1 g2} − 1 24{−2h0 2 h1γ2}. (17)

According to the proposition in Zhang (2004), Z xi x1 g(x)dx = 1 2c T i g − 1 24d T i γ for 1 ≤ i ≤ n,

where g and ci are n × 1 vectors, γ and di are (n − 2) × 1 vectors. ci and di are

the same as that in Zhang (2004). Denoting k1 = (2 + h0/h1)h0, k2 = −h02/h1, and

k3 = −2h02h1, we have ηi = η(a) + Z xi a g(x)dx = η(a) + Z x1 a g(x)dx + Z xi x1 g(x)dx = η(a) + 1 2(k1g1 + k2g2+ c T i g) − 1 24(d T i γ + k3γ2) = η(a) + 1 2(ci+ k1· e n 1 + k2· en2) T g − 1 24(di+ k3· e n−2 1 ) T γ, (18) where ek

i is the k-dimensional unit vector with the i th element being 1 and 0 elsewhere.

(30)

Proposition 2. Mn,n is invertible for all n ≥ 3.

Proof. Let Gn,n = 1₂Cn,nand Hn,n−2 = ₂₄1Dn,n−2. Then Mn,n = Gn,n−Hn,n−2Rn−2,n−2−1 QTn−2,n.

It suffices to show that det(Mn,n) 6= 0. Define the (2n − 2) × (2n − 2) matrix N2n−2 by

N2n−2 = " Gn,n Hn,n−2 QT_n−2,n Rn−2,n−2 # . By the properties of block matrix decomposition,

" Gn,n Hn,n−2 QT n−2,n Rn−2,n−2 # = " I HR−1 O I # " Mn,n O O R # " I O R−1QT _I # .

Since the matrix R is invertible, we then know that det(Mn,n) ∝ det(N2n−2). Thus, we

only need to prove that det(N2n−2) 6= 0 for all n ≥ 3. More specifically, we simplify

the matrix N by some elementary matrix operations into the matrix N0

det(N2n−2) ∝ det(N2n−20 ) = G0_n,n H_n,n−20 Q0T n−2,n R 0 n−2,n−2 , (19)

where G0_n,n is the matrix with entries gi,j given by gi,i = gi,i+1 = h−1i−1 for 2 ≤ i ≤ n,

g1,1 = k1/2, g1,2 = k2/2, and gi,j = 0 elsewhere; the matrix Hn,n−20 has elements

νi,j, for 2 ≤ i ≤ (n − 1), νi,i−2 = νi,i−1 = h₁₂i−1, ν1,1 =

−h2 0h1 12 , ν2,1 = h1 12,νn,n−2 = hn−1 12 , and 0 elsewhere; Q 0_T

n−2,n is the (n − 2) × n matrix with elements qi,j, where

qi,i+1 = −2h−1i − 2h −1

i+1 for 1 ≤ i ≤ (n − 2) and 0 elsewhere; the matrix R

0

n−2,n−2 is

a symmetric band matrix with entries ri,i, ri,i = 1₄(hi + hi+1) for 1 ≤ i ≤ (n − 2),

ri,i+1 = ri+1,i = ₁₂1hi+1 for 1 ≤ i ≤ (n − 3), and 0 otherwise. When n = 2, N20 is a

2 × 2 matrix, where the first row is (k1/2, k2/2) and the second row is (h−11 , h

−1

1 ). It is

clearly that det(N₂0) > 0 since k1 > k2.

According to Lemma 1 stated below, for all n ≥ 3, the determinant of N_2n−20 can

be expressed in terms of N_2(n−1)−20 by

det(N_2n−20 ) = 1

4det(N

0

2(n−1)−2) + Kn,

where Knis a positive number. Since det(N20) > 0, it follows clearly that det(N

0

2n−2) > 0

(31)

For illustration, Appendix B gives N_2n−20 for n = 4 and 5 and expresses N_2∗5−20 in

terms of N_2∗4−20 .

Lemma 1. For the square matrix N_2n−20 defined by (19), for n ≥ 3,

det(N_2n−20 ) = 1

4det(N

0

2(n−1)−2) + Kn, (20)

for some Kn> 0.

Proof. We will show the equality (20) by induction. Step 1:

When n = 2, it is clear that det(N₂0) = h0 (h0+ h1)/h12 > 0.

When n = 3, det(N₄0) = h0(3h0(h1+ h2) + h1(2h1+ 3h2)) 12h2 1h22 = h0 (3 h0+ 3 h1) 12 h12 +h0 3 h0h12+ 2 h13 12 h12h22 + h0 6 h0h1+ 5 h12 12 h12h2 = 1 4det(N 0 2) + K3, where K3 = h0(3 h0h12+2 h13) 12 h12h22 + h0(6 h0h1+5 h12)

12 h12h2 > 0. Thus (20) holds for n = 3.

When n = 4, we calculate the determinant of N_2∗4−20 . It follows that

det(N₆0) = 1 4det(N 0 4) + K4, where K4 = h0 9 h0h12h22+ 6 h13h22+ 12 h0h1h23+ 10 h12h23+ 4 h0h24+ 4 h1h24 144 h12h22h32 + h0 18 h0h12h2+ 12 h13h2 + 30 h0h1h22+ 25 h12h22+ 12 h0h23+ 12 h1h23 144 h12h22h3 .

It is clear that (20) holds since h0, h1, h2, h3 > 0.

Step 2:

We would like to prove (20) for general n by induction. For a fixed k > 5, assume

(32)

When n = k, consider the matrix form for N_2k−20 N_2k−20 = " G0_k,k H_k,k−20 Q0T k−2,k R 0 k−2,k−2 # .

By the structures of G0,H0,Q0, and R0, we can write the determinant of N_2k−20 in terms

of N_2(k−1)−20 . More specifically, we interchange some column vectors of N_2k−20 , such that the i-th column vector becomes the (i−1)-th column vector for i = k+1, . . . , 2k−1, and the k-th column vector becomes the (2k − 1)-th column vector by some permutation operations. Do the same thing with the row vectors. We then obtain

det(N_2k−20 ) = N_2k−40 U2k−4,2 V2,2k−4 T2,2 (21)

for some matrices U2k−4,2, V2,2k−4, and T2,2. The matrix in (21) can be transformed into

a lower triangle with some elementary matrix operations. More specifically, the block matrix in (21) can be written as

" N_2k−40 U2k−4,2 V2,2k−4 T2,2 # = " N_2k−40 O2k−4,2 V2,2k−4 T2,20 # · " I −N_2k−40−1 U2k−4,2 O I # .

We then obtain det(N_2k−20 ) = det(N_2k−40 )·det(T_2,20 ). The matrix T_2,20 is an upper triangle

matrix, where T_2,20 [1, 1] = h−1_k−1 and

T_2,20 [2, 2] = hk−1+ hk−2 4 − h2 k−2 12 N 0₋₁ 2k−4[2k − 4, 2k − 4] + (2h−1_k−2+ 2h−1_k−1) ( h2 k−2 12 + hk−2 12 N 0₋₁ 2k−4[k − 1, 2k − 4] ) . (22)

By induction hypothesis, det(N_2n−20 ) = 1

4det(N

0

2(n−1)−2) + Kn holds for n = k − 1 and

n = k − 2, and det(N_2(k−3)−20 ) > 0, we get N_2k−40−1 [2k − 4, 2k − 4] < 4/hk−2 by Lemma 2,

and N_2k−40−1 [k − 1, 2k − 4] > −hk−2/3 by Lemma 3. It follows that

T_2,20 [2, 2] > hk−1+ hk−2 4 − h2 k−2 12 4 hk−2 + (2h−1_k−2+ 2h−1_k−1) ( h2 k−2 12 − hk−2 12 hk−2 3 ) > h 2 k−2 18 (2h −1 k−2+ 2h −1 k−1) − hk−2 3 + hk−1+ hk−2 4 = hk−2 36 + h2 k−2 9hk−1 + hk−1 4 > hk−1 4 . Then det(N_2k−20 ) = T_2,20 [2, 2] 1 hk−1 det(N_2k−40 ) = hk−1 4 + T 0 2,2[2, 2] − hk−1 4 !! 1 hk−1 det(N_2k−40 ).

(33)

Define Kk = T_2,20 [2, 2] − hk−1 4 _det(N0 2k−4) hk−1 . Since T 0 2,2[2, 2] > hk−1 4 and det(N2k−4) > 0, then det(N_2k−20 ) = 1 4det(N 0 2k−4) + Kk > 0 as required.

Lemma 2. If det(N_2(k−1)−20 ) = 1₄det(N_2(k−2)−20 ) + Kk−1 is satisfied, then

N_2k−40−1 [2k − 4, 2k − 4] < 4

hk−2

for k ≥ 4.

Proof. It is known that the elements of the inverse of N2k−4 can be expressed in terms

of its cofactor matrix.

N_2k−40−1 [2k − 4, 2k − 4] = det(cof actor(N

0

2k−4[2k − 4, 2k − 4]))

det(N_2k−40 ) .

By the definition of N_2k−40 , it is easy to see that

det(cof actor(N_2k−40 [2k − 4, 2k − 4])) = 1

hk−2

det(N_2(k−2)−20 ).

Since det(N_2(k−1)−20 ) = 1₄det(N_2(k−2)−20 ) + Kk−1 is satisfied by induction hypothesis, we

obtain N_2k−40−1 [2k − 4, 2k − 4] = 1 hk−2det(N 0 2(k−2)−2) 1 4det(N 0 2(k−2)−2) + Kk−1 = 4 hk−2+ Y < 4 hk−2 as required, since Y = 4∗Kk−1 (hk−2det(N_2k−40 )) > 0.

Lemma 3. If det(N_2n−20 ) = 1₄det(N_2(n−1)−20 ) + Kn is satisfied for n = k − 1, n = k − 2,

and det(N_2n−20 ) > 0 for n = k − 3, then

N_2k−40−1 [k − 1, 2k − 4] > −hk−2

3 for k ≥ 5.

Proof. Since N0N0−1 = I, it is obvious that N0[k − 1, ·]N0−1[·, 2k − 4] = 0, where

N0[k − 1, ·] is the (k − 1)-th row of N0 and N0−1[·, 2k − 4] is the (2k − 4)-th column of

N0−1. More specifically, it can be shown that

N_2k−40−1 [k − 2, 2k − 4] + N_2k−40−1 [k − 1, 2k − 4] = −h 2 k−2 12 N 0₋₁ 2k−4[2k − 4, 2k − 4].

(34)

Consider the cofactor of N_2k−40 [k − 2, 2k − 4]. By the structure of N_2k−40 , we obtain

N_2k−40−1 [k − 2, 2k − 4] ∝ det(cof ator(N_2k−40 [k − 2, 2k − 4])) = − hk−3

12hk−2

det(A), where A is a (2k − 6) × (2k − 6) matrix and all elements in A are the same as the

elements in N_2k−60 except A[2k − 6, k − 3] and A[2k − 6, 2k − 6], where A[2k − 6, k − 3] =

−2h−1_k−2−3h−1_k−3 < N_2k−60 [2k−6, k−3] and A[2k−6, 2k−6] = hk−3

4 + hk−2

6 . Similarly, using

elementary matrix operations, the det(N_2k−60 ) and det(A) can be formed by N_2(k−3)−20 .

We obtain that det(N_2k−60 ) = N_2k−80 O2k−8,2 V_2,2k−80 X2,2 and det(A) = N_2k−80 O2k−8,2 V_2,2k−80 Y2,2 ,

where both of X and Y are upper triangle matrices with X[1, 1] = Y [1, 1] = h−1_k−3, and

X[2, 2] = hk−2 4 + hk−3 4 − c − N 0 2k−6[2k − 6, k − 3]d, Y [2, 2] = hk−2 6 + hk−3 4 − c − A[2k − 6, k − 3]d, where c = h 2 k−3 12 N 0₋₁ 2k−8[2k − 8, 2k − 8] and d = h2 k−3 12 + hk−3 12 N 0₋₁ 2k−8[k − 4, 2k − 8]. Since

det(N_2k−60 ) = det(N_2k−80 ) · X[1, 1] · X[2, 2] = 1₄det(N_2k−80 ) + Kk−2 > 0 is satisfied, we

find that Kk−2= h−1k−3 hk−3 4 − c − N 0 2k−6[2k − 6, k − 3]d ! > 0. It follows that hk−3 4 − c − A[2k − 6, k − 3]d > hk−3 4 − c − N 0 2k−6[2k − 6, k − 3]d > 0.

Then det(A) = det(N_2k−80 ) · Y [1, 1] · Y [2, 2] > 0 and N_2k−40−1 [k − 2, 2k − 4] < 0. Since

N_2k−40−1 [2k − 4, 2k − 4] < 4

hk−2 is satisfied by Lemma 2, we obtain the the inequality

N_2k−40−1 [k − 1, 2k − 4] > −hk−2

(35)

B

Appendix: The Construction of N

_2n−20

in

Propo-sition 2

When n = 4, we find N_2∗4−20 =              h0 2+h0 h1 2 −h02 2 h1 0 0 −₍h02h1) 12 0 1 h1 1 h1 0 0 h1 12 0 0 _h1 2 1 h2 0 h2 12 h2 12 0 0 _h1 3 1 h3 0 h3 12 0 −2_h 1 − 2 h2 0 0 h1+h2 4 h2 12 0 0 −2_h 2 − 2 h3 0 h2 12 h2+h3 4              . If n = 5, then N_2∗5−20 =                   h0 2+h0_h1 2 −h02 2 h1 0 0 0 −₍h02h1) 12 0 0 1 h1 1 h1 0 0 0 h1 12 0 0 0 _h1 2 1 h2 0 0 h2 12 h2 12 0 0 0 _h1 3 1 h3 0 0 h3 12 h3 12 0 0 0 _h1 4 1 h4 0 0 h4 12 0 −2_h 1 − 2 h2 0 0 0 h1+h2 4 h2 12 0 0 0 −2_h 2 − 2 h3 0 0 h2 12 h2+h3 4 h3 12 0 0 0 −2_h 3 − 2 h4 0 0 h3 12 h3+h4 4                   .

By some permutation operations, det(N_2∗5−20 ) can be expressed in terms of det(N_2∗4−20 ).

That is, det(N_2∗5−20 ) = h0 2+h0_h1 2 −h02 2 h1 0 0 −₍h02h1) 12 0 0 0 1 h1 1 h1 0 0 h1 12 0 0 0 0 _h1 2 1 h2 0 h2 12 h2 12 0 0 0 0 1 h3 1 h3 0 h3 12 0 h3 12 0 −2_h 1 − 2 h2 0 0 h1+h2 4 h2 12 0 0 0 0 −2_h 2 − 2 h3 0 h2 12 h2+h3 4 0 h3 12 0 0 0 _h1 4 0 0 1 h4 h4 12 0 0 0 −2_h 3 − 2 h4 0 h3 12 0 h3+h4 4 = N_2∗4−20 U6,2 V2,6 T2,2 .

(36)

References

[1] Burden, R. L. and Faires, J. D. (2001). Numerical Analysis, Brooks/Cole, Aus-tralia.

[2] Craven, P. and Wahba, G. (1979). Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation, Numerical Mathematik, 31, 377-403.

[3] de Boor, C. (2001). A Practial Guide to Splines: with 32 figures, Rev. ed., Springer, New York.

[4] Eubank, R. L.(1990). Nonparametric Regression and Spline Smoothing, 2nd ed., Marcel Dekker, New York.

[5] Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models, Chapman and Hall, London.

[6] Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and General-ized Linear Models: a Roughness Penalty Approach, Chapman and Hall, London. [7] Gu, C. (2002). Smoothing Spline ANOVA Models, Springer, New York.

[8] Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis, 2nd ed., Springer, New York.

[9] Rossi, N., Wang, X., and Ramsay, J. O. (2002). Nonparametric item response function estimates with the EM algorithm, Journal of Educational and Behavioral Statistics, 27, 291-317.

[10] Wahba, G. (1991). Spline Models for Observational Data, Philadelphia, Pennsyl-vania/SIAM.

[11] Wang, Z. (2000). An algorithm for generalized monotonic smoothing, Journal of Applied Statistics, 27, 495-507.

(37)

[12] Zhang, J. T. (2004). A simple and efficient monotone smoother using smoothing splines, Nonparametric Statistics, 16(5), 779-796.

(38)

Table 1: The proportion of Monotony in 10000 repeats n = 50 n = 100 n = 200 α = 1.98 β = 28 77.89% 91.38% 96.2% ˆ pg α = 6.28 β = 17.67 79.43% 93.46% 97.27% a=6.9 β = 1.1 85.54% 93.36% 94.67%

Table 2: Proportion of ASE( ˆpg)

ASE( ˆpm) ≥ 1 n = 50 n = 100 n = 200 α = 1.98 β = 28 39.91% 43.76% 48.34% α = 6.28 β = 17.67 49.27% 52.5% 56.12% α = 6.9 β = 1.1 56.3% 61.45% 66.47%

Table 3: The distribution summary for ASE(ˆpg) and ASE(ˆpm), where Q1 is the first

quartile , Q3 is the third quartile. The unit is 10−3.

Q1 Median Mean Q3 ˆ pg 2.3 4.9 7.0 9.3 n = 50 ˆ pm 2.3 4.9 6.8 9.1 ˆ pg 1.6 2.8 4 5.1 α = 1.98 β = 28 n = 100 pˆm 1.5 2.7 3.7 4.9 ˆ pg 1.0 1.7 2.2 2.8 n = 200 ˆ pm 0.83 1.5 2.0 2.6 ˆ pg 1.2 2.9 4.6 6.1 n = 50 ˆ pm 1.2 2.9 4.6 6.1 ˆ pg 0.71 1.7 2.7 3.6 α = 6.28 β = 17.67 n = 100 pˆm 0.7 1.6 2.5 3.4 ˆ pg 0.52 1.08 1.62 2.14 n = 200 ˆ pm 0.44 0.95 1.39 1.87 ˆ pg 1.4 3.1 4.8 6.3 n = 50 ˆ pm 1.8 3.2 5.6 6.9 ˆ pg 0.91 2.0 2.9 3.8 α = 6.9 β = 1.1 n = 100 pˆm 1.1 2.2 3.0 4.0 ˆ pg 0.54 1.1 1.6 2.1 n = 200 ˆ pm 0.59 1.2 1.6 1.2

(39)

Table 4: For sample size n = 100 and 10000 monotone ˆpg, the Distribution

summary of ASE(ˆpg) and ASE(ˆpm), where the Q1 is the first quartile ,

Q3 is the third quartile. The unit is 10−3.

Q1 Median Mean Q3 ˆ pg 0.7 1.6 2.5 3.3 α = 1.98 β = 28 pˆm 0.8 1.7 2.6 3.5 ˆ pg 0.8 1.8 2.8 3.7 α = 6.28 β = 17.67 pˆm 0.7 1.7 2.6 3.5 ˆ pg 1.3 2.5 3.4 4.4 α = 6.9 β = 1.1 pˆm 1.1 2.2 3.1 4.1

Table 5: For sample size n = 100 and 10000 non-monotone ˆpg, the

dis-tribution summary of ASE(ˆpg) and ASE(ˆpm), where the Q1 is the first

quartile , Q3 is the third quartile. The unit is 10−3.

Q1 Median Mean Q3 ˆ pg 5.4 8.1 9.2 11.5 α = 1.98 β = 28 pˆm 0.8 1.8 2.7 3.6 ˆ pg 5.0 7.4 8.4 10.5 α = 6.28 β = 17.67 pˆm 0.8 1.8 2.8 3.7 ˆ pg 4.1 6.4 7.5 9.7 α = 6.9 b=1.1 pˆm 1.1 2.3 3.3 4.3

(40)

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Cp P (( Cp ))

Figure 1: Examples of three EC performance curves.

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x p(x) ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●●● ● ●●●●●●●●●●●● ● ●●●●●● ● ●●●● ● ● ●●●●●●●●● ●● ● ●● ● ● ● ● ●●● ● ●● ●● ● ● ● ●●● ● ● ● ●●● ● ● ● ● ● ● ●●●● ● ●● ● ● ●● ● ●●●● ●●●● ● ● ●●●●●●●●●●●●●● Bernoulli (a) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x p(x) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Bernoulli (b)

Figure 2: Bernoulli data. The solid line is the target function p(x) = 1 − (1 − x4.5)2.5.

The dotted line is the estimated curve with monotone constraint. Dots in the left panel

(41)

1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 x λλ (( x )) ●● ● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ●●● ● ● ● ●● ●●●● ● ● ●● ●●● ●● ● ● ● ● ● ●● ●●●●● ●●● ● ● ● ● ●● ● ● ●● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ●●● ● ● ●● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● Poisson (a) 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 x λλ (( x )) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Poisson (b)

Figure 3: Poisson data. The solid line is the target function φ(x) = log(x2_{+ 1). Dotted}

line is the estimated curve by GCV method. Dash-dot line is the estimated curve with

smoothing parameter 1.5. Dots in the left panel are points (xi, yi), i = 1, . . . , 200. Dots

in the right panel are binned data.

1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 x λλ (( x )) ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●●●● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ●● ● ● ●● ●●● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● Poisson (a) 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 x λλ (( x )) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Poisson (b)

Figure 4: Poisson data. The solid line is the target function φ(x) = 1

e−e

−x_+x−sin(π x) π .

The dotted line is the estimated curve by GCV method. Dash-dot line is the estimated

curve with smoothing parameter 2.5 × 10−5. Dots in the left panel are points (xi, yi),

(42)

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x p(x) ●●●●●●●●●●● ●● ● ●●●● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ●●● ● ●● ●●●●●●● ●● ●●●● ●●●●● ●● ●●●●●●●● ● ● ●●●● ●●● ● ●●● ●●●● ●●● ●●● ●● ●●●●●●● Target Gu Mono (a) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x p(x) ●●● ●●●●● ●●●●● ● ● ● ●● ● ●●● ● ●●●● ●● ●●●●●●●● ●●●●●● ●●● ● ●●●●● ●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●● Target Gu Mono (b) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x p(x) ● ●●● ●● ● ●●●●●●●● ●●●●● ●●●●● ● ●●●●●●●●●●●● ●● ●●●● ●●●● ● ● ●●● ●●● ● ● ●● ● ● ● ● ● ●● ●●●●●●● ●●● ● ●●●● ●●●●●●●●●●●●●● ●●●● Target Gu Mono (c) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x p(x) ●●●● ●●●●●●● ● ●●●● ●●●●● ●●●● ●●●● ●●● ●● ●●●●●●●●●●● ●●●● ●●● ● ● ● ●●●● ● ●● ●● ● ●● ● ●●●●●●●● ●●●●●● ● ● ●●●●●●●●●●●●●●●● Target Gu Mono (d) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x p(x) ● ●●●● ●●●● ●●●●●●● ●●●●●●● ● ●●● ●●●●●●●● ● ●●● ●●●●●● ● ●●● ●●●●●●●●●●●● ●●●●●●●●● ● ●●● ● ●●●●● ● ●● ● ●● ● ●● ● ● ●● ● ● ● ●●●● Target Gu Mono (e) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x p(x) ●●●●●●●●● ●●●●●●● ●● ● ●●● ●●●●●●●●●●●●●● ●●●●● ●●●●●●●●● ●● ● ●●●●●●●●●●●●●●●●● ●●●● ● ● ●● ● ● ●● ● ●●● ●●● ● ● ● ● ● ●●● ●●● Target Gu Mono (f)

Figure 5: Illustrative examples. The solid line is the target function p(x) = 1 −

(1 − xα)β. For panels from top to bottom, (α, β) are (1.98,28), (6.28,17.67), and

(6.9,1.1), respectively. The dashed line is the estimated curve ˆpg by Gu(2002). The

dash-dot line is the estimated curve ˆpm with monotone constraint. The dots are data

points. The left three panels shows the cases with monotone ˆpg, while ˆpg in the right

單調無母數迴歸在廣義線性模型上之研究

國 立 交 通 大 學

統計學研究所

碩 士 論 文

單調無母數迴歸在廣義線性模型上之應用

Nonparametric Monotone Regression for

Generalized Linear Models

研 究 生 ： 文誠智

指導教授 ： 洪志真 博士

單調無母數迴歸在廣義線性模型上之應用

Nonparametric Monotone Regression for

Generalized Linear Models

研 究 生：文誠智

Student

：Cheng-Chih Wen

指導教授：洪志真

Advisor

：Dr. Jyh-Jen Horng Shiau

國 立 交 通 大 學

統計學研究所

碩 士 論 文

A Thesis

Submitted to Institute of Statistics

College of Science

National Chiao Tung University

in Partial Fulfillment of the Requirements

for the Degree of

Master

in

Statistics

June 2007

Hsinchu, Taiwan, Republic of China

單調無母數迴歸在廣義線性模型上之研究

研究生：文誠智 指導教授：洪志真 博士

國立交通大學統計學研究所

摘要

本篇文章裡，為解決WAT(Wafer Acceptance Test)-EC(Engineering

Control) 的 問 題 ， 我 們 發 展 了 單 調 無 母 數 迴 歸 。 藉 由 Gu(2002) ，

Zhang(2004)所提出的方法加以結合及修正，將反應變數拓展至整個指

數族上，與此相關的演算法也會在本文中提出。我們利用Natural Cubic

Splines的性質發展出有效率的計算法，並用模擬資料來探討其效率。

當反應變數為Bernoulli或Poisson分部時，其模擬的結果都有不錯的

表現。此外，在“真實函數＂具有單調性的情形下，有單調限制估計

量之ASE(Averages Square Error)與無單調限制並沒有明顯差異。然

而，當無單調限制之估計量呈現出非單調時，則單調限制估計量在ASE

上之表現會明顯優於前者。最後，我們將說明如何利用此方法來篩選

EC中的WAT測試項目並且建立適當的管制上下限。

Nonparametric Monotone Regression for

Generalized Linear Models

Student: Cheng-Chih Wen

Advisor: Dr. Jyh-Jen Horng Shiau

Institute of Statistic

National Chiao Tung University

誌 謝

陳鄰安教授曾經說過：「數學可以因數學而數學，統計不能因統計而統計」，我

用了兩年的碩士生涯才體會到這兩句話的涵義。不僅如此，除了在統計方面學到許多

新知識，也認識很多好朋友。對於此篇論文的完成，我要感謝我的指導老師 洪志真

教授仔細認真的批閱，以及教導我做學問應有的態度，並耐心回答我在論文上所遇到

的問題，使我順利完成論文。也要感謝我的研究所同學們俊睿、益銘、花花、雪芳、

永在和建威，給我鼓勵與歡笑，帶給我美好的碩士生回憶。

最後要感謝博士班的碩慧學姐，除了論文上的協助外，也時常不吝惜分享其人生

經驗與價值觀，讓我了解自己欠缺的知識，並在我人生的規劃中給予適時的建議，受

益良多。

此外還要感謝家人的支持，以及女朋友的體諒，使我可以毫無顧慮地專心完成論

文。僅將此論文獻給我的老師、學姐、家人以及朋友們

文 誠 智 謹誌于

國立交通大學統計研究所

中華民國九十六年六月

Contents

1

Introduction

2

Literature Review

2.1

Smoothing Splines - Natural Cubic Splines

2.2

Monotone Smoothing Splines

2.3

Regression with Responses from Exponential Families

3

國立交通大學

碩士論文

研究生：文誠智

指導教授：洪志真博士

研究生：文誠智

國立交通大學

碩士論文

研究生：文誠智指導教授：洪志真博士

Control) 的問題，我們發展了單調無母數迴歸。藉由 Gu(2002) ，

誌謝

新知識，也認識很多好朋友。對於此篇論文的完成，我要感謝我的指導老師洪志真

文誠智謹誌于