CHAPTER I: INTRODUCTION TO BANKRUPTCY PREDICTION METHODS
1.4 The SLM
In this section, the formulation of the SLM using the prospective sample and that using the case-control sample will be given. The SLM is defined similarly to the LLM (1) by replacing the linear relationship α + β x of the continuous predictor X in the logit function of the LLM with an unknown function H(x).
Given the prospective sample (Yi, xi, zi), i = 1, · · ·, n, the SLM is defined by assuming the bankruptcy probability for the company with the predictor values (X, Z) = (x, z) to be
p(Y = 1 | X = x, Z = z) = exp{H(x) + θ z}
1 + exp{H(x) + θ z}, (4) or written in the form of the logit function of bankruptcy probability
logit{p(Y = 1 | X = x, Z = z)} = log
½ p(Y = 1| X = x, Z = z) 1− p(Y = 1 | X = x, Z = z)
¾
= H(x) + θz.
Here, we only assume H(x) to be a smooth function of the value x of the continuous predictor X, otherwise, it is not specified. Also, θ is a 1×q vectors of logistic parameters, as it does in the LLM (1). Clearly, this is a very flexible prediction model. For the company with predictor values (x0, z0), its predicted probability of bankruptcy is thus defined as
ˆ
p(Y = 1| X = x0, Z = z0) = exp{ ˆH(x0) + ˆθ z0}
1 + exp{ ˆH(x0) + ˆθ z0}, (5) the logistic distribution evaluated at the predictive score ˆH(x0) + ˆθ z0. Here ˆH(x0) and ˆθ are estimates derived by applying the local likelihood method to the prospective sample from the SLM (4).
The local likelihood approach for producing ˆH(x0) and ˆθ in (5) is now introduced.
This approach is composed of three steps. In the first step, an initial local likelihood estimate ˆH1(x0) of H(x0) is generated. There exists many methods for estimating H(x0). One of these methods with simple idea is the local likelihood method; see
Tibshirani and Hastie (1987). This method is to first choose a positive scalar constant bθ, also called the bandwidth, and define a neighborhood of x0 as
N (x0; bθ) ={t = (t1,· · ·, td)T :|tj − x0j| ≤ bθ, for j = 1, · ··, d},
where x0 = (x01, · · ·, x0d)T. Then the idea of the local likelihood method is to apply both concepts of the weighted likelihood method using partial sample
S(x0; bθ) ={(Yi, xi, zi) : xi∈ N(x0; bθ), for i = 1, · ··, n},
and the first order Taylor approximation
H(xi)≈ H(x0) + H(1)(x0)T (xi− x0)≡ α + β (xi− x0),
for each xi ∈ N(x0; bθ). Here the larger the value of bθ, the larger the number of data points contained in S(x0; bθ). Also, the parameters α and β are 1×1 and 1×d vectors of parameters, respectively, as they are in the LLM. But, they now stand for the unknown quantities H(x0)and H(1)(x0)T, respectively, and H(1)(x0) is the d × 1 vector of partial derivatives of H(x0).
Specifically, to produce ˆH1(x0), a bankruptcy probability model developed by the above arguments
p(Y = 1| X = x, Z = z) = exp{α + β (x − x0) + θ z}
1 + exp{α + β (x − x0) + θ z} (6) is imposed to the prospective sample (Yi, xi, zi), i = 1, · · ·, n, from the SLM (4) with xi ∈ N(x0; bθ). Given the value of bθ and the resulting bankruptcy probability model (6) for the prospective sample from the SLM (4) with xi ∈ N(x0; bθ), the local
log-likelihood function of η = (α, β, θ)T is defined by function, and is used to compute the weight assigned to the data. It is usually taken as a symmetric and unimodal probability density function over [−1, 1]. Hence it gives positive weight to the data inside the neighborhood sample S(x0; bθ) and weight 0 outside. The larger weights are given to data points with X values closer to x0 and smaller weights to those with X values far from x0. However, the results from the literature show that the choice of the density function K(·) is not very important in the local fitting. A popular choice of K(·) is the Epanechnikov kernel defined as
K(u) = (3/4) (1− u2) I(|u| ≤ 1);
see Wand and Jones (1995), due to its computational convenience and optimal per-formance (for example it minimizes mean square error among all nonnegative kernel functions).
as the initial local likelihood estimate ˆH1(x0) of H(x0). By the same arguments for the consistency of the maximum likelihood estimate ˆα derived by (3) for the LLM (1), Hˆ1(x0) is a consistent estimate of H(x0). For this fact, see also Fan, Heckman, and Wand (1995).
Note that the concept of local inference is well established in regression analysis;
see also Wand and Jones (1995). There are two major strategies considered in the local likelihood approach: using linear approximation (the first order Taylor approximation) for each H(xi) with xi ∈ N(x0; bθ), and using the partial (local) sample S(x0; bθ) to derive the maximum local likelihood estimates. This method is directly analogous to the LLM, except that here we have used the concept of local fitting.
In the second step, the estimate ˆθrequired in (5) is generated by applying the simple logistic regression analysis. To estimate the value of θ, we shall replace the unknown quantity H(xi)in the SLM (4) with its initial local likelihood estimate ˆH1(xi),for each i = 1, · · ·, n, fit the bankruptcy probability by the resulting model
p(Y = 1| X = xi, Z = zi) = exp{α0+ ˆH1(xi) + θ zi}
1 + exp{α0+ ˆH1(xi) + θ zi}, (8) and use the prospective sample from the SLM (4) to maximize the corresponding pseudo profile log-likelihood function with respect to φ = (α0, θ)T. Here α0 is a normalizing constant which makes the bankruptcy probability function (8) be integrated to 1.
Specifically, using the bankruptcy probability model (8) and the prospective sample from the SLM (4), the pseudo profile log-likelihood function of φ = (α0, θ)T is
ˆSLM(φ) =
Hence the required estimate ˆθ of θ in (5) is obtained. By the results in Hosmer and Lemeshow (1989), the consistency of ˆθ for θ can be seen.
Finally, in the third step, the local likelihood estimate ˆH(x0) required in (5) is
produced. To produce the value of ˆH(x0), follow the same arguments in the first step, replace the unknown quantity θ with ˆθ obtained in the second step, use the value of bandwidth bH, fit the bankruptcy probability by the resulting model
p(Y = 1| X = xi, Z = zi) = exp{α∗+ β (xi− x0) + ˆθ zi}
1 + exp{α∗+ β (xi− x0) + ˆθ zi} (10) for the prospective sample from the SLM (4) with xi ∈ N(x0; bH), and maximize the corresponding pseudo profile local log-likelihood function with respect to ξ = (α∗, β)T. Here α∗ and β stand for H(x0) + α1 and H(1)(x0)T, respectively, where α1 is a normal-izing constant which makes the bankruptcy probability function (10) be integrated to 1.
Specifically, using the value of bandwidth bH, the resulting bankruptcy probability model (10), and the prospective sample from the SLM (4), the pseudo profile local log-likelihood function of ξ = (α∗, β)T is
Combining the consistency of ˆθ and the consistency of the maximum likelihood esti-mates (ˆα∗, ˆβ) for (α∗, β), we see that the value of α1 converges to 0, as the sample size of prospective data become large. Hence ˆα∗ is a consistent estimate of H(x0), and the required estimate ˆH(x0) of H(x0) in (5) may be taken as ˆH(x0) = ˆα∗. For this fact, see also Fan, Heckman, and Wand (1995).
By the consistency of ˆH(x0)and ˆθ, the corresponding predicted bankruptcy
proba-bility exp{ ˆH(x0)+ˆθ z0}
1+exp{ ˆH(x0)+ˆθ z0} in (5) approaches the true bankruptcy probability1+exp{H(xexp{H(x0)+θ z0}
0)+θ z0}
in (4) for the company with predictor values (x0, z0),as the sample size of prospective data become large. By this fact, it will be used in Section 1.8 to construct a bankruptcy prediction device for prospective data from the SLM (4).
On the other hand, using the case-control sample from the SLM (4) and treating the sample as if it was a prospective sample from the SLM (4), the local likelihood estimates for H(x0)and θ are now given. Applying the case-control sample (Yi = 0, xi, zi)for i ≤ n0 and (Yi = 1, xi, zi)for i > n0 from the SLM (4) to the normal equations (7), (9), and (11), the local likelihood estimates ˆH1(x0) and ˆH(x0)for H(x0)and ˆθ for θ can be produced. The consistency of both ˆH1(x0) and ˆH(x0) for H(x0) + c∗, and ˆθ for θ will be shown in Chapter II. Here
c∗ = log{p(Y = 0) / p(Y = 1)} + log(n1 / n0)
has been defined in Section 1.3.
Unfortunately, due to the consistency of ˆH(x0) for H(x0) + c∗ and the fact that the unknown quantity c∗ is generally not equal to 0, the resulting predicted bankruptcy probability exp{ ˆH(x0)+ˆθ z0}
1+exp{ ˆH(x0)+ˆθ z0}, obtained by plugging these ˆH(x0)and ˆθ into (5), does not converge to the true bankruptcy probability 1+exp{H(xexp{H(x0)+θ z0}
0)+θ z0} in (4), but approaches
exp{c∗+H(x0)+θ z0}
1+exp{c∗+H(x0)+θ z0}, for the company with predictor values (x0, z0). This is the major difference between applying the SLM to the prospective sample and to the case-control sample. Although the predicted bankruptcy probability (5) derived by the case-control sample from the SLM (4) does not estimate the true bankruptcy probability, we will discuss in Section 1.8 that it still can be used to develop a bankruptcy prediction device for case-control data from the SLM (4). The same conclusions have also been reached for the LLM in Section 1.3.