Bayes Inference for Technological
Substitution Data with Data-based
Transformation
LYNN KUO
University of Connecticut, Storrs, USA
JACK LEE
National Chiao Tung University, Hsinchu, Taiwan
PETER CHENG
Ming Chuan University, Taipei, Taiwan
JEFFREY PAI
University of Manitoba, Winnipeg, Canada ABSTRACT
Bayesian inference via Gibbs sampling is studied for forecasting techno-logical substitutions. The Box±Cox transformation is applied to the time series AR(1) data to enhance the linear model ®t. We compute Bayes point and interval estimates for each of the parameters from the Gibbs sampler. The unknown parameters are the regression coecients, the power in the Box±Cox transformation, the serial correlation coecient, and the vari-ance of the disturbvari-ance terms. In addition, we forecast the future techno-logical substitution rate and its interval. Model validation and model choice issues are also addressed. Two numerical examples with real data sets are given.
KEY WORDS AR(1); Box±Cox transformation; Metropolis-within-Gibbs
sampling; model choice; prediction
INTRODUCTION
As technology advances, new products replace the old ones. For example, colour televisions replace black-and-white televisions, digital telephone switching systems replace analog switching systems, etc. The rate of change, called technology penetration, is de®ned as the ratio of the number of new products to the combined total of new and old products. A typical growth curve such as Gompertz, logistic, normal, or Weibull is not quite adequate in ®tting and forecasting the penetration data. For example, let Ftbe the technology penetration at time t. Then under a
logistic growth curve assumption, we expect to see a linear growth function for log Ft= 1 ÿ Ft.
However, when dealing with real data, Lee and Lu (1987, 1989) and Keramidas and Lee (1990)
CCC 0277±6693/97/020065±18 Received August 1995
found that a more ¯exible transformation of yt Ft= 1 ÿ Ft gave much superior predictive
accuracies. This is primarily due to the violations of the linearity assumption for the growth function and of the independence assumption for the disturbance terms. Lee and Lu found that the residuals of a Box±Cox (1964) transformation of yt ®tted with a linear growth function
exhibit strong positive ®rst order autoregressive dependence (AR(1) dependence). They incorporate the Box±Cox transformation in their frequentist's analysis to enhance the linearity of the growth function and to model the proper dependence structure among the observations. In this paper, we pursue a Bayesian analysis along the same line. For a penetration data set with colour television, we show our forecast improves upon the predictor given in Lee and Lu.
A family of data-based transformed models for forecasting technological substitutions has been empirically shown in Lee and Lu (1987, 1989) to be quite useful for short-term forecasts. These models are more general than the four well-known S-shaped growth curve models: logistic, normal, Weibull, and Gompertz. Basically, we ®rst select a transformation from the following choices: M1 yt Ft= 1 ÿ Ft M2 yt expfFÿ1 Ftg; M3 yt ÿ log 1 ÿ Ft; and M4 yt ÿ 1=log Ft 1
Then, we apply the widely used Box±Cox transformation to the AR(1)-dependent data. More speci®cally, let y1, . . . , ynbe a set of ®rst-stage transformed n observations. The second-stage
transformation is de®ned by y lt yt n lÿ 1=l when l 6 0 log yt n when l 0 2 where n is a known constant such that yt n > 0 for all t. In practice n is set to 0 if all yt's are
positive. Furthermore, the y lt ; t 1; . . . ; n, are assumed to be normally distributed with the
mean linear in the regression parameters and with the covariance matrix s2S, where
S 1 ÿ r2ÿ1V; V c
ab; cab rj aÿb j, for a; b 1; . . . ; n, and ÿ1 < r < 1. In other words,
y l y l
1 ; . . . ; y ln 0 xbb a 3
where xbb is the growth function, bb a; b0, and x is the design matrix with 1's in the ®rst column
and t1to tnor log(t1) to log(tn) in the second column, depending on the growth curve model being
considered (see Lee and Lu, 1987). The vector of disturbance terms a is assumed to be distributed as a multivariate normal with mean vector 0 and covariance matrix s2S. In case a higher degree
polynomial is assumed for the growth function, the dimensions of x and bb can be adjusted easily. For example, if a second-degree polynomial in time is assumed for the growth function, then the third column of x is t2
1 to t2nor [log(t1)]2to [log(tn)]2.
The above families of transformations include the four commonly used link functions: the logit, the probit, the complementary log±log and log±log functions as de®ned in McCullagh and Nelder (1989, p. 108). These four link functions are obtained by applying the transformations in equations (1) and (2) with l 0.
Lee and Lu (1987, 1989) derive the maximum likelihood estimates of the model in equation (3). This paper studies the problem from a Bayesian point of view. In addition to Bayesian inference
on the parameters, we address prediction for future technology penetrations and predictive intervals. Instead of the plug-in method given in Lee and Lu, we obtain a Bayesian predictive distribution. A Gibbs sampler is used to approximate this predictive distribution and its functionals. This is a Markov chain Monte Carlo algorithm that simulates variates idealistically from the posterior distribution of bb; r; l, and s given the data. The transitional matrix of the Markov chain is written as a product of full conditional densities. The stationary distribution of this Markov chain is the desired posterior density. In addition to Bayesian inference and prediction, we address the issue of model adequacy and model selection using prequential predictive densities. Model selection addresses the question of which of the four ®rst stage transformations together with the Box±Cox transformation is the best. Model adequacy checks whether any of the models are appropriate in forecasting.
We re-analyse two real data sets. One is the technology penetration data for colour televisions provided by Nielson (1985). The other is the penetration data for electronic telephone switching systems of a telephone company. The ®rst data set has been analysed by Lee and Lu (1987, 1989). We show our Bayes solutions to all four models in equation (1) together with the Box±Cox transformation are adequate in ®tting the data. Moreover, we show our forecast intervals improve upon the frequentist's ones given in Lee and Lu with better accuracy and shorter lengths.
We seek non-informative priors, so the analysis is more focused on the likelihood. The parameters bb; r; l, and s are assumed to have independent prior distributions. The prior on bb is assumed to be a bivariate normal with mean ma; mb0and covariance matrix
Sp s 2 a 0 0 s2 b 4 The prior on l and r can be quite arbitrary, because the Metropolis (1953) algorithm is used to generate these variates. In fact, we have chosen the prior on r uniform on ÿ1; 1, and the prior on l to be ¯at (improper) on ÿ1; 1. We have also explored the proper prior on l that is uniform over ÿ4; 4, and found no appreciable dierence. The prior density on s is chosen from a conjugate family
p s /sg 11 eÿZ=2s2
g > 0; Z > 0
We will denote this modi®ed inverse gamma density by IG0 g; Z. Note this prior is equivalent to
choosing s2 with the inverse gamma density IG g=2; 2=Z as de®ned by Berger (1985, p. 561). If
the non-informative prior p s 1=s is desired, we can let g ! 0 and Z ! 0. When sa! 1 and
sb! 1 as being done in our numerical examples, then our prior on bb; s; and r is equivalent to
the one given by Zellner and Tiao (1964), where Bayesian analysis of the model without the Box± Cox transformation was studied.
When n is small, the parameters are hard to estimate. However, with several independent concurrent short series sharing a common AR(1) covariance structure, we will then have a general growth curve model with AR(1) covariance structure. Keramidas and Lee (1990) apply the maximum likelihood method. Lee and Liu (1996) extend the algorithm here to Bayesian predictive inference for the general growth curve model with AR(1) dependence.
Gibbs sampling has been applied extensively to various problems. We mention only the related work in time series: Albert and Chib (1993); Carlin, Polson, and Stoer (1992); Chib (1993); Chib and Greenberg (1994); Marriott et al. (1995); and McCulloch and Tsay (1994).
The next section develops the conditional densities used in the Gibbs sampler. A detailed discussion of the Gibbs sampler and the Metropolis algorithm is given in Appendix 2. Then we develop the predictive distribution and the one-step forecast. We discuss model adequacy and model selection. Finally, two numerical examples based on real penetration data are given.
MODEL AND ALGORITHM
In this section, we describe the general model and specify the conditional densities used in the Gibbs sampler. Let us ®rst transform Ftto ytvia equation (1). Then we transform ytto y lt by
applying the Box±Cox transformation (2). Let us note that because of the special structure for S, its inverse is Sÿ1 1 ÿr 0 0 0 ÿr 1 r2 ÿr 0 0 0 ÿr 1 r2 0 0 ... ... ... ... ... 0 0 0 ÿr 1 0 B B B B B @ 1 C C C C C A
Given the model y lt a bt at; where at follows the AR(1) model described in the
Introduction, we can write our likelihood (cf. Ljung and Box, 1980; Lee and Lu, 1987) as L bb; s; r; l; y / sÿn 1 ÿ r21=2 exp ÿ 1 2s2S y; bb; r; l Yn i 1 ylÿ1 i 5 where
S y; bb; r; l y lÿ xbb0Sÿ1 y lÿ xbb
with
x 1 1 1 11 2 3 n
0
1 t:
It is worth noting that the above models with l 0 in the Box±Cox transformation reduce to the four existing growth curve models used in technology substitution data. They are: (1) Logistic (Fisher and Pry, 1971): logFt= 1 ÿ Ft a bt at; (2) Normal (Stapleton, 1976): Fÿ1 Ft
a bt at; (3) Weibull (Sharif and Islam, 1980): logÿlog 1 ÿ Ft a b log t at; and (4)
Gompertz (1825): ÿlog ÿlog Ft a bt at. For the Weibull model, log t is used for t in
de®ning x. We can also write the likelihood in equation (5) equivalently in terms of F F1; . . . ; Fn0 by a change of variable technique. Then we obtain four expressions
corres-ponding to the four transformations in equation (1) respectively. We can pursue inference and prediction using either the likelihood in y or the likelihood in F. Most of the subsequent developments are derived from equation (5) mostly because it explains the AR(1) process more directly.
Combining the likelihood function (5) with the prior p bb; s; r; l discussed in the Introduction, we obtain the posterior density
Then the Gibbs algorithm proceeds as follows:
(1) Generate a given b; r; l; s; y from the normal distribution N mas2 bs2a s2 as2 a ; s2s2 a s2 as2 a ; where a 10Sÿ11 and b 10Sÿ1 y lÿ bt:
Consequently, if sa ! 1 in the prior, we generate a from the conditional distribution
N b=a; s2=a.
(2) Generate b given a; r; l; s; y from the normal distribution: N mbs2 ds2b s2 cs2 b ; s2s2b s2 cs2 b ! ; where
c t0Sÿ1t and d t0Sÿ1 y lÿ a1:
Consequently, if sb! 1 in the prior, we generate b from the conditional distribution
N d=c; s2=c.
(3) Generate r given a; b; l; s; y using the Metropolis (1953) method, where f r / 1 ÿ r21=2 eÿ 1=2s2S y;bb;r;l
p r: (4) Generate l given a; b; r; s; y using the Metropolis method, where
f l / eÿ 1=2s2S y;bb;r;lYn i 1 ylÿ1 i ( ) p l:
(5) Generate s given a; b; r; l; y from the modi®ed inverse gamma distribution IG0 g n; Z S y; bb; r; l:
The Metropolis algorithm used in steps (3) and (4) is described in Appendix 2. An alternative blocking algorithm that combines steps (1) and (2) into one step should be more ecient especially when a and b are highly correlated. Let S x0S
ÿ1x=s2 Sÿ1p . That is, we generate
the vector bb given s; r; l from the following bivariate normal density: a b N Sÿ1 x0Sÿ1y l s2 Sÿ1p mmab ; S ÿ1
instead of generating a conditioning on b and the rest of the parameters and similarly for b in two steps. We implemented the latter without blocking because the correlation coecient between a and b is not high in our numerical examples.
FORECAST
Having obtained the posterior distribution of the parameters, we can use it to predict the future values of y. We will illustrate this by the one-step forecast. Generalization to forecast the lth future period is straightforward and discussed at the end of this section. Suppose we are at the nth period. Let Dndenote the data set fy1; . . . ; yng. Let xi 1; i. We suppress x1; . . . ; xn; xn1in
the notation for convenience, because they are the known covariates. Let yy denote the parameters bb; r; l; s. Let yy k;s denote the variate of yy drawn in the kth iteration and sth
replication of the Gibbs sampler given the data set Dn. The same superscript notation will be
applied to subsets of yy and their functionals. We usually take k suciently large. Prediction for the n 1th period follows from the predictive density
f Yn 1j Dn
Z
f Yn 1j Dn; yyp yy j Dn dyy
where Yn1 denotes the random future observation at period n 1. This density can be
approximated by Monte Carlo integration from the Gibbs sample: ^f Yn 1j Dn 1r
Xr s 1
f Yn 1j Dn; yy k;s 6
This predictive density can be obtained by simply drawing y sn 1 for s 1; . . . ; r from f Yn 1j Dn; yy k;s for each yy k;s in the replications. The mean of this predictive distribution is
computed from
E Yn 1j Dn E E Yn 1j Dn; yy j Dn 7
To evaluate the inner expectation, let us consider two cases. We assume n 0 in equation (2) for simplicity.
Case (1): l 6 0.
It follows from equation (2) that
yn1 y ln 1l 11=l 8 where y ln 1 xn 1bb an1 and an 1 ran en1: Note an y ln ÿ xnbb y l nÿ 1 l ÿ xnbb: Therefore, y ln 1 xn 1bb r y l nÿ 1 l ÿ xnbb en 1:
Therefore, from equation (8), we have yn 1 xn1bb r y l nÿ 1 l ÿ xnbb en 1 l 1 1=l : 9 Case (2): l 0.
We can derive the following equation by a similar method:
yn 1 expxn1bb r log ynÿ xnbb en 1: 10
Let y k;sn 1 denote the functional (9) evaluated at the kth iteration and sth replication of the sampler, i.e. y k;sn 1 xn 1bb k;s r k;s y l k;s n ÿ 1 l k;s ÿ xnbb k;s e k;sn 1 l k;s 1 1=l k;s 11 Note that e k;sn 1 is generated from the N 0; s k;s2 distribution in the kth iteration and sth
replication of the Gibbs sampler.
Therefore, combining equations (7), (9), and (6), when l 6 0, we predict Yn 1by
^yn 11r
Xr s 1
y k;sn 1: 12
Alternatively, we can predict Yn 1 using the median of the Gibbs sample,
~yn 1 median y k;sn 1
n or
s 1
: 13
Prediction intervals and quantiles of the functional Yn 1 can be computed similarly from the
sample y k;sn 1; s 1; . . . ; r.
Similarly, when l 0, we can use the following variates to predict Yn 1:
y k;sn 1 expxn 1bb k;s r k;s log ynÿ xnbb k;s e k;sn 1: 14
Then we use the mean or the median of the sample to predict yn 1 and construct predictive
intervals from the Gibbs sample (14).
Theoretically, the posterior probability of the case l 0 is 0 when the prior on l is a continuous measure. However, we had to program the two cases l 0 and l 6 0 of the forecast rule separately to avoid an over¯ow problem.
Having obtained ^yn 1, we apply the inverse transformations of equation (1) to predict Fn 1,
i.e. (1) ^Fn 1 ^yn 1= ^yn 1 1; (2) ^Fn 1 F log ^yn 1; (3) ^Fn 1 1 ÿ exp ÿ^yn 1; and (4)
^
Fn 1 expfÿ1=^yn 1g. The Bayesian predictive intervals for Fn 1are calculated from the Gibbs
sample of yn 1 in equations (11), (14), and the above transformations.
In addition to the naive inverse transformation, we can add a bias-corrected term which involves only the second derivative of the inverse transformation in the Taylor series expansion. However, as noted in Keramidas and Lee (1990, p. 628), the bias-correction does not guarantee to produce better forecasts as the ®rst derivative vanishes. Granger and Newbold (1976) studied
other forecast rules for the transformed variables using Hermite polynomials. We have chosen only the naive inverse transformation in this paper for his simplicity. Moreover, the naive ones have demonstrated desirable forecast performance as in the section of numerical examples.
Now we brie¯y discuss how we generalize the above one-step prediction rule to multi-step prediction. Let Yf Yn 1; . . . ; Yn l denote the vector of future observations. Then the joint
density of Yfgiven Dncan be written as
f Yfj Dn f Yn 1j Dn; yyf Yn 2j Yn 1; Dn; yy f Yn lj Yn 1; . . . ; Yn lÿ1; Dn; yy
Therefore, the samples from the joint future density can be drawn sequentially with each step similar to the single step in equation (6); a generated new future observation in each step is incorporated as data into the next step of the sequential procedure.
MODEL VALIDATION AND MODEL CHOICE
Both model adequacy and model selection issues are discussed in this section. Model adequacy is checked by comparing the observed yn 1to its 95% predictive interval developed in the previous
section for each n in a series. A model is judged to be adequate if about 95% of the intervals contain the observed future values.
For model selection, we consider three criteria: mean squared error (MSE), mean absolute relative deviation (MARD), and the prequential pseudo-Bayes factor (PPBF).
To de®ne the PPBF, let us de®ne the prequential conditional predictive ordinate (PCPO), which is the predictive density of Yn 1evaluated at the future observed value yn 1. Therefore, we
have from equation (6), dn 1 ^f yn 1j Dn 1rXr s 1 1 2p p s k;s exp ÿ yl k;sÿ1 n 1 l k;s ÿ xn 1bb k;sÿ r k;sa k;sn !2 2 s k;s2 8 > > > > > < > > > > > : 9 > > > > > = > > > > > ; yl k;sÿ1 n 1 15 where a k;s n y l k;s n ÿ xbb k;s:
Consequently, the PCPO for Fn1is computed by
cn 1 ^f Fn 1j Dn ^f yn 1j DndFdy y yn 1
16 where dy/dF is computed from equation (1).
Let N denote the total number of periods in the data set. We would like to monitor data for a reasonable time period, say for t 1; . . . ; I, before considering model selection. Then we evaluate the following three prediction measurements for each of the models for the periods I 1 to N. All the prediction rules are computed prequentially as described in the previous section, i.e.
^
in equation (1) together with the AR(1) model for the Box±Cox transformation, we can evaluate the following three measurements:
MSE 1 N ÿ I XN t I 1 ^Ftÿ Ft2 17 MARD N ÿ I1 XN t I 1 j ^Ftÿ Ftj Ft 18 PPBF YN t I 1 ct 19
The best model is the one with smallest MSE, smallest MARD, or the largest PPBF.
Note the PPBF is dierent from the pseudo Bayes factor in Geisser and Eddy (1979) where the cross-validation idea is used. In fact, the PPBF evaluates the conditional joint predictive density of the data yI 1; . . . ; yN given y1; . . . ; yI.
To compute the PPBF in equation (19), we need to run each of the N ÿ I Gibbs samplers from scratch for each prequential conditional predictive ordinate. The sequential imputation algorithm developed by Kong, Liu, and Wong (1994) and Liu and Chen (1995) can be modi®ed to suit for this situation. The sequential imputation algorithm provides a desirable alternative approach to this problem. With the new data yI 1; . . . ; yNcoming in sequentially, the algorithm
sequentially impute the unknown parameters using the importance sampling weights based on the same old data batch y1; . . . ; yI. As long as the new data are not wildly dierent from the old
batch, the importance sampling weights should be well behaved. NUMERICAL EXAMPLES
Analysis based on two real-life data sets is summarized here. The ®rst data set, listed in Table I, initially provided by Nielson Inc., is the penetration data for colour television for the period 1956±85. The second data set, listed in Table IV, is the penetrations of electronic telephone switching systems for the years from 1967 to 1984 for a telephone company. The numbers Ft
listed are the fractions of the total number of new technology users (users of colour television, electronic switching systems, etc.) divided by the total number of new and old technology users (users of colour and black and white televisions, subscribers for electronic (new) and electro-mechanical (old) switching systems).
Table I. Technology penetration data for colour TV
Year Penetrations Year Penetrations Year Penetrations
1956 0.00052 1966 0.09694 1976 0.73606 1957 0.00219 1967 0.16325 1977 0.77065 1958 0.00394 1968 0.24175 1978 0.77984 1959 0.00569 1969 0.32034 1979 0.80926 1960 0.00743 1970 0.35744 1980 0.83028 1961 0.00943 1971 0.41015 1981 0.82916 1962 0.01208 1972 0.48631 1982 0.87595 1963 0.01889 1973 0.55355 1983 0.88703 1964 0.03120 1974 0.62251 1984 0.90465 1965 0.05332 1975 0.68394 1985 0.91519
Diuse priors on the parameters are chosen for both data sets. The parameters in equation (4) are chosen such that sa! 1 and sb! 1. Given this limiting case, our posterior does not
depend on ma or mb. The prior on r is uniform ÿ1; 1 to re¯ect the equal likelihood of r. The
prior on l is chosen to be the improper ¯at prior, i.e. p l 1 for all l. The prior on s is also improper, namely, p s 1=s.
Table II lists the point estimates, estimates of the standard deviations, and the percentiles for each of the parameters for the complete colour television data set (1956 to 1985) and the model with the reciprocal link transformation (M4). The Bayes estimates for the parameters are not signi®cantly dierent among the four link transformations; therefore we omit the results for the other link transformations. Initially, the Bayes estimates are computed from the Gibbs sampler with 53 iterations and 500 replications and with 50 loops in each Metropolis algorithm to generate r and l for each iteration and replication of the Gibbs sampler. The starting points for the replications for each parameter are chosen from random perturbations around the maximum likelihood estimates. The convergence of the Gibbs sampler is monitored by examining their empirical quantiles. This leads us to iterate the above Gibbs sampler ten more times to yield the results in Tables II and III. The Bayesian predictive intervals can easily be obtained from Table II. For example, the 95% intervals are read from the 2.5% and 97.5% columns.
For the ®rst data set, we produce the one-step-ahead prediction rule for the year 1966 using both the mean and the median of the Gibbs sample based on the data set for the year 1956 to 1965. Then we predict the subsequent years prequentially, i.e. using the data from 1956 to 1966 to predict the penetration for 1967, etc. In Figure 1, the one-step forecasts for the years 1966 to 1985 using the medians predictors are plotted for each of the four link transformations. They are also compared with the actually observed data. This ®gure shows that the logistic transformation (M1) produces the best forecasts except for the years 1974±1976 and 1982.
Figure 2 provides model adequacy checks in which the 95% Bayesian credible intervals are plotted pointwise for each of the one-step-ahead prediction rules. The ®gure shows that all four models are adequate because the future observed values are always contained in the intervals. The con®dence bands become narrower as time increases as expected because more data are available. It can be observed that the four models dier less on the left tails of the distributions than on the right tails.
The 90% and 95% frequentist's predictive intervals can be computed from a Student-t distribution. Let ^y denote the frequentist's forecast of y
n given in the Appendix 1. Then
^y l+ta=2 n ÿ 2^s will yield a 100 1 ÿ a% predictive interval for the time t n, where ^s is
com-puted similarly to equations (A15) and (A17) of Lee and Lu (1987, p. 76). Hence approximately 100 1 ÿ a% predictive intervals for yn can be obtained from ^y^lÿ ^lta=2 n ÿ 2^s1=^l < yn<
^y^l ^lta=2 n ÿ 2^s1=^l: The frequentist's predictive rules are transformed back to obtain the
predictive rules ^F. In Figure 3, these frequentist's 95% intervals are plotted against the Bayesian intervals for the years 1966 to 1985, both with the logistic link. It can be seen that the Bayesian
Table II. Gibbs approximation to the estimates for colour TV data
Mean S.D. 2.5% 5% 25% 50% 75% 95% 97.5% ^a ÿ2:38 0.30 ÿ2:94 ÿ2:79 ÿ2:54 ÿ2:38 ÿ2:25 ÿ1:88 ÿ1:76 ^b 0.15 0.01 0.13 0.14 0.15 0.16 0.16 0.17 0.17 ^r 0.90 0.07 0.75 0.77 0.87 0.92 0.96 0.99 0.99 ^l ÿ0:08 0.06 ÿ0:20 ÿ0:18 ÿ0:12 ÿ0:08 ÿ0:04 0.02 0.04 ^s 0.08 0.01 0.06 0.06 0.07 0.08 0.09 0.10 0.10
Figure 1. Plot of the Bayesian one-step forecasts for each of the four links versus years
predicting intervals are not only shorter in length than the frequentist's ones but also more accurate in prediction.
Table III addresses the model selection question. The three criteria MSE, MARD, and PPBF are computed for the last 20 periods, i.e. 1966 to 1985. The three measurements are denoted by MSE(B), MARD(B), and PPBF. Furthermore, MSE and MARD are also computed using the frequentist's one-step prediction rule. They are denoted by MSE(F) and MARD(F). The table shows that the Bayesian predictor produces smaller MSE and MARD than the frequentist's predictor. Table III also shows that the logistic transformation (M1) yields the smallest MSE and MARD, Bayesian or frequentist's, among the four transformations. This is con®rmed by Figure 2. On the other hand, the reciprocal log transformation (M4) yields the largest PPBF. This is con®rmed by Figure 4, where the prequential CPOs for the years 1966 to 1985 are plotted for each of the link functions. It is not entirely surprising that we select dierent models due to dierent criteria. The skewedness of the predictive density revealed in Figure 2 manifests this
Figure 3. Plot of 95% Bayesian and frequentist's one-step-ahead predictive intervals and for the logistic link versus years
Table III. Model selection for the colour TV data
MSE(B) MARD(B) PPBF(log) MSE(F) MARD(F)
M1 0.00034 0.047 37.39 0.00038 0.052
M2 0.00040 0.051 46.74 0.00052 0.058
M3 0.00070 0.063 40.99 0.00054 0.064
phenomenon. Figure 4 further shows that models 2 and 4 are quite comparable and the logistic (M1) model ®ts well for the earlier periods up to 1972.
Table V lists the Bayes estimates for the various parameters and its credible intervals based on the telephone switching data set. Note that the estimates for b are positive for both data sets. This
Figure 4. Plot of the prequential conditional predictive ordinates (PCPO) for each of the four links versus years
Table IV. Penetrations of electronic telephone switching systems
Year Penetrations Year Penetrations Year Penetrations
1967 0.00071 1973 0.11259 1979 0.30486 1968 0.00386 1974 0.14553 1980 0.32404 1969 0.00447 1975 0.16736 1981 0.35516 1970 0.02003 1976 0.19083 1982 0.37708 1971 0.02961 1977 0.20130 1983 0.41134 1972 0.05974 1978 0.26299 1984 0.43709
Table V. Gibbs approximation to the estimates for the telephone data set
Mean S.D. 2.5% 5% 25% 50% 75% 95% 97.5% ^a ÿ2:23 0.30 ÿ2:83 ÿ2:64 ÿ2:36 ÿ2:21 ÿ2:08 ÿ1:85 ÿ1:79 ^b 0.11 0.01 0.08 0.09 0.10 0.11 0.12 0.14 0.14 ^r 0.70 0.20 0.30 0.34 0.55 0.73 0.87 0.97 0.98 ^l 0.46 0.05 0.35 0.38 0.43 0.46 0.50 0.55 0.57 ^s 0.06 0.02 0.03 0.04 0.05 0.06 0.07 0.09 0.10
is reasonable because we would expect substitutions to increase with time. The estimate for l for the TV data is much smaller than that of the telephone data indicating that the dependence for the latter is much stronger. Table VI lists the model selection measures for the last eight periods, i.e. 1977 to 1984. It reveals that the logistic transformation (M1) is preferred by the prequential pseudo-Bayes factor. The probit transformation (M2) is preferred by the Bayesian MSE and MARD criteria. The Bayesian prediction rules are better than the frequentist's ones in prediction. Moreover, our ®gures show that the Bayesian prediction rules have shorter intervals than the frequentist's ones and more accurate; all four link transformations, with the Box±Cox transformation and AR(1) models, are adequate. These ®gures are omitted.
APPENDIX 1: FREQUENTIST'S PREDICTION RULE The frequentist's prediction rule is given here. Let y y
n 1; . . . ; yn l0 be an l-dimensional
vector of future observations to be predicted on the basis of the past observations y y1; . . . ; yn0. Let x denote the design matrix corresponding to the future observations.
Furthermore, let Cov y l0
; y l0
0 s2= 1 ÿ r2 cab, where cab rj aÿb j; a; b 1; . . . ; n l:
When the unknown parameters are replaced by their maximum likelihood estimates from equation (A3), the forecasts of y can be written as
^y l 1 f1 ^lx^bb ^rr y ^ln ÿ xn^bbg1=^lÿ n when ^l 6 0 expx^bb ^rr y ^l n ÿ xn^bb ÿ n when ^l 0 ( A1 where ^bb x0V^ÿ1xÿ1x0V^ÿ1y ^l; ^rr ^r; ^r2; . . . ; ^rl0, 1 1; . . . ; 10, ^V ^c ab; ^cab ^rj aÿb j, xn
and y ^ln are the nth row of x and y ^l respectively, and ^l and ^r are the MLEs of l and r,
respectively. For the one-step-ahead forecast, we set l 1 in the above discussion and let ^y denote the ®rst row of ^y l 1. We assume n is 0 in the two examples.
APPENDIX 2: GIBBS SAMPLING
We describe brie¯y the Gibbs sampling procedure. Suppose we desire to estimate f U1; U2; . . . ; Upj Dn, the posterior joint density of U1; U2; . . . ; Up, given the data Dn. The
algorithm assumes that we can generate variates from the conditional densities f Uij fUjgj 6 i; Dn.
The algorithm proceeds as follows. Let us start with initial values, U 01 ; U 02 ; . . . ; U 0
p . Generate
a value U 11 from the conditional density f U1j U 02 ; . . . ; U 0p ; Dn. Similarly, generate a value
U 12 from the conditional density f U2j U 11 ; U 03 ; . . . ; U 0p ; Dn, and continue until the value
U 1
p is generated from the conditional density f Upj U 11 ; . . . ; U 1pÿ1; Dn. With this new
realization of the values U 1 U 1
1 ; . . . ; U 1p replacing the old values, we can continue to
iterate until the kth realization. Under very mild regularity conditions, this Markov chain
Table VI. Model selection for the telephone data set
MSE(B) MARD(B) PPBF(log) MSE(F) MARD(F)
M1 0.00028 0.047 25.72 0.00058 0.061
M2 0.00021 0.038 20.92 0.00141 0.106
M3 0.00049 0.059 17.88 0.00317 0.173
converges to a stationary distribution for large k. The vector U k1 ; . . . ; U k
p has a distribution
that is approximately equal to f U; . . . ; Upj Dn. By starting independent initial choices, we can
also replicate the above iterations r times. Let U i;s U i;s
1 ; . . . ; U i;sp denote the realization
of U for the ith iteration and sth replication. The posterior moments, functionals, and credible sets can be computed from the empirical measure assigning weight 1/r to each U k;s1 ; . . . ; U k;s
p ; s 1; . . . ; r, to compute posterior features.
For the issue of convergence and the issue of pragmatic choices of k and r, consult Geman and Geman (1984), Tanner and Wong (1987), Gelfand and Smith (1990), Gelfand et al. (1990), Gelman and Rubin (1992), and Tierney (1994).
When the conditional densities are not easily identi®ed, such as in cases without conjugate priors, the Metropolis (1953) algorithm or importance sampling methods can be employed. We describe the Metropolis algorithm brie¯y. Suppose we desire to sample a variate from the following generic density:
f U1j U2; . . . ; Up; Dn R1 f Dnj U1; . . . ; Upp U1; U2; . . . ; Up
ÿ1f Dnj U1; . . . ; Upp U1; U2; . . . ; Up dU1 A2
Let f (U1) denote the conditional density in equation (A2), suppressing the conditioning
variables for brevity. Let us de®ne a transition kernel q(U1, X ), which maps U1to X. If U1is a
real variable with range in ÿ1; 1, we can construct q such that X U1 s0Z with Z being
the standard normal random variate and s02re¯ecting the conditional variance of U
1in equation
(A2). Then the Metropolis algorithm proceeds as follows: Step 1: start with any point U 01 , and stage indicator j 0.
Step 2: generate a point X according to the transition kernel q U j 1 ; X :
Step 3: update U j 1 to U j 11 X with probability p minf1; f X=f U j 1 g, stay at U j 1 with probability 1 ÿ p.
Step 4: repeat steps 2 and 3 by increasing the stage indicator until the process reaches a stationary distribution.
Chapter 9 of Hammersley and Handscomb (1964) provides a discussion of why this algorithm works. Note that this algorithm is de®ned by using the ratio of two values of equation (A2). Therefore, all we need is to know the functional form of the likelihood and prior. This spares us the task of evaluating the normalizing constant.
If U1is a bounded variable with range in (a, b), we can use a distribution with support on (a, b)
(for example, uniform over (a, b) distribution) to generate X. In this case, the transition kernel q(U1, X ) does not depend on U1. Alternatively, we can apply the transformation, such as
U0 log U1ÿ a= b ÿ U
1, to map (a, b) into ÿ1; 1, then use the normal transition kernel
described earlier to generate X using the density of U0. We implemented the latter procedure in
our numerical examples because the normal transitional kernel seems to yield a more ecient algorithm than the uniform transitional kernel. On using the normal kernel, we ®rst transform r to r02 ÿ1; 1 by r0 log 1 r= 1 ÿ r. Then we apply the Metropolis algorithm to the
function
f r0 / e3r0=2
1 er03eÿ 1=2s
2S0 y;bb;r0;l
where S0 y; bb; r0; l is obtained from S y; bb; r; l with r replaced by er0
ÿ 1= er0
1. We also need to specify sr0(denoted by s0in the above Metropolis algorithm). The quantity sr0is usually
chosen to re¯ect the conditional standard deviation of r0 given bb; l; s; y. We propose three
methods to estimate the variance s2 r0:
(1) Start with an arbitrary guess of sr0; use the Gibbs sampler of the r0s to estimate empirically
its variance.
(2) Obtain estimates for the mean and variance of r, denoted by mr and s2
r, from the SAS
AUTOREG procedure given a ®xed l. Apply the delta method to obtain an estimate of the variance of r0, i.e. s2
r0 s2r 2= 1 ÿ m2r2.
(3) Derive the pro®le likelihood of r and l as in Lee and Lu (1987), l r; l; y ÿn
2log ^s2 r; l l ÿ 1 Xn t 1
log yt12 log 1 ÿ r2 A3
where
^s2 r; l 1
n y lÿ x^bb0Sÿ1 y lÿ x^bb with
^bb x0Sÿ1xÿ1x0Sÿ1y l
Then, invert the sample information matrix at the maximum likelihood estimates of equation (A3) to obtain the preliminary variance estimates of r and l respectively. This method is adopted in our analysis, because it has the advantage of obtaining both variances. Then we apply the delta method to obtain the variance estimates for r and l. These variance estimates are further re®ned by empirical evaluations of the variance estimates from the Gibbs sampler because the conditional variances are required in the Metropolis algorithm. Having obtained r0 from the Metropolis algorithm, we transform
the r0back to r by r er0
ÿ 1= er0
1.
An operation like that applied to r can also be applied to the variate l, except that the above method (3) of using the SAS AUTOREG is not applicable for l.
Conditioning on l, our model is a special case of the models that are discussed by Albert and Chib (1993), Chib (1993) and Chib and Greenberg (1994). Therefore, the generation of r can be done more eciently using the idea in Chib and Greenberg. That is the full conditional distribution of r for the uniform prior can be written as
r j data; bb; s2/ C r N ^r; oÿ1I ÿ1;1
where C r is the density of the initial observation and o Xn ÿ1 i 1 y li ÿ a ÿ bi2 and ^r oÿ1Xn ÿ1 i 1
Then the Metropolis step can be done by drawing a r (say, r) from N ^r; oÿ1. The chain stays at
the current value if r does not satisfy stationarity; otherwise it moves to r with probability
given by
min C r C r ; 1
ACKNOWLEDGEMENTS
Kuo's research was partially supported by the National Science Foundation Grant DMS-9008021. Lee's research was partially supported by the National Research Council Grants 82-208-M009-054 and 85-2121-M-009-004. Pai's research was partially supported by the Research Foundation, University of Connecticut.
REFERENCES
Albert, J. and Chib, S., `Bayes inference via Gibbs sampling of autoregressive time series subject to Markov mean and variance shifts', Journal of Business and Economic Statistics, 11 (1993), 1±16.
Berger, J. O., Statistical Decision Theory and Bayesian Analysis, 2nd edn, New York: Springer-Verlag, 1985.
Box, G. E. P. and Cox, D. R., `An analysis of transformations', Journal of the Royal Statistical Society, A, 26 (1964), 211±252.
Carlin, B., Polson, N. and Stoer, D., `A Monte Carlo approach to nonnormal and non-linear state-space modeling', Journal of the American Statistical Association, 87 (1992), 493±500.
Chib, S., `Bayes regression with autoregressive errors, a Gibbs sampling approach', Journal of Econo-metrics, 58 (1993), 275±294.
Chib, S. and Greenberg, E., `Bayes inference in regression models with ARMA(p, q) errors', Journal of Econometrics, 64 (1994), 183±206.
Fisher, J. C. and Pry, R. H., `A simple substitution model of technological change', Technological Forecasting and Social Change, 3 (1971), 75±88.
Geisser, S. and Eddy, W., `A predictive approach to model selection', Journal of the American Statistical Association, 74 (1979), 153±160.
Gelfand, A. E., Hills, S. E., Racine-Poon, A. and Smith, A. F. M., `Illustration of Bayesian inference in normal data models using Gibbs sampling', Journal of the American Statistical Association, 85 (1990), 972±985.
Gelfand, A. E. and Smith, A. F. M., `Sampling-based approaches to calculating marginal densities', Journal of the American Statistical Association, 85 (1990), 398±409.
Gelman, A. E. and Rubin, D., `Inference from iterative simulation using multiple sequences', Statistical Science, 7 (1992), 457±472.
Geman, S. and Geman, D., `Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images', IEEE Transactions on Pattern Analysis and Machine Intelligence, 6 (1984), 721±741.
Gompertz, B., `On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies', Philosophical Transactions of the Royal Society of London, 115 (1825), 513±585.
Granger, C. W. J. and Newbold, P., `Forecasting transformed series', Journal of the Royal Statistical Society, B, 38 (1976), 189±203.
Hammersley, J. M. and Handscomb, D. C., Monte Carlo Methods, London: Chapman and Hall, 1964. Keramidas, E. M. and Lee, J. C., `Forecasting technological substitutions with concurrent short time
series', Journal of the American Statistical Association, 85 (1990), 625±632.
Kong, A., Liu, J. S. and Wong, W. H., `Sequential imputations and Bayesian missing data problems', Journal of the American Statistical Association, 89 (1994), 278±288.
Lee, J. C. and Lu, K. W., `On a family of data-based transformed models useful in forecasting techno-logical substitutions', Technotechno-logical Forecasting and Social Change, 31 (1987), 61±78.
Lee, J. C. and Lu, K. W., `Algorithm and practice of forecasting technological substitutions with data-based transformed models', Technological Forecasting and Social Change, 36 (1989), 401±414.
Lee, J. C. and Liu, K. C., `Bayesian analysis of a general growth curve model with power transformations and AR(1) dependence', paper presented at the 1996 joint statistical meeting in Chicago.
Liu, J. S. and Chen, R., `Blind deconvolution via sequential imputations', Journal of the American Statistical Association, 90 (1995), 567±576.
Ljung, G. M. and Box, G. E. P., `Analysis of variance with autocorrelated observations', Scandinavian Journal of Statistics, 7 (1980), 172±180.
Marriott, J., Ravishanker, N., Gelfand, A. and Pai, J., `Bayesian analysis for ARMA processes: complete sampling based inference under exact likelihoods', in Berry, D., Chaloner, K. and Geweke, J. (eds), Bayesian Statistics and Econometrics: Essays in Honor of Arnold Zellner, (1994), 241±254.
McCullagh, P. and Nelder, J. A., Generalized Linear Models, 2nd edn. London: Chapman and Hall, 1989. McCulloch, R. and Tsay, R. S., `Bayesian analysis of autoregressive time series via the Gibbs sampler',
Journal of Time Series Analysis, 15 (1994), 235±250.
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and Teller, E., `Equations of state calculations by fast computing machines', Journal of Chemical Physics, 21 (1953), 1087±1091.
Nielson, A. C. Inc., private communications (1985).
Sharif, M. N. and Islam, M. N., `The Weibull distribution as a general model for forecasting technological change', Technological Forecasting and Social Change, 18 (1980), 247±256.
Stapleton, E., `The normal distribution as a model of technological substitution', Technological Forecasting and Social Change, 8 (1976), 325±334.
Tanner, M. and Wong, W., `The calculation of posterior distributions by data augmentation' (with discussion), Journal of the American Statistical Association, 81 (1987), 82±86.
Tierney, L., `Markov chains for exploring posterior distributions' (with discussion), The Annals of Statistics, 22 (1994), 1701±1758.
Zellner, A. and Tiao, G. (1964), `Bayesian analysis of the regression model with autocorrelated errors', Journal of the American Statistical Association, 59 (1964), 763±778.
Author's biographies:
Lynn Kuo is a professor in the Department of Statistics at the University of Connecticut. She received her PhD from the University of California at Los Angeles. Her current research interests include Bayesian computation, software reliability, and event history analysis.
Jack Lee is professor and director at the Institute of Statistics, National Chiao Tung University, Hsinchu, Taiwan. He received his PhD from the State University of New York at Bualo. His current research interests include time series, growth curve, and multivariate analysis.
Peter Cheng is an associate professor in the Department of Statistics at Ming Chuan University, Taipei, Taiwan. He received his PhD from Columbia University. His research interests include bioassays with historical controls and isotonic regression methods for tumour analysis.
Jerey Pai is an assistant professor at the Warren Center for Actuarial Studies and Research, Faculty of Management, University of Manitoba, Winnipeg, Canada. He received his PhD from the University of Connecticut. His research interests include time series, volatility and compound loss models.
Authors' addresses:
Lynn Kuo, Department of Statistics, University of Connecticut, Storrs, CT 06269-3120, USA. Jack Lee, Institute of Statistics, National Chiao Tung University, Hsinchu, Taiwan.
Peter Cheng, Department of Statistics, Ming Chuan University, Taipei, Taiwan.
Jeerey Pai, Warren Center for Actuarial Studies & Research, Faculty of Management, University of Manitoba, Winnipeg, Manitoba, R3T 5V4 Canada.