Bayes inference for technological substitution data with data-based transformation

(1)

Bayes Inference for Technological

Substitution Data with Data-based

Transformation

LYNN KUO

University of Connecticut, Storrs, USA

JACK LEE

National Chiao Tung University, Hsinchu, Taiwan

PETER CHENG

Ming Chuan University, Taipei, Taiwan

JEFFREY PAI

University of Manitoba, Winnipeg, Canada ABSTRACT

Bayesian inference via Gibbs sampling is studied for forecasting techno-logical substitutions. The Box±Cox transformation is applied to the time series AR(1) data to enhance the linear model ®t. We compute Bayes point and interval estimates for each of the parameters from the Gibbs sampler. The unknown parameters are the regression coecients, the power in the Box±Cox transformation, the serial correlation coecient, and the vari-ance of the disturbvari-ance terms. In addition, we forecast the future techno-logical substitution rate and its interval. Model validation and model choice issues are also addressed. Two numerical examples with real data sets are given.

KEY WORDS AR(1); Box±Cox transformation; Metropolis-within-Gibbs

sampling; model choice; prediction

INTRODUCTION

As technology advances, new products replace the old ones. For example, colour televisions replace black-and-white televisions, digital telephone switching systems replace analog switching systems, etc. The rate of change, called technology penetration, is de®ned as the ratio of the number of new products to the combined total of new and old products. A typical growth curve such as Gompertz, logistic, normal, or Weibull is not quite adequate in ®tting and forecasting the penetration data. For example, let Ftbe the technology penetration at time t. Then under a

logistic growth curve assumption, we expect to see a linear growth function for log Ft=1 ÿ Ft.

However, when dealing with real data, Lee and Lu (1987, 1989) and Keramidas and Lee (1990)

CCC 0277±6693/97/020065±18 Received August 1995

(2)

found that a more ¯exible transformation of yt Ft=1 ÿ Ft gave much superior predictive

accuracies. This is primarily due to the violations of the linearity assumption for the growth function and of the independence assumption for the disturbance terms. Lee and Lu found that the residuals of a Box±Cox (1964) transformation of yt ®tted with a linear growth function

exhibit strong positive ®rst order autoregressive dependence (AR(1) dependence). They incorporate the Box±Cox transformation in their frequentist's analysis to enhance the linearity of the growth function and to model the proper dependence structure among the observations. In this paper, we pursue a Bayesian analysis along the same line. For a penetration data set with colour television, we show our forecast improves upon the predictor given in Lee and Lu.

A family of data-based transformed models for forecasting technological substitutions has been empirically shown in Lee and Lu (1987, 1989) to be quite useful for short-term forecasts. These models are more general than the four well-known S-shaped growth curve models: logistic, normal, Weibull, and Gompertz. Basically, we ®rst select a transformation from the following choices: M1 y_t Ft=1 ÿ Ft M2 yt expfFÿ1Ftg; M3 yt ÿ log1 ÿ Ft; and M4 yt ÿ 1=log Ft 1

Then, we apply the widely used Box±Cox transformation to the AR(1)-dependent data. More speci®cally, let y1, . . . , ynbe a set of ®rst-stage transformed n observations. The second-stage

transformation is de®ned by ylt yt n l_{ÿ 1=l} _{when l 6 0} log yt n when l 0 2 where n is a known constant such that yt n > 0 for all t. In practice n is set to 0 if all yt's are

positive. Furthermore, the ylt ; t 1; . . . ; n, are assumed to be normally distributed with the

mean linear in the regression parameters and with the covariance matrix s2_{S, where}

S 1 ÿ r2ÿ1_{V; V c}

ab; cab rj aÿb j, for a; b 1; . . . ; n, and ÿ1 < r < 1. In other words,

yl_yl

1 ; . . . ; yln 0 xbb a 3

where xbb is the growth function, bb a; b0_{, and x is the design matrix with 1's in the ®rst column}

and t1to tnor log(t1) to log(tn) in the second column, depending on the growth curve model being

considered (see Lee and Lu, 1987). The vector of disturbance terms a is assumed to be distributed as a multivariate normal with mean vector 0 and covariance matrix s2_{S. In case a higher degree}

polynomial is assumed for the growth function, the dimensions of x and bb can be adjusted easily. For example, if a second-degree polynomial in time is assumed for the growth function, then the third column of x is t2

1 to t2nor [log(t1)]2to [log(tn)]2.

The above families of transformations include the four commonly used link functions: the logit, the probit, the complementary log±log and log±log functions as de®ned in McCullagh and Nelder (1989, p. 108). These four link functions are obtained by applying the transformations in equations (1) and (2) with l 0.

Lee and Lu (1987, 1989) derive the maximum likelihood estimates of the model in equation (3). This paper studies the problem from a Bayesian point of view. In addition to Bayesian inference

(3)

on the parameters, we address prediction for future technology penetrations and predictive intervals. Instead of the plug-in method given in Lee and Lu, we obtain a Bayesian predictive distribution. A Gibbs sampler is used to approximate this predictive distribution and its functionals. This is a Markov chain Monte Carlo algorithm that simulates variates idealistically from the posterior distribution of bb; r; l, and s given the data. The transitional matrix of the Markov chain is written as a product of full conditional densities. The stationary distribution of this Markov chain is the desired posterior density. In addition to Bayesian inference and prediction, we address the issue of model adequacy and model selection using prequential predictive densities. Model selection addresses the question of which of the four ®rst stage transformations together with the Box±Cox transformation is the best. Model adequacy checks whether any of the models are appropriate in forecasting.

We re-analyse two real data sets. One is the technology penetration data for colour televisions provided by Nielson (1985). The other is the penetration data for electronic telephone switching systems of a telephone company. The ®rst data set has been analysed by Lee and Lu (1987, 1989). We show our Bayes solutions to all four models in equation (1) together with the Box±Cox transformation are adequate in ®tting the data. Moreover, we show our forecast intervals improve upon the frequentist's ones given in Lee and Lu with better accuracy and shorter lengths.

We seek non-informative priors, so the analysis is more focused on the likelihood. The parameters bb; r; l, and s are assumed to have independent prior distributions. The prior on bb is assumed to be a bivariate normal with mean m_a; m_b0_{and covariance matrix}

Sp s 2 a 0 0 s2 b 4 The prior on l and r can be quite arbitrary, because the Metropolis (1953) algorithm is used to generate these variates. In fact, we have chosen the prior on r uniform on ÿ1; 1, and the prior on l to be ¯at (improper) on ÿ1; 1. We have also explored the proper prior on l that is uniform over ÿ4; 4, and found no appreciable dierence. The prior density on s is chosen from a conjugate family

ps /_s_{g 1}1 eÿZ=2s2

g > 0; Z > 0

We will denote this modi®ed inverse gamma density by IG0_{g; Z. Note this prior is equivalent to}

choosing s2 _{with the inverse gamma density IGg=2; 2=Z as de®ned by Berger (1985, p. 561). If}

the non-informative prior ps 1=s is desired, we can let g ! 0 and Z ! 0. When sa! 1 and

sb! 1 as being done in our numerical examples, then our prior on bb; s; and r is equivalent to

the one given by Zellner and Tiao (1964), where Bayesian analysis of the model without the Box± Cox transformation was studied.

When n is small, the parameters are hard to estimate. However, with several independent concurrent short series sharing a common AR(1) covariance structure, we will then have a general growth curve model with AR(1) covariance structure. Keramidas and Lee (1990) apply the maximum likelihood method. Lee and Liu (1996) extend the algorithm here to Bayesian predictive inference for the general growth curve model with AR(1) dependence.

Gibbs sampling has been applied extensively to various problems. We mention only the related work in time series: Albert and Chib (1993); Carlin, Polson, and Stoer (1992); Chib (1993); Chib and Greenberg (1994); Marriott et al. (1995); and McCulloch and Tsay (1994).

(4)

The next section develops the conditional densities used in the Gibbs sampler. A detailed discussion of the Gibbs sampler and the Metropolis algorithm is given in Appendix 2. Then we develop the predictive distribution and the one-step forecast. We discuss model adequacy and model selection. Finally, two numerical examples based on real penetration data are given.

MODEL AND ALGORITHM

In this section, we describe the general model and specify the conditional densities used in the Gibbs sampler. Let us ®rst transform Ftto ytvia equation (1). Then we transform ytto ylt by

applying the Box±Cox transformation (2). Let us note that because of the special structure for S, its inverse is Sÿ1 1 ÿr 0 0 0 ÿr 1 r2 _ÿr ₀ ₀ 0 ÿr 1 r2 ₀ ₀ ... ... ... ... ... 0 0 0 ÿr 1 0 B B B B B @ 1 C C C C C A

Given the model yl_t a bt at; where at follows the AR(1) model described in the

Introduction, we can write our likelihood (cf. Ljung and Box, 1980; Lee and Lu, 1987) as Lbb; s; r; l; y / sÿn_{1 ÿ r}21=2 _{exp ÿ} 1 2s2Sy; bb; r; l _Y_n i 1 ylÿ1 i 5 where

Sy; bb; r; l yl_{ÿ xbb}0_Sÿ1_yl_{ÿ xbb}

with

x 1 1 1 1_{1 2 3 n}

0

1 t:

It is worth noting that the above models with l 0 in the Box±Cox transformation reduce to the four existing growth curve models used in technology substitution data. They are: (1) Logistic (Fisher and Pry, 1971): logFt=1 ÿ Ft a bt at; (2) Normal (Stapleton, 1976): Fÿ1Ft

a bt at; (3) Weibull (Sharif and Islam, 1980): logÿlog1 ÿ Ft a b log t at; and (4)

Gompertz (1825): ÿlogÿlog Ft a bt at. For the Weibull model, log t is used for t in

de®ning x. We can also write the likelihood in equation (5) equivalently in terms of F F1; . . . ; Fn0 by a change of variable technique. Then we obtain four expressions

corres-ponding to the four transformations in equation (1) respectively. We can pursue inference and prediction using either the likelihood in y or the likelihood in F. Most of the subsequent developments are derived from equation (5) mostly because it explains the AR(1) process more directly.

Combining the likelihood function (5) with the prior pbb; s; r; l discussed in the Introduction, we obtain the posterior density

(5)

Then the Gibbs algorithm proceeds as follows:

(1) Generate a given b; r; l; s; y from the normal distribution N mas2 bs2a s2_as2 a ; s2_s2 a s2_as2 a ; where a 10_Sÿ1_{1 and b 1}0_Sÿ1_yl_{ÿ bt:}

Consequently, if sa ! 1 in the prior, we generate a from the conditional distribution

Nb=a; s2_=a.

(2) Generate b given a; r; l; s; y from the normal distribution: N mbs2 ds2b s2_cs2 b ; s2s2b s2_cs2 b ! ; where

c t0_Sÿ1_{t and d t}0_Sÿ1_yl_{ÿ a1:}

Consequently, if sb! 1 in the prior, we generate b from the conditional distribution

Nd=c; s2_=c.

(3) Generate r given a; b; l; s; y using the Metropolis (1953) method, where f r / 1 ÿ r21=2 _eÿ1=2s2_Sy;bb;r;l

pr: (4) Generate l given a; b; r; s; y using the Metropolis method, where

f l / eÿ1=2s2_Sy;bb;r;lYn i 1 ylÿ1 i ( ) pl:

(5) Generate s given a; b; r; l; y from the modi®ed inverse gamma distribution IG0_{g n; Z Sy; bb; r; l:}

The Metropolis algorithm used in steps (3) and (4) is described in Appendix 2. An alternative blocking algorithm that combines steps (1) and (2) into one step should be more ecient especially when a and b are highly correlated. Let S_x0_S

ÿ1x=s2 Sÿ1p . That is, we generate

the vector bb given s; r; l from the following bivariate normal density: a b N Sÿ1 x0Sÿ1yl s2 Sÿ1p m_ma_b ; S ÿ1

instead of generating a conditioning on b and the rest of the parameters and similarly for b in two steps. We implemented the latter without blocking because the correlation coecient between a and b is not high in our numerical examples.

(6)

FORECAST

Having obtained the posterior distribution of the parameters, we can use it to predict the future values of y. We will illustrate this by the one-step forecast. Generalization to forecast the lth future period is straightforward and discussed at the end of this section. Suppose we are at the nth period. Let Dndenote the data set fy1; . . . ; yng. Let xi 1; i. We suppress x1; . . . ; xn; xn1in

the notation for convenience, because they are the known covariates. Let yy denote the parameters bb; r; l; s. Let yyk;s _{denote the variate of yy drawn in the kth iteration and sth}

replication of the Gibbs sampler given the data set Dn. The same superscript notation will be

applied to subsets of yy and their functionals. We usually take k suciently large. Prediction for the n 1th period follows from the predictive density

f Yn 1j Dn

Z

f Yn 1j Dn; yypyy j Dn dyy

where Yn1 denotes the random future observation at period n 1. This density can be

approximated by Monte Carlo integration from the Gibbs sample: ^fYn 1j Dn 1_r

Xr s 1

f Yn 1j Dn; yyk;s 6

This predictive density can be obtained by simply drawing ys_{n 1} for s 1; . . . ; r from f Yn 1j Dn; yyk;s for each yyk;s in the replications. The mean of this predictive distribution is

computed from

EYn 1j Dn EEYn 1j Dn; yy j Dn 7

To evaluate the inner expectation, let us consider two cases. We assume n 0 in equation (2) for simplicity.

Case (1): l 6 0.

It follows from equation (2) that

yn1 yl_{n 1}l 11=l 8 where yl_{n 1} xn 1bb an1 and an 1 ran en1: Note an yln ÿ xnbb y l nÿ 1 l ÿ xnbb: Therefore, yl_{n 1} xn 1bb r y l nÿ 1 l ÿ xnbb en 1:

(7)

Therefore, from equation (8), we have yn 1 xn1bb r y l nÿ 1 l ÿ xnbb en 1 l 1 1=l : 9 Case (2): l 0.

We can derive the following equation by a similar method:

yn 1 expxn1bb rlog ynÿ xnbb en 1: 10

Let yk;s_{n 1} denote the functional (9) evaluated at the kth iteration and sth replication of the sampler, i.e. yk;s_{n 1} xn 1bbk;s rk;s y lk;s n ÿ 1 lk;s ÿ xnbbk;s ek;s_{n 1} lk;s₁ 1=lk;s 11 Note that ek;s_{n 1} is generated from the N0; sk;s2_{distribution in the kth iteration and sth}

replication of the Gibbs sampler.

Therefore, combining equations (7), (9), and (6), when l 6 0, we predict Yn 1by

^yn 11_r

Xr s 1

yk;s_{n 1}: 12

Alternatively, we can predict Yn 1 using the median of the Gibbs sample,

~yn 1 median yk;sn 1

n o_r

s 1

: 13

Prediction intervals and quantiles of the functional Yn 1 can be computed similarly from the

sample yk;s_{n 1}; s 1; . . . ; r.

Similarly, when l 0, we can use the following variates to predict Yn 1:

yk;s_{n 1} expxn 1bbk;s rk;slog ynÿ xnbbk;s ek;sn 1: 14

Then we use the mean or the median of the sample to predict yn 1 and construct predictive

intervals from the Gibbs sample (14).

Theoretically, the posterior probability of the case l 0 is 0 when the prior on l is a continuous measure. However, we had to program the two cases l 0 and l 6 0 of the forecast rule separately to avoid an over¯ow problem.

Having obtained ^yn 1, we apply the inverse transformations of equation (1) to predict Fn 1,

i.e. (1) ^Fn 1 ^yn 1= ^yn 1 1; (2) ^Fn 1 Flog ^yn 1; (3) ^Fn 1 1 ÿ expÿ^yn 1; and (4)

^

Fn 1 expfÿ1=^yn 1g. The Bayesian predictive intervals for Fn 1are calculated from the Gibbs

sample of yn 1 in equations (11), (14), and the above transformations.

In addition to the naive inverse transformation, we can add a bias-corrected term which involves only the second derivative of the inverse transformation in the Taylor series expansion. However, as noted in Keramidas and Lee (1990, p. 628), the bias-correction does not guarantee to produce better forecasts as the ®rst derivative vanishes. Granger and Newbold (1976) studied

(8)

other forecast rules for the transformed variables using Hermite polynomials. We have chosen only the naive inverse transformation in this paper for his simplicity. Moreover, the naive ones have demonstrated desirable forecast performance as in the section of numerical examples.

Now we brie¯y discuss how we generalize the above one-step prediction rule to multi-step prediction. Let Yf Yn 1; . . . ; Yn l denote the vector of future observations. Then the joint

density of Yfgiven Dncan be written as

f Yfj Dn f Yn 1j Dn; yyf Yn 2j Yn 1; Dn; yy f Yn lj Yn 1; . . . ; Yn lÿ1; Dn; yy

Therefore, the samples from the joint future density can be drawn sequentially with each step similar to the single step in equation (6); a generated new future observation in each step is incorporated as data into the next step of the sequential procedure.

MODEL VALIDATION AND MODEL CHOICE

Both model adequacy and model selection issues are discussed in this section. Model adequacy is checked by comparing the observed yn 1to its 95% predictive interval developed in the previous

section for each n in a series. A model is judged to be adequate if about 95% of the intervals contain the observed future values.

For model selection, we consider three criteria: mean squared error (MSE), mean absolute relative deviation (MARD), and the prequential pseudo-Bayes factor (PPBF).

To de®ne the PPBF, let us de®ne the prequential conditional predictive ordinate (PCPO), which is the predictive density of Yn 1evaluated at the future observed value yn 1. Therefore, we

have from equation (6), dn 1 ^fyn 1j Dn 1_rXr s 1 1 2p p sk;s exp ÿ ylk;s_ÿ1 n 1 lk;s ÿ xn 1bbk;sÿ rk;sak;sn !₂ 2sk;s2 8 > > > > > < > > > > > : 9 > > > > > = > > > > > ; ylk;s_ÿ1 n 1 15 where ak;s n yl k;s n ÿ xbbk;s:

Consequently, the PCPO for Fn1is computed by

cn 1 ^fFn 1j Dn ^fyn 1j Dn_dFdy y yn 1

16 where dy/dF is computed from equation (1).

Let N denote the total number of periods in the data set. We would like to monitor data for a reasonable time period, say for t 1; . . . ; I, before considering model selection. Then we evaluate the following three prediction measurements for each of the models for the periods I 1 to N. All the prediction rules are computed prequentially as described in the previous section, i.e.

^

(9)

in equation (1) together with the AR(1) model for the Box±Cox transformation, we can evaluate the following three measurements:

MSE 1 N ÿ I XN t I 1 ^Ftÿ Ft2 17 MARD _{N ÿ I}1 XN t I 1 j ^Ftÿ Ftj Ft 18 PPBF YN t I 1 ct 19

The best model is the one with smallest MSE, smallest MARD, or the largest PPBF.

Note the PPBF is dierent from the pseudo Bayes factor in Geisser and Eddy (1979) where the cross-validation idea is used. In fact, the PPBF evaluates the conditional joint predictive density of the data yI 1; . . . ; yN given y1; . . . ; yI.

To compute the PPBF in equation (19), we need to run each of the N ÿ I Gibbs samplers from scratch for each prequential conditional predictive ordinate. The sequential imputation algorithm developed by Kong, Liu, and Wong (1994) and Liu and Chen (1995) can be modi®ed to suit for this situation. The sequential imputation algorithm provides a desirable alternative approach to this problem. With the new data yI 1; . . . ; yNcoming in sequentially, the algorithm

sequentially impute the unknown parameters using the importance sampling weights based on the same old data batch y₁; . . . ; yI. As long as the new data are not wildly dierent from the old

batch, the importance sampling weights should be well behaved. NUMERICAL EXAMPLES

Analysis based on two real-life data sets is summarized here. The ®rst data set, listed in Table I, initially provided by Nielson Inc., is the penetration data for colour television for the period 1956±85. The second data set, listed in Table IV, is the penetrations of electronic telephone switching systems for the years from 1967 to 1984 for a telephone company. The numbers Ft

listed are the fractions of the total number of new technology users (users of colour television, electronic switching systems, etc.) divided by the total number of new and old technology users (users of colour and black and white televisions, subscribers for electronic (new) and electro-mechanical (old) switching systems).

Table I. Technology penetration data for colour TV

Year Penetrations Year Penetrations Year Penetrations

1956 0.00052 1966 0.09694 1976 0.73606 1957 0.00219 1967 0.16325 1977 0.77065 1958 0.00394 1968 0.24175 1978 0.77984 1959 0.00569 1969 0.32034 1979 0.80926 1960 0.00743 1970 0.35744 1980 0.83028 1961 0.00943 1971 0.41015 1981 0.82916 1962 0.01208 1972 0.48631 1982 0.87595 1963 0.01889 1973 0.55355 1983 0.88703 1964 0.03120 1974 0.62251 1984 0.90465 1965 0.05332 1975 0.68394 1985 0.91519

(10)

Diuse priors on the parameters are chosen for both data sets. The parameters in equation (4) are chosen such that sa! 1 and sb! 1. Given this limiting case, our posterior does not

depend on ma or mb. The prior on r is uniform ÿ1; 1 to re¯ect the equal likelihood of r. The

prior on l is chosen to be the improper ¯at prior, i.e. pl 1 for all l. The prior on s is also improper, namely, ps 1=s.

Table II lists the point estimates, estimates of the standard deviations, and the percentiles for each of the parameters for the complete colour television data set (1956 to 1985) and the model with the reciprocal link transformation (M4). The Bayes estimates for the parameters are not signi®cantly dierent among the four link transformations; therefore we omit the results for the other link transformations. Initially, the Bayes estimates are computed from the Gibbs sampler with 53 iterations and 500 replications and with 50 loops in each Metropolis algorithm to generate r and l for each iteration and replication of the Gibbs sampler. The starting points for the replications for each parameter are chosen from random perturbations around the maximum likelihood estimates. The convergence of the Gibbs sampler is monitored by examining their empirical quantiles. This leads us to iterate the above Gibbs sampler ten more times to yield the results in Tables II and III. The Bayesian predictive intervals can easily be obtained from Table II. For example, the 95% intervals are read from the 2.5% and 97.5% columns.

For the ®rst data set, we produce the one-step-ahead prediction rule for the year 1966 using both the mean and the median of the Gibbs sample based on the data set for the year 1956 to 1965. Then we predict the subsequent years prequentially, i.e. using the data from 1956 to 1966 to predict the penetration for 1967, etc. In Figure 1, the one-step forecasts for the years 1966 to 1985 using the medians predictors are plotted for each of the four link transformations. They are also compared with the actually observed data. This ®gure shows that the logistic transformation (M1) produces the best forecasts except for the years 1974±1976 and 1982.

Figure 2 provides model adequacy checks in which the 95% Bayesian credible intervals are plotted pointwise for each of the one-step-ahead prediction rules. The ®gure shows that all four models are adequate because the future observed values are always contained in the intervals. The con®dence bands become narrower as time increases as expected because more data are available. It can be observed that the four models dier less on the left tails of the distributions than on the right tails.

The 90% and 95% frequentist's predictive intervals can be computed from a Student-t distribution. Let ^y _{denote the frequentist's forecast of y}

n given in the Appendix 1. Then

^yl+ta=2n ÿ 2^s will yield a 1001 ÿ a% predictive interval for the time t n, where ^s is

com-puted similarly to equations (A15) and (A17) of Lee and Lu (1987, p. 76). Hence approximately 1001 ÿ a% predictive intervals for yn can be obtained from ^y^lÿ ^lta=2n ÿ 2^s1=^l < yn<

^y^l_^lt_{a=2n ÿ 2}_^s1=^l_{: The frequentist's predictive rules are transformed back to obtain the}

predictive rules ^F. In Figure 3, these frequentist's 95% intervals are plotted against the Bayesian intervals for the years 1966 to 1985, both with the logistic link. It can be seen that the Bayesian

Table II. Gibbs approximation to the estimates for colour TV data

Mean S.D. 2.5% 5% 25% 50% 75% 95% 97.5% ^a ÿ2:38 0.30 ÿ2:94 ÿ2:79 ÿ2:54 ÿ2:38 ÿ2:25 ÿ1:88 ÿ1:76 ^b 0.15 0.01 0.13 0.14 0.15 0.16 0.16 0.17 0.17 ^r 0.90 0.07 0.75 0.77 0.87 0.92 0.96 0.99 0.99 ^l ÿ0:08 0.06 ÿ0:20 ÿ0:18 ÿ0:12 ÿ0:08 ÿ0:04 0.02 0.04 ^s 0.08 0.01 0.06 0.06 0.07 0.08 0.09 0.10 0.10

(11)

Figure 1. Plot of the Bayesian one-step forecasts for each of the four links versus years

(12)

predicting intervals are not only shorter in length than the frequentist's ones but also more accurate in prediction.

Table III addresses the model selection question. The three criteria MSE, MARD, and PPBF are computed for the last 20 periods, i.e. 1966 to 1985. The three measurements are denoted by MSE(B), MARD(B), and PPBF. Furthermore, MSE and MARD are also computed using the frequentist's one-step prediction rule. They are denoted by MSE(F) and MARD(F). The table shows that the Bayesian predictor produces smaller MSE and MARD than the frequentist's predictor. Table III also shows that the logistic transformation (M1) yields the smallest MSE and MARD, Bayesian or frequentist's, among the four transformations. This is con®rmed by Figure 2. On the other hand, the reciprocal log transformation (M4) yields the largest PPBF. This is con®rmed by Figure 4, where the prequential CPOs for the years 1966 to 1985 are plotted for each of the link functions. It is not entirely surprising that we select dierent models due to dierent criteria. The skewedness of the predictive density revealed in Figure 2 manifests this

Figure 3. Plot of 95% Bayesian and frequentist's one-step-ahead predictive intervals and for the logistic link versus years

Table III. Model selection for the colour TV data

MSE(B) MARD(B) PPBF(log) MSE(F) MARD(F)

M1 0.00034 0.047 37.39 0.00038 0.052

M2 0.00040 0.051 46.74 0.00052 0.058

M3 0.00070 0.063 40.99 0.00054 0.064

(13)

phenomenon. Figure 4 further shows that models 2 and 4 are quite comparable and the logistic (M1) model ®ts well for the earlier periods up to 1972.

Table V lists the Bayes estimates for the various parameters and its credible intervals based on the telephone switching data set. Note that the estimates for b are positive for both data sets. This

Figure 4. Plot of the prequential conditional predictive ordinates (PCPO) for each of the four links versus years

Table IV. Penetrations of electronic telephone switching systems

Year Penetrations Year Penetrations Year Penetrations

1967 0.00071 1973 0.11259 1979 0.30486 1968 0.00386 1974 0.14553 1980 0.32404 1969 0.00447 1975 0.16736 1981 0.35516 1970 0.02003 1976 0.19083 1982 0.37708 1971 0.02961 1977 0.20130 1983 0.41134 1972 0.05974 1978 0.26299 1984 0.43709

Table V. Gibbs approximation to the estimates for the telephone data set

Mean S.D. 2.5% 5% 25% 50% 75% 95% 97.5% ^a ÿ2:23 0.30 ÿ2:83 ÿ2:64 ÿ2:36 ÿ2:21 ÿ2:08 ÿ1:85 ÿ1:79 ^b 0.11 0.01 0.08 0.09 0.10 0.11 0.12 0.14 0.14 ^r 0.70 0.20 0.30 0.34 0.55 0.73 0.87 0.97 0.98 ^l 0.46 0.05 0.35 0.38 0.43 0.46 0.50 0.55 0.57 ^s 0.06 0.02 0.03 0.04 0.05 0.06 0.07 0.09 0.10

(14)

is reasonable because we would expect substitutions to increase with time. The estimate for l for the TV data is much smaller than that of the telephone data indicating that the dependence for the latter is much stronger. Table VI lists the model selection measures for the last eight periods, i.e. 1977 to 1984. It reveals that the logistic transformation (M1) is preferred by the prequential pseudo-Bayes factor. The probit transformation (M2) is preferred by the Bayesian MSE and MARD criteria. The Bayesian prediction rules are better than the frequentist's ones in prediction. Moreover, our ®gures show that the Bayesian prediction rules have shorter intervals than the frequentist's ones and more accurate; all four link transformations, with the Box±Cox transformation and AR(1) models, are adequate. These ®gures are omitted.

APPENDIX 1: FREQUENTIST'S PREDICTION RULE The frequentist's prediction rule is given here. Let y_y

n 1; . . . ; yn l0 be an l-dimensional

vector of future observations to be predicted on the basis of the past observations y y₁; . . . ; yn0. Let x denote the design matrix corresponding to the future observations.

Furthermore, let Covyl0

; yl0

0_s2_{=1 ÿ r}2_c_ab_{, where c}_ab_rj aÿb j_{; a; b 1; . . . ; n l:}

When the unknown parameters are replaced by their maximum likelihood estimates from equation (A3), the forecasts of y _{can be written as}

^y_{l 1} f1 ^lx^bb ^rr y^ln ÿ xn^bbg1=^lÿ n when ^l 6 0 expx^bb ^rr y^l n ÿ xn^bb ÿ n when ^l 0 ( A1 where ^bb x0_V_^ÿ1_xÿ1_x0_V_^ÿ1_y^l_{; ^rr ^r; ^r}2_{; . . . ; ^r}l0_{, 1 1; . . . ; 1}0_{, ^}_{V ^c} ab; ^cab ^rj aÿb j, xn

and y^ln are the nth row of x and y^l respectively, and ^l and ^r are the MLEs of l and r,

respectively. For the one-step-ahead forecast, we set l 1 in the above discussion and let ^y denote the ®rst row of ^y_{l 1}. We assume n is 0 in the two examples.

APPENDIX 2: GIBBS SAMPLING

We describe brie¯y the Gibbs sampling procedure. Suppose we desire to estimate f U1; U2; . . . ; Upj Dn, the posterior joint density of U1; U2; . . . ; Up, given the data Dn. The

algorithm assumes that we can generate variates from the conditional densities f Uij fUjgj 6 i; Dn.

The algorithm proceeds as follows. Let us start with initial values, U0₁ ; U0₂ ; . . . ; U0

p . Generate

a value U1₁ from the conditional density f U1j U02 ; . . . ; U0p ; Dn. Similarly, generate a value

U1₂ from the conditional density f U2j U11 ; U03 ; . . . ; U0p ; Dn, and continue until the value

U1

p is generated from the conditional density f Upj U11 ; . . . ; U1pÿ1; Dn. With this new

realization of the values U1_U1

1 ; . . . ; U1p replacing the old values, we can continue to

iterate until the kth realization. Under very mild regularity conditions, this Markov chain

Table VI. Model selection for the telephone data set

MSE(B) MARD(B) PPBF(log) MSE(F) MARD(F)

M1 0.00028 0.047 25.72 0.00058 0.061

M2 0.00021 0.038 20.92 0.00141 0.106

M3 0.00049 0.059 17.88 0.00317 0.173

(15)

converges to a stationary distribution for large k. The vector Uk₁ ; . . . ; Uk

p has a distribution

that is approximately equal to f U; . . . ; Upj Dn. By starting independent initial choices, we can

also replicate the above iterations r times. Let Ui;s_Ui;s

1 ; . . . ; Ui;sp denote the realization

of U for the ith iteration and sth replication. The posterior moments, functionals, and credible sets can be computed from the empirical measure assigning weight 1/r to each Uk;s₁ ; . . . ; Uk;s

p ; s 1; . . . ; r, to compute posterior features.

For the issue of convergence and the issue of pragmatic choices of k and r, consult Geman and Geman (1984), Tanner and Wong (1987), Gelfand and Smith (1990), Gelfand et al. (1990), Gelman and Rubin (1992), and Tierney (1994).

When the conditional densities are not easily identi®ed, such as in cases without conjugate priors, the Metropolis (1953) algorithm or importance sampling methods can be employed. We describe the Metropolis algorithm brie¯y. Suppose we desire to sample a variate from the following generic density:

f U₁j U2; . . . ; Up; Dn R1 f Dnj U1; . . . ; UppU1; U2; . . . ; Up

ÿ1f Dnj U1; . . . ; UppU1; U2; . . . ; Up dU1 A2

Let f (U1) denote the conditional density in equation (A2), suppressing the conditioning

variables for brevity. Let us de®ne a transition kernel q(U1, X ), which maps U1to X. If U1is a

real variable with range in ÿ1; 1, we can construct q such that X U1 s0Z with Z being

the standard normal random variate and s02_{re¯ecting the conditional variance of U}

1in equation

(A2). Then the Metropolis algorithm proceeds as follows: Step 1: start with any point U0₁ , and stage indicator j 0.

Step 2: generate a point X according to the transition kernel qU j ₁ ; X :

Step 3: update U j ₁ to U j 1₁ X with probability p minf1; f X=f U j ₁ g, stay at U j ₁ with probability 1 ÿ p.

Step 4: repeat steps 2 and 3 by increasing the stage indicator until the process reaches a stationary distribution.

Chapter 9 of Hammersley and Handscomb (1964) provides a discussion of why this algorithm works. Note that this algorithm is de®ned by using the ratio of two values of equation (A2). Therefore, all we need is to know the functional form of the likelihood and prior. This spares us the task of evaluating the normalizing constant.

If U1is a bounded variable with range in (a, b), we can use a distribution with support on (a, b)

(for example, uniform over (a, b) distribution) to generate X. In this case, the transition kernel q(U1, X ) does not depend on U1. Alternatively, we can apply the transformation, such as

U0_logU₁_{ÿ a=b ÿ U}

1, to map (a, b) into ÿ1; 1, then use the normal transition kernel

described earlier to generate X using the density of U0_{. We implemented the latter procedure in}

our numerical examples because the normal transitional kernel seems to yield a more ecient algorithm than the uniform transitional kernel. On using the normal kernel, we ®rst transform r to r0_{2 ÿ1; 1 by r}0_{log1 r=1 ÿ r. Then we apply the Metropolis algorithm to the}

function

f r0_/ e3r0=2

1 er03eÿ1=2s

2_S0_y;bb;r0_;l

where S0_{y; bb; r}0_{; l is obtained from Sy; bb; r; l with r replaced by e}r0

ÿ 1=er0

1. We also need to specify sr0(denoted by s0in the above Metropolis algorithm). The quantity s_r0is usually

(16)

chosen to re¯ect the conditional standard deviation of r0 _{given bb; l; s; y. We propose three}

methods to estimate the variance s2 r0:

(1) Start with an arbitrary guess of sr0; use the Gibbs sampler of the r0s to estimate empirically

its variance.

(2) Obtain estimates for the mean and variance of r, denoted by m_r and s2

r, from the SAS

AUTOREG procedure given a ®xed l. Apply the delta method to obtain an estimate of the variance of r0_{, i.e. s}2

r0 s2_r2=1 ÿ m2_r2.

(3) Derive the pro®le likelihood of r and l as in Lee and Lu (1987), lr; l; y ÿn

2log ^s2r; l l ÿ 1 Xn t 1

log yt1₂ log1 ÿ r2 A3

where

^s2_{r; l}1

nylÿ x^bb0Sÿ1ylÿ x^bb with

^bb x0_Sÿ1_xÿ1_x0_Sÿ1_yl

Then, invert the sample information matrix at the maximum likelihood estimates of equation (A3) to obtain the preliminary variance estimates of r and l respectively. This method is adopted in our analysis, because it has the advantage of obtaining both variances. Then we apply the delta method to obtain the variance estimates for r and l. These variance estimates are further re®ned by empirical evaluations of the variance estimates from the Gibbs sampler because the conditional variances are required in the Metropolis algorithm. Having obtained r0 _{from the Metropolis algorithm, we transform}

the r0_{back to r by r e}r0

ÿ 1=er0

1.

An operation like that applied to r can also be applied to the variate l, except that the above method (3) of using the SAS AUTOREG is not applicable for l.

Conditioning on l, our model is a special case of the models that are discussed by Albert and Chib (1993), Chib (1993) and Chib and Greenberg (1994). Therefore, the generation of r can be done more eciently using the idea in Chib and Greenberg. That is the full conditional distribution of r for the uniform prior can be written as

r j data; bb; s2_{/ Cr N^r; o}ÿ1_I ÿ1;1

where Cr is the density of the initial observation and o Xn ÿ1 i 1 yl_i ÿ a ÿ bi2 and ^r oÿ1Xn ÿ1 i 1

(17)

Then the Metropolis step can be done by drawing a r (say, r_{) from N^r; o}ÿ1_{. The chain stays at}

the current value if r _{does not satisfy stationarity; otherwise it moves to r} _{with probability}

given by

min Cr Cr ; 1

ACKNOWLEDGEMENTS

Kuo's research was partially supported by the National Science Foundation Grant DMS-9008021. Lee's research was partially supported by the National Research Council Grants 82-208-M009-054 and 85-2121-M-009-004. Pai's research was partially supported by the Research Foundation, University of Connecticut.

REFERENCES

Albert, J. and Chib, S., `Bayes inference via Gibbs sampling of autoregressive time series subject to Markov mean and variance shifts', Journal of Business and Economic Statistics, 11 (1993), 1±16.

Berger, J. O., Statistical Decision Theory and Bayesian Analysis, 2nd edn, New York: Springer-Verlag, 1985.

Box, G. E. P. and Cox, D. R., `An analysis of transformations', Journal of the Royal Statistical Society, A, 26 (1964), 211±252.

Carlin, B., Polson, N. and Stoer, D., `A Monte Carlo approach to nonnormal and non-linear state-space modeling', Journal of the American Statistical Association, 87 (1992), 493±500.

Chib, S., `Bayes regression with autoregressive errors, a Gibbs sampling approach', Journal of Econo-metrics, 58 (1993), 275±294.

Chib, S. and Greenberg, E., `Bayes inference in regression models with ARMA(p, q) errors', Journal of Econometrics, 64 (1994), 183±206.

Fisher, J. C. and Pry, R. H., `A simple substitution model of technological change', Technological Forecasting and Social Change, 3 (1971), 75±88.

Geisser, S. and Eddy, W., `A predictive approach to model selection', Journal of the American Statistical Association, 74 (1979), 153±160.

Gelfand, A. E., Hills, S. E., Racine-Poon, A. and Smith, A. F. M., `Illustration of Bayesian inference in normal data models using Gibbs sampling', Journal of the American Statistical Association, 85 (1990), 972±985.

Gelfand, A. E. and Smith, A. F. M., `Sampling-based approaches to calculating marginal densities', Journal of the American Statistical Association, 85 (1990), 398±409.

Gelman, A. E. and Rubin, D., `Inference from iterative simulation using multiple sequences', Statistical Science, 7 (1992), 457±472.

Geman, S. and Geman, D., `Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images', IEEE Transactions on Pattern Analysis and Machine Intelligence, 6 (1984), 721±741.

Gompertz, B., `On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies', Philosophical Transactions of the Royal Society of London, 115 (1825), 513±585.

Granger, C. W. J. and Newbold, P., `Forecasting transformed series', Journal of the Royal Statistical Society, B, 38 (1976), 189±203.

Hammersley, J. M. and Handscomb, D. C., Monte Carlo Methods, London: Chapman and Hall, 1964. Keramidas, E. M. and Lee, J. C., `Forecasting technological substitutions with concurrent short time

series', Journal of the American Statistical Association, 85 (1990), 625±632.

Kong, A., Liu, J. S. and Wong, W. H., `Sequential imputations and Bayesian missing data problems', Journal of the American Statistical Association, 89 (1994), 278±288.

(18)

Lee, J. C. and Lu, K. W., `On a family of data-based transformed models useful in forecasting techno-logical substitutions', Technotechno-logical Forecasting and Social Change, 31 (1987), 61±78.

Lee, J. C. and Lu, K. W., `Algorithm and practice of forecasting technological substitutions with data-based transformed models', Technological Forecasting and Social Change, 36 (1989), 401±414.

Lee, J. C. and Liu, K. C., `Bayesian analysis of a general growth curve model with power transformations and AR(1) dependence', paper presented at the 1996 joint statistical meeting in Chicago.

Liu, J. S. and Chen, R., `Blind deconvolution via sequential imputations', Journal of the American Statistical Association, 90 (1995), 567±576.

Ljung, G. M. and Box, G. E. P., `Analysis of variance with autocorrelated observations', Scandinavian Journal of Statistics, 7 (1980), 172±180.

Marriott, J., Ravishanker, N., Gelfand, A. and Pai, J., `Bayesian analysis for ARMA processes: complete sampling based inference under exact likelihoods', in Berry, D., Chaloner, K. and Geweke, J. (eds), Bayesian Statistics and Econometrics: Essays in Honor of Arnold Zellner, (1994), 241±254.

McCullagh, P. and Nelder, J. A., Generalized Linear Models, 2nd edn. London: Chapman and Hall, 1989. McCulloch, R. and Tsay, R. S., `Bayesian analysis of autoregressive time series via the Gibbs sampler',

Journal of Time Series Analysis, 15 (1994), 235±250.

Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and Teller, E., `Equations of state calculations by fast computing machines', Journal of Chemical Physics, 21 (1953), 1087±1091.

Nielson, A. C. Inc., private communications (1985).

Sharif, M. N. and Islam, M. N., `The Weibull distribution as a general model for forecasting technological change', Technological Forecasting and Social Change, 18 (1980), 247±256.

Stapleton, E., `The normal distribution as a model of technological substitution', Technological Forecasting and Social Change, 8 (1976), 325±334.

Tanner, M. and Wong, W., `The calculation of posterior distributions by data augmentation' (with discussion), Journal of the American Statistical Association, 81 (1987), 82±86.

Tierney, L., `Markov chains for exploring posterior distributions' (with discussion), The Annals of Statistics, 22 (1994), 1701±1758.

Zellner, A. and Tiao, G. (1964), `Bayesian analysis of the regression model with autocorrelated errors', Journal of the American Statistical Association, 59 (1964), 763±778.

Author's biographies:

Lynn Kuo is a professor in the Department of Statistics at the University of Connecticut. She received her PhD from the University of California at Los Angeles. Her current research interests include Bayesian computation, software reliability, and event history analysis.

Jack Lee is professor and director at the Institute of Statistics, National Chiao Tung University, Hsinchu, Taiwan. He received his PhD from the State University of New York at Bualo. His current research interests include time series, growth curve, and multivariate analysis.

Peter Cheng is an associate professor in the Department of Statistics at Ming Chuan University, Taipei, Taiwan. He received his PhD from Columbia University. His research interests include bioassays with historical controls and isotonic regression methods for tumour analysis.

Jerey Pai is an assistant professor at the Warren Center for Actuarial Studies and Research, Faculty of Management, University of Manitoba, Winnipeg, Canada. He received his PhD from the University of Connecticut. His research interests include time series, volatility and compound loss models.

Authors' addresses:

Lynn Kuo, Department of Statistics, University of Connecticut, Storrs, CT 06269-3120, USA. Jack Lee, Institute of Statistics, National Chiao Tung University, Hsinchu, Taiwan.

Peter Cheng, Department of Statistics, Ming Chuan University, Taipei, Taiwan.

Jeerey Pai, Warren Center for Actuarial Studies & Research, Faculty of Management, University of Manitoba, Winnipeg, Manitoba, R3T 5V4 Canada.