Forecasting energy consumption in Taiwan using hybrid nonlinear models

(1)

Forecasting energy consumption in Taiwan using hybrid nonlinear models

H.T. Pao

*

Department of Management Science, National Chiao Tung University, 1001 Ta Hsueh Road, 30010 Hsinchu, Taiwan, ROC

a r t i c l e

i n f o

Article history:

Received 9 September 2008 Received in revised form 25 March 2009 Accepted 22 April 2009 Available online 15 August 2009 Keywords:

Energy consumption Artiﬁcial neural networks Encompassing test SEGARCH models

Multi-step-ahead forecasting

a b s t r a c t

The total consumption of electricity and petroleum energies accounts for almost 90% of the total energy consumption in Taiwan, so it is critical to model and forecast them accurately. For univariate modeling, this paper proposes two new hybrid nonlinear models that combine a linear model with an artiﬁcial neural network (ANN) to develop adjusted forecasts, taking into account heteroscedasticity in the model’s input. Both of the hybrid models can decrease round-off and prediction errors for multi-step-ahead forecasting. The results suggest that the new hybrid model generally produces forecasts which, on the basis of out-of-sample forecast encompassing tests and comparisons of three different statistic measures, routinely dominate the forecasts from conventional linear models. The superiority of the hybrid ANNs is due to their ﬂexibility to account for potentially complex nonlinear relationships that are not easily captured by linear models. Furthermore, all of the linear and nonlinear models have highly accurate forecasts, since the mean absolute percentage forecast error (MAPE) results are less than 5%. Overall, the inclusion of heteroscedastic variations in the input layer of the hybrid univariate model could help improve the modeling accuracy for multi-step-ahead forecasting.

Ó 2009 Published by Elsevier Ltd.

1. Introduction

Worldwide energy consumption is rising sharply, owing to increasing human population, continuing pressures for better living standards and emphasis on large-scale industrialization in devel-oping countries, thus sustaining positive economic growth rates. Taiwan’s energy consumption increased sharply from 49.67 million kiloliters of oil equivalent (KLOE) in 1990 to 113.85 million KLOE in 2007. The annual growth rate was 5.00% during this period. Among the various forms of energy consumed in 2007, electricity accounted for 51.18%, petroleum 38.35%, and the others 10.47% (Bureau of Energy, Ministry of Economic Affairs in Taiwan). Total electricity consumption rose sharply from 82.65 billion (kwh) in 1990 to 229.20 billion (kwh) in 2007 with an annual growth rate of 6.18%. Petroleum consumption increased from 22.97 KLOE in 1990 to 43.66 KLOE in 2007 with an annual growth rate of 3.85%. Given this fact, the accuracy of energy demand forecasting is important not only for energy utilities themselves but also for consumers.

A sound forecasting technique is essential for accurate invest-ment planning in energy production/generation and distribution. Multivariate modeling along with co-integrated techniques or regression analysis has been used in a number of studies to analyze and forecast energy consumption [1–6]. One limitation of

multivariate models is that they depend on the availability and reliability of data on independent variables over the forecasting period, which requires further efforts in data collection and esti-mation. On the other hand, univariate time series analysis provides another modeling approach, which only requires the historical data of the variable of interest to forecast its future evolution behavior. The univariate Box–Jenkins autoregressive integrated moving average (ARIMA)[7]analysis has been widely used for modeling and forecasting many medical, environmental, ﬁnancial, and engineering applications[8–11]. In addition, Zhou et al.[12] pre-sented a univariate trigonometric grey prediction approach for forecasting electricity demand in China.

Recently, artificial neural network (ANN) techniques have also gained popularity in energy demand and load forecasting. For short-term forecasting, Gonzalez and Zamarreno [13] proposed specifications for a self-exciting neural network (NN) model to forecast energy consumption in buildings. Lauret[14]proposed the use of Bayesian regularization as a technique to estimate the parameters of a NN in order to forecast load. Since the Bayesian methods provide an explicit handling of uncertainty in the modeling, Lauret[14]concluded that the Bayesian NN approach to modeling offers significant advantages over classical NN learning methods for short-term load forecasting. Hipper et al.[15]made a literature review and evaluation in forecasting load using NNs. Amjady and Keynia[16]proposed a hybrid method composed of wavelet transform, NN and evolutionary algorithm for load *Tel.: þ886 3 5131578; fax: þ886 3 5710102.

E-mail address:htpao@cc.nctu.edu.tw

Contents lists available atScienceDirect

Energy

j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e n e r g y

0360-5442/$ – see front matter Ó 2009 Published by Elsevier Ltd. doi:10.1016/j.energy.2009.04.026

(2)

forecasting. Additionally, some mid-term forecast models have also been proposed and implemented in this ﬁeld[17–20]. For long-term energy forecasting, Padmakumari et al.[21]used fuzzy NNs for long-term load forecasting. Kermanshahi and Iwamiya[22]used back-propagation networks and Jordan recurrent networks to forecast Japan’s electricity energy consumption until 2020. Pao[23]

concluded that the forecasting performance of ANN for Taiwan’s energy consumption is higher than that of the other linear models. Moreover, Jebaraj and Iniyan[24]made a literature survey and gave a brief overview of the different types of energy modeling and forecasting.

For a univariate time series forecasting problem, the inputs of the NN are the past lagged observations of the data series, and the outputs are the future values. For multi-step-ahead forecasting, however, one or more output nodes can be used. If one output node is employed, then the iterative forecasting approach is assumed, and each forecast value is used iteratively as input for the next forecasts. In contrast, if the number of output nodes is equal to the length of the forecasting horizon, then the direct forecasting approach is used, in which the future values can be predicted directly from the network outputs[25]. The iterative forecasting approach may generate more prediction errors, because the fore-cast values are iteratively used as inputs for the next forefore-casts. The direct forecasting approach can raise serious round-off errors, because the number of output nodes is equal to the length of the forecasting horizon[17].

The aim of this paper is to focus on multi-step-ahead forecasts for energy consumption in Taiwan using univariate modeling. In order to avoid excessive round-off and prediction errors, taking heteroscedastic variations into account, a new hybrid univariate nonlinear network is proposed with two input nodes generated by a linear model: level forecasts ðbytÞ and volatility forecasts ðb

s

tÞ and

a single output node yt. The forecast encompassing tests and three

different statistical measures are used to assess the out-of-sample performance of the proposed techniques to prevent data mining-induced overﬁtting.

The rest of this paper is organized as follows. In Section2, two proposed new hybrid ANN models are described. Section3presents the performance evaluation methods by using statistical measures and forecast encompassing tests. The model construction and model comparisons are explained in Sections4 and 5, respectively. The last section summarizes and concludes the paper.

2. Methodology

This section describes several linear models: the exponential smoothing model (Winters), the exponential form of the general-ized autoregressive conditional heteroscedasticity (EGARCH) and seasonal EGARCH (SEGARCH) models, the combined Winters with volatility EGARCH model (WARCH), and an artiﬁcial neural network (ANN) nonlinear model. They are brieﬂy described below as the basis on which to present two new hybrid nonlinear models: SEGARCH–ANN and WARCH–ANN. Both hybrid models are formed by combing a linear model with a NN to predict Taiwan’s consumption of electricity and petroleum.

2.1. Winters models

Exponential (EXPO) smoothing methods are often useful for forecasting a time series whose parameters change slowly over time. These methods can be implemented by using the Box–Jenkins methodology [7]. For seasonal data, an exponential smoothing approach is the Winters method. In particular, an ARIMA (0,1,1) (0,1,1)Smodel may be a good alternative to the additive

Winters method, where S is the seasonal periodicity. The additive Winters ARIMA(0,1,1) (0,1,1)Smodel can be written as

1 B1 BS_z t ¼ ð1

q

1BÞ 1

Q

1BS

n

t; (1)

where B is a backward shift operator and vtis a random error. Letbzt

be the forecasting time series; then the residual

n

0

tð ¼ ztbztÞ time

series is both detrended and deseasonalized.

2.2. EGARCH and seasonal EGARCH (SEGARCH) models

The generalized autoregressive conditional heteroscedasticity (GARCH) model is an approach to modeling time series with het-eroscedastic errors [26]. Nelson and Cao [27] argued that the nonnegative constraints on the parameters

a

iand

g

iin the linear

GARCH model are too restrictive. There are no restrictions on these parameters in the exponential form of the GARCH model (EGARCH). In this model, the conditional variance

s

2

t is an asymmetric function

of the lagged disturbances

3

ti. The EGARCH regression model can

be written as zt ¼ x0_t

b

þ

3

t;

3

t ¼

s

tet; ln

s

2 t ¼

u

þX q i ¼ 1

a

_igðetiÞ þ Xp j ¼ 1

g

_jln

s

2 tj ; (2)

where gðetÞ ¼

q

etþ jetj Ejetj; and etwNð0; 1Þ:

Note that Ejetj ¼ pffiffiffiffiffiffiffiffiffi2=

p

if etwNð0; 1Þ: The function g(et) is

linear in etwith slope

q

þ 1 if etis positive, and with slope

q

1 if et

is negative.

The seasonal intervention model employs dummy variables to forecast the time series. The model with autoregressive errors and EGARCH variances (SEGARCH) is expressed as follows:

zt ¼

a

0þ

a

1t þ d1xs1;tþ d2xs2;tþ / þ d10xs10;tþ d11xs11;tþ

n

t;

n

t ¼

3

t

4

1

n

t1 /

4

m

n

tm;

3

t ¼

s

tet; ln

s

2_t¼

u

þX q i ¼ 1

a

igðetiÞ þ Xp j ¼ 1

g

jln

s

2_tj; (3)

where gðetÞ ¼

q

etþ jetj Ejetj; etwINð0; 1Þ; and xs1;t ¼ n 1 if period t is January₀ _otherwise /

xs11;t ¼ n 1 if period t is November₀ _otherwise :

This model could be called an AR(m)-SEGARCH(p,q) regression model (henceforth SEGARCH (m,p,q)). The optimal lag length m is determined based on the information criteria, AIC and SBC, and the Durbin Watson (DW) statistic. Both the Portmanteau Q statistic[28]

and the Lagrange multiplier (LM) test[29]are used to determine the lag lengths p and q of the ARCH model. These tests are signiﬁcant (p < 0.0001) for lags between 1 and 12, which indicate that a very high order ARCH process is needed to model the heteroscedasticity. Both the forecasted values ofbztandb

s

tare used as inputs of the SEGARCH–

(3)

2.3. The combined Winters with volatility EGARCH model (WARCH) Let

n

0

t ¼ ztbzt be the t-th residual, where ztis the observed

value andbzt is the predicted value given by the Winters model.

Therefore, f

n

0

tg is a detrended and deseasonalized time series. The

Ljung–Box Q*_{statistics are used to test the autocorrelation. If the}

p-values of Q*_{are less than 0.05, this is an evidence that the f}

_n

0 tg is

highly autocorrelated. To construct the EGARCH model for f

n

0 tg, the

three statistical tests, DW, AIC and SBC, are used for the autocor-relation to determine the lag length m, and both the Q and LM tests are used for the ARCH process to determine the lag lengths p and q. Once these tests indicate heteroscedasticity with p < 0.05 for lag between 1 and 12, the EGARCH model can be used to produce a forecasted conditional error varianceb

s

2_t by modeling the resid-uals f

n

0

tg with heteroscedastic errors. The proposed two-step

WARCH (m,p,q) model combines a Winters model in the ﬁrst step to obtain the detrended and deseasonalized residuals f

n

0

tg with the

AR(m)-EGARCH (p,q) model in the second step to produce the estimated heteroscedastic error variance b

s

t for the historical and

forecast periods. The WARCH (m,p,q) model is expressed as Step 1 :1 B1 BSzt ¼ ð1

q

₁BÞ1

Q

₁BS

n

t;

n

0t ¼ ztbzt; Step 2 :

n

0 t ¼

3

t

4

1

n

0t1 /

4

m

n

0tm;

3

t ¼

s

tet; ln

s

2_t ¼

u

þX q i ¼ 1

a

igðetiÞ þ Xp j ¼ 1

g

_jln

s

2_tj;

where gðetÞ ¼

q

etþ jetj pffiffiffiffiffiffiffiffiffi2=

p

;and etwINð0; 1Þ: Both forecasted values ofbztandb

s

tfrom the Winters and ARCH

steps, respectively, are used in the WARCH–ANN model discussed below.

2.4. Artiﬁcial neural network (ANN) model

NNs can be described as an attempt by humans to mimic the functioning of the human brain. The models are analytical tech-niques modeled after the processes of learning in the cognitive system and the neurological functions of the brain and are capable of predicting new observations (of speciﬁc variables) from other observations (of the same or other variables) after executing a process of so-called learning from existing data[20]. The models can be built without explicitly formulating the possible relationship that exists between variables. Theoretical results show that NNs are also able to sufﬁciently approximate arbitrary mappings to the desired accuracy if given a large enough network[30]. In this sense, NNs may be seen as multivariate, nonlinear and nonparametric methods, and they should be expected to model complex nonlinear relationships much better than the traditional linear models.

Fig. 1 shows a popular three-layer feedforward NN model. It consists of one input layer with m input variables, one hidden layer with h hidden nodes, and one output layer with a single output node. The hidden layers perform nonlinear transformations on the inputs from the input layer and feed the transformed values to the output layer. The connection weights and node biases are the model parameters. The model estimation process is called network training. Usually in applications of ANNs, the total available data are split into a training set and a test set. The training set is used to calibrate the network model, while the test set is used to evaluate its forecasting ability. During the training procedure, an overall error measure is minimized to get the estimates of the parameters

of the models. More detailed materials about NN learning can be found in Bishop[31].

For m-step-ahead forecasting (m > 1), both iterative and direct forecasting approaches can be used. The iterative forecasting approach with p input nodes has a mapping function of the form ytþ1 ¼ f yt;yt1; .; ytpþ1; ytþ2 ¼ f bytþ1;yt; .; ytpþ2; ytþ3 ¼ f

bytþ2;bytþ1;yt; .; ytpþ3; «

ytþm ¼ fbytþm1;bytþm2; .; bytþ1;yt; .; ytpþm

: ð5Þ

The direct forecasting approach has a mapping function of the form ytþm;ytþm1; .; ytþ1 ¼ fyt;yt1; .; ytpþ1 : (6)

The iterative forecasting approach may generate more predic-tion errors, because the forecast values are iteratively used as inputs for the next forecasts. The direct forecasting approach, however, is subject to serious round-off errors, because the number of output nodes is equal to the length of the forecasting horizon.

In order to avoid excessive round-off and prediction errors, taking heteroscedastic variations into account, a new hybrid univariate network with two input nodes,b

s

t and byt, and a single output node is

proposed. The form of the mapping function can be expressed as ytþ1 ¼ f b

s

tþ1; bytþ1 ;ytþ2 ¼ f b

s

tþ2;bytþ2 ; .; ytþm ¼ fb

s

tþm; bytþm ; (7)

where ðb

s

tþ1; bytþ1Þ; ðb

s

tþ2; bytþ2Þ; .; ðb

s

tþm; bytþmÞ can be

pre-dicted by using a linear model. This hybrid model, using a univar-iate modeling approach for multi-step-ahead forecasting, is described in the next section.

2.5. The hybrid SEGARCH–ANN and WARCH–ANN models

The practical advantage of ANN models is that the relationships between input and output variables do not need to be speciﬁed in advance, since the method itself establishes these relationships

y b0 Output Layer v1 vh b1 … … … b_h 1 2 … … h Hidden Layer w11 whm … … … Input Layer x1 x2 … … … ... xm

(4)

through a training process. Also, ANNs do not require any assumptions on the underlying population distributions.

Both the SEGARCH and WARCH linear approaches are outlined above. The two-step WARCH (p,q) model produces bzt and the

detrained and deseasonalized residual

n

0

tin the ﬁrst Winters step,

and the estimated heteroscedastic error varianceb

s

tin the second

ARCH step for the historical and forecast periods. The SEGARCH model generates predicted valuesbzt and its conditional standard

deviation estimatesb

s

tfor the historical and forecast periods. The

step proposed here takes only the two valuesbztandb

s

tas inputs to

an ANN model.

The new hybrid nonlinear univariate model is constructed by using a two-step process. In step 1, a linear model with an error volatility component (SEGARCH or WARCH) is estimated to generate values for both the level forecastsbzt and the volatility

forecastsb

s

t. In step 2, both of the values estimated in step 1,b

s

tand

bzt, are plugged into an ANN model with the corresponding output

target zt. These models are called the WARCH–ANN and SEGARCH–

ANN models. All of the proposed hybrid ANN models are formed by combining a linear model with a NN to develop an adjusted forecast for Taiwan’s electricity and petroleum consumption levels.

In order to prevent data mining-induced overﬁtting, this paper uses out-of-sample tests to compare the multi-step-ahead fore-casting capabilities of the WARCH, SEGARCH, WARCH–ANN, and SEGARCH–ANN univariate models, where WARCH and SEGARCH are the benchmark models.

3. Forecasting evaluation methods

For the purpose of evaluating out-of-sample forecast capability, two different testing approaches are used. The ﬁrst test associates the three evaluation statistics, root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage forecast error (MAPE), to each model. They are expressed as below:

RMSE ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Xn i ¼ 1 ðPi AiÞ2 . n v u u t ; MAE ¼ X n i ¼ 1 jPi Aij.n; (8) MAPE ¼ X n i ¼ 1 jðPi AiÞ . Aij . n 100:

where Piand Aiare the i-th forecasting and actual values,

respec-tively, and n is the total number of predictions. Lewis[32]interprets the MAPE results as a means to judge the accuracy of the forecast:

less than 10% is a highly accurate forecast, 10–20% is a good forecast, 20–50% is a reasonable forecast, and more than 50% is an inaccurate forecast.

The second test for forecast encompassing was introduced by Chong and Hendry[33]. This test formalizes the intuition that model i should be preferred to model j if model i can explain what model j cannot explain, without model j being able to explain what model i cannot explain. Granger and Newbold[34] argued that forecast encompassing was more stringently required than forecast accuracy. Clements and Hendry[35]proposed the argu-ment that the encompassing test is impleargu-mented through testing the signiﬁcances of the

a

1and

b

1coefﬁcients in the following two

regression equations:

E_i ¼

a

₀þ

a

₁D_ijþ

u

t; (9)

Ej ¼

b

0þ

b

1Dijþ

n

t;

where Eiand Ejdenote the forecast errors for model i and model

j (Ei¼ Pi Aj, Ej¼ Pj Aj), respectively; Dijdenotes the differences

between the forecast results i and j models (Dij¼ Pi Pj), and

u

tand

n

t are random errors. The null hypothesis is that neither model

encompasses (outperforms) the other. If

a

1is signiﬁcantly different

from zero and

b

1is not, then the null hypothesis is rejected in favor

of the alternative hypothesis that model j encompasses model i. Conversely, if

b

1 is signiﬁcant but

a

1 is not, then this is an

evidence that model i encompasses model j. If neither

a

1nor

b

1is

signiﬁcant, or conversely if both

a

1and

b

1are signiﬁcant, then we

fail to reject the null hypothesis and conclude that neither model encompasses the other. Table 4 reports the results of the forecast encompassing tests.

4. Experimental results

In this section, the performance of the alternative modeling approaches is compared using two seasonal time series: electricity consumption and petroleum consumption in Taiwan. The period under examination extends from January 1993 to December 2007 with a total of 180 observations for each series. The period from January 1993 to December 2005 is treated as the estimation (or training) period for the models. The subsequent period, from January 2006 through December 2007, is the testing or out-of-sample period.

4.1. Electricity consumption series

As shown inFig. 2, the time series data of Taiwan’s electricity consumption show strong seasonality and growth trends. The peak

0 5 10 15 20 25 Jan-93 Jan-94 Jan-95 Jan-96 Jan-97 Jan-98 Jan-99 Jan-00 Jan-01 Jan-02 Jan-03 Jan-04 Jan-05 Jan-06 Jan-07 Billion

(5)

season for each year generally occurs in June or July, because energy use is greatest in the summer.

4.1.1. Winters and WARCH models for electricity

The electricity consumption series zt, given inFig. 2, assumes

that the seasonality and the growth trend exist in the historical data and extend to the future with the same pattern. The statistical properties can be examined by using the autocorrelation function (acf) and the partial autocorrelation function (pacf). The result reveals that ztis non-stationary. The ﬁrst regular differences and

ﬁrst seasonal differences are calculated in order to remove the growth trend and the seasonality characteristics. In this step, the ﬁrst 13 observations are lost. The stationary time series thus acquired can be used to identify the Winters model. The estimated equation is presented as follows:

1 B1 B12zt ¼ ð1 0:820BÞ1 0:671B12

n

t: (10) Let bzt be the point prediction of ztand

n

0t ¼ ztbzt be the t-th

residual, where f

n

0

tg is the detrended and deseasonalized time

series with 143 observations. The Augmented Dickey-Fuller (ADF) unit root test can be rejected for the residuals f

n

0

tg at the 5% level of

signiﬁcance, since the ADF test statistic 12.27 is lower than the critical value 2.88. This indicates that the series is stationary. The p-values for the Ljung–Box test statistics are less than 0.05, this means that the f

n

0

tg is highly autocorrelated. Both the Q and LM

tests are signiﬁcant with p < 0.05 for lags between 1 and 12, which indicate that an ARCH process is needed to model hetero-scedasticity. The conditional error variance b

s

t for f

n

0tg can be

forecasted by estimating the parameters of EGARCH process. The new WARCH model is constructed by combining the Winters with the AR(13,18,23,24)-EGARCH (q ¼ (1,24)) model describing the error variance, which can be expressed as

Step 1 :1 B1 B12_zt _{¼ ð1 0:820BÞ}_{1 0:671B}12

_n

t;

n

0 t ¼ ztbzt; Step 2 :

n

0 t ¼ 39783 0:148

n

0t13þ 0:183

n

0t18 0:330

n

0t23 þ0:041

n

0 t24þ

3

t;

3

t ¼

s

tet; ln

s

2_t ¼ 25:623 þ 0:541gðet1Þ þ 0:725gðet24Þ; where gðetÞ ¼ 0:042etþ jetj pffiffiffiffiffiffiffiffiffi2=

p

;

etwINð0; 1Þ: (11)

Once estimated, Eq.(11)can be used to computebztandb

s

tin the

historical and forecast periods from the Winters and ARCH steps, respectively. Both the values ofbztandb

s

tare used as the input variables

in the WARCH–ANN model whose corresponding output value is zt.

4.1.2. SEGARCH model

The electricity consumption series zt given in Fig. 2 exhibits

a reasonable deterministic linear trend and monthly seasonal variation. Seasonal intervention models are employed to forecast this time series. The derived SEGARCH model with autoregressive error AR(1,2,8)-SEGARCH (q ¼ (1,24)) is zt ¼ 5609782 þ 58818t 243591xs1;t 1049180xs2;t 40242xs3;tþ 25837xs4;tþ 788991xs5;tþ 1284619xs6;t þ 1902389xs7;tþ 2334958xs8;tþ 1684728xs9;t þ 1498304xs10;tþ 818588xs11;tþ

n

t;

n

t ¼

3

t 0:230

n

t1 0:196

n

t2 0:102

n

t8;

3

t ¼

s

tet; (12) ln

s

2_t ¼ 25:716 þ 0:360gðet1Þ þ 0:533gðet24Þ;

where gðetÞ ¼ 0:477etþ jetj pffiffiffiffiffiffiffiffiffi2=

p

; etwINð0; 1Þ and x_s1;t ¼ n 1 if period t is January₀ _otherwise /

xs11;t ¼ n 1 if period t is November₀ _otherwise :

The estimated parameters and estimated values ofbztandb

s

tare

obtained simultaneously for the sample and forecast periods. The values of bothbztandb

s

tare used as input variables in SEGARCH–

ANN model where the corresponding output value is zt.

4.1.3. WARCH–ANN and SEGARCH–ANN models

In this step,bzt and b

s

t values from the WARCH and SEGARCH

estimation steps are regarded as input variables to WARCH–ANN and SEGARCH–ANN models in which ztis used in the output layer. All

networks are trained with the forecasting data from January 1993 to December 2005 and forecast for 24 months from January 2006 to December 2007. A back-propagation learning algorithm is used in the training process. More than 50 experiments are conducted to Table 1

The best results of neural networks for electricity consumption data.

WARCH–ANN SEGARCH–ANN

Input nodes 2 2

The number of hidden neurons 3 4

Learning rate 0.04 0.1

Momentum 0.1 0.1

RMSE of training data 0.019 0.016

1.5 2.5 3.5 4.5

Jan-93 Jan-94 Jan-95 Jan-96 Jan-97 Jan-98 Jan-99 Jan-00 Jan-01 Jan-02 Jan-03 Jan-04 Jan-05 Jan-06 Jan-07 Milion

(6)

determine the best combination of the learning rates, momentum, and number of hidden nodes. Throughout the training, the Neural-Ware[36]utility ‘SAVEBEST’ is used to monitor and save the lowest RMSE from the training set. For WARCH–ANN, the best RMSE result is obtained by using a learning rate of 0.08, a momentum rate of 0.1, and 3 nodes in a single hidden layer that uses the generalized data learning rule and a sigmoid transfer function (y ¼ 1/(1 þ ex)). The best architecture for the network is {2:3:1}. For the SEGARCH–ANN, the data period and the estimation details for the ANN are the same as those discussed above. The best architecture of the network is {2:4:1}. The results are reported inTable 1.

4.2. Petroleum consumption series

The petroleum consumption time series is recorded monthly wise from 1993 to 2007. These data include trend and seasonal varia tions, as shown inFig. 3. With their electric data, the values from 1993 to 2005 (156 observations) are used for estimating the models, and the monthly data from 2006/1 to 2007/12 are used for testing.

4.2.1. Winters, WARCH, and SEGARCH models

Investigation of the acf for the petroleum data reveals that it is non-stationary. The first regular differences and first seasonal differences are calculated in order to remove the growth trend and the seasonality characteristics. This process loses the first 13 observations. The resulting stationary time series (143 observa-tions) can be used to identify the Winters model. The estimated equation is presented below:

1 B1 B12yt ¼ ð1 0:804BÞ

1 0:668B12

n

t (13) The ADF unit root test can be rejected for the residuals f

n

0

tg at

the 5% level of signiﬁcance, since the ADF test statistic 11.02 is lower than the critical value 2.88. This indicates that the series is stationary. Both the Q and LM tests are signiﬁcant with p < 0.05 for lags between 1 and 12. This enables us to estimate the conditional

error variances b

s

2_t by using EGARCH process in historical and forecast periods. The ﬁnal WARCH((16),0,5) model, combining the Winters with the AR(16)-EGARCH (q ¼ (5)) model, is expressed as follows: Step 1 :1 B1 B12yt ¼ ð1 0:804BÞ 1 0:668B12

n

t;

n

0t ¼ ytbyt; (14) Step 2 :

n

0 t ¼ 3935 þ 0:215

n

0t16þ

3

t;

3

t ¼

s

tet; ln

s

2_t¼ 23:061 þ 0:356gðet5Þ;

where gðetÞ ¼ 0:339etþ jetj pffiffiffiffiffiffiffiffiffi2=

p

; etwINð0; 1Þ:

For the SEGARCH model with autoregressive errors, the ﬁnal speciﬁcation for the petroleum series is AR(1,5)-SEGARCH (q ¼ (1,24)), and the estimated model is as follows:

yt ¼ 1919088 þ 8615t 97131xs1;t 374934xs2;t 59440xs3;t 134773xs4;t 54148xs5;t 112180xs6;t 54368xs7;t 85199xs8;t 176545xs9;t 62519xs10;t 88123xs11;tþ

n

t;

n

t ¼ 0:208

n

t1 0:302

n

t5þ

3

t;

3

t ¼

s

tet; (15)

ln

s

2_t¼ 22:682 þ 0:2293gðet1Þ 0:4362gðet24Þ; where

gðetÞ ¼ 0:458etþ jetj ffiffiffiffiffiffiffiffiffi 2=

p

; etwINð0; 1Þ; xs1;t ¼ n 1 if period t is January₀ _otherwise / xs11;t ¼ n 1 if period t is November₀ _otherwise : 4.2.2. WARCH–ANN and SEGARCH–ANN models

The predicted values and the conditional variance estimates from the WARCH and SEGARCH models for petroleum are used to estimate the ANN models in this section. The back-propagation learning algorithm is used in the training process. For the Table 2

The best results of neural networks for petroleum consumption data.

WARCH–ANN SEGARCH–ANN

Input nodes 2 2

The number of hidden neurons 5 5

Learning rate 0.4 0.06

Momentum 0.1 0.05

RMSE of training data 0.029 0.023

14 19 24

Jan-06 Jul-06 Jan-07 Jul-07

Actual WARCH WAR-ANN SEGARCH SEGAR-ANN Billion

(7)

WARCH–ANN model, the best RMS error is obtained using a learning rate of 0.4, a momentum of 0.1, and 5 nodes in a single hidden layer ({2-5-1}). For the SEGARCH–ANN model, the best RMS error result is obtained by using a learning rate of 0.06, a momentum of 0.05, and 5 hidden nodes ({2-5-1}). The results are reported inTable 2.

5. Out-of-sample forecasting performance comparison In this section, the out-of-sample forecasting ability of four models (WARCH, SEGARCH, WARCH–ANN and SEGARCH–ANN) is evaluated over a 24-month forecast period, where WARCH and SEGARCH are the benchmark models. For the years 2006 and 2007, the forecasting values given by the proposed four models as well as the actual values for both types of energy are shown inFigs. 4 and 5. Clark [37] showed that out-of-sample forecast comparisons can help prevent data mining-induced overﬁtting. While the hybrid ANN would clearly be expected to dominate in the sample, since it nests the linear model, there is in fact no a prior guarantee that the hybrid ANNs will dominate with out-of-sample data. Indeed, it is possible that ANN could overﬁt the data in the sample and thus produces out-of-sample forecasts that are inferior to forecasts from the linear model[38]. Thus, three different statistical measures and forecast encompassing tests are employed to evaluate the out-of-sample forecast capability of each of the linear and hybrid nonlinear models.

5.1. Root mean square, mean absolute and mean absolute percentage forecast errors

Table 3reports the RMSE, MAE and MAPE for each model for the out-of-sample period from January 2006 to December 2007. The results show that the WARCH model is better than the SEGARCH model, SEGARCH–ANN is better than the SEGARCH model, and WARCH–ANN is the best of the four models for both electricity and petroleum consumption. However, SEGARCH–ANN is better than the WARCH model on petroleum consumption only. All of the models have highly accurate forecasts, because the MAPE results are less than 5%[32].

Furthermore, none of the comparisons, neither by RMSE nor by MAE and MAPE, can provide any indication of whether any one model’s performance is signiﬁcantly better than that of other models in a formal statistical sense [38]. Therefore, in the next section we present an additional means of comparison between forecasting models, namely comparison by forecast encompassing, which allows us to test whether one model has signiﬁcantly better performance than another.

5.2. Forecast encompassing tests

The forecast encompassing test was applied to the out-of-sample comparison for nested models by Clark and McCracken

[39]. The results from the encompassing tests reported inTable 4

paint a picture similar to that inTable 3. This table reports the t-statistics of the estimated coefﬁcientsb

a

1and b

b

1from Eq.(9)and

the corresponding p-values. As clearly seen from the Table 4

(Panel A for the electricity consumption and Panel B for the petroleum consumption), the differences D14, D24, and D34

between the WARCH–ANN and the other three models (Winters, SEGARCH, SEGARCH–ANN) explain the forecasting errors E1, E2,

and E3 well from each of the alternative models, respectively.

Moreover, the forecasting errors (E4) of the WARCH–ANN model

cannot be accounted for by any of the differences, D14, D24, or D34,

for either fuel. Pairwise comparisons for the encompassing reveal that WARCH–ANN is the only model whose forecast is not encompassed by the other models, and WARCH–ANN signiﬁ-cantly encompasses the other models. Thus, WARCH–ANN can be considered the dominant forecasting device for both energy consumptions.Table 4 also reveals that SEGARCH–ANN signiﬁ-cantly encompasses the SEGARCH model. The graphical repre-sentation of encompassing tests is shown inFig. 6for both the types of energy.

As a result, it should be clear that the ANN steps reproduce the predicted values from the initial linear model and the ANN step encompassing step 1. Moreover, a poor linear model may produce a poor hybrid nonlinear model. The output of the SEGARCH model is poorer than that of the Winters one, so the WARCH–ANN results are better than the SEGARCH–ANN ones. The signiﬁcantly superior performance of the hybrid nonlinear ANN models compared with other conventional linear models in

3.2 3.6 4

Jan-06 Jul-06 Jan-07 Jul-07

Million Actual WARCH WAR-ANN SEGARCH SGAR-ANN

Fig. 5. Actual and model values for petroleum consumption data.

Table 3

Comparative forecasting performance of energy consumption.

WARCH WARCH–ANN SEGARCH–ANN SEGARCH Panel A: electricity energy consumption

Input nodes bzt;bst bzt;bst

RMSE 643744.33 531545.14 596013.96 824500.08 MAE 474189.18 404184.25 464632.42 606629.27

MAPE 2.90% 2.56% 2.98% 3.65%

Panel B: petroleum energy consumption

Input nodes bzt;bst bzt;bst

RMSE 165753.68 134832.21 148234.91 204369.84 MAE 134300.13 112542.53 122320.08 167031.13

MAPE 4.08% 3.51% 3.71% 4.88%

(8)

out-of-sample test suggests that ﬂexible hybrid ANNs may be able to account for potentially complex nonlinear relationships not easily captured by linear models.

6. Conclusion

Forecasting energy consumption is of utmost importance to the reconstruction process going on in Taiwan, speciﬁcally in the energy generation systems. It is a challenge for us to develop forecast tools with the data obtained in Taiwan due to her rapid economic growth and increasing demand for both electricity and petroleum. Taking the heteroscedastic variation into account, this study proposes two hybrid univariate nonlinear models, SEGARCH– ANN and WARCH–ANN for multi-step-ahead forecasting. Both hybrid ANNs can decrease round-off and prediction errors for multi-step-ahead forecasting.

The out-of-sample forecasting performance of each model is assessed by three statistical measures: RMSE, MAE, MAPE, and encompassing tests. The results of the statistical measures suggest that the WARCH model is better than the SEGARCH model, SEGARCH–ANN is better than the SEGARCH model, and WARCH– ANN is the best of the four models for both electricity and petro-leum consumption. However, SEGARCH–ANN is better than the WARCH model on petroleum consumption only. All of the models have highly accurate forecasts, since the MAPE results are less than 5%. Furthermore, the WARCH–ANN model signiﬁcantly encom-passes the other three models, while SEGARCH–ANN considerably encompasses the SEGARCH model in the consumption of both the types of energy. So it should be clear that the ANN steps would reproduce the predicted values from the initial linear model, and the ANN step would encompass the linear step. The WARCH–ANN model is the dominant forecasting approach for both electricity and petroleum.

In summary, the preponderance of the statistical evidence presented in this paper suggests that the proposed hybrid ANNs forecasts generally outperform the forecasts from a variety of linear models in predicting the data on Taiwan’s energy consumption. The practical significance of this result is evident from the out-of-sample nature of the tests employed. Although the hybrid ANN nests the linear model as a special case and would therefore be expected to dominate this model in the sample, assuming there was no a priori guarantee that hybrid ANNs would dominate out-of-sample, especially if the ANNs overfit the in-sample data. The fact that the proposed hybrid ANNs did outperform the conven-tional linear models in the out-of-sample tests therefore reveals that flexible hybrid ANNs may be to account for potentially complex nonlinear relationships not easily captured by linear models. Furthermore, the information on the interactions and nonlinear integrating effects between ðbzt; b

s

ztÞ and ztare important because

both types of energy consumption are well accommodated by the hybrid nonlinear algorithm. Overall, the inclusion of hetero-scedastic variations in the input layer of the hybrid univariate model could help improve the modeling accuracy of multi-step-ahead forecasts.

Acknowledgments

The author would like to thank the anonymous referees for their valuable suggestions and useful comments.

References

[1] Himanshu AA, Lester CH. Electricity demand for Sri Lanka: a time series analysis. Energy 2008;33:724–39.

[2] Fatai K, Oxley L, Scrimgeour FG. Modeling and forecasting the demand for electricity in New Zealand: a comparison of alternative approaches. The Energy Journal 2003;24(1):75–102.

[3] Hamzacebi C. Forecasting of Turkey’s net electricity energy consumption on sector bases. Energy Policy 2007;35:2009–16.

[4] Mohamed Z, Bodger P. Forecasting electricity consumption in New Zealand using economic and demographic variables. Energy 2005;30(10):1833–43. [5] Gorucu FB, Gumrah F. Evaluation and forecasting of gas consumption by

statistical analysis. Energy Sources 2004;26:267–76.

[6] Yang M, Yu X. China’s rural electricity market – a quantitative analysis. Energy 2004;29(7):961–77.

[7] Box GEP, Jenkins GM. Time series analysis: forecasting and control. San Francisco: Holden-Day; 1976.

[8] Bowden N, Payne JE. Short term forecasting of electricity prices for MISO hubs: evidence from ARIMA–EGARCH models. Energy Economics 2008;30(6):3186–97.

[9] Ediger VS, Akar S. ARIMA forecasting of primary energy demand by fuel in Turkey. Energy Policy 2007;35(3):1701–8.

Table 4

Encompassing tests of forecasting performance of alternative models.

Dependent variable: forecasting errors Independent variable: forecasting difference from two models

D12b D13b D23b D14b D24b D34b

Panel A: electricity consumption (forecast period: from January 2006 to December 2007)

E1a(WARCH) 1.52*(2.98) 0.52*(1.26) 0.81*(2.84)

E2a(SEGARCH) 2.52*(4.95) 1.34*(3.57) 0.97*(4.73)

E3a(SEGARCH–ANN) 0.48*(1.15) 0.34(0.90) 0.70*(3.08)

E4a(WARCH–ANN) 0.19(0.69) 0.03(0.16) 0.30(1.33)

Panel B: petroleum consumption (forecast period: from January 2006 to December 2007)

E1a(WARCH) 0.90(1.17) 1.66(1.24) 0.82*(2.71)

E2a(SEGARCH) 0.10(0.21) 1.00*(4.55) L0.10*(L3.02)

E3a(SEGARCH–ANN) 0.66(1.03) 0.13(0.21) L0.65*(L2.90)

E4a(WARCH–ANN) 0.12(0.09) 1.18(1.05) 1.65(1.08)

Which indicates that statistical signiﬁcance is at the 0.1 level. The values in parentheses are t-statistics.

a_{1 ¼ WARCH, 2 ¼ SEGARCH, 3 ¼ SEGARCH–ANN, 4 ¼ WARCH–ANN and E}

idenotes the forecast error for model i. b _D

ijdenotes the difference between the forecast from the model i and model j. *Which indicates that statistical signiﬁcance is at the 0.05 level.

WARCH-ANN

WARCH

SEGARCH-ANN

SEGARCH

(9)

[10] Pappas SS, Ekonomou L, Karamousantas DC, Chatzarakis GE, Katsikas SK, Liatsis P. Electricity demand loads modeling using AutoRegressive moving average (ARMA) models. Energy 2008;33(9):1353–60.

[11] Saab S, Badr E, Nasr G. Univariate modeling and forecasting of energy consumption: the case of electricity in Lebanon. Energy 2001;26:1–14. [12] Zhou P, Ang BW, Poh KL. A trigonometric grey prediction approach to

fore-casting electricity demand. Energy 2006;31(14):2839–47.

[13] Gonzalez PA, Zamarreno JA. Prediction of hourly energy consumption in buildings based on a feedback artiﬁcial neural network. Energy and Buildings 2005;37(6):595–601.

[14] Lauret P, Fock E, Randrianarivony RN, Manicom-Ramasamy JF. Bayesian neural network approach to short time load forecasting. Energy Conversion and Management 2008;49(5):1156–66.

[15] Hippert HS, Pedreira CE, Souza RC. Neural networks for short-term load forecasting: a review and evaluation. IEEE Transactions on Power Systems 2001;16(1):44–55.

[16] Amjady N, Keynia F. Short-term load forecasting of power systems by combination of wavelet transform and neuro-evolutionary algorithm. Energy 2009;34(1):46–57.

[17] Pao HT. Forecasting electricity market pricing using artiﬁcial neural networks. Energy Conversion and Management 2007;48:907–12.

[18] Mirasgedis S, Saraﬁdis Y, Georgopoulou E, Lalas DP, Moschovits A, Karagiannis F, et al. Models for mid-term electricity demand forecasting incorporating weather inﬂuences. Energy 2006;31:208–27.

[19] Amjady N, Keynia F. Midterm load forecasting of power systems by a -new prediction method. Energy Conversion and Management 2008;49(10):2678–87.

[20] Tso GKF, Yau KKW. Predicting electricity energy consumption: a comparison of regression analysis, decision tree and neural network. Energy 2007;32(9):1761–8.

[21] Padmakumari K, Mohandas KP, Thiruvengadam S. Long term distribution demand forecasting using neuro fuzzy computations. International Journal of Electrical Power and Energy Systems 1999;21(5):315–22.

[22] Kermanshahi B, Iwamiya H. Up to year 2020 load forecasting using neural nets. International Journal of Electrical Power and Energy Systems 2002;24:789–97.

[23] Pao HT. Comparing linear and nonlinear forecasts for Taiwan’s electricity consumption. Energy 2006;31:1793–805.

[24] Jebaraj S, Iniyan S. A review of energy models. Renewable and Sustainable Energy Reviews 2006;10:281–311.

[25] Hu MY, Zhang G, Jiang CX, Patuwo BE. A cross-validation analysis of neural network out-of-sample performance in exchange rate forecasting. Decision Sciences 1999;30(1):197–216.

[26] McKenzie E. General exponential smoothing and the equivalent ARMA process. Journal of Forecasting 1984;3:333–44.

[27] Nelson DB, Cao CQ. Inequality constraints in the univariate GARCH model. Journal of Business and Economic Statistics 1992;10(2):229–35.

[28] McLeod AI, Li WK. Diagnostic checking ARMA time series models using squared-residual autocorrelations. Journal of Time Series Analysis 1983;4:269–73.

[29] Engle RF. Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inﬂation. Econometrica 1982;50:987–1007. [30] Zhang GP. An investigation of neural networks for linear time-series

fore-casting. Computer and Operations Research 2001;28:1183–202.

[31] Bishop CM. Neural networks for pattern recognition. Oxford: Oxford Univer-sity Press; 1995.

[32] Lewis CD. Industrial and business forecasting method. London: Butterworth; 1982.

[33] Chong YY, Hendry DF. Econometric evaluation of linear macroeconomic models. Review of Economic Studies 1986;53:671–90.

[34] Granger CWJ, Newbold P. Forecasting economic time series. Orlando, FL: Academic Press; 1986.

[35] Clements MP, Hendry DF. Forecasting economic time series. UK: Cambridge University Press; 1998.

[36] NeuralWare. Neural computing: neural works professional II/PLUS and neu-ralworks explorer. NeuralWare 1993;.

[37] Clark TE. Can out-of-sample forecast comparisons help prevent overﬁtting. Journal of Forecasting 2004;23:115–39.

[38] Donaldson RG, Kamstra M. Forecast combining with neural networks. Journal of Forecasting 1996;15:49–61.

[39] Clark TE, McCracken MW. Tests of equal forecast accuracy and encompassing for nested models. Journal of Econometrics 2001;105:85–110.