預測組合的演變及各學科中的組合概念

(1)

୯ҥᆵ᡼εᏢޗ཮ࣽᏢଣ࿶ᔮᏢس ᅺγፕЎ

Department of Economics College of Social Sciences National Taiwan University

Master Thesis

ႣෳಔӝޑᄽᡂϷӚᏢࣽύޑಔӝཷۺ On the evolution of the forecast combinations

and the pooling ideas

ֺػޱ Yu-Fang Sung

ࡰᏤ௲௤ȅఉ୯ྍ റγ Advisor: Kuo- Yuan Liang, Ph.D.

ύ๮҇୯ 104 ԃ 1 Д January, 2015

(2)

ठ ठᖴ

ҁፕЎҥ୷ܭख़ຎ໒ബ܄ޑሽॶᢀǴץղ܄Ӧӣ៝ᒍنᅟ࿶ᔮᏢዛளЬ

C.GrangerගрႣෳಔӝޑфമǶ

གᖴఉ୯ྍԴৣ(ϡεᝊ๮ᆕӝ࿶ᔮࣴزଣଣߏ)ᡣךૈӧ࿶ᔮس္଺೭ᅿࡘ གྷǵᐕўᜪޑץղЎകǴ٠Ԗ೭΋ϩࡘᒣԾҗǶ྽߃ךӢࣁ௢஖࿶ᔮᏢҔኳࠠϩ

݋ೀ౛᝼ᚒޑૈΚǴܤ๱዗௃຾Ε࿶ᔮسǴՠᒿ๱ਔВ௢౽Ǵך຾΋؁ௗ᝻ډٰ

ԾځдᏢࣽ፦ᅪࡷᏯޑᖂॣǶךම࿶჋၂ஒ೭٤ཀـ༼᏾ډፐ୸΢ൔ֋Ǵයࡑ೸

ၸ૸ፕǴளډ׳ుڅޑ࣮ݤǴՠࠅၶډհᅃޑჹࡑǵჹܭך஥ٰޑࡷᏯ༔ϐаሷǶ ӵϞǴӢࣁԖఉԴৣޑ೭ҽᏢೌ໡ໆǴךளډКၨӳޑᐒ཮ૈӳӳޑ਩౛Ў᝘ǵ ਥᏵ᛾Ᏽว৖ၨࣁᑞஏޑፕॊǴ٠ᙖԜ߄ၲךޑ΋٤֚ൽکᚶᅪǶఉԴৣӭԃు

હܭႣෳಔӝޑ஑཰ૈΚǵዕ࿩࣬ᜢЎ᝘ޑుࠆфۭǴගٮҁፕЎ೚ӭख़ाޑᢀ ᗺکၗ਑Ǵωᗉխ೭ঁբࠔࢬܭጥభޑץղǶ

གᖴᙁᒸᅇԴৣ(Ѡ᡼ύѧࣴزଣ࿶ᔮ܌܌ߏ)ගрޑΟᗺࡷᏯǴᡣךϸ࣪Ծρ ፕॊ΢ޑલѨǶಃ΋ǴႣෳಔӝύ໣ࡘቶ੻ޑᆒઓԴԐӧѳ֡ॶ೭ঁ௶ॊ಍ीໆ ύ൩৖౜ΑǴךޑЎ᝘ጄൎሡाঅ҅ǶಃΒǴႣෳಔӝޑࢬՉǴऩ݀੿ࢂӢࣁ಍

ी೬ᡏޑ໒วߦԋޑǴךᔈ၀ाӃ᛾ܴीᆉႣෳಔӝࢂ΋ҹ຤ઓ຤Κޑ٣ǶಃΟǴ όᔈ၀ѝࢂࣁΑפډፕЎ֎ЇΓޑߝᗺǴӧᄔाޑኗቪ΢௛ᜏၸࡋᐟਗ਼Ǵᗋࢂᔈ

྽а౛ܺΓǶ

གᖴڬЎ݅Դৣ(߻३ෝύЎεᏢ࿶ᔮس௲௤Ǵ౜Һ३ෝ߷ୱ࿶ᔮࣴزύЈၗ

ుࣴز঩)ගᒬךቪբፕЎᔈ၀ܴዴࡰрԾρޑว౜܈ଅ᝘ǶڬԴৣޚ࡭஑཰ޑᆒ ઓǴӧα၂཮΢ޔقόᒈޑࡷᏯҁፕЎޑଅ᝘Ǵߦ٬ךϸ࣪Ծρޑ߳ज़Ǵा؃Ծ ρӧϣ৒ک߄ॊ΢຾΋؁ᆒྡྷǶќѦǴڬԴৣΨ຤ЈӦ΋΋ࡰᗺךҁЎϣ৒ෞᚆǵ ЎݤᒱᇤޑӦБǴ٠ගрдޑঅ҅ཀـǴԴৣಒЈ༇ࠀޑᆒઓǴᡣΓག୏Ƕ

གᖴ݅ඁ࣓Դৣ(Ѡ᡼εᏢޗ཮ࣽᏢଣଣߏ)ٌ༇ᒤ౛ଣ୍ǵྣ៝ᏢηǴߦԋዛ ᏢߎԖਏޑϩଛډሡाޑᏢғЋ΢ǴךΨᆾڙځඁǶԴৣ྽ϺӧΓ٣ܕࡑޑගᗺǴ ΨᡣךԖ܌ᏢಞǶ

(3)

ᄔ ᄔा

C. GrangerӢࣁගр Bates and Granger (1969)ǵNewbold and Granger (1974)Ϸ Granger and Ramanathan (1984)൚ۓΑдႣෳಔӝ౛ፕҥ୷ޣޑӦՏǶՠ Newbold and Granger (1974)ޑϦԄᆶ Markowitz (1952) ޑϦԄӧኧ౛΢ֹӄ࣬ӕǴԶЪԐ ܭ Granger and Ramanathan (1984)ςԖ Crane and Crotty (1967)଺ၸ࣬՟ޑ٣Ǵа଑

ᘜޑБԄᆕӝኳࠠǶќѦǴႣෳಔӝගрϐࡕډ 1974 ԃϝቶڙץຑǴՠϐࡕࠅΞ ഢڙᆀ៉Ǵ೭ΨᡉҢႣෳಔӝޑள༈ǴёૈԖځബཥଅ᝘ϐѦޑচӢǴځύޑा

Ӣࢂႝတࣽמޑว৖ǶҁЎ਩౛೭٤ે๎ǴаၨץղޑفࡋǴख़ཥᔠຎ Granger ჹႣෳಔӝޑଅ᝘Ƕ

໣ࡘቶ੻ǵᆕӝၗૻޑᢀۺϷ଺ݤΨ೏ၮҔӧ਻ຝᏢύޑ໣ӝႣൔǵЈ౛Ꮲ ύޑղᘐႣෳϷᆫӝϩ݋ǵӦ᎜ᏢύޑӦ᎜ႣෳϷ᎜ෳϩ݋ǵғᄊᏢύޑᕉϩ݋Ƕ ჹಔӝཷۺޑғว୏ᐒϷࢬᡂԖᑫ፪ޑᏢޣǴёаୖԵҁЎςޕಔӝཷۺޑጲࠁ ว৖Ƕ

ᜢᗖӷ Ⴃෳಔӝǵ׫ၗಔӝǵႣෳ఼ᇂᔠۓǵղᘐႣෳǵᆫӝϩ݋ǵ໣ӝႣ ൔǵӦ᎜Ⴃෳಔӝǵ᎜ෳϩ݋ǵᕉϩ݋

(4)

Abstract

Though Granger is well recognized as a theoretical founding father of forecast combination, two evidences go against the general attribution of his contribution. First, the forecast combination formula and the derivation of its weights are exact the same as Markowitz (1952). Second, before Granger and Ramanathan (1984), Crane and Crotty (1967) had already combined forecasts through multiple regression. Furthermore, forecasts combination was heavily criticized before 1974, but was widely accepted after that. All these factors indicates the rising importance of forecast combination is due to reasons other than its innovative contribution. Studies shows the development of computer technology has played a role.

Other than the reposition of Granger’s status, after 1969, various combination practices are developed as well, such as ensemble forecast in meteorology, earthquake forecast and seismic analysis in seismology, and loop analysis in ecology. Researchers who interest in the idea of combination may consult this thesis to have an understanding of the various findings.

Keywords forecast combinations; portfolio selection; forecast encompassing;

judgmental forecast; meta-analysis; ensemble forecast; combining earthquake forecasts;

seismic analysis; loop analysis

(5)

Contents

ठᖴ i

ᄔा ii

Abstract iii

Contents iv

Exhibits v

Tables vi

Chapter 1 Introduction 1

1.1 Theoretical development of forecast combination 1

1.2 Impacts of forecast combination 2

1.3 Contributions of this thesis 2

Chapter 2 From pooling ideas to the innovation of forecast combinations 4 2.1 Portfolio selections (Markowitz 1999)─ pooling assets and mathematical equivalence 4 2.2 Two-stage forecasting model (Crane and Crotty 1967)--combining forecasts through

multiple regression technique 6

2.3 Combining multiple estimates before Bates and Granger (1969) 7

2.4 Granger’s remark 7

Chapter 3 The change in economists and statistician’s attitude 8

3.1 Early appreciator 8

3.2 Late appreciator—the development of computer technology 8 Chapter 4 Evolution of forecast combinations and forecast encompassing 11 4.1 Interaction between operational research and forecast combinations 11

4.2 Late development 13

4.3 Forecasts encompassing 19

4.4 A modified exhibit 22

Chapter 5 The ensemble ideas in natural science 24

5.1 Ensemble forecasting in meteorology 24

5.2 Seismology and Ecology 29

Chapter 6 The ideas of combination in psychology 35

6.1 Meta-analysis 35

6.2 Mechanical combination of individual judgments 41

Chapter 7 Uncertainty and complexity 46

Reference 47

(6)

Exhibits

Exhibit 1 Evolution of forecast combinations ... 2

Exhibit 2 Cumulative number of articles published on combined forecasts (adapted from Clemen 1989) ... 10

Exhibit 3 Theoretical development made by Dickinson (1973, 1975) ... 12

Exhibit 4 Operational research model closely related to forecast combinations ... 13

Exhibit 5 Connection between Bayesian Models for Combining Probability Distributions in the Normal settings and forecast combinations ... 18

Exhibit 6 Evolution of forecasts encompassing ... 22

Exhibit 7 A modified exhibit based on Clemen (1989) ...23

Exhibit 8 Concept of ensemble forecast (adapted from Fritsch et al. 2000, 572). ... 26

Exhibit 9 Evolution of ensemble forecasts ... 29

Exhibit 10 An illusion of the statistical upshot of forecast combination ...43

(7)

Tables

Table 1 Similarities between portfolio and combination of forecasts ... 5 Table 2 Bayesian equivalence of the theories of forecast combination ...18 Table 3 Compariaon between meta-analysis and forecast combinations ...39

(8)

Chapter 1 Introduction 1.1 Theoretical development of forecast combination

Forecast combination is a simple and pragmatic way to possibly produce better forecasts. For example, in many cases, just the individual forecasts are available, rather than the information they are based on, and so combining is appropriate. The method was generally founded by Granger. Forecast combination method generally fall in to three catalog: variance-covariance method, regression-based method and Bayesian-based method (Diebold 2007; Liang and Shih 1994).

Begin with variance-covariance method (Bates and Granger 1969; Newbold and Granger 1974). Suppose there are M unbiased forecasts ࡲԢ_் ൌ ሺܨ_ଵǡ்ǡ ǥ ǡ ܨ_ெǡ்ሻ of some quantity ܺ_். Then the linear combination

ܥ_் ൌ ࢝Ԣ_்ࡲ_்ǡ ࢝Ԣ_்૚ ൌ ૚ǡ Ͳ ൑ ݓ_௜ǡ் ൑ ͳ݂݋ݎ݈݈ܽ݅ǡ

where ܥ_் is the combined forecast, ࢝Ԣ_் ൌ ሺݓ_ଵǡ்ǡ ǥ ǡ ݓ_ெǡ்ሻ is weight, and

૚^ᇱൌ ሺͳǡ ǥ ǡͳሻ.

Minimizing the variance of the combined forecast error will result in

࢝_் ൌ ሺσ^ିଵ૚ሻȀሺ૚Ԣσ^ିଵ૚ሻǡ where

σ ൌ ࡱሺࢋ_்ࢋ_்^ᇱሻ ࢋ_் ൌ ܺ_்૚ െ ࡲ_்Ǥ

In the theoretical perspective, Newbold and Granger (1974) is an extension of Bates and Granger (1969), which extend the a 2 by 2 variance covariance matrix to k by k one. The k by k variance covariance matrix require intensive computation, fortunately, Granger and Ramanathan (1984) propose regression-base method, and therefore popularize the use of forecast combination. Suppose there are m forecasts ݕ_{௧ା௛ǡ௧} which made at time t for t+h,

ݕ_௧ା௛ ൌ ෍ ݓ_௜

ெ

௜ୀଵ

ݕ_{௧ା௛ǡ௧ǡ௜}൅ ݁_{௧ା௛ǡ௧}ǡ

by simply regressing realizations ݕ_௧ା௛ on forecasts, one derive the weight ݓ_௜ for each forecast. In fact, the optimal variance-covariance combining weights have a

(9)

regression interpretation as ݓ_௜ subject to σ ݓ_௜ ൌ ͳ and the intercept is excluded, which is known as method B. In practice, it is usually preferable not to constrain weights sum to 1 (but still exclude the intercept, method A) or more flexibly, lifting both constraints (method C) (Liang and Shih 1994). Summary of theoretical development of forecast combination is in Exhibit 1.

Exhibit 1 Evolution of forecast combinations 1.2 Impacts of forecast combination

Today, forecast combination have been applied in diverse area and have influence on the theoretical development of economics. According to Clemen (1989), the techniques of pooling forecasts was used as common practice in institutions such as the business outlook surveys of ASA/NBER since 1968, consensus macroeconomics forecasts of Blue Chip Economic Enterprises since 1976, and economic forecasts of The Financial Times at least before 1986. Successfully been applied in forecasting economics such as inflation, money supply, exchange rates, stock prices and sales forecasting. Among them, forecast combinations is used heavily in financial engineering especially. Application of combined forecasting has not been limited to economics, outcomes of football games, wilderness area use, check volume and many other things were all included.

Also, forecast combinations influences the development of economic theory.

Rational expectations and efficiency theory have taken root in the spirit of forecast combinations (Holden and Peel 1989; Holden et al. 1985). For they all share the same theoretical position of maximum information (Bunn 1989).

1.3 Contributions of this thesis

The contributions of the thesis are three.

First, though Granger is well recognized as a theoretical founding father of forecast combination, two evidences go against the general attribution of his

(10)

contribution. First, the forecast combination formula and the derivation of its weights are exact the same as Markowitz (1952). Second, before Granger and Ramanathan (1984), Crane and Crotty (1967) had already combined forecasts through multiple regression. Furthermore, forecasts combination was heavily criticized before 1974, but was widely accepted after that. All these factors indicates the rising importance of forecast combination is due to reasons other than its innovative contribution.

This thesis find that the development of computer technology has played a role.

To support the explanation, this paper surveys early pooling approaches that may inspired Bates and Granger (1969). A brief introduction to the subsequent evolution of forecast combinations is included.

Second, the ideas of combination beyond economic forecasting and their inter-disciplines relations were presented, such as ensemble forecast in meteorology, earthquake forecast and seismic analysis in seismology, and loop analysis in ecology.

Researchers interested in the idea of combination may consult this thesis to have an understanding of the various findings.

Third, the thesis adjusts and expands the exhibit in “historical development of combining forecasts literature” (Clemen 1989), which gave an influential¹ review of forecast combination.

1 Cited 1531 times on Google scholar. Updated on January 17, 2015.

(11)

Chapter 2 From pooling ideas to the innovation of forecast combinations Pooling techniques has been our daily practice from long time age. The simplest example, mean is a basic descriptive statistic for a distribution. Beyond mathematic, combining forecasts subjectively is also a simple example for the application of pooling ideas. People knew a sounder forecasting estimate can be obtained by combining and averaging estimates (Board 1963). In the suggestion of National Industrial Conference, to forecast sales well “[o]ne of the oldest and simplest methods of forecasting” (12) is to pool and average² the views of managers.

People employ combining techniques in forecasting because it can “broaden the base of forecasting” (12) and thus “obtain a sounder forecast of sales than could be made by a single estimator” (12). These reasons are just the same as Bates and Granger (1969). More applications of pooling ideas are demonstrated below, which are likely to inspire forecast combination method.

2.1 Portfolio selections (Markowitz 1999)─ pooling assets and mathematical equivalence

Risk diversification (pooling assets) has already been a common practice among business investors far before Markowitz proposed his “Portfolio selection”

(1952). For example, in the Merchant of Venice, Shakespeare has the businessman Antonio say: “My ventures are not in one bottom trusted, nor to one place; nor is my whole estate upon the fortune of this present year; therefore, my merchandise makes me not sad.” (Act I, Scene 1). Same behavior can be found on modern businessman, investment trusts of Scotland and England in the middle of the 19th century provided diversification for their customers, their practices influenced modern investment companies. According to (Markowitz 1999), Wiesenberger's annual reports (since 1941) showed that these firms held large numbers of securities to diversify risks.

To an investor who wants to minimize the risks in his investment, it is not an asset's own risk that is important, but rather the contribution the asset makes to the variance of his entire asset allocation package. But before Markowitz (1952) mathematical tools only facilitated investor to calculate individual asset variance,

2 Not in the numerical sense. In psychology, this is called “clinical method”. The decision-maker

(12)

Markowitz innovates in calculating the variance of a weighted sum of the asset allocation package:

suppose one has M assets, the returns of assets is ܀^ᇱൌ ሺܴ_ଵǡ ǥ ǡ ܴ_ெሻ and the expected return is ૄ ൌ ሺ܀ሻ, the expected return of a portfolio is

ൌ ܠ^ᇱૄǡܠ^ᇱൌ ሺݔ_ଵǡ ǥ ǡ ݔ_ெሻܽ݊݀૚^ᇱൌ ሺͳǡ ǥ ǡͳሻǤ The variance of a portfolio is ൌ ሺ܍܍^ᇱሻǡ܍ ൌ ܀ െ ૄǤ

And the constraint is ܠ^ᇱ૚ ൌ ͳǡݔ_௜ ൒ Ͳ.

By Lagrange formula, the optimal investment amount is ܠ ൌ ሺσ^ିଵ૚ሻȀሺ૚Ԣσ^ିଵ૚ሻǡ

which is exactly the optimal weight in Newbold and Granger (1974). The mathematical equivalence indicates that Granger may learn from Markowitz.

Some famous scholars had already aware of the analogues between portfolio selections and forecast combination. Timmermann (2006) had said “the portfolio is the combination of forecasts and the source of risk reflects incomplete information about the target variable and model misspecification possibly due to non-stationarities in the underlying data generating process.” Winkler (1989) had also said “[j]ust as investors create diversified portfolios to reduce risk, a combined forecast can be thought of as having a smaller risk of an extremely large error than an individual forecast.” (Table 1) Table 1 Similarities between portfolio and combination of forecasts

Because the mathematical equivalence, Markowitz’s risk reduction effect can explain the improvement in the accuracy of forecast combinations under the minimum error-variance criterion. Examples, such as: (1) Armstrong (2006) examined numerous forecasting methods for reducing forecast error and summarized that one of the advantages of forecast combinations is “spread risk”. Arguably, his

(13)

comment has taken root in Markowitz (1952). The method of portfolio is efficient in minimizing the overall risk within a portfolio by spreading risk through combining techniques. As a forerunner of forecast combinations, the method of portfolio may inspire Armstrong to describe the advantage of forecast combinations as risk spreading.

(2) Hibon and Evgeniou (2005) used the idea of risk reduction to explain the practical strength of forecast combination as well. They improved forecasting ability by firstly using a simple model-selection criterion to select among forecasts and gained a significant improvement in the accuracy of the selected combined forecast over that of the selected individual forecasts. Regarding this result, Hibon and Evgeniou (2005) said

“[t]hese results indicate that the advantage of combining forecasts is not that the best possible combinations perform better than the best possible individual forecasts, but that it is less risky in practice to combine forecasts than to select an individual forecasting method.” Like forecast combinations, we can say that “the advantage of a portfolio is not a portfolio will perform better than the possible best return and least risk asset, but that it is less risky or more practical than to put all your money on the possible-not-existed ideal asset.”

(3) To spread risks, one would better to diversify assets across industries with different economic characteristics. The more distinct in the economic characteristics, the better. Same rule applies to forecast combination as well. To improve further the forecasting ability, component in the combined formula should be chose from models or people with distinct information sets (Granger 1989; Wallis 2011).All three examples illustrated here show the theoretical similarity between portfolio and forecast combinations (or to be critical, the inherited nature of forecast combinations from portfolio).

2.2 Two-stage forecasting model (Crane and Crotty 1967)--combining forecasts through multiple regression technique

Crane and Crotty (1967) is suspected as another forerunner of Granger’s forecast combination. They combine forecasts through multiple regression technique with a regressor produced by exponential smoothing model (time series model) and other regressors (multiple regression model). The two-stage forecasting model took the advantage on the complement characteristics of time series analysis model and multiple regression model. Since time series models uses information contained in the historical

(14)

movement pattern and is good at detecting and adjusting to changes in the forecast series, but does not use the information of independent variables and failed in the prediction of major changes in the trend, combining one into the other will aid in the forecasting ability. One thing noteworthy is that the two-stage forecasting model was even not an invention of Crane and Crotty (1967), it ‘has been successfully applied to a problem important to asset management in banks, the forecasting of demand deposit’

(505).

2.3 Combining multiple estimates before Bates and Granger (1969)

Besides, in the field of statistics, combining multiple estimates has been used since 1936 (Clemen 1989). Early proposition of was made by Edgerton and Kolbe (1936) and Horst (1936). The authors took a minimization of the sum of squares of the differences of the standard scores for the estimates, and maximization of the pairwise separation among the sample points respectively. Both techniques were similar in essence to least squares though were not used by modern researchers anymore. Not long before 1969, minimum squared-error combination of estimates were provided by Halperin (1961).

2.4 Granger’s remark

Granger remarks that the idea of forecast combinations as his own idea. In the obituary of Granger published in the International Journal of Forecasting, Elliott (2009) described that the idea of Granger causality is likely inspired Granger himself to study further on the issue of forecasting. That is, the question of whether or not one variable results in another turned into the question of whether or not the variable is valuable for forecasting another variable. Granger himself attributed the finding to his observation based on the work of Barnard (1963). Granger found the result may be improved by taking a simple average. In his words, Bates and him then “just developed the idea” (Teräsvirta 1995, 587) that the predicting result is quite possibly better if not throwing one of the forecasts away but combining. The same words had been appeared several years ago when Granger was invited to review on the topic of forecast combination (Granger 1989). He referred the theoretical finding to his observation on Barnard (1963) as well.

But the mathematical equivalence between Markowitz (1952) and Newbold and

(15)

Granger (1974), the method similarity between Crane and Crotty (1967) and Granger and Ramanathan (1984), both indicate Granger may learn from others.

Chapter 3 The change in economists and statistician’s attitude Despite forecast combination received wide acceptances and induced numerous applications nowadays, it was heavily criticized by economists and statistician when the idea was first introduced (Newbold and Granger 1974). The main concern of the mainstream econometricians is to understand economic structural relationship by modelling so they put most of their efforts on aggregating information and trying to construct a robust model. In their view, if the model is “good”, then a “good” forecast is guaranteed. In contrast, forecast combination combines forecast rather than combines information which those forecasts were based. This practice was at odds with the forecasting climate then.

3.1 Early appreciator

Although forecast combination is inferior to combining information directly, it can deal “with short, dirty time series with tools that managerial users of the forecast can understand” (Newbold and Granger 1974, 152). The comment above was done by Mr. Stern who worked in industry.

Early appreciators are practitioners who work in industry. Mr. Craddock who worked in Meteorological Office mentioned his own experience to support for forecast combination. He said:

Whatever the views held on the combining of forecasts of time series obtained by different methods, there is no doubt that combined long-range weather forecasts, each based on several predictions founded on different physical principles, are better on average than the predictions given by any single method (156).

These evidence shows practitioners tend to value the pragmatic value of pooling approach. Moreover, Bates and Granger (1969) was published on operational organization journal instead of economic or econometric journals.

3.2 Late appreciator—the development of computer technology

The change in attitude reflects increasing appreciation on practical value. The popularization of forecast combination may attribute to the development of computer

(16)

technology³. Before the regression-based method was proposed (Granger and Ramanathan 1984), calculate forecast combination is computational intensive.

Because researchers have to calculate the k by k variance-covariance matrix.

It was until 1966, SAS was developed at North Carolina State University; SPSS was developed at SPSS Incorporate in 1968; Minitab was developed at the Pennsylvania State University in 1972. And it was not until then regression analysis became easier to use, for previously sometimes it took up to 24 hours to receive the result from one regression. The high time cost may be one of the reasons prevented experts to study further analytical tools for forecast combination. For the immature of computer statistics technology, Newbold and Granger (1974) which promotes the ‘fully automatic’ value of forecast combinations received heavy attacks from econometricians. It is likely to be the reason that led to the overall ignorance upon Crane and Crotty (1967) and other early forerunners as well.

Things began to change after 1975. Clemen (1989)’s exhibit showed an upward trend of cumulative number of articles of forecast combinations after 1975 (Exhibit 2).

The development of computer technology facilitates empirical researchers to do large scale computation which was in favor of combining approach. M competition, a study that utilizes 1001 time series data to evaluate and compare the accuracy of different forecasting methods, shows the outperformance of the simple average method over all the individual methods (Makridakis et al. 1982).A further investigation on the combination issue vindicated the robustness of averaging (combining) approach (Makridakis and Winkler 1983). Numerous empirical results in support of forecast combination approaches and the timely publication of Granger and Ramanathan (1984) both led to popularize the use of forecast combination.

3 Development in techniques or methods facilitate the development or popularization of theories. For example, Newton Raphson method were used to solve non-linear systems of equations, but because it required intensive computation, macroeconomic theories were hardly used it. It was until the popularization of an easier computational techniques, Gauss-Seidel method, in 1970s, macroeconomic theorists were facilitated to develop more complex models (Evans 1969).

(17)

Exhibit 2 Cumulative number of articles published on combined forecasts (adapted from Clemen 1989)

(18)

Chapter 4 Evolution of forecast combinations and forecast encompassing Forecast combinations was generally not accepted by econometricians in the beginning, therefore the development began in the field of operational researches where Bates and Granger published their paper.

4.1 Interaction between operational research and forecast combinations

Following Bates and Granger (1969) in the journal Operational Research Quarterly [ORQ] came a stream of articles in the same journal, including articles by Bunn (1975, 1977), Öller (1978) and Dickinson (1973, 1975). Dickinson (1973, 1975) investigated on the estimation of weight and looked further into the sampling distribution of weights. Dickinson (1973) used the minimum-variance criterion to analyze the sampling distributions of the weights, deriving the confidence limits for the estimates of the weights and of the variance of a combined forecast. The theoretical analysis showed the unreliability of the weight estimates, indicating a limited improvement of accuracy. Dickinson (1975)latter continued this study to show a minimum variance criterion will at least result in error variance that is no greater than that of any of the component forecasts. Recently, Liang et al. (2006) refined Dickinson (1973) on the distribution of the optimal combining weights by establishing a model of an inverted linear combination of two dependent F-variates. Moreover, they generalized the combining model to the case of combining three independent competing forecasts.

Dickinson (1975) also discussed the statistical properties of the weight estimators for the occurrence of negative weights. Bunn (1985) enriched the study of sign of weight by examining sign conditions under various weighting schemes, not only the error variance minimizing method but also equal weighting, optimal weighting with independence assumption, and three variations of a Bayesian combination. While Bunn’s work give a prototype of combining pair forecasts, Liang (1992) derived a general framework of multiple forecast combinations and contributed to provide a practical framework for quick check of the sign of weights. A brief summary is provided in Exhibit 3.

(19)

Exhibit 3 Theoretical development made by Dickinson (1973, 1975)

Bunn (1975, 1977) who worked on the topic of the forecast combination and who appeared in ORQ as well. He suggested a Bayesian outranking approach to enhance performance over small samples. Bunn (1975) showed a decision maker can meaningfully assigned subjective probabilities over a set of forecasting models and updated, according to a Bayesian process, so that one forecast will outperform another.

Bunn (1977) compared the method utilizing subjective probabilities on the relative forecasting ability of each predictor (or said ‘outperformance method’) and the minimum-variance method by simulating experiments and found the former will outperform the latter if there is little prior information (less than 10 observations and possibly less than 30).

Öller (1978) developed the Bayesian framework with a set of self-scoring weights derived from the experts themselves. Each expert was asked to rate subjectively a given sum of confidence weights over his own forecasts. When the sum of the confidence weights is limited, these weights could function as weights for the computation of combined forecasts. According to a Bayesian process, records of the experts' previous performance can be used to adjust the confidence weights attached to the individual forecasts.

Operational research model is closely related to forecast combinations (Exhibit 4). The proposal and the publication of the forecast combination method prospers the study of operational research, meanwhile, operational research models aid in forecast combinations for dealing with multiple objectives. According to Clemen (1989),Lawrence and Reeves (1981), Reeves and Lawrence (1982) and Gulledge Jr et al.

(1986) utilized multiple objective linear programming to minimize composite of various error statistics; Wall and Correia (1989) programmed a preferences optimization

(20)

approach.

Exhibit 4 Operational research model closely related to forecast combinations 4.2 Late development

The development after Bates and Granger (1969) is enormous. Materials selected in this chapter are mainly based on Granger’s reviewing work “Combining forecasts——twenty years later” (1989). More late reviewing articles can be found (Wallis 2011) but they have essentially taken root in Granger (1989).

4.2.1 Different information set

In the original Bates and Granger (1969) settings, the two forecasts were based on the same information set. Separating individual’s information from common information, Granger concludes in the case forecasters share common information, equal weight combinations is useful. Moreover, it is useful to include more forecasts in the combination, even if the forecast is based on same information set, because a new forecast can improve the combination in the sense of adding individual information upon common sense. Also, Granger illustrate the usefulness of negative weights.

Equivalent setting had been set up by Kim, 2001 #100@@author-year} who works independently in accounting and finance Wallis (2011).

4.2.2 Simple extensions

In the original Bates and Granger (1969) settings, past values of the forecast were not used efficiently. Researchers are encourged to include past values in combination. And if the data is not a stationary series, Granger recommanded to see Hallman and Kamstra (1989) as well as other generalizations. Moreover, multiple step forecasts are worth to try.

Another extension come with the idea of time-varing weights. Granger suggests a time-varying parameter regression using the Kalman filter. Also, one may refer to Engle et al. (1984).

(21)

Although Granger encourages include as more forecasts as possible, it is complicated when there are many forecasts available for combination. Selecting data is therefore important. Ranking all the forecasts on performance in terms of squared forecast error and leaving only the best is suggested as a trimmed way. Another decomposing way is proposed (Figlewski 1983) that forecast error was decomposed into two parts (commom error and individual error) and the wegihts was decided based on the relative sizes of the variances of the two components.

Combining forecasts with horizons longer than one period causes problems.

Because one forecast may performs better over the other in a short-term but becomes worse in a long term and therefore the weights can vary with time horizon. Based on co-integration ideas, forecasts consistent with both short-term and long-term models at all horizons has been suggested (Engle et al. 1989). Transforming the data scale by taking log may help. Also, Granger has encouraged to develop the use of Batyesian updating schemes.

4.2.3 Combining probability distributions

More complicated extensions are associated with testing the conditions of encompassing. One of the necessary conditions for model P to encompass Q is that the economically relevant features (variance, confidence interval, quartile, etc) of the one-step forecast from P has to dominates the corresponding forecast from Q. How to test encomapsing of a combination need the knowledge of combining probability distributions. A relavant question is the combination of quantiles. By combining a pair of quantiles forecasts, for example the first quarter and the third quarter, to form a forecast interquantile range (Granger et al. 1989).

Combining probability distributions is more pertinent to the problem, which is, yet, basically ignored by economists but developed well by business and management school (Clemen 1989).In regards of no completely satisfactory combining technique in the literature, Granger proposed to a possible method. First, find the corresponding quantile function of each distribution function. And second, by inversing the combination of two quantile functions, to find a sensible combined function (Granger 1989).

4.2.3.1 Axiom approaches

However, on the topic of mathematical combination of probability distribution, it is inevitable to discuss axiom approaches, which focused on axiom-based aggregation

(22)

formulas (Clemen and Winkler 1999). Aggregation of probability distribution has been long developed in management science and risk analysis journals, two common approaches are ‘linear opinion pool’ and ‘logarithmic opinion pool’.

‘Linear opinion pool’ was proposed by Stone (1961) in the article “The opinion pool”, in which a weighted linear combination of the forecasters' probabilities had been proposed. It combines subjective probability distribution to get group consensus in a mathematical approach. Let ݂݅ሺߠሻrepresent the probability distribution for a parameter ߠ of subject ݅. A consensus of probability distribution, denoted as a single distribution

݂ሺߠሻ, can be written in a weighted average form. That is

݂ሺߠሻ ൌ ෍ ݓ_௜݂_௜ሺߠሻǡ

௜

ݓ݄݁ݎ݁ݓ_௜ ൒ Ͳܽ݊݀ ෍ ݓ_௜ ൌ ͳǤ

௜

Several weighting schemes were proposed for the method, including: simple average, weighted by ranking, weighted by self-rating, weighted according to the previous performance. Simply put, the weighs are determined subjectively (Clemen and Winkler 1999).

Another axiom approach is the logarithmic approach. The ‘logarithmic opinion pool’ is usually written using the geometric form, as

ς ݂_௜^௪^೔ሺݕሻ

׬ ς ݂௜௪_೔ሺݕሻ ݀ݕǤ

With its own strength, logarithmic combination attracts scholars’ attention.

Logarithmic pooling method is convenient to manipulate. No matter first combine individual distributions, then update the combined distribution following Bayesian, or update individual distributions first, then combine, if with logarithmic pooling method, same results are derived; this property is said to satisfy the principle of external Bayesianity (Clemen and Winkler 1999, Wallis 2011).

4.2.3.2 Bayesian Approaches

Around 1980s, rising concerns about Bayesian approach shift attention from the axiomatic approach to the development of Bayesian combination models (Bunn 1989)⁴.

4 According to Bunn (1989), at the time he published Bunn (1975, 1977), there was an increasing acceptability of the “Bayesian approach to using multiple experts and different sources of evidence” (162), and this trend “reinforced the alternative idea of using multiple models for forecasting” (162).

(23)

Winkler (1968) and Morris (1974) have proposed a general Bayesian updating scheme to combine information and assess differential weights. Though some people give credit to Morris (1974) as the first establisher of Bayesian consensus model (Hall and Mitchell 2007), Winkler (1968) is probably the first researcher who proposed the primary framework. The Bayesian formwork was called "nature conjugate method", investigating the consensus of subjective probability distribution. He assumed that ݂ሺߠሻ represents a prior distribution, ߠ is the uncertain variable and ݅ is information, and defines Bayesian theorem in the form

݂ሺߠȁ݅ሻ ൌ ݂ሺߠሻ݈ሺ݅ȁߠሻ

׬ ݂ሺߠሻ ݈ሺ݅ȁߠሻ݀ߠǡ

where ݈ሺ݅ȁߠሻ is primitively interpreted as a sampling distribution or a likelihood function.

(Morris (1974)) enriched the Bayes' interpretation by decomposing the components of information z into two parts: one is from expert (denoted as݁) and another from decision maker (denoted as ݀ ). Decision makers' prior probability assessment onߠ , ݂ሺߠȁ݀ሻ , will be altered upon reception of expert's probability assessment on ߠ, ݃ሺߠȁ݁ሻ. The likelihood function ݈ሺǤ ሻ therefore explains how the decision maker subjectively feels about the credibility of the expert's probability assessment. The posterior probability distribution of decision maker can be write as

݂ሺߠȁ݃ሺߠȁ݁ሻǡ ݀ሻ ൌ ݂ሺߠȁ݀ሻ݈ሾ݃ሺߠȁ݁ሻȁ݀ሿ

׬ ݂ሺߠȁ݀ሻ݈ሾ݃ሺߠȁ݁ሻȁ݀ሿ݀ߠǡ

where ׬ ݂ሺߠȁ݀ሻ݈ሾ݃ሺߠȁ݁ሻȁ݀ሿ݀ߠ is the aggregation of the probability assessment of both decision maker and expert. Due to this sophisticated reinterpretation of Winkler (1968), (Morris (1974)) was credited as the first theoretical paper which is wholly consistent with the Bayesian view of probability. One thing notably is that Morris (1974)was published in the same journal, Management Science, as Winkler (1968), which indicates their inheriting relation.

Decisions in the face of uncertainty should be based on all available information, requiring combination of information obtained from models and experts; however, in the real world, due to common training and experiences of experts, the fact that experts have some sort of dependence is inevitable. With regard to the issue, Winkler (1981)

(24)

presents a theoretical model which formally allows dependence among experts without requiring a prior for particular form of consensus density function. Normal results were presented and the sensitivity to the degree of dependence was found in the consensus distribution.

Inspired by Winkler (1981), Agnew (1985) extended further the Bayesian consensus model to the case in which dependent experts provide probability assessment on multiple unknown parameters. Moreover, it developed Bayesian sequential updating procedure, which uses experts' past performance to determine weights in each period.

The literature extended but frustrates from practical difficulties to find the likelihood function. Because of this, effort has gone into the practical models for aggregating single probabilities and probability distributions (Clemen and Winkler 1999).

4.2.3.2.1 Bayesian combinations of event probabilities

For Bayesian combinations of event probabilities there are independence approach, Genest and Schervish approach, Bernoulli approach, and Normal approach (Clemen and Winkler 1999).

4.2.3.2.2 Bayesian models for combining probability distributions On the other front, there are Bayesian models which have been developed for combining probability distributions for continuous occurrence probability of a certain event.

The normal model has been important in this field. According to Liang and Shih (1994), the typical minimum-variance model for combining forecasts (Bates and Granger 1969, Newbold and Granger 1974) is consistent with the normal model (Winkler 1981, Bordley 1982). Moreover, a rewritten regression model (Granger and Ramanathan 1984) is equivalent to the normal model as well (Bordley 1986).In brief, I show the relations in Exhibit 5.

(25)

Exhibit 5 Connection between Bayesian Models for Combining Probability Distributions in the Normal settings and forecast combinations

By setting up the some necessary assumption, Bayesian combinations of probability are equivalent to forecast combination (Liang and Shih 1994). The Bayesian model in Winkler (1981) is equivalent to Newbold and Granger (1974) by assuming Normal distributed prior, Normal distributed likelihood and location invariant.

Following Winkler (1981), the Bayesian model in Bordley (1982) also is in equivalence to Newbold and Granger (1974) by assuming uniform distributed prior, Normal distributed likelihood, known variance-covariance matrix, location invariant.

Although the prior in Bordley is not normal, but since the prior is unimodal and symmetric⁵, this is generally not a problem (Cleman and Winkler 1999). Besides, the mean of posterior density in (Bordley 1986) can be equivalent to the rewritten regression model in Granger and Ramanathan (1984) by assuming Normal distributed prior and Normal distributed likelihood. Summary is provided in Table 2.

Table 2 Bayesian equivalence of the theories of forecast combination

The theoretical evolution continues. Anandalingam and Chen (1989) generalized results of Winkler (1981), Bordley (1982, 1986), deriving their models respectively under different conditions. Liang and Shih (1994) relaxed further the assumption of unbiased decision maker’s prior.

5 Even if the unimodal prior is just roughly symmetric, that would not be a problem (Clemen and Winkler

(26)

Although the normal model has been popular, it has some shortcomings, the obvious one is that a normal prior is required. As a consequence, several extensions are proposed (Clemen and Winkler 1999).

4.3 Forecasts encompassing

Nelson (1972) and Cooper and Nelson (1975) arouse an issue of forecast encompassing by using exactly the same formula as Bates and Granger (1969). The similarity of forecast encompassing and forecast combinations at first appearance had once induced me to categorize Nelson and Cooper’s work as just an extension of Granger’s work, but the distinguish idea in essence prevents me to do so. In contrast to Bates and Granger (1969) who skipped the evaluation steps and were satisfied with combining information just through combining multiple forecasts, Nelson (1972) and Cooper and Nelson (1975) evaluated the informational increment of an econometric model to the time series model, intending to synthesize model with combined information set. Nelson (1972) concludes that significant weights of both models in the combining regression is due to the inability of a model to include all available information. Cooper and Nelson (1975) followed the previous study, looking into the decreasing prediction errors in a post-sample test through the significance level of t-statistics in the combining regression.

4.3.1 Similar to Bates and Granger (1969) at first appearance

Nelson had written the formula of composite forecasts (Nelson 1972) to evaluate the prediction performance of the FRBMIT-PENN (FMP) econometric model of the U.S.

economy by using the simple time-series models, an empirical representations of individual endogenous variables as stochastic processes of integrated autoregressive moving average (ARIMA) form, to establish standards of accuracy. The formula is,

ܣ_௧ ൌ ߚሺܨܯܲሻ_௧൅ ሺͳ െ ߚሻሺܣܴܫܯܣሻ_௧൅ ߝ_௧Ǥ Derived ߚመ as

ߚመ ൌ σሾሺܨܯܲሻ_௧െ ሺܣܴܫܯܣሻ_௧ሿሾܣ_௧െ ሺܣܴܫܯܣሻ_௧ሿ σሾሺܨܯܲሻ_௧െ ሺܣܴܫܯܣሻ_௧ሿ^ଶ ǡ

which after transformation was exactly same as the product of variance-covariance method in Bates and Granger (1969),

(27)

ߚመ ൌ σ ݑ_ଶ௧^ଶ െ σ ݑ_ଵ௧ݑ_ଶ௧ σ ݑ_ଵ௧^ଶ ൅ σ ݑ_ଶ௧^ଶ െ ʹ σ ݑ_ଵ௧ݑ_ଶ௧Ǥ

However, he was not using the ߚመ as weight to optimize the forecasting ability, but rather he interpreted results in some other way.

4.3.2 Inherited the spirit of Markowitz (1952)

Nelson (1972) drew analogues between his work and Markowitz (1952)’s, saying that the ߚመ in encompassing formula is just “as the weight for a minimum variance two-asset portfolio depends on the covariance of returns as well as on return variances” (Nelson 1972, 911). In brief, both of them measures the influence of one thing (forecasting model or portfolio) by its impact to the whole, but not by its individual properties.

Rather than measure individual errors of each model, they use a composite forecast as a benchmark to measure the expected loss reduction in associated with a combined formula. Evidences showed that composite models were largely more accurate than FMP models but only accurate than ARIMA models in some cases, reflecting the inefficiency of certain FMP models in a sense that combining the FMP models with an ARIMA models significantly reduced the forecasting error of FMP.

Nelson (1972)’s motivation is standing from the viewpoint of the decision maker, making an overall evaluation of information contained in models. The question of whether one model or the other is more accurate is irrelevant, for a decision maker, his objective is to minimize expected loss, the contribution of one model should therefore measure by its comparison with the composite model. He had said (Nelson 1972):

[F]rom the viewpoint of the decision maker the question of whether one set of predictions or the other is more accurate is irrelevant. Since his objective is to minimize expected loss, he will purchase any piece of information which reduces expected loss by more than its cost. Thus, the value of the ARIMA predictions, for example, is not measured by their individual errors, but rather by the contribution which they are able to make to the reduction in expected loss associated with a composite prediction or a set of composite predictions (913).

The idea underlying is just the same as Markowitz (1952). Markowitz's work

(28)

showed that it is trivial to look at a security's own risk, to an investor, the important thing is the contribution the security brings to the variance of his entire portfolio.

Comparable in idea to Markowitz, Nelson knew the important thing to a decision maker is the overall contribution one model makes to the forecasting ability of the composite forecast rather than individually evaluate the accuracy of each model.

4.3.3 Forecast combinations and encompassing

The connection of forecast combinations and encompassing was discussed by several studies. Note that, however, nether Nelson (1972) nor Cooper and Nelson (1975) used the term ‘forecast encompassing’. It was until 1986, Mizon (1984) coined the term.

This model evaluation technique which essentially coincides with the forecast combination formula hereafter arouse concerns about the connection between it (forecast encompassing) and forecast combinations.

After Mizon (1984), regarding the changing characteristics of economy system and the consequence insufficient validity of each individual model to be passable overall performance, Chong and Hendry (1986) were motivated to investigate the suitable situation of using a system evaluation techniques. Their motivation indicating this paper was in line with Nelson’s work, essentially, efforts to improve the model specification was encouraged. They investigated in 4 methods, among these system evaluation techniques, one of the approaches was forecast encompassing. Comparing the empirical and theoretical fitness of forecast encompassing with that of conventional methods, they concluded the forecast encompassing is both feasible and more promising than others.

Following Chong and Hendry (1986), Diebold (1989) inherited the viewpoint of using combining regression as information encompassing test. In his work, though the pragmatic virtues of forecast combinations was argued, the efficiency of combining forecast was still in doubt to Diebold. Eventually, using combining regression to facilitate the combination of information set was emphasized.

Fang (2003) extended Diebold (1989) not only demonstrated encompassing tests as tools in model specification but reversely demonstrated encompassing tests as tools to explain the accuracy improvement of forecast combinations. Liang and Ryu (2003) showed further encompassing tests as a valuable principle on the choice of the

(29)

forecasts in the combining regression. Though without reference to Fang (2003), Liang and Ryu (2003) also established a two-way interaction between forecast combinations and forecasts encompassing.

Recently, more complex econometrics models are connected to the study of forecasts encompassing such as nested model, quintile forecasts, probability forecasts (Clements and Harvey 2010; Clements and Harvey 2011).

Brief summary is provided in Exhibit 6

.

Exhibit 6 Evolution of forecasts encompassing 4.4 A modified exhibit

From the perspective of theoretical development, the evolution of forecast combination is provided in Chapter 1. From perspective of early application of pooling ideas, the impact of portfolio selection and two-stage forecasting model on forecast combination are suggested (Chapter 2). The late development of forecast combination is briefly introduced in this chapter. Also, the impact of portfolio selection on forecast encompassing is suggested.

(30)

In sum, a modified exhibit based on Clemen (1989)’s review is provided in Exhibit 7.

Exhibit 7 A modified exhibit based on Clemen (1989)

(31)

Chapter 5 The ensemble ideas in natural science

Natural science suffers no less for uncertainty problem than social science. Such as chaos in the fluid dynamics, no always best model for earthquake prediction and the complexity in ecology system all deeply bother scientists who pursue accuracy in prediction. Combination techniques are therefore employed by the nature scientists. In a sense, they are just like a business man who faces the uncertainty in business and address problem by seeking a better prediction method as well.

Researchers interested in the theory of combination may consult this thesis to have a brief understanding of how other researchers come up with the combination idea, and how does it connect to Granger’s forecast combinations. Put simply, motivation of using combination ideas and its evolution or its connection to forecasts combinations will be stressed here.

5.1 Ensemble forecasting in meteorology

Forecasting weather in numerical way has been developed long ago but the innate chaotic nature of climate system made precise prediction impossible. A bit earlier than 1969, average forecasts was made to get a more precise results. Soon the concept of combining has taken root in meteorologists’ mind.

The motivation for meteorologists to develop the techniques of forecast combination is due to uncertainty. This is just as same as Markowitz’s reason. The main spirit of Markowitz’s theory of portfolio is the underlying uncertainty in portfolios.

Regard to uncertainty, they take a same processes, averaging estimates to generate a representative estimation.

This field may not be familiar by economists so I will first introduce its history and operational method.

5.1.1 Development of numerical weather prediction

It was until the mid of 19^th century, scientists came up with the idea that it should be possible to forecast weather from calculations based upon natural laws (FitzRoy 1863). Latter, accompanying with the development in physics (Bjerknes 1904) and computer technology, Charney (1951) achieve to apply his barotropic equation set (Richardson 2007). Weather forecasting has thereby come to dominate meteorology through its application in numerical weather prediction (NWP).

(32)

The quality of forecasting accuracy was poor in 1950s even just for a two-day forecast (Kalnay et al. 1998). However, the forecasting error has been reduced remarkably in the decade following. Technology has brought breakthrough, taking the leading forecasting agent, Met Office in United Kingdom, for example. From 1954 to 1966, the first operational system was established; from 1967 to 1978, computer has already facilitate scientists to deal with 10-level model that solved the Navier–Stokes equations of several weather character (including: fluid motion, the thermodynamic, heat transfer and continuity equations, etc.); from 1976 to 1992, NWP advanced in Mesoscale which is an intermediate scale between the scales of weather systems and of microclimates, meanwhile, a new 15-level model was developed to replace the 10-level model in 1982 for use in global aviation; more recently, efforts are put in the development of a unified climate–forecast model (Golding et al. 2004). The improvement of NWP basically follows after the improvement of computer equipment.

No matter D. Hendry once had complained British government that “when the official weather forecasting service missed correctly forecasting a particularly damaging storm the response was to buy larger computers for the forecasters; when the economic forecasters failed to predict a major economic event it was decided to substantially reduce support for research in our area”(Granger 2001, 478).

5.1.2 Introduction of ensemble forecasting

Ensemble forecasting is one of the branches of numerical weather prediction (NWP) that allow us to estimate the uncertainty in a weather forecast as well as the most likely outcome. Instead of running the NWP model once, the model runs many times from very slightly different initial conditions to deal with the chaotic nature in weather forecasting. Two procedures are often used to modify the model settings within an ensemble. One is multi-model ensemble which use more than one model within the ensemble. The other is multi-physics ensemble which use the same model but with different combinations of physical parameterization settings(Organization 2012).

It follows that, rather than a deterministic forecast, ensemble forecast produces a probability which evaluate the probability an event to occur at a particular location. In fact ensemble forecast produce more than probability forecast, which will be covered latter. Exhibit 8 illustrates how a multi-model ensemble samples the uncertainty of the forecast.

(33)

Exhibit 8 Concept of ensemble forecast (adapted from Fritsch et al. 2000, 572).

6.1.3 Operational method

Ensemble forecast applied to differing range of weather forecast, the types of ensemble forecast include global, regional and convective-scale ensemble forecast with various observation domains from 70 kilometers to 1 kilometers. The standard ensemble forecast products includes not only the probability forecast mentioned before but also ensemble mean, ensemble spread, quantiles, spaghetti maps, postage stamp maps and site-specific meteograms (Organization 2012).

Assigning weights in a combination is most critical in ensemble forecast. The process generally starts from sampling uncertainties and assign more weight on forecasts which have high resolution. Examples of the capabilities of numerical high resolution forecast are obvious differing weather characteristic in a location, such as the differing thermal, wind and precipitation patterns in a valley or slope; or the different weather patterns on the opposed side of a weather barriers (e.g. Alps, Andes, Appalachians) and distinction of resulting thermal, wind and precipitation patterns.

Because forecast in these are is high resolution and therefore produces high precision, forecasters call them “high resolution” or “high control”. Comparing the relative capabilities of ensemble members with high resolution/control, forecasters assign weights in ensemble (EPS 2006).

Subjective modification of weight in an attempt to improve the distribution is accepted but not encouraged. For certain very short-period forecasts and for local forecasts over a small area subjective modification may help but not for longer-period

(34)

forecasts or forecasts over a large area. Using the whole ensemble distribution in a probabilistic approach is strongly recommended (Organization 2012).

5.1.3 Motivation and evolution of ensemble forecast 5.1.3.1 Chaos

Lorenz (1963) mathematically described the unstable characteristic of the fluid dynamics. Given a finite systems of deterministic ordinary nonlinear differential equations which is designed to represent the hydrodynamic flow, the solution is sensitive to the initial conditions. A slightly differing initial states will quite frequently end up in considerably different states, which correspond to calm or stormy weather ('Edward Lorenz'). However, the chaotic nature of the fluid dynamics and the limited observation data certainly lead to uncertainty in the estimation of true initial state. As a consequence, it is hard for a single set of initial states to properly predict the final state.

This influential paper established the field of chaos theory and have led to the theoretical development of ensemble forecasting.

Incidentally, Sanders (1963) investigate the uncertain nature of subjective probability forecasts by averaging subjective weather forecasts, illustrating the superiority of the combined procedures. Latter, Bosart (1975) also tested on the performance of the average subjective weather forecasts and confirm Sanders (1963)’s results.

5.1.3.2 Early development

Stochastic method was applied to “assess the value of new or improved data by considering their to assess the value of new or improved data”(Epstein 1969, 739). The Monte Carlo experiment results shows stochastic dynamic predictions have outperformed traditional deterministic procedure significantly in terms of mean square errors.

The ensemble conduct continued, as I mentioned before, in 1974, Craddock who worked in Meteorological Office was invited to the discussion of Newbold and Granger (1974) has commented on forecast combination in the perspective of ensemble weather forecast. In a consensus of combined ideas, professors from economic and business cooperated with meteorologists. Winkler and Murphy (1977) showed the outperformance of the average of subjective precipitation probability forecasts over

(35)

individual opinion made by numerical weather forecaster. Clemen and Murphy (1986) average the subjective weather forecasts and model output statistics, finding again the superiority of the combined product over the individual one. Inspired by meteorology, professors from economic and business applied forecast combination techniques in the context of weather prediction. Clemen (1985) based on the Bayesian framework outlined by Morris (1974) to discuss whether human weather forecaster can bring new information to the mechanical guidance forecast and whether on human weather forecaster bring incremental information over the other, in the context of precipitation probability forecasts.

To sum up, meteorologists has tried to model weather system since 1951 but the innate chaotic nature prevent them to easily do so. Ensemble forecast has been put into practice since 1963 but was mature till 1999 which has to thank to the idea from forecast combinations (Newbold and Granger 1974) and the advanced computing power.

Professors in economic and business were also inspired by meteorologists, doing forecast combinations researches in the context of weather forecast, such as applying forecast combinations technique (Winkler and Murphy 1977) or investigating the informational contribution in the context of weather forecast combinations (Clemen 1985). A brief summary is provided in Exhibit 9.

Ensemble forecasting provides more information to weather forecasters. In the process of producing ensemble forecast, researchers gain some benefit from comparisons. If the estimated forecasts vary a lot, researchers can thereby calculating a probability of the uncertainty for the final product. On the other hand, if the estimated forecasts are all very similar, forecasters may therefore have more confident on the accuracy of the final product. ('Ensemble Forecasting'). The application of ensemble forecasts has not been restricted to weather forecasting, it also applies to flood forecasting and bunches of projects (Cloke and Pappenberger 2009).

(36)

Exhibit 9 Evolution of ensemble forecasts 5.2 Seismology and Ecology

To enrich the findings in the application of combined ideas, embedded combined ideas in seismology and ecology are illustrated in this thesis as well, which potentially enlarge the modified exhibit in Chapter 4.4.

5.2.1 Seismology

The well-known fact that “[e]arthquake prediction research has been conducted for over 100 years with no obvious successes” (Geller 1997, 425) provide me an impetus to dive into this branch of study. No matter to address uncertainty in the choice of proper prediction models (Cooper and Nelson 1975; Nelson 1972) or to improve forecasting accuracy (Sanders 1963; Bates and Granger 1969), combination techniques were widely applied as a nature corresponding answer. However, it’s until late 1990s, seismologists start to use combination techniques in papers. Most of the review of seismology still ignores the developing combination techniques (Ben-Menahem 1995;

Agnew 2002). Seismologists still mainly used the term ‘combination’ in physical meaning, such as the ‘combination of point forces’.

Without doubt, the practical concern is the most important spirit that motivate seismologists to use combination techniques. After all, the more precise we can avoid the destruction of earthquake, the more safety we enjoy. Combination techniques has applied in earthquake forecasting and also applied in the earthquake building engineering such as seismic response spectra and nonlinear dynamic analysis to reach reliable estimate.

Marzocchi et al. (2012) illustrated the uncertainty problem seismologists

(37)

have met with and clearly pointed out the practical value of merging models, saying that:

[I]n practice, we never know which of the candidate models will be the best in a long testing phase. We also note that the best candidate model may capture one important part of the earthquake generation process well, while others might suitably represent secondary, or at least more subtle, features (2577).

The spirit and the response of the cited paragraph are just the same as that in economic forecasting. Both of them used the combination techniques in address of uncertainty problems (Winkler 1989, 608):

In our uncertain and rapidly changing world, I think that adhering strictly to this [the development of ‘true’ model] ideal is counterproductive in most important forecasting situations. I prefer to view forecasts as information and the combining of forecasts as the aggregation of information. The key question is how best to accomplish this aggregation (606).

Regarding the uncertain problem, Neither Marzocchi et al. (2012) agree with the practice that “simply adopt the model that has performed best so far and disregard all others” (2577) nor do several scholars who do economic forecast. They all reason the advantage of forecast combination for some of its component models may outperform over the up-to-now best model in other period.

5.2.1.1 Earthquake forecasting

Combination techniques in earthquake forecasting had been happened relatively late. Fedotov et al. (1977) mentioned it while compared two earthquake statistics method. And just one sentence did he said about the combination techniques in the middle of his paper: “Thus, a combined use of various methods seems to be one of the hopeful ways of increasing efficiency of prediction” (320). No more in introduction and no more in conclusion as if nobody would care about his suggestion.

To my knowledge⁶, the next time that seismologists came up with the idea of combination had to wait until 1990s. Sobolev et al. (1991) followed Fedotov et al. (1977) drawing compiling maps of expected earthquakes and both of them found positive forecasting ability in real time. And more pertinent to this paper, Sobolev et al. (1991)

6 Though Sobolev et al. (1991) mentioned Aki (1981) as a forerunner of precursors combinations as well, I found that Aki (1981)’s work is more pertinent to an universal measurement that would be useful for

(38)

discussed further the issue of combination. By using Bayesian formula, they combined the probability of expectation of a large earthquake with prognostic precursors. Up to 1989, three earthquakes occur within the areas of expected earthquakes but outside the center. But there were to areas indicated high possibility of earthquake but no strong earthquakes have been reported yet. Criticism though admitted that this paper pushed forward the concept of using combination techniques to improve forecasting ability, pointed out that the interdependence of the combined elements may be a potential problem (Shebalin et al. 2014).

Other than the Bayesian method, because “[i]t is well known that combining many models ……may yield higher performances than any individual member” (37), seismologists are working on their way to combine forecasts in their method. Shebalin et al. (2014) proposed a rate combinations method. It transforms different model outcome (for example: some measures in level, some measure in number) into one base and then multiplies the parameter derived with earthquake occurrence rate to get a new combined model.

After 2000s, more papers are aiming at investigating combined short-term and long-term models and correspondingly exhibited a ‘suspected’ close relation between earthquake forecasting and forecast combinations. I use the word ‘suspected’ because though no directly citation from Granger or other scholars in economic and business, the similarity in the tool and the spirit indicate a strong tight between the two subjects.

Rhoades and Gerstenberger (2009) combined short-term earthquake probability (STEP) forecasting model with long-range earthquake forecasting model EEPAS in their paper.

Each model typically based on time, density and location and a Poisson function to generate, a prediction, an earthquake occurrence rate. The authors combine the rate-based model by using the relative performance of the model as measurement to choose weights in the weighted average formula. Just as same as forecast combinations.

Moreover, they also evaluate model performance by some conventional statistical tools, such as AIC. Technically, the difference between forecast combinations and their mixture models is just that seismologists tend to write down likelihood function and use simplex method, a popular algorithm for linear programming, to solve the optimization problem. Rhoades (2013) extended the study to a more wide ranging.

5.2.1.2 Seismic analysis

Seismic analysis is an earthquake building engineering and is a subset of