Leading Indicator Identification - 應用貝氏定理與鑑別分析建立金融危機預警系統

For 48 different input variables with 120 samples, we first use F statistics to make feature selection. Table 5.2 presents the results and sequencing of significance using ANOVA to evaluate the F statistics and the p -level of all input variables. The F value provides a test for the statistical significance of the differences among means of classified samples.

Table 5.2: Key variables extracted by F statistics ARGENTINA

Variable F statistic p-level Variable F statistic p-level TD/FR 175.46345 0.00000 DC/GDP 10.97127 0.00129 M2/FR 90.87242 0.00000 FR/IM 10.35467 0.00175 (EX - IM)/GDP 82.22062 0.00000 12m M1/FR G 8.55043 0.00428

RIR 81.08383 0.00000 12mIM G 8.22778 0.00504

CA/GDP 78.96180 0.00000 TD/GDP 8.07963 0.00544

INR 71.51459 0.00000 12mM1 G 7.61407 0.00690

mM2 G 55.57029 0.00000 M1/FR 7.01714 0.00940

FR/EX 54.72298 0.00000 mMMM G 6.58445 0.01179

mIM G 53.12155 0.00000 FR G 6.49024 0.01239

MMM(M2/RM) 51.80418 0.00000 RER 6.08802 0.01533 mM1 G 44.70306 0.00000 SD/GDP 4.39464 0.03860

mGDP G 42.54848 0.00000 EMR 4.07456 0.04624

mGDPC G 42.52536 0.00000 12mEX G 3.15495 0.07877 m DC/GDP G 36.86279 0.00000 GDPC 1.95798 0.16486 12mMMM G 30.82983 0.00000 FBY 1.23840 0.26848 m M2/FR G 30.57065 0.00000 12mGDPC G 1.05769 0.30625 m CBD G 28.65698 0.00000 12mGDP G 1.04226 0.30979 m M1/FR G 25.39791 0.00000 12mINR 0.96480 0.32838

RRA 24.18581 0.00000 NSR 0.86332 0.35507

mNSR G 22.89918 0.00001 12mNSR G 0.68292 0.41057 12m CBD G 21.40104 0.00001 SD/FR 0.42338 0.51676 12m M2/FR G 18.52652 0.00004 mEX G 0.30513 0.58193 12mFR G 16.00145 0.00012 12mM2 G 0.13088 0.71829 12m DC/GDP G 13.84032 0.00033 CBD 0.07593 0.78346

For a variable, the larger the F value is, the more significantly the variable affects the response variable. Comparing with other variables, TD/FR is the most significant variable by F statistics. Variables with p -level > 0.05 are relatively insignificant ones. So we remove 12mEX G, GDPC, FBY, 12mGDPC G, 12mGDP G, 12mINR, NSR, 12mNSR G, SD/FR, mEX G,

12mM2 G and CBD these 12 variables. Afterwards we select the remained 36 variables as key variables at first stage of variable eatraction.

For the key variables extracted by F statistics, we continue to use Spear-man correlation and to make further feature selection. Modulus of a cor-relation coefficient represents the strength of the cor-relationship between two variables. Here we take advantage of the Spearman correlation to find out which variables vary with the forward crisis variable. Table 5.3 presents the results of implementing the Spearman correlation for the 36 key variables and arranges in an order according to the important degree. Variables with

coeffi-Table 5.3: Key variables extracted by Spearman correlation ARGENTINA

Correlation Correlation

Variable Coefficient p-level Variable Coefficient p-level TD/FR 0.57104 0.00000 m M2/FR G 0.33710 0.00057 (EX - IM)/GDP 0.54215 0.00000 INR 0.33026 0.00074

CA/GDP 0.53188 0.00000 mM1 G 0.32630 0.00087

RIR 0.53126 0.00000 m M1/FR G 0.32531 0.00090

FR/EX 0.52875 0.00000 12m M1/FR G 0.30174 0.00217

M2/FR 0.51016 0.00000 mNSR G 0.28502 0.00387

MMM(M2/RM) 0.49730 0.00000 12mM1 G 0.28433 0.00396 12mMMM G 0.42268 0.00001 12mIM G 0.27646 0.00513

mGDP G 0.41769 0.00001 DC/GDP 0.26588 0.00720

mGDPC G 0.41671 0.00002 RER 0.23755 0.01676

RRA 0.41580 0.00002 m DC/GDP G 0.23293 0.01907 12m DC/GDP G 0.41165 0.00002 FR/IM 0.22951 0.02096 12m M2/FR G 0.38952 0.00006 FR G 0.22114 0.02626

mIM G 0.38919 0.00006 M1/FR 0.21327 0.03225

mM2 G 0.36757 0.00016 EMR 0.20541 0.03934

m CBD G 0.36659 0.00016 SD/GDP 0.20344 0.04130 12m CBD G 0.36562 0.00017 mMMM G 0.17298 0.08366 12mFR G 0.36180 0.00020 TD/GDP 0.12188 0.22470

cients greater than 0.2 will be thought acceptable. Among them, TD/FR has the highest correlation coefficient with the forward crisis variable. Besides, (EX - IM)/GDP, CA/GDP, RIR, FR/EX and M2/FR also have quite high correlation coefficients greater than 0.5 as well. These six variables certainly have strong relationships with the forward crisis variable. After the second stage of variable extraction, variables with p -level > 0.05, such as mMMM G and TD/GDP, will be removed. Therefore, the remained 34 variables will be regarded as the final key variables. After the final key variables are extracted, we will find out leading indicators among them in two kinds of methods. One of the methods is factor analysis, which is employed for system BEWS-A.

The other method is discriminant analysis, which is employed for system BEWS-B.

For BEWS-A, factor analysis is proceeded to uncover relationships among the key variables, and classify them. The purpose is to reduce the number of key variables and remove the redundancy. While performing the factor analysis, we use PCA to extract consecutive factors, and the extracted factors are orthogonal to each other. Figure 5.6 is the plot of eigenvalues evaluated in PCA. Given an eigenvalue threshold of 1.0 to determine the number of factors according to the Kaiser criterion, so we only retain the first 8 factors. In each factor, the factor loading of a variable represents the correlation between the variable and the factor axis. If the modulus of a factor loading for a variable is high on a specific factor, we can infer that the variable is strongly correlated with the factor. Figure 5.7 lists the factor loadings obtained from the factor analysis for all of the key variables and factors. It indicates that the correlation is high enough for the fields which are marked. Because the factors are independent of each other, only one field will be marked in one

Figure 5.6: Plot of eigenvalues evaluated in factor extraction

row. Therefore, each variable will only belong to a specific factor. In this way, we succeed in dividing the key variables into 8 classes. In each factor, the number of variables which have high correlations with this factor are more than one, so we realize that there must exist some redundancy. Taking factor 1 for example, there are many marked variables which have high correlations with it, including RER, CA/GDP, M2/FR, TD/FR, FR/EX, 12mM1 G, MMM(M2/RM), 12mMMM G, DC/GDP and (EX-IM)GDP. These variables are belong to the same class and highly related with each other. Therefore, we will only choose one variable as leading indicator in one factor, and the variable must have the highest Spearman correlation coefficient with the forward crisis variable. Table 5.4 summarize the identified leading indicators for all of the factors.

Figure 5.7: Factor loadings of the key variables and the factors extracted in factor analysis

Table 5.4: Leading indicators identified in BEWS-A for ARGENTINA

Table 5.5 persents the correlation matrix for the 8 leading indicators and the forward crisis variable. we can notice that the highest correlation coeffi-cient between the leading indicators is smaller than 0.60. so it is believable that there are not redundancy existing among the leading indicators. Be-sides, from the sactterplots of Figure 5.8, we can know these leading indica-tors truly have high correlations with the forward crisis variavle, mY4Q ERW (which is also called the response variable). As for the Figure 5.9 and 5.10, they show the posterior probability series using regime shift detection for the 8 leading indicators in comparison with the forward crisis variable. We can find the posterior probabilities for the samples with crises are obviously greater than the posterior probabilities for the samples without crises.

For BEWS-B, discriminant analysis is the technique used to identify lead-ing indicators. In this thesis, we adopt a forward stepwise analysis. The model of discriminanation is built step by step relying on including the key variables one by one to examine the F -to-enter and F -to-remove values. Af-ter finishing all of the steps, the variables remained in the model are exactly

Table 5.5: Correlation matrix for leading indicators and the forward crisis

ERW 1.00 0.57 0.42 0.39 0.42 0.53 0.33 0.37 0.23 TD/FR 0.57 1.00 0.56 0.13 0.17 0.59 0.25 0.30 -0.07 RRA 0.42 0.56 1.00 0.01 -0.03 0.54 0.41 0.43 0.12 mIM G 0.39 0.13 0.01 1.00 0.43 0.21 0.22 0.02 0.23 mGDP G 0.42 0.17 -0.03 0.43 1.00 0.20 0.03 0.08 0.21 RIR 0.53 0.59 0.54 0.21 0.20 1.00 0.42 0.37 0.14 INR 0.33 0.25 0.41 0.22 0.03 0.42 1.00 0.23 0.57 mM2 G 0.37 0.30 0.43 0.02 0.08 0.37 0.23 1.00 0.25 FR/IM 0.23 -0.07 0.12 0.23 0.21 0.14 0.57 0.25 1.00

the leading indicators for BEWS-B. These leading indicators will be used to construct discriminant functions, and to contribute the most to the discrim-ination between groups in samples. Table 5.6 presents the results using the discriminant model with identified leading indicators. The Wilks’ lambda is generally used to denote the statistical significance of the discriminatory power of the current model. Its value ranges from 1.0 (no discriminatory power) to 0.0 (perfect discriminatory power). Each value in the second col-umn listed in Table 5.6 denotes the Wilks’ lambda after the respective vari-able is entered into the model. Partial lambda is the Wilks’ lambda for the unique contribution of the respective variable to the discrimination between groups. Because a lambda of 0.0 denotes perfect discriminatory power, the smaller the Partical lambda, the greater is the contribution to the overall dis-crimination of the respective variable. As we can see, the Partial lambda

in-Figure 5.8: Scatterplots of posterior probabilities of extracted leading indi-cators and the forward crisis variable for ARGENTINA

dicates that variable M2/FR contributes most, variable M1/FR second most, variable 12mIM G third most, and variable 12mFR G contributes least to the overall discrimination. Thus, we can conclude that M2/FR and M1/FR are the major variables that allow us to discriminate between samples with and without crisis. As for the Tolerance, it is defined as 1 minus R-square of the respective variable with all other variables in the model, and this value

Table 5.6: Leading indicators identified in BEWS-B and the discriminant model for ARGENTINA

ARGENTINA

Leading Wilks’ Partial F-remove p-level Tolerence 1-Toler.

Indicator Lambda Lambda (1,106) (R-Sqr.)

12mFR G 0.30226 0.99983 0.01831 0.89262 0.34330 0.65677 M2/FR 0.52296 0.57789 77.42519 0.00000 0.06484 0.93516 FR/IM 0.31784 0.95083 5.48172 0.02109 0.21375 0.78625 m CBD G 0.31630 0.95547 4.94027 0.02836 0.12466 0.87534 RRA 0.30269 0.99841 0.16917 0.68168 0.06766 0.93234 12m DC/GDP G 0.31665 0.95439 5.06542 0.02647 0.11063 0.88937 12mIM G 0.32733 0.92327 8.80913 0.00371 0.15825 0.84176 M1/FR 0.34929 0.86521 16.51314 0.00009 0.05886 0.94115 mM2 G 0.30401 0.99407 0.63207 0.42837 0.12691 0.87309 MMM(M2/RM) 0.31515 0.95895 4.53707 0.03548 0.03842 0.96158 INR 0.31026 0.97406 2.82338 0.09585 0.27147 0.72853 m M2/FR G 0.31290 0.96584 3.74873 0.05551 0.18320 0.81680 FR G 0.30703 0.98431 1.68988 0.19644 0.19231 0.80769

gives an indication of the redundancy of the respective variable. For example, when the variable INR is about to enter into the model, it has a tolerance value of 0.27, then INR can be considered to be 73% redundant with the variables already included. In this system, we set the Tolerance threshold at its default value of 0.01. If a variable is included in the model that is more than 99% redundant with other variables, it means that its practical contribution to the improvement of the discriminatory power is dubious and the variable will be removed.

Figure 5.9: Comparison between posterior probabilities of each leading indi-cator and the forward crisis variable

Figure 5.10: Comparison between posterior probabilities of each leading in-dicator and the forward crisis variable

在文檔中應用貝氏定理與鑑別分析建立金融危機預警系統 (頁 61-72)