Derive the expression of A(y) that is defined as {a|

Knowledge-Internalization Process for Neural- Neural-Networks Practitioners

Step 3: Derive the expression of A(y) that is defined as {a|

a∈ Φ_o^-1(y) AND rank(W^Hω(a)) = rank(W^H)}; and Step 4: Derive the expression of f ^-1(y) that is defined as {x|

W^Hx = ω(a) with all a ∈ A(y)}.

Take Network I to illustrate the explicit outcomes and the insights obtained within the extracting process. Φ_o^-1(y) = {a| 15.1206a₁- 34.366a₂+ 5.6589a₃- 21.9999a₄= y - 100.4744}. Φ_o^-1(y) is in the form of linear equation. Thus, for each non-void value y, Φ_o^-1(y) is a hyperplane in (-1, 1)⁴. As y changes, Φo-1(y) forms parallel hyperplanes in (-1, 1)⁴; for any y changes of the same magnitude, the corresponding hyperplanes are spaced by the same distance. The activation space is entirely covered by these parallel Φ_o^-1(y) hyperplanes, orderly in terms of the values of non-void y.

Furthermore, the center of these parallel hyperplanes is the Φ_o^-1(100.4744) hyperplane. These parallel hyperplanes form a (linear) scalar field: For each point a of the activation space, there is only one output value y whose Φ_o^-1(y) hyperplane passes point a; all points on the same Φ_o^-1(y) hyperplane are associated with the same y value.

Note that the function x^Tw^H_h⋅ = tanh^-1(a_h) -b_h^Hwithin the h^th component in the right-hand side of (5) is a separable function from the one within the other components. Given an activation value a_h, {x ∈ ℜ³| x^Tw^H_h⋅ = tanh^-1(a_h) -b_h^H} defines a hyperplane in the input space, since all w^H_h⋅ and

bh are given constants. For the h^th hidden node, the hyperplanes associated with various a_h values are parallel and form a (linear) scalar activation field in the input space [16]: For each point x of the input space, there is only one activation value a_h whose corresponding hyperplane passes point x; all points on this hyperplane are associated with the same a_h value. Furthermore, each hidden node gives rise to an activation field in the input space, and four hidden nodes set up four independent activation fields in the input space.



_h⁴₌₁^{^x^|^x^T^w^h^H^⋅⁼^tanh^-¹⁽^a^h⁾^-^b^h^H^} in (5) can be denoted by {x|W^Hx = ω(a)}, where ω(a) ≡ (ω1(a₁), ω2(a₂), ω3(a₃), ω₄(a₄))^T and ω_h(a_h) ≡ tanh^-1(a_h) -b_h^H for all 1 ≤ h ≤ 4. Given the activation values of a, ω(a) is simply a vector of known component values and the representation

W^Hx = ω(a) (6) is a system of four simultaneous linear equations with three unknowns. Furthermore, W^Hx = ω(a) is a set of inconsistent simultaneous equations if rank(W^Hω(a)) = rank(W^H) + 1 [5, p. 108], and thus the corresponding point a is void. The discussion establishes Lemma 1 below.

Lemma 1. An activation value a is void if rank(W^H^{ω(a)) =} rank(W^H) + 1; otherwise, a is non-void.

Now W^H equals requirement of (7) corresponds to a non-void point;

otherwise, a void point. Moreover, for each non-void a, (6) defines a point in the input space.

tanh^-1(a₄) = 2.646686748 + 3.248238694tanh^-1(a₁) -

0.801390022tanh^-1(a₂) + 0.931270242tanh^-1(a₃) (7) Thus, the non-void set, which consists of all non-void a’s, equals {a| tanh^-1(a4) = 2.646686748 + 3.248238694 tanh^-1(a₁) - 0.801390022tanh^-1(a₂) + 0.931270242tanh^-1(a₃)}, which is a (non-linear) 3-manifold in (-1, 1)⁴. A p-manifold is a Hausdorff space X with a countable basis such that each point x of X has a neighborhood that is homomorphic with an open subset of ℜ^p. A 1-manifold is often called a curve, and a 2-manifold is called a surface [4]. For our purpose, it suffices to consider Euclidean spaces, the most common members of the family of Hausdorff spaces.

The internal-preimage of a non-void value y, A(y) ≡ {a

| 15.1206a₁- 34.366a₂+ 5.6589a₃- 21.9999a₄= y- 100.4744, tanh^-1(a4) = 2.646686748 + 3.248238694tanh^-1(a1) - 0.801390022tanh^-1(a₂) + 0.931270242tanh^-1(a₃)}, is a 2-manifold in (-1, 1)⁴. The group of the (non-empty) A(y)s forms an internal-preimage field in the activation space – there is one and only one A(y) passing through each non-void a; for any a on A(y’), its output value is equal to y’; and A(y)s are geometrically aligned orderly according to the positions of Φo-1(y)s, since the A(y) is the intersection of Φo -1(y) and the non-void set. Thus, the non-void value y can be represented as 15.1206a₁ - 34.366a₂ + 5.6589a₃ - 21.9999tanh(2.646686748 + 3.248238694tanh^-1(a₁) - 0.801390022tanh^-1(a₂) + 0.931270242tanh^-1(a₃)) + 100.4744.

Any point that is in A(y) links to the preimage f ^-1(y).

Thus, for any non-void y, the corresponding preimage f ^-1(y) of Network I equals {(T_t, r_c, r_t)^T| T_t = 8.619766318 - 16.38982949tanh^-1(a₁) + 11.9539307tanh^-1(a₂) - 2.216810863tanh^-1(a₃), r_c = 0.012291155 + 0.01700793tanh

-1(a₁) - 0.02138228tanh^-1(a₂) + 0.017746672tanh^-1(a₃), r_t = 0.046189981 + 0.052432722tanh^-1(a₁) - 0.015121128tanh

-1(a₂) + 0.026740939tanh^-1(a₃), 5.6589a₃ - 21.9999tanh(2.646686748 + 3.248238694tanh^-1(a1) - 0.801390022tanh^-1(a₂) + 0.931270242tanh^-1(a₃)) = y - 100.4744 - 15.1206a₁+ 34.366a₂, -1 < a₁< 1, -1 < a₂< 1, -1

< a₃< 1}, which is a 2-manifold in the input space.

The input space is entirely covered by an orderly grouping of f ^-1(y)’s that forms a preimage field in the input space. That is, there is one and only one f ^-1(y) passing through each x; the corresponding output value of the network to this x is the y value associated with this f ^-1(y);

and f ^-1(y)’s are aligned orderly, though not necessarily with equal space.

Similar extracting processes can be applied to Network II and III to have their corresponding preimages shown in (8) and (9), respectively. tanh(0.13672399 - 0.685087351tanh^-1(a₁) + 0.78937922 tanh^-1(a₂) - 0.687474652tanh^-1(a₃)) = y - 93.6583 + tanh(-1.599920472 + 0.474275763tanh^-1(a₁) + 0.629597938tanh

-1(a₂) + 0.129397058tanh^-1(a₃)) = y - 104.8248 + 14.0352a₁

+ 16.7297a₂, -1 < a₁< 1, -1 < a₂< 1, -1 < a₃< 1}

(9)

Since the dimension of the input space in all networks in Table 2 is three, we can also draw the three-dimensional preimage of each obtained SFLN based upon its corresponding mathematical expression.

With the explicit outcomes and insights obtained from the above preimage-extracting process, we, as the representative practitioner, conduct the examination of several beliefs.

We know that premium bonds and discount bonds could be determined by comparing the market interest rate with the contract coupon rate. Specifically, if the coupon rate is greater than the market interest rate, then the bond is priced with a premium. Else the bond is traded at a discount.

This belief leads to an insight that the preimage of a reliable SLFN should be parallel to the plane with this property that r_t = r_c. As shown in Fig. 1, these obtained preimages reveal the tendency of such insight. Thus, this belief has “high”

credibility.

Another common sense is that one bond with a greater coupon rate than another should be priced higher at a given interest rate. Namely, the high coupon rate implies a high price for one bond. Preimage of each SLFN in Fig. 1 does demonstrate a positive relationship between coupon rate and interest rate in the same. Thus, we could conjecture that the high interest rate results in the low bond price and this common known sense has “high” credibility.

Furthermore, observing Fig. 2, we find that the curvatures for the premium bonds and discount bonds are different in Networks I and II. In the r_t and T_t coordinate, the preimages for premium bonds appear to be concave and those for discount bonds appear to convex. This may imply another insight that is consistent with the viewpoint in the bond pricing but has “low” credibility. Namely, for keeping

(or discount) bond is getting short. In other words, the interest rate is more significantly influence on short-term bonds than long-term bonds for keeping the same price.

IV. CONCLUSIONS AND FUTURE WORK

This study adds to the literature by introducing the knowledge internalization process for the SLFN practitioner.

A possible future avenue of further enquiry may be exploring newly issued financial instruments with limited instances using the process set out here. In addition, the knowledge internalization process applied to real world data and the externalization of belief into explicit knowledge are all issues meriting further study.

This study also demonstrates the appropriateness of adopting mathematical analysis in place of the more usual data analysis to extract the accurate preimage from SLFNs.

With the extraction of preimages for each SLFN, the practitioner may obtain certain insights for any unexplored data which none knows the suited tools to analyze, because SLFN is known for its ability of being an universal approximator. Further, with these obtained insights, the practitioner may adopt the most suited tools for the empirical data, the tools that may help him/her more easily and properly to establish the model for the variables. Then, he may get more useful insights and understanding. For example, when the dependent variable of interest is monotone along with a certain factor linearly or nonlinearly combined by these explanatory (or independent) variables, the extraction of preimages may be a great benefit. In such a case, these extracted preimages appear to be parallel with this linear combined factor. With such an insight, the practitioner may adopt the common regression method or other suited tools for the data after he properly transform the variables. Of course, this argument is one of future researches.

REFERENCES

[1] R. Andrews, J. Diederich, and A. B. Tickle, “Survey and critique of techniques for extracting rules from trained artificial neural networks,”

Knowledge-Based Systems, Vol.8, no. 6, pp. 373-389, 1995.

[2] B. Baesens, R. Setiono, C. Mues, and J. Vanthienen, “Using neural network rule extraction and decision tables for credit-risk evaluation,”

Management Science, Vol. 49, no. 3, pp. 312-329, 2003.

[3] F. Maire, “Rule-extraction by backpropagation of polyhedral,” Neural Networks, Vol. 12, pp. 717-725, 1999

[4] J. Munkres, Topology: a first course. Englewood Cliffs, NJ: Prentice-Hall, 1975.

[5] K. Murty, Linear Programming. New York, NY: John Wiley & Sons, 1983.

[6] I. Nonaka, and H. Takeuchi, The Knowledge-creating Company.

Oxford: Oxford University Press, 1995.

[7] J. Rabuñal, J. Dorado, A. Pazos, J. Pereira, and D. Rivero, “A new approach to the extraction of ANN rules and to their generalization capacity through GP,” Neural Computation, Vol. 16, pp. 1483-1523, 2004

[8] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representation by error propagation,” In D. E. Rumelhart, & J.

L.McClelland (Eds.), Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundation. Cambridge, MA:

MIT Press, pp. 318-362, 1986.

[9] D. E. Rumelhart, and J. L. McClelland, Parallel distributed processing:

explorations in the microstructure of cognition, vol. 1: foundation.

Cambridge, MA: MIT Press, 1986

[10] K. Saito, and R. Nakano, “Extracting regression rules from neural networks,” Neural Network, Vol. 15, no. 10, pp. 1297-1288, 2002 [11] R. Setiono, W. K. Leow, & J. M. Zurada, “Extraction of rules from

artificial neural networks for nonlinear regression,” IEEE

Transactions on Neural Networks, Vol. 13, no. 3, pp.564-577, 2002.

[12] R. Setiono, and H. Liu, “NeuroLinear : From neural networks to oblique decision rules,” Neurocomputing, Vol. 17, no. 1, pp. 1-24, 1997.

[13] D. Sgroi, and D. Zizzo, “Neural Networks and bounded rationality,“ Physica A, Vol. 375, pp. 717-725, 2007.

[14] I. A. Taha, and J. Ghosh, “Symbolic interpretation of artificial neural networks,” IEEE Transactions on Knowledge and Data Engineering, Vol. 11, no. 3, pp. 448-463, 1999.

[15] A. B. Tickle, R. Andrews, M. Golea, & J. Diederich, “The truth will come to light: directions and challenges extracting the knowledge embedded within trained artificial neural networks,” IEEE

Transactions on Neural Networks, Vol. 9, no. 6, pp. 1057-1068, 1998.

[16] R. Tsaih, “An explanation of reasoning neural networks,”

Mathematical and Computer Modelling, Vol. 28, pp. 37-44, 1998.

[17] R. Tsaih, Y. Hsu, and C. Lai, “Forecasting S&P 500 stock index futures with the hybrid AI system,” Decision Support Systems, Vol. 23, no. 2, pp. 161-174, 1998.

[18] R. R. Zhou, S. F. Chen, and Z. Q. Chen, “A statistics based approach for extracting priority rules from trained neural networks,”

Proceedings of the IEEE-INNS-ENNS International Join Conference on Neural Network, Como, Italy, CA: IEEE Computer Society, pp.

401-406, 2000.

TABLE I

THE 18 HYPOTHETICAL BONDS WITH DIFFERENT COMBINATIONS OF TERM TO MATURITY AND CONTRACTUAL INTEREST RATE.

Bond

a Assume that coupon payments are made annually.

TABLE II

FINAL WEIGHTS AND BIASES OF NETWORKS I, II AND III, RESPECTIVELY.

Weights and Biases

Fig. 1. Preimage graphs along the rt and rc plane for network I, II, III (from top to bottom), respectively.

在文檔中利用preimage分析萃取規則之實作 (頁 40-45)