(2) return on sales. The relationship among these indicators indeed can be used to analyze the timing of resources exploration in the behavior of firm. Several quantitative methods, such linear time series models with single-output, multi-input structure as AR, MA, ARX, ARMA, and ARMAX [17][18], are applicable for modeling the dynamics of the interaction between five indicators mentioned above. Once the trained model has achieved, it interprets the timing of resources exploration in the behavior of firm based on the coefficients of explanatory variables. However, a trained ARMAX model cannot reach optimal one with respect to the performance criterion of mean absolute percent error. Thus, three more nonlinear models, back-propagation neural network (BPNN) [19][20], adaptive neuro-fuzzy inference system (ANFIS) [21][22], and adaptive support vector regression (ASVR) [23][24], are also provided in this study so that the performance of each model can be compared quantitatively.. where polynomials the lag operator. C. indicated by. A(q) = 1 + a1q −1 + ... + a na q − na. (3). −1. (4). + ... + bnb1q. − nb1+1. … B nu (q) = b1nu + b2nu q −1 + ... + bnbnu q − nbnu +1 C (q) = 1 + c1 q −1 + ... + c nc q − nc. (5) In this study, five indicators suitably plug in ARMAX structure for modeling their relationship in order to analyze the timing of resources exploration in the behavior of firm. More precisely, the growth rate of long-term investment is designed as output y (t ) , as well as the others, the firm size, the return on total asset, the return on common equity, and the return on sales, are assigned to be input signals u1 (t ) , u 2 (t ) , u 3 (t ) , and u 4 (t ) , respectively. To select the best structure that has the smallest loss function in the validation data set is a good way to choose the delays for input signals nk1 ,..., nk nu in Eq. (2). Alternatively, a good idea is to select the structure with the best fit determined by the number of parameters that will be used in modeling. In such a way, the appropriate order of polynomial in lag operator q −1 for A(q) , B1 (q ) ,…, Bnu (q) , C (q ) on Eq. (3),(4),and (5) can be obtained. After the best structure of ARMAX model is selected, it will estimate the parameters of the designated structure based on the training and validation data sets.. Several linear and nonlinear models are described in this section. Three major models: autoregressive moving-average regression, back-propagation neural network, and segmented adaptive support vector regression are introduced here. 2.1. Auto-regressive moving-average regression (ARMAX) A general polynomial black-box model [19] allows a flexible model description of the response variables y j (t ); j = 1,2,..., ny , the explanatory. 2.2. Back-propagation neural network (BPNN) A well-known intelligent computing machine, threelayer back-propagation neural net (BPNN) [20] is used for modeling nonlinear structure to analyze the timing of resources exploration in the behavior of firm. For a three-layer BPNN, a structure of 4×10×1 multilayer-perceptron is used that the input layer has 4 input neurons to catch the input patterns, the hidden layer has 10 neurons to propagate the intermediate signals, and the output layer has 1 neuron to display the computed results (output y(t ) ) as shown in Fig. 1. We arrange the input pattern in the following four input signals: the firm size u1 (t ) , the return on total asset u 2 (t ) , the return on common equity u3 (t ) , and the return on sales u 4 (t ) . Only an appropriate growth rate of long-term investment y(t ) is designed as output. For more training assignments in this three-layer BPNN, the tangent-sigmoid transfer function is applied as the activations in the hidden layer, the symmetric saturating linear transfer function is employed to the output layer as the activations, and Bayesian regulation derived from Levenberg-Marquardt training method is used as the learning algorithm for three-layer BPNN. Moreover,. variables ui (t ); i = 1,2,..., nu , and the disturbances el (t ); l = 1,2,..., ne . The most typical singleoutput, multi-input system is written as follows. B (q) B (q) C (q) A(q) y (t ) = 1 u1 (t − nk1 ) + ... + nu unu (t − nknu ) + e(t ) (1) Fnu (q ). ,. are expressed below.. B1 (q) = b11 + b21q. 2. METHODS. F1 (q). q −1. A , B1 ,…, Bnu. D(q). where q is the shift operator, nk1 ,..., nk nu are delays for input signals, as well as A , B1 ,…, Bnu , C , D , and are polynomials in the lag operator q −1 . Before estimating the appropriate parameters for the model, both the orders of polynomials A , B1 ,…, Bnu , C , D , and F1 ,…, Fnu and the delays of input signals nk1 ,..., nk nu must be determined to make sure whether the selected structure is the best. This model can perform simulation, forecasting, and parameter estimation of univariate time series in the presence of auto-regressive moving-average regression (ARMAX) [19], especially in financial time series applications like asset return problem. ARMAX model is described below. A(q ) y (t ) = B1 (q)u1 (t − nk1 ) + ... + Bnu (q )u nu (t − nk nu ) + C (q )e(t ) (2) F1 ,…, Fnu. 2.

(3) training epochs is assigned to 1000 as well as training stop criteria is set to be 10-6 in this case.. A Sugeno rule operates [21] like the designated diagram as shown in Fig. 2.. 2.3.. 2.4. Adaptive support vector regression (ASVR). Adaptive neuro-fuzzy (ANFIS). inference. system. Support Vector Machines along with neural networks as one of the standard tools for machine learning and data mining [23][24]. Initially developed for solving classification problems, SV technology can also be successfully applied in regression, i.e. functional approximation, problems. Unlike pattern recognition problems, where the desired outputs are discrete values like Booleans, here there are real-valued functions [25]. We consider approximating functions solved by support vector regression (SVR) as the form of l (9) f ( x, w) = wiφ ( x) ,. The acronym ANFIS derives its name from adaptive neuro-fuzzy inference system [21]. Using a given input/output data set, the ANFIS constructs a fuzzy inference system (FIS) whose membership function parameters are tuned (adjusted) using either a backpropagation gradient descent algorithm alone, or in combination with a least squares type of method [22]. This allows your fuzzy systems to learn from the data they are modeling. It actually is of Sugenotype fuzzy inference systems in which ANFIS uses a hybrid learning algorithm to identify parameters of Sugeno-type fuzzy inference systems. ANFIS can also be invoked using an optional argument for model validation. The type of model validation that takes place with this option is a checking for model overfitting, and the argument is a data set called the checking data set. We may think of a learning data set as a training data set plus another checking data set. ANFIS has been shown to be very useful for modeling nonlinear system concerning the complex dynamic behavior [21], for instances, the applications of non-periodic short-term forecasting. This is because the feature of ANFIS with inherent distributive architecture and the efficient learning algorithms for adapting system’s parameters. However, ANFIS must follow four constraints: (a) Each output is specified to be first or zeroth order Sugeno-type function. (b) All output membership functions must be a single output with the same type and either being linear or constant. Defuzzification employs weighted average method (c) the number of output membership functions must be equal to the number of rules. (d) Unity weight for each rule is applied. A typical rule in a Sugeno fuzzy model has the form If Input1 = x and Input2 = y, then Output z = ax + by + c (6) For a zero-order Sugeno model, the output level z is a constant (a = b = 0) . The output level z i of each rule is weighted by the firing strength wi of the rule. For example, for an AND rule with Input 1 = x and Input 2 = y , the firing strength is wi = AndMethod ( F1 ( x), F2 ( y )) (7) where F1,2 (.) are the membership functions for Inputs. ∑ i =1. where φ (x) are denoted by features. In order to introduce all relevant and necessary concept of SV regression in a gradual way, a simple linear regression is considered first. (10) f ( x, w) = wT φ ( x) + b Furthermore, Vapnik introduced a general type of loss function, namely, error, the linear loss function with ε -insensitivity zone: ⎧⎪ 0 if y − f (x, w) ≤ ε , (11) y − f (x, w) = ⎨ ε. l. R ( w, ξ , ξ * ) =. ∑. ε. i =1. 1 w 2. 2. ⎛ ⎜ + C⎜ ⎜ ⎝. l. l. ∑ξ + ∑ξ i. i =1. i =1. ⎞. *⎟ i ⎟. ⎟ ⎠. ,. (13). under constrains (14) (15) w φ ( xi ) + b − yi ≤ ε i = 1,..., l (16) ξ i ≥ 0, i = 1,..., l (17) ξ i* ≥ 0, i = 1,..., l where the constant C influences a trade-off between an approximation error and an estimation error decided by the weight vector norm w , and this yi − wT φ ( xi ) − b ≤ ε + ξi , i = 1,..., l T. + ξi* ,. design parameter is chosen by the user. ξ i and ξi* are slack variables as the measurement upper bound and lower bound of outputs. This quadratic optimization is equivalence to apply Karush-Kuhn-Tucker (KKT) conditions for regression in which maximizing dual variables Lagrangian Ld (α ,α * ) :. i i. i =1 N. ∑. According to the learning theory of SVMs, the objective is to minimize the empirical risk and normsquared of weight vector simultaneously. Thus, estimate a linear regression hyperplane f ( x, w) = wT φ ( x) + b by minimizing. N. Final Output =. otherwise.. A new empirical risk is introduced for performing support vector regression. l 1 (12) Remp ( w, b)ε = yi − wT φ ( xi ) − b. 1 and 2. The final output of the system is the weighted average of all rule outputs, computed as. ∑w z. ⎪⎩ y − f (x, w) − ε. (8). wi. i =1. 3.

(4) Ld (α , α * ) = −. 1 2. * i. i. j. − α *j )φ ( xi )T φ ( x j ). i, j =1. l. −ε. Step 1: normalization of data sequence on Eq. (25) as follows.. l. ∑(α − α )(α. (18). ⎡ x(0) (1) ⎤ ⎡~ x(0) (1) ⎤ ⎢ (0) ⎥ ⎢~(0) ⎥ ⎢x (2)⎥ ⎢x (2)⎥ ~ XN ⎢~ = XN = ⎢x(0) (3)⎥, XN = x(0) (3)⎥ ⎢ ⎥ ⎥ max| x(0) (i) | ⎢ ⎢ M ⎥ ⎢ M ⎥ i ⎢ (0) ⎥ ⎢~(0) ⎥ ⎣x (n)⎦ ⎣x (n)⎦. l. ∑(α + α ) − ∑(α − α ) y , i. * i. * i. i. i =1. i. i =1. subject to constraints l. l. ∑α = ∑α i. i =1. * i. ,. (19). i =1. C ≥ α i ≥ 0,. i = 1,..., l. C ≥ α i* ≥ 0,. i = 1,..., l. (20) (21). Step 2: constructing a simple linear regression among the most recent normalized data points, ~ x (0) (1), ~ x (0) (n),..., ~ x ( 0) ( n) . ~ x (0) (k ) = ϕk + ψ , k = 1,2,..., n (30) where ϕ is the slope and ψ is the bias in this line. Step 3: Eq. (30) turns out to be a normal equation and its solution Θ to this least squared problem is obtained. ~ X = ΩΘ , (31) ~ Θ = ( Ω T Ω ) −1 Ω T X , (32) ~ ~ ( 0) ~ ~ ( 0) ( 0) T where X = [ x (1), x (2),..., x (n)]. After calculating Lagrange multipliers α i and α i* , find an optimal desired weights vector of the regression hyperplane as l (22) w0 = (α i − α i* )φ ( xi ). ∑ i =1. and an optimal bias of regression hyberplane as ⎛ l ⎞ 1⎜ ⎟ (23) b0 = ( yi − φ ( xi )T w0 ) . l ⎜⎜ ⎝. ∑. ⎟⎟ ⎠. i =1. In nonlinear cases for regression, the kernal function, for typical instances, polynomial, RBF, or sigmoid function, will be adopt to replace the scale product φ ( xi )T φ ( x j ) with K ( xi , x j ) in Eq. (18).. ⎡1 ⎢ ⎢2 Ω = ⎢3 ⎢ ⎢M ⎢n ⎣. If the term is defined in training data set, the output of SVR can be obtained with new input pattern zi [25]. (24) y = gβ + b0 where the vector g is constructed by g = ziT x , and matrix x stands for patterns in training data set as well as vector zi represents new input pattern. β i = (α i − α i* ). (33). (34). Step 5: according to the above sequences both Δ N −1 and Λ N −1 , the total deviations. ,. ⎧0, if t ≠ t0 delta (t − t0 ) = ⎨ ⎩1, if t = t0. n. κ=. ∑| δ. ~ ( 0). (i) |. (35). , e = 10 −6. (36). i=2. (25) n. ,. κ , the coefficient of. oscillation τ , the ratio of final deviation to mean deviation ρ , the ratio of last two deviation in average to mean deviation ϑ , and a coefficient of weighted-oscillation at final deviation σ are calculated.. the delta function is represented as μ=. (26). ∑δ. ~ ( 0). κ +e ρ=. (27) ϑ=. Let the original data sequence as expressed below. X N = {x (0) (1), x (0) (2),..., x (0) (n)} (28) The algorithm of adaptive support vector regression (ASVR) is proposed in the following several steps from Eq. (29) to Eq. (46).. (i ) + e. i=2. and the sign function is defined as if h < 0. Θ = [ϕ ,ψ ]T. ⎡δ~(0)(2)⎤ ⎡ x(0)(2) −x(0)(1) ⎤ ⎡δ(0)(2)⎤ ⎥ ⎢~(0) ⎥ ⎥ ⎢ ⎢ (0) (0) (0) ⎢δ (3)⎥ ⎢ x (3) −x (2) ⎥ ⎢δ (3)⎥ Δ ~ N−1 = ⎢δ (0)(4)⎥ ΔN−1 = ⎢ x(0)(4) −x(0)(3) ⎥ = ⎢δ(0)(4)⎥, ΛN−1 = ( 0 ) ⎥ ⎥ ⎥ ⎢ ⎢ max| δ (i) | ⎢ ⎢ M ⎥ ⎥ ⎢ M ⎥ ⎢ M i ~ ⎥ ⎢ ⎥ ⎢ ⎢ (0) (0) (0) (0) ⎥ ⎣δ (n)⎦ ⎣x (n) −x (n−1)⎦ ⎣δ (n)⎦. The algorithm applied to the constrained optimization of support vector regression (SVR) is called adaptive support vector regression (ASVR) and designed for exploring two free parameters C and ε such that the computation burden for quadratic programming (QP) reduce a lot, and it converges to the near-optimal solution soon. The unit-step function u (t ) is written by. if h ≥ 0. , and. followed by normalizing Δ N −1 to Λ N −1 .. x = [ x1 x2 ... xl ]. ⎧ 1, sign(h) = ⎨ ⎩− 1,. 1⎤ ⎥ 1⎥ 1⎥ ⎥ M⎥ 1⎥⎦. Step 4: taking difference on Eq. (29) led to Δ N −1. zi = [ zi1, zi 2 ,..., ziN ]T. ⎧0, if t < t0 u (t − t0 ) = ⎨ ⎩1, if t ≥ t0. (29). ~ | δ ( 0) ( n) |. κ. n −1 ~ ~ | δ (0) (n − 1) + δ (0) (n) | 2. κ. (37) (38). n −1. σ = tanh( ρ 2 / μ ). (39) Step 6: After that, a brief formula based on the several expressions on Eq. (35)-(39) to determine the. 4.

(5) value of ε is established in the following Eq. (40)(42). q = exp{− tanh(| ϕ |)1/ 2 σ (1−u ( μ −1)) − u ( μ − 1) ⋅ ρ ϑ } (40) ⎧q (1−delta (κ )) ⎪ v=⎨ ⎪⎩ qμ (1−u ( μ −1))ϑ 2u ( μ −1). [. ε =v×. if sign(ϕ ) = 1. ]. (1− delta (κ )). if sign(ϕ ) = −1. max( X N ) − min( X N ) 2. problem. In this study, we proposed a novel scheme to compute a near optimal value of free parameter σ, denoted by σ rbkf , automatically so that it can be obtained fast and succinctly according to provided training data sequence {x1, x2 ,..., xl } in the following formula on Eq. (46). , (41). ,. (42) σ rbkf. Once the value of v has been determined, the ε in Eq. (42) is set, and then the constrained optimization on Eq. (18)-(21) will start for several iterations to search the optimal w0 and b0 on Eq. (22)-(23). In support vector regression, an increase value of parameter C will highly penalizes the big empirical error while an increase value of ε will reduce the support vectors to loose the bound of empirical error [26]. Therefore, how to deal with a trade-off between C and ε in SVR is become a very important issue. In this research, the relationship between ε and C, we proposed, can be constructed in the basis of modified Bessel function of second kind with the order n [27] as expressed below. A specific integer number n is obtained from a function ⎡ ⎤ of the coefficient of the oscillation μ as described on Eq. (33). n = ⎡1 / μ ⎤ (43) where the operator ⎡1 / μ ⎤ is represented as a smallest integer bigger than 1 / μ . C = Kn (ε) = (−1)n+1{ln(ε / 2) + γ }In (ε) + (−1)n + 2. ∞. 1 2. k. k=0. ∑ k=0. ∞. I n (ε ) =. ∑ k!(Γε(/n2+) k + 1) n+2k. γ = 0.5772156.... is. Euler’s. (44). (45) constant. i =1. l −1. 2. ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎟ ⎠. l. ∑x , x=. j. j =1. (46). l. A collection of data about five indicators, (i) the growth rate of long-term investment (GRLTI), (ii) the firm size (FS), (iii) the return on total asset (ROA), (iv) the return on common equity (ROE), and (v) the return on sales (ROS), from TEJ [30] including 30 corporations have been cited herein for explaining the timing of resources exploration in the behavior of firm to fit in with the real world dynamics in changing environments. In order to accomplish data manipulation easier, data preprocess is required to transform indicator GRLTI linearly with appropriate bias, and a natural logarithm applied to indicator FS. The first phase designed as training/learning stage for modeling linear structure of ARMAX as well as nonlinear structure of BPNN and SASVR, which is of the posterior analysis from observed 383 historical data for a period of 10 years from 1995 to 2004. Next, the second phase, the prior validation stage proceeded to simulate the empirical results for examining the system performance employing interpolation from trained model. Estimated ARMAX with a bias of scalar (bias2 = 2000.5) simulated from computer:. k =0. where. i. 3. EMPIRICAL SIMULATION AND DISCUSSIONS. 2k−n. (ε / 2)n+2k {Φ(k) + Φ(n + k)} k!(n + k)!. 1/ 2. l. ∑ (x − x). In such a way, the calculation burden for searching an appropriate free parameter σ indeed reduces a lot.. n−1. ∑(−1) (n −k −1)!(ε / 2). ⎛ ⎜ ⎜ = υ ⋅ ⎜⎜ ⎜ ⎜⎜ ⎝. and. 1 1 1 Φ ( p ) = 1 + + + ... + , Φ (0) = 0 . 2 3 p. The tunable free parameters in SVR can be done automatically and referred it to as adaptive support vector regression (ASVR). In addition, another free parameter σ of radial basis kernel function has exploited here for optimizing SVR learning. As we know, radial basis function is usually utilized as a sort of activation function while training an intelligent machine. Furthermore, the radial basis function can not only be applied to linear or nonlinear dynamics, but it also frequently acted as a kernel function for SVR learning more than polynomial or tangent-sigmoid function do. Thus, it is important that how to select an optimal one from the given training data set is a critical issue. However, there is no any unified thumb of rule to search an appropriate scale for free parameter σ of radial basis kernel function, even though several literatures [28][29] has proposed some method to explore the near optimal free parameter σ. Accordingly, the optimal learning in SVMs becomes an challenge. y (t ) = 0.3023 ⋅ y (t − 1) + 0.1651 ⋅ y (t − 2) + 0.04129 ⋅ y (t − 3). + 0.2023 ⋅ y (t − 4) + 0.04312 ⋅ u1 (t − 3) − 0.004114 ⋅ u 2 (t − 2) − 0.0023 ⋅ u 3 (t − 1) − 0.0056 ⋅ u 4 (t − 3) + e(t ) − 0.2227 ⋅ e(t − 1). (47). − 0.1648 ⋅ e(t − 2). Thus, we can construct an estimated ARMAX model as expressed in Eq. (47). In this estimated ARMAX model, we can check directly from Eq. (47) to explain the current GRLTI is related to it’s the most recent four lags of GRLI, as well as coupled to the second lag of FS, the second lag of ROA, the second lag of ROE, and the second lag of ROS. We can interpret that in the respect of autoregressive GRLTI is definitely auto-correlated to a few of most recent historical (the past) GRLTI. Furthermore, there are three indicators ROA, ROE, and ROS playing the roles to affect the current GRLTI negatively. In other words, increasing on ROA, ROE, or ROS will depress the current GRLTI. This development can meet the typical theory in the prospect theory. 5.

(6) perspective [14]. Nevertheless, an indicator FS can promote the current GRLTI such that the larger FS is, the higher GRLTI will be. It is also noted that strictly speaking the residual terms indicated by e(t ) , e(t − 1) , and e(t − 2) in MA part have small values usually. The MA part cannot affect GRLTI significantly, even though they have relatively larger coefficients with respect to those terms, e(t ) , e(t − 1) , and e(t − 2) as shown in Fig. 3. Obviously, MA part is trivial in this ARMAX model as a result of small residuals; in contrast, AR and X part of estimated ARMAX model are used to determine GRLTI predominantly and their corresponding coefficients is displayed as shown in Fig. 4. As a matter of fact, the most recent lags of GRLTI are essentially related to the performance of GRLTI, and secondly we must also take FS into account when we examine the changes in GRLTI. The performance criteria [31] derived by mean square error (MSE), mean absolute deviation (MAD), and mean absolute percent error (MAPE) will be used to compare the empirical simulation for every competitive models (ARMAX, BPNN, ANFIS, and ASVR) applied to the same sample data set, as listed in Table 1. After the comparison between three models, nonlinear models have better performance on MSE, MAD, and MAPE than the linear one, namely, ARMAX model. This is because MAPE criterion of the trained ARMAX model with different biases always cannot less than 5%. It implies that the accuracy of empirical simulation is not enough for the trained ARMAX model. On the contrary, nonlinear models like BPNN, ANFIS, and ASVR have achieved higher effectiveness in modeling. Furthermore, the goodness of fit for the proposed methods, as listed in Table 1, is also tested by Q-test [17], and null hypothesis cannot be rejected due to all p-value greater than level of significance (5%). In other words, all of trained models are significant on the test of fitting problem. As for model validation, ASVR has achieved the best Akaike information criterion (AIC) and Bayesian information criterion (BIC) [18], that is, the better reliability for ASVR modeling. However, the individual interaction between any single input indicator and the output indicator cannot be presented in the nonlinear structures of BPNN, ANFIS, or ASVR. In linear structure of ARMAX model, we can view directly which input indicator affecting the output indicator to explain the timing of resources exploration in the behavior of firm. Even though SVR takes a large longer time to train its model, ASVR we proposed takes much less computation time for modeling than ordinary SVR as listed in Table 1.. (a) In the linear structure of ARMAX model, four input indicators (the firm size, the return on total asset, the return on common equity, and the return on sales) can actually affect the changes in the output indicator (the growth rate of longterm investment) significantly over the different levels. That is, the resulting ARMAX model can explain the growth rate of long-term investment that can help decision-maker to explore new resources due to the pressure of external changing environment. This dynamics also can be considered as the timing when the risky attitude of managers will be shifted from riskavoiding exploiting activities to relative risktaking exploring activities. (b) In ARMAX model, the firm size affects the growth rate of long-term investment positively whereas the return on total asset, the return on common equity, and the return on sales influence the growth rate of long-term investment negatively. Besides, 1-lag and 2-lag of innovations are also involved in the resulting output. There are four lags within autoregression of ARMAX model, and the autoregression of ARMAX model dominates the growth rate of long-term investment largely since the coefficients with respect to four lags of auto-regression are relatively bigger than the other parts. Secondly, we cannot neglect the contribution of 3-lag of firm size to the output due to its big coefficient. (c) For a given long length data stream, the performance of estimation using ASVR not only reduces the MSE a lot to attain better generalization, but it also improves MAPE highly to boost the localization as comparing with the ordinary SVR. Clearly, the nonlinear structure of ASVR can get the satisfactory results better than the linear structure of ARMAX model or the nonlinear structure of BPNN and ANFIS. However, the nonlinear models, BPNN, ANFIS, or ASVR, cannot tell us the exact the contribution of individual factor to the output, the growth rate of long-term investment, because those factors are hidden in the nonlinear system.. 4. CONCLUSING REMARKS. [1] E. Penrose, The Theory of the Growth of the Firm, Oxford University Press: Oxford, 1959. [2] R. P. Rumelt, Toward a Strategic Theory of the Firm. In Competitive Strategic Management, pp.. The following statements summarize accomplishment of the proposed methods.. 5. ACKNOWLEDGEMENTS This work is partially supported by the National Science Council, Taiwan, Republic of China, under grant number NSC 93-2218-E-143-001.. 6. REFERENCES. the. 6.

(7) [3]. [4] [5] [6]. [7]. [8] [9]. [10] [11] [12] [13] [14] [15]. [16] [17] [18] [19] [20]. 556-570, Lamb RB (Ed), Prentice-Hall: Englewood Cliffs NJ, 1984. J. B. Barney, “Types of Competition and the Theory of Strategy: Towards an Integrative Framework,” Academy of Management Review, vol.11, no.4, pp.791-800, 1986b. J. B. Barney, “Firm Resources and Sustained Competitive Advantage,” Journal of Management, vol.17, no.1, pp. 99-120, 1991. J. B. Barney, “Looking Inside for Competitive Advantage,” Academy of Management Executive, vol. 9, no. 4, pp. 49-61, 1995. I. Dierickx and K. Cool, “Asset Stock Accumulation and Sustainability of Competitive Advantage,” Management Science, vol. 35, no. 12, pp. 1504-1514, 1989. M. A. Peteraf,” The Cornerstones of Competitive Advantage: A Resourced-Based View,” Strategic Management Journal, vol. 14, pp. 179-191, 1993. B. Wernerfelt,” A Resource-Based View of Firm,” Strategic Management Journal, vol. 5, pp. 171-180, 1984. C. E. Helfat and M. A. Peteraf, “The Dynamic Resource-Based View: Capability Lifecycles,” Strategic Management Journal, vol. 24, pp. 997-1010, 2003. J. G. March, “Exploration and Exploitation in Organizational Learning,” Organization Science, vol. 2, pp. 71-87, 1991. R. M. Cyert and J. G. March, A Behavioral Theory of the Firm, U.S.: Prentice-Hall, 1963. D. A. Levinthal and J. G. March, “The myopia of learning,” Strategic Management Journal (1986-1998), vol. 14, pp. 95-112, 1993. H. A. Simon, Administrative Behavior: A Study of Decision-Making Process in Administrative Organization, Free Press: New York, 1997. D. Kahneman, &, A. Tversky, “Prospect Theory: An analysis of Decision Under Risk,” Econometrica, pp. 263-291, 1979. F. T. Rothaermel and D. L. Deeds, “Exploration and Exploitation Alliances in Biotechnology: A System of New Product Development,” Strategic Management Journal, vol. 25, no. 3, pp.201-221, 2004. J. A. Schumpeter, The Theory of Economic Development, Transaction Publishers: New Brunswick, NJ, 1934. Hamilton, J.D., Time Series Analysis, New Jersey: Princeton University Press, 1994. G. E. P. Box, G. M. Jenkins, and G. C. Reinsel, Time Series Analysis: Forecasting & Control, New Jersey: Prentice-Hall, 1994. B. L. Bowerman and R. T. O’Connell, Forecasting and Time Series: An Applied Approach, Belmont: Duxbury Press, 1993. S. Haykin, Neural Network: A Comprehensive Foundation, 2nd Ed, Prentice Hall, New Jersey,. 1999. [21] J.-S. R. Jang, "ANFIS: Adaptive-Networkbased Fuzzy Inference Systems," IEEE Transactions on Systems, Man, and Cybernetics, Vol. 23, No. 3, pp. 665-685, May 1993. [22] J. Neter, W. Wasserman, and M.H. Kutner, Applied linear statistical models, 2nd Ed, Homewood, IL: Irwin, 1985. [23] V. Vapnik, The Nature of Statistical Learning Theory. New York: Springer-Verlag, 1995. [24] N. Cristianini, and J. Shawe-Taylor, An Introduction to Support Vector Machines (and other kernel-based learning methods), London: Cambridge University Press, 2000. [25] S. R. Gunn, Support Vector Machines for Classification and Regression. Technical Report, Image Speech and Intelligent Systems Research Group, University of Southampton, 1997. [26] V. Kecman, Learning and Soft Computing, Massachusetts : MIT Press, 2001. [27] E. Kreyszig, Advanced Engineering Mathematics, 8th Edition, Wiley, New York, 1999. [28] Chih-Chung Chang, Chih-Jen Lin “LIBSVM: a library for Support Vector Machines”, http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm .pdf [29] Y. L. Murphey, Z. H. Chen, M. Putrus, and L. Feldkamp, “SVM Learning from Large Training Data Set”, Proc. IEEE IJCNN, pp.8602865, 2003. [30] TEJ database, Taiwan Economic Journal Co. Ltd.,Taiwan, 2004. http://www.tej.com.tw/ [31] B. R. Chang, “Hybrid BPNN-Weighted GreyCLMS Forecasting,” Journal of Information Science and Engineering, Vol. 21, No. 1, pp. 209-221, January, 2005.. 7.

(8) Input 1. TABLE 1 The performance criteria is asserted by using mean square error (MSE), mean absolute deviation (MAD), and mean absolute percent error (MAPE) for five indicators to analyze the timing of resources exploration in the behavior of firm with 383 sampled data from 1995 to 2004. Data preprocess has been manipulated linearly with two different biases (bias1=1000.5 and bias2=2000.5). The goodness of fit for the proposed methods is also test by Q-test with the level of significance ( α =0.05). Model validation is evaluated by Akaike information criterion (AIC) or Bayesian information criterion (BIC) and the computation time is done as well. ANFIS (bias2). SVR (bias1). ASVR (bias2). 0.0025. 0.0023. 0.0021. 0.0021. 0.0021. 0.0000215. 0.0063. 0.0085. 0.0066. 0.0068. 0.0457. 0.0046. 0.0532 p-VALUE 0.4415 AIC -1205.0. 0.0511. 0.0349. 0.0368. 0.3856. 0.0392. 0.1583. 0.4823. 0.6262. 0.0733. 0.4797. -1230.6. -1258.5. -1261.2. -2520.9. -2293.5. -1189.2. -1214.8. -1242.7. -1245.4. -2505.1. -2285.8. MSE MAD MAPE. BIC. F1(x). x. wi Induced Weight (firing strength). AND. Input MF. Input 2 y. F2(y). Output MF z. Output Level. z=ax+by+c. Figure 2. A fuzzy inference through a single rule is conducted by ANFIS. ARMAX Model. e(t-2) e(t-1) Explanatory Variable (AR, MA, and X). Criteria ARMAX ARMAX BPNN (bias1) (bias2) (bias2). Input MF. Method abbreviation 1. ARMAX- Auto-regressive Moving-average Regression 2. BPNN- Back-propagation Neural Network 3. ANFIS- Adaptive Neuro-fuzzy Inference System 4. ASVR- Adaptive Support Vector Regression. e(t) u4(t-3) u3(t-1) u2(t-2) u1(t-3) y(t-4) y(t-3) y(t-2) y(t-1) -1.00. -0.50. 0.00. 0.50. 1.00. 1.50. Coefficient Value u1 ( t ). . . .. u2 ( t ). Figure 3. The coefficients with respect to the corresponding explanatory variables of AR, MA and X parts of ARMAX are displayed.. y (t ). u3 ( t ). ARMAX Model. Input Layer. Hidden Layer. Explanatory Variable (AR and X). u4 ( t ). Output Layer. Figure 1. A typical three-layer BPNN architecture and Tangent-sigmoid and Pure-line transfer function used as the output of activations in the hidden layer and output layer, respectively.. u4(t-3) u3(t-1) u2(t-2) u1(t-3) y(t-4) y(t-3) y(t-2) y(t-1) -0.10. 0.00. 0.10. 0.20. 0.30. 0.40. Coefficient Value. Figure 4. The coefficients with respect to the corresponding explanatory variables of AR and X parts of ARMAX model are emphasized here.. 8.

(9)