Artificial neural networks

2. Methodology

2.3 Artificial neural networks

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

17 weight

input layer

hidden layer

output layer bias

transfer function

neurons

can successfully reduce the chance of mode mixing, is a really good, substantial improvement over the original EMD.

2.3. Artificial neural networks

Artificial neural networks (ANNs), which have been widely used for data prediction in many application domains, are a kind of intelligent learning paradigm.

They have been developed over the last 50 years, and the first, simplest model of ANNs which is called Perceptron was proposed by Frank Rosenblatt in 1957.

Nowadays, the most popular model of ANNs is feed-forward back-propagation neural network (BPNN). It adopts the Widrow-Hoff learning rule (i.e. least mean squared (LMS) rule) (Hagan et al., 1996) and different algorithms such as the steepest descent method, Newton‘s method and Levenberg-Marquardt (LM) algorithm to train the network. ANNs are designed to imitate the biological neural system. Typically, ANNs contain three sections: neurons, connection weights and transfer functions.

Figure 2.3 Simple structure chart of three-layers neural network

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

The connection weights present the strength between neurons. A larger weight means the connection is stronger while a smaller weight presents a weaker link. The neurons process input signals overlap at the neuron and are sent to the transfer function for generating output value. The transfer functions are created to restrict the output value of ANNs. Because different kinds of ANNs are used in different ways, they need different transfer functions to generate different results.

The feed-forward BPNN, which is the most popular model of ANNs for time-series data prediction, uses three kinds of transfer functions: log-sigmoid function, hyperbolic function and linear function. Log-sigmoid function and hyperbolic function are often used in the hidden layer. The log-sigmoid function takes an output value between 0 and 1 while the hyperbolic function takes an output value between 1 and -1. These two transfer functions are both differentiable so that the training algorithms may work for the network, otherwise the linear function is usually put in the output layer. It can produce values of any number.

In this thesis, we used the feed-forward BPNN for modeling the decomposed IMFs and the residual component. In the BPNN, there is an important parameter called the

―mean square error function‖, which is a function of weights. Since our goal was to minimize the mean square error function, we had to adjust the connection weights iteratively by training the network. The mean square error function could be presented as:

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

where a_i is the final output value, a function of weight. t_i is the target value, and e_i is the error between the values ai and ti. On the other hand, the input value of the mth layer‘s ith neuron is the nonlinear function of the output value of the (m-1)th layer‘s neurons:

The function is the transfer function previously discussed. wijm

is the weight between the mth layer‘s ith neuron and (m-1)th layer‘s jth neuron, bim

is the bias of

mth layer‘s ith neuron, a

_j^m-1 is the output value of (m-1)th layer, and a_i^m is the output value of mth layer.

In the history of BPNN, there have been several algorithms used to train the network for adjusting the weight. Here we adopted the Levenberg-Marquardt (LM) algorithm, which combines the advantages of the steepest descent method and Newton‘s method:

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

where W^m(k) is the matrix of weights in the mth layer after the kth adjustment. H(k) is called the ―Hessian matrix‖, which is the second derivative of the mean square error functions, and g(k) is the first derivative of F(w). is the identity matrix, and k is the control parameter.

The control parameter k changes iteratively, set with a big value in the beginning.

Meanwhile, the LM algorithm will equal to the steepest descent method. The steepest descent method converges quickly when our result is still far from the optimal point, but as the result gets closer to the optimal point, the convergence speed gets slower, so the method costs much more times to find the optimal result.

The k becomes a very small value in the later period of training, meanwhile the LM algorithm will become equal to Newton‘s method, which converges quickly when the result is approaching to the optimal result.

The significant reason why we chose BPNN as our prediction tool was that the BPNN is usually regarded as a ―universal approximator‖ (Hornik et al., 1989). Hornik et al. found that a three-layer BPNN can approximate any continuous function arbitrarily well with an identity transfer function (i.e. linear transfer function) in the output layer and logistic functions (i.e. log-sigmoid function and hyperbolic function) in the hidden layer.

In practice, the neural networks with one and occasionally two hidden layers are widely used and perform well. In this thesis, we utilized the four-layer feed-forward BPNN. Moreover, the number of neurons in the hidden layers were set to the same value as the IMFs or this value plus two, respectively, since the number of neurons in the hidden layer can range from one-half to two times (Mendelsohn, 1993) the sum of input and output numbers.

‧

The applied complete four-layer BPNN is shown as follows:

Figure 2.4 The structure chart of complete four-layer BPNN

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Below is a flowchart of the BPNN learning progress which can help us to understand it more clearly.

Figure 2.5 The flowchart of training progress of BPNN No

Set all the parameters of neural network

Generate original weights from random number

Calculate the output value of hidden layer and output layer

Calculate the mean square error function

Adjust weights and biases

Check

1. Error function reaches minimum?

2. Achieve the maximum number of training?

Stop training

Yes

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

在文檔中基於EEMD之倒傳遞類神經網路方法對用電量及黃金價格之預測 - 政大學術集成 (頁 18-24)

2. Methodology

2.3 Artificial neural networks

國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

2.3. Artificial neural networks

Figure 2.3 Simple structure chart of three-layers neural network

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

mth layer‘s ith neuron, a

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧

Figure 2.4 The structure chart of complete four-layer BPNN

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

Figure 2.5 The flowchart of training progress of BPNN No

Yes

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學