• 沒有找到結果。

1. Overview of Neural Network

Artificial neural network (ANN) can be defined as a model of reasoning based on the human brain (Negnevitsky, 2005). An artificial neural network consists of several numbers of interconnected neurons, which are analogous to the biological neurons in the brain and connected by weighted link passing signal from one neuron to another. A neuron consists of a cell body called soma, a number of fibers called dendrites, and a single long fiber called axon. Dendrits are branches for a network around soma, the axon stretches out to the dendrites and somas of other neurons. Fig. 9 illustrates the schematic of neural network. The neurons are connected to the external environment from input and output layers. A neuron receives several signals from the input links, then computes an activation level and sends the information as an output signal through the output links. The output signal can be the final solution to the problem or be an input to other neurons. Neural networks have capability to learn, when they use experiences to improve the performance.

Figure 9 Biological neural network Source: Negnevitsky (2005)

Neural network has ability to learn and generalize, which means it can produce reasonable outputs for inputs that not faced during the training phase. This will make ANN has ability to solve complex problems that are difficult. However, ANN cannot solve the solution by working individually, but need to integrate with a consistent system engineering approach. There are several useful properties and capabilities of ANN, such as:

nonlinearity, input-output mapping, adaptively, evidential response, contextual information, fault tolerance, uniformity of analysis and design, and neurobiology analogy.

There are three basic elements in ANN model as shown in Fig. 10. The interconnecting link or synapses differentiate by the weight. An input signal is connected to neuron and multiplied by the synaptic weight. The weight of an artificial neural network can be within positive and negative values. An adder for summing the input signals, weighted by synapses of the neuron. Activation function is a function to limit the permissible amplitude range of the output signal to some finite value. An external bias also applied in the neuron model in Fig. 10. The bias is used to increase or decrease the net input of the activation function.

Figure 10 Nonlinear model of a neuron

The weights are the basic function of long-term memory in neural networks, and the learning ability is performing by repeated adjustment of these weights. Each neuron receives a number of input signals from its connections, while the output signal is transmitted through the neuron’s outgoing connection. The neuron computes the weighted sum of the input signals and then compares it with the result of the threshold value. For example, if the net input is less than the threshold value, then the neuron output is -1. But when the input value is greater than or equal to the threshold value, then the neuron becomes activated and the output reaches a value +1. Mathematical model for describing a neuron is describing in Eq. (8) as follows:

1 m

k kj j

j

u w x

=

= (8)

and,

(u b )

k k k

y =ϕ + (9)

where x1, x2, …, xm are the input signals; wk1, wk2, …, wkm are the synaptic weights of neuron k; uk is the linear combiner output; bk is the bias; ϕ(⋅) is the activation function; and yk is the output signal of the neuron.

2. Activation Functions

Activation functions work to defines output of neurons in terms of the induced local field. There are three basic types of activation functions: threshold function, sigmoid function, and piecewise-linear function. These activation functions are depicted in Fig. 11.

Threshold function

Sigmoid function

Piecewise-linear function

Figure 11 ANN activation functions

Threshold function

The threshold activation functions also called as hard limit function or Heaviside function. This type of activation function is expressed as:

1, if 0 ( ) 0, if 0

ϕ υ =

< (10)

The output of neuron k employing such a threshold functions as following:

1, if 0 0, if < 0

yk υ

υ

= ≥ (11)

where υk is the induced local field of the neuron as follows:

1 m

k kj j k

j

w x b υ

=

= + (12)

Many literatures refer this neuron as the McCulloch-Pitts model, which is developed by McCulloh and Pitts in 1943 (Haykin, 1999). In this model, when the output of a neuron is non-negative, then the output value will be 1. Meanwhile, when the output of a neuron is negative, then the output value will be 0. This functions often used in classification and pattern recognition tasks.

Sigmoid function

In sigmoid function, the graph is illustrate with s-shaped. Most of common form of neural network using this form as the activation functions. Sigmoid function can be defined as a strictly increasing function which perform a smooth balance between linear and non-linear performance. The sigmoid function can be formulated as in Eq. (13).

( ) 1

1 exp ( a )

ϕ υ = + − υ (13)

where is a slope parameter of the sigmoid function. Different slope can be obtained by varying the parameter . The sigmoid function becomes a threshold function when the slope parameters approaches infinity. A sigmoid function assumes a continuous range of values from 0 to 1, also the sigmoid function is differentiable. The sigmoid activation function is used in the back-propagation neural networks.

Piecewise-linear function

A Piecewise-linear activation functions or called as linear function provides output value equal to the neuron-weighted input. This linear activation function is often used for linear approximation. The piecewise-linear activation function described as in Eq. (14).

1, 1

2

1 1

( ) ,

2 2

0, 1

2 υ

ϕ υ υ υ

υ

≥ +

= + > > −

≤ −

(14)

The linear activation function provides output value equal to the neuron-weight input. This activation function is often used for linear approximation in neural network.

3. Learning Rules and Methods

Artificial neural network has ability to learn from the environment and it can improve performance through learning process. An ANN will learn about environment through adjustment of the weights and bias levels. There are several learning rules in ANN, such as error correction learning, Boltzmann learning, Hebbian rules, and competitive learning.

Error correction learning

In error correction learning, weights will be modified in order to reduce the error value. The error can be reduced gradually by using error signal (differences of target output and actual output). The error signals can be defined as follows:

( ) d ( ) y ( )

k k k

e n = nn (15)

where ek(n) is the error signal, dk(n) is desired response or target output, and yk(n) is output signal of neuron k.

Boltzmann learning

The Boltzmann learning rules is a learning algorithm based stochastic learning algorithm based on the design called Boltzmann machine (Haykin, 1999). The neurons create a recurrent structure, and they work in a symmetrical binary manner. In an ‘on’ state denote by +1 and in an ‘off’ state denote by -1. The number of weight i and j is the same (wij = wji). This rule is similar with error correction learning, but difference in the way adjustment the error value. The neurons in Boltzmann learning have two functional groups:

visible and hidden. Visible neurons provides interface between the network and environment, while in hidden neuron always operate freely. The operation mode for error correction can be perform by clamped condition and free-running condition. In clamped condition, the visible neuron is all clamped onto specific states based on the environment.

Meanwhile, all neurons are allowed to operate freely in the free-running condition. The Boltzmann learning rule represents:

( ) j k

kj kj kj

w η ρ+ ρ

∆ = − ≠ (16)

where is a learning rate parameter, is correlation between states of neuron j and k, with the network clamped condition. Then, denote correlation between the states of neurons j and k with the network in its free-running condition. and range in value from -1 to +1.

Hebbian learning

The Hebbian synapse increases the strength with positively correlated presynaptic and postsynaptic signals. When these signals uncorrelated or negatively correlated, then it will decrease the strength. The Hebbian learning rules also called as activity product rule.

The simplest form of Hebbian learning rules can be represents as in Eq. (17).

( ) ( ) ( )

kj k j

w n ηy n x n

∆ = (17)

where is a positive constant of the learning rate, is a synaptic weight of neuron k with presynaptic signal and postsynaptic signal .

Competitive learning

In competitive learning, the output neuron will be competing among them to become active. In this learning rules, only one output neuron is active at the same time.

This feature makes competitive learning rules highly suitable to find important features and classify the input patterns. The competitive learning rule is defined in Eq. (18).

(x w ) if neuron k wins the competition 0 if neuron k loses the competition

j kj

wkj η −

∆ = (18)

where is synaptic weight connection input node j to neuron k, is input pattern, is the learning parameters.

Generally learning methods can be classified to three methods: supervise learning, unsupervised learning, and hybrid learning. In supervise learning, every input from training set will be followed by the output. The difference between actual output and prediction output will be used to modify and adjust the weights. The prediction should be very close value with the actual value. In unsupervised learning or self-organized learning, the learning process is without using teacher. The neural network can organize itself to perform similar vectors without learning data. One of the solutions in this type of learning,

competitive learning rule can be used to perform the unsupervised learning. Competitive learning rule can be utilized by using input layer and competitive layer. Input layer will receive available data set, while competitive layer will compete with each other to response the features of input data. Then, the neuron with the greatest total input successfully will the competition and turns on, while the others will switch off. In hybrid learning method, the learning process will combine supervise and unsupervised learning methods. Some of the weight adjustment will be decided by supervise learning method, and others will be adopt by unsupervised learning method.

4. Problems in Learning Period

In neural network, overfitting or overtraining is a problem that occurs during neural network training. This error is driven to a very small value, but become large when new data is used. This happened because the network has memorized the training examples, but not learns to generalize to new circumstance. There are several methods to overcome the overfitting problem. The first method is to use a large network to provide an adequate fit.

More complex network, will make more complex the functions the network can create.

However, if it is difficult to know the size of network earlier, than early stopping and regularization methods can be used to improve generalization. In early stopping method, the available data is divided into three subsets. The first subset is the training set, which is used to compute gradient and to update the weights and biases. Then, the second subset is used as validation data set. During training process, the error on validation data set is monitored. During initial phases of training, error on the training set and validation set will be decreases. But when network begins to overfit the data, the error on validation set will be increase. This will make the training be stopped, and the weights and biases at the minimum of validation error are returned. The last set is test set error, which is not used during the training phase. This set is used to compare different models and to plot the test set error during the training process. The other method to improve overfitting is by using regularization. In regularization method, the improvement of generalization can be done by modify the performance function that is normally chosen to be the sum of squares of the network error on the training set.

5. Multilayer Neural Network

A multilayer neural network is a feed-forward neural network with additional hidden layer in the network. The Multilayer perceptron consists of an input layer, one or

more hidden layers, and an output layer as shown in Fig. 12. The input layer is used to accept input signals from the outside world and then distribute the data to all neurons in hidden layer. The output layer accepts output signals from hidden layer and establishes the output to the entire network. An example of multilayer feed-forward networks is back-propagation neural network (BPNN) structure that can handle nonlinear relationship between the input layers and the output layers by the supervised learning.

Figure 12 Architecture of multilayer perceptron

The number of hidden layer is very important and many techniques are used to decide the architecture. Hidden layer does not interact with the external environment directly; however, the hidden layer has influence for the decision of the final output.

Therefore, the number of hidden layer and neurons in hidden layer should be considered carefully. Training error will be high and under-fitting can be occurring if the number of neurons is too few. On the other hand, by selecting too many of neurons in hidden layer will make the training error low, but instead over-fitting will occur. Generally, single hidden layer is used in the architecture of BPNN. There are several methods to select the number of neuron in hidden layer. These methods such as:

1) The number of neurons should be between the size of input layers and output layers;

2) The number of neurons should be 2/3 of the size of input layers and the size of output layers;

n y d RMSE

n

k

k

k ) /

(

1

2

=

=

3) The number of hidden neurons should be not being over than twice size of the input layer.

By considering these rules, the right number of hidden layer and neurons in the hidden layer can be decided. BPNNs works by adjusting the weight in forward pass and reverse pass to neutrals based on the errors. This process will continuously work until the error value is within acceptable range. In the learning process, the weight values of the networks are adjusted based on the input information and output results. Better output can be obtained by providing more details in the input training classification and the greater amount of learning information. Since the learning and verification data for the BPNN is limited by the functional values, the data must be normalized by the following equation:

(

max min

)

min

min max

min D D D

P P

P

PN P × − +

= − (19)

where PN is the normalized data, P is the original data, Pmax is the maximum value of data, Pmin is the minimum value of the data, Dmax is the expected maximum value of the normalized data, and Dmin is the expected minimum value of the normalized data. When applying the neural network to the system, the input and output values of the neural network fall in the range of: [0.1, 0.9].

There are several conditions for network learning termination in BPNN such as:

when the root mean square error (RMSE) between the expected value, and network output value is reduced to a preset value; when the preset number of learning cycles has been reached; and when cross-validation takes place between the training samples and test data.

The first two conditions are related to the preset values. The research adopts the first and the second approaches by gradually increase the network training time to decreasing the RMSE until it is stable and acceptable. The RMSE is defined as follows:

(20)

where n, dk, and yk are the number of training samples, the actual value for training sample k, and the predicted value of the neural network for training sample k, respectively.

相關文件