Technical Background of ANN - Introduction to Artificial Neural Network

2. Introduction to Artificial Neural Network

2.1.2 Technical Background of ANN

Artificial neural network is an algorithm that simulates the function of brain. In order to make an artificial neural network function properly, the algorithm must be divided into two major steps: the first step is called a learning phase and the second step is a retrieving phase. In the learning phase, we need to adjust the system parameters of an artificial neural network such as weights or bias. In the retrieving phase, we can use the artificial neural network to predict the result based on new input data.

The neuron in an artificial neural network functions similarly as a neuron in a brain. The schematic showing the analog can be found in Figure 2.1.

Xi is the input, i=1… n.

Wi is the weights, i=1… n.

P is the bias.

f ( ) is the active function.

Y is the output

Fig. 2.1. Schematic shows the analog of artificial neuron and brain neuron structure.

A typical structure of neuron Dendrites: input from other neuron Axon: output connect to other neuron Synapse: the gap between the neurons Cell, Nucleus: body of neuron

As shown in Fig. 2.1, the {Xi}, i=1,…,n denotes an excitation from other neuron, and {Wi}, i=1,…,n represents a set of weighting coefficients denoting the connection strength between neurons, P denotes a threshold level beyond which a neuron will fire.

And Y is the excitation to other neurons. Depending on the performance desired, a variety of active functions have been adopted in different artificial neural networks.

Several useful forms of active function are step function, sign function, hyperbolic tangent function, or sigmoid function, which is defined as

1 1 ^{x b} Sigmoid

e^{ }

  ^, Hyper tangent tanh( x b ). b: the intercept on x axis. (2-1)

Step function Sign function

Hyper tangent Sigmoid

Fig. 2.2. Some useful forms of activation function for ANN

2.2 Typical Structures of ANN and the Corresponding Application Examples

2.2.1 The Perceptron Architecture

Perceptron is the first practical ANN, typically used for data classification. Figure 2.3 shows the structure of a perceptron with a structure of input layer and output layer.

is a normal node with Y = X is a neural node with Y = f(X)

Fig. 2.3. A typical perceptron structure

A perceptron ANN usually uses either a step function or a sign function as its active function depending on the training data set used. Denoting Xi to be the input of the i-th input node, Wij the weighting coefficient the i-th input node and the j-th neural node, then the input of the j-th neural node can be expressed as

1 N

j i ij j

Y X W P





 ^{. Here}

N is the total number of input nodes, Pj is the bias of the neural node, and Yj is the output of the j-th neural node. Let Dj be the desired output at the j-th node, Ej is the difference

between Yj and Dj. We can use E to adjust the ANN system parameters Wij and Pj by following the learning rule shown below:

( 1) ( )

ij ij j i

W t W t E X (2-2) ( 1) ( )

j j j

P t P t E , (2-3) where η denotes the learning rate with a typical magnitude ranging between 0 and 1, Ej

= Dj – Yj, and t is the iterations.

The training process can be set up by dividing the process into the following steps:

1. First, random numbers between zero and one are used to initialize the weighting coefficients and biases.

2. An input data pattern is inserted at the input, and the output and related errors E are calculated.

3. Using the errors and the learning rules to update the weighting coefficients and biases.

4. Repeat the steps 2 and 3 until the desired result is achieved with a satisfactorily small error.

2.2.2 Application Example of Perceptron

In this section, we will apply a perceptron for data classification to illustrate how this network works. The application example is to classify the 26 English alphabets. The

26 English alphabets were presented in a figure format of 12x12 pixels figure at the input of the perceptron. Figure 2.4 presents some of the letter figures. The output of the ANN is shown by an array of twenty six digits such that the output of alphabet A is (100000…0), B is (010000…0), C is (001000…0)…etc.

Fig. 2.4. Some English alphabet figures used in this example.

We converted each alphabet figures (from first to the last pixel) into an integer array of 144 elements and used the array for the input data. If the pixel is black, the corresponding integer is one. Otherwise it is zero. We therefore have 26 input data sets with each data being one dimensional array of 144 components of ones or zeros. For the output, we use a 26x26 identity matrix with each row corresponding to the desired output, resulting in a perceptron of 144 input neurons and 26 output neurons. The learning rate is set to be 0.5. The weight coefficients are set to be 0.25 by adjusting the bias. As the result, the learning process can converge with a learning curve shown in Fig.

2.5, indicating that the mean square error (MSE) can approach zero. As shown in Fig.

2.6, the ANN can successfully recognize the alphabet letters with an extremely low

0 50 100 150 200 250 300 -0.005

0.000 0.005 0.010 0.015 0.020 0.025 0.030 0.035

M S E

iterations

learning curve

Fig. 2.5. The learning curve of the perceptron ANN for the 26 alphabet letters

Fig. 2.6. The Confusion matrix (X: Output Y: Desired Output)

The learning curve is different for each training cycle of the alphabet letters because the initial weighting coefficients are generated by random number generator and they are of course different for each training cycle. Nevertheless, the final results with different initial set of weighting coefficients are similar.

The weighting coefficients and biases can be represented by matrices with the dimensions of 144x26 and 26x1, respectively. Fig. 2.7 presents the original matrix form of weighting coefficients. Fig. 2.8 shows the reshaped form of the weighting matrix with dimension of 12x312 with the weighting coefficients at the same pixel position in each alphabet letter figures. It was found that the weighting coefficients have large magnitudes at the positions corresponding to the positions of the black pixels in the alphabet letter figures. The weighting coefficients with larger magnitudes reveal those important pixels in the alphabet letter figures. The ANN after training may invoke those pixels to recognize the characteristic features of the alphabet letters. We found that those important pixels are clustered in the central region.

Fig. 2.7. The matrix of weighting coefficients (in its origin form).

Fig. 2.8. English alphabets and the corresponding reshaped weighting matrices

For example, if an ANN tries to distinguish the alphabets O from X. It may firstly examine the pixels locating at the central region of the letters to find some important

not be the only way to distinguish. There may have several possible forms of weighting matrix to meet our goal. The weighting matrix, which yields a power of recognition, is a solution to an equation that connects the desire output vector to the multiplication of the input vector and the weighting matrix. Since there are 26 alphabet letters, we have a set of linear equations with 144x26 variables and 26x26 constraints. It is not surprising that many solutions may exist. In the case that the equations do not have a solution, the training process would fail and no satisfied weighting matrix could be produced.

Although a perceptron structure is suited for data classification and recognition, it cannot be used for function approximation. Researchers in the past had developed another type of ANN structure, called back-propagation neuron network to expand the applicability of ANN.

2.2.3 Back-Propagation Artificial Neuron Network

Back-propagation artificial neuron network is useful in many applications of ANN.

The typical structure of a back-propagation ANN is illustrated in Figure 2.9, which has a multilayer configuration. Here W is the matrix of weighting coefficients of the ANN.

For using an input data X, the ANN can generate an output Y with the desired output D.

Back-propagation ANN typically possesses a structure of three to four layers, including one input layer, one output layer and one or two hidden layers. The active function

implemented in back-propagation ANN is sigmoid function in both hidden layers and output layer to enhance the performance in the learning phase with nonlinear data structure.

Fig. 2.9. Typical configuration of a back-propagation artificial neuron network

The learning procedure of a back-propagation ANN is summarized as follows:

Firstly, the input data and the corresponding desired output are sent to the ANN. The input data propagates forward layer by layer from the input layer, hidden layers to the output layer. The resulting error, which is defined to be the difference between the output and the desired response, is calculated. The weighting coefficients and biases of the ANN are then adjusted in order to minimize the error. Because the ANN has a multi-layer structure, we need to invoke chain rule to calculate the gradient of the

system parameters used in the error function.

The learning rule of back-propagation ANN is described as follows. We denote Wjk

to be the weighting coefficient between the j-th hidden neuron and the k-th output neuron. The bias of the j-th hidden neuron and the k-th output neuron is denoted as Pj

and Pk, respectively. Yk is the output from the k-th output neuron with Dk the desired

Here F is the active function. The sum of the variables in the active function is defined as a new variable “net”, and n and m are the number of input neuron and the number of hidden neuron, respectively. Error function will be defined as

where l is the number of output neuron.

We used the gradient descent algorithm to minimize the error function, leading to the following equations to be used for adjusting the weighting coefficients and bias

parameters.

where t is the iterations and η is the rate constant of learning process.

By using the chain rule, we can calculate the partial derivatives in Eqs. 2.7 and 2.8, which lead to (detail in appendix I)

( 1) ( ) use _k to adjust the bias at the nodes in the hidden layer and the weighting coefficients

between input layer and hidden layer by

( 1) ( )





. Because the active function used is a sigmoid function, we can simplify F net'( _j) to a multiplication of sigmoid functions as shown below

1 1 1

Eqs. 2-10 and 2-11 form the basis of learning needed to train the ANN.

2.2.4 Application of Back-Propagation Artificial Neuron Network for Function

Approximation

The function approximation is the most useful feature of ANN. This functionality of ANN can be use to approximate the relationship between input and output of a

physical system. Here we consider a simple application example of back-propagation ANN to approximate the relationship between the critical angles of the interface between two media, which is known as the Snell’s law. The critical angle of light passing through an interface of two media is known to be

1 2

sin ( / )n n

  ^ , (2-13)

where n ,₁ n are the refractive indices of the two medium. For this case, we construct a ₂

back-propagation ANN with two input nodes, four hidden nodes and one output node.

The learning rate is set to be 0.1 for all weighting and bias parameters. As shown in Fig.

2.10, the convergence of the learning process is poor if the number of hidden nodes is less than three, whereas no further improvement in the convergence can be yielded when the number of hidden nodes is more than 4.

0 5 0 0 1 0 0 0 1 5 0 0 2 0 0 0 2 5 0 0

Fig. 2.10. Learning curve of a back-propagation ANN with different number of hidden nodes.

After training, the ANN can simulate the training data and predict the test data as shown in Figure 2.11. The predicted values of ANN agree very well with the theoretic curve.

Fig. 2.11. The input-to-output characteristic curve of the back-propagation ANN with (a)the training data and (b) the test data.

2.2.5 Application of Back-Propagation Artificial Neuron Network for Time Series

Prediction

Time series prediction is useful for the prediction of weather temperature, the periods of sunspot, and chaos, etc. Time series prediction is very similar to functional approximation. Here we will focus on the issue of how to use ANN to predict the

behavior of a chaotic series. We created a chaotic series with the follow formulas:

0 1

0.01 4 (1 )

t t t

Y_ Y Y



  , (2-14)

which is results in a logistic map with r=4.

For time series prediction, the input to an ANN is the data we have prepared. For example, we can use Y1 to Y3 from Eq. 2-14 as the input data and Y4 as the desired output of the ANN. Similarly, we can shift one position and take Y2 to Y4 as the input and Y5 as the desired output. In this way, we can prepare numerous dataset to train the ANN. We chose the learning rate to be 0.01 for all weighting and bias parameters. The learning curves with different sets of system parameters are presented in Figure 2.12. After training, we can use the ANN to predict the data remaining in the time series data. The predicted values of the test data and training data by the ANN are plotted in Fig. 2.13, indicating the performance of the ANN is excellent with the predicted values almost identical to the real values in the chaotic series.

0 500 1000 1500 2000

0.0

Fig. 2.12(a) The learning curves of an ANN used for time series prediction. During the learning phase, the data prepared for training the ANN is single, two serial numbers, three serial numbers, respectively.

0 500 1000 1500 2000 0.0

0.1 0.2 0.3 0.4 0.5 0.6

Error

iterations

h1 h5 h10

Fig. 2.13. The time series (top) and the input-to-output characteristic curves (bottom) of a back-propagation ANN with the training data (left) and the test data (right).

Fig 2.12(b) The learning curves of an ANN with different number of hidden nodes.

2.3 Summary

From the case studies shown in this chapter, we found that ANN is useful for numerous applications. Most of the applications of ANN use the back-propagation structure. Function approximation is useful to simulate the behavior of a physical system, which the underlying processes inside the system are unclear. We can build an ANN to simulate the relationship between the input and output of a physical system. The most sensitive issue of ANN relates to the training process. The training process requires much CPU time and may yield poor performances if an inappropriate ANN structure is implemented and is trained with inappropriate data sets.

Chapter 3 Complete Characterization of Ultrashort Coherent Optical Pulses with SHG Spectral

Measurement

3.1 Introduction

As explained in Chapter 1, the complete field characterization of coherent optical pulses is the first step to invoke these optical pulses for optical metrology. Several techniques had been developed to offer the complete characterization of coherent optical phases, such as frequency-resolved optical gating (FROG) first reported by D. Kane and R. Trebino [2], and spectral-phase interferometry for direct electric field reconstruction (SPIDER) developed by T. Tanabe, et al. [3].

The basic concept of FROG is quite similar to the autocorrelation measurement but FROG measures the spectrums at different time delays instead of optical intensity only.

Retrieving the spectral phases and then yielding a complete-field information of the coherent pulse under study is via an iteration algorithm. SPIDER can directly measure the spectral phase of a coherent pulse with a spectral-shearing interferometer, which separates the incoming coherent pulse into two parts and sent one part through a linear spectral phase modulator, and the other through a linear temporal phase modulator. And then by superpose these two parts together to yield a spectral-shearing interferogram.

The spectral phase and therefore the complete field information of the coherent pulse can be deduced directly from the interferogram without involving any further iterative calculation.

Along the development of complete coherent pulse characterization, Dorrer, et al.

had invoked a self-referencing device based on the concept of shearing interferometry in the space and frequency domains to perform the spatio-temporal characterization of ultrashort light pulses [22]. Weiner et al. [23] had demonstrated the spreading of femtosecond optical pulses into picosecond-duration pseudo-noise bursts. In this case, pulse spreading was accomplished by encoding pseudorandom binary phase codes onto the optical frequency spectrum. Subsequently, decoding of the spectral phases restores the original pulse. Shelton et al. have generated a coherently synthesized optical pulse from two independent mode-locked femtosecond lasers, providing a route to extend the coherent bandwidth available for ultrafast science [24]. Applications of coherent light pulse characterization techniques in femto-chemistry had been well reviewed in [25].

Another attractive approach to characterize coherent laser pulse is to use an adaptive feedback-controlled apparatus to tailor the spectral phase of a coherent pulse to achieve the maximum second harmonic generation output from a nonlinear optical crystal [26]. In this way, the compensating spectral phases carry the spectral phase information about the coherent pulse under study.

Control the quantum evolution of a complex system is an important advance in optical metrology. The technique has now been coined as coherent or quantum control.

Adaptive coherent pulse control [27-30] is the most successful scheme to be used for quantum control. Several algorithms have been developed to tailor a coherent optical field for a specific target on the basis of fitness information [31-36]. In this regard, a freezing phase concept had been proposed for adaptive coherent control with a femtosecond pulse shaper [26].

Our main goal of this study is to develop an artificial neuron network (ANN) model which can be used to retrieve the spectral phase of a coherent pulse directly from the spectrum of the second harmonic generation (SHG) with a nonlinear optical crystal. The SHG spectrum is affected by both the SHG process and the spectrum of the incident light pulse. In this chapter, we will develop an ANN to help us retrieving the spectral phase and therefore the complete-field information of a coherent pulse (phase and spectrum) with the measured spectrum of second harmonic generation.

Assuming the temporal profile of a coherent pulse is known, therefore we only need to adjust the spectral phase of the input pulse to generate the maximum SHG output from a nonlinear crystal. From the measured SHG spectrum, we retrieve the spectral phase of the input coherent pulse with an artificial neuron network. If the approach is successful, we can simply retrieve the complete field information of a

coherent pulse in real time directly from a measured SHG spectrum without time-consuming computation. The apparatus needs only NLO (Nonlinear Optics) crystal and a spectroscope.

3.2 Theory

Considering an incident coherent optical pulse E( )



 A( )



eⁱ^{ }^{( )} with a spectrum of ^{A w}^{( )} and spectral phase distribution of ^{j w}^{( )}. The second harmonic

generation spectrum can be expressed as (2 ) ( )* ( ) 2

ISHG

  



  E



  d ^. ^(3-1) Assuming the spectrum of the coherent pulse to be Gaussian, and the spectral phase profile can be properly depicted with a polynomial of order 6, usually factor the phase of

a high order is much small than the low order we cut it off at the order six.

2 2

In general, the phase terms of order zero and one do not have any effect on SHG. The spectral phase profile can be further simplified by including terms from two to six only.

Note that from the point of view of theory, it shall be impossible to retrieve the spectral phase of a coherent pulse directly from the SHG spectrum of a coherent pulse.

Therefore, in the following we will conduct some simulations to test the feasibility of

3.3 Simulation 1

3.3.1 Preparation of the training data set

To prepare the training process of ANN, we sampled the spectrum and phase of a coherent Gaussian pulse to generate 64 data points. The second harmonic spectrum is presented with a data array of 127 data points because the second harmonic spectrum was calculated via a convolution operation.

The schematic of the training process is detailed in Fig 3.1. The input into a back-propagation artificial neuron network is the data of the second harmonic generation pulse comprising a spectral profile array and a spectral phase array.

Fig. 3.1. The schematic showing the training process of a back-propagation artificial neuron network. The input data to the ANN is prepared from the SHG Spectrum generated by a coherent pulse with a Gaussian amplitude

在文檔中類神經網路在光學量測方面之應用 (頁 24-0)