Higher-Order Feature Multi-Layer Neural Net with Conjugate

4 Experiments on Well Logging Inversion

4.6 Higher-Order Feature Multi-Layer Neural Net with Conjugate

In this section, we use neural networks that are trained by conjugate gradient method instead of the gradient descent method, and also expand the input features using the higher-order functions which from second-order function to 5^th-order function. We design 5 higher-order feature neural nets using these functions:

HOCG-1, HOCG-2, HOCG-3, HOCG-4, and HOCG-5, and are listed in Table 4-11.

These experiments are run with toolbox of Matlab ® v7.1 which use function newff(…) to create a neural network and assign the search routine as Golden Section Search (srchgol).

A. Evaluation of HOCG Networks

Table 4-12 shows the testing result of each network type. The stopping error is set to 10^-5for all experiments. Each network performance is evaluated by the average of mean absolute errors of 31 trials that achieve the training goal. From Table 4-10 and 4-12, the experimental results show that the neural networks trained by conjugate gradient method have better performance than those trained by gradient descent method. The average of mean absolute error of HOCG network is smaller and the average training time is shorter than corresponding HOML. Table 4-13 shows the comparison of average of mean absolute error in network performance between HOML and HOCG. Among all these 5 network types, HOCG-3 yields the largest improvement (16.1%) than the others.

Table 4-11. Five different HOCG types.

Network type Features (include bias) Network size

HOCG-1 1 + x 10-12-10

HOCG-2 1 + x + x² 20-24-10

HOCG-3 1 + x + x²+ x³ 30-36-10 HOCG-4 1 + x + x²+ x³+ x⁴ 40-48-10 HOCG-5 1 + x + x²+ x³+ x⁴+ x⁵ 50-60-10

Table 4-12. Comparison of each type of HOCG.

Network size Avg. of MAE Smallest MAE Avg. training time (sec.) HOCG-1 10-12-10 0.002861 0.002326 558 HOCG-2 20-24-10 0.002475 0.002018 529 HOCG-3 30-36-10 0.002095 0.001824 616 HOCG-4 40-48-10 0.002369 0.001792 395 HOCG-5 50-60-10 0.002154 0.001765 408

Table 4-13. Comparison of average of mean absolute error between HOML and HOCG.

Network Avg. of MAE Network Avg. of MAE Improvement HOML-1 0.003008 HOCG-1 0.002861 4.9%

HOML-2 0.002514 HOCG-2 0.002475 1.5%

HOML-3 0.002496 HOCG-3 0.002095 16.1%

HOML-4 0.002501 HOCG-4 0.002369 5.3%

HOML-5 0.002407 HOCG-5 0.002154 10.5%

B. Discussion of Divergence

The training process can be observed by monitoring the MSE curve. Here we summarize two of the reasons that may cause the learning unsuccessfully.

(1) The initial weight values.

Gradient-based method starts the learning procedure from initial connection weights, then update iteratively with the selected search direction and the step size.

Since the error surface of neural network has numerous local minima, improper initial connection weights may cause the learning unsuccessfully. Figure 4-14 shows one of the testing cases in HOCG-5 that tests on dataset number 10. The error can not go down after large iterations. On the contrary, Figure 4-15 shows the result of another testing of HOCG-5 on the same testing dataset, and achieves the training goal at iteration 1,556 in this trial.

Fig. 4-14. Testing case of HOCG-5 on dataset number 10 that does not achieve the training goal.

Fig. 4-15. Testing case of HOCG-5 on dataset number 10 that achieves the training goal.

(2) The maximal number of iterations.

The learning procedure may stop if the maximal number of iterations is not set to large enough. Figure 4-16 shows one of the test cases in HOCG-2 that tests on dataset number 12. The training converges at iteration 3,315, but no effective training is observed before about 2,600 iterations. Obviously, the training will fail to achieve the training goal if the maximal number of iterations is set to less than 2,600 in this trial.

Fig. 4-16. Testing case of HOCG-2 on dataset number 12.

4.7 Experiments on Reversing the Input and Output of Well Logging Data

A. Determine the Suitable Number of Input Nodes of Network

In order to generate the synthetic training data from the desired true formation conductivity, we reverse the input and the output of well logging data. That means the original input becomes the desired output and the original desired output now becomes the input. By observing some datasets, such as number 2 and 5 as shown in Figure 4-17, some data intervals, for example, the depth from 490 to 500 feet in dataset number 2 and the depth from 540 to 560 feet in dataset number 5 have the same input features (the dotted line), so a network with short length 10 or 20 input nodes may not be suitable for training with these datasets. To find the number of input nodes, we compare the network performance with 10, 20, 40, and 50 input nodes. 10 experiments are done to get the average performance. The used network model is HOML-1 that uses the gradient descent method. The stopping error is set to 0.0005, the learning rate is 0.6, the momentum parameter is 0.4, and the used testing dataset is number 1, 5, 10, 12, 15, 17, 20, 23, 25, and 30. The number of training datasets is 30 for each testing. In Table 4-14, HOML-1 with 10 and 20 input nodes can not converged. In this case, HOML-1 is not able to achieve the training goal. However, it converged when the number of input nodes is 40 and 50, and the one with 50 input nodes has better performance, so we use the HOML networks

with 50 inputs nodes to do experiments of reversing the input and output of well logging data.

Table 4-14. Average of mean absolute error with 10, 20, 40, and 50 inputs nodes in HOML-1.

Number of input nodes Network size Avg. of MAE

10 10-12-10 Diverged

20 20-24-20 Diverged

40 40-48-40 0.02398

50 50-60-50 0.02143

490 500 510 520 530 540 550 560 570 580 590

(a) Dataset number 2. (b) Dataset number 5.

Fig. 4-17. Two well logging datasets.

B. Experimental Results of HOML and HOCG

From Table 4-13, the HOML-3, 4, 5 and the HOCG-3, 4, 5 have smaller average of mean absolute error, so we use these three models with 50 input and output nodes to do the experiments on reversing the input and output of well logging data. The number of hidden nodes is 1.2 times of the number of input nodes. The learning rate for HOML is 0.6 and the momentum parameter is 0.4. The stopping error is set to 0.0005. Table 4-15 and 4-16 show the comparison of average of mean absolute error and training time of HOML and HOCG respectively.

The testing of HOML-3 on dataset number 10, HOML-4 on dataset number 20, and HOML-5 on dataset number 30 are shown in Figure 4-18 respectively. From the testing result of HOML-4 and HOCG-4 in Figure 4-18(b), HOML that trained by gradient descent method has the better performance than HOCG that trained by

conjugate gradient method. However, as shown in Table 4-15, among all three types we used, the average of mean absolute error of HOCG networks are smaller than HOML network types. From Table 4-16, for larger-size network (HOML-5 and HOCG-5), they do not have much difference in the average training time although the difference of average of mean absolute error between them is significant. On the other hand, for smaller-size network, they have much more difference in average training time as HOML-3 and HOCG-3. For small network, conjugate gradient method has more significant improvement in training time to convergence.

Table 4-15. Average of mean absolute error of HOML and HOCG.

HOML-3 vs. Network size 150-180-50 200-240-50 250-300-50

HOML 0.01684 0.01965 0.01947 HOCG 0.01507 0.01758 0.01361

Table 4-16. Training time of HOML and HOCG in seconds.

HOML-3 vs. Network size 150-180-50 200-240-50 250-300-50

HOML 3,957 3,864 5,284 HOCG 1,768 2,239 4,729

HOML-3 HOCG-3

490 500 510 520 530 540 550 560 570 580 590

On dataset number 10, MAE is 0.01817. On dataset number 10, MAE is 0.01481.

(a) HOML-3 vs. HOCG-3.

HOML-4 HOCG-4

490 500 510 520 530 540 550 560 570 580 590

On dataset number 20, MAE is 0.01356. On dataset number 20, MAE is 0.01569.

(b) HOML-4 vs. HOCG-4.

HOML-5 HOCG-5

490 500 510 520 530 540 550 560 570 580 590

On dataset number 30, MAE is 0.02326. On dataset number 30, MAE is 0.00992.

Fig. 4-18. Comparison of network performance of HOML and HOCG.

4.8 Summary

From Table 4-10, 4-12 and 4-15, we summarize the testing performance of networks and shown in Table 4-17 and Table 4-18. In Table 4-17, each HOCG network has smaller average of mean absolute error and shorter average training time than HOML network, which shows that the conjugate gradient method improves the training efficiency. Also the network with higher-order features has better performance than general MLP network that using both gradient descent method and conjugate gradient method, which shows that the proposed higher-order feature neural nets are more suitable for well logging data inversion. The same

result can be seen in Table 4-18, which shows the comparison of experimental results of reversing the input and output of well logging data. Moreover, the average training time of experiments of reversing the input and output of well logging data is much more than previous normal inverse procedure. One of the reasons is that the network size is larger because we use 50 input nodes for each higher-order feature neural net. Another possible reason is that the input features are distributed in the shape of straight lines as shown in Figure 4-17.

Table 4-17. Comparison of HOML and HOCG in average of mean absolute error and average training time.

Network size Avg. of MAE Avg. training time (Sec.)

HOML-1 10-12-10 0.003008 2,105

HOCG-1 10-12-10 0.002861 558

HOML-2 20-24-10 0.002514 1,804

HOCG-2 20-24-10 0.002475 529

HOML-3 30-36-10 0.002496 1,483

HOCG-3 30-36-10 0.002095 616

HOML-4 40-48-10 0.002501 1,633

HOCG-4 40-48-10 0.002369 395

HOML-5 50-60-10 0.002407 1,663

HOCG-5 50-60-10 0.002154 408

Table 4-18. Comparison of experimental results of reversing the input and output of well logging data.

Network size Avg. of MAE Avg. training time (Sec.)

HOML-3 150-180-50 0.01684 3,957

HOCG-3 150-180-50 0.01507 1,768

HOML-4 200-240-50 0.01965 3,864

HOCG-4 200-240-50 0.01758 2,239

HOML-5 250-300-50 0.01947 5,284

HOCG-5 250-300-50 0.01361 4,729

5. Genetic Algorithm (GA) on Well Logging Inversion

5.1 Introduction

Genetic algorithm (GA) was investigated by John Holland et al. in 1975 [17]. It was based on the biological evolution. GA’s learning is a competition among all chromosomes in population, and treats these chromosomes for the potential solutions to the target problem. Population consists of a set of chromosomes and a chromosome consists of a number of genes. The number of individuals in population is called the population size A chromosome is also called an individual in GA, and an iteration referred to the GA is called a generation. In each generation, individuals are operated by genetic algorithm procedure to form the next generation.

The fitness value can be obtained through fitness function and is used to evaluate each individual. Fitness value can be regarded as a measure of indicating how good an individual is. The fitness function is specifically designed for the different problem. Through competition the individual with better fitness value has higher probability to be reproduced and generates offspring, in such a way that the solution will be better and better to the target problem. GA is a global search algorithm that provides chance to avoid getting trapped at local minimum, and then often treated as one of the powerful tool to optimization problem. Moreover, GA needs only a fitness function and does not use the gradient information. Because it is easy to implement, GA is widely used in many applications. [18]-[19].

The GA process can be summarized as following steps:

(1) Generate the initial population with many individuals.

The GA process starts with an initial population of random chromosomes. A chromosome consists of one or more genes. The basic way of coding is in binary format. Figure 5-1 shows a chromosome consists of 3 genes and each gene consist of 3 bits.

Fig. 5-1. Binary format of an individual with 3 genes.

(2) Evaluate the fitness value of each individual.

Each chromosome will then be evaluated by computing its fitness value.

Individuals with the higher fitness value have more chance to generate offspring during evolution because through competition they have more chance to be reproduced in reproduction phase.

(3) Termination condition.

Rank the fitness values and find the best individual. The algorithm is terminated when the best individual meets the desired requirement or the evolution reaches the maximal number of generations.

(4) Reproduction.

Reproduction is the process for selecting individuals to be reproduced for the next generation. The methods of roulette wheel selection and tournament selection can be used to select individuals.

1. Roulette wheel selection.

In roulette wheel selection method, the population is viewed as mapping on a roulette wheel. Each individual is represented by a space that proportionally corresponds to its fitness value, which means, an individual with higher fitness value occupies more space in the roulette wheel so that has more chance to be reproduced. For example, the number of individuals of population is N, fi is the fitness value of individual i, and then the number of individual i to be reproduced is:

2. Tournament selection.

There are three stages in this reproduction method:

(a) Select two or more individuals.

(b) Reproduce the individual that has the highest fitness value among them.

(5) Crossover.

Crossover is the process of exchanging gene information of randomly selected two individuals by the crossover probability Pr, one basic crossover process is the single point crossover as illustrated in Figure 5-2.

Fig. 5-2. Single point crossover from two individuals, each element of individual is a gene.

(6) Mutation.

Mutation is the process to flip a bit in an individual. The mutation probability Pm

decides how many bits will be mutated in population. For example, set mutation probability of 1% means totally 1% bits in whole population will be flipped.

Reproduction, crossover, and mutation are three major GA operators that used to produce the new population. A new generation starts by going to step (2). The flowchart of genetic algorithm is shown in Figure 5-3.

Fig. 5-3. Flowchart of genetic algorithm.

5.2 Real-Coded Genetic Algorithm (RCGA)

The difference between the RCGA and GA is the coding format of individuals [20]. In RCGA, the individuals are directly represented with real number instead of the binary string. The binary format has the disadvantage in computational cost because of encode and decode phases. In addition, RCGA deals with continuous search space without sacrificing numerical precision as the binary format does.

RCGA applies different crossover operator and mutation operator from GA. In order to do crossover and mutation on real-coded individuals, Michalewicz [21]

introduced the arithmetic crossover and arithmetic mutation.

(1) Arithmetic crossover.

Consider two selected individuals x = {x1, …, xn} and y = {y1, …, yn} to produce offspring p and q as shown in Figure 5-4:

pi = hi × xi ＋ (1－hi) × yi

qi = (1－hi) × xi ＋ hi × yi

where hi is a uniform random crossover parameter.

(2) Arithmetic mutation.

Given an individual x = (x1, x2, …, xi, …, xn ) that is selected to be mutated and an interval [ai, bi] where ai and bi are the lower and upper bound for gene xi. While doing mutation, a new gene xi’ is uniform randomly selected from range [ai, bi] to replace the original gene xi and then a new individual x’ = (x1, x2, …, xi’, …, xn ) is formed.

Fig. 5-4. Arithmetic crossover.

5.3 Experiments on Well Logging Data Inversion

We use the hybrid method that combines neural network and GA to do the well logging data inversion. In section 5.3.1, we describe how an individual can be used to represent a neural network. In section 5.3.2, we describe the reason and give flowchart of the combination of neural network and GA. In section 5.3.3, we do experiments to evaluate the performance of the hybrid method. In section 5.3.4, we do experiments on reversing the input and output of well logging data, and section 5.3.5 is the summary of experiments.

5.3.1. Representation of Weights of Neural Network

When applying GA in finding the weight values of neural network, each gene of individual represents a connection weight of neural network and each individual forms a neural network. The number of gene of an individual is equal to the total number of connection weight of a network. For example, there are I input nodes, J hidden nodes, and K output nodes in a neural network, an individual that represents the network have L genes, where L = ((I+1)(J)+(J+1)(K)). Figure 5-5 shows how an individual can be used to represent a network. The network has 2 input nodes, 3 hidden nodes, and 2 output nodes, the total number of connection weight is (2+1)(3)+(3+1)(2) = 17. Since each gene represents a connection weight, we can use an individual that contains 17 genes to represent such a MLP network.

Fig. 5-5. A MLP network represented by an individual.

5.3.2. Combination of Neural Network and GA

GA offers a way to search for connection weights of MLP networks. However, GA is relatively slow in local search although it is good at global search. The learning efficiency of GA can be improved by integrating a local search procedure into the evolution. The local search algorithm could be BP learning algorithm [22].

In our experiment, GA is the main procedure that used to find the connection weights of MLP network, and in order to improve the learning efficiency of GA, a BP training using gradient descent is incorporated for Nb iterations with the best individual every Ng GA generations. The fitness function is defined as the inverse of mean squared error. The algorithm is described as follows, and the flowchart is shown in Figure 5-6.

Algorithm 5-1: Hybrid method for well logging data inversion.

Input: Well logging datasets and the corresponding desired output values. 30 datasets are used in training and 1 dataset is used in testing.

Output: Mean absolute error using 1 testing dataset.

Step 1: Initialization.

1. Set the stopping error.

2. Set parameters of GA and BP training.

3. Generate the initial population with many individuals, and each individual forms a network.

Step 2: Input 30 well logging datasets to each individual for training, and do genetic algorithm for one generation.

Step 3: Compute the fitness values of all individuals.

Step 4: Find the best one individual to do BP training that use gradient descent method for Nb iterations.

Step 5: Compute the fitness value of individual that operated in step 4.

If the fitness value of best one individual is less than the stopping error or the computation reaches the maximal number of generations, then algorithm stops.

Otherwise, repeat by going to step 2.

Step 6: Input 1 testing well logging dataset, and compute the mean absolute error.

Fig. 5-6. Flowchart of the hybrid method on well logging inversion.

5.3.3 Evaluation of Hybrid Neural Network and GA

Global search procedure such as GA is usually computationally expensive, and it would be better to employ local search procedure, so we do BP training with the best individual after each GA generation. The local search procedure here is used to refine the network with the best individual, not to find the final optimal weight with that individual, so the number of iterations is not necessary to be large, and we set

Nb = 10. The number of individual in population is 30, and the maximal number of generations is 10,000. The crossover rate and the mutation rate are set to 0.8 and 0.001 respectively. The crossover parameter is randomly selected in range [0, 1], and the mutation range is set to range [-1.0, 1.0]. The used reproduction method is tournament selection. We use hybrid method of HOML-1 with GA to HOML-5 with GA, the learning rate in HOML networks is 0.6 and the momentum parameter is 0.4.

The stopping error is set to 10^-5. 10 experiments are done for each hybrid method to get the average performance. The used testing datasets are 1, 5, 10, 12, 15, 17, 20, 23, 25, and 30 respectively, and the number of training datasets is 30. The testing result is shown in Table 5-1.

Table 5-1 shows the testing result of different architecture of hybrid methods.

In our experiments, adding more higher-order terms in input features do not provide more improvement in reducing the mean absolute error. Among these 5 hybrid methods, HOML-3 with GA yields the smallest average of mean absolute error. By comparing Table 5-1 and the testing results of HOML as shown in Table 4-10, no significant difference about average of mean absolute error can be found. However, the average training time of hybrid method is much more than that of HOML.

Figure 5-7 shows two of test cases. Figure 5-7(a) shows the test case of HOML-2 with GA on dataset number 10, and it is the test case that has the smallest mean absolute error among 10 trials. Figure 5-7(b) shows the test case of HOML-3 with GA on dataset number 20, and it is the test case that has the smallest mean absolute error among 10 trials.

Table 5-1. Testing results of HOML with GA.

Network

with GA 10-12-10 600 0.002927 0.002768 9,568 HOML-2

with GA 20-24-10 600 0.002492 0.002207 9,218 HOML-3

with GA 30-36-10 600 0.002349 0.002008 13,521 HOML-4

with GA 40-48-10 600 0.002729 0.002461 14,254 HOML-5

with GA 50-60-10 600 0.002658 0.002125 14,168

490 500 510 520 530 540 550 560 570 580 590 0

0.2 0.4 0.6 0.8

1 desired output

real output

(a) Testing dataset on number 10 with the mean absolute error 0.002507 using HOML-2 with GA.

490 500 510 520 530 540 550 560 570 580 590

0 0.2 0.4 0.6 0.8

1 desired output real output

(b) Testing dataset on number 20 with the mean absolute error 0.002208 using HOML-3 with GA.

Fig. 5-7. Testing results in HOML-2 with GA and HOML-3 with GA.

5.3.4 Experiments on Reversing the Input and Output of Well Logging Data

We use HOML-3 with GA to HOML-5 with GA to do the experiments on

在文檔中類神經網路與基因演算法於井測資料反推 (頁 66-0)