Determination of the MLP Network Parameters

4 Experiments on Well Logging Inversion

4.4 Determination of the MLP Network Parameters

One of the important parameters to MLP network performance is the learning rate. The learning rate determines the speed of training. In general, a larger learning rate generates more weight change while adjusting the connection weight, but may result in the divergence because of oscillation. A smaller learning rate, on the other hand, generates a smaller weight adjusting and results in slow training speed. We use the network with 10 input nodes, 20 hidden nodes, and 10 output nodes to test with the different learning rate. The stopping error is set to 10^-5, the maximal number of iteration is 10,000, and the momentum parameter is set to 1-η , where η denotes the learning rate. 31 trials are done for average performance of each learning rate. The testing result is shown in Table 4-3 and plot in Figure 4-5. The MLP network with learning rate 0.6 generates the smallest average of mean absolute error in our experiments.

Table 4-3. MLP network performance with different learning rate.

Network size Num. of

0 0.001 0.002 0.003 0.004 0.005

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Learning rate

Average of MAE

Fig. 4-5. Mean absolute error in MLP networks with different learning rate.

B. Number of Hidden Nodes Determination

For neural network architecture design, the MLP network size is usually dependent on different problem. In our experiment, the number of nodes in the input and the output layer is determined in advance, however, the choice of the number of nodes in hidden layer is not determined. A number of techniques have been developed to help selecting the appropriate number of nodes in hidden layer. One method is called pruning method [15]. The pruning method starts with a trained network with large number of hidden nodes and then tries to remove some of the redundant hidden nodes. This involves computing a saliency measure of each hidden node. A saliency measure is the importance of a hidden node to the network.

A low saliency means that the hidden node has low influence to the network performance, and then can be removed. In this study, we adopt the pruning method called Unit Selection Algorithm (USA) that was proposed by Messer et al. [16] to search for the appropriate number of nodes in hidden layer.

The algorithm to test saliency needs two datasets: a training dataset Dtr and a verification dataset Dver.We must specify a mean absolute error level Eel that we are ready to accept on the verification dataset.

The unit selection algorithm then proceeds as follows. The flowchart is shown in Figure 4-6.

Algorithm 4-1: Unit selection algorithm to prune MLP network.

Input: Training datasets and verification datasets.

Step 1: Set error level Eel.

Step 2: Construct a large MLP network with h hidden nodes and train until the mean absolute error on training dataset is less than an error threshold.

Step 3: Calculate the saliency of each hidden node Si, i = 1, …, h. connecting node i in hidden layer l to the node j in the next layer (l+1), n^(l+1) is the number of nodes in layer (l+1). Figure 4-7 shows layer l and layer l+1 of the MLP network.

Step 4: Find the node that has the smallest saliency and remove it, and set the current number of hidden nodes h = h – 1.

Step 5: Retrain the network for Nr iterations and compute the mean absolute error Ever on verification dataset.

Step 6: If Ever > Eel, algorithm stops, else, go to step 3.

Step 7: set h = h + 1.

In step 6, we compare Ever and error level Eel. If Ever is larger than Eel, the algorithm stops, otherwise, repeat by going to step 3. In step 7, the node that just removed in step 4 must be restored, so we set h = h + 1. The reason is that Ever will increase as more nodes are removed, so the optimal number of hidden nodes is chosen at the point just before Ever exceeds the error level that we are ready to accept.

Fig. 4-6. The MLP network.

In our experiments, we construct a MLP network with 10 input nodes and 10 output nodes. Since the algorithm starts from a large network, we use 20 hidden nodes. The used learning rule is gradient descent method, the learning rate is 0.6, and the momentum parameter is 0.4. The mean absolute error level Eel that we are ready to accept is set to 0.005 for all experiments. Initially we train the MLP network until the mean absolute error on training dataset is less than 0.003. Then after each hidden node removal, the network is retrained for 10 iterations. Training datasets are 16 out of the total 31 datasets, and the remaining 15 datasets are used for verification. We repeat 20 experiments to find the average number of hidden nodes. Table 4-4 shows the results of 20 experiments, and the average number of hidden nodes is 11.8.

Then we compare the network performance between the original MLP network (20 hidden nodes) and the MLP network pruned by the algorithm (12 hidden nodes).

31 trials are done for each network. The number of input nodes is 10, the stopping error is set to 10^-5, the learning rate is 0.6, and the momentum parameter is 0.4. The testing results are shown in Table 4-5. The size of MLP network that pruned by USA algorithm is smaller but still keeps the same performance level. However, the used training time is different. A smaller-size network not only saves the required memory but also saves the training time. The suggested number of hidden nodes is 1.2 times of the number of input nodes for the experiments.

Fig. 4-7. Flowchart of USA algorithm.

Table 4-4. Number of hidden nodes by USA algorithm.

Hidden nodes by USA algorithm in

20 experiments 12, 14, 11, 9, 10, 16, 14, 13, 10, 11, 13, 16, 13, 12, 8, 12, 9, 10, 12, 11.

Average 11.8

Table 4-5. Comparison of MLP network performance between full network and the network that pruned by USA algorithm.

Network size Number of

training patterns

Avg. of

MAE Smallest

MAE Avg.

training time (Sec.) Full

network 10-20-10 600 0.002972 0.001904 2,454 USA 10-12-10 600 0.003008 0.002065 2,105

C. Number of Hidden Layers Determination

We use the second hidden layer and test the MLP networks with different number of nodes in the second hidden layer. 31 trials are done for each MLP network, and the number of training datasets is 30 for each trial The network parameters are the same as used in the previous experiments that the learning rate is 0.6, the momentum parameter is 0.4, and the stopping error is 10^-5. Table 4-6 shows the experimental results of network with one and two hidden layers. For both the average of mean absolute error and the smallest error cases, no significant difference can be observed. Adding one more hidden layer does not provide apparent improvement, so we use one hidden layer in our experiments.

Table 4-6. Comparison of MLP network performance with one and two hidden layers.

layer 10-12-10 600 0.003008 0.002065 2,105 10-12-3-10 600 0.003169 0.002258 2,602 10-12-4-10 600 0.003304 0.002323 1,957 10-12-5-10 600 0.003312 0.002257 2,751 10-12-6-10 600 0.003128 0.002487 2,446 Two

hidden layers

10-12-7-10 600 0.003058 0.002037 1,954

4.5 Higher-Order Feature Multi-Layer Neural Net

在文檔中類神經網路與基因演算法於井測資料反推 (頁 55-60)