Proposed Niching SOMS Weight Updating Rule

Figure 4.3 shows the structure and operation of the SOM in the NSOMS. The SOM performs three operations: evaluation, deterministic competition, and search. In Figure 4.3, initially, we divide the whole SOM network into H subnetworks (niches), each niche comprises N neurons, and each neuron j in the hth niche (j ∈ Λ^h) contains a vector of a possible solution set w^h_j (the weight vector). So the total number of neurons equals H × N in the whole network. The initial center location of the hth niche is set on the w_e^h the average of all w^h_j. Take the missile interception application as an example and let the number of incoming missiles be M. Each time new measured data [v₁, v₂, · · · , v_M] are sent into the scheme, the SOM is triggered to operate. All of the possible solution sets in the neurons will then be sent to the dynamic model to derive their corresponding data p^h_j. The SOM evaluates the product of all terms of ^°^°°v_m− p^h_j^°^°° for m = 1, · · · , M . Of all the neurons for the hth niche, it chooses the neuron j^∗, which corresponds to the smallest value, as the winner. The learning process then continues, and each niche will eventually converge to the nearest optimal solution.

The search strategy of the population-based optimization algorithm is to find the best individual and move other individuals to approach to the optimal solution. However, one drawback of the SOM-based optimization algorithms is that the network size increases exponentially along with the dimension (r) of the search space. The network needs least 2^r neurons to ensure that each dimension can be considered during the search. To overcome this difficulty, an additional random term, such as random noise and random search methods, is added to raise the optimization efficiency. It might then use few neurons and randomly explore to each coordinate direction during the search. In [40], a small amount of random noise and also a narrowing down method are included in the weight updating rules

to improve its performance. In [29], Michele et al. also derive an alternative optimization algorithm based on neural gas networks (NG-ES) to overcome the bad scaling problem of the KSOM-ES by introducing a mechanism for generating trial points randomly. Wu and Chow proposed a self-organizing and self-evolving agents (SOSENs) neural network that combines multiple simulated annealing algorithms (SAs) and SOM algorithm [44]. Each neuron of SOSENs has its own updating rules (self-evolving) with an SA, and learns from other neurons by the SOM algorithm (self-organizing) after some time. However, when the distance between the best current solution and the real optimal solution is very large, the search process of these methods do not achieve good performance with only small random changes.

From the discussions above, the SOMS weight updating rule previously proposed may not be suitable for optimization in a multimodal domain. Thus, we made several modi-fications so as to reduce the number of neurons to raise the optimization efficiency. We similarly define a Gaussian distribution function G(w^h_j,i(k)) as distribution function for each element w^h_j,i(k), the ith element in w^h_j(k) in the kth stage of learning:

G(w^h_j,i(k)) = exp(−(w_j,i^h (k) − w^h_e_i(k))²

2σ^h_i(k)² ) (4.1)

where w^h_e_i(k) stands for the ith element in w^h_e(k) average of all w^h_j(k), and σ^h_i(k) is the standard deviation of the distribution for w^h_j,i(k). From the same concept to speed up the learning, described in chapter 3, the strategy is to vary the mean and standard deviation of G(w^h_j,i(k)) by moving its center toward w^h_j^∗_,i(k) and enlarging (reducing) the standard deviation σ_i^h(k) according to the double distance between w^h_j∗,i(k) and w^h_e_i(k). The new distribution function ˜G⁽w˜_j,i^h (k)) is then formulated as

G( ˜˜ w^h_j,i(k)) = exp(−( ˜w_j,i^h (k) − ˜w^h_e_i(k))²

2˜σ^h_i(k)² ) (4.2)

where ˜w^h_j,i(k) stands for the new w^h_j,i(k), ˜w^h_e_i(k) the new w^h_e_i(k), and ˜σ_i^h(k) the new σ_i^h(k) after the adjustment. Based on the same strategy in the SOMS, during each iteration of learning, G(w^h_j,i(k)) is dynamically centered at the location of the winning neuron j^∗, with a larger (smaller) width when w_e^h_i(k) is much (less) different from w^h_j^∗_,i(k). It thus covers a more fitting neighborhood region, and leads to a higher learning efficiency.

In order to reduce the network size greatly for dealing with the optimization of a multimodal domain, we make the adjustment in weight updating rule of the SOMS.

From the mapping property of the SOM, we understand that the SOM cannot obtain a good feature maps with a small network size. Therefore, let the weight vectors form the uniform distribution like the pre-ordered lattice in the neuron space becomes not so meaningful. So far a more fitting search range is already well-defined with the determined mean and standard deviation of the Gaussian distribution function. Thus, we propose a deterministic neighborhood to design the NSOMS weight updating rules. Based on the proposed concept, the new ˜w^h_e_i(k) and ˜σ^h_i(k) are then formulated as

w_e^h_i(k) = w_e^h_i(k) + ηw(k) · (w_j^h^∗_,i(k) − w^h_e_i(k)) (4.3)

σ_i^h(k) = σ^h_i(k) + η_w(k) · (2|w^h_j∗,i(k) − w^h_e_i(k)| − σ_i^h(k)) + ε (4.4)

where η_w(k), (0 < η_w(k) ≤ 1) stands for the learning rate in the kth stage of learning and ε the a small value added to avoid that σ_i^h(k) rapidly converges to zero. We can set a large value of ηw(k) to speed up convergence. However, if ηw(k) set to be 1, it

may probably converges to the local optimum. Thus, the premature convergence can be avoided through introduction of the additional adaptation term. From Eqs.(4.3) and (4.4), we can regenerate the new weight ˜w^h_j,i(k) from a Gaussian distribution with mean

Under this learning process,the network will gradually converge to a very small region with σ^h_i(k) continuing to decrease.

Sometimes more than two niches eventually converge to the same location of the optimal solution, or several optimal solutions have not been found yet. To overcome this difficulty, a technique for automatically determining the number of niches is introduced into the NSOMS to find as many solutions as possible. First, we use Eqs.(4.3) and (4.4) to detect a searched optimal solution when the standard deviation ˜σ^h_i(k) for every element is less than preset value and to determine an effective optimal solution with duplicate optimal solutions excluded when the mean values ˜wⁱ_e(k) and ˜w^j_e(k) are very close. If more than two niches are similar, only one is reserved and the others eliminated from the competition. If the number of niches is equal to the number of effective optimal solutions, a new niche is generated randomly. In other words, we intend to make the niche set size H(k) vary depending on the effective optimal solutions set size Es(k). We thus define a specific relation between the niche set size H(k) and the effective optimal solutions set size Es(k).

H(k) = n₁· Es(k) + n₀ (4.6)

where n0 and n1 are positive integers which can be either constants or variables decaying along with time. Of course, other types of functions can also be used. During the searching process, in order to prevent that the new niches regenerated converge on the locations of searched optimal solutions repeatedly, the initial center location of the regenerate niche should be far away from the initial center locations of all previous niches as much as possible. Hence, we define an evaluation criterion as

D_W_i∗ = min

°°

°w˜_R^h− w_Cⁱ^°^°°≥ λ(k) (4.7)

where ˜w_R^h stands for the initial center location of the regenerated niche, w_Cⁱ the ith location included in the set <^C of the initial center locations of all previous niches, and λ(k) the distance evaluation parameter. If the minimum distance D_W_i∗ less then λ(k), this new niche will be regenerated randomly again. In the initial stage of the learning, λ(k) can be set larger to prevent that the similar niches converge to the same optimal solution repeatedly. Later, λ(k) may be decreased gradually to let some optimal solutions that are possibly very close can found. Thus, a function for λ(k) that satisfies the demand can be formulated as

λ(k) = λ(0)

2 · e^−k/τ (4.8)

where the initial value λ(0) is the average of all kw_i(0) − w_e(0)k and τ time constant. Of course, other types of functions can also be used. Under this design, the searching efficacy

of the NSOMS will become faster and more efficiently.

4.3 Visualization Of the Distribution Of Optimal So-lutions

Different optimal solutions can now be found by the NSOMS. The next significant task is how to select the most useful solutions from the set of the optimal solutions. Visualization and clustering of high-dimensional data are well-known successes in SOM. We employ the basic principle of the double self-organizing map (DSOM), which updates the weight vectors together with the two-dimensional position vector of the neuron, to achieve the visualization of distribution of optimal solutions. In other words, the positions of the optimal solutions within the parameter space are mapped onto a two-dimensional (2-D) space. Through this map, it allows us to classify the optimal solutions into clusters easily, yielding useful information for solution selection.

The NSOMS is different from the DSOM in that the position updating rules of the neurons in the DSOM cannot be applied directly to optimization due to the system parameters operate in quite different ranges. We thus proposed a new adaptive mapping model to visualize the distribution of the optimal solutions. First, we know that the neuron space and weight vector spaces are with different dimensions, so we have to transform them into the same dimension. We define two Gaussian type functions as the neighborhood functions in the neuron space and the weight vector space, D_i and F (w_i(k)) in the kth stage of learning as

D_i = exp(− niche, respectively, w_i(k) the weight vector of neuron i to entire network, σd the average of all^°^°°r_i− r^h_j∗

°°

°², and σ_cthe average of all^°^°°w_i(k) − w^h_j∗(k)^°^°°². The Gaussian type function is frequently used as the neighborhood function, and it is differentiable and continuous.

With the neighborhood functions, the magnitudes of their distances in the neuron space and weight vector space, respectively, can be normalized to be between 0 and 1. The new learning model in the SOM is designed to let nearby neurons of a feature map correspond to nearby weights. From this mapping, a cost function E_i(k) is then defined as

E_i(k) = 1

2(D_i− F (w_i(k)))². (4.11) Based on the gradient-descent approach, the position updating rule of the neurons is derived as

r_i(k + 1) = r_i(k) − η_p(k)∂E_i(k)

∂r_i(k)

= r. _i(k) + η_p(k)(D_i· (D_i− F (w_i(k))) · (r_i(k) − r^h_j∗(k))) (4.12)

where ηp(k) stands for the learning rate in the kth stage of learning.

It is evident that only two learning parameters η_w(k) and η_p(k) need to be deter-mined. It is then straightforward to determine the learning parameters for the diverse optimization problems. Through the weight and position updating rules of the neurons

synchronously, the NSOMS can be applied to optimization, in particular, identification and visualization of multiple optimal solutions on the 2D neuron space.

4.4 Applications

To demonstrate its capability, the NSOMS is applied to both function optimization and dynamic trajectory prediction. A PC with 3GHz and MATLAB software were used for all the simulations. Based on the NSOMS, we first develop learning schemes corresponding to each of the applications. Simulations are then executed for performance evaluation.

The results are especially compared with the RCS-PSM [18] for their similar searching abilities. To compare their performances, the maximum ratio of the searched peak values to the real peak values (MPR), which denotes the sum of the fitness of the searched optima divided by that of the fitness of the actual optima and the effective number of maintained peaks (NMP), are selected as the performance criteria [18]. A searched peak is detected with the best fitness value at least 80% of the real peak value.

4.4.1 Function Optimization of a Multimodal Domain

For a function optimization problem, the goal may be to maximize (minimize) an object function O(·). Let O(w^h_j(k)) be the function value for the weight vector w^h_j(k), which represents a possible solution. During the learning process, P_r is a reference value of the performance evaluation for optimization. Note that the value of P_r is empirical according to the demanded resolution in learning, and we chose it very close to zero. The learning algorithm for function optimization is organized as follows.

Algorithm for function optimization based on the NSOMS: Maximize (minimize) an object function using the NSOMS.

Step 1: Set the stage of learning k = 0. Choose H number of niches, N number of neurons within each niche, and reference value P_r. Estimate the ranges of the possible parameter space and randomly store the possible parameters w^h_j(0) into the neurons, where j = 1, . . . , N , h = 1, . . . , H.

Step 2: Compute O(w^h_j(k)) for all w^h_j(k).

Step 3: Among the neurons to every niche, find the one with the largest (smallest) value as the winning neuron j^∗ for the maximization (minimization) problem.

Step 4: Update the weight vectors of the winning neuron j^∗ and its neighbors to every niche, and update the positions of neurons to entire network according to the position and weight updating rules described in Eqs.(4.3)-(4.5) and Eq.(4.12).

Step 5: If ^P^q

i=1σ˜^h_i(k) < P_r to every niche, w^h_j^∗(k) is determined to be as an effective optimal solution with duplicate optimal solutions excluded. From Eqs.(4.6) and (4.7), the new w^h_j(k) will be randomly regenerated and then added to the set <^c.

Step 6: Check whether the number of iterations is smaller than a pre-specified maximum number of iterations. If it is not, let k = k + 1, and go to Step 2; otherwise, the learning process is completed and output all optimal values. The final network mapping is ready for visualizing the distribution and structure of the optimal solutions.

Four multimodal functions are used to demonstrate the proposed algorithm. These functions are defined as follows:

F 1(x) = sin⁶(5πx), where 0 ≤ x ≤ 1 (4.13)

These four test functions have also been used in [15, 18]. The optimization here is to maximize these four functions. The first function, F1, has five peaks with the same height for x = 0.1, 0.3, 0.5, 0.7, and 0.9. The F2 function has five unequally spaced peaks with different heights. In F3, a(i) = 16[(i mod 5)−2] and b(i) = 16([i/5]−2). F3 is a two-dimensional function with 25 peaks of different heights in the interval [476.191, 499.002]

and the highest peak is located at (32,32). The 2D test functions F1-F3 are shown in Figure 4.4. F4 is a four-dimensional function with 4 peaks of the same height, where y_i is a constant vector obtained by permuting the components of the vector (8,2,2,2).

The initial number of the neurons was set to be 1 × 10 (H = 1, N = 10) and the pa-rameters n₁, n₀, η_w(k) and η_p(k) were set to be constants, 2, 1, 0.4, and 0.4, respectively for all function optimization. For comparison, we also use the RCS-PSM for function minimization, which is with an initial population size of 10 to match with that of the SOM, and the crossover and mutation probability of 0.6 and 0.0333 for the GA opera-tion, respectively. To automatically determining the number of niches, we use the same condition described in Eq.(4.6) to RCS-PSM for simulations.

For each test functions, each algorithm is repeated 30 times and carried out after 30 iterations each time. The comparison results of average are listed in Table 4.1. As shown in Table 4.1, NSOMS and RCS-PSM perform quite well in terms of the number and quality

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Function F3(x,y) Function F2(x)Function F1(x)

of peaks obtained. But, NSOMS works better than RCS-PSM for the computational time.

Figure 4.5 shows the variations of NMP and MPR according to the increase of time for F3 test function. We observed that the NSOMS converged faster than the RCS-PSM.

Figure 4.6 shows the variation of the network structure on the SOM using the NSOMS for F4 test function. In Figure 4.6(a), we only show the variation of the best four niches.

It shows clearly that the positions of neurons on the SOM reveal the distribution and structure of the optimal solutions. Figure 4.6(b) shows the neighboring relationship of the neurons of the best four niches, and we can observe that the neighborhood functions of neurons and weights, described in Eqs.(4.9) and (4.10), varied during the NSOMS learning process, and eventually they are very close to each other. From these results, the NSOMS demonstrates the identification and visualization of the optimal solutions of high dimension.

Table 4.1: Comparison results for NSOMS and RCS-PSM on the 4 test functions.

Function Method NMP MPR Time (s)

NSOMS 5 1 0.1213

RCS-PSM 5 1 0.2154

NSOMS 5 1 0.1436

RCS-PSM 5 1 0.2417

NSOMS 25 1 2.0301

RCS-PSM 25 1 5.5171

NSOMS 4 1 0.1337

RCS-PSM 4 1 0.7265

(a) Number of maintained peaks

(b) M aximum peak ratio

Figure 4.5 Convergence comparisons for F3 function: (a) the variation of the number of maintained peaks and (b) the variation of the maximum peak ratio during the NSOM S learning process.

(a) Projection result in 2D neuron space

(b) Final neighborhood function values

Figure 4.6 The results obtained by the NSOM S for F4 function: (a) projection result in the 2D neuron space, and (b) final neighborhood function values.

:D_i :F(w_i( ))k

:D_i*

4.4.2 Multiple Dynamic Trajectories Prediction

For a dynamic trajectory prediction problem, the goal may be to estimate the initial position and velocity of a moving object using the measured data. For the trajectory prediction of multiple targets, here we assume that the target detection has been carried out in advance, and we focus on the estimation of the initial states of multiple moving objects. Through a learning process, the NSOMS may determine a most probable initial state of each target through repeatedly comparing the measured data with the predicted trajectories derived from the possible initial states stored in the neurons of the SOM.

In this application, the nonlinear dynamic equation describing the trajectory of the moving object and the measurement equation are as those described in Chapter 3. The learning algorithm for multiple trajectories prediction is organized as follows.

Algorithm for multiple trajectory prediction based on the SOMS: Predict an optimal initial state for the trajectory of every moving target using the measured position data.

Step 1: Set the stage of learning k = 0. Choose H number of niches, N number of neurons within each niche, and reference value Pr. Estimate the ranges of the possible position and velocity of the moving object, and randomly store the possible initial states w^h_j(0) into the neurons, where j = 1, . . . , N , h = 1, . . . , H.

Step 2: Send w^h_j(k) into the dynamic model, described in Eqs.(3.25) and (3.28), to compute p^h_j(k).

Step 3: For each neuron j of every niche, compute its output O_j^h(k):

O^h_j(k) =^Y^M where M is the number of the objects detected.

Find the winning neuron j^∗ with the minimum O^h_j∗(k):

O^h_j∗(k) = min

j O^h_j(k) (4.18)

Step 4: Update the weight vectors of the winning neuron j^∗ and its neighbors to every niche, and update the positions of neurons of the entire network.

Step 5: If ^P^q

i=1σ˜^h_i(k) < Pr for every niche, w^h_j^∗(k) is determined to be as an effective optimal solution with the duplicate optimal solutions excluded. The prediction process outputs the predicted optimal initial states to the dynamic model to derive the object trajectories. From Eqs.(4.6) and (4.7), the new w^h_j(k) will be randomly regenerated and then added into the set <^c.

Step 6: Check whether the number of iterations is smaller than a pre-specified maximum number of iterations. If it is not, let k = k + 1, and go to Step 2; otherwise, the prediction process is completed and output optimal states of all objects. The final network mapping provides the visualization of the distribution of the optimal states. In addition, during each stage of learning, we perform a number of learning to increase the SOM learning speed. This number of learning is set to be a large number in the initial stage of the learning process, such that the NSOMS may converge faster at the price of more oscillations, and decreased gradually to achieve smooth learning in later stages of learning.

To demonstrate the effectiveness of the proposed NSOMS and weight updating rule, we performed a series of simulations for dynamic trajectory prediction based on using the proposed NSOMS and two NSOMS without the proposed dynamic weight updating rule (named as SOMSO-1 and SOMSO-2 by using the SOMO’s and SOSENs’s weight updating rules, respectively, in place of that of the NSOMS). The trajectory to predict in the simulations was designed to emulate that of a missile. Its governing equations of motion in the 3D Cartesian coordinate system are as those described in Chapter 3. The ranges of the possible initial states w_j(0) were estimated to be

1.14 × 10⁶m ≤ x₁(0) ≤ 2.14 × 10⁶m

Within the ranges described in Eq.(4.19), the possible initial positions and velocities of the missile were selected and stored into the 1125 (5×225) neurons of the 2D SOM. We consider three targets to be detected in the following simulations. The parameters of NSOMS were set to be n₁ = 2, n₀ = 1, respectively. The additional adaptation term ε, described in Eq.(4.4), was set to be 0.1. For comparison, we set the same learning rate as those in the NSOMS described in Sect. 4.4,1, and several parameters of the SOMSO-1 and SOMSO-2 were adjusted via a trial-and-error process to yield salient performance.

The number of learning is set to be 20 during each stage of learning.

We first applied the SOMS, SOMSO-1, and SOMSO-2 for trajectory prediction with a good estimate of the initial state. Three ideal initial states of the missiles were assumed

在文檔中自我組織特徵映射網路於最佳化問題及其應用 (頁 69-0)