4.6 Experiments on organizing property and data clustering
4.6.1 Experiments on organizing property
Data set description: We conducted experiments on two types of data: a synthetic data set and a real-world data set. The synthetic data set consisted of 500 points uniformly distributed in a unit square. For the real-world data set, we used the training set of class ‘0’ in the “Pen-Based Recognition of Handwritten Digits” database (denoted as PenRecDigits C0) in the UCI Machine Learning Database Repository [96]. The data set
consists of 802 16-dimensional vectors. To demonstrate the map-learning process, we used the first two dimensions of the feature vectors as data for simulations. As a pre-processing step, we scaled down each element of the vectors in PenRecDigits C0 to 1/100 of its original value to avoid numerical traps.
Experiment setup: In the experiments, an 8 × 8 equally spaced square lattice in a unit square was used as the structure of the SOM network. For the neighborhood function, we used the Gaussian kernel hkl in Eq. (2.4).
We evaluated SOCEM, SOEM, SODAEM, and KohonenGaussian (Kohonen’s batch algorithm that uses Gaussian reference models) in 20 independent random initialization trials and two setups for σ in hkl. For each trial, data samples were randomly selected from the data set as the initial mean vectors, µ(0)1 , µ(0)2 , · · ·, µ(0)G , of the reference models, which were multivariate Gaussians with full covariance matrices. The initial covariance matrix Σ(0)l was set as ρlI, where ρl=mink6=l{kµ(0)l − µ(0)k k}, for l = 1, 2, · · · , G. To avoid the singularity problem, we applied the variance limiting step to the covariance matrices during the learning process. If the value of any element of the covariance matrix was less than 0.001, it was set at 0.001.
4.6.1.1 Results on the synthetic data
We first demonstrate the map-learning processes of SOCEM, SOEM, and SODAEM using one of the 20 random initializations by showing the configurations of the Gaussian means on the maps, and then summarize the overall results of all the initializations.
Simulations using SOCEM: Figure 4.5 shows two simulations using the SOCEM algorithm. In the first simulation, SOCEM is run with the random initialization in Figure 4.5 (a) and a fixed σ of 0.15 in hkl. As shown in Figure 4.5 (b), the algorithm’s learning converges to an unordered map. In the second simulation, SOCEM starts with the same random initialization as that in Figure 4.5 (a), but with a larger σ of 0.6. When it converges at the current σ value, σ is reduced by 0.15. Then, the algorithm is applied again with the new σ value and the reference models obtained in the previous phase. This process continues until SOCEM converges at σ = 0.15. Figures 4.5 (c), (d), (e), and (f) depict the maps obtained when σ =0.6, 0.45, 0.3, and 0.15, respectively. We can explain the second simulation in terms of annealing (cf. Section 4.2.1): When using SOCEM, we start with a larger σ value (a higher temperature) so that the objective function is simple enough to be optimized. Then, we obtain the target map configuration by gradually reducing the value of σ (the temperature). Though the reduction in σ produces a more complex objective function for optimization, SOCEM can still learn well because the reference models obtained at the larger σ value provide a sound initialization for the next learning phase at the smaller σ value.
Simulations using SOEM: We conducted two similar simulations using the SOEM algorithm. In the first simulation, SOEM was run with the random initialization in Figure
4.6 (a) (the same as that in Figure 4.5 (a)) and a fixed σ of 0.15. As shown in Figure 4.6 (b), the learning of SOEM converged to an unordered map. In the second simulation, SOEM started with the random initialization in Figure 4.6 (a) and a larger σ of 0.6.
Then, the value of σ was gradually reduced to 0.15 in 0.15 decrements. Figures 4.6 (c), (d), (e), and (f) depict the maps obtained when SOEM converges at σ =0.6, 0.45, 0.3, and 0.15, respectively. Similar to SOCEM, we can interpret the reduction of σ in SOEM as an annealing process (cf. Section 4.3.1), which overcomes the initialization issue. Comparing Figures 4.6 (c)-(d) to Figures 4.5 (c)-(d), we observe that the map obtained by SOEM is more concentrated than that obtained by SOCEM for the same σ value. This may be because SOEM learns the map in a more global manner than SOCEM, as noted in Section 4.3. In other words, each data sample contributes to all the neurons in a more global manner in SOEM than in SOCEM.
Simulations using SODAEM: Figure 4.7 depicts the simulations using the SO-DAEM algorithm with the same random initialization as that in Figure 4.5 (a) and Figure 4.6 (a). The value of σ is also fixed at 0.15, and the initial value of β is set to 0.16. When SODAEM converges at a β value, it is applied again with βnew=β × 1.6 and the reference models obtained in the previous phase. We stop the learning process at β = 17.592. In our experience, it is appropriate to set the maximum value of β within the range 10 to 20 for practical applications. When β = 0.16, the temperature is high enough to ensure a smooth objective function. Therefore, according to the parameter update rules of SO-DAEM, the reference models form a compact ordered map via lateral interactions near the center of the data samples, even though the neighborhood size is small (σ = 0.15 in this case). When β = 1.04 and 17.592, SODAEM is almost equivalent to SOEM and SOCEM, respectively. In these two cases, SODAEM converges to the ordered maps in Figure 4.7 (f) and Figure 4.7 (i), respectively. However, as shown in Figures 4.5 (a)-(b) and Figures 4.6 (a)-(b), SOCEM and SOEM do not converge to an ordered map when σ = 0.15, which demonstrates that the annealing process of SODAEM overcomes the initialization problem of SOCEM and SOEM when σ = 0.15. Note that SODAEM may not be able to obtain any ordered map during the annealing process if the value of σ is too small to form an ordered map at a small β value.
Discussion: The experiment results obtained by the three proposed algorithms and KohonenGaussian for the 20 random initializations are summarized in Table 4.2. Several conclusions can be drawn from the results. First, SOEM often converges to an ordered map even at a small, fixed σ value (σ = 0.15 in the experiments); but KohonenGaussian and SOCEM seldom do so. This may be because SOEM learns the map in a more global way, as noted in Section 4.3; hence, it is less sensitive to the initialization of the parameters when σ is small. The results for KohonenGaussian and SOCEM are similar. This may be because they only differ in the winner selection strategy. Second, the initialization issue of KohonenGaussian, SOCEM and SOEM can be overcome by using a larger σ value
Table 4.2: Results of simulations using KohonenGaussian, SOCEM, SOEM, and SO-DAEM in 20 independent random initialization trials on the synthetic data. The algo-rithms were run with two setups for σ in hkl. When σ = 0.15, KohonenGaussian succeeded in converging to an ordered map in one random initialization case (S:1), but failed in the remaining cases (F:19).
Setup for σ σ = 0.15 σ = 0.6 initially, and is reduced to 0.15 in 0.15 decrements
KohonenGaussian S:1 S:20
F:19 F:0
SOCEM S:1 S:20
F:19 F:0
SOEM S:15 S:20
F:5 F:0
SODAEM S:20
-F:0
-(0.6 in the experiments) initially, and then gradually reducing the value to the target σ value (0.15 in the experiments). The reduction of σ can be interpreted as an annealing process (cf. Section 4.2.1, Section 4.2.2, and Section 4.3.1). Third, the experiment results show that SODAEM overcomes the initialization issue of SOCEM and SOEM at a small σ value (0.15 in the experiments) using the annealing process, which is controlled by the temperature parameter β.
4.6.1.2 Results on PenRecDigits C0
We also conducted experiments on real-world data using the setups for the neighborhood function described in Section 4.6.1.1. Table 4.3 summarizes the results obtained by the four PbSOM learning algorithms. From the results, we can draw the same conclusions as those made for the experiment results on the synthetic data. Figures 4.8, 4.9, and 4.10 demonstrate, respectively, the map-learning processes of SOCEM, SOEM, and SODAEM using one of the 20 random initializations. Comparing Figures 4.8, 4.9, and 4.10 , we observe that these three algorithms obtain rather different results. SOCEM and SOEM usually obtain different maps because they learn the maps based on different cluster-ing criteria (classification-likelihood vs. mixture-likelihood). SODAEM and SOEM (or SOCEM) usually obtain different results because SODAEM’s annealing is achieved by increasing the β value, while SOEM’s (or SOCEM’s) annealing is achieved by decreasing the σ value. Comparing Figures 4.9 (f) and 4.10 (f), although SODAEM becomes equiva-lent to SOEM when the value of β is increased to 1.04, their search paths on the objective function surface are different because they have rather different seed models (Figure 4.10 (e) vs. Figure 4.9 (e)). Therefore, they converge to different local maxima of the objective function and obtain different maps. Likewise, although SODAEM becomes equivalent to SOCEM when the value of β is increased to 17.592, they converge to different local max-ima of the objective function and obtain different maps (Figure 4.10 (i) vs. Figure 4.8
Table 4.3: Results of simulations using KohonenGaussian, SOCEM, SOEM, and SO-DAEM in 20 independent random initialization trials on PenRecDigits C0. The algo-rithms were run with two setups for σ in hkl. When σ = 0.15, KohonenGaussian suc-ceeded in converging to an ordered map in one random initialization case (S:1), but failed in the remaining cases (F:19).
Setup for σ σ = 0.15 σ = 0.6 initially, and is reduced to 0.15 in 0.15 decrements
KohonenGaussian S:1 S:20
F:19 F:0
SOCEM S:2 S:20
F:18 F:0
SOEM S:14 S:20
F:6 F:0
SODAEM S:20
-F:0
-(f)).