Chapter 4 Image Alignment Applications
4.2.2 Execution Phase
∑
1=
= N
i
f x
iy
iz
iTotDist
(4.11) where N is the number of the cube model. Thus, when the resolution of the cube model is sufficiently high, any arbitrary point clouds inside the cube can be send into a trained TNFN to estimate the distance between the input point clouds and the reference model.In consideration of training a TNFN to model the reference surface, as well as the coarse alignment learning, RGLS-HCCA is also utilized to perform training the TNFN.
4.2.2 Execution Phase
In the execution phase, the input point clouds are aligned with the reference model by means of MVFH extraction, TNFN-based coarse alignment, and TNFN-based fine alignment.
MVFH extraction has been discussed in Section 4.2.1 (Part (a)), whereas the TNFN-based coarse and fine alignments are described bellow.
(a) TNFN-based coarse alignment
Assuming the MVFH descriptor has been calculated, the descriptor is forwarded to the trained TNFN to obtain the rotation angles (
φ
,ϕ
,θ
) and translation parameters (x, y, z). Then, the six parameters are used to compute the rotation matrix R and translation vector T defined in Eq. (4.6). Based on R and T, we obtain the estimated pose to coarsely align the input point clouds with the reference model.(b) TNFN-based fine alignment
The procedure of the TNFN-based fine alignment consists of the TNFN mapping and the downhill simplex optimization [96]. In the TNFN mapping, the TNFN maps each 3D point cloud (
x
,y
,z
) into a 1D distance functionf
(x
,y
,z
) (defined in Eq.(4.16)). The total distance function∑
f(x,y,z) is computed by summing of each distance mapping of 3D point cloud. Thus, the total distance function is used as the cost function of the subsequent downhill simplex optimization. In downhill simplex optimization, iterative calculation of rigid transformation between input point clouds and reference model is adopted to minimize the cost function. Each iterative loop uses the downhill simplex method to compute the rotation matrix R and translation vector T to perform fine alignment. Once the downhill simplex optimization is completed, the final R and T are used to calculate the estimate pose that align the input point clouds with reference surface.Detail steps of the downhill simplex optimization [97] for fine alignment of 3D surface are described as follows:
Step 0: Under 3D rigid body transformation, we choose six degrees of freedom (three rotation angles (
φ
,ϕ
,θ
) and three translation parameters (x, y, z)) as the vertex of simplex. Then we randomly generate 6+1 initial vertices of simplex within a fixed range where 6 represents the dimension of vertex vector. In this study, the 7 initial vertices are denoted asX
0,X
1,L,X
6. Step 1: Two procedures are performed in this step.(1) Evaluation: Based on each vertex of simplex, we can compute the corresponding rigid transformation matrix defined in Eq. (4.6). According to the transformation matrix, the
we sort the
C
(X
i) and set the order as follows:
C
(X
0)<C
(X
1)<L<C
(X
6). (4.12) Step 2: In this step, the reflection pointX
6R is calculated. The downhill simplex optimization utilizes the reflection point as the first candidate point to replace the worst pointX . The
6 reflection point is calculated as follows:(a) First find centroid of the remaining point (
X
0 ~X
5): (b) Then seek the reflection point:
X
6R =M
+α
(M
−X
6), (4.14) where α >0 and the default value is α =1.(c) Finally,
C
(X
6R) can be calculated by the means of evaluation method described in Step 1.Step 3: There are 3 cases are discussed in this step.
Case 1: If
C
(X
6R)≥C
(X
0) andC
(X
6R)<C
(X
5), chooseX
6R to replaceX . Then we
6 re-sort the simplex and forward to Step 4.Case 2: If
C
(X
6R)<C
(X
0), compute the expansion pointX
6E as follows:X . After that, we re-sort the simplex and forward to Step 4.
6Case 3: If
C
(X
6R)≥C
(X
0) andC
(X
6R)≥C
(X
5), compute the contraction pointX
6C as follows:
X
6C =M
+β
(X
6 −M
), (4.16) where 0<β
<1 and the default value isβ
=0.5. IfC
(X
6R)<C
(X
6) , then6 .
6
X
RX
= Otherwise, ifC
(X
6R)≥C
(X
6), thenX
6 =X
6. Subsequent, check the case ofC
(X
6R) as follows:(i) If
C
(X
6C)<C
(X
6), chooseX
6C to replaceX . Then, we re-sort the simplex
6 and forward to Step 4.(ii) If
C
(X
6C)≥C
(X
6), shrink the whole simplex towardX . After shrinking, the
0 new simplex is expressed as:
[ X
0,(η X
0 +(1−η
)X
1),L,(η X
0 +(1−η
)X
5),(η X
0 +(1−η
)X
6)]
, (4.17) where 0<η
<1 and the default value isη
=0.5.Step 4: If the least cost function meet one of the following conditions, the downhill simplex method is terminated, and output the final results.
(a) The number of loops reaches a predefined maximal iteration value.
(b) The value of cost function is less than a minimal threshold.
Otherwise, if the least cost function does not meet the above conditions, then we feedback to the Step 2 to continue the optimization procedure.
To sum up, the final results of the downhill simplex method would output the best vertex of simplex. Then we decode it to the six degrees of freedom (
φ
,ϕ
,θ
,x ,
,y z
). TheseChapter 5
Experimental Results
In this chapter, the performance of RGLS-HCCA is demonstrated on three problems. The first one is a problem of prediction of Mackey-Glass time series. This problem is a common benchmark for examining different learning algorithms. By applying RGLS-HCCA to the benchmark, RGLS-HCCA would show how fast the algorithm converges and lower estimating error comparing with other learning algorithms. Subsequently, two real world problems, which are 2D and 3D image alignment tasks, are used to verify the applications of RGLS- HCCA. The proposed RGLS-HCCA would act from a simulator to a real system. The experiments would evaluate the proposed method of aligning 2D and 3D images in comparison with other typical alignment systems.
This chapter is divided into three subsections. In Section 5.1, the prediction of Mackey-Glass time series is used to examine the learning performance of RGLS-HCCA. In Section 5.2 and 5.3, RGLS-HCCA is applied to 2D and 3D image alignment problems, respectively.
All experiments in this chapter are performed by using an Intel Core i7 860 chip with a 2.8GHz CPU, a 3G memory, and the Matlab 7.5 simulation software.
5.1 Prediction of Mackey-Glass Time Series
To verify the proposed RGLS-HCCA, Mackey-Glass time series is utilized to compare RGLS-HCCA with that of other methods. The initial parameters of the proposed RGLS-HCCA are determined by parameter exploration methods ([98] and [99]). As shown in [98], a small population size is good for the initial performance, a large population size is good for long-term performance and a low mutation rate is good for on-line performance, a high mutation rate is good for off-line performance. Moreover, in [99], parameters for genetic
algorithms can be adjusted by exploring the predefined range in increments of a small value.
For instance, the population size has the range from 10 to 100 in increments of 10. Thus, this study adjusts parameters of RGLS-HCCA according to the criteria mentioned in parameter exploration methods. The results of parameters used in this study are listed in Table 5.1 where
“none” in SLE indicates “not used” in the learning phase.
Moreover, since AT
A (with size of 50 × 50 under conditions of 10 fuzzy rules and four
input in a TNFN) in Eq. (3.5) is singular (rank of ATA is about 47) for the example of Mackey
Glass time series prediction, this dissertation incorporates RGLS to make (ATA+λI) is
non-singular. To consider the RGLS parameter (λ), this paper adopts the cross-validation method [100] to adjust it. The notion of the cross-validation method is to divide the training data set into training data and validation data and increase λ with small increments to balance the error of training data set and validation set. Thus, this paper uses cross-validation method to optimize the RGLS parameter (λ) and final adjusted λ of this example is listed in Table 5.1.Table 5.1: Initial parameters of RGLS-HCCA before training.
Value
Parameters PLE SLE
Psize 30 20
Nc 20 none
Selection_Times 40 none
NormalTimes 10 none
ExploreTimes 15 none
Crossover Rate 0.6 0.6
Mutation Rate 0.2 0.3
[Mmin, Mmax] [6, 15] [6, 15]
[mmin, mmax] [-5, 5] [-5, 5]
[σmin, σmax] [3, 20] [3, 20]
Minimum_Support TransactionNum/2 none
researches [102] followed Lapedes and Farber’s work to be a benchmark to examine algorithms. Thus, we utilize such Mackey-Glass time series to perform an analysis on our proposed algorithm and other evolutionary algorithms.
The Mackey-Glass time series is generated from the following delay differential equation:
For this time series prediction problem, Jang [103] extracted 1000 input-output data pairs {x, yd} from t=118 to t=1117, which consisted of four past values of x(t), that is
[
x
(t
−18),x
(t
−12),x
(t
−6),x
(t
);x
(t
+6)], (5.2) where τ=17 and x(0)=1.2 and x(t)=0 for t<0. The reason choosing four past values to predict time series is from Jang’s [103] work which wanted to allow comparison with other researches’ algorithms (Lapedes and Farber [101], Moody [104], Crower [102]). Thus, there are four input to RGLS-HCCA, corresponding to these values of x(t), and one output representing the value x(t+Δt), where Δt is a time prediction into the future. The first 500 pairs [from x(118) to x(617)] are the training data set, and the remaining 500 pairs [from x(618) tox(1117)] are the testing data set used for validating the proposed method. The values are
floating-point numbers assigned using the RGLS-HCCA initially. The fitness function in this case is defined in Eq. (3.26) and (3.27) to train the neural fuzzy network. The evolution learning processes 500 generations and it is repeated 50 times. For comparative analysis, the present study adopts the root mean square error (RMSE), which is defined as follows:1 ( ( 6) ( 6)) 1/2, predicted value by the model with four inputs and one output.
In this example, RGLS-HCCA is compared the performance with the HESP [23], ESP
[14], and SANE [13]. In these models, the learning parameters, which are determined according the parameter exploration method [98] and [99], are shown in Table 5.2. To perform training, the evolution learning processes for 500 generations. Figure 5.1(a)-(d) show the prediction results of the three models. The symbol “o” represents the desired output of the time series, and the symbol “*” represents the output of the four models. Figures 5.2(a)-(d) illustrate the error between the desired and four models’ outputs. As shown in Fig. 5.1-2, the performances of the RGLS-HCCA are better than those of others. Fig. 5.3 shows the learning curves of the four models. As shown this figure, the proposed RGLS-HCCA model converges faster than those of other three models.
(a) (b)
(a) (b)
(c) (d)
Figure 5-2: Prediction errors of the (a) proposed RGLS-HCCA, (b) HESP, (c) ESP, and (d) SANE.
Figure 5-3: Learning curves of the proposed RGLS-HCCA, HESP, ESP, and SANE.
In addition HESP, ESP, and SANE, to further show the effectiveness and efficiency of the proposed RGLS-HCCA model, we also apply MGCSE [15], and traditional genetic algorithm (TGA) [16] to the same problem. To compare with theses algorithms, according the
parameter exploration method [98] and [99], 14, 13, 12, 14, and 12 fuzzy rules are set for HESP, MGCSE, ESP, SANE and TGA, respectively. In addition, the population size has the range of 10 to 250 in increments of 10, the crossover rate has the range of 0.1 to 1 in increments of 0.1, and the mutation rate has the range of 0 to 0.4 in increments of 0.01. To this end, the parameters used for HESP, MGCSE, SANE and TGA are listed in Table 5.2. In addition, as same with RGLS-HCCA, the evolution learning of each method processes for 500 generations and is repeated 50 times. Table 5.3 lists the generalization capabilities of the proposed RGLS-HCCA, HESP, MGCSE, ESP, SANE, and TGA. Clearly, as shown in Table 5.3, RGLS-HCCA obtains a lower RMSE than other methods. In TGA, according to [13], cooperative coevolutionary algorithms can find solutions faster and solve harder problems than TGA. Thus, RGLS-HCCA and other methods (HESP, MGCSE, ESP, and SANE) exhibit lower RMSE than TGA. In SANE, symbiotic evolution is adopted. Since symbiotic evolution only used one population to evaluate every partial solution, the evaluation would cause partial solutions too similar. Instead, the proposed RGLS-HCCA provides several groups to evaluate each partial solution. Thus, the proposed model has more chance to obtain optimal solution.
The explanation can specify that the proposed method has better performance than SANE. To consider group-based evolutionary algorithms (HESP, MGCSE, and ESP), when faced with complex problems, the dimension of chromosomes is still high such that low convergence rate occurs. Thus, this dissertation incorporates RGLS to reduce the dimension of chromosomes and proposes HCCA to self adjust the parameters and structure of TNFN. Based on this fact,
Table 5.2: Initial parameters of four learning models.
Table 5.3: Performance comparison of various existing models.
RMSE Method
Best Mean Worst STD
RGLS-HCCA 0.0017 0.0023 0.0026 0.0005
HESP 0.0118 0.0149 0.0193 0.0017
MGCSE 0.0100 0.0158 0.0190 0.0019
ESP 0.0110 0.0172 0.0219 0.0026
SANE 0.0145 0.0219 0.0313 0.0039
TGA 0.0192 0.0271 0.0747 0.0079
Furthermore, this example also compares the running time of RGLS-HCCA with that of other methods. The running time defined in this case is used to measure the time when the fitness of the algorithm exceeds the predefined value (0.85). The results of four algorithms over 50 runs are reported in Table 5.4. As shown in this table, the proposed RGLS-HCCA is faster than HESP, MGCSE, ESP, SANE, and TGA.
Table 5.4: Comparison of the running time of various algorithms.
Method Best(seconds) Worst(seconds) Mean(seconds) RGLS-HCCA 6.07 43.02 23.28
5.2 Results of 2D Image Alignment
In the 2D image alignment experiment, visual inspection images, which are 640 by 480
pixels size, are used to examine the utility of the proposed CNFN-based image alignment method. Figure 5.4 depicts an example about such images where the left side is a reference image and the right side is a transformed image by a scaling, rotation and translation. Also in this figure, the dashed window represents a template window (the size is 200×200, and feature vectors are extracted within this window), and the cross sign denotes the reference location of the template.
(a) (b)
Figure 5-4: (a) Reference image. (b) Testing image with scale=0.9, rotation=-10°, vertical translation=5, horizontal translation=10.
In the following 2D image alignment experiments, two kinds of neural works are performed. The first one is a one-stage of CNFN (OS-CNFN), which is taken into consideration of applying to the medium range of affine parameters and examining different learning methods. The second one is a multi-stage of CNFN (MS-CNFN), which is used to apply the trained networks to adapt to a large range of affine parameters.
5.2.1 Alignment Results of One-stage Neural Fuzzy Network
In Table 5.5, four types of experimental images are prepared for simulation. The first three
Table 5.5: Experimental images preparation.
Image Type Image Preparation
Synthesized Images 800 images are generated with randomly selected affine parameters within the predefined range.
Training Images The 50% of synthesized images Testing Images The 50% of synthesized images
Real Images Images are acquired from CCD camera with different pose from the reference image.
Table 5.6: Range of affine transformation parameters used in experiments.
Affine transformation parameter The range of affine transformation parameter
Scale [0.7 1.3]
rotation(degrees) [-30 30]
vertical translation(pixels) [-20 20]
horizontal translation(pixels) [-20 20]
The following parts will discuss the comparison with existing learning methods and with existing image alignment systems.
Part 1: Comparison with existing learning methods
Three typical evolutionary learning methods, which are HESP [23], ESP [14] and SANE [13], are implemented carefully (the learning parameters are found using the method given in [98] and [99]) to compare with the proposed RGLS-HCCA. Moreover, to explore the number of fuzzy rules for HESP, ESP and SANE, the fuzzy rules are tuned by setting the range of 20-100 in increments of 5. Thus, the results find that 85, 80 and 80 rules are suitable for SANE, ESP, and HESP respectively.
In this experiment, 800 synthesized images are generated randomly by the way in Table 5.5 where 50% of images are for training set and another 50% ones are for testing set. Then 33-element feature vectors are obtained by applying Gabor-WGOH with PCA dimensionality reduction to above-generated images. Moreover, before training, the initial parameters of RGLS-HCCA are given in Table 5.7. The initial parameters are tuned by the parameter exploration method (where the RGLS parameter (λ) is adjusted by cross-validation method) which has been described in section 5.1.
To consider SRM in RGLS-HCCA, Figure 5.5 shows the best results of the probability vectors for 15 runs in different training and testing images. As shown in Fig. 5.5, the highest probability means the most suitable number of fuzzy rules of the TNFN model in the best run.
Therefore, the suitable number of fuzzy rules is 24. It represents that in most cases a 24-rule TNFN would have higher probability to obtain better performance than other rules within [Mmin, Mmax] = [18, 25].
Figure 5.6 depicts the learning curves of four models. From this figure, RGLS-HCCA demonstrates faster convergence speed than those of HESP, ESP and SANE. Moreover, to examine the learning accuracy, the testing data would be sent into the trained TNFNs to get the estimated pose including scale, rotation, vertical translation, and horizontal translation.
Then, by comparing the desired pose, four alignment errors (i.e. ErrScale, ErrAngle, ErrDx, and ErrDy) are generated. Table 5.8 presents the learning accuracy of four evolutionary models. From this table, the proposed RGLS-HCCA exhibits the lowest errors among four models. To this end, the proposed model not only promotes its leaning speed but also sustains the high learning accuracy.
Table 5.7: Initial parameters before training.
Value Parameters
PLE SLE
Psize 40 20
Nc 20 none
Selection_Times 50 None
NormalTimes 10 None
ExploreTimes 15 None
Crossover Rate 0.6 0.7 Mutation Rate 0.2 0.4 [Mmin, Mmax] [18, 25] [18, 25]
Figure 5-5: Best results of the probability vectors for 15 runs in SRM.
Figure 5-6: Learning curves of the RGLS-HCCA, HESP, ESP, and SANE methods.
Table 5.8: Leaning accuracy of the RGLS-HCCA, HESP, ESP, and SANE methods.
Mean Errors Method
ErrScale ErrAngle
(degrees) ErrDx
(pixels) ErrDy (pixels) RGLS-HCCA 0.0066 0.3252 0.4953 0.5058 HESP 0.0223 1.4431 1.1309 1.1600 ESP 0.0229 2.0470 1.1051 1.6137 SANE 0.0247 2.0311 1.4620 1.8132
Part 2: Comparison with existing image alignment systems
To evaluate OS-CNFN (i.e. the proposed system) in comparison with other existing systems ([42], [44], [87], [88], and [91]), the implementation of these existing systems are carefully cited their original paper. The comparison in this section consists of the alignment accuracy, alignment speed, robustness and real image alignment case. These comparisons are discussed in the following parts.
A. Alignment accuracy
To compare the alignment accuracy of different systems, the training images, which are used to train neural networks, and the testing images, which are used to check the alignment accuracy, are generated by the way described in Table 5.5.
Figure 5.7 depicts an alignment example for a testing image on six different systems. The cross sign in this figure denotes the estimated results. From this figure, OS-CNFN can estimate more accurate position and orientation of the cross sign than other systems.
In addition, 15 runs using different training and testing images are performed to further examine the alignment accuracy of the proposed system. The simulation results are shown in Table 5.9, which presents the average and standard deviation error of six image alignment systems. From this table, OS-CNFN exhibits the lowest alignment error than other systems.
Moreover, the simulated data indicates that the alignment reaches the high accuracy level;
thus, OS-CNFN can provide a useful way to align images very accurately.
Table 5.9: Alignment errors in different image alignment systems.
(a)
(b) (c)
(d) (e)
(f) (g)
Figure 5-7: Alignment results for different systems: (a) Ground Truth, (b) OS-CNFN, (c) DCT, (d) FFT, (e) KICA, (f) ISOMAP, and (g) SIFT.
B. Alignment speed
To demonstrate the alignment speed, the execution time required in performing one image alignment task is discussed. In this paper, the steps of performing one image alignment task consists of capturing the template window from the input image, computing the feature within the window, and feeding the calculated feature into the trained network to get the affine parameters.
In this experiment, we utilize 400 testing images to perform image alignment tasks. The average execution time of OS-CNFN, DCT, FFT, KICA, ISOMAP, and SIFT take about 30ms, 26ms, 28ms, 65ms, 330ms, and 57ms respectively. From this result, it is obviously that OS-CNFN is almost as fast as the FFT and DCT systems and is more efficiently than other three systems.
C. Alignment Robustness
Next, the robustness of OS-CNFN under different levels of random additive Gaussian noise is discussed. In this experiment, 400 testing images are randomly generated with the addition of various strengths of Gaussian noise to examine the robust performance of different image alignment systems. Figure 5.8 illustrates an example of aligning a testing image with the reference image under 10 dB signal-to-noise ratio (SNR) condition. As shown in this figure, OS-CNFN estimates the rotation and translation of the cross sign more accurately than other methods.
The simulation results of the absolute estimating errors of affine parameters under eight
(a)
(b) (c)
(d) (e)
(f) (g)
Figure 5-8: Alignment results for different systems under 10 dB SNR condition: (a) Ground Truth, (b) OS-CNFN, (c) DCT, (d) FFT, (e) KICA, (f) ISOMAP, and (g) SIFT.
(a) (b)
(c) (d)
Figure 5-9: Average affine transformation errors comparison using OS-CNFN, DCT, FFT, KICA, ISOMAP, and SIFT under various SNR. Error with respect to (a) scale, (b) rotation, (c) translation on X-axis, and (d) translation on Y-axis.
D. Real Image Alignment Case
In this part, real images are utilized to verify the effectiveness of the proposed system.
Figure 5.10 (a)-(d) presents the results of aligning the same real image using OS-CNFN, DCT, FFT, KICA, ISOMAP, and SIFT respectively