• 沒有找到結果。

Chapter 3 Regularized Least Squares Based Hierarchical Cooperative

3.2 Structure Level Evolution

In this subsection, the structure level evolution (SLE) is discussed. The main processes of SLE involve six operations: receive good networks, reproduction, variable antecedent-part crossover, variable antecedent-part mutation, evaluation, insert good neurons. The details of these operations are described as follows:

a. Receive good networks: Before the structure evolution starts, we receive N well-performed networks from parameter level evolution to be chromosomes. The coding structure of chromosomes in the structure level evolution is shown in Fig. 3.7. In this figure, each block of a chromosome describes an antecedent part of a fuzzy rule that has the form in Eq. (2.5), where

m and

ij

σ

ij represent a Gaussian membership function with mean and deviation of ith dimension and jth rule node, respectively. The consequent part of a fuzzy rule is skipped to encode into chromosomes since regularized least squares is proposed to estimate the consequent part. After that, we sort the chromosomes to prepare for performing reproduction.

Figure 3-7: The coding the antecedent part of fuzzy rules into a chromosome in the structure level evolution.

antecedent-part crossover (VAC) is proposed to perform crossover. In VAC, two parents are selected by using the roulette-wheel selection method [74]. Because the selected parents may be with different length, the misalignment of individuals must be avoided in the crossover operation. Thus variable antecedent-part crossover is proposed to address this problem. The antecedent part means that only the antecedent of fuzzy rule is performed crossover operation. In VAC, two-point crossover [76] is adopted to execute crossover. Thus, new individuals are generated by exchanging the site’s values between the selected sites of the parents’ individuals. In VAC, to avoid the misalignment of individuals in the crossover, the selection of the crossover points would not exceed the shortest length chromosome of two parents. Two individuals with different lengths using VAC operation are shown in Fig. 3.8. where ARj represents the parameters of the antecedentpart of the jth rule in the TNFN, and Rk represents there are k fuzzy rules in a TNFN. After performing the VAC, the new offspring can replace the individuals with poor performance.

Figure 3-8: Variable antecedent-part crossover operation in the structure level evolution.

d. Variable antecedent-part mutation: The mutation operator can randomly alter the allele of a gene. It provides new information to every population at the site of an individual. In the structure level evolution, the variable antecedent-part mutation (VAM) is adopted to perform the mutation operation. The benefit of VAM is to be applied to different length of chromosomes. The VAM operation of each individual is shown in Fig. 3.9 where AR indicates antecedent part of fuzzy rule In VAM, uniform mutation [78] is adopted, and the

mutated gene is drawn randomly from the domain of the corresponding variable.

Figure 3-9: Variable antecedent-part mutation operation in the structure level evolution.

e. Evaluation: The evaluating step is to evaluate the fitness of each chromosome that has not already been evaluated in a population. The higher a fitness value indicates the better performance. Since each chromosome only includes the antecedent part of fuzzy rules, the consequent part of fuzzy rules is not defined. Thus, similar to the fitness assignment in PLE, the RGLS method is used to estimate the consequent part of fuzzy rules. After the antecedent and consequent part are determined, the TNFN is constructed. Then, evaluate every TNFN to obtain a fitness value. In this paper, the fitness value is designed according to Eq. (3.26) and (3.27).

f. Insert good neurons: After the evaluation operation, if a network has a higher fitness value than the best network in the parameter level, then insert the neurons into the corresponding groups of subpopulation in the parameter level evolution.

Thus, the whole learning process of SLE is summarized in Figure 3.10.

Figure 3-10: Whole learning process of SLE.

In short, the purpose of SLE is to reserve the good combinations of fuzzy rules produced by PLE and evolve the structure of the produced neural fuzzy networks. Thus, the utility of SLE is to fine tune the evolved results of PLE. To this end, PLE would be a major evolution to evolve TNFNs and it affects the effectiveness of the proposed RGLS-HCCA model.

Chapter 4

Image Alignment Applications

To demonstrate the applicability of RGLS-HCCA to real world problems, two image alignments tasks are taken to consideration: 2D image alignment and 3D image alignment.

For a 2D image alignment problem, it is considered of great importance in numerous industrial applications including automatic visual inspection, electronic component assembly automation, circuit board inspection, and robotic machine vision. Among them, an automatic visual inspection system [80-82] is one of the most important fields for seeking an accurate geometric transformation to align images. To this end, neural network based methods have widespread to face this problem. The reason is that such methods often extract global features from images and feed them into a trained neural network to estimate geometric transformations parameters. In this dissertation, RGLS-HCCA can be used to develop a neural fuzzy network-based image alignment system to demonstrate high performance.

For a 3D image alignment problem, it is considered a critical step in object recognition [83], surface reconstruction [84], and image-guided surgery [85]. Two major concerns for the alignment task are execution time and alignment accuracy. Recently, neural network-based methods have become very popular due to their high efficiency. Thus, a TNFN-based coarse-to-fine 3D surface alignment scheme is proposed in the current dissertation.

In this chapter, two subsections are used to introduce the proposed alignment systems.

training images are created by applying the reference image to affine transformations with randomly selected parameters, and then use the Gabor-weighted gradient orientation histograms (Gabor-WGOH) descriptor to represent these training images as feature vectors.

Finally, the feature vectors and desired targets are employed to train a CNFN using RGLS-HCCA. During the executing phase, the sensed image is sent to the Gabor-WGOH descriptor to extract a feature vector and then feed it into the RGLS-HCCA trained CNFN to estimate affine transformations parameters. Then, the estimated parameters are taken to align the sensed image with the reference image. The following subsections will introduce the process of the proposed 2D image alignment scheme.

Figure 4-1: Flow chart of the proposed image alignment algorithm.

4.1.1 Off-line Procedure

The objective of the off-line procedure is to train CNFN. Four main parts in the procedure are synthesized training images creating, Gabor-WGOH descriptor generating, self-organized training data yielding, and CNFN training. These parts are described as follows.

(a) Synthesized Training Images Creating

The synthesized training images can be generated by applying various combination of translation, rotation, and scaling transformations within a predefined range. The

transformation model is affine transformation which can be described by the following matrix

The WGOH descriptor has been compared by several global descriptors [40-42, 86-88]

using a nearest-neighbor search of the feature vector proposed by [89] and [90]. Thus, WGOH was proven a good descriptor [86] and [90], inspired by Scale Invariant Feature Transform (SIFT) descriptor [91],and presented by Bradley et al. to show its high speed [92].The main idea of the WGOH is that it calculates the orientation histograms within a region, and uses the magnitude of the gradient at each pixel and the 2D Gaussian function to weight the histogram [86]. Therefore, for the WGOH descriptor, there are four steps for representing an image:

1 For each image, we capture the template window, whose location is at the center of the image, to be a place of extracting features. Within the window, we divide the length and width of the window into 4 equal parts to form 4×4 grids. Each grid is considered a sub-image. Thus the template window can be split into 4×4 sub-images.

2 On each pixel of the sub-image

I

( y

x

, ), the gradient magnitude

m

( y

x

, ), and orientation

4 Concatenate 8-bin histograms of 16 sub-images into a 128-element feature vector, and normalize it to a unit length. To reduce strong gradient magnitudes, the elements of the feature vector are limited to 0.2, and this vector is normalized again.

Consequently, each image can be represented by a 128-elemet feature vector. Fig. 4.2 illustrates an example of WGOH computation steps. However, using pixel difference to compute the gradient is sensitive to noise. To avoid such sensitivity, Moreno et al. combined a Gabor filter with WGOH descriptor to suppress noise [93]. Based on this fact, we adopt the Gabor-WGOH descriptor for representing an image.

Because the 128-elemet feature vector is still too high to train a TSK-type neuro-fuzzy network, there is a requirement of finding a dimensionality reduction method to lower the dimension of the feature vector. In order to lower the dimension of feature vector, we further employed principal component analysis method (PCA) to reduce the 128-elemet feature vector into a 33-element one. Therefore, each image can be represented by a 33-elemet feature vector.

Figure 4-2: Steps for creating a WGOH feature vector.

(c) Yielding self-organized training data

After describing the Gabor-WGOH descriptor, this paper proposes a self-organized training data-creating method to provide an appropriate training data set for training neural

fuzzy networks. The major advantages of the proposed training data-creating method are that it can prevent the generation of the redundant data and supply a self-organized training data set for training a neural fuzzy network efficiently. The steps for yielding the self-organized training data are as follows:

Step 1: First, generate a small training data set {

S

train}.

Step 2: Then, utilize the training data set to train a neural fuzzy network.

Step 3: Input a fixed number of testing data set {

S

test} into the neural network to create the

Step 4: If ErAcc <

t , then accumulate the LoopNum=LoopNum+1. Otherwise, set

er

LoopNum=0. The symbol t indicates the threshold of the error accumulator, and

er

LoopNum means the accumulating number of loop.

Step 5: If LoopNum > loop threshold

t

loop, terminate the training and output the training set }

{

S

train . Otherwise, go to step 2 to run recursive training.

In Step 3, the insert testing data is the data that the neural fuzzy network does not perform well. Therefore, inserting such data can enhance the learning ability of the neural network and prevent the selection of the redundant training data. Moreover, from Step 5,

cooperate in adapting to a large range of affine transformation. The aim of this operation is to improve the problem of applying a large range of affine transformation to traditional one-stage neural network which can cause a large amount of training data; such a network is difficult to train. The cooperative networks can be seen a coarse-to-fine aligning the captured image with reference image.

Figure 4.3 presents the process of cooperative neural fuzzy network. From this figure, each stage deals with a certain range of affine parameters and they cooperate to get a large range of affine parameters. As input an image with an unknown pose, the cooperative neural fuzzy network would gradually reduce the pose difference between the input and reference image. Thus the final pose with respect to the reference image can be written as the following equation:

P

final =

P

1+

P

2 +L+

P

N, (4.5) where P1, P2, and

P indicates the estimated pose from 1st, 2nd, and Nth stage of the

N neural network.

Figure 4-3: Process of cooperative neural fuzzy networks.

To perform training CNFN with providing the training data, this study proposes RGLS-HCCA to accomplish it. In CNFN, once the dynamic image alignment range of each stage has been determined, each network can be trained independently. Thus, the training process of each stage of CNFN is similar, and the only difference is its training parameters. To

this end, RGLS-HCCA is used to train each stage of CNFN to estimate the pose with respect to the input image.

4.1.2 On-line Procedure

In the on-line phase, the sensed image (input image) is sent to the Gabor-WGOH descriptor to extract a feature vector and then feed it into RGLS-HCCA trained CNFN to estimate transformation parameters, which include the scaling factor s, rotation angle θ , and translation (Δx, Δ ), to be taken into aligning images. More specifically, the proposed

y

CNFN performs N-stages of neural fuzzy network (as shown in Fig. 4.3) to gradually align the sensed image with the reference image. Thus, the image alignment error will be reduced stage by stage and finally get the best aligning pose with the reference image.

4.2 3D Image Alignment System

According to Chapter 2, each pixel in a 3D image can be considered as a 3D point cloud data with respect to the laser scanner. Thus, a 3D image is viewed as a collection of 3D point clouds and these point clouds can represent arbitrary 3D surface. Based on this fact, aligning two 3D images is likely to align two 3D surfaces and other researches also call 3D image alignment to be 3D surface alignment (or registration) [45]. In this dissertation, the objective of a 3D image alignment is to align a captured 3D image (i.e. 3D surface) of an object in an arbitrary view with the 3D surface of the reference model.

Figure 4.4 presents the flow diagram of the proposed 3D image alignment system. In the learning phase, two data flows are performed for training TNFNs to adapt two levels of image

Figure 4-4: Flow diagram of the proposed 3D image alignment system.

4.2.1 Learning Phase

The objective of the learning procedure is to train two TNFNs for applying coarse-to-fine 3D image alignment. These two major parts of the procedure are the coarse alignment learning and the TNFN-based surface modeling. These parts are described in the following contents.

(a) Coarse alignment learning

The goal of coarse alignment is to determine an approximate rigid transformation that coarsely aligns the reference model with the input point clouds. The coarse alignment must be quick to provide a good initial transformation for the fine alignment task. Thus, TNFN is utilized to learn any case of rigid transformation within the predefined range. Once the training of TNFN is completed, input arbitrary view of point clouds would yield the estimate pose with respect to the reference model. Therefore, the executing phase of the TNFN is simple and efficient.

The procedures of proposed coarse alignment learning involves generating synthesized training point cloud data, yielding the modified viewpoint feature histogram (MVFH), and

training the TNFN. These operations are introduced as follows:

(i) Generating synthesized training point cloud data

Figure 4.5 depicts the point cloud data of the reference model. The reference model is an integrated model constructed by collecting multi-views of point cloud data. To generate the synthesized training point cloud data, various combinations of translation and rotation transformations within a predefined range are applied in the reference model. The transformation can be considered a rigid transformation, which can be written as follows:

where R is a rotation matrix, T is a translation vector, s is an original set of point cloud data and m is a transformed set of point cloud data. Furthermore, to simulate the real case in a 3D scene, point cloud data that cannot be seen in the viewpoint direction are eliminated. Figure 4.6 presents an example of the simulated training data. As shown in this figure, the point cloud data is only a partial of reference model and the unseen point clouds have been eliminated. Therefore, after the training point data has been generated, the following operation is to extract the feature of the point cloud data.

(a) (b)

Figure 4-6: Example of the simulated training data: (a) Front view and (b) Top view.

(ii) Modified Viewpoint Feature Histogram

Modified Viewpoint Feature Histogram (MVFH) is the modification of Viewpoint feature histogram (VFH), which was presented by Rusu et al. [94], to show its computationally efficient 3D feature. To introduce VFH in advance, this descriptor is computed by accumulating a histogram of the angles between the central viewpoint direction and each normal of point cloud. Figure 4.7 illustrates the idea of VFH.

Figure 4-7: Creation of viewpoint feature histogram.

Suppose the central point is

V and the viewpoint is

c

V . Then the central viewpoint

p direction is

V

c

V

p. Thus the angle

θ

between the central viewpoint direction (

V

c

V

p) and

each normal

n of point cloud

i

V can be computed by the following equation:

i

Thereafter, the N-bin orientation histograms (each bin cover 180/N degree) can be calculated by accumulating the angle described in Eq. (4.7). The histogram in each bin is normalized by dividing the total number of point clouds. Thus, such histogram indicates the percentage of point clouds falling in each bin. However, in 3D surface alignment tasks, the viewpoint direction angle to represent the 3D surface might be not appropriate because VFH in some much different view angles would yield similar feature, especially in the case of symmetrical objects with 180 degree view angle difference. Figure 4.8 illustrates an example of similar VFH with much different view angle. As shown in this figure, the object is at two much different viewpoints but they have similar viewpoint feature histogram.

that the 3D feature is utilized to identify the view angle and if the 3D feature is view independent, the captured feature would be similar in each view such that it is impossible to differentiate the exact view angles in an object. Regarding this fact, we modify the original viewpoint feature histogram by calculating another viewpoint direction related angle to improve the viewpoint feature histogram. Then we name such viewpoint direction as modified viewpoint feature histogram (MVFH). Figure 4.9 presents a diagram that describes two viewpoint direction related angles where

θ

is the original angle used by VFH,

φ

is new added angle used by MVFH, the central point is

V , the viewpoint is

c

V , and

p

V is a certain

i 3D point.

Figure 4-9: Diagram describes two viewpoint direction related angles

θ

and

φ

.

The new added angle

φ

can be computed by the following equation:

.

Then the N-bin orientation histograms (each bin cover 180/N degree) can be computed by accumulating the angle

φ

. Thus, MVFH is finished by dividing the total number of point clouds to normalize histogram in each bin. To demonstrate the improvement of the modified viewpoint feature histogram, we utilize the previous example presented by Fig. 4.8, which has

similar VFH in much different view, to re-computed MVFH. Figure 4.10 depicted the computed MVFH. As shown in this figure, the first histogram and the second histogram have different shape. This example clarifies that MVFH correct the error of much different view with similar VFH.

Figure 4-10: Example of modified viewpoint feature histograms in much different view.

(iii) TNFN Training

After extracting MVFH from a 3D object, let MVFH be the input neurons of TNFN and let the desired pose be the output neurons of TNFN. The desired pose comprises six degrees of freedom, including three rotation angles (

φ

,

ϕ

,

θ

) and three translation parameters (x, y, z).

Thus, the use of TNFN is to model the relationship between the MVFH and the desired pose.

Once receiving a MVFH from capturing a certain view of point clouds, the TNFN would

applying the transformation defined in Eq. (4.6). To reduce the correlations between training point clouds, the six parameters are selected randomly and independently within the predefined boundaries. After the training-set has been generated, the MVFH method is used to represent the training point clouds as input features of a TNFN. Subsequently, the proposed RGLS-HCCA would be adopted to begin training of a TNFN and the training procedure would stop as the stopping condition is satisfied. Although the training phase is lengthy, the executing phase of the proposed coarse alignment method merely consists of computing the MVFH descriptor and then feeding it into TNFN to estimate the corresponding pose.

(b) TNFN-based surface modeling

The purpose of the TNFN-based surface modeling is to provide an evaluation method for performing the fine alignment of 3D surface. The evaluation is to measure how close the distance from the reference surface to input point clouds is. Thus, the major part of the TNFN-based surface modeling is to use TNFN to model the 3D surface that maps the 3D Euclidean input space (input 3D point (x,y,z)) into 1D Euclidean output space (the shortest distance to the reference surface). Such mapping can be considered a cost function that

The purpose of the TNFN-based surface modeling is to provide an evaluation method for performing the fine alignment of 3D surface. The evaluation is to measure how close the distance from the reference surface to input point clouds is. Thus, the major part of the TNFN-based surface modeling is to use TNFN to model the 3D surface that maps the 3D Euclidean input space (input 3D point (x,y,z)) into 1D Euclidean output space (the shortest distance to the reference surface). Such mapping can be considered a cost function that