Object Detection - Intelligent Object Tracking Controller Design

Chapter 4 Intelligent Object Tracking Controller Design

4.1 Object Detection

Recently, the image processing has been widely applied to lots of researches by transforming images into other forms of images, called the feature images, to provide useful information for problems in military, medical science, industry, etc.

For simplicity, the object to be traced in this thesis is created by computer image which contains a white ball moving on a pure background in black and is projected on a screen, 240cm away from the Eye-robot. Figure 4.2 shows an image retrieved by the left-eye camera mounted on the Eye-robot. Clearly, in addition to the white ball and black background on the screen, there also includes undesirable things outside the screen in the image. To properly detect the white ball moving on the screen, this section will introduce some basic concepts and methods of image processing.

The pixels of the image retrieved by the Eye-robot are represented by a point (R,G,B) in the color cube as shown in Figure 4.1, where three axes are related to red, green and blue. Let the pixel be located at (x,y) with color (R,G,B), then its gray level can be expressed as

23 resolution; for example, the image used in this thesis is of the size 320240, and thus H=320 and W=240. Since the RGB space contains 256 levels, numbered from 0 to

255, along each axis in the color cube, there are 256³different colors in total and 256 different gray levels.

To detect the object on the screen, the image in RGB space of the left-eye camera in Figure 4.2 is first transformed by (4.1-1) into a gray level image as shown in Figure 4.3 with gray level intensity distribution given in Figure 4.4. Clearly, the gray level of the object, i.e., the white ball, is much higher than the other pixels and therefore a threshold can be easily selected for the object. Taking the threshold as 180 results in a binary image shown in Figure 4.5, where the pixel located at (x,y) is

Figure 4.1 RGB color space and the color cube.

and Figure 4.6 shows gray level image distribution.

Based on this binary image, the location of the object in an image is detected object via offline training with the use of neural network, which will be introduced in the next section.

Figure 4.3 Image transform to gray level

Object center (x,y) = (159,117) Gray level=254

Object

Background point (x,y) = (140,55) Gray level=105 Object center (x,y) = (159,117) (R,G,B)=(252,255,255)

Object

Background point (x,y) = (140,55) (R,G,B)=(88,110,116)

Figure 4.2 Image in RGB space

Figure 4.5 Binary image of taking the threshold as 180 Figure 4.4 Intensity of gray level distribution

255

4.2 Tracking Controller Design

This section describes the design of tracking controller, a neural-network-based controller, which is achieved via offline training. The input of the neural network is the object center p and the output is set to be the tracking index V related to the voltage needed to drive the Eye-robot. Figure 4.7 shows the feedback

configuration of the tracking control, where the neural network controller receives the error e between the current horizontal position x_c and the desired position x_d of the target and provides correspondingly a tracking index V to drive the Eye-robot. In this section, the neural network controller will emphasize with the training pattern that in used to drive the Eye-robot to trace the object.

Figure 4.6 Gray level image distribution

Object center (x,y) = (159,117) Gray level=255

Object

Background point (x,y) = (140,55) Gray level=0

Figure 4.8 Eye training patterns 0.5 1

Figure 4.7 Feedback configuration of the tracking

e Image

Figure 4.9 Neck training patterns

Position

Now, let’s focus on the offline training of the neural network controller. There are two kinds of training patterns V1 and V2 used in this thesis and shown in Figure 4.8 and Figure 4.9. Note that the pattern V₁ is expressed as

 

thesis and is designed to have the ability to find the object in searching mode. Once the object is detected, the eyes and the neck will work together to trace the object.

Based on the training patterns, there are three kinds of the training modes for the Eye-robot. The first mode adopts three neural networks, including NN_neck and two NNeye, for the neck, right eye and left eye, respectively. The neural network NNneck is trained by V₂, while the neural networks NN_eye are trained by V₁. The second mode

and right eye play the roles of concentration on the object. Based on the training patterns, when the object position p1 and p2 are located between 0.49 and 0.51, the center area, both the values V₁ and V₂ vanish. That means the Eye-robot already detects the object and will focus on it. In such situation, the Eye-robot keeps still and is not necessary to make any motion. If the object position p₁ and p₂ are located outside the center area, from the patterns V1 and V2 , the positions of the two eyes will keep unchanged and the neck will track the object faster when the object is farer away the center area. In addition, if the object is located in p2<0.1 or p2>0.9, the neck will track the object in the highest speed.

There are four neural networks used in this thesis, including NNneck, NNeye, NN_eye1-neck, and NN_eye2-neck. Their structures are all three-layered, containing input layer, hidden layer and output layer, depicted in Figure 4.10 to Figure 4.13. The structure of NN_neck adopts one input signal and one output signal. By using 19 hidden neurons, NNneck is trained by the pattern (p2, V2) in Figure 4.10. The neural networks of individual eye, NN_eye, adopt one input signal, one output signal and 81 hidden neurons, which are trained by the pattern (p1, V1) in Figure 4.11. For the combined neural network NN_eye1-neck, it uses one input signal, two output signals and 131 hidden neurons, which is trained by the pattern (p1, V1) for the left eye and the pattern (p2, V2) for the neck in Figure 4.12. For the integrated neural network NN_eye2-neck, it uses two input signals, three output signals and 151 hidden neurons, which is trained by the pattern (p₁, V₁) for the two eyes and the pattern (p₂, V₂) for the neck in Figure 4.13.

With these four neural networks, there are three types of tracking control, named as C3NN, C2NN and C1NN. The first type C3NN is composed of three neural networks, NN_neck and two NNeye for the neck, left eye and right eye. The second type C2NN contains two neural networks, NN_eye and NN_eye1-neck, where NN_eye is used for the right eye and NNeye1-neck is used for the left eye and the neck to operate together. As for the third

type C1NN, it employed only one neural network NN_eye2-neck, which operates the neck and the two eyes simultaneously. To compare with these three types of tracking control, the third one seems to be simpler than the other two types.

Hidden Layer Input

Layer

Output Layer pneck

Vneck

Figure 4.10 Structure of neck neural network

Hidden Layer Input

Layer

Output Layer

Figure 4.12 Structure of left eye combine with neck neural network p_left-neck

Vneck

Vleft-eye

Hidden Layer Input

Layer

Output Layer Figure 4.11 Structure of eye neural network peye

Veye

Based on the back-propagation algorithm in Chapter 2, the neural networks designed for the tracking control is off-line trained according to the patterns (p1,V1) and (p₂,V₂). With the well trained neural networks, the Eye-robot can keep the object around the visual center.

After the offline training of the neural networks, the training results will be applied to the Eye-robot tracking. The Eye-robot kinematics allows pan and tilt motion and then can trace any object moving left-and-right or up-and-down. The motion of the Eye-robot, with five degrees of freedom, is designed to keep the object at the center of the image retrieved by the Eye-robot. During the object tracking, each Eye-robot motor will get an appropriate tracking index depending on the object location via neural network off-line training. However, the tracking indeces are obtained under normalization and between 0 and 1, not available for driving the Eye-robot. Hence, the desired voltage input of each motor is determined by

Hidden Layer Input

Layer

Output Layer pleft-neck

Vneck

V_left-eye

Figure 4.13 Structure of left eye combine with neck and right eye neural network V_right-eye p_right

v_input Vv_max (4.2-3)

which multiplies the maximum voltage v_max to the index V . Obviously, the larger the voltage input v, the faster the tracking speed of the Eye-robot.

In this thesis, the object tracking is fulfilled by the neck and two cameras simultaneously. When the object is found, the Eye-robot is driven by all the motors together to track and then focus on the object at the visual center. Figure 4.14 shows the block diagram of the Eye-robot visual tracking control design architecture.

Image Capture

Object Detection Threshold>180

Initialize system

Calculate Object Position

Yes

Neural Network Controller

Eye-robot

Figure 4.14 Block diagram of the Eye-robot visual tracking control design architecture RGB2Gray Level

Velocity Command p

Chapter 5 Experimental Results

This chapter shows the experimental results, including the neural network off-line training and the intelligent tracking control, to demonstrate the success of the multiaxial control designed for the Eye-robot.

5.1 Neural Network Off-line Training

This section focuses on the off-line training of the four neural networks, NNneck, NNeye, NNeye1-neck, and NNeye2-neck, used in the tracking controllers, C3NN, C2NN

and C1NN, introduced in Chapter 4. It is known that different types of tracking control requires different types of neural networks. Besides, all the neural networks are designed to have three layers, the input layer, output layer and hidden layer. The number of neurons of the input layer is chosen to be the same as the number of input data, so is the number of neurons of the output layer, corresponding to the output data.

However, how many neurons are needed for the hidden layer should be determined by experiments, via neural network off-line training in this thesis.

First, let’s find the suitable number of neurons of NNneck and NNeye, which will be applied to the controller C3NN. The off-line training of NN_neck is executed in different cases, named as NNneck-k where k is the number of neurons of its hidden layer and is chosen from 10 to 30. Based on the off-line training, it can be found that the learning time is decreased sharply while k is changed from 18 to 19, as shown in Table 5.1. Obviously, the NN_neck-19 is the best structure when comparing with others.

Thus, the NNneck-19 will be used in the controller C3NN to control the neck. Similarly, Table 5.2 shows the results of the off-line training of NN_eye. Evidently, the NN_eye-81 is

the best structure with minimal learning time and epochs. Hence, two of the NN_eye-81 will be used in the controller C3NN to control the right eye and left eye.

Table 5.1 Neck neural network off-line training parameter

Experiments NNneck-10 NNneck-18 NNneck-19 NNneck-20 NNneck-30

Table 5.2 Eye neural network off-line training parameter

Experiments NN_eye-60 NN_eye-80 NN_eye-81 NN_eye-82 NN_eye-100

Next, let’s find the suitable number of hidden neurons of NN_eye1-neck, which is applied to the controller C2NN. The off-line training of NNeye1-neck is also executed in different cases, named as NN_eye1-neck-k where k is the number of neurons of its hidden layer and is chosen from 110 to 150. Based on the off-line training, it can be found that the learning time is decreased sharply while k is changed from 130 to 131 and then increased sharply while k is changed from 131 to 132, as shown in Table 5.3.

Obviously, the NN_eye1-neck-131 is the best structure when comparing with others. Thus, the NNeye1-neck-131 will be used in the controller C2NN to control both the left eye and the neck simultaneously. Besides, the NN_eye1-81 is also included in C2NN to control the right eye.

Table 5.3 Left eye combine with neck neural network off-line training parameter Experiments NNeye1-neck-110 NNeye1-neck-130 NNeye1-neck-131 NNeye1-neck-132 NNeye1-neck-150

Learning Time 4752 4188 2408 4726 5001

Epochs 112662 80353 42112 85029 88619

Hidden neuron 110 130 131 132 150

Tolerance 10^-3 10^-3 10^-3 10^-3 10^-3

Learning rate 0.01 0.01 0.01 0.01 0.01

Input neuron 1 1 1 1 1

Output neuron 2 2 2 2 2

Finally, let’s find the suitable number of hidden neurons of NN_eye2-neck, which is applied to the controller C1NN. Similarly, NN_eye2-neck-k represents the case of the off-line training of NN_eye2-neck with k neurons in the hidden layer chosen from 130 to 170. From Table 5.3, the NN_eye2-neck-151 is the best structure when comparing with others. Besides, interestingly an abrupt change in learning time still exists around k=151, same as the other three neural networks mentioned previously. According to

Table 5.3, the NN_eye2-neck-151 will be used in the controller C1NN to control the neck and both the two eyes at the same time.

Table 5.4 Left eye combine with neck and right eye neural network off-line training parameter

Experiments NNeye2-neck-130 NNeye2-neck-150 NNeye2-neck-151 NNeye2-neck-152 NNeye2-neck-170

Learning Time 6802 4036 3512 5547 6748

Epochs 169082 89044 77113 118173 137133

Hidden neuron 130 150 151 152 170

Tolerance 10^-3 10^-3 10^-3 10^-3 10^-3

Learning rate 0.01 0.01 0.01 0.01 0.01

Input neuron 2 2 2 2 2

Output neuron 3 3 3 3 3

5.2 Set Point Control

This section will employ the neural networks well-trained by the back-propagation learning algorithm to drive the Eye-robot such that the object is traced, i.e., the object is kept in the image center. In the experiments, an image of a circle with radius 5cm projected on a screen is used as the object to be traced and the distance between the screen and the Eye-robot is 280cm. During the object tracking, the maximal velocity of the neck is set to be v_max=1500rpm/min. For an individual eye, not working together with the neck, its maximal velocity is set to be 70 rpm/min.

However, when an eye works together with the neck, its maximal velocity is not necessary to be 70 rpm/min, but is lowered down to 30 rpm/min.

Figure 5.1 to Figure 5.3 show the results of the C3NN set-point control of an fixed object whose image is initially located at the position left to the visual center.

The controller C_3NN contains three neural networks, NN_neck and two NN_eye. Figure 5.1 shows the tracking index produced by NNneck and the position error of the object to the visual center. Obviously, with the tracking index the motor mounted to drive the neck has successfully reduced the position error. From Figure 5.2 and Figure 5.3, it is clear that the motors employed to steer the two cameras have also reached the control goal to locate the object around the visual center with an error near to zero.

Figure 5.2 Set-point control of NN_eye results of the right eye

Figure 5.1 Set-point control of NN_eye results of the left eye

Figure 5.4 and Figure 5.5 show the results of the C2NN set-point control under the same conditions. There are two neural networks, NN_eye1-neck and NN_eye, to implement the controller C2NN. Figure 5.4 shows the tracking indices produced by NN_eye1-neck for the left eye and the neck. With the tracking indices the position error has been reduced to zero as expected. For Figure 5.5, the motor employed to steer the right eye has also successfully located the object around the visual center with an error near to zero.

Figure 5.3 Set-point control of NN_neck results of the neck

Figure 5.5 Set-point control of NNeye results of the right eye

Figure 5.4 Set-point control of NN_eye1-neck results of the left eye combine with neck

Finally, Figure 5.6 shows the results of the C1NN set-point control which uses only one neural network NNeye2-neck to control the neck and two cameras simultaneously. In this figure, both the position errors related to the two eyes have been reduced to zero by the three tracking indices generated by NNeye2-neck.

Once the object is detected, the two eyes and the neck will work together to trace the object. Whatever the place of the object location, it is clear that the Eye-robot will detect the object then trace the object in the center within 2 seconds and maintain the object in the center. These experimental results will be performed to demonstrate the effectiveness of the proposed scheme.

To demonstrate that the controllers C3NN, C2NN and C1NN are also suitable for the set-point control of a fixed object initially located right to the visual center, Figure 5.7 to Figure 5.9 show the experiment results of the image position errors related to the two eyes. In these figures, it is clear that the controllers C_3NN, C_2NN and C_1NN indeed successfully drive the object image to the visual center.

Figure 5.6 Set-point control of NNeye2-neck results of the left eye combine with neck and right eye

Figure 5.8 Set-point control of C2NN errors result Figure 5.7 Set-point control of C_3NN errors result

5.3 Horizontal Object Tracking Control

This section will further focus on the tracking control of horizontal trajectories and similar to set-point control the maximal velocities are set to be v_max=2000rpm/min for the neck, 70 rpm/min for the individual eye and 30 rpm/min for the eye to work with the neck. The horizontal trajectory is sinusoidal and expressed as

) 2 cos(

)

( T

A t t

x   (5.3-1)

where A is the magnitude and T is the period. In the experiments, A=70 and the T= 6, 9, 12, 15 seconds. For the case of T=6 seconds, the experiment results of the image position errors are shown in Figure 5.10 to Figure 5.12. From these results, it is clear that all the controllers C_3NN, C_2NN and C_1NN are indeed able to control the Eye-robot to trace the moving object with image position errors less than 0.09, i.e., within 30 pixels.

Such image position errors are acceptable since the radius of the object, which is 6 Figure 5.9 Set-point control of C_1NN errors result

pixels with the tracking errors around 0.02, and results in a quite smooth tracking motion.

Figure 5.10 6 seconds of C_3NN errors tracking control result

Figure 5.12 6 seconds of C1NN errors tracking control result Figure 5.11 6 seconds of C2NN errors tracking control result

Intuitively, a slower moving object will lead to a more precise tracking motion.

To demonstrate such tracking behavior, the controllers C3NN, C2NN and C1NN are also applied to the cases of T=9,12,15 seconds. Figure 5.13 to Figure 5.15 show the experiment results for T=9 seconds, Figure 5.16 to Figure 5.18 show the experiment results for T=12 seconds, and Figure 5.19 to Figure 5.21 show the experiment results for T=15 seconds. As expected, the image position errors are reduced to 20 pixels for T=9 seconds and to 15 pixels T=12 seconds. However, limited to the physical feature

of cameras, the image position errors are no longer improvable by further decreasing

在文檔中應用於眼球機器人之智慧型多軸追蹤控制 (頁 33-0)