• 沒有找到結果。

Chapter 1 Introduction

1.3 Thesis Organization

This thesis deals with object tracking, and the rest of the thesis is organized as follows. The intelligent learning algorithm of the back-propagation will be introduced in Chapter 2. In Chapter 3, we represented the system problem statement concepts related to the object tracking, and the system hardware and software will be described.

Chapter 4 deals with object detection and intelligent object tracking controller design, which we implemented for object tracking, provides a detailed description of the Eye-robot tracking. In Chapter 5 experimental results and discuss are provided. At last, the conclusions and future work will be proposed in Chapter 6.

6

Chapter 2

Intelligent Learning Algorithm

This chapter discusses a popular learning method capable of handling such large learning problem, which is back-propagation learning algorithm. This algorithm can be efficiently implemented in computing system in which only local information can be transported through the network. The detailed explanation about intelligent learning algorithm will be introduced in the section. Section 2.1 introduces the artificial neural network (ANN), which is introduces the neuron that mimic human nervous. Section 2.2 describes a kind of neural network learning algorithm, which is the back-propagation network learning algorithm.

2.1 Introduction to ANN

The human nervous system consists of a large amount of neurons, shown in Figure 2.1, including somas, axons, dendrites and synapses. Each neuron is capable of receiving, processing, and passing electrochemical signals from one to another. In order to mimic the characteristics of the human nervous system, recently investigators have developed an intelligent algorithm, called artificial neural network (ANN), to construct intelligent machines capable of parallel computation. This thesis will apply ANN to learn the object tracking of the Eye-robot.

7

. . . . . .

y

xn x3 x2

x1

wn b w3

w2

w1

 f(.)

Figure 2.2 Basic element of artificial neural network Figure 2.1 Schematic view of a neuron

8

As shown in Figure 2.2, the basic element of ANN has N inputs and each input is multiplied by a corresponding weight, analogous to synaptic strengths. The weighted inputs and the bias are summed to determine the activation level of the neuron, whose input-output relation is expressed as

1

In this thesis, the sigmoid function in (2.1-5) is used as the activation function since it is very similar to the input-output relation of biological neurons. The commonest multilayer feed-forward network, constructed by the basic element, is

9

shown in Figure 2.3 Multilayer feed-forward network 2.3, which contains input layer, output layer, and hidden layer. A multi-hidden layer network can solve more complicated problems than a single-hidden layer network. However, the training process of multi-hidden layer is more difficult. The numbers of hidden layers and their neurons are decided by the complexity of the problem to solve.

In addition to the architecture, the method of setting the weights is an important matter for a neural network. For convenience, the learning for a neural network is mainly classified into two types, supervised and unsupervised. The training in supervised learning is mapping a given set of inputs to a specified set of target outputs. The weights are then adjusted according to various learning algorithms. For unsupervised learning, a sequence of input vectors is provided, but no target vectors are specified. The network modifies the weights so that the most similar input vectors

Output layer Input layer

2nd hidden layer

1st hidden layer

. . . . .

. .

. . . . . . .

Figure 2.3 Multilayer feed-forward network

10

are assigned to the same output unit. In this thesis, the neural network learns the behavior by input-output pairs, a way of supervised learning.

2.2 Back-Propagation Network

In supervised learning, the back-propagation network learning algorithm is widely used in most applications. The back-propagation algorithm was proposed in 1986 by Rumelhart, Hinton and Williams, which is based on the gradient descent method for updating the weights to minimize the total square error of the output. The training by the back-propagation algorithm is mainly applied to multilayer feed-forward network in three steps: input training patterns, calculate errors via back-propagation, and adjust the weights accordingly. Figure 2.4 shows the back-propagation network, including input layer with I neurons, one hidden layer with Jneurons and output layer with Kneurons. Let xi be the input to the i-th neuron of the input layer, i=1,2,…,I. For the output layer, the output of the k-th neuron, k=1,2,…,K, is expressed as from the j-th neuron in the hidden layer to j-th neuron in the output layer. As for the hidden layer, the output of its j-th neuron is obtained as

,..., from the i-th neuron in the input layer to j-th neuron in the hidden layer.

11

In a back-propagation neural network, the learning algorithm usually executes in two phases. First, a training input pattern is presented to the network input layer, and propagated through hidden layer to output layer, which generates an output pattern. If this pattern is different from the desired output, an error is calculated and then propagated backward from the output layer to the input layer. The weights are modified as the error is propagated. The back-propagation training is designed to minimize the mean square error between the actual output and the desired output, which is based on an iterative gradient algorithm. A learning cycle starts with applying an input vector to the network, which is propagated in a forward propagation mode which ends with an output vector. Next, the network evaluates the errors between the desired output vector and the actual output vector. It uses these errors to shift the connection weights and biases according to a learning rule that tends to

Figure 2.4 Back-propagation network

12

minimize the error. This process is generally referred to as “error back-propagation”

or back-propagation for short. The adjusted weights and biases are then used to start a new cycle. A back-propagation cycle, also known as an epoch, in a neural network is illustrated in Figure 2.5. For a finite number of epochs the weights and biases are shifted until the deviations from the outputs are minimized.

The learning algorithm of the back-propagation is elaborated on as below:

Step 1: Input the training data.

The training data contain a pair of input x[x1 x2xI ]T and desired output d [d1d2dK ]T. Set appropriate maximum tolerable error Emax and leaning rate  between 0.1 and 1.0, according to the computing time and the precision.

Step 2: Randomly set the initial weights and bias values of the network.

Step 3: Calculate the outputs of the hidden layer and the output layer.

Input-Output Pair

Forward Propagation

Evaluation of Error

Back-Propagation Review Learning

Rule Update Weights

and Biases

Input Vector

Output Vector (Target)

Figure 2.5 Back-Propagation Cycle

13

Step 4: Calculate the error function expressed as

correction of weights as

Step 6: Propagate the correction backward to update the weights as below

   

14

Step 7: Check whether the whole training data set have learned or not.

When the network has learned the whole training data set, it is called that the network goes through a learning cycle. If the network does not go through a learning cycle, return to Step 1, otherwise, go to Step 8.

Step 8: Check whether the network converges or not.

If E<Emax, terminate the training process, otherwise, begin another learning cycle by going to Step 1.

Back-propagation network learning algorithm can be used to model various complicated nonlinear functions. Recently the back-propagation network learning algorithm is successfully applied to many domains and applications, such as pattern recognition, adaptive control, clustering problem, etc. In this thesis, the back-propagation network algorithm was used to learn the relationship between input (position) and output (velocity) function for the eye-robot to pursuit an object. Figure 2.6 shows the flow chart of the design process of the back-propagation algorithm.

15

Figure 2.6 Flow chart of the design process of the back-propagation

16

Chapter 3

System Description

The Eye-robot designed for object tracking can be viewed as an integration of three subsystems, including a humanoid visual system (HVS), an image processing system (IPS) and a tracking control system (TCS). The HVS is constructed by two cameras in parallel to emulate the human eyes and detect the object. The IPS is used to abstract the features of the object from the images retrieved by the HVS. After the object features are recognized, the TCS is then adopted to drive the HVS to track the object. This chapter will focus on the design of the TCS based on the back-propagation neural network.

3.1 Problem Statement

This section describes the humanoid visual system (HVS), an image processing system (IPS) and a tracking control system (TCS). The HVS is built with two cameras and five motors to emulate human eyeballs. The IPS is first transformed the image retrieved from the Eye-robot into a gray level image, and then taking a threshold can be easily selected for the object. The TCS is the mainly part for this thesis used in the Eye-robot tracking control, which is the controller via the off-line training with the used of neural network. There are four neural networks used in this thesis, including NNneck, NNeye, NNeye1-neck, and NNeye2-neck. With these four neural networks, there are three types of tracking control, which are C3NN, C2NN and C1NN. Then the off-line training results will be compared with these three types of tracking control. When the object location in the two cameras image is obtained and determined, the purpose of object tracking can be achieved by the Eye-robot.Here is a

17

brief describes to the object detection method and the tracking control of the Eye-robot, and how to obtain the space coordinates of the Eye-robot will be introduced in Chapter 4.

The Eye-robot is a challenging and important problem in computer vision. The Eye-robot tracking is expected to detect the position of an object in a scene and maintain its position in the center of the image plane. How to design and maintain a high performance of the control system is an important topic for sturdy.

In this thesis, an object is tracked by the Eye-robot. The object is projected by an overhead projector in a screen. The speed of the object motion velocity is 28 cm/sec. The distance between the Eye-robot and screen are designed around 240 cm and the proportion of the vision object in the Eye-robot is designed as 10cm10cm.

Figure 3.1 shows the whole system framework of the Eye-robot tracking, it includes camera, graphic-card, personal desktop, RS 232 interface transmission and MCDC controller for the most part to construct the close-loop of the Eye-robot tracking system.

Object Camera Graphic

Card

Figure 3.1 Whole system framework of the Eye-robot Tracking

ASCII Control

Signal

18

3.2 Hardware

The Eye-robot tracking system is built with two cameras and five motors to emulate human eyeballs as shown in Figure 3.2. This active vision head of the Eye-robot was designed to mimic the human visual system. The hardware used in Eye-robot tracking using include motor controller, one desktop computer, power supply, two cameras, several I/O port and motor drive cards are used to implement the overall Eye-robot tracking system.

The Eye-robot adopts five FAULHABER DCservomotors to steer the Eye-robot in tracking system. Each motor works at voltage 15V, frequency 31.25 kHz, maximum output current 0.35 A and maximum speed 15000 rpm. With RS-232 interface, the controller of DCservomotors is executed by the motion control card, MCDC 3006S, in a positioning resolution of 0.18°. With these 5 degrees of freedom, the object tracking system would track the target whose position is determined from the image processing of the two cameras, QuickCamTM Communicate Deluxe, with specs:

1.3-megapixel sensor with RightLight™2 Technology

Built-in microphone with RightSound™2 Technology

Video capture: Up to 1280 x 1024 pixels (HD quality) (HD Video 960 x 720 pixels)

Frame rate: Up to 30 frames per second

Still image capture: 5 megapixels (with software enhancement)

USB 2.0 certified

Optics: Manual focus

The Eye-robot has two pan-direction video cameras, a conjugated tilt motor and a pan-tilt neck. The neck pan-direction is the first axis, and the tilt direction is the

19

second axis. The binocular tilt-direction is set as the third axis, and the pan-directions of the cameras are set as fourth and fifth axes. Figure 3.3 defines axes of the five-axis Eye-robot. The Eye-robot head can rotate around a neck, but the neck was fixed in this research and only three degrees of freedom were used. The range of pan is approximately 120 degrees, and tilt is approximately 60 degrees. The size of the head is about 25 cm width and 10 cm height for the eye part.

As the purpose of the active vision system is to mimic the human system, the requirements of movement are also defined by human eye motion speed. The motor system is designed to achieve three 120 degree pan saccades per second and three 60 degree tilt saccades per second.

All motors are controlled which is attached to the host PC via the RS-232 interface card. The control and image process are both implemented in personal desktop computer with 1.60GHz CPU.

Figure 3.2 Eye-robot system

20

3.3 Software

The software for controlling the vision head of the Eye-robot, including capturing video, processing video image frames, and sending control signals for motor control, is written in MATLAB.

MATLAB is a high-level technical computing language and interactive

environment for algorithm development, data visualization, data analysis, and numeric computation. Using the MATLAB product, we can solve technical computing problems faster than with traditional programming languages, such as C, C++, and Fortran.

In this thesis, the software of the MATLAB in a wide range of applications, including signal and image processing, communications, control design, test and measurement, financial modeling and analysis, and computational biology. Add-on toolboxes (collections of special-purpose MATLAB functions, available separately)

Figure 3.3 Definition of axes of the Eye-robot 1st Axis

2nd Axis 3rd Axis 5th Axis

4th Axis

21

extend the MATLAB environment to solve particular classes of problems in these application areas.

MATLAB provides a number of features for documenting and sharing the work.

We also can integrate MATLAB code with other languages and applications, and distribute MATLAB algorithms and applications. Using MATLAB Features include:

 High-level language for technical computing

 Development environment for managing code, files, and data

 Interactive tools for iterative exploration, design, and problem solving

 Mathematical functions for linear algebra, statistics, Fourier analysis, filtering, optimization, and numerical integration

 2-D and 3-D graphics functions for visualizing data

 Tools for building custom graphical user interfaces

In this thesis, the software for controlling the vision head of the Eye-robot, include capturing image, processing image frames and sending control signals to each axis motors control which is written by M-file editor of Matlab R2006a. This is a kind of multitasking version of M-file editor produced in Matlab. All source code is written on the host PC desktop and compiled. The communication between PC desktop and Eye-robot are using RS232 to transmission signal from pc desktop to the Eye-robot controller. It handles Eye-robot tracking and sends the tracking status and coordinates of the object in the plane to pc desktop.

22

Chapter 4

Intelligent Object Tracking Controller Design

In this chapter, the intelligent object tracking controller design applied to the Eye-robot will be introduced, including object detection and tracking controller design.

4.1 Object Detection

Recently, the image processing has been widely applied to lots of researches by transforming images into other forms of images, called the feature images, to provide useful information for problems in military, medical science, industry, etc.

For simplicity, the object to be traced in this thesis is created by computer image which contains a white ball moving on a pure background in black and is projected on a screen, 240cm away from the Eye-robot. Figure 4.2 shows an image retrieved by the left-eye camera mounted on the Eye-robot. Clearly, in addition to the white ball and black background on the screen, there also includes undesirable things outside the screen in the image. To properly detect the white ball moving on the screen, this section will introduce some basic concepts and methods of image processing.

The pixels of the image retrieved by the Eye-robot are represented by a point (R,G,B) in the color cube as shown in Figure 4.1, where three axes are related to red, green and blue. Let the pixel be located at (x,y) with color (R,G,B), then its gray level can be expressed as

23 resolution; for example, the image used in this thesis is of the size 320240, and thus H=320 and W=240. Since the RGB space contains 256 levels, numbered from 0 to

255, along each axis in the color cube, there are 2563different colors in total and 256 different gray levels.

To detect the object on the screen, the image in RGB space of the left-eye camera in Figure 4.2 is first transformed by (4.1-1) into a gray level image as shown in Figure 4.3 with gray level intensity distribution given in Figure 4.4. Clearly, the gray level of the object, i.e., the white ball, is much higher than the other pixels and therefore a threshold can be easily selected for the object. Taking the threshold as 180 results in a binary image shown in Figure 4.5, where the pixel located at (x,y) is

Figure 4.1 RGB color space and the color cube.

24

and Figure 4.6 shows gray level image distribution.

Based on this binary image, the location of the object in an image is detected object via offline training with the use of neural network, which will be introduced in the next section.

25

Figure 4.3 Image transform to gray level

Object center (x,y) = (159,117) Gray level=254

Object

Background point (x,y) = (140,55) Gray level=105 Object center (x,y) = (159,117) (R,G,B)=(252,255,255)

Object

Background point (x,y) = (140,55) (R,G,B)=(88,110,116)

Figure 4.2 Image in RGB space

26

Figure 4.5 Binary image of taking the threshold as 180 Figure 4.4 Intensity of gray level distribution

255

27

4.2 Tracking Controller Design

This section describes the design of tracking controller, a neural-network-based controller, which is achieved via offline training. The input of the neural network is the object center p and the output is set to be the tracking index V related to the voltage needed to drive the Eye-robot. Figure 4.7 shows the feedback

configuration of the tracking control, where the neural network controller receives the error e between the current horizontal position xc and the desired position xd of the target and provides correspondingly a tracking index V to drive the Eye-robot. In this section, the neural network controller will emphasize with the training pattern that in used to drive the Eye-robot to trace the object.

Figure 4.6 Gray level image distribution

Object center (x,y) = (159,117) Gray level=255

Object

Background point (x,y) = (140,55) Gray level=0

28

Figure 4.8 Eye training patterns 0.5 1

Figure 4.7 Feedback configuration of the tracking

e Image

Figure 4.9 Neck training patterns

Position

29

Now, let’s focus on the offline training of the neural network controller. There are two kinds of training patterns V1 and V2 used in this thesis and shown in Figure 4.8 and Figure 4.9. Note that the pattern V1 is expressed as

 

thesis and is designed to have the ability to find the object in searching mode. Once the object is detected, the eyes and the neck will work together to trace the object.

 

thesis and is designed to have the ability to find the object in searching mode. Once the object is detected, the eyes and the neck will work together to trace the object.

相關文件