Hardware Platform - System Overview - 利用PTZ攝影機實現動態背景中特定物體之鎖定追蹤技術

Chapter 2 System Overview

2.2 Hardware Platform

In this system, we use an active camera Lilin PIH-7600 High Speed PTZ Dome as shown in Figure 2-8 that has pan-tilt-zoom function to acquire real time image frame sequences. These frames are captured and processed by Personal Computer (PC). The specification of the computer is Intel Pentium(R) 4 at 2.4 GHz, 512 Mb RAM in Windows XP OS. As shown in Figure 2-9, the active camera has two interfaces which are RS-232 and video capture interface. RS-232 interface is used to send a command by PC to control the camera movement including pan-tilt and zoom operation. Meanwhile, video capture interface is an analog input of video sequence through video capture card on PC to read out the image data. The video capture card is Vguard 7146. To process the image and send commands to control the PTZ camera, we use the Borland C++ Builder 6.0 as the platform.

Figure 2-8 : Lilin PIH-7600 High Speed PTZ Dome

Figure 2-9 : Hardware platform diagram

3 Chapter 3

Deformable Dynamic Tracking System

In this chapter, we will describe how to determine the location and area of moving object, and track it by mean-shift algorithm in image sequence. The tracking system is divided into two parts: moving object detection and man-shift tracking. We use both frame difference method for moving object detection and size adaptive mean-shift method for tracking.

3.1 Moving Object Detection

Among most moving object detection methods, the most commonly used methods are frame difference method and background subtraction method. We use the frame difference method because the scene of an active camera always changes, so we can not construct a stable background model for background subtraction. This moving object detection part consists of three elements, which are temporal difference, specific filter and image projection.

Temporal difference calculates the image difference between current and previous frame as shown in Figure 3-1(a) and Figure 3-1 (b) on the next page. After the temporal difference step, we can see the noises in the difference map (as shown in Figure 3-2). If we directly use this to select the tracking region for moving object tracking, the performance might be undesirable.

Figure 3-1 : (a) previous frame (left) (b) current frame (right)

Figure 3-2 : The result of frame difference in Figure 3-1

After we get the difference map, we use a specific digital filter of image dilation and erosion to eliminate the effect of noises on the difference map and then to enhance the actually moving object region. The result of applying this specific digital filter is shown in Figure 3-3 whose pixels are separated into two categories, moving pixels and non-moving pixels. We can compare Figure 3-2 with Figure 3-3 to see that the latter one is easier to recognize a moving object area for us.

Figure 3-3 : Difference map after erosion and dilation

We can get the more precise difference map after this specific digital filter, and take action to select our target object for tracking tasks. The method we adopt is the image projection method, which is commonly used in image processing. We need to do horizontal projection and vertical projection and according to these two graphs we can mark the moving object region for later tracking tasks. The result of image projection method is shown in Figure 3-4.

In extracting the moving object region, we first take horizontal projection to find the largest region on X coordinate, and then perform the vertical projection to find the largest region on Y coordinate. The result is a rectangular bounding box from left-top coordinate to right-bottom coordinate as shown in Figure 3-4.

After finding the largest region of the motion binary map, we will take it as our target object for the mean shift tracking task, if the area of the region is larger than some scale and satisfy the appropriate width/height ratio. Otherwise if the size is too small or the width/height ratio is not reasonable, we will choose to resume the whole step to find another appropriate candidate moving object.

Figure 3-4 : Image projection

Before we perform the tracking task, we need another step to reduce the effect of background noises is necessary. We adopt the elliptic region rather than rectangle region because we found that the rectangle region contains more background information than the elliptic region. The difference between the rectangle region and the elliptic region can be seen in Figure 3-5. In chapter 5, the experiment result will demonstrate the advantage of using the elliptic region.

Figure 3-5 : The rectangle region and the elliptic region

3.2 HSV Color Transformation

After we find the target object in the moving object detection part, we will track this target. However, we need to transfer the color space of our target object from RGB (Figure 3-6) to HSV (Figure 3-7) before tracking. The main purpose of this color space transformation is about the lightness problem of our system. Because the diaphragm of the active camera which we use will be changing by itself with the environment illumination, we have to avoid the effect of this varying factor.

Consequently, we solve this problem by using the HSV color space, because it can extract the lightness information from RGB color values, so we can reduce the sensitivity of this single quantity of illumination.

The HSV model, also known as HSB model, was created in 1978 by Alvy Ray Smith. It is a nonlinear transformation of the RGB color space. It defines a color space in terms of three components: hue, saturation, and value. The definition is described below: [27]

1. Hue: It is the color type and ranges from 0 ~ 360 degree. Each value corresponds to one color. For example, 0 is red, 45 is orange and 55 yellow.

When it comes to 360 degree, it is also equal to 0 degree.

2. Saturation: It is the intensity of the color, and ranges from 0%~100%. 0 means no color, and that means only gray value between black and white exists. 100 means the intense color with the most color variety.

3. Value: It is the brightness of the color, and also ranges from 0%~100%. 0 is always black. Depending on the saturation, 100 may be white or a more or less saturated color.

Figure 3-6 : RGB color model [28]

Figure 3-7 : HSV color model [29]

In our system, we transfer the RGB color space of the target object to HSV color space in order to prevent the brightness problem caused by the varying diaphragm of the camera, because the HSV color model separates the brightness information from the RGB color information. The transformation algorithm from RGB color model to HSV color model lists out step by step in the following paragraph:

1. We will transfer the RGB value from 0~255 to 0.0~1.0.

2. We select the maximum and minimum value of (R, G, B), and then adopt the following algorithm to transfer the RGB model to HSV model.

0 if MAX = MIN

6. In step 3, if MAX=MIN the value of H will consider as “Undefined.” In order to let our calculation more convenient, we define the H value to be 0 if it happens.

After execute these 6 steps to transfer the RGB color of our target object into HSV color, the next step of our system is to calculate the color histogram of the target object pixel by pixel. And the most important problem of this step is how many orders each element of HSV color model will be. This problem will be solve and explained in the Chapter 5.

3.3 Mean-Shift Tracking

The mean shift method is employed in the joint, spatial-range domain of gray level and color images for discontinuity preserving filtering and image segmentation before by goodness of its low cost and simplicity in computation. This method also presents a new approach to the real time tracking of non-rigid objects whose statistical distributions characterize the object of interest [16]. This theorem was proposed in 2000, and it also has been proved that it will converge consequently.

3.3.1 Mean-Shift Tracking Theorem

By definition, given a set { Xi }i=1…n of n points in the d-dimensional space R^d, the multivariate kernel density estimate with kernel K(x) and window radius h, computed in the point X0 is given by

where a commonly used kernel is the multivariate normal:

2 )

The kernel value of the candidate object at position Y can be described as:

∑

Hence, tracking can be seen as searching the minimized distance of the sample based on the estimate of the Bhattacharyya coefficient given by:

∑

After some derivations, we see that:

∑

So we can use this to minimize the distance by mean-shift algorithm.

3.3.2 Mean-Shift Tracking Algorithm

In our system, we use the mean-shift algorithm to track the object that we are interested in. After color model transformation, we use the histogram of HSV color space to excute the mean-shift tracking. In the context of tracking, given the color distribution functions of q and p generated from the model and candidate object regions, a sample corresponds to the color observed at a pixel X and has an associated sample weight w(x) which is defined by:

))

The key point of the mean-shift algorithm is the computation from initial object

where K(.) is usually a radically symmetric kernel function, such as a Gaussian distribution kernel function. We choose another popular kernel function called flat kernel. That is { K(x) = 1 , ||x|| < 1 }.

We will repeat the algorithm several times until it converges, and it will converge soon provably [16]. After its converging, we also need to check if the tracking result matches with our tracking target. We estimate the discrete density q={qu} u=1…m from the m-bin histogram of the target model, while p(y)={qu(y)}u=1…m is estimated at a given location y from m-bin histogram of the target candidate.

So the whole mean-shift algorithm can be briefly sorted below:

I. Transfer the HSV color model into distribution histogram on target model.

II. For the coming frames, get the HSV color histogram of the target candidate.

III. Calculate the mean-shift vector ΔX using the formula (3.12)

IV. After it converges, calculate the Bhattacharyya coefficient using the following equation (3.7) to see if the target candidate and the target model are matched.

V. If the target candidate is matched, perform the next adaptive size selecting method. Otherwise we consider the tracking is loss and search for another moving object.

3.4 Adaptive Size Selecting

After the mean-shift tracking step, the next step of our tracking system is the adaptive size selecting step, which is also our innovation. In the previous step, mean-shift tracking step, we can only track the target object in fix size matching.

Because we should track the target object as precise as possible, we need to change our tracking region with the varying size. Eventually, we find a solution for this problem by combining the mean-shift tracking with adaptive size selecting method.

Figure 3-8 : Three different size candidates

The adaptive size selecting method is also based on the mean-shift tracking algorithm. In general, the mean-shift tracking algorithm selects the fix size target candidate to calculate the mean-shift vector ΔX. Here we use the different tracking sizes of target candidate, and we calculate the Bhattacharyya coefficient for each size.

As shown in Figure 3-8, after we calculate the position of new tracking object, we will choose several candidates with different size.

There are three different size candidates in Figure 3-8, the smaller candidate, the normal candidate and the larger candidate. Then we need to compare their Bhattacharyya coefficients to choose which one has the largest similarity. Before that, the most important thing is to normalize the color distribution of each tracking size of the candidate, because the Bhattacharyya coefficient which we use for judging the similarity of each candidate must have the same numbers of sample weight. So the each bin of color distribution of target candidate p(y)={qu(y)} u=1…m will be multiplied a normalized coefficient f.

f = number of target model pixels ／ number of target candidate model pixels Therefore the Bhattacharyya coefficient calculating equation will be modified as:

∑

We compare the Bhattacharyya coefficients calculated by equation (3.10) of three different sizes of target candidate, and then we select the one which has the largest Bhattacharyya coefficient as our final tracking result.

In the Figure 3-9 and Figure 3-10 on next page, we demonstrate the result of size selecting on single picture. Figure 3-9 is retrieved on Internet and Figure 3-10 is enlarged from Figure 3.9 by Photoshop. In the Figure 3.10, the tracking region is chosen by hand and the right one is the target object. We use both the mean-shift tracking and size selecting method to calculate the correct position and size. In Figure 3-10, after 4 times iteration we can see the tracking result is good and the size selecting is also matched.

Figure 3-9 : Tracking region is chosen by hand

Figure 3-10 : The tracking result after 4 times iteration.

This selecting method also has some problems that might occur. They would happen in some video sequence which leads to the size selecting always diverging.

That means the size selecting step always choosing the larger region or choosing the smaller tracking region. Both situations may cause the size selecting diverging and tracking failure. We can not simulate this kind of situation in only one picture because that might happen in many frames later. But we can handle this by the following two methods:

1. We can try to change the camera zoom to let the camera reset its focus and refine the image resolution of the target object. This method might be useful when the distance between the target object and the camera

varies very often, but this method still might be failure possibly.

2. We can restrict the size selecting times in our coding program such as only 3 times larger than the target object or only 3 times smaller than the target object. This method seems not very good but its purpose is to prevent the tracking diverged and failure.

We adopt both steps to our tracking system, and the result is the same with our expectation. The experiment result will be demonstrated in the Chapter 5 with more discussions.

4 Chapter 4

PTZ Control System

In the proposed tracking system, we try to keep the tracked target around the central part of the captured scene. In this chapter, we will introduce the control mechanism of an active pan-tilt-zoom camera module. The control mechanism includes the pan/tilt action and the zoom in/out action. The pan and tilt control action of the active camera will be activated in two conditions: one is when the change detection found a moving object and its position is not at the center part of the captured scene. The other condition is when the result of the mean-shift tracker found the target away from the center. The zooming in and out control action of the active camera will be activated by the result of size selected module.

Despite of the action control of the camera, the speed control of these actions is also concerned. That is, the speed of pan/tilt action is switched dynamically and proportional to the distance between the target position and the image center. A larger panning/tilting speed is required when the target is farer away from the center. At the same time, the speed of zooming in/out action also needs to be changed dynamically and proportional to the size-changing rate of our target object. After having calculated and decided the proper pan/tilt speed, the camera control module executes some control routines to control the active camera continuously.

4.1 Basic Mechanism

The camera we use is the Lilin PIH-7600 High Speed PTZ Dome, and its control signal is transmitted by RS-485 standard interface. To connect this camera to our PC, we use RS-232 serial port and a transfer connector between RS-232 and RS-485. The UART Format is: none parity bit, 8 data bits, 1 stop bit, and baud rate: 9600 bps.

Every command that we send from PC to camera contains three bytes, and it usually accompanies with a “STOP” command after every command has been send.

The detail of each byte of controlling command is listing in Figure 4-1 and Figure 4-2.

We can see the function of each byte in Figure 4-1 and the speed control of pan and tilt in Figure 4-2. Figure 4-1 : Command Description

Tilt Speed Bit 5 Bit 4 Bit 3 Pan Speed Bit 2 Bit 1 Bit 0

Figure 4-2 : Pan/Tilt Speed Control

The most challenging part of the active camera control is the panning/tilting speed and its active time. If the tracking speed is set too fast and the active period is set too long, the tracking might be lost because the panning/tilting action is taken too much. On the other hand, if the tracking speed is set too slow to follow with the target object, we lost it as well. After our experiments, we select the 2°/sec on pan speed and 8°/sec on tilt speed and that will be the optimized solution.

In our tracking system, we use mean shift tracking method to find the target object in every frame. Then we divide our monitor into 9 difference region as showed in Figure 4-3. If the target object moves away from the center, that will motivate us to drive the active camera to track the object. After the position tracking, we also need to judge the size change of our target object. If the size changes twice larger than the original one, we zoom out the camera. On the other hand, if the size becomes smaller than the half of original one, we zoom in it.

Figure 4-3 : Divide monitor screen into 9 difference regions

4.2 Pan and Tilt Control

In Pan and tilt control, we are according to the position of tracking object. And the command format that we sent has been described in the former articles. In Figure 4-4, we illustrate the flow chart of pan and tilt control, and in Figure 4-5, we list out the command format and content of panning right, panning left, tilting up, tilting down and stop command.

Figure 4-4 : Pan and tilt control (position control)

Here are three control issues that we should pay more attention to when we send commands to control the active camera.

I. The speed control of pan and tilt are optimized as 2°/sec on pan speed and 8°

/sec on tilt speed. When the speed is set to be higher, the image will be blurred due to the shortage exposure time of the camera. If the speed is set to be lower, the tracking will be easily got lost because the tracking speed is not enough to follow the moving object.

II. When we find out that the moving object has already been tracking in the center of the monitor, we have to send a “STOP” command no matter the action of the camera.

III. When we find out that the moving object has changed its position to another region, despite the center region, we should change our sending command to control the action of the active camera immediately.

Panning-Right Tilting-Up Stop Command

40H 01H 81H 40H 04H 81H 40H 00 H FF H

Pan Speed : 2°/sec Pan Speed : 8°/sec

在文檔中利用PTZ攝影機實現動態背景中特定物體之鎖定追蹤技術 (頁 23-0)