Chapter 4 Human tracking
4.3 Mean-shift algorithm
(a) (b) (c)
Fig. 4-5 (a) Gaussian kernel (b) flat kernel (c) Epanechnikov kernel
(a) (b) (c)
Fig. 4-6 (a) Target object (b) Kernel function (c) Target object and Kernel function
4.3 Mean-shift algorithm
In order to characterize the target, first a feature space is chosen. The reference target model is represent by normalized histogram q in the feature space. The target model can be considered as centered at the spatial location 0. In the subsequent frame, the candidate model is defined at location y, be expressed as p(y). We use Eq. 4.8 and Eq. 4.9 as our target and candidate model, respectively.
^ ^
The similarity value between and
^
Let be normalized pixel locations in the region defined as the target model. The region is centered at 0. Here we use Epanechnikov kernel , using these weights increases the robustness of the density estimation since the peripheral pixels are the least reliable.
The function associates to the pixel at location
^ ^
in the quantized featu r
^ 2
where δ is the Kronecker delta function. The normalization constant C is derived by imposing the condition from where
^
since the summation of delta functions for u=1…m is equal to one.
Candidate model
Let be the normalized pixel locations of the candidate model, center at y in the current frame. The normalization is succeed from the frame containing the target model. Here we use Epanechnikov kernel same as target model, but with bandwidth h,
*
{ }xi i=1...n
the probability of the feature u=1…m in the candidate model is given by
is the normalization constant. Note the does not depend on y because the pixel locations
Ch
x are organized in a regui y is one of the lattice nodes. The bandwidth h defined the scale of the target candidate.
Similarity measure
The similarity function defines a distance among target model and candidate model. The Bhattacharyya coefficient, which evaluates the similarity of the target model and the candidate model, is defined as
lar lattice and
To find the location corresponding to the target in the current frame, the Bhattacharyya coefficient in Eq. (4.16) should be maximized as function of y which can be solved by running the mean-shift iterations.
Object localization
Color and ICs information were chosen as the features, however, the same framework can be used for texture and edges, or any combination of them. In the sequential, it is assumed that the following information is available: a. d detection and localization of the objects track in the initial frame b. Every objects periodic analysis accounting for possible updates of the target models due to significant changes in color.
Minimize the distance d(y) is equivalent to maximizing the Bhattacharyya coefficient
. The search for the new target location in the current frame starts at the location y of the target in the previous frame. So the probabilities of the candidate at location
^ ^
0 1...
{pu(y )}u= m
^
y in the current frame have to be computed first. Using Taylor 0
expansion around the values
^ ^
In order to minimize the distance d(y), the second term in Eq. 4.17 has to be the maximized, the first term being independent of y. The second term represents the density estimate computed kernel function at y in the current frame, with the data being weighted by Eq. 4.18. In this process, the kernel is recursively moved from current location
y according to the relation. The distance 1
between
Fig. 4-7 Mean-shift algorithm flow chart Given the target model and its location
^
{ }qu u=1...m
^
y0in the previous frame, set initial previous similarity value equal to 0, then the mean-shift algorithm is described as following,
1. Initialize the location of the target in the current frame with
^
3. Find the next location of the candidate according to Eq. 4.19.
4. Compute ,and evaluate
Evaluate
^
[ ( ), ]1
^
p y q
ρ and go to step 2.
Otherwise
^ 0
^
y ← and break. y1
4.4 ROI resizing
In real situation, human probably walk toward or keep away from camera, thus fixed ROI’s scale is not suitable because the ROI will contain some background pixels or only some parts of human. Consequently, it will influence tracking result shown in Fig. 4-8 and a similarity value smaller than a threshold value. Therefore, the ability to resize ROI’s scale is an important issue. In our system the ROI scale is adjusting every 100 frames.
Fig. 4-8 ROI scale larger than object
Fig. 4-9 ROI resize flow chart
In order to adjust ROI scale adaptively, first we use temporal difference to find the foreground image and its position we do the temporal difference only in ROI’s regions. The dilation process is applied to gradually enlarge the boundaries of for ground pixel and link the broken boundary parts which obtain temporal difference.
The projection of dilation image into x-axis and y-axis will produce the current width (widthcurrent) and height (
is determ
) shown in Fig 4-10. Finally, the new ROI’s size ined with Eq. 4.20. Som times, the dilation process unable to link broken , th ward x-axis and y-axis will produce ROI actual hum So we will set the minimum size of ROI. After the s scale is determ rget model will be update, too. The update thod is use histogram shown in Eq. 4.21, where α is set to 0.6.
Fig. 4-10 X-axis and Y-axis image projection
In Fig. 4-11 and 4-12 show the ROI resize in fixed and active camera, respectively.
(a)
(b)
Fig. 4-11 Fixed camera condition (a) and (b) left is difference image. Right is after resize scale.
(a)
(b)
(c)
(d)
(e)
Fig. 4-12 Active camera condition. (a) original image (c-e) left are difference images.
Right is ROI resizing scale image
4.5 Kalman filter
Fig. 4-13 Kalman filter flow chart
Fig. 4-13 shows the Kalman filter algorithm with use to predict the target location when the target is occluded with other object or if Bhattacharyya similarity value smaller than a threshold value (in this case we use threshold value equal to 0.65).
if the Count smaller than 5, then it means object was occluded and the predicted
frames the target model histogram will be updated by Eq. 4.21.
In our system, the Kalman filter is integrated into mean-shift object tracking method. First, Kalman filter initialized by mean-shift target position. Second, the searching result of mean-shift is feedback as the measurement of Kalman filter and estimating its parameters.
We assume that and are Gaussian random variable with zero mean, so their probability density function are N[0,Q(t-1)] and N[0,R(t)], where the covariance matrix Q(t-1) and R(t) are referred to as the transition noise covariance matrix and measurement noise covariance matrix. Here
Wk Vk
[ , , , ]T
Xk = x y Vx Vy is state of the system at the moment k, Zk =[ , ]x yT is measurement value of system state at the moment k.
x and Vx are the horizontal position and veloci . The value of state transition matrix A, measurement matrix H, process noise covariance matrix Q and measurement noise covariance matrix R list as following Eq. 4.23,
ty respectively
The detail procedures of Kalman position prediction are listed bellow:
1. Predict the position of the target at moment k by kalman filter, and compute the prior error estimate covariance.
k
2. Centered with predicted position
^ '
xk , acquire the observation value Zkaccording to Eq. 4.22(b).
3. Correct measurement with Kalman filter, compute the revision matrix and renew poster state estimation as well as posterior error estimate covariance.
' ' Here Vx and Vy are x and y motions, respectively. In most application, we usually consider current and previous frame motion. If the moving object move in the same direction, using two frames motion to predict new position will obtain small error with respect to actual position. If the moving object moves in different direction use two frame motion are not good enough to represent because the predicted position.
Here we consider more frames motion to get more accuracy, therefore we choose 5 frames motion’s average to get the more represent of moving direction.
f
/ f (4.26)
In both formulas, xi and yi are the horizontal and vertical coordinate of the target center respectively, continuous frames. The distance between Kalman and mean-shift position are calculated by Euclidean distance.
In this section we use the position predicted by kalman filter compare with mean-shift. This comparison is helpful us to know the accuracy of the predicted position. We use Eq. 4-27 to calculate the error between mean-shift and Kalman filter.
^
1
| |
n i i i
x x
MAE n
=
−
=
∑
(4.27)
Where xi is mean-shift position and x^ is predicted position by Kalman filter.
We do experiment both in indoor and outdoor environments. In the indoor environment, the human is not walking with a certain direction or path. Meanwhile, in the outdoor environment, the human is walking with the same direction.
(a)
(b)
(c)
(d)
Fig. 4-14 Indoor environment (a) frame 643 and 662 (b) frame 679 and 710 (c) frame 718 and 728 (d) frame 761 and 797
The MAE is calculated separately in x-direction and y-direction Fig.4-15 and Fig.
4-17 show the curves of x and y-position, where green line and blue line indicate mean-shift and Kalman filter prediction, respectively. The MAE position analysis uses 2 and 5-previous frames motion. Table 4-1 shows the MAE of 5-previous frame motion is smaller than 2-previous frame motion. Meanwhile, Table 4-2 shows that the MAE of 2 and 5-previous frame motion do not differ greatly. The reason is the MAE in Table 4-1 are generated from human that walking in several directions, thus using as much as possible frame to compute Kalman prediction will produce position better than 2-previous frame. But in the case of Table 4-2, the human is walking with the same direction , thus 2-previous frame enough to represent the Kalman prediction.
(a)
(b)
Fig. 4-15 (a) previous 2 frame motion left is x and right y position (b) previous 5 frame motion left is x and right y position
Table 4-1 MAE in different frame motion
X position error (pixel) Y position error (pixel) distance
Previous 2 frame motion 18.25 3.74 18.9
Previous 5 frame motion 5.02 2.15 6.02
(a)
(b)
(c)
(d)
Fig. 4-16 Outdoor environment (a) frame 708 and 738 (b) frame 762 and 783 (c)frame 812 and 837 (d) frame 847 and 857
(a)
(b)
Fig. 4-17 (a) previous 2 frame motion left is x and right y position (b) previous 5 frame motion left is x and right y position
Table 4-2 MAE in different frame motion
X position error (pixel) Y position error (pixel) distance
Previous 2 frame motion 3.98 3.6 5.89
Previous 5 frame motion 4.07 3.619 5.49
Chapter 5
Experimental results
In this chapter, we will reveal the human detection and tracking system on active camera. Our algorithm was implemented on the platform of PC with Intel Core2 Quad 2.4GHz and 2GB RAM. Our algorithm was developed in Borland C++ Builder 6.0 on Window XP. Because our human detection and tracking system will run in real time video surveillance with an active pan-tilt-zoom camera, we should do some experiments to test its performance and stability under several kinds of environments.
In section 5.1, introduce the experimental environment. In section 5.2, we will experiment our kalman position predict and compare with mean-shift position and calculate their error by MAE (Mean absolute error). The kalman filter used in real-time situation to solve object occlusion problem. In section 5.3, we use experiment single object in the scenario. In section 5.4, we use experiment multiple objects in the scenario.
5.1 Environment setup
The environment of experimental locates in our laboratory. The complexity of the environment is enough to verify our system while tracking and detecting moving human. Fig. 5-1 show several images of our laboratory environment without zoom in/out operation. Fig. 5-2 shows several images for zoom in/out condition.
Fig. 5-1 Experimental environment
Fig. 5-2 Experiment zoom condition
5.2 Kalman filter
The following two figures (Fig. 5-3 and Fig. 5-4) are compared under occlusion situation the difference between Kalman filter and no Kalman filter. In Fig. 5-3 We can observe a human pass the red screen will result the Bhattacharyya coefficient small than threshold so in frame 227 the mean-shift tracking will miss lock the object under occlusion problem. Fig. 5-4 Kalman filter was embedded in mean-shift algorithm the occlusion problem can be solve in frame 227 as shown in Table 5-1. We can also observe the object occluded for long time but the human was still be locked.
Because the target object histogram be update so that kalman filter is not always be used in occlusion situation. Using histogram update idea in occlusion will increase tracking accuracy and precision.
There is one occlusion testing case in indoor environment. A man walking and then be occluded by a tall and long screen. In this situation the similarity measure will be drastically low so that the Kalman filter will be used to predict position.
Table 5-1 Predicted position
Frame number Similarity Object center
225 0.6874 (167,153)
226 0.6366 (165,153)
227 0.5933 (176,143)
259 0.6937 (112,118)
(a)
(b)
(c)
(d)
Fig. 5-3 Human tracking use mean-shift. (a) frame 194 and 206 (b) frame 225 and 234 (c) frame 243 and 246 (d) frame 255 and 277
(a)
(b)
(c)
(d)
(e)
Fig. 5-4 Human tracking use mean-shift and Kalman filter. (a) Frame 186 and 210 (b)Frame 224 and 227 (c) Frame 229 and 233 (d) Frame 255 and 263 (e)Frame 269 and 278
5.3 Active camera with single object experiment
In this section we experiment a single objects move in the scenario that active camera will smoothly trace the object. So there 5 topics to discuss one object in various situation that have different performance.
5.3.1 9 regions and 26 regions experiments
There are many methods to control the active camera direction, for example motion or position … etc. In this thesis we use position based method to control the camera pan/tilt directions. The image size is 320x240 so we divide into 9 and 25 regions. Every region implies a direction and speed so that the object in one of these regions the algorithm sends command to active camera tell it to pan/tilt.
In 9 regions, the speed of each regions have fixed speed and the speed was be stopped by stop command that means the pan/tilt angle was limited by stop command so in visual situation we can observe the camera stop and go repeat forever shown in Fig. 5-5. The stop and go phenomenon that result in observer uncomfortable. The 9 regions direction are show in Fig. 5-6. In real situation the object will be missed
In order to solve stop and go phenomenon, the regions of image divides into 25 regions shown in chap2 Fig. 2-5. Each regions have different speed and the stop command will not be used anymore so the we not limit the angle of pan/tilt. Thus stop and go problem will be solved show in Fig. 5-7.
Fig. 5-5 9 regions pan/tilt control
(a)
(b)
(c)
(d)
Fig. 5-6 Position based use 9 regions (a)frame 180, 244 and 272 (b)frame 345, 383 and 445 (c)frame 510, 560 and 564 (d)frame 567, 595 and 601
(a)
(b)
(c)
(d)
(e)
Fig. 5-7 Position based use 25 regions (a)frame 127, 189 and 213 (b)frame 241, 280 and 232 (c)frame 366, 385 and 418 (d)frame 473, 565 and 612 (e)frame 693, 775 and 843
5.3.2 Color spaces experiments
In section 5.2.1, there are two samples use HSV as color feature. We can observe this color space can focus on object shown in Fig. 5-7. So our experiment HSV is the main color space used on active camera. Because object walking in the scenario the lightness will result in object occurring essence change. In case of brighten occurred the object essence become different than original. Y’UV and RGB color space used to compare with HSV. In Fig. 5-8 is RGB color space used in our algorithm. In these figures we can track the object in the scenario smoothly, but the ROI position not always focus on object’s body. Sometimes it focuses on floor that the similarity values drastically down. The phenomenon in RGB color space has more sensitivity to lightness. In Fig. 5-9 the Y’UV color space is better than RGB.
(a)
(b)
(c)
(d)
Fig. 5-8 Choice RGB color space (a) frame 169, 215 and 283 (b) frame 328, 391 and 446 (c) frame 477, 506 and 537 (d) frame 623, 633 and 675
(a)
(b)
(c)
(d)
Fig. 5-9 Choice Y’VU color space (a) frame 134, 175 and 215 (b) frame 272, 306 and 325 (c) frame 401,449 and 477 (d) frame 497, 528 and 582
5.3.3 Using color and ICA features in human tracking experiment
In this section, we will experiment the ICA features embedded in our mean-shift algorithm. In chapter 4, we have described how to combine color and ICA features in mean-shift algorithm. The purpose of ICA features is used to solve tracking miss problem. Target object is tracked by mean-shift algorithm when a background or nonhuman object has the same color with target object the tracking system maybe
to solve tracking miss problem when target object and nonhuman object have the same color.
There are 4 cases to experiment moving object and background in the same color situation. In case chair, a man moving in the scenario and drag a chair that has the same color with him. In case chair2 and chair3, there is a chair in the scenario and then the human appeared and then the man walks near the chair. In case same color, the man’s cloth has the same color with floor. All of these four samples have same character that human have same color with background objects. So we use color and ICA embedded in color as feature to observe the tracking performance.
In Fig. 5-10 to Fig. 5-13, straight line, line with cross and line with circle are ground truth, color feature and ICA embedded color feature tracking position, respectively. All of these features are present for x and y position in 2-D coordinate.
The line with circle has big error to ground truth line because when human and chair have the same color so in tracking system they have same similarity so the tracking system tracks the chair as shown in Fig. 5-10. In Table 5-2 shows the MAE (mean absolute error) in color feature is 33.573. In Fig. 5-11, the line with cross has small error to ground truth line because the human and nonhuman have different essence in the same color situation. In Table 5-2 shows the MAE in color feature is 14.59. In Fig. 5-11, the ICA embedded in color feature still has good performance. Fig.
5-12 and Fig. 5-13 are sample of chair2 and same color, respectively.
(a) (b)
Fig. 5-10 Sample of chair. (a) x position (b) y position
Fig. 5-11 Sample of chir3. (a) x position (b) y position
(a) (b) Fig. 5-12 Sample of chair2 (a) x position (b) y position
(a) (b)
Fig. 5-13 Sample of chair2 (a) x position (b) y position Table 5-2 Color and ICA feature in different case
Sample Color feature MAE ICA + color feature MAE
Chair 33.573 14.59
Chair3 68.97 21.4
Chair2 16.1 16.3
Same color 21.5 19.2
In Fig. 5-14 shows that only use color as feature the tracking will miss tracking when human and chair have same color. In Fig. 5-15 color and ICA features used and the tracking system will not missed when human and chair have the same color. Fig.
5-15 sample shows tracking algorithm will not miss anymore that ICA can solve human and chair have the same color situation because the ICA features have human essence so the ICA in human and chair are not similar.
(a)
(b)
(c)
(d)
Fig. 5-14 Using color feature. (a) frame 619 and 634 (b) frame 645 and 677 (c) frame 746 and 761 (d) frame 792and 813
(a)
(b)
(c)
(d)
Fig. 5-15 Using color and ICA features. (a) frame 369 and 422 (b) frame 453 and 518 (c)frame 593 and 638 (d) frame 694 and 723
5.3.4 Human tracking by zoom in/out control experiment
Human tracking by active camera control successfully implement in our system.
Although it can achieve target object always keep in field of view and in the center of monitor screen. Sometimes the target object may be far away our camera the object’s size too small result resolution unclear. So the zoom in/out operation was used to improve this problem. In chapter 4, we have introduced the ROI scale resize method that successfully used in human tracking when camera is moving. ROI scale resize still can used in zoom in/out operate. Here when the ROI scale was be adjusted if the new size of ROI is small than the old one then the zoom out will be operated.
Otherwise when the ROI scale was be adjusted if the new size of ROI is larger than
the old one then the zoom in will be operated as shown in Fig. 5-16.
(a)
(b)
(c)
(d)
Fig. 5-16 Zoom in/out tracking (a) original image at frame 494 (b) At frame 529 zoom in (c) At frame 744 zoom out (d) At frame 798 zoom in.
5.4 Human tracking in multiple objects experiment
In previous section, we have experiment the single object tracked by our algorithm on active camera. And it can achieve good performance just like pan/tilt control, zoom in/out operation, ICA feature and ROI scale resize promote tracking robustness in single object situation. But in generally situation it is not always only single object in the scenario so in this section we will experiment multiple object in the scenario. Sometimes the object may occlude our target object or across with each other. So in this section objects occlude and across are our main experiment topics.
Fig. 5-17 shows the target object was be locked by our tracking algorithm, after a period of time a person with red cloth walk into the scenario. The person will across
Fig. 5-17 shows the target object was be locked by our tracking algorithm, after a period of time a person with red cloth walk into the scenario. The person will across