Chapter 4 Human tracking
4.5 Kalman filter
Fig. 4-13 Kalman filter flow chart
Fig. 4-13 shows the Kalman filter algorithm with use to predict the target location when the target is occluded with other object or if Bhattacharyya similarity value smaller than a threshold value (in this case we use threshold value equal to 0.65).
if the Count smaller than 5, then it means object was occluded and the predicted
frames the target model histogram will be updated by Eq. 4.21.
In our system, the Kalman filter is integrated into mean-shift object tracking method. First, Kalman filter initialized by mean-shift target position. Second, the searching result of mean-shift is feedback as the measurement of Kalman filter and estimating its parameters.
We assume that and are Gaussian random variable with zero mean, so their probability density function are N[0,Q(t-1)] and N[0,R(t)], where the covariance matrix Q(t-1) and R(t) are referred to as the transition noise covariance matrix and measurement noise covariance matrix. Here
Wk Vk
[ , , , ]T
Xk = x y Vx Vy is state of the system at the moment k, Zk =[ , ]x yT is measurement value of system state at the moment k.
x and Vx are the horizontal position and veloci . The value of state transition matrix A, measurement matrix H, process noise covariance matrix Q and measurement noise covariance matrix R list as following Eq. 4.23,
ty respectively
The detail procedures of Kalman position prediction are listed bellow:
1. Predict the position of the target at moment k by kalman filter, and compute the prior error estimate covariance.
k
2. Centered with predicted position
^ '
xk , acquire the observation value Zkaccording to Eq. 4.22(b).
3. Correct measurement with Kalman filter, compute the revision matrix and renew poster state estimation as well as posterior error estimate covariance.
' ' Here Vx and Vy are x and y motions, respectively. In most application, we usually consider current and previous frame motion. If the moving object move in the same direction, using two frames motion to predict new position will obtain small error with respect to actual position. If the moving object moves in different direction use two frame motion are not good enough to represent because the predicted position.
Here we consider more frames motion to get more accuracy, therefore we choose 5 frames motion’s average to get the more represent of moving direction.
f
/ f (4.26)
In both formulas, xi and yi are the horizontal and vertical coordinate of the target center respectively, continuous frames. The distance between Kalman and mean-shift position are calculated by Euclidean distance.
In this section we use the position predicted by kalman filter compare with mean-shift. This comparison is helpful us to know the accuracy of the predicted position. We use Eq. 4-27 to calculate the error between mean-shift and Kalman filter.
^
1
| |
n i i i
x x
MAE n
=
−
=
∑
(4.27)
Where xi is mean-shift position and x^ is predicted position by Kalman filter.
We do experiment both in indoor and outdoor environments. In the indoor environment, the human is not walking with a certain direction or path. Meanwhile, in the outdoor environment, the human is walking with the same direction.
(a)
(b)
(c)
(d)
Fig. 4-14 Indoor environment (a) frame 643 and 662 (b) frame 679 and 710 (c) frame 718 and 728 (d) frame 761 and 797
The MAE is calculated separately in x-direction and y-direction Fig.4-15 and Fig.
4-17 show the curves of x and y-position, where green line and blue line indicate mean-shift and Kalman filter prediction, respectively. The MAE position analysis uses 2 and 5-previous frames motion. Table 4-1 shows the MAE of 5-previous frame motion is smaller than 2-previous frame motion. Meanwhile, Table 4-2 shows that the MAE of 2 and 5-previous frame motion do not differ greatly. The reason is the MAE in Table 4-1 are generated from human that walking in several directions, thus using as much as possible frame to compute Kalman prediction will produce position better than 2-previous frame. But in the case of Table 4-2, the human is walking with the same direction , thus 2-previous frame enough to represent the Kalman prediction.
(a)
(b)
Fig. 4-15 (a) previous 2 frame motion left is x and right y position (b) previous 5 frame motion left is x and right y position
Table 4-1 MAE in different frame motion
X position error (pixel) Y position error (pixel) distance
Previous 2 frame motion 18.25 3.74 18.9
Previous 5 frame motion 5.02 2.15 6.02
(a)
(b)
(c)
(d)
Fig. 4-16 Outdoor environment (a) frame 708 and 738 (b) frame 762 and 783 (c)frame 812 and 837 (d) frame 847 and 857
(a)
(b)
Fig. 4-17 (a) previous 2 frame motion left is x and right y position (b) previous 5 frame motion left is x and right y position
Table 4-2 MAE in different frame motion
X position error (pixel) Y position error (pixel) distance
Previous 2 frame motion 3.98 3.6 5.89
Previous 5 frame motion 4.07 3.619 5.49
Chapter 5
Experimental results
In this chapter, we will reveal the human detection and tracking system on active camera. Our algorithm was implemented on the platform of PC with Intel Core2 Quad 2.4GHz and 2GB RAM. Our algorithm was developed in Borland C++ Builder 6.0 on Window XP. Because our human detection and tracking system will run in real time video surveillance with an active pan-tilt-zoom camera, we should do some experiments to test its performance and stability under several kinds of environments.
In section 5.1, introduce the experimental environment. In section 5.2, we will experiment our kalman position predict and compare with mean-shift position and calculate their error by MAE (Mean absolute error). The kalman filter used in real-time situation to solve object occlusion problem. In section 5.3, we use experiment single object in the scenario. In section 5.4, we use experiment multiple objects in the scenario.
5.1 Environment setup
The environment of experimental locates in our laboratory. The complexity of the environment is enough to verify our system while tracking and detecting moving human. Fig. 5-1 show several images of our laboratory environment without zoom in/out operation. Fig. 5-2 shows several images for zoom in/out condition.
Fig. 5-1 Experimental environment
Fig. 5-2 Experiment zoom condition
5.2 Kalman filter
The following two figures (Fig. 5-3 and Fig. 5-4) are compared under occlusion situation the difference between Kalman filter and no Kalman filter. In Fig. 5-3 We can observe a human pass the red screen will result the Bhattacharyya coefficient small than threshold so in frame 227 the mean-shift tracking will miss lock the object under occlusion problem. Fig. 5-4 Kalman filter was embedded in mean-shift algorithm the occlusion problem can be solve in frame 227 as shown in Table 5-1. We can also observe the object occluded for long time but the human was still be locked.
Because the target object histogram be update so that kalman filter is not always be used in occlusion situation. Using histogram update idea in occlusion will increase tracking accuracy and precision.
There is one occlusion testing case in indoor environment. A man walking and then be occluded by a tall and long screen. In this situation the similarity measure will be drastically low so that the Kalman filter will be used to predict position.
Table 5-1 Predicted position
Frame number Similarity Object center
225 0.6874 (167,153)
226 0.6366 (165,153)
227 0.5933 (176,143)
259 0.6937 (112,118)
(a)
(b)
(c)
(d)
Fig. 5-3 Human tracking use mean-shift. (a) frame 194 and 206 (b) frame 225 and 234 (c) frame 243 and 246 (d) frame 255 and 277
(a)
(b)
(c)
(d)
(e)
Fig. 5-4 Human tracking use mean-shift and Kalman filter. (a) Frame 186 and 210 (b)Frame 224 and 227 (c) Frame 229 and 233 (d) Frame 255 and 263 (e)Frame 269 and 278
5.3 Active camera with single object experiment
In this section we experiment a single objects move in the scenario that active camera will smoothly trace the object. So there 5 topics to discuss one object in various situation that have different performance.
5.3.1 9 regions and 26 regions experiments
There are many methods to control the active camera direction, for example motion or position … etc. In this thesis we use position based method to control the camera pan/tilt directions. The image size is 320x240 so we divide into 9 and 25 regions. Every region implies a direction and speed so that the object in one of these regions the algorithm sends command to active camera tell it to pan/tilt.
In 9 regions, the speed of each regions have fixed speed and the speed was be stopped by stop command that means the pan/tilt angle was limited by stop command so in visual situation we can observe the camera stop and go repeat forever shown in Fig. 5-5. The stop and go phenomenon that result in observer uncomfortable. The 9 regions direction are show in Fig. 5-6. In real situation the object will be missed
In order to solve stop and go phenomenon, the regions of image divides into 25 regions shown in chap2 Fig. 2-5. Each regions have different speed and the stop command will not be used anymore so the we not limit the angle of pan/tilt. Thus stop and go problem will be solved show in Fig. 5-7.
Fig. 5-5 9 regions pan/tilt control
(a)
(b)
(c)
(d)
Fig. 5-6 Position based use 9 regions (a)frame 180, 244 and 272 (b)frame 345, 383 and 445 (c)frame 510, 560 and 564 (d)frame 567, 595 and 601
(a)
(b)
(c)
(d)
(e)
Fig. 5-7 Position based use 25 regions (a)frame 127, 189 and 213 (b)frame 241, 280 and 232 (c)frame 366, 385 and 418 (d)frame 473, 565 and 612 (e)frame 693, 775 and 843
5.3.2 Color spaces experiments
In section 5.2.1, there are two samples use HSV as color feature. We can observe this color space can focus on object shown in Fig. 5-7. So our experiment HSV is the main color space used on active camera. Because object walking in the scenario the lightness will result in object occurring essence change. In case of brighten occurred the object essence become different than original. Y’UV and RGB color space used to compare with HSV. In Fig. 5-8 is RGB color space used in our algorithm. In these figures we can track the object in the scenario smoothly, but the ROI position not always focus on object’s body. Sometimes it focuses on floor that the similarity values drastically down. The phenomenon in RGB color space has more sensitivity to lightness. In Fig. 5-9 the Y’UV color space is better than RGB.
(a)
(b)
(c)
(d)
Fig. 5-8 Choice RGB color space (a) frame 169, 215 and 283 (b) frame 328, 391 and 446 (c) frame 477, 506 and 537 (d) frame 623, 633 and 675
(a)
(b)
(c)
(d)
Fig. 5-9 Choice Y’VU color space (a) frame 134, 175 and 215 (b) frame 272, 306 and 325 (c) frame 401,449 and 477 (d) frame 497, 528 and 582
5.3.3 Using color and ICA features in human tracking experiment
In this section, we will experiment the ICA features embedded in our mean-shift algorithm. In chapter 4, we have described how to combine color and ICA features in mean-shift algorithm. The purpose of ICA features is used to solve tracking miss problem. Target object is tracked by mean-shift algorithm when a background or nonhuman object has the same color with target object the tracking system maybe
to solve tracking miss problem when target object and nonhuman object have the same color.
There are 4 cases to experiment moving object and background in the same color situation. In case chair, a man moving in the scenario and drag a chair that has the same color with him. In case chair2 and chair3, there is a chair in the scenario and then the human appeared and then the man walks near the chair. In case same color, the man’s cloth has the same color with floor. All of these four samples have same character that human have same color with background objects. So we use color and ICA embedded in color as feature to observe the tracking performance.
In Fig. 5-10 to Fig. 5-13, straight line, line with cross and line with circle are ground truth, color feature and ICA embedded color feature tracking position, respectively. All of these features are present for x and y position in 2-D coordinate.
The line with circle has big error to ground truth line because when human and chair have the same color so in tracking system they have same similarity so the tracking system tracks the chair as shown in Fig. 5-10. In Table 5-2 shows the MAE (mean absolute error) in color feature is 33.573. In Fig. 5-11, the line with cross has small error to ground truth line because the human and nonhuman have different essence in the same color situation. In Table 5-2 shows the MAE in color feature is 14.59. In Fig. 5-11, the ICA embedded in color feature still has good performance. Fig.
5-12 and Fig. 5-13 are sample of chair2 and same color, respectively.
(a) (b)
Fig. 5-10 Sample of chair. (a) x position (b) y position
Fig. 5-11 Sample of chir3. (a) x position (b) y position
(a) (b) Fig. 5-12 Sample of chair2 (a) x position (b) y position
(a) (b)
Fig. 5-13 Sample of chair2 (a) x position (b) y position Table 5-2 Color and ICA feature in different case
Sample Color feature MAE ICA + color feature MAE
Chair 33.573 14.59
Chair3 68.97 21.4
Chair2 16.1 16.3
Same color 21.5 19.2
In Fig. 5-14 shows that only use color as feature the tracking will miss tracking when human and chair have same color. In Fig. 5-15 color and ICA features used and the tracking system will not missed when human and chair have the same color. Fig.
5-15 sample shows tracking algorithm will not miss anymore that ICA can solve human and chair have the same color situation because the ICA features have human essence so the ICA in human and chair are not similar.
(a)
(b)
(c)
(d)
Fig. 5-14 Using color feature. (a) frame 619 and 634 (b) frame 645 and 677 (c) frame 746 and 761 (d) frame 792and 813
(a)
(b)
(c)
(d)
Fig. 5-15 Using color and ICA features. (a) frame 369 and 422 (b) frame 453 and 518 (c)frame 593 and 638 (d) frame 694 and 723
5.3.4 Human tracking by zoom in/out control experiment
Human tracking by active camera control successfully implement in our system.
Although it can achieve target object always keep in field of view and in the center of monitor screen. Sometimes the target object may be far away our camera the object’s size too small result resolution unclear. So the zoom in/out operation was used to improve this problem. In chapter 4, we have introduced the ROI scale resize method that successfully used in human tracking when camera is moving. ROI scale resize still can used in zoom in/out operate. Here when the ROI scale was be adjusted if the new size of ROI is small than the old one then the zoom out will be operated.
Otherwise when the ROI scale was be adjusted if the new size of ROI is larger than
the old one then the zoom in will be operated as shown in Fig. 5-16.
(a)
(b)
(c)
(d)
Fig. 5-16 Zoom in/out tracking (a) original image at frame 494 (b) At frame 529 zoom in (c) At frame 744 zoom out (d) At frame 798 zoom in.
5.4 Human tracking in multiple objects experiment
In previous section, we have experiment the single object tracked by our algorithm on active camera. And it can achieve good performance just like pan/tilt control, zoom in/out operation, ICA feature and ROI scale resize promote tracking robustness in single object situation. But in generally situation it is not always only single object in the scenario so in this section we will experiment multiple object in the scenario. Sometimes the object may occlude our target object or across with each other. So in this section objects occlude and across are our main experiment topics.
Fig. 5-17 shows the target object was be locked by our tracking algorithm, after a period of time a person with red cloth walk into the scenario. The person will across the target object and our tracking algorithm and active camera will move smoothly and stably. Fig. 5-18 shows the target object was be occluded by the person with red clothes. But the person will not influence our system. That means our system has robust lock ability.
(a)
(b)
(c)
(d)
Fig. 5-17 Across case. (a) frame 2 and 32 (b)frame 88 and 174 (c)frame 209 and 243
(a)
(b)
(c)
(d)
(e)
Fig. 5-18 Occlude case. (a) frame 211 and 252 (b)frame 390 and 403 (c)frame 431 and 572 (d) frame 601 and 627 (e) frame 635 and 667
Chap 6
Conclusions and Future work
6.1 Conclusions
The experiment results show that the system is capable to track moving humans by mean-shift algorithm with active camera. The system can tracks human not only in single object scenario but also in multiple objects scenario as shown in chap 5. In low resolution situation, the system adaptively operates zoom in/out to adjust resolution.
There are several contributions in this research:
1. Our system can exactly distinguish human and nonhuman.
2. The extracted ICA features include human essence, so human tracking can still lock target object when the target and background have the same color.
3. In the pass, temporal difference has been used to determine the moving object position; therefore, active camera needs to be in fixed circumstance.
The active camera is able to keep moving the whole time, because the temporal difference is neglected.
4. ROI can automatically be resized to match the real target’s size.
6.2 Future work
In our system, the moving human can be detected and tracked smoothly and continuously but if the moving human in the complex environment that will result
tracking lose. For example, the background has very bright light that will result moving human changes it character.
We use temporal difference, x-axis and y-axis to redefined target object ROI size.
The experiment result has shown in chap5. There is a problem in this method. When the active camera move and use temporal difference, in the general situation we get blur and unclear image. This problem will influence our ROI resizing. There are some methods to solve this problem. For example [37], use different kernel function scale to adjust height and width avoid temporal difference used.
The active camera is driven by pelco P protocol and use position based control pan or tilt. Although, drive active camera to control direction successfully. But the speed of pan or tilt is fixed. Maybe the moving human motion can be considered and use motion to drive active camera speed and direction. That will result active camera moves more reliable.
References
[1] Wei Guo, Dong-Liang Bi, Lu Liu "Human motion tracking based on shape analysis" in proc. of the 2007 international conference on wavelet analysis and pattern recognition, Beijing , china, 2-4 nov. 2007.
[2] W. J. Gillner, “Motion based vehicle detection on motorways,” in Proc. of the Intelligent Vehicles '95 Symposium, pp.483-487, Sept. 1995.
[3] R.C. Gonzalez, R.E. Woods, Digital Image Processing, Addison-Wesley, New York, 1992.
[4] D. Marr, E. Hildreth, Theory of edge detection, Proc. R. Soc. London 207 (1980) 197–217.
[5] T. Law, H. Itoh, H. Seki, Image filtering, edge detection, and edge tracing using fuzzy reasoning, IEEE Trans. Pattern Analysis Mach. Intell. 18 (1996) 481–491.
[6] P. H. Batavia, D. E. Pomerleau, and C. E. Thorpe, “Overtaking vehicle detection using implicit optical flow,” IEEE Conference on Intelligent Transportation System, Nov.1997, pp. 729-734.
[7] Dalal N, T riggs B, “Histograms of oriented gradients for human detection”, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol 1,pp. 886-893, 2005
[8] L. Zhao and C. E. Thorpe, “Stereo- and neural network-based pedestrian detection,” IEEE Transactions on Intelligent Transportation Systems, Vol. 1, No.3, pp. 148-154, Sept. 2000.
[9] K. Fukunaga, L.D. Hostetler, “The estimation of the gradient of a density function, with applications in pattern recognition,” IEEE Trans. Inform. Theo. 21 (1) (1975) 32–40.
[10] Montabone S, Soto A, “Human detection using a mobile platform and novel features derived from a visual saliency mechanism”, Image and Vision Computing Volume. 28, Issue.3, pp. 391-402, 2010
[11] M.S. Bartlett, J.R. Movellan and T.J. Sejnowski, “Face recognition by independent component analysis.“ IEEE Transaction on Neural Networks, Vol.13, No. 6 , pp. 1450–1464, 2002.
[12] Y. Ou, X. Wu,H. Qian and Y. Xu, “A Real Time Race Classification System,”
IEEE International Conference on Information Acquisition, pp. 378-383, 2005.
[13] P. Viola, M. Jones, and D. Snow, “Detecting Pedestrians Using Patterns of Motion and Appearance,” Int’l J. Computer Vision, vol. 63, no. 2, pp. 153-161,
2005.
[14] Dimitrijevic M, Lepetit V, Fua P, "Human body pose detection using Bayesian spatio-temporal templates", Computer Vision and Image Understanding, Vol.104, issue.2-3, pp.127-139,2006 .
[15] Plamen P, Ognian B and Krasimir M, “Face Detection and Tracking with an Active Camera”, International IEEE Conference Intelligent Systems, 2008 4th.
[16] Smith, A. R., ”Color Gamut Transform Pairs,” Computer Graphics, Vol. 12(3), pp. 12-19, 1978
[17] http://en.wikipedia.org/wiki/HSV_color_space#Conversion_from_RGB_to_HSL _or_HSV
[18] http://www.mathworks.com/access/helpdesk/help/toolbox/images/f8-20792.html [19] Comaniciu, D., Ramesh, V., and Meer, P. “Kernel-based object tracking.” IEEE
Transactions on Pattern Analysis and Machine Intelligence, 25, 5 (2003), 564—577.
[20] D. Freedman and P. Kisilev, " Fast mean shift by compact density representation." IEEE Conference on Computer Vision and Pattern Recognition Pages: 1818-1825 Published: 2009.
[21] Wang F.L., Yu S.Y. and Yang J., "Robust and efficient fragments-based tracking using mean shift." Aeu-International Journal of Electronics and Communications Volume: 64 Issue: 7 Pages: 614-623 Published: 2010
[22] F. Porikli, O. Tuzel, “Multi-kernel object tracking,” IEEE Int. Conf. Multimedia Expo (2005) 1234–1237.
[23] C. Yang, R. Duraiswami, L. Davis, “Efficient mean-shift tracking via a new Similarity measure.” in: IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, 2005, pp. 176–183.
[24] P. Pe´rez, C. Hue, J. Vermaak, M. Gangnet, Color-based probabilistic tracking, in:
Proceedings of European Conference on Computer Vision, Copenhagen, Denmark, 2002.
[25] K. Nummiaro, E. Koller-Meier, L. Van Gool, An adaptive color based particle filter, Image Vis. Comput. 21 (1) (2003) 99–110.
[26] O. Williams, A. Blake, R. Cipolla, Sparse bayesian learning for efficient visual
[26] O. Williams, A. Blake, R. Cipolla, Sparse bayesian learning for efficient visual