Chapter 3 Human tracking
3.6 Occlusion handler
Normally, particle filter can handle some occlusion condition, but it depends on the range of samples which spread in the spatial space. If some samples spread in the location where human appeared after occlusion, then the human still can be tracked continuously. On the other hand, if the occlusion happened and spread range of samples is too small to cover the region where human appears, then resample will lead to track lost.
The occlusion handler in our work is based on color similarity of target and candidate model. The details are described below.
1. Create candidate model 𝑐 = {𝑐(𝑢)}𝑢=1…𝑚 from the ROI in current frame.
2. Compute similarity value between target model 𝑞′= {𝑞′(𝑢)}
𝑢=1…𝑚 and candidate model 𝑐 = {𝑐(𝑢)}𝑢=1…𝑚.
3. If 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 < 𝑡ℎ𝑠𝑠𝑖𝑚, then do not process the resample step. Assume candidate model is occluded with other object.
4. Add counter 𝐶𝑜𝑢𝑛𝑡 = 𝐶𝑜𝑢𝑛𝑡 + 1.
5. During tracking process, the step 1~4 are iterated until the tracking human has appeared (similarity value larger than thssim) or 𝐶𝑜𝑢𝑛𝑡 ≥ 10 which avoided the samples spread out of image.
6. Then the resample step is restarted.
24
(a) frame 681 (b) frame 685
(c) frame 690 (d) frame 695
Fig. 3-7 Occlusion handler (a) frame 681 (b) frame 685 (c) frame 690 (d) frame 695
PF similarity
similarity
< 0.7 Count < 10
Yes
No
Weighted resampling
End Count=0
Count++
No
Yes
Occlusion handler
Fig. 3-8 Occlusion handler flow chart
25
Fig. 4-1 Active camera control through RS485
The active camera is controlled by pelco P-protocol [34] through RS-232 to RS-485 converter. It has to control pan (horizontal direction), tilt (vertical direction) angle, and zoom’s step to achieve tracking purpose.
The pelco P-protocol has 8 bytes data with message format as shown in Fig. 4-2.
Byte1 and byte7 is start and stop byte, and always set to 0xA0 and 0xAF respectively.
Byte2 is the receiver or camera address. In this thesis, we only use one camera, so byte2 always set to 0x00. Byte3, byte4, byte5, byte6 are used to control pan-tilt-zoom (PTZ) as shown in Fig. 4-3. The last byte is an XOR check sum byte.
Data byte 2 Data byte 3 Data byte 4
Fig. 4-2 Message format
26
Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
Data byte1 Fixed to 0 Camera On Auto Scan On
Camera On/Off
Iris Close Iris Open Focus Near Focus Far
Data byte2 Fixed to 0 Zoom Wide Zoom Tele Tilt Down Tilt Up Pan Left Pan Right 0 (for pan/tilt) Data byte3 Pan speed 00 (stop) to 3F (high speed) and 40 for Turbo
Data byte4 Tilt speed 00 (stop) to 3F (high speed)
Fig. 4-3 Data byte 1 to 4 format
In this thesis, we divide the image into 9 regions associated with pan-tilt directions, and keep moving object in the center of FOV. Every region has specific direction as shown in Fig. 4-4. If the target is located on stop-region, then camera is set to stop. Meanwhile, the camera speed on other regions is determined by PID controller. The zoom-in and zoom-out will be activated if the target’s size becomes smaller or larger than user’s defined size. The details of camera control are showed in Fig. 4-5.
S T O P
Fig. 4-4 Control direction for each regions
27 Tracking target
Target size >
upper bound Zoom out command
Target size <
lower bound Zoom in command
Target position in
center region Stop command
PID control
Pan / Tilt command
Update upper bound
& lower bound &
target size
Send PTZ command Yes
Yes
Yes No
No
No
Fig. 4-5 Camera control flow chart
4.1 PID controller
A proportional-integral-derivative controller (PID controller) is a generic control loop feedback mechanism (controller) widely used in industrial control systems [35].
The diagram of PID is showed in Fig. 4-6.
28
Fig. 4-6 The PID controller
In the PID control system, the monitored Plant/Process is hoped to keep one ideal state. The measured value of Plant/Process is 𝑦(𝑡) which is sent to the comparer to compare with setting value 𝑢(𝑡). If the Plant/Process has affected by disturbance, the measured value is not equal to setting value and the comparer will produce error signal 𝑒(𝑡). The error signal 𝑒(𝑡) is sent to controller. The controller produces output signal 𝐶𝑜𝑢𝑡 to correct Plant/Process and make it returns to ideal state.
The output signal 𝐶𝑜𝑢𝑡 is defined by following equation.
𝐶𝑜𝑢𝑡 = 𝐾𝑝𝑒(𝑡) + 𝐾𝐼∫ 𝑒(𝑡) 𝑑𝑡 + 𝐾𝐷𝑑𝑒(𝑡)
𝑑𝑡 (4.1)
where 𝐾𝑝 is proportional constant, 𝐾𝐼 is integral constant, and 𝐾𝐷 is derivative constant.
The controller consists proportional controller (P controller), integral controller (I controller), and derivative controller (D controller).
1. P controller: is error signal 𝑒(𝑡) multiplied by 𝐾𝑝. The Plant/Process which has been disturbed can be corrected by this controller, but there are some small eternal error cannot solve by this controller.
2. I controller: is the integral of error signal 𝑒(𝑡) with time. In other words, it multiplies error with its existed time. Also, it can correct the small eternal error which P controller cannot overcome. This controller can use the accumulated integrals with time to make disturbed Plant/Process recover to setting value 𝑢(𝑡).
29
3. D controller: is the differential of error signal 𝑒(𝑡). Due to this operation, the system has the perspective and can predict the Plant/Process which has large variation.
The corresponding variables of PID controller in our work are defined as follows:
Setting value 𝑢(𝑡): the center position of image.
Error signal 𝑒(𝑡): the difference of center position and target position.
Measured value 𝑦(𝑡): the target position which estimated by tracking system.
Output signal 𝐶𝑜𝑢𝑡: the output is transferred to pan / tilt speed.
We use two independent PID controllers to control horizontal and vertical position difference, and estimate the speed of pan and tilt. The 𝐶𝑜𝑢𝑡 is converted to pan speed and tilt speed by Eq. 4.2, Eq. 4.3.
𝑆𝑝𝑒𝑒𝑑𝑝𝑎𝑛 = 𝐶𝑜𝑢𝑡 ∗ 0.1 + 𝑜𝑓𝑓𝑠𝑒𝑡𝑝𝑎𝑛 (4.2)
𝑆𝑝𝑒𝑒𝑑𝑡𝑖𝑙𝑡 = 𝐶𝑜𝑢𝑡 ∗ 0.1 + 𝑜𝑓𝑓𝑠𝑒𝑡𝑡𝑖𝑙𝑡 (4.3)
𝑜𝑓𝑓𝑠𝑒𝑡𝑝𝑎𝑛 = { 𝑜𝑓𝑓𝑠𝑒𝑡𝑝𝑎𝑛, 𝐶𝑜𝑢𝑡 ≥ 0
−𝑜𝑓𝑓𝑠𝑒𝑡𝑝𝑎𝑛, 𝐶𝑜𝑢𝑡 < 0 (4.4)
𝑜𝑓𝑓𝑠𝑒𝑡𝑡𝑖𝑙𝑡 = { 𝑜𝑓𝑓𝑠𝑒𝑡𝑡𝑖𝑙𝑡, 𝐶𝑜𝑢𝑡 ≥ 0
−𝑜𝑓𝑓𝑠𝑒𝑡𝑡𝑖𝑙𝑡, 𝐶𝑜𝑢𝑡 < 0 (4.5)
where 𝑜𝑓𝑓𝑠𝑒𝑡𝑝𝑎𝑛 and 𝑜𝑓𝑓𝑠𝑒𝑡𝑡𝑖𝑙𝑡 are defined by user. These values are related to the pan-tilt speed provided by camera’s specifications (0 to 64). The PID controller in Eq.
4.2 and Eq. 4.3 produced limited speed value in a suitable range, because if the speed is set too large, then the camera may drive over the object. The consequence is tracking lost may happen.
30 new width and height are defined as follow.
𝑟𝑎𝑡𝑖𝑜𝑤 ℎ⁄ =𝑤ℎ𝑖𝑛𝑖𝑡𝑖𝑎𝑙
31
5 Chapter 5
Experimental results
This system was implemented on PC platform with Intel® Core™ i5 CPU 650 @ 3.20GHz, 4GB RAM, and developed in Borland C++ Builder 6.0 on Windows 7. The system has been tested under several environments in order to verify its performance and stability. Both video files (AVI uncompressed format) and image sequences from active camera are tested.
5.1 Track on video file
Three videos have been used to verify the tracking system, with parameter particle filter as follows.
Number of samples 𝑁 = 30
Number of bins in histogram 𝑚 = 6 ∗ 6 ∗ 6 = 216
State covariance (𝜎𝑥, 𝜎𝑣𝑥, 𝜎𝑦, 𝜎𝑣𝑦, 𝜎𝑤, 𝜎ℎ) = (2,0.5,2,0.5,0.4,0.8)
1. Video 1 is used to verify the occlusion handler in our system as shown in Fig.
5-1 and Fig. 5-2. Figure 5-1 shows the tracking result without occlusion handler.
The full occlusion condition happens in frame 685. If the particle filter resamples during the full occlusion condition, it may resample on uncorrect positions as shown in frame 689 and tracking will lost in frame 694 and 698. Meanwhile, when the full occlusion happens in the particle filter with occlusion handle, the resample step will not be done immediately. So, the sample set can keep widespread range to track after full occlusion.
32
Frame 558 Frame 673
Frame 685 Frame 689
Frame 694 Frame 698
Fig. 5-1 Tracking without occlusion handler
33
Frame 558 Frame 673
Frame 685 Frame 689
Frame 694 Frame 698
Fig. 5-2 Tracking with occlusion handler
34
2. Video 2 is used to verify the tracking feature. Figure 5-3 shows human wears black jacket walking near a black chair. In this case, the target human has similar color feature with the black chair, but the proposed system still can tracks the target human.
Frame 193 Frame 258
Frame 292 Frame 351
Frame 372 Frame 398
Fig. 5-3 Object has similar color as target human
35
3. Video 3 is used to verify the tracking performance in complex situation. Figure 5-4 shows the target human is paritial occluded with a chair. The target human does sitting down and stand-up activity, as shown in Fig. 5-4 (b). Moreover, the target human is partial occluded with other human as shown in Fig. 5-4 (c).
(a)
(b)
(c)
Fig. 5-4 (a) frame 1436, 1547 and 1605 (b) frame 2152, 2214 and 2277 (c) frame 2757, 2818 and 2838
36
5.2 Track on active camera
The active camera sets-up in our laboratory. The complexity of the environment is enough to verify the system while detecting and tracking moving human. The parameters of particle filter and PTZ are set as follows:
Number of samples 𝑁 = 30
Number of bins in histogram 𝑚 = 6 ∗ 6 ∗ 6 = 216
State covariance
(𝜎𝑥, 𝜎𝑣𝑥, 𝜎𝑦, 𝜎𝑣𝑦, 𝜎𝑤, 𝜎ℎ) = (10,1,10,1,1,2)
𝑜𝑓𝑓𝑠𝑒𝑡𝑝𝑎𝑛 = 12 𝑜𝑓𝑓𝑠𝑒𝑡𝑡𝑖𝑙𝑡 = 6
Proportional constant 𝐾𝑝 = 0.9 Integral constant 𝐾𝐼 = 0.1 Derivative constant 𝐾𝐷 = 0.15 𝑟𝑎𝑡𝑒𝑏𝑖𝑔= 1.1
𝑟𝑎𝑡𝑒𝑠𝑚𝑎𝑙𝑙 = 0.9
37
1. Tracking result only by controlling pan and tilt. The experimental result shows the target human is mostly located on camera’s FOV, no matter how he walks.
(a) (b) (c)
(d) (e) (f)
(h) (i) (j)
(k) (l) (m)
Fig. 5-5 Update pan and tilt command to track
38
2. Tracking to test the zoom-in/out. In this case, the target human is walking away from camera or approaching to the camera.
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
Fig. 5-6 The effects of zoom in / out
Figure 5-6 (a) shows the target human has been detected and the 𝑍𝑜𝑜𝑚𝑙𝑎𝑦𝑒𝑟 is initialized to 0. If there is a zoom-in happened, 𝑍𝑜𝑜𝑚𝑙𝑎𝑦𝑒𝑟 is added by 1. On the other hand, 𝑍𝑜𝑜𝑚𝑙𝑎𝑦𝑒𝑟 is subtracted by 1 when zoom-out happened. The details of 𝑍𝑜𝑜𝑚𝑙𝑎𝑦𝑒𝑟 is showed in Table 5-1.
Table 5-1 Zoom layer varies in Fig. 5-6
(a) (b) (c) (d) (e) (f) (g) (h) (i)
𝒐𝒐 0 0 1 2 1 2 3 2 1
39
3. Tracking by controlling pan, tilt, and zoom, with target human freely walking in the environment.
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
(j) (k) (l)
Fig. 5-7 Combination of pan, tilt and zoom in / out
Table 5-2 Zoom layer varies in Fig. 5-7
(a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l)
𝒐𝒐 0 0 1 1 1 0 0 1 2 1 0 0
40
4. Tracking a target human which more than one person walking in the same environment.
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
(j) (k) (l)
Fig. 5-8 Human tracking in multiple objects
41
5. Tracking a target human which more than one person walking in the same environment.
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
(j) (k) (l)
Fig. 5-9 Human tracking in multiple objects
42
6 Chapter 6
Conclusions and Future work
6.1 Conclusions
The experiment results show that the proposed system can track moving human by particle filter algorithm on active camera. Also, the tracking system is able to track the target human when more than one person walking in the same environment.
Moreover, the zoom-in/out adjusts the resolution image of tracking human.
There are several contributions in this research:
1. Our system can exactly distinguish human and nonhuman.
2. The weighted resampling can help particle filter to preserve the samples with high weights.
3. Occlusion handler can solve the temporal full occlusion condition.
4. It can track target human smoothly by using the PID controller to determine the motion of camera.
43
6.2 Future works
In our system, the moving human can be detected and tracked smoothly and continuously. But there are some situations which will result in tracking lost. For example, the background has significant light changes that will lead to moving human changing its character.
In order to use particle filter with active camera in real-time, we reduces the bins of color histogram and the number of samples, it sometimes affects the accuracy of tracking. It can be solved by using some optimized methods in samples. For example, mean-shift can be used to optimize each sample in particle filter.
The active camera is driven by pelco P protocol and uses PID controller to pan or tilt. The results of driving active camera are successful. But it doesn’t use the result of 𝑣𝑥 and 𝑣𝑦 in estimated target state vector 𝑠𝑡𝑎𝑟𝑔𝑒𝑡. The 𝑣𝑥 and 𝑣𝑦 can be involved in the speed of pan and tilt to increase the accuracy of camera control.
44
References
[1] W. J. Gillner, “Motion based vehicle detection on motorways”, Proceedings of the IEEE Intelligent Vehicles '95 Symposium, pp. 483-487, September 1995.
[2] P. H. Batavia, D. A. Pomerleau, and C. E. Thorpe, “Overtaking vehicle detection using implicit optical flow”, Proceedings of the IEEE Transportation Systems Conference, pp. 729-734, November 1997.
[3] L. Zhao and C. E. Thorpe, “Stereo- and neural network-based pedestrian detection”, IEEE Transactions on Intelligent Transportation Systems, vol. 1, pp.
148-154, September 2000.
[4] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection”, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886-893, 2005.
[5] S. Montabone, A. Soto, “Human detection using a mobile platform and novel features derived from a visual saliency mechanism”, Image and Vision Computing, vol. 28, pp. 391-402, 2010.
[6] P. Viola, M. J. Jones, and D. Snow, “Detecting Pedestrians Using Patterns of Motion and Appearance”, International Journal of Computer Vision, vol. 63, pp.
153-161, 2005.
[7] M. Dimitrijevic, V. Lepetit, and P. Fua, “Human body pose detection using Bayesian spatio-temporal templates”, Computer Vision and Image Understanding, vol.104, pp.127-139, 2006.
[8] R. C. Gonzalez, R. E. Woods, Digital Image Processing, Addison-Wesley, New York, 1992.
[9] D. Marr and E. Hildreth, “Theory of edge detection”, Proceedings of the Royal Society, vol. 207, pp. 197–217, London, 1980.
[10] W. Guo, D. L. Bi, L. Liu, “Human motion tracking based on shape analysis”, Proceedings of the International Conference on Wavelet Analysis and Pattern Recognition, pp. 2-4, Beijing, China, November 2007.
[11] T. Law, H. Itoh, and H. Seki, “Image filtering, edge detection, and edge tracing using fuzzy reasoning”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, pp. 481-491, May 1996.
[12] O. Williams, A. Blake, and R. Cipolla, “Sparse bayesian learning for efficient visual tracking”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, pp. 1292–1304, August 2005.
[13] A. Agarwal and B. Triggs, “Recovering 3D human pose from monocular images”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, pp.
45
44-58, January 2006.
[14] B. Han and L. Davis, “Object tracking by adaptive feature extraction”, International Conference on Image Processing, pp. 1501-1504, October 2004.
[15] R. T. Collins, Y. Liu and M. Leordeanu, “Online selection of discriminative tracking features”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, pp.1631-1643, October 2005.
[16] K. Fukunaga and L. D. Hostetler, “The estimation of the gradient of a density function, with applications in pattern recognition”, IEEE Transactions on Information Theory, vol. 21, pp. 32-40, January 1975.
[17] D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-based object tracking”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, pp. 564-577, May 2003.
[18] D. Freedman and P. Kisilev, “Fast mean shift by compact density representation”, IEEE Conference on Computer Vision and Pattern Recognition, pp. 1818-1825, June 2009.
[19] F. L. Wang, S. Y. Yu, and J. Yang, “Robust and efficient fragments-based tracking using mean shift”, AEU - International Journal of Electronics and Communications, vol. 64, pp. 614-623, July 2010.
[20] F. Porikli and O. Tuzel, “Multi-kernel object tracking”, IEEE International Conference on Multimedia and Expo, pp. 1234–1237, July 2005.
[21] C. Yang, R. Duraiswami, and L. Davis, “Efficient mean-shift tracking via a new similarity measure”, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 176–183, June 2005.
[22] R. V. Babu, P. Pe´rez, and P. Bouthemy, “Robust tracking with motion estimation and local Kernel-based color modeling”, Image and Vision Computing, vol. 25, pp.1205–1216, August. 2007.
[23] S. Feng, Q. Guan, S. Xu and F. Tan, “Human tracking based on mean shift and Kalman Filter”, International Conference on Artificial Intelligence and Computational Intelligence,2009.
[24] P. Pe´rez, C. Hue, J. Vermaak, M. Gangnet, “Color-based probabilistic tracking”, Proceedings of European Conference on Computer Vision, pp. 661-675, 2002.
[25] K. Nummiaro, E. Koller-Meier, and L. V. Gool, “An adaptive color based particle filter”, Image and Vision Computing, vol. 21, pp. 99–110, 2003.
[26] D. Murray and A. Basu, “Motion tracking with an active camera”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, May 1994.
[27] C. W. Lin, C. M. Wang, Y. J. Chang, and Y. C. Chen, “Real-time object extraction and tracking with an active camera using image mosaics”, Proceedings of the IEEE Workshop on Multimedia Signal Processing, pp.
46
149-152, December 2002.
[28] R. T. Collins, O. Amidi, and T. Kanade, “An active camera system for acquiring multi-view video”, Proceedings of the International Conference on Image Processing, September 2002.
[29] L. Fiore, D. Fehr, R. Bodor, A. Drenner, G. Somasundaram and N.
Papanikolopoulos, “Multi-camera human activity monitoring”, Journal of Intelligent Robotic Systems, vol. 52, pp.5-43, May 2008.
[30] A. R. Smith, “Color Gamut Transform Pairs”, SIGGRAPH 78 Conference Proceedings, vol. 12, pp. 12-19, August 1978.
[31] http://en.wikipedia.org/wiki/HSV_color_space#Conversion_from_RGB_to_HSL _or_HSV
[32] http://www.mathworks.com/access/helpdesk/help/toolbox/images/f8-20792.html [33] Y. Cheng, “Mean shift, mode seeking, and Clustering”, IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 17, pp. 790-799, Aug. 1995.
[34] http://www.commfront.com/RS232_Examples/CCTV/Pelco_D_Pelco_P_Examp les_Tutorial2.HTM#1
[35] http://en.wikipedia.org/wiki/PID_controller