Human Detection and Tracking - 基於改良式獨立成分分析之人物偵測與追蹤

First, we define the gray blocks represent non-human objects, and other color blocks represent human objects.

Figure 4-3 shows the detection results of multi-pose human. There are some pose of human depend on the walking direction. We can see that our system can detect human successfully even the human has vary pose.

(a) Human walks straight to 0° (b) Human walks straight to 45°

(e) Human walks to 180° (f) Human walks to 225°

(g) Human walks to 270° (h) Human walks to 315° Fig. 4-3 : Detection results for multi-direction of human

Following figure shows the detection system able to classify the non-hman objects, such as vehicles, animals, leaves, etc.

(a) Automatic valve (b) Cars

(a) Lateral side of human detection (b) Front side of human detection Fig. 4-5 : Results of normal indoor environment

Fig. 4-5 shows the result of frontal and lateral human at indoor environment.

People with backpack or carry something with hand are shown in Fig. 4-6. Human have different pose and running through the path are shown in Fig. 4-7.

(a) Human with bag (b) Human with a unbrella Fig. 4-6 : Results of human carry something.

(a) Different pose (b) Running Human Fig. 4-7 : Results of human with other action.

In different environment, different light, some results of multiple moving objects which include human and non-human in the same frame will be shown in Fig. 4-8.

These monitor movements have human, animals or vehicles.

Figure 4-9 simulates a person that the body is partially occluded with other objects.

(a) A person and cars (b) three people and moving leaves

(e) A person and leaves (f) Two dogs and a person Fig. 4-8 : Multiple objects in one frame.

(a) Occluded by a fence (b) Occluded by a car

If we confirm that an object is a person after detecting it within a periodic time, then we just need to track it instead of tracking and detecting it at the same time. We can also reducing false alarm by statistics of the detection result. Moreover, tracking processing may help us to analysis the trajectory of the object, and for other behavior analysis. In Fig. 4-10 (b) (c), the position of green black is predicted by Kalman filter, the original foreground combines object 0 and object 2, we only recognize the object 0 at first, but after Kalman filter, the object 2 is also caught in these frames. Figure 4-11 shows the other result of tracking.

(a) Frame # 200 (b) Frame # 205

(a) Frame # 1488 (b) Frame # 1499

Finally, the accuracy of our system is shown in the following. Here, we establish the training database from 16 different videos, and the testing database from the other 18 different videos. There are 1843 images for positive training data, 2066 images for negative training data, 3178 images for positive testing data, and 2847 images for negative testing data. Each data is normalized to 40 by 40 pixels.

We compare our proposed method with different method. A codebook matching (CBM) algorithm [21] use human shape as the features, and match the moving object with the code vectors of the codebook. The other two ways: ICA-Cosine [9], and ICA -SVM [18] are used for face recognitions, which features are also extracted by ICA.

In [9], the features are selected by calculating the ratio of between-class to within-class variability r for each coefficient, the larger r the better distinguishing ability, and cosine similarity measurement is used for classification. In [18], they select the components according to the binary classifier capability which uses Perception or Neural Network, and classifying by SVM. Extracting features using PCA and classifying by Back-Propagation (BP) Neural Network is another way.

Table 3 : Accuracy of proposed method and the others

Accuracy (%) Training Data Testing Data

Human NonHuman Human NonHuman

Features+Classifier 1843 2066 3178 2847

MICA + SVM 97.72 95.84 94.15 93.57

ICA + Cosine [9] 90.87 85.73 90.34 85.49

ICA + SVM [18] 97.55 93.9 93.17 91.13

CBM [21] 87.95 92.83 90.88 93.68

PCA + BP 99.18 99.46 89.65 94.09

We select 30 bases for ICA and PCA and 40 features of shape to build 256 code vectors in the code book. The result shows in table 3 that our proposed method Modified ICA (MICA) outperforms the others.

Fig. 4-12 illustrate the results of our proposed method and other methods, horizontal axis means the selected number of ICA features, above vertical axis means accuracy of testing data, and under vertical axis means the corresponded number of SVs, where the point-dotted line represents the result before feature selection, the point-dashed line represents the result of Fisher’s criterion [9], the plus-dashdot line represents the result of NN [18], and the star-solid line represents the result of our proposed method. The total features after the process of ICA is 76, we select n features from them.

We compare each method with 10 to 60 ICA features. In Fig. 4-12, apparently if we did not select a set of better features, but just depend on the creating sort of the bases, the result will very poor for detection. Selecting subsets of coefficients by class discriminability improved the performance of the ICA representation.

(a) Number of ICA features - Accuracy

(b) Number of ICA features - Number of SVs Fig. 4-12 : Analysis of different feature selection methods

Table 4 : Analysis of computing time

Selection No. features No. SVs Accuracy (%) ms/object

Entropy 30 825 93.88 1.13

Fisher's [9] 30 1157 93.21 1.33

Entropy 40 958 94.51 1.41

Fisher's [9] 40 1194 94.4 1.65

NN [18] 40 1028 94.58 1.51

In Fig. 4-12 (a), our proposed method has a greater performance than others, and in Fig. 4-12 (b), the corresponded number of SVs is obviously much less than others.

The computation time of detection process are listed in Table 4. Here we use 5 videos, total have 14056 frames, detection times are more than 3000 times. We can see that the costing time of our proposed method is also less than the others, a unit of measurement is millisecond per object. Thus a human detection process which extracting features by ICA, selecting features by conditional entropy, and classifying using SVM, can obtain an optimal result not only in accuracy but also in computing time.

4.3 Discussion

First, table 3 shows that the accuracy of our system is more than ninety percent, and we think it is enough for a warning system. Note that, the training and testing data are included people with full body and half body.

Of course, there still some situation may cause the system fail. Sometime the color of people dressing is too close to background, it may cause the background subtraction failed and cut the object by half, it is depicted in Fig. 4-13.

Fig. 4-13 : Example of system fail #1

Because of our shadow elimination algorithm is based on the texture and color relations between shadow range and background, and the threshold to detect shadow is set by the assume of shadow range is much smaller than moving object range, so if the shadow area is too large, or the background texture is not so obvious, the shadow will effect the detection result, see as Fig. 4-14.

Fig. 4-14 : Example of system fail #2

Another main problem is grouping. Although we have a multiple human separation algorithm, but we just split human when they are walking shoulder by shoulder, and their head is observed. For Kalman filter, it is also useful when the

target model are constructed first, if a group of people al the time in the secured area all the time, we have no chance to build the model for each person. The situation can be shown in Fig. 4-15.

Fig. 4-15 : Example of system fail #3

Because the process of our system does not simplify, it needs so heavy computing load that it can not process in real-time system, this is also a main disadvantage.

5 Chapter 5

Conclusion and Future Work

In this thesis, we present a system for object-based human detection and tracking.

A simple process based on HSV color space is proposed to eliminate shadow for human detection. The experimental results show that the proposed process can actually improve the precision of human detection. For the problem of small groups of people walking partially occluded, we solve them by using a fitting ellipse function which depend on pyramid method, and a simple trajectory tracking based on Kaman filter is used to resolve some other occlusion problem, the tracking sub-system can also decrease the false-alarm rate. ICA have been used in a lot of applications, for example of separating sound or EEG signals , reducing noise, face recognition and so on, but never used in human detection. We not only use ICA for our feature extraction, but also represent a feature selection method to solve the disadvantage of unstable training components. We observed that the class discriminability of independent component does not depend on binary classified capability from the distributions of ICA coefficient, so the conditional entropy is proposed to solving the problem. The conditional entropy if referred to as the entropy of desired output Y conditional on coefficient value X. Moreover, the accuracy of our detection process and computing time are better than other methods, the accuracy is more than 93%.

To improve the performance and the robustness of our system, some enhancements can be done in the future:

(a) A robust shadow elimination algorithm is needed. For our system, we can not handle the large-range shadow. It’s probably to employ an edge

(b) For most of case, we can not recognize a person from a group of people, because of our training data is not include the background range, so it is not easy to recognize human if he is not extracted clearly from other image. We may need to training the images include background, thus even the camera is not fixed, the features is also useful.

(c) The computing load of our system is heavy, we need to simplify or find other fast algorithm to reduce the computation time.

References

[1] A. Elgammal, R. Duraiswami, D. Harwood, and L.S. Davis, “Background and Foreground Modeling Using Nonparametric Kernel Density Estimation for Visual Surveillance,” Proc. of the IEEE, Vol. 90, No. 7, July 2002.

[2] Fung G S, Yung N H, Grantham K H, et al. “Effective moving cast shadow detection for monocular color traffic image sequences”. Optical Engineering, Vol.

41, No.6, pp. 1425-1440, 2002.

[3] I. Haritaoglu, D. Harwood, and L.S. Davis. “Hydra: Multiple people detection and tracking using silhouettes,” In IEEE International Workshop on Visual Surveillance, pp. 6-13, June 1999.

[4] T. Zhao and R. Nevatia. “Tracking multiple humans in complex situations,”

IEEE T. Pattern Analysis and Machine Intelligence, Vol. 26, No. 9, pp.

1208-1221, Sept. 2004.

[5] S. J. McKenna, S. Jabri, Z. Duric, and A. Rosenfeld, “Tracking groups of people,” Comput. Vision Image Understanding, No. 80, pp. 42–56, 2000.

[6] R. Venkatesh Babu, P. P´erez, and P. Bouthemy. “Robust tracking with motion estimation and local kernel-based color modeling.” Image Vis. Comput. In Press, 2007.

[7] W. Hu, M. Hu, X. Zhou, T. Tan, J. Lou, S. “Maybank, Principal axis-based correspondence between multiple cameras for people tracking,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 28 No. 4, pp.

663–671, April 2006.

[8] T.-W. Lee and M.S. Lewicki. “Unsupervised image classification, segmentation, and enhancement using ICA mixture models.” IEEE Trans. on Image Processing, Vol. 11, No. 3, pp. 270–279, 2002.

[9] M.S. Bartlett, J.R. Movellan and T.J. Sejnowski, “Face recognition by independent component analysis. “ IEEE Transaction on Neural Networks, Vol.

13, No. 6 , pp. 1450–1464, 2002

[10] C. Liu and H. Wechsler, “Independent component analysis of Gabor features for face recognition,” IEEE Trans. Neural Networks, Vol. 14 No. 4, pp. 919–928, 2003

[11] W. J. Gillner, “Motion based vehicle detection on motorways,” in Proc. of the Intelligent Vehicles '95 Symposium, pp.483-487, Sept. 1995.

[12] L. Zhao and C. E. Thorpe, “Stereo- and neural network-based pedestrian detection,” IEEE Transactions on Intelligent Transportation Systems, Vol. 1, No.

3, pp. 148-154, Sept. 2000.

[13] C. E. Smith, C. A. Richards, S. A. Brandt, and N. P. Papanikolopoulos, “Visual tracking for intelligent vehicle-highway systems,” IEEE Transactions on Vehicular Technology, Vol. 45, No. 4, pp. 744-759, Nov. 1996.

[14] Y. L. Tian and A. Hampapur, “Robust Salient Motion Detection with Complex Background for Real-time Video Surveillance,” Proceedings of the IEEE Workshop on Motion and Video Computing (WACV/MOTION’05), August 2005.

[15] C. Curio, J. Edelbrunner, T. Kalinke, C. Tzomakas, and W. von Seelen, “Walking pedestrian recognition,” IEEE Transactions on Intelligent Transportation Systems, vol. 1,no. 3, pp.155-163, Sept. 2000.

[16] S. M. Yoon and H. Kim, “Real-time multiple people detection using skin color, motion and appearance information,” Proceedings of the 2004 IEEE International Workshop on Robot and Human Interactive Communication Kurashiki, Okayama Japan, pp. 20-22, Sept. 2004.

[17] K Lo, M Yang, R Lin - “Shadow Removal for Foreground Segmentation,”

PSIVT , LNCS 4319, pp. 342-352, 2006.

[18] Y. Ou, X. Wu,H. Qian and Y. Xu, “A Real Time Race Classification System,”

IEEE International Conference on Information Acquisition, pp. 378-383, 2005 [19] B.A. Draper, K. Baek, M.S. Bartlett, J.R. Beveridge, “Recognizing Faces with

PCA and ICA,” Computer Vision and Image Understanding: special issue on face recognition, pp. 115-137, 2003

[20] C. Stauffer and W.E.L Grimson, “Adaptive Background Mixture Models for Real-Time tracking,” In IEEE Conference on Computer Vision and Pattern Recognition, pp. 246-252, June 1999.

[21] J. Zhou and J. Hoang, “Real Time Robust Human Detection and Tracking System,” Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) 2005.

[22] P. H. Batavia, D. E. Pomerleau, and C. E. Thorpe, “Overtaking vehicle detection using implicit optical flow,” IEEE Conference on Intelligent Transportation System, Nov.1997, pp. 729-734.

[23] C. Huang, T. Chen, S. Li, E. Chang, and J.L. Zhou, “Analysis of speaker variability,” Proc. European Conference on Speech Communication and Technology. Denmark, Vol. 2, pp. 1377–1380. 2001

[24] C. Orrite-Uruñuela, J. Martínez del Rincón, J. Elías Herrero-Jaraba, G. Rogez,

“2D Silhouette and 3D Skeletal Models for Human Detection and Tracking,”

Proceedings of the 17th International Conference on Pattern Recognition (ICPR’04), 2004.

[25] Z. L. Jlang, S. F. Li, D. F. Gao, “A Time Saving Method for Human Detection in Wide Angle Camera Images,” Proceedings of the Fifth International Conference on Machine Learning and Cybernetics, Dalian, pp. 13-16, August 2006.

[26] R. Polana and R. Nelson, “Detecting activities,” IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition, pp. 2-7, 1993.

[27] M. Bertozzi, A. Brogi, M. Del Rose, M. Felisa, A. Rakotomamonjy and F. Suard,

“A Pedestrian Detector Using Histograms of Oriented Gradients and a Support Vector Machine Classifier,” IEEE Intelligent Transportation Systems Conference, pp. 143-148, 2007

[28] C.C. Chang and C.J. Lin, “LIBSVM: a Library for Support Vector Machines,”

June 14, 2007. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

[29] A. Cichocki and S. Amari, “Adaptive Blind Signal and Image Processing,” Wiley, 2002.

[30] A. Hyvarinen, J. Karhunen, E. Oja, “Independent Component Analysis,” Wiley New York, 2001.

[31] Abe, Shigeo, “Support Vector Machines for Pattern Classification,”

London :Springer-Verlag London Limited, 2005.

[32] L Wang ed., “Support Vector Machines: Theory and Applications,” New York:

Springer, Berlin Heidelberg, 2005.

[33] S. R. Gunn, “Support Vector Machines for Classification and Regression,”

Technical Report, May 1998.

在文檔中基於改良式獨立成分分析之人物偵測與追蹤 (頁 71-88)