Motionless Human Checking - Intelligent Human Detection

Chapter 3 Intelligent Human Detection

3.4 Motionless Human Checking

In section 3.1.3, moving objects segmentation is implemented to narrow down the amount of ROI for less computational cost. However, the ROI may be under the situation which the ROI has been judged to contain a human in the previous frame but the same ROI is without motion in the current frame. In other words, the ROI under this situation should contain the human, yet the system would filter out it in the current frame. Therefore, the system has to check whether the ROI is under this situation and contains a human or not.

Fig-3.16 the process of parameters of ROI storing

In this section, the system stores the parameters of ROI which contains a human in the previous frame, and judges the ROI whether has any change in the current frame. Fig-3.16 shows the process of parameters of ROI storing, where i represents

the amount of humans in the previous frame. If the ROI has some change, it would be considered as new ROI in the current frame and still delivered into the next step. On the other hand, the ROI without change, the object or human in the ROI is sure to be motionless because it is possible to vanish between the previous frame and the current frame. Therefore, the system could check whether any motionless human exists, and then consider motionless human as new ROI in the current frame. After checking, the new ROI would be delivered into the next step as well. By this section, the system can increase the accuracy rate and maintain the high execution speed.

Chapter 4 Experimental Results

In the previous chapter, four main steps are introduced for the proposed human detection system. In this chapter, the experiment results of each step will be shown in detail, and the proposed steps of algorithms will be implemented by Microsoft Visual Studio 2010 and OpenCV 2.2.

4.1 ROI Selection

In this section, an example of ROI selection would be shown in Fig-4.1. The process and result of ROI selection are introduced step by step. First, the depth image is shown in Fig-4.1(a), and result of histogram projection, top-view image, is shown in Fig-4.1(b). Then, the system finds the contour of top-view image in Fig-4.1(c) and selects the rectangle region through connected-component labeling (CCL) in Fig-4.1(d). Afterwards the rectangle region is mapped into the depth image based on the rectangle region of top-view image. Finally, moving objects segmentation is implemented to check whether the rectangle regions in depth image contain any moving pixels from the difference of frames. Thus, there are four ROIs selected in Fig-4.1(e) and three ROIs selected in Fig-4.1(f). Moving objects segmentation is useful to reduce the computational cost from above example.

(e) (f) Fig-4.1 the process and result of ROI selection

Besides, this thesis compares the advantage of color image with the advantage of depth image. In detail, the system adopts depth information to implement histogram projection and CCL, and color information to implement moving objects segmentation. However, the difference between the previous frame and the current frame in depth images could be also used to segment moving objects. Furthermore, ROI selection could be implemented only by moving objects segmentation and CCL

(a) (b)

as well. Therefore, there are two examples to explain why this thesis doesn’t adopt above methods to select ROI.

 Example 1: Moving objects segmentation based on difference in depth images.

In Fig-4.2, there are three continue frames and four windows in a frame, including the window of gray image, the window of gray difference, the window of depth image, the window of depth difference. In Fig-4.2(a), there are some pixels in difference image based on depth information, but not any pixels in difference image based on color information. Similarly, same situations are also in Fig-4.2(b) and Fig-4.2(c) due to the property of depth camera. Depth camera adopts two infrared cameras, infrared transmitter and infrared receiver. The technique of infrared cameras is based on the reflection of infrared, so it would cause the situations such as Fig-4.2. Hence, the system doesn’t adopt the depth information to segment moving objects due to such infrared noises.

(a)

(b)

(c)

Fig-4.2 Comparing difference of depth and gray images

 Example 2: ROI selection only by moving objects segmentation and CCL

In Fig-4.3, there are four examples and three windows in an example, including the window of top-view image, the window of difference image and the window of gray image. In these examples, even though using larger mask to dilate the pixels in difference image, the system still often selects fragmented ROI.

Therefore, this thesis adopts not only depth information but also color information to select ROI correctly.

(a)

(b)

(c)

(d)

Fig-4.3 ROI selection only by moving objects segmentation and CCL

4.2 Feature Extraction

In this section, all ROIs would be resized into 64×128, including the original data and the test data. Some original data, Leeds Sport Pose (LSP) dataset, are shown in Fig-4.4(a) and normalized LSP dataset are shown in Fig-4.4(b).

(a)

(b)

Fig-4.4 Some original data and normalized original data

Test data are derived from the experiment video and some examples are shown in Fig-4.5. These data includes eight subjects for experiment and there are hundreds of photos per subject.

Fig-4.5 Some normalized test data of subjects for experiment

Then, original data and test data would be computed by histogram of oriented gradients (HOG) descriptors, and saved as feature vectors. All feature vectors would be combined and saved as the feature matrix. Therefore, this matrix could be delivered into next step to recognize human shape.

4.3 Human Shape Recognition

In this section, the human recognition system would be implemented by two learning algorithms, support vector machine (SVM) and artificial neural network (ANN), to examine the performance and reliability. In human detection, there are four possible events given in Table 4.1, including True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN). These four events are determined based on the actual condition and test result, and they are listed as below:

1. True Positive, TP, means a real human is detected as human.

2. True Negative, TN, means a non-human is detected as non-human.

3. False Positive, FP, means a non-human is detected as human.

4. False Negative, FN, means a real human is detected as non-human.

With these four events, the true positive rate TPR and false positive rate FPR can be respectively defined as below:

A true positive rate of 100% means all humans are detected correctly, while a false positive rate of 0% means any non-human is not detected as human. To compare the performance of the system, the accuracy rate AR is defined as below:

TP+TN

AR= 100%

TP+TN+FP+FN× (4.3)

and a higher AR implies a better detection performance.

Table 4.1 TP, FP, FN, TN table

In this thesis, it adopts original data which includes 1500 positive data and 1328 negative data to test the performance of human shape recognition. Similarly, it also adopts same amount of positive test data and negative test data for the performance test. The performance and average executing time of two approaches are shown in Table 4.2, including TP, TN, TPR, FPR, AR, average executing time. Besides, the comparison of ANN neurons selection is shown in Table 4.3. In Table 4.3, the performance of two hidden layers is better than the performance of one hidden layer since original data in two hidden layers could be classified perfectly and test data in two hidden layers have higher AR. Hence, the system uses two hidden layers to establish ANN classifier. In Table 4.2, the classifiers of SVM and ANN have the similar AR, but have obviously different executing time. In system execution, ANN classifier has higher executing speed than SVM classifier. However, the training of ANN classifier spends much more time than the training of SVM classifier. Therefore, each of SVM classifier and ANN classifier has its advantages and disadvantages.

TP FP

Table 4.2 the performance and average executing time of two approaches

Table 4.3 the comparison of ANN neurons selection

4.4 Motionless Human Checking

In this section, taking Fig-4.6 as an example, there are three continue frames in the situation which first frame is human still with motion, second frame is human without motion and third frame is human without motion as well. In Fig-4.6(a), human with motion has been recognized successfully, and human without motion is

also recognized in Fig-4.6(b) and Fig-4.6(c). As a result, motionless human could be considered as new ROI in the next frame and successfully recognized by feature extraction and human shape recognition as well.

(a)

(b)

(c)

Fig-4.6 three continue frames of motionless human checking

Chapter 5 Conclusions and Future Works

This thesis proposes a real-time human detection system based on RGB-D images generated by Kinect to find out humans from a sequence of images. The system is separated into four parts, including region-of-interest (ROI) selection, feature extraction, human shape recognition and motionless human checking. First, the histogram projection, connected component labeling and moving objects segmentation are applied to select the ROIs according to the property that human is walking or standing with motion. By histogram projection, the rough vertical distribution in 3-D space could be implemented. Thus, if an object or human has the height above the certain threshold, it would be selected as a ROI. Afterwards the ROI would be marked by CCL, and moving objects segmentation check whether the ROI contains any change between the previous frame and the current frame. Then, normalize the ROIs based on the bilinear interpolation approach and extract the human shape feature by Histogram of Oriented Gradients (HOG). After feature extraction, support vector machine or artificial neural network is adopted to train the classifier based on Leeds Sports Pose dataset, and human shape recognition is implemented by the classifier. Finally, check whether the image contains any motionless human, and then recognize it. From the experimental results, the conclusions are listed as following:

 The system combines the advantages of color image and depth image. The advantage of depth image is robust to illumination changes, so the histogram projection of depth image is implemented to select the ROI. However, if the

system only uses the difference frame by frame of gray image, it would obtain a fragmented ROI even though the difference is processed by more dilation. On the other hand, the use of depth image is hard to segment moving object because of the hardware limitation. Although there is not any moving object in depth image, the difference frame by frame has many noises. Therefore, the system adopts depth image to select the ROI and gray image to segment the moving object.

 Because gray image is rich in texture and has a high angular resolution, the HOG descriptors are implemented to extract features. Although HOG descriptors have high dimensions, the system could still execute in real-time owing to only ROIs not whole image processed. Furthermore, two approaches for human shape recognition have close accuracy rate. Nonetheless, support vector machine (SVM) is faster than artificial neural network (ANN) in pre-processing, and ANN is faster than SVM when human detection.

In order to improve the interaction between robot and human, there are three systems often required for a robot, including human detection, human tracking and pose recognition. With these systems, the robot could detect humans, track specific humans and interact with them through their poses. Hence, the interaction between human and robot could be more natural. In this thesis, the proposed system is successful in human detection, and it will be further applied to the implementation of the other systems in the future.

Reference

[1] "http://msdn.microsoft.com/en-us/library/jj131033.aspx"

[2] "http://en.wikipedia.org/wiki/Kinect."

[3] R. Polana and R. Nelson, "Low level recognition of human motion (or how to get your man without finding his body parts)," in Motion of Non-Rigid and Articulated Objects, 1994., Proceedings of the 1994 IEEE Workshop on, 1994, pp. 77-82.

[4] C. Cédras and M. Shah, "Motion-based recognition a survey," Image and Vision Computing, vol. 13, pp. 129-155, 1995.

[5] M. Enzweiler, P. Kanter, and D. M. Gavrila, "Monocular pedestrian recognition using motion parallax," in Intelligent Vehicles Symposium, 2008 IEEE, 2008, pp. 792-797.

[6] J. Heikkila and O. Silven, "A real-time system for monitoring of cyclists and pedestrians," in Visual Surveillance, 1999. Second IEEE Workshop on, (VS'99), 1999, pp. 74-81.

[7] C. Nakajima, M. Pontil, B. Heisele, and T. Poggio, "Full-body person recognition system," Pattern Recognition, vol. 36, pp. 1997-2006, 2003.

[8] M. Spengler and B. Schiele, "Towards robust multi-cue integration for visual tracking," Machine Vision and Applications, vol. 14, pp. 50-58, 2003.

[9] Z. Tao and R. Nevatia, "Tracking multiple humans in complex situations,"

Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 26, pp.

1208-1221, 2004.

[10] L. Zhao and C. E. Thorpe, "Stereo- and neural network-based pedestrian detection," Intelligent Transportation Systems, IEEE Transactions on, vol. 1, pp. 148-154, 2000.

[11] M. Bertozzi, E. Binelli, A. Broggi, and M. D. Rose, "Stereo Vision-based approaches for Pedestrian Detection," in Computer Vision and Pattern Recognition - Workshops, 2005. CVPR Workshops. IEEE Computer Society Conference on, 2005, pp. 16-16.

[12] D. M. Gavrila and S. Munder, "Multi-cue pedestrian detection and tracking from a moving vehicle," International Journal of Computer Vision, vol. 73, pp.

41-59, Jun 2007.

[13] S. Nedevschi, S. Bota, and C. Tomiuc, "Stereo-Based Pedestrian Detection for Collision-Avoidance Applications," Intelligent Transportation Systems, IEEE Transactions on, vol. 10, pp. 380-391, 2009.

[14] X. Lu, C. Chia-Chih, and J. K. Aggarwal, "Human detection using depth information by Kinect," in Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on, 2011, pp.

15-22.

[15] H. Jain, A. Subramanian, S. Das, and A. Mittal, "Real-time upper-body human pose estimation using a depth camera," Computer Vision/Computer Graphics Collaboration Techniques, pp. 227-238, 2011.

[16] N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," in Computer Vision and Pattern Recognition, 2005. CVPR 2005.

IEEE Computer Society Conference on, 2005, pp. 886-893 vol. 1.

[17] Z. Qiang, Y. Mei-Chen, C. Kwang-Ting, and S. Avidan, "Fast Human Detection Using a Cascade of Histograms of Oriented Gradients," in Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, 2006, pp. 1491-1498.

[18] P. Viola, M. J. Jones, and D. Snow, "Detecting Pedestrians Using Patterns of Motion and Appearance," International Journal of Computer Vision, vol. 63,

pp. 153-161, 2005.

[19] H. Fujiyoshi, A. J. Lipton, and T. Kanade, "Real-time human motion analysis by image skeletonization," IEICE Transactions on Information and Systems E Series D, vol. 87, pp. 113-120, 2004.

[20] D. G. Lowe, "Object recognition from local scale-invariant features," 1999, pp.

1150-1157 vol. 2.

[21] K. Mikolajczyk, C. Schmid, and A. Zisserman, "Human detection based on a probabilistic assembly of robust part detectors," in Computer Vision - Eccv 2004, Pt 1. vol. 3021, T. Pajdla and J. Matas, Eds., ed Berlin: Springer-Verlag Berlin, 2004, pp. 69-82.

[22] C. Wohler and J. K. Anlauf, "An adaptable time-delay neural-network algorithm for image sequence analysis," Neural Networks, IEEE Transactions on, vol. 10, pp. 1531-1536, 1999.

[23] M. Szarvas, A. Yoshizawa, M. Yamamoto, and J. Ogata, "Pedestrian detection with convolutional neural networks," in Intelligent Vehicles Symposium, 2005.

Proceedings. IEEE, 2005, pp. 224-229.

[24] A. Mohan, C. Papageorgiou, and T. Poggio, "Example-based object detection in images by components," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 23, pp. 349-361, 2001.

[25] D. M. Gavrila, "A Bayesian, Exemplar-Based Approach to Hierarchical Shape Matching," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 29, pp. 1408-1421, 2007.

[26] N. Duc Thanh, L. Wanqing, and P. Ogunbona, "A part-based template matching method for multi-view human detection," in Image and Vision Computing New Zealand, 2009. IVCNZ '09. 24th International Conference, 2009, pp. 357-362.

[27] L. Zhe and L. S. Davis, "Shape-Based Human Detection and Segmentation via Hierarchical Part-Template Matching," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 32, pp. 604-618, 2010.

[28] T. Corneliu and S. Nedevschi , "Real-time pedestrian classification exploiting 2D and 3D information," Intelligent Transport Systems, IET JOURNALS &

MAGAZINES, pp. 201-210, 2008.

[29] D.M. Gavrila and V. Philomin, "Real-time object detection for smart vehicles,"

Computer Vision, The Proceedings of the Seventh IEEE International Conference on, vol.1, pp. 87-93, 1999.

[30] W. Bo and R. Nevatia, "Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors," in Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, 2005, pp.

90-97 Vol. 1.

[31] N.B. Karayiannis and G.W. Mi, "Growing radial basis neural networks:

merging supervised and unsupervised learning with network growth techniques," Neural Networks, IEEE Transactions on, vol. 8, pp. 1492-1506, 1997.

[32] M. Christopher Bishop, Pattern Recognition and Machine Learning, Springer, 2007.

[33] R. C. González and R. E. Woods, Digital Image Processing, 3rd Edition:

Pearson/Prentice Hall, 2008.

[34] R. Laganière, OpenCV 2 computer vision application programming cookbook:

Packt Publ. Limited, 2011.

[35] L. Tang, W.X. Xie and J.J. Huang, "Finding main road seeds based on symmetrical edge orientation histogram," Institution of Engineering and Technology, Electronics Letters, pp. 235-237, 2004.

[36] " http://www.comp.leeds.ac.uk/mat4saj/lsp.html"

在文檔中基於色彩及深度影像之即時人形偵測系統設計 (頁 52-0)