Edge Detection and Distance Transformation

Chapter 4 Experimental Results

4.2.2 Edge Detection and Distance Transformation

The results of edge detection and distance transformation are shown from Fig-4.7 to Fig-4.10. Take Fig-4.7 as an example, Fig-4.7(a) is the outcome of ROI selection and then the selected ROIs would be separated as shown in Fig-4.7(b).

Following, the ROIs would be normalized and resized based on the distance between object and camera as presented in Fig-4.7(c). Finally, edge detection and distance transformation are implemented and the results are shown in Fig-4.7(d) and Fig-4.7(e), respectively. Fig-4.8, Fig-4.9 and Fig-4.10 are presented in the same way.

Fig-4.6 Comparison of the result of normalization. The human is originally standing at 1.6m, 2.0m, 2.4m, 2.8m, 3.2m and 3.6m from left to right.

(a) (b) (c) (d) (e)

Fig-4.7 Result of edge detection and distance transformation in the condition of walking pose

Fig-4.8 Result of edge detection and distance transformation in the condition of more than one human

(a) (b) (c) (d) (e)

Fig-4.9 Result of edge detection and distance transformation in complex background

4.3 Human Recognition

In this section, the human recognition system would be tested in different situations to examine the performance and reliability. Before presenting the results, it is required to introduce the method for evaluating the results. In general, the major objective of a detection system is to detect humans from an image or a sequence of images. In human detection, there are four possible events given in Table 4.1, including True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN). These four events are determined based on the actual condition and

(a) (b) (c) (d) (e)

Fig-4.10 Result of edge detection and distance transformation in the condition of one human and one chair.

test result, and they are listed as below:

1. True Positive, TP, means a real human is detected as human.

2. True Negative, TN, means a non-human is detected as non-human.

3. False Positive, FP, means a non-human is detected as human.

4. False Negative, FN, means a real human is detected as non-human.

With these four events, the true positive rate TPR and false positive rate FPR can be respectively defined as below:

A true positive rate of 100% means all humans are detected correctly, while a false positive rate of 0% means any non-human is not detected as human. To compare the performance of the system, the accuracy rate AR is defined as below:

TP+TN

AR= 100%

TP+TN+FP+FN× (4.3)

and a higher AR implies a better detection performance.

Table 4.1 TP, FP, FN, TN table

In order to examine the robustness of the human recognition system, many test images in different situations are collected. The possible situations could be roughly separated into three cases: different poses (DP), occlusion by other objects or humans (OC) and complex background (CB). In this thesis, the overall test image set, which contains 2714 test images, are separated into three groups, including 980 images in DP group, 1114 images in OC group and 620 images in CB group, as shown from Fig-4.11 to Fig-4.13. Through separating them apart, it is simple to observe and compare the reliability of the system in these situations.

Fig-4.11 Examples of test images in DP group

Fig-4.12 Examples of test images in OC group

After ROI selection and feature extraction, the selected ROIs and extracted features would be sent into the human recognition system to get results. Note that there are totally 9173 selected ROIs in test image set, including 1972 in DP group, 3842 in OC group and 3359 in CB group. In this thesis, there are two template sets, Set-I and Set-II, and two recognition approaches, voting-based approach and neural-network-based approach. Hence, there are four different methods, which are Set-I-Voting, Set-I-NN, Set-II-Voting and Set-II-NN. The performances of these methods in different test groups are shown in Table 4.2, including TPR, FPR and AR.

Moreover, Table 4.3 shows the TPR, FPR and AR of overall test images and the average executing time. Through these two tables, there are some conclusions that we could get:

 It is obvious that the accuracy rate of Set-II is higher than the accuracy rate of Set-I, especially in the OC group. Under slight occlusion, Set-I and Set-II both have good performance. Unfortunately, when suffering from serious occlusion, the accuracy rate of Set-I would drop obviously. However, the computational cost of Set-I is lower than Set-II and the average executing time of Set-I is lower than 0.1sec.

Fig-4.13 Examples of test images in CB group

 The performance of neural-network-based approach is better than the performance of voting-based approach. The concept of voting-based approach is straight and it is easy to implement. However, it couldn’t handle all the poses and situations because the definitions of the relations between different parts of body are brief. As for neural network, it could adjust its weight to difficult situations through the process of learning, but the training data has to be prepared and selected in advance.

Table 4.2 Comparison of performances in DP-, OC- and CB-group

DP OC CB

TPR FPR AR TPR FPR AR TPR FPR AR

Set I-Voting 89.21 0.90 94.22 81.04 4.12 88.55 85.43 8.77 90.12

Set II-Voting 92.81 0.60 96.15 89.73 3.09 93.36 89.61 7.41 92.02

Set I-NN 91.06 0.80 95.18 84.73 2.83 91.02 87.60 4.79 93.75

Set II-NN 94.86 0.40 97.26 92.05 2.06 95.03 92.25 3.32 95.83 DP=Different Poses. OC=Occlusion. CB=Complex Background. (%)

Table 4.3 Performances and average executing time

TPR FPR AR Executing Time

Set I-Voting 84.11% 5.78% 90.34% 0.089s

Set II-Voting 90.56% 4.72% 93.74% 0.122s

Set I-NN 87.01% 3.41% 92.91% 0.092s

Set II-NN 92.86% 2.37% 95.80% 0.131s

Chapter 5 Conclusions and Future Works

This thesis proposes an intelligent human detection system based on depth information generated by Kinect to find out humans from a sequence of images and resolve occlusion problems. The system is divided into three parts, including ROI selection, feature extraction and human recognition. First, the histogram projection and connected component labeling (CCL) are applied to select the ROIs according to the property that human would present vertically in general. Through histogram projection, the system could generate the rough vertical distribution in 3-D space.

Therefore, if the height of object exceeds a certain threshold, the object would be selected as an ROI and marked by CCL. Then, normalize each ROI based on its distance to camera and extract the human shape feature by the edge detection and distance transformation to obtain the distance image. Finally, the chamfer matching is used to search possible parts of human body under component-based concept, and then shape recognition is implemented by neural network according to the combination of parts of human body. From the experimental results, there are some conclusions listed as below:

 The proposed system could detect human with accuracy rate higher than 90% and average executing time about 0.1sec/frame. Besides, with the help of depth image and component-based concept, the system could also detect humans correctly even suffering from serious occlusion.

 The use of depth image to implement human detection would have some distinct advantages over conventional techniques. First, it is robust to illumination change

and influence of distance. Second, it could deal with occlusion problems efficiently.

Third, it is suitable for moving camera because no background modeling is required.

 The use of chamfer matching to achieve significant human features could highly reduce the dimension and size of the neural network. The conventional pattern recognition often directly applies a patch of image or the whole pixels of an ROI into the neural network. Consequently, the neural network requires hundreds and thousands of neurons in its input layer and a whale of training data for training.

With pre-processing via chamfer matching, the number of neurons in the input layer could be reduced to less than 50.

In order to improve the human-robot interaction, there are three functions often required for a robotic system, including human detection, human tracking and pose detection. With these three functions, the robot could detect humans in the image, track specific humans and interact with them based on their poses. Therefore, the interaction between human and robot could be more accurate and natural. In this thesis, the proposed system has been demonstrated to be successful in human detection. In the future, all the schemes developed in this thesis will be further applied to the implementation of the other two functions.

Reference

[1] N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," in Computer Vision and Pattern Recognition, 2005. CVPR 2005.

IEEE Computer Society Conference on, 2005, pp. 886-893 vol. 1.

[2] K. Mikolajczyk, C. Schmid, and A. Zisserman, "Human detection based on a probabilistic assembly of robust part detectors," in Computer Vision - Eccv 2004, Pt 1. vol. 3021, T. Pajdla and J. Matas, Eds., ed Berlin: Springer-Verlag Berlin, 2004, pp. 69-82.

[3] L. Zhe and L. S. Davis, "Shape-Based Human Detection and Segmentation via Hierarchical Part-Template Matching," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 32, pp. 604-618, 2010.

[4] D. M. Gavrila, "A Bayesian, Exemplar-Based Approach to Hierarchical Shape Matching," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 29, pp. 1408-1421, 2007.

[5] X. Lu, C. C. Chen, and J. K. Aggarwal, "Human detection using depth information by Kinect," in Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on, 2011, pp.

15-22.

[6] M. Bertozzi, E. Binelli, A. Broggi, and M. D. Rose, "Stereo Vision-based approaches for Pedestrian Detection," in Computer Vision and Pattern Recognition - Workshops, 2005. CVPR Workshops. IEEE Computer Society Conference on, 2005, pp. 16-16.

[7] D. M. Gavrila and S. Munder, "Multi-cue pedestrian detection and tracking from a moving vehicle," International Journal of Computer Vision, vol. 73, pp.

41-59, Jun 2007.

[8] S. Nedevschi, S. Bota, and C. Tomiuc, "Stereo-Based Pedestrian Detection for Collision-Avoidance Applications," Intelligent Transportation Systems, IEEE Transactions on, vol. 10, pp. 380-391, 2009.

[9] L. Zhao and C. E. Thorpe, "Stereo- and neural network-based pedestrian detection," Intelligent Transportation Systems, IEEE Transactions on, vol. 1, pp. 148-154, 2000.

[10] H. Fujiyoshi, A. J. Lipton, and T. Kanade, "Real-time human motion analysis by image skeletonization," IEICE Transactions on Information and Systems E Series D, vol. 87, pp. 113-120, 2004.

[11] "http://en.wikipedia.org/wiki/Kinect."

[12] C. Cédras and M. Shah, "Motion-based recognition a survey," Image and Vision Computing, vol. 13, pp. 129-155, 1995.

[13] M. Enzweiler, P. Kanter, and D. M. Gavrila, "Monocular pedestrian recognition using motion parallax," in Intelligent Vehicles Symposium, 2008 IEEE, 2008, pp. 792-797.

[14] R. Polana and R. Nelson, "Low level recognition of human motion (or how to get your man without finding his body parts)," in Motion of Non-Rigid and Articulated Objects, 1994., Proceedings of the 1994 IEEE Workshop on, 1994, pp. 77-82.

[15] J. Heikkila and O. Silven, "A real-time system for monitoring of cyclists and pedestrians," in Visual Surveillance, 1999. Second IEEE Workshop on, (VS'99), 1999, pp. 74-81.

[16] C. Nakajima, M. Pontil, B. Heisele, and T. Poggio, "Full-body person recognition system," Pattern Recognition, vol. 36, pp. 1997-2006, 2003.

[17] M. Spengler and B. Schiele, "Towards robust multi-cue integration for visual tracking," Machine Vision and Applications, vol. 14, pp. 50-58, 2003.

[18] Z. Tao and R. Nevatia, "Tracking multiple humans in complex situations,"

Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 26, pp.

1208-1221, 2004.

[19] H. Jain, A. Subramanian, S. Das, and A. Mittal, "Real-time upper-body human pose estimation using a depth camera," Computer Vision/Computer Graphics Collaboration Techniques, pp. 227-238, 2011.

[20] Z. Qiang, M. C. Yeh, K. T. Cheng, and S. Avidan, "Fast Human Detection Using a Cascade of Histograms of Oriented Gradients," in Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, 2006, pp. 1491-1498.

[21] P. Viola, M. J. Jones, and D. Snow, "Detecting Pedestrians Using Patterns of Motion and Appearance," International Journal of Computer Vision, vol. 63, pp. 153-161, 2005.

[22] D. G. Lowe, "Object recognition from local scale-invariant features," 1999, pp.

1150-1157 vol. 2.

[23] C. Wohler and J. K. Anlauf, "An adaptable time-delay neural-network algorithm for image sequence analysis," Neural Networks, IEEE Transactions on, vol. 10, pp. 1531-1536, 1999.

[24] M. Szarvas, A. Yoshizawa, M. Yamamoto, and J. Ogata, "Pedestrian detection with convolutional neural networks," in Intelligent Vehicles Symposium, 2005.

Proceedings. IEEE, 2005, pp. 224-229.

[25] N. D. Thanh, L. Wanqing, and P. Ogunbona, "A part-based template matching method for multi-view human detection," in Image and Vision Computing New Zealand, 2009. IVCNZ '09. 24th International Conference, 2009, pp. 357-362.

[26] A. Mohan, C. Papageorgiou, and T. Poggio, "Example-based object detection in images by components," Pattern Analysis and Machine Intelligence, IEEE

Transactions on, vol. 23, pp. 349-361, 2001.

[27] W. Bo and R. Nevatia, "Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors," in Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, 2005, pp.

90-97 Vol. 1.

[28] J. P. Serra, Image analysis and mathematical morphology: Academic Press, 1982.

[29] R. M. Haralick, S. R. Sternberg, and X. Zhuang, "Image Analysis Using Mathematical Morphology," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. PAMI-9, pp. 532-550, 1987.

[30] R. C. González and R. E. Woods, Digital Image Processing, 3rd Edition:

Pearson/Prentice Hall, 2008.

[31] R. Laganière, OpenCV 2 computer vision application programming cookbook:

Packt Publ. Limited, 2011.

[32] A. Rosenfeld and A. C. Kak, "Digital picture processing. Volumes 1 &

2//(Book)," New York, Academic Press, 1982, 1982.

[33] I. Sobel and G. Feldman, "A 3x3 isotropic gradient operator for image processing," a talk at the Stanford Artificial Project in, pp. 271-272, 1968.

[34] J. M. S. Prewitt, Object enhancement and extraction vol. 75: Academic Press, New York, 1970.

[35] J. Canny, "A Computational Approach to Edge Detection," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. PAMI-8, pp. 679-698, 1986.

[36] G. Borgefors, "Distance transformations in digital images," Computer vision, graphics, and image processing, vol. 34, pp. 344-371, 1986.

[37] A. Meijster, J. B. T. M. Roerdink, and W. H. Hesselink, "A general algorithm for computing distance transforms in linear time," Mathematical Morphology and its applications to image and signal processing, pp. 331-340, 2002.

[38] H. G. Barrow, J. M. Tenenbaum, R. C. Bolles, and H. C. Wolf, "Parametric correspondence and chamfer matching: Two new techniques for image matching," DTIC Document1977.

在文檔中基於深度資訊之智慧型人形偵測系統設計 (頁 64-0)