Chapter 1 Introduction
1.3 Related Work
In recent years, these have been widespread interests in studying problems with human detection, and many researches results have been repeated. So far, the proposed various methods can be classified into motion-based type and appearance-based type according to the way the targets are represented.
The motion-based approach observes the human’s motion in human movement through video sequences. The most direct way is to learn the typical motion patterns of human movement. For example, Viola et al. [1] compute the different directions between successive images as human motion patterns, which and then learned by AdaBoost algorithm to obtain a set of decision rules for detecting people. Little and Boyd [2] take the optical flow of two successive images as feature points of the human’s motion and analyze the motion periodicity to confirm the existence of moving human.
In [3], Heisele and Woehler extract the corresponding regions of two successive images through region tracking technique and use the Time-Delay Neural Network to detect the changing frequency of the width of the observing region to achieve human detection.
Through observation of image sequences, it is known that the gait of human’s walking is a distinct feature as suggested by Wang et al.[4]. Therefore, some researches put this periodic motion feature – gait – in use. In Cunado et al.[5], there are detailed descriptions and definitions about human gaits. They propose a pendulum model to describe the process of human’s walking. Curio et al. [6] combine texture and contour information extracted from video sequences along with motion patterns of humans’
gaits. Niyogi and Adelson [7] compute the differences of human silhouette in XYT-axis frame to obtain the gait pattern for gait detection. Based on the results in [8], Yang Ran et al. [9, 10] proposed a Twin-Pendulum Model to represent the gait of walking people.
They use image processing techniques to find the maximal and minimal angles between two limbs as the periodical motion features.
On the other hand, the appearance-based approach uses a set of appearance features of static human image to detect the existence of the target. This category uses low-level features to represent the possible human looks and apply standard pattern recognition process to find the corresponding appearance for detecting humans. Broggi et al. [11] use vertical symmetry properties of human shape to detect the position and the size of a human. Hayfron et al. [12] detect the humans by analyzing the symmetry information in spatio-temporal domain. Wu and Yu [13] proposed a two-layer statistical field model combining the Boltzman model and Markov model to describe the features of the non-rigid human shape. Their approach still works even when some parts of the human body are self occluded. Besides, the two-layer statistical field model flexibly describes the observations from the image. Another solution of appearance-based approach is based on template matching, which constructs the human templates from different viewing angles and poses and detects the humans by comparing the appearance feature with the constructed templates. For representing human appearance, Gavrila et al. [14, 15] and Liu et al. [16] characterize the human shape by silhouette or edge image and then transfer them into distance transformed images. But the above approaches fail to detect partially occluded humans due to the fact that they only take the global features – entire human shape. Thus, many researches detect humans via detection of each part of the human body and analysis of the relations among them to reconstruct the human shape. An example is by Mohan et al. [17], where they use an Adaptive Combination of Classifiers (ACC) to detect all kinds of body parts and integrate all part-classifiers to classify the humans. For more examples, Ramanan et al. [18] propose a pose model based on the human body parts, and they use a set of human poses to lock
the human candidates who are then tracked through detecting of the model generated from each image. Leibe et al. [19] propose an Implicit Shape Model (ISM) to model the relations between body parts and body centroid, and then apply a voting process to determine the human’s position. In order to tackle the problem with translation, scale, and orientation, many low-level features are proposed as well. For example, Oren et al.
[20] propose Haar vertical and horizontal wavelets to compute the intensity variations of the target’s appearance. The results by Wu and Nevatia [21] and by Sabzmeydani and Mori [22] use the edgelets and shapelets as the local features to describe the human shape. The edgelet feature is constructed by comparing the similarity between images and predefined edgelet templates, which differ in number of edge, orientations, single or pair. Similar to edgelet feature, shapelet feature is a set of edgelet feature. In another word, shapelet feature is a piece of shape. In addition, the work by N. Dalal et al.[23] is the first one which uses the Histograms of Oriented Gradients (HOG) to represent the features of people and becomes the performance benchmark in the field of human detection. Based on [23], Zhu et al.[24] use variable feature types to describe humans more flexibly and also improve the processing time by changing single complex classifier into a cascaded set of simple classifiers. Wang et al. [25] rotate each HOG feature according to its orientation to achieve the invariance of geometrical translation and rotation.
Recently, many automobile manufacturers including Toyota, DaimlerChrysler, BMW, Volvo, Honda, etc…have spent many efforts on transferring the technology of human detection to develop a pedestrian warning system on an intelligent vehicle. For example, the Mobileye [26] has developed maturity products that are equipped on advanced vehicles, but the prices of the entire system are still expensive because the system includes various types of sensors, like radar, laser, and camera. The related
information and materials of their work can be found on their respective web sites.