Human Tracking - 電腦視覺為基礎之多部位人體追蹤系統設計

1. Introduction

1.2 Human Tracking

To track an object in a sequence of frames, we can model appearances of the object and then use the model to predict its position in the sequence. However, in a complex environment, detecting a target object using the appearance model in video sequences is not easy since the appearances of the object are variable due to occlusion, illumination variations, or orientation changes. In general, the movements of an object in consecutive frames are assumed smooth.

Therefore, if we can locate the target object in several frames, the appearance model and movement model of the target object obtained from these frames can be used to track the object in the following frames.

In this study, we aim to create the trajectory of a human and predict his positions for safeguarding, that is, to detect an intruder approaching a building or a designated place. Since a human is not a rigid object, his appearance might be greatly affected by his motion. We decomposed the human body into three parts: head, torso, and hip-leg, since the three parts usually have different appearances and can be distinguished as shown in Fig. 1.2. The images show that the colors of the head part contain mostly skin colors and hair colors, which are Fig. 1.2 Example of the three body parts used on tracking a person. The three body parts are shrunk and the limbs are excluded to reduce the affection from human motion.

usually different from the colors of the other two parts. The colors of the torso and hip-leg parts consist mainly of those of the clothes, which may be similar, as shown in the fourth and fifth images of Fig. 1.2. To separate the three parts, we have to use other features such as height ratios.

With respect to the features used, we adopted color histograms proposed by Perez et al.

[18] and Nummiaro et al. [19] to model the appearances of the three body parts. In the initialization phase, we adopted the background subtraction method according to a Gaussian background model to extract a human and then extract the histograms of the body parts from the human region. We then tracked the humans by using their appearances as the features and tracked the three parts by particle filters to reduce possible failures due to appearance changes by checking the consistency of states among these three parts. Since the appearance model of each person in recent frames was usually unique and temporally context-dependent, the model can be used to distinguish different persons and track them independently. However, when modeling the color histogram in the whole color space, histogram matching was time-consuming due to the high dimensional features used. The method proposed by Nummiaro et al. [19] quantized the color histogram into an 8 8 8 or 8 8 4 three-dimensional one. The method proposed by Perez et al. [18] modeled colors in HSV color space by two histograms. The intensity channel was modeled as a histogram and the other two channels as another two-dimensional histogram. The histograms were quantized into several bins to improve the speed and reduce the effect of noise. However, in these models, two objects with very few dissimilarities were not easily distinguished. In our research, we propose a specific histogram mapping for histogram feature extraction to improve the ability of discriminating the objects with similar color distributions. Since the camera in our system is fixed, the background scene can be assumed less changed in consecutive frames. To improve the discriminability between the target object and background

objects, we combined the adaptive background model with the adaptive color histogram model of the target object.

When adopting the color histograms as the features used in a particle filter, we need to extract the histogram feature for each particle. It is generally very inefficient to extract the features for a large number of particles. In this research, we will create a cumulative histogram map (CHM) for each image to improve the efficiency of feature extraction. The cumulative histogram map is similar to the integral map that is popularly used for extracting Haar-Like features [20]. They will be modified to cumulate the histogram features of each sample state in constant time.

For failure detection and adjustment, we will use a support vector machine (SVM) [21,22] to distinguish abnormally and normally tracked body parts. The position of an abnormal body part will be adjusted according to its relative positions with the other body parts. If a single part was abnormal, we adjusted its position and used the system dynamic model to track the abnormal one. If two or three parts were abnormal, we re-initialized the tracking process of the three parts around their predicted positions. Next, we detect whether the failure is caused by occlusion or similar appearances. For the latter case, we will estimate

Fig. 1.3 The system flow diagram of the three-part human tracker.

the appearance model from the adjusted rectangle of the body parts; else, the appearance model is kept unmodified. The flow diagram of our tracking system is depicted in Fig. 1.3. It includes four major modules: initialization, particle-filter-based tracking, abnormal body part detection, and state correction.

在文檔中電腦視覺為基礎之多部位人體追蹤系統設計 (頁 16-20)