Target Object Similarity - A Particle Filter with Discriminability Improved Histogram Model

4. A Particle Filter with Discriminability Improved Histogram Model

4.2 Target Object Similarity

To calculate the weight of each particle, the probability of a particle belonging to the tracking target should be calculated. Considering the captured video of a fixed camera, the tracking targets are usually the moving objects in the scene. If only a target object moves in the scene, we can model the background image and extract the target object by subtracting the background image from the current frame. However, Fig. 4.2 A sample image and the histogram of the Cr channel in the two rectangles (head and background).

(a) (b)

Fig. 4.2 The two histograms of the Cr channel in Fig. 4.2 quantized into eight bins.

(a) Uniform mapping. (b) Equalized mapping.

when multiple target objects are tracked, background subtraction is not enough to distinguish them. To distinguish different targets, Pérez et al. [18] introduced a target appearance model using color histograms, since the color histogram is robust against non-rigidity. When the target is moving, his appearance may change due to the variations of poses or illumination. If the target appearance changes, the appearance may fail to detect the target, but the background model can be used. Therefore, in our study, we integrate the similarities of a particle from both background model and target appearance model to form a robust tracking system.

4.2.1 Specific Histogram Mapping

In our application, we aim to track a human in consecutive color images. The color histogram model [19] is robust against partial occlusion, non-rigidity, and rotation. However, in our application, the region of a tracking target may be small. To track the object in small regions, the histogram may be sparse and not sufficient to represent the color distribution of the region. For instance, if the number of bins is set as 8 8 8 and the region in image is 32 32, the expected number of pixels in each bin is only two, which is insufficient to represent the color distribution. To represent the color distribution, we model the histogram in color channel independently. Here, we select YCbCr as the color space, since the three channels are assumed independent. We divide the values in each channel into eight bins respectively. The expected number of pixels in each bin is 128, which can represent the color distribution more sufficiently. Another benefit of the modification is the computational efficiency when we compare the histograms between a particle and the target object, because the total number of bins is reduced to 24.

To represent the color histogram in several bins, another important task is how to map from a range of colors in the histogram to a bin. If the range is equally quantized

for each bin and the histogram is compact, all pixels may fall into a small number of bins. In our cases, two different histograms cannot easily be distinguished. Fig. 4.2 shows two histograms of the Cr channel in the face region of a person and a background region, whose histograms are very different. When the ranges are equally divided into eight bins as shown in Fig. 4.2, the color distributions of the two regions will be very similar. To cope with the problem, we first choose one histogram H as the reference one for histogram equalization. The equalization can be denoted as

, where . is a function that equalizes the reference histogram H into an equalized histogram z, which is represented as a vector. The function . is then applied to another histogram H' to form a feature vector . Based on the mapping, we can prevent the pixels from falling into the same bins for two slightly different color distributions. Fig. 4.2 shows the quantized bins of the face region and background regions by selecting the face region as a reference one. In the figure, we can easily find that the two quantized histograms are different, especially in the third bin.

4.2.2 Target Appearance Model

We model the histogram in each of the three color channels in the color space YCbCr, since the three channels are assumed independent. We divide the values in each channel into eight bins. In the initialization phrase, the color histogram is extracted from the image of the target object. Since the target object is moving, its appearance may change gradually. To adapt to the changes, the histogram model is updated as follows:

1 , (4.10)

where and are the histogram models at time t+1 and t, is the histogram

directly extracted from the estimated state at time t, and is a constant used to control the updating speed. In a frame, the region of a particle state whose color histogram is similar to that of the target object should have a higher probability belonging to the target object.

4.2.3 Background Appearance Model

To check whether a particle state is located in the position of a moving object, we also check the differences between the current frame and the background scene. Here, we adopt a Gaussian background model [3] to extract the background image. In general, the background appearances may change due to background object moving or illumination change. To adapt to the change, the background model is updated as

, , 1 , , (4.11)

where , and , are the color vectors in pixel , of the background image and frame image in time t respectively. To detect the foreground object, the currently processed image can subtract with the background image. However, the pixel-wise background subtraction is sensitive to background variations such as variation of illumination or vibration of leaves. Since the background variation is not greatly affect the color distributions in a region, we extract the color histograms in the positions of particle states as the background features. In a frame, the region of a particle state whose color histogram is similar to that of the background image should have a lower probability belonging to the target object.

4.2.4 Similarity Measurement

The appearance of the particle that belonging to target object should be similar to target appearance model but different from the background appearance. As described

above, we can model the target color histogram and map it into N bins according to the mapping function . defined in Sec. 4.2.1, labeled as . We can also extract the background color histogram from the adaptive background model in the region of a particle and map it into N bins, labeled as . To measure the probability of a particle belonging to the target object, we extract the color histograms of particle from current processed image and map it into N bins, labeled as . A particle state with a higher probability belonging to the target object has the property that the distance between and must be small, and between and must be high. Therefore, the probability used in Eq. (4.6) can be formulated as

(a) (b) (c)

(d) (e) (f)

Fig. 4.3 An example of the similarity measurement by using the color histograms of target person and background image. (a) Image of a tracking target person, (b) Image of a tracking frame, (c) Background image, (d) Target appearance similarity map of the tracking frame, (e) Background appearance similarity map of the tracking frame, (f) The similarity map by combining background and target appearance models.

1 2 |Σ|

, , , (4.12)

, , . (4.13)

Fig. 4.3(d) shows the similarity map of Fig. 4.3(b) comparing with the target appearance model extracted from the person region of Fig. 4.3(a). Fig. 4.3(e) shows the similarity map of Fig. 4.3(b) comparing with the background appearance model extracted from Fig. 4.3(c). Fig. 4.3(f) shows the combines of the two similarity according to Eqs. (4.12) and (6.2). In the similarity map, the gray scale of a pixel represents the similarity of a region with the same size of the person region in Fig.

4.3(a) centered in the pixel. In Fig. 4.3(f), we can observe that the region of the target person in Fig. 4.3(b) has largest similarity.

Note that, in a particle filter, the measurement affects the selection of particles in the resampling step. The motion of the selected particles in the next frame is determined by the system dynamic model defined in Eq. (4.1). The noise W in the system dynamic model affects the distribution of the particles. Since we assume that the noise W is a Gaussian distribution, the distribution of particles forms a mixture of Gaussians in the state space. The set of Gaussian distributed particles will be placed in each important part of the state space, which will next be resampled according to the weights calculated from the measurements. All the selection and generation of particles are the characteristics of the particle filter.

在文檔中電腦視覺為基礎之多部位人體追蹤系統設計 (頁 44-49)