Mixture of Gaussians - Vehicle Detection - Proposed Framework Overview

Chapter 3. Proposed Framework Overview

4.2 Vehicle Detection

4.2.1 Mixture of Gaussians

Different background objects may appear at a same location in a frame over time.

A representative example is that a traffic surveillance scene with trees and vehicles partially covering a roadway, then a same pixel location shows the values from tree leaves, vehicles, and the roadway itself. Thus, the background is not single modal in this case. So Stauffer and Grimson [14] propose a multi-valued background model to cope with multiple background objects.

They consider the values of a particular pixel location over time as a “pixel process”, {X1, …, Xt}, which is modeled by a mixture of K Gaussians. The probability of the current pixel X_t value is

where K is the number of Gaussian distributions and is determined by the various scenes in the different applications, ωi,t is the weight of ith Gaussian at time t, μi,t is the mean value of the ith Gaussian at time t, Σ^i,t is the covariance matrix of the ith Gaussian at time t, and η is a Gaussian probability density function:

)

In addition, for the reason of computation efficiency, the covariance matrix is assumed to be of the form:

Σkt kI

, =σ (12)

where k is an integer ranging from 1 to K. This means that the red, green, and blue pixel values are independent and have the same variances. Although this is not certainly the case, the assumption allows us to avoid the costly matrix inversion.

At each t frame time, a criterion is needed to provide discrimination between the foreground and background distributions. Therefore, each current pixel, Xt, in the frame is checked against the existing K Gaussian distributions until a match is found.

Once the pixel matches one distribution of the existing K Gaussian distributions, which means the pixel belongs to the background. Otherwise, the pixel belongs to the foreground. A match is defined as follows:

To solve the changes of geometry setting and luminance condition in most video sequences, it is necessary to track those changes of the K Gaussian distributions. In other words, the mixture of Gaussians background model has to be updated with new coming frames.

The authors implement an on-line K-means approximation instead of a costly expectation-maximization (EM) algorithm on a window of the recent data to estimate the updated model parameters. The weights of the kth Gaussian at time t, w_k,t , are

where α is the learning rate which determines the speed for the distribution’s parameters updating. After this approximation, the weights should be normalized. In addition, the μt and σt parameters for unmatched distributions remain the same. The two parameters of a distribution which matches the current pixel Xt are updated as follows:

where another learning rate, ρ, is

If none of the K distributions match current pixel X_t, the least probable distribution is replaced by a distribution which has the current pixel as its mean value μ, an initially high variance σ² and an initially low weight w.

While the parameters of the mixture model of each pixel change, we would like to determine which Gaussian distributions are most likely produced for the multi-modal background. To model this, a manner is required for deciding what parts of the mixture model best represents background. First, all the distributions are ranked based on the ratio between their weight, wk,t, and standard deviation, σk,t. This assumes that the higher and more compact the distributions are likely to belong to the background. Then, the first B distributions in ranking order which satisfy

), distributions that should be accounted for the background, are accepted as background.

4.2.2 Shadow Elimination

Shadow elimination is a critical issue to distinguish between the moving objects and the moving shadows for the robust vision-based systems. The shadow can cause various undesirable behaviors such as object shape distortion and object merging. To solve these problems, we combine the previous shadow removal works based on color reflectance [22] and gradient feature [20] to eliminate the cast shadow in our

framework.

First, we introduce a shadow elimination technique based on color reflectance.

The principle of color reflectance can be modeled as the multiplication of light energy and reflectance of object and expressed by the following equation.

C C

C ener refl

val = * (20)

where C stands for color channels: red, green, and blue, val^C is the value of color C, ener^C is the light energy of color C, and refl^C is the reflectance of color C. From a relationship between shadow, and background, the following relationship would be obtained [43]. color C in background, bg_ener^C is the light energy of color C in background. Thus, we realize that if a pixel belongs to shadow, the values must satisfy the following equation:

where thS is the threshold for identifying the shadow. Afterward, most of shadows would be removed from extracted object by the method. However, it is possible that some parts of vehicles are considered as shadow at the same time, which causes the broken vehicles. In general, morphological operation is a common approach to recover the broken vehicles, but it cannot recover those vehicles with serious damage.

Thus, we use an approach to recover the broken vehicles based on gradient feature of the moving vehicles before morphological operation.

The approach to get the gradient feature of the moving vehicles is proposed in [20]. First, calculate gradient images of the moving vehicle and its relevant background. Gradient of the moving vehicle contains gradient of moving vehicles and its shadows. Moreover, gradient of relevant background contains gradient of only background. The example gradient images of moving foreground and relevant background are shown in Figure 12. Based on observation, we can discover the gradient of the moving vehicles is different from that of relevant background, while the gradient of the moving shadow is similar to that of relevant background. Thus, the difference of the two gradient images can reserve most gradient information at the moving vehicles area which presents skeleton of the vehicles, and the shadow gradient at shadow region is removed as shown in Figure 13.

(a) (b)

Figure 12. Gradient images of foreground and its relevant background. (a) The moving foreground. (b) The gradient image of the moving foreground. (c) The relevant background. (d) The gradient image of the relevant background.[20]

Figure 13. Result of shadow elimination by using gradient feature.[20]

Finally, we integrate the detected gradient of the moving vehicles with the moving vehicles which are executed shadow removal by color reflectance to construct more complete moving vehicles. In this way, since the vehicle’s body is seriously damaged, the skeleton of the moving vehicles is able to make up for information loss.

Thus, the morphological operation still can be applied for recovering the vehicle according to its gradient data.

In addition, in order to obtain the gradient information of the moving foreground and the relevant background, Sobel filter is used to detect the gradient information with horizontal and vertical operators as shown in Figure 14.

(a) (b)

Figure 14. Operators of Sobel filter. (a) Operator for horizontal changes. (b) Operator for vertical changes.

在文檔中交通監控影片中日夜車流壅塞之分析及評估 (頁 40-47)