Pixel-level Methods - Image Analysis Techniques

2.1 Image Analysis Techniques

2.1.1 Pixel-level Methods

In most video surveillance systems, cameras are fixed. This static camera setting relaxes the difficulty of foreground object detection. Ideally, if we collect the color/intensity feature of a pixel over a temporal period, we may find, in most cases, the statistical property of the foreground color/intensity is somewhat different from that of the background color/intensity. Moreover, most of the period, the color/intensity feature at a pixel belongs to the background color/intensity. These two observations are the fundamental assumptions of many pixel-based background subtraction methods. Since background subtraction methods are simple and effective, this background modeling approach has become one of the popular tools in video surveillance applications.

temporal statistical property of every pixel. Based on the learned model, a pixel is classified as either a “background pixel” or a “foreground pixel” based on the current color/intensity observation at that pixel. Besides, the current observation is fed back to update the background model. By on-line learning the statistical property of the background color/intensity, this background modeling method can efficiently extract foreground regions from the background. Currently, several efforts have been proposed for the modeling of time-varying background. Some simpler methods used the 1^st order and 2^nd order statistics to model the temporal property of a pixel [104]. In these simple approaches, a pixel with its color/intensity feature far away from the mean value is classified as a foreground pixel.

On the other hand, some methods used more complicated parametric forms to model the dynamic statistics of the color/intensity feature at a pixel. Among those methods, the Gaussian mixture model (GMM) has been widely studied and has been proved to be a useful form for background modeling [2]. In principle, the distribution of a pixel value (x) over the temporal (t) direction is formulated as

( ( )) ^K _i( ) _au( ( ), , )_i _i

p x t w t g x t μ σ

∑

× ^{, (1)}

where p(x(t)) is the probability of observing the current pixel value x(t), wi(t) is an estimate of the weight of the ith Gaussian function gau(.) in the mixture model at Time

t. μ

andσ

i are the mean value and the standard deviation value of the ith Gaussian in the mixture at Time t. An example of the probability distribution of a pixel with a Gaussian mixture model is shown in Fig. 2.

Fig. 2. The probability distribution of a pixel with Gaussian mixture model

[2]

.

To classify a pixel into either a background pixel or a foreground pixel, Stauffer-Grimson [1] suggested firstly separating the K Gaussian distributions into background Gaussians and foreground Gaussians. Those pixels belonged to background Gaussians are determined as background pixels, and vice versa. To separate the K Gaussian distributions, the ratio wi /σi of each Gaussian distribution are calculated and is used to rank the K Gaussian distributions from small to large.

The first B Gaussian distributions, whose summation of their probability weights exceeds a threshold T, are treated as background Gaussians. This is formulated as

arg min( ^b _i )

b i

B w T

∑

> ^.⁽²⁾

On the other hand, the parameter sets {μi,σi ,wi } are dynamically updated over time to adapt to the environmental variation. By using a recursive filter to approximate the online Expectation-maximization (EM) algorithm [1], the parameter sets are updated based on the following formulation:

( ) (1

t

( )) (

t t

1) ( ) ( ( ), (

t Q x t t

1))

β

= −

λ β

− +

λ β

− . (3)

Here, β(t) could be any model parameter of {μi,σi ,wi } at Time t, λ(t) is the parameter learning rate, and x(t) is the new observation at Time t. The function Q(.) is

parameterβ(t−1). In Fig. 3, we show the detection results based on the Gaussian mixture method.

(a)

(b)

Fig. 3. Background subtraction results based on Gaussian mixture model.

Instead of using a parametric form to model the statistical property of a background pixel, Elgammal et al. [10] proposed the description of a background model based on non-parametric kernel density estimation. In their method, the pixel-wise statistical property along the temporal direction is modeled by a kernel density function. Given N successive intensity values Bx={x1, x2,…,xN} along a temporal period at a pixel, they estimate the probability density function (pdf) to be

( |_t _x) 1 ^N _BW( _t _i)

p x B K x x

N

₌

∑

− ^{. (4)}

Here, xt represents an intensity value. KBW is the kernel function with bandwidth BW

.

By assuming that most of the intensity values inside the observed time period belong to the background, a pixel with a smaller probability value p(xt) is more likely to be a foreground pixel. To adapt to the environmental variation over time, this algorithm simply shifts the time window to update samples for the estimation of the pdf function.

To overcome the appearance variations caused by surrounding lighting, a few

researchers try to record all possible forms of the background images and then dynamically select the most suitable background image from the stored background image database. Obviously, it would be inefficient to directly store all possible background images in a large database. Hence, Funck et al. [11] assumed that the background images would form a Euklidian subspace within the space formed by all image pixels. By applying the Principal Component Analysis (PCA) technique to calculate the major principal components, any background image could be represented as a linear combination of the derived eigen-backgrounds. With this eigen-background representation, any input image is firstly projected onto the background subspace to find the most matched background image. By subtracting the matched background image from the input image, foreground objects are identified.

Even though the detection of foreground objects based on pixel-level background modeling works pretty well for a scene with stationary background, this approach has difficulty in handling the occasional appearance ambiguity between a foreground object and its surrounding background [12]. When a foreground object happens to have an appearance similar to that of the surrounding background, the background model may not be enough for foreground/background discrimination.

Hence, instead of focusing on the background model, some other researchers proposed the learning of the foreground target model. For instance, Tsai et al. [13]

developed a probabilistic method to model a pixel-level car model in the chromatic domain. In their method, the RGB color features of many “car” pixels are collected and converted to a new color domain based on the following transformation.

( ) / 3

To combat the luminance variation problem, only the chromatic information (u, p) is

values of the “car” pixels cluster compactly in the u-p color space. This cluster can be approximated by a Gaussian function:

1 1 1 mean based on the training set of “car” pixels, and Σc is the estimated chromatic covariance matrix. Based on the car probability model in (6), the probability of being a “car” pixel at a pixel with the chromatic feature xc

can be evaluated.

在文檔中貝氏階層式結構於視訊監控之研究與應用 (頁 29-34)