Background Model Maintenance via Density Estimation
3.1.2 Model Accuracy, Robustness and Sensitivity
To estimate a density distribution from a sequence of intensities I0,x, . . . , It,x2 for a pixel at a position x via the GMM, three issues regarding model accuracy, robust-ness and sensitivity need to be addressed. Specifically, a mixture model consisting of N Gaussian distributions at time instance t can be denoted by
P (It,x) = XN n=1
wt−1,x,nN It,x; µt−1,x,n, σt−1,x,n2 ,
where N symbolizes a Gaussian probability density function
N I; µ, σ2 .
2Here It,x ∈ R denotes the 1-D pixel intensity only. Yet, all our formulations can be easily extended to multi-dimensional color image processing, e.g., It,x ∈ R3.
µt−1,x,n and σt−1,x,n2 are the Gaussian parameters of the nth model, and wt−1,x,n is the respective mixture weight. For maintaining this mixture model, the parameters µt−1, σt−12 and wt−1 need to be updated based on a new observation It,x. In the GMM, the update rule for µ, for the case that It,x matches the nth Gaussian model, is
µt,x,n = (1− ρ)µt−1,x,n + ρIt,x,
where ρ∈ [0, 1] is a learning rate3 that controls how fast the estimate µ converges to new observations. Likewise, similar update rules can be applied to renewing σ2 and w, given corresponding learning rates.
In updating the Gaussian parameters µ and σ2, their values should reflect the up-to-date statistics of a scene as accurately as possible. It is thus preferable to set their learning rates to large values to quickly derive Gaussian distributions that fit new observations. Also as noted in [36], setting higher learning rates for µ and σ2 improves model convergency and accuracy, and brings few side-effect in model stability.
While the model estimation accuracy depends on the learning rates for µ and σ, one can see that the R-S trade-off is affected by the learning rate for the mixture weight w. In the original GMM for background model estimation, the classification of Gaussian models into foreground and background is done by evaluating their mixture weights through thresholding. The Gaussian models that appear more often will receive larger weights in the model updating process, and will possibly be labeled as background [54]. However, the frequency of model occurrence should not be the only factor that guides the changes of mixture weights. For example,
3The definition of learning rate is inherited from [54].
one may prefer to give large weights to the Gaussian models of tree shadows (for background adaptation) while to keep small weights to those of parked cars (for foreground detection), despite the similar frequencies of occurrence of these two objects. By incorporating the high-level information of pixel types, e.g., of shadow or car, into the weight updating process, flexible background modeling can then be carried out. As more pixel types are designated by a surveillance system, more appropriate controls on weight changes can be advised accordingly, which will help resolving the R-S trade-off in background modeling. Based on this observation, we propose a new bivariate learning rate control scheme based on a feedback of pixel type for GMM.
3.2 Bivariate Learning Rate Control via High-Level Feedback
Our presentations of the proposed bivariate learning rate control via high-level feedback is divided into three parts. Firstly, an algorithm of background model maintenance using the GMM is proposed, wherein two types of learning rates are formally defined. We highlight the importance of the learning rate control for mixture weights and elaborate its relationship to foreground pixel labeling.
Secondly, a feedback scheme that controls the learning rates for mixture weights is detailed. Under this feedback control, different learning rates can be applied to different image locations and scene types, which makes dynamic background adaptation possible. Thirdly, a heuristic based on frame difference is introduced to assist the learning rate control for the adaptation of double-quick lighting changes.
False alarms caused by, for example, sudden sunshine changes in the background can hence be suppressed by this heuristic while significant, object motions can still be captured.
3.2.1 Background Model Maintenance
Given a new observation of pixel intensity It,x, the task of background model maintenance is to match this new observation to existing Gaussian distributions, if possible, and to renew all the parameters of the Gaussian mixture model for this pixel. The detailed steps of the proposed background model maintenance using the GMM is shown in Algorithm 2.
For the model matching in Algorithm 2, l(t, x) is utilized to index the best matched Gaussian model of It,x, if existing. Otherwise, l(t, x) = 0 will be set to indicate It,x is a brand-new observation and should be modeled by a new Gaussian distribution. The matching results of It,x can be recorded by model matching indicators, i.e.,
and will be used in the later model update. Unlike [54] that adopts a more complex formulation in model matching, i.e.,
l(t, x) = arg min
n=1,...,N
|It,x− µt−1,x,n|
σt−1,x,n , (3.1)
a simple rule that selects the model of higher weight as the best match is used in
Algorithm 2. The proposed weight-based matching rule prefers matching a pixel observation to the Gaussian model of background (with higher weight) other than those of foreground, if this observation falls in the scopes of multiple models. Using this rule not only saves computational costs but also fits the proposed rate control scheme better, as will be discussed in more detail later.
After model matching, we check if Mt,x,l(t,x) is equal to 0, which implies no model matched. If so, a model replacement is performed to incorporate It,x into the GMM; otherwise, a model update is executed. In the replacement phase, the least weighted Gaussian model is replaced by the current intensity observation. In the update phase, the following three rules,
µt,x,l(t,x) = 1− ρt,x,l(t,x)(α)
µt−1,x,l(t,x)+ ρt,x,l(t,x)(α) It,x, (3.2) σ2t,x,l(t,x) = 1− ρt,x,l(t,x)(α)
σt−1,x,l(t,x)2 + ρt,x,l(t,x)(α) It,x− µt,x,l(t,x)
2
, (3.3)
wt,x,n = (1− ηt,x(β)) wt−1,x,n+ ηt,x(β) Mt,x,n, (3.4)
are applied, where ρt,x,l(t,x)(α) ∈ R denotes the learning rate for the Gaussian parameters µ and σ2, and ηt,x(β) ∈ R is a new learning rate introduced in this research for controlling the updating speed of the mixture weight w. Here, the two scalars α and β can be viewed as hyper-parameters over ρ and η for tuning their values. In [54], the learning rate ρ is defined as
ρt,x,l(t,x)(α) .
= αN It,x; µt−1,x,l(t,x), σt−1,x,l(t,x)2
, (3.5)
while in [36] it is given by in quicker convergence in Gaussian parameter learning [36], we still choose (3.5) in our implementation for experimental comparisons and put our emphasis on the control of the learning rate η for the mixture weight. In later experiments we will show that better performance can be achieved by controlling the learning rate η than by tuning the rate ρ. Also, as noted in [36], typical values of α are in [0.1, 0.001] for both (3.5) and (3.6), yielding a wide range of convergence rates in Gaussian parameter estimation. Here we set α = 0.025 as a default value for quick model learning.
In previous background modeling researches, e.g., [25], [36], [54], a naive setting for mixture weight update, i.e.,
wt,x,n = (1− α) wt−1,x,n+ α Mt,x,n, (3.7)
is adopted. The rule (3.7) can be viewed as a special case of the proposed weight update of (3.4) with ηt,x = α. In (3.7), all image pixels are confined to having an identical rate setting in mixture weight learning, so that scene changes can not be properly handled with respect to space and time. Instead, with our generaliza-tion that assigns individual learning rates for mixture weights to image pixels and adapts them over time, higher flexibility in regularizing background adaptation
4Interested readers can find the details in [36].
Algorithm 2: Background model maintenance
can be obtained. Note that the index n is not attached to ηt,x because the chang-ing rates for the weights wt,x,ns, ∀n, are designed to be consistent among the N Gaussian models of the same image pixel. Regarding the computation of ηt,x, we link it to the high-level feedback of pixel types and describe the feedback control in Sec. 3.2.2.
In the GMM, all the scene changes, regardless of being foreground or back-ground, are modeled by Gaussian distributions. To further distinguish these two classes, a foreground indicator Ft,x,n for each Gaussian model is defined using the