Background Model Estimation - 雙階層視訊分析 – 由靜態背景模型到動態前景切割

Described below are the derivations of the MAP formulation (2.2) in Sec.2.2.1.

For easy explanation, the derivations are decomposed into six parts, which are classifier training, iterative formulation, posterior probability decomposition, like-lihood probability decomposition, background block classification, and the final MAP formulation.

Classifier Training

To begin with, a MAP classifier derived from the training data D is defined by f^∗ = arg max_fP (f | D) = arg maxf P (f | X, Y). It can be interpreted as a supervised learning process to train an optimal classifier f^∗ from the training data D = {X, Y}. With the definition of f^∗, we can start to derive the following

equations to estimate a background model.

P ( eBt| It, D)

= R

P ( eBt| It, D, f ) p(f | It, D) df

= R

P ( eBt| It, f ) p(f | D) df

≈ P ( eBt| It, f^∗),

where P (f | D) is assumed to peak at the optimal classifier f^∗ (e.g., see [49], pp.474–476).

Iterative Formulation

To develop an iterative form for estimating a background model, we first define

Be^∗_t = arg max

Bet

P ( eBt | It, f^∗),

and

Be^∗_t−1= arg max

Bet−1

P ( eB_t−1| It−1, f^∗).

Then we have

(The image frames I_t,ℓ are used later to compute feature vectors for classification.)

Because the classifier f^∗ is used to perform block-wise (local) classifications, and I_{t−1,ℓ−1} are those image frames used in computing feature values, both of them are eliminated from the prior probability which is used to measure the global consistency over image blocks. That is, we simplify the prior term from P ( eBt | I_{t−1,ℓ−1}, f^∗, eB_t−1^∗ ) to P ( eBt | eB_t−1^∗ ).

Likelihood Probability Decomposition

Applying the assumption of block-wise independencies, the likelihood term can be further decomposed as follows. the ith block of frame It from an arbitrary on-line image stream, and it should be independent from our choice of a classifier f^∗ and what the ith block of a background model is at time t− 1, i.e., eb^∗i_t−1.

Background Block Classification

To utilize background block classification in estimating a background model, we have

Then we derive the decomposition for the image likelihood,

P (It| eBt, I_{t−1,ℓ−1}, eB_t−1^∗ , f^∗)∝Y

With all these derivations, we arrive at the following MAP optimization

Be_t^∗ = arg max

References

[1] P. M. Q. Aguiar and J. M. F. Moura, “Figure-ground segmentation from occlusion,” IEEE Trans. Image Process., vol. 14, no. 8, pp. 1109–1124, 2005.

[2] ——, “Joint segmentation of moving object and estimation of background in low-light video using relaxation,” in Proc. IEEE Int’l Conf. Image Processing, vol. 5, 2007, pp. 53–56.

[3] S. Ayer and H. S. Sawhney, “Layered representation of motion video using robust maximum-likelihood estimation of mixture models and mdl encoding,”

in Proc. IEEE Int’l Conf. Computer Vision, 1995, pp. 777–784.

[4] T. E. Boult, R. Micheals, X. Gao, P. Lewis, C. Power, W. Yin, and A. Erkan,

“Frame-rate omnidirectional surveillance and tracking of camouflaged and occluded targets,” in Proc. Second IEEE Workshop on Visual Surveillance, 1999, pp. 48–55.

[5] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 11, pp.

1222–1239, 2001.

[6] H.-T. Chen, H.-H. Lin, and T.-L. Liu, “Multi-object tracking using dynam-ical graph matching,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 2, 2001, pp. 210–217.

[7] S. Cohen, “Background estimation as a labeling problem,” in Proc. IEEE Int’l Conf. Computer Vision, vol. 2, 2005, pp. 1034–1041.

[8] A. Criminisi, G. Cross, A. Blake, and V. Kolmogorov, “Bilayer segmentation of live video,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, 2006, pp. 53–60.

[9] R. Cucchiara, C. Grana, M. Piccardi, and A. Prati, “Detecting moving ob-jects, ghosts, and shadows in video streams,” IEEE Trans. Pattern Anal.

Mach. Intell., vol. 25, no. 10, pp. 1337–1342, Oct. 2003.

[10] T. J. Darrell and A. P. Pentland, “Cooperative robust estimation using layers of support,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 17, no. 5, pp.

474–487, 1995.

[11] F. De la Torre and M. J. Black, “Robust principal component analysis for computer vision,” in Proc. IEEE Int’l Conf. Computer Vision, vol. 1, 2001, pp. 362–369.

[12] A. Demiriz, K. P. Bennett, and J. Shawe-Taylor, “Linear programming boost-ing via column generation,” Machine Learnboost-ing, vol. 46, no. 1-3, pp. 225–254, 2002.

[13] A. Elgammal, D. Harwood, and L. S. Davis, “Non-parametric background model for background subtraction,” in Proc. European Conf. Computer Vi-sion, vol. 2, 2000, pp. 751–767.

[14] T. Ellis and M. Xu, “Object detection and tracking in an open and dynamic world,” in Proc. IEEE Int’l Workshop Performance Evaluation of Tracking and Surveillance, 2001.

[15] M. Fradet, P. P´erez, and P. Robert, “Time-sequential extraction of motion layers,” in Proc. IEEE Int’l Conf. Image Processing, 2008, pp. 3224–3227.

[16] Y. Freund and R. E. Schapire, “Experiments with a new boosting algorithm,”

in Proc. Int’l Conf. Machine Learning, 1996, pp. 148–156.

[17] B. J. Frey, N. Jojic, and A. Kannan, “Learning appearance and transparency manifolds of occluded objects in layers,” in Proc. IEEE Conf. Computer Vi-sion and Pattern Recognition, vol. 1, 2003, pp. 45–52.

[18] J. Friedman, T. Hastie, and R. Tibshirani, “Additive logistic regression: a statistical view of boosting,” The Annals of Statistics, vol. 38, no. 2, pp.

337–374, April 2000.

[19] N. Friedman and S. Russell, “Image segmentation in video sequences: A probabilistic approach,” in Proc. Conf. Uncertainty in Artificial Intelligence, 1997, pp. 175–181.

[20] X. Gao, T. E. Boult, F. Coetzee, and V. Ramesh, “Error analysis of back-ground adaption,” in Proc. IEEE Conf. Computer Vision and Pattern

Recog-[21] W. Grimson, C. Stauffer, R. Romano, and L. Lee, “Using adaptive tracking to classify and monitor activities in a site,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1998, pp. 22–29.

[22] D. Gutchess, M. Trajkovics, E. Cohen-Solal, D. Lyons, and A. K. Jain, “A background model initialization algorithm for video surveillance,” in Proc.

IEEE Int’l Conf. Computer Vision, vol. 1, 2001, pp. 733–740.

[23] I. Haritaoglu, D. Harwood, and L. S. Davis, “A fast background scene mod-eling and maintenance for outdoor surveillance,” in Proc. Int’l Conf. Pattern Recognition, vol. 4, 2000, pp. 179–183.

[24] ——, “W4: Real-time surveillance of people and their activities,” IEEE Trans.

Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 809–830, 2000.

[25] M. Harville, “A framework for high-level feedback to adaptive, per-pixel, mixture-of-gaussian background models,” in Proc. European Conf. Computer Vision, vol. 3, 2002, pp. 543–560.

[26] E. Hayman and J.-O. Eklundh, “Statistical background subtraction for a mo-bile observer,” in Proc. IEEE Int’l Conf. Computer Vision, 2003, pp. 67–74.

[27] M. Heikkil¨a and M. Pietik¨ainen, “A texture-based method for modeling the background and detecting moving objects,” IEEE Trans. Pattern Anal. Mach.

Intell., vol. 28, no. 4, pp. 657–662, 2006.

[28] M. Irani and S. Peleg, “Motion analysis for image enhancement: Resolution, occlusion, and transparency,” J. Visual Communication and Image Represen-tation, vol. 4, pp. 324–335, 1993.

[29] N. Jojic and B. J. Frey, “Learning flexible sprites in video layers,” in Proc.

IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, 2001, pp. 199–

206.

[30] N. Jojic, J. Winn, and L. Zitnick, “Escaping local minima through hierarchical model selection: Automatic object discovery, segmentation, and tracking in video,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, 2006, pp. 117–124.

[31] Q. Ke and T. Kanade, “A subspace approach to layer extraction,” in Proc.

IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, 2001, pp. 255–

262.

[32] D.-W. Kim and K.-S. Hong, “Practical background estimation for mo-saic blending with patch-based markov random fields,” Pattern Recognition, vol. 41, no. 7, pp. 2145–2155, 2008.

[33] T. Ko, S. Soatto, and D. Estrin, “Background subtraction on distributions,”

in Proc. European Conf. Computer Vision, vol. 3, 2008, pp. 276–289.

[34] V. Kolmogorov, A. Criminisi, A. Blake, G. Cross, and C. Rother, “Bi-layer segmentation of binocular stereo video,” in Proc. IEEE Conf. Computer Vi-sion and Pattern Recognition, vol. 2, 2005, pp. 407–414.

[35] M. P. Kumar, P. H. S. Torr, and A. Zisserman, “Learning layered motion segmentations of video,” Int’l J. Computer Vision, vol. 76, pp. 301–319, 2008.

[36] D.-S. Lee, “Effective gaussian mixture learning for video background subtrac-tion,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 5, pp. 827–832, 2005.

[37] L. Li, W. Huang, I. Y.-H. Gu, and Q. Tian, “Statistical modeling of complex backgrounds for foreground object detection,” IEEE Trans. Image Process., vol. 13, no. 11, pp. 1459–1472, 2004.

[38] Y. Li, J. Sun, and H.-Y. Shum, “Video object cut and paste,” ACM Trans.

Graphics, vol. 24, no. 3, pp. 595–600, 2005.

[39] B. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” in Proc. DARPA IU Workshop, 1981, pp.

121–130.

[40] V. Mahadevan and N. Vasconcelos, “Background subtraction in highly dy-namic scenes,” in Proc. IEEE Conf. Computer Vision and Pattern Recogni-tion, 2008, pp. 1–6.

[41] S. J. McKenna, S. Jabri, Z. Duric, A. Rosenfeld, and H. Wechsler, “Tracking groups of people,” Computer Vision and Image Understanding, vol. 80, no. 1, pp. 42–56, October 2000.

[42] A. Mittal and D. Huttenlocher, “Site modeling for wide area surveillance and image synthesis,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 2, 2000, pp. 160–167.

[43] A. Monnet, A. Mittal, N. Paragios, and V. Ramesh, “Background modeling and subtraction of dynamic scenes,” in Proc. IEEE Int’l Conf. Computer Vision, 2003, pp. 1305–1312.

[44] A. Y. Ng, M. I. Jordan, and Y. Weiss, “On spectral clustering: Analysis and an algorithm,” in Advances in Neural Information Processing Systems 14, 2002.

[45] J. C. Platt, “Probabilistic outputs for support vector machines and compar-isons to regularized likelihood methods,” in Advances in Large Margin Clas-sifiers, A. Smola, P. Bartlett, B. Sch¨olkopf, and D. Schuurmans, Eds. MIT Press, 2000.

[46] G. R¨atsch, T. Onoda, and K.-R. M¨uller, “Soft margins for adaboost,” Machine Learning, vol. 42, pp. 287–320, 2001.

[47] C. Ridder, O. Munkelt, and H. Kirchner, “Adaptive background estimation and foreground detection using kalman-filtering,” in Proc. Int’l Conf. Recent Advances in Mechatronics, 1995, pp. 193–199.

[48] J. Rittscher, J. Kato, S. Joga, and A. Blake, “A probabilistic background model for tracking,” in Proc. European Conf. Computer Vision, vol. 2, 2000, pp. 336–350.

[49] B. Sch¨olkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Cambridge, Mas-sachusetts: MIT Press, 2002, pp. 469–516.

[50] M. Seki, T. Wada, H. Fujiwara, and K. Sumi, “Background subtraction based on cooccurrence of image variations,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 2, 2003, pp. 65–72.

[51] Y. Sheikh and M. Shah, “Bayesian modeling of dynamic scenes for object detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 11, pp.

1778–1792, 2005.

[52] J. Shi and J. Malik, “Motion segmentation and tracking using normalized cuts,” in Proc. IEEE Int’l Conf. Computer Vision, 1998, pp. 1154–1160.

[53] A. J. Smola and R. Kondor, “Kernels and regularization on graphs,” in Proc.

Ann. Conf. Computational Learning Theory / Kernel Workshop, 2003, pp.

144–158.

[54] C. Stauffer and W. Grimson, “Adaptive background mixture models for real-time tracking,” in Proc. IEEE Conf. Computer Vision and Pattern Recogni-tion, vol. 2, 1999, pp. 246–252.

[55] Y.-L. Tian, M. Lu, and A. Hampapur, “Robust and efficient foreground anal-ysis for real-time video surveillance,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, 2005, pp. 1182–1187.

[56] M. E. Tipping, “Sparse bayesian learning and the relevance vector machine,”

J. Machine Learning Research, vol. 1, pp. 211–244, Jun. 2001.

[57] P. H. S. Torr, R. Szeliski, and P. Anandan, “An integrated bayesian approach to layer extraction from image sequence,” IEEE Trans. Pattern Anal. Mach.

Intell., vol. 23, no. 3, pp. 297–303, 2001.

[58] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers, “Wallflower: Principles and practice of background maintenance,” in Proc. IEEE Int’l Conf. Com-puter Vision, vol. 1, 1999, pp. 255–261.

[59] D. Wang, T. Feng, H.-Y. Shum, and S. Ma, “A novel probability model for background maintenance and subtraction,” in Proc. Int’l Conf. Vision Inter-face, 2002, pp. 109–116.

[60] J. Y. A. Wang and E. H. Adelson, “Representing moving images with layers,”

IEEE Trans. Image Process., vol. 3, no. 5, pp. 625–638, 1994.

[61] Y. Weiss, “Smoothness in layers: Motion segmentation using nonparamet-ric mixture estimation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1997, pp. 520–527.

[62] Y. Weiss and E. H. Adelson, “A unified mixture framework for motion seg-mentation: Incorporating spatial coherence and estimating the number of models,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1996, pp. 321–326.

[63] J. Wills, S. Agarwal, and S. Belongie, “What went where,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, 2003, pp. 37–44.

[64] C. R. Wren, A. Azarbayejani, T. J. Darrell, and A. P. Pentland, “Pfinder:

Real-time tracking of the human body,” IEEE Trans. Pattern Anal. Mach.

Intell., vol. 19, no. 7, pp. 780–785, July 1997.

[65] J. Xiao and M. Shah, “Motion layer extraction in the presence of occlusion using graph cuts,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 10, pp. 1644–1659, 2005.

[66] H. Yang, Y. Tan, J. Tian, and J. Liu, “Accurate dynamic scene model for moving object detection,” in Proc. IEEE Int’l Conf. Image Processing, vol. 6, 2007, pp. 157–160.

[67] P. Yin, A. Criminisi, J. Winn, and I. Essa, “Tree-based classifiers for bi-layer video segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007, pp. 1–8.

[68] J. Zhong and S. Sclaroff, “Segmenting foreground objects from a dynamic textured background via a robust kalman filter,” in Proc. IEEE Int’l Conf.

Computer Vision, vol. 1, 2003, pp. 44–50.

[69] D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Sch¨olkopf, “Learning with local and global consistency,” in Advances in Neural Information Processing Systems 16, 2004.

[70] Q. Zhu, S. Avidan, and K.-T. Cheng, “Learning a sparse, corner-based repre-sentation for time-varying background modeling,” in Proc. IEEE Int’l Conf.

Computer Vision, vol. 1, 2005, pp. 678–685.

[71] Z. Zivkovic, “Improved adaptive gaussian mixture model for background sub-traction,” in Proc. Int’l Conf. Pattern Recognition, vol. 2, 2004, pp. 28– 31.

在文檔中雙階層視訊分析 – 由靜態背景模型到動態前景切割 (頁 133-148)