結論與未來工作 - 自動化演講錄製系統之虛擬導播子系統

6.1、結論

本研究建立了一套具有學習能力的自動化選鏡系統，稱之為「虛擬導播」，是屬於自動化演講錄製系統的子系統之一。虛擬導播能夠從多部攝影機傳送來的畫面中，經過影像的分析及評估後，選擇出最適合的畫面播放。研究內容主要分為三個部分：第一部分為影像特徵擷取；第二部分為影像品質的評估；第三部分為多重畫面的決策。

虛擬導播在分析影像品質時，除了使用常見的光學分析及美學分析以外；還考慮到觀賞者可能會有興趣的畫面，因而加入了情節動作分析；除此之外，不同於其他研究之處，則是顧慮到切換畫面時的流暢性，因此也將連續性分析納入評估標準中。

依據上述的評估準則，系統必須先對影像擷取所需的特徵。在偵測主體時，

考慮了主體物在畫面中的動態線索及靜態線索，進而製作出一張顯著圖像，而非單純估計主體物在畫面中的位置。顯著圖的好處是可以直接與 ROF 圖像對照，

進一步得知主體物在畫面中是否清晰；也可以從顯著圖中獲得主體物大小的資訊，

避免主體過大或過小而導致觀賞者在視覺上的不適。

而在多重畫面的決策時，由於真實導播在選鏡時，沒有依循固定的標準，因此我們提出一套學習機制，可以學得導播選鏡的技巧，使系統的運作方式能更接

近真實導播的專業手法。

6.2、未來工作

為了實驗時方便錄製起見，目前虛擬導播系統是在一台筆記型電腦上運作，

系統必須同時接收三組攝影機的資訊並且於同一時間進行大量的計算，導致輸出的畫面偶爾會有停格的現象。為了解決此問題，主體偵測可於攝影師端計算完成後，再將資料傳送至虛擬導播以減輕計算量龐大的負擔，也可以使得播出的畫面更為流暢。

由於本系統的學習模式屬於監督式學習，必須替訓練資料提供預期的輸出結果，雖然目前的預期輸出結果是由傳播相關科系並且具有導播經驗的學生所提供，

倘若能夠由真正的專業導播提供預期輸出，本系統的實驗結果將能夠更接近專業導播的水準。

本系統決策畫面的方式是透過學習而來的，未來不只受限於拍攝演講現場，

更可以透過不同種類的訓練資料，應用在球賽、演唱會、產品發表會等轉播或影片錄製，比起一般利用通用準則的選鏡系統，更能根據拍攝場合的不同，而運用不同的決策方式，使系統的應用更為彈性。

參考文獻

[Abd10] G. Abdollahian, C. M. Taskiran, Z. Pizlo, and E. J. Delp, “Camera Motion-Based Analysis of User Generated Video,” IEEE Transaction on Multimedia, Vol. 12, No. 1, 2010.

[Bia98] M. Bianchi, “Auto Auditorium: A Fully Automatic, Multi-camera System to Televise Auditorium Presentations,” Proc. of the Joint DARPA/NIST

Workshop on Smart Spaces Technology, 1998.

[Che95] Y. Cheng, “Mean Shift, Mode Seeking, and Clustering,” IEEE Transaction

on PAMI, Vol. 17, No. 8, pp. 790-799, 1995.

[Cru94] G. Cruz and R. Hill, “Capturing and Playing Multimedia Events with

STREAMS,” Proc. ACM Int’l Conf. on Multimedia, pp. 193-200, 1994.

[Fan03] C. Y. Fang, S. W. Chen, and C. S. Fuh “Automatic Change Detection of

Driving Environments in a Vision-Based Driver Assistance System,” IEEE Transactions on Neural Networks, vol. 14, no. 3, pp. 646-657, 2003.

[Gle00] M. Gleicher and J. Masanz, “Towards Virtual Videography,” Proc. of ACM Multimedia, pp. 375-378, 2000.

[Gof12] S. Goferman, L. Zelnik-Manor, A. Tal, “Context-Aware Saliency Detection,”

IEEE Trans. on PAMI, Vol. 34, No. 10, pp. 1915-1926, 2012.

[Hec87] R. Hecht-Nielsen, “Counterpropagation networks,” Applied Optics, Vol. 26,

Issue 23, pp. 4979-4983, 1987.

[Kum02] M. Kumano, Y. Ariki, M. Amano, K. Uehara,”Video Editing Support System Based on Video Grammar and Content Analysis,” Proc. of the Int’l

Conf. on Pattern Recognition(ICPR) , vol. 2, pp. 1031-1036, 2002

[Liu01] Q. Liu, Y. Rui, A. Gupta, and J. J. Cadiz, “Automating Camera Management for Lecture Room Environments,” Proc. of the SIGCHI Conf. on Human

Factors in Computing Systems, pp. 442-449, 2001.

[Liu11] T. Liu, Z. Yuan, J. Sun, J. Wang, N. Zheng, X. Tang, and H.Y. Shum,

“Learning to Detect a Salient Object, “ IEEE Transaction on PAMI, Vol. 33,

No. 2, pp. 353-367, 2011.

[Luc81] B. D. Lucas, T. Kanade, “An Iterative Image Registration Technique with an

Application to Stereo Vision,” Proceedings of Imaging Understanding Workshop, pp. 121-130, 1981.

[Mac02] E. Machnicki and L. Rowe, “Virtual director: Automating a webcast,”

Multimedia Comput. Network., 2002.

[Oku07] S. Okuni, S. Tsuruoka, G. P. Rayat, H. Kawanaka, T. Shinogi, “Video Scene Segmentation Using the State Recognition of Blackboard for Blended Learning,” Int’l Conf. on Convergence Information Technology, pp.

2437-2442, 2007.

[Oni04] M. Onishi and K. Fukunaga, “Shooting the Lecture Scene Using

Computer-Controlled Cameras based on Situation Understanding and Evaluation of Video Images” Proc. of the 17th Int’l Conf. on Mobile and

Ubiquitous Multimedia, pp. 781–784, 2004.

[Wan09] T. Wang, A. Mansfield, R. Hu, J. Collomosse, ”An Evolutionary Approach to Automatic Video Editing,” Proc. of the Int’l Conf. on Visual Media

Production(CVMP), pp. 127-134, 2009.

[Yen04] P. S. Yen, C. Y. Fang, and S. W. Chen, “Motion Analysis of Nearby Vehicles on a Freeway,” IEEE International Conference on Networking, Sensing and

Control, Vol.2, pp.903-908, 2004.

在文檔中自動化演講錄製系統之虛擬導播子系統 (頁 68-72)