• 沒有找到結果。

5. 結果與討論

5.3. 結論與建議

在本計畫所做的主體是以單張照片為主。以單張的2D影像來說可獲得相關的深度資 訊是非常稀少的,而只能從單張圖片中搜尋有關深度的資訊來估測相對的深度值。在本 計畫中所做的只是初步去估測單張圖片的深度值,尚有許多地方待改進。

例如本計畫所估測的深度圖是在相同物件中給予相同深度值。而基本上物體有輪廓以及 背景拍攝有角度時應是一邊較近而另一邊則較遠,而本計畫中都是給予相同的深度值。

另環境因素會影響到判斷,以及物體的陰影所造成的判斷錯誤等地方仍需加以改善。大 多估測深度的研究都是以獲得不同照片或獲取相機等設定的參數去進行研究,因此本計 畫的重點是在於如何從單張影像中估測其深度值。

所以所估測的深度圖只是初步的結果,而在實驗結果部分配合本實驗室另一位學長 的研究中利用原圖以及深度圖做出的雙眼影像配合立體顯示器(Sharp 的)所呈現的立體 影像效果非常好。當然還是有許多缺點得改進,而主要改善的地方我們大概可以分為幾 點。第一判斷區塊的正確性應予提高,我們所做的實驗室皆以單純的背景做實驗,因此 在背景複雜時估測錯誤便會增加許多,這是因為在區塊判斷上面判斷錯誤,因此區塊判 斷的正確性可以提高深度的準確性。第二必須去除陰影,物體在光線照射下一定會有陰 影的部分,而在陰影的部分中會誤認為是不同的區塊,因此去除陰影是其中一項可以改 進的工作。第三本計畫中給予同一物件或區塊是相同的深度值,因此在判斷上面得能分 辨物體上是同一平面或是有深淺應給予不同的深度值,以及背景是平面或是斜面。這部 分可用清晰度、銳利度等方法來加以改進。第四點是在於環境的許多因素,如最常見的 光線照射以及亮度等問題,這部分影響最大,因此未來最主要的工作也在於此。第五照 片中清晰與模糊的部分判斷。在照片中清晰的地方大多為主體部分,而背景大多為模糊 部分,因此在模糊部分的判斷上面就更加困難。而清晰的部分又得區分其深淺。

以上五點是大略總結歸於本計畫未來必須改進的地方度的領域。立體影像是未來的 趨勢,因此如何利用大眾中多數擁有的單眼拍攝器材去估測深度必定是未來主要研究的 課題之一。由於立體影像的產生使人們之間不但可從聲音獲得對方的訊息更可藉由立體 影像增進與對方的互動,而科幻電影中不再是科幻而是實際的生活了。

參考文獻

[1] 李獻仁,”立體照片原理與製作(Lenticular 立體照片)”。

[2] Christophe Simon, Frederique Bicking and Thierry Simon, 2002, ” Estimation of depth on thick edges from sharp and blurred images”, IEEE Instrumentation and Measurement Technology Conference, Volume 1 pp.323- 328.

[3] Cassandra Swain, Alan Peters and Kazuhiko Kawamura, 1994, ” Depth Estimation from Image Defocus using Fuzzy Logic”, Proceedings of the 3rd IEEE International Conference on Fuzzy System, Volume 1, P9,94-99, Orlando, USA.

[4] F. Deschen, D. Ziou, P. Fuchs, 2002, ”Homotopy-Based Estimation of Depth Cues in Spatial Domain”, IEEE International Conference on Pattern Recognition, Vol.3 pp.627-630.

[5] Satoko OHTSUKA, Shinya SAIDA, 1994, ” Depth Perception from Motion Parallax in the Peripheral Vision”, IEEE international Workshop on Robot and Human Communication, pp.72-77.

[6] F. Deschˆenes, D. Ziou, 2003, ”Homotopy-Based Computation of Defocus Blur and Affine Transform”, in Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’03), Vol.1, pp.I-398-404.

[7] S. Battiatoa, A. Caprab, S. Curtib, M. La Casciac, 2004, ”3D Stereoscopic Image Pairs by Depth-Map Generation”, in Proceedings of the 2nd International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT’04), pp.124-131.

[8] Yamanoue, H., “The Differences Between Toed-in Camera Configurations and Parallel Camera Configurations in Shooting Stereoscopic Images”, Multimedia and Expo, 2006 IEEE International Conference, page(s): 1701-1704, 9-12 July 2006.

[9] Yamanoue Hirokazu, Nagayama Masaru , Bitou Mineo , Tanada Jun, “Subjective study on the orthostereoscopic conditions for 3D-HDTV”, 映像情報メディア学会技術報

告, 21(63), pp.7-12, 1997.

[10] Hoonjong Kang, Namho Hur, Seunghyun Lee, Hiroshi Yoshikawa, “Horizontal parallax distortion in toed-in camera with wide-angle lens for mobile device”, Optics Communications, Volume 281, Issue 6, Pages 1430-1437, 15 March 2008.

[11] William and Craig pp. 115 – 164 “Seeing 3D from 2D Images”.

[12] Fraunhofer-Institut fÄur Nachrichtentechnik, Heinrich-Hertz-Institut (HHI),

“Depth-Image-Based Rendering (DIBR), Compression and Transmission for a New

Approach on 3D-TV”, Stereoscopic Displays and Virtual Reality Systems XI,

Proceedings of the SPIE, Volume 5291, pp. 93-104 (2004).

[13] C. Fehn. “A 3D-TV Approach Using Depth-Image-Based Rendering (DIBR)” In

Proceedings of 3rd IASTED Conference on Visualization, Imaging, and Image Processing , pp. 482-487, Benalmádena, Spain, Sep. 2003.

[14] Marcelo Bertalmio, Gui

llermo Sapiro

, Vincent Caselles, Coloma Ballester, “Image inpainting”,

International Conference on Computer Graphics and Interactive Techniques, Proceedings of the 27th annual conference on Computer graphics and interactive techniques, Pages: 417- 424, 2000.

[15] Bertalmio, M., Bertozzi, A.L., Sapiro, G., “Navier-stokes, fluid dynamics, and image

and video inpainting”, Computer Vision and Pattern Recognition, 2001. CVPR 2001.

Proceedings of the 2001 IEEE Computer Society Conference, Volume 1, page(s): I-355- I-362 vol.1, 2001.

[16] Bertalmio, M., Vese, L., Sapiro, G., Osher, S., “Simultaneous structure and texture image inpainting”, Image Processing, IEEE Transactions, Volume 12, Issue 8, page(s): 882- 889, Aug. 2003.

[17] Antonio Criminisi, Patrick Pérez, and Kentaro Toyama, “Region filling and object removal

by exemplar-based image inpainting”, Image Processing, IEEE Transactions on,

Volume 13, Issue 9, page(s): 1200-1212, Sept. 2004.

[18] BianRu Li, Yue Qi, XuKun Shen, “An image inpainting method”, Computer Aided Design and Computer Graphics, 2005. Ninth International Conference, page(s): 6 pp.-, 7-10 Dec. 2005.

[19] Shantanu D. Rane, Guillermo Sapiro, and Marcelo Bertalmio , “Structure and Texture

Filling-In of Missing Structure and Texture Filling-In of Missing Compression Applications”, Image Processing, IEEE Transactions on Volume 12, Issue 3, page(s):

296- 303, March 2003.

[20] Tauber, Z., Ze-Nian Li, Drew, M.S., “Review and Preview: Disocclusion by Inpainting

for Image-Based Rendering”, Systems, Man, and Cybernetics, Part C: Applications and

Reviews, IEEE Transactions, Volume 37, Issue 4, page(s): 527-540, July 2007.

[21] Hong-Ming Wang, Jhing-Fa Wang, “Object Removal AlgorithmHardware in Image

Processing”, 國立成功大學 電機工程學系碩士論文 2004.

[22] Wan-Yu Chen, Yu-Lin Chang, Shyh-Feng Lin, Li-Fu Ding, and Liang-Gee Chen,

“Efficient Depth Image Based Rendering with Edge Dependent Depth Filter and

Interpolation”, Multimedia and Expo ,2005.ICME 2005, IEEE International Conference

on July 2005

“Real-Time Depth Image Based Rendering Hardware Accelerator for Advanced

Three Dimensional Television System”, Multimedia and Expo,2006 IEEE

International Conference on July 2006.

可供推廣之研發成果資料表

■可申請專利 ■ 可技術移轉 日期:98 年 8 月 25 日

國科會補助計畫

計畫名稱:三維立體深度估測技術開發及其晶片設計 計畫主持人:鄭芳炫

計畫編號:NSC 96-2221-E-216-039-MY2 學門領域:資訊工程 技術/創作名稱 1. 二維影像深度估測技術

2. 影像填補技術及其硬體設計 發明人/創作人 鄭芳炫

中文:

1. 本研究是從單張影像中尋找消失線與消失點去估測深度,再利 用原圖及深度圖計算左右雙眼影像給予立體顯示器使用。

2. 左右雙眼影像所產生之影像空洞填補技術及其硬體架構之晶片 設計

技術說明

英文:

1. In order to make stereo image more popular, we focus on how to estimate depth information from single image in this project. We utilize original image and depth map to calculate the binocular image and use 3D LCD to display the stereo image. The concept of vanishing point is used to estimate the depth information.

2. An image inpainting technique with its hardware architecture and chip design is developed to achieve real time application..

可利用之產業 及 可開發之產品

光電產業, 顯示器產業, 電視產業

立體數位相框, 三維立體顯示器, 三維立體電視

技術特點

1. 利用所做出的深度圖以及原圖去計算出雙眼影像配合 3D 立體 顯示器去顯示立體影像

2. 利用硬體設計達到即時左右雙眼影像空洞填補以完成真實3D 立體影像顯示

推廣及運用的價值

※ 1.每項研發成果請填寫一式二份,一份隨成果報告送繳本會,一份送 貴單 位研發成果推廣單位(如技術移轉中心)。

※ 2.本項研發成果若尚未申請專利,請勿揭露可申請專利之主要內容。

3.本表若不敷使用,請自行影印使用。

98 年 5 月 27 日

報告人姓名 鄭芳炫 服務機構

及職稱

中華大學資工系教授 時間

會議地點

2009/5/19~2009/5/23 日本橫濱 慶應大學

本會核定 補助文號

NSC96-2221-E-216-039-MY2 會議

名稱

(中文) 2009 年國際圖形識別聯盟機器視覺應用研討會 (英文) 2009 IAPR Conference on Machine Vision Applications 發表

論文 題目

(中文) 針對新一代視訊編碼的畫框間模式決定演算法

(英文) Inter Mode Decision Algorithm For Advanced Video Coding 一、參加會議經過

會議的開幕典禮由主辦單位與會議的委員會主席簡單的致歡迎詞後,隨即展開。由於本 會議為一個專業之研討會, 為了讓與會之學者專家不會錯過任何一個場次之研討, 會議 只安排一個場地進行。本屆會議由於受到新流感的影響參加人數比上屆少, 共有 144 篇 論文投稿, 經過嚴格的審查後, 最後通過 122 篇論文, 其中 39 篇安排口頭報告(oral), 而 83 篇為海報展示(poster)。大會安排了三天的會議議程, 共分為十五個 section, 其中安排 了三個場次之專題報告: (1) Large Scale Image Search (2) Focal Stack Photography:

High-Performance Photography with a Conventional Camera (3) Integration of Earth Observation Data: Challenge of GEOSS (Global Earth Observation System of Systerms),分 別由 Dr. Cordelia Schmid, INRIA, France,Prof. Kyros Kutulakos, University of Toronto, Canada 以及 Prof. Ryosuke Shibasaki, The University of Tokyo, Japan 做精彩的專題報告。

本人之論文『Inter Mode Decision Algorithm For Advanced Video Coding』被安排在第一天 的下午 section 3 之場次以海報的方式發表, 如下圖所示。本次會議尚有台灣之其他論文 發表, 也大都是以海報的方式發表。經過三天完整會議研討,與會者均有豐富的收穫。

附件三

二、與會心得

此次會議之地點在慶應大學的橫濱校區, 慶應大學目前仍是日本國內最好的學府之一,

共有六個校區, 因此參加此次會議充份感受到一流學府之氣息,的確有許多值得學習的 地方。本會議之定位是以專業精緻之研討會自許, 與一般大雜燴式之大型研討會不同。

主要目的是讓與會之學者能真正達到充份的學術交流, 而不是走馬看花。三天的會議安 排得十分緊湊, 每天都是從上午 9:00 至下午 6:00 止。本會議在第二天晚上安排了晚宴, 晚宴中除了報告此次會議的相關數據外如註冊人數 198 人, 也頒發了過去十屆以來之最 佳論文五篇, 其中有四篇都是日本的論文, 此點不禁讓我好奇此次的最佳論文是如何選 出的。畢竟台灣的研究水準也不錯, 同時參予的人數一直是僅次於日本, 但卻沒有一篇 最佳論文被選出, 個人感到有些遺珠之憾。

三、考察參觀活動(無是項活動者省略)

本會議的定位是專業精緻之研討會,三天的會議安排得十分緊湊, 每天都是從上午 9:00 至下午 6:00 止, 因此並無時間做考察參觀活動。

四、建議

每次參加研討會常常會在會場碰到許多台灣去的學者教授,若在出國前就可以互相聯繫 一起出席, 不僅在費用上可以比較節省, 在會議上也可以整合力量為台灣之學術界出聲 讓國際能充份了解台灣在學術領域之實力。本次會議共有 29 個國家的研究學者參加, 台 灣大概有十幾位教授及學生參加, 除本校中華大學外, 尚有清華大學、中正大學、北科 大、虎尾科大、中原大學、大葉大學、亞洲大學等。也許國科會可以在現有之網站上另 闢一個出席國際會議之交流園地, 讓國內之研究學者可以互通訊息, 不僅可以整合大家 的力量, 也可知道國內在國際學術界之活動能量。

五、攜回資料名稱及內容

本次會議攜回一本紙本的會議論文集,資料名稱為 Proceedings of the IAPR Conference on Machine Vision Applications。另有一片資料光碟為本次會議論文集之光碟版。

六、其他

Inter Mode Decision Algorithm For Advanced Video Coding

Fang-Hsuan Cheng

Department of Computer Science & Information Engineering, Chunag Hua University

Hsinchu, Taiwan 300 fhcheng@chu.edu.tw

Yea-Shuan Huang

Department of Computer Science & Information Engineering, Chunag Hua University

Hsinchu, Taiwan 300

Abstract

Variable block size used for inter coding is one of the key technologies in H.264/AVC. When different objects contained in the same macroblock have different motions, smaller block sizes probably achieve better predictions. However, this feature results in extremely high computational complexity when all the block sizes are considered to decide a best one.

This paper proposes a new inter mode decision algorithm to reduce the number of inter modes that has to be checked, and then encoding time is reduced. We use the co-located macroblock in previous frame and its neighbors as candidates, and check whether an edge of moving object is crossing the middle of these candidates by using the score given to the modes. The experimental results show that the proposed algorithm is able to reduce 31%-41% total encoding time and about 41%-54% motion estimation time with a negligible PSNR loss of 0.05 dB and bit-rate increment of 2% on the average.

Keyword:Variable Block Size; Motion Estimation; Mode Decision

1. Introduction

Video Compression plays an important role in digital video communication, transmission and storage. H.264/AVC [1-4] is the latest video coding standard developed by the JVT (Joint Video Team) of ISO/IEC Moving Picture Experts Group (MPEG) and ITU-T Video Coding Expert Group (VCEG).

While the H.264 belongs to H.26L family of VCEG and the AVC (Advanced Video Coding) belongs to MPEG-4 part 10.

This standard has been designed in order to provide higher coding efficiency and network adaptation, which includes a Video Coding Layer (VLC) and a Network Abstraction Layer (NAL). While the VCL represents the video content, and the NAL provides a network-friendly interface.

Comparing to the previous video coding standards, H.264/AVC achieves significant improvement in coding efficiency. This is due to the fact that a number of new techniques are adopted in this standard such as variable block size (VBS) motion estimation, multiple reference frames, quarter-pixel motion estimation, directional prediction of intra coded blocks, in-loop deblocking filter, integer DCT transform and context-based adaptive binary arithmetic coding

(CABAC) etc.. As a result, H.264 can save over half bitrate compared with that of MPEG-2 under the same quality.

Motion estimation (ME) is used as a main method for removing the redundantly information between frames in many video coding standards. H.264, like other video encoders, adopts block-based motion estimation to find a best block matching from a pre-defined search area, and performs variable block size motion estimation to indicate individual motion object in a macroblock. Figure 1 shows the seven block sizes and corresponding mode number/symbol in H.264.

We can divide these seven block sizes into two levels which are macroblock level and sub-macroblock level. In macroblock level, there are four inter modes and an additional skip mode (mode 0) which uses the same size with mode 1. If the macroblock is processed in sub-macroblock level, it can be further partitioned into 8x8, 8x4, 4x8 and 4x4 block sizes. The same works will be done in the four sub-macroblocks, and the order of process is from left to right and top to bottom even in mode 2 or 3.

Figure 1. Variable block sizes and corresponding mode number.

According to the new technique described above, H.264/AVC achieves higher coding efficiency than prior video coding standards. However, the large amount of computation makes the encode time extremely increase, thus, it is difficult to be used in practical applications especially in real-time environment. It can be seen that inter modes still take the biggest part of computation. For the reason, we propose a new inter mode decision algorithm to reduce the encoding time with negligible loss of coding efficiency.

The rest of this paper is organized as follows. Section 2 introduces some related works of inter mode decision in H.264. The proposed new inter mode decision algorithm is described in Section 3. The experimental results are shown in Section 4. And a conclusion will be given in Section 5.

相關文件