Self-assessment

We successfully proposed a high speed algorithm compared to other object-based ones. Under object-based algorithm, we can make a complete object in the same depth. That is, we will feel that every part in the object is at the same distance with us. It is very important for a comfortable 3D experience.

This algorithm was also published in CVGIP, 2011 as an oral paper. It will be a good reference for someone who expects to do further research in this area.

However, this algorithm cannot reach our final goal – the real time application.

Although we believe that the parallel computation can speed up the algorithm a lot, it may not reach 30fps when handling a full HD sequence. For a realistic application, we should try to simplify and speed up this algorithm.

The other question is that there is no standard for evaluating the qualities of 2D to 3D conversion algorithm since there is no ground truth for a real 2D sequence.

Furthermore, the confortable 3D content is a complex issue because it will depend on human’s feeling. The judgment way should be different from the traditional 2D sequence. We believe that to establish a fair computation method to judge which algorithm is better will be an important issue in the related research.

 Published paper

Yi-Chun Chen et. al. “Efficient 2D to 3D conversion with Object-Based Segmentation”, Computer Vision, Graphics, and Image Processing (CVGIP), 2011

國科會補助專題研究計畫項下出席國際學術會議心得報告

日期： 100 年 8 月 2 日

一、參加會議經過

7 月 6 日抵達希臘科孚島。參加 SESSION W1C Image Sequence and Stereoscopic Processing，此場次有兩篇實驗室的論文發表。接下來下午各別參加了兩場 SESSION W3A Human 3D Perception and 3D Video Assessments 以及 W2B Architecture and Implementations I。晚上參加 Welcome Reception.

7 月 7 日早上參加 SESSION T1C Image and Video Coding I 並發表論文。接下來參加了 SESSION T2B Signal Processing for Communications II 並聽了幾篇論文報告。中午用完餐後，參加了下午的 Plenary 3: Binaural Signal Processing。接下來又參加了兩場 SESSION T3A Signal and Image Restoration 以及 SESSION T4B Image Processing Applications。

由於我們國內班機是 7 月 8 日下午，所以早上只去參加了一個 SESSION F1C Image and Video Coding II 之後就回飯店整理行李去搭機了。

計畫編號 NSC 99-2221-E-009-185

計畫名稱應用於 3D 視訊多媒體之多核心微型通訊系統研究--子計畫五：高畫質多視角立體視訊核心技術研究(I)

出國人員

姓名李國龍服務機構

及職稱交通大學電子工程系所會議時間 100 年 7 月 6 日至

100 年 7 月 8 日會議地點科孚島(Corfu),希臘( Greece)

會議名稱

(中文)第 17 屆國際電子電機學會數位訊號處理國際研討會

(英文)17th IEEE International Conference on Digital Signal Processing

發表論文題目

(中文) 一個適用於 H.264/AVC 可調式視訊編碼分數點移動估測之有效率模式預先選擇演算法

( 英文 )An Efficient Mode Pre-Selection Algorithm for H.264/AVC

Scalable Video Extension Fractional Motion Estimation

二、與會心得

此會議為訊號處理領域的主要會議，其中包含了通訊、語音編碼、視訊編碼、影像處理以及與數位訊號相關之議題。在此會議中，參加了數場發表場次，也聽了不少篇國外學者所做的研究。雖然說這個會議選擇在希臘的一個離島舉辦，原本預期心理會覺得應該論文數量不多且論文品質可能不高，但是，聽了這幾場之後，我發現此會議所接受的論文品質似乎具有一定的水準。所以，參加此會議讓我又多了解了許多數位訊號處理研究議題。

三、考察參觀活動(無是項活動者略) 略

四、建議

此會議雖然規模小，但卻是五臟俱全。然而，由於參加會議那段時間剛好是希臘罷工潮，所以遇到罷工事件導致科孚島上所有計程車在我們抵達那天皆沒有營運。

更慘的事是，主辦單位也沒有查覺這件事，最後導致我們拖著行李徒步至開會飯店。

因此，建議如果以後遇到國內有罷工事件的話，那麼主辦單位最好是安排公車讓參與者搭乘。

五、攜回資料名稱及內容

研討會會議手冊(內含會議行程以及詳細各場次所欲發表之論文題目) 會議光碟(內含所有完整會議資訊以及所有論文全文)

六、其他

無

DSP2011 notification for paper 115

2 封郵件

Gwo-Long Li <[email protected]>

DSP2011 <[email protected]> 2011年3月26日上午6:01

收件者: Gwo-Long Li <[email protected]>

Dear Author(s),

Thank you for submitting your manuscript to the 17th International Conference on Digital Signal

Processing (DSP2011). The review process has now been concluded and it is our pleasure to inform you that your proposed paper 115 entitled

AN EFFICIENT FRACTIONAL MOTION ESTIMATION MODE SELECTION ALGORITHM FOR H.264/AVC SCALABLE VIDEO EXTENSION

has been ACCEPTED for publication.

The reviews for your paper are attached below. They served as the basis for the Technical Program Committee's decision. Please make sure that you＇ll incorporate all the reviewers＇ suggestions and comments during the preparation of your camera-ready paper. In the next days you will receive additional information regarding your camera ready manuscript submission procedure and deadline.

Please note that for each accepted paper, it is required that at least one author registers (at full rate) for the conference before May 1st, 2011. Papers that are not registered by this deadline will not be

included in the proceedings and in the final program. Registration instructions will be sent to you soon.

Thank you for submitting your paper to DSP2011.

Looking forward to welcoming you to Corfu in July!

Best regards,

The DSP2011 Technical Program Chairs Andreas Floros, Ionian University, Greece Giovanni Poggi, University of Naples, Italy

--- REVIEW 1 --- PAPER: 115

TITLE: AN EFFICIENT FRACTIONAL MOTION ESTIMATION MODE SELECTION ALGORITHM FOR H.264/AVC SCALABLE VIDEO EXTENSION

AUTHORS: Gwo-Long Li and Tian-Sheuan Chang OVERALL RATING: 1 (weak accept)

NOVELTY AND ORIGINALITY: 4 (good)

TECHNICAL CONTENT AND CORRECTNESS: 4 (good) CLARITY OF PRESENTATION: 4 (good)

RELEVANCE TO THE CONFERENCE: 3 (fair)

For what motion estimation technique are the memory bandwidth requirements reported in sec. 2.1?

The bit rate differences in Fig 4 and 5 should be given in relative numbers (bit rate increase in percent) instead of absolute numbers (bit rate increase in kbps).

Page 1 of 2 Gmail - DSP2011 notification for paper 115

2011-08-02 https://mail.google.com/mail/?ui=2&ik=b51218734b&view=pt&cat=Paper%20Submissi...

It should be further discussed how the effectiveness of the proposed technique depends on the employed motion estimation strategy.

I would further suggest to add rate distortion curves (instead of the PSNR and bit rate difference curves) in order to illustrate the impact on the rate distortion efficiency.

--- REVIEW 2 --- PAPER: 115

TITLE: AN EFFICIENT FRACTIONAL MOTION ESTIMATION MODE SELECTION ALGORITHM FOR H.264/AVC SCALABLE VIDEO EXTENSION

AUTHORS: Gwo-Long Li and Tian-Sheuan Chang OVERALL RATING: 1 (weak accept)

NOVELTY AND ORIGINALITY: 3 (fair)

TECHNICAL CONTENT AND CORRECTNESS: 3 (fair) CLARITY OF PRESENTATION: 4 (good)

RELEVANCE TO THE CONFERENCE: 5 (excellent)

Fast mode decision algorithms have been in use for H.264. What is new in this paper for SVC is a way to avoid the fractional pel motion estimation for the new SVC modes - motion prediction, residual prediction.

The proposed method looks at integer pel cost to avoid some of the fractional pel cost

computations. While the basic concept is not new, it may be new when applied in the SVC concept.

Hence the rating for "innovation" and "technical content" are given as average.

Some of the experimental results are surely useful for the SVC comminity - hence the relevance is given as "excellent" - and seing from the CFP fo the conference, teh contents seem to fit in.

Points for potential improvement:

1. Some gramatical mistakes could have been avoided.

2. The introduction directly jumps to an existing fast ME / mode decision algorithm without even a paragraph of background to SVC, and what is currently available for SVC in public implementations.

Gwo-Long Li <[email protected]> 2011年3月28日上午11:13 收件者: [email protected]

[隱藏引用文字]

Page 2 of 2 Gmail - DSP2011 notification for paper 115

2011-08-02 https://mail.google.com/mail/?ui=2&ik=b51218734b&view=pt&cat=Paper%20Submissi...

AN EFFICIENT MODE PRE-SELECTION ALGORITHM FOR H.264/AVC SCALABLE VIDEO EXTENSION FRACTIONAL MOTION ESTIMATION

Gwo-Long Li and Tian-Sheuan Chang

Department of Electronics Engineering & Institute of Electronics National Chiao Tung University

ABSTRACT

H.264/AVC scalable video extension (SVC) adopts various advanced prediction modes to exploit the data redundancies between layers for better coding efficiency but at the cost of significantly increased computational complexity, especially for hardware realization of fractional motion estimation.

This paper proposes an efficient and hardware friendly mode pre-selection algorithm which only preserves the possible prediction modes for fractional motion estimation through the pre-selection rules proposed in this paper.

Simulation results demonstrate that our proposed algorithm can reduce up to 72.92% prediction modes with only 1.24%

bitrate increase and 0.02dB PSNR degradation.

Index Terms— Fractional motion estimation, Mode pre-selection, Inter-layer prediction, H.264/AVC Scalable Extension

1. INTRODUCTION

Fractional motion estimation (FME) is one of the commonly adopted techniques in video coding system to further improve the rate distortion performance [1]. The operation of FME is mainly composed by two stages called half-pixel stage and quarter-pixel stage and each stage executes search and interpolation process to find out the best prediction results. Although FME only checks several positions around the best motion vectors produced by integer motion estimation (IME), the computational complexity of IME and FME are almost equal to each other especially in hardware realization due to the complicated interpolation process and a lot of prediction modes need to be checked by FME [2].

As a result, the operations of IME and FME are usually divided into two different pipeline stages in hardware design [3] to aim at higher coding performance.

In H.264 video coding standard [1], seven block sizes are supported in inter prediction mode and each partition size has to be checked by IME and FME one by one to select the best prediction result as shown in Fig.1. Thus, 41 blocks have to go through IME and FME operation. In addition to the inherent prediction modes in H.264, the mechanism of inter-layer prediction adopted in SVC [4] significantly increases the computational complexity of FME as shown in Fig.2, including inter-layer motion prediction (ILM), and

inter-layer motion residual prediction (ILM+R). As a result, 41×4=164 blocks have to be examined by FME in SVC.

To simplify the design complexity in hardware realization, the small blocks ranged from 8×8 to 4×4 are early decided in IME stage to derive a Sub-mode and thus only partition sizes of 16×16, 16×8, 8×16, and Sub-mode (9 blocks in minimum and 21 blocks in maximum) have to be examined by FME operation as Fig.3(a) shown. Similarly, the idea of Sub-mode early decision can be also applied to SVC for easing the overhead of hardware implementation. As a result, only 36 to 84 blocks have to be examined by FME for SVC as shown in Fig.4. Although the early decision method for Sub-mode can efficiently reduce the overheads of FME, the computational complexity of FME is still high. Several works [5-8] have been proposed to increase the coding speed of FME in hardware implementation. In contrast to check all prediction modes, [9,10] proposed a mode pre-selection method as shown in Fig.3(b) to pre-selecting the potential skippable prediction modes before entering FME prediction process in H.264. However, none of above literature has addressed the issues of SVC. Thus, this paper proposes an efficient mode pre-selection algorithm to lighten the computational complexity of FME for SVC through the statistical observations.

The rests of this paper are organized as follows. In Section 2, some observations are introduced to indicate the rate distortion cost relationship between different prediction modes. Afterwards, the mode pre-selection algorithm is proposed according to the observations. Simulation results are shown in Section 3 to demonstrate the efficiency of our proposed algorithms. The conclusions are made in Section 4.

16x16 16x8 8x16 8x8

Fig.1 Illustration of mode selection process of H.264

16x16 16x8 8x16 8x8

Fig.2 Illustration of mode selection process of SVC

16x16 16x8 8x16 Sub-mode

Fig.4 Illustration of mode selection process for SVC 2. PROPOSED MODE PRE-SELCTION ALGORITHM In this section, we conduct several analyses to observe the relationship between the rate distortion costs (RDCosts) of IME and FME of different prediction modes.

2.1. Analysis for Inter-layer and Inter prediction modes Fig. 5 shows the relationship between RDCosts of IME and FME of different prediction modes. In this figure, the vertical axis indicates the RDCosts and the horizontal axis is the index of macroblocks. The InterI and ILMI individually stand for the IME RDCosts of Inter and ILM mode; the InterF and ILMF are the FME RDCosts of Inter and ILM mode, respectively. From this figure, we can derive a property that the RDCosts of IME and FME are very close to each other for the same prediction mode. For example,

the RDCost of IME is very close to the RDCost of FME for Inter mode and the same situation can be seen from ILM prediction mode. Therefore, if the IME RDCost of Inter mode is sufficiently larger than that of IME RDCost of ILM mode, the FME RDCost of Inter mode will be larger than FME RDCost of ILM mode and vice versa. As a result, we conduct several simulations to confirm the property that we observed and the statistical results are shown in Table 1. In this table, the conditional probability of P(A|E) is defined as follows. achieve up to 85.74% on average. In summary, we conclude that if IME RDCost of Inter (ILM) mode is sufficiently larger than that of IME RDCost of ILM (Inter) mode, the FME of Inter (ILM) mode can be skipped.

(a)

(b)

Fig.5 Relationship between RDCosts of IME and FME of different prediction modes (a) Football, (b) Foreman sequence

2.2. Analysis for partition size

We further analyze the relationship between RDCosts of IME and FME in different partition size. Four cases listed in Eq.(5) to Eq.(8) are used to produce the analytic results shown in Table 2. From this table, if IME RDCost of 16×16 is less than IME RDCost of 16×8 or 8×16, more than 95.83% of probability that FME RDCost of 16×16 will less than FME RDCost of 16×8 or 8×16. Therefore, we can

Fig.6 reveals the proposed FME mode pre-selection concept and Fig.7 shows the flowchart of our proposed mode pre-selection algorithm in which the candidate set of prediction modes Φ is defined as follows. obtained from the observation of the relationship between different partition sizes. A macroblock after IME operation will go through all determination process to filter out the potentially skippable prediction modes.

IME for

Fig.6 Illustration of mode FME mode pre-selection for SVC

3. SIMULATION RESULTS

In this section, several simulation results are shown to demonstrate the performance of our proposed FME mode pre-selection algorithm. The simulation settings are summarized in Table 3 and 12 test sequences including various motion activities are used to produce the simulation results. Table 4 shows the bit rate comparison of our proposed algorithm with JSVM9.17[11]. From this table, the bit rate increasing of our proposed algorithm is only 1.24%

on average. For PSNR comparison as shown in Table 5, our proposed algorithm only conducts 0.02dB PSNR degradation on average when compared to JSVM. The percentage of mode reduction of our proposed algorithm is listed in Table 6. Our proposed algorithm can achieve 72.92% mode reductions when compared to JSVM on average. For the high motion sequences such as Stefan, Soccer and Football, we can observe that the mode reductions are much higher than slow and median motion sequences. This situation is because that RDCost difference between two modes in high motion sequences is much larger than RDCost difference between two modes in slow motion sequences. Therefore, it is easy to distinguish the skippable modes by our proposed mode pre-selection rules listed in Eq.(1) to Eq.(4). Similarity, since RDCost difference between two modes in slow motion sequences is marginal, less prediction modes could be skipped by our proposed mode pre-selection algorithm.

Table 1. Statistical results of Eq.(1)

Sequences P(A|E)×100% Sequences P(A|E)×100%

Akiyo 97.81 Tempete 83.52

Dancer 96.71 Football 83.62

Coastguard 78.53 Foreman 81.59

Table 85.51 M&D 97.26

Mobile 70.61 Soccer 80.11

News 97.37 Stefan 76.25

Table 2. RDCost relationship between partition sizes Sequences P(A|E)×100% Sequences P(A|E)×100%

Akiyo 99.31% Tempete 91.77%

Dancer 98.80% Football 92.74%

Coastguard 97.03% Foreman 97.61%

Table 97.62% M&D 99.53%

Mobile 88.12% Soccer 98.12%

News 98.79% Stefan 90.54%

Table 3. Simulation settings

Reference software JSVM9.17 [11]

QP for spatial base layer 38 QP for Spatial enh. layer 32 Frame size in spatial base layer QCIF

Frame size in spatial enh. ayer CIF

Frames to be encoded 150

GOP 8

Table 4. Bitrate comparison of proposed algorithm JSVM Proposed Increasing (%)

Akiyo 29.23 29.72 1.69 % Table 5. PSNR comparison of proposed algorithm

JSVM Proposed Difference (dB)

Akiyo 39.94 39.92 -0.02

Table 6 Averaged mode reduction of our proposal (Unit: %) Akiyo Dancer Coastguard Table Tempete MD

62.50 68.75 75.00 75.00 75.00 68.75 Football Mobile Foreman News Stefan Soccer

81.25 75.00 75.00 68.75 75.00 75.00 4. CONCLUSION

In this paper, an efficient mode pre-selection algorithm for fraction motion estimation is proposed to reduce the computational complexity of fraction motion estimation in SVC. By observing the relationship between IME RDCosts and FME RDCosts of different prediction modes, several mode pre-selection rules are proposed to reject some potentially skippable modes before FME operation. In addition, since our proposed mode pre-selection algorithm is only composed by several simple additions, subtractions, and comparators, it can be easily realized in hardware form.

Simulation results demonstrate that our proposed algorithm can reduce 72.92% prediction modes on average before entering FME operation. In addition, the bitrate increasing and PSNR degradation of our proposed algorithm is only

Mode pre-selection for Inter-layer and Inter prediction modes

[1] Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification (ITU-T Rec.

H.264/ISO/IEC14496-10 AVC), March 2003.

[2] T.-C. Wang, Y.-W. Huang, H.-C. Fang, L.-G. Chen,

“Performance analysis of hardware oriented algorithm modifications in H.264,” in proceeding of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.2, pp.493-496, April 2003.

[3] Y.-K. Lin, D.-W. Li, C.-C. Lin, T.-Y. Kuo, S.-J. Wu, W.-C.

Tai, W.-C. Chang, and T.-S. Chang, ” A 242mW, 10mm2 1080p H.264/AVC High Profile Encoder Chip,” in proceeding of International Solid-State Circuits Conference (ISSCC), pp. 314-315, Feb. 2008.

[4] H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the scalable video coding extension of the H.264/AVC standard,”

IEEE Transaction on Circuits and Systems for Video Technology, vol. 17, no. 9, pp. 1103-1120, September 2007.

[5] H. Nisar and T.-S. Choi, “Fast and efficient fractional pixel motion estimation for H.264/AVC video coding,” in proceeding of IEEE International Conference on Image Processing, pp.1561-1564, 2008.

[6] C.-Y. Kao, C.-L. Wu, and Y.-L. Lin, “A High-Performance Three-Engine Architecture for H.264/AVC Fractional Motion Estimation,” IEEE Transactions on Very Large Scale Integration System, vol. 18, no. 4, pp.662-666, April 2010

[7] Y.-J. Wang, C.-C. Cheng, and T.-S. Chang, “A fast algorithm and its VLSI architecture for fractional motion estimation for H.264/MPEG-4/AVC video coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no. 5, pp.

578–583, May 2007.

[8] G. Kim, J. Kim, C.-M. Kyung, “A Low cost single-pass fractional motion estimation architecture using bit clipping for H.264 video codec,” in Proceeding of IEEE International Conference on Multimedia and Expo. pp.661-662, 2010.

[9] C.-C. Yang, K.-J. Tan, Y.-C. Yang and J.-I. Guo, “Low complexity fractional motion estimation with adaptive mode selection for H.264/AVC, in Proceeding of IEEE International Conference on Multimedia and Expo. pp.673-678, 2010.

[10] C.-C. Lin, Y.-K. Lin, and T.-S. Chang, “A fast algorithm and its architecture for motion estimation in MPEG-4 AVC/H.264,” in proceedings of Asia Pacific Conference on Circuits and Systems, pp.1250-1253, December 2006.

[11] ITU-T and I. JTC1. (2008) JSVM Software version JSVM 9.17.

國科會補助計畫衍生研發成果推廣資料表

日期:2011/10/15

國科會補助計畫

計畫名稱: 子計畫五：高畫質多視角立體視訊核心技術研究(I) 計畫主持人: 張添烜

計畫編號: 99-2221-E-009-185- 學門領域: 積體電路及系統設計

無研發成果推廣資料

99 年度專題研究計畫研究成果彙整表

其他成果

(無法以量化表達之成

果如辦理學術活動、獲得獎項、重要國際合作、研究成果國際影響力及其他協助產業技術發展之具體效益事項等，請以文字敘述填列。)

無

成果項目量化名稱或內容性質簡述

測驗工具(含質性與量性)

課程/模組

電腦及網路系統或工具

教材

舉辦之活動/競賽

研討會/工作坊

電子報、網站

科教處計畫加填項

目計畫成果推廣之參與（閱聽）人數

國科會補助專題研究計畫成果報告自評表

請就研究內容與原計畫相符程度、達成預期目標情況、研究成果之學術或應用價值（簡要敘述成果所代表之意義、價值、影響或進一步發展之可能性）、是否適合在學術期刊發表或申請專利、主要發現或其他有關價值等，作一綜合評估。

1. 請就研究內容與原計畫相符程度、達成預期目標情況作一綜合評估

■達成目標

□未達成目標（請說明，以 100 字為限）

□實驗失敗

□因故實驗中斷

□其他原因說明：

2. 研究成果在學術期刊發表或申請專利等情形：

論文：■已發表 □未發表之文稿 □撰寫中 □無專利：□已獲得 □申請中 ■無

技轉：□已技轉 □洽談中 ■無其他：（以 100 字為限）

3. 請依學術成就、技術創新、社會影響等方面，評估研究成果之學術或應用價值（簡要敘述成果所代表之意義、價值、影響或進一步發展之可能性）（以 500 字為限）

1. 學術成就: 完成現有針對單視點視訊的深度估測引擎演算法的效果分析

在文檔中應用於3D視訊多媒體之多核心微型通訊系統研究---子計畫五：高畫質多視角立體視訊核心技術研究(I) (頁 13-26)