1
中 華 大 學
碩 士 論 文
題目 題目
題目 題目: :: : H.264/AVC H.264/AVC H.264/AVC H.264/AVC 之快速編碼 之快速編碼 之快速編碼 之快速編碼模式決策 模式決策 模式決策 模式決策
A Fast Mode Decision Method for H.264/AVC Coding Using the Spatial-Temporal Prediction
Scheme
系 所 別:資訊工程學系碩士班 學號姓名:M09202011 喻仲平 指導教授:連振昌 博士
中 華 民 國 九十四 年 七 月
1
摘要 摘要 摘要 摘要
在H.264/AVC的壓縮標準中,此標準擁有極有彈性的motion estimation
mode,多張參考的frames,在I-frames中有intra的預測模型,motion estimation
的在精確,entropy coding…等都被用來提供獲得最佳的R-D成本。特別是,針
對每一個macroblock,H.264/AVC中總共有七種motion estimation mode的大小
(從4×4到16×16)來去找出最小的compensation error。然而,在參考軟體JM-9.3
中使用了全域搜尋演算法(Full Search)會造成龐大的計算量使得壓縮變的無效
率。因此,有些方法使用SAD(sum of absolute difference),同質性(homogeneous)
區 域 分 析 或 是 邊 緣 偵 測 (edge detection) 來 決 策 出 較 為 適 合 的 motion
estimation mode。但是,許許多多的快速搜尋方法都要耗費額外的影像處理運
算量。所以在本篇論文中,提出了在各frame間,以空間跟時間(spatial-temporal)
關 聯 性 來 節 省 motion estimation 的 時 間 。 更 進 一 步 的 , 應 用 了 drift
compensation的概念去避免預測飄移(prediction drift)的現象。最後實驗結果
顯示了此方法節省了約60%的motion estimation運算量,且PSNR平均只掉落了約
0.05dB。
2
致謝 致謝 致謝 致謝
很幸運的有這個機會能夠在中華大學的校園裡求學,並且順利取得碩士學
位,度過值得回憶的兩年研究生活,是很令人興奮的。在這之中首先,我要感謝
我的指導教授 連振昌教授在這兩年間不僅是在專業知識上的教導以及細心的教
學外,在人生的態度以及做人處世的道理到求學的精神更是讓我獲益良多,這兩
年在連教授一點一滴的栽培下,才得以順利的完成研究所的學業。
其次,要感謝智慧型多媒體實驗室許許多多朝夕相處的夥伴們,尤其感謝勝
政、秋龍、振宇學長在我論文研究上給予莫大的指導與幫助,以及禮潭、嘉宏、
衍毅、堯弘、士棻、志豪、學偉同學的相互支持和照顧下,志強、揚凱、鬱婷…
等眾多實驗室學弟妹的幫忙及陪伴,使得兩年的生活過的多采多姿,感謝你們讓
我留下美好甜蜜的回憶。
最後,感謝我的父母親,因為您們無時無刻的叮嚀並照顧使得我可以如期完
成我的論文。
3
目錄 目錄 目錄 目錄
摘要 摘要 摘要
摘要 ... 1 致謝致謝
致謝致謝 ... 2 目錄
目錄 目錄
目錄 ... 3 第一章第一章
第一章第一章 ... 4 簡介
簡介 簡介
簡介 ... 4 第二章第二章
第二章第二章 ... 5 H.264/AVC
H.264/AVC H.264/AVC
H.264/AVC 以及以及以及 JM以及JMJM-JM---9.39.39.39.3 參考軟體參考軟體參考軟體... 5 參考軟體 第三章第三章
第三章第三章 ... 6 本篇論文所提出的快速模式決策方法
本篇論文所提出的快速模式決策方法 本篇論文所提出的快速模式決策方法
本篇論文所提出的快速模式決策方法 ... 6 第四章第四章
第四章第四章 ... 7 實驗結果
實驗結果 實驗結果
實驗結果 ... 7 第五章
第五章 第五章
第五章 ... 8 結論
結論 結論
結論 ... 8 英
英 英
英 文文文 文 附附附附 錄錄錄錄 ... 9
4
第一章 第一章 第一章 第一章
簡介 簡介 簡介 簡介
在最新一代的壓縮技術-H.264/AVC 中,此標準擁有極有彈性的 motion
estimation mode,多張參考的 frames,在 I-frames 中有 intra 的預測模型,
motion estimation 的在精確,entropy coding…等都被用來提供獲得最佳的 R-D
成本。特別是,針對每一個 macroblock,H.264/AVC 中總共有七種 motion
estimation mode 的大小(從 4×4 到 16×16)來去找出最小的 compensation
error。跟過去的壓縮技術比較起來,例如:H.263 CHC (Conversational High
Compression),H.263 Baseline,以及 MPEG-4 SP (Simple Profile),分別提
升了 27%,40%,以及 29%的壓縮率。然而,在參考軟體 JM-9.3 中使用了全域搜
尋演算法(Full Search)會造成龐大的計算量使得壓縮變的無效率。所以在本篇
論文中,提出了在各 frame 間,以空間跟時間(spatial-temporal)關聯性來節省
motion estimation 的時間。更進一步的,應用了 drift compensation 的概念
去避免預測飄移(prediction drift)的現象。第二章將會講解一下 H.264/AVC
跟 JM-9.3 參考軟體的架構,第三章是我們提出的快速搜尋演算法,第四章是實
驗結果,第五章是結論。
5
第二章 第二章 第二章 第二章 H.264/AVC
H.264/AVC H.264/AVC
H.264/AVC 以及 以及 以及 JM 以及 JM JM- JM -- -9.3 9.3 9.3 9.3 參考軟體 參考軟體 參考軟體 參考軟體
由 MPEG 以及 ITU-T 組織所共同制定的 H.264/AVC 壓縮標準中,此標準的主
要目標是在於改進壓縮的效率(compression efficiency),對網路有更好的親和
性(network friendly)還有錯誤的回復機制(error robustness)。在 H.264/AVC
中,整體的壓縮架構流程跟過去並沒有什麼太大的差異,所以在此章中,會導讀
一下 H.264/AVC 的架構以及在參考軟體 JM-9.3 中重要的一些壓縮流程。
6
第三章 第三章 第三章 第三章
本篇論文所提出的快速模式決策方法 本篇論文所提出的快速模式決策方法 本篇論文所提出的快速模式決策方法 本篇論文所提出的快速模式決策方法
首先我們會先分析各 frame 間模式預測結果的關聯性,由這個關聯性的觀察
我們導出我們的快速搜尋演算法機制,利用過去已經決策出來的模式來統計出未
來可能也會出現相同的模式的觀念,讓沒有必要的模式就取消掉不要去做額外的
搜尋動作以節省時間。要達到此目的就必須先去知道現在的 macroblock 是在過
去畫面間大約那個位置,這裡用到了 motion vector 的 prediction,然後去算
出預測 macroblock 周圍 3×3 的 macroblock 的模式分布,挑出前兩名的模式分布
用來當作我們的模式候選人來壓縮,並且再做 drift compensation 就是我們所
有 可 能 需 要 去 壓 縮 的 模 式 , 最 後 找 出 最 小 成 本 的 模 式 就 是 我 們 要 對 此
macroblock 壓縮的模式。
7
第四章 第四章 第四章 第四章 實驗結果 實驗結果 實驗結果 實驗結果
我們用了以下的環境來進行我們的實驗:
1. 影片壓縮長度為 200 張 frames。
2. 參考 frames 設定為五張。
3. Motion estimation 的搜尋範圍設定為 16 個 pixels。
4. 開啟針對 DC 的 Hadamard 轉換。
5. 開啟 Rate-distortion 的最佳化。
6. GOP 度為 13 張 frames。
7. 共測試九段的 yuv qcif 影片
實驗進行分三階段,第一階段會把 QP 鎖住在 28、32、36 跟 40 來比較效率
如何。第二階段會變動不同的 rate,在各種 rate 下的壓縮效率為何。最後會跟
兩篇過去提出的快速模式搜尋演算法來比較看看,我們提出來的效率為何。
8
第五章 第五章 第五章 第五章
結論 結論 結論 結論
在我們所提出的利用空間跟時間(spatial-temporal)關聯性來節省motion
estimation的時間,實驗結果證明了我們平均提升了約60%的motion estimation
運算量,且PSNR平均只掉落了約0.05dB,並且沒有其餘額外的影像處理技巧的運
算。本篇論文的方法可以很快的實作並且運用在H.264/AVC下,並且大大的提升
壓縮效能。
9
英 英 英
英 文 文 文 文 附 附 附 附 錄 錄 錄 錄
2
A Fast Mode Decision Method for H.264/AVC Coding Using the Spatial-Temporal Prediction Scheme
Prepared by Chung-Ping Yu Directed by
Dr. Cheng-Chang Lien
Computer Science and Information Engineering Chung Hua University
Hsin-Chu, Taiwan, R.O.C.
July, 2005
I
ABSTRACT
In the H.264/AVC coding standard, large flexibilities for the motion estimation
mode, multiple reference frames, intra prediction modes for I-frame, motion
estimation refinements, entropy coding… etc. are provided to obtain the optimum
R-D cost. Especially, there are seven motion estimation modes from 4×4 to 16×16 are
used to find the minimum motion compensation error for each macroblock. However,
the high computation cost of the full search method in the reference software JM-9.3
make the encoding process inefficient. Therefore, the methods of applying the SAD
(sum of absolute difference), homogeneous region analysis, and edge detection are
developed to determine the optimum motion estimation mode. However, the
additional computation cost of the image processing will reduce the efficiency of the
motion compensation process. In this thesis, the spatial-temporal correlations between
the current frame and the reference frame are analyzed to develop a fast mode
decision method in which no extra image processes are used. Furthermore, the
concept of drift compensation is adopted to avoid the error accumulation phenomenon
during the mode decision process. The experimental results show that the computation
cost may be reduced above 60% and the average PSNR is only dropped about 0.05db.
Keywords: H.264, JM-9.3, mode decision, R-D Cost
II
CONTENTS
ABSTRACT...I CONTENTS... II LIST OF TABLES ...III LIST OF FIGURES...V
CHAPTER 1 ... 1
INTRODUCTION ... 1
CHAPTER 2 ... 5
THE H.264/AVC AND JM-9.3 REFERENCE SOFTWARE ... 5
2.1THE H.264/AVCBASELINE PROFILE... 6
2.2THE JM-9.3ENCODER STRUCTURE... 8
2.3H.264/AVCPERFORMANCE ANALYSIS... 10
CHAPTER 3 ... 17
THE FAST MODE DECISION METHOD... 17
3.1THE SPATIAL-TEMPORAL MODE CORRELATION... 17
3.2THE FAST MODE DECISION METHOD USING SPATIAL-TEMPORAL MODE CORRELATION... 21
CHAPTER 4 ... 26
EXPERIMENTAL RESULTS... 26
4.1THE EFFICIENCY AND PSNRANALYSES FOR FIXED QPPARAMETERS... 26
4.2THE RATE-DISTORTION ANALYSES... 32
CHAPTER 5 ... 42
CONCLUSION... 42
REFERENCE ... 43
III
LIST OF TABLES
TABLE 2.1THE RELATED PARAMETERS IN THE H.264/AVC FOR THE TESTING PROCESS... 11
TABLE 2.2THE EFFICIENCY ANALYSIS USING THE AKIYO TEST SEQUENCE... 11
TABLE 2.3THE EFFICIENCY ANALYSIS USING THE CONTAINER TEST SEQUENCE... 11
TABLE 2.4THE EFFICIENCY ANALYSIS USING THE HALL_MONITOR TEST SEQUENCE. ... 11
TABLE 2.5THE EFFICIENCY ANALYSIS USING THE MOTHER_AND_DAUGHTER TEST SEQUENCE. ... 12
TABLE 2.6THE EFFICIENCY ANALYSIS USING THE NEWS TEST SEQUENCE. ... 12
TABLE 2.7THE EFFICIENCY ANALYSIS USING THE SALESMAN TEST SEQUENCE. ... 12
TABLE 2.8THE EFFICIENCY ANALYSIS USING THE CARPHONE TEST SEQUENCE... 12
TABLE 2.9THE EFFICIENCY ANALYSIS USING THE COASTGRD TEST SEQUENCE... 13
TABLE 2.10THE EFFICIENCY ANALYSIS USING THE FOREMAN TEST SEQUENCE. ... 13
TABLE 2.11 THE COLOR DEFINITION... 14
TABLE 3.1THE PROBABILITIES FOR THE MODE OF A MACROBLOCK ON P FRAME #2 IS SIMILAR TO THE MODES OF THE NEIGHBORING MACROBLOCKS CENTERED AT THE SAME POSITION ON P FRAME #1. 19 TABLE 3.2THE PROBABILITY THAT THE ESTIMATION MODE OF A MACROBLOCK IS SIMILAR TO THE TOP ONE... 20
TABLE 3.3THE PROBABILITY THAT THE ESTIMATION MODE OF A MACROBLOCK IS SIMILAR TO THE TOP TWO. ... 20
TABLE 3.4THE NUMBER OF THE AVAILABLE MOTION VECTORS IN FIG.3.2... 22
TABLE 3.5FIVE CATEGORIES FOR BLOCK MODES. ... 23
TABLE 4.1THE COMPUTATION TIMES (SECS) OF THE MOTION ESTIMATION PROCESS FOR NINE VIDEO SEQUENCES USING THE JM-9.3 REFERENCE SOFTWARE. ... 28
TABLE 4.2THE COMPUTATION TIMES (SECS) OF THE MOTION ESTIMATION PROCESS FOR NINE VIDEO SEQUENCES USING OUR PROPOSED METHOD... 28
TABLE 4.3THE SPEED UP ANALYSIS. ... 29
TABLE 4.4THE PSNR(DB) OF THE MOTION ESTIMATION PROCESS FOR NINE VIDEO SEQUENCES USING THE JM-9.3 REFERENCE SOFTWARE. ... 29
TABLE 4.5THE PSNR(DB) OF THE MOTION ESTIMATION PROCESS FOR NINE VIDEO SEQUENCES USING OUR PROPOSED METHOD. ... 30
TABLE4.6THE PSNR(DB) ANALYSIS. ... 30
TABLE 4.7THE BIT-RATE (KBITS) OF THE MOTION ESTIMATION PROCESS FOR NINE VIDEO SEQUENCES USING THE JM-9.3 REFERENCE SOFTWARE... 31
TABLE 4.8THE BIT-RATE (KBITS) OF THE MOTION ESTIMATION PROCESS FOR NINE VIDEO SEQUENCES USING OUR PROPOSED METHOD... 31
TABLE 4.9THE BIT-RATE ANALYSIS. ... 32
TABLE 4.10THE EFFICIENCY AND PSNR COMPARISONS FOR THE PROPOSED METHOD AND WU’S METHOD. ... 40
IV
TABLE 4.11THE EFFICIENCY AND PSNR COMPARISONS FOR THE PROPOSED METHOD AND JING’S METHOD. ... 40
V
LIST OF FIGURES
FIG.1.1MACROBLOCK AND SUB-MACROBLOCK PARTITIONS IN H.264... 2
FIG.2.1THE STRUCTURE OF H.264ENCODER... 5
FIG.2.2MACROBLOCK AND SUB-MACROBLOCK PARTITIONS IN H.264[1] ... 7
FIG.2.3THE FLOWCHART OF THE MAIN PART OF JM-9.3 ENCODER. ... 9
FIG.2.4THE FLOWCHART OF FUNCTION “ENCODE_ONE_MACROBLOCK”. ... 10
FIG.2.5THE VIDEO SEQUENCE OF BIRD FLYING (A) AND MOVING BUS (B) ARE USED TO ILLUSTRATE THAT THE BEST REFERENCE FRAME IS NOT THE PREVIOUS ONE. ... 14
FIG.2.6THE ILLUSTRATION OF MODE DECISION WITH THE ENCODING PARAMETERS DEFINED IN TABLE 2.1 FOR AKIYO SEQUENCE. ... 15
FIG.2.7THE ILLUSTRATION OF MODE DECISION WITH THE ENCODING PARAMETERS DEFINED IN TABLE 2.1 FOR COASTGRD SEQUENCE. ... 15
FIG.2.8THE ILLUSTRATION OF MODE DECISION WITH THE ENCODING PARAMETERS DEFINED IN TABLE 2.1 FOR FOREMAN SEQUENCE. ... 16
FIG.3.1THE MODE CORRELATION FOR A MACROBLOCK BETWEEN ADJACENT FRAMES. ... 18
FIG.3.2DISTRIBUTION OF MOTION VECTORS FOR A MACROBLOCK ON P-FRAME #2... 22
FIG.3.3THE PREDICTION MOTION VECTOR CORRESPONDING TO MACROBLOCK. ... 23
FIG.3.4THE BLOCK DIAGRAM OF FAST MODE DECISION METHOD... 25
FIG.4.1RATE-DISTORTION CURVES FOR AKIYO VIDEO SEQUENCE OBTAINED BY JM-9.3, OUR PROPOSED METHOD, AND JM-9.3 WITH ONLY MACROBLOCK. ... 33
FIG.4.2RATE-DISTORTION CURVES FOR CARPHONE VIDEO SEQUENCE OBTAINED BY JM-9.3, OUR PROPOSED METHOD, AND JM-9.3 WITH ONLY MACROBLOCK. ... 33
FIG.4.3RATE-DISTORTION CURVES FOR COASTGRD VIDEO SEQUENCE OBTAINED BY JM-9.3, OUR PROPOSED METHOD, AND JM-9.3 WITH ONLY MACROBLOCK. ... 34
FIG.4.4RATE-DISTORTION CURVES FOR CONTAINER VIDEO SEQUENCE OBTAINED BY JM-9.3, OUR PROPOSED METHOD, AND JM-9.3 WITH ONLY MACROBLOCK. ... 34
FIG.4.5RATE-DISTORTION CURVES FOR FOREMAN VIDEO SEQUENCE OBTAINED BY JM-9.3, OUR PROPOSED METHOD, AND JM-9.3 WITH ONLY MACROBLOCK. ... 35
FIG.4.6RATE-DISTORTION CURVES FOR HALL_MONITOR VIDEO SEQUENCE OBTAINED BY JM-9.3, OUR PROPOSED METHOD, AND JM-9.3 WITH ONLY MACROBLOCK. ... 35
FIG.4.7RATE-DISTORTION CURVES FOR MOTHER_AND_DAUGHTER VIDEO SEQUENCE OBTAINED BY JM-9.3, OUR PROPOSED METHOD, AND JM-9.3 WITH ONLY MACROBLOCK... 36
FIG.4.8RATE-DISTORTION CURVES FOR NEWS VIDEO SEQUENCE OBTAINED BY JM-9.3, OUR PROPOSED METHOD, AND JM-9.3 WITH ONLY MACROBLOCK. ... 36
FIG.4.9RATE-DISTORTION CURVES FOR SALESMAN VIDEO SEQUENCE OBTAINED BY JM-9.3, OUR PROPOSED METHOD, AND JM-9.3 WITH ONLY MACROBLOCK. ... 37 FIG.4.10THE COMPUTATION TIME OF THE MOTION ESTIMATION PROCESS FOR AKIYO (LEFT) AND
VI
CARPHONE (RIGHT) VIDEO SEQUENCES USING THE JM-9.3 REFERENCE SOFTWARE, OUR PROPOSED METHOD, AND JM-9.3 WITH ONLY MACROBLOCK. ... 37 FIG.4.11THE COMPUTATION TIME OF THE MOTION ESTIMATION PROCESS FOR COASTGRD (LEFT) AND
CONTAINER(RIGHT) VIDEO SEQUENCES USING THE JM-9.3 REFERENCE SOFTWARE, OUR PROPOSED METHOD, AND JM-9.3 WITH ONLY MACROBLOCK. ... 38 FIG.4.12THE COMPUTATION TIME OF THE MOTION ESTIMATION PROCESS FOR FOREMAN (LEFT) AND
HALL_MONITOR (RIGHT) VIDEO SEQUENCES USING THE JM-9.3 REFERENCE SOFTWARE, OUR PROPOSED METHOD, AND JM-9.3 WITH ONLY MACROBLOCK. ... 38 FIG.4.13THE COMPUTATION TIME OF THE MOTION ESTIMATION PROCESS FOR MOTHER_AND_DAUGHTER
(LEFT) AND NEWS (RIGHT) VIDEO SEQUENCES USING THE JM-9.3 REFERENCE SOFTWARE, OUR PROPOSED METHOD, AND JM-9.3 WITH ONLY MACROBLOCK. ... 38 FIG.4.14THE COMPUTATION TIME OF THE MOTION ESTIMATION PROCESS FOR SALESMAN VIDEO
SEQUENCE USING THE JM-9.3 REFERENCE SOFTWARE, OUR PROPOSED METHOD, AND JM-9.3 WITH ONLY MACROBLOCK... 39
1
CHAPTER 1 Introduction
Recently, the new video coding standard H.264/AVC [1] is proposed by the Joint
Video Team (JVT) [1] to develop the new low bit-rate video compression technology.
By comparing with the conventional coding standards, e.g., H.263 CHC
(Conversational High Compression) [3], H.263 Baseline [3], and MPEG-4 SP (Simple
Profile) [4], the compression rate may be improved about 27%, 40%, and 29%
respectively [8]. In the H.264/AVC coding standard [1], large flexibilities for the
motion estimation modes, entropy coding, multiple reference frames, intra prediction
modes for I-frame, motion estimation refinements… etc. are provided to obtain the
optimum R-D cost. Especially, there are seven motion estimation modes from 4×4 to
16×16 are used to find the minimum motion compensated error for each macroblock.
In the current JVT reference software [9], seven modes (the various kinds of block
sizes) are applied to perform the motion compensation process such that the R-D cost
defined in (1) is optimized.
R D
JMode = +
λ
Mode× (1)where D denotes the motion compensated error of a macroblock,
λ
Mode is theLagrange multiplier, and R represents the demanded bit-rate. In order to obtain the
optimum motion estimation mode, we must calculate the R-D cost for each motion
2
estimation mode in which some time-consuming processes, e.g., the motion
estimation, DCT transformation, and quantization, are involved. In addition, seven
motion estimation modes from 4×4 to 16×16 (see Fig. 1.1) are used to determine the
optimum motion estimation mode for each macroblock. Hence, the high computation
cost for the full search method used in the reference software JM-9.3 [9] make the
encoding process inefficient.
Fig. 1.1 Macroblock and Sub-macroblock partitions in H.264
Recently, many researches [10-13] [18-19] addressed on the fast mode decision
methods are proposed. In [10], the SAD (sum of absolute difference) between the
current frame and previous frame for each macroblock is applied to evaluate which
kind of block size is appropriate for the motion estimation process. The decision rules
in [10] are described briefly as follows. Firstly, the SAD is calculated and compared
with the predefined threshold. If the SAD is smaller than a threshold value then the
motion estimation mode will be selected among the 16×16, 16×8 and 8×16 block
Sub-macroblock partitions
0
1
0
2 1
3
Macroblock partitions
0
2 1
3
0
1
0 1
0 1
16×16 16×8 8×16 8×8
8×8 8×4 4×8 4×4
0
0
3
types. Otherwise, the motion estimation mode is determined by evaluate all the seven
block types. In [11], the edge detection [12] is applied to classify the homogeneous
and non-homogeneous regions. If the macroblocks belong to the homogeneous
regions then the motion estimation modes 4, 5, 6 and 7 (8×8, 8×4, 4×8 and 4×4) are
removed. Otherwise, the macroblock is divided into many sub-macroblocks of size
8×8 and each sub-macroblock is analyzed whether it belongs to the homogenous
region or not. If each 8×8 sub-macroblock is homogenous then the motion estimation
modes 5, 6 and 7 (8×4, 4×8 and 4×4) are removed. This process is continued until all
sub-macroblock are evaluated. Finally the rate-distortion optimization is used to
determine which motion estimation mode is best. In [12], Pan et al. applied the edge
direction histogram to reduce the computation cost of the decision of intra-prediction
modes. In [13], Zhu et al. applied the low-resolution image and edge detection to
determine the motion estimation modes (8×4, 4×8 and 4×4). However, the efficiency
of the motion compensation process will be reduced by the extra image processes for
determining the appropriate motion estimation mode.
In this thesis, the spatial-temporal correlations among the reference frames and
neighboring macroblocks are analyzed to develop a fast mode decision method in
which no extra image processes are used. Furthermore, the concept of drift
compensation [17] is adopted to avoid the error accumulation during the mode
4
decision process. By comparing with JM-9.3 reference software, the experimental
results show that the computation cost may be reduced above 60% and the average
PSNR is only dropped about 0.05db.
The rest of the thesis is organized as follows. The H.264/AVC structure and
JM-9.3 encoder are reviewed in the chapter 2. The proposed method of fast mode
decision is described in the chapter 3. In chapter 4, the simulation results for nine
video sequences are illustrated.
5
CHAPTER 2
The H.264/AVC and JM-9.3 Reference Software
Recently, the new video coding standard entitled “Advanced Video Coding
(AVC)” is proposed jointly as the part 10 of MPEG-4 and the ITU-T
Recommendation H.264. The main objectives for H.264/AVC coding standard are to
improve the compression efficiency, network friendly, error robustness and video
representation for interactive and non-interactive applications. In the H.264/AVC
coding standard, large flexibilities for the motion estimation modes, multiple
reference frames, intra prediction mode for I-frames, motion estimation refinements,
entropy coding… etc. are provided to obtain the optimum R-D cost. The H.264/AVC
coding structure [2] is shown in Fig. 2.1.
Fig. 2.1 The structure of H.264 Encoder
6
2.1 The H.264/AVC Baseline Profile
Generally, the H.264/AVC defines four encoding profiles: baseline, main,
extended, and fidelity range extensions (FREXT) [5]. In this thesis, only the baseline
profile is considered to develop the fast mode decision method. In the baseline profile,
just I- and P- slices are supported. I-slices consist of the intra-coded macroblocks;
while the P-slices consist of the intra-coded, inter-coded, or skipped macroblocks. The
inter-coded macroblocks in the P-slices are encoded by the motion compensated
process from a number of reference frames (previous coded pictures). Then the
residual data are transformed and quantized with the 4×4 integer transformation [2].
The transform coefficients are coded by using the context-adaptive variable length
coding scheme (CAVLC) [6] [7].
In the conventional coding standards, the reference frame for the P frame is fixed
as the previous frame. However, the reference frame in H.264/AVC may be multiple.
In addition, the block size for the motion compensation prediction process is no
longer limited to 16×16. In H.264, the block size may vary are from 16×16 to 4×4 (Fig.
2.2). There are totally seven block sizes are used for the motion compensation
prediction process.
7
0
Sub-macroblock partitions
0 1
0 1
0 1
2 3
0
0
1
0 1
0
2 1
3 1 macroblock partition of
16*16 luma samples and associated chroma samples
Macroblock partitions
2 macroblock partitions of 16*8 luma samples and associated chroma samples
4 sub-macroblocks of 8*8 luma samples and associated chroma samples 2 macroblock partitions of
8*16 luma samples and associated chroma samples
1 sub-macroblock partition of 8*8 luma samples and associated chroma samples
2 sub-macroblock partitions of 8*4 luma samples and associated chroma samples
4 sub-macroblock partitions of 4*4 luma samples and associated chroma samples 2 sub-macroblock partitions
of 4*8 luma samples and associated chroma samples
Fig. 2.2 Macroblock and Sub-macroblock partitions in H.264 [1]
Hence, the motion vector will be estimated for each partition of macroblock and
sub-macroblock. If the coding system choose a large partition size (16×16, 16×8 or 8×
16) means that the small number of bits are used to encode the motion vectors. On the
contrary, if the coding system choose the small partition size (8×8, 8×4, 4×8 or 4×4)
then larger number of bits will be required to encode the motion vectors.
In general, chroma components (Cr and Cb) of each 16×16 macroblock has half
horizontal and vertical resolutions of the luminance component. Each chroma
component is partitioned in the same way as the luminance component. The partition
in the luminance block 16×16 corresponds to the partition in the chroma block 8×8.
The partition in the luminance block 16×8 corresponds to the partition in the chroma
block 8×4.
8
2.2 The JM-9.3 Encoder Structure
In this thesis, the H.264/AVC reference software JM-9.3 [9] is utilized to develop
the fast mode decision method. Figure 2.3 shows the flowchart of the main part of
JM-9.3 encoder. In the JM-9.3 reference software, the configuration
(encoder_baseline.cfg) should be initialized at the head of “main” function. Then, the
frame_picture function is used to define the interlaced or progressive video. The
field_picture function is used for the interlaced video only. In the following, the
encode_one_macroblock function is used to encode each macroblock. In
encode_one_macroblock function, the PartitionMotionSearch function is utilized to
partition frame into the macroblocks for the BlockMotionSearch function. Finally, the
BlockMotionSearch function selects the motion estimation methods for the motion
compensation process. The algorithm of the function BlockMotionSearch for the
integer pixel and sub pixel searching are described as follows.
If (FastMotionEstimation enable) FastIntegerPelBlockMotionSearch() Else
#ifndef _FAST_FULL_ME FullPelBlockMotionSearch()
#else
FastFullPelBlockMotionSearch()
If (FastMotionEstimation enable) FastSubPelBlockMotionSearch() Else
SubPelBlockMotionSearch()
▲ Sub pixel motion search selection
◄ Integer pixel motion search selection
9
Fig. 2.3 The flowchart of the main part of JM-9.3 encoder.
The flowchart of the motion estimation and rate distortion optimization for mode
decision in reference software JM-9.3 is shows in Fig. 2.4.
10
Fig. 2.4 The flowchart of function “encode_one_macroblock”.
2.3 H.264/AVC Performance analysis
In this section, the performance of H.264/AVC coding system is illustrated. All
the tests/simulations/experiments are performed on the PC with P4-2.8G. The test
sequences consist of Akiyo, Container, Hall_Monitor, Mother_and_daughter, News,
Salesman, Carphone, Coastgrd, and Foreman. The parameters in the H.264/AVC for
the testing process are shown in Table 2.1.
11
Table 2.1 The related parameters in the H.264/AVC for the testing process.
Type categories QP Video Ref. Frame No. Inter mode block type A 28 qcif yuv 5 200 All the block type
B 28 qcif yuv 5 200 16×16
C 28 qcif yuv 5 200 16×16, 16×8, 8×16
D 28 qcif yuv 2 200 All the block type
The objectives of the testing processes focus on the analyses of the impact on the
encoding efficiency for the mode number used for the motion estimation and the
number of reference frames. Tables 2.2 to 2.10 illustrate the efficiency analysis.
Table 2.2 The efficiency analysis using the Akiyo test sequence.
Type categories Time (sec) PSNR (dB) Bit rate (kbps)
A 471 38.77 63.61
B 190 38.55 66.11
C 304 38.70 63.72
D 257 38.75 63.76
Table 2.3 The efficiency analysis using the Container test sequence.
Type categories Time (sec) PSNR (dB) Bit rate (kbps)
A 423 36.59 84.79
B 205 36.43 88.59
C 322 36.56 85.34
D 323 36.52 86.92
Table 2.4 The efficiency analysis using the Hall_Monitor test sequence.
Type categories Time (sec) PSNR (dB) Bit rate (kbps)
A 400 37.81 99.23
B 201 37.54 107.92
C 315 37.72 103.09
D 310 37.72 99.08
12
Table 2.5 The efficiency analysis using the Mother_and_daughter test sequence.
Type categories Time (sec) PSNR (dB) Bit rate (kbps)
A 428 37.78 81.40
B 196 37.49 86.40
C 318 37.70 82.10
D 327 37.71 81.41
Table 2.6 The efficiency analysis using the News test sequence.
Type categories Time (sec) PSNR (dB) Bit rate (kbps)
A 429 37.25 125.21
B 209 36.95 136.36
C 312 37.12 128.49
D 293 37.24 125.62
Table 2.7 The efficiency analysis using the Salesman test sequence.
Type categories Time (sec) PSNR (dB) Bit rate (kbps)
A 465 36.07 110.98
B 209 35.81 119.99
C 331 35.98 113.60
D 307 36.07 110.89
Table 2.8 The efficiency analysis using the Carphone test sequence.
Type categories Time (sec) PSNR (dB) Bit rate (kbps)
A 516 37.53 140.69
B 199 37.16 151.76
C 336 37.39 142.61
D 285 37.35 144.29
13
Table 2.9 The efficiency analysis using the Coastgrd test sequence.
Type categories Time (sec) PSNR (dB) Bit rate (kbps)
A 578 34.32 271.33
B 220 34.13 294.38
C 375 34.24 276.93
D 331 34.27 272.61
Table 2.10 The efficiency analysis using the Foreman test sequence.
Type categories Time (sec) PSNR (dB) Bit rate (kbps)
A 536 36.80 179.53
B 200 36.38 205.45
C 389 36.64 185.47
D 306 36.70 180.15
From Tables 2.2 to 2.10, the large number of macroblocks and reference frames
will make the encoding process inefficiency.
Many researches of improving the efficiency of multiple reference frames were
proposed. In [14], the authors use the SATD (Sum of Absolute Transformed
Differences) and motion vector compactness to decide how many reference frames are
needed. For example, the video sequences of bird flying (Fig. 2.5a) and moving bus
(Fig. 2.5b) are used to illustrate that the best reference frame is not the previous one.
14 (a)
(b)
Fig. 2.5 The video sequence of bird flying (a) and moving bus (b) are used to illustrate that the best reference frame is not the previous one.
Finally, some typical video sequences for the multiple mode section in
H.264/AVC are shown in Fig. 2.7 ~ 2.9. Furthermore, the blocks encoded with Inter
Mode and Intra Mode is labeled with different colors defined in Table. 2.11.
Table 2.11 the color definition.
Block type Color
MB_COPY ■
Inter Mode ■
Intra Mode ■
15
I frame P frame 1 P frame 2 P frame 3
P frame 4 P frame 5 P frame 6 P frame 7
P frame 8 P frame 9 P frame 10 P frame 11
Fig. 2.6 The illustration of mode decision with the encoding parameters defined in Table 2.1 for Akiyo sequence.
I frame P frame 1 P frame 2 P frame 3
P frame 4 P frame 5 P frame 6 P frame 7
P frame 8 P frame 9 P frame 10 P frame 11
Fig. 2.7 The illustration of mode decision with the encoding parameters defined in Table 2.1 for Coastgrd sequence.
16
I frame P frame 1 P frame 2 P frame 3
P frame 4 P frame 5 P frame 6 P frame 7
P frame 8 P frame 9 P frame 10 P frame 11
Fig. 2.8 The illustration of mode decision with the encoding parameters defined in Table 2.1 for Foreman sequence.
17
CHAPTER 3
The Fast Mode Decision Method
The full searching process (16×16, 16×8,…,4×4) for determining the estimation
mode in the reference software JM-9.3 will make the encoding process inefficient. In
this chapter, we will describe the proposed fast mode decision method. Firstly we will
analysis the spatial-temporal mode correlations among the spatial and temporal
macroblocks. Based on the spatial-temporal mode correlation, the fast mode decision
method is constructed.
3.1 The Spatial-Temporal Mode Correlation
By the careful observation of the mode decision process in JM-9.3 reference
software, the motion estimation mode of a macroblock is highly correlated with the
modes of the macroblocks neighboring to the same position on the previous reference
frames (multiple reference frames). The mode correlation is described in Fig. 3.1.
18
Fig. 3.1 The mode correlation for a macroblock between adjacent frames.
The probability that the mode of a macroblock on P frame #2 is similar to the
modes of the neighboring macroblocks centered at the same position on P frame #1
(The number 1~9 of P frame #1 in Fig. 3.1) is high. Based on the JM reference
software, the mode correlation is analyzed and listed in Table 3.1. Here, the
estimation modes: 16×16, 16×8, 8×16, 8×8, and skip mode are used to analyze the
mode correlation.
19
Table 3.1 The probabilities for the mode of a macroblock on P frame #2 is similar to the modes of the neighboring macroblocks centered at the same position on P frame
#1.
In Table 3.1, nine video sequences with length of 200 frames are used to analyze
the mode correlation. Furthermore, the quantization step size is set as QP=28, 32, 36
and 40 respectively. For the Foreman video sequence with QP=28, the possibility that
the estimation mode of a macroblock is closed to the estimation modes of the
neighboring macroblocks centered at the same position on the previous frame is
87.28%. From the analysis results in Table 3.1, it was observed that the mode
correlation for a macroblock between adjacent frames is high. Furthermore, two
additional experiments for the mode correlation analysis are given in Table 3.2 and
3.3. In Table 3.2 and 3.3 the skip and 16×16 modes are regarded as the same class.
Table 3.2 and 3.3 illustrate the probability that the estimation mode of a macroblock is
similar to the top one and top two estimation modes among the neighboring
macroblocks centered at the same position on the previous frame respectively.
Sequences QP28 QP32 QP36 QP40
Akiyo 95.11% 96.02% 97.22% 98.26%
Container 93.35% 94.88% 96.40% 97.55%
Hall_Monitor 95.28% 96.62% 97.16% 97.51%
Moth&Daug 91.77% 93.59% 95.11% 96.42%
News 92.90% 93.63% 94.52% 95.73%
Salesman 94.63% 95.10% 96.00% 97.31%
Carphone 88.54% 90.16% 92.32% 94.40%
Coastgrd 86.05% 87.31% 89.17% 92.70%
Foreman 87.28% 88.29% 89.22% 90.90%
20
Table 3.2 The probability that the estimation mode of a macroblock is similar to the top one.
Table 3.3 The probability that the estimation mode of a macroblock is similar to the top two.
From the observation in Tables 3.1, 3.2, and 3.3, we found that the motion
estimation mode of a macroblock is highly correlated with the motion estimation
modes of the macroblocks neighboring to the same position on the previous reference
frame. According to the above mode correlation analysis, we propose a new method
of fast mode decision to improve the computation cost of the mode decision process
Sequences QP28 QP32 QP36 QP40
Akiyo 90.10% 93.78% 96.76% 98.57%
Container 89.18% 93.38% 96.46% 98.49%
Hall_Monitor 91.58% 92.65% 93.74% 95.60%
Moth&Daug 78.31% 87.71% 93.86% 97.30%
News 81.11% 85.39% 90.04% 93.76%
Salesman 85.52% 87.99% 92.64% 96.69%
Carphone 64.70% 76.14% 85.99% 92.74%
Coastgrd 53.23% 65.31% 78.79% 89.42%
Foreman 54.29% 63.22% 74.78% 84.00%
Sequences QP28 QP32 QP36 QP40
Akiyo 93.48% 95.74% 97.76% 98.94%
Container 93.08% 95.37% 97.30% 98.70%
Hall_Monitor 96.01% 96.48% 96.41% 97.22%
Moth&Daug 86.42% 92.06% 95.95% 97.85%
News 87.86% 90.41% 93.16% 95.51%
Salesman 91.85% 92.35% 95.19% 97.80%
Carphone 78.22% 84.90% 90.51% 94.81%
Coastgrd 73.61% 79.11% 86.36% 93.32%
Foreman 72.11% 77.30% 84.31% 89.66%
21 in the H.264/AVC encoding system.
3.2 The Fast Mode Decision Method using Spatial-Temporal Mode Correlation
For each GOP, the mode decision process for first P-frame is determined by using
the full search method in JM reference software and the determined estimation modes
for each macroblock are used to predict the modes for next P-frame. The algorithm of
the fast mode decision method is described as follows.
Step 1. Tracking of the macroblock
To find the mode histogram for each macroblock, the corresponding position on
previous frame for each macroblock should be tracked. The position of each
macroblock on the current P-frame is tracked with the weighted motion vectors of the
macroblocks neighboring to the same position on the previous reference frame shown
in Fig. 3.2. The weighted motion vector is calculated as:
∑ ∑
+
=
−
= +
=
−
=
=
1
1 1
1
) , ( )
, (
x i
x i
y j
y j
)/T j i (
y x
MV
m
(2)where MV is the predicted motion vector, x and y denote the block coordinates of
current frame and m(i, j) is the motion vector of block (i, j) on previous frame (The
number 1~9 in Fig. 3.2). The precise prediction of the motion vector may be
calculated by weighting each block according to the area proportion of the block to
22
4×4 block. For example, Table 3.4 illustrates the area proportion of each block to 4×4
block for the macroblock shown in Fig. 3.2. The value of T in Eq. (2) is obtained as
the summation of all the weighting factors.
Fig. 3.2 Distribution of motion vectors for a macroblock on P-frame #2.
Table 3.4 The number of the available motion vectors in Fig. 3.2 No.1 No.2 No.3 No.4 No.5 No.6 No.7 No.8 No.9 T
16 0 4 8 16 0 0 0 0 44
Step 2. Calculation of the mode histogram
Once each macroblock on current frame is tracked, we may find the
corresponding macroblock shown in Fig. 3.3. With the tracked position of each
macroblock, the mode histogram may be obtained by calculating the numbers of the
estimation modes of the neighboring macroblocks centered at the position of the
tracked macroblock.
23
Fig. 3.3 The prediction motion vector corresponding to macroblock.
The mode histogram is calculated by the following steps. All the modes in Fig.
1.1 are classified into five categories listed in Table 3.5 according to their block size.
Based on the mode categories in Table 3.5, we sort the number of partitioned blocks
for each category and then choose the top two categories as the candidate motion
estimation modes.
Table 3.5 Five categories for block modes.
Mode Category
1 SKIP / 16×16
2 16×8 / 8×16
3 8×8
4 8×4 / 4×8
5 4×4
Step 3. Select the motion estimation mode in the candidate categories.
Step 4. In order to prevent the drift phenomenon, the candidate categories need to be
refined when the R-D cost for a macroblock is larger than predefined threshold.
24
Firstly, we record the R-D cost for each macroblock in first P-frame obtained from the
JM mode decision process (full searching). Then, the R-D cost of each macroblock
for the successive P-frame will compared to the R-D cost of the corresponding
macroblock on the first P-frame. If the R-D cost is greater than a predefined value, the
mode decision process will be refined as the following rule:
If 8×8 mode is not chosen, then the mode 8×8 is considered as the candidate
mode.
Else if 16×16 mode is not chosen, then the mode 16×16 is considered as the
candidate mode.
Else the mode 4×4 is considered as the candidate mode.
Step 5. Once the motion estimation modes in each macroblock are determined, the
partition scheme and corresponding motion vectors are recorded.
The block diagram of fast mode decision method using the spatial-temporal
correlation is illustrated in Fig. 3.4.
25
Fig. 3.4 The block diagram of fast mode decision method.
26
CHAPTER 4 Experimental Results
Here, all the efficiency and rate-distortion analyses are constructed on the basis of
JM-9.3 reference software and the simulation environments are defined as:
1. The length of video frames for the simulation is 200;
2. The number of reference frames is 5;
3. The search range for the motion estimation is 16 pixels;
4. The Hadamard transform for encoding DC components is used;
5. The rate-distortion optimization is applied;
6. The length of GOP is 13;
7. Nine video sequence of QCIF format are used as the testing video.
Based on the environment setting, the efficiency and rate-distortion analyses for the
proposed system are illustrated.
4.1 The Efficiency and PSNR Analyses for Fixed QP Parameters
In this section, the efficiency and rate-distortion analyses for fixed QP
parameters are illustrated. Here the QP parameters in H.264/AVC are fixed as 28, 32,
36, 40 respecting for the above analyses. The computation efficiency is analyzed with
Eq. (3) and the transmission rate is analyzed with Eq. (4). Firstly, the efficiency
27
analyses are illustrated in Table 4.1 ~ 4.3. The computation times of the motion
estimation process for nine video sequences using the JM-9.3 reference software are
listed in Table 4.1 and the computation times using our proposed method are listed in
Table 4.2. The speed up analysis is given in Table 4.3. It is obviously that the fast
mode decision using the spatial-temporal correlation may reduce the computation
time about 62.94%. Especially, the computation time may be reduced more than
67.33% for the low-motion video sequences: Akiyo, Container, Hall_Monitor and
Moth&Daug. The PSNR analyses are illustrated in Table 4.4 ~ 4.6, The PSNR of the
motion estimation process for nine video sequences using the JM-9.3 reference
software are listed in Table 4.4 and the PSNR using our proposed method are listed in
Table 4.5. The PSNR analysis is given in Table 4.6. It is obviously that the fast mode
decision using the spatial-temporal correlation is decreased about 0.05dB. The bit-rate
analyses are illustrated in Table 4.7 ~ 4.9. The bit-rate of the motion estimation
process for nine video sequences using the JM-9.3 reference software are listed in
Table 4.7 and the bit-rate using our proposed method are listed in Table 4.8. The
bit-rate analysis is given in Table 4.9. It is obviously that the fast mode decision using
the spatial-temporal correlation is only increased about 1.32%. In general, the degree
of the efficiency improvement for the video sequences with high motion (Carphone,
Coastgrd, Foreman) is less than the one for the video sequences with low motion
28 (Akiyo, Container, Hall_Monitor, Moth&Daug).
% ] 100
[
] [
]
[ x
JM Time
proposed Time
JM
T Time −
=
∆ (3)
% ] 100
[
] [ ]
[ x
proposed BitRate
JM BitRate proposed
BitRate
R −
=
∆ (4)
Table 4.1 The computation times (secs) of the motion estimation process for nine video sequences using the JM-9.3 reference software.
Sequence QP 28 QP 32 QP 36 QP 40
Akiyo 287.314 312.212 289.464 282.45 Container 269.474 266.356 264.932 259.672 Hall_Monitor 249.655 250.843 254.847 257.433
A
Moth&Daug 280.982 282.236 275.035 265.734 News 267.245 269.825 271.863 270.774
B
Salesman 271.056 282.485 288.397 287.349Carphone 321.541 315.659 310.295 301.799 Coastgrd 371.431 372.472 359.324 343.851
C
Foreman 318.449 316.305 315.779 310.682
Table 4.2 The computation times (secs) of the motion estimation process for nine video sequences using our proposed method.
Sequence QP 28 QP 32 QP 36 QP 40
Akiyo 111.302 103.608 101.118 89.628
Container 90.484 81.677 73.227 67.697
Hall_Monitor 80.144 78.406 79.468 76.155
A
Moth&Daug 121.263 109.707 86.865 75.448
News 103.086 101.961 87.118 80.992
B
Salesman 101.996 101.530 96.495 81.251Carphone 188.059 154.296 116.575 92.569 Coastgrd 189.512 166.412 140.789 116.430
C
Foreman 179.554 165.550 143.719 121.728
29
Table 4.3 The speed up analysis.
Sequence QP 28 QP 32 QP 36 QP 40
Akiyo 61.26% 66.81% 65.07% 68.27%
Container 66.42% 69.34% 72.36% 73.93%
Hall_Monitor 67.90% 68.74% 68.82% 70.42%
A
Moth&Daug 56.84% 61.13% 68.42% 71.61%
News 61.43% 62.21% 67.96% 70.09%
B
Salesman 62.37% 64.06% 66.54% 71.72%Carphone 41.51% 51.12% 62.43% 69.33%
Coastgrd 48.98% 55.32% 60.82% 66.14%
C
Foreman 43.62% 47.66% 54.49% 60.82%
Table 4.4 The PSNR (dB) of the motion estimation process for nine video sequences using the JM-9.3 reference software.
Sequence QP 28 QP 32 QP 36 QP 40
Akiyo 38.77 35.91 33.32 30.85
Container 36.59 34.05 31.48 29.01
Hall_Monitor 37.81 35.03 32.09 29.29
A
Moth&Daug 37.78 35.07 32.72 30.59
News 37.25 34.25 31.46 28.75
B
Salesman 36.07 33.15 30.43 28.04Carphone 37.53 34.81 32.36 29.83
Coastgrd 34.32 31.41 29.02 26.97
C
Foreman 36.80 34.09 31.48 28.84
30
Table 4.5 The PSNR (dB) of the motion estimation process for nine video sequences using our proposed method.
Sequence QP 28 QP 32 QP 36 QP 40
Akiyo 38.73 35.83 33.27 30.79
Container 36.56 34.00 31.42 28.95
Hall_Monitor 37.78 35.04 32.03 29.24
A
Moth&Daug 37.74 35.01 32.65 30.54
News 37.20 34.16 31.36 28.68
B
Salesman 36.05 33.11 30.40 28.01Carphone 37.48 34.73 32.25 29.73
Coastgrd 34.30 31.38 28.97 26.91
C
Foreman 36.76 34.01 31.37 28.77
Table4.6 The PSNR (dB) analysis.
Sequence QP 28 QP 32 QP 36 QP 40
Akiyo -0.04 -0.08 -0.05 -0.06
Container -0.03 -0.05 -0.06 -0.06
Hall_Monitor -0.03 0.01 -0.06 -0.05
A
Moth&Daug -0.04 -0.06 -0.07 -0.05
News -0.05 -0.09 -0.1 -0.07
B
Salesman -0.02 -0.04 -0.03 -0.03Carphone -0.05 -0.08 -0.11 -0.1
Coastgrd -0.02 -0.03 -0.05 -0.06
C
Foreman -0.04 -0.08 -0.11 -0.07
31
Table 4.7 The bit-rate (kbits) of the motion estimation process for nine video sequences using the JM-9.3 reference software.
Sequence QP 28 QP 32 QP 36 QP 40
Akiyo 63.61 42.86 29.47 20.66
Container 84.79 53.74 35.28 24.13
Hall_Monitor 99.23 66.08 43.84 29.28
A
Moth&Daug 81.40 48.74 30.31 19.17
News 125.21 82.76 54.76 35.96
B
Salesman 110.98 68.51 41.59 25.30Carphone 140.69 85.41 54.07 34.88
Coastgrd 271.33 134.35 69.78 39.51
C
Foreman 179.53 113.69 72.89 47.48
Table 4.8 The bit-rate (kbits) of the motion estimation process for nine video sequences using our proposed method.
Sequence QP 28 QP 32 QP 36 QP 40
Akiyo 64.49 42.81 29.35 20.74
Container 85.56 53.76 35.13 24.00
Hall_Monitor 102.93 68.15 44.52 29.66
A
Moth&Daug 83.17 48.71 30.26 19.15
News 130.20 84.51 55.52 36.31
B
Salesman 114.17 69.90 41.77 25.31Carphone 145.14 86.38 53.88 34.95
Coastgrd 280.52 138.28 70.75 39.89
C
Foreman 187.65 117.04 73.78 47.91
32
Table 4.9 The bit-rate analysis.
Sequence QP 28 QP 32 QP 36 QP 40
Akiyo 1.36% -0.12% -0.41% 0.39%
Container 0.90% 0.04% -0.43% -0.54%
Hall_Monitor 3.59% 3.04% 1.53% 1.28%
A
Moth&Daug 2.13% -0.06% -0.17% -0.10%
News 3.83% 2.07% 1.37% 0.96%
B
Salesman 2.79% 1.99% 0.43% 0.04%Carphone 3.07% 1.12% -0.35% 0.20%
Coastgrd 3.28% 2.84% 1.37% 0.95%
C
Foreman 4.33% 2.86% 1.21% 0.90%
4.2 The Rate-Distortion Analyses
In section 4.2, the rate-distortion analyses for the bit-rate 100K to 3000K bits/sec
are illustrated. The rate-distortion analyses are performed with the following two
environment setting: (1) All the motion estimation modes in the JM-9.3 are used, (2)
Only the motion estimation modes: 16×16, 16×8, and 8×16 are used. From Fig. 4.1 to
Fig. 4.9, the simulation results show that the PSNR value is closed the optimal value
obtained from the JM-9.3 reference software.
33
0 200 400 600 800 1000 1200 1400
40 42 44 46 48 50 52 54 56 58 60
Bit rate (kbit/s) @ 30.00 Hz
SNR Y(dB)
SNR vs. bit-rate for sequence-akiyo.qcif
JM-9.3 Proposed JM-9.3 only MB
Fig. 4.1 Rate-distortion curves for Akiyo video sequence obtained by JM-9.3, our proposed method, and JM-9.3 with only Macroblock.
0 500 1000 1500 2000 2500 3000 3500
35 40 45 50 55 60
Bit rate (kbit/s) @ 30.00 Hz
SNR Y(dB)
SNR vs. bit-rate for sequence-carphone.qcif
JM-9.3 Proposed JM-9.3 only MB
Fig. 4.2 Rate-distortion curves for Carphone video sequence obtained by JM-9.3, our proposed method, and JM-9.3 with only Macroblock.
34
0 500 1000 1500 2000 2500 3000
25 30 35 40 45 50 55
Bit rate (kbit/s) @ 30.00 Hz
SNR Y(dB)
SNR vs. bit-rate for sequence-coastgrd.qcif
JM-9.3 Proposed JM-9.3 only MB
Fig. 4.3 Rate-distortion curves for Coastgrd video sequence obtained by JM-9.3, our proposed method, and JM-9.3 with only Macroblock.
0 500 1000 1500 2000 2500 3000
35 40 45 50 55 60
Bit rate (kbit/s) @ 30.00 Hz
SNR Y(dB)
SNR vs. bit-rate for sequence-container.qcif
JM-9.3 Proposed JM-9.3 only MB
Fig. 4.4 Rate-distortion curves for Container video sequence obtained by JM-9.3, our proposed method, and JM-9.3 with only Macroblock.
35
0 500 1000 1500 2000 2500 3000
30 35 40 45 50 55 60
Bit rate (kbit/s) @ 30.00 Hz
SNR Y(dB)
SNR vs. bit-rate for sequence-foreman.qcif
JM-9.3 Proposed JM-9.3 only MB
Fig. 4.5 Rate-distortion curves for Foreman video sequence obtained by JM-9.3, our proposed method, and JM-9.3 with only Macroblock.
0 500 1000 1500 2000 2500 3000
38 40 42 44 46 48 50 52 54 56
Bit rate (kbit/s) @ 30.00 Hz
SNR Y(dB)
SNR vs. bit-rate for sequence-hall-monitor.qcif
JM-9.3 Proposed JM-9.3 only MB
Fig. 4.6 Rate-distortion curves for Hall_monitor video sequence obtained by JM-9.3, our proposed method, and JM-9.3 with only Macroblock.
36
0 500 1000 1500 2000 2500 3000
35 40 45 50 55 60
Bit rate (kbit/s) @ 30.00 Hz
SNR Y(dB)
SNR vs. bit-rate for sequence-mother-and-daughter.qcif
JM-9.3 Proposed JM-9.3 only MB
Fig. 4.7 Rate-distortion curves for Mother_and_daughter video sequence obtained by JM-9.3, our proposed method, and JM-9.3 with only Macroblock.
0 200 400 600 800 1000 1200 1400 1600 1800 2000 35
40 45 50 55 60
Bit rate (kbit/s) @ 30.00 Hz
SNR Y(dB)
SNR vs. bit-rate for sequence-news.qcif
JM-9.3 Proposed JM-9.3 only MB
Fig. 4.8 Rate-distortion curves for News video sequence obtained by JM-9.3, our proposed method, and JM-9.3 with only Macroblock.
37
0 500 1000 1500 2000 2500 3000
35 40 45 50 55 60
Bit rate (kbit/s) @ 30.00 Hz
SNR Y(dB)
SNR vs. bit-rate for sequence-salesman.qcif
JM-9.3 Proposed JM-9.3 only MB
Fig. 4.9 Rate-distortion curves for Salesman video sequence obtained by JM-9.3, our proposed method, and JM-9.3 with only Macroblock.
0 200 400 600 800 1000 1200 1400
50 100 150 200 250 300
Bit rate (kbit/s) @ 30.00 Hz
ME time(secs)
ME time vs. bit-rate for sequence-akiyo.qcif JM-9.3 Proposed JM-9.3 only MB
0 500 1000 1500 2000 2500 3000 3500
140 160 180 200 220 240 260 280 300 320
Bit rate (kbit/s) @ 30.00 Hz
ME time(secs)
ME time vs. bit-rate for sequence-carphone.qcif
JM-9.3 Proposed JM-9.3 only MB
Fig. 4.10 The computation time of the motion estimation process for Akiyo (left) and Carphone (right) video sequences using the JM-9.3 reference software, our proposed
method, and JM-9.3 with only Macroblock.