O RGANIZATION - 利用對稱式樹狀預測架構在AVC中有效實現視訊正轉、倒轉、快轉及隨機擷取

CHAPTER 1 INTRODUCTION

1.3 O RGANIZATION

The details of the proposed symmetric tree prediction structure encoder and the proposed VCR-functionality-supported decoder will be described in the following chapters. The organization and abstract of each chapter are described as follows:

In Chapter 2, some previous works that address VCR functionality for video compression will be introduced first. An overview of Advanced Video Coding (AVC) will be given then. Some highlight features in AVC are briefly described. The reference picture management and utilization method in AVC, which is strongly related with the proposed structure, will be described in detail.

In Chapter 3, we firstly describe the basic concept of the proposed symmetric tree prediction structure. Then the implementation detail in AVC of the proposed structure is described. Further, we describe the concept of the proposed decoder that support VCR functionality. The decoder flow is described step by step. Finally, we discuss the cost of VCR functionality for normal GOP structure as well as the proposed structure. The trade-off among various symmetric tree prediction structures are discussed in detail.

In Chapter 4 we show the experimental results. The coding efficiency of various symmetric tree prediction structures is compared. We also compare the coding efficiency between the proposed structure and the normal GOP structures.

The conclusions are given in Chapter 5. We highlight the properties of the proposed prediction structure and the decoder that support VCR functionality.

Chapter 2 Background

2.1 Previous Work

The normal video encoding uses sequential temporal prediction so the reference dependency is the same as the picture coding order of the GOP. If the GOP size is large, the dependencies of later frames cause serious problem for achieving VCR functionality.

In previous works, there are some techniques to implement VCR functionalities.

In [1], macroblock-based scheme is proposed to use the reverse play operation. It divides all the macroblocks to forward macroblocks (FMB) and backward macroblocks (BMB). They define MBⁿ(k,l) which means that the macroblock is the n^th frame and at k^th row and l^th column. MB^n-1(k,l) is defined as BMB if MBⁿ(k,l) has the same spatial position, for example, MBⁿ(k,l) is coded without motion compensation. Otherwise, it is defined as FMB. In backward display, FMB is reconstructed by the formula:

MBⁿ(k,l) = MC MB^n-1(mvⁿ(k,l))+eⁿ(k,l)

MC MB^n-1is the motion compensated macroblock of MBⁿ(k,l).

eⁿis the prediction error.

BMB is reconstructed by the formula: (due to MC MB^n-1(mvⁿ(k,l)) = 0) MBⁿ(k,l) = MB^n-1(k,l)+eⁿ(k,l)

Ö MB^n-1(k,l) = MBⁿ(k,l) - eⁿ(k,l).

In the algorithm, if we want to play frame n-1 after play frame n, BMB can be display with parsing eⁿ(k,l), but FMB is in the different situation. All related

macroblocks in frame n-2 which are the motion compensated macroblocks of FMB in frame n-1 needs to be sent. The BMB is the saving part in the algorithm. But the percentage of BMB is different by sequence and it just use for step reverse playback. If we want fast forward / reverse playback, the percentage of BMB will be little. The improvement for full VCR functionalities is limited.

In [2], another previous work uses video transcoding for fast forward / backward video playback. It must define different GOP structure for different speed display. If the required frame is the first frame of the GOP, it is set as intra frame. Otherwise it is set as inter frame. For example, the original sequence is 0^th to 17^th frame with 0^th and 9^th are intra frames and others are inter frames. If we use 4 times speed up, we play 0, 4, 8, 12, 16 in forward display. The 0^th is also intra frame, 4^th, 8^thcan use sum of motion vector to do motion compensation from 0^th to 4^th and 8^th. But the 12^th frame can not just parse all motion vectors to do motion compensation. It should needs 9^th intra frame then we can use motion compensation from 9^th to 11^th to get 12^th. Therefore the algorithm defines 12^th frame as intra frame for 4 times speed up. Here they define a formula for define intra frames:

If (K mod L) < r (L is GOP size and r is display speed.) Ö We set the K^th frame as intra frame.

They re-estimate the motion vector of inter frame with 4 methods for 4 situations such as in place, area weighted average, maximum overlap and median. With 4 methods combine the motion vectors as new motion vector but the combination makes error accumulation. If the speed up rate is high, the error becomes large so they make a threshold to switch intra coding and re-estimation inter coding but the degradation of PSNR is still very serious.

In [3], other previous work uses dual bitstreams structure to each sequence. One is encoded by forward playback sequence and another is encoded with reverse playback

sequence. For example, we encode a sequence with 9 frames which shows in Table 2-1.

With 2 bitstream, the client requests a frame to server. The server finds the shortest way to get the frame. For example, if the client want to get 6^th frame, the server pare the forward bitstream because it just need to parse 1 frame then the client can get the target frame. If we want get 5^th, first we parse the 6^th I frame of the forward bitstream and parse the 5^th motion vector and residue of the reverse bitstream and then we can get the 5^th frame.

This algorithm must contain 2 bitstreams and sometimes it needs to switch different bitstream to go minimum path to get target frame so it may not match perfectly.

No 0 1 2 3 4 5 6 7 8 9

Forward I P P P P P I P P P

Reverse P P P I P P P P P I

Table 2-1 dual bitstream structure

For VCR functionalities implementation with lower complexity and lower buffer cost, [4] reconstructs a new GOP structure. It decomposes sequential structure to hierarchical structure that can strongly reduce reference dependency. First, they present a binary tree structure which shows in Figure 2-1. Figure 2-1 uses a binary tree structure with N = 15, the level value is the same as reference dependency. From the Figure 2-1, the maximum dependency is 3, which is much smaller than the normal IPPP structure.

The maximum dependency is reduced from N to log₂ (N)

With the dependency reduction, random access functionality is easy to implement.

From Figure 2-1, we can also found that there is no redundant frame decoding with 2ⁿ speed up (forward / inverse) playback because we can just skip large level frames. But this structure can not supply non 2ⁿ speed up playback.

In order to reduce the redundant frame for non 2ⁿ speed up, it proposes another

GOP structure shows in Figure 2-2. It sets the center frame to be the first encoded frame.

The frame of level = n connect to the edges of level = n-1 and each level has the same frame number. Comparing Figure 2-1 and Figure 2-2, if we want triple speed up playback that display frame 1, 4, 7, 10, and 14, which is green color in figure, the redundant decoded frame of binary tree are 5 frames, including 3, 5, 9, 11, and 13. But for the Figure 2-2 structure only has 2 redundant frames, which is frame 3 and 11.

Figure 2-1 and Figure 2-2 structures supplies much faster random access and fast playback with fewer redundant frames. But in this paper, it has not implemented the structure so we do not know the coding efficiency. Further, in this paper only one direction prediction is considered, bi-direction prediction is not discussed in this paper.

Figure 2-1 binary tree structure with N = 15 [4]

Figure 2-2 proposed GOP structure of previous work [4]

2.2 Advanced Video Coding

Advance Video Coding (AVC), unlike the previous video coding standard such as MPEG-2, provide a flexible reference picture management and utilization scheme. We utilize this scheme to achieve our tree-like prediction structure. In section 2.2.1 we provide an overview of AVC. In Section 2.2.2, we describe the concept and syntax of the reference picture management and utilization scheme in AVC.

2.2.1 Overview

AVC is the newest video coding standard developed by the Joint Video Team (JVT) of ISO/MPEG and ITU. It provides better coding efficiency compare with MPEG-4 and H.263. The detail syntax and decoding method are described in [7]. In the following, we firstly briefly described the AVC encoding process, and then briefly describe some highlighted features in AVC that enables enhanced coding efficiency.

In AVC encoding process, a video sequence is separate into several pictures, and each picture will be processed macroblock by macroblock.

Figure 2-2 is the AVC encoder block diagram. In inter prediction mode, it use block-based motion estimation and motion compensation to generate the prediction image. In intra prediction mode, is use the previous coded macroblocks at the same picture to generate the prediction image. The best prediction mode is selected by the mode decision scheme. The prediction value of the best prediction mode is subtracted from the original image to form the prediction residue (Dn). DCT and quantization are then applied on the residue. The results are further entropy coded to generate the bitstream. The reconstructed pictures are then generated in reverse direction of encoding, and are stored in the reference picture buffer. It will be used for the inter prediction of the following pictures.

Figure 2-3 AVC encoder [7]

Compare with the previous video coding standard, such as MPEG-2, the AVC has the following new features that can improve the coding efficiency: [8].

z Variable block size motion compensation with small block size: AVC support seven different block size, range from 4x4 to 16x16.

z Quarter sample accurate motion compensation: the prior standards just enable half sample motion vector accuracy.

z Motion vectors over picture boundaries: the picture boundary extrapolation technique is first used in H.263 and is included in AVC.

z Weighted prediction: this can dramatically improve coding efficiency for light change in the same scenes.

z Small block size transform: in prior standards use transform block size of 8x8, but AVC use block size of 4x4 transform that allows encoder to represent signal more locally-adaptive.

z Exact-match inverse transform: in previous standard, the DCT transform and inverse DCT transform are not perfect reconstruction. In AVC, the transform

and inverse transform are perfect reconstruction.

z Arithmetic entropy coding: a powerful arithmetic coding method known as CABAC is adopted in AVC. We can choose CAVLC and CABAC for entropy coding.

z Context-adaptive entropy coding: The two entropy coding method adopted in AVC, CAVLC and CABAC, are both use context-based adaptive to improve performance.

Except the above features, AVC also provides a flexible reference picture management and utilization scheme, which is strongly related with our work. The detailed descriptions are given in the following section.

2.2.2 Reference picture management process

The reference picture management scheme in AVC can be generally classified as following [8]:

z Multiple reference picture motion compensation: there is only 1 reference picture in forward prediction and 2 reference pictures in bi-direction prediction in prior standards. In AVC, we can use 16 reference pictures at most.

z Decoupling of referencing order from display order: in prior standards, the encoding order has strict limitation. In AVC, it removes the restriction that the encoder can choose the order of pictures with high degree of flexibility.

z Decoupling of picture representation methods from picture referencing capability: in prior standards, bi-direction picture can not be used as reference for prediction, but AVC remove the restriction that bi-direction picture can be used as reference.

To support these features, the AVC uses the following methods for the reference picture

management [7]. All of their related syntax is stored in the slice header of each slice, so the decoder can find out the temporal dependency of each picture by only decode the slice header rather than the whole picture.

2.2.2.1 Reference picture list initialization process

Previous video codec such as MPEG4 contains only one forward reference picture for P-picture, or one forward and one backward reference picture for B-pictures, so it is unnecessary to control the reference picture order. In AVC, the reference picture number can be up to 16. AVC uses the “reference picture list” to list the reference pictures that can be used by the handling pictures. Each index of the reference picture list mapped to a reference picture inside the decoded picture buffer (DPB). During the motion estimation (ME) and motion compensation (MC) stage of an inter-predicted block, the block simply indicate the index of the reference picture list to point out which reference picture it is used. With this structure, the AVC standard must provide a mechanism to

“order” the reference pictures in the reference picture list, that is, which reference picture is put at which index inside the reference picture list. The “reference picture list initialization process” provides the default ordering method in AVC.

In AVC, the reference pictures are divided into two types, one is short term reference pictures and another is long term reference pictures, each of which has different management method. To each short-term reference picture a variable PicNum is assigned, and to each long-term reference picture a variable LongTermPicNum is assigned. During the memory management process in AVC, one can identify a short-term or a long-term reference picture with PicNum or LongTermPicNum, respectively. For short term reference frame, PicNum is generally set with a value that related to the display order of that frame. For long-term reference frame, LongTermPicNum is set with the “memory management control operation” (MMCO) in

AVC, we will describe its detail in section 2.2.2.3.

In the following, we describe the reference picture list initialization process for P slice and B slice.

P slices reference list initialization

For P slice, the reference picture list RefPicList0 is ordered such that the short-term reference frames has lower indices than long-term reference frames. For short-term reference frames, they are ordered starting with the one that has the largest PicNum to the one that has the smallest PicNum. For long-term reference frames, they are ordered starting with the one that has the smallest LongTermPicNum to the one that has the largest LongTermPicNum.

We give an example for P slices list initialization:

Assuming we have 5 reference frames that contain 3 short term reference frames with PicNum = 303, 302, 300 and 2 long term reference frames with LongTermPicNum

= 0 and 3. After the initialization,

RefPicList0 [0] is the short term reference picture with Picnum = 303 RefPicList0 [1] is the short term reference picture with Picnum = 302 RefPicList0 [2] is the short term reference picture with Picnum = 300

RefPicList0 [3] is the long term reference picture with LongTermPicnum = 0 RefPicList0 [4] is the long term reference picture with LongTermPicnum = 3

B slices reference list initialization

For B slice, the reference picture is also ordered such that the short-term reference frames has lower indices than long-term reference frames. For the short term reference frames, we further divided them into two parts. The first part contains all references whose PicNum are smaller than current PicNum and the second part contains all

references whose PicNum are larger than current PicNum. The first part short-term reference frames are ordered starting with the one that has the largest PicNum to the one that has the smallest PicNum. The Second part short-term reference frames are ordered starting with the one that has the smallest PicNum to the one that has the largest PicNum.

For reference picture list RefPicList0, it is started with the entire ordered short-term reference frame in the first part short-term reference, followed by the entire ordered short-term reference frame in the second part, finally is the long-term reference frames with the same order used in P slice. For reference picture list RefPicList1, it is started with the entire ordered short-term reference frame in the second part short-term reference, followed by the entire ordered short-term reference frame in the first part, finally is again the long-term reference frames with the same order used in P slice. Also note that after this ordering, if the RefPicList1 is identical with RefPicList0, the first two entries of RefPicList1 are switched.

We also give an example to B slice:

Assuming we have 6 reference frames that contain 4 short term reference frames with Picnum = 303, 302, 300, 299 and 2 long term reference frames with LongTermpicnum = 0 and 3. The current reference frame is 301. After the initialization,

RefPicList0 [0] is the short term reference picture with Picnum = 300 RefPicList0 [1] is the short term reference picture with Picnum = 299 RefPicList0 [2] is the short term reference picture with Picnum = 302 RefPicList0 [3] is the short term reference picture with Picnum = 303

RefPicList0 [4] is the long term reference picture with LongTermPicnum = 0 RefPicList0 [5] is the long term reference picture with LongTermPicnum = 3 RefPicList1 [0] is the short term reference picture with Picnum = 302

RefPicList1 [1] is the short term reference picture with Picnum = 303

RefPicList1 [2] is the short term reference picture with Picnum = 300 RefPicList1 [3] is the short term reference picture with Picnum = 299

RefPicList1 [4] is the long term reference picture with LongTermPicnum = 0 RefPicList1 [5] is the long term reference picture with LongTermPicnum = 3 2.2.2.2 Reference picture list reordering process

In 2.2.2.1 we introduce the initialization of the reference list. The initialization process order the short term reference pictures such that the temporally closer reference frame is in lower index of the reference list. This is because temporally closer reference frames usually provided better prediction image and will be encoded many times for each inter prediction block. In the AVC entropy coding method such as CABAC, lower reference picture index can be coded with fewer bits, so put temporally closer frame at lower index can provide better coding efficiency. On another side, the initialization process order the long term reference frames from smallest LongTermPicNum to largest LongTermPicNum, which can not reflect to the frequency of there utilization. Further, sometimes the temporally closer reference frame is not the best reference frame and we will want to reorder the reference picture list. Another reason that we need to reorder the reference picture list is, for some prediction structure, such as the proposed tree-prediction structure, we may need to move some reference pictures outside the scope of the temporal prediction process of the handling picture to reduce the temporal dependency. To address these issues, AVC provides the reference picture list reordering process to make the user can fully control the order of the reference picture list.

In reference buffer list section, AVC has 2 reference buffer lists. If we just use forward prediction, the RefPicList0 is used. If we use forward and backward prediction at the same time, RefPicList1 and RefPicList0 buffers are both used. We use ref_pic_list_reordering_1X syntax element to present which buffer we are control. If ref_pic_list_reordering_flag_10 = 1, we make refPicList0 buffer reordering and if

ref_pic_list_reordering_flag_11 = 1, we make refPicList1 buffer reordering.

The second syntax element: reordering_of_pic_nums_idc. If reordering_of_pic_nums_idc = 0 or 1, our reordering process of reference buffer lists are for short term reference frame. If reordering_of_pic_nums_idc = 2, our reordering process of reference buffer lists are for long term reference frames. If reordering_of_pic_nums_idc = 3, the reference reordering function ends.

If reordering_of_pic_nums_idc = 0, we parse abs_diff_pic_num_minus1[i] = k that we move the reference of pic_num = ( current pic_num -

在文檔中利用對稱式樹狀預測架構在AVC中有效實現視訊正轉、倒轉、快轉及隨機擷取 (頁 13-0)