P REVIOUS W ORK - 利用對稱式樹狀預測架構在AVC中有效實現視訊正轉、倒轉、快轉及隨機擷取

CHAPTER 2 BACKGROUND

2.1 P REVIOUS W ORK

The normal video encoding uses sequential temporal prediction so the reference dependency is the same as the picture coding order of the GOP. If the GOP size is large, the dependencies of later frames cause serious problem for achieving VCR functionality.

In previous works, there are some techniques to implement VCR functionalities.

In [1], macroblock-based scheme is proposed to use the reverse play operation. It divides all the macroblocks to forward macroblocks (FMB) and backward macroblocks (BMB). They define MBⁿ(k,l) which means that the macroblock is the n^th frame and at k^th row and l^th column. MB^n-1(k,l) is defined as BMB if MBⁿ(k,l) has the same spatial position, for example, MBⁿ(k,l) is coded without motion compensation. Otherwise, it is defined as FMB. In backward display, FMB is reconstructed by the formula:

MBⁿ(k,l) = MC MB^n-1(mvⁿ(k,l))+eⁿ(k,l)

MC MB^n-1is the motion compensated macroblock of MBⁿ(k,l).

eⁿis the prediction error.

BMB is reconstructed by the formula: (due to MC MB^n-1(mvⁿ(k,l)) = 0) MBⁿ(k,l) = MB^n-1(k,l)+eⁿ(k,l)

Ö MB^n-1(k,l) = MBⁿ(k,l) - eⁿ(k,l).

In the algorithm, if we want to play frame n-1 after play frame n, BMB can be display with parsing eⁿ(k,l), but FMB is in the different situation. All related

macroblocks in frame n-2 which are the motion compensated macroblocks of FMB in frame n-1 needs to be sent. The BMB is the saving part in the algorithm. But the percentage of BMB is different by sequence and it just use for step reverse playback. If we want fast forward / reverse playback, the percentage of BMB will be little. The improvement for full VCR functionalities is limited.

In [2], another previous work uses video transcoding for fast forward / backward video playback. It must define different GOP structure for different speed display. If the required frame is the first frame of the GOP, it is set as intra frame. Otherwise it is set as inter frame. For example, the original sequence is 0^th to 17^th frame with 0^th and 9^th are intra frames and others are inter frames. If we use 4 times speed up, we play 0, 4, 8, 12, 16 in forward display. The 0^th is also intra frame, 4^th, 8^thcan use sum of motion vector to do motion compensation from 0^th to 4^th and 8^th. But the 12^th frame can not just parse all motion vectors to do motion compensation. It should needs 9^th intra frame then we can use motion compensation from 9^th to 11^th to get 12^th. Therefore the algorithm defines 12^th frame as intra frame for 4 times speed up. Here they define a formula for define intra frames:

If (K mod L) < r (L is GOP size and r is display speed.) Ö We set the K^th frame as intra frame.

They re-estimate the motion vector of inter frame with 4 methods for 4 situations such as in place, area weighted average, maximum overlap and median. With 4 methods combine the motion vectors as new motion vector but the combination makes error accumulation. If the speed up rate is high, the error becomes large so they make a threshold to switch intra coding and re-estimation inter coding but the degradation of PSNR is still very serious.

In [3], other previous work uses dual bitstreams structure to each sequence. One is encoded by forward playback sequence and another is encoded with reverse playback

sequence. For example, we encode a sequence with 9 frames which shows in Table 2-1.

With 2 bitstream, the client requests a frame to server. The server finds the shortest way to get the frame. For example, if the client want to get 6^th frame, the server pare the forward bitstream because it just need to parse 1 frame then the client can get the target frame. If we want get 5^th, first we parse the 6^th I frame of the forward bitstream and parse the 5^th motion vector and residue of the reverse bitstream and then we can get the 5^th frame.

This algorithm must contain 2 bitstreams and sometimes it needs to switch different bitstream to go minimum path to get target frame so it may not match perfectly.

No 0 1 2 3 4 5 6 7 8 9

Forward I P P P P P I P P P

Reverse P P P I P P P P P I

Table 2-1 dual bitstream structure

For VCR functionalities implementation with lower complexity and lower buffer cost, [4] reconstructs a new GOP structure. It decomposes sequential structure to hierarchical structure that can strongly reduce reference dependency. First, they present a binary tree structure which shows in Figure 2-1. Figure 2-1 uses a binary tree structure with N = 15, the level value is the same as reference dependency. From the Figure 2-1, the maximum dependency is 3, which is much smaller than the normal IPPP structure.

The maximum dependency is reduced from N to log₂ (N)

With the dependency reduction, random access functionality is easy to implement.

From Figure 2-1, we can also found that there is no redundant frame decoding with 2ⁿ speed up (forward / inverse) playback because we can just skip large level frames. But this structure can not supply non 2ⁿ speed up playback.

In order to reduce the redundant frame for non 2ⁿ speed up, it proposes another

GOP structure shows in Figure 2-2. It sets the center frame to be the first encoded frame.

The frame of level = n connect to the edges of level = n-1 and each level has the same frame number. Comparing Figure 2-1 and Figure 2-2, if we want triple speed up playback that display frame 1, 4, 7, 10, and 14, which is green color in figure, the redundant decoded frame of binary tree are 5 frames, including 3, 5, 9, 11, and 13. But for the Figure 2-2 structure only has 2 redundant frames, which is frame 3 and 11.

Figure 2-1 and Figure 2-2 structures supplies much faster random access and fast playback with fewer redundant frames. But in this paper, it has not implemented the structure so we do not know the coding efficiency. Further, in this paper only one direction prediction is considered, bi-direction prediction is not discussed in this paper.

Figure 2-1 binary tree structure with N = 15 [4]

Figure 2-2 proposed GOP structure of previous work [4]

在文檔中利用對稱式樹狀預測架構在AVC中有效實現視訊正轉、倒轉、快轉及隨機擷取 (頁 14-18)