• 沒有找到結果。

CHAPTER 2 BACKGROUND

2.1 P REVIOUS W ORK

The normal video encoding uses sequential temporal prediction so the reference dependency is the same as the picture coding order of the GOP. If the GOP size is large, the dependencies of later frames cause serious problem for achieving VCR functionality.

In previous works, there are some techniques to implement VCR functionalities.

In [1], macroblock-based scheme is proposed to use the reverse play operation. It divides all the macroblocks to forward macroblocks (FMB) and backward macroblocks (BMB). They define MBn(k,l) which means that the macroblock is the nth frame and at kth row and lth column. MBn-1(k,l) is defined as BMB if MBn (k,l) has the same spatial position, for example, MBn (k,l) is coded without motion compensation. Otherwise, it is defined as FMB. In backward display, FMB is reconstructed by the formula:

MBn (k,l) = MC MBn-1 (mvn(k,l))+en(k,l)

MC MBn-1 is the motion compensated macroblock of MBn (k,l).

en is the prediction error.

BMB is reconstructed by the formula: (due to MC MBn-1 (mvn(k,l)) = 0) MBn (k,l) = MBn-1 (k,l)+en(k,l)

Ö MBn-1 (k,l) = MBn (k,l) - en(k,l).

In the algorithm, if we want to play frame n-1 after play frame n, BMB can be display with parsing en(k,l), but FMB is in the different situation. All related

macroblocks in frame n-2 which are the motion compensated macroblocks of FMB in frame n-1 needs to be sent. The BMB is the saving part in the algorithm. But the percentage of BMB is different by sequence and it just use for step reverse playback. If we want fast forward / reverse playback, the percentage of BMB will be little. The improvement for full VCR functionalities is limited.

In [2], another previous work uses video transcoding for fast forward / backward video playback. It must define different GOP structure for different speed display. If the required frame is the first frame of the GOP, it is set as intra frame. Otherwise it is set as inter frame. For example, the original sequence is 0th to 17th frame with 0th and 9th are intra frames and others are inter frames. If we use 4 times speed up, we play 0, 4, 8, 12, 16 in forward display. The 0th is also intra frame, 4th, 8th can use sum of motion vector to do motion compensation from 0th to 4th and 8th. But the 12th frame can not just parse all motion vectors to do motion compensation. It should needs 9th intra frame then we can use motion compensation from 9th to 11th to get 12th. Therefore the algorithm defines 12th frame as intra frame for 4 times speed up. Here they define a formula for define intra frames:

If (K mod L) < r (L is GOP size and r is display speed.) Ö We set the Kth frame as intra frame.

They re-estimate the motion vector of inter frame with 4 methods for 4 situations such as in place, area weighted average, maximum overlap and median. With 4 methods combine the motion vectors as new motion vector but the combination makes error accumulation. If the speed up rate is high, the error becomes large so they make a threshold to switch intra coding and re-estimation inter coding but the degradation of PSNR is still very serious.

In [3], other previous work uses dual bitstreams structure to each sequence. One is encoded by forward playback sequence and another is encoded with reverse playback

sequence. For example, we encode a sequence with 9 frames which shows in Table 2-1.

With 2 bitstream, the client requests a frame to server. The server finds the shortest way to get the frame. For example, if the client want to get 6th frame, the server pare the forward bitstream because it just need to parse 1 frame then the client can get the target frame. If we want get 5th, first we parse the 6th I frame of the forward bitstream and parse the 5th motion vector and residue of the reverse bitstream and then we can get the 5th frame.

This algorithm must contain 2 bitstreams and sometimes it needs to switch different bitstream to go minimum path to get target frame so it may not match perfectly.

No 0 1 2 3 4 5 6 7 8 9

Forward I P P P P P I P P P

Reverse P P P I P P P P P I

Table 2-1 dual bitstream structure

For VCR functionalities implementation with lower complexity and lower buffer cost, [4] reconstructs a new GOP structure. It decomposes sequential structure to hierarchical structure that can strongly reduce reference dependency. First, they present a binary tree structure which shows in Figure 2-1. Figure 2-1 uses a binary tree structure with N = 15, the level value is the same as reference dependency. From the Figure 2-1, the maximum dependency is 3, which is much smaller than the normal IPPP structure.

The maximum dependency is reduced from N to log2 (N)

With the dependency reduction, random access functionality is easy to implement.

From Figure 2-1, we can also found that there is no redundant frame decoding with 2n speed up (forward / inverse) playback because we can just skip large level frames. But this structure can not supply non 2n speed up playback.

In order to reduce the redundant frame for non 2n speed up, it proposes another

GOP structure shows in Figure 2-2. It sets the center frame to be the first encoded frame.

The frame of level = n connect to the edges of level = n-1 and each level has the same frame number. Comparing Figure 2-1 and Figure 2-2, if we want triple speed up playback that display frame 1, 4, 7, 10, and 14, which is green color in figure, the redundant decoded frame of binary tree are 5 frames, including 3, 5, 9, 11, and 13. But for the Figure 2-2 structure only has 2 redundant frames, which is frame 3 and 11.

Figure 2-1 and Figure 2-2 structures supplies much faster random access and fast playback with fewer redundant frames. But in this paper, it has not implemented the structure so we do not know the coding efficiency. Further, in this paper only one direction prediction is considered, bi-direction prediction is not discussed in this paper.

0

Figure 2-1 binary tree structure with N = 15 [4]

7

Figure 2-2 proposed GOP structure of previous work [4]

相關文件