Motion Coder - MPEG-4 Video Texture Coding

Overview of the MPEG-4 Video Standard

2.2 MPEG-4 Video Texture Coding

2.2.3 Motion Coder

Motion coding is essential for P-VOP and B-VOP to reduce temporal redundancy. The motion coder consists of a motion estimator, motion compensator, previous/next VOPs store and motion vector (MV) predictor and coder. Furthermore, in order to perform the motion prediction for VOP of arbitrary shape, a special padding technique is required for the reference VOP before motion estimation.

Padding Process

Fig. 2.8 shows a simplified diagram of the padding process. The value of luminance and chrominance samples outside the VOP are defined by the padding process.

A decoded MB d[y][x] is padded by referring to the corresponding decoded shape block s[y][x]. An MB that lies on the VOP boundary is padded by replicating the boundary samples of the VOP towards the exterior. This process is divided into horizontal repetitive padding and vertical repetitive padding. The remaining MBs that are completely outside the VOP are filled by extended padding.

• Horizontal repetitive padding: Each sample at the boundary of a VOP is replicated horizontally to the left and/or right direction in order to fill the transparent region

Figure 2.8: Simplified padding process (from [5]).

Figure 2.9: Priority of boundary MBs surrounding an exterior MB (from [5]).

outside the VOP of a boundary block. If there are two boundary sample values for filling, the two sample values are averaged.

• Vertical repetitive padding: The remaining unfilled transparent region from above procedure are padded by similar process as the horizontal repetitive padding but in the vertical direction.

• Extended padding: Exterior MBs immediately next to boundary MBs are filled by replicating the samples at the border of the boundary MBs. If an exterior MBs is next to more than one boundary MBs, one of the MBs is chosen, according to the priority shown in Fig. 2.9. The remaining exterior MBs (not located next to any boundary MBs) are filled with 128.

Motion Estimation

Motion estimation (ME) is a method of prediction between adjacent frames/pkctures. In general, the ME techniques used in MPEG-4 can be seen as an extension of standard MPEG-1/2 or H.263 block matching techniques with modified block (polygon) matching to handle arbitrary-shaped VOPs which is block-based method.

For an arbitrary shape VOP, the bounded VOP is first extended to the right-bottom side to multiples of MB size. The alpha value of the extended pixels is set to zero. The SAD is used for error measure, and is computed only for the pixels with nonzero alpha values.

The basic motion estimation may be performed on 16 × 16 luminance MBs. The mo-tion vector is specified to half-pixel accuracy. In many coding software implementamo-tions, the motion estimation is performed by full search to integer pixel accuracy vector and, using it as the initial estimate, a half pixel search is performed around it. Interpolation of MB is necessary because the motion vector may be non-integer. Fig. 2.10 illustrates the bilinear interpolation method.

In the MPEG-4 standard, besides motion vector for 16 × 16 MB, motion vector can be sent for individual 8 × 8 blocks to reduce prediction errors more.

+ +

+ Integer pixel position Half pixel position

Figure 2.10: Interpolation scheme for half sample search (from [5]).

Motion Vector Encoder

The motion vector must be coded when using INTER mode coding. Horizontal and ver-tical motion vectors are coded differentially by using a spatial neighborhood of three motion vectors that have already been coded (see Fig. 2.11). These three motion vectors are candidate predictors for differential coding. The differential coding of motion vectors is performed with reference to the reconstructed shape. In the special cases at the borders of the current VOP the following decision rules are applied:

1. If the MB of one and only one candidate predictor is outside the VOP, it is set to zero.

2. If the MBs of two and only two candidate predictors are outside the VOP, they are set to the third candidate predictor.

3. If the MBs of all three candidate predictors are outside the VOP, they are set to zero.

For horizontal and vertical components, the median value of the three candidates for the same component is used as predictor, denoted P x and P y, respectively:

P x = Median(MV 1x, MV 2x, MV 3x),

P y = Median(MV 1y, MV 2y, MV 3y).

Then, the vector differences, MV Dx (= MV x − P x) and MV Dy (= MV y − P y), are coded by variable-length coding (VLC).

MV2 MV3 MV : Current motion vector

MV1: Previous motion vector MV2: Above motion vector MV3: Above right motion vector

: VOP border

MV1 MV1

Figure 2.11: Motion vector prediction (from [8]).

Motion Compensation

The motion compensator uses motion vectors to compute motion compensated prediction block, pred[i][j], from the same reference VOP. In addition to basic motion compensation processing, three alternatives are supported, namely, unrestricted motion compensation, four MV motion compensation and overlapped motion compensation.

For unrestricted motion compensation, the motion vectors are allowed to point outside the decoded area of a reference VOP. The pred[i][j] is defined as follows:

xref = min(max(xcurr + dx, vhmcsr), xdim + vhmcsr − 1),

yref = min(max(ycurr + dy, vvmcsr), ydim + vvmcsr − 1),

where vhmcsr = vop horizontal mc spatial ref, vvmcsr = vop vertical mc spatial ref, (ycurr, xcurr) is the coordinate of a sample in the current VOP, (yref, xref ) is the coor-dinate of a sample in the reference VOP, (dy, dx) is the motion vector, and (ydim, xdim) is the dimension of the bounding rectangle of the reference VOP.

One/two/four vectors decision is indicated by the MCBPC codeword and field prediction flag for each MB. If one motion vector is transmitted for a certain MB, this is considered four vectors with the same value as the MV. When two field motion vectors are transmit-ted, each of the four block prediction motion vectors has the value equal to the average of

the field motion vectors (rounded such that all fractional pixel offsets become half pixel offsets). If four vectors are used, each of the motion vectors is used for all pixels in one of the four luminance blocks in the MB.

Overlapped motion compensation is performed when the flag obmc disable = 0. Each pixel in an 8 × 8 luminance prediction block is a weighted sum of three prediction values, divided by 8. The creation of each pixel P (i, j), in an 8 × 8 luminance prediction block is governed by the following equation:

P (i, j) = ^{(p(i+M V}^x⁰^{,j+M V}^y⁰^)∗H⁰(i,j)+p(i+M Vx¹,j+M Vy¹)∗H1(i,j)+p(i+M Vx²,j+M Vy²)∗H2(i,j)+4)

8 ,

where (MV_x⁰, MV_y⁰) denotes the motion vector for the current block, (MV_x¹, M V_y¹) the motion vector of the block above or below, (MV_x², MV_y²) the motion vector of the block to the left or to the right, and H₀(i, j), H₁(i, j), and H₂(i, j) are the weighting value of each pixel in the current block and neighbor blocks.

Since the VOP may be coded in P or B mode, there are three types of motion pre-diction, namely forward mode, backward mode, and bi-directional mode. The different modes make different predictions ¯P (i, j) as follows.

1. Forward mode: Only the forward vector (MVFx,MVFy) is applied in this mode.

The prediction blocks ¯P_y(i, j), ¯P_u(i, j), ¯P_v(i, j) are generated from the forward ref-erence VOP.

2. Backward mode: Only the backward vector (MVBx,MVBy) is applied. The pre-diction blocks ¯Py(i, j), ¯Pu(i, j), ¯Pv(i, j) are generated from the backward reference VOP.

3. Bi-directional mode: Both the forward vector (MVFx,MVFy) and the backward vector (MVBx,MVBy) are applied. The prediction blocks ¯P_y(i, j), ¯P_u(i, j), ¯P_v(i, j) are generated from the forward and the backward reference VOPs by doing the forward and the backward predictions and then averaging both predictions pixel by pixel.

在文檔中 MPEG-4物件視訊編碼器在PACDSP平台上之軟體實現 (頁 29-35)