2.3 MPEG-4 Video Texture Coding (from [5], [6] and [7])
2.3.3 Motion Coder
There are four types of VOPs (see Figure 2.4 and associated discussion) that use differ-ent coding methods. Motion coding is necessary only for P-VOP and B-VOP to reduce temporal redundancy. The motion coder consists of a motion estimator, motion compen-sator, previous/next VOPs store and motion vector (MV) predictor and coder. In order to perform motion prediction on a per VOP basis, the motion estimation of the blocks on the VOP borders has to be modified from block matching to polygon matching. Furthermore, a special padding technique is required for the reference VOP.
Padding Process
The padding process defines the values of luminance and chrominance samples outside the VOP for prediction of arbitrarily shaped objects. Figure 2.11 shows a simplified dia-gram of this process.
A decoded MB d[y][x] is padded by referring to the corresponding decoded shape blocks[y][x]. A MB that lies on the VOP boundary is padded by replicating the boundary samples of the VOP towards the exterior. This process is divided into horizontal repetitive padding and vertical repetitive padding. The remaining MBs that are completely outside the VOP are filled by extended padding.
• Horizontal repetitive padding: Each sample at the boundary of a VOP is replicated
horizontally to the left and/or right direction in order to fill the transparent region outside the VOP of a boundary macroblock. If there are two boundary sample values for filling a sample outside of a VOP, the two boundary samples are averaged.
• Vertical repetitive padding: The remaining unfilled transparent samples from above procedure are padded by a similar process as the horizontal repetitive padding but in the vertical direction. The samples already filled in the horizontal repetitive padding are treated as if they were inside the VOP for the purpose of this vertical pass.
• Extended padding: Exterior MBs immediately next to boundary macroblocks are filled by replicating the samples at the border of the boundary macroblocks. Note that the boundary macroblocks have been completely padded in horizontal and ver-tical repetitive padding. If an exterior macroblock is next to more than one boundary macroblocks, one of the macroblocks is chosen, according to the priority shown as Figure 2.12. The exterior macroblock is then padded by replicating upwards, down-wards, leftdown-wards, or rightwards the row of samples from the horizontal or vertical border of the boundary macroblock having the largest priority number. The remain-ing exterior macroblocks (not located next to any boundary macroblocks) are filled with 128.
Motion Estimation
Motion estimation (ME) is a method of prediction between adjacent frames/pictures. This technique falls into two categories, pixel-based algorithms and block-based algorithms (BMA). The motion estimation method used in MPEG-4 encoder is block-based.
In general, the ME techniques used in MPEG-4 can be seen as an extension of standard MPEG-1/2 or H.263 block matching techniques with modified block (polygon) matching.
Figure 2.13 illustrates an example for polygon matching. The bounding rectangle of the VOP is first extended on the right-bottom side to multiples of macroblock size. Zero stuffing is used for these extended pixels. The alpha value of the extended pixels is set to zero. The MBs are formed by dividing the extended bounding rectangles into 16×16 blocks. SAD is used as error measure. The original alpha plane for the VOP is used to
Vertical Repetitive
Padding
Extended Padding Horizontal
Repetitive Padding Saturation
Σ
Predictions Framestores
f [y][x]
d [y][x]
s [y][x]
s’ [y][x]
hor_pad [y][x] hv_pad [y][x]
d’ [y][x]
Figure 2.11: Padding process (from [6]).
Exterior macroblock Boundary
macroblock 3
Boundary macroblock 0
Boundary macroblock 1 Boundary
macroblock 2
Figure 2.12: Priority of boundary MBs surrounding an exterior MB (from [6]).
macroblock
transparent pixels
Pixels for polygon matching
VOP
Figure 2.13: Polygon matching for an arbitrary shape VOP (from [6]).
exclude the pixels of the MB that are outside the VOP. SAD is computed only for the pixels with nonzero alpha value. This forms a polygon for the MB that includes the VOP boundary.
The reference VOP is padded based on its own shape information. For example, when the reference VOP is smaller than the current VOP, the reference is not padded up to the size of the current VOP.
The basic motion estimation is performed on 16 × 16 luminance MB. The motion vector is specified to half-pixel accuracy. In many coding software implementations, the motion estimation is performed by full search to integer pixel accuracy vector and, using it as the initial estimate, a half pixel search is performed around it.
In the MPEG-4 standard, besides motion vector for16 × 16 MB, motion vector can be sent for individual8 × 8 blocks to reduce more prediction errors. Both the 8 × 8 block motion compensation and overlapped motion compensated prediction are referred to as advanced prediction in H.263 and are adapted in MPEG-4 to work with arbitrary shaped VOPs.
Because the motion vector may be non-integer number, sample interpolation is neces-sary. The process for interpolation of half sample values is carried out only in half sample mode, where the half sample values are calculated by bilinear interpolation as depicted in Figure 2.14. Using interpolation, the half-pixel motion vector can be calculated.
Motion Vector Encoder
When using INTER mode coding, the motion vector must be coded. Horizontal and verti-cal motion vector are coded differentially by using a spatial neighborhood of three motion vectors already coded (see Figure 2.15). These three motion vectors are candidate