Chapter 1 Introduction
1.2 Thesis Organization
The thesis is organized as follows. The algorithm description and analysis of H.264/AVC’s main profile is introduced and discussed in Chapter 2. In Chapter 3, the proposed bandwidth-efficient motion compensation architecture for H.264/AVC video decoder is described first. Then, the motion compensation engine for supporting H.264/AVC’s main profile specification is illustrated. We also propose the novel data-reuse technique to reduce the required bandwidth particularly in H.264/AVC fractional motion compensation.
Chapter 4 presents frame and motion vector memory organization including memory access controller for external SDRAM. We apply a memory scheduling technique to reduce the access latency under external BUS and provide a flexible data arrangement method to improve data hit rate. The CHIP implementation is given in Chapter 5. Finally, conclusion is shown in Chapter 6.
Chapter 2
Motion Compensation Algorithm of H.264/AVC’s Main Profile
Similar to previous video standard, motion compensation is an important part in a video decoder system. The feature is that the current picture is predicted by previous decoded pictures without requiring extra bit-streams. Thus, the transmission bandwidth can be reduced efficiently without degrading visual quality. Hence, H.264/AVC is used in a wide range of applications due to its better coding efficiency.
In this Chapter, we will introduce a basic structure and concept of H.264/AVC coding standard in Section 2.2. In H.264/AVC, The main profile is almost a superset of the baseline profile. Specifically, additional tools provided by main profile are Bi-directional predictions, direct mode coding, multiple reference frames and weighted prediction for motion compensation part. The detailed algorithms of features related to motion compensation are described in the following sections. Finally, we will list differences among video coding standards such as MPEG-2, MPEG-4, etc in Section 2.6.
2.1 Profiling
Figure.2.1 shows the profiling of H.264/AVC’s main profile on ARM-7 processor. The reference software we adopt is JM 9.2 [3]. Specifically, inter prediction related modules, which occupy 51 % of the entire video decoder, include motion compensation, reconstruction,
and reference frame copy. If we improve this part efficiently, total performance of the decoder system will be increased as well. This dominated part can be greatly reduced by parallel processing, data-reuse scheme, or pipeline processing on the ASIC design.
Profiling
Figure 2.1 H.264 software (JM 9.2) profiling on ARM 7 processor
2.2 Motion Compensation Process Flow
The score of motion compensation process flow has been explained as Figure 2.2. Data relating to inter prediction are received from syntax parser. It is processed to pixels through several functional units consist of MV prediction, Interpolation and Weighted Prediction.
Figure 2.2 The general score of motion compensation for H.264/AVC’s main profile
Figure 2.3 shows the basic block diagram of H.264/AVC encoding block diagram. The block diagram of decoder is shown in Figure 2.4. With the exception of the de-blocking filter, we can find that most of the basic functional components (prediction, transform, quantization, entropy coding, etc) exist in previous standards such as MPEG-1, MPEG-2, MPEG-4, H.263 but important changes of H.264 occur in the details of each functional block. Because the decoder is our research topic, we will focus on decoder process flow. The decoder receives a compressed bitstream from channel receiver side and thereby entropy decodes the data elements to produce a set of quantized coefficients X. These coefficients are scaled and inverse transformed to D' n. The motion compensation (MC block) reconstructs the PRED according to previous decoded data. The PRED adds D' n to produce uF' n prior to the
-Figure 2.3 General structure of H.264 encoder.
Buffer Entropy
Figure 2.4 General structure of H.264 decoder
2.3 Inter Prediction Algorithm for H.264/AVC’s Main Profile
The inter prediction of H.264/AVC’s main profile includes tree-structured hierarchical macroblock partitions and more flexible block size selection called as variable block size (VBS) compared with previous standards [1][2][4]. In case of motion compensated prediction,
macroblocks are predicted from the image signal of transmitted reference images. For this purpose, each macroblock can be divided into smaller partitions such as 16x16, 16x8, and 8x8.
The corresponding 8x8 sub-macroblock is further divided into partitions with block sizes of 8×4, 4×8 or 4×4. For each sub-macroblock partition, a motion vector may be independently selected and coded, but the reference picture index and prediction type of the sub-macroblock is used for all sub-macroblock partitions. Chroma components use the same partition as luma components. The smallest block size selection could reach as small as 4x4 and 2x2 for luma and chroma component respectively. For each macroblock partition, a reference picture index, prediction type (list-0, list-1, bi-pred), and a motion vector may be independently selected and coded. Figure 2.5 illustrates all types of partitions.
Figure 2.5 Macroblock partitions and sub-macroblock partitions
2.3.1 Bi-directional Prediction
A bi-directional prediction is main feature provided by H.264/AVC main profile.
Bi-prediction uses two lists of previously decoded reference pictures, list-1 and list-0. The reference picture is previous or future decoded pictures for B-slices. Each macroblock of B slices may be predicted from previous reference picture (list-1) and future reference picture (list-0). In P slices, only single directional prediction is used, and the allowable reference pictures are list-0. In B slices, list-0 and list-1 of reference pictures are considered. For B-slices, single directional prediction using either list 0 or list 1 is allowed, or bi-prediction using both list 0 and list 1 is allowed. Figure 2.6 gives three examples to illustrate Bi-prediction: (a) one previous and one future reference (similar to B-picture prediction in previous MPEG video standard), (b) two past references and (c) two future references.
Previous frame L0 Current frame Future frame L1
Two future picture Two previous picture
One previous, one future picture
TIME
(b) (c)
(a)
Figure 2.6 Example using Bi-prediction: (a) previous/future (b) previous (c) future
In the bi-prediction, a reference block is created from list-0 and list-1 reference pictures.
Two motion compensated reference areas are obtained from a list-0 and list-1 picture respectively, and two separate motion vectors are required. Each sample of the prediction block is calculated as an average of motion vector of the list-0 and list-1 prediction sample.
Except when using Weighted Prediction, the following equation is used:
( , ) ( 0( , ) 1( , ) 1) 1
Pred i j = Pred i j +Pred i j + (2.2) Where 0( , )Pred i j and Pred i j are prediction samples derived from the list-0 and 1( , ) list-1 reference pictures and Pred i j is a bi-predictive sample. After calculating each ( , ) prediction sample, the reconstructed samples are a summation of residual and predicted data that is decoded by entropy decoding and intra/inter prediction respectively. The list-0 and list-1 motion vectors in bi-predictive macroblocks or blocks are predicted from neighboring motion vectors that have the same temporal direction. For instance, a motion vector for the
current macroblock pointing to a previous picture is predicted from other neighboring motion vectors that also point to previous pictures. It is illustrated as Figure 2.7. The prediction of motion vector is introduced as next section.
1
MVL
JJJJG
0
MV
LJJJJG
Figure 2.7 The current block is predicted by MVL0 and MVL1 motion vector using Bi-prediction
2.3.2 Multiple Reference Frames
In H.264/AVC, multiple reference frames may be used for inter-prediction [4], with a reference frame index coded to indicate which multiple reference frames are used. When bi-prediction is used by applying weighted prediction, the list 0 and the list 1 predictors are averaged together to form a final predictor. For each sub-macroblock partition, a motion vector may be independently selected and coded, but the reference frame index and prediction type of the sub-macroblock is used for all of the sub-macroblock partitions. Figure 2.8 shows the bi-prediction with multiple reference frames. An index is a reference frame parameter. An additional picture reference parameter has to be transmitted together with the motion vector in
bitstream. H.264 uses picture order count (POC) to indicate relative distances between coded pictures and reference pictures. POC is used for scaling motion vectors in direct modes, and for weighting factor derivation in WP implicit mode that will be introduced in the following sections. Adopting multiple reference frames increases the access frequency according to a linear model: 25% complexity increase for each added frame. A negligible gain (less than 2%) in bit rate is observed for low and medium bit rates, but more significant savings can be achieved for high bit rate sequences (up to 14%) [4].
Figure 2.8 Bi-prediction with multiple reference pictures
Up to five different reference frames can be used for inter-picture coding resulting in better subjective video quality and more efficient coding. Providing multiple reference frames can also help make the H.264 bitstream more error resilient. The error resilient tools are supported by extended profile in H.264/AVC, which will not be discussed in this thesis. Note that this feature leads to increased memory requirement for both the encoder and the decoder since previously decoded and reconstructed multiple reference frames must be maintained in
memory. For storing large pixels of several reconstructed reference frames, the huge memory size is required such as SDRAM. Therefore, we will propose an efficient memory allocation method and SDRAM controller architecture so that remained decoded pictures can be efficiently stored in single external memory. The related concept will be introduced in Chapter 4.
2.4 Motion Vector Prediction
The prediction for the decoded macroblock is determined by the set of motion vectors (MV) that are associated with that macroblock. The motion vectors indicate the position within the set of previously decoded frames from which each block of pixels will be predicted.
A motion vector is generated by motion vector prediction. In baseline profile, motion vector is only generated by traditional MV prediction that includes median and directional prediction.
Motion vector prediction of H.264/AVC’s main profile supports new predictable method in Bi-predictive slices: direct mode, which except for traditional MV prediction. We introduce these motion vector generations in the following sub-sections.
2.4.1 Traditional MV Prediction
The Motion vector is generated from motion vector difference (MVD) and motion vector prediction (MVP). The associated equations are expressed by (2. 1).
MVPy
MVD is decoded from universal variable length decoder (UVLD) and MVP is predicted according to neighboring motion vectors. MVP algorithm, of which concept is similar to that for MPEG-4, contains directional prediction for 16 x 8 or 8 x 16 block size and median prediction for other block sizes. The detail of MVP decision is shown in Figure 2.9. Equation
of median prediction is expressed by (2. 2). The location of MVA, MVB, MVC, MVD which neighboring current block is depends on different block sizes. For example, MVA is a left neighboring block and MVC is a right-upper neighboring block when block size is 8x16 as Figure 2.9 (a) shows. The definition of neighboring motion vector is illustrated as Figure 2.9 for different block sizes. In addition, some boundary conditions or exceptions have to be handled carefully. For instance, when MVC is not available, its value is replaced by MVD.
We do not go into details of those trivial boundary conditions over here.
) , ,
(MVA MVB MVC median
MVP= (2.2)
Figure 2.9 (a) directional prediction for 8 x 16 block size, (b) directional prediction for 16 x 8 block size, (c) median prediction
In addition to the motion-compensated block size described in Figure 2.5, a P macroblock can also be coded to P_SKIP mode. For this coding mode, neither residual signal nor motion information is transmitted. That is, motion vectors are only decided according to MVP. The reconstructed data is obtained similar to that of macroblock type P_16x16.
Macroblocks coded in P_SKIP are often located in large area with no scene change or slow motion. Besides the above techniques, H.264/AVC also supports multiple reference frames, weighted prediction and direct mode for B slice. These tools greatly improve coding efficiency. Application of de-blocking filter is a well-known method to improve image quality
by alleviating blocking artifacts. The de-blocking design in H.264/AVC is brought within motion-compensated prediction loop and the improvement in quality becomes more conspicuous.
2.4.2 Direct Mode Coding
1
MVL
JJJJG
0
MVL
JJJJG MVJJJJGC
Figure 2.10 Direct mode prediction for B slices
Direct mode is another method for motion vector prediction. The direct-mode macroblock does not require such side information but derives reference frame, block size, and motion vector data from the subsequent inter pictures. Figure 2.10 is shown to illustrate the process of direct mode coding. This mode superimposes two prediction signals. One prediction signal is derived from the future inter picture and the other comes from a previous picture. The direct mode uses bidirectional prediction and allows residual coding of the prediction error. The forward and backward motion vectors MVJJJJGL0
and MVJJJJGL1
of this mode are derived from the motion vectors MVC
JJJJG
used in the co-located macroblock of the future picture Ref. list-1. Note that the direct-mode macroblock uses the same partition as the co-located
macroblock. The prediction signal is calculated by a linear combination of two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures list-0 and list-1. When using multiple reference picture prediction, the forward reference picture for the direct mode Ref. list-1 is chosen to be the future inter picture with the co-located macroblock. The forward and backward motion vectors for direct-mode blocks are calculated as following equation:
is the backward motion vector, and
MVC
JJJJG
represents the motion vector of the co-located block in the future inter picture. For B pictures, TDD is the temporal distance between the previous and the future inter picture, and TDB is the distance between the current B picture and the previous inter picture. In that case, the actual reference picture Ref. list-0 (which is also a reference picture for the co-located macroblock of the following picture) is used for the calculation of the temporal distances TDD
and TDB. And when both the current macroblock and its co-located are in frame mode, TDB is the temporal distance between the current B frame and the reference frame Ref. list-0, and TDD is the temporal distance between the future reference frame Ref. list-0 and Ref. list-1.
2.5 Fractional Interpolation
H.264/AVC main profile standard also supports high motion resolution that reaches quarter motion accuracy for luma component and eighth one for chroma component. This can
be found firstly in advances profile of MPEG-4 Visual standard; however, H.264/AVC reduces the complexity of interpolation processing comparison with MPEG-4 standard. Luma half sample interpolation is generated from integer-position samples using a 6-tap symmetrical Finite Impulse Response (FIR) filter with weights (1, -5, 20, 20, -5, 1). Once all the half-pel samples are available, the quarter samples are produced by linear interpolation using bilinear filter. Luma samples interpolation is shown in Figure 2.11(a)-(c). Quarter-pel resolution motion vectors in the luma component require eighth-sample resolution vectors in the chroma component assuming 4:2:0 chrominance format. Interpolated samples at eighth-sample intervals in chroma component are generated using bilinear interpolator illustrated in Figure 2.10 (d), and the displacement can achieve one-eighth accuracy. Each sub-sample position i is a linear combination of the neighboring integer sample positions A, B, C and D.
Figure 2.11 (a) luma half sample with 6-tap FIR, (b) luma horizontal/vertical quarter sample with bilinear filter, (c) luma diagonal quarter sample with bilinear filter, (d)
chroma sample with bilinear filter. Upper-case letters indicate the full samples and lower-case letter indicates the interpolated fractional samples
From mathematical equations, they are both 2-D interpolation for luma and chroma interpolation. However, based on hardware implementation, these equations can be separated into two 1-D to reduce hardware cost, namely, horizontal filter first and than vertical one, or vice verse. In chapter 3, we will propose a novel architecture of interpolation to combine luma and chroma interpolation so that cost and complexity can be improved in ASIC design.
2.6 Weighted Prediction
The weighted prediction (WP) tool has been adopted in the H.264/AVC Main and Extended profiles to improve coding efficiency by applying a multiplicative weighting factor and an additive offset to the motion compensated prediction [5] [6]. While the concept of applying a weighting factor to a reference picture prediction is not new, the inclusion of the WP tool in the H.264 standard marks the first time such a feature has been incorporated into an international video compression standard. Weighted prediction also compensates the brightness difference so that the reference frame is more strongly correlated to the current frame. The WP allows arbitrary multiplicative weighting factors and additive offsets to be applied to reference picture predictions in both P and B pictures. The WP tool is particularly effective for coding fading sequences. When applying to a single prediction, as in P pictures, WP is similar to a leaky prediction, which has been previously proposed for error resiliency.
Leaky prediction becomes a special case of WP, with the scaling factor limited to the range0≤ ≤a 1. The WP also allows negative scaling factors, and scaling factors greater than
one. A key difference of H.264’s WP tool from previous proposals involving weighted prediction for compression efficiency is the association of the reference picture index with the weighting factor parameters, which allows for efficient signaling of these parameters.
Use of weighted prediction is indicated in the sequence parameter set for P slices using the weighted_pred_flag field, and for B slices using the weighted_bipred_idc field. There are two WP modes -- explicit mode, which is supported in P and B slices, and implicit mode, which is supported in B slices only. A single weighting factor and offset are associated with each reference picture index for each color component in each slice. In explicit mode, these WP parameters may be coded in the slice header. In implicit mode, these parameters are derived based on relative distance of the current picture and its reference pictures. For each macroblock or macroblock partition, the weighting parameters are based on the reference picture index (or indices in the case of bi-prediction) of the current macroblock or macroblock partition. The reference picture indices are either coded in the bitstream or may be derived, e.g., for skipped or direct mode macroblocks. The use of the reference picture index to signal which weighting parameters to apply is bit-rate efficient, as compared to requiring a weighting parameter index in the bitstream, because the reference index is already available based on other required bitstream fields.
2.6.1 Explicit Mode
Use of explicit mode WP is indicated by weighted_pred_flag equal to 1 in the picture parameter set of P slices, or by weighted_bipred_idc equal to 1 in B slices. In explicit mode, the WP parameters are coded in the slice header for each coded slice. A multiplicative weighting factor and additive offset for each color component may be coded for each of the allowable reference picture in list 0 for P slices and B slices. The number of allowable reference pictures in list 0 is indicated in the picture parameter set by num_ref_idx_l0_active_minus1, and for list 1 for B slices is indicated by
num_ref_idx_l1_-active_minus1. The weighting factors and offsets used in a particular slice are included in the slice header when explicit mode WP is used. The allowable range of parameter values is constrained to 16-bit arithmetic operations in the inter prediction process.
The dynamic range and precision of the weighting factors can be adjusted using the luma_log_weight_denom and chroma_log_- weight_denom fields, which are the binary logarithm of the denominator of the luma and chroma weighting factors, respectively. Higher values of the log weight denominator allow more fine-grained weighting factors but require additional bits for coding the weighting factors and limit the range of the effective scaling.
For each allowable reference picture index in list 0, and for B slices also in list 1, flags are coded which indicate whether or not weighting parameters are present in the slice header for
For each allowable reference picture index in list 0, and for B slices also in list 1, flags are coded which indicate whether or not weighting parameters are present in the slice header for