Chapter 3 Modified Rate-Distortion Model for H.264/AVC
3.4 Frame Level Header Bits Prediction
The header bits prediction is needed in the R-D model. In Equation ( 2-3 ), the quadratic R-D model needs the predicted number of header bits. In JM, a simple way is used to estimate the number of head bits. It takes the number of the previous macroblock header bits as the predicted number of header bitsmh i,( )j . Hence, in JM, we have
In this section, we change the predicted number of header bits based on the observation discussed in Section 3.3. The header bits of a frame are actually a combination of the header bits from several macroblocks that can have different modes. Hence, we can divide the header bits in a frame into several parts, as shown in Figure 3-25. After dividing, we predict the number of each macroblock mode in the current frame and compute the average bits of each macroblock mode. Equation ( 3-12 ) formulates our header bits prediction.
52
Figure 3-25 Proposed frame-level header bits prediction
( ) ( )
( ) ( )
( ) ( )
1 1
16 16,16 8,8 16,8 8,
t
t t
t
t frame
t
h j N j h j
N j
h j h j t chromaintra
= × −
−
=
∑
= × × × ×
( 3-12 ) wherehandhare the predicted and actual numbers of coded header bits, and j means the j-th pictures. The symbol t denotes the type of macroblock mode. N and N are the actual and predicted macroblock number. Hence, we take a linear model to predict the number of macroblock mode and compute the total predicted number of frame-level header bits. In the following experiments, we compare the precision of header bits prediction. The error means the mean of absolute difference between the predicted number header bits and the actual number of coded header bits, with the GOP format being IPPPP….
Table 3-9 Experiments of the header bits prediction
Error (MAD) PSNR(Y) Sequence Bit Rate
JM Our
Improvement
(JM-Our)/JM JM Our
32K 102.28 118.88 -0.16 22.81 22.80
64K 230.50 195.20 0.15 25.85 25.84
128K 448.91 307.23 0.32 28.9 28.89
256K 455.74 341.97 0.25 32.35 32.35
Bus
512K 618.10 435.25 0.30 36.50 36.50
32K 149.36 117.82 0.21 23.03 23.00
64K 346.00 220.79 0.36 26.25 26.23
128K 466.33 303.87 0.34 29.65 29.63
256K 487.02 370.80 0.24 33.36 33.36
Flower
512K 437.03 327.30 0.25 37.54 37.54
32K 246.28 217.30 0.12 34.98 35.04
64K 406.40 308.47 0.24 37.14 37.15
128K 521.44 354.06 0.32 39.22 39.22
256K 661.20 423.24 0.36 40.86 40.86
Highway
512K 332.83 324.61 0.02 42.13 42.12
32K 213.48 263.36 -0.23 21.55 21.54
64K 370.37 279.81 0.24 24.62 24.63
128K 520.11 390.53 0.25 27.95 27.95
256K 498.20 456.32 0.08 31.58 31.56
Stefan
512K 609.49 562.51 0.08 35.73 35.74
In Table 3-9, we can see that there is an expectable improvement in common cases.
However, the improvement of PSNR is not obvious, and sometimes we may even have worse results. In this experiment, we know that a complicated prediction may not necessarily provide a better result for rate control. Hence, we maintain the original header bits prediction in our rate control scheme and provide an option if an accurate prediction is desired.
54
Chapter 4
Bit allocation of H.264/AVC
In Chapter 3, we discussed the R-D model for the rate control of H.264/AVC. Before applying the R-D model to the estimation of the QP value, we need to determine the target bits. Target bit means the expected coded bits as we code a frame or a macroblock. The allocation of target bits can affect the buffer fullness and video quality. In this chapter, we focus on the bit allocation of target bits in frame level and macroblock level. In frame level, we will determine the target bits for each frame (picture). In the following, we’ll first discuss the target bits in frame level.
Figure 4-1 The diagram of the bit allocation for picture level and basic-unit level
4.1 Frame Level
In JM model of H.264/AVC, the target bits are determined by two parts, as expressed in Equations ( 2-13 ) and ( 2-14 ). Equation ( 2-13 ) uses the buffer fullness, target buffer level, and average picture bit rate to determine the target bits. The strategy is that when buffer fullness is higher than the target bits, we use fewer bits to encode the data and thus reduce the level of buffer fullness. Equation ( 2-14 ) determines the target bits according to the remaining bits of this GOP and averages the remaining bits by checking the picture complexity.
5
Apart from the above two strategies, there are other methods that are proposed for bit allocation. For example, Jiang and Lin provide the PSNR-based frame complexity estimation to determine the target bits [16]. In this section, we propose some ideas that can be used in bit allocation of frame target bits. In Section 4.1.1, we provide a parameter that can represent the frame complexity. In Section 4.1.2, we raise the importance of the beginning pictures of each GOP and add this importance into the consideration of bit allocation. Finally in Section 4.1.3, we indicate the fact that more bits coded in this frame can cause the decrease of the residual of the next frame. However, as too many bits are consumed in the coding of the current frame, there could be inadequate remaining target bits for the coding of the following frames. Hence, we will discuss the trade-off of this relation in that section.
56
4.1.1 Frame Complexity
When encoding video with a constant quantization value, the numbers of coded bits for each frame are different. That is because that the frame complexity in each frame is different.
Traditionally, the MAD (mean absolute difference) value is usually used to represent the frame complexity. In theory, a larger MAD will correspond to the encoding of more bits. Here, we simulate the relation between the MAD value and the number of coded bits with a constant quantization value. Figure 4-2 describes the relation between the number of coded bits and the MAD value for a sequence of frames when the QP=6, 16, 26, 36 and 46. We can see the number of coded bits is directly proportional to the MAD value when the QP is small, like QP=6, 16 and 26. However, this proportional property is not found when QP = 36 and 46.
When the QP value is large, the error propagation will cause the MAD value to rise dramatically. Hence, using MAD to represent the frame complexity may be not suitable when the QP value is large. Here, we try to find a factor that can represent the frame complexity.
Figure 4-2 The relations between the number of coded bits and MAD
Figure 4-3 The relations between the number of coded bits and the difference between the number of 8×8 macroblocks and the number of 16×16 macroblocks.
In Figure 4-3, we can observe that the difference between the number of the 8×8 macroblocks and the number of 16×16 macroblocks is proportional to the number of coded bits. This is because when the QP value is large, the number of 16×16 macroblocks will increase. Hence, using the number of 8×8 or 16×16 macroblocks could be more helpful than using the MAD value, especially when the bit rate is low. Hence, after the target bits are determined by Equation ( 2-15 ), we add another equation ( 4-1 ) to re-estimate the target bits.
( )
8×8mode macroblocks. The coefficient c is generally a small value, such as 0.1 or 0.2. In the58
following, we compare the coded bits with rate control and a constant quantization value. If X is the array of the coded bits of each frame with rate control and Y is the array of the coded bits with a constant QP, the correlation coefficient between X and Y arrays can represent their correlation. The correlation coefficient and the average PSNR are listed in Table 4-1.
Table 4-1 The correlation coefficients table
Sequence Bit Rate Coefficient c
Correlation Coefficient
PSNR(Y)
Original 0.143 25.85
0.1 0.180 25.83
64K
0.2 0.097 25.89
Original 0.024 28.90
0.1 0.002 28.91
Bus
128K
0.2 0.056 28.91
Original 0.272 37.14
0.1 0.265 37.18
64K
0.2 0.246 37.18
Original 0.315 39.22
0.1 0.312 39.23
Highway
128K
0.2 0.182 39.20
Original 0.037 24.62
0.1 0.032 24.58
64K
0.2 0.012 24.63
Original 0.061 27.95
0.1 0.041 27.98
Stefan
128K
0.2 0.053 27.9
In Table 4-1, we can see the improvement is not obvious. Sometime, the correlation coefficient is even reduced. This is because different QP values will cause the change of the number of each macroblock mode. Hence, the relation between the number of macroblock
and frame complexity will not be obvious.
4.1.2 Frame Importance
In Joint Model, each P-frame of the GOP has the same importance. Hence, the plot of the target buffer level is like Figure 4-4 (a). After coding the first I-frame, the target buffer level rises up to 50% and a normal target buffer level hopes to decrease the buffer level in a uniform way. In stead, our idea is that the first few pictures in a GOP are more important than the other pictures. This is because the quantization errors occurring in the first few pictures will have a longer propagation. To reduce the quantization errors in the first few frames, we raise the importance of the beginning frames and allow the buffer level to decrease slowly there. In Figure 4-4(b), we test four different curves for the target buffer level. When the value p is small, the decreases are small in the beginning pictures. However, this arrangement may cause the decrease of the buffer level to be too large when we code the remaining pictures.
When the bit rate is constant and the profile is the baseline profile, the equation ( 2-11 ) can be rewritten as following.
) 2 ( ) 2
( i
i V
S =
( 1) ( ) (2)
( ) 1
i
i i
p
S j S j S + = − N i
− ( 4-2 )
(a) (b) Figure 4-4 The target buffer level control, (a) the normal target buffer level, (b)
non-linear target buffer level.
60
Then, we change the target buffer level to be a nonlinear curve. The equation is written as Equation ( 4-3 ).
where the Np is the number of P-frame in a GOP and Np,r is the remaining number of P-frame in this GOP. Generally, we decide the p valuesto be 0.95 or 0.9 to avoid a dramatic decrease in the buffer level. This is because a large decrease in the buffer level may cause a serious degrading of video quality. After applying ( 4-3 ), we simulate how the video PSNR is changed by the changing of the target buffer level. In Figure 4-5, we can find the PSNR of the beginning pictures in each GOP are higher than that in the original case. However, this change may cause the drop of PSNR for the remaining pictures in the same GOP. Hence, this may cause a violent change of the video quality. Moreover, it also become easier to have a buffer overflow.
(a) The size of GOP is 250 (a) The size of GOP is 60
Figure 4-5 The experiment of change frame importance. (a)flower sequence (b)mobile sequence
) 2 ( ) 2
( i
i V
S =
( )
(
1 ( ) 1)
,( )( 1) ( ) (2)
1
p r p
N i
i i i N i
S j S j S p p
p −
+ = − × − ×
− ( 4-3 )
4.1.3 Selection different QP causes the MAD
change of next frame
Inter-prediction coding compresses the difference data between two pictures. In the rate control scheme, the inter-prediction coding is the most important part. In inter-prediction coding, if a larger compression distortion is generated in the current picture, the residual in the next picture will increase. In other words, coding more bits in the current picture may cause a reduce of data in the following pictures. In Figure 4-6, show an experiment of this property.
Figure 4-6 The data of the following pictures when coding the current picture with different QP’s.
In Figure 4-6, all pictures, except the 3rd picture, are coded with QP = 28. We can see that if we encode the 3rd picture with more bits, it will cause the reduce of the residuals in the following pictures. In rate control, we determine the QP value according to buffer fullness and frame complexity. Now, our strategy is that after determining the QP value q, we can predict
62
the situations of buffer fullness when coding with q-1, q and q+1, respectively. Then, we can determine the QP value of the next picture according to our predicted buffer fullness and the predicted MAD. The flow chart is plotted as follows.
Figure 4-7 The flow chart of our strategy in QP decision
In the above flow chart, the predicted coded bits are computed by the R-D model. After that, we can compute the predicted target bits of the next picture. In the following, we will discuss how to compute the MAD of the next picture. First, we simulate the change of the MAD values of the next picture when encoding the current picture with different QP values.
Figure 4-8 shows the relation between the MAD of the next picture and the QP of the current picture. There are the MAD differences when the values of QP are 27, 28 and 29, respectively, with the “foreman” sequence. The differences of MAD from 29 to 28 and from 28 to 27 are very similar. Hence, we define the symbol m as the difference between two QP’s. The relation of these MAD’s when the values of QP are 27, 28 and 29, is as following.
27 28
29 28
QP QP
QP QP
MAD MAD m
MAD MAD m
= =
= =
= −
= +
( 4-4 )Figure 4-8 The relation between the value of MAD and the value of QP
After defining the symbol m, we observe that the value of m and MAD are related. In , we can express this relation as a linear model when QP is 14, 22, 30, 38 or other values. Hence, the MAD value of the next picture can be predicted according to this relation. Our strategy is
QP=14 QP=22
QP=30 QP=38 Figure 4-9 The relation between the value of m and MAD
64
that after having determined the QP value of the current picture, we can predict the QP value of the next picture. This criterion can be either the minimum distortion or a stable visual quality. In the minimum distortion criterion, we can calculate the QP values of all cases and find the one which has the smallest distortion. On the other hand, in the stable visual quality criterion, we choose the one such that QP valus are the most similar.
In the following experiment, we compare the performance in two aspects. One is the PSNR of each pictures, while the other is the stability of the PSNR.
Figure 4-10 The PSNR of each pictures,. The sequence is “salesman” and the bit rate is 64K.
In Figure 4-10, the PSNR of each picture is improved. The average improvement of PSNR is about 0.5dB. The other improvements are list in the Table 4-2
Table 4-2 The PSNR of this experiment PSNR
Sequence Bit Rate
JM Our Improvement
Akiyo 64K 42.75 42.85 0.1
Grandma 64K 39.38 39.50 0.12
Hall 64K 38.31 38.37 0.06
In the stability of the PSNR experiment, the performance is defined to be the absolute difference between the original PSNR of each picture and the PSNR after passing through the median filter. A smaller value means the PSNR is more stable. We can find an improvement of the PSNR stability in Table 4-3. In Figure 4-11, we can see there is a little, but not obvious, improvement.
Table 4-3 The PSNR stability of the experiment PSNR
Sequence Bit Rate
JM Our Improvement
Hall 32K 0.1555 0.1443 0.0112
Highway 128K 0.3133 0.2672 0.0461
Mother 128K 0.3352 0.2709 0.0643
Coastguard 512K 0.3134 0.2914 0.0220
Figure 4-11 The PSNR of each picture
66
4.2 Macroblock Level
In the JM of H.264/AVC, if the number of basic unit in one picture is one, we need to determine the bit allocation for each basic unit. If the basic unit is a macroblock, we’ll determine the bit allocation for each macroblock. In JM, the original method is defined as
( ) ( )
( )
22( )
, ,
mb ,
k
pMAD j k
T j k T j
pMAD j k
= ×
∑
( 4-5 )where the Tmb is the target bits of the macroblock, T is the target bits of the picture, j means the index of the picture, and k means the index of the encoded macroblock. The symbol pMAD means the predicted MAD of a macroblock.
(a)Original Picture (b)The MAD value when QP is 28
(c)The coded bits when QP is 28 (d)The coded bits with rate control Figure 4-12 The relation between the coded bits and MAD in macroblock level
The concept of this equation is that a macroblock with a larger MAD needs more bits to be coded. In Figure 4-12, a darker macroblock means that the MAD of this macroblock is higher and the coded bits are more than a brighter macroblock. The distribution in Figure 4-12 (d) needs to be similar to the distribution in Figure 4-12 (c). In this section, we modify the bit allocation at the macroblock level to improve the coding performance.
4.2.1 MAD prediction from forward frame (add
motion vector)
We have an idea that the residual data of each macroblock will move when the moving vector exists. It means the MAD of macroblock will move with motion vector. Figure 4-13 describes that the motion will cause the original MAD of this macroblock to move to other macroblocks.
Figure 4-13 The diagram of macroblock motion
Hence, we try to modify the MAD prediction in the macroblock level. We use the motion vector of each macroblock and move the MAD of each macroblock to the new location. After moving all the macroblocks, we calculate the average MAD to represent the predicted MAD value. Like the case the Figure 4-14, the MAD of macroblock M in the next picture can be computed as:
68
Figure 4-14 Our method to predict the new MAD value
where the pMAD means the predicted MAD, the w is used to represent each weighted MAD in the macroblock. Employing this method can improve the video quality in theory. This is because a more accurate MAD prediction can generate a more accurate bit allocation and avoid a mismatch in bit allocation. This can also improve the situation of buffer fullness.
Figure 4-15 The PSNR of each picture with the “silent” sequence when the bit rate is 16K
( ) ( ) ( ) ( )
1
N t L t M t
t
N L M
w MAD N w MAD L w MAD M pMAD M
w w w
+
× + × + ×
= + + ( 4-6 )
Figure 4-16 The buffer fullness with the “silent” sequence when the bit rate is 16K.
In Figure 4-15, we can find a little improvement in PSNR. There is also an improvement for accurate coded bits of picture in Figure 4-16. However, the improvement of our strategy is not very obvious.
70
4.2.2 A solution of coding too many bits in
picture level
In Section 3.2, we discussed the accuracy of the R-D model. We provide an accurate model to improve the situation. In this section, we use another approach. In a good rate control method, the coded bits of each picture must match the target bits. If the number of macroblock which is coded without the R-D model is large, it means the target bits have been used up long before we finish the encoding of all the macroblocks. Hence, we may change the target bits of each macroblock to improve this situation. The strategy is that we predict the location of these macroblocks that have no target bits left. Then, we change the target bits of these macroblocks which are before the predicted location and move the actual location backward. In Figure 4-18, we describe our strategy and formulate it as follows
( ) ( )
0mb mb
T m T m if m m
m Total Macroblock Number γ
= × <
≤ <
( 4-7 ) where Tmb is the target bits for each macroblock andγis a coefficient which is between 0 and 1. Generally a smallerγcan improve the situation more obviously. In the following, we perform some experiments and compare the results between the original approach and the proposed modification.
Buffer Fullness The number of macroblock that is coded without using R-D model
Figure 4-17 Compare the buffer fullness and the number of macroblock that is coded without using R-D model
Figure 4-18 The illustration of our strategy in changing bit allocation
Figure 4-19 The buffer fullness
72
Figure 4-20 The PSNR of each picture
In Figure 4-19, the buffer fullness after the change of the bit allocation is very close to the target buffer level. This is because we can suppress the number of the macroblocks that have no remaining target bits. However, there is still a side effect. In Figure 4-20, the PSNR of each picture after the change of the bit allocation is lowered. This is because reducing the target bits of the previous macroblocks may cause the mismatch between the target bits and the complexity of each macroblock. Hence, although this modification can control the buffer fullness well, it is still not a good solution.
In Chapter 3 and 4, we proposed several approaches in some issues of rate control for H.264/AVC. Finally we use a flow chart to represent the summary of our algorithm. Figure 4-21 is our final algorithm for the rate control of H.264/AVC.
Figure 4-21 The flow chart of our algorithm
74
Chapter 5 Conclusion
In this thesis, we study some rate control issues of H.264/AVC. Here, we conclude our accomplishments as below:
1. We propose a new rate-distortion model for the baseline profile of H.264/AVC. Using the R-D model can improve the accuracy of rate control. The improvement can also get good visual quality, especially in low bitrates.
2. The relation between the header bit and the MAD value has been found. We use this
2. The relation between the header bit and the MAD value has been found. We use this