An Improved Detection Method for Zero Quantized Blocks on H.264/AVC
Bo-Jhih Chen and Shen-Chuan Tai
Institute of Computer and Communication Engineering,
Department of Electrical Engineering, National Cheng Kung University, Tainan City 701, Taiwan (R.O.C.)
Email: [email protected] and [email protected]
Abstract—An improved detection method for observing the zero quantized block (ZQB) is proposed. The additional com- putational cost would be reduced due to 4 × 4 ZQBs being detected prior to the 4 × 4 DCT forward transform and the quantization processes on H.264/AVC video coder, we report a new criterion based on the statistical analysis by considering the energy conservation theorem. Experiments are also carried out to validate the present method. The results indicate that the present method has both a better detection rate with the negligible PSNR degradation and a reasonable error and/or false detection comparing to the prevalent methods.
Particularly, computation savings are obtained as well.
Keywords-zero quantized block; DCT; energy conservation;
H.264/AVC;
I. INTRODUCTION
H.264/MPEG4 Part 10 advance video coding(H.264/AVC) provides a number of advances in video coding techniques.
H.264 can achieve better performance in both the coding efficiency and the visual quality than previous standards such as MPEG-1/2/4 and H.261/263 [1], [2], and [3]. As far as the low bit-rate coding is concerned, i.e., a higher quantization parameter (QP) for encoding, a large amount of4×4 residual blocks after transform (DCT) / quantization (Q) will be considered as zero-quantized blocks (ZQBs) with sixteen zero quantized coefficients. Consequently, if ZQBs can be predicted, the computation of DCT/Q is skipped and the computational complexity is reduced as well.
Several efforts focused on an issue of detecting ZQBs [4], [5],and [6]. Sousa [5] proposed the early detection algorithms used for the8×8 DCT-based video encoder such as H.263. In [4], an improved detection algorithm has shown the higher detection rate and more computation savings than [5] based on theoretical analyses. Wang et al. [6] have analyzed the dynamic range of DCT coefficients and further have derived the condition for detecting the ZQBs.
Particularly, other approaches used the characteristic of the input residual data of DCT with statistical model to predict ZQBs. Pao et al. [7] addressed that the correlation of the residual pixel values after the motion-compensated prediction could be obtained approximately by use of a Laplacian distribution. Similarly, Wang et al. [8] extended Pao’s results to present a hybrid method applying in H.264.
In [9], a ZQB detection method was presented based on
the energy conservation theorem and the rate-distortion of Gaussian distribution. Therefore, Xie’s method was superior to the existing methods in terms of the detection accuracy and the complexity of motion estimation.
In this paper, our detection method for ZQB is derived with respect to the analyses of DCT coefficients. Experimen- tal results show that the average detection rate obtained by using our method is superior to other detection algorithms with negligible visual degradations. The remainder of this paper is organized as follows. Both the background of the integer discrete cosine transform of H.264/AVC and the overview of the ZQB early detection methods are reviewed in Section II. The preliminary work for DCT coefficients and the proposed method are presented in Section III.
Experimental results are shown in Section IV to compare the proposed method with the comparative methods. Con- clusions are given in Section V.
II. BACKGROUND ANDOVERVIEW
A. Integer-DCT Transform and Quantization
H.264 adopts the 4 × 4 integer DCT transform to avoid the mismatch problem caused by the inverse transform [10]
and [11]. Given a 4 × 4 residual block X, i.e., X = {f (m, n)| 0 ≤ m, n ≤ 3}, the transform coefficients, F = {F (u, v)|0 ≤ u, v ≤ 3}, can be calculated by the following form:
F = AXAT = (CXCT) ⊗ P F = W ⊗ P F (1) where
C =
1 1 1 1
2 1 −1 −2
1 −1 −1 1
1 −2 2 −1
, P F =
a2 ab2 a2 ab2
ab 2
b2 4
ab 2
b2 4
a2 ab2 a2 ab2
ab 2
b2 4
ab 2
b2 4
, and
a =1
2, b =r 2 5, c = 1
2 r 2
5
A is the floating transform marix and C is the core trans- form matrix. W is a forward transform matrix obtained by CXCT. P F is a post-scaling factor. The symbol ⊗ represents that each element of W is multiplied by a factor in the same position inP F . Given a quantization parameter,
QP, which varies from 0 to 51. The quantization of H.264 is defined as
|FQ(u, v)| = (|W (u, v)| · M F (QP %6, r) + k) >> qbits (2) where r = 2 − u%2 − v%2, qbits = 15 + f loor(QP/6), and sign(FQ(u, v)) = sign(W (u, v)). The constant k is 2qbits/3 for the intra-coded block or 2qbits/6 for the inter-coded block. The symbol >> denotes the right-shift operator. The quantization matrixM F can be defined as:
M F (QP %6, r) =
5243 8066 13107 4660 7490 11916 4194 6554 10082 3647 5825 9362 3355 5243 8192 2893 4559 7283
(3)
From (2), the quantized coefficientFQ(u, v) is equal to zero while the following sufficient criterion holds true:
|W (u, v)| < T (r) (4) where
T (r) = 2qbits− k
M F (QP %6, r) (5)
B. Related work
Sousa [5] provided a sufficient condition to find the8 × 8 zero-quantized blocks. Wang [8] rewrote it for4 × 4 blocks,
SAD < T HSousa= T (0)
4 (6)
Based on the theoretical analysis, Moon [4] derived a new threshold which could detect more ZQBs by comparing to Sousa’s method:
SAD < T HM oon (7)
where
T HM oon= minn
T (0)
4 +min{hs(0,3),hs(1,2)}
2 ,T (1)2 o and
hs(i, j) =
3
X
y=0
(|f (i, y)| + |f (j, y)|)
In [6], Wang proposed an early detection algorithm ac- cording to the basis of DCT transform
SAD < T HW ang (8)
where the thresholdT HW ang is T HW ang= minn
(T (0)+m0)
2 ,(T (1)+m2 1), T (2)o (9) m0 andm1 are defined as
m0= min(S3− 2S1, S1− 2S3, S4− 2S2, S2− 2S4) m1= min(S1+ S2, S1+ S4, S2+ S3, S3+ S4)
andSi is calculated by
S1 = |f (0, 0)| + |f (0, 3)| + |f (3, 0)| + |f (3, 3)|
S2 = |f (0, 1)| + |f (0, 2)| + |f (3, 1)| + |f (3, 2)|
S3 = |f (1, 1)| + |f (1, 2)| + |f (2, 1)| + |f (2, 2)|
S4 = |f (1, 0)| + |f (1, 3)| + |f (2, 0)| + |f (2, 3)|
III. PROPOSEDMETHOD FORZQBSDETECTION
A. Analysis of the Input Residual Block
Pao [7] and Wang [8] assumed that the input data f of DCT may be approximately by the Gaussian and the Laplacian distribution, respectively. Hence, the variance of DCT coefficientsσF(u, v) can be defined as
σ2F(u, v) = σ2fARAT
u,uARAT
v,v (10) whereA is the transform matrix in (1) and [ · ]u,udenotes the (u, u)th component of the matrixARAT;R is a covariance matrix given as
R =
1 ρ ρ2 ρ3
ρ 1 ρ ρ2
ρ2 ρ 1 ρ
ρ3 ρ2 ρ 1
(11)
where ρ is the correlation coefficient between vertical and horizontal directions from two pixels of a residual block.
Empirically, ρ = 0.6 was set as same as [7] and [8]. From (10) and (11), we have
σF2(u, v) = σf2
5.61 2.13 1.06 0.68 2.13 0.81 0.40 0.26 1.06 0.40 0.20 0.13 0.68 0.26 0.13 0.08
(12)
As shown in (12), the variance of the DCT coefficients can be estimated from the variance of the input residual data f . It also shows that the variance of the (0, 0)th (i.e., the DC coefficient) is larger than that of AC coefficients.
In [9], the condition of each DCT coefficients which are less than one, |F (u, v)| < 1, was derived as follows.
F (u, v) < 5
6Qstep (13)
where Qstep = 0.625 × 2QP/6.
Based on the property of normally distributed data, the probability of the DCT coefficients within [−3σF, 3σF] is about 99.73%. Considering the (13), the probability of F (u, v) equal to zero is over 99% if the following codition is true
3σF(u, v) < 5
6Qstep (14)
From (10) and (14), we have σ2f< (56Qstep)2
32[ARAT]u,u[ARAT]v,v (15)
Table I
LOCATIONS, Loc(i),ARE DEFINED BY THEDCT COEFFICIENTS’ POSITIONS
Loc(i) Position(u, v)th
Loc(0) = DC (0,0)
Loc(1) (0,1), (1,1), (1,0) Loc(2) (0,2), (1,2), (2,2), (2,0), (2,1) Loc(3) (0,3),(1,3),(2,3),(3,0), (3,1), (3,2),(3,3)
B. Proposed Method for ZQB Detection
Determining the largest magnitude of DCT coefficients, FM AX, of a 4 × 4 block is an important part for early detecting the ZQBs. In order to investigate the distribution of DCT coefficients, we first analyzed six CIF(352 × 288) video sequences, (’Akiyo’, ’Coastguard’, ’Foreman’, ’News’,
’Silent’, and ’Table Tennis’), with different QPs (18, 24, 30, 36, 40, and 46). As shown in Table I, a4 × 4 DCT block is divided into the four locationsLoc(i) separately and the DC coefficient locates in Loc(0). We define the probability of the largest of DCT coefficients occurred inLoc(i) as follows
P (i) = NLoc(i)
NAll
(16) where NAll is the total number of the 4 × 4 blocks and the NLoc(i) denotes the number of the FM AX occured in Loc(i). Figure 1 shows the occurrence probability of Loc(i) with various quantization parameters QPs. It shows that the distribution ofLoc(0) has larger variation with various QPs, whileLoc(1) is the dominant location with the largest DCT coefficients, over50%, especially at lower QPs (higher bit- rates). On average, over 70% and 80% of the largest DCT coefficients fall within top-left2 × 2 and 3 × 3, respectively.
In addition, the total energy of DCT satisfies the following equality based on the energy conservation property,
X
m
X
n
|f (m, n)|2=X
u
X
v
|F (u, v)|2 (17)
where f (m, n) and F (u, v) are the residual pixel values and the DCT coefficients, respectively. The energy of AC coefficients of a4 × 4 DCT block, EACs, can be derived as the variance of the residual data,
EACs = X
u
X
v
|F (u, v)|2− |F (0, 0)|2
= X
m
X
n
|f (m, n)|2− 1 N
N −1
X
m=0 N −1
X
n=0
f (m, n)
2
= N2σ2f (18)
In (18), it shows that N2 times of the variance of the input residual data σfis approximately the energy of AC coefficients EACs. From (15) and (18), we have the threshold for judging the ACs’ energy EACs as follows,
EACs< T HACs(u, v) (19)
where
T HACs(u, v) = N2(56Qstep)2 32[ARAT]u,u[ARAT]v,v
Therefore, we first use the (13) to check the DC coefficient F (0, 0) and then employ (19) to judge the ACs’ energy. An improved method is derived from [8] and [9] for detecting ZQBs as follows.
|F (0, 0)| < 56Qstep
EACs< T HACs(0, 2) (20) where the thresholdT HACs(u, v) is a symmetrical matrix, i.e., T HACs(u, v) = T HACs(v, u). Based on an above statical analysis, it has shown that the occurrence probability that the largest DCT coefficients fall within top-left3 × 3 is over80%. Therefore, only use T HACs(0, 2) as the threshold for judging the ACs’ energy and further reduce the number of comparisons compared with [8]. Moreover, by comparing the second threshold of judeging ACs’ energy, a higher threshold of proposed method than that of [9]. Hence, proposed method have more capability to early detecting more ZQBs.
IV. EXPERIMENTALRESULTS
We use the reference software JM11.0 [12] to implement the proposed method. Six video sequences (’Akiyo’, ’Coast- guard’, ’Foreman’, ’News’, ’Silent’, and ’Table Tennis’) with different motion activities are used. They are in CIF format (352 × 288) and have 150 frames to be encoded by IPPP coding structure. The fast motion estimation with search range 32 is enable and the number of reference frame is set to 1. Rate distortion optimization is enabled for mode selection and use six QP values in our experiments to examine the performance at different bit rates. Note that if the residual block is determined as ZQB, DCT/Q/DQ/IDCT are skipped and residual block is assigned to zeros.
The objective performance in terms of the video quality degradation (∆P ) is given as
∆P = POrg.− P (21)
wherePOrg.andP are the peak signal-to-noise ratio (PSNR, dB) obtained from the original encoder and the encoder with each method, respectively. In addition, the detection rate (DR, %) is defined as
DR(%) = N Nz
× 100% (22)
whereN is the number of ZQBs being determined by early detection algorithms prior to DCT/Q and Nz is the total number of actual ZQBs. Table II-VII shows the results of the PSNR degradation and detection rate by comparing to the original encoder. Note that a negative ∆P means the PSNR gain. On average, the better performance in terms of detection rate is obtained by using our method with insignificant PSNR degradation.
(a) (b) (c)
(d) (e) (f)
Figure 1. The probability of the largest DCT coefficient occurred in Loc(i). (a) Akiyo, (b) Coastguard, (c) Foreman, (d) News, (e) Silent, and (f) Table Tennis
l
Table II
COMPARISONS OFPSNR DEGRADATION(DB)ANDDETECTIONRATE (%), Akiyo
Xie [9] Wang [8] Proposed
QP ∆P DR ∆P DR ∆P DR
18 0.035 62.4 0.024 60.1 0.038 75.5 24 0.039 87.4 0.042 87.4 0.070 90.7 30 0.028 92.2 0.049 93.3 0.073 94.2 36 0.018 94.7 0.054 96.4 0.033 95.9 40 0.002 96.4 0.046 97.9 0.025 97.2 46 0.037 98.2 0.032 99.1 0.036 98.5
As mentioned, the detection rate of the proposed al- gorithm are higher than that of other algorithms and the detection quality rate is90% in average. However, the PSNR of the proposed algorithm is lower than the other algorithms.
This is due to that the threshold proposed by our method is higher than others (i.e., the number of NZQBs being wrong predicted as ZQBs is more than others).
To evaluate the efficiency of detection algorithms, detec- tion quality rate (DQ) are employed to compare the detection capacity of ZQBs. TheDQ is defined as
DQ(%) = (Nz+ Nn) − (Nm+ Nf) Nz+ Nn
× 100% (23) where Nz andNn are the total number of ZQBs and non- ZQBs (NZQB), respectively. Nm is the number of ZQBs being miss predicted as NZQBs and Nf is the number of NZQBs being false determined to ZQBs. Therefore,
Table III
COMPARISONS OFPSNR DEGRADATION(DB)ANDDETECTIONRATE (%), Coastguard
Xie [9] Wang [8] Proposed
QP ∆P DR ∆P DR ∆P DR
18 0.038 16.9 0.035 14.7 0.058 23.4 24 0.076 27.0 0.069 22.6 0.111 40.9 30 0.107 50.5 0.094 45.3 0.154 65.2 36 0.095 75.9 0.087 74.1 0.119 85.5 40 0.055 88.4 0.064 86.9 0.095 93.9 46 -0.024 96.3 0.018 96.6 0.003 97.9
Table IV
COMPARISONS OFPSNR DEGRADATION(DB)ANDDETECTIONRATE (%), Foreman
Xie [9] Wang [8] Proposed
QP ∆P DR ∆P DR ∆P DR
18 0.045 31.8 0.036 30.9 0.081 41.5 24 0.145 60.2 0.125 58.1 0.185 70.6 30 0.127 81.6 0.131 81.9 0.162 87.0 36 0.086 91.5 0.112 92.7 0.110 94.0 40 0.072 94.7 0.067 95.8 0.090 96.2 46 -0.016 96.6 -0.024 97.8 0.022 97.2
the higher DQ means that detection algorithms can more efficiently detect the ZQBs. As shown in Figure 2, the results of DQ curves obtained by our method are better than other comparative algorithms. Our method have average DQ of 90 % with negligible video quality degradation at different
(a) (b) (c)
(d) (e) (f)
Figure 2. Comparisons of Detection Quality. (a) Akiyo, (b) Coastguard, (c) Foreman, (d) News, (e) Silent, and (f) Table Tennis
Table V
COMPARISONS OFPSNR DEGRADATION(DB)ANDDETECTIONRATE (%), News
Xie [9] Wang [8] Proposed
QP ∆P DR ∆P DR ∆P DR
18 0.055 43.8 0.046 42.0 0.076 58.5 24 0.072 78.6 0.064 77.5 0.097 84.2 30 0.074 87.2 0.068 87.6 0.096 90.7 36 0.067 91.7 0.058 92.7 0.080 94.0 40 0.025 94.7 0.047 95.2 0.053 96.5 46 -0.002 96.8 -0.029 97.8 -0.011 97.8
Table VI
COMPARISONS OFPSNR DEGRADATION(DB)ANDDETECTIONRATE (%), Silents
Xie [9] Wang [8] Proposed
QP ∆P DR ∆P DR ∆P DR
18 0.015 26.1 0.015 25.4 0.024 31.9 24 0.039 36.8 0.036 36.3 0.053 47.3 30 0.038 66.0 0.046 65.0 0.066 75.7 36 0.018 85.8 0.021 86.8 0.033 90.3 40 0.028 93.2 0.021 94.1 0.052 95.1 46 0.032 97.2 0.051 98.9 0.039 97.6
QPs.
Table VIII lists the required operators for transform and quantization and the number of operations for determining a 4 × 4 zero quantized block. To compare the reduction in computational complexity, the computation saving rate
Table VII
COMPARISONS OFPSNR DEGRADATION(DB)ANDDETECTIONRATE (%), Table Tennis
Xie [9] Wang [8] Proposed
QP ∆P DR ∆P DR ∆P DR
18 0.036 22.1 0.035 20.2 0.060 33.9 24 0.101 47.2 0.088 44.8 0.149 58.3 30 0.133 68.9 0.120 68.9 0.168 77.5 36 0.098 85.6 0.093 87.5 0.133 90.0 40 0.115 93.6 0.087 94.3 0.119 95.8 46 0.068 97.1 0.088 97.2 0.122 98.1
(CSR) is calculated by CSR(%) =
1 − OP OPOrg.
× 100% (24)
where OPOrg. is the total number of computation required by DCT/Q/DQ/IDCT in original encoder, and OP is the number of operations of early detection approaches. Fig- ure IV shows that the average CSR of each approaches at different QPs. As we can see, our method can save up to 49
% of computation in DCT/Q/DQ/IDCT.
V. CONCLUSION
This paper proposes an improved detection method to predict ZQBs by accounting for the experimental distribution of largest DCT coefficients. The results exhibit insignificant PSNR degradation with 0.08dB and a better detection qual- ity of 90%. The computational results also show that the computation of DCT/Q/DQ/IDCT are saved up to 54.6% .
Table VIII
THENUMBER OFREQUIREDOPERATIONS PER4 × 4BLOCK
Required operations of a4 × 4 sub-block
Operator Original Additional
DCT/Q DQ/IDCT [9] [8] Proposed
ADD 80 64 16 10 16
MUL 16 16 17 0 17
SFT 32 16 0 6 0
CMP 0 0 2 9 2
Figure 3. Comparisons of computational saving rate (CSR, %).
REFERENCES
[1] D. Marpe, T. Wiegand, and G. J. Sullivan, “The H.264 / MPEG4 advanced video coding standard and its applications,”
IEEE Commun. Mag., vol. 44, no. 8, pp. 134–143, 2006.
[2] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira, T. Stockhammer, and T. Wedi, “Video coding with H.264/AVC: tools, performance, and complexity,” IEEE Circuits Syst. Mag., vol. 4, no. 1, pp. 7–28, 2004.
[3] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra,
“Overview of the H.264/AVC video coding standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 560–
576, Jul. 2003.
[4] Y. H. Moon, G. Y. Kim, and J. H. Kim, “An improved early detection algorithm for all-zero blocks in H.264 video encoding,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 8, pp. 1053–1057, Aug. 2005.
[5] L. A. Sousa, “General method for eliminating redundant computations in video coding,” Electronics Letters, vol. 36, no. 4, pp. 306–307, Feb. 17, 2000.
[6] H. Wang, S. Kwong, and C.-W. Kok, “Efficient prediction algorithm of integer DCT coefficients for H.264/AVC opti- mization,” IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 4, pp. 547–552, Apr. 2006.
[7] I.-M. Pao and M.-T. Sun, “Modeling DCT coefficients for fast video encoding,” IEEE Trans. Circuits Syst. Video Technol., vol. 9, no. 4, pp. 608–616, 1999.
[8] H. Wang and S. Kwong, “Hybrid model to detect zero quan- tized DCT coefficients in H.264,” IEEE Trans. Multimedia, vol. 9, no. 4, pp. 728–735, 2007.
[9] Z. Xie, Y. Liu, J. Liu, and T. Yang, “A general method for detecting all-zero blocks prior to DCT and quantization,”
IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 2, pp.
237–241, 2007.
[10] (2007, November) ITU-T Recommendation H.264 : Advanced video coding for generic audiovisual services.
International Telecommunications Union. [Online]. Available:
http://www.itu.int/rec/T-REC-H.264-200711-I/en
[11] I. Richardson, H.264 and MPEG-4 Video Compression: Video Coding for Next-generation Multimedia. Wiley, 2003.
[12] H.264/AVC Reference Software JM11.0. [Online]. Available:
http://www.iphome.hhi.de/suehring/tml/