An Early Detection Algorithm of All-Zero Blocks in H.264/AVC
through the Vector Operations
Bo-Jhih Chen and Shen-Chuan Tai
Institute of Computer and Communication Engineering, Department of Electrical Engineering,
National Cheng Kung University, Tainan, Taiwan
E-mail: [email protected]
Abstract
- Motion estimation (ME) and discrete cosine transform (DCT) are major two parts for H.264/AVC, however, these two parts occupy most of computational time. Several studies focus on the fast motion estimation (FME) algorithm to reduce ME computations and then to accelerate the encoding process. FME has been optimized but we cannot neglect another parts on H.264/ AVC; i.e., integer-DCT (I-DCT) / quantization. With this concern, we propose an efficient method through the vector operations to early detect all-zero block (AZB) before the transform and quantization on H.264. Experimental results show that the proposed method is superior to other algorithms in terms of the hit detection rate and the computational complexity reduction at the expense of insignificant degradation of video quality.Keywords: H.264/AVC, DCT, I-DCT, AZB
1. Introduction
Most standard video coding technologies, like MPEG series, H.26x series [1]-[4] and H.264/AVC [5], are widely applied in various multimedia communications. Motion estimation, motion compensation, and discrete cosine transformation / quantization of H.264 are major processing functionalities for these codec standards. H.264/AVC is the latest video coding technology released by JVT team.
H.264/AVC outperforms existing video coding standards in terms of video coding performances [6], however, the considerable amount of computations of an H.264/AVC required encoder. A lot of research works have been dedicated on optimization of H.264/AVC including fast motion estimation and low-complexity transform and quantization [7]-[12].
Several studies concentrated on the optimization of motion estimation, but some works did not neglect to optimize the other portions to further speed up the whole of encoding in H.264. One way to accelerate the
encoder is to early detect an all-zero block (AZB) prior to transform process. Once a block is detected as an AZB, then transform/quantization can be skipped. An “AZB” block means that a 4×4 block of all sixteen transformed coefficients being zeros after quantization. Xuan [13] provided a simple method to verify that each residual block whether it omits the transform process. In [14], Sousa theoretically derived a sufficient condition for AZB detection. It based on the sum of absolute difference (SAD) and a quantization parameter (QP). Moon, et al. [15] proposed an approach derived from the theoretical observation of distinguish positions of transform coefficients in H.264.
In this paper, the proposed method for AZB detection is according to the significant relationship: a quantized block relative to multiplication factors (MF). Firstly, we use a novel model to form the transform signals in vector by the matrix direct product, and furthermore three criteria of AZB detection are derived by vector resultant. The detection rate and computational computations could be better than [14]-[15]. These will be presented in the experimental results. In addition, it ensures that our proposed method could be utilized to enhance the detection ability while reducing the computations relatively.
This paper is organized as follows. Integer DCT and quantization of the H.264, and previous works for detecting all-zero blocks are briefly introduced in Section 2. Our proposed method through vector operations is presented in Section 3. Section 4 introduces the experimental results to verify the efficiency of our proposed method for accelerating encode in H.264. The conclusions of this paper are given in Section 5.
2. Analysis of All Zero Block in encoder
2.1. Integer Discrete Cosine Transform and
Quantization in H.264
coefficient, yuv, 0 ≤ u,v ≤ 3, is obtained by the integer
DCT (I-DCT). I-DCT operation in H.264 can be defined as [5] E W E CXC Y= T.∗ = .∗ (1)
where X is a 4×4 residual block, xij, 0 ≤ i,j ≤ 3, C and CT are integer transform matrices defined in
H.264/AVC. The superscript “T” denotes the transpose of matrix and the operator “.*” represents the element-by-element multiplication in their corresponding indices. These two matrices are orthogonal matrices; i.e., CTC = CCT = I, and E is the
post-scaling factor matrix defined in [5]. The size of transformed block of H.264 is 4×4 and the residual block is as well.
Each component of transformed block (W) will be quantized by a quantization parameter (QP). The quantization process of H.264/AVC as follows
3 , 0 where ) ( ) ( , ) | (| ) ( _ , ≤ ≤ = >> + ⋅ ⋅ = v u w sign q sign QBits c w w sign q uv uv idx rem QP uv uv uv MF (2)
where QBits = 15 + floor(QP/6), QP_rem = QP%6, the constant c = 2QBits /3 for intra blocks or c = 2QBits /6 for
inter blocks, and the symbol “>>” indicates a binary shift right operator. The multiplication factor,
MFQP_rem,idx, have been predefined which depends on
their position and QP, and are listed in Table 1.
Table 1. Multiplication Factor, MF. (u,v)th (0,0), 0,2), (2,0), 2,2) (1,1),(1,3), (3,1), (3,3) Other positions QP_rem idx = 2 0 1 0 13107 5243 8066 1 11916 4660 7490 2 10082 4194 6554 3 9362 3647 5825 4 8192 3355 5243 5 7282 2893 4559
where idx = 2 – (u mod 2) – (v mod 2). After the quantization processing, quantized signal block Q is obtained by eq.(2) for all 16 indices.
2.2. Conventional AZB Detection Methods
We list the previous approaches which are mainly to early detect the all-zero blocks as follows. In eq.(3), a threshold is theoretically derived for detecting AZB block with QP. [14] 4 ) 0 , ( _ where , _Sousa TH Sousa QP TH SAD< =ΤΗ (3)
Moon, et al. [15] proposed an approach based on the theoretical observation of transform coefficients as the eq. (4).
∑∑
⎪ ⎪ ⎩ ⎪ ⎪ ⎨ ⎧ = = = = = = = = = = = = = = ≤ ≤ ∈ ∀ + + = = + = = < i j k k k j i j i A j i j i A j i j i A j i j i A A S k A j i j i x S S S S QP Th Sousa TH Th Th Th Moon TH Moon TH SAD } 3 , 0 , 2 , 1 | ) , {( } 2 , 1 , 2 , 1 | ) , {( } 2 , 1 , 3 , 0 | ) , {( } 3 , 0 , 3 , 0 | ) , {( and , 4 1 , ) , ( |, ) , ( | }, , min{ min , 2 ) 1 , ( TH 2 , 4 min _ 1 where } 2 , 1 min{ _ , _ 4 3 2 1 4 3 2 1 1 1 (4)These two methods mentioned above are used to detect AZB with SAD value.
3. Proposed Method Based on the Vector
Operation
3.1. The Representation of DCT signal
Integer-DCT of H.264 was shown as the eq.(1).It also says that the row-transform followed by the column-transform and it can be depicted a butterfly signal flow diagram as Fig. 1.
A 4×4 residual block is an input signal, xij, wuv’ is a
temporal result by row-transform, and the transformed block is an output signal, wuv, 0 ≤ u,v ≤ 3, through the
row/column transformation. M0-M4 with subscript “row” and “col” are mediate signals as the row and column transformation, respectively.
The transformed signal can be composed of residual signals with weight values. These weight values can be defined as a 4×4 matrix. a transformed signal of (u,v)th
can be form as the following:
⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ = = 33 32 31 30 23 22 21 20 13 12 11 10 03 02 01 00 33 32 31 30 23 22 21 20 13 12 11 10 03 02 01 00 * . * . x x x x x x x x x x x x x x x x v v v v v v v v v v v v v v v v w uv uv uv v X (5)
Each residual signal (xij) is multiplied by a weight
value (vij) in the same position in weight matrix vuv .
To simplify the implementation later, we use the strategy of “matrix direct product” to represent the transform process, which can be represented as
)) ( ( ) (W H vecX vec = ⋅ (6)
where vec(.) denotes a vector, which size of 16×1, that is, vec(X) is a one-dimension vector of 4×4 residual block, and a transform block is represented as vec(W).
(
)
T X) 00 01 02 03 10 33 ( x x x x x x vec = L ( )T W) 00 01 02 03 10 33 ( w w w w w w vec = L (7)The matrix H is a 16×16 matrix and is defined as
C C
H= ⊗ (8)
where the operator “⊗” denotes the “matrix direct product” with element defined by
kl ij mn c c
h = (9)
where m= 4i+kand n= 4j+l. C is a 4×4 matrix defined in H.264. H is also equal to p row vectors
( ) ( ) ( )
(
)
⎥⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ = ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ = 15 2 1 0 215 22 21 20 115 12 11 10 015 02 01 00 2 1 0 p p p p p h h h h h h h h h h h h h h h h L M L L L M H H H H H (10) andp= 4u+v.For instance, the transform signal of (0,0)th is given
by w00 = H vec0⋅ (X), transform signal of (0,1)th is w01
= H vec1⋅ (X) and so on. Therefore, each element of a transformed block can be depicted as
) (X
H vec
wp = p⋅ (11)
where Hp is a 16×1 row matrix of p.
3.2. Proposed Method of AZB Detection by
Vector Resultants
To determine each transform coefficient is equal to zero after quantization. It requires the huge amount of computations to judge if quantized signal is less than
TH(QP_rem,idx).
Given an quantization parameter (QP), we can easily get a multiplication factor (MF) listed in Table 1 to find a TH(QP_rem,idx). The transform block (W) and the thresholds (TH) are regarded as two 4×4 matrices, which specifies the relative coordinate of the (u,v)th position. These two matrices are represented in
Fig. 2. (a) (0,0) (0,1) (0,2) (0,3) (1,0) (1,1) (1,2) (1,3) (2,0) (2,1) (2,2) (2,3) (3,0) (3,1) (3,2) (3,3) (b) (*,2) (*,1) (*,2) (*,1) (*,1) (*,0) (*,1) (*,0) (*,2) (*,1) (*,2) (*,1) (*,1) (*,0) (*,1) (*,0)
Figure 2. (a) A 4×4 transform signal block of
wuv , (b) A 4×4 matrix of threshold, TH(*,idx).
As the QP_rem is fixed, the threshold TH(*,idx) is constant on idx={0, 1, 2}. These 16 transform elements can be classified into three sets depending on the (u,v)th
position. Sidx is the set of transform signals according to
the value of idx.
If all transform signals of these three sets in this transform block are less than threshold TH(QP_rem,idx) concurrently, this transform block will be an AZB. According to eq.(11), transform signals can be represented as follows ⎪ ⎩ ⎪ ⎨ ⎧ = + ⋅ = = + ⋅ = = + ⋅ = = } 2 2 % 2 % | ) ( { } 1 2 % 2 % | ) ( { } 0 2 % 2 % | ) ( { 0 1 2 v u vec S v u vec S v u vec S S p p p idx X H X H X H (12) where 0≤ p≤15 and p= 4u+v.
In order to reduce the computations of AZB detection, find vector resultants to represent these three sets respectively. Using the “vector addition” operation, an identical vector (e.g. vector resultant) is obtained by row vectors Hp in which of Sidx. Three identical vectors
of Sidx are defined as
10 8 2 0 ) 2 ( H H H H Hid = ⊕ ⊕ ⊕ 14 12 11 9 6 4 3 1 ) 1 ( H H H H H H H H H ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ = id (13) 15 13 7 5 ) 0 ( H H H H Hid = ⊕ ⊕ ⊕
where Hid(idx) is an identical vector of Sidx, and the
operator “⊕“ denotes the “vector addition” operation. Eq. (12) can be rewritten as
⎪ ⎩ ⎪ ⎨ ⎧ ⋅ = ⋅ = ⋅ = = ) ( ) ( ) ( ) 0 ( 0 ) 1 ( 1 ) 2 ( 2 X H X H X H vec S vec S vec S S id id id idx (14)
According to the above procedures, the sufficient condition of AZB detection is defined as
2 and , 1 , 0 for ) , _ ( TH =
< QP remidx idx
Sidx (15)
Consequently, the transformed signals have been converged on three classes shown in Fig. 3.
4. Experimental Results
In this section, H.264 reference software JM9.8 [16] is implemented for the experiments. Six testing video sequences (“Akiyo”, “Coastguard”, “Foreman”, “News”, “Silent”, and “Table Tennis”), each of which is CIF format (352×288), are performed to evaluate the performance of our proposed method. The configuration of our simulation is one reference frame with IPPP frame structure (100 frames), rate-distortion optimization off (RDO = 0), and quantization parameters (QP) was set to 24, 28, 32, 36, and 40. For comparing our proposed method with other algorithms discussed in [14] and [15] are shown as follows.
Our proposed method in terms of the peak signal-to-noise ratio (PSNR) is compared with JM reference software and other algorithms. The degradation of PSNR ranges from 0.07 dB to 0.25 dB via the six video sequences and five QPs are shown in Table 2. In average, PSNR drop is about 0.15 dB. It is so small that it does not influence the human being’s visual quality at all.
In order to evaluate the overall performance and conveniently compare to the capacity for detection of AZBs, hit detection rate (HDR) and false detection rate (FDR) are critical values and are defined as
% 100 %, 100 B actual_NAZ AZB detected_N -false actual_AZB ZB detected_A × = × = N N FDR N N HDR (16)
where
N
detectedAZB is the number of AZBs detected bythree early detection algorithms,
N
miss-detectedAZB isthe number of AZBs without being detected, and
N
false-detectedAZB is the number of NAZBs (Nonall-zeros blocks), that not all transformed coefficients are equal to zeros, being detected as AZBs. As far as these three values are concerned, the higher HDR presents more efficiently the algorithm can more detect AZBs correctly. If the FDR is smaller, it is yields less the PSNR degradation and the accuracy of visual quality. Table 2. Comparisons of PSNR QP JM [14] [15] Proposed JM [14] [15] Proposed Akiyo News 24 42.07 42.06 42.07 41.94 40.66 40.64 40.66 40.53 28 39.75 39.75 39.75 39.61 38.12 38.12 38.12 38.00 32 37.15 37.15 37.15 37.02 35.33 35.33 35.33 35.21 36 34.79 34.78 34.79 34.66 32.65 32.63 32.65 32.54 40 32.35 32.35 32.35 32.25 29.96 29.96 29.96 29.90 Coastguard Silent 24 37.36 37.36 37.36 37.17 38.51 38.51 38.51 38.43 28 34.36 34.36 34.36 34.11 35.81 35.81 35.81 35.72 32 31.27 31.27 31.27 31.03 33.21 33.21 33.21 33.13 36 28.66 28.64 28.66 28.44 31.00 31.01 31.00 30.93 40 26.50 26.50 26.50 26.33 28.94 28.94 28.94 28.86
Foreman Table Tennis
24 39.06 39.05 39.06 38.85 38.31 38.3 38.31 38.14
28 36.67 36.67 36.67 36.46 35.54 35.54 35.54 35.35
32 34.23 34.23 34.23 34.03 32.77 32.77 32.77 32.62
36 32.02 32.01 32.02 31.83 30.53 30.52 30.53 30.39
40 29.77 29.77 29.77 29.60 28.75 28.75 28.75 28.63 Figure 3. Signal flow after processing
Table 3. Comparisons of the false detection rate (FDR) of three methods
QP [14] [15] Proposed [14] [15] Proposed Akiyo News 24 0 0 7.39 0 0 7.56 28 0 0 9.64 0 0 8.82 32 0 0 9.32 0 0 9.39 36 0 0 11.95 0 0 11.04 40 0 0 13.23 0 0 12.95 Coastguard Silent 24 0 0 3.58 0 0 4.78 28 0 0 6.30 0 0 6.82 32 0 0 9.45 0 0 9.78 36 0 0 12.91 0 0 12.74 40 0 0 16.40 0 0 15.09
Foreman Table Tennis
24 0 0 7.65 0 0 5.41
28 0 0 10.42 0 0 8.19
32 0 0 11.66 0 0 10.58
36 0 0 12.14 0 0 12.59
40 0 0 12.81 0 0 12.73
The comparative results of HDR are illustrated in Fig. 4, the horizontal axis is QP and the vertical axis is the HDR, where we can see that our proposed method is superior to the others two algorithms because the proposed method defines the optimal condition to early detect the all-zero blocks. The detection of AZBs of the proposed method is more efficient than the others two algorithms and thereby mostly all zeros block in the video sequence were detected. These blocks correspond to approximately 60% - 94% of the total AZBs detected by the proposed method.
Table 3 shows the results of the FDR among our proposed method and the others two algorithms. The FDR of Sousa[14] and Moon[15] both are equal to zero due to they are lossless algorithms that NAZB was not classified to be an AZB for these two algorithms. In average, the FDR of our proposed method is about 3% - 16% in simulation, especially under the higher quantization parameter. This result can explain why video quality degrades in Table. 2 when applying our proposed in the procedure of video coding.
To further demonstrate the proposed method, we can consider the overall computaions required for
detection of AZB in the whole encoding process. Required computations of algorithms for AZB detection are shown in Table 4.
Computation saving rate (CSR) is defined as
100% 1 JM × − = C C CSR method (17)
where Cmethod is the encoding operations of DCT/Q for
a method of AZB detection, and CJM is the total
encoding operations of DCT/Q in original encoder (JM).
Table 4. Required computational operations of DCT/Q per block
Algorithms
Transform /
Quantization (Inverse-)Transform / Quant.
ADD /
MUL SHIFT / CMP ADD / MUL SHIFT / CMP 4×4 DCT 64 / 0 16 / 0 64 / 32 16 / 0
Additional operations (Prediction of AZB)
[14] 15 / 0 0 / 1 0 0
[15] 18 / 2 0 / 3 0 0
Proposed 28 / 6 0 / 3 0 0
Fig. 5 shows the comparisons of CSR of methods for detecting the AZB. It means how many computaions can be reduced over the encoding process. In average, it clearly observes that the proposed
Figure 4. HDR curve of six video sequences. (a) Akiyo (b) Coastguard (c)
Foreman (d) News (e) Silent (f) Table Tennis
(a) (b)
(c) (d)
method can get the better performance in reducing the computations than the other algorithms.
0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 24 28 32 36 40 C oupt at io n Sav ing (C S) QP [14] [15] proposed
Figure 5. Computation Saving Rate(CSR) of the proposed method compared to [14] and [15]
over six video sequences
5. Conclusions
In this paper, we propose an efficient method through vector operations to detect the AZB prior to transformation and quantization. Simulation results show that significant improvement in the detection rate and can be achieved with negligible video-quality degradation. Furthermore, it ensures that our proposed method could be utilized to enhance the detection ability while reducing the computations relatively.
Acknowledgments
This work was supported by the National Science Council, Taiwan, R.O.C. under the Grant No. NSC 96-2221-E-006-014.
References
[1] G. K. Wallance, “The JPEG still picture compression standard,” Communication ACM, vol. 34, pp.30-44, Apr. 1991.
[2] “The MPEG-2 international standard,” ISO/IEC, Reference number ISO/IEC 13818-2, 1996. [3] “MPEG-4 Video Verification Model Version
18.0,” ISO/IEC JTC1/SC29/WG11 N3908, Pisa, Italy, Jan. 2001.
[4] “Video Coding for Low Bit Rate Communication,” ITU-T Recommendation. H.263, Feb. 1998.
[5] ISO/IEC MPEG 14496-10:2003, “Coding of Audiovisual Object – Part 10: Advanced Video Coding,” 2003, also ITU-T Recommendation H.264 “Advanced video coding for generic audiovisual services.”
[6] T. Wiegand, H.Schwarz, A. Joch, and f. Kossentini, “Rate-constrained coder control and comparison of video coding standards,” IEEE Trans. on Circuits and Syst. for Video Technology, vol. 13, pp. 488-703, July 2003. [7] C. Zhu, W. S. Qi, and W. Ser, “Predictive fine
granularity successive elimination for fast
optimal block-matching motion estimation,” IEEE Trans. Image Processing, vol. 14, no. 2, pp. 213–221, Feb. 2005.
[8] S.-C. Tai, Y.-R. Chen, S.-J. Li, Low complexity variable-size block-matching motion estimation for adaptive motion compensation block size in H.264, in Proc. the IEEE Asia-Pacific Conference on Circuits and Syst., pp. 613-616, 2004.
[9] A. Ahmad, N. Khan, S. Masud, and M. A. Maud, “Efficient block size selection in H.264 video coding standard,” Electronics Letters, vol. 40, no. 1, pp. 19–21, Jan. 2004.
[10] A. Hallapuro, M. Karczewicz, and H. Malvar, “Low complexity transform and quantization,” in Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, Docs. JVT-B038 and JVT-B039, Jan. 2002.
[11] A. Hallapuro and M. Karczewicz, “Low complexity (I) DCT,” in Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, Docs. VCEV-N43, Sept. 2001.
[12] H. S. Malvar, “Low-Complexity length-4 transform and quantization with 16-bit arithmetic,” in ITU-T SG16, Doc. VCEG-N44, Sept. 2001.
[13] Z. Xuan, Z. Yu and S. Yu, “Method for detecting all-zero DCT coefficients ahead of discrete cosine transformation and quantization,” Electronics Letters, vol. 34, No. 19, pp. 1839-1840, Sep. 1998.
[14] L. A. Sousa, “General method for eliminating redundant computations in video coding,” Electronics Letters, vol. 36, no. 4, pp. 306-307, Feb. 2000.
[15] Y. H. Moon, G. Y. Kim, and J. H. Kim, “An improved early detection algorithm for all-zero blocks in H.264 video encoding,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 8, pp. 1053-1057, Aug. 2005.
[16] H.264 Reference Software JM9.8, [Online] Available: http://iphome.hhi.de/suehring/tml/