A REFINED FAST 2-D DISCRETE COSINE TRANSFORM ALGORITHM WITH REGULAR BUTTERFLY STRUCTURE
Yuh-Ming Huang’,2, Ja-Ling Wu’r2, and Chiou-Ting Hsu’
’Department of Computer Science and Information Engineering
National Taiwan University, Taipei, 107
‘Department of Information Engineering, National Chi-Nan University, Puli, 545
Taiwan, Republic of China
Abstract
In this paper, a fast computation algorithm for the two-dimensional discrete cosine transform (2-D DCT) is derived based on index permutation. As a result, only the computation of N N- point l-D DCT’s and some post- additions are required for the computation of an (?JxN)-point 2-D DCT. Furthermore, as compared with [7], the derivation of the refined algorithm is more succinct, and the associated post- addition stage possesses a more regular butterfly structure. The regular structure of the proposed algorithm makes it more suitable for VLSI and parallel implementations.
1. Introduction
Since DCT approaches the statistically optimal Karhunen-Loeve transform (KLT) for highly correlated signals. It has found wide applications in speech and image processing as well as telecommunication signal processing for the purpose of data compression, feature extraction, image reconstruction, and filtering. Thus, many algorithms and VLSI architectures for the fast computation of DCT have been proposed
[ 1 1461.
Among those algorithms, [5] and [6] are believed to be the most efficient 2-D DCT algorithms in the sense of
minimizing any measure of
computational complexity. However, the main drawbacks of these algorithms are the requirements of complex computation for [5] and complicated matrix decomposition for [ 61. Moreover, the non-modularized structure of these algorithms may complicate the design and control of the concurrent VLSI implementation.
Recently, Cho and Lee [7] proposed a fast modularized DCT algorithm, in which an (NxN)-point 2D DCT could be obtained by computing N N-point l-D DCT’s and a post-addition stage. In a later work [9], they also provide regular expressions for the input-output relations of the post-addition stage. However, the number of required additions is increased as an expense for improving the regularity in the structure.
Based on the idea of [7], in this paper, an index permutation based algorithm for computing the 2-D DCT is proposed. Although the resulant computational complexity is the same as that of [7], the derivation of the refined algorithm is more succinct, and the post-addition stage of the refined algorithm has a more regular butterfly structure.
Contributed Paper
11.
Computing the 2-D DCT
The Refined Fast Algorithm for
For a given input data sequence f i j , 0 2-D DCT and its <ilN-1. O<jlN-l, the
inverse are given by [ 11
2 (2
+
l)m (2j+
1)nn CO5 2N 2NFntn
=--c "' -cZD./C@s
tsJd O<mlN- 1, Oln<N- 1, (1-1) and 2 N-l N-l (2+I)m (2j+l)nr ( 1 -2) cos d n = c 2N 2N ' O<i<N- 1 OIjlN- 1 where - , i f k = O ,otherwise (1 -3)For convenience, the normalization factors cm and cn are not included in the following derivations. Thus, the denormalized 2-D DCT can be expressed as N - l N-l (2i + 1)mn ( 2 j
+
1)nx cos 2 N 2 N 'r,,,
=X X L ,
cos /=o /=o O<mlN- 1 , OInlN- 1, Fm,n = (2/N) cm Cn Ym,". (2) After some permutation of the input data sequence [2], eqn. (2) can be written as and N-l N-l (4i+
1)mn ( 4 j+
1)nr COS 2N2N
' r n , =TXT,
cos /=o /=o OlmlN- 1, OlnLN- 1 (3) whereL,2J
0 Si,j I N/2-1 J&+~ Oli<N/2-l, N / 2 < j l N - l.yJ
=Ls2-LzJ
N / 2 l i I N - l , O l j l N / 2 - 1L,%'-2J-LIN2J-l N/2<i,j IN-1.
(4) Based on the idea of [7], let
(4i
+
1)m+
( 4 j+
1)" 2 N N-l ,V-l n7 A n , , , , =y&L
cos /=o /=o OlmlN- 1 OlnlN- I (5-1) and N - l N - l (4i+l)m-(4j+I)n n7 2 N Bm,n =7
x
XI,/ cos E O J=o OlmIN- 1, OlnlN- 1, (5-2) Ym,n = (Am.n+Bm,& (6) thenSince 4i+l and N are coprime to each other, i.e. (4i+l,
N)'
=1, the permutation (4i+l)j+i modulo N maps all values of j . Let qij be the quotient of (4i+l)j+i divided by N. Hence, the kernels of the 2-D transforms represented in eqn. (5-1) and eqn. (5-2) can be rewritten as 1-D DCT's by replacing j with (4i+l)j+i-Nq,j. That is,N-l N-l (4i+l)(m+(4j+I)n) cos = ~Jd Id ~ ~ , ~ ( ~ l +, I ) / i i 2N OlmlN- 1, OlnlN- 1 (7-1) and (41 + I)( m - ( 4 j + 1)n) + 4Nq, n 2N 7I
4" =
224
,=U ,'O (,,+,),+, , u3sOIm5N- 1, OinlN- 1. (7-2) where <x>N denotes
x
modulo N.For the simplicity of notation,
X i , , ( 4 1 + l j , + i \. is denoted as
TJ,/
. Then, it can be seen that the 2-D input data sequence is grouped into N distinct data sets of size N, that is[(RI,,
,O 5 i 5 N - 11
,O<
j<
N - 1 the equations N - l - (4i+
l)(m+
( 4 j+
1)n) 2 N x (8-1)x,,,
cos /=a and (8-2) N-lzx,./
cos (4i+
I)(m
- (4j + 1)n) 2N I =o<
correspond to one of the 1-D DCT's of the data sequence {
tJ,l
1
or equal to zero, with respect to m and n. That is, by defining (4i + l)(m - (4; + ~ ) n ) I " ? - I '- 2 , = c d , -c
E(./.#
+ j ( / , , , ? , ,) lor n is even cos I v / 2 - ' . \ - 1 2N 2 , i l l :=(I -c
c
(Xi.. -&?I ,) fbr n IS odd (4i+
1)1 hl, ='FyJ,l
cos- E , ,=o 2 N (9)we can see that eqn. (8-1 j and eqn. (8-2) correspond to one of the +h,l and -h,l for some 1 = 0,l ,..., N-1, or equal to zero.
Besides, through an index
permutation , eqn. (9) can be implemented by a 1 -D DCT as follow:
/"-I * (2i
+
1)lh,,
= cos- E , (10-1)I =o 2 N
where
Hence, for the computation of an O\JxN)-point 2-D DCT, only the computation of N N-point 1-D DCT's and some post-additions are required. Next, we will show that the post- addition stage can be implemented by a butterfly-like structure. Since (4r+1)("+(4(j+rV'2)+1)n)7r=li.cos (41 + 1)(m+(4J ~ __ + 1)n) - *, cos 2N 2N (1 1-1) and
(4, + I"-(~(J + N / 2 ) + I),) (41 + 1Xm-(4, +I),)
cos 7r=+cos ~ R ' 2 N 2 iv 2N 2 , = U i = o (41 + l)(m - (41 +I).)
i
for n = 2 ( 2 k + l ) (14) where k = O,l,.. .,N/4-1.For example, if N=4, by eqn. (14) we have
(15) For n is odd, let
and N - 1 - (4i +1)1 r=O 2N I$ = C S J , , cos- E . (1 6-2) Since (41'
+
1)(m
+
( 4 ( j+
N / 4 )+
1)n)2N
cos 7r 7 (1 7-11 (4i+lj(m+(4j+lj,?) = fsin 7r and2"
(41’
+
1)(m
- (4(J +N / 4 )
+I).)
cos
n2N
.
(17-2) 7r (41’+
1)( m- (4j +I).)
2N = fsin But . (4i+1)1 (4i+l)(N--l) 2 N 2 N s n ~ n = cos n, (18)i.e. the 1-D discrete sine transform can be directly computed from the 1-D discrete cosine transform . Therefore, for some r and s, Olr,slN,
Y,,,”
can be written as(19) where k = 0, 1 ,. . .N/4- 1 , and $ I and
qs
are respectively equal to fG,I and
f
HJJ
.
For example, if N=4, r=3 and s=l, by eqn. (19) we have
As a result, the computation of an WxN)-point 2-D DCT can be achieved by recursively applying the above decompositions (eqn. (14) and eqn. (1 9)). The signal flow graphs for a 4x4 and an 8x8 DCT’s are shown in Figure 1 and Figure 2, respectively.
111. Complexity analysis of the post- addition stage
For an (NxN)-point 2-D DCT, let A(N) and B(N) respectively denote the number of additions required in the first logzN butterfly stages and the last butterfly stage, and let C(N) denotes the number of nodes that don’t require butterfly computations in the first log2N butterfly stages. From eqns. (15) and (18), we have C(4) = 2 and C(N) = C(N/2)+N/2 for N28, and B(N) = N2-
2N. Therefore, A(N) = N’logZN- C(N)+B(N) = N2( l+logzN)-3N+2. IV. Conclusion
A new index-permutation based 2-D DCT algorithm has been presented in this paper. The succinct derivation of the proposed algorithm make it more easy to describe the processes o f : how to map one 2-D DCT into a number of 1-D DCT’s. Moreover, the structure of the post-addition stage of the proposed algorithm is more regular than that of [7], and a systematic approach for constructing the post-addition stage has also been described.
References
[ I ] R. K. Rao and P. Yip, Discrete Cosine Transform: Algorithm, Advantages, and Applications. New York: Academic, 1990.
[2] M. J. Narasimha and A. M. Peterson, “On the Computation of the Discrete Cosine Transform,” IEEE Trans. Communications, Vol. COM-26, No. 6, pp. 934-936, June
1978.
[3] H. S. Hou, “A Fast Recursive Algorithms for Computing the Discrete Cosine Transform,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-35, pp. [4] PeiZong Lee and Fang-Yu Huang, “Restructured Recursive DCT and DST Algorithms,” IEEE Trans. Signal Processing, vol. 42, no. 7, pp. 51 P. Duhamel and C. Guillemot,
“Polynomial Transform
Computation of 2-D DCT,” in proc. ICASSP ’90, pp. 1515-1518, Apr.
1990.
61 E. Feig and S. Winograd, “Fast Algorithms for the Discrete Cosine
1455-1461, Oct. 1987.
Transform,” IEEE Trans. Signal ASSP-40, no. 9, pp. 2166-2173, Sep. Processing, vol. 40, no. 9, pp. 2174-
2193, Sept. 1992. [9] N. I. Cho, I. D. Yun, and S.U. Lee,
[7] N. I. Cho and S.U. Lee, “Fast “On the Regular Structure for the Algorithm and Implementation of 2- Fast 2-D DCT Algorithm,” IEEE D DCT,” IEEE Trans. Circuits Syst. Trans. Circuits Syst. vol. 40, no. 4,
1991.
[8]
N.
I. Cho andS.U.
Lee: “A Fast 4x4DCT Algorithm for the Recursive
2-
D DCT,” IEEE Trans. Acoust., Speech, Signal Processing, vol.
1992.
vol. 38, no. 3, pg. 297-305, Mar. pp. 259-266, Apr. 1993.
f l I f2? 4 2 f02 f IO f23 f3 I fo I f l 3 f20 f3 2
Figure 1. The signal flow graph for 4x4
DCT
YIO
y 2 0
y30
f l I f3 3 f j j f77 foo €22 f44 f66 f17 f02 f?3 f60 €3 I f54 f7s f12 f30 fs I f73 f46 f04 f26 f47 f65 f13 €06 f20 f3 2 f57 f48 f64 f7 I f07 f16 f25 f43 f34 f52 f6 I f70 fI0 f36 f~ j f7? f l 8 f3 7 f74 f l 4 f j 5 fso fos f24 f4 I f67 f03 f, I f40 f62 f56 fo 1 f? 7 f42 f63 f76 p00 POI p07 p10 PI I p17 p 2 0 p 2 1 p27 Pjo p 3 I p 3 7 QW Qo I Q07 Qio Q I I 4 1 7 Q ~ o Q2 I 4 2 7 Q30 Q3 I Q37
**
. a a-b a- - .
b. - -
- $ a+b.
. -
-:=-.
(a>Figure 2. The signal flow graph for 8 x 8 DCT (a) the 1 st butterfly stage of the post-addition stage
(b) the 2nd, the 3th, and the 4th butterfly stages of the post-addition stage (for n is (c) the 2nd, the 3th, and the 4th butterfly stages of the post-addition stage (for n is (The output order of the broken butterfly
**
is reverse to that of the solid butterfly *.)even) odd)
.
\n
P
> Yo0(c>