Two-stage convolver-based DCT and IDCT
algorithms
J.-L. WU S.-H. HSU W.-J. Duh
Indexing terms ’ Transform, Algorithms, Image processing
Abstract: Discrete cosine transform is the most popular solution for image coding owing to its near optimal performance, yet it is not signal dependent. A two-stage convolver-based DCT and IDCT algorithm with power-of-two length is pre- sented. The transform matrix of IDCT is decom- posed into the product of pre- and post processing matrices by using Mobius inversion formula. The pre-processing stage consists of just additions and subtractions; the post-processing stage performs circular convolution operations. A nearly recur- sive computational method for preprocessing cal- culation is given which considerably reduces the operation count and is valuable in practical usage. The post-processing is a circular convolution and can be computed using number theoretic trans- form, (semi-) systolic arrays or transversal filters.
1 Introduction
Discrete orthogonal transforms are increasingly impor- tant in many practical applications, such as data com- pression and spectrum analysis [
11.
Among these various discrete orthogonal transforms, the discrete cosine trans- form (DCT) is generally recognised as the best solution to encode digital speech and image owing to its near optimal performance, yet it is not signal dependent [ 2 ] . Along with the growing applications of DCT, many fast algorithms had been proposed [3-71. Ersoy [8] proposed a two-stage algorithm for DFT which is based on the vector transformation of sines and cosines into new basis functions using the Mobius inversion of number theory A new IDCT algorithm, with power-of-two length, derived following the construction method developed in Reference 8, is proposed. The kernel matrix of the trans- form is factorised into pre-processing and post-processing matrices. The pre-processing stage consists of just addi- tions and subtractions. The post-processing stage is a circular-convolver and can be realised using contempo- rary finite impulse response (FIR) filter implementation techniques. Although the development of the proposed algorithm is similar to that of Reference 8, the resultant r91.Paper 83341 (E4). received 3 1 s May 1991
The authors are with the Department of Computer Science and lnfor- mation Engineering, National Taiwan University, Taipei, Taiwan, Republic of China
I E E P R O C E E D I N G S - I , Vol. 138, N o . 6, D E C E M B E R 1991
two-stage convolver-based IDCT algorithm is thought to be new.
2 Preliminaries for two-stage IDCT algorithm Since the IDCT can be realised simply by transposing the signal flowgraph of the DCT and vice versa, it is suffi- cient to consider the IDCT-like equation
where
j(n) = y ( n ) n = I , 2, . . . , N - 1 and N = 2“, where m is an integer.
The sequence x(k) in eqn. 1 is first partitioned into odd and even sequences which then forms a new sequence z(k)
c41
asz ( k ) = x(2k)
2 k
+
- = x ( N - (2k+
1))(
where k = 0, 1,
. . .
, N/2 - 1Thus from eqns. 1 and 3
( 3 )
(4)
where k = 0, 1, _ . ., N ~ 1. Eqn. 1 can now be presented
in another form as
N - 1
i ( k ) = “ = O
c
j ( n ) cos(x)
( 5 )where Z(4k
+
1 ) = z(k).For convenience, an index group I(4N) is defined as
I(4N) = 4k
+
1 mod 4N where k is an integer.From the relationship between i(k) and z(k), it is clear that the evaluation of I(k) on the index group I(4N)
results in z(k).
According to the derivation in Reference 8, the cosine function can be represented in another set of basis, say 61 I
b(m,, N ) , by using the Mobius inversion formula [9] as M cos
(g)
=1
b(m,, N ) u m 1 = 1 1 0 < x mod 1 < 0.5 u ( x ) =i
- 1 0.5 < x mod 1 < 1 ( 6 ) 0 x mod 1 = 0 or 0.5where M = N - 3 and m , E I ( N ) as N is a power-of-two. Substituting eqn. 6 into eqn. 5
Y (n)
I
' 16 616 H16 H16 R16 H:6Hk
premultiplication of c(n)1
bit reverml permutation1
I
fast preprocessing stageI
I
permutation of H16I
circular convolution of postprocessing stageI
permutation of H:6I
inverse reordering of S16c
Fig. 1 Sysrem block diagram of proposed I D C T N = 16
where M = 4N ~ 3 and m , , k E I(4N). Interchanging the
order of the summation, eqn. 7 can be decomposed into two stages
M
i(k) =
1
b(m,, 4N)h(m,k mod 4N) (9)m, = 1
In eqn. 8, the sequence j(n) is transformed through the pre-processing stage to an intermediate sequence h(0; in eqn. 9, the post-processing stage, h(0 is transformed to the sequence Z(k).
Since k and m , of eqn. 9 are both in 1(4N), it is easy to
find a primitive root [9] of order N to generate a new
sequence such that it is in a circular convolution form. Let g be a primitive root of order N of the index group
1(4N), then the following index mappings are defined : k = gk' mod 4N
m , = gSm' mod 4N (10)
612
where g can be 4N - 3 and s =
+
1 or - 1. Substituting eqn. 10 into eqn. 9, the equation becomes a circular con- volution asN - I ,
m , = O
i ( g k ' ) = b(g""', 4N)h(gk'+""; mod 4N) (1 1) where k' = 0, 1,
. .
., N - 1. As shown in eqn. 11, whens = - 1 the post-processing becomes a circular convolu- tion; and when s =
+
1 it is a circular correlation.From eqns. 6 and 8, it is clear that the pre-processing stage involves additions and substractions only; and also from eqns. 6, 9 and 11, the post-processing can be con- verted to a circular convolution form which can be com- puted by using circular-convolver [lo] or NTT (number
theoretic transform) [l
11.
Therefore, both pre- and post- processing stages are suitable for realistic implementa- tion.3 Fast computations of pre-processing stage Following the definition of eqn. 6, a two-stage convolver- based IDCT is expressed in eqns. 8 and 9. Although the pre-processing stage consists of additions/subtractions only, direct computation of the proposed algorithm takes about nz multiplications and additions. Thus for real
applications, it is necessary to reduce the operation counts and modularise the computational structure. Matrix algebra is used to clarify the derivations. Rep- resenting the IDCT in matrix form as
CN = SNRNQNPN (12) where C , denotes the IDCT transform matrix; PN
denotes the diagonal matrix for multiplication needed in
eqn. 2; QN is the pre-processing matrix in eqn. 8; RN is
the post-processing matrix in eqn. 9; SN is the inverse reordering described in eqn. 3; and the subscript N
denotes the transform length.
Decomposition of
QN
is the main theme. Since 1 E 1(4N), the symmetry of QN is obtained as follows:( N i 2 ) - 1 " = O
+
')
(2n+
1x41'+
1) 4 + j ( 2 n + I)u(
(2n+
4 N 1x41'+
1)+-+i)
1 2 4 (14) where I' = 0, 1, ..., ( N / 2 ) - 1. Since it folows thatwhere
0,
= Q N B N and B, is the bit reversal permu-tation matrix.
From eqn. 8, the function u[(nl/4N)
+
$1
is, in fact, the sign function of cos (n1/4N). To implement eqn. 16, a fast computational method devised to construct the recurrentrelation between DNi2 and
ON,,
is presented as follows:D N / 2 = M N / 2 0 N / 2
+
A N / 2 (17)and eqn.
16
becomes‘ N / 2 ] (
[
‘N/Z ‘ , N / Z ) ] [ : : I ‘ N P ]ON
=[t::
-‘N/Z ‘(NO) M ( N / Z ) O N / ,where MN12, a diagonal matrix with elements 1 or - 1, stands for the multiplication matrix and A N / , , a sparse matrix with nonzero elements 2 or -2, denotes the com- pensation matrix.
From eqn. 18, it is obvious that
8,
is nearly recurs- ively decomposable. The computational diagram of eqn. 18 is illustrated in Fig. 2. The complexity of the pro-Fig. 2 Fast computation of pre-processing stage
cedure depends on the computation of matrix A N / , and increases as N gets larger. Though the derived recurrence
is not perfect, it reduces the number of operations of the pre-processing stage. The corresponding architecture is regular and is more efficient than direct implementation of adder arrays. Table 1 shows the operation counts of some practically used lengths.
Table 1 : Operation counts for the pre-processing Stage Length Adder array Proposed method N log, N
Additions Additions
Additions Shifts Inversions
4 16 8 0 1 8
8 64 26 2 4 24
16 256 76 8 12 48
32 1024 236 30 32 160
64 4096 7 48 86 80 384
4 Architecture for post-processing stage
From eqn. 11, the post-processing stage becomes a circu- lar correlator or circular convolver depending on if s = 1 or s = - 1. It is well known that the circular convolution
can be implemented using systolic, semi-systolic arrays and transversal filters [IO]. It can also be computed by
using the circular convolution property (CCP) of the NTT [I I]. The post-processing stage can be converted to
a circular convolution form after a simple data permu- tation. From the derivations in eqns. 10 and 11, the per- mutation orders can be easily generated.
I E E PROCEEDINGS-I, Vol. 138, No. 6 , D E C E M B E R 1991
5 Discussion and conclusions
It is clear that the post-processing stage is of circular-
convolution form from the derivation of the proposed two-stage convolver-based IDCT. Since the IDCT is unitary, the DCT transform matrix is the transpose of the IDCT transform matrix. The two-stage DCT algorithm can thus be obtained by transposing the matrix C N .
It is obviously that the pre-processing stage of the DCT is a circular convolver and can be treated as a FIR filter. The frequency response of the DCT pre-processing
filter is illustrated in Fig. 3. In most of the image coding
9 61
-1 0 0 1 0
frequency
Fig. 3 Frequency response of DCT pre-processingfiller
systems, a pre-filter is always used to remove high fre- quency noises and a post-filter, designed based on human visual properties [12], is also included to improve the picture quality, as shown in Fig. 4. Furthermore, the pre-
0 data In U
E
C Lo ‘? data oute
5
Fig. 4 filteringBlock diagram of image coding system with pre- and post-
processing matrix, Q N , can be directly applied to the field
of image coding and pattern recognition [13, 141.
6 References
1 AHMED. N., and RAO, K.R.: ‘Orthogonal transforms for digital signal processing’ (Springer-Verlag. Berlin, 1975)
2 AHMED, N., NATARAJAN, T., and RAO, K.R.: ‘The discrete cosine transform’, I E E E Trans., 1974, C-25, pp. 9&93
3 CHEN, W.H., SMITH, C.H., and FRALICK, S.C.: ‘A fast computa- tional algorithm for the discrete cosine transform’, I E E E Trans., 1977, COM-25, (9), pp. lWG1009
4 NARASIMHA, M.J., and PETERSON, A.M.: ‘On the computation of the discrete cosine transfom’, I E E E Trans., 1978, COM-26, (6), pp. 934-936
5 MAKHOUL, J: ‘A fast cosine transform in one and two dimen- sions’, I E E E Trans., 1980, A S P - 2 8 , pp. 27-34
6 HOU, H.S.: ‘A fast recursive algorithm for computing the discrete cosine transform’, I E E E Trans., 1987, A S P - 3 5 , pp. 1455-1461 7 LEE, B.G.: ‘Input and output index mappings for a prime factored
decomposed computation of discrete cosine transform’, I E E E
Trans., 1989, A S P - 3 7 , (2), pp. 237-244
8 ERSOY, 0.: ’A two-stage representation of D F T and its apphca-
9 APOSTOL, T.M.: ‘Introduction to analytic number theory’ 1989, A S P - 3 7 , ( I l ) , pp 1743-1749
12 NGAN, K.N., LEONG, K.S., and SINGH, H.: ‘Adaptive cosine lions’, I E E E Trans., 1987, ASP-35, pp. 825 831
(Spnnger-Verlag, New York, 1976)
10 ERSOY, 0.: ‘Semisystolic array implementation of circular, skew circular, and linear convolutions’, IEEE Trans., 1985, C-34, (2). pp. 1W194
11 AGARWAL, R.C., and BURRUS, C.S.: ‘Fast convolution using Fermat number transforms with applications to digital filtering’, IEEE Trans., 1974, A S P - 2 2 , pp. 87-91
transform coding of images in perceptual domain’, I E E E T m n a ,
13 ERSOY, O.K., and KIM, D.Y.: ‘Image recognition wlth the dlScrCIe rectangular-wave transform’, J. Opt. Soc. Am. A, 1988, 5, (11), pp. 5-18
14 KIM, D.Y., and ERSOY, O.K.: ‘Image recognition with the discrete rectangular-wave transform’, 1989,6, (6), pp. 835-843
614 IEE P R O C E E D I N G S - I , Vol. 138, N o . 6 , D E C E M B E R 1991