Two-stage convolver-based DCT and IDCT algorithms

(1)

Two-stage convolver-based DCT and IDCT

algorithms

J.-L. WU S.-H. HSU W.-J. Duh

Indexing terms ’ Transform, Algorithms, Image processing

Abstract: Discrete cosine transform is the most popular solution for image coding owing to its near optimal performance, yet it is not signal dependent. A two-stage convolver-based DCT and IDCT algorithm with power-of-two length is presented. The transform matrix of IDCT is decomposed into the product of pre- and post processing matrices by using Mobius inversion formula. The pre-processing stage consists of just additions and subtractions; the post-processing stage performs circular convolution operations. A nearly recursive computational method for preprocessing cal- culation is given which considerably reduces the operation count and is valuable in practical usage. The post-processing is a circular convolution and can be computed using number theoretic transform, (semi-) systolic arrays or transversal filters.

1 Introduction

Discrete orthogonal transforms are increasingly impor- tant in many practical applications, such as data com- pression and spectrum analysis [

11.

Among these various discrete orthogonal transforms, the discrete cosine transform (DCT) is generally recognised as the best solution to encode digital speech and image owing to its near optimal performance, yet it is not signal dependent [ 2 ] . Along with the growing applications of DCT, many fast algorithms had been proposed [3-71. Ersoy [8] proposed a two-stage algorithm for DFT which is based on the vector transformation of sines and cosines into new basis functions using the Mobius inversion of number theory A new IDCT algorithm, with power-of-two length, derived following the construction method developed in Reference 8, is proposed. The kernel matrix of the transform is factorised into pre-processing and post-processing matrices. The pre-processing stage consists of just additions and subtractions. The post-processing stage is a circular-convolver and can be realised using contempo- rary finite impulse response (FIR) filter implementation techniques. Although the development of the proposed algorithm is similar to that of Reference 8, the resultant r91.

Paper 83341 (E4). received 3 1 s May 1991

The authors are with the Department of Computer Science and lnfor- mation Engineering, National Taiwan University, Taipei, Taiwan, Republic of China

I E E P R O C E E D I N G S - I , Vol. 138, N o . 6, D E C E M B E R 1991

two-stage convolver-based IDCT algorithm is thought to be new.

2 Preliminaries for two-stage IDCT algorithm Since the IDCT can be realised simply by transposing the signal flowgraph of the DCT and vice versa, it is suffi- cient to consider the IDCT-like equation

where

j(n) = y ( n ) n = I , 2, . . . , N - 1 and N = 2“, where m is an integer.

The sequence x(k) in eqn. 1 is first partitioned into odd and even sequences which then forms a new sequence z(k)

c41

as

z ( k ) = x(2k)

2 k

+

- = x ( N - (2k

+

1))

(

where k = 0, 1,

. . .

, N/2 - 1

Thus from eqns. 1 and 3

( 3 )

(4)

where k = 0, 1, _ . ., N ~ 1. Eqn. 1 can now be presented

in another form as

N - 1

i ( k ) = “ = O

c

j ( n ) cos

(x)

( 5 )

where Z(4k

+

1 ) = z(k).

For convenience, an index group I(4N) is defined as

I(4N) = 4k

+

1 mod 4N where k is an integer.

From the relationship between i(k) and z(k), it is clear that the evaluation of I(k) on the index group I(4N)

results in z(k).

According to the derivation in Reference 8, the cosine function can be represented in another set of basis, say 61 I

(2)

b(m,, N ) , by using the Mobius inversion formula [9] as M cos

(g)

=

1

b(m,, N ) u m 1 = 1 1 0 < x mod 1 < 0.5 u ( x ) =

i

- 1 0.5 < x mod 1 < 1 ( 6 ) 0 x mod 1 = 0 or 0.5

where M = N - 3 and m , E I ( N ) as N is a power-of-two. Substituting eqn. 6 into eqn. 5

Y (n)

I

' 16 616 H16 H16 R16 H:6

Hk

premultiplication of c(n)

1

bit reverml permutation

1 I

fast preprocessing stage

I

permutation of H16

I

circular convolution of postprocessing stage

I

permutation of H:6

I

inverse reordering of S16

c

Fig. 1 Sysrem block diagram of proposed I D C T N = 16

where M = 4N ~ 3 and m , , k E I(4N). Interchanging the

order of the summation, eqn. 7 can be decomposed into two stages

M

i(k) =

1

b(m,, 4N)h(m,k mod 4N) (9)

m, = 1

In eqn. 8, the sequence j(n) is transformed through the pre-processing stage to an intermediate sequence h(0; in eqn. 9, the post-processing stage, h(0 is transformed to the sequence Z(k).

Since k and m , of eqn. 9 are both in 1(4N), it is easy to

find a primitive root [9] of order N to generate a new

sequence such that it is in a circular convolution form. Let g be a primitive root of order N of the index group

1(4N), then the following index mappings are defined : k = gk' mod 4N

m , = gSm' mod 4N (10)

612

where g can be 4N - 3 and s =

+

1 or - 1. Substituting eqn. 10 into eqn. 9, the equation becomes a circular con- volution as

N - I ,

m , = O

i ( g k ' ) = b(g""', 4N)h(gk'+""; mod 4N) (1 1) where k' = 0, 1,

. .

., N - 1. As shown in eqn. 11, when

s = - 1 the post-processing becomes a circular convolu- tion; and when s =

+

1 it is a circular correlation.

From eqns. 6 and 8, it is clear that the pre-processing stage involves additions and substractions only; and also from eqns. 6, 9 and 11, the post-processing can be con- verted to a circular convolution form which can be computed by using circular-convolver [lo] or NTT (number

theoretic transform) [l

11.

Therefore, both pre- and post- processing stages are suitable for realistic implementation.

3 Fast computations of pre-processing stage Following the definition of eqn. 6, a two-stage convolver- based IDCT is expressed in eqns. 8 and 9. Although the pre-processing stage consists of additions/subtractions only, direct computation of the proposed algorithm takes about nz multiplications and additions. Thus for real

applications, it is necessary to reduce the operation counts and modularise the computational structure. Matrix algebra is used to clarify the derivations. Rep- resenting the IDCT in matrix form as

CN = SNRNQNPN (12) where C , denotes the IDCT transform matrix; PN

denotes the diagonal matrix for multiplication needed in

eqn. 2; QN is the pre-processing matrix in eqn. 8; RN is

the post-processing matrix in eqn. 9; SN is the inverse reordering described in eqn. 3; and the subscript N

denotes the transform length.

Decomposition of

QN

is the main theme. Since 1 E 1(4N), the symmetry of QN is obtained as follows:

( N i 2 ) - 1 " = O

+

')

(2n

+

1x41'

+

1) 4 + j ( 2 n + I)u

(

(2n

+

4 N 1x41'

+

1)

+-+i)

1 2 4 (14) where I' = 0, 1, ..., ( N / 2 ) - 1. Since it folows that

where

0,

= Q N B N and B, is the bit reversal permu-

tation matrix.

From eqn. 8, the function u[(nl/4N)

+

$1

is, in fact, the sign function of cos (n1/4N). To implement eqn. 16, a fast computational method devised to construct the recurrent

(3)

relation between DNi2 and

ON,,

is presented as follows:

D N / 2 = M N / 2 0 N / 2

+

A N / 2 (17)

and eqn.

16

becomes

‘ N / 2 ] (

[

‘N/Z ‘ , N / Z ) ] [ : : I ‘ N P ]

ON

=

[t::

-‘N/Z ‘(NO) M ( N / Z ) O N / ,

where MN12, a diagonal matrix with elements 1 or - 1, stands for the multiplication matrix and A N / , , a sparse matrix with nonzero elements 2 or -2, denotes the com- pensation matrix.

From eqn. 18, it is obvious that

8,

is nearly recurs- ively decomposable. The computational diagram of eqn. 18 is illustrated in Fig. 2. The complexity of the pro-

Fig. 2 Fast computation of pre-processing stage

cedure depends on the computation of matrix A N / , and increases as N gets larger. Though the derived recurrence

is not perfect, it reduces the number of operations of the pre-processing stage. The corresponding architecture is regular and is more efficient than direct implementation of adder arrays. Table 1 shows the operation counts of some practically used lengths.

Table 1 : Operation counts for the pre-processing Stage Length Adder array Proposed method N log, N

Additions Additions

Additions Shifts Inversions

4 16 8 0 1 8

8 64 26 2 4 24

16 256 76 8 12 48

32 1024 236 30 32 160

64 4096 7 48 86 80 384

4 Architecture for post-processing stage

From eqn. 11, the post-processing stage becomes a circular correlator or circular convolver depending on if s = 1 or s = - 1. It is well known that the circular convolution

can be implemented using systolic, semi-systolic arrays and transversal filters [IO]. It can also be computed by

using the circular convolution property (CCP) of the NTT [I I]. The post-processing stage can be converted to

a circular convolution form after a simple data permutation. From the derivations in eqns. 10 and 11, the per- mutation orders can be easily generated.

I E E PROCEEDINGS-I, Vol. 138, No. 6 , D E C E M B E R 1991

5 Discussion and conclusions

It is clear that the post-processing stage is of circular-

convolution form from the derivation of the proposed two-stage convolver-based IDCT. Since the IDCT is unitary, the DCT transform matrix is the transpose of the IDCT transform matrix. The two-stage DCT algorithm can thus be obtained by transposing the matrix C N .

It is obviously that the pre-processing stage of the DCT is a circular convolver and can be treated as a FIR filter. The frequency response of the DCT pre-processing

filter is illustrated in Fig. 3. In most of the image coding

9 61

-1 0 0 1 0

frequency

Fig. 3 Frequency response of DCT pre-processingfiller

systems, a pre-filter is always used to remove high frequency noises and a post-filter, designed based on human visual properties [12], is also included to improve the picture quality, as shown in Fig. 4. Furthermore, the pre-

0 data In U

E

C Lo ‘? data out

e

5

Fig. 4 filtering

Block diagram of image coding system with pre- and post-

processing matrix, Q N , can be directly applied to the field

of image coding and pattern recognition [13, 141.

6 References

1 AHMED. N., and RAO, K.R.: ‘Orthogonal transforms for digital signal processing’ (Springer-Verlag. Berlin, 1975)

2 AHMED, N., NATARAJAN, T., and RAO, K.R.: ‘The discrete cosine transform’, I E E E Trans., 1974, C-25, pp. 9&93

3 CHEN, W.H., SMITH, C.H., and FRALICK, S.C.: ‘A fast computa- tional algorithm for the discrete cosine transform’, I E E E Trans., 1977, COM-25, (9), pp. lWG1009

4 NARASIMHA, M.J., and PETERSON, A.M.: ‘On the computation of the discrete cosine transfom’, I E E E Trans., 1978, COM-26, (6), pp. 934-936

5 MAKHOUL, J: ‘A fast cosine transform in one and two dimen- sions’, I E E E Trans., 1980, A S P - 2 8 , pp. 27-34

6 HOU, H.S.: ‘A fast recursive algorithm for computing the discrete cosine transform’, I E E E Trans., 1987, A S P - 3 5 , pp. 1455-1461 7 LEE, B.G.: ‘Input and output index mappings for a prime factored

decomposed computation of discrete cosine transform’, I E E E

Trans., 1989, A S P - 3 7 , (2), pp. 237-244

(4)

8 ERSOY, 0.: ’A two-stage representation of D F T and its apphca-

9 APOSTOL, T.M.: ‘Introduction to analytic number theory’ 1989, A S P - 3 7 , ( I l ) , pp 1743-1749

12 NGAN, K.N., LEONG, K.S., and SINGH, H.: ‘Adaptive cosine lions’, I E E E Trans., 1987, ASP-35, pp. 825 831

(Spnnger-Verlag, New York, 1976)

10 ERSOY, 0.: ‘Semisystolic array implementation of circular, skew circular, and linear convolutions’, IEEE Trans., 1985, C-34, (2). pp. 1W194

11 AGARWAL, R.C., and BURRUS, C.S.: ‘Fast convolution using Fermat number transforms with applications to digital filtering’, IEEE Trans., 1974, A S P - 2 2 , pp. 87-91

transform coding of images in perceptual domain’, I E E E T m n a ,

13 ERSOY, O.K., and KIM, D.Y.: ‘Image recognition wlth the dlScrCIe rectangular-wave transform’, J. Opt. Soc. Am. A, 1988, 5, (11), pp. 5-18

14 KIM, D.Y., and ERSOY, O.K.: ‘Image recognition with the discrete rectangular-wave transform’, 1989,6, (6), pp. 835-843

614 IEE P R O C E E D I N G S - I , Vol. 138, N o . 6 , D E C E M B E R 1991