• 沒有找到結果。

A NEW ARRAY ARCHITECTURE FOR PRIME-LENGTH DISCRETE COSINE TRANSFORM

N/A
N/A
Protected

Academic year: 2021

Share "A NEW ARRAY ARCHITECTURE FOR PRIME-LENGTH DISCRETE COSINE TRANSFORM"

Copied!
7
0
0

加載中.... (立即查看全文)

全文

(1)

436 IEEE TRANSACTIONS ON SIGNAL PROCESSlNG, VOL. 41, NO. I , I A N U A R Y 1993 I, 2) are unbiased and statistically independent. The frequency res-

A New Array Architecture for Prime-Length Discrete

olution bandwidths are given by

Cosine Transform

and

L I I

x ( M

+

1)T, M

.

SNR,’ i = 1, 2 (4.1)

B

=--- Jiun-In Guo, Chi-Min Liu, and Chein-Wei Jen

Abstract-A new approach to derive a systolic algorithm for prime- length discrete cosine transform (DCT) is proposed. It makes use of the input/output (UO) data permutations and the symmetry property of cosine kernels such that the proposed array possesses outstanding

E ( i

- A ) *

= B,?. (4.2) performance in hardware cost of the processing elements (PE’s), av- erage computation time, and the I/O cost.

the asymptotic variances

8 . N B. Sicperresolution

If

1

fl -

fil

<

1 / M T , , but the SNR, are high that

2P(Bl

+

B ? ) < l / M T , (4.3) or, equivalently,

Equation (4.3) means that two sinusoids are resolvable. However, the biases of the AR frequency estimate, a B , / 2 , exist; and the res- olution bandwidths OB, spread by a factor

6 .

And, the statistical variances are given by

C. C h o i c e s of t h e A R Model M and Sample Size N

( I / M

.

SNR,) should be small, at least to meet the resolvability condition ( 4 . 4 ) , and N should be large that ( 1 / N . T , )

<

/ 3 B , / 2 .

ACKNOWLEDGMENT

The author wishes to thank Prof. S . Gutmann of Northeastern University for valuable discussions and suggestions. The com- ments from the Associate Editor and anonymous reviewers are also deeply appreciated. 111 I21 131 I41 151 I61 171 181 191 REFERENCES

R. T. Lacoss. “Data adaptive spectral analysis method,” Grophysics, E. H. Satorius and J . R. Zeidler, “Maximum entropy spectral analysis of multiple sinusoids in noise,” Geophysics, vol. 43, pp. I I 1 1 - 1 118, Oct. 1978.

H. Sakai, “Statistical properties of AR spectral analysis,” IEEE Tram. Acoust., Speech, Signal Processing, pp. 402-409, Aug. 1979. S . W. Lang and J . H. McClellan, “Frequency estimation with maxi- mum entropy spectral estimator,” IEEE Trans. Acousr., Speech, Six-

r 7 d Processing, vol. ASSP-28, pp. 716-724. Dec. 1980.

S . L . Marple. “Resolution of conventional Fourier, autoregressive, and special ARMA methods of spectrum analysis,” presented at the IEEE Int. Conf. ASSP. Hartford, CT, 1977.

S . Haykin, Ed., Nonlinear Methods ofSpecrral Analysis. Berlin Hei- delberg, New York: Springer-Verlag, 1979.

S . M. Kay and S . L. Marple. Jr.. “Spectrum analysis-a modern per- spective,” Proc. IEEE, vol. 69. pp. 1380-1419, Nov. 1981.

D. G . Childers. Ed., Modern Specrrurn Analysis. New York: IEEE,

1978.

S . B. Kesler, Modern Specrrutn Analysis, I I . New York: IEEE. 1986. vol. 36, pp. 661-675, Aug. 1971.

I. INTRODUCTION

The discrete cosine transform (DCT) has been widely used in image coding for its near-optimal performance [ l ] . Since the D C T

is computation intensive, the development of high-speed hardware is necessary in many real-time applications. Systolic arrays are an appropriate architecture to meet the requirements of both high pro- cessing speeds and VLSI implementation. However, the computing

algorithms encapsulated within systolic arrays need to be devel- oped specifically.

Recently, there were some systolic array architectures [2]-[6] proposed to realize one-dimensional D C T . These architectures can be categorized into linear array architectures [2]-[4] and two-di- mensional array architectures

[SI,

161.

Although the two-dimen- sional arrays can attain higher speeds than one-dimensional arrays, the hardware complexity of PE’s and the control complexity of these two-dimensional arrays are generally higher than those of lin- ear arrays. Furthermore, the two-dimensional arrays need high I/O bandwidth and a large number of I/O channels to attain the higher speeds, unless most operands are preloaded into the arrays instead of being supplied from the input ports. But additional over- heads are needed if the operands are preloaded into the arrays like the two-dimensional array in 151. Considering for example the ar- ray in [6], the average computation time for N-point D C T is

( A

+

2 ) cycles, while the number of multipliers in the array is (4N

+

4

A),

if the clock cycle is assumed to be the consumption time of one multiplier. In addition, undesirable features such as the com-

plex control problems, high I/O bandwidth, and a large number

of I/O channels are still accompanied with the array in [6]. The attractive feature of linear arrays is that the U0 bandwidth and the number of I/O channels can be kept independent of the DCT length if the I/O channels exist only at the two extreme ends of a linear array. As discussed in [SI, the high U0 bandwidth re- quired for most systolic arrays would limit computing speeds. Hence, linear arrays should be one feasible architecture for a sys-

Manuscript received January 18, 1991; revised November 4, 1991. Part of this correspondence was presented at the IEEE Workshop on Visual Signal Processing and Communications, June 6-7, 1991. This work was supported by the National Science Council under Grant NSC80-0404-E009- 39.

J.-1. Guo is with the Department of Electronics Engineering, Institute of Electronics, National Chiao Tung University, Hsinchu, 30039, Taiwan, Republic of China.

C.-M. Liu is with the Department of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, 30039, Taiwan, Republic of China.

C.-W. Jen is with the Department of Electronics Engineering, Institute of Electronics, National Chiao Tung University, Hsinchu, 30039, Taiwan, Republic of China.

IEEE Log Number 9203370. 1053-587X/93$03.00 0 1993 IEEE

(2)

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 41, NO. I , JANUARY 1993 431

tem application. However, how to keep I/O channels at the extreme ends of linear arrays and to pursue high computing power at the same time should be a challenging design issue when deriving sys-

where { y(i)/i = 0, 1 ,

. . .

, N - I } is the input sequence and {Y(k)lk = 0, 1,

. . .

, N - 1) is the output sequence. We rep- resent (1) as a matrix-vector multiplication as follows:

-

-

1 1 1 1 1 1 I

cos ( a ) cos (3a) cos ( 5 a ) cos (7a) cos (9a) cos ( I la) cos ( 1 3 4

cos (2a) cos ( 6 a ) cos (loa) cos (14a) cos (IOU) cos (6a) cos ( 2 a ) cos (3a) cos (9a) cos ( 1 3 4 cos (7a) cos ( a ) cos ( 5 a ) cos ( I la) cos ( 4 4 cos ( 1 2 a ) CO5 ( S a ) 1 cos ( S a ) cos (12a) cos (4u) cos (5a) cos (13a) cos (3a) cos (7a) cos ( l l a ) cos (a) cos ( 9 a ) cos (6a) cos (loa) cos ( 2 a ) cos (14a) cos ( 2 a ) cos ( l o a ) cos ( 6 a )

-

-

tolic algorithms for linear arrays. The approach in [ 2 ] is to directly represent the D C T as a matrix-vector multiplication first. Then, the systolic array realization for the matrix-vector multiplication can be directly modified to compute the DCT. Since the designed array in [ 2 ] cannot retain the I/O channels at the two extreme ends of itself, a large number of I/O channels and high U0 bandwidth are needed. Another approach [3] modifies the D C T into a form similar to the discrete Fourier transform (DFT) and realizes the DCT by using the array that has been developed for the D F T . Since the twiddle factor exp ( j 2 7 r l N ) in the D F T is a complex number while the factor cos (27r/4N) in the D C T is a real number, the designed arrays based on this approach should induce much hard- ware cost. In addition, the approach in [4] is also to represent the DCT as a matrix-vector multiplication like [ 2 ] , but it generates the transform kernels recursively in the array instead of prestoring them in memory. The array in [4] uses this method to reduce the I/O cost such a s the number of U0 channels and I/O bandwidth, but addi- tional hardware cost is paid for recursive generations of the cosine kernels.

T o simultaneously consider the hardware cost, the IiO band- width, and the number of I/O channels, a systolic algorithm for prime length D C T is derived in this correspondence. The design approach utilizes the input and output data permutations accom- panied with the symmetry property of the cosine kernels such that the proposed array can retain most I/O channels at the two extreme ends and simultaneously attain good performance in average com- putation time, hardware cost of the PE’s, and the number of the

PE’s. The performance of the proposed array and that of the linear arrays in [2]-[4] are discussed in Section 111. From Section 111, we can see that the proposed array possesses better performance than the arrays [2], [3] in the hardware cost of the PE’s, the average computation time, the number of U0 channels, and the IiO band- width. Moreover, it also possesses better performance than the ar- ray [4] in the hardware cost of the PE’s. The overheads of the proposed array include some additional shift registers, latches, multiplexers, a demultiplexer, and a switching element for solving control problems. Basically, these overheads are minor as com- pared with the savings in regard to the hardware cost of the PE’s in the array. This correspondence is arranged as follows. Section

I1 describes the derivation of the computing algorithm encapsulated

in the array. Section I11 considers the array realization of the pro- posed systolic algorithm. A brief conclusion is given in Section IV.

11. T H E ALGORITHM DERIVATION The D C T is defined as N - I ~ ( k ) =

C

y(i) cos r = O f o r k = O , I ; . . . N - 1

where “a” denotes s / 1 4 ; and N is assumed to be 7. If ( 2 ) is di- rectly realized by linear array architectures, as was done in [ 2 ] , there would be one input port needed in every PE to transmit the cosine kernels for proper operations, and would induce a large number of I/O channels and high I/O bandwidth. It can be shown that the D C T defined in (1) can be formulated a s

Y(k) = { 2 T ( k )

+

x(0)) cos

I&],

f o r k = O , l ; . . , N - 1.

(3)

where

and x ( i ) is another sequence defined as x(N - 1 ) = y(N - 1)

x ( i ) = y(i) - x ( i

+

1) f o r i = 0, I ,

. . .

, N - 2. ( 5 ) If N is a prime number, there exists some number of “ g , ” not necessarily unique, such that there is a one-to-one mapping from integers { i j i = 1 , 2 ,

. . .

, N - 1 ) to integers { j l j = 1, 2 ,

. . .

, N - l ) , given by

J = I K ‘ I N ( 6 )

where \AIN denotes the result of “A-modulo-N” operation. Then (4) can be reformulated with i and k as powers of the primitive element “g.” Because i and k take on the value zero, and zero is not a power of “ g , ” the zero frequency component must be treated specially, i.e.,

A’- I

Y(O) =

C

y ( i )

,

= 0 (7)

Y(k) = { 2 T ( k )

+

x(O)} cos

1

3

f o r k = I ; . . , N - l . ( 8 4 where

N - I

T(k) =

c

x ( i ) cos

I$],

f o r k = 1,

. . . ,

N - 1 . (8b)

I = I

Applying ( 6 ) to (Sb), it follows that

(3)

438 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 41. NO. I , JANUARY 1993

The term

"I

g'

IN

x

1

g

kIN"

can be expressed as

1

& I N X

I

g k I N =

1

g ' + ' l N

+

m x N, i , k = l , 2 ; . . , N - l where "m" is a n integer. Then, (9a) can be written as

N - I T'(k) = T(I g k l N ) = x ' ( i ) X C;, , = I k = 1 , 2 ,

. . .

,

N - 1 where and x ' ( i ) = x(l g'

I N )

Applying (1 1) to (9c), (9c) can be written as

( N - 1 ) / 2

(9b) T'(k) = I = I x " ( i ) x C ; , k = 1, 2, *

. .

, N - 1. (12a)

where

x " ( i ) =

if mi and m2 are one even number and one odd number

if ml and m2 are all even numbers or all odd numbers

Now (7), @a), and (9c) constitute the computational equations for

the DCT. To see the difference between these computational equa- tions and ( I ) , (9c) is written as

where "a" denotes 1 r / 7 , N a n d "g" are assumed to be 7 and 3, respectively. It can be seen that the absolute values of the cosine kernels along same antidiagonal positions in the matrix of (10) are the same while those in the matrix of (2) d o not have any specific order like (IO). This phenomenon tells that the vector of T'(k) is the circular convolution of inputs x' (i) and the cosine kernels. The phenomenon also exists in the DFT, which was firstly found by Rader [IO] and has also been used to design the efficient systolic arrays for prime length DFT [9]. Now we apply it to derive the systolic algorithm for DCT. From the viewpoint of array realiza- tion, the constant value along the same antidiagonal positions means that this variable can be sent to every PE along a link from one input port at the extreme end of a linear array. The (2N - 3) an- tidiagonal lines in the matrix of (10) mean that there are only (2N - 3) values instead of N 2 values in the matrix of (2) needed to be sent to the array. This phenomenon can be effectively captured to design the systolic array with a low number of I/O channels and low I/O bandwidth.

From (IO), since cos ( k a / N ) = -cos ((N - k ) a / N ) , it is ob- served that the absolute values of the cosine kernels located at the left three columns are the same as those located at the right three columns. This symmetry property benefits further reduction of the computational complexity in the algorithm. As shown in the Ap- pendix, the symmetry property of the cosine kernels can be ex- pressed as the following equation:

and

N - 1

j = 1,

. . .

9- - , and k = l ; . . , N - l

L

The integers m l and m2 are determined in the following equations:

N - I

and i = I;.. '

-

2

k = 1 , 2 ; . . , N - I ,

where

I

g n + k l N +

1

g r f A + ( N - I l / Z I N - - N.

Now (7), @a), and (12) constitute the computational equations of

the DCT in the proposed algorithm. Considering the computational complexity, the number of multiplications has been reduced from (N - 1)2 in (9c) to (N - 1 ) 2 / 2 in (I2a). In addition, the vector of

T' (k) in ( 1 2 4 is still in a circular convolution form. It will be shown in the next section that such a form is beneficial to the reduction of I/O cost.

(4)

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 41. NO. I , J A N U A R Y 1993

x’(l)tx’(4) Wtx’(5) ~‘(3)+~‘(6)

Fig. l(a). The dependence graph (DG) of the proposed algorithm for 7-point DCT where “a” denotes ~ / 7

111. T H E A R R A Y REALIZATION

This section considers the array realization of the proposed sys- tolic algorithm. Fig. 1 shows the dependence graph (DG) [12] of

the proposed algorithm for a seven-point DCT. The DG clearly shows the data operations, data dependency, and control signals involved in the proposed algorithm. Linear arrays can be con- structed from the DG according to the design procedure [12]. And the tag control scheme [ 131 can be utilized for the I/O control and data control. Based on the two design approaches, Fig. 2 shows the constructed array for seven-point DCT with projection vector [0 11. For the sake of showing the activity of the array clearly, we rewrite (7), (8a), and (12a) in recursive forms as

z ;

= x(0)

zb

=

z&’

+

2 X [ x ’ ( i )

+

x ’ ( i

+

3)], i = 1, 2, 3. Y ( 0 ) =

z ;

Y ’ ( k ) = {2T’(k)

+

x(0)) x cos

(5

( 3 k

I,),

, 6. k = 1,

. . .

y ; = 0 y ; = y i - ’

+

x ” ( i ) x

ci,

i = 1 , 2 , 3 , k = l ; . . , 6. (13c) T ( k ) = y : where

and “y;” and “zb” are the intermediate results.

From Fig. 2(a), we know that the operations specified in (13a) and (13b) are computed within the left-most PE, while those in (13c) are computed in other PE’s. The multiplication and addition

y * #

C’

x3’x4. x l ’ x2’ c’ C= c ;

xl’ <= X l ;

x2’ <= x2 ; elseif sign=01 then x3’<=x1; elseif sign=lO then

If sign=00 then y’c=y+xl’C ; If T a g l = l then y‘<=y-xl’c ; x4‘c=x2; y’c=y+X2’C ; else else x4‘<=x4; end x3-<=x3; y’<=y-XZ’c ; end x 3

Ylo<=(zy+xl)’c ; If TagZ=l then If T a g l = l then Yloc=r’ ;

z’<=xl+Zx3 ; else z’<=z+2x3 ; end else y20<-0 ;

end

Fig. I(b). The functions of nodes.

439

constitute the main functions of the PE’s, which are shown in Fig. 2(b). And three control signals denoted as “ T a g l , ” “Tag2,” and “sign” are used to select the right operands in the operations. Fig. 2(c) shows the preprocessing stage needed in the array. The inter- mediate sequence x ( i ) can be generated from input sequence y ( i ) by a subtractor, and then we use the multiplexers and a switching element to permute the sequence x ( i ) where the required control signals can be generated by circular shift registers. Finally, the required data patterns are obtained by adding and subtracting the permuted data. Fig. 2(d) shows the postprocessing stage in the ar- ray, which uses a demultiplexer to perform the output data per- mutation. Similarly, the control signals needed in the demulti- plexer can be generated by a circular shift register. The utilization of shift registers and latches in Fig. 2(c) and Fig. 2(d) makes the array able to be pipelined. That is, the intermediate signals x ( i ) and output results Y ( k ) of current block are shifted into the shift registers seriously. After all of these ( N - 1) values have been

(5)

440 s i g n = l O Y l + Y l + X i n l ' C 1 Tag1=' ~ e x t 2 * ~ 1 T a g l =O

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 41, NO. I . JANUARY 1993

\ s i g n z 0 0 s i g n = 0 1 s i g n = l l X e x t l 'C1 X e x t l 'C1 Xext2 C1 X i n l ' C 1 X i n l ' C 1 XinS'C1 1 Y l + Y l - Y1; Y l + Y l - Y l - 10 11 m m m 10 10 m m 10 m 10 1.8 0 0 t.7 0 0 1.6 0 t=5 ~'(6)+~'(3) ~'(3)-~'(6) 1-4 X'(5)+~'(2) x:(2)-x'(5) 1.3 ~'(4)+~'(1) x (1)4(4) 1 Y1 o e- ( 2 y l +Xext3)'C2 If T a g l = l then else y 2 ' ~ 2 ' C- c- y 2 + 2 X e x t i Xext3+2Xextl Y l O 4-

-

XextZ Y l If T a g E l then Y2o

+

y 2 4- T a g l e l s e Y2o 4--- 0 e n d C 1 ' C C 1 X e x t l '

+

X e x t l T a g l ' e T a g l X e x t Z ' C X e x t Z If T a g l = l then X e x t l ' X e x t l X i n l '

+

X e x t l Xin2' +- Xext2 XextZ' C 1 ' v i ' control

r

0 0 0 1 1 0 1 1 1 1 1 1 circular SR 1.6 1. 5 1. 4 1. 3 1.2 1. 1 1.9 1. 8 1. 7 1-6 t:5 1 :4 1.3 1 2 1. 1 Control If controk0 then else end U'<= U ; v.<= v ; U'<= v ; V'<= U ;

Fig. 2. (a) The array architecture for 7-point DCT where " r ( )" denotes cos ( ) and " U " denotes ~ / 7 . (b) The functions of

the PE's in the array. (c) The preprocessing stage in the array where SR denotes shift register, SE denotes switching element,

(6)

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 41, NO. I , J A N U A R Y 1993 44 I

0 1

0 1 0 Y(5)

Y(1)

Fig. 2. (Conrinued) (d) The postprocessing stage in the array where SR denotes shift register and L denotes latch.

shifted into the registers, they are shifted parallelly into the latches for the I / O data permutations such that the data of next block can be continuously shifted into the registers without any time delay. Therefore, the proposed array including the preprocessing and postprocessing stages can be fully pipelined, and a high throughput rate of the design can be attained.

In order to see the features of the proposed array more clearly, (12a) is expressed as

1

-

--COS ( 2 ~ ) COS ( 6 ~ ) COS (4a) COS ( 6 ~ ) COS ( 4 ~ ) --COS ( 5 ~ ) cos (4a) -cos (5u) -cos (a)

-cos (5u) -cos ( a ) -cos (3a) cos ( a ) -cos (3a) cos (2a)

L cos (3a) cos (2a) cos (6a)

x’ ( I ) f x’ (4) x ’ ( 2 )

*

x’(5) x ’ ( 3 ) k x ’ ( 6 )

where “a” denotes * / I , N a n d “g” are assumed to be 7 and 3, respectively. If “ k ” is equal to 1 , 5 , and 6, the minus signs in the

values instead of eight to the array for computing each seven-point DCT. It can be seen from the array in Fig. 2(a) that only ( N - 1) cosine kernels are needed to compute an N-point DCT. And, the average computation time for computing the N-point D C T is ( N -

1) cycles. This phenomenon is induced from the cyclic property of Exerting the specific order of the cosine kemels in the matrix of (14). these kernels in the array are imported from the right-most PE instead of being imported from every PE a s the approach in [2]. Therefore, the proposed array requires a low number of I/O chan- nels and low 1 / 0 bandwidth. Considering the I/O cost, the I/O cost of the designs [2]-[4] are proportional to (N

+

2 ) L [2], (N

+

3 ) L [3], and 8L [4] where L is the wordlength. And, the I / O cost of the proposed array is only proportional to 7 L

+

N

+

2. Also,

the proposed array needs much lower hardware cost than the de- signs (21-[4]. The required numbers of multipliers are N [2], 4N

+

4 [3], and 2N - 2 [4], which are much larger than the ( N

+

1)/2 of the proposed array. Moreover, regarding t o the average computation time, the proposed array needs (N - 1) cycles for computing N-point D C T , which is better than the N cycles in (21, and also better than the (N

+

1) cycles in [3]. The hardware over-

heads of the proposed array include some shift registers, latches, multiplexers, a demultiplexer, and a switching element for solving the control problems and the I/O data permutations. And the cycle time of the array includes the multiplication and addition time a s well as the time for multiplexing. However, these overheads are minor as compared with the savings of hardware cost in the pro- posed array. As a whole, the proposed array excels the arrays [2], [3] in average computation time, hardware cost of PE’s, the num- ber of I/O channels, and the I/O bandwidth. It also excels the array (41 in hardware cost of the PE’s.

the modulo operation in (6), i.e.,

1

g ’ I , v =

I

g N - ’

- ‘

I N .

IV. CONCLUSIONS

In this correspondence, a new approach to derive the systolic algorithm for prime length D C T is presented. This approach in- duces the array to have good performance in hardware cost of PE’s, average computation time, the number of I/O channels, and the I/O bandwidth. Also, this design approach can be similarly applied to derive the systolic algorithms for discrete sine transform (DST) and discrete Fourier transform (DFT) [9]. Although the proposed systolic algorithm and array are derived under the restriction that N is a prime number, they can be applied to the nonprime !ength DCT by appending the input data from nonprime length to prime length at the expense of some overheads in hardware cost and av- erage computation time. With these overheads, the hardware cost of the proposed array is still lower than that in the arrays (21-[4]. However, it is not always a drawback that N is a prime number. It is known that the blocking effect will occur in the D C T as applied to image coding with low bit rate. And the overlapping method is one of the remedies for this problem [ 1 11. Applying the proposed algorithm to the nonprime length D C T by using the overlapping method can also reduce the undesirable blocking effect.

APPENDIX input vector are valid. Otherwise, plus signs are valid. As shown

for computing the N-point DCT. And C = {cos ( 2 a ) , cos ( 6 a ) , cos

in (14), there are ( 3 N - 5 ) / 2 values needed to be Sent to the array In the Appendix, the proof Of ( l

‘1

is given’ At first, ( l ’) is “- written here as

1

g ’

I N

1

= cos

1

(N -

1

g ’ + “ - ‘ I / ’

IN)

- g ’ + ‘ ” - I ) / ?

I

;

(4a), cos ( 5 ~ ) . cos ( a ) , cos (3a), cos ( 2 a ) , cos ( 6 a ) ) is the sequence of these eight values for the seven-point DCT. It is observed that the last two cosine kemels are identical to the first two cosine ker- nels in

C.

And these common cosine kemels can be shared for computing two neighboring blocks successively. As many image blocks are processed continuously, it is only necessary to send six

- ~ cos -

(7)

442 IEEE TRANSACTIONS ON SIGNAL PROCESSING. VOL. 41, NO. I, JANUARY 1993

The necessary and sufficient condition that ( A l ) holds is image coding,” in Proc. fCASSP 83, Boston, MA, 1983, pp. 1212- 1215.

[I21 S . Y . Kung. VLSI Array Processors. Englewood Cliffs, NJ: Pren- tice-Hall, 1988, Chapters 3 and 4, pp. 110-282.

[I31 C. W. Jen and H. Y . Hsu, “The design of a systolic array with tags Input,” in Proc. ISCAS, Finland, 1988, pp. 2263-2266.

I

N .

I

g ‘ I N

= N -

I

g T + ( N - 1 ) / 2

That is

IN = N (A21

1

+

I

g ‘ + “ - 1 1 0

where “g” is a primitive element. According to the number theory [7], we have

1

g ( N - I l / 2 I N = - 1;

I A

= 1 I A l N l B I N I N

Utilization of Bandpass Filtering for the Matrix

Pencil Method

then

Fengduo Hu, T . K. Sarkar, and Yingbo Hua

=

II

g f l N x ( N

-

1)IN =

1 - 1

g ‘ l N l N

Abstract-This correspondence describes an alaorithm named the

= IN -

I

R ’ I N I N

a s 0

<

I

g ’ I N 5 N

-

1, 1 5 i 5 N - 1, we have IN -

I

g ’ l N l N = N -

I

g ’ l N . It means that

bandpass matrix pencil (BPMP) method for estimating the parameters of an exponential data sequence. The matrix pencil (MP) method, along with a filtering technique, is used to estimate the complex exponentials of the signal. However, due to special requirements to the filtered data by the MP method, the prefiltering process is not trivial. The approach presented here utilizes the backward process for the IIR filtering and the circular convolution for the FIR filtering. resoectivelv. Monte Carlo I _

simulations are presented to illustrate the performance of the proposed filtering schemes. IN = IN -

I

d I N I N = N -

I

g ’ h .

I

g l + ( N - I ) / 2 so

1

g ‘ I N

+

I

g r + ( N - I l 1 2

I N

= N . I. INTRODUCTION Therefore, ( 1 1) is proved. ACKNOWLEDGMENT

The authors are very grateful to the reviewers for their construc-

The mathematical model of an observed signal can generally be formulated as M y ( k ) = x ( k )

+

n ( k ) =

c

R , Z f

+

n ( k ) , , = I k = 0 , 1,

. . .

, N - 1 ( 1 ) comments. REFERENCES

N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine trans- form,” IEEE Trans. Comput., vol. C-23, pp. 90-93, Jan. 1974.

U . Totzek and F. Matthiesen, “Two-dimensional discrete cosine transform with linear systolic arrays,” in Proc. Int. Con& Systolic Arrays, Ireland, 1989, pp. 388-397.

N. I. Cho and S . U . Lee, “DCT algorithms for VLSI parallel imple- mentations,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 38, no. I , pp. 121-127, Jan. 1990.

L. W. Chang and M. C. Wu, “A unified systolic array for discrete cosine and sine transforms,” IEEE Trans. Signal Processing, vol. 39, no. 1, pp. 192-194, Jan. 1991.

C. Chakrabarti and 1. Ja’Ja’, “Systolic architectures for the compu- tation of the discrete Hartley and the discrete cosine transforms based on prime factor decomposition,” IEEE Trans. Comput., vol. 39, no. M. H. Lee, “On computing 2-D systolic algorithm for discrete cosine transform,” IEEE Trans. Circuits Syst., vol. 37, no. 10, pp. 1321-

1323, Oct. 1990.

Shu Lin and Daniel J. Costello, Jr., Error Control Coding: Funda- menrals and Applications. Englewood Cliffs, NJ: Prentice-Hall, 1983, Chapter 2, Section 2.2, pp. 19-24.

A. L. Fisher and H. T. Kung, “Special-purpose VLSI architectures: general discussions and a case study,” in VLSI and Modern Signal Processing,” S . Y . Kung e r a l . , Eds. Englewood Cliffs, NJ: Pren- tice-Hall, 1985, Chapter 8, pp. 154-169.

C. M. Liu and C. W . Jen, “A new systolic array algorithm for dis- crete Fourier transform,” IEEE Trans. Comput., 1990, submitted for publication, also in Proc. Int. Symp. on Circuits and Systems, Sin- gapore, 1991.

C. M. Rader, “Discrete Fourier tranforms when the number of data samples is prime,” Proc. IEEE, vol. 56, 1968, pp. 1107-1 108. H. C. Reeve, 111, and J. R. Lim, “Reduction of blocking effect in 11, pp. 1359-1368, NOV. 1990.

where

z,

= exp ( - a ,

+

jw,) ( 2 )

and

z,’s

and R,’s are the poles and residues of the signal, respec- tively.

M

is the number of poles of the signal, and n ( k ) is the back- ground noise. a, and w, are the damping factor and angular fre- quency of the i t h sinusoid, respectively. Once the number of poles and their values have been determined, the residues at the poles can be found by the least squares method. Hence, only the problem of estimation of the poles is considered in this correspondence.

The most popular method for pole retrieval is Prony’s method. However, Prony’s method is notorious for its extreme sensitivity to noise. There are many modified versions of the Prony method. The most well known one is the principal eigenvector (PE) method 111. Recently, Hua and Sarkar 121, [3] developed a new technique, named the matrix pencil (MP) method, for pole estimation. The

advantage of using matrix pencil is that the signal poles can be found directly from the eigenvalues of the matrix contrast to the PE method, which generally requires two-step processes. In the first step one solves a matrix equation, and finds the roots of a polynomial equation in the second step.

Manuscript received August 10, 1989; revised October 24, 1991. F. Hu is with Entropic Speech Inc., Cupertino, CA 95014.

T. K. Sarkar is with the Department of Electrical and Computer Engi-

Y, Hua is with the Department of Electrical Engineering, University of IEEE Log Number 9203378.

neering, Syracuse University, Syracuse, NY 13244.1240. Melbourne, Parkville, Victoria, Australia 3052.

數據

Fig.  l(a).  The dependence graph  (DG) of  the  proposed algorithm for  7-point DCT  where  “a”  denotes  ~ / 7
Fig.  2.  (a) The  array  architecture  for  7-point  DCT  where  &#34; r (   )&#34;  denotes  cos (  )  and  &#34; U &#34;   denotes  ~ / 7
Fig.  2.  (Conrinued)  (d) The  postprocessing  stage in  the  array  where  SR  denotes shift  register  and L  denotes latch

參考文獻

相關文件

了⼀一個方案,用以尋找滿足 Calabi 方程的空 間,這些空間現在通稱為 Calabi-Yau 空間。.

The research proposes a data oriented approach for choosing the type of clustering algorithms and a new cluster validity index for choosing their input parameters.. The

Then, it is easy to see that there are 9 problems for which the iterative numbers of the algorithm using ψ α,θ,p in the case of θ = 1 and p = 3 are less than the one of the

volume suppressed mass: (TeV) 2 /M P ∼ 10 −4 eV → mm range can be experimentally tested for any number of extra dimensions - Light U(1) gauge bosons: no derivative couplings. =&gt;

• Formation of massive primordial stars as origin of objects in the early universe. • Supernova explosions might be visible to the most

We propose a primal-dual continuation approach for the capacitated multi- facility Weber problem (CMFWP) based on its nonlinear second-order cone program (SOCP) reformulation.. The

• An algorithm for such a problem whose running time is a polynomial of the input length and the value (not length) of the largest integer parameter is a..

(Another example of close harmony is the four-bar unaccompanied vocal introduction to “Paperback Writer”, a somewhat later Beatles song.) Overall, Lennon’s and McCartney’s