Introduction to 128-Point FFT - 自我補償之固定長度乘法器及其應用

Chapter 1 Introduction

4.1 Introduction to 128-Point FFT

In this section, the 128-point FFT algorithm proposed by Y-W Lin [13] will be introduced.

Given a sequence x(n), the N-point DFT is defined as

⁰

In equation (4.1) the computational complexity is O N( ²) through directly performing the

-required computation. The computational complexity can be reduced to ( log )O N _r^N by using the radix-r FFT algorithm. In general, higher-radix FFT algorithm has less number of complex multiplications while compared with radix-2 FFT algorithm. Hence, the radix-8 FFT algorithm is employed in the 128-point FFT. But the 128-point FFT is not the power of 8, the mixed-radix FFT algorithm which include radix-2 FFT and radix-8 FFT algorithm should be chosen. The mixed-radix 128-point FFT algorithm is derived as below.

First, let the constant in equation (4.1) as

Then, equation (4.1) can be rewritten as

In equation (4.4), the 128-point DFT can be considered as a two-dimensional DFT, 2-point DFT and 64-point DFT. The inputs of 128-point DFT are computed by radix-2 FFT algorithm at first. Then, the results of radix-2 FFT are multiplied by twiddle factor. Finally, the results of multiplication should be calculated by 64-point DFT algorithm which can decomposed into 8-point DFT recursively 2 times. In order to derived the 64-point FFT algorithm by using radix-2³ FFT algorithm, the constant n2 and k2 in equation (4.3) can be defined as

Using equation (4.5), equation (4.4) can be rewritten as

( )

Where the twiddle factor can be decomposed as

1 2 3 4 1 2 3 4

In equation (4.9), the 8-point DFT are divided into three steps by using radix-2 index map.

Fig 4-1 shows the signal flow graph of the radix-8 FFT algorithm. In which, the radix-8 algorithm is decomposed into three steps. Each step has four butterfly operations. After the butterfly operations, the multiplications of twiddle factors in each step should be performed.

There are only three twiddle factors, -j, W , and ₈¹ W in radix-8 algorithm. The ₈³ multiplication of “-j” only needs to exchange the real part with imaginary part. Thus, it does not need any multiplier. The multiplications of the twiddle factors, W and ₈¹ W , can be ₈³

-replaced by some additions. Because the twiddle factors can be written as 2 2(1− and j)

(

^{2 2(1} ^j⁾

)

− − , respectively. The value of 2 2 is equal to 0.70710678 which can be written as 2⁻¹+2⁻³+2⁻⁴+2⁻⁶+2⁻⁸ can be complemented only by five shifters and four adders .

Fig 4-1: The signal flow graph of radix-8 FFT algorithm

The signal flow graph of 128-point mixed-radix FFT algorithm is shown as Fig 4-2. In which, the 128-point FFT is composed by three stage. The first stage is performed by radix-2 FFT algorithm and the radix-8 FFT algorithm shown in Fig 4-1 is employed in the second and third stages. The black point in each stage means that one twiddle factor will be multiplied at that point. In the first stage, there are sixty-four butterfly units and the two inputs of ith

butterfly unit are ith and (64+i)th input data where i = 0 ~ 63. Then the results of first stage should be calculated by the second and third stages. There are sixteen radix-8 FFT units in the second and third stages, respectively. The orders of radix-8 FFT inputs are different in each stage. In the second stage, the inputs of each radix-8 FFT unit are shown as below

(64 ) ,(64 8) ,(64 16) ,(64 24) ,

-Fig 4-2: The signal flow graph of 128-point mixed-radix FFT algorithm

4.2 128-Point FFT Architecture

In order to reduce the area of 128-point FFT, the proposed multiplier is employed in the 128-point FFT architecture. The 128-point FFT architecture proposed by Y-W Lin’s [13] is introduced in this section.

Fig 4-3 shows the 128-point FFT architecture which is divided into three modules. The first

-module is complemented by radix-2 FFT algorithm, and the radix-8 FFT algorithm is used in the second and the third modules. In this architecture, the high throughput rate is achieved by using four parallel data paths; the order of the output sequence is the bit reversal of the order of the input sequence as seen in Fig 4-3.

Fig 4-3: Block diagram of 128-point FFT 4.2.1 Module 1

Fig 4-4 shows the architecture of Module 1 which consists of 128 registers which can store 64 complex data, four two-input butterfly units (BU), two complex multipliers, and two ROMs. The ROMs are used to store the twiddle factors. The 128 registers are used to store inputs data and the outputs of BU. The operations of BU are the complex addition and the complex subtraction from two input data. Because the two inputs of each BU are in(i) and in(64+i) where i is from 0 to 63. This is corresponds to the first stage of Fig 4-2. The order of four parallel input sequences in Module 1 is in(4m), in(4m+1), in(4m+2), and in(4m+3) where m is from 0 to 31. Thus, the 64 input data at first 16 cycles should be stored in the register file.

At next 16 cycles, the eight inputs of the four BU are received from the register file and the inputs data, respectively. Then eight outputs data are generated by the four BU. The four outputs of the complex addition are sent to the Module 2 directly, and the other four outputs of complex subtraction are stored in the register file. Before the four outputs are stored, two of them are multiplied by twiddle factors. After 32 cycles, the other two outputs are multiplied

-by twiddle factors. Then, the four outputs are fed into the Module 2. By this multiplication approach can not only reduce the four complex multipliers to two complex multipliers but also achieve 100% utilization of the complex multipliers.

Fig 4-4: Block diagram of the Module 1 4.2.2 Module 2

The block diagram of the Module 2 is illustrated in Fig 4-5. It consists of four BU_8 structures and four complex multipliers. The architecture of BU_8 is directly mapped from 3-step radix-8 FFT algorithm as seen in Fig 4-1. And the numbers of registers in each step are eight, four, and tow, respectively. These registers are used to store the input of two-input BU until the other available input is received. The outputs of two-input BU in first and second steps should be multiplied by the twiddle factors, 1, -j, W , and ₈¹ W . As mentioned in ₈³ Section 4.1, the multiplications of these twiddle factors can be implemented without any multipliers. But the four outputs of BU_8 need to be multiplied by the nontrivial twiddle factors with four complex multipliers.

-Fig 4-5: Block diagram of the Module 2 4.2.3 Module 3

The Module 3 is also realized by radix-8 FFT algorithm. Fig 4-6 shows the block diagram of the Module 3. The structure of the Module 3 is different from that of Module 2, because the orders of input data of the Module 2 and the Module 3 are different. The structure should be adapted for the different orders of output as shown in Fig 4-6. The outputs data in first and second steps only need to be multiplied by the twiddle factors, 1, -j, W , and ₈¹ W . Thus, no ₈³ any multiplier is used in the Module 3.

Fig 4-6: Block diagram of the Module 3

在文檔中自我補償之固定長度乘法器及其應用 (頁 45-53)