• 沒有找到結果。

Chapter 4 Parallel-In-Parallel-Out FFT/IFFT Processor Architecture Design

4.3 FFT Sub_Module Design

4.3.5 Commutator Design

The commutator has an important issue to make the read write operations of different memory banks to be conflict free. Besides, in memory design, a single port memory’s area is about half of a dual port memory’s area. For example, a 128 words

× 38 bits dual port memory size is 0.054 mm2, but a 128 words × 38 bits single port memory size is 0.023 mm2 (The memory is generated by memory compiler using 90nm process technology). The single port memory size is 42.6% of the dual port memory size. With the commutator, we can change the 4 dual port memories into the 8 single port memories. The read or write address for the 4 PE in each stage is shown in Table 4-4. A counter is used to read or write the data from the memory for 4 radix-8

PEs, which’s binary index is b7b6b5b4b3b2b1b0, and, p1p0 is the ID of PE, {00,01,10,11}

means {PE0,PE1,PE2,PE3}. The following will show how to design the single port memory with read write operations conflict free [21][22].

Table 4-4 Read or write address for the processing elements in each stage Read or Write Address Address in Memory Bank Stage1 b2b1b0b7b6b5b4b3p1p0 { b2b1b0}+{ b3p1p0} Stage2 b7b6b5b2b1b0b4b3p1p0 { b7b6b5}+{ b3p1p0} Stage3 b7b6b5b4b3p1b2b1b0p0 { b7b6b5}+{ b1b0p0}

The read write operations for stage 1 are shown in Fig. 4.16. According to Fig.

4.16, we need at least 8 pipeline stages for each PE in stage 1; however, 8 pipeline stages for each PE can’t meet the timing constrain of the processing elements. For the system timing constrain, we choose 24 pipeline stages in stage 1 to make memory read write operations conflict free, and also make the timing constrain of the processing elements meet the system requirement.

B0

Conflict Free Pipeline Cycle=8+16n We Choose 24 Cycles for the Pipeline Stages

PE0

Fig. 4.16 Memories read write operations for different PE in stage 1

(a)

(b)

Fig. 4.17 Memories read write operations for different PE in stage 2 (a) butterfly 0~7 (b) butterfly 32~39

The read write operations for stage 2 are shown in Fig. 4.17. The read write operations in stage 2 which is not similar to stage 1, change the operations order every 32 butterflies. In addition, the 32 butterflies with the same read write operations are called inner stage, which is differ to stage defined by FFT algorithm called outer stage.

For this reason, we have to stall cycles every 32 butterflies to wait the data already written to the memories. Then, after that, we start to read data for next 32 butterflies.

Here we also choose 24 pipeline stages in stage 2 for the system timing constrain.

In stage 3, the read and write operations for PE0 and PE1 is the same as that for PE2 and PE3. Thus, we have to delay one cycle for PE2 and PE3 reading or writing the data. The operations are shown in Fig.4.18. In addition, the read write operations in stage 3, similar to stage 2, have to stall cycles every 32 butterflies, too. Here we choose 22 pipeline stages in stage 3 for the system timing constrain.

0 2 4 6 0 2 4 6 0 2 4 6 0 2 4 6

1 3 5 7 1 3 5 7 1 3 5 7 1 3 5 7

0 2 4 6 0 2 4 6 0 2 4 6 0 2 4 6

1 3 5 7 1 3 5 7 1 3 5 7 1 3 5 7

PE0

PE1

PE2

PE3

0 2 4 6 0 2 4

1 3 5 7 1 3 5

0 2 4 6 0 2

1 3 5 7 1 3

PE0

PE1

PE2

PE3 Read

Write

Memory Bank Number

Conflict Free Pipelined Cycles = 2+4n We Choose 22 Cycles for the Pipeline Stages

Fig. 4.18 Memories read write operations for different PE in stage 3

According to the analysis of the commutator operations, the state diagram of the proposed FFT/IFFT processor is shown in Fig. 4.19. The state diagram begins with

the IDLE state waiting for the fft_start signal to start the FFT/IFFT computation. Each stage has 5 states: Rd, Tw, Wr, Wait_Tw, and Wait_Wr. The Rd state is for PE to read the input data from memories and also triggers the counter of memory reading address.

The Tw state is for PE’s data multiplying by twiddle factor and also triggers the counter of twiddle factor ROM reading address. The Wr state is for PE to write the output data to memories and also triggers the counter of memory writing address. The last 2 states, Wait_Tw/Wait_Wr, are waiting for the reading data in PEs already multiply with twiddle factor/writing to the memory.

The state is beginning with stage 1 Rd. Then, after suitable pipeline stages, the current state is changed to the next states, which are stage 1 Rd, stage 1 Tw, stage 1 Wr, stage 1 Wait_Tw, and stage 1 Wait_Wr. After the stage 1 Wait_Wr state has already done, the current state is changed to 5 states of next stage.

From analysis of read write operations in each stage discussed in Fig. 4.16, Fig.

4.17, and Fig. 4.18, stage 2 and stage 3 has more stall cycles than stage 1 due to the operations order changed every 32 butterflies. Therefore, there are two signals to change the current state of the last 3 states in stage 2 and stage 3. One is outer stage signal triggered every 255 butterflies called outer_stage_inc. The other is inner stage signal triggered every 32 butterflies called inner_stage_inc. The outer stage and inner stage is defined as the discussion mentioned before. Thus, there is a loop in stage 2 and stage 3 due to the operations of current state in inner stage or outer stage.

Finally, the state diagram for commutator will make the read write operations of different memory banks to be conflict free by stall the cycles between inner stages or outer stages.

IDLE Stage1

fft_start stage1 tw start stage1 wr start

outer stage2 tw start stage2 wr start

inner stage stage3 tw start stage3 wr start

inner stage

Fig. 4.19 State diagram of FFT/IFFT processor

相關文件