Fast Fourier Transform v3.1 IP

第三章 FBP 演算法嵌入式系統實作

3.3 使用 IP 介紹

3.3.2 Fast Fourier Transform v3.1 IP

FBP 元件的另一個功能必須包含 FFT 運算及 IFFT 運算。Xilinx 公司也有提

供 FFT 及 IFFT 的 IP 可以供我們使用，我們不需從頭設計 FFT 及 IFFT 的電路。

Xilinx 公司提供的 Fast Fourier Transform v3.1 IP 的重要特色如下：

z 此 IP 可以在執行狀態中設定為 FFT 或 IFFT 運算。

z FFT及IFFT可以支援的取樣點數目大小是N=2^m，m= 3 到 16。

z 每一筆輸入資料的精確度b_x可以是 8、12、16、20、24 位元寬度。

z FFT運算過程中使用到的phase factor精確度 bw可以是 8、12、16、20、24 位元寬度。

z 計算過程中計算的形式有：

-Unscaled (full-precision) fixed point -Scaled fixed point

-Block floating point

z 在 butterfly 運算過程中產生的數值可以使用 Rounding 或是 Truncation 的方式進位到最後一位。

z IP 的硬體架構有三種選擇：

-Pipelined, Streaming I/O -Radix-4, Burst I/O

-Radix2, Minimum Resource

底下是 Fast Fourier Transform v3.1 IP 的線路符號代碼，黑色的部分表示我們有使用到的訊號，灰色則代表在我們的設計中沒有選用這些訊號：

XN_RE XK_RE

圖 3-10 Core Schematic Symbol

Port Name Port Width Direction Description

XN_RE bxn Input Input data bus: Real component ( bxn=8,12,16,20,24 )

XN_IM bxn Input Input data bus: Imaginary

component ( bxn=8,12,16,20,24 ) START 1 Input FFT start signal(Active High):

START is asserted to begin the data loading and transform calculation (for the Burst I/O architecture). For continuous data processing, START will begin data loading, which processing directly to transform calculation and the data unloading.

UNLOAD 1 Input Result unloading (Active High): For the Burst I/O architecture,

UNLOAD will start the unloading of the results in normal order. The UNLOAD port is not necessary for the Pipelined, Streaming I/O architecture.

NFFT 5 Input Point size of the transform: NFFT can be the size of the transform or

any smaller point size. For example, a 1024-point FFT can compute point size 1024, 512, 256, and so on. The value of NFFT is log₂ (point size). This port is optional for Pipelined, Streaming I/O architecture.

NFFT_WE 1 Input Write enable for NFFT (Active High): Asserting NFFT_WE will automatically cause the FFT core to stop all processes and to initialize the state of the core. This port is optional for Pipelined, Streaming I/O architecture.

FWD_INV 1 Input Control signal that indicates if a forward FFT or an inverse FFT is performed. When FWD_INV=1, a forward transform is computed. If FWD_INV=0, an inverse transform is performed.

FWD_INV_WE 1 Input Write enable for FWD_INV (Active High).

SCLR 1 Input Master reset (Active High):

Optional port.

CLK 1 Input Clock

XK_RE[(B-1):0] bxk Output Output data bus: Real component.

XK_IM[(B-1):0] bxk Output Output data bus: Imaginary component.

XN_INDEX Log2(point size)

Output Index of input data.

XK_INDEX Log₂(point size)

Output Index of output data.

RFD 1 Output Ready for data (Active High): RFD is High during the load operation.

BUSY 1 Output Core activity indicator (Active High): This signal will go High while the core is computing the transform.

DV 1 Output Data valid (Active High): This signal is High when valid data is presented at the output.

EDONE 1 Output Early done strobe (Active High):

EDONE goes High one clock cycle immediately prior to DONE going active.

DONE 1 Output FFT complete strobe (Active High):

DONE will transition High for one clock cycle at the end of the transform calculation.

表 3-3 Fast Fourier Transform v3.1 IP Core Pinout

圖 3-10 針對我們有用到的訊號線做一個說明，另外 Xilinx 公司提供的這個 IP，可以設定為執行 FFT 或是 IFFT 運算。此 FFT 是一個計算離散傅立葉轉換 (Discrete Fourier Transform，DFT)高效率的演算法。但是注意的是 FFT 使用的函式是 DFT

並不是和函式(3.1)對應的IDFT(inverse Discrete Fourier Transform)

( )

¹ ¹

( )

⁰ ¹

z Pipelined, Streaming I/O：此架構可以連續性的處理資料，也就是讀入資料，

做轉換及資料的輸出可以同時執行。此架構下執行 FFT 或是 IFFT 運算所需的時間最少，但是此架構下 IP 要合成硬體所需要的硬體資源也是最多的。

此架構使用的傅立葉演算法為 Radix-2 butterfly。

z Radix-4, Burst I/O：此架構下執行 FFT 或是 IFFT 運算有兩個階段，一個是資料的輸入輸出，另一個是資料的轉換運算過程。此架構下所需要的硬體資源會比 Pipelined, Streaming I/O 架構要少，但運算的時間會增加。內部使用的傅立葉演算法為 Radix-4 butterfly。

z Radix2, Minimum Resource：此架構跟 Radix-4, Burst I/O 架構一樣有資料的輸入輸出及轉換運算過程兩個階段，但是因為內部的傅立葉演算法為 Radix-2 butterfly。因此整個運算的時間最長，但所需要的硬體資源也最少。

另外，執行 FFT 或是 IFFT 運算時，在 radix-4 或是 radix-2 butterfly 轉換過程中，

數值可能會增大，因此考慮可能發生位元溢位的狀況下。運算的過程中數值的計算有三種形式：

z Unscaled (full-precision) fixed point：整個計算過程，不針對數值作任何比率的縮小，此情形在數值位元寬度不足時可能發生溢位狀況。

z Scaled fixed point：在作 FFT 或是 IFFT 會有多個 radix-4 或是 radix-2 butterfly 轉換過程，每個過程都作比率的縮小，縮小的倍數可能是 1、2、4、8，代 Fast Fourier Transform v3.1 IP 執行 FFT 運算，因此 Pipelined, Streaming I/O 架構無法跟我們的設計配合，因此我們選擇 Radix-4, Burst I/O 架構，使得資料的輸出入跟資料作傅立葉轉換分成兩階段。另外，我們處理的資料是 pixel 的灰階值，

範圍是 0~255，只要 8bits 就可以表達。但是之前我們提到要使用 fix point 的方式來處理 float point 的資料，因此將會乘上 2 冪次方倍數來保留運算中產生的小數位的資訊，再加上考慮計算過程中可能發生的溢位問題，因此在輸入資料的位元寬度及 phase factor 我們設定為 24 位元寬度。資料輸出則由 IP 自動設定為 33 位元寬度。為保留計算中數值所代表的資訊完整性，我們選擇 Unscaled (full-precision) fixed point 計算形式，避免縮小倍數的同時而失去部分資訊。至於我們要執行多少取樣數的 FFT 及 IFFT 運算呢？前面我們提到原本的 128 pixels 的灰階值在執行 FFT 運算前要先作 Zero Padding 的動作，因次我們將取樣數設為 256 筆。這樣我們就完成 Fast Fourier Transform v3.1 IP 的參數設定。

底下是我們參考到的時序圖：

圖 3-11 IP 開始輸入資料的波形圖

圖 3-12 IP 輸出資料的波形圖

其中要注意的是當 start 訊號拉成 1 時，從下一個 clock 開始的第四個 clock 才能輸入資料到 Fast Fourier Transform v3.1 IP 中。當 dv 訊號為 1 時，則開始輸出傅立葉轉換後的資料。

在文檔中區塊式濾波反投影演算法設計與FPGA實作(II) (頁 32-39)