Scaling Method - 以靜態機率模型分析為基礎之應用於快速傅利葉轉換處理器設計的精度最佳化技術

Chapter 1 Introduction

2.3 Scaling Method

Because of the addition and subtraction operations in FFT computations, the value range is increased from stage to stage. One solution to avoid possible overflows in a fixed-point FFT design is to increase the wordlength [11]. However, the increased wordlength has many drawbacks in FFT implementations. First, a larger storage is required to store the data which increases both chip area and power consumption. Second, a longer wordlength results in worse critical-path timing for the arithmetic units, which is not preferred in the high-throughput FFT designs. The most of all, the wordlength is fixed in the memory-based FFT architecture which cannot allow different wordlengths from stage to stage. Consequently, many scaling methods have been proposed for FFT processors to scale the data for wordlength reduction. The scaling scheme for fixed-point FFT processors can be roughly divided into two categories: 1) Static Scaling Method and 2) Dynamic Scaling Method.

Oppenheim et al. [12, 20] suggest a static scaling procedure which is easy to understand, simple to implement, and most often used. Since the maximum magnitude increases by no more than a factor of 2 from stage to stage, we can prevent overflow by incorporating an attenuation of 1/2, that is, increase 1 bit for integer-part and decrease 1 bit for fraction-part, at the input to each butterfly, as shown in Fig. 8. In this case, the SQNR may not as good as the dynamic scaling method, but the hardware is very simple to implement.

Fig. 8 Butterfly showing scaling by 1/2 at the input

We can further improve the method with a slight modification. Compared to scaling at

SQNR. Since the original method induces the noise at the input of each butterfly, the accuracy will be lost from the beginning. We modify the butterfly of Fig. 8 to that of Fig. 9, where the output is noiseless before scaling. Fig. 10 shows the simulation result of two methods. We can see that scaling at the output is always a better choice.

Fig. 9 Butterfly showing scaling by 1/2 at the output

Fig. 10 Comparison of different scaling positions, wordlength = 12 bits

In [13], Ramakrishnan et al. consider a special case of FFT design for OFDM receivers.

The authors exploit the Gaussian nature of OFDM signals to predict the growth of the value range of signals at each stage and decide the scaling behavior appropriately. They suggest increasing 1 bit of integer parts for every two stages instead of every stage. However, the model of Gaussian distributed inputs is not suitable for general case, like uniformly distributed input which is most assumed in FFT analysis [11, 12]. Furthermore, the method has good results only in a small range of σ of Gaussian distribution. Fig. 11 shows the SQNR for different σ with the two methods.

Fig. 11 The SQNR for different σ with 12-bits, 8192-point FFT

The other way to do the static scaling optimization is through time-consuming and pattern-dependent simulations to find feasible number formats for each stage. Since the exhaustive simulation is impractical in many cases, designers usually intuitively pick some configurations to evaluate, and choose the best one among them.

The dynamic scaling method uses a shard-exponent concept which not only reduces the wordlength in FFT processors but also acquires good SQNR. The block floating point (BFP) algorithm [1, 14, 20], which is one of the dynamic scaling approaches, employs intermediate buffers to store the output data, and detects the maximum value to decide the exponent for each buffer. Unfortunately, the intermediate buffers and exponent storage causes a large amount of area overhead. Also, the additional processing latency and power consumption are introduced by the intermediate buffer accesses and data detections. Due to the increased complexity of the dynamic scaling method, the static scaling approach is preferred for many FFT designs in reality.

We propose a static scaling optimization method to maximize the precision in terms of SQNR. Not only the hardware complexity is the same as that of the traditional static scaling method, but also the precision comes close to the dynamic scaling method. Our method

0.00

truncation and saturation operations. It can suggest the number format for each stage in a short time, and also can handle different FFT sizes, FFT algorithms, wordlengths, and distributions of input signals.

Chapter 3 The Proposed Approach

This chapter has four sections. The first one describes the motivation of static scaling optimization. And the second one defines the problem formulation. In the third section, we illustrate the probability model and the derived distribution with the computation in FFT. Last, we present the proposed static scaling optimization method. The purpose of this thesis is to fix the scaling behavior at each stage with optimized SQNR in short time.

3.1 Motivation

With the approach of [12], the integer-part bit would be increased by 1 for each butterfly.

Fig. 12 shows the general form of the scaling behavior for each stage with the radix-2 FFT algorithm. Note that a stage is a radix-2 butterfly computation and the format <m, n> means a 2’s complement binary number with m bits for integer-part and n bits for fraction-part, where m + n = WL (total wordlength). The number of representable values is 2^WL. When m is increased by 1, the scale of representable values is double. The n is then decreased by 1, so the precision is decreased. Take the radix-2 64-point FFT with <1, 11> input format as an example, it has log2 64 = 6 stages, therefore the output format would be <7, 5>.

Stage s

<s, WL-s> <s+1, WL-s-1>

Fig. 12 The scaling behavior for each stage with the approach of [12]

However, through 21.6M sets of 64-pt FFT simulation which takes 12 hours to run, the probability that need 7 bits for integer-part is about 0%. That is, we may use fewer bits for

integer-part to acquire better SQNR since more bits for fraction-part are reserved. To handle the possible overflow problems, we apply saturation arithmetic which is a common technique in DSP computation to reduce the noise.

Saturation arithmetic is an arithmetic to limit a number to a fixed range between a minimum and maximum value. For example, if the valid range is from -8 to 7 (4 bits for integer-part), the overflow occurs when we compute 2’b0100 (4) + 2’b0101 (5) = 2’b1001 (-7). The noise would be 9 – (-7) = 16 since the correct answer is 9. If we apply saturation arithmetic, the result would be saturated to 7 when the correct answers larger than 7. So the error is 9–7 = 2 which is much smaller than 16.

We illustrate an example of static scaling optimization. Fig. 13 shows a configuration which is the traditional scaling behavior of 256-point FFT and the SQNR is 35.39 dB. If we modify the output format at Stage 8 from <9, 3> to <8, 4>, the SQNR would be increased to 37.03 dB. And we further modify the output format at stage 7 from <8, 4> to <7, 5>, the SQNR would be increased to 38.47 dB. However, if we further modify the output format at stage 2 from <3, 9> to <2, 10>, the SQNR would be decreased to 17.82 dB. The example tells that the format of each stage has to be chosen appropriately to get the best SQNR.

Stage 1 Stage 2 Stage 3 Stage 4

Stage 5

<2, 10> <3, 9> <4, 8>

<5, 7>

Fig. 13 The scaling behavior for 256-point FFT with the approach of [12]

Traditional static scaling optimization method has relied on time consuming simulations to fix the scaling behavior at each stage [13]. Nevertheless, it costs about 80 hours to simulate only 10k sets of 8192-point FFT for one configuration. And for the radix-2 FFT algorithm, since each stage has to be decided increasing 1 bit of integer-part or not, 8192 possible

configurations exist. It needs about 75 years to do the simulation which is very impractical.

As a result, we propose a static scaling optimization approach which can fix the scaling behavior in less than 2 minutes in this thesis.

3.2 Problem Formulation

To define the precision optimization problem, we illustrate the problem as follow: Given FFT size, radix-r FFT algorithm where r is the power of 2, wordlength for both I/O and storage, and the input probability distribution, the static scaling optimization problem is to fix the number format at each stage to give the maximum SQNR for the whole fixed-point FFT computation.

3.3 Probability Model

3.3.1 Probability Mass Function

In order to do the analysis, we model the input as a discrete random variable (RV) which is real and its number of values is finite. For a discrete random variable X, it has an associated probability mass function (PMF) [27], which gives the probability of each numerical value

that the random variable can take, denoted p_X. In particular, if x is any possible value of X, the probability mass of x, denoted pX(x), is the probability of the event {X = x} (P({X = x}) for short) consisting of all outcomes that give rise to a value equal to x:

pX(x) = P({X = x}) (3.1)

Note that

∑ 𝑝x X(𝑥)= 1 (3.2)

where in the summation above, x ranges over all the possible numerical values of X.

Since the input of an FFT processor is a fixed-point complex number, the wordlength (WL) for both real and imaginary parts of the signals are fixed. That is, the number of the representable value is numerical which is restricted to 2^WL for 2’s complement binary number.

For example, 2 bits number with <1, 1> format has 2²=4 representable values which are {-1, -0.5, 0, 0.5}. As a result, the PMF can perfectly describe the behavior of signals in terms of the probability of each representable value in fixed-point design.

Besides, computing either real part or imaginary part of input signals can represent the total result in terms of SQNR, which is proved below.

Theorem 1

In FFT computation, computing either real part or imaginary part of input signals can represent the total result in terms of SQNR if the real part and imaginary part of the input signals have the same probability distribution.

Proof of Theorem 1:

For N-point FFT, x[n] is the input sequence which has N element and y[k] is the FFT of x[n] where

y[k] = ∑^N−1_n=0x[n]W_N^nk (3.3) We first prove that

Powersignal= 2 ∗ Powersignal_real (3.4) For each element in y, denoted y[k], we have

Powery = |y[k]|² = ��∑^N−1n=0𝑥[n]W_N^nk��²= �∑^N−1n=0|𝑥[n]|�W_N^nk��² (3.5) Since W_N^nk is on the unit circle and its magnitude is unary, that is

�W_N^nk� = 1 (3.6)

Equation (3.5) can be rewritten to

|y[k]|² = �∑^N−1_n=0|𝑥[n]|�² = �∑^N−1_n=0�𝑥_𝐼[n] + 𝑗 ∗ 𝑥_𝑄[n]��² (3.7) where 𝑥𝐼[n] is the real part and 𝑥𝑄[n] is the imaginary part.

Since 𝑥_𝐼[n] and 𝑥_𝑄[n] have the same distribution, in average, Eq. (3.7) becomes

|y[k]|² = �∑^N−1n=0�2𝑥_𝐼[n]²�² = 2 ∗ �∑^N−1n=0𝑥_𝐼[n]�² (3.8) And from the computation with real part of the input sequence y_real, we have

(3.4).

The power of noise can also be proved that

Power_noise= 2 ∗ Power_{noise_real} (3.10) From Eq. (3.4) and (3.10), we can derive that

SQNR_total = 10 ∗ log₁₀�Powersignal

Powernoise� dB

= 10 ∗ log₁₀�^2∗Powersignal_real

2∗Power_{noise_real}� dB = SQNR_real (3.11)

End of proof

By Theorem 1, we can only analyze the real-part of input signals to represent the complex signals when computing SQNR.

In the following discussion, we assume either real-part or imaginary-part for each input signal of the fixed-point FFT computation is a discrete random variable that is independent with each other and uniformly distributed in [-1, 1). Fig. 14 shows the PMF of an input random variable with the uniform distribution and 6-bit wordlength.

Fig. 14 The PMF of an input random variable with wordlength = 6 bits

3.3.2 Derived Distribution for the FFT Computation

In this section, we consider functions Y = g(X) of a discrete random variable X. Given the PMF of X, we discuss techniques to calculate the PMF of Y (also called a derived distribution) [27]. In order to discuss the derived distribution of the FFT computation, we now focus on the special case where the function g is the FFT computation, denoted fft. A flow graph representing the raidx-2 butterfly computation is shown in Fig. 15, which is the basic computation of FFT. We can observe that the butterfly consists of two operations, which are the addition/subtraction operation and the twiddle factor multiplication operation. If we can handle the two derived distributions of these two operations, we can derive the distribution of the output of the FFT computation at each stage.

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

0 0.01 0.02 0.03 0.04 0.05

Value

Probability

Fig. 15 The butterfly computation

For the addition operation, we now consider an example of a function of two random variables, namely, the case where Z = A + B, for independent A and B with PMFs pA and pB, respectively. Then for any integer z, we have

p_Z(z) = P(A + B = z)

= ∑{(a,b)|a+b=z}P(A = a, B = b)

= ∑ P(A = a, B = z − b)_a

= ∑ p_x _A(a)p_B(z − b) (3.12)

The resulting PMF pZ is called the convolution of the PMFs of A and B. See Fig. 16 for an illustration. And the subtraction operation can be proved that is the same as the addition operation.

Fig. 16 The calculation of the addition of two independent uniform random variables

Since the twiddle factor W is unary, that is, |W| = 1, the scalar has no effect when computing the derived distribution of Y_real[k] = ∑^N−1_n=0x_IW_N^nk or Y_imag[k] = ∑^N−1_n=0x_QW_N^nk. According to Theorem 1, each of Yreal or Yimag can represent the total FFT computation in

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

propagation of the addition operation. Fig. 17 shows the derived distribution in the 8-point FFT computation. The x-axis is the representable value and the y-axis is the probability of each representable value.

Fig. 17 The derived distribution in the signal flow of the FFT computation

3.3.3 Saturation Analysis

To model the behavior of saturation, we limit the representable value of output between a maximum and a minimum value which are decided by the given output format. For example, if the integer part of the given format is 4 bits, the maximum value is near 8 (smaller than 8 by a fractional unit), and the minimum value is exactly -8. The probability of the overflowed values which are beyond the limited value would be added to the maximum or the minimum value, such that the PMF of output with the behavior of saturation is modeled. The PMF with the behavior of saturation is illustrated in Fig. 18.

Fig. 18 (a) The PMF before saturation with 5-bit integer part (b) The PMF after saturation to 4-bit integer part

x[0]

x[1]

x[2]

x[3]

x[4]

x[5]

x[6]

x[7]

X[0]

-1 -1 -1 -1

-1 -1

-1

3.3.4 SQNR Calculation

In order to calculate the SQNR of the output of FFT, we need to know the power of the fixed-point output and the noise-free output. With the method in 3.3.2, we can get the output derived distribution, denoted p for its PMF, which is assumed that the computation is noise-free. That is, the noise from multiplication and add/subtraction is ignored. However, the output wordlength is fixed to the input wordlength, such that the number of representable value for output is not as much as the representable value in the output derived distribution, denoted x’. Signal x’ would be truncated to x which is representable for output. These point then induce truncation error which is x’-x. Note that the saturation operation also induces error, and the way to calculate its power is as same as truncation error. We can take the overflowed value before saturated as x’, and x would be the value that x’ is saturated to.

For example, if the output derived distribution has the format <2, 1>, the representable value is [-1, -1.5, -1, -0.5, 0, 0.5, 1, 1.5]. But the I/O wordlength is 2 bits, and the output format is <2, 0>, x’: [-1.5, -0.5, 0.5, 1.5] cannot be represented. These values would be truncated to x: [-2, -1, 0, 1], respectively, and induce truncation error by 0.5 for each.

The formula for SQNR calculation is

SQNR = 10 ∗ log₁₀�^Power_Power^signal

noise� dB (3.13)

And we can calculate the power of signalby the output derived distribution as

Power_signal = ∑ (𝑥_𝑥 ²∗ 𝑝(𝑥)) (3.14) The power of noise can also be calculate by

Power_noise = ∑ ((𝑥_𝑥′ ^′− 𝑥)²∗ 𝑝(𝑥′)) (3.15) With Eq. (3.14) and Eq. (3.15), we can evaluate SQNR by Eq. (3.13).

Take a 64-point FFT with 12-bit I/O as an example, we apply four different output formats and calculate their SQNR. The integer part decides the output range, and the fraction part decides the unit. The calculation result is shown in Table II . Note that Unit in the third

column means the smallest number which the format can represent. We can find that <6, 6>

has the best SQNR among these output formats, since its overflow probability is near zero and the unit is half compared to <7, 5>.

Table II 64-Point FFT Output Format Analysis

Output Format Output Range Unit SQNR Overflow

Probability

<7, 5> [-64, 63) 0.0313 48.17 dB 0 %

<6, 6> [-32, 31) 0.0156 54.19 dB 5.53*10^-11%

<5, 7> [-16, 16) 0.0078 42.49 dB 0.049 %

<4, 8> [-8, 8) 0.0039 16.14 dB 8.33 %

The proposed SQNR model was verified using the simulation method. The comparison is shown in Table III. In this simulation, the result is from 21.6M randomly sets of 64-point FFT with the uniform distribution. And each of formats cost about 12 hours to simulate by Matlab.

As a result, the SQNR difference is smaller than 0.15 dB, and the overflow probabilities evaluated by the two methods are extremely closed.

Table III The comparison between analysis method and simulation method

Output Format Analysis Method Simulation Method Difference

SQNR Overflow

Probability

SQNR Overflow

Probability

SQNR Overflow

Probability

<7, 5> 48.17 dB 0 % 48.17 dB 0 % 0.00 dB 0 %

<6, 6> 54.19 dB 5.53*10^-11 % 54.20 dB 0 % 0.01 dB ~0 %

<5, 7> 42.49 dB 0.049 % 42.34 dB 0.050 % 0.15 dB 0.001 %

<4, 8> 16.14 dB 8.33 % 16.13 dB 8.34 % 0.01 dB 0.01 %

3.4 Scaling Optimization

In this section, we further consider the scaling behavior from stage to stage, and illustrate the proposed flow of the scaling optimization. Since the output wordlength is fixed, the representable value of each stage is also the same, and then the scale decision has to be made.

Based on the probability model and derived distribution concepts, we propose a greedy algorithm to suggest the scaling behavior at each stage with optimized precision. The modified model is more suitable for fixed-point hardware implementation, and the computation complexity is O(2^WL*s), where s is the number of stages and WL is the wordlength.

3.4.1 Truncation Operation

The wordlength is increasing through FFT computation in the butterfly as mentioned.

However, the data has to be quantized to write into storage whose wordlength is fixed and as same as the input of butterfly. Truncation operation for quantization is to discard few bits from the least significant bit (LSB) for limiting the number of bits. For example, consider the 5-bit binary number 0.1011 (0.6875) and if we truncate it to 4 bits, the result would be 0.101 (0.625) which is resulting from discarding 1 bit from LSB.

For each radix-2 butterfly, the number of representable values in output is about double compared to the input as shown in Fig. 19. When the truncation operation is applied then, half of values are not representable. To model the probability behavior, the probability of the value which is truncated would be added to the probability of representable value. After the truncation, the PMF in Fig. 19 becomes the PMF shown in Fig. 20. The noise induced by truncation operation can be computed with the method mentioned in Ch. 3.3.4.

Fig. 19 The PMF behavior of butterfly computation, and the input and output format are <1, 1> and <2, 1>,

respectively.

-1.5 -1 -0.5 0 0.5

-2 1 1.5

1/4

Fig. 20 The PMF after truncation operation

3.4.2 Saturation Operation

In addition to the truncation operation, saturation behavior can also be modeled in terms of PMF and noise. With given number range, that is, wordlength of integer part, the maximum and minimum value can be decided. The number beyond the limitation would be saturated to the maximum or minimum value, and its probability is also added to the probability of the maximum or minimum value. The operation is similar to the analysis method mentioned in Ch. 3.3.3. The major difference is that the operation is applied at each stage and evaluated the noise induced for scaling decision.

3.4.3 Scaling Decision

With the two operations and its noise analysis, the noise at each stage can be evaluated with given number format. Take the radix-2 FFT algorithm as an example, there are two scaling choices which are to increase 1 integer bit or to maintain the number format of input

在文檔中以靜態機率模型分析為基礎之應用於快速傅利葉轉換處理器設計的精度最佳化技術 (頁 21-0)