Fixed-Point Implementation - OFDM TDD Uplink Synchronization

OFDM TDD Uplink Synchronization

5.4 Fixed-Point Implementation

In algorithm development, it is often convenient to employ floating-point computation. But for power, speed, and hardware cost reasons, practical transceiver implementations normally use fixed-point computation. The DSP employed in this work, TI’s TMS320C6416, is also of the fixed-point category, which can perform fixed-point computations more efficiently than floating-point ones. We consider fixed-point DSP implementation in this work, which entails careful conversion of the original program used in algorithm development from floating-point to fixed-point. We also try to making the resulting program run fast by making efficient use of the DSP’s features.

The C6416 CPU contains 8 parallel 32-bit function units, two of which are multipliers and the remaining six can do a number of arithmetic, logic, and memory access operations.

There is also flexibility in arranging the data so that each function unit can do double 16-bit or quadruple 8-bit operations. Running at 600 MHz, the peak performance is 4800 MIPS. For many transmission systems, 32-bit computations are an overkill and 8-bit computations do not provide the necessary accuracy. This appears to be the case with the present system, too.

Hence we choose to use the 16-bit data type mostly, with careful selection of dynamic range of the data at different points in the function chain. Simulation results confirm that this is an appropriate choice. In fact, a TI document also suggests use of the short data type (16-bit) for fixed-point multiplication inputs whenever possible [18]. The chosen data formats are as shown in Figures 5.15 and 5.16 for the transmitter and the receiver, respectively, where Qx.y means there are x bits before the binary points and y bits after. In every case, x + y = 15 because the sign takes one bit. We discuss the details of each block in the following subsections.

Fractional CFO synchronization error distribution under different speeds

|error|>0.5% subcarrier spacing

|error|>1% subcarrier spacing

|error|>2% subcarrier spacing

Figure 5.12: Fractional CFO synchronization error distribution under different speeds.

5 10 15 20

RMSE (number of subcarrier spacings)

RMSE of fractional CFO synchronization after averaging

6−path channel, 0 km/h 6−path channel, 30 km/h 6−path channel, 60 km/h

Figure 5.13: RMSE of fractional CFO synchronization after averaging.

5.4.1 Modulation and Subcarrier Allocation

The types of modulation supported in the IEEE 802.16e standard are BPSK, QPSK, 16-QAM and, optionally, 64-QAM. The output signals of the modulators have normalized symbol energy, with the range of signal values of each modulation type as shown in Table 5.4. The widest range occurs in the case of 64-QAM, which is [ ^√⁻⁷₄₂ , ^√⁷₄₂ ]. Therefore we must have at least one bit for the integer part of the signal value. With one bit for sign, there remains 14 fractional bits. Hence Q1.14 is the chosen data format, whose range covers [−2, 2). It can cover the ranges of pilot and preamble modulations as well.

The subcarrier allocation block simply allocates the modulation data samples, null sam-ples and pilot samsam-ples to their assigned subcarriers. There is no need to change data format in this block.

Table 5.4: Ranges of Modulated Signal Values

5.4.2 The IFFT and FFT

Since the signals after the IFFT are in the range [−1, 1], we choose Q.15 as the data format after IFFT and before FFT. For efficiency reason, we employ some functions provided by TI in the DSPLIB for C64x to implement the IFFT and the FFT.

The DSPLIB contains FFT functions employing 32 × 32-bit and 16×16-bit

multiplica-0 10 20 30 40 50 60

Fractional CFO estimation after averaging under different speeds

|error|>0.5% subcarrier spacing

|error|>1% subcarrier spacing

|error|>2% subcarrier spacing

Figure 5.14: Fractional CFO synchronization error distribution under different speeds after averaging.

binary Q1.14 Q1.14 Q.15 Q.15

FEC encoder modulation subcarrier

allocation IFFT 4X upsample

and SRRC filter

Figure 5.15: Fixed-point data formats used at different points in the transmitter.

Q.15 SRRC filter and 4X downsample

Q.15 timing and CFO Q.15 Q1.14

synchronization FFT

Figure 5.16: Fixed-point data formats used at different points in the receiver.

tions. The former has higher computational complexity. We resolve to use the latter.

The function DSP fft16x16r() computes a complex forward mixed radix FFT with scal-ing, rounding and digit reversal. The input data x[] and the coefficients w[] are arrays of complex numbers, with the numbers stored in interleaved 16-bit real and imaginary parts.

The output data are returned in a separate array y[] in normal order, also complex with interleaved 16-bit real and imaginary parts. The code uses a special ordering of FFT coeffi-cients (also called twiddle factors). These twiddle factors are generated by using the function tw fft16x16() provided by TI.

The DSPLIB does not contain a 16×16-bit IFFT routine. Hence we modify the DSP fft16x16r() routine to compute the IFFT. The modification is based on the following identity: where y[] is the input, x[] is the output, and W_N is the twiddle factor. Therefore, we first

conjugate the input, then perform FFT, and then conjugate the output to obtain the desired IFFT.

In DSP fft16x16r(), scaling by 2 (i.e., right shift by 1 bit) takes place at each radix-4 stage except the last one. A radix-4 stage could give a maximum bit-growth of 2 bits, which would require scaling by 4. To prevent overflows, the input data in general should be scaled by 2^{BT −BS}, where BT = log₂N (total number of bit growth) and BS = dlog₄N − 1e (2’s exponent of scaling), with N being the length of the FFT. All shifts are rounded to reduce the truncation noise power by 3 dB.

Recall that the length of IFFT/FFT in our system is 256. Hence BT = log₂256 = 8 and BS = dlog₄256 − 1e = 3 and theoretically we need to shift the input to the right by 5 bits. But we find that, in our case, scaling the IFFT input by 4 bits is enough to prevent output overflow, and this can reduce the noise from scaling by 3 dB. Thus, in principle, the IFFT output is scaled for a total of BS + 4 = 7 bits. But as far as fixed-point binary numbers are concerned, such scaling amounts merely to relocating the binary point, which can be relocated anywhere (equivalent to applying an arbitrary integer-power-of-2 scaling) for the convenience of fixed-point computation. Thus we interpret the IFFT output as in Q.15 format. For the FFT, we find that right-shift of the input by 1 bit is enough to prevent output overflow.

5.4.3 SRRC Filter with Oversampling and Downsampling

In order to provide the ability to simulate path delays at non-integer sample times, an interpolator is induced in the transmitter to yield 4-times oversampled transmitter output.

In our system, we adopt the 57-taps square-root raised-cosine (SRRC) filter with α = 0.155.

We implement a polyphase system, shown in Figure 5.17. This implementation would involve applying filter coefficients only to input values that are nonzero. In our work, L = 4. When

Figure 5.17: Implementation of interpolation filter with polyphase decomposition [5].

Figure 5.18: Convolution kernel at the boundary of a finite-length sequence [7].

computing an output value at the boundary of a sequence, a portion of the convolution or correlation kernel is usually off the edge of sequence, as illustrated in Figure 5.18. We assume the values outside the data sequence to be 0, that is, we do zero padding. Thus, we can avoid using many if-else statements to handle the boundary values when doing convolution.

The output of Figure 5.17 is equivalent to oversampling input by 4 times and passing it through the SRRC filter. In the receiver, we just convolve the input signals with the SRRC filter, which is like the convolution in the transmitter, and downsample the output by 4

times. The data formats of the input and the output are the same.

5.4.4 Synchronization

The detailed synchronization method has been presented in previous sections. Besides trans-lating floating data type to short data type, here we only make two points retrans-lating to fixed-point implementation:

• In fractional CFO estimation, we use a lookup table to implement the arctan() function.

The table contains 2048 entries covering the range [tan 0, tan 0.4π] uniformly, for ˆε in [0, 0.2] times the subcarrier spacing. The table also applies to negative values of ˆε since the tangent function is symmetric.

• In frequency offset compensation, we construct two tables for the sin() and the cos() functions, each containing 2048 entries covering the range [0, π] uniformly.

在文檔中 IEEE 802.16e OFDM上行及OFDMA下行同步技術與數位訊號處理器實現之研究 (頁 87-94)