• 沒有找到結果。

Chapter 4 A Robust Channel Estimator for STBC-OFDM Systems

4.6 Simulation Results

The performances of the proposed channel estimator are demonstrated through the simulation of an STBC-OFDM system with two transmit antennas and one receive antenna. The multipath channels adopt the ITU Veh-A [48] channel model with relative path power profiles of 0, -1, -9, -10, -15, and -20 (dB), and the path excess delays are uniformly distributed from 0 to 50 sampling periods. Jakes model is also used to generate a Rayleigh fading environment [49].

Fig. 4.6 and Fig. 4.7 show the BER performances of the proposed scheme with different tracking iterations for QPSK and 16-QAM modulations at the vehicle speed υe of 120 km/hr. The maximum Doppler frequency fD is 277.8 Hz and is about 0.025 normalized to subcarrier spacing Δf. The result of perfect channel estimation, denoted as perfect CSI, is included for benchmarking. As shown in Fig. 4.6, due to the first tracking iteration in the tracking stage that only uses the pilot subcarriers to track channel variations and provide a global search direction, the performance curve of the proposed scheme with one tracking iteration has a performance gap as compared with the perfect CSI curve. However, when the tracking iteration number increases, the performances of the proposed scheme become more close to the perfect CSI case. In QPSK modulation, the performance curves of the proposed scheme with two, three and four tracking iterations have about 1.4, 1.1 and 0.2 dB gaps in Eb/N0 as compared with the perfect CSI case at BER=10-3. BER of about 10-4 can be achieved in Eb/N0 of

0 5 10 15 20 25 30

30 dB. Moreover, the performances of the proposed scheme and the original two-stage method with four tracking iterations are almost the same. As shown in Fig.

4.7, in 16-QAM modulation, the performance curves of the proposed scheme with three, four, five, and eight tracking iterations have about 2.2, 1.2, 0.5 and 0.2 dB gaps in Eb/N0 as compared with the perfect CSI case at BER=10-2. BER of about 10-3 can be achieved in Eb/N0 of 30 dB. Similarly, the performances of the proposed scheme and the original two-stage method with eight tracking iterations are very close.

Fig. 4.8 and Fig. 4.9 show the BER performances of the proposed scheme with different tracking iterations for QPSK and 16-QAM modulations at υe of 240 km/hr.

The maximum Doppler frequency fD is 555.6 Hz and is about 0.051Δf. As shown in Fig. 4.8, in QPSK modulation, the performance curves of the proposed scheme with two, three and four tracking iterations have about 2.1, 1.6 and 0.6 dB gaps in Eb/N0 as compared with the perfect CSI case at BER=10-2. BER of about 10-3 can be achieved in Eb/N0 of 30 dB. As shown in Fig. 4.9, in 16-QAM modulation, the performance

Fig. 4.10 shows the BER performances under different υe in Eb/N0 of 16 dB. At υe of 120 km/hr (fD=0.025Δf), BER of the perfect CSI case and the proposed scheme with four tracking iterations for QPSK/16-QAM/64-QAM modulation can achieve about 8.9x10-4/4.6x10-3/3.9x10-2and 9.6x10-4/7.6x10-3/4.3x10-2, respectively, without using channel coding. Even at υe of 240 km/hr (fD=0.051Δf), BER of the proposed scheme with four tracking iterations for QPSK/16-QAM/64-QAM can achieve about 3.1x10-3/2.9x10-2/ 9.7x10-2.

Three kinds of interpolation-based channel estimation methods, the 1st-order predictive algorithm, the 2nd-order predictive algorithm and the two dimensional (2-D) interpolation method [53], [54], are simulated to make the performance comparisons.

Considering the IEEE 802.16e OFDMA downlink specification, these methods are executed based on the cluster structures (Fig. 4.5) where a cluster consists of 14 consecutive subcarriers with alternating structures in two successive time slots. These interpolation-based methods are applied as follows: 1) for each time slot, do LS

interpolation of the corresponding channel frequency response for each specific transceiver antenna pair; 3) perform linear frequency-domain interpolation by using pilot subcarriers and the interpolated subcarriers obtained from time-domain interpolation.

Fig. 4.11 shows the normalized mean square errors (MSE) of channel estimation for QPSK modulation under different methods at υe of 120 km/hr. As shown in the figure, the performance curves of the interpolation-based methods exhibit an error floor phenomenon. Generally, there are three factors contributing to the channel estimation error of the interpolation-based methods, which are AWGN noise and model errors from both time-domain and frequency-domain interpolations. At low Eb/N0 situation, the estimation error is mainly dominated by AWGN noise. However, the error floor phenomenon at high Eb/N0 is due to model errors. The longest interval between the pilot subcarriers transmitted from one antenna is 12 subcarrier spacing, and even that between the pilot and interpolated subcarriers is four subcarrier spacing.

Because of both the frequency selective fading caused by larger multipath delay spreads and the time selective fading caused by higher Doppler effect, the interpolation-based methods under the situation of limited pilots in the cluster

Fig. 4.10 BER performances versus the vehicle speed.

structures cannot recover the channel frequency response well. In Eb/N0 of 30 dB, the normalized MSEs of the proposed scheme and the 2-D interpolation method are about -24.3 dB and -13.9 dB. As discussed above, the 2-D interpolation method performs the linear interpolation in time and frequency axes. We further perform the 2-D interpolation method with the second order [65] and cubic [66] interpolations in frequency axis, respectively, and the MSE of channel estimation for QPSK modulation at υe of 120 km/hr are shown in Fig. 4.12. Even using the second order and cubic interpolations in frequency axis, the performances of the 2-D interpolation methods are also inferior to our proposed method. The MSE of the 2-D interpolation method with the second order and cubic interpolations in frequency axis are about -15.0 dB and -15.8 dB in Eb/N0 of 30 dB.

Although the interpolation-based methods have lower complexity for implementation, our proposed scheme has lower MSE of channel estimation and algorithms and 2-D interpolation-based methods at υe of 120 km/hr.

4.7 Summary

In this chapter, a two-stage channel estimation method is proposed for STBC-OFDM systems in wireless mobile channels. The initialization stage uses a MPIC-based decorrelation method to identify the significant paths of CIR in the beginning of each frame. The tracking stage is then used to track the path gains with known CIR positions. When operating at υe of 120 and 240 km/hr with Eb/N0 of 16 dB for QPSK modulation, the proposed channel estimation method can achieve BER of about 10-4 and 10-3 without using channel coding. As compared with interpolation-based channel estimation methods which are commonly adopted in the pilot-aided channel estimator designs [53], [54], [65], and [66], under 120 km/hr, the proposed channel estimation method has the normalized MSE improvements of 8.5~10.4 dB in Eb/N0 of 30 dB. interpolation-based methods with different interpolations in frequency axis at υe of 120 km/hr.

Chapter 5

Novel Programmable FIR Filter Based on Higher Radix Recoding

5.1 Introduction

In baseband communication systems, finite impulse response (FIR) filter is one of the most widely used block. In wire and wireless baseband communications, high performance programmable FIR filters are frequently used to perform adaptive pulse shaping and signal equalization on the received data. Complexity reduction of FIR filter implementations has been of particular interest because lower computational complexity leads to high performance as well as low power design.

A novel programmable FIR filter based on higher radix recoding is proposed for high performance and low power applications [67]. We employ the higher radix recoding [68]-[70] to reduce the number of partial products and reduce power consumption by pre-computation sharing in each partial product generation. However, the propagation delay of pre-computing multiples is increasing by increasing higher radices. We can extend recoding methodology by using secondary radix recoding schemes [70] to reduce the number and propagation delay of odd multiples requirement for very high radix recoding. Although the proposed receiver dose not implement the programmable FIR filters, the computation sharing concept will be used in the proposed receiver for low-complexity implementation.

The remainder of this chapter is organized as follows. Section 5.2 presents the algorithm reformulations with higher radix recoding and secondary radix recoding.

Section 5.3 describes the proposed programmable FIR filters based on higher radix recoding and secondary radix recoding. Section 5.4 shows the design results and comparisons of programmable FIR filters. Finally, Section 5.5 draws the conclusions of this chapter. consumption by pre-computation sharing in each partial product generation. However, increasing radices to reduce the number of partial products will increase the propagation delay of pre-computing multiples of the multiplicand. We propose to extend recoding methodology by secondary radix recoding schemes to reduce the number of odd multiples required for very high radix recoding.

5.2.1 Higher Radix Recoding

A w×w-bit multiplication X·C in its simplest form is implemented by the generation of w partial products. We denote X and C in binary number representation.

1

The product computation is based on the sum:

-1

with the partial product, Pi=X·c[i]·2i. The factor 2i is realized by a shift of the multiplicand X, and the factor c[i]Î{1, 0}is employed by a partial product generation to select either X·2i or zero as i-th partial product in the w-term sum. The main as primitive digits set to be conditionally selected, complemented by (-1)s, and shifted by 2e+ir to form the partial products to be accumulated.

5.2.2 Secondary Radix Recoding

The radices greater than eight will increase the number of required odd multiples and therefore increase the complexity and delay of the carry propagate adder to pre-compute the odd multiples. We extend higher radix (radix β=2r, r≥5) recoding by the secondary radix recoding to express the sum of partial products and reduce the number of required odd multiples. As show in (5.4), each Booth digit value

-2r-1≤di≤2r-1 is recoded to n-digit value (2≤n≤4) in secondary radix recoding scheme.

For example, n=2 and the secondary radix (modulus) λ, we recode the digit di to be

1, 0,

i i i

d =d × +l d (5.7)

where digits d1,i and d0,i are chosen from a balanced complete residue system modulo λ forming the secondary radix digit set Dλ. The residue digit sets of the form

1 2 ( 1) / 2 increase the range of digit values, all nonzero digits of Dλ should be of the form ±δ·2i, where δ is a member of {0, 1}, {0, 1, 3} or {0, 1, 3, 5}. For n=2, (5.4) can be multiplication by λ to be a shift and add/subtract. The n-digit secondary radix values must include all integers in the high radix Booth digit range [-2r-1, 2r-1].

5.2.3 Algorithm Reformulation for FIR Filter

Considering an N-tap FIR filter with input sequence Xn, output sequence Yn, and coefficients Ci, without loss of generality, we can express the FIR reformulation of reformulation of high radix β with secondary radix recoding as

1

where d1,i,j, d0,i,jÎDλ. Each partial product can also be expanded with factorization

, , , ,

, , (2 )r i ( 1)s i j { , , } 2jr e i j

n i i j i j n i

X - ×dh × = - h × dh ×X - × +h (5.13)

where δη,i,j={1}, {1,3} or {1,3,5} and η=1 or 0.

5.3 Programmable FIR Filter Based on Higher Radix Booth Recoding

5.3.1 Higher Radix Booth Multiplier (HRBM)

Fig. 5.1 shows the parallel architecture of 19×19-bit radix-16 (β=24) higher radix booth multiplier (HRBM), as shown in (5.5) and (5.6). The HRBM is composed of the odd multiples pre-computation, basic units, and adders. In the implementation of HRBM, the input X, coefficient C, and output X·C are represented in two’s complement format.

The pre-computations of odd multiples {1×, 3×, 5×, 7×} for radix-16 HRBM is utilized by complement and shift operations to generate the full set of partial products {-8×, -7×, …, 7×, 8×}for Booth radix-16 digits. Fig. 5.2 shows the structure of the odd multiples pre-computation and shows the 5× implemented by carry propagate adder (CPA). The basic unit of HRBM is composed of higher radix encoder (HREnc), 4:1 MUX, Shifter, and AND gates. Since HREnc is directly connected to coefficients,

Fig. 5.1 Parallel architecture of 19×19-bit radix-16 (β=24) HRBM.

it does not locate on the critical path. Table 5.1 present the radix-16 recoding scheme of HREnc. 4:1 MUX is used to select the odd multiples with control signal sel. Shifter is composed of one 4:1 MUX to select the data shifting 0 to 3 bits with the control signal sf. AND gates and signal zero are used to deal with zero partial product output.

Signal sig decides the partial product will be add (sig=0) or subtract (sig=1).

Finally, we employ the carry save adders (CSA) tree to sum the outputs of five basic units.

Fig. 5.2 Odd multiples pre-computation structure and pre-compute 5× architecture.

TABLE 5.1

RADIX-16 RECODING SCHEME OF HRENC

Coefficient 4:1MUX Shifter AND Sign

Ci[4j+3:4j-1] sel sf zero sig

Partial Product

00000 11111 00 00 0 0 1 0 -0

00001 00010 11110 11101 00 00 1 0 1 1 -1

00011 00100 11100 11011 00 01 1 0 1 2 -2

00101 00110 11010 11001 01 00 1 0 1 3 -3

00111 01000 11000 10111 00 10 1 0 1 4 -4

01001 01010 10110 10101 10 00 1 0 1 5 -5

01011 01100 10100 10011 01 10 1 0 1 6 -6

01101 01110 10010 10001 11 00 1 0 1 7 -7

01111 10000 00 11 1 0 1 8 -8

5.3.2 HRBM2 Based on Secondary Radix Recoding

Fig. 5.3 shows the parallel architecture of 19×19-bit radix-32-modulo-7 (β=25; λ=7) HRBM2 based on secondary radix recoding, as shown in (5.10). The HRBM2 is composed of the basic units and adders. In the implementation of HRBM2, the input X, coefficient C, and output X·C are represented in two’s complement format.

The 2-digit values of D7={0, ±1, ±2, ±4} cover all integers in Booth radix-32 digit range [-16,16]. The digits of D7 can be realized by a conditional complement (-1)s and a shift 2e, e=0, 1, or 2. A 19×19-bit multiplier X·C based on secondary radix recoding to recode coefficient C. (5.10) can be expressed as

3 3

5 5

1, 0,

0 0

7 i (2 )i i (2 )i

i i

X C X d X d

= =

× = ×

å

× × +

å

× × (5.14)

where d1,i, d0,iÎD7. Using radix-32-modulo-7 recoding the 19×19-bit multiplier X·C has 4 partial products of each digit expression. Each digit expression can be extended as

Fig. 5.3 Parallel architecture of 19×19-bit radix-32-modulo-7 (β=25; λ=7) HRBM2.

, 5 , Shifter1, Shifter0, and AND gates. Since HRSEnc is directly connected to coefficients, it does not on the critical path. Table 5.2 presents the radix-32-modulo-7 recoding

TABLE 5.2

RADIX-32-MODULO-7 RECODING SCHEME OF HRSENC

Coefficient Shift AND Sign Partial Product Ci[5j+4:5j-1] sf1/sf

scheme of HRSEnc. Shifter0 is composed of one 4:1 MUX to select the data shifting 0 to 2 bits with the control signal sf0. Shifter1 is composed of one 2:1 MUX to select the data shifting 0 to 1 bit with the control signal sf1. AND gates and signal z[1:0] are used to deal with zero partial product of δη,i. Signal s[1:0] decides the partial product will be add (s[1] or s[0]=0) or subtract(s[1] or s[0]=1). Finally, we employ the CSA trees to sum the output of each digit expression.

5.3.3 Programmable FIR Filter Architectures

We implement the 10-tap programmable FIR filters based on HRBM and HRBM2, respectively. For high performance, the transposed direct form architecture with CSA summation is used to implement the FIR filter. Fig. 5.4 shows the FIR filter using radix-16 HRBM consists of one odd multiples pre-computation and ten basic units and adders (BUAs) of HRBM. The HRBM scheme efficiently removes the redundant computations by sharing the odd multiples pre-computation in FIR filter operation, which leads to low-power design. For further improvement of high performance, Fig. 5.5 shows the FIR filter using radix-32-modulo-7 HRBM2 only consists of ten BUAs of HRBM2. Since the odd multiple of radix-32-modulo-7 HRBM2 is only {1×}, it efficiently eliminates the propagation delay of pre-computing multiples.

Fig. 5.4 Programmable FIR filter using HRBM.

5.4 Results and Comparisons

Two 10-tap programmable FIR filters using proposed 19×19-bit HRBM and HRBM2 are synthesized by TSMC 0.18 μm CMOS technology. We also implement the 10-tap FIR filters using 19×19-bit carry save array multiplier (CSAM), Wallace tree multiplier (WTM), and computation sharing multiplier (CSHM) [71] for comparisons. WTM and CSAM are two widely used multipliers. Since the tree structure of partial products summation, WTM has better performance than CSAM.

CSHM also has the concept of computation sharing for low complexity FIR filter design. We design the 10-tap FIR filter using CSHM as proposed in [71]. Since the filter architecture in [71] has one pipeline stage, we also implement the filters using different multipliers with one pipeline stage. Table 5.3 shows the clock cycle, area and power of the filters using different multipliers for low power or high speed design constraints. The power results shown in Table 5.3 are measured by the clock rate of 172 MHz (clock cycle=5.8 ns) for low power or high speed design constraints.

Since WTM- and CSAM-based architectures do not use computation sharing and perform redundant computations for all taps, the FIR filter using HRBM shows better results in terms of speed and power. CSHM-based architecture requires eight odd multiples pre-computations, {1×, 3×,..., 15×}, that are two times of odd multiples requirement of HRBM-based architecture. The basic unit of CSHM is more

Fig. 5.5 Programmable FIR filter using HRBM2.

complicated than the BUA of HRBM. Besides, HRBM-based architecture uses CSA summation to efficiently reduce critical path delay. Therefore, HRBM-based architecture has better speed and power results than CSHM-based architecture. Fig.

5.6 shows normalized power, area, and clock cycle of different multiplier-based architectures. The FIR filter using HRBM has 45.8%, 40.9%, and 55.2% performance improvement over FIR filter using CSHM, WTM, and CSAM. HRBM-based architecture has 45.7%, 37.0%, and 50.2% power reduction over the FIR filter based on CSHM, WTM, and CSAM. Furthermore, HRBM2-based architecture efficiently eliminates the propagation delay of pre-computing multiples. In terms of performance and power consumption, the HRBM2 scheme has 54.5~65.5% and 28.3~43.3%

improvement with respect to the FIR filter based on CSHM, WTM, and CSAM.

5.5 Summary

In this chapter, we present novel programmable digital FIR filters for low-power and high-performance applications. The proposed programmable FIR filters can be used to perform adaptive pulse shaping and signal equalization in wire or wireless

TABLE 5.3

DESIGN RESULTS OF 10-TAP PROGRAMMABLE FIR FILTER

For Low Power Design

HRBM HRBM2 CSHM WTM CSAM

Clock cycle (ns) 5.8 5.8 5.8 5.8 5.8

Area (gates) 36455 45582 47261 39501 60298

Power@172 MHz (mW) 68.32 77.71 126.00 108.38 137.10 For High Speed Design

HRBM HRBM2 CSHM WTM CSAM

Clock cycle (ns) 2.6 2.0 4.8 4.4 5.8

Area (gates) 37634 47193 59101 51070 60298

Power@172 MHz (mW) 69.68 78.55 160.31 142.78 137.10

baseband communications. In larger word length signal cases, for example, 19 bits, by using higher radix recoding to decrease partial products and pre-computation sharing in each partial product, HRBM-based architecture has better results in terms of performance and power consumption that are about 40.9~55.2% and 37.0~50.2%

improvement over other designs. Extending higher radix recoding scheme with secondary radix recoding can further improve performance by reducing the propagation delay of pre-computing odd multiples. Thus, HRBM2-based architecture can improve performance about 54.5~65.5% and power consumption about 28.3~43.3% over other designs.

Fig. 5.6 Design results comparison of different multiplier-based.

Chapter 6

Downlink Baseband Receiver Implementation

6.1 Introduction

In this chapter, the hardware implementation of the proposed downlink baseband receiver for IEEE 802.16e in mobile mode will be presented. The proposed receiver applied in STBC-OFDM system with two transmit antenna and one receive antenna aims to provide high performance in wireless mobile environments. The proposed baseband receiver includes the following features:

● provision of the simple and robust schemes and architectures for the symbol boundary detector and carrier frequency offset estimator;

● adoption of a high-performance two-stage channel estimation method for providing precise CSI in wireless mobile channels;

● provision of an efficient channel estimator architecture for low-complexity hardware implementation while keeping the high performance;

● implementation of a baseband receiver applied in an STBC-OFDM system with two transmit antennas and one receive antenna;

● for IEEE 802.16e specification, the STBC-OFDM baseband receiver supporting up to 27.32 Mbps uncoded data transmission in 16-QAM

modulation;

● operation at 11.2 MHz sampling clock and 78.4 MHz operation clock while drawing 68.48 mW from 1V supply voltage by using 90 nm CMOS process.

Fig. 6.1 (a) shows the architecture of the proposed downlink baseband receiver.

The baseband receiver includes a symbol boundary detector, an ICFO/FCFO estimator, an FFT, a two-stage channel estimator, an STBC decoder, and a demapper.

The architecture of the proposed two-stage channel estimator with the STBC decoder and demapper is also shown in Fig. 6.1 (b). Fig. 6.2 shows the implementation flow.

The proposed STBC-OFDM system is constructed by C/C++ language for the whole (a)

(b)

Fig. 6.1 Architectures of (a) the proposed downlink baseband receiver and (b) the proposed two-stage channel estimator with the STBC decoder and demapper.

system simulation environment. Floating-point simulation is first carried out to develop the robust architectures under the proposed synchronization and channel estimation methods and to evaluate the system requirements and the target performances as described in Table 3.1. After the system architecture is defined, depending on a tradeoff between the system performance and implementation complexity, the fixed-point simulation of the whole baseband receiver is performed to

system simulation environment. Floating-point simulation is first carried out to develop the robust architectures under the proposed synchronization and channel estimation methods and to evaluate the system requirements and the target performances as described in Table 3.1. After the system architecture is defined, depending on a tradeoff between the system performance and implementation complexity, the fixed-point simulation of the whole baseband receiver is performed to