System Requirements and Design Considerations

Chapter 2 Overview of IEEE 802.15.3c Standard

2.4 System Requirements and Design Considerations

The IEEE 802.15.3c standard indicates the system requirements mentioned in Section 2.1.1. One of the requirements is the 1728 MHz sampling rate. The sampling rate equals to the throughput of the system. The throughput is increased with the improvements of CMOS process and architecture, but over GHz throughput is still a challenge for hardware design. Moreover, high throughput also means that high power consumption in the digital circuit. Also, the choice of the architecture determines the power consumption. Hence, there two design considerations on high sampling rate:

architecture and power consumption.

The pipelined structure and parallel structure are commonly used for high throughput design. The pipelined structure can increase the clock rate by inserting the registers into the combinational circuit. However, the dynamic power is also increased

with clock rate. On the other hand, parallel structure increases the throughput by copying the structure without increasing the clock rate, but the area and static power grows with the number of copies. Hence, our considerations on architecture and power consumption mainly focus on how many copies we want and how fast the clock rate is.

Another system requirement is the bit error rate (BER). According to the FER and frame size, the required BER is 1.54*10^-4. Hence, the design consideration is the performance and the cost of computational complexity. First of all, we consider the algorithm that can be realized in hardware design. Since the throughput is very high, we should keep the complexity as low as possible. Then, we should consider the channel model. The channel model in Section 2.2 contains Doppler Effect, so it is time-variant. Thus, the algorithm of the equalizer must have the ability to update its coefficients with time. Third, the length of the training sequence is determined by the standard as mentioned in Section 2.1.3, so the algorithm should be ready within the training stage. Hence, we have to choose the reasonable computational complexity algorithm which satisfies the BER requirement.

Chapter 3 Fast Convergent Adaptive Frequency Domain Equalizer

3.1 Review of Frequency Domain Equalization

In Section 2.1.2, we derive the formula of circular convolution, which can be transformed into a simple multiplication in the frequency domain:

R H D

= ⋅ (3.1)

,where H is a diagonal matrix. To recover the transmitted data, we multiply the inverse of H on both sides of equation:

1 1

H

⁻ ⋅ =

R H

⁻ ⋅ ⋅ =

H D D

(3.2)

, where the inverse of H is also a diagonal matrix. After IFFT and CP removal, we can fully recover the transmitted signal dn.

The above equations describe the ideal case: no AWGN and time-invariant channel. In reality, the white noise always exists due to the thermal noise, and the channel varies with time due to many effects, such as related movement, air flow, or moving object. Thus, the equation should be:

k k( ) k k k

R

J t H D

⋅ ⋅ +

N

(3.3)

the subchannels. If we simply multiply the inverse of Hk all over the time, the time-variant effect will corrupt the data. Furthermore, to get the accurate inverse of Hk

is a difficult job under AWGN. To break through the predicament, the first thing is to overcome AWGN and get the inverse of Hk as accurate as possible. Then, an adaptive algorithm is performed to track the changes in the time-variant channel. In this way, the time-variant component Jk

(t) is no more a trouble in the equalization. Based on the

idea, the block diagram of the proposed adaptive FDE with channel estimation is shown in Fig. 3-1. The LS channel estimation evaluates the initial value of coefficients by using the CMS and the preamble as the training sequence. Then, the data payload is transmitted and equalized by FDE. The LMS adaptive algorithm updates the coefficients against the time-variant channel.

Fig. 3-1 Block diagram of the proposed FDE

3.2 Channel Estimation

In the beginning of the transmission, the transmitter sends the training sequence

u

512 located in CES field of CMS to assist the equalization as shown in Fig. 2-4. With the training sequence, we can easily estimate the channel matrix Hk, which is the inverse of the coefficients Wk.

512,

This solution is known as zero-forcing (ZF) method. The benefit is the simple implementation, but this method suffers from a problem: noise enhancement. With AWGN, the Eqn. (3.4) is revised as Eqn. (3.5).

The noise enhancement occurs when the channel gain Hk is so small that the noise

N

k is the dominant part in received signal. In that case, especially with large Nk, the estimation result is far away from perfect estimation as illustrated in Fig. 3-2.

410 420 430 440 450 460

Fig. 3-2 Noise enhancement

Since there are 6 U512 in CMS, using Least-Square (LS) method is a better way

than using ZF. The main point of LS is to minimize the sum of the squares of the error.

First of all, the equalization can be described as:

512,k k

R U

= W (3.6)

Second, apply the error caused by AWGN, where i stands for i-th U512 in CMS.

512, ,

Then, we need to minimize the sum of the squares, so let the partial derivative on

W

k be zero.

Finally, the solution of Wk indicates the minimum of S.

Since U512 is constant all the time, it can be rewritten as:

Substituting Rk with U512, the channel estimation result is:

512,

With the summation of Nk, the noise enhancement is reduced since the mean of AWGN is zero. We can obviously observe the benefit from Fig. 3-2. The numerical analysis in Table 3-1 also supports the result.

Table 3-1 Numerical analysis between LS and ZF

Method Mean of error Variance of error

LS 0.0176 + 0.0197i 1.4251

ZF 0.6073 + 1.6084i 5.7079

3.3 Adaptive Equalization

In OFDM system, the pilot subcarriers are needed to track the changes of the time-variant channel. However, we can not insert any known message in the frequency domain since the whole system is SCBT. Thus, our FDE requires an adaptive algorithm against the time-variant channel.

3.3.1 Adaptive Algorithm

There are many adaptive algorithms developed in the literals. The issues of these algorithms mainly focus on their computational complexity and convergence speed.

The widely used algorithms are Minimum-Mean-Square-Error (MMSE), Recursive-Least-Square (RLS), and Least-Mean-Square (LMS) [16], [17], and there are many improvements on these algorithms. Due to 1728MHz sampling rate, high computational complexity algorithm is not suitable for such high sampling rate

design. Based on the considerations, we will prove that LMS is a good choice for the FDE.

Let’s consider the block diagram of the adaptive FDE shown in Fig. 3-3. R is the input from FFT, and the adaptive FDE do the equalization and update filter coefficients W. The FDE output is sent back to time domain and made decision by the demapper. The error E is the difference between FDE output and the training sequence (or sliced output when the data is transmitted).

Fig. 3-3 Illustration of adaptive FDE

The idea of LMS algorithm is to use the method of the steepest descent to find a set of W which minimizes the cost function. In our design, the FDE takes a subblock into the equalization, so the cost function should involve a block of errors, which is so called Block LMS (BLMS) [18]. However, since the equalization is independent of each subchannel, we can consider each cost function Ck in each subchannel independently instead of whole subblock.

{ 2}

k k

C

Ex E

(3.12)

The notation of Ex{.} rather than E{.} is used to denote the expect value because we don’t want to be confused with the error E. Then, applying the steepest descent is

to take the partial derivative with respect to the filter coefficients W.

* *

{ } 2 { }

C Ex EE Ex EE

∇ = ∇ = ∇ (3.13)

Since the equalization is independent of each subchannel, Eqn. (3.13) is equal to zeros when the error E and coefficient W are in different subchannel. Then, substituting E with received signal R, we can rewrite Eqn. (3.13) as

, where k is the subchannel index. Now, these derivatives point towards the steepest ascent of the cost function. To find out the minimum of the cost function, we take a step size of

μ

in the opposite direction of the derivatives.

, *

, where n indicates the subblock index.

For simplification, the expected value can be reduced, and the whole LMS algorithm can be simplified as:

LMS: W_{k n}_, ₊₁=W_{k n}_, +

μ

R E_k _k^* (3.16)

The derivations of MMSE and RLS can be found in [16], [17]:

MMSE:

RLS:

, where

σ

_n²and

σ

_s²are variance of noise and signal respectively, Y is equalized signal,

U is the intermediate vector, and g

n is the gain vector.

Compared with MMSE [19]-[21] and RLS [22], [23], the LMS algorithm has less computational complexity than RLS since there is only one multiplication for updating on one subchannel. In hardware design, more operations on updating will cause the longer feedback latency. The latency will impact the performance since the equalizer can not update immediately. It is more sensitive to the latency especially in high sampling rate and deep pipelined system since the latency is much longer.

Furthermore, the low computational complexity leads to low power consumption. The low power issue is more important in the modern SOC design. In that case, LMS also has the advantage of low power consumption property. On the other hand, MMSE has less computational complexity than RLS, but it requires the information of SNR, which is hard to be evaluated since there are Doppler and channel Effect on the received signal. Although there are some algorithms trying to do SNR evaluation, the result is still not reliable in the practical system. Based on these considerations, LMS is suitable for FDE in high sampling rate design and can also achieve the required BER with LS channel estimation mentioned in Section 3.3.2.

3.3.2 Convergence Speed Acceleration

LMS has to do training to achieve convergence before any data is ready to be

increasing the step size [24]. However, compared with other algorithms, LMS still suffers the slow convergence speed [25] problem. Hence, the training time of LMS takes longer than others, and it requires longer training sequence to do training.

According to the standard, the training sequence is available in CES filed of CMS preamble and PHY preamble. However, there are only 6 U512 for training before CMS payload, so the training result of LMS is not good enough as compared with LS channel estimation.

From the analysis result shown in Fig. 3-4 and Table 3-2, LS channel estimation has a better performance in the view of mean and variance of the error after the training stage. Moreover, the channel model is almost the best case for LMS since the perfect estimation result is so close to the initial value for LMS training procedure, which is an all-pass filter with uniform filter gain. The learning curve is shown in Fig.

3-5. The simulation is under the channel model and 10dB AWGN. LMS only algorithm takes about 35 subblocks to achieve the same performance of LS-LMS combined algorithm. The MSE of first six subblocks is zero since LS is doing the average on these six points. The result supports that the convergence speed of the combined algorithm is indeed faster than single LMS algorithm.

Compared with adaptive TDE, the adaptive algorithm has an initial value of the coefficients with the aid of the channel estimation in the frequency domain. Hence, we can choose a low computational complexity algorithm with slower convergence speed. By doing so, we can balance the tradeoff of performance and hardware complexity.

Table 3-2 Numerical analysis of training result between LS and LMS

Method Mean of error Variance of error

LS 0.0213 + 0.0124i 0.0605

LMS -0.2533 + 0.1042i 0.0896

330 340 350 360 370 380 390 400 410 420

Fig. 3-4 Comparison of training result between LS and LMS

0 10 20 30 40 50 60 70

Fig. 3-5 Learning curves

3.4 Demapper

In the transmitter of the digital communication system, the digital modulation transforms the digital bit stream to an analog passband signal. The block diagram is shown in Fig. 3-6. First, the mapper converts every n-bits into one complex symbol according to the constellation map. After the Digital-to-Analog Converter (DAC), the discrete data stream becomes the continuous square wave. The pulse shaper transforms the square wave to band-limited waveform and reduces the frequency bandwidth. Finally, the mixer transforms the baseband signal to the passband signal with the carrier.

Fig. 3-6 Digital modulation

In IEEE 802.15.3c standard, the constellation maps are π/2 BPSK, π/2 QPSK, π/2 8-PSK, and π/2 16-QAM. The π/2 means that the symbol does counterclockwise π/2 phase shift after M-PSK mapping as shown in Fig. 3-7. The purpose of counterclockwise π/2 phase shift is to generate the continuous phase waveform, which results in a constant-modulus signal when using BPSK. The constant-modulus signal can reduce the problem caused by the non-linear distortion since the translations between each symbol never pass through the origin as illustrated in Fig. 3-8.

Fig. 3-7 Block diagram of π/2 M-PSK mapper

I Q

1 1

-1

I

Q

1 1

-1 -1

(a) (b)

Fig. 3-8 The translation at: (a) even sampling time, (b) odd sampling time

After the equalization in the receiver, the demapper converts the complex symbol back to the digital bit stream. For the π/2 M-PSK demapper, the first thing is doing the clockwise π/2 phase shift. Then, the slicer makes the decision on the complex symbol with noise. Finally, the M-PSK demapper converts the complex symbol back to the digital bit stream as shown in Fig. 3-9.

M-PSK constellation

demap clockwise π/2

phase shifter

π/2 M-PSK demapper

From IFFT output

To EQ output

Feedback to FFT

π/2 M-PSK mapper

Fig. 3-9 Block diagram of π/2 M-PSK Demapper and feedback loop

From Section 3.3, we know that the adaptive FDE needs the decision result to perform the algorithm. In this case, the block diagram in Fig. 3-9 can be revised as shown in Fig. 3-10. Since the π/2 phase shift doesn’t change the boundary of the constellation map of π/2 M-PSK, except π/2 BPSK, the slicer can be put before the π/2 phase shifter. For π/2 BPSK, we just do the decision on the real/image axis at even/odd sampling time since the data is modulated in that order. Therefore, we don’t need to do the counterclockwise π/2 phase shift again after slicing and reduce some hardware resources.

Fig. 3-10 Revised demapper

3.5 System Architecture and Performance

The proposed FDE operates based on equations in sections of 3.2, 3.3, and 3.4, and the detailed block diagram is shown in Fig. 3-11. In the simulation, the signals are interfered by channel model and AWGN and are assumed to be perfectly synchronized. The system flow is explained as follows:

1. In the beginning, the channel estimation evaluates the filter coefficients with training sequences by LS method.

2. When training sequences is done, the cyclic prefixed data stream is transmitted.

The adaptive FDE equalize the received signal.

3. After equalization, the signal is sent to decision circuit, which functions as a slicer.

4. Using the error between equalized and sliced signal, the adaptive FDE updates the filter coefficients by LMS algorithm.

To evaluate the performance, the channel model we use is based on the IEEE 802.15.3c standard group with Jakes’ model, mentioned in Section 2.2. The whole transmitted sequence is composed of CMS, preamble, PW, data, and PCES. The CMS and preamble are used for training and PW works as cyclic prefix. The simulation results are shown in Fig. 3-12. The whole testing environment is built with C language. For each testing point, the length of the transmitted sequence is 448000 samples. Based on the standard, the error rate criterion is set to 1.54*10^-4 after any error correcting method. From the figure, our adaptive FDE requires about 10 dB Eb/N0 to achieve this criterion. Comparing to optimal receiver, the loss is only 1.5 dB for both π/2 BPSK and π/2 QPSK.

The fixed-point simulation model is determined by following procedure. First, we quantize the input to minimum word length without significant performance loss.

Then, we quantize the next data path. Step by step, we can finally find out the word length of each data path and ensure the performance loss in a reasonable range.

Fig. 3-11 Detailed block diagram of the proposed adaptive FDE

0 1 2 3 4 5 6 7 8 9 10 11 12

10^-5 10^-4 10^-3 10^-2 10^-1 10⁰

Eb/N0(dB)

BER

AWGN pi/2 BPSK pi/2 QPSK

pi/2 BPSK (fixed-point) pi/2 QPSK (fixed-point)

Fig. 3-12 Eb/N0 vs. Bit Error Rate

Chapter 4 Architecture Design and Hardware Reduction

4.1 Design Specifications and Architecture

IEEE 802.15.3c standard focuses on over Gbps data rate wireless communication.

To achieve the target, there are two key features in the standard. The first one is the usage of the 60 GHz RF band. The unlicensed RF bandwidth is wide enough to support the usage of large bandwidth. The transmission rate is proportional to the bandwidth, so using the unlicensed 60 GHz RF band is essential. The second one is the ultra high sampling rate. Although there are many methods to achieve the target of high data rate, like using higher modulation or multi-input and multi-output (MIMO) system [26], raising the sampling rate is the most direct way since the data rate is proportional to the sampling rate. With the moderate modulation scheme, the data rate could be twice or three times of the sampling rate. In this way, we can easily achieve the target of over Gbps data rate. Based on these features, we propose LS-LMS combined FDE in Chapter 3. The block diagram shown in Fig. 3-11 is redrawn in Fig.

4-1 due to the hardware design considerations. In the following sections, we will discuss our hardware design. However, FFT and IFFT are not the design target in this thesis.

Fig. 4-1 Revised block diagram of the proposed FDE

In modern CMOS process, the issue of power consumption becomes more and more important. There are many methods to reduce the power consumption when we design the hardware, such as using low computational complexity algorithm, substituting high complexity arithmetic unit with lower one, or sharing the hardware resources. By using these methods, we can reduce the chip area and the switching power consumption. Meanwhile, the leakage power is also reduced when the chip area is reduced.

4.2 Divider Free LS Method

In Section 3.2, Eqn. (3.10) indicates LS method needs a complex division. There are two ways to avoid the division. One is using the phase operation as shown in Eqn.

(4.1), and the other one is to multiply the conjugate of the divisor both on the denominator and the numerator as shown in Eqn. (4.2).

512, 512,

The phase operation replaces the complex division into one square root function, two square functions, one scalar division, and one subtraction. However, the transformation between the phasor and complex number requires trigonometric function, as shown in Eqn. (4.3). Although there are some realistic designs, the hardware cost is still too high.

512 512, 512,

Eqn. (4.2) transforms one complex division to one complex multiplication, one square function and one scalar division. This method is generally used when we calculate the complex division. However, there is one scalar division, which is much more complex than a multiplier [27].

Since the division is an inversed multiplication, then multiplying an inverse of the scalar is a commonly used method. To find out the inverse, we can try to use a table with all possible inverse of the scalar, and we can easily implement it with a ROM as illustrated in Fig. 4-2. The bit width is determined by the accuracy of the inverse, and the word width is determined by the word length of the scalar. According to the simulation result of fixed-point C language, the bit width should be 13 bits and the word width is 14 bits to maintain the performance. Therefore, the size of the ROM is 2¹⁴*13, which is 213k bits. The cost is reduced, but the ROM still takes large area.

To reduce the size of the ROM, we can try to reduce the bit and word width. Since the accuracy is already determined by bit width, we need to focus on the reduction of the word width. By observing the inverse, we can find out that the inverse is almost the same in nearby words. An example is shown in Eqn. (4.4), the difference between 1/128 and 1/129 is so small that they can not represented in 13 bits. Therefore, nearby

在文檔中應用於單載波室內無線接收器之快速適應頻率域通道等化器之設計 (頁 31-0)