Sampling Clock Offset Model - S IMULATED C HANNEL M ODEL

CHAPTER 2 System Platform

2.3 S IMULATED C HANNEL M ODEL

2.3.4 Sampling Clock Offset Model

As shown in FIG 2.7, sample clock offset (SCO) is caused by the variances of sampling frequency between digital to analog converter (DAC) in transmitter and analog to digital converter (ADC) in receiver. In time domain, SCO results time shift from practical sampled points and ideal

sampled points. Without compensating SCO effect, the time shift error will be accumulated. It leads ADC to sample the received signal at wrong time and fails receiver behavior. The SCO distortion also makes a linear phase error in frequency domain as FIG 2.12. Thus, we use pilot sub-carriers to estimate the linear phase error caused by SCO to recovery the transferred data.

FIG. 2.12 SCO effect in Time domain and frequency domain

CHAPTER 3 A Low Complexity Frame Synchronizer for OFDM Application

In this chapter, a low complexity frame synchronizer used for OFDM system is proposed. It mainly chooses the most-significant taps of matched filter used for FFT window detection to reduce correlation complexity of frame synchronizer. To explain our study clearly, the IEEE 802.11a PHY introduced in chapter 2 is selected as our system platform. The detail algorithm, analysis and simulation results will be shown in the following.

3.1 Frame Synchronizer Data Flow

Packet Detection

Coarse AFC

FFT Window Detection

Fine AFC

From ADC To FFT

Long Preamble Detection

Frame Synchronizer

FIG. 3.1 Frame synchronizer data flow

The data flow of proposed frame synchronizer is shown in FIG 3.1. In the initial, packet detection detects the valid packet through normalized auto-correlation algorithm in short preamble.

A decision threshold is chosen to compare with the normalized auto-correlation value. The valid

packet will be asserted when the normalized auto-correlation value is greater than decision threshold. Then, coarse frequency compensation uses residue short training symbols to compensate CFO ≦ ±4ppm(±20KHz). At the same time, frame synchronization detects the end of short preamble by another decision threshold. Next, FFT window detection finds out start boundary of FFT window by comparing with one long training sequence (cross-correlation algorithm). After deciding the FFT window boundary, fine frequency compensation compensates remain CFO ≦ 0.8ppm(4KHz) and channel equalizer estimates channel response by another long training sequence.

3.1.1 Packet Detection

In 802.11a PHY, the valid packet can be detected by depending the periodic data property of PLCP preamble. As mentioned in 2.1.2, short preamble is constructed by ten repeating short symbols and each short symbol has period ‘Ts’ (0.8us). Thus we make a comparison of received signals R(t) and R(t+Ts) by the normalized auto-correlation scheme [24-25] depicted as follows:

In the above equation: Ck is the auto-correlation value and Pk is the corresponding symbol power.

The parameter ‘N’ is the number of sample points in a short period ‘Ts’ equaling to 16.

Normalizing the auto-correlation value Ck with symbol power Pk , we can get a new decision value λk. The normalized auto-correlation valueλk can detect the valid packet independent with receiver power level. Thus packet detection begins working without AGC turning the correct RF receiver gain. In IEEE 802.11a PHY, AGC, packet detection, diversity selection and Coarse CFO estimation are required to be complete in short preamble duration. The number of short symbols needed for packet detection should be as less as possible. In our design, since AGC and packet detection can work simultaneously, they can share short symbols with each other and get longer estimation time to increase performance. The proposed decision value Λk are defined as following equation: it uses three short symbol pairs for normalized auto-correlation algorithm.

FIG. 3.2 Example of Packet Detection in Proposed Design

FIG 3.2 shows an example of packet detection. Noise signals with 5us are added before the valid packet. The testing channel condition is SNR=0dB, CFO=200KHz(40ppm) and multipath delay spread=150 ns. The vertical axis is the proposed normalized auto-correlation value Λk. To detect the valid packet, a pre-defined threshold is needed to compare with Λk. Once the normalized correlation value is greater than pre-defined threshold, detection of packet will be asserted. It is clearly under low SNR regions, the normalized auto-correlation value of noise signal varies extremely. To reduce the error rate of false announcement, a decision window is defined to test packet assertion. When Λk is greater than pre-defined threshold, the decision window starts to check the following correlation values. Packet detection only announce when all correlation values in decision window are also greater than the pre-defined threshold. If not, the packet assertion will be canceled and packet detection returns the initial state, as shown in FIG 3.2.

3.1.2 FFT Window Detection

In our proposed design, FFT window detection finds the correct FFT window boundary by the known-data property [26]. It compares the received data with the ideal long training symbol data in a pre-defined searching window. The data comparison is based on the cross-correlation algorithm shown as follows：

1 2

In the above equation, ‘R’ is the received data from ADC, ‘C’ is the corresponding compared element of long training symbol. ‘Ln’ is the total number of elements in one long training symbol.

In 802.11a standard, Ln is the same as FFT size equaling to 64. Δ(k) is correlation value of the

kth index of pre-defined searching window. Thus the maximum cross-correlation value represents which most similar to the ideal long training symbol, declared as the FFT window boundary.

FIG. 3.3 FFT window detection in AWGN and multi-path channel

An example of FFT window detection in AWGN channel and multi-path channel with 150 ns RMS delay spread is shown as FIG 3.3. It is clearly in the AWGN channel, the maximum cross-correlation index will be the start of FFT window as we expected. However in the multi-path channel, the delay spread of other arrival paths makes the maximum cross-correlation value locate in the later samples compared with the ideal FFT window boundary, and the correct FFT window boundary becomes the 2th or 3th peak cross-correlation value in the searching window. A common resolution is choosing the index earlier N points (N is an integer modified by designer) than the maximum cross-correlation value index as preferred FFT window boundary. However, the early catching will reduce the effective GI and degrades system performance in severe multi-path channel [27]. To solve this problem, the TOP ‘M’ pre-cursor searching scheme in [3] was referenced. It defines the index of maximum ‘M’ cross-correlation values as boundary candidates.

The ‘N’ samples before the peak cross-correlation value is pre-cursor window. If there are more than one boundary candidates locating in the pre-cursor window, chooses the earlier index as our preferred FFT window boundary. Otherwise, chooses the peak cross-correlation value index as our preferred FFT window boundary. FIG 3.4 is the FFT window boundary distribution between using pre-cursor searching scheme (In our design, M=5 and N=5) and conventional design (without pre-cursor searching scheme) in multi-path channel with RMS delay spread=150 ns. For the perfect boundary cutting (index=0 at FIG 3.4), using pre-cursor searching scheme has correct probability twice the conventional design. Also the boundary distribution of pre-cursor searching scheme is more centralized, meaning less early catching points needed to retain effective GI.

Comparing the simulation curves in SNR=0dB and SNR=10dB, since increasing SNR can’t reduce

multi-path interference, the boundary distribution of conventional design choosing the maximum correlation value in different SNR region are almost the same. However, SNR improvement can reduce probability of error boundary candidates in pre-cursor searching scheme caused by AWGN noise. Thus SNR improvement of pre-cursor searching scheme leads to better boundary distribution centralization (index=0) and less early catching (index from –4 to -1).

FIG. 3.4 FFT window detection in AWGN and multi-path channel

3.2 Proposed Algorithm

3.2.1 Most-Significant Taps Scheme

In 802.11a PHY, the most hardware cost of frame synchronizer is FFT-window detection. To

implement the cross-correlation scheme (Eq 3.3), matched filter with 64 taps are used to calculate the timing metric Δ(k), meaning 64 complex multipliers(each complex complier has four multipliers and two adders) are needed. Therefore, the most efficient approach for hardware saving is reducing required taps compared in FFT window detection. However, matched filter is based on ML estimation, its compared accuracy has positive relation with input data power. And decreasing tap number of matched filter may result in performance degradation. To reduce required taps of matched filter with the least performance loss, the most-significant taps schemes is proposed.

* ] [ ])

[

)

(

( ∑

= +

×

=

∆

m S m

C

R

k

_(Eq_3.4)

In Eq 3.4, the parameter C is the matched-filter coefficient from C0 to C63, corresponding to the 64 taps. S is the index-sorting matrix from the maximum element of C to the minimum element. For example, S[1] represents index of the 1st maximum element of C and S[2] represents index of the 2nd maximum element. The parameter N is the number of used taps modified by user in demand. FIG 3.5 shows the power distribution of matched-filter coefficients in time domain and reorders them by power ratio.

FIG. 3.5 Power distribution of C0~C63 and S[1]~S[64]

The contents of index-sorting matrix S is listed as follows:

S≣{15、51、1、33、25、41、30、36、46、20、54、12、35、31、39、27； (1st~16th) 59、7、62、4、45、21、26、40、2、64、16、50、3、63、55、11； (17th~32th) 8、58、60、6、28、38、48、18、43、23、57、9、34、32、49、17； (33th~48th) 44、22、19、47、53、13、14、52、42、24、37、29、10、5、61 } (49th~64th)

FIG. 3.6 Analysis of most significant tap number versus power ratio

In 802.11a standard, the matched-filter coefficients are generated from the long OFDM training symbol transferred into time domain, resulting great power ratio variance between the coefficients. In the most-significant taps scheme, the least power ratio coefficients will be seen as redundant taps and removes from matched-filter. Thus the most-significant taps scheme can reduce correlation-complexity with less performance degradation. FIG 3.6 plots the total number of taps used for most-significant taps scheme versus its containing power ratio. The matched filter in [28]

proposed using first 32 matched filter coefficients for low-power synchronizer design. It has 50 % power ratio from the conventional design (with total 64 taps). However in most 32 significant taps scheme, 50% correlation complexity from conventional 64 taps is saved as [28] with 32 taps, but

the proposed design still containing 72.4% power ratio from conventional design. Therefore it can get better performance than [28]. On the other hand, the most significant taps scheme only requires 20 taps to reach 50% power ratio, saving 37.5% complexity from [28].

3.2.2 Quantization Approach

Another effective approach to reduce complexity of cross-correlation was proposed in [29].

The proposed correlation scheme quantized the matched filter coefficients into the value composed of {0、±2⁰、±2^-1、±2^-2……±2^-q}. By the quantized 2^-q- level coefficients, multiply function of cross-correlation scheme can be replaced with q-bit shifting function. Thus multipliers used for correlation can be simplified into q-bit shifters. In IEEE 802.11a standard, the time domain long training symbol can be quantized into {0、±2^-3、±2^-4、±2^-5、2^-6}. The drawback of this approach is serious quantization error, as FIG 3.7 shown.

FIG. 3.7 Tap power analysis of quantized approach

We use signal to quantization error ratio (SQNR) to estimate the quantization error (Eq 3.5)：

⎪ ⎪

Parameter C is the original matched filter coefficient and Q is coefficient after quantized. The SQNR of quantization approach is 14.86dB. Although the SQNR ratio is some worse, FIG 3.8 shoes the FER simulation in multipath channel with 150 ns RMS delay spread and CFO =100KHz under perfect packet detection. The SNR loss between original 64 taps and quantized 64 taps is only 0.5 dB for 1% FER.

FIG. 3.8 FER between conventional and quantization approach

Finally, we proposed a low complexity cross-correlation design for FFT-window detection by combining the most-significant taps scheme and the quantization approach. The algorithm is shown as follows:

Similar to Eq 3.4, parameter ‘R’ is the received signals and ‘N’ is the number of used taps.

The parameter ‘N’ to reduce complexity while still maintaining performance is different with channel condition and user’s concern. In chapter 5, we will show the simulation results between channel model, complexity, and performance in our 802.11a system platform.

CHAPTER 4 A Low Complexity and High Throughput Frame Synchronizer for OFDM-Based UWB System

In this Chapter, a novel frame synchronizer is proposed for OFDM-based UWB system.

Integrating the tap-reduction scheme, register-sharing algorithm and dynamic threshold, the proposed design can save over 50% area cost and power consumption from the conventional design power with an acceptable performance loss. Moreover, the proposed design can achieve 528MS/s throughput for 120~480Mb/s data rates UWB system in 0.18µm CMOS process.

4.1 Motivation

For OFDM-based UWB system, Frame synchronizer requires over hundreds of Mega samples per second throughput. Conventional frame synchronizer using single matched filter is not efficient to achieve high throughput by the long critical path of complex multiplier used for matched filters. On the other hand, parallel approaches with multiple matched-filters [9-10] to achieve such high throughput will lead to high area cost and high power consumption. To solve this problem, reducing matched filter complexity becomes the main concern to implement our design. In a matched-filter, tap number and required throughput dominate design complexity. Thus we proposed a tap-reduction scheme to reduce tap number for low-complexity improvement.

Furthermore, another register-sharing algorithm cooperates with the tap-reduction scheme to save required size of register-files for parallel architecture. Finally, dynamic threshold design is adopted

to enhance frame error rate performance from the conventional fixed-threshold design. The platform of our OFDM-based UWB system has been introduced in section 2.2. In the following, we first introduce the proposed algorithm based on LDPC-COFDM system to reach 528MS/s high throughput, including tap-reduction scheme, register-shaing algorithm, and dynamic threshold design. Then we apply the proposed algorithm for MB-OFDM system and add another dynamic searching window algorithm to detect RF switching of the three time-interleaved sub-bands. The performance analysis and simulation result of proposed design will be shown in chapter 5.

4.2 LDPC-COFDM Design

In LDPC-COFDM system, transmitter sends the valid data at one fixed sub-band with 528MHz bandwidth. Without time-interleaving the OFDM symbols, the TFC of RF will maintain constant. Thus frame synchronizer needn’t to consider the correct switching time between the sub-bands.

4.2.1 Frame Synchronizer Flow

FIG 4.1 is the data flow of proposed frame synchronizer for LDPC-COFDM UWB system.

In the initial, Packet detection detects the valid packet from the received signals through auto-correlation scheme. After packet announcement, FFT window detection finds the correct FFT window boundary by matched filters. Then preamble timing detection distinguishes three kinds of sync symbols (PS, FS, CES) in preamble. Finally, by the control signals from three main blocks, FFT symbol gate cuts OFDM data symbols to FFT for frequency domain transformation.

From ADC To FFT

Preamble Cut

Boundary Cut Packet Announce

FIG. 4.1 Frame synchronizer flow

4.2.1. 1 Packet Detection

Noise signals and valid packet will be distinguished by using periodic packet sync symbols.

The normalized auto-correlation scheme of packet detection is shown as follows:

In Eq 4.1, the parameter ‘r’ is the received signals from ADC. Before valid packet announcement, the received signals will be divided into several received symbols with 312.5ns time duration (equal to one OFDM symbol duration). The parameter ‘X’ is the index number of the received symbols, and ‘N’ is the total length of samples in one received symbol. The calculated

result ‘AX’ represents the auto-correlation value of the Xth received symbol, ‘PX’ represents the (Eq 4.1)

power estimation of Xth received symbol, and λ_X represents the normalized auto-correlation value of Xth received symbol. In [30], it proposed that AFC estimates CFO effect by the phase of auto-correlation value for OFDM symbol pair with three symbols duration. To share auto-correlation value with AFC, packet detection calculate auto-correlation value between received symbol Xth and (X+3)th as FIG 4.2. Moreover, to prevent false announcement, packet detection asserts the valid packet at index k when both λ_X and λ_X₋₁ are higher than the pre-defined threshold.

X =1 X =2 X =3 X =4 X =5 X =6 X =7 X =8 X =9

λ

₂

λ

₃

λ

₄

λ

₅

λ

₆

312.5ns

threshold compare

threshold

compare threshold compare

threshold compare

FIG. 4.2 Packet detection flow

4.2.1. 2 FFT Window Detection

After packet detection, FFT Window detection finds FFT window boundary by comparing sync sequences in packet sync symbol. It also based on the cross-correlation algorithm and matched-filter. Section 2.2.1 refers that sync sequences has 128 points. Thus the tap number of matched-filter is 128. The cross-correlation algorithm is shown as follows：

In Eq 4.2, parameter ‘r’ is the received data from ADC, ‘s’ is the corresponding sync sequences used as matched-filter coefficients, ‘Ls’=128 is the total tap number, and ‘m’ is the index of pre-defined searching window with 312.5ns time duration (equal to one OFDM symbol duration).

4.2.1. 3 Preamble Timing Detection

In the proposed frame synchronizer, FFT window detection only finds the FFT window. We still need preamble timing detection to divide preamble from received data. The decision scheme of preamble timing detection is shown as follows:

( Eq 4.3 )

In the above equation, ‘DY’ is the auto-correlation value of the sync sequences in Yth packet sync symbol, ‘PY’ is the corresponding symbol power, and ‘Ls’=128 is the total points in sync sequences. Preamble timing detection is also based on the auto-correlation scheme and ‘Γ’ is the parameter of compared threshold. From the proposal [19], frame sync symbol equals packet sync symbol multiplying –1. The auto-correlation value between the last packet sync symbol and first frame sync symbol will be negative to auto-correlation value of other sync symbol pairs. Thus

(

₁

)

preamble timing detection decides first sync symbol by Eq 4.3 shown as FIG 4.3. Since before preamble timing detection, FFT window boundary has been detected. We can remove cyclic prefix interfered by ISI from sync symbols and only use sync sequences for correlation estimation.

21

_th

(last)

(eliminate and approach zero)

FIG. 4.3 Preamble timing detect flow

4.2.2 Proposed Algorithm

4.2.2. 1 Tap-Reduction Scheme

As mentioned earlier, parallel approaches to achieve 528MS/s throughput leads to high hardware cost and power consumption. For low complexity improvement, reducing tap number of matched filter was proposed [10]. The trade off is performance degradation of frame synchronizer.

According to the UWB system proposal [19], the power of sync sequences is constant for every sample point. We can’t apply the most-significant taps scheme introduced in section 3.2.1 to reduce tap number of matched filter. Therefore, we proposed a tap-reduction scheme to reduce correlation complexity by down sampling the received signals because of the average power distribution property of sync sequences. The proposed tap-reduction scheme can also apply for

auto-correlation scheme. In the following, we show the modified functions of Eq 4.1~ Eq 4.3：

Packet Detection：

( Eq 4.4 ) FFT Window Detection：

⎣⁽ ¹⁾^/ ⎦ ²

Preamble Timing Detection：

( Eq 4.6 )

In Eq 4.4 ~Eq 4.6, the parameter ‘ω’ is a reduction factor controlling correlation complexity and tap number for each function block.

Differing from conventional down-sampling scheme having only 1/‘ω’ throughput rate of input data, the tap-reduction scheme still has the same throughput rate (528MS/s) with input data

to keep timing resolution of FFT window detection. Sync sequences used as matched-filter taps

are also divided into ‘ ω ’ groups (S_n_×_w₊_j j∈{0, 1, 2....,w−1}). By the average power

distribution property of sync sequences, any one of the ‘ω’ groups chosen as matched-filter taps has equal performance. The detail performance simulation of tap-reduction scheme will be shown in section 5.2.1. By the simulation result, we proposed ‘ω’=4 for our frame synchronizer. The data flow of conventional design and design using tap-reduction scheme (with ‘ω’=4, ‘j’=3) are shown in the following：

Register file: with 128 words 3

127 128 129 130

0 1 2 3 4 5 6 7 …… ……

Received samples has been stored by register files

131 132 133

FIG. 4.4 Data flow of conventional design with 128 taps (w=1)

Register file: with 32 words

127 128 129 130

0 1 2 3 4 5 6 7 …… ……

Received samples has been stored by register files

131 132 133

FIG 4.4 is the conventional design with 128 taps (‘ω’=1). The register-files storing received

samples for cross-correlation are 128 words. FIG 4.6 is the tap-reduction scheme with 32 taps (‘ω’=4 ‘j’=3). Comparing FIG 4.4 and FIG 4.5, tap-reduction scheme reduces 75% correlation

complexity and register-files length of conventional design from 128 taps to 32 taps. However,

在文檔中應用於正交分頻多工技術為基礎之低複雜度接收端基頻框架同步器 (頁 36-0)