**CHAPTER 3 A Low Complexity Frame Synchronizer for OFDM**

**3.2 P ROPOSED A LGORITHM**

**3.2.2 Quantization Approach**

the proposed design still containing 72.4% power ratio from conventional design. Therefore it can get better performance than [28]. On the other hand, the most significant taps scheme only requires 20 taps to reach 50% power ratio, saving 37.5% complexity from [28].

**3.2.2 Quantization ** **Approach **

Another effective approach to reduce complexity of cross-correlation was proposed in [29].

The proposed correlation scheme quantized the matched filter coefficients into the value composed
of {0、±2^{0}、±2^{-1 }、±2^{-2 }……±2^{-q }}. By the quantized 2^{-q }- level coefficients, multiply function of
cross-correlation scheme can be replaced with q-bit shifting function. Thus multipliers used for
correlation can be simplified into q-bit shifters. In IEEE 802.11a standard, the time domain long
training symbol can be quantized into {0、±2^{-3}、±2^{-4}、±2^{-5}、2^{-6 }}. The drawback of this approach is
serious quantization error, as FIG 3.7 shown.

FIG. 3.7 Tap power analysis of quantized approach

33

We use signal to quantization error ratio (SQNR) to estimate the quantization error (Eq 3.5)：

### ⎪ ⎪

Parameter C is the original matched filter coefficient and Q is coefficient after quantized. The SQNR of quantization approach is 14.86dB. Although the SQNR ratio is some worse, FIG 3.8 shoes the FER simulation in multipath channel with 150 ns RMS delay spread and CFO =100KHz under perfect packet detection. The SNR loss between original 64 taps and quantized 64 taps is only 0.5 dB for 1% FER.

FIG. 3.8 FER between conventional and quantization approach

34

Finally, we proposed a low complexity cross-correlation design for FFT-window detection by combining the most-significant taps scheme and the quantization approach. The algorithm is shown as follows:

Similar to Eq 3.4, parameter ‘R’ is the received signals and ‘N’ is the number of used taps.

The parameter ‘N’ to reduce complexity while still maintaining performance is different with channel condition and user’s concern. In chapter 5, we will show the simulation results between channel model, complexity, and performance in our 802.11a system platform.

35

**CHAPTER 4 **

**CHAPTER 4**

**A Low Complexity and High Throughput Frame ** **Synchronizer for OFDM-Based UWB System **

**A Low Complexity and High Throughput Frame**

**Synchronizer for OFDM-Based UWB System**

In this Chapter, a novel frame synchronizer is proposed for OFDM-based UWB system.

Integrating the tap-reduction scheme, register-sharing algorithm and dynamic threshold, the proposed design can save over 50% area cost and power consumption from the conventional design power with an acceptable performance loss. Moreover, the proposed design can achieve 528MS/s throughput for 120~480Mb/s data rates UWB system in 0.18µm CMOS process.

**4.1 Motivation **

**4.1 Motivation**

For OFDM-based UWB system, Frame synchronizer requires over hundreds of Mega samples per second throughput. Conventional frame synchronizer using single matched filter is not efficient to achieve high throughput by the long critical path of complex multiplier used for matched filters. On the other hand, parallel approaches with multiple matched-filters [9-10] to achieve such high throughput will lead to high area cost and high power consumption. To solve this problem, reducing matched filter complexity becomes the main concern to implement our design. In a matched-filter, tap number and required throughput dominate design complexity. Thus we proposed a tap-reduction scheme to reduce tap number for low-complexity improvement.

Furthermore, another register-sharing algorithm cooperates with the tap-reduction scheme to save required size of register-files for parallel architecture. Finally, dynamic threshold design is adopted

36

to enhance frame error rate performance from the conventional fixed-threshold design. The platform of our OFDM-based UWB system has been introduced in section 2.2. In the following, we first introduce the proposed algorithm based on LDPC-COFDM system to reach 528MS/s high throughput, including tap-reduction scheme, register-shaing algorithm, and dynamic threshold design. Then we apply the proposed algorithm for MB-OFDM system and add another dynamic searching window algorithm to detect RF switching of the three time-interleaved sub-bands. The performance analysis and simulation result of proposed design will be shown in chapter 5.

**4.2 LDPC-COFDM Design **

**4.2 LDPC-COFDM Design**

In LDPC-COFDM system, transmitter sends the valid data at one fixed sub-band with 528MHz bandwidth. Without time-interleaving the OFDM symbols, the TFC of RF will maintain constant. Thus frame synchronizer needn’t to consider the correct switching time between the sub-bands.

**4.2.1 Frame Synchronizer Flow **

FIG 4.1 is the data flow of proposed frame synchronizer for LDPC-COFDM UWB system.

In the initial, Packet detection detects the valid packet from the received signals through auto-correlation scheme. After packet announcement, FFT window detection finds the correct FFT window boundary by matched filters. Then preamble timing detection distinguishes three kinds of sync symbols (PS, FS, CES) in preamble. Finally, by the control signals from three main blocks, FFT symbol gate cuts OFDM data symbols to FFT for frequency domain transformation.

37

**From ADC** **To FFT**

**Preamble Cut**

**Boundary Cut**
**Packet Announce**

FIG. 4.1 Frame synchronizer flow

*4.2.1. 1 Packet Detection *

Noise signals and valid packet will be distinguished by using periodic packet sync symbols.

The normalized auto-correlation scheme of packet detection is shown as follows:

In Eq 4.1, the parameter *‘r’* is the received signals from ADC. Before valid packet
announcement, the received signals will be divided into several received symbols with 312.5ns time
duration (equal to one OFDM symbol duration). The parameter ‘*X’* is the index number of the
received symbols, and *‘N*’ is the total length of samples in one received symbol. The calculated

result ‘*A**X**’* represents the auto-correlation value of the *X**th* received symbol, ‘*P**X**’* represents the
(Eq 4.1)

38

power estimation of X*th* received symbol, and λ* _{X}* represents the normalized auto-correlation value
of

*X*

*th*received symbol. In [30], it proposed that AFC estimates CFO effect by the phase of auto-correlation value for OFDM symbol pair with three symbols duration. To share auto-correlation value with AFC, packet detection calculate auto-correlation value between received symbol

*X*

*th*and (X+3)

*th*as FIG 4.2. Moreover, to prevent false announcement, packet detection asserts the valid packet at index k when both λ

*and λ*

_{X}

_{X}_{−}

_{1}are higher than the pre-defined threshold.

**X =1** **X =2** **X =3** **X =4** **X =5** **X =6** **X =7** **X =8** **X =9**

**X =1**

**X =2**

**X =3**

**X =4**

**X =5**

**X =6**

**X =7**

**X =8**

**X =9**

### λ

1### λ

_{2}

### λ

_{3}

### λ

_{4}

### λ

_{5}

### λ

_{6}

312.5ns

**threshold**
**compare**

**threshold**
**compare**

**threshold**

**compare** **threshold**
**compare**

**threshold**
**compare**

FIG. 4.2 Packet detection flow

*4.2.1. 2 FFT Window Detection *

After packet detection, FFT Window detection finds FFT window boundary by comparing sync sequences in packet sync symbol. It also based on the cross-correlation algorithm and matched-filter. Section 2.2.1 refers that sync sequences has 128 points. Thus the tap number of matched-filter is 128. The cross-correlation algorithm is shown as follows：

39

In Eq 4.2, parameter ‘r’ is the received data from ADC, ‘s’ is the corresponding sync sequences used as matched-filter coefficients, ‘Ls’=128 is the total tap number, and ‘m’ is the index of pre-defined searching window with 312.5ns time duration (equal to one OFDM symbol duration).

*4.2.1. 3 Preamble Timing Detection *

In the proposed frame synchronizer, FFT window detection only finds the FFT window. We still need preamble timing detection to divide preamble from received data. The decision scheme of preamble timing detection is shown as follows:

( Eq 4.3 )

In the above equation, ‘D*Y*’ is the auto-correlation value of the sync sequences in Yth packet
sync symbol, ‘P*Y*’ is the corresponding symbol power, and ‘L*s*’=128 is the total points in sync
sequences. Preamble timing detection is also based on the auto-correlation scheme and ‘Γ’ is the
parameter of compared threshold. From the proposal [19], frame sync symbol equals packet sync
symbol multiplying –1. The auto-correlation value between the last packet sync symbol and first
frame sync symbol will be negative to auto-correlation value of other sync symbol pairs. Thus

### (

_{1}

### )

^{2}

40

preamble timing detection decides first sync symbol by Eq 4.3 shown as FIG 4.3. Since before preamble timing detection, FFT window boundary has been detected. We can remove cyclic prefix interfered by ISI from sync symbols and only use sync sequences for correlation estimation.

**21**

**21**

_{th}**(last)**

**(last)**

**(eliminate and approach zero)**

FIG. 4.3 Preamble timing detect flow

**4.2.2 Proposed Algorithm **

*4.2.2. 1 Tap-Reduction Scheme *

As mentioned earlier, parallel approaches to achieve 528MS/s throughput leads to high hardware cost and power consumption. For low complexity improvement, reducing tap number of matched filter was proposed [10]. The trade off is performance degradation of frame synchronizer.

According to the UWB system proposal [19], the power of sync sequences is constant for every sample point. We can’t apply the most-significant taps scheme introduced in section 3.2.1 to reduce tap number of matched filter. Therefore, we proposed a tap-reduction scheme to reduce correlation complexity by down sampling the received signals because of the average power distribution property of sync sequences. The proposed tap-reduction scheme can also apply for

41

auto-correlation scheme. In the following, we show the modified functions of Eq 4.1~ Eq 4.3：

Packet Detection：

( Eq 4.4 ) FFT Window Detection：

⎣^{(} ^{1}^{)}^{/} ⎦ ^{2}

Preamble Timing Detection：

( Eq 4.6 )

In Eq 4.4 ~Eq 4.6, the parameter ‘ω’ is a reduction factor controlling correlation complexity and tap number for each function block.

Differing from conventional down-sampling scheme having only 1/‘ω’ throughput rate of input data, the tap-reduction scheme still has the same throughput rate (528MS/s) with input data

to keep timing resolution of FFT window detection. Sync sequences used as matched-filter taps

2

42

are also divided into ‘ ω ’ groups (*S*_{n}_{×}_{w}_{+}_{j}*j*∈{0, 1, 2....,*w*−1}). By the average power

distribution property of sync sequences, any one of the ‘ω’ groups chosen as matched-filter taps has equal performance. The detail performance simulation of tap-reduction scheme will be shown in section 5.2.1. By the simulation result, we proposed ‘ω’=4 for our frame synchronizer. The data flow of conventional design and design using tap-reduction scheme (with ‘ω’=4, ‘j’=3) are shown in the following：

**Register file: with 128 words**
**3**

**127 128 129 130**

**0** **1** **2** **3** **4** **5** **6** **7** …… ……

**Received samples has been stored by register files**

**131 132 133**

FIG. 4.4 Data flow of conventional design with 128 taps (w=1)

43

**Register file: with 32 words**

**12**

**127 128 129 130**

**0** **1** **2** **3** **4** **5** **6** **7** …… ……

**Received samples has been stored by register files**

**131 132 133**

FIG 4.4 is the conventional design with 128 taps (‘ω’=1). The register-files storing received

samples for cross-correlation are 128 words. FIG 4.6 is the tap-reduction scheme with 32 taps
(‘ω’=4 ‘*j*’=3). Comparing FIG 4.4 and FIG 4.5, tap-reduction scheme reduces 75% correlation

complexity and register-files length of conventional design from 128 taps to 32 taps. However, when applying parallel architecture for high-throughput matched-filter design, the register-files should be parallelized, too. To resolve the increasing size of register-files for parallelism, we

proposed another register-sharing algorithm. It can cooperate with the tap-reduction scheme to share received samples for the parallel matched-filters to reduce required size of register-files.

44

*4.2.2. 2 Register-Sharing Algorithm *

**1 3 5 7**

**2 4 6 8** **Register File 1**

**Register File 2**

**Matched Filter 1**

**Matched Filter 2**

**1 3 5 7**

**Register File 1** **Matched Filter 1** **Matched Filter 2**

**share data**

**Conventional** **Parallelism**

**Proposed** **Register-Sharing**

**Algorithm**

**Assume: 8-tap matched-filter, parallelism=2, =2** ω

FIG. 4.6 Example of tap-reduction scheme with parallelism

FIG 4.6 shows an example of 8-tap matched-filter. With ω=2, register-files used for stored

received data were reduced to 4 words. However, when we use parallel 2 architecture, 2 suits

register-files are needed corresponding with 2 suits matched-filter, increasing the required size of register-files. Thus we proposed a register-sharing algorithm to solve this problem. By rescheduling the received data and compared taps, the 2 suits matched-filters can share the same received data with only one register-files, reducing hardware cost of register-files.

The register-sharing algorithm is shown as Eq 4.7. The left side is tap-reduction scheme from Eq 4.5, and the right side is the proposed register-sharing algorithm by rescheduling the index of

received data and compared taps as Eq 4.7.

45 The detail derivation of register-sharing algorithm is shown as Eq 4.8:

### ( )

To explain the proposed algorithm, we use 8-tap matched-filter as our example as FIG 4.7:¾ **Partition factor ’ω’ = 2**

**without register-sharing** **with register-sharing**

**share**

FIG. 4.7 Data flow example of the proposed design

46

As shown in FIG 4.7, conventional design use received data 1~8 comparing with compared taps 1~8 at K=0, and use received data 2~9 comparing with compared taps 1~8 at K=1. For ‘ω’=2, the tap-reduction scheme divide information of conventional design into two data-partition groups.

Without proposed algorithm, matched-filter only uses one data-partition group of compared taps (1、3、5、7) to compute matched-filter power. Thus register-files need to refresh at every sample cycle. However with proposed algorithm, matched-filter use all data-partition group of compared taps to compute matched-filter power for different sample cycle in order. Thus the register-files in FIG 4.7 can share received data for K=0 and K=1. When we apply the register-sharing algorithm for parallel architecture, required size of register-files can be reduced as FIG 4.6.

By using different tap groups to compare with shared received data, the register-sharing algorithm should cooperate with tap-reduction scheme. Furthermore, it is only suitable for matched-filter coefficients with constant power distribution because all the tap groups having the same power ratio makes correlation result in equivalent. The data flow of register-sharing algorithm with the proposed reduction factor ‘ω’=4 is shown as FIG 4.8. The access ratio between only tap-reduction scheme and with register-sharing algorithm is computed as Eq 4.9 by comparing FIG 4.5 and FIG 4.8. The conventional design accesses 32 words for first cycle and re-accesses 32 words for every proceeding cycles； The proposed design accesses 32 words for the first 4 cycles but re-accesses 1 word for every 4 proceeding cycles. The parameter ‘N’ is the searching window length of FFT window detection equaling to samples in one OFDM symbol.

### ⎣ ⎦ _{1} _{.} _{37} _{%}

47

**Received samples has been stored by register files**

………

**120 121 122 123 124 125 126 127**

**0** **1** **2** **3** **4** **5** **6** **7** ……… ……… **127+N-1**

FIG. 4.8 Data flow of register-sharing algorithm with 32 taps

*4.2.2. 3 Dynamic Threshold Design *

In general, a pre-defined threshold is needed to compare with the estimation result for detection using auto-correlation scheme. But in low SNR regions, received data seriously distorted by AWGN alters the optimized threshold value for auto-correlation scheme. Therefore, a dynamic threshold was proposed to generate the compared threshold automatically according to different channel conditions. We apply the dynamic threshold design for preamble timing detection in our proposed frame synchronizer. The decision function of preamble timing detection (Eq 4.6) has a pre-defined threshold ‘Γ’, and we calculate ‘Γ’ by dynamic threshold design as (Eq 4.10)：

48

In (Eq 4.10), definition of parameter ‘D’ and ‘P’ is the same as (Eq 4.6), the parameter ‘ε’ is a constant factor modified by users according to simulation results. In our design, the first threshold

‘Γ’ for comparison is calculated by the normalized auto-correlation value of the valid packet
announcement. Then threshold ‘Γ’ of other sync symbols is calculated by multiplying the
**normalized auto-correlation value of its previous sync symbols multiplying the constant factor ‘ε’. **

**4.3 Multi-Band OFDM Design **

**4.3 Multi-Band OFDM Design**

Different from LDPC-COFDM UWB system, MB-OFDM system used three sub-bands to transfer data. Therefore, baseband frame synchronizer of MB-OFDM system needs to control RF receiver detecting the selected sub-band and changes it at correct time. FIG 4.9 and FIG 4.10 show the received data of LDPC-COFDM system and MB-OFDM system individually. Before frame synchronizer detect the sub-bands successfully, RF receiver of MB-OFDM system will fix its bandwidth at one sub-band to transfer data. Thus only data at the selected sub-band can be transferred and data at other two sub-bands will be filtered as shown in FIG 4.10, meaning that the effective preamble length of MB-OFDM frame synchronizer will be reduced to only 1/3 of S LDPC-COFDM frame synchronizer. This requires frame synchronizer using packet sync symbol

49

for band detection as less as possible. To reach this demand, we modified the shared auto-correlator by adding its correlation complexity. This approach can improve the accuracy of packet detection and save the number of used packet sync symbol. The trade-off is doubling the area cost and power consumption of the shared auto-correlator. To maintain low-power feature, another low-cost dynamic searching window is proposed for band detection. It can provide estimated power information for AGC and reduce turn on probability of auto-correlator and matched-filter to save power consumption. For AGC, the spent packet sync symbol will be saved also by adding a training packet and using the estimated power of dynamic searching window for tuning correct RF gain.

**2812.5** **0** **-1875** **-937.5** **0** **937.5** **1875** **2812.5** **0.01**

**0.02** **0.03** **0.04** **0.05** **0.06**

**0.07** **Nois e ** **Pac ke t Start **

**POW** **ER**

**Time [ns]**

FIG. 4.9 Baseband Received Data of LDPC-COFDM system

50

FIG. 4.10 Baseband Received Data of MB-OFDM system

**4.3.1 Frame Synchronizer (MB-OFDM)Flow **

**Band**

**RF Receiver Band Select**
**Band Boundary**

**FFT**
**Window**
**Detection**

**FFT Boundary**

FIG. 4.11 Frame synchronizer flow of MB-OFDM UWB system

51

FIG 4.11 is the frame synchronizer flow of MB-OFDM UWB system. In the initial, control FSM fixes RF receiver at sub-band 1 to transfer data, and AGC tunes correct RF gain of noise signals. After AGC tunes the RF gain stably, packet detection uses auto-correlation scheme to the valid packet. At the same time band detection decides the correct switching time of time-interleaved OFDM symbols transferred at the three sub-bands. Once packet valid is asserted, control FSM changes sub-bands of RF receiver at corresponding tine duration by the band boundary information of band detection. Then FFT window detection finds FFT window boundary during band boundary ±16 sample cycles. Finally, preamble timing detection distinguishes three kinds of sync symbols (PS, FS, CES) in preamble and controls FSM cutting OFDM data symbols for FFT.

**4.3.2 Proposed ** **Algorithm **

*4.3.2. 1 Training AGC *

In our system platform, we assume that the variable gain amplifier (VGA) of RF receiver can tune gain from 0 to 70 dB and implement AGC block by signal power measurement algorithm. For low cost consideration, we used the estimated power information of band detection and build up one AGC lookup table with effective range from –10 to 10 dB to tune VGA gain. The drawback of the AGC lookup table is long searching time under high SNR condition. In LDPC-COFDM system, there are sufficient packet sync symbols for AGC tuning VGA gain. However, MB-OFDM

52

system enormously reduces the available sync symbols for AGC, and too long AGC time under high SNR region will fails frame synchronizer because of insufficient sync symbols. To solve this problem, we proposed the training AGC with binary search to tune VGA gain. Before transferring the valid data, transmitter sends a training packet for receiver and AGC tunes the correct gain at most 4 effective packet sync symbol (12 OFDM symbol duration). The tune valid gain of noise signal and data signal in training packet will be stored as training gain. When transferring the valid data, AGC will reference the training gain and tunes VGA gain finely by AGC lookup table. Thus only one effective packet sync symbol (3 OFDM symbol duration) will be cost by AGC. The algorithm of AGC is shown as follows：

### 2

In (Eq 4.11), GAIN*est *is the estimated gain from AGC lookup table with effective range
from –10 to 10 dB；GAIN*max* and GAIN*min* is the possible maximum and minimum VGA gain (In
our design, GAIN*max*=70dB and *GAIN**min*=0dB)； *GAIN**now* is the VGA gain at now time and
*GAIN**next *is the computed gain of next time. The detail data flow of training AGC is shown as FIG
4.12.

53

** of noise signal**
**Yes**

** of data signal**

**AGC training**

FIG. 4.12 Detail data flow of training AGC

54

*4.3.2. 2 Band Detection *

In MB-OFDM system, band detection must decide the correct switching time of sub-bands to receive time-interleaved OFDM symbols, like FIG 2.4. FIG 4.10 shows that before band detection, only 242.4ns (128 samples) has data for every 937.5ns period (3 OFDM symbols). If we accumulate the power of received signal for continuous 128 samples, the accumulated value will reach a local maximum value in time domain for every 937.5ns period. FIG 4.13 shows the accumulated power distribution in time domain and apparently the end of sub-band 1 locates at the index of local maximum value.

**-2812.5** **-1875** **-937.5** **0312.5 937.5** **1875** **2812.5**
**2**

**4**
**6**
**8**
**10**

**Pac ke t Start** **Nois e **

**Time [ns]**

**ac** **cu** **mulate** **d POWER**

**Band Boundary**

FIG. 4.13 Accumulated power of continuous 128 samples

55

To detect the end of sub-band 1, we use a dynamic searching window to find the

To detect the end of sub-band 1, we use a dynamic searching window to find the