Summary - Symbol Rate Frame Synchronization with FD-ADC Architecture

Part I PHY Layer: Three Key modules for Multi-mode FD receiver

Chapter 2 Symbol Rate Frame Synchronization with FD-ADC Architecture

2.6 Summary

This chapter builds a low-complexity symbol-rate sequential searcher for OFDM symbol synchronization which is based on FD-ADC techniques. Searching with simple matched filter detection over multiple preambles, the proposed method is significantly tolerant to CFO and works properly at low SNR to make initial

synchronization easier and more flexible. By using symbol-rate processes, the hardware sharing successfully accomplishes low-complexity implementations. The main cost of the proposed searcher is to require one additional preamble. Hence, this work is attractive to very high throughput (VHT) wireless LAN and Multi-Gbps WPAN specifications using FD-ADC techniques.

Chapter 3 FD Channel Estimation and Equalization with Single-FFT Architecture for SCBT System

Without a CP, most SC transmissions can not adopt frequency-domain equalizer (FDE) directly. This chapter utilizes frequency-domain channel estimator (FD-CE) and decision-feedback aliasing canceller (DF-AC) to produce single-FFT SC-FDE. In this way, SCBT can be decoded using sphere decoder of MIMO-OFDM modems to support multi-mode and backward compatibility under an acceptable complexity in IEEE 802.11 VHT. An N-point FFT is sufficient to measure channel frequency responses (CFR) from L-sample preambles (L≦N/2). And then, M-bit block codes (M

≦L) are decodable over frequency domains with DF-AC’s help. Simulations and

measurements imply that this work can ensure adequate performance, even if there is no CP existed against the distortions of multipath propagation.

In most of the wireless broadband applications like WiMax, Wifi, UWB and WRAN, compensation for multi-path fading is highly pointed in order to make systems work properly. With the help of a CP SISO OFDM and MIMO OFDM systems can be easily integrated. In addition, this can help estimate channels and equalize packets over frequency domains directly. Although SCBT without any CP gains throughput, it causes FFT aliasing in FDE—FDE can not assure sufficient performance directly. Thus, one of the major challenge for multi-mode integrations is to make equalizers as compact as possible, i.e., consolidation of non-CP SCBT, SISO OFDM and MIMO OFDM. Reconfigurable and scalable architectures [63-[64] with heterogeneous units are good solutions to support such operations. However, multiple equalizers are built in designs. A single-carrier frequency-domain equalizer (SC-FDE), being an attractive solution, is developed to eliminate FFT aliasing without a circular property in some approaches [34-[35], e.g., overlap-and-save and overlap-and-add methods. Yet, additional DFT units were included. The user defined formats [65-[67], inserting CP into single-carrier datum against the multipath propagation, were created to improve performance. SC-FDEs with a pair of FFT and IFFT for IEEE 802.16 [64]

and IEEE 802.15.3a with an impulse option [69] were developed to demodulate

SCBT with CP over time domains. In the case of single-carrier transmissions with pseudo noise (PN) spreading, a block-based SC-FDE with both DFT and IDFT for HIPERLAN-2 [70] was utilized to yield an additional 3-dB gain, and it was also demodulated in the time domain.

The objective of this chapter is to derive single-FFT processes for supporting multi-mode and backward compatibility under an acceptable complexity in MIMO-OFDM modems, such as IEEE 802.11 VHT, with a non-CP SCBT. Figure 3-1 displays the block diagram of the proposed SC-FDE, where the sphere decoder (SD) [71]-[73] is widely adopted in MIMO-OFDM modems. All equalizations and decoding are performed over the frequency domain. An N-point FFT is sufficient to process L-sample preambles and M-bit block codes (M L  N/2).

The remainder of this chapter is organized as follows. The system assumptions with problem statements are addressed in Section 3.2. The proposed single-FFT processes are described in Section 3.3. Performance evaluations are presented in Section 3.4. Implementations and complexity are discussed in Section 3.5. Finally, Section 3.6 presents our conclusions.

Figure 3-1: The block diagram of FD-ADC based OFDM receivers

3.1 System Assumptions

3.1.1 System Descriptions

Indoor frequency-selective fading, e.g., IEEE [55] and JTC [74], is assumed to roughly span two symbols of L-sample preambles and M-bit block codes. The proposed packet format of non-CP SCBT is that datum without FEC are encoded by block code (Kb code sets contain M bits), modulated by QPSK and synchronized via

L-sample preambles (L M). The j^th received signal can be expressed as

1 0

j  L j U j 

r H s H s n (3.1)

where r is the j_j ^th 2L received vector, 1 n is 2₀ L complex additive white 1 Gaussian noise (AWGN) vector with variance _n² , and s is the j_j ^th L 1

transmitted symbol, e.g., preambles, SC block codes, or multi-carrier datum with N-L

zeros. H₂_L_₂_L is the circulant Toeplitz matrix with the first column being h and



h h0, , ,1 h2_L_1





h  denotes a channel impulse response (CIR). Both H_L and H_U are the lower triangle (including diagonal entries) matrix and upper triangle matrix of

2L2L

Linear convolution of two finite sequences is commonly conducted by multiplying the fast Fourier transforms of these two zero-padding input sequences. In SISO-OFDM and MIMO-OFDM systems, the CFR can be obtained from the FFT of the received symbol with CP divided by the FFT of the transmitted sequences [75].

k k k k ^,^ˆk ^k



^{0,1, ,} ¹



of estimated CFR. In frequency-selective fading, the CFR is not an identity matrix.

One problem associated with single-FFT processes for non-CP SCBT is to extend the FFT size in order to be enough to cover the entire packet. For a finite FFT size, it is impossible to gain a linear convolution of non-CP symbols and channel directly.

Another consideration is that most zero-forcing equalizers are adversely affected by noise enhancement due to CFR with zeros or very small values (deep fading) [74].

3.2 The Proposed Single-FFT Processes

3.2.1 Frequency-Domain Channel Estimator

Based on the received vector in Equation (3.1), the linear convolution of the j^th symbol and the CIR can be deconstructed into two parts: one is body, denoted by

L j

H s , and the other is H s , caused by interblock interference (IBI). In receivers, _U _j

the body part of the current symbol convolved with the CIR is mixed with the IBI term of previous symbols in each FFT window. These two components can be separated for linear convolution over the time domain, namely preamble

reconstruction. Taking all possibilities of QPSK modulations, the preambles (training symbols) are multiplied by complex coefficient. If “1” is the basis, the other three cases, i, -1, and i can be written as c₂ , 1 c₃ , and 1 c₄1, where c is ₂ i , c ₃

is -1, and c is i₄  . Two useful combinations of convolved vectors are received signals can be rewritten as

¹ ²

In Equation (3.6), each second term is also AWGN. To reduce AWGN effects, H b_L

and H b_U can be obtained as

sets of the received preambles that collect two specific combinations of current and previous symbols defined in Equation (3.5). After averaging r and r and solving

H bL and H b_U , the CFR can be measured via one-tap division. The frequency response is

   

are the functions of preamble reconstruction.

3.2.2 Decision-Feedback Aliasing Canceller

Based on decision feedback, the j^th received datum (in frequency domains) can be rewritten as detections, if a maximum-likelihood (ML) decision [76] is applied to decode datum

over the frequency domain, the maximum a priori equals

ˆ arg min ( ) 2

complexity of implementations and sharing with a MIMO-OFDM modem, an ML decision is replaced by SD. Although CFR ˆH is estimated using linear convolution _N with a preamble reconstruction, the body part and the aliasing term of each ideal code

set must be separated for decoding references. Instead of re-transforming to time

is a diagonal matrix containing the CFR. After the channel estimation, multiply

 

N j ze

F d by D_H, and pass through the separation operator G^—A new coefficient

 

N L j ze

F H d is produced for sphere decoding. In the time domain, the body part is easily derived by taking the first L of linear convolution.

 

⁽ ⁾ symbolsˆ_j₁, s is decodable over frequency domains. In the case of decoding the 1_j ^st data symbol, F H s_N( _Uˆ_j_₁)_se is acquired prior from the packet header.

3.3 Performance Evaluations

The WiFi systems and a linear block codes are used to evaluate the proposed SC-FDE (without FEC), where the FFT size is 64 and the number of preambles is 56. Each preamble, modulated by BPSK, is spreading with 11-chip Barker code (L=11) [2]—two useful combinations of preambles are {c =1, ₁ c =1} and {₂ c =1, ₃ c = -1}. ₄ Only one antenna is utilized to receive the non-CP SCBT in the MIMO-OFDM modem. The basis of our performance is the packet error rate (PER) that is required to be 8% in frequency-selective fading. Figure 3-2 displays the power delay profiles of JTC and IEEE 802.11 (Naftali model: Rayleigh fading with phase distributed uniformly) channel models, which are used to measure the system performance in multipath environment. The packet length is 1024 bytes, encoded by linear (8, 4) code [72] with BPSK modulation and complementary code keying (CCK) [2] with QPSK modulation, respectively. The generator matrix of a linear (8, 4) code is given by

(8,4)

The minimum hamming distance is 4. In WiFi systems, each CCK codeword (8-bit orthogonal block code) is composed of four phases  , ₁ ₂ , ₃ and ₄ of



^0, ^{2, ,3 2} 



(M=8).

(a)

(b)

(c)

(d)

Figure 3-2: Power delay profiles for IEEE and JTC channel models. (a) and (b) two random cases of IEEE model (RMS delay spread is 100 ns). (c) JTC indoor office B model. (d) JTC indoor residential C model

Figure 3-3 plots the PER of the linear (8, 4) code in both JTC and IEEE 802.11 fading channels. Compared with the case of AWGN, our SNR losses are around 3.6 dB ~ 5.6 dB, depending on fading environments. In Fig. 3-4, the proposed scheme has

improved performances compared with some time-domain extents [78]–[79]. The simulation results of channel model of JTC indoor office A indicates that the proposed scheme performs better than previous study [78], because the nonlinear equalization employs the sphere decoding with the search of minimum Euclidean distance over transmitted symbols. In the case of JTC indoor residential B, a CFO of 50 ppm with 1-ppm and 2-ppm residual errors caused by automatic frequency control (AFC) is induced in systems. In contrast with the ICI equalizer [79], this proposal also yields an improvement of 20-dB, indicating that the impact of residual CFOs does not make significant performance degradation. By transferring RF signals to MATLAB via USB for real-time measurements before VLSI implementations, XilinxDSP Development Kits with on-board 14-bit A/Ds, 14-bit D/As and FPGA (2-million gates and 50 MHz) are connected to an in-house 2.4 GHz 2x2 RF module with 20-MHz bandwidth to transmit and record real wireless packets. In this way, a software-defined radio (SDR) platform is shown in Fig. 3-5. The received EVM of QPSK is about -21 dB.

Figure 3-3: Simulation of the linear (8, 4) code in IEEE and JTC fading

Figure 3-4: Simulation of CCK in JTC fading － office A, residential C and office B with residual CFOs.

Figure 3-5: SDR platform for SCBT measurements.

3.4 Implementation and Complexity

Figure 3-6 and 3-7 show the block diagrams of the proposed FD-CE and single-FFT SC-FDE with DF-AC, respectively. Five key modules are derived in the FD-CE: (1) a preamble reconstruction with pattern recognition for pre-processing linear convolution before FFT; (2) a look-up table (LUT) for storing the ideal frequency responses of preamble; (3) complex multipliers with conjugate output for calculating the linear convolution; and, (4) two averages for reducing AWGN effects. The single-FFT SC-FDE with DF-AC (Fig. 3-7) contains five major building blocks: (1) an DF-AC to eliminate the FFT aliasing of non-CP symbols; (2) a sphere decoder to

decode datum over the frequency domain without noise enhanced; (3) an LUT to save ideal frequency responses of M-bit block codes as decoding references; and, (4) complex multipliers (shared with FD-CE) to generate new references for sphere decoding.

Figure 3-6: Block diagram of the proposed FD-CE.

Figure 3-7: Block diagram of the single-FFT SC-FDE with DF-AC

3.4.1 Sphere Decoder with SCBT Decoding

The decoding problems in both modes (SCBT and MIMO OFDM) can be formulated as Maximum-likelihood (ML) search, which can be efficiently solved through SD algorithm. In terms of tree traversal behavior, the classical SD algorithm is a type of depth-first branch and bound (B&B) algorithm that approaches to the solution of ML detection via two stages: preprocessing of QR decomposition, and tree search. In contrast with the classical SD, three parts of our SD are different when SCBT decoding is performed, which are the preprocessing, metric computation and tree structure. The main overhead of such SD is the control unit to deal with all data paths in implementations. The detail procedures of the SD for MIMO detection and SCBT decoding are recalled as follows:

Symbol definitions: x is the MT-dimensional transmitted signal vector ( ₁ ₂

a). Depth-first SD with MIMO detection

1) Pre-compute the QR decomposition on H by QR operations (HQR );

2) For each data sub-carrier from each antenna:

i. Compute y (y Q y ) by complex multiplier banks; ^H

ii. Perform depth-first tree search by SD engines [47] which search the solution of ˆ arg min ² initial radius of sphere decoder is infinite and the metric computation at level i is

b). Depth-first SD with SCBT decoding

1) Pre-compute the body term (^F^N



^{H d}^L ^j



_ze) and aliasing term (^F^N



^{H d}^U ^j



_ze⁾

by complex multiplier banks;

2) Compute y (y F r_N( )_j _zeF H s_N( _Uˆ_j_₁)_ze) by point-to-point subtraction;

3) Perform depth-first tree search by SD engines [47] which search the solution of ˆ arg min ² pre-calculated in the pre-processing step). The initial radius of sphere decoder is infinite and the metric computation at level i is

^

 

^x^{ }ⁱ ^ ^yⁱ^^^Bⁱ² (3.17) where Bi is the i^th element in signal vector B.

The preprocessing in SCBT mode is to calculate the body term (^F^N



^{H d}^L ^j



_ze^{) and}

aliasing term (^F^N



^{H d}^U ^j



_ze ) while the preprocessing in MIMO mode is QR

decomposition. All of them can be performed by the shared multiplier banks. B represents the multiplication result of two matrices in step a.2.ii while it is a frequency-domain signal vector in step b.3 and the metric computation of Equation (3.17) only needs complex subtractions. Hence a classical SD ALU [51] for the metric computation in MIMO-OFDM mode enables to apply for SCBT decoding directly.

The tree structure, e.g., a parent node branching to child nodes, is also different in the two modes, as plotted in Figs 3-8(a) and 3-8(b). For the typical SD search tree in

Nq-QAM MIMO detection, Fig. 3-8 (a) shows the tree structure that each parent node may extend Q possible child nodes. Figure 3-8 (b) displays the tree structure of SCBT decoding where each parent node can only extend one child node except the root node.

When a leaf node is reached, the upper bound (radius) is updated. If the partial Euclidean distance of current node is larger than the upper bound, a dead end is declared and the next start node of backward recursion is always one of the child nodes of root. The SD search engine does not require special extensions because the behavior of tree punching of SCBT decoding still follows the depth-first B&B paradigm. Hence, the main overhead of such SD is to design a control unit to deal with all data flows.

( 0)

N L ze

F H d F H d_N( _L ₆₄)_ze F H d_N( _L ₁₈₈)_zeF H dN( L ₂₅₂)ze

i2 1 x

xi xi4

Depth = 32 Depth = 4

backward recursion

backward recurs

ion

Figure 3-8: Tree structures of SD search in MIMO-OFDM and DSSS-CCK modes. (a) DSSS-CCK search tree. (b) MIMO-OFDM search tree for 4x4 16-QAM

3.4.2 Detail VLSI Architecture

Figure 3-9 displays the detail architecture of a 4x4 MIMO-OFDM modem with the proposed SC-FDE supporting two kinds of packet formats in WiFi systems: (1) DSSS-CCK and (2) MIMO-OFDM. The MIMO-OFDM modem was implemented via hardware-description language (HDL) and mapped on FPGA of a XilinxDSP Development Kit. In Fig. 3-9, one of the four FFTs based on variable-length FFT

architecture [80] supports 32 and 64 points, transferring the reconstructed data to the frequency domain. The preamble reconstruction extracts both body and ISI terms of Barker code to reconstruct linear convolution before FFT. By multiplying the frequency response of reconstructed data with the reciprocal of the frequency response of an ideal preamble (Barker code; pre-stored in ROM1), the channel frequency response (CFR) is extracted. Once the CFR is obtained, the ALU uses the four complex multiplier banks to calculate ^{D F}^H ^N

 

^d^j _ze ^and

   

N L j ze H N j ze

F H d GD F d for all d_j, where d_j is the frequency response of an ideal CCK code and G^ is the matrix operator (both saved in the ROM2). For all CCK codes, each output of ^{D F}^H ^N

 

^d^j _ze and the body (^F^N



^{H d}^L ^j



_ze) and aliasing terms (^F^N



^{H d}^U ^j



_ze) are stored in SRAM1 and SRAM2, respectively. If the received signals are data symbols, CE flag is set to 0. Then, the received signals subtract from an aliasing term with the feedback code index in SRAM2 to get the body term of the received signals. The sphere decoder with four SD engines [51] decodes the body term of received signals and feeds back the decision for next decoding. In the MIMO-OFDM mode, two long preambles are received to estimate the matrices of CFR by the four (parallel) channel estimators and then decomposed via the QR operations (H=QR). CE flag is set to 0 if the received signals are MIMO-OFDM symbol Y. The pre-processing of Y (Q^HY) is performed using complex multiplier

banks and then is recorded in SRAM1. Finally, the decoding is done via the same SD engines with Q^HY and R.

The symbol duration of each Barker code and CCK code are 1000 ns and 720 ns, respectively. In Fig. 3-9, the bottlenecks of our data path are the computations of

 

H N j ze

D F d and ^F^N



^{H d}^L ^j



_ze. Basically, we need 32*256 and 32*32*256 complex multiplications to measure ^{D F}^H ^N

 

^d^j _ze^and^F^N



^{H d}^L ^j



_ze. To reduce the decoding complexity, the code set with zero phase ones has been calculated － only 32*64 and 32*32*64 complex multiplications are required. The other three phases of

 

additional conjugate output (Fig. 3-10) is created to handle this work. Taking a matrix multiplication of G V^ (Vis a N-dimensional vector), the element g at row i and _j

the element g_N__j at row



^N ^ⁱ



_N multiply the same element in V and the computation rule is depicted by Fig. 3-11. As a result, the computing complexity of

 

N L j ze

F H d is reduced to 9*32 complex multiplications and the amount of the required complex multiplications is 9*32*64 for all CCK codes. In our implementation, the matrix multiplications of ^{D F}^H ^N

 

^d^j _ze^and^F^N



^{H d}^L ^j



_ze^are

performed via four complex multiplier banks (8 complex multipliers per bank). If the four complex multiplier banks (parallel) process 32 complex multiplications at 50 MHz, the processing time of (^{D F}^H ^N

 

^d^j _ze^and^F^N



^{H d}^L ^j



_ze) is 12.8 us or 640 cycles or 13 Barker codes. Because the AGC and synchronizations consume 16 Barker codes in our designs (AGC: 2 Barker codes, timing recovery: 10 Barker codes and phase recovery: 3~4 Barker codes), the available preambles for preamble reconstruction is 27 (=56-16-13). All storages (Fig. 3-9) represent the elements with word length of 10 bits. ROM1 is 80 bytes to store the reciprocal of the frequency response of an ideal Barker code, and ROM2 is 7680 bytes in which the 5120 and 2560 bytes are used to record the frequency responses of 64 ideal CCK codes with zero phase and the separator operator G^, respectively. ROM3 is 80 bytes to store the reciprocal of long preamble. After calculating ^{D F}^H ^N

 

^d^j _ze and body/aliasing terms of all CCK codes, the outputs are saved in SRAM1 (5120 bytes) and SRAM2 (10240 bytes), respectively. Due to the decoding latency of MIMO detection, the clock rate of

SD or K-best SD are up to 200 MHz supporting the 4x4 64-QAM MIMO-OFDM transmission. For WiFi systems with 4x4 MIMO OFDM and 64 QAM, four SD engines are required to parallel decode data at 150 MHz － one SD engine with 150 MHz clock rate can meet the decoding latency <720 ns in DSSS-CCK mode (because the CCK symbol duration is 720 ns). CCK codes can be partitioned into four sub-sets and decoded using four parallel SD engines, as shown in Fig. 3-12. The parallel decoding procedure decreases the decoding latency to 40 cycles (or slows down the clock rate to 100 MHz). Hence, the clock rate and the number of SD engines are dominated by MIMO detection. The detailed descriptions of SD engines are stated in the following part.

Multi-mode control unite

Mode 0: SCBT dataMode 1: MIMO OFDM data

Figure 3-9: Detail architecture and complexity of a 4x4 MIMO-OFDM modem with the proposed SC-FDE

Figure 3-10: Complex multiplier with an additional conjugation output

   

r i r i r i

Y Y Y j A A j  B B j and X XrX ji 



ArA ji

 

 BrB ji



^*.

Figure 3-11: The rule of signal multiplications of G^; dark squares for the elements being currently computed, and light squares for the elements after calculated).

Figure 3-12: CCK mapping on four sub-sets for parallel SD searching

在文檔中無線區域網路第一層與第二層核心技術之設計與實現 (頁 84-0)