I NTRODUCTION TO IEEE 8016 M OFDMA 1. Basic OFDMA Symbol Structure in IEEE 8016m [10]

Digital Signal Processor Software Implementation of LMMSE Channel Estimation Method for IEEE

2. I NTRODUCTION TO IEEE 8016 M OFDMA 1. Basic OFDMA Symbol Structure in IEEE 8016m [10]

The Advanced Air Interface uses OFDMA as the multiple access scheme in the downlink. The material of this is taken

TABLE I

PRU STRUCTURE FORDIFFERENTTYPES OFSUBFRAMES Subframe Type Number of Subcarriers Number of Symbols

Type-1 18 6

Type-2 18 7

Type-3 18 5

from [10].

1) OFDMA Basic Terms: We introduce some basic terms appeared in the OFDMA physical layer (PHY) of IEEE 802.16m. These definitions help us understand the concepts of subcarrier allocation and transmission in IEEE 802.16m OFDMA.

• Physical and logical resource unit: A physical resource unit (PRU) is the basic physical unit for resource alloca-tion. It comprises Psc consecutive subcarriers by Nsym

consecutive OFDMA symbols. Psc is 18 subcarriers and Nsym is 6 OFDMA symbols for type-1 subframes, 7 OFDM symbols for type-2 subframes, and 5 OFDMA symbols for type-3 subframes. A logical resource unit (LRU) is the basic logical unit for distributed and lo-calized resource allocations. An LRU is Psc · Nsym

subcarriers for type-1, type-2, and type-3 subframes. The LRU includes the pilots that are used in a PRU. The effective number of subcarriers in an LRU depends on the number of allocated pilots.

• Distributed resource unit: A distributed resource unit (DRU) contains a group of subcarriers which are spread across the distributed resource allocations within a fre-quency partition. The size of DRU equals the size of PRU, i.e., Psc subcarriers by Nsym OFDMA symbols.

• Contiguous resource unit: The localized resource unit, also known as contiguous resource unit (CRU), contains a group of subcarriers which are contiguous across the localized resource allocations. The size of CRU equals the size of PRU, i.e., Psc subcarriers by Nsym OFDMA symbols.

2) Frame Structure: The advanced air interface basic frame structure is illustrated in Fig. 1. Each 20 ms superframe is divided into four 5-ms radio frames. When using the same OFDMA parameters as in Figs. II and III with channel bandwidth of 5, 10, or 20 MHz, each 5-ms radio frame further consists of eight subframes for G = 1/8 and 1/16. With channel bandwidth of 8.75 or 7 MHz, each 5-ms radio frame further consists of seven and six subframes, respectively for G = 1/8 and 1/16. In the case of G = 1/4, the number of subframes per frame is one less than that of other CP lengths for each bandwidth case. A subframe shall be assigned for either downlink (DL) or uplink (UL) transmission. There are four types of subframes:

• Type-1 subframe consists of six OFDMA symbols.

• Type-2 subframe consists of seven OFDMA symbols.

• Type-3 subframe consists of five OFDMA symbols.

• Type-4 subframe consists of nine OFDMA symbols.

This type shall be applied only to UL subframe for the 8.75 MHz channel bandwidth when supporting the WirelessMAN-OFDMA frames.

Fig. 1. Frame structure for 5, 10 and 20 MHz modes (Fig. 484 in [10]).

TABLE II

OFDMA PARAMETERS(TABLE794IN[10])

TABLE III

ADDITIONALOFDMA PARAMETERS(TABLE795IN[10])

The basic frame structure is applied to FDD and TDD duplex-ing schemes, includduplex-ing H-FDD MS operation. The number of switching points in each radio frame in TDD systems shall be two, where a switching point is defined as a change of directionality, i.e., from DL to UL or from UL to DL.

2. Downlink Transmission in IEEE 802.16m OFDMA[10]

Again this section is mainly taken from [10]. Each DL subframe is divided into 4 or fewer frequency partitions; each partition consists of a set of physical resource units across the total number of OFDMA symbols available in the subframe.

Each frequency partition can include contiguous (localized) and/or non-contiguous (distributed) physical resource units.

Each frequency partition can be used for different purposes such as fractional frequency reuse (FFR) or multicast and broadcast services (MBS).

1) Subband Partitioning: The PRUs are first subdivided into subbands and minibands where a subband comprises N1

adjacent PRUs and a miniband comprises N2 adjacent PRUs, where N1 = 4 and N2 = 1. Subbands are suitable for frequency selective allocations as they provide a contiguous allocation of PRUs in frequency. Minibands are suitable for frequency diverse allocation and are permuted in frequency.

2) Miniband Partitioning: The miniband permutation maps the P RUM Bs to Permuted P RUM Bs (P P RUM Bs) to ensure that frequency diverse PRUs are allocated to each frequency partition.

3) Frequency Partitioning: The P RUSB and P P RUM B

are allocated to one or more frequency partitions.

3. Cell-Specific Resource Mapping[10]

The content of this section is mainly taken from [10].

P RUF P is are mapped to LRUs. All further PRU and subcar-rier permutation are constrained to the PRUs of a frequency partition.

1) CRU/DRU Allocation: The partition between CRUs and DRUs is done on a sector specific basis. A 4 or 3-bit Downlink subband-based CRU Allocation Size (DCAS_SBi) field is sent in the SFH for each allocated frequency partition. DCASSBi

indicates the number of allocated CRUs for partition F Pi in unit of subband size.

2) Subcarrier Permutation: The subcarrier permutation de-fined for the DL distributed resource allocations within a frequency partition spreads the subcarriers of the DRU across the whole distributed resource allocations. The granularity of the subcarrier permutation is equal to a pair of subcarriers.

After mapping all pilots, the remainders of the used subcar-riers are used to define the distributed LRUs. To allocate the LRUs, the remaining subcarriers are paired into contiguous tone-pairs. Each LRU consists of a group of tone-pairs.

4. Pilot Structure

1) Pilot patterns: Pilot patterns are specified within a PRU.

Base pilot patterns used for DL data transmission with one data stream in dedicated and common pilot scenarios are shown in Figure 3, with the subcarrier index increasing from top to bottom and the OFDM symbol index increasing from left to right. Figs. 3(a) and 3(b) show the pilot locations for stream sets 0 and 1, respectively. The base pilot patterns used for two DL data streams in dedicated and common pilot scenarios are shown in Figure 4, where the subcarriers are indexed similarly to Figs. 3. Figs. 4(a) and 4(b) show the pilot locations for pilot streams 1 and 2 in a PRU, respectively. The number on

Fig. 2. Frequency partition for BW=10MHz, KSB = 7, F P CT = 4, F P CT = 4, F P S0 = F P Si = 12, DF P SC = 2, DCASSB,0 = 1, DCASM B,0= 1, DCASi= 2 and IDcell = 0 (Fig. 503 in [10]).

a pilot subcarrier indicates the pilot stream the pilot subcarrier corresponds to. The subcarriers marked as “X” are null

sub-Fig. 3. Pilot patterns used for 1 DL data streams (Fig. 505 in [10]).

Fig. 4. Pilot patterns used for 2 DL data streams (Fig. 506 in [10]).

Fig. 5. PRBS generator for pilot modulation (Figure 584 in [10]).

carriers, on which no pilot or data is transmitted. In this thesis, we used the pilot structure for 2 DL data streams.

2) Pilot Modulation: Pilot subcarriers are inserted into each data burst in order to constitute the symbol. The PRBS (pseudo-random binary sequence) generator depicted in Fig. 5 is used to produce a sequence wk. Each pilot is transmitted with a boosting of 2.5 dB over the average non-boosted power of each data tone. The pilot subcarriers is modulated according to

ℜ{ck} = 8 3

(1 2 − wk

), ℑ{ck} = 0. (4)

5. Downlink MIMO Architecture and Data Processing The architecture of downlink MIMO at the transmitter side is shown in Figure 6. The MIMO encoder block maps L MIMO layer (L ≥ 1) onto Mt MIMO streams (Mt ≥ L), which are fed to the precoder block. A MIMO layer is an in-formation path fed to the MIMO encoder as an input. A MIMO layer represents one channel coding block. For the spatial multiplexing modes in single-user MIMO (SU-MIMO),“rank”

is defined as the number of MIMO streams to be used for the user allocated to the Resource Unit (RU). For SU-MIMO,

Fig. 6. DL MIMO (Figure 547 in [10]).

only one user is scheduled in one RU, and only one channel coding block exists at the input of the MIMO encoder (vertical MIMO encoding at transmit side). For MU-MIMO, multiple users can be scheduled in one RU, and multiple channel coding blocks exist at the input of the MIMO encoder. The existence of multiple channel coding blocks at the input of the MIMO encoder can be caused by either using horizontal encoding or by using vertical encoding in several MIMO layers or by using a combination of vertical and horizontal encoding in several MIMO layers at the transmit side. Using multiple MIMO layers is called multi-layer encoding.

6. MIMO Layer to MIMO Stream Mapping

MIMO layer to MIMO stream mapping is performed by the MIMO encoder. The MIMO encoder is a batch processor that operates on M input symbols at a time. The input to the MIMO encoder is represent by an M × 1 vector as

s =





 s1

... s_M





 (5)

where si is the ith input symbol within a batch. In case of MU-MIMO transmissions, the M symbols belong to different MSs. Two consecutive symbols may belong to a single MIMO layer. One MS shall have at most one MIMO layer.

MIMO layer to MIMO stream mapping of the input symbols is done in the space dimension first. The output of the MIMO encoder is an M_t× NF MIMO STC matrix as

X = S(s) (6)

where

• M_tis the number of MIMO streams,

• N_F is the number of subcarriers occupied by one MIMO block,

• X is the output of the MIMO encoder,

• s is the input MIMO layer vector,

• S() is a function that maps an input MIMO layer vector to an STC matrix, and

• S(s) is an STC matrix.

The STC matrix X can be expressed as

X =







x_1,1 x_1,2 ... x_1,N_F x_2,1 x_2,2 ... x_2,N_F ... ... ... ...

xM_t,1 xM_t,2 ... xM_t,N_F.





 . (7)

The four MIMO encoder formats (MEF) are space frequency block code (SFBC), vertical encoding (VE), multi-layer en-coding (ME), and conjugate data repetition (CDR). In this paper, we only use SFBC MIMO encoder for implementation, so we just discuss about relevant SFBC technique content. For SU-MIMO transmissions, the STC rate is defined as

R = M

N_F. (8)

For MU-MIMO transmissions, the STC rate per user (R) is equal to 1 or 2.

1) SFBC Encoding: The input to the MIMO encoder is represented by a 2× 1 vector

s = [s1

]

. (9)

The MIMO encoder generates the SFBC matrix X =

[s1 −s^∗2

s2 s^∗₁ ]

. (10)

where X is a 2×2 matrix. The SFBC matrix X, occupies two consecutive subcarriers.

2) Signal Reception for SFBC Encoding: In this paper, we use the SFBC to MIMO stream mapping. Figure 7 shows the 2× 1 SFBC structure in our system, where s1 and s2 are transmission signals, P₁ and P₂ are pilots, and i indicate the channel response for antenna i at subcarrier j. We encode and decode for a pair of subcarriers and use the least-square method to decode the received signal as

r = Hs + n, s = For simplicity we use linear model, where r_iis received signal correspond to vector r, and H is channel response matrix, and s_i is transmission signal corresponding to vector s, where ˆs_i is the estimation of si.

3. BASICCHANNELESTIMATIONMETHODS

1. Least-Squares (LS) Estimator

Based on the a priori known data, we can estimate the channel responses at pilot carriers roughly by the least-squares (LS) technique. An LS estimator minimizes the squared error [11]

||y − HLSx||² (14)

where y is the received signal and x is a priori known pilots, both in the frequency domain and both being N× 1 vectors where N is the FFT size. HLS is an N × N matrix whose

Fig. 7. SFBC structure.

Therefore, (14) can be rewritten as

[y(m)− x(m)hLS(m)]², for all m = mi. (16) Then the estimate of pilot signals, based on only one observed OFDMA symbol, is given by

hLS(m) = y(m)

x(m) =x(m)h(m) + n(m)

x(m) = h(m) +n(m)

x(m) (17) where n(m) is the complex white Gaussian noise on subcarrier m. We may collect hLS(m) into hp,LS, an Np× 1 vector as

where h_p,LS(i) means the channel response uses LS method for estimation on ith pilot, and y_p(i) means received signal on ith pilot and xp(i) means transmission signal on ith pilot. The LS estimator is a simplest channel estimator one can think of.

2. Linear Interpolation

After obtaining the channel response estimate at the pilot subcarriers, we use linear interpolation to obtain the responses at some other subcarriers. Exactly where apply it will be discussed later. Linear interpolation is a commonly considered scheme due to its low complexity. It does the interpolation between two known data. We use the channel estimate at two pilot subcarriers obtained by the LS estimator to estimate the channel frequency response information at the pilot subcarriers between them.

Linear interpolation may be done in frequency or in time.

We only perform it in time. The channel estimate at the other pilot subcarrier for time index k, mL < k < (m + 1)L , using linear interpolation is given by [12]

h_e(k) = h_e(mL+l) = [h_p(m+1)−hp(m)]l

L+h_p(m) (19) where hp(k), k = 0, 1,· · · , Np, are the channel frequency re-sponses at pilot subcarriers, L is the pilot subcarriers spacing, and 0 < l < L.

Fig. 8. Linear interpolation.

3. LMMSE Channel Estimation

We have given a brief introduction to the method in chapter 1. Now we describe it in more detail. The material in this section is mainly taken from [1], [2].

1) Channel Modeling for Channel Estimation: Consider a discrete-time equivalent lowpass channel impulse response

h(n) =

L∑−1 l=0

α_lδ(n− l) (20)

where n and l are integers in units of the sampling period T_s and αlis the complex gain of path l. The mean delay and the RMS delay spread are given by, respectively,

τ_µ= One question here is how the expectation E(|αl|²) should be defined. As our purpose is channel estimation, suppose one channel estimation is performed for K OFDM symbols. Then the expectation should be an average that is taken over these symbol, note that we may let k = 1. In addition, we assume that the channel estimator input contains no carrier frequency error, but the PDP can have a nonzero initial delay τ0, although conventional definition of the PDP usually zero out the initial path delays.

Fourier transforming the PDP gives the corresponding fre-quency autocorrelation function. For an exponential PDP with initial delay τ₀, we have

Rf(k)

Rf(0) = e^−j2πτ⁰^k/N

1 + j2πτrmsk/N (23) where τ0 = τµ− τrms and N is the DFT size used in the multicarrier system. For a uniform PDP of width T with a initial delay τ0,

2) Estimation of Channel Delay Parameters: The fre-quency response of the channel in (20) is given by

H(f ) =

L∑−1 l=0

αle^−j2πlf/N (25)

where the division by N in the exponent normalizes the period of H(f ) in f to N . If we advance the channel response by τ (arbitrary) time units, then the frequency response becomes

Ha(f ) = e^{j2πτ f /N}H(f ) =

L∑−1 l=0

αle−j2π(l−τ)f/N. (26) Differentiating Ha(f ) with respect to f , we get

dHa(f ) Applying Parsevals theorem, we get

J (τ ),⟨

where <· > denotes frequency averaging. Hence J (τ ), E(⟨ The above equations show that J (τ ) is minimized when τ = τµ. In addition,

τ_rms² = N²min J (τ ) 4π²∑L−1

l=0 E(|αl|²). (30) For pilot-transmitting OFDM systems, the below given a way to find τµ and τrms from the frequency domain channel estimate.

Consider a system where one out of every Fs subcarriers is a pilot. We can approximate dHa(f )/df by the first-order difference, [Ha(f + F_s)− Ha(f )]/F_s and substitute it into (29). Then, we obtain

J (τ )≈ 1 frequen-cies, and < · >p denotes averaging over pilot subcarriers.

Then we modify the approximation by taking circular differ-encing over f rather than linear differdiffer-encing. Therefore, we approximate J (τ ) by

J (τ )≈ 1 F_s²E|⟨

|e^jϕH ((f + Fs)%N ) − H(f)|²⟩

p (32) where % denotes modulo operation, and we have assumed that (f + F_s)%N is a pilot subcarrier. Now let Ri be the (instantaneous) frequency-domain autocorrelation of the channel response: Then from (32) we have

J (τ )≈ 2

F_s²[E(R₀− R{e^jϕE(R₁)})]. (34) Then (34) gives an approximation of J (τ ) defined in (29).

According to (34), τµ and τrms can be estimated in the following way:

1) estimate the channel responses at the pilot subcarriers,

2) estimate R_i (i = 0, 1), 3) estimate J (τ ),

4) find the value of τ that minimizes J (τ ), and 5) substitute the result into (30) to estimate τ_rms² .

Step 1 can be achieved using the LS method. For step 2, R0

and R1 can be estimated via R0= where we assume that σ_n² has been estimated in some way, such as from the received power in the null subcarriers. Thus, for step 3, J (τ ) can be estimated using where Av denotes time averaging, i.e., averaging over OFDM symbols say k. For step 4, we may estimate the mean delay as

3) LMMSE Filtering: To complete LMMSE channel esti-mation, the above estimates of delay parameters, namely, τ_µ and τrms, can be substituted into proper places in (23) or (24) depending on the choice of PDP model. Then the resulting autocorrelation function of channel frequency response can be used in LMMSE channel estimation as outlined in chapter 1.

4. Application to IEEE 802.16m

As mentioned previously, We employ the technique of [1], [2]. The LMMSE channel estimation for IEEE 802.16m proceeds as follows.

1) Do LS channel estimation at pilot subcarriers of all used PRUs.

2) Linearly interpolate in time to acquire two additional channel estimates per PRU per symbol, as shown by the arrows in Fig. 9.

3) Estimate R0 and R1 as explained below.

4) Estimate τµ and τrms as discussed below.

5) Find the autocorrelation function associated with the exponential PDP as

6) Based on the above autocorrelation function, do LMMSE filtering to estimate the data subcarrier re-sponses as

wd = (Rpp+ σ²_nI)⁻¹rdp, (40) h_d= w^h_dh_p. (41)

: For another antenna : Data subcarrier : pilot subcarrier

: preamble

DL SF0 DL SF1 DL SF2 DL SF3 DL SF4

Fig. 9. Linear interpolation in time domain for 16m downlink signal.

Let H(f ) denote the resulting channel estimate at pilot sub-carrier f from step 1. Let σ²_n be the variance of additive white gaussian noise (AWGN) at the pilot locations and linear interpolation from two pilots in step 2. For step 3, R0 and R1

can be estimated via R0=

p is the averaged magnitude-squares of esti-mation pilot subcarrier channel responses, σ_n² is the estimated noise variance at pilot subcarriers, and Fs = 8 (there is one pilot subcarrier every 8 subcarriers in IEEE 802.16m). For step 4, we may estimate the mean delay as

τ_µ=−N∠Av(R1) 2πFs

. (43)

And we may estimate the RMS delay spread as

τ_rms= N

in our present work, we do delay estimation based on only one OFDM symbol for simplicity and for better performance in a time-varying channel. Hence we estimate R0 and R1 for each OFDM symbol separately.

4. FIXED-POINTIMPLEMENTATION ANDOPTIMIZATION

METHODS

In this chapter, we discuss some technique issue concern-ing our fixed-point implementation and some optimization methods to reduce the run time. Before the above, we first introduce the basic concepts of fixed-point and floating-point arithmetic. What is their difference? What are the advantages and disadvantage of fixed-point calculation? After these, we propose some techniques regarding C code implementation and optimization.

Fig. 10. Mixed method.

1. Nonlinear Function Implementation

In our implementation, we will need to carry out the compu-tation of some function that are implemented the math library of C as subroutines. These subroutines may take many CPU cycles to execute. We can use some technique to avoid calling these subroutines and save cycles. We discuss the techniques below. Specially, the functions that we are interested in are sqrt(), sin(), cos(), and arctan().

1) Table Look-Up Method: A simplest method to imple-ment a mathematical function is to use a look-up table. First, we build a table which contain the desired number of output values for the function. Then, we only need to enter the proper input index to get the answer from the table. But table size determines the needed memory space. If we want to have more output precision, then a larger table needs to be limit.

Therefore, we have to know what range of input and output values that we need. When know there ranges, we can prevent wasting of unnecessary table space. Table look-up offer a simple method to replace calling library, but its precision depends on the table size. If the memory size is very limited, then it may not have enough space to store a table with the desired precision.

2) Mixed Method [14]: Mixed method is a trade off between table look-up and computation, which is based on table look-up but calculates the output value further by linear interpolation as shown in Figure 10. First, we have to build two tables of reduced size compared to the full-precision table, one is to record the function outputs and the other to record the slopes between pairs of input values. See Figure 10. If we want to know the corresponding output of X, we can get the answer by calculating the distance between X and X1and the slope between X1 and X2. Adopting this technique, we can use a little computation to exchange for more precision.

2. Q-Scaling Flow Chart

Figure 11 shows the flow chart of the channel estimation

在文檔中寬頻合作式無線多輸出入通訊系統---子計畫三：合作式多輸出入無線通訊之上行傳收器訊號處理技術研究( II ) (頁 42-58)