國 立 交 通 大 學
電子工程學系 電子研究所碩士班
碩 士 論 文
IEEE 802.16e OFDMA 頻寬檢測與資訊單元處理
技術之研究
Study in Bandwidth Detection and Information Element
Handling for IEEE 802.16e OFDMA
研 究 生: 蔡婉清
指導教授: 林大衛 博士
IEEE 802.16e OFDMA 頻寬檢測與資訊單元處理技術之研究
Study in Bandwidth Detection and Information Element Handling
for IEEE 802.16e OFDMA
研 究 生: 蔡婉清 Student: Wan-Ching Tsai
指導教授: 林大衛 博士 Advisor: Dr. David W. Lin
國 立 交 通 大 學
電子工程學系 電子研究所碩士班
碩 士 論 文
A Thesis
Submitted to Department of Electronics Engineering & Institute of Electronics College of Electrical and Computer Engineering
National Chiao Tung University in Partial Fulfillment of the Requirements
for the Degree of Master of Science
in
Electronics Engineering July 2008
Hsinchu, Taiwan, Republic of China
IEEE 802.16e OFDMA 頻寬檢測與資訊單元處理
技術之研究
研究生:蔡婉清 指導教授:林大衛 博士
國立交通大學
電子工程學系 電子研究所碩士班
摘要
本篇論文介紹 IEEE 802.16e 正交分頻多工存取(OFDMA)裡,頻寬檢測和循 環自首(cyclic prefix)長度檢測的問題、演算法、以及資訊單元處理實做方面的議 題。 當一個行動電話在一開始要進入網路的時候,可能會需要檢測所在的頻寬。 在收到的訊號中,可能會包含數個系統而每個系統的頻寬和位置都有待檢測。此 篇論文中,我們考慮接收 20 MHz 頻寬的訊號,找出所有這訊號中存在的系統所 使用的中心頻率和頻寬。由於在 IEEE 802.16e 中有規範關於頻寬大小與中心頻 率放置的限制,我們可以根據規範列出所有可能存在這 20 MHz 的頻寬分佈 (每 個系統的頻寬大小與中心頻率)。在所有可能的頻寬分佈的功率頻譜已知的情況 下,我們可以找到一個功率頻譜最接近所接收到的訊號的功率頻譜,而這個功率 頻譜所屬的頻率分佈就是我們的檢測結果。在這篇論文中,我們提出兩種維度分 別去計算功率頻譜的距離,並且分析比較此兩種維度所檢測出的效果和計算複雜 度。
完成頻寬檢測後,此行動電話必須檢測它要進入的系統的循環字首的長度。 循環字首共有四種長度。考慮到計算複雜度的問題,我們使用循環字首的相關 性,找出這四種長度中相關性最高的長度就是我們的檢測結果。 行動電話在完成進入網路的程序並成功進入後,必須在每個下行訊框讀取資 料單元,因此我們建議一種低運算複雜度的資訊單元處理的方法並且將他實現與 最佳化在 DSP 上。
Study in Bandwidth Detection and Information
Element Handling for IEEE 802.16e OFDMA
Student: Wan-Ching Tsai Advisor: Dr. David W. Lin
Department of Electronics Engineering
Institute of Electronics
National Chiao Tung University
Abstract
This thesis introduces the bandwidth detection, cyclic prefix (CP) detection, and information element (IE) handling problems and algorithms, and implementation the issues of IEEE 802.16e OFDMA system.
In DL, the (mobile station) MS receiver may need to detect the bandwidth upon its initial entrance to the network. There are multi types of bandwidth we received from several systems. In this thesis, we consider detecting the bandwidth and the center frequencies of all S-OFDMA systems in the 20 MHz bandwidth that we received. Because there are some rules specify the bandwidth and the permutation of center frequency in IEEE 802.16e [2], we can list all possible bandwidth distributions (bandwidth and center frequency of each system) that fall in 20 MHz. Since the power spectrums of all types of bandwidth distribution are known, we can find the type which has the least distance to the received signal as the decision bandwidth distribution. In this thesis, we propose two types of dimensions to calculate distance separately and analyze the performances and the computational complexities of the two types of dimensions.
After detecting bandwidth, the MS have to detect the CP used in the system. There are four types of CP. Considering the computational complexity, we use the CP
complex conjugate correlation method to test which CP result in the largest correlation and that is the determined CP length.
As MS complete the network entry procedure and succeed entrance, it must to read IE every downlink frame, so we proposed an information element handling method that has low computational complexity and implemented and optimized it on DSP.
誌謝
這篇論文能夠順利完成,首先要感謝的人是我的指導教授林大衛老師,在兩 年的研究所生涯當中,由於老師的細心指導及在專業領域的博學精深,使得我學 習到不少研究的精神與方法,也由於老師親切樂觀的態度,使我在研究的過程中 也能保持輕鬆的心情去面對與嘗試解決各種錯誤,享受研究的樂趣。 此外,感謝通訊電子與訊號處理實驗室所有的成員,包含各位師長、同學、 學長姐與學弟妹們。特別感謝洪崑健學長、吳俊榮學長總是主動熱心且不保留的 給予我在研究過程上許多的指導與建議,也感謝 94 級的依翎學姐不論是在研究 還是生活上都給我許多照顧,還有昀澤、光中、佳楓、威年、尚諭、紹唐等同學, 因為能和你們共同討論、分享求學的經驗及一路上的相互扶持、,讓這兩年的研 究生涯充滿歡樂與回憶。 最後,我要感謝我的家人們,感謝他們一直都在背後支持我,在求學過程中 總是不斷的鼓勵我,是我精神上最大的支柱。 在此,將此篇論文獻給所有幫助過我,陪伴我走過這一段歲月的師長,同學, 朋友與家人。 蔡婉清 民國九十七年七月 於新竹Contents
1 Introduction 1
2 Overview of the IEEE 802.16e OFDMA Standard 3
2.1 Introduction to OFDM [5] . . . 3
2.2 Introduction to OFDMA [7] . . . 5
2.3 Introduction to IEEE 802.16e [2] . . . 6
2.3.1 Concept of OFDMA and Definition of Basic Terms . . . 8
2.3.2 Scalable OFDMA [10] . . . 12
2.4 OFDMA Frame Structure [2] . . . 13
2.5 OFDMA Subcarrier Allocation [2] . . . 16
2.5.1 Downlink . . . 17
2.5.2 Uplink . . . 22
2.6 Modulation [2] . . . 24
2.7 Transmit Spectral Mask [2] . . . 26
3 The DSP Implementation Platform 27 3.1 The DSP Card [14] . . . 27
3.2 The DSP Chip [15] . . . 28
3.2.1 Central Processing Unit [15] . . . 30
3.2.2 Memory Architecture and Peripherals [15] . . . 32
3.3 TI’s Code Development Environment [17] . . . 33
3.3.1 Code Composer Studio . . . 33
3.4 Code Optimization on TI DSP Platform . . . 36
3.4.1 Compiler Optimization Options [19] . . . 36
3.4.2 Software Pipelining [20] . . . 37
3.4.3 Intrinsics [19] . . . 38
4 Channel Bandwidth and Cyclic Prefix Detection for IEEE 802.16e OFDMA 40 4.1 System Parameters . . . 40
4.2 Bandwidth Detection . . . 41
4.2.1 Interference Analysis . . . 42
4.2.2 Bandwidth Decision Method . . . 46
4.3 CP Detection . . . 50
4.3.1 CP Decision Method . . . 51
4.4 Simulation Results . . . 52
4.4.1 Bandwidth Detection . . . 54
4.4.2 CP Detection . . . 61
5 Information Element Handling and DSP Implementation 64 5.1 Information Element Handling . . . 64
5.2 The Optimization Result . . . 65
6 Conclusion and Future Work 68 6.1 Conclusion . . . 68
List of Figures
2.1 Bandwidth efficiency comparison of traditional FDM and OFDM systems
(from [5]). . . 4
2.2 The use of cyclic prefix (from [8]). . . 5
2.3 Comparison of subcarrier allocatins in OFDM and OFDMA (from [9]). . 7
2.4 OFDMA frequency description (3-channel schematic example, from [1]). 9 2.5 OFDMA time description (from [1], Figure 213). . . 9
2.6 Example of the data region which defines the OFDMA allocation (from [1]). . . 11
2.7 Example of an OFDMA frame (with only mandatory zone) in TDD mode (from [2]). . . 14
2.8 OFDMA DL Frame Prefix format for all FFT sizes except 128 (from [2]). 15 2.9 Illustration of OFDMA with multiple zones (from [2]). . . 16
2.10 FCH subchannel allocation for all 3 segments (from [1]). . . 17
2.11 Example of DL renumbering the allocated subchannels for segment 1 in PUSC (from [1]). . . 18
2.12 DL cluster structure (from [13]). . . 19
2.13 Description of an UL tile (from [13]). . . 23
2.14 PRBS generator for pilot modulation (from [2]). . . 24
2.15 Transmit spectral mask for license-exempt operation (from [1]). . . 26
3.1 Sundance’s SMT395 module (from [14]) . . . 28
3.2 Functional block and CPU (DSP core) diagram [16]. . . 30
3.4 Code development flow for C6000 (from [19]). . . 35 3.5 Software-pipelined loop (from [15]). . . 39 4.1 Possible channel bandwidth distributions with PUSC permutation. All 49
possibilities are listed on the right. . . 42 4.2 The section of samples r (i.e., the 2048 points we get) may straddle over
two adjacent symbols. . . 42 4.3 Power spectrum of 2048 points of one 20 MHz bandwidth signal with
QPSK modulation in perfect channel (λ = 500). . . . 43 4.4 Power spectrum of 2048 points of three 5 MHz bandwidth systems with
QPSK modulation in perfect channel. All symbols are without CP and received with no offset. . . 44 4.5 The same bandwidth distributions shown in Figure 4.1 after power
nor-malization. . . 46 4.6 Each segment in the 23D structure calculates the mean of all spectrum
points belonging to it (PUSC permutation). . . 48 4.7 The theoretical performance of the detection of type a, b, and c in AWGN
channel. . . 50 4.8 An example of CP overlap between different CP lengths. . . 53 4.9 Index number of testing CP symbols with CP overlap to real CP and the
correlation value that results. . . 53 4.10 Flow diagram of bandwidth detection simulation. . . 55 4.11 Detection error rates of one 20 MHz system in SinglePath and AWGN
channels. . . 55 4.12 Detection error rates of one 20 MHz system in SUI-3 channel. . . 57 4.13 Detection error rates of one 20 MHz system in SUI-4 channel. . . 57 4.14 The histogram of the squared of distance between “a” system and mean
power of “a” in 2048D. . . 58 4.15 The histogram of the square of distance between “a” system and mean
power of “a” in 23D. . . 58 4.16 The spaced-frequency correlation function of the SUI-3 channel. . . 59
4.17 The spaced-frequency correlation function of the SUI-4 channel. . . 59 4.18 Distribution of bandwidth detection errors at SNR 0 dB, SUI-3 channel. . 60 4.19 One 10 MHz center frequency 3.5 GHz system vs. one 10 MHz center
frequency 3.5 GHz − 5 MHz. . . . 60 4.20 The CP detection error rate in AWGN channel. . . 62 4.21 The types of error detect CP when original 1/8 CP, L = 5, and SNR 4 dB. 62 5.1 The procedure of read the information from 32bits aligned IE. . . 64 5.2 A part of C code for decode repetition coding. . . 65 5.3 A part of assembly code for decode repetition coding. The adjacent
par-allel lines in the ellipse means which instructions are parpar-allel. . . 66 5.4 A part of C code for byte align transfer to 32 bits align. . . 67 5.5 A part of assembly code for byte align transfer to 32 bits align. The
List of Tables
2.1 OFDM Advantages and Disadvantages [7] . . . 6
2.2 S-OFDMA Parameters Proposed by WiMAX Forum . . . 12
2.3 1024-FFT OFDMA DL Carrier Allocation for PUSC (from [2]) . . . 20
2.4 1024-FFT OFDMA UL Carrier Allocation for PUSC (from [2]) . . . 22
2.5 Transmit Spectral Mask for License-Exempt Bands . . . 26
3.1 Functional Units and Operations Performed [15] . . . 31
Chapter 1
Introduction
The IEEE 802.16e, of which a subset is commonly known as Mobile WiMAX (Worldwide Interoperability of Microwave Access), is a broadband wireless access (BWA) system that has drawn much attention these days. IEEE 802.16e was originally suggested as an en-hancement version of IEEE Std.802.16-2004 to provide mobile station (MS) with mobility at vehicular speed. Therefore, it specifies BWA systems for both fixed and mobile MS simultaneously [1], [2].
One of the most promising modes in the IEEE 802.16e standard is the orthogonal frequency division multiple access (OFDMA) mode, which is generally accepted as a performance efficient multiple access scheme. The Mobile WiMAX system also utilizes the bandwidth scalability, where the FFT size typically increases with the bandwidth. In this thesis, we consider the IEEE 802.16e WirelessMAN OFDMA system with a time-division duplex (TDD) mode, where downlink (DL) and uplink (UL) transmissions are time multiplexed in each TDD frame.
Our study can be divided into two parts. The first part is the bandwidth and cyclic prefix (CP) detection techniques for IEEE 802.16e OFDMA. In a general OFDMA sys-tem, when performing initial network entry, the MS may not know the FFT size and the channel bandwidth employed by the BS, so it may need to detect them. After detecting the bandwidth and FFT size, MS should search all possible values of CP until it finds the CP being used by the BS. The literatures about bandwidth detection almost use the transmission time to determine a measured bandwidth (ex. [3], [4]), it is not suitable for
the initial network entry. In the present study we consider recognizing the bandwidth structure, that is, appropriate for cognitive radio. We assume the FFT size is proportional to the channel bandwidth, so we just pay attention to bandwidth and CP detection.
The second part is the information element (IE) handling in the digital signal processor (DSP) implementation of an overall system. The principal IE we need to handle is FCH and burst profile in the DL-MAP, that contains the information we have to interpret in order to read the received data. We implement an IE interpreter on Texas Instrument (TI)’s DSP. Moreover, we employ various optimization techniques to accelerate the execution speed of the implemented DSP programs.
This thesis is organized as follows.
1. First introduce the IEEE 802.16e WirelessMAN OFDMA standard in chapter 2. 2. Introduce the DSP implementation platform in chapter 3.
3. Analyze the bandwidth and CP detection problems and present some solutions in chapter 4.
4. Expound IE handling problem and discusses the DSP optimization methods and presents the optimization results in chapter 5.
5. Finally, the conclusion is given in chapter 6, where we also point out some potential future work.
Chapter 2
Overview of the IEEE 802.16e OFDMA
Standard
In this chapter, we first introduce some basic concepts regarding orthogonal frequency-division multiplexing (OFDM) and OFDMA. Then we give an overview of the IEEE 802.16e OFDMA standard. For the sake of simplicity, we only introduce the specifi-cations that are useful in our study. Other specifispecifi-cations like channel coding, channel estimation, transmit diversity, etc., are not our concern and are ignored in this introduc-tion. For more details we refer the reader to [1] and [2], from which we take much of the material in this chapter.
2.1 Introduction to OFDM [5]
OFDM is a special case of the multicarrier transmission technique, where a single datas-tream is transmitted over a number of lower rate subcarriers. It divides the available spectrum into narrower subcarrier bands, and each subcarrier only transmits a portion of the total information.
The orthogonality of OFDM constitutes one major difference from the classical par-allel data system, making its use of the available spectrum more efficient. Figure 2.1 shows the difference. As we can see, the subcarriers in an OFDM symbol can be arranged so that the sideband of each subcarrier overlaps but the received symbols still live
with-Figure 2.1: Bandwidth efficiency comparison of traditional FDM and OFDM systems (from [5]).
out adjacent-carrier interference. This can be accomplished by using the discrete Fourier transform (DFT) as proposed by Weinstein and Ebert in 1971 [6]. The complexity of DFT, however, was too expensive in the early days. Fortunately, modern advances in very-large-scale integration (VLSI) make it possible to use the fast Fourier transform (FFT) for a more efficient implementation of the DFT. The complexity is reduced from
N2 in DFT to N log
2N in FFT.
One of the main reasons to use OFDM is to increase the robustness against frequency selective fading or narrowband interference. An OFDM system may encode data using forward error correction (FEC) coding and distribute them across several subcarriers. If frequency-selective fading causes errors in the reception of few subcarriers, the data bits in those subcarriers are recovered through FEC.
Another reason for choosing OFDM is its ability to handle multipath interference with a small amount of overhead. For a given overall data rate, the increasing number of carri-ers can reduce the data rate that each individual carricarri-ers must convey, and hence lengthen the symbol period. This means that the inter symbol interference (ISI) affects a smaller percentage of each symbol. In order to eliminate the ISI, a guard time (or guard interval) is inserted. The guard time is chosen larger than the expected delay spread, such that
multipath components from one symbol cannot interfere with the next symbol. However, if we insert zeros within the guard interval, the orthogonality among subcarriers will no longer exist, which causes serious intercarrier interference (ICI). To preserve the orthog-onality among the subcarriers and eliminate ICI, the OFDM symbol should be cyclically extended in the guard time rather than just extended with zero. Figure 2.2 shows how to add cyclic prefix in front of an OFDM symbol. Hence if the maximum multipath delay is smaller than the guard time, there will not be ISI or ICI.
Finally, the advantages and disadvantages are summarized in Table 2.1. The advan-tages are already discussed above. The disadvanadvan-tages can be addressed in various ways, but are not the focus of the present study.
2.2 Introduction to OFDMA [7]
OFDMA is a multiple access method based on OFDM signaling that allows simultaneous transmissions to and from multiple users along with the other advantages of OFDM. In OFDM, a channel is divided into carriers which is used by one user at any time. In OFDMA, the carriers are divided into subchannels. Each subchannel has multiple carriers that form one unit in frequency allocation. In this way, the bandwidth can be allocated dynamically to the users according to their needs.
An additional advantage of OFDMA is the following. Due to the large variance in a mobile system’s path loss, inter-cell interference is a common issue in mobile wireless systems. An OFDMA system can be designed such that subchannels can be composed
Table 2.1: OFDM Advantages and Disadvantages [7]
Advantages
Disadvantages
Bandwidth efficiency Sensitive to frequency offset Immunity to multipath effect Sensitive to timing offset Robust against narrowband interference Sensitive to phase noise
Large peak-to-average power ratio
from several distinct permutations of subcarriers. This enables significant reduction in inter-cell interference when the system is not fully loaded, because even on occasions where the same subchannel is used at the same time in two different cells, there is only a partial collision on the active subcarriers. A simple comparison of the subcarrier alloca-tion of OFDM and OFDMA is shown in Fig. 2.3.
In order to support multiple users, the control mechanism becomes more complex. Besides, the OFDMA system has some implementation issues which are more compli-cated to handle. For example, power control is needed for the uplink to make signals from different users have equal power at the receiver, and all users have to adjust their transmitting time to be aligned.
2.3 Introduction to IEEE 802.16e [2]
The IEEE 802.16 family of standards is officially called WirelessMAN. Part of it is known as WiMAX (Worldwide Interoperability for Microwave Access) by an industry group called the WiMAX Forum. The abjectlies of the Forum are to promote and certify com-patibility and interoperability of broadband wireless products.
The first 802.16 standard approved in December 2001 is a standard for point to multi-point broadband wireless transmission in the 10–66 GHz band, with only a line-of-sight (LOS) capability. It uses a single carrier (SC) physical layer (PHY) technique.
To overcome the disadvantage of the line-of-sight (LOS) requirement between trans-mitters and receivers in the 802.16 standard, the 802.16a standard was approved in 2003 to support nonline-of-sight (NLOS) links, operational in both licensed and unlicensed
fre-Figure 2.3: Comparison of subcarrier allocatins in OFDM and OFDMA (from [9]). quency bands from 2 to 11 GHz, and subsequently revised to create the 802.16d standard (now code-named 802.16-2004). With such enhancements, the 802.16-2004 standard has been viewed as a promising alternative for providing the last-mile connectivity by radio link. However, the 802.16-2004 specifications were devised primarily for fixed wireless users. The 802.16e task group was subsequently formed with the goal of extending the 802.16-2004 standard to support mobile terminals.
The IEEE 802.16e has been published in Febuary 2006. It specifies four air inter-faces: WirelessMAN-SC PHY, WirelessMAN-SCa PHY, WirelessMAN-OFDM PHY, and WirelessMAN-OFDMA PHY. This study is concerned with WirelessMAN-OFDMA PHY in a mobile communication environment.
Some glossary used in the following is as follows. The direction of transmission from the base station (BS) to the subscriber station (SS) is called downlink (DL), and the opposite direction is uplink (UL). The SS is considered synonymous as the mobile station
(MS). It is sometimes termed the user. The BS is a generalized equipment set providing connectivity, management, and control of the SS.
2.3.1 Concept of OFDMA and Definition of Basic Terms
We present some basic concepts and terminology of OFDMA signaling in this subsection. The contents of this subsection have been taken to a large extent from [1] and [2].
The OFDMA PHY mode is based on at least one of the fast fourier transform (FFT) sizes 2048 (backward compatible to IEEE Std 802.16-2004), 1024, 512, and 128 shall be supported. This facilitates support of the various channel bandwidths.
The MS may implement a scanning and search mechanism to detect the DL signal when performing initial network entry, and this may include dynamic detection of the FFT size and the channel bandwidth employed by the BS.
Frequency Domain and Time Domain Descriptions
An OFDMA symbol is made up of subcarriers, the number of which determines the FFT size used. There are several subcarrier types:
• Data subcarriers: For data transmission.
• Pilot subcarriers: For various estimation purposes.
• Null subcarriers: No transmission at all, for guard bands and including the DC
subcarrier.
The active subcarriers are divided into subsets of subcarriers, where each subset is termed a subchannel. The subcarriers forming one subchannel may, but need not be, adjacent. The concept is shown in Fig. 2.4.
Three basic types of subchannel organization are defined: partial usage of subchannels (PUSC), full usage of subchannels (FUSC), and adaptive modulation and coding (AMC); among which the PUSC is mandatory and the other two are optional. In PUSC DL, the entire channel bandwidth is divided into three segments to be used separately. The
Figure 2.4: OFDMA frequency description (3-channel schematic example, from [1]).
FUSC is employed only in the DL and it uses the full set of available subcarriers so as to maximize the throughput.
Inverse-Fourier-transformation creates the OFDMA waveform; its time duration is referred to as the useful symbol time Tb. A copy of the last Tg of the useful symbol
period, termed cyclic prefix (CP), is used to collect multipaths while maintaining the orthogonality of the tones. Figure 2.5 illustrates the structure.
The transmitter energy increases with the length of the guard time while the receiver energy remains the same (the cyclic extension is discarded), so there is a 10 log(1 −
Tg/(Tb+ Tg))/ log(10) dB loss in Eb/N0. Using a cyclic extension, the samples required
for performing the FFT at the receiver can be taken anywhere over the length of the extended symbol. This provides multipath immunity as well as a tolerance for symbol time synchronization errors.
On initialization, an SS should search all possible values of CP until it finds the CP being used by the BS. The SS shall use the same CP on the UL. Once a specific CP duration has been selected by the BS for operation on the DL, it should not be changed. Changing the CP would force all the SSs to resynchronize to the BS.
Important Parameters
Four primitive parameters characterize the OFDMA symbols:
• BW : The nominal channel bandwidth.
• Nused: Number of used subcarriers (which includes the DC subcarrier).
• n: Sampling factor. Its value is set as follows: For channel bandwidths that are a
multiple of 1.75 MHz, n = 8/7; else for channel bandwidths that are a multiple of any of 1.25, 1.5, 2 or 2.75 MHz, n = 28/25; else for channel bandwidths not otherwise specified, n = 8/7.
• G: This is the ratio of CP time to “useful” time, i.e., Tcp/Ts.
The following parameters are defined in terms of the primitive parameters.
• NF F T: Smallest power of two greater than Nused.
• Sampling frequency: Fs = bn · BW/8000c × 8000.
• Subcarrier spacing: 4f = Fs/NF F T.
• Useful symbol time: Tb = 1/4f .
• CP time: Tg = G × Tb.
• OFDMA symbol time: Ts= Tb+ Tg.
• Sampling time: Tb/NF F T.
Slot and Data Region
The definition of an OFDMA slot depends on the OFDMA symbol structure, which varies for UL and DL, for FUSC and PUSC, and for the distributed subcarrier permutations and the adjacent subcarrier permutation.
• For downlink PUSC using the distributed subcarrier permutation, one slot is one
• For uplink PUSC using either of the distributed subcarrier permutations, one slot is
one subchannel by three OFDMA symbols.
• For downlink FUSC and downlink optional FUSC using the distributed subcarrier
permutation, one slot is one subchannel by one OFDMA symbol.
In OFDMA, a data region is a two-dimensional allocation of a group of contiguous sub-channels, in a group of contiguous OFDMA symbols. All the allocations refer to logical subchannels. This two-dimensional allocation may be visualized as a rectangle, such as the 4×3 rectangle shown in Fig. 2.6.
Segment
A segment is a subdivision of the set of available OFDMA subchannels (that may include all available subchannels). One segment is used for deploying a single instance of the MAC.
Permutation Zone
A permutation zone is a number of contiguous OFDMA symbols, in the DL or the UL, that use the same permutation formula. The DL subframe or the UL subframe may contain more than one permutation zone.
Table 2.2: S-OFDMA Parameters Proposed by WiMAX Forum
Parameters
Values
System Channel Bandwidth (MHz) 1.25 5 10 20 Sampling Frequency (MHz) 1.4 5.6 11.2 22.4 FFT Size 128 512 1024 2048 Subcarrier Spacing (∆f ) 10.94 kHz
Useful Symbol Time (Tb=1/∆f ) 91.4 µsec
Guard Time (Tg=Tb/8) 11.4 µsec
OFDMA Symbol Duration (Ts=Tb+Tg) 102.9 µsec
2.3.2 Scalable OFDMA [10]
One feature of the IEEE 802.16e OFDMA is the selectable FFT size, from 128 to 2048 in multiples of 2, excluding 256 to be used with wirelessMAN-OFDM. This has been termed scalable OFDMA (S-OFDMA). One use of S-OFDMA is that if the channel bandwidths are allocated based on integer power of 2 times a base bandwidth, then one may consider making the FFT size proportional to the allocated bandwidth so that all systems are based on the same subcarrier spacing and the same OFDMA symbol duration, which may sim-plify system design. For example, Table 2.2 lists some S-OFDMA parameters proposed by the WiMAX Forum [11]. S-OFDMA supports a wide range of bandwidth to flexibly address the need for various spectrum allocation and usage model requirements.
When designing OFDMA wireless systems the optimal choice of the number of sub-carriers per channel bandwidth is a tradeoff between protection against multipath, Doppler shift, and design cost/coplexity. Increasing the number of subcarriers leads to better im-munity to the ISI caused by multipath; on the other hand it increases the cost and com-plexity of the system (it leads to higher requirements for signal processing power and power amplifiers with the capability of handling higher peak-to-average power ratios). Having more subcarriers also results in narrower subcarrier spacing and therefore the sys-tem becomes more sensitive to Doppler shift and phase noise. Calculations show that the optimum tradeoff for mobile systems is achieved when subcarrier spacing is about 11 kHz
[12] .
2.4 OFDMA Frame Structure [2]
Duplexing Modes
In licensed bands, the duplexing method shall be either frequency-division duplex (FDD) or time-division duplex (TDD). FDD SSs may be half-duplex FDD (H-FDD). In license-exempt bands, the duplexing method shall be TDD.
Point-to-Multipoint (PMP) Frame Structure
When implementing a TDD system, the frame is composed of BS and SS transmissions. Figure 2.7 shows an example. Each frame in the downlink transmission begins with a preamble followed by a DL transmission period and an UL transmission period. In each frame, time gaps, denoted transmit/receive transition gap (TTG) and receive/transmit gap (RTG), are between the downlink and uplink subframes and at the end of each frame, respectively, to allow the BS to turn around.
Subchannel allocation in the downlink may be performed in several ways: PUSC where some of the subchannels are allocated to the transmitter, and FUSC where all sub-channels are allocated to the transmitter. The downlink frame shall start in PUSC mode with no transmit diversity. The FCH shall be transmitted using QPSK rate 1/2 with four repetitions using the mandatory coding scheme (i.e., the FCH information will be sent on four subchannels with successive logical subchannel numbers) in a PUSC zone. The FCH contains the DL Frame Prefix (see Figure 2.8) which specifies the length of the DL-MAP message that immediately follows the DL Frame Prefix and the repetition coding used for the DL-MAP message.
The transitions between modulations and coding take place on slot boundaries in time domain (except in AAS zone, where AAS stands for adaptive antenna system) and on sub-channels within an OFDMA symbol in frequency domain. The OFDMA frame may in-clude multiple zones (such as PUSC, FUSC, PUSC with all subchannels, optional FUSC, AMC, TUSC1, and TUSC2, where AMC stands for adaptive modulation and coding and
Figure 2.7: Example of an OFDMA frame (with only mandatory zone) in TDD mode (from [2]).
TUSC stands for tile usage of subchannels), the transition between zones is indicated in the DL-MAP. Figure 2.9 depicts an OFDMA frame with multiple zones. The PHY para-meters (such as channel state and interference levels) may change from one zone to the next.
The maximum number of downlink zones is 8 in one downlink subframe. For each SS, the maximum number of bursts to decode in one downlink subframe is 64. This includes all bursts without connection identifier (CID) or with CIDs matching the SS’s CIDs. FCH, DL-MAP, and Logical Subchannel Numbering
In PUSC, any segment used shall be allocated at least the same number of subchannels as in subchannel group #0. For FFT sizes other than 128, the first 4 slots in the downlink part of the segment contain the FCH as defined before. These slots contain 48 bits modulated by QPSK with coding rate 1/2 and repetition coding of 4. For FFT-128, the first slot in the downlink part of the segment is dedicated to FCH and repetition is not applied. The basic allocated subchannel sets for segments 0, 1, and 2 are subchannel groups #0, #2, and #4,
Figure 2.8: OFDMA DL Frame Prefix format for all FFT sizes except 128 (from [2]).
respectively. Figure 2.10 depicts this structure.
After decoding the DL Frame Prefix message within the FCH, the SS has the knowl-edge of how many and which subchannels are allocated to the PUSC segment. In order to observe the allocation of the subchannels in the downlink as a contiguous allocation block, the subchannels shall be renumbered. The renumbering, for the first PUSC zone, shall start from the FCH subchannels (renumbered to values 0–11), then continue num-bering the subchannels in a cyclic manner to the last allocated subchannel and from the first allocated subchannel to the FCH subchannels. Figure 2.11 gives an example of such renumbering for segment 1.
For uplink, in order to observe the allocation of the subchannels as a contiguous allo-cation block, the subchannels shall be renumbered, and the renumbering shall start from
Figure 2.9: Illustration of OFDMA with multiple zones (from [2]).
the lowest numbered allocated subchannel (renumbered to value 0), up to the highest numbered allocated subchannel, skipping nonallocated subchannels.
The DL-MAP of each segment shall be mapped to the slots allocated to the segment in a frequency first order, starting from the slot after the FCH (subchannel 4 in the first symbol, after renumbering), and continuing to the next symbols if necessary. The FCH of segments that have no subchannels allocated (unused segments) will not be transmitted, and the respective slots may be used for transmission of MAP and data of other segments.
2.5 OFDMA Subcarrier Allocation [2]
As mentioned, the OFDMA PHY defines four scalable FFT sizes: 2048, 1024, 512, and 128. For convenience, here we only take the 1024-FFT case for introduction. The subcar-riers are divided into three types: null (guard band and DC), pilot, and data. Subtracting the guard tones from NF F T, one obtains the set of “used” subcarriers Nused. For both
uplink and downlink, these used subcarriers are allocated to pilot subcarriers and data subcarriers. However, there is a difference between the different possible zones. For FUSC and PUSC, in the downlink, the pilot tones are allocated first; what remains are data subcarriers, which are divided into subchannels that are used exclusively for data.
Figure 2.10: FCH subchannel allocation for all 3 segments (from [1]).
Thus, in FUSC, there is one set of common pilot subcarriers, and in PUSC downlink, there is one set of common pilot subcarriers in each major group, but in PUSC uplink, each subchannel contains its own pilot subcarriers.
2.5.1 Downlink
Preamble
Preamble is the first symbol of the downlink transmission. There are three types of pream-ble carrier-sets, which are defined by allocation of different subcarriers for each one of them. The subcarriers are modulated using a boosted BPSK modulation with a specific pseudo-noise (PN) code. The preamble carrier-sets are defined using
Figure 2.11: Example of DL renumbering the allocated subchannels for segment 1 in PUSC (from [1]). where: P reambleCarrierSetn n k
specifies all subcarriers allocated to the specific preamble, is the number of the preamble carrier-set indexed 0–2, is a running index 0–283.
For the preamble symbol there are 86 guard-band subcarriers on the left side and the right side of the spectrum. Each segment uses a preamble composed of a carrier-set out of the three available carrier-sets in the manner that segment i uses preamble carrier-set
i, where i = 0, 1, 2. Therefore, each segment eventually modulates each third subcarrier.
In the case of segment 0, the DC carrier is zeroed and the corresponding PN number is discarded.
The 114 different PN series modulating the preamble carrier-set are defined in Table 309 of [1] for the 1024-FFT mode. The series modulated depends on the segment used
and the IDcell parameter. Symbol Structure for PUSC
The symbol is first divided into basic clusters and zero carriers are allocated. Pilots and data carriers are allocated within each cluster. Table 310 of [2] summarizes the parameters of the symbol structure of different FFT sizes for PUSC mode. Here we only take the 1024-FFT OFDMA downlink carrier allocation for example, which is summarized in Table 2.3. Fig. 2.12 depicts the DL cluster structure.
Downlink Subchannels Subcarrier Allocation in PUSC
The subcarrier allocation to subchannels is performed using the following procedure: 1. Dividing the subcarriers into the number of clusters (Nclusters), where the physical
clusters contain 14 adjacent subcarriers each (starting from carrier 0). The number of clusters varies with the FFT size.
2. Renumbering the physical clusters into logical clusters using the following formula:
LogicalCluster =
RenumberingSequence(P hysicalCluster), first DL zone, or Use All SC indicator
= 0 in STC DL Zone IE,
RenumberingSequence((P hysicalCluster)+ otherwise.
13 · DL P ermBase)modNclusters,
(2.2)
Table 2.3: 1024-FFT OFDMA DL Carrier Allocation for PUSC (from [2]) Parameter Value Comments Number of DC subcarriers 1 Index 512 Number of guard subcarriers, left 92
Number of guard subcarriers, right 91 Number of used subcarriers, Nused 841
Renumbering sequence 6, 48, 37, 21, 31, 40, 42, Used to renumber 56, 32, 47, 30, 33, 54, 18, clusters before 10, 15, 50, 51, 58, 46, 23, allocation to 45, 16, 57, 39, 35, 7, 55, subchannels. 25, 59, 53, 11, 22, 38, 28, 19, 17, 3, 27, 12, 29, 26, 5, 41, 49, 44, 9, 8, 1, 13, 36, 14, 43, 2, 20, 24, 52,4, 34, 0
Number of subcarriers per cluster 14 Number of clusters 60 Number of data subcarriers in each 24 symbol per subchannel
Number of subchannels 30
Basic permutation sequence 6 3, 2, 0, 4, 5, 1 (for 6 subchannels)
Basic permutation sequence 4 3, 0, 2, 1 (for 4 subchannels)
In the first PUSC zone of the downlink (first downlink zone) and in a PUSC zone defined by STC DL ZONE IE() with “use all SC indicator = 0”, the default re-numbering sequence is used for logical cluster definition. For all other cases DL PermBase parameter in the STC DL Zone IE() or AAS DL IE() shall be used.
3. Allocating logical clusters to groups. The allocation algorithm varies with FFT size. For FFT size = 1024, dividing the clusters into six major groups. Group 0 includes clusters 0–11, group 1 clusters 12–19, group 2 clusters 20–31, group 3 clusters 32–39, group 4 clusters 40–51, and group 5 clusters 52–59. These groups may be allocated to segments; if a segment is used, then at least one group shall be allocated to it. By default group 0 is allocated to sector 0, group 2 to sector 1, and group 4 to sector 2.
sep-arately for each OFDMA symbol by first allocating the pilot carriers within each cluster, and then partitioning all remaining data carriers into groups of contiguous subcarriers. Each subchannel consists of one subcarrier from each of these groups. The number of groups is therefore equal to the number of subcarriers per subchan-nel, and it is denoted Nsubcarriers. The number of the subcarriers in a group is equal
to the number of subchannels, and it is denoted Nsubchannels. The number of data
subcarriers is thus equal to Nsubcarriers· Nsubchannels. The parameters vary with FFT
sizes. For FFT size = 1024, use the parameters from Table 2.3, with basic permuta-tion sequence 6 for even numbered major groups and basic permutapermuta-tion sequence 4 for odd numbered major groups, to partition the subcarriers into subchannels con-taining 24 data subcarriers in each symbol. The exact partitioning into subchannels is according to
subcarrier(k, s) = Nsubchannels· nk
+{ps[nkmod Nsubchannels] + DL P ermBase} mod Nsubcahnnels (2.3)
where: subcarrier(k, s) s nk Nsubchannels ps[j] DL P ermBase
is the subcarrier index of subcarrier k in subchannel s,
is the index number of a subchannel, from the set {0,...,
Nsubcarriers− 1},
= (k + 13 · s) mod Nsubcarriers, where k is the
subcarrier-in-subchannel index from the set {0,..., Nsubcarriers− 1},
is the number of subchannels (for PUSC use number of sub-channels in the currently partitioned major group),
is the series obtained by rotating basic permutation sequence cyclically to the left s times,
is an integer ranging from 0 to 31, which is set to the preamble IDCell in the first zone and determined by the DL-MAP for other zones.
On initialization, an SS must search for the downlink preamble. After finding the pream-ble, the user shall know the IDcell used for the data subchannels.
2.5.2 Uplink
The UL follows the DL model, therefore it also supports up to three segments. The UL supports 35 subchannels where each transmission uses 48 data subcarriers as the minimal block of processing. Each new transmission for the uplink commences with the parameters as given in Table 2.4.
Symbol Structure for Subchannel (PUSC)
A slot in the uplink is composed of three OFDMA symbols and one subchannel. Within each slot, there are 48 data subcarriers and 24 fixed-location pilots. The subchannel is constructed from six uplink tiles, each tile has four successive active subcarriers and its configuration is illustrated in Fig. 2.13.
Partitioning of Subcarriers into Subchannels in the Uplink
The usable subcarriers in the allocated frequency band shall be divided into Ntilesphysical
tiles as defined in Fig. 2.13 with parameters from Table 2.4. The allocation of physical tiles to logical tiles in subchannels is performed as
T iles(s, n) = Nsubchannels·n+{Pt[(s+n) mod Nsubchannels]+UL P ermBase} mod Nsubcahnnels
(2.4) Table 2.4: 1024-FFT OFDMA UL Carrier Allocation for PUSC (from [2])
Parameter Value Comments Number of DC subcarriers 1 Index 512 Nused 841
Guard subcarriers: left, right 92,91
TilePermutation 11, 19, 12, 32, 33, 9, 30, 7, used to allocate 4, 2, 13, 8, 17, 23, 27, 5, tiles to subchannels 15, 34, 22, 14, 21, 1, 0, 24, 3, 26, 29, 31, 20, 25, 16, 10, 6, 28, 18 Nsubchannels 35 Ntiles 210
Figure 2.13: Description of an UL tile (from [13]). where: T iles(s, n) n Pt Nsubchannels s UL P ermBase
is the physical tile index in the FFT with tiles being ordered consecutively from the most negative to the most positive used subcarrier (0 is the starting tile index),
is the tile index 0,...,5 in a subchannel, is the tile permutation,
is the number of subchannels,
is the subchannel number in the range {0,...,Nsubchannels− 1},
is an integer value in the range 0...69, which is assigned by a management entity.
After mapping the physical tiles in the FFT to logical tiles for each subchannel, the data subcarriers per slot are enumerated by the following process:
1. After allocating the pilot carriers within each tile, indexing of the data subcarriers within each slot is performed starting from the first symbol at the lowest indexed subcarrier of the lowest indexed tile and continuing in an ascending manner through the subcarriers in the same symbol, then going to the next symbol at the lowest indexed data subcarrier, and so on. Data subcarriers are indexed from 0 to 47. 2. The mapping of data onto the subcarriers will follow (2.5), which calculates the
subcarrier index (as assigned in item 1) to which the data constellation point is to be mapped:
Subcarriers(n, s) = (n + 13 · s) mod Nsubcarriers (2.5)
Subcarriers(n, s)
n
s
Nsubcarriers
is the permutated subcarrier index corresponding to data sub-carrier n is subchannel s,
is the running index 0,...,47, indicating the data constellation point,
is the subchannel number,
is the number of subcarriers per slot.
2.6 Modulation [2]
Subcarrier Randomization
The pseudo random binary sequence (PRBS) generator, as shown in Fig. 2.14, shall be used to produce a sequence wk. The polynomial for the PRBS generator is X11+ X9+ 1.
The value of the pilot modulation on subcarrier k shall be derived from wk.
The initialization vector of the PRBS generator for both uplink and downlink, desig-nated b10..b0, is defined as follows:
b0..b4 = five least significant bits of IDcell as indicated by the frame preamble in the first downlink zone and in the downlink AAS zone with Diversity Map support, DL PermBase following STC DL Zone IE() and 5 LSB of DL PermBase follow-ing AAS DL IE without Diversity Map support in the downlink. Five least signifi-cant bits of IDcell (as determined by the preamble) in the uplink. For downlink and uplink, b0 is MSB and b4 is LSB, respectively.
b5..b6 = set to the segment number + 1 as indicated by the frame preamble in the first downlink zone and in the downlink AAS zone with Diversity Map support, PRBS ID as indicated by the STC DL Zone IE or AAS DL IE without Diver-sity Map support in other downlink zone. 0b11 in the uplink. For downlink and uplink, b5 is MSB and b6 is LSB, respectively.
b7..b10 = 0b1111 (all ones) in the downlink and four LSB of the Frame Number in the uplink. For downlink and uplink, b7 is MSB and b10 is LSB, respectively.
Data Modulation
After the repetition block, the data bits are entered serially to the constellation mapper. Gray-mapped QPSK and 16-QAM shall be supported, whereas the support of 64-QAM is optional.
Pilot Modulation
In all permutations except uplink PUSC and downlink TUSC1, each pilot shall be trans-mitted with a boosting of 2.5 dB over the average non-boosted power of each data tone. The pilot subcarriers shall be modulated according to
<{ck} = 83(12 − wk) · pk, ={ck} = 0, (2.6)
where pkis the pilot’s polarity for SDMA (spatial division multiple access) allocations in
AMC AAS zone, and p = 1 otherwise. Preamble Pilot Modulation
The pilots in the downlink preamble shall be modulated according to
<{P reambleP ilotModulation} = 4 ·√2 · (1
2 − wk), (2.7)
Figure 2.15: Transmit spectral mask for license-exempt operation (from [1]). Table 2.5: Transmit Spectral Mask for License-Exempt Bands
Bandwidth (MHz) A B C D 10 9.5 10.9 19.5 29.5 20 4.75 5.45 9.75 14.75
2.7 Transmit Spectral Mask [2]
IEEE 802.16e do not specify the power mask for the licensed bands. Due to requirement of bandwidth-limited transmission, the transmitted spectral density of the transmitted sig-nal shall fall within the spectral mask as shown in Fig. 2.15 and Table 2.5 in license-exempt bands. The measurements shall be made using 100 kHz resolution bandwidth and a 30 kHz video bandwidth. The 0 dBr level is the maximum power allowed by the relevant regulatory body.
Chapter 3
The DSP Implementation Platform
In this chapter, we introduce the DSP platform used in our implementation. We employ the SMT395 DSP module made by Sundance housed on a Sundance PCI-plugin board. The DSP chip on the module is the TMS320C6416T made by Texas Instrument (TI). It also has a Xilinx Virtex II Pro FPGA. We will introduce the DSP card and the DSP chip. In addition, we will also introduce the software development tool, the Code Composer Studio (CCS), and the code development technique.
3.1 The DSP Card [14]
The DSP card used in our implementation is Sundance’s SMT395 shown in Fig. 3.1. It houses a 1 GHz 64-bit TMS320C6416T DSP of TI, manufactured on 90 nm technology. The SMT395 is supported by TI’s Code Composer Studio and the 3L Diamond real-time operating system (RTOS) to enable multi-DSP systems with minimum programmer effort. It provides a flexible platform for various applications.
Some important features of the SMT395 module are as follows.
• 1 GHz TMS320C6416T fixed-point DSP with 8000 MIPS peak DSP performance. • Xilinx Virtex II Pro FPGA XC2VP30-6 in FF896 package.
• 256 Mbytes of SDRAM at 133MHz.
• Two Sundance High-speed Bus (50 MHz, 100 MHz or 200 MHz) ports at 32 bits
Figure 3.1: Sundance’s SMT395 module (from [14])
• Eight 2 Gbit/sec Rocket Serial Links (RSL) for inter-Module communications. • Six common ports up to 20 MB per second for inter-DSP communication. • 8 Mbytes flash ROM for configuration and booting.
• JTAG diagnostics port.
3.2 The DSP Chip [15]
The TMS320C6416T DSP is a fixed-point DSP in the TMS320C64x series of the TMS320C6000 DSP platform family. It is based on the advanced VelociTI very-long-instruction-word (VLIW) architecture developed by TI. The functional block and DSP core diagram of the TMS320C64x series is shown in Fig. 3.2.
The C6000 core CPU consists of 64 general-purpose 32-bits registers and eight func-tion units. Features of C6000 device include the following.
• VLIW CPU with eight functional units, including two multipliers and six
arith-metic:
– Executes up to eight instructions per cycle.
– Allows designers to develop highly effective RISC-like code for fast develop-ment time.
• Instruction packing:
– Gives code size equivalence for eight instructions executed serially or in par-allel.
– Reduces code size, program fetches, and power consumption.
• Conditional execution of all instructions:
– Reduces costly branching.
– Increases parallelism for higher sustained performance.
• Efficient code execution on independent functional units:
– Efficient C complier on DSP benchmark suite.
– Assembly optimizer for fast development and improved parallelization.
• 8/16/32-bit data support, providing efficient memory support for a variety of
appli-cations.
• 40-bit arithmetic options add extra precision for vocoders. • 32x32-bit integer multiply with 32- or 64-bit result.
In the following subsections, three major parts of the TMS320C64x DSP are intro-duced respectively. They are the central processing unit, memory, and peripherals.
Figure 3.2: Functional block and CPU (DSP core) diagram [16].
3.2.1 Central Processing Unit [15]
The C64x DSP core contains 64 32-bit general purpose registers, program fetch unit, instruction decode unit, two data paths each with four function units, control register, control logic, advanced instruction packing, test unit, emulation logic and interrupt logic. The program fetch, instruction fetch, and instruction decode units can arrange eight 32-bit instructions to the eight function units every CPU clock cycle. The processing of instructions occurs in each of the two data paths (A and B) shown in Fig. 3.2, each of which contains four functional units and one register file. The four functional units are: one unit for multiplier operations (.M), one for arithmetic and logic operations (.L), one for branch, byte shifts, and arithmetic operations (.S), and one for linear and circular address calculation to load and store with external memory operations (.D). The details of the functional units are described in Table 3.1.
Table 3.1: Functional Units and Operations Performed [15] Parameter Value
.L unit(.L1, .L2) 32/40-bit arithmetic and compare operations 32-bit logical operations
Leftmost 1 or 0 counting for 32 bits Normalization count for 32 and 40 bits Byte shifts
Data packing/unpacking 5-bit constant generation
Dual 16-bit and Quad 8-bit arithmetic operations Dual 16-bit and Quad 8-bit min/max operations .S unit (.S1, .S2) 32-bit arithmetic operations
32/40-bit shifts and 32-bit bit-field operations 32-bit logical operations
Branches
Constant generation
Register transfers to/from control register file (.S2 only) Byte shifts
Data packing/unpacking
Dual 16-bit and Quad 8-bit compare operations
Dual 16-bit and Quad 8-bit saturated arithmetic operations .M unit (.M1, .M2) 16 x 16 multiply operations
16 x 32 multiply operations
Dual 16 x 16 and Quad 8 x 8 multiply operations Dual 16 x 16 multiply with add/subtract operations Quad 8 x 8 multiply with add operations
Bit expansion
Bit interleaving/de-interleaving Variable shift operations Rotation
Galois Field Multiply
.D unit (.D1, .D2) 32-bit add, subtract, linear and circular address calculation Loads and stores with 5-bit constant offset
Loads and stores with 15-bit constant offset(.D2 only) Loads and stores doubles words with 5-bit constant Loads and store non-aligned words and double words 5-bit constant generation
Each register file consists of 32 32-bit registers. Each function unit in the two sets of four functional units reads and writes directly within its own data path. That is, functional units .L1, .S1, .M1, .D1 can only write to register file A. The same holds for register file B. However, two cross-paths (1X and 2X) allow functional units from one data path to access a 32-operand from the opposite side register file. The cross path 1X allows data path A to read their source from register file B. The cross path 2X allows data path B to read their source from register file A. In the C64x, CPU pipelines data-cross-path accesses over multiple clock cycles. This allows the same register to be used as a data-cross-path operand by multiply functional units in the same execute packet.
3.2.2 Memory Architecture and Peripherals [15]
The C64x is a two-level cache-based architecture. The level 1 cache is separated into program and data spaces. The level 1 program cache (L1P) is a 128-Kbit direct mapped cache and the level 1 data cache (L1D) is a 128-Kbit 2-way set-associative mapped cache. The level 2 (L2) memory consists of 8-Mbytes memory space for cache (up to 256 Kbytes) and unified mapped memory.
The external memory interface (EMIF) provides interfaces for the DSP core and external memory, such as synchronous-burst SRAM (SBSRAM), synchronous DRAM (SRAM), SDRAM, FIFO and asynchronous memories (SRAM and EPROM). The EMIF also provides 64-bit-wide (EMIFA) and 16-bit-wide (EMIFB) memory read capability.
The C64x contains some peripherals such as enhanced direct-memory-access (EDMA), host-port interface (HPI), PCI, three multichannel buffered serial ports (McBSPs), three 32-bit general-purpose timers and sixteen general-purpose I/O pins. The EDMA con-troller handles all data transfers between the level-two (L2) cache/memory and the device peripheral. The C64x has 64 independent channels. The HPI is a 32-/16-bit wide parallel port through which a host processor can directly access the CPUs memory space. The PCI port supports connection of the DSP to a PCI host via the integrated PCI master/slave bus interface.
3.3 TI’s Code Development Environment [17]
The Code Composer Studio (CCS) is a key element of the DSP software and development tools from Texas Instruments. The tutorial [18] introduces the key features of CCS and the programmer’s guide [19] gives a reference for programming TMS320C6000 DSP devices. A programmer needs to be familiar with coding development flow and CCS for building a new project on the DSP platform efficiently.
3.3.1 Code Composer Studio
The CCS combines the basic code generation tools with a set of debugging and real-time analysis capabilities which supports all phases of the development cycle shown in Fig. 3.3. Some main features of the CCS are
• Real-time analysis.
• Source code debugger common interface for both simulator and emulator targets.
– C/C++ assembly language support. – Simple breakpoints.
– Advanced watch window. – Symbol browser.
• DSP/BIOS support.
– Pre-emptive multi-threading.
– Interthread communication. – Interupt handing.
• Chip Support Libraries (CSL) to simplify device configuration. CSL provides
C-program functions to configure and control on-chip peripherals.
• DSP libraries for optimum DSP functionality. The DSP library includes many
C-callable, assembly-optimized, general-purpose signal-processing and image/video processing routines. These routines are typically used in computationally intensive real-time applications where optimal execution speed is critical. The TMS320C64x digital signal processor library (DSPLIB) provides some routines for:
– Adaptive filtering. – Correlation. – FFT.
– Filtering and convolution. – Math.
– Matrix functions. – Miscellaneous.
3.3.2 Code Development Flow [19]
The recommended code development flow involves utilizing the C6000 code generation tools to aid in optimization rather than forcing the programmer to code by hand in assem-bly. Hence the programmer may let the compiler do all the laborious work of instruction selection, parallelizing, pipelining, and register allocation. This simplifies the mainte-nance of the code, as everything resides in a C framework that is simple to maintain, support, and upgrade. Fig. 3.4 illustrates the three phases in the code development flow. Because phase 3 is usually too detailed and time consuming, most of the time we will not go into phase 3 to write linear assembly code unless the software pipelining efficiency is too bad or the resource allocation is too unbalanced.
3.4 Code Optimization on TI DSP Platform
In this section, we describe several methods that can accelerate our code and reduce the execution time on the C64x DSP. First, we introduce two techniques that can be used to analyze the performance of specific code regions:
• Use the clock( ) and printf( ) functions in C/C++ to time and display the
perfor-mance of specific code regions. Use the stand-alone simulator (load6x) to run the code for this purpose.
• Use the profile mode of the stand-alone simulator. This can be done by compiling
the code with the -mg option and executing load6x with the -g option. Then enable the clock and use profile points and the RUN command in the Code Composer debugger to track the number of CPU clock cycles consumed by a particular section of code. Use “View Statistics” to view the number of cycles consumed.
Usually, we use the second technique above to analyze the C code performance. The feedback of the optimization result can be obtained with the -mw option. It shows some important results of the assembly optimizer for each code section. We take these results into consideration in improving the computational speed of certain loops in our program.
3.4.1 Compiler Optimization Options [19]
In this subsection, we introduce the compiler options that control the operation of the compiler. The CCS compiler offers high-level language support by transforming C/C++ code into more efficient assembly language source code. The compiler options can be used to optimize the code size or the executing performance.
The major compiler options we use are -o3, -k, -pm -op2, -mh<n>, -mw, and -mi.
• -on: The “n” denotes the level of optimization (0, 1, 2, and 3), which controls the
type and degree of optimization.
– -o3: highest level optimization, whose main features are:
∗ Performs loop optimizations, and loop unrolling. ∗ Removes all functions that are never called.
∗ Reorders function declarations so that the attributes of called functions
are known when the caller is optimized.
∗ Propagates arguments into function bodies when all calls pass the same
value in the same argument position.
∗ Identifies file-level variable characteristics.
• -k: Keep the assembly file to analyze the compiler feedback.
• -pm -op2: In the CCS compiler option, -pm and -op2 are combined into one option.
– -pm: Gives the compiler global access to the whole program or module and allows it to be more aggressive in ruling out dependencies.
– -op2: Specifies that the module contains no functions or variables that are called or modified from outside the source code provided to the compiler. This improves variable analysis and allowed assumptions.
• -mh<n>: Allows speculative execution. The appropriate amount of padding, n,
must be available in data memory to insure correct execution. This is normally not a problem but must be adhered to.
• -mw: Produce additional compiler feedback. This option has no performance or
code size impact.
• -mi: Describes the interrupt threshold to the compiler. If the compiler knows that
no interrupts will occur in the code, it can avoid enabling and disabling interrupts before and after software-pipelined loops for improvement in code size and perfor-mance. In addition, there is potential for performance improvement where interrupt registers may be utilized in high register pressure loops.
3.4.2 Software Pipelining [20]
Software pipelining is a technique used to schedule instructions from a loop so that mul-tiple iterations of the loop execute in parallel. This is the most important feature we rely
on to speed up our system. The compiler always attempts to software-pipeline. Fig. 3.5 illustrates a software pipelined loop. The stages of the loop are represented by A, B, C, D, and E. In this figure, a maximum of five iterations of the loop can execute at one time. The shaded area represents the loop kernel. In the loop kernel, all five stages execute in parallel. The area above the kernel is known as the pipelined loop prolog, and the area below the kernel the pipelined loop epilog.
But under the conditions listed below, the compiler will not do software pipelining [19]:
• If a register value lives too long, the code is not software-pipelined.
• If a loop has complex condition code within the body that requires more than five
condition registers, the loop is not software pipelined.
• A software-pipelined loop cannot contain function calls, including code that calls
the run-time support routines.
• In a sequence of nested loops, the innermost loop is the only one that can be
software-pipelined.
• If a loop contains conditional break, it is not software-pipelined.
Usually, we should maximize the number of loops that satisfy the requirements of soft-ware pipelining. Softsoft-ware pipelining is a very important technique for optimization; its importance cannot be overemphasized.
3.4.3 Intrinsics [19]
We do not use any intrinsics in our code, but we introduce the concept of this tech-nique here. The C6000 compiler provides intrinsics, which are special functions that map directly to C64x instructions, to optimize C/C++ code quickly. All assembly instruc-tions that are not easily expressed in C/C++ code are supported as intrinsics. A table of TMS320C6000 C/C++ compiler intrinsics can be found in [19].
Chapter 4
Channel Bandwidth and Cyclic Prefix
Detection for IEEE 802.16e OFDMA
The IEEE 802.16e OFDMA standard is very flexible in choice of bandwidth, FFT size, and CP length [2][p.781]. The FFT size can be 2048, 1024, 512, and 128. The ratio of CP time to useful time can be 1/32, 1/16, 1/8, and 1/4. When performing initial network entry, the MS may implement a scanning and search mechanism to detect the DL signal. This includes dynamic detection of the FFT size and the channel bandwidth employed by the BS. After detecting the bandwidth and the FFT size, MS should search all possi-ble values of CP until it finds the CP being used by the BS. In this chapter, we discuss these detection issues. Note that, in bandwidth detection, we consider recognizing the bandwidth structure. That is appropriated for cognitive radio, not necessarily needed by WiMAX MS.
4.1 System Parameters
The system profile we select is WirelessMAN-OFDMA PHY profile, TDD, and single-input single-output (SISO) operation. The center frequency is 3.5 GHz, and the FFT sizes are 512, 1024, and 2048. The CP lengths are 1/32, 1/16, 1/8, and 1/4 of the FFT size. We consider the PUSC permutation in CP detection simulation and use segment 0 to allocate data subcarriers, but in bandwidth detection simulation and analysis, we consider use of all subchannels with no pilot subcarriers and no preambles for convenience.
The modulation could be QPSK, 16-QAM, or 64-QAM with randomly generated data. Other parameter values are as specified in Table 4.1.
Table 4.1: System Parameters Used in Our Study Parameters Values System Channel Bandwidth (MHz) 5 10 20 Sampling Frequency (MHz) 5.6 11.2 22.4 FFT Size 512 1024 2048 Subcarrier Spacing (kHz) 10.94 Useful Symbol Time (µsec) 91.4
We consider the S-OFDMA system, that is, the FFT size is proportional to the allo-cated bandwidth. Therefore, we can know the bandwidth employed by BS as the FFT size is detected, and vice versa. So we only look at bandwidth and CP detection techniques in this research.
4.2 Bandwidth Detection
As mentioned above, the MS should detect the channel bandwidth employed by the BS at initial network entry. It may be 5, 10, or 20 MHz. Because of the use of S-OFDMA, the subcarrier spacing is fixed no matter which bandwidth is used. We consider detecting the bandwidth structure in a specific 20 MHz band where multiple systems that use different bandwidths may coexist and the CFO is assumed to be zero. The system profile in IEEE 802.16e [2] specifies some rules for the center frequency of a system of a certain band-width. In our research, we assume the center frequencies of systems whose bandwidth is 5 MHz are multiples of 2.5 MHz away from 3.5 GHz, and the center frequencies of systems whose bandwidth is 10 MHz are multiples of 5 MHz away from 3.5 GHz. Fig-ure 4.1 shows all possible bandwidth distributions in the 20 MHz, and detecting which bandwidth distribution is in the 20 MHz is our major work. The following are the steps of the bandwidth detection method that we use:
1. Receive signals with high sampling frequency (22.4 MHz).
2. After each 2048 points, do FFT to convert the signal to the frequency domain. 3. Calculate the power spectrum.
Figure 4.1: Possible channel bandwidth distributions with PUSC permutation. All 49 possibilities are listed on the right.
the mean power. This normalizes the final mean power to one. 5. Decide bandwidth distribution.
4.2.1 Interference Analysis
Ideally, we should like the result of step 3 to be something like that depicted in Figure 4.1. However, even though the channel is perfect, we cannot get the perfect power spectrum as shown in Figure 4.1. Interferences comes from two sources. One is the starting point of each 2048 points may not fall in the CP or the first point in the useful time, that is, the 2048 points may come from two adjacent OFDMA symbols. The proportions of the two adjacent symbols depend on the CP length. The shorter the CP length the higher the probability the 2048 points include two symbols. Figure 4.2 shows one example and Figure 4.3 shows the resulting power spectrum. The other reason has to do with the
Figure 4.2: The section of samples r (i.e., the 2048 points we get) may straddle over two adjacent symbols.
0 500 1000 1500 2000 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 frequency power
lambda = 500, T1 and T2 are 16−QAM in 20MHz, perfect channel
Figure 4.3: Power spectrum of 2048 points of one 20 MHz bandwidth signal with QPSK modulation in perfect channel (λ = 500).
relation between the center frequency of each system and the sampling rate. The center frequency of each system is a multiple of 2.5 MHz away from 3.5 GHz, but 2.5 MHz is not an integer multiple of the subcarrier spacing (10937.5 Hz). Therefore, when we transform 2048 points of samples of signals, say, having a 3.5 GHz center frequency into the frequency domain, the fractional subcarrier offset with signals that have center frequencies at other than 3.5 GHz will cause interference. Figure 4.4 illustrates this effect.
Consider the case of a single 20 MHz system with center frequency at 3.5 GHz and with 1/8 CP. Let the FFT start point be at the λth point of one OFDMA symbol. That is, the 2048 points we get include parts of two adjacent symbols where the first (2048 +
CP length) − λ points are from the first symbol and the last λ − CP length points are
from the second symbol. (See Figure 4.2.) Let r(n) denote the signal samples and let the two adjacent symbols (without CP) be denoted t1(n) and t2(n), where n ∈ ℵ and 0 ≤ n ≤ 2047. And let their frequency spectra be denoted R(k), T 1(k), and T 2(k), respectively, where k ∈ ℵ and 0 ≤ k ≤ 2047. Because the first (2048 + CP length) − λ points of r are the last (2048 + CP length) − λ points of t1, the first part of r is t1 left circular shifted by λ − CP length points or, equivalently, right circular shifted by