IEEE 802.16e OFDMA 上行及下行通道估測技術之探討與數位訊號處理器實現

全文

(1)國立交通大學電子工程學系電子研究所碩士班碩. 士. 論. 文. IEEE 802.16e OFDMA 上行及下行通道估測技術之探討與數位訊號處理器實現 Research in and DSP Implementation of Channel Estimation Techniques for IEEE 802.16e OFDMA Uplink and Downlink. 研究生：王依翎指導教授：林大衛博士. 中華民國九十六年六月.

(2) IEEE 802.16e OFDMA 上行及下行通道估測技術之探討與數位訊號處理器實現 Research in and DSP Implementation of Channel Estimation Techniques for IEEE 802.16e OFDMA Uplink and Downlink. 研究生: 王依翎. Student: Yi-Ling Wang. 指導教授: 林大衛博士. Advisor: Dr. David W. Lin. 國立交通大學電子工程學系. 電子研究所碩士班. 碩士論文. A Thesis Submitted to Department of Electronics Engineering & Institute of Electronics College of Electrical and Computer Engineering National Chiao Tung University in Partial Fulfillment of the Requirements for the Degree of Master in Electronics Engineering June 2007 Hsinchu, Taiwan, Republic of China. 中華民國九十六年六月.

(3) IEEE 802.16e OFDMA 上行及下行通道估測技術之探討與數位訊號處理器實現. 研究生：王依翎. 指導教授：林大衛博士. 國立交通大學電子工程學系電子研究所碩士班. 摘要正交分頻多重進接(OFDMA)技術近來在行動環境中廣受注目且已經應用在許多數位通訊應用中。採用 OFDMA 一個最主要的原因是其抗頻率選擇性衰變及窄頻干擾的能力。我們聚焦在 IEEE 802.16e OFDMA 上行及下行傳輸的通道估測部分。我們並在 Sundance 公司的版上裝置德州儀器公司的 TMS320C6416 數位信號處理器來實現通道估測的機制。通道估測大致可以分成三個階段。首先我們使用最小平方差的估測器來估計在導訊上的通道頻率響應，這是為了硬體的計算方便。其次我們在頻率域上使用線性內插法來得到在資料載波上的通道響應。最後我們使用平均時間技巧在時域上來增進其效能。我們先在 AWGN 通道上驗證我們的模擬模型，然後再放置於多重路徑的 SUI-2 和 SUI-3 通道上模擬。在上行傳輸，我們提出了瓦線性內插法;而在下行傳輸，我們有提出了 2 點、 4 點以及進階 4 點群線性內插法。為了增進程式在數位訊號處理器上的執行效率，我們先將原始的浮點運算 C 程式版本修改為實數運算的程式版本，接著再 i.

(4) 考慮數位訊號處理器的特性來修改之前的程式在本篇論文中，我們首先簡介 IEEE 802.16e OFDMA 上行及下行的標準機制和 DSP 的實現環境。接著，我們分別在各傳輸情形下介紹所用的通道估測方法並探討其估測效能及數位訊號處理器實現方面的實驗結果。. ii.

(5) Research in and DSP Implementation of Channel Estimation Techniques for IEEE 802.16e OFDMA Uplink and Downlink. Student：Yi-Ling Wang. Advisor：Dr. David W. Lin. Department of Electronics Engineering Institute of Electronics National Chiao Tung University Abstract. OFDMA (orthogonal frequency division multiple access) technique has drawn much interest recently in the mobile transmission environment and been successfully applied to a wide variety of digital communications applications over the past several years. One of the main reason to use OFDMA is its robustness against frequency selective fading and narrowband interference. We focus on the OFDMA uplink and downlink channel estimation based on IEEE 802.16e. We also implement these channel estimation schemes on Texas Instruments’ TMS320C6416 digital signal processor (DSP) housed on Sundance board. The channel estimation schemes can be separated into three steps. First, we use LS estimator on pilot subcarriers because of its low computational complexity. Second, we estimate the channel response on data subcarriers using linear iii.

(6) interpolation in the frequency domain. Finally we try time averaging technique to improve the performance in the time domain. We verify our simulation model on AWGN channel and then do the simulation on SUI-2 and SUI-3 multipath channels. In uplink transmission, we propose the tile linear interpolation and as for downlink, we use the 2-point, 4-point and advanced 4-point cluster linear interpolation. In order to increase the efficiency on DSP, we rewrite the floating-point C program to fixed-point version and further refine our codes by considering the features of the DSP chip. In this thesis, we first introduce the standard of the IEEE 802.16e OFDMA uplink and downlink and the DSP implementation environment . Then we describe the channel estimation methods we use and discuss the performance and the DSP implementation results in each transmission condition.. iv.

(7) 誌謝這篇論文能夠順利完成，要感謝的人很多，首先要最感謝我的指導教授林大衛老師，感謝他兩年來在教學上對我的指導與包容，在遭遇困難時老師總是能細心地給予適當的方向去解決問題，能成為老師的學生真的是我前世修來的福氣及畢生最大的榮幸。此外，由衷感謝通訊電子與訊號處理實驗室所有的成員，包含各位師長、同學、學長姐與學弟妹們。特別感謝洪崑健學長、吳俊榮學長對我在學業研究上的不吝指導與建議，還有耀鈞、柏昇、政達、介遠、順成、、等同學，謝謝他們這兩年來對我的照顧以及幫助。最後更要感謝我的父母親，家人對我的支持、鼓勵是我求學路上精神的最大慰藉，對他們的感謝是筆墨難以形容的。最後由衷感謝所有幫助關懷過我的人。. 王依翎民國九十六年七月於新竹. v.

(8) Contents. 1 Introduction. 1. 2 Introduction to IEEE802.16e OFDMA. 3. 2.1. 2.2. 2.3. Overview of OFDMA [3], [4] . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3. 2.1.1. Cyclic Prefix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4. 2.1.2. Discrete Time Baseband Equivalent System Model . . . . . . . . . .. 5. Basic OFDMA Symbol Structure in IEEE 802.16e. . . . . . . . . . . . . . .. 6. 2.2.1. OFDMA Basic Terms . . . . . . . . . . . . . . . . . . . . . . . . . . .. 7. 2.2.2. Frequency Domain Description . . . . . . . . . . . . . . . . . . . . .. 7. 2.2.3. Primitive Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . .. 8. 2.2.4. Derived Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 9. 2.2.5. Frame Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 9. Uplink Transmission in IEEE 802.16e OFDMA. . . . . . . . . . . . . . . . .. 10. 2.3.1. Data Mapping Rules . . . . . . . . . . . . . . . . . . . . . . . . . . .. 10. 2.3.2. Carrier Allocations . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 11. 2.3.3. Pilot Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 15. vi.

(9) 2.3.4 2.4. Data Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 15. Downlink Transmission in IEEE 802.16e OFDMA . . . . . . . . . . . . . . .. 16. 2.4.1. Data Mapping Rules . . . . . . . . . . . . . . . . . . . . . . . . . . .. 16. 2.4.2. Preamble Structure and Modulation. . . . . . . . . . . . . . . . . . .. 17. 2.4.3. Subcarrier Allocations . . . . . . . . . . . . . . . . . . . . . . . . . .. 19. 2.4.4. Pilot Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 22. 2.4.5. Data Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 22. 3 The DSP Hardware and Associated Software Development Environment 23 3.1. The TMS320C6416 DSP [7] . . . . . . . . . . . . . . . . . . . . . . . . . . .. 23. 3.1.1. TMS320c64x Features . . . . . . . . . . . . . . . . . . . . . . . . . .. 23. 3.1.2. Central Processing Unit . . . . . . . . . . . . . . . . . . . . . . . . .. 25. 3.1.3. Memory Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . .. 29. 3.2. The Code Composer Studio Development Tools [9], [10] . . . . . . . . . . . .. 33. 3.3. Code Optimization Methods [12] . . . . . . . . . . . . . . . . . . . . . . . .. 35. 3.3.1. Compiler Optimization Options [9], [10]. . . . . . . . . . . . . . . . .. 37. 3.3.2. Using Intrinsics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 39. 4 Uplink Channel Estimation and DSP Implementation 4.1. 41. Channel Estimation Techniques . . . . . . . . . . . . . . . . . . . . . . . . .. 41. 4.1.1. The Least-Squares (LS) Estimator. . . . . . . . . . . . . . . . . . . .. 42. 4.1.2. Linear Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . .. 43. vii.

(10) 4.2. 4.3. 4.4. 4.5. 4.1.3. Time Averaging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 43. 4.1.4. Application to IEEE 802.16e OFDMA Uplink . . . . . . . . . . . . .. 45. Simulation Parameters and Channel Model . . . . . . . . . . . . . . . . . . .. 46. 4.2.1. OFDMA Uplink System Parameters. . . . . . . . . . . . . . . . . . .. 46. 4.2.2. Simulation Channel Model . . . . . . . . . . . . . . . . . . . . . . . .. 47. Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 47. 4.3.1. Simulation Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 47. 4.3.2. Validation of Simulation Model . . . . . . . . . . . . . . . . . . . . .. 49. 4.3.3. Floating-point Simulation . . . . . . . . . . . . . . . . . . . . . . . .. 50. DSP Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 60. 4.4.1. Fixed-Point Data Formats . . . . . . . . . . . . . . . . . . . . . . . .. 60. 4.4.2. Fixed-Point Simulation . . . . . . . . . . . . . . . . . . . . . . . . . .. 61. 4.4.3. DSP Computational Load . . . . . . . . . . . . . . . . . . . . . . . .. 63. Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 64. 5 Downlink Channel Estimation and DSP Implementation. 71. 5.1. System Parameters and Channel Model . . . . . . . . . . . . . . . . . . . . .. 71. 5.2. Channel Estimation Methods . . . . . . . . . . . . . . . . . . . . . . . . . .. 71. 5.2.1. Two-Point Cluster Linear Interpolation . . . . . . . . . . . . . . . . .. 72. 5.2.2. Four-Point Cluster Linear Interpolation . . . . . . . . . . . . . . . . .. 74. 5.2.3. Advanced Four-Point Cluster Linear Interpolation . . . . . . . . . . .. 76. Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 78. 5.3. viii.

(11) 5.4. 5.5. 5.3.1. Simulation Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 78. 5.3.2. Validation with AWGN Channel . . . . . . . . . . . . . . . . . . . . .. 78. 5.3.3. Floating-Point Simulation . . . . . . . . . . . . . . . . . . . . . . . .. 79. 5.3.4. Cluster Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 91. DSP Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 95. 5.4.1. Fixed-Point Data Formats . . . . . . . . . . . . . . . . . . . . . . . .. 95. 5.4.2. Fixed-Point Simulation . . . . . . . . . . . . . . . . . . . . . . . . . .. 95. 5.4.3. DSP Simulation Loading . . . . . . . . . . . . . . . . . . . . . . . . .. 98. Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 99. 6 Conclusion and Future Work. 111. 6.1. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111. 6.2. Potential Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112. Bibliography. 113. ix.

(12) List of Figures 2.1 Discrete-time model of the baseband OFDMA system (from[3]). . . . . . . .. 4. 2.2. OFDMA symbol time structure (from [5]). . . . . . . . . . . . . . . . . . . .. 5. 2.3. Discrete-time baseband equivalent of an OFDMA system with M users (from [4]).. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6. 2.4. Example of the data region which defines the OFDMA allocation (from [5]).. 8. 2.5. OFDMA frequency description (from [5]). . . . . . . . . . . . . . . . . . . .. 8. 2.6. Example of an OFDMA frame (with only mandatory zone) in TDD mode (from [6]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2.7. 10. Example of mapping OFDMA slots to subchannels and symbols in the uplink (from [6]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 12. 2.8. Description of an uplink tile (from [5]). . . . . . . . . . . . . . . . . . . . . .. 12. 2.9. PRBS generator for pilot modulation (from [5] and [6]). . . . . . . . . . . . .. 15. 2.10 QPSK, 16-QAM, and 64-QAM constellations (from [5]). . . . . . . . . . . . .. 16. 2.11 Example of mapping OFDMA slots to subchannels and symbols in the downlink in PUSC mode (from [6]). . . . . . . . . . . . . . . . . . . . . . . . . . .. 17. 2.12 Downlink transmission basic structure (from [5]). . . . . . . . . . . . . . . .. 18. 2.13 Cluster structure (from [6]). . . . . . . . . . . . . . . . . . . . . . . . . . . .. 19. x.

(13) 3.1 The DSP on the Sundance board . . . . . . . . . . . . . . . . . . . . . . . .. 24. 3.2. Block diagram of the TMS320C6416 DSP [7]. . . . . . . . . . . . . . . . . .. 26. 3.3. Pipeline phases of TMS320C6416 DSP [7]. . . . . . . . . . . . . . . . . . . .. 27. 3.4. TMS320C64x CPU data paths [7]. . . . . . . . . . . . . . . . . . . . . . . . .. 32. 3.5. Code development flow for TI C6000 DSP [12]. . . . . . . . . . . . . . . . . .. 36. 4.1. Tile structure.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 45. 4.2. Block diagram of the simulated system. . . . . . . . . . . . . . . . . . . . . .. 49. 4.3. The SER curve for uncoded QPSK resulting from simulation matches the theoretical one. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4.4. Tile linear interpolation with different exponential weighting in AWGN with QPSK. (a) MSE. (b) SER.. 4.5. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 55. Tile linear interpolation compared with theory adding data MSE in AWGN: (a),(b) QPSK. (c),(d) 16QAM. (e),(f) 64QAM. . . . . . . . . . . . . . . . . .. 4.9. 54. Tile linear interpolation with different modulations in AWGN. (a) MSE. (b) SER. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4.8. 53. Tile linear interpolation of exponential weighting 0.9 with different velocities in SUI-2 with QPSK. (a) MSE. (b) SER. . . . . . . . . . . . . . . . . . . . .. 4.7. 52. Tile linear interpolation with different exponential weighting in SUI-2 with velocity v=60 km/hr with QPSK. (a) MSE. (b) SER. . . . . . . . . . . . . .. 4.6. 50. 56. Tile linear interpolation with different velocity and different modulations in SUI-2. (a),(b) QPSK. (c),(d) 16QAM. (e),(f) 64QAM. . . . . . . . . . . . . .. xi. 57.

(14) 4.10 Tile linear interpolation with different velocities in SUI-3 with QPSK. (a) MSE. (b)SER.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 58. 4.11 Tile linear interpolation with different used subchannels. (a),(b) AWGN. (c),(d) SUI-2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 59. 4.12 Fixed-point data format in our design. . . . . . . . . . . . . . . . . . . . . .. 60. 4.13 Fixed-point formats in channel estimation of our design. . . . . . . . . . . .. 60. 4.14 Performance of fixed-point computation of tile linear interpolation (10 used subchannels) compared to floating-point computation (a),(b) AWGN. (c),(d) SUI-2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 62. 4.15 FIXED.H. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 64. 4.16 Function channel estimation FIXED. . . . . . . . . . . . . . . . . . . . . . .. 65. 4.17 Function pilot extraction FIXED. . . . . . . . . . . . . . . . . . . . . . . . .. 65. 4.18 Function interpolation FIXED.. . . . . . . . . . . . . . . . . . . . . . . . . .. 66. 4.19 Assembly code of function channel estimation FIXED. . . . . . . . . . . . .. 67. 4.20 Assembly code of function interpolation FIXED. . . . . . . . . . . . . . . . .. 68. 4.21 Software pipelining information of function channel estimation FIXED. . . .. 69. 5.1. Structure of cluster organization in time.. 73. 5.2. (a) 2-point cluster linear interpolation illustration, bold line is our estimation. . . . . . . . . . . . . . . . . . . .. of linear interpolation (b) pilot positions are different in even and odd symbols 73 5.3. (a) Pilots in previous symbol taken as reference. (b) Four pilot points in cluster. (c) Four-point cluster linear interpolation illustration. Bold line is our estimation by linear interpolation. . . . . . . . . . . . . . . . . . . . . .. xii. 75.

(15) 5.4 Advanced four-point cluster linear interpolation. (a) First data symbol. (b) Second to (n − 1)th data symbols. (c) Last (nth) data symbol. . . . . . . . .. 76. 5.5. Downlink transmission simulation flow. (a) Preamble. (b) Data symbols. . .. 79. 5.6. The SER curve for uncoded QPSK resulting from simulation matches the theoretical one. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 5.7. Two-point cluster linear interpolation with different exponential weighting with QPSK. (a),(b) In AWGN. (c),(d) In SUI-2 with velocity v=60 km/hr. .. 5.8. 80. 82. Three methods of cluster cluster linear interpolation with different velocity in SUI-2 of QPSK. (a),(b) Two-point with exponential weighting w=0.9. (c),(d) Two-point. (e),(f) Four-point. . . . . . . . . . . . . . . . . . . . . . . . . . .. 5.9. 83. Comparison of all methods we use, including two-point, two-point with exponential weighting w=0.9, four-point and advanced four-point cluster linear interpolation in AWGN. (a) MSE. (b) SER. . . . . . . . . . . . . . . . . . .. 84. 5.10 Comparison of all methods we use, including two-point , two-point with exponential weighting w=0.9, four-point and advanced four-point cluster linear interpolation in SUI-2 with velocity v=60 km/hr. (a) MSE. (b) SER. . . . .. 85. 5.11 Advanced four-point linear interpolation with different modulation in AWGN. (a) MSE. (b) SER. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 86. 5.12 Advanced four-point cluster linear interpolation compared with theory adding data MSE in AWGN. (a),(b) QPSK. (c),(d) 16QAM. (e),(f) 64QAM. . . . .. 87. 5.13 Advanced four-point cluster linear interpolation with different velocities and different modulations in SUI-2. (a),(b) QPSK. (c),(d) 16QAM. (e),(f) 64QAM. 88 5.14 Advanced four-point cluster linear interpolation with different velocities in SUI-3 with QPSK. (a) MSE. (b) SER. . . . . . . . . . . . . . . . . . . . . . xiii. 89.

(16) 5.15 Advanced four-point cluster linear interpolation considering preamble effect. (a),(b) AWGN. (c),(d) SUI-2. . . . . . . . . . . . . . . . . . . . . . . . . . .. 90. 5.16 Advanced four-point cluster linear interpolation with no preamble in AWGN. (a) MSE. (b) SER. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 91. 5.17 (a)MSE and (b)SER over used subcarriers in AWGN at 10 dB SNR. . . . . .. 92. 5.18 Average cluster performance in AWGN at 10 dB SNR. (a) MSE. (b) SER. .. 93. 5.19 Average cluster performance in AWGN at 10 dB SNR. (a) MSE. (b) SER. .. 93. 5.20 Average cluster performance in AWGN at 10 dB SNR. (a),(b) Even symbols. (c),(d) Odd Symbols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 94. 5.21 Fixed-point preamble transmission formats in our design. . . . . . . . . . . .. 95. 5.22 Fixed-point data transmission formats in our design. . . . . . . . . . . . . .. 96. 5.23 Fixed-point data formats in preamble estimation of our design. . . . . . . . .. 96. 5.24 Fixed-point data formats in channel estimation of our design. . . . . . . . . .. 96. 5.25 Fixed-point computation of advanced four-point cluster linear interpolation in AWGN. (a) MSE. (b) SER.. . . . . . . . . . . . . . . . . . . . . . . . . .. 97. 5.26 Fixed-point computation of advanced four-point cluster linear interpolation in SUI-2. (a) MSE. (b) SER.. . . . . . . . . . . . . . . . . . . . . . . . . . .. 97. 5.27 FIXED.H. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 99. 5.28 Function preamble estimation FIXED. . . . . . . . . . . . . . . . . . . . . . 100 5.29 Function channel estimation FIXED. . . . . . . . . . . . . . . . . . . . . . . 101 5.30 Function pilot extraction FIXED. . . . . . . . . . . . . . . . . . . . . . . . . 101 5.31 Function interpolation FIXED.. . . . . . . . . . . . . . . . . . . . . . . . . . 102. xiv.

(17) 5.32 Function interpolation FIXED (cont.). . . . . . . . . . . . . . . . . . . . . . 103 5.33 Function interpolation FIXED (cont.). . . . . . . . . . . . . . . . . . . . . . 104 5.34 Assembly code of function preamble estimation FIXED. . . . . . . . . . . . . 105 5.35 Assembly code of function channel estimation FIXED. . . . . . . . . . . . . 106 5.36 Assembly code of function pilot extraction FIXED. . . . . . . . . . . . . . . 107 5.37 Assembly code of function interpolation FIXED. . . . . . . . . . . . . . . . . 108 5.38 Software pipelining information of function preamble estimation FIXED. . . 109 5.39 Software pipelining information of function channel estimation FIXED. . . . 110. xv.

(18) List of Tables 2.1 OFDMA Uplink Subcarrier Allocations [5], [6] . . . . . . . . . . . . . . . . .. 13. 2.2. OFDMA Downlink Subcarrier Allocation under PUSC [5], [6]. . . . . . . . .. 20. 3.1. Execution Stage Length Description for Each Instruction Type [7] . . . . . .. 28. 3.2. Functional Units and Operations Performed [7]. . . . . . . . . . . . . . . . .. 30. 3.3. Functional Units and Operations Performed (Continued) [7] . . . . . . . . .. 31. 4.1. OFDMA Uplink Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . .. 46. 4.2. Channel Profiles of SUI-2 and SUI-3 [16] . . . . . . . . . . . . . . . . . . . .. 48. 4.3. OFDMA Uplink DSP Loading . . . . . . . . . . . . . . . . . . . . . . . . . .. 63. 4.4. OFDMA Uplink Efficiency Performance Comparison . . . . . . . . . . . . .. 70. 5.1. OFDMA Downlink Parameters . . . . . . . . . . . . . . . . . . . . . . . . .. 72. 5.2. OFDMA DL DSP Loading for Channel Estimation in 2048-FFT, BW: 20 MHz 98. xvi.

(19) Chapter 1 Introduction Orthogonal frequency division multiple access (OFDMA) has emerged as one of the prime multiple access schemes for broadband wireless networks (e.g., IEEE 802.16 Mobile WiMAX, DVB-RCA, etc.). As a special case of multicarrier multiple access schemes, OFDMA exclusively assigns each subchannel to only one user, eliminating the intra-cell interference (ICI). For fixed or portable applications where the frequency selective channels are slowly varying, an intrinsic advantage of OFDMA is its capability to exploit the so-called multiuser diversity embedded in multipath channels. Furthermore, OFDMA has the merit of easy decoding at the receiver side due to the absence of ICI. Other advantages of OFDMA include finer granularity and better link budget [1]. OFDMA can be easily generated using an inverse fast Fourier transform (IFFT) and received using a fast Fourier transform (FFT). The IEEE 802.16 standard committee has developed a group of standards for wireless metropolitan area networks (MANs). OFDMA is used in the 2 to 11 GHz Fixed Wireless Access (FWA) systems. IEEE 802.16 has developed the IEEE Standard 802.16-2004 for broadband wireless access systems, which provides a variety of services to fixed outdoor as well as nomadic indoor users. The 802.16e is designed to support terminal mobility, and currently it aims to serve terminals with a speed of 120 km/hr [2].. 1.

(20) This thesis focuses on the channel estimation part for WirelessMAN-OFDMA in both uplink and downlink transmission, and it is organized as follows. First, in chapter 2, we introduce some OFDMA basics in the IEEE 802.16e OFDMA uplink and downlink standard. In chapter 3, we describe the implementation platform, which consists of Texas Instrument’s TMS320C6416 digital signal processor (DSP) on a Sundance Carrier board. In chapter 4, the various channel estimation techniques are introduced and we discuss the performance of channel estimation methods in uplink transmission and some DSP implementation issues. In chapter 5, we propose several methods for downlink, compare the performance of each method and also give some DSP implementation issues. At last, we mention the conclusion and give some potential future work in chapter 6.. 2.

(21) Chapter 2 Introduction to IEEE802.16e OFDMA We first give the basic concept of the OFDMA techniques for multicarrier modulation. The downlink and uplink specifications of IEEE 802.16e are introduced afterward.. 2.1. Overview of OFDMA [3], [4]. Orthogonal frequency-division multiple-access (OFDMA) is being considered to be the multiple access scheme for future wireless systems, e.g., WiMAX or fourth-generation (4G) broadband wireless networks. In an OFDMA system, several users simultaneously transmit their data by modulating an exclusive set of orthogonal subcarriers, thus each user’s signal can be separated easily in the frequency domain. One typical structure is the subband OFDMA, which divides all available subcarriers into a number of subbands. Each user is allowed to use one available subband for the data transmission. Pilot symbols are employed for the estimation of channel state information (CSI) within the subband. Furthermore, robustness to narrowband interference and dynamic channel assignment are other two advantages of OFDMA systems. Figure 2.1 shows an OFDMA network in which active users simultaneously communicate with the base station (BS).. 3.

(22) Figure 2.1: Discrete-time model of the baseband OFDMA system (from[3]).. 2.1.1. Cyclic Prefix. Cyclic prefix (CP) is used to overcome the intersymbol and interchannel interference problems. The multiuser channel is assumed to be substantially invariant within one-block (or -symbol) duration. The symbol timing mismatch is assumed to be smaller than the CP duration. In this scenario, users do not interfere each other in the frequency domain. A CP is a copy of the last part of the OFDMA symbol (see Fig. 2.2). A copy of the last Tg of the useful symbol period, termed CP, is used to collect multipath while maintaining the orthogonality of the tones. However, the transmitter energy increases with the length of the guard time while the receiver energy remains the same (the cyclic extension is discarded), so there is a 10 log(1-Tg /(Tb +Tg ))/log(10) dB loss in Eb /N0 .. 4.

(23) Figure 2.2: OFDMA symbol time structure (from [5]).. 2.1.2. Discrete Time Baseband Equivalent System Model. The material in this subsection is mainly taken from [4]. If we consider an OFDMA system with M active users sharing a bandwidth of B = T1 Hz (T is the sampling period) as shown in Fig. 2.3. The system consists of K subcarriers of which Ku are useful subcarriers (excluding guard bands and DC subcarrier). The users are allocated non-overlapping subcarriers in the spectrum depending on their needs. The discrete time baseband channel consists of L multipath components and has the form h(l) =. L−1 X. hm δ(l − lm ). (2.1). m=0. where hm is a zero-mean complex Gaussian random variable with E[hi h∗j ] = 0 for i 6= j. In frequency domain H = Fh. (2.2). where H = [H0 , H1 , ..., HK−1 ]T , h = [h0 , ..., hL−1 , 0, ..., 0]T and F is K-point DFT matrix. The impulse response length lL−1 is upper bounded by the length of CP (Lcp ). The received signal in frequency domain is given by Yn =. M X. Xi,n Hi,n + Vn. i=1. 5. (2.3).

(24) Figure 2.3: Discrete-time baseband equivalent of an OFDMA system with M users (from [4]).. where Xi,n = diag(Xi,n,0 , ..., Xi,n,K−1 ) is K × K diagonal data matrix and Hi,n is the K × 1 channel vector (2.2) corresponding to the ith user in nth symbol. The noise vector Vn is distributed as CN (0, σ 2 IK ).. 2.2. Basic OFDMA Symbol Structure in IEEE 802.16e. The WirelessMAN-OFDMA PHY, based on OFDM modulation, is designed for nonlineof-sight (NLOS) operation in frequency bands below 11 GHz. For licensed bands, channel bandwidths allowed shall be limited to the regulatory provisioned bandwidth divided by any power of 2 no less than 1.0 MHz. The material is mainly taken from [5] and [6].. 6.

(25) 2.2.1. OFDMA Basic Terms. We introduce some basic terms appeared in OFDMA PHY. These definitions would help us to understand the concepts of subcarrier allocation and transmission of IEEE 802.16e OFDMA. • Slot: A slot in the OFDMA PHY is a two-dimensional entity spanning both a time and a subchannel dimension. It is the minimum possible data allocation unit. For downlink (DL) PUSC (Partial Usage of SubChannels), one slot is one subchannel by two OFDMA symbols. For uplink (UL), one slot is one subchannel by three OFDMA symbols. • Data Region: In OFDMA, a data region is a two-dimensional allocation of a group of contiguous subchannels, in a group of contiguous OFDMA symbols. All the allocations refer to logical subchannels. A two dimensional allocation may be visualized as a rectangle, such as the 4 × 3 rectangle shown in Fig. 2.4. • Segment: A segment is a subdivision of the set of available OFDMA subchannels (that may include all available subchannels). One segment is used for deploying a single instance of the MAC.. 2.2.2. Frequency Domain Description. An OFDMA symbol (see Fig. 2.5) is made up of subcarriers, the number of which determines the FFT size used. There are several subcarrier types: • Data subcarriers: For data transmission. • Pilot subcarriers: For various estimation purposes. 7.

(26) Figure 2.4: Example of the data region which defines the OFDMA allocation (from [5]).. Figure 2.5: OFDMA frequency description (from [5]).. • Null subcarriers: No transmission at all, for guard bands and DC subcarrier.. 2.2.3. Primitive Parameters. Four primitive parameters characterize the OFDMA symbols: • BW : The nominal channel bandwidth. • Nused : Number of used subcarriers (which includes the DC subcarrier). • n: Sampling factor. This parameter, in conjunction with BW and Nused , determines the subcarrier spacing and the useful symbol time. Its value is set as follows: For channel bandwidths that are a multiple of 1.75 MHz n = 8/7, else for channel bandwidths. 8.

(27) that are a multiple of any of 1.25, 1.5, 2 or 2.75 MHz n = 28/25, else for channel bandwidths not otherwise specified n = 8/7. • G: This is the ratio of CP time to “useful” time, i.e., Tcp /Ts . The following values shall be supported: 1/32, 1/16, 1/8, and 1/4.. 2.2.4. Derived Parameters. The following parameters are defined in terms of the primitive parameters. • NF F T : Smallest power of two greater than Nused . • Sampling frequency: Fs = f loor(n·BW/8000) × 8000. • Subcarrier spacing: 4f = Fs /NF F T . • Useful symbol time: Tb = 1/4f . • CP time: Tg = G × Tb . • OFDMA symbol time: Ts = Tb + Tg . • Sampling time: Tb /NF F T .. 2.2.5. Frame Structure. When implementing a time-division duplex (TDD) system, the frame structure is built from base station (BS) and subscriber station (SS) transmissions. Each frame in the DL transmission begins with a preamble followed by a DL transmission period and an UL transmission period. In each frame, the TTG and RTG shall be inserted between the downlink and uplink and at the end of each frame, respectively, to allow the BS to turn around. Fig. 2.6 shows an example of an OFDMA frame with only mandatory zone in TDD mode. 9.

(28) Figure 2.6: Example of an OFDMA frame (with only mandatory zone) in TDD mode (from [6]).. 2.3. Uplink Transmission in IEEE 802.16e OFDMA. In this section we briefly introduce the specification of IEEE 802.16e OFDMA uplink transmission. The material is mainly taken from [5] and [6].. 2.3.1. Data Mapping Rules. The UL mapping consists of two steps. In the first step, the OFDMA slots allocated to each burst are selected. In the second step, the allocated slots are mapped. Step 1 : Allocate OFDMA slots to bursts. 1) Segment the data into blocks sized to fit into one OFDMA slot. 2) Each slot shall span one or more subchannels in the subchannel axis and one or more 10.

(29) OFDMA symbols in the time axis (see Fig. 2.7 for an example). Map the slots such that the lowest numbered slot occupies the lowest numbered subchannel in the lowest numbered OFDMA symbol. 3) Continue the mapping such that the OFDMA symbol index is increased. When the edge of the UL zone is reached, continue the mapping from the lowest numbered OFDMA symbol in the next available subchannel. 4) An UL allocation is created by selecting an integer number of contiguous slots, according to the ordering of steps 1 to 3. This results in the general Burst structure shown by the gray area in Fig. 2.7. Step 2 : Map OFDMA slots within the UL allocation. 1) Map the slots such that the lowest numbered slot occupies the lowest numbered subchannel in the lowest numbered OFDMA symbol. 2) Continue the mapping such that the Subchannel index is increased. When the last subchannel is reached, continue the mapping from the lowest numbered subchannel in the next OFDMA symbol that belongs to the UL allocation. The resulting order is shown by the arrows in Fig. 2.7. Fig. 2.7 illustrates the order of OFDMA slots mapping to subchannels and OFDMA symbols.. 2.3.2. Carrier Allocations. The uplink supports 70 subchannels for 2048-FFT PUSC permutation. Each transmission uses 48 data carriers as the minimal block of processing. Each new transmission for the uplink commences with the parameters as given in Table 2.1.. 11.

(30) Figure 2.7: Example of mapping OFDMA slots to subchannels and symbols in the uplink (from [6]).. Figure 2.8: Description of an uplink tile (from [5]).. A slot in the uplink is composed of three OFDMA symbols and one subchannel. Within each slot, there are 48 data subcarriers and 24 pilot subcarriers. The subchannel is constructed from six uplink tiles, each having four successive active subcarriers with the configuration as illustrated in Fig. 2.8. The usable subcarriers in the allocated frequency band shall be divided into Ntiles physical. 12.

(31) Table 2.1: OFDMA Parameter Value Number of DC 1 subcarriers Nused 1681 Guard subcarriers: 184,183 Left, Right TilePermutation. Nsubchannels Nsubcarriers Ntiles Number of subcarriers per tile Tiles per subchannel. 70 48 420 4. Uplink Subcarrier Allocations [5], [6] Notes Index 1024 (counting from 0) Number of all subcarriers used within a symbol. Used to allocate tiles to subchannels 6, 48, 58, 57, 50, 1, 13, 26, 46, 44, 30, 3, 27, 53, 22, 18, 61, 7, 55, 36, 45, 37, 52, 15, 40, 2, 20, 4, 34, 31, 10, 5, 41, 9, 69, 63, 21, 11, 12, 19, 68, 56, 43, 23, 25, 39, 66, 42, 16, 47, 51, 8, 62, 14, 33, 24, 32, 17, 54, 29, 67, 49, 65, 35, 38, 59, 64, 28, 60, 0. Number of all subcarriers within a tile. 6. tiles with parameters from Table 2.1. The allocation of physical tiles to logical tiles in subchannels is performed according to: T iles(s, n) = Nsubchannels · n + (P t[(s + n) mod Nsubchannels ] + U L P ermBase)mod Nsubchannels where: • T iles(s, n) is the physical tile index in the FFT with tiles being ordered consecutively from the most negative to the most positive used subcarrier (0 is the starting tile index), • n is the tile index 0..5 in a subchannel, • P t is the tile permutation, 13.

(32) • s is the subchannel number in the range 0...Nsubchannels − 1, • U L P ermBase is an integer value in the range 0..69, which is assigned by a management entity, and • Nsubchannels is the number of subchannels for the FFT size given in Table 2.1. After mapping the physical tiles to logical tiles for each subchannel, the data subcarriers per slot are enumerated by the following process: 1) After allocating the pilot carriers within each tile, indexing of the data subcarriers within each slot is performed starting from the first symbol at the lowest indexed subcarrier of the lowest indexed tile and continuing in an ascending manner through the subcarriers in the same symbol, then going to the next symbol at the lowest indexed data subcarrier, and so on. Data subcarriers shall be indexed from 0 to 47. 2) The mapping of data onto the subcarriers will follow the equation below. This equation calculates the subcarrier index (as assigned in item 1) to which the data constellation point is to be mapped: Subcarrier(n, s) = (n + 13 · s) mod Nsubcarriers where: • Subcarrier(n, s) is the permutated subcarrier index corresponding to data subcarrier n is subchannel s, • n is a running index 0..47, indicating the data constellation point, • s is the subchannel number, and • Nsubcarriers is the number of subcarriers per slot. 14.

(33) Figure 2.9: PRBS generator for pilot modulation (from [5] and [6]).. 2.3.3. Pilot Modulation. The PRBS (pseudo-random binary sequence) generator depicted in Fig. 2.9 is used to produce a sequence, wk . The value of the pilot modulation, on subcarrier k, shall be derived from wk . For the mandatory tile structure in the uplink, pilot subcarriers shall be inserted into each data burst in order to constitute the symbol and they shall be modulated according to their subcarrier location within the OFDMA symbol. The pilot subcarriers shall be modulated according to <{ck } = 2. 2.3.4. ¢ ¡1 − wk , ={ck } = 0. 2. (2.4). Data Modulation. As shown in Fig. 2.10, the data bits are entered serially to the constellation mapper. Gray-mapped QPSK and Gray-mapped 16QAM shall be supported, whereas the support of 64QAM (also Gray-mapped) is optional.. 15.

(34) Figure 2.10: QPSK, 16-QAM, and 64-QAM constellations (from [5]).. 2.4. Downlink Transmission in IEEE 802.16e OFDMA. This section briefly introduces the specifications of IEEE 802.16e OFDMA PUSC downlink transmission. The material is mainly taken from [5] and [6].. 2.4.1. Data Mapping Rules. The downlink data mapping rules are as follows: 1. Segment the data after the modulation block into blocks sized to fit into one OFDMA slot. 2. Each slot shall span one subchannel in the subchannel axis and one or more OFDMA symbols in the time axis, as per the slot definition mentioned before. Map the slots such that the lowest numbered slot occupies the lowest numbered subchannel in the lowest numbered OFDMA symbol. 16.

(35) Figure 2.11: Example of mapping OFDMA slots to subchannels and symbols in the downlink in PUSC mode (from [6]).. 3. Continue the mapping such that the OFDMA subchannel index is increased. When the edge of the Data Region is reached, continue the mapping from the lowest numbered OFDMA subchannel in the next available symbol. Figure 2.11 illustrates the order of OFDMA slots mapping to subchannels and OFDMA symbols.. 2.4.2. Preamble Structure and Modulation. The first symbol of the downlink transmission is the preamble. Fig. 2.12 shows a downlink transmission period. There are three types of preamble carrier-sets, those are defined by allocation of different subcarriers for each one of them. The subcarriers are modulated using a boosted BPSK modulation with a specific pseudo-noise (PN) code. The PN series modulating the pilots in the preamble can be found in [5, pp. 553–562].. 17.

(36) Figure 2.12: Downlink transmission basic structure (from [5]).. The preamble carrier-sets are defined as P reambleCarrierSetn = n + 3 · k,. (2.5). where: • P reambleCarrierSetn specifies all subcarriers allocated to the specific preamble, • n is the number of the preamble carrier-set indexed 0, 1, 2, and • k is a running index 0,. . . ,567. Each segment uses one type of preamble out of the three sets in the following manner: For the preamble symbol, there will be 172 guard band subcarriers on the left side and the right side of the spectrum. Segment i uses preamble carrier-set i, where i = 0, 1, 2. The DC subcarrier will not be modulated at all and the appropriate PN will be discarded. Therefore, DC subcarrier shall always be zeroed. The pilots in downlink preamble shall be modulated as √ ¡1 ¢ − wk , <{P reambleP ilotsM odulated} = 4 · 2 · 2 ={P reambleP ilotsM odulated} = 0. 18. (2.6).

(37) Figure 2.13: Cluster structure (from [6]).. 2.4.3. Subcarrier Allocations. The OFDMA symbol structure is constructed using pilots, data and zero subcarriers. The symbol is first divided into basic clusters and zero carriers are allocated. The pilot tones are allocated first; what remains are data subcarriers, which are divided into subchannels that are used exclusively for data. Pilots and data carriers are allocated within each cluster. Figure 2.13 shows the cluster structure with subcarriers from left to right in order of increasing subcarrier index. For the purpose of determining PUSC pilot location, odd and even symbols are counted from the beginning of the current zone. The first symbol in the zone is even. The preamble shall not be counted as part of the first zone. Table 2.2 summarizes the parameters of the OFDMA PUSC symbol structure. The allocation of subcarriers to subchannels is performed using the following procedure: 1) Divide the subcarriers into a number (Nclusters ) of physical clusters containing 14 adjacent subcarriers each (starting from carrier 0). 2) Renumber the physical clusters into logical clusters using the following formula: LogicalCluster  first DL zone,  RenumberingSequence(P ¡ hysicalCluster), RenumberingSequence (P hysicalCluster+ ¢ =  13 · DL P ermBase)mod Nclusters , otherwise.. 19.

(38) Table 2.2: OFDMA Downlink Subcarrier Allocation under PUSC [5], [6] Parameter Value Comments Number of DC 1 Index 1024 (counting from 0) subcarriers Number of guard 184 subcarriers, left Number of guard 183 subcarriers, right Number of used 1681 Number of all subcarriers used within a subcarriers (Nused ) symbol, including all possible allocated pilots and the DC carrier Number of subcarriers 14 per cluster Number of clusters 120 Renumbering sequence 1 Used to renumber clusters before allocation to subchannels: 6,108,37,81,31,100,42,116,32,107,30,93,54,78, 10,75,50,111,58,106,23,105,16,117,39,95,7, 115,25,119,53,71,22,98,28,79,17,63,27,72,29, 86,5,101,49,104,9,68,1,73,36,74,43,62,20,84, 52,64,34,60,66,48,97,21,91,40,102,56,92,47, 90,33,114,18,70,15,110,51,118,46,83,45,76,57, 99,35,67,55,85,59,113,11,82,38,88,19,77,3,87, 12,89,26,65,41,109,44,69,8,61,13,96,14,103,2, 80,24,112,4,94,0 Number of data 24 subcarriers in each symbol per subchannel Number of subchannels 60 Basic permutation 12 6,9,4,8,10,11,5,2,7,3,1,0 sequence 12 (for 12 subchannels) Basic permutation 8 7,4,0,2,1,5,3,6 sequence 8 (for 8 subchannels). 20.

(39) 3) Dividing the clusters into six major groups. Group 0 includes clusters 0–23, group 1 clusters 24–39, group 2 clusters 40–63, group 3 clusters 64–79, group 4 clusters 80– 103 and group 5 clusters 104–119. These groups may be allocated to segments. If a segment is being used, then at least one group shall be allocated to it. (By default group 0 is allocated to segment 0, group 2 to segment 1, and group 4 to segment 2) . 4) Allocate subcarriers to subchannel in each major group separately for each OFDMA symbol by first allocating the pilot subcarriers within each cluster and then taking all remaining data subcarriers within the symbol. The exact partitioning into subchannels is according to the equation below, called a permutation formula: © subcarrier(k, s) = Nsubchannels · nk + ps [nk mod Nsubchannels ]+ ª DL P ermBase mod Nsubchannels where: • subcarrier(k, s) is the subcarrier index of subcarrier k in subchannel s, • s is the index number of a subchannel, from the set [0...Nsubchannels − 1], • nk = (k + 13 · s)mod Nsubcarriers , where k is the subcarrier-in-subchannel index from the set [0...Nsubcarriers − 1], • Nsubchannels is the number of subchannels (for PUSC use number of subchannels in the currently partitioned group), • ps [j] is the series obtained by rotating basic permutation sequence cyclically to the left s times, • Nsubcarriers is the number of data subcarriers allocated to a subchannel in each OFDMA symbol, and • DL P ermBase is an integer from 0 to 31. 21.

(40) 2.4.4. Pilot Modulation. Pilot subcarriers shall be inserted into each data burst in order to constitute the symbol. The PRBS (pseudo-random binary sequence) generator depicted in Fig. 2.9 shall be used to produce a sequence, wk . Each pilot shall be transmitted with a boosting of 2.5 dB over the average non-boosted power of each data tone. The pilot subcarriers shall be modulated according to <{ck } =. 2.4.5. ¢ 8¡1 − wk , ={ck } = 0. 3 2. (2.7). Data Modulation. As shown in Fig. 2.10, for downlink transmission, gray-mapped QPSK and Gray-mapped 16QAM shall be supported, whereas the support of 64QAM (also Gray-mapped) is optional.. 22.

(41) Chapter 3 The DSP Hardware and Associated Software Development Environment DSP implementation is the final goal of our work. The DSP on the Sundance board is TMS320C6416 made by Texas Instruments(see Fig.3.1). In this chapter, we introduce the architectures of the DSP chip.. 3.1 3.1.1. The TMS320C6416 DSP [7] TMS320c64x Features. The TMS320C64x DSPs are the highest-performance fixed-point DSP generation of the TMS320C6000 DSP devices, with a performance of up to 600 million instructions per second (MIPS) and an efficient C compiler. The TMS320C64x device is based on the secondgeneration high-performance, very-long-instruction-word (VLIW) architecture developed by Texas Instruments (TI). The C6416 device has two high-performance embedded coprocessors, Viterbi Decoder Coprocessor (VCP) and Turbo Decoder Coprocessor (TCP) that significantly speed up channel-decoding operations on-chip. But they do not apply to the work reported in this thesis. The C64x core CPU consists of 64 general-purpose 32-bits registers and 8 function units, 23.

(42) Figure 3.1: The DSP on the Sundance board. these 8 functional units contain 2 multipliers and 6 arithmetic units. C6000 features: • Advanced VLIW executes up to eight instructions per cycle and allows designers to develop highly effective RISC-like code for fast development time. • Instruction packing gives code size equivalence for eight instructions executed serially or in parallel and reduces code size, program fetches, and power consumption. • Conditional execution of all instructions reduces costly branching. and increases parallelism for higher sustained performance. • Efficient code execution on independent functional units include efficient C compiler on DSP benchmark suite. and assembly optimizer for fast development and improved parallelization. • 8/16/32-bit data support, providing efficient memory support for a variety of applications. • 40-bit arithmetic options add extra precision for applications requiring it.. 24.

(43) • Saturation and normalization provide support for key arithmetic operations. • Field manipulation and instruction extract, set, clear, and bit counting support common operation found in control and data manipulation applications. The additional features of C64x include: • Each multiplier can perform two 16×16 bits or four 8×8 bits multiplies every clock cycle. • Quad 8-bit and dual 16-bit instruction set extensions with data flow support. • Support for non-aligned 32-bit (word) and 64-bit (double word) memory accesses. • Special communication-specific instructions have been added to address common operations in error-correcting codes. • Bit count and rotate hardware extends support for bit-level algorithms.. 3.1.2. Central Processing Unit. The block diagram of the C6416 DSP is shown in the Fig. 3.2. The C64x CPU, shaded in figure, contains: • Program fetch unit. • Instruction dispatch unit. • Instruction decode unit. • Two data paths, each with four functional units. • 64 32-bit registers. 25.

(44) Figure 3.2: Block diagram of the TMS320C6416 DSP [7].. • Control registers. • Control logic. • Test, emulation, and interrupt logic. The program fetch, instruction dispatch, and instruction decode units can deliver up to eight 32-bit instructions to the functional units every CPU clock cycle. The processing of instructions occurs in each of the two data paths (A and B), each of which contains four functional units (.L, .S, .M, and .D) and 32 32-bit general-purpose registers for the C6416. 3.1.2.1. Pipeline Structure. The TMS320C64x DSP pipeline provides flexibility to simplify programming and improve performance. The pipeline can dispatch eight parallel instructions every cycle. The pipeline phases are divided into three stages as shown in Fig. 3.3. 26.

(45) Figure 3.3: Pipeline phases of TMS320C6416 DSP [7].. • Fetch has 4 phases: – PG (program address generate): The address of the fetch packet is determined. – PS (program address send): The address of the fetch packet is sent to memory. – PW (program access ready wait): A program memory access is performed. – PR (program fetch packet receive): The fatch packet is at the CPU boundary. • Decode has two phases: – DP (instruction dispatch): The next execute packet in the fetch packet is determined and sent to the appropriate functional units to be decoded. – DC (instruction decode): Instructions are decoded in functional units. • Execute has five phases: – E1: Execute 1. – E2: Execute 2. – E3: Execute 3. – E4: Execute 4. – E5: Execute 5. The pipeline operation of the C62x/C64x instructions can be categorized into seven instruction types. Six of these are shown in Table 3.1, which gives a mapping of operations 27.

(46) Table 3.1: Execution Stage Length Description for Each Instruction Type [7]. occurring in each execution phase for the different instruction types. The delay slots associated with each instruction type are listed in the bottom row. The execution of instructions can be defined in terms of delay slots. A delay slot is a CPU cycle that occurs after the first execution phase (E1) of an instruction. Results from instructions with delay slots are not available until the end of the last delay slot. For example, a multiply instruction has one delay slot, which means that one CPU cycle elapses before the results of the multiply are available for use by a subsequent instruction. However, results are available from other instructions finishing execution during the same CPU cycle in which the multiply is in a delay slot. 3.1.2.2. Functional Units. The eight functional units in the C6000 data paths can be divided into two groups of four; each functional unit in one data path is almost identical to the corresponding unit in the. 28.

(47) other data path. The functional units are described in Table 3.2 and Table 3.3. Besides being able to perform 32-bit operations, the C64x also contains many 8-bit to 16-bit extensions to the instruction set. For example, the MPYU4 instruction performs four 8×8 unsigned multiplies with a single instruction on an .M unit. The ADD4 instruction performs four 8-bit additions with a single instruction on an .L unit. The data line in the CPU supports 32-bit operands, long (40-bit) and double word (64bit) operands. Each functional unit has its own 32-bit write port into a general-purpose register file (listed in Fig. 3.4). All units ending in 1 (for example, .L1) write to register file A, and all units ending in 2 write to register file B. Each functional unit has two 32-bit read ports for source operands src1 and src2. Four units (.L1, .L2, .S1, and .S2) have an extra 8-bit-wide port for 40-bit long writes, as well as an 8-bit input for 40-bit long reads. Because each unit has its own 32-bit write port, when performing 32-bit operations all eight units can be used in parallel every cycle.. 3.1.3. Memory Architecture. The C64x has a 32-bit, byte-addressable address space. Internal (on-chip) memory is organized in separate data and program spaces. When off-chip memory is used, these spaces are unified on most devices to a single memory space via the external memory interface (EMIF). The C64x has two 64-bit internal ports to access internal data memory have and a single internal port to access internal program memory, with an instruction-fetch width of 256 bits. A variety of memory options are available for the C6000 platform. In our system, the memory types we can use are: • On-chip RAM, up to 875 MB. • Program cache. 29.

(48) Table 3.2: Functional Units and Operations Performed [7]. 30.

(49) Table 3.3: Functional Units and Operations Performed (Continued) [7]. 31.

(50) Figure 3.4: TMS320C64x CPU data paths [7].. 32.

(51) • 32-bit external memory interface supports SDRAM, SBSRAM, SRAM, and other asynchronous memories. • Two-level caches [8]. Level 1 cache is split into program (L1P) and data (L1D) cache. Each L1 cache is 16 KB. Level 2 memory is configurable and can be split into L2 SRAM (addressable on-chip memory) and L2 cache for caching external memory locations. The size of L2 is 1 MB. External memory can be several MB large. The access time depends on the memory technology used but is typically around 100 to 133 MHz. In our system, the external memory usable by the DSP is a 32 MB SDRAM.. 3.2. The Code Composer Studio Development Tools [9], [10]. We now introduce the software environment used in our work. TI supports a useful GUI development tool set to DSP users for developing and debugging their projects: the Code Composer Studio (CCS). The CCS development tools are a key element of the DSP software and development tools from TI. The fully integrated development environment includes realtime analysis capabilities, easy to use debugger, C/C++ compiler, assembler, linker, editor, visual project manager, simulators, XDS560 and XDS510 emulation drivers and DSP/BIOS support. Some of CCS’s fully integrated host tools include: • Simulators for full devices, CPU only and CPU plus memory for optimal performance. • Integrated visual project manager with source control interface, multi-project support and the ability to handle thousands of project files. • Source code debugger common interface for both simulator and emulator targets: 33.

(52) – C/C++/assembly language support. – Simple breakpoints. – Advanced watch window. – Symbol browser. • DSP/BIOS host tooling support (configure, real-time analysis and debug). • Data transfer for real time data exchange between host and target. • Profiler to analyze code performance. CCS also delivers “foundation software” consisting of: • DSP/BIOS kernel for the TMS320C6000 DSPs. – Pre-emptive multi-threading. – Interthread communication. – Interrupt handling. • TMS320 DSP Algorithm Standard to enable software reuse. • Chip Support Libraries (CSL) to simplify device configuration. CSL provides Cprogram functions to configure and control on-chip peripherals. TI also supports some optimized DSP functions for the TMS320C64x devices: the TMS320C64x digital signal processor library (DSPLIB). This source code library includes C-callable functions (ANSI-C language compatible) for general signal processing mathematical and vector functions [11]. The routines included in the DSP library are organized as follows:. 34.

(53) • Adaptive filtering. • Correlation. • FFT. • Filtering and convolution. • Math. • Matrix functions. • Miscellaneous.. 3.3. Code Optimization Methods [12]. The recommended code development flow involves utilizing the C6000 code generation tools to aid in optimization rather than forcing the programmer to code by hand in assembly. This makes the compiler do all the laborious work of instruction selection, parallelizing, pipelining, and register allocation, which simplifies the maintenance of the code, as everything resides in a C framework that is simple to maintain, support, and upgrade. The recommended code development flow for the C6000 involves the phases described in Fig. 3.5. The tutorial section of the Programmer’s Guide [12] focuses on phases 1 and phase 2, and the Guide also instructs the programmer about the tuning stage of phase 3. What is learned is the importance of giving the compiler enough information to fully maximize its potential. An added advantage is that this compiler provides direct feedback on the entire program’s high MIPS areas (loops). Based on this feedback, there are some simple steps the programmer can take to pass complete and better information to the compiler to maximize the compiler performance.. 35.

(54) Figure 3.5: Code development flow for TI C6000 DSP [12]. 36.

(55) The following items list the goal for each phase in the software development flow shown in Fig. 3.5. • Developing C code (phase 1) without any knowledge of the C6000. Use the C6000 profiling tools to identify any inefficient areas that we might have in the C code. To improve the performance of the code, proceed to phase 2. • Use techniques described in [12] to improve the C code. Use the C6000 profiling tools to check its performance. If the code is still not as efficient as we would like it to be, proceed to phase 3. • Extract the time-critical areas from the C code and rewrite the code in linear assembly. We can use the assembly optimizer to optimize this code. TI provides high performance C program optimization tools, and they do not suggest the programmer to code by hand in assembly. In this thesis, the development flow is stopped at phase 2. We do not optimize the code by writing linear assembly. Coding the program in high level language keeps the flexibility of porting to other platforms.. 3.3.1. Compiler Optimization Options [9], [10]. The compiler supports several options to optimize the code. The compiler options can be used to optimize code size or execution performance. Our primary concern in this work is the execution performance. Hence we do not care very much about the code size. The easiest way to invoke optimization is to use the cl6x shell program, specifying the -on option on the cl6x command line, where n denotes the level of optimization (0, 1, 2, 3) which controls the type and degree of optimization: • -o0. 37.

(56) – Performs control-flow-graph simplification. – Allocates variables to registers. – Performs loop rotation. – Eliminates unused code. – Simplifies expressions and statements. – Expands calls to functions declared inline. • -o1. Performs all -o0 optimization, and: – Performs local copy/constant propagation. – Removes unused assignments. – Eliminates local common expressions. • -o2. Performs all -o1 optimizations, and: – Performs software pipelining. – Performs loop optimizations. – Eliminates global common subexpressions. – Eliminates global unused assignments. – Converts array references in loops to incremented pointer form. – Performs loop unrolling. • -o3. Performs all -o2 optimizations, and: – Removes all functions that are never called. – Simplifies functions with return values that are never used. – Inlines calls to small functions. 38.

(57) – Reorders function declarations so that the attributes of called functions are known when the caller is optimized. – Propagates arguments into function bodies when all calls pass the same value in the same argument position. – Identifies file-level variable characteristics. The -o2 is the default if -o is set without an optimization level. The program-level optimization can be specified by using the -pm option with the o3 option. With program-level optimization, all of the source files are compiled into one intermediate file called a module. The module moves through the optimization and code generation passes of the compiler. Because the compiler can see the entire program, it performs several optimizations that are rarely applied during file-level optimization: • If a particular argument in a function always has the same value, the compiler replaces the argument with the value and passes the value instead of the argument. • If a return value of a function is never used, the compiler deletes the return code in the function. • If a function is not called directly or indirectly, the compiler removes the function. When program-level optimization is selected in Code Composer Studio, options that have been selected to be file-specific are ignored. The program level optimization is the highest level optimization option. We use this option to optimize our code.. 3.3.2. Using Intrinsics. The C6000 compiler provides intrinsics, which are special functions that map directly to C64x instructions, to optimize the C code performance. All instructions that are not easily 39.

(58) expressed in C code are supported as intrinsics. Intrinsics are specified with a leading underscore ( ) and are accessed by calling them as we call a function. A table of TMS320C6000 C/C++ compiler intrinsics can be found in [12].. 40.

(59) Chapter 4 Uplink Channel Estimation and DSP Implementation The aim of our work is the algorithm design and DSP implementation of IEEE 802.16e OFDMA transmission system. From implementation consideration, we use simple channel estimation techniques such as linear interpolation in frequency domain and simple improvement methods in time domain. We evaluate the performance of each channel estimation method mainly by observing the symbol error rate (SER) and the mean square error (MSE).. 4.1. Channel Estimation Techniques. Channel estimators in OFDMA system usually need pilot information as reference. A fading channel requires constant tracking, so pilot information has to be transmitted continuously. In general, the fading channel can be viewed as a two-dimensional (2-D) signal (time and frequency), whose values are sampled at pilot positions. We consider three topics in this section, which are channel estimation at pilot subcarriers, interploation schemes and time-domain improvement methods. More specifically we use the least-squares (LS) technique to estimate the channel response at pilots, use linear interpolation to estimate the frequency response at nonpilot subcarriers in the frequency 41.

(60) domain, and consider two ways of time-domain improvement including simple average and exponential average. These are discussed seperately in the following subsections.. 4.1.1. The Least-Squares (LS) Estimator. Based on the priori known data, we can estimate the channel information on pilot carriers roughly by the least-squares (LS) estimator. An LS estimator minimizes the squared error [13] ˆ LS X||2 ||Y − H. (4.1). where Y is the received signal and X is a priori known pilots, both in the frequency domain ˆ LS is an N × N matrix whose and both being N × 1 vectors where N is the FFT size. H values are 0 except at pilot locations mi where i = 0, · · ·  Hm0 ,m0 · · · 0 ··· 0  0 · · · H · · · 0 m1 ,m1   ˆ 0 ··· 0 · · · Hm2 ,m2 HLS =   0 ··· 0 ··· 0 0 ··· 0 ··· 0. , Np − 1: ··· ··· ··· ··· ···. 0 0 0 0.    .  . (4.2). HmNp −1 ,HmNp −1. Therefore, (4.1) can be rewritten as ˆ LS (m)X(m)]2 , for all m = mi . [Y (m) − H. (4.3). Then the estimate of pilot signals, based on only one observed OFDMA symbol, is given by ˆ LS (m) = Y (m) = X(m)H(m) + N (m) = H(m) + N (m) H X(m) X(m) X(m). (4.4). where N (m) is the complex white Gaussian noise on subcarrier m. We collect HLS (m) into ˆ p,LS , an Np × 1 vector where Np is the total number of pilots, as H ˆ p,LS = [Hp,LS (0) Hp,LS (1) · · · Hp,LS (Np − 1)]T H =. p −1) T , Yp (1) , . . . , XYpp(N ] . [ XYpp(0) (0) Xp (1) (Np −1). The LS estimator is a simplest channel estimator one can think of. 42. (4.5).

(61) 4.1.2. Linear Interpolation. After obtaining the channel response estimate at the pilot subcarriers, we use interpolation to obtain the response at the rest of the subcarriers. Linear interpolation is a commonly considered scheme due to its low complexity. It does the interpolation between two known data. That is, we use the channel information at two pilot subcarriers obtained by the LS estimator to estimate the channel frequency response information at the data subcarriers between them. We also use linear extrapolation to estimate the response as the data subcarriers beyond the outermost pilot subcarriers. The channel estimatw at data subcarrier k, mL < k < (m + 1)L , using linear interpolation is given by [14] He (k) = He (m + l) = (Hp (m + 1) − Hp (m)). l + Hp (m) L. (4.6). where Hp (k), k = 0, 1, · · · , Np , are the channel frequency responses at pilot subcarriers, L is the pilot subcarriers spacing, and 0 ≤ l < L.. 4.1.3. Time Averaging. We also consider processing the channel information along the time axis to get better estimation. Averaging several channel responses over a period of time should mitigate the influence of noise. Coherence time is a statistical measure of the time duration over which the channel impulse response is essentially invariant. It quantifies the similarity of the channel response at different times. The channel can be considered slowly varying if the coherence time is greater than the OFDMA symbol period. The channel may even be assumed to be static over one or several reciprocal Doppler spread intervals. For example, assume the SS moves at a speed of 60 km/h. The maximum Dopper shift. 43.

(62) with a center frequency 3.5 GHz can be calculated as fm =. v = 194.44 Hz. λ. (4.7). The corresponding coherence time is approximately [15] Tc ≈. 9 = 920.83 µs. 16πfm. (4.8). Consider an OFDMA system of bandwidth 20 MHz, and using 2048-FFT and 256-point cyclic prefix. The symbol period is ¡. (2048 + 256) b. 28 ·20M 25. 8000. ¢ = 102.86 µs. c × 8000. (4.9). 920.83 c = 8 symbols can be regarded static. Thus we may Hence, the channel response over b 102.86. use simple averaging over 3 symbols to reduce noise effect as Havg (k) =. interp interp H0interp (k) + H−1 (k) + H−2 (k) 3. (4.10). where Hninterp (k) is the interpolated channel response at the previous nth symbol time. If the channel remains static, over a longer time period, we may use more symbols in the averaging to reduce the noise effect more effectively. But then the storage requirement and the computational complexity both increase, a simple way to take more (or less) symbols into the average effectively and yet without the storage and complexity penalty is exponential averaging:. ½ ˜ exp (f ) h n. =. ˜ exp (f ) + (1 − w) · h ˜ interp (f ), n > 1, w·h n−1 n ˜ interp (f ), h n = 1, n. (4.11). ˜ exp (f ) is the estimated channel after exponential averaging at nth symbol time, where h n ˜ interp (f ) is the channel response by using only the interpolation discussed before at the nth h n symbol time, and w is the exponential factor. Exponential averaging may yield better performance than simple moving average when the channel is very static, but its performance may degrade more significantly than that of 44.

(63) Figure 4.1: Tile structure.. moving average in fading channels. We will compare the performance at different values of w and in different conditions later.. 4.1.4. Application to IEEE 802.16e OFDMA Uplink. As described before in chapter 2, uplink transmission uses tile structure to transmit pilot and data information. Fig 4.1 shows an example of tile transmission. Within a tile, we first estimate the channel response at each pilot position. Second, we interpolate the frequency response at data subcarriers in symbol 1 and 3 by the estimated pilot. Last, we get the frequency response of symbol 2 by time averaging the channel response estimates of symbols 1 and 3. We give the detail steps for channel estimation as follows: • Estimate the channel response at each pilot location by using the LS technique. • Use the linear interpolation scheme to get the data subcarrier response in symbols 1 and 3 from the estimated values at pilot locations. • Estimate the channel response at middle symbol that contains no pilots in a tile by. 45.

(64) Table 4.1: OFDMA Uplink Parameters Parameters Values Bandwidth 20 MHz Central frequency 3.5 GHz Nused 1681 Sampling factor n 28/25 G 1/8 NF F T 2048 Sampling frequency 22.4 MHz Subcarrier spacing 10.94 kHz Useful symbol time 91.43 µs CP time 11.43 µs OFDMA symbol time 102.86 µs Sampling time 44.65 ns. averaging the first and third symbols in the time domain as H2est (f ). H1interp (f ) + H3interp (f ) . = 2. (4.12). Exponential averaging is an alternative.. 4.2. Simulation Parameters and Channel Model. This section gives the parameters and introduce the channel model used in our simulation work.. 4.2.1. OFDMA Uplink System Parameters. In chapter 2, we introduced the primitive and the derived parameters of the system. The system parameters used in our simulation are listed in Table 4.1.. 46.

(65) 4.2.2. Simulation Channel Model. Erceg et al [16] published a total of 6 different radio channel models for type G2 (i.e, LOS and NLOS) MMDS BWA systems in three terrain categories. The three types in suburban area are • A: hilly terrain, heavy tree, • C: flat terrain, light tree, and • B: between A and C. The correspondence with the so-called SUI channels is: • C: SUI-1, SUI-2, • B: SUI-3, SUI-4, and • A: SUI-5, SUI-6. In the above, SUI-1 and SUI-2 are Ricean multipath channels, whereas the other four are from Hari and are Rayleigh multipath channels. The Rayleigh channels are more hostile and exhibit a greater rms delay spread. And the SUI-2 represents a worst case link for terrain type C. We employ SUI-2 and SUI-3 model in our simulation, but we use Rayleigh fading to model all the paths in these channels. The channel charateristics are as shown in Table 4.2.. 4.3 4.3.1. Simulation Results Simulation Flow. Figure 4.2 illustrates our simulated system. We assume perfect synchronization and omit it in our simultion. After channel estimation, we calculate the MSE between the real channel 47.

(66) Table 4.2: Channel Profiles of SUI-2 and SUI-3 [16]. 48.

(67) Figure 4.2: Block diagram of the simulated system.. and the estimated one, where the average is taken over the subcarriers. The symbol error rate (SER) can also be obtained after demapping.. 4.3.2. Validation of Simulation Model. Before considering multipath channels, we do simulation with an AWGN channel to validate the simulation model. We validate the model by comparing theoretical SER curves and the SER curves resulting from simulations. For an even number of bits per symbol, the SER of rectangular QAM is given by ! ¶ Ãr µ 3 Es 1 Q (4.13) Ps = 4 1 − √ M − 1 N0 M where • M = number of symbols in modulation constellation; for example, M = 4 for QPSK, M = 16 for 16QAM and M = 64 for 64QAM, • Es = average symbol energy, • N0 = noise power spectral density (W/Hz), and • Q(x) =. √1 2π. R∞ x. 2 /2. e−t. dt, x ≥ 0. 49.

(68) Uplink__QPSK__AWGN. 0. 10. Simulated−−no estimation error theory −2. 10. −4. SER. 10. −6. 10. −8. 10. −10. 10. 0. 2. 4. 6. 8. 10. 12. 14. Es/No. Figure 4.3: The SER curve for uncoded QPSK resulting from simulation matches the theoretical one.. In Figure 4.3, the theoretical symbol error rate (SER) curve versus Es /N0 for uncoded QPSK is plotted together with the SER curve resulting from the simulation. In this figure, we simulate for no channel estimation error. This validates the simulation (we use C/C++ programming language and TI’s code composer studio).. 4.3.3. Floating-point Simulation. In our simulation, we assume using 10 subchannels to transmit. Figure 4.4 shows the performance of tile linear interpolation with different exponential weighting in AWGN. The method of with weighting w = 0.9 has the best SER and MSE. But in the condition of SUI2, velocity being 60, this becomes the worst situation in both SER and MSE (see Fig. 4.5). It is because in multipath such as SUI-2, the variance of channel condition is much violent than in AWGN. We also get the validation from the analysis of in given velocity, calculating 50.

(69) the MSE by using the variance of Bessel function. Therefore, using exponential weighting of previous tile can not help estimate validly. Figure 4.6 shows tile linear interpolation with exponential weighting 0.9 in SUI-2 with different velocity of QPSK. We use no tile exponential averaging in following work. Figure 4.7 illustrates tile linear interpolation with different modulations (uncoded QPSK, 16QAM and 64QAM) in AWGN. We compare our simulation results with theory and no estimation error curves in SER (Fig. 4.7(b)). Figure 4.8 shows tile linear interpolation compared with another theory curve which takes data MSE into consideration in AWGN. Figure 4.7(a) shows the MSE curves of these three modulation types. The three lines match with each other as a straight line with slope m = −1. The results of MSE are unrelated to the modulation type because the pilots are BPSK modulated in each modulation case. And the channel response is interpolated only using the pilot information.[17] The simulation of tile linear interpolation with different velocity and different modulation in SUI-2 is given in Fig. 4.9, and Fig. 4.10 gives only QPSK in SUI-3. Figure 4.11 illustrates tile linear interpolation with different used number of subchannels in AWGN and SUI-2. We use 60 tiles to transmit in occupying 10 subchannels while using 120 tiles for 20 subchannels.. 51.

(70) Uplink__QPSK__AWGN. 0. 10. tile linear interpolation + exp weighting w=0.2 + exp weighting w=0.5 + exp weighting w=0.9 no estimation error. −1. MSE. 10. −2. 10. −3. 10. −4. 10. 0. 5. 10 Es/No. 15. 20. (a) Uplink__QPSK__AWGN. 0. 10. −2. 10. −4. SER. 10. −6. 10. tile linear interpolation + exp weighting w=0.2 + exp weighting w=0.5 + exp weighting w=0.9 no estimation error theory. −8. 10. −10. 10. 0. 2. 4. 6. 8. 10. 12. 14. Es/No. (b). Figure 4.4: Tile linear interpolation with different exponential weighting in AWGN with QPSK. (a) MSE. (b) SER.. 52.

(71) Uplink__QPSK__SUI2(v=60). 0. 10. −1. MSE. 10. −2. 10. tile linear interpolation + exp weighting w=0.2 + exp weighting w=0.5 + exp weighting w=0.9 no estimation error. −3. 10. 0. 5. 10. 15 Es/No. 20. 25. 30. 25. 30. (a) Uplink__QPSK__SUI2(v=60). 0. 10. −1. SER. 10. −2. 10. tile linear interpolation + exp weighting w=0.2 + exp weighting w=0.5 + exp weighting w=0.9 no estimation error. −3. 10. 0. 5. 10. 15 Es/No. 20. (b). Figure 4.5: Tile linear interpolation with different exponential weighting in SUI-2 with velocity v=60 km/hr with QPSK. (a) MSE. (b) SER.. 53.