Comparison of Adaptive LS-LMS FDE and MPIC Golay-MPIC TDE

Chapter 4 Architecture Design and Performance Analysis

4.4 Comparison of Adaptive LS-LMS FDE and MPIC Golay-MPIC TDE

From Section 4.1 and Section 4.3, the contents describe the architecture design and simulation result of equalizations for IEEE 802.15.3c standard. This section will compare the advantage of LS-LMS FDE and Golay-MPIC TDE. Furthermore, we will explain different equalizers shall be used for different environment.

Table 4-8 Synthesis result of LS-LMS FDE and Golay-MPIC TDE

LS-LMS FDE Golay-MPIC TDE

FFT

Without 2 FFT

Including 2 FFT

Processing

65nm CMOS Process

Clock rate (MHz)

400

Area (Gate count)

415K 1723K 405K

Power(mW)

81@1.08v 211@1.08v 88@1.08v

In Table 4-8, the area and power consumption of LS-LMS FDE is much larger than that of Golay-MPIC TDE. The most important factor is FFT, because FDE has feedback loop which needs FFT to transform data to time or frequency domain. On the other hand, the data stream of TDE is straightforward, and the only one FFT is for OFDM mode which is not an overhead for the system. The computation complexity of LS-LMS FDE and Golay-MPIC TDE are listed in Table 4-9. Golay-MPIC TDE uses more complex multipliers in MPIC stage and complex adders in OGC stage, but LS-LMS FDE uses more memory resources for FFT feedback loop to store received

Table 4-9 Computation complexity of LS-LMS FDE and Golay-MPIC TDE

LS-LMS FDE Golay-MPIC TDE Complex

multiplication

1 (20 bits x 14 bits) 3 (21 bits x 15 bits)

Complex conjugate multiplication

1 (18 bits x 15 bits) 2 (18 bits x 14 bits)

Modified divider

(scalar multiplier)

1 (13 bits x 11 bits) 1 (13 bits x 11 bits)

Complex

Adders

3 (16 bits + 15 bits) 18 (16 bits + 15 bits)

Single-port memory

12 (64 bits x 64 bits) 2 (144 bits x 16 bits)

Dual-port memory

8 (64 bits x 64 bits) 0

These two equalizers have their own advantages, and Table 4-10 is the comparison of LS-LMS FDE and Golay-MPIC TDE. For hardware complexity consideration, we propose a new architecture which is Golay-MPIC TDE for better channel. Since the architecture of TDE is simpler, pipeline can be much deeper. If the channel condition allowed, the low-cost TDE is a better choice. However, LS-LMS FDE is superior to Golay-MPIC TDE when channel is not LOS or the change of channel is fierce. It’s a tradeoff to choose LS-LMS FDE or Golay-MPIC TDE.

Table 4-10 Comparison of LS-LMS FDE and Golay-MPIC TDE

LS-LMS FDE Golay-MPIC TDE

Area and Power × ○

High sampling rate × ○

Variant Channel ○ ×

Chapter 5 Baseband Design and Chip

Implementation

This chapter discusses the baseband receiver chip implementation of the proposed adaptive LS-LMS FDE and LOS MPIC TDE in Section 5.1 and 5.2, respectively.

5.1 Adaptive LS-LMS FDE in Baseband Receiver [34]

5.1.1 Chip Integration and Implementation Result

The overall block diagram excluding channel decoder of 8x parallel baseband receiver for IEEE 802.15.3c SC/HSI mode is redrawn in Fig. 5-1 where the dashed line represents control signals. All the function units work in digital domain. The simulation models of overall baseband are built with MATLAB and Verilog HDL.

FFT FDE

Fig. 5-1 Proposed block diagram of baseband receiver design

The control of the data flow is in sequence block by block. In other words, the executed block will output control signals to trigger the next block and is then turned into sleeping mode to avoid redundant power consumption.

Considering the memory usage requirements, the baseband receiver design is synthesized and implemented by standard cell design flow using TSMC 65 nm 1P9M low power (LP) process with voltage supply 1 V at 667 MHz. The total area of baseband receiver circuit excluding FFT is about 701k gate count. The FFT blocks which including two FFT and one IFFT occupy 1,932k gate count. Fig. 5-2 illustrates the area proportion of each block circuit excluding FFT. The boundary detection (BD) block only active in preamble period, so it can share memory resource with FDE in data payload period. The shared memories are 32.68% of the system excluding FFT.

FDE

Fig. 5-2 Area proportion of each block circuit excluding FFT

In auto-placing and routing (APR), there are totally 144 32-word by 32-bit dual port memories (32×32 dual port memories) for the FFT usage which occupy most of the chip area. For a better IR drop and EM management, the memory datasheet

suggests that the width of block rings around memory block is at least 5μm.

Fig. 5-3 Size of 32×32 dual port memory with power rings

The area of one 32-word by 32-bit dual port memory is 247.12μm × 37.215μm. As shown in Fig. 5-3, the occupied area will increase to (273.12×63.215) ÷ (247.12×

37.215) = 1.8 times of the original area after surrounding by block rings with default spacing. Comparing the area with registers as listed in Table 5-1, the 1.8 times memory area will be larger than the area of registers. For area efficiently, the 32-word by 32-bit dual port memories will be replaced with registers such that 28.48 % memory area for FFT will be saved.

Table 5-1 Area and power comparison of 32×32 dual port memory and register

Area (μm

) Power (mW)

Register

11838.60 3.23

32×32 dual port memory

1P9M general purpose (GP) process with voltage supply 1 V for higher clock rate and more timing margin for APR. The chip layout view without pads is shown in Fig. 5-4.

The core size is 2556μm × 3056μm with 65.91% utilization density. The post-layout simulation shows that the proposed baseband receiver design can achieve 333 MHz.

Table 5-2 is the chip summary. Until now, no previous works have been reported for the baseband designs.

FFT IFFT

FFT FEQ

SCO interpolator

FCFO

estimation Shard memory

PD&BD&

CFO detection

Fig. 5-4 1^st Chip layout view of the proposed baseband receiver Table 5-2 1^st chip summary (using LS-LMS FDE)

Process

TSMC 65 nm 1P9M GP process

Sampling Frequency (MHz)

2640

Clock Frequency (MHz)

330

Total Gate Count

3,463 K (100 %)

Gate Count (excluding FFT)

1,249 K (36.07 %)

Gate count (FFT block)

2,214 K (63.93 %)

Core Area

7.81 mm² (2.56 mm × 3.05 mm)

Utilization

65.91 %

Mode

SC HSI

BER (uncoded) @ 12 dB

8.92×10^-4 1.43×10^-2

Date Rate (Gbps)

3.52 5.28

Power (mW)

793.98

Leakage Power (mW)

78.31

5.1.2 Measurement Consideration

Due to the 8x parallelism, the chip has plenty of input and output bits. In order to reduce the number of pads and verified the function correctness easily, the testing plan is shown in Fig. 5-5. The input data can be given from outside of the chip or from pattern stored inside the chip. The comparator will check the correctness of computation results for the stored pattern.

Baseband

Fig. 5-5 Testing diagram for measurement

The measurement results show that the chip is completely function as expected.

Due to the limitation of area provided by shuttle program from the foundry, the power rings and power stripes of this chip are very restricted. Less power rings and power stripes could cause IR drop and decrease the chip operating speed. Although the clock frequency does not achieve our target frequency, the power consumption trend is correct as shown at Table 5-3. If clock frequency is 333 MHz shown in Table 5-2, the power consumption is about 741.75 mW estimated from the power consumption trend.

Fig. 5-6 is the die photo of LS-LMS FDE chip.

Table 5-3 Power measurement of 1^st chip

Clock Frequency (MHz)

1 10 20 30 40

Power (mW)

33.8 51.9 65.5 78.2 89.1

Fig. 5-6 Die photo of 1^st Chip baseband receiver

Fig. 5-7 shows the functional test of this baseband receiver. If the results are all correct, the O_CHECK_RESULT will be high (1) and O_ERR_COUNT will be low (0). At beginning, O_CHECK_RESULT is low (0), since the outputs of the system are not ready which are still in initial state (0). When computed outputs are arrived and compared with the stored results, O_CHECK_RESULT starts to work and set to be high (1).

Fig. 5-7 Functional test of 1^st chip

Fig. 5-8, Fig. 5-9, and Fig. 5-10 are the testing and setup platform. We use Agilent 93000 SOC test system with CQFP 160 pins socket to test the chip. The equipment is provided by National Chip Implementation Center (CIC).

Fig. 5-8 Agilent 93000 SOC test system

Fig. 5-9 CQFP 160 pins socket

Fig. 5-10 Wire connection of CQFP 160 pins socket

5.2 LOS MPIC Golay-MPIC TDE in Baseband Receiver

5.2.1 Chip Integration and Implementation Result

The block diagram excluding channel decoder of 8x parallel baseband receiver for IEEE 802.15.3c SC/HSI mode is redrawn in Fig. 5-11 where the dashed line represents control signals. The simulation models of overall baseband are built with MATLAB and Verilog HDL. The control of the data flow is in sequence block by block. In other words, the executed block will output control signals to trigger the next block and is then turned into sleeping mode to avoid redundant power consumption.

Fig. 5-11 Proposed block diagram of baseband receiver design

TSMC 65nm 1P9M general purpose process with voltage supply 1 V has update the library, and the new library becomes faster but larger power consumption as compared with the old library that the version I chip uses. We use the new library to implement this chip for TSMC tape-out flow, and synthesize this chip at 667 MHz by two kinds of library to compare the differences. Table 5-4 shows the new library has 1.42 times power consumption to the old library, but the areas are almost the same.

Table 5-4 Comparison of old and new TSMC 65nm GP process libraries

Old Library New Library

Area (Gate-count)

2416K 2407K

Power (mW)

1169 1660

The total area of baseband receiver circuit is about 2479K gate count. Fig. 5-12 illustrates the area proportion of each block circuit. The shared memory is shared by BD, TDE and PNC blocks. BD uses the memory in preamble period, and TDE uses it in data payload. Finally, PNC utilizes it in PCES field.

Fig. 5-12 Area proportion of each block circuit

Since the memory elements are replaced with register which is explained in Section 5.1.1, we can use TSMC 65 nm 1P9M general purpose (GP) process with voltage supply 1 V for higher clock rate and more timing margin for APR. The chip layout view is shown in Fig. 5-13. The core size is 2820μm × 2820μm with 88.93%

utilization density. The post-layout simulation shows that the proposed baseband

receiver design can achieve 336.7 MHz. Table 5-5 is the chip summary.

Fig. 5-13 2^nd Chip layout view of the proposed baseband receiver Table 5-5 2^nd Chip summary(using Golay-MPIC TDE)

Process

TSMC 65 nm 1P9M GP process

Sampling Frequency (MHz)

2640

Clock Frequency (MHz)

330

Total Gate Count

2915 K

Core Area

7.95 mm² (2.82 mm × 2.82 mm)

Utilization

88.93 %

Mode

SC HSI

BER (uncoded) @ 12 dB

7.36×10^-5 9.30×10^-6

Date Rate (Gbps)

3.52 5.28

Power (mW)

1116.7

Leakage Power (mW)

87.42

5.2.2 Measurement Consideration

As a result of 8x parallelism, the chip has massive input and output bits. However, we use different method from last version to verify this chip. For testing the function correction, we use high SNR data pattern, so the bit width of input data can be reduced to only 5 bits. In Fig. 5-14, the Pseudo Random Binary Sequence (PRBS) block will generate noise which is attached to the end of input data. In that way, we can save the area of stored pattern, and import data from outside.

Fig. 5-14 Testing diagram for measurement

Chapter 6 Conclusion and Future Work

6.1 Architecture Design Summary

This thesis proposes an adaptive LS-LMS FDE and LOS Goaly-MPIC TDE that can satisfy the dual mode (SC and HSI) specifications of IEEE 802.15.3c. The hardware of both methods can be shared by SC and HSI mode to reduce hardware complexity. The BER and sampling rate can achieve the requirement of IEEE 802.15.3c.

The LS-LMS FDE combines LMS adaptive algorithm with LS channel estimation.

The LMS algorithm has the advantage of low computational complexity and sufficient convergence speed with the aid of LS channel estimation. The simulation results show that the LS-LMS FDE can achieve 6.01*10^-4 BER in SC mode and 9.68*10^-3 BER in HSI mode (both uncoded) at SNR 12 dB. The total area is about 415K gate-count with 69% shared among SC and HSI mode except 2 FFT. The power consumption excluding FFT is only 81.27 mW when working at 400MHz.

On the other hand, the Golay-MPIC TDE uses MPIC equalization with Golay sequence-aided channel estimation. The MPIC algorithm can reduce the hardware complexity unlike traditional time-domain equalizer and Golay sequence-aided channel estimation will eliminate the AWGN noise. The Golay-MPIC TDE can achieve 2.53*10^-4 BER in SC mode and 4.22*10^-5 BER in HSI mode (both uncoded) at SNR 12dB. The total area is about 405K gate-count with 99% shared by SC and HSI mode. The power consumption is only 88mW when working at 400 MHz.

6.2 Chip Implementation Summary

The proposed different domain architectures are integrated in two indoor wireless communication baseband receiver systems. For the high speed and area efficiency considerations, the overall system designs are implemented using 65 nm 1P9M CMOS GP process under supply voltage 1.0 V.

The LS-LMS FDE chip occupies 7.81mm² core area with 65.91% utilization, and the clock rate is 333 MHz. The data rate of SC and HSI mode can achieve 3.52 Gbps and 5.28 Gbps, respectively. Also, the power consumption is 793.98 mW. The shared memory is 32.68% of the baseband system which is shared by BD and FDE blocks.

The core area of Golay-MPIC TDE chip is 7.95 mm² with 88.93% utilization, and the clock rate is 336.7 MHz. The data rate of SC and HSI mode can achieve 3.52 Gbps and 5.28 Gbps, respectively. Also, the power consumption is 1.12 W. The BD, TDE and PNC blocks use the same shared memory which is 37% of the baseband system.

6.3 Future Work

In the future, we will consider the modifications on the Golay-MPIC TDE algorithm to deal with the effects of variant channel and NLOS channel. As regards the chip implementation, we will reduce the core area and power consumption. Also, 10 Gbps data rate is our design target in the future. Higher QAM modulation, deeper pipeline, and more parallels architecture can achieve the 10 Gbps data rate goal.

Reference

[1] S.K. Yong and C.C. Chong, “An overview of multigigabit wireless through millimeter wave technology: Potentials and technical challenges,” EURASIP

Journal on Wireless Communications and Networking, vol. 2007, 2007

[2] L. Caetano, and S. Li, “Benefits of 60 GHz,” Sibeam Corp., Nov., 2005.

[3] T.C. Wei, “Synchronization Design for DVB-T/H and Indoor Wireless Receiver,”

Master Thesis, Dept. of EE, National Chiao Tung University, Hsinchu, Taiwan, May. 2011.

[4] IEEE 802.15.3c-2009, IEEE P802.15 Working Group for Wireless Personal Area Networks, Oct. 2009.

[5] IEEE Std. P802.11 TGad D0.1, “PHY/MAC Complete Proposal Specification,”

IEEE, May, 2010.

[6] D. Falconer, S. L. Ariyavisitakul, A. Benyamin-Seeyar, and B. Eidson,

“Frequency Domain Equalization for Single-Carrier Broadband Wireless Systems”, IEEE Communications Magazine, vol. 40, no. 4, 2002, pp. 58-66.

[7] R, Fisher, “60 GHz WPAN Standardization within IEEE 802.15.3c,” Proc.

Signals, Syst. and Electroni. Symp., pp. 103-105, Aug. 2007.

[8] C. Koh, “The Benefits of 60 GHz Unlicensed Wireless Communications,”

Deployment White Papers, YDI Wireless.

[9] S. Yong, "TG3c channel modeling sub-committee final report," IEEE P802.15 Working Group for Wireless Personal Area Networks, IEEE802.15-07-0584-00-oo3c, Jan. 2007.

[10] channel-model-matlab-code-release [Online], IEEE 802.15 WPAN Millimeter Wave Alternative PHY Task Group 3c (TG3c), Available:

http://www.ieee802.org/15/pub/TG3c_contributions.html.

[11] T.Y. Liu, “Design of Fast Convergent Adaptive Frequency-Domain Equalizer for Single Carrier Indoor Wireless Receiver,” Master Thesis, Dept. of EE, National Chiao Tung University, Hsinchu, Taiwan, Oct. 2009.

[12] M. G. Bellanger, “Adaptive Digital Filters”, 2nd ed., New York: Marcel Dekker, 2001.

[13] P. S. R. Diniz, “Adaptive Filtering: algorithms and practical implementation”, 2nd ed., Boston: Kluwer Academic Publishers, 2002.

[14] Y. Yang, Y. H. Chew, and T. T. Tjhung, “Adaptive frequency-domain equalization for space-time block-coded DS-CDMA downlink,” IEEE International

Conference on Communications, vol. 4, May 2005, pp. 2343 – 2347.

[15] R. Kimura, R. Funada, Y. Nishiguchi, M. Lei, T. Baykas, C. S. Sum, J. Wang, A.

Rahman, Y. Shoji, H. Harada and S. Kato, “Golay Sequence Aided Channel Estimation for Millimeter-Wave WPAN Systems,” Personal, Indoor and Mobile Radio Communications, Sept. 2008.

[16] K. Ishihara, K. Takeda, and F. Adachi, “Iterative Channel Estimation for Frequency-Domain Equalization of DSSS Signals,” IEICE TRANS. COMMUN., vol. E90–B, no.5, MAY 2007 .

[17] K. Amis and D.L. Roux, “Predictive decision feedback equalization for space time block codes with orthogonality in frequency domain Personal,” IEEE 16th

International Symposium on Indoor and Mobile Radio Communications, vol.

2, 11-14 Sept. 2005, pp. 1140 – 1144.

[18] R. Kumar and M. Khan, “Mitigation of multipath effects in broadband wireless systems using quantized state adaptive equalization methods,” IEEE Aerospace

Conference, 2006, pp. 9.

[19] F. H. Hsiao and Terng-Yin Hsu, “A Frequency Domain Equalizer for WLAN 802.11g Single-Carrier Transmission Mode”, IEEE International Symposium on

Circuits and Systems, vol. 5, May 2005, pp. 4606 – 4609.

[20] S. U. H. Qureshi, “Adaptive Equalization,” Proceedings of The IEEE, vol. 73, No. 9, Sept. 1985.

[21] H.Y. Chen, “Design of Baseband Receiver for High-Mobility Wireless Metropolitan Area Network,” Master Thesis, Dept. of EE, National Chiao Tung University, Hsinchu, Taiwan, Sep. 2009.

[22] Z. Gao and Q. Wu, J. Wang, “A Novel Combination Algorithm Based on Chip Equalizer and Multi-path Interference Cancellation,” Signal Processing, 2004.

Proceedings, ICSP ’04.

[23] M. Golay, “Complementary Series,” IRE Transactions on Information Theory, Apr. 1961.

[24] S. Budisin, "Efficient Pulse Compressor for Golay Complementary Sequences,"

Electronics Letters, vol.27, no.3, pp.219-220, Jan. 1991.

[25] B. Popovic, "Efficient Golay Correlator," Electronics Letters, vol.35, no.17, pp.1427-1428, Aug. 1999.

[26] S. Haykin, “Adaptive Filter Theory”, 4th ed., Upper Saddle River, N.J.: Prentice Hall, 2002.

[27] A. Burg, S. Haene, W. Fichtner, and M. Rupp, “Regularized Frequency Domain Equalization Algorithm and its VLSI Implementation,” IEEE International

Symposium on Circuits and Systems, May 2007, pp. 3530 – 3533.

[28] J. Coon, S. Armour, M. Beach, and J. McGeehan, “Adaptive frequency-domain equalization for single-carrier MIMO systems,” IEEE International Conference

[29] S.J. Huang and Sau-Gee Chen, “A High-Throughput Radix-16 FFT Processor with Normal Input/Output Ordering for IEEE 802.15.3c,” VLSI Design/CAD Symposium, Aug., 2011.

[30] P.G. Donato, M.A. Funes, M.N. Hadad and D.O. Carrica, “Optimised Golay Correlator” , Electron. Lett., 26^th Mar. 2009 vol.45 No.7

[31] Y. Zeng and T. S. Ng, “Pilot Cyclic Prefixed Single Carrier Communication:

Channel Estimation and Equalization,” IEEE Signal Processing Letters, vol. 12, issue 1, Jan. 2005, pp. 56 – 59.

[32] J. H. Yu, K. J. Hou, and T. D. Chiueh, “Multi-way Baseband Receiver Design for IEEE 802.15.3c HSI-OFDM Mode,” VLSI Design/CAD Symposium, Session 3-3, 2009.

[33] H. Sari, G. Karam, and I. Jeanclaude, ”Transmission Technique for Digital Terrestrial TV Broadcasting,” IEEE Communications Magzine , Feb. 1995.

[34] Y.S. Huang, “Design and Implementation of Synchronization Detection for IEEE 802.15.3c,” Master Thesis, Dept. of EE, National Chiao Tung University, Hsinchu, Taiwan, Oct. 2010.

在文檔中十億級資料傳輸室內無線SC/OFDM接收機之等化器 (頁 73-0)