Linear-Sort - Sorting Network - Multistage K-Best Detection Architecture

Chapter 5 Hardware Architecture

5.3 Multistage K-Best Detection Architecture

5.3.1 Sorting Network

5.3.1.4 Linear-Sort

Area consumption in sorting network is associated with N because both the compare-and-select units as well as the registers grow proportionally to N. The results discussed above and shown in Figure 5-9 tell us that bubble-sort and bitonic-sort

require larger num s sort. Therefore, bubble-sort

and

dvantage of the circuit is the low silicon area. A drawback is that N cycles are needed to sort the N elements, i.e., linear sorting latency.

ber of comparators than M. Afghahi’

bitonic-sort are not really very good at all for our purpose.

In order to keep area small for larger K in multistage K-Best detector, we develop the circuit shown in Figure 5-10 and called linear-sort. The linear-sort is based upon small modifications to this design which is proposed by M. Afghahi [37].

It is consisted of N/2 compare-and-select units and requires N clock cycles to sort a sequence of N elements. Initially, all the registers are set to one. At each compare-and-select unit, the smaller value of the two inputs is passed to the lower output while the larger one is passed to the upper output. Therefore, after N clock cycles, the smallest K values will be stored at the bottom registers. The a

[Figure 5-10] Block diagram of linear-sorting network

5 n Unit

The architecture of the metric calculation unit at the first stage is shown in Figure 5-11. Figure 5-12 shows the architecture of the metric calculation unit at the ith stage ( 2 ≤ i ≤ q ) which executes the calculations corresponding to the equation (3.9) of the multistage K-Best algorithm. It consists of four n-bit adders, two n-bit × n-bit squarers and one carry save adder circuit, where n is the bit-width of the data path. From equation (3.10) and (3.11), we can see that the amount of calculations in the carry save adder circuit is different for different layers.

Resource sharing is applied. Therefore, the N candidates at the input of each stage are processed one after the other. As shown in Figure 5-11 and Figure 5-12, each candidate requires two clock cycles to update the accumulated partial Euclidean distance in the metric calculation unit. Then the updated accumulated partial Euclidean distance is delivered to the sorting network. The linear sorting network requires N clock cycles to sort a sequence of N candidates. Therefore, the total number of clock cycles for each stage is equal to N + 2.

.3.2 Metric Calculatio

[Figure 5-11] Architecture of metric calculation unit at the first stage

[Figure 5-12] Architecture of metric calculation unit at the ith stage (2≤ i ≤q)

5.4 Experiment Reports for Hardware Implementation

In this section, to evaluate the hardware implementation effectiveness of the proposed multistage K-Best detector, we have designed the hard-output multistage K-Best detector that can support 2 × 2 MIMO transmission with 64-QAM modulation.

Therefore, we report the experiment results in the following sub-section.

5.4.1 The Area and Power Estimation

The design entry is Verilog HDL language, which is synthesized to gate level implementation using Synopsys with 0.18um process technology. The whole

architec of the

QR-decomposition unit, matrix and vector multiplication unit and multistage K-Best detection unit are about 86K, 23K and 196K equivalent gates, respectively. Detailed reports can be found in Table 5-1.

The maximum clock frequency of the detector is 120MHz and the system can achieve up to 80Mbps throughput. For the power consumption issue, the power is estimated by PrimePower. The detector operates with a clock frequency of 120 MHz and the power for the detector is about 366mW. We use SOC Encounter as place and route (P&R) tool and the layout is shown in Figure 5-13. The core area of detector occupies about 2.080mm × 2.080mm = 4.3264mm² and the die size is about 2.728mm

× 2.728mm = 7.442mm².

ture consists of about 305K equivalent gates. The resource usages

[Table 5-1] Area report for each unit and component

[Figure 5-13] Chip layout by SOC Encounter

Chapter 6 Conclusion

6.1 Conclusion

This thesis has proposed a new signal detection algorithm which includes replacing the original higher order constellation with several lower order constellations and ordering the lower order constellation. Furthermore, application of the proposed algorithm significantly reduces the computational complexity compared to the conventional algorithm and achieves almost identical detecting accuracy.

Thus, the main contributions of this thesis are summarized as follows:

1. We compared the performance of the two tree search algorithms, multistage K-Best and conventional K-Best, for detection in the MIMO-OFDM system.

According to the simulation results summarized above, it can be seen that the two algorithms will produce almost the same PER performance at similar K values. However, the multistage K-Best algorithm is shown to achieve such performance with much lower computational complexity compared to the conventional K-Best algorithm.

2. The multistage K-Best architecture is presented. The architecture operates in a pipelined fashion and effectively supports 2 × 2 MIMO transmission with 64-QAM modulation. The hardware implementation is synthesized using 0.18um CMOS technology.

6.2 Comparison

In order to achieve high throughput with almost the same silicon area, we considered two designs: Design A contains one multistage K-Best detection unit with

ee identical conventional K-Best detection units with K = 8. We note that these designs can operate 2×2 antenna configuration and 64-QAM mod

K = 8; Design B contains thr

ulation scheme. Table 6-1 shows the implementation results of the two designs including the silicon area, detection throughput and power consumption. Due to the largely reduced number of required candidate, the detection throughput of Design A is almost 1.22 times larger than that of Design B with the similar clock frequency. At the same time the power consumption of Design A is smaller than Design B.

Reference Design A Design B

K 8 8

Antennas 2 × 2 2 × 2

Modulation 64-QAM 64-QAM

Gate Count 196 199.5

Throughput

@ 120 MHz [Mbps]

80 65.4

Power consumption

@ 120 MHz [mW]

180.9 239.1

[Table 6-1] Comparison of implementation results

6.3 Future Work

Performance may benefit from more sophisticated receiver such as iterative MIMO detection and decoding. This receiver consists of a MIMO detector and a channel decoder, as shown in Figure 6-1 [6]. The iterations are performed between the two units, such that the reliability of the decisions is increased. We know multistage K-Best algorithm can get benefits from decomposition higher order constellation for reducing overall complexity. Therefore, we can put the new idea into the iterative receiver, and then discuss the effects for computational complexity and performance by simulation results for the future work

] Block diagram of an iterative receiver [Figure 6-1

REFERENCES

[1] N

[4] Van Nee, R., and R. Prasad, “OFDM for Wireless Multimedia Communications”

Boston: Artech House, 2000.

[5] Y. G. Li, J. H. Winters and N. R. Sollenberger, “MIMO-OFDM for wireless communications: signal detection with enhanced channel estimation” IEEE Transactions on Communications, vol. 50, Sept. 2002, pp. 1471-1477.

[6] Mohinder Jankiraman, “Space-time codes and MIMO systems” Boston: Artech House, 2004.

[7] Hamid Jafarkhani, “Space-time coding: theory and practice” Cambridge:

Cambridge University Press, 2005.

[8] Kwan Wai W , “Reduced-

complexity Maximum Likelihood lattice decoder for MIMO channels” in Proc. of the 7th Asia-Pacific Conference on Communications, pp. 213-216, 2001

[9] T. Fujita, T. Onizawa, W. Jiang, D. Uchida, T. Sugiyama, and A. Ohta, “ A new signal detection scheme combining ZF and K-best algorithms for OFDMISDM”

PIMRC 2004, vol. 4, pp. 2387-2391, Sept. 2004.

. J. A. Sloane and A. D. Wyner, Eds., “Claude Elwood Shannon: Collected Papers” New York: IEEE Press, 1993.

[2] G. J. Foschini., “Layered space-time architecture for wireless communication in fading environment when using multiple antennas” Bell Labs. Tech. Journal, vo1.2, autumn, 1996.

[3] G. J. Foschini, and M. J. Gans, “On limits of wireless communications in a fading environment when using multiple antennas” Wireless Personal Communications, vol. 6, 1998.

ong, Chi-Ying Tsui, Roger S Cheng, Wai Ho Mow

[10] Damen M.O., El Gamal H. and Caire G., “On maximum-likelihood detection and the search for the closest lattice point” IEEE Transactions on Information Theory,

[11 pected complexity of sphere decoding,” in

[12]

coding” IEEE Signal Processing Letters, vol. 12, March

[13

Inf. Theory, vol. 48, no. 8, pp. 2201–2214, Aug. 2002.

els,” in Proc. IEEE Int. Conf. Commun., Jun. 2004,

[15]

Acoustics,

. iv - 809-12.

. Lett., vol. 4, no. 5, pp. 161–163, May 2000.

[18 s for

vol. 49, Oct. 2003, pp. 2389 - 2402 ] B. Hassibi and H. Vikalo, “On the ex

Proc. Asilomar Conf. Signals, Syst., Comput., 2001, pp. 1051–1055

Tao Cui, Tellambura, C., “Approximate ML detection for MIMO systems using multistage sphere de

2005 pp.222 – 225

] E. Argell, E. Eriksson, A. Vardy, and K. Zeger, “Closest point search in lattices,”

IEEE Trans.

[14] D. L. Ruyet, T. Bertozzi, and B. Özbek, “Breadth first algorithms for APP detectors over MIMO chann

pp. 926–930.

Rupp M., Gritsh G., Weinrichter H., “Approximate ML detection for MIMO systems with very low complexity” IEEE International Conference on

Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04), vol. 4, 17-21 May 2004 pp

[16] M. O. Damen, A. Chkeif, and J.-C. Belfiore, “Lattice code decoder for space-time codes,” IEEE Commun

[17] E. Viterbo and E. Biglieri, “A universal lattice code decoder for fading channels,” IEEE Trans. Inf. Theory, vol. 45, no. 7, pp. 1639–1642, Jul. 1999.

] H. Artes, D. Seethaler, and F. Hlawatsch, “Efficient detection algorithm MIMO channels: A geometrical approach to approximate ML detection,” IEEE Trans. Signal Process., vol. 51, no. 11, pp. 2808–2820, Nov. 2003.

[19] K. W. Wong, C. Y. Tsui, R. S. Cheng, and W. H. Mow, “A VLSI architecture of a K-best lattice decoding algorithm for MIMO channels” in IEEE International Symposium on Circuits and Systems, May 2002, pp. III-273-III-276.

nal on vol.

[21]

Journal of Solid-State Circuits, vol. 40, pp. 1566-1577, July

[22] hester, Wiley,

[23]

puters, EC-8, Sept. 1959, pp. 330-334.

[25]

1136.

[27] ns.

[28]

IEEE Trans.

Computers, vol. 40, Sept. 1991, no. 9, pp. 989-995.

[20] Z. Guo and P. Nilsson, “Algorithm and Implementation of the K-Best Sphere Decoding for MIMO Detection” in Communications, IEEE Jour

24, Issue 3, March 2006 pp.491 - 503

A. Burg et al., “VLSI implementation of MIMO detection using the sphere decoding algorithm”

2005.

Jennings A., “Matrix computation for engineers and scientists” Chic 1977.

J. E. Volder, “The CORDIC trigonometric computing technique” IRE Trans.

Electron. Com

[24] J. S. Walther, “A unified algorithm for elementary functions” Proc. AFIPS SJCC 1971, vol. 38, pp. 379-385.

P. W. Baker, “Suggestion for a fast binary sine/cosine generator” IEEE Trans.

Computers, 1976, pp.

1134-[26] M. D. Ercegovac and T. Lang, “Redundant and on-line CORDIC: application to matrix triangularization and SVD” IEEE Trans. Computers, vol. 39, June 1990, no. 6, pp. 725-740.

H. X. Lin and H. J. Sips, “On-Line CORDIC Algorithms” IEEE Tra Computers, vol. 39, Aug. 1990, no. 8, pp. 1038-1052.

N. Takagi, T. Asada, and S. Yajima, “Redundant CORDIC Methods with a Constant Scale Factor for Sine and Cosine Computation”

[29] J. A. Lee and T. Lang, “Constant-factor redundant CORDIC for angle calculation and rotation” IEEE Trans. Computers, vol. 41, Aug. 1992, pp. 1016–1025.

J. B. Anderson and S. Mohan, “Sequential coding algorithms: A surv

[30] ey and cost

[31]

heory, vol. 17, pp. 118, Jan. 1971

[33] B. Tarokh and H. Sadjadpour, “Construction of OFDM M-QAM sequences with

Jan. 2003.

8, 1982, pp. 19–26.

ixes” IEE Proceedings, Computers and Digital

[36] s and their applications” Proc. Spring Joint

[37]

. 10, October 1991

okyo, Japan, May. 2000.

[40] g Booth-folding technique,”

analysis” IEEE Trans. Communications, vol. 32, Issue 2, Feb 1984, pp. 169-176.

F. Jelinek and J. B. Anderson, “Instrumentable tree encoding of information sources” IEEE Trans. on Information T

[32] S. J. Simmons. “Breadth-first trellis decoding with adaptive effort". IEEE Trans.

Information Theory, vol. IT-38, pp. 3-12, Jan. 1990.

low peak-to-average power ratio” IEEE Trans. Communication, vol. 51, no. 1, pp.

25–28,

[34] W. M. Gentleman and H. T. Kung, “Matrix triangularization by systolic arrays”

in Proc. SPIE Real-Time Signal Processing IV, vol. 29

[35] A.El-Amawy and K.R. Dharmaranjan, “Parallel VLSI algorithm for stable inversion of dense matr

Techniques, vol. 136, No. 6, P. 575-580, 1989.

K. E. Batcher, “Sorting network

Computer. Conf. AFIPS, Vol. 32, dxdc1968, pp. 307-314.

M. Afghahi, “A 512 16-b Bit-Serial Sorter Chip” IEEE Journal of Solid-State Circuits, vol. 26, no

[38] R. Van Nee, A. Van Zelst, and G. Awater, “Maximum likelihood decoding in a space division multiplexing system,” IEEE VTC’00, T

[39] R. F. H. Fischer, C. Windpassinger, "Real versus complex-valued equalisation in V-BLAST systems," Electronics Letters, vol. 39, no. 5, pp. 470 - 471, Mar. 2003.

D. De Caro , A. G. M. Strollo, “Parallel squarer usin

在文檔中基於多級樹狀結構之球狀解碼器實現 (頁 69-0)