高效率之球型解碼演算法及其應用

(1)

國立交通大學

電信工程研究所

博士論文

高效率之球型解碼演算法及其應用

Highly Efficient Sphere Decoding

Algorithm and Its Applications

研究生：黃崇榮

指導教授：李大嵩博士

(2)

高效率之球型解碼演算法及其應用

Highly Efficient Sphere Decoding Algorithm and

Its Applications

研究生：黃崇榮

Student:

Chung-Jung

Huang

指導教授：李大嵩博士 Advisor:

Dr.

Ta-Sung

Lee

國立交通大學

電信工程研究所

博士論文

A Dissertation

Submitted to Institute of Communication Engineering

College of Electrical and Computer Engineering

National Chiao Tung University

in Partial Fulfillment of the Requirements

for the Degree of Doctor of Philosophy

in

Communication Engineering

September 2013

Hsinchu, Taiwan, R.O.C.

(3)

高效率之球型解碼演算法及其應用

學生：黃崇榮

指導教授：李大嵩博士

Chinese Abstract

國立交通大學電信工程研究所

摘要

多輸入多輸出系統中，高效率且低功率消耗之接收機的設計為關鍵議題之ㄧ。在本論文中，吾人首先以K-Best 解碼器為基礎，提出一個適用於大型積體電路架 構的高性能球型解碼器。利用複數平面星座圖的規律特性來簡化路徑長度計算及排序，達到省卻大量資料排序動作及路徑值的運算需求，進而實現一個高效率且具有固定吞吐量的解碼器；更進一步針對傳統 K--Best 解碼性能的缺失，藉由所 提出的一種新型搜尋策略可提供接近於最大似然搜尋之解碼性能。接著，吾人針對廣義之多天線之欠定系統，提出具有低解碼複雜度的解碼器。該解碼器包含了兩個步驟：1.藉由所提出的高效率的平面候選點搜尋器將所有所需的候選點一一找出。2.針對這些候選點集合進行平面交集的動作並配合動態半徑調整機制來快速地找出該問題的解。接著，一個可與所提出解碼器結合之通道矩陣行向量的排序策略亦被提出。進而提供低運算需求及近似最大似然搜尋的解碼性能。吾人亦針對排序策略所對應的運算降低率提出一套系統化的數學分析分法。最後，吾人針對上傳鏈結系統之多點協同傳輸系統中的碼簿搜尋問題，提出具有極低運算複雜度之演算法。首先，吾人根據矩陣運算理論，提出一個塊狀 QR 分解程序，能順利將原搜尋問題轉化成尋找最長路徑之命題。接著，運用所提出的修正K-Best 解碼器便能以極低的運算量完成碼簿搜尋且仍保有極佳的系統性能。經由電腦模擬驗證本論文所提出的演算法及架構皆能提供優越的解碼性能及較低的運算需求，極適用於下世代之寬頻無線通訊系統。

(4)

Highly Efficient Sphere Decoding Algorithm and Its

Applications

Student: Chung-Jung Huang

Advisor: Dr. Ta-Sung Lee

English Abstract

Institute of Communications Engineering

National Chiao Tung University

Abstract

In this dissertation, a low complexity near-ML K-Best sphere decoder is proposed as the first part. The development of the proposed K-Best sphere decoding algorithm (SDA) involves two stages. First, a new candidate sequence generator (CSG) is proposed. The CSG directly operates in the complex plane and efficiently generates sorted candidate sequences with precise path weights. Using the CSG and an associated parallel comparator, the proposed K-Best SDA can avoid the computational complexities in the large amount of path weight evaluations and sorting. Then a new search strategy based on a derived cumulative distribution function (cdf) and an associated efficient procedure is proposed. With the above features, the proposed SDA can provide near ML performance with the lower complexity than conventional K-Best SDAs. Afterwards, a novel decoder with low decoding complexity is proposed for underdetermined MIMO systems. The proposed decoder consists of two stages. First, an improved slab decoding algorithm is adopted to efficiently obtain valid candidate

(5)

optimal solution by conducting intersections on the obtained candidate set with dynamic radius adaptation. Furthermore, an optimal preprocessing technique is proposed from the geometrical perspective and the comprehensive analysis on the complexity reduction is also provided. The proposed decoder incorporating preprocessing scheme offers a low (non-exponential) computational complexity and near-ML decoding performance for underdetermined MIMO systems, particularly with large number of antennas and/or high-order constellations. Finally, a tree based codebook search algorithm for uplink (UL) coordinated multipoint (CoMP) systems is proposed. The codebook search issue can be reformulated as a tree search form and the solution can be obtained efficiently using a modified K-Best enumeration strategy. The proposed approach provides the advantage of low computational complexity and nearly the same performance of the exhaustive search algorithm, especially when the CoMP size is significant. Simulation results show that these proposed algorithms can significantly reduce the computational complexity and maintain system performance, which provide a promising solution for future wireless communication systems.

(6)

Acknowledgement

I wish to express my deepest gratitude for the guidance, support and constructive criticism to my advisor Dr. Ta Sung Lee, whose elegant way of approaching problems has considerably influenced my research and working style. I especially want to thank Dr. Wei-Ho Chung from Academia Sinica for many helpful suggestions, valuable comments and writing improvement.

I want to thank the all members of the Communication System Design and Signal Processing (CSDSP) Lab for sharing their knowledge and inspiring discussion.

I also thank my mother, parents-in-law, brothers and friends for their encouragement and continual support. Finally, I thank my adorable children CoCo and Jim and my lovely wife Amy, who always stood by me during frustrating and hard times, and without whom this work would not have been completed.

(7)

Chinese Abstract ... i

English Abstract ... ii

Table of Contents ... v

List of Figures ... viii

List of Tables ... xi

Acronym Glossary ... xii

Notations ... xv

Chapter 1 Introduction ... 1

1.1 Basics of Multi-Antenna Systems ... 1

1.2 Basics of MIMO Decoder ... 2

1.3 Related Literature Review ... 4

1.4 Main Contributions ... 9

1.5 Organization of Dissertation ... 10

Chapter 2 Efficient Search Algorithm for Over- determined

MIMO systems ... 12

2.1 Overview ... 12

2.2 Signal Model ... 13

2.3 Proposed Sorting Algorithm and Hardware Architecture ... 18

2.3.1 Candidate Sequence Generator in Complex Plane ... 18

(8)

2.3.3 Complexity Advantages ... 28

2.4 Proposed Search Strategy for Near-ML Performance ... 29

2.4.1 Preprocessing with Column Permutation ... 29

2.4.2 Proposed Search Strategy ... 30

2.4.3 Joint 2-Layer ML Search Algorithm ... 34

2.5 Computer Simulation and Discussions ... 38

2.6 Summary ... 48

Chapter 3 Geometry Based SDA for Under- determined MIMO

systems ... 50

3.1 Overview ... 50

3.2 Signal Model for Underdetermined SDA ... 51

3.3 Proposed Decoding Algorithms for Underdetermined Systems ... 54

3.3.1 An Efficient Slab Search (ESS) Algorithm ... 55

3.3.2 A Multi-slab Sphere Decoding (MSSD) Algorithm ... 57

3.4 Proposed Preprocessing Technique for Complexity Reduction ... 60

3.4.1 A Preprocessing with Column Permutation ... 60

3.4.2 Complexity Analysis ... 64

3.5 Computer Simulation and Discussions ... 69

3.6 Summary ... 76

Chapter 4 Efficient Search Algorithm for Codebook Search in

Uplink CoMP Systems ... 77

4.1 Overview ... 77

4.2 Signal Model ... 78

(9)

4.4 Simulation Results ... 85

4.5 Summary ... 88

Chapter 5 Conclusions and Future Works ... 91

5.1 Summary of Dissertation ... 91

5.2 Future Works ... 92

APPENDIX ... 94

(10)

List of Figures

Figure 2-1: Illustration of the multi-index operation. ... 17

Figure 2-2: Modulo operation of the search center. ... 20

Figure 2-3: Partition of the search segments. ... 20

Figure 2-4: (a) Modulo unit of _Re

( )

" i y . (b) Transformation unit of " , i M y . ... 23

Figure 2-5: Hardware architecture of the candidate generator. ... 25

Figure 2-6: Illustration of the HPPC operations. ... 26

Figure 2-7: HPCC architecture. ... 27

Figure 2-8: Search constraints of the Nth layer with ' d = 1.1. ... 32

Figure 2-9: Cdf curves of 2 , , o i i r . (a) 4x4 MIMO channel. (b) 8x8 MIMO channel. ... 32

Figure 2-10: Geometrical relationship illustrating the adopted property ... 35

Figure 2-11: Performance of complex K-Best SDA for 4x4 MIMO systems. (a) 16-QAM modulation. K = 4 and 8 for complex K-Best SDA incoporating proposed CML strategy; K = 4, 8, and 12 for regular complex K-Best SDAs. (b) 64-QAM modulation. K = 4 and 12 for complex K-Best SDA incoporating proposed CML strategy; K = 4, 12, and 24 for regular complex K-Best SDAs. ... 41

Figure 2-12: Performance and complexity of SDA for 4x4 MIMO systems with 16-QAM modulation. (a) SER. (b) Complexity. K = 8 for K -Best SDAs. ... 42

Figure 2-13: Performance and complexity of SDA for 4x4 MIMO systems with 64-QAM modulation. (a) SER. (b) Complexity. K = 8 for K -Best SDAs. ... 43

(11)

Figure 2-14: Performance and complexity of SDA for 8x8 MIMO systems

with16-QAM modulation. (a) SER. (b) Complexity. K = 14 for K -Best SDAs. ... 45

Figure 2-15: Performance and complexity of SDA for 8x8 MIMO systems with

16-QAM modulation. (a) SER. (b) Complexity. K = 32 for proposed K -Best SDA; K = 32 and 52 for conventional K -Best SDAs. ... 47

Figure 3-1: Geometrical diagram of slabs with different y . ... 61

Figure 3-2: SER performance comparisons of SSD, ML and proposed decoder with

16-QAM modulation for various MIMO configurations. ... 71

Figure 3-3: SER performance and complexity comparisons of SSD, ML and proposed

decoder with 64-QAM modulation for various MIMO configureations. (a) SER; (b) Complexity. ... 72

Figure 3-4: Probability density function of ' M

y with various ordering rules with

16-QAM (4,2) MIMO configuration at SNR=15dB. ... 73

Figure 3-5: The comparison of the averaged complexity reduction ratio for various

ordering rules. ... 73

Figure 3-6: Performance and complexity comparisons of the proposed decoder

incoporated with and without greedy reordering scheme with 64 QAM

modulation ... 75

Figure 4-1: Illustration of centralized UL CoMP system model ... 79 Figure 4-2: Illustration of centralized UL CoMP system model. ... 80 Figure 4-3: Sum-rate performance and complexity of the proposed and exhaustive

search methods with Nt=4, Nr=4, M=3, P=1, d1=2, d2=3, and d3=4. ... 87

(12)

(13)

List of Tables

Table 2-1: LIST OF CANDIDATE SEQUENCES ... 22

Table 2-2: COMPUTATIONAL COMPLEXITY OF PROPOSED K-BEST SDA (EXCLUDING INTERFERENCE CANCELLATION) ... 37

Table 2-3: COMPUTATIONAL COMPLEXITY OF CONVENTIONAL K-BEST SDA IN REAL DOMAIN (EXCLUDING INTERFERENCE CANCELLATION) ... 38

Table 4-1: THE PROPOSED BLOCK QRDECOMPOSITION ... 88

Table 4-2: APROPOSED CODEBOOK SEARCH ALGORITHM USING K-BEST ENUMERATION ... 90

(14)

Acronym Glossary

3GPP third generation partnership project

AWGN additive white Gaussian noise

BS base station

BQRD block QR decomposition

CA carrier aggregation

CAS compare-and-select

cdf cumulative distribution function

CML conditional-ML

CoMP coordinated multipoint

CP central processor

CSG complex sequence generator

CSI channel state information

D-BLAST diagonal Bell laboratories layered space-time

DLSD double layer sphere decoding

ESS efficient slab search

FP Fincke and Pohst

GS Gram-Schmidt

GSD generalized sphere decoder

HPCC highly-parallel comparison circuit

ICI inter-cell intererence

LTE long term evolution

(15)

MIMO multiple-input multiple-output MISO multiple-input-single-output ML maximum-likelihood

MMSE minimum mean-square error

MSSD multi slab sphere decoding

MU multi-user

PDA plane decoding algorithm

pdf probability density function

PED partial Euclid distance

PSASR partial sum of achievable sum-rate

RVD real-value decomposition

RRH remote radio head

SDA sphere decoding algorithm

SE Schnorr and Euchner

SER symbol error rate

SIC successive interference cancellation

SINR signal to interference and noise ratio

SLA slab decoding algorithm

SNR signal to noise ratio

SSD slab sphere decoder

SVD singular value decomposition

UE user equipment

UL uplink

V-BLAST vertical Bell laboratories layered space-time

(16)

WiMAX Worldwide Interoperability for Microwave Access ZF zero-forcing

(17)

Notations

é ù⋅ ceiling operator ( )⋅ * conjugate operator { } E ⋅ expectation operator ë û⋅ floor operator

( )⋅ † Moore-Penrose pseudo-inverse operator

[ ]⋅ _{rounding operator} ( )⋅-1 _{inverse operator} ( ) Q ⋅ quantization operator ( )⋅ T transpose operator C search radius H channel matrix r

N _{number of receive antennas}

c

M constellation size

t

N number of transmit antennas

P transmit power

n noise vector

g _{average SNR at the receiver}

x transmit signal vector

y _{received signal vector}

m qk

H channel between the qth receiver and the kth UE in cell m

(m-1)P k+

(18)

(m-1)P k+

x

transmitted signal vector for the kth UE in cell m T

d number of total transmit data streams

sum

R achievable sum-rate

( )i

X _{the ith column of matrix X}

{ }j i

X matrix consists of the ith column to of jth column of matrix X

( )

diag block diagonal matrix stacking operator

( )

tr trace operator

(19)

Chapter 1 Introduction

In this introductory chapter, background materials about multi-antenna systems and well-known decoding algorithms are presented. The following sections include the literature survey, dissertation contribution and overview of this dissertation.

1.1 Basics of Multi-Antenna Systems

Next generation wireless communication systems are expected to provide users with higher data rate services for video, audio, data, and voice signals. Many innovative techniques have recently been proposed to improve the spectral efficiency and reliability of wireless communication links. Examples include coded multicarrirer modulation, smart antennas, multiple-input multiple-output (MIMO) technology and adaptive modulation.

Among these technologies, MIMO technology has attracted substantial research and industrial interests. The MIMO technology involves the use of multiple antennas at both sides to provide more reliable communication link and/or higher spectral efficiency[1-3]. The theoretical analyses show that the MIMO system capacity linearly increases with the minimum value of the number of transmitting and receiving antennas [2]. As a result, the MIMO technology has been recognized as one of the most promising solutions for future wireless communication systems. There are two underlying techniques in MIMO system: spatial multiplexing [6] and diversity techniques [3]. Spatial multiplexing offers a linear increase of data rate by transmitting multiple independent data streams at the same time.

(20)

Spatial diversity provides diversity gain to mitigate fading effects by using the multiple (ideally independent) copies of the transmitted signal. They are usually trade-offs in the two techniques which provide an effective and promising solution while achieving high-data rate and reliable transmission.

The MIMO technology has been widely adopted in the next-generation wireless communications such as IEEE 802.16 and Third Generation Partnership Project (3GPP), Long Term Evolution (LTE), and LTE-advanced (LTE-A) systems.

1.2 Basics of MIMO Decoder

Multiple-antenna systems employing spatial multiplexing increase the spectral efficiency. However, this improvement comes at the cost of an increased receiver complexity. Finding the well balanced trade-off between communications performance and implementation complexity in MIMO detection is one of the key challenges in the receiver design.

The optimal detector for MIMO systems is the maximum likelihood (ML) detector, which search all possible combinations of transmitted symbols. The number of possible combinations increases exponentially with the number of antennas and the size of the legal modulation symbol set. Therefore, it is difficult to be implemented at the receiver in practice. As a remedy, many suboptimal detection algorithms have been developed with desired trade-off between performance and complexity. They can be divided into the following classes.

 Linear MIMO Detection

Zero-Forcing (ZF) and minimum mean square error (MMSE) filters apply linear operations to the received signal in order to restore the transmitted signal. These linear filters can be implemented at a low complexity; however, their communications

(21)

performance is poor. The MMSE filter considers the noise power in the interference cancellation and therefore shows a slightly better performance.

 Successive Interference Cancellation

The successive interference cancellation (SIC) technique was initially adopted by the vertical Bell Laboratories layered space-time (V-BLAST) system [4] - [7-8]. In contrast to the basic ZF and MMSE filters, SIC detects the transmitted streams sequentially and then removes the interference of each detected stream from the received data before continuing the detection process. The performance of the SIC algorithm is generally better than ZF and MMSE filters.

 Breadth-First Tree Search Algorithms

For further improvement of the communications performance, the MIMO detection problem can be mapped on a tree search. Tree search algorithms can be divided into breadth-first and depth-first search algorithms. Breadth-first algorithms can potentially provide a constant throughput with slight performance loss compared to an optimal

detection. Among these techniques, the K-Best enumeration strategy is the most popular

approach. While traversing the tree, the K-Best detector always retains the K best nodes

in each search level. This additional sorting operation causes extra computational complexity. In general, the decoding performance of breadth-first algorithms depends on the number of survival nodes chosen in each search layer.

 Depth-First Tree Search Algorithms

The main idea of depth-first search is to apply pruning criteria to remove parts of the tree in the search to reduce the computational complexity. The typical sphere detectors can achieve the same decoding performance of ML detector with significant complexity reduction. Due to the nature of the depth-first search, their throughput is usually variable.

(22)

The sequential tree search order makes it difficult to parallelize the detection. There exist many sub-optimal variants regarding enumeration technique, pruning criterion, or simplified metric calculations to obtain desired trade-off between performance and complexity.

1.3 Related Literature Review

As the aforementioned, linear detection and SIC scheme are simple to implement, but their detection performance is far from optimal. ML detection is the optimal detection scheme, but its complexity grows exponentially with the size of the transmitted symbol alphabet and number of transmit antennas. To reduce the complexity of ML detection, the sphere decoding algorithm (SDA) has been introduced to achieve the same performance as ML detection with reduced complexity [9]-[12]. The SDA has received considerable attention as an effective detection scheme for MIMO systems.

The basic idea of the SDA is to locate the lattice point nearest to the received signal vector within a given sphere radius. In doing so, the SDA transforms the original problem into a tree search problem. Some candidate enumeration strategies have been proposed [9]-[12]. In the work by Fincke and Pohst SDA (FP-SDA) [9], [10], the radius is set as a scaled variance of the noise. If no lattice points satisfy the radius constraint, the algorithm increases the search radius and restarts the search. The Schnorr and Euchner SDA (SE-SDA) [12] is a variant of the FP-SDA. It shows that enumerating candidate symbols in ascending order based on their distance from the Babai estimate [13] (nulling-canceling solution) speeds up the tree search. This approach is likely to find the optimal solution faster than the FP-SDA and hence can reduce the computational complexity. With these efforts, the conventional SDA is still too

(23)

complex in the low SNR regime and its decoding throughput is not stable in general. Hence, it is not desirable for real time detection and hardware implementation. Previous works [14]-[16] proposed some architectures to explore the parallelism property of VLSI to improve the decoding throughput. These designs exhibit excellent performance in the higher SNR regime.

To overcome the drawbacks of the conventional SDA, the K-Best SDA has been

introduced in [17]-[19]. The K-Best SDA uses a breadth-first search and keeps the

K-Best candidates of each layer for the search of the next layer. Briefly, the main idea

of the K-Best SDA is to keep only K candidates which have the smallest path weights

as the most promising solutions. Hence, the decoding throughput of the K-Best SDA is

stable. Unfortunately, applying a sorting algorithm to find the K-Best candidates in

each layer requires many computational operations and a long decoding latency.

Moreover, the value of K must be large enough to achieve near-ML performance, and

this would increase the computational complexity, decoding latency, and implementation cost.

Sorting is a critical factor in reducing the complexity of a K-Best SDA. In [17] the

bubble sort algorithm is applied to conduct sorting. More efficient sorting algorithms [18], [19] have also been adopted to reduce computational complexity. Recently, a high

efficiency sorting architecture has been proposed, which can sort K values of partial

Euclidean distances in K/2 clock cycles [20]. It is found that the quick sort algorithm

[18] is not always more suitable than the bubble sort algorithm for a small value of K.

Some efficient early-pruning schemes have been proposed in [18], [21] which eliminate the survival candidates that are unlikely to become ML solutions in the early search layers. The approach in [22] reduces the number of candidate nodes by adopting

dynamic K values according to the index of search layers. The above approaches can

(24)

due to that the ML solution can possibly be dropped.

To solve the above performance problem, the method presented in [23] always conducts the ML search in several preceding search layers, where ML search refers to an exhaustive search in a certain layer. In this case, the operation in the remaining

layers is the same as the conventional K-Best SDA. This approach is a special case of

the dynamic-K method, and increases complexity and power consumption significantly.

In general, it is not necessary to perform the ML search especially when the channel

condition is good. The method proposed in [24] chooses the optimal K dynamically

according to the channel condition. An approximated algorithm [25] has been proposed to estimate channel conditions in an efficient way. Nevertheless, these methods require complicated procedures and some extra circuits. To the best of our knowledge, there are no efficient mechanisms for deciding the number of layers in which the ML search is

conducted, or whether to perform the ML search under different K values and antenna

numbers.

Most of the SDAs developed so far work in the real domain using the real-valued decomposition (RVD) [17], [26]-[27]. Although the real domain approaches lead to better performance and lower complexity, they require more search layers than the complex domain approaches [28], [29]. To reduce the number of search layers, some novel search methods which operate in the complex plane have been proposed [30], [31]. These methods introduce errors when evaluating path weights, which achieves the goal of reducing complexity but sacrificing performance significantly. On the other hand, some communication systems require rotating the constellation by a pre-defined angle before transmitting symbols to achieve a higher diversity gain. In this case, conventional real domain SDAs cannot be adopted directly, and some extra and complicated techniques are needed. To tackle these issues, a new SDA directly performing in the complex domain is desired.

(25)

Afterwards, we will consider an underdetermined MIMO system commonly existing in multi-user (MU) uplink transmission of 3GPP LTE/LTE-A or Mobile WiMAX where the number of users exceeds that of receiving antennas at the base station and decoupling the spatial signals from these users encounters difficulties due to insufficient number of receive antennas. In this circumstance, conventional SDAs are unable to identify a unique solution for the underdetermined MIMO systems. To overcome the aforementioned drawback of SDA, certain novel decoders have been

proposed. First, the generalized sphere decoder (GSD) [42] performs an exhaustive

search on specified dimensions to find the ML solution. Its decoding complexity increases with the constellation size and the difference between transmit-receive antenna numbers. Based on GSD, other efficient decoders have been proposed, such as the regularized sphere decoder [43]-[45], tree-search approach [46], [47], double-layer sphere decoder (DLSD) [48], [49], and slab sphere decoder (SSD) [50], [51]. In [43], the authors convert the original problem into an overdetermined form by the regularization technique with a constant modulus constellation constraint, and then apply the conventional SDA to obtain a near ML solution. Later, the works in [44] reformulate this approach to remove the constant modulus constraint for generalized M-QAM systems. In [46], authors propose an efficient tree-search decoding algorithm for binary constellation systems and extend this algorithm to M-PSK systems in [47]. This modified algorithm needs to decompose the constellation into a weighted sum of QPSK constellations for M-QAM systems. As a result, the decoding complexity increases rapidly with the size of transmit-receive antenna number difference and/or constellation. The DLSD utilizes an outer sphere decoder to find a valid candidate set, and an inner sphere decoder to find the ML solution. The SSD adopts a geometrical approach for finding the valid candidate set to reduce the search complexity of DLSD. Both DLSD and SSD need to perform the conventional SDA sequentially, so their

(26)

complexity remains an issue. Besides, algorithms for coded MIMO systems [52], hybrid approach [53] or heuristic search method [54] tackle the MIMO system from

different perspectives. Comparing the results in [42]-[51], it is evident that the SSD

exhibits the lowest complexity for the large constellation and is thus chosen as a benchmark. Unfortunately, as shown in [51], the decoding complexity increases rapidly with the size of transmit-receive antenna number difference and/or constellation; therefore, developing efficient decoding algorithm is still an active research field for practical applications.

Finally, we intended to consider a codebook search problem in Uplink Coordinated multipoint (CoMP) systems. CoMP has been adopted in LTE-A to improve the cell average and cell edge throughputs [61]. It uses the cooperation between points in several cooperation groups to coordinate the transmission for inter-cell interference (ICI) alleviation and link quality enhancement. An attractive CoMP scheme referred to as centralized CoMP, is a full cooperation approach that involves full channel state information (CSI) and full data information for providing improved performances. The full cooperation scheme between base stations (BS) and remote radio heads (RRH) is applicable in LTE-A because of the dedicated fiber links. In uplink (UL) centralized CoMP systems, cooperating BSs forward received signals and CSI to a central processor (CP), which computes the corresponding precoder matrix for each user equipment (UE). Therefore, the CP needs to feed back the exact precoder matrix to each UE, which is inefficient and impractical. A codebook-based scheme that feeds back only the precoder matrix index (rather than the matrix itself) is adopted as a remedy in real applications. Centralized CoMP with MIMO has attracted significant attention, and there are numerous studies that focus on the optimal precoder design [62-63]. However, efficient codebook search algorithms for the aforementioned scenario are uncommon. It is noted that, to the best of our knowledge, the issue remains

(27)

scarce in the literature.

1.4 Main Contributions

The contributions of this dissertation are summarized as follows:

1. A simple and efficient complex domain candidate sequence generator (CSG) is proposed. By combining the proposed CSG with an efficient sorting architecture, the proposed decoder can significantly reduce path weight calculations and comparison operations without sacrificing detection performance. Moreover, to address the performance issue, a new search strategy that incorporates the ML search in the preceding layers under poor channel conditions improves the performance of the

proposed K-Best SDA even when the value of K is small. A judicious criterion is

proposed that helps determine fewer ML search layers. Furthermore, an efficient search procedure is also proposed that fully utilizes existing hardware elements. Combining

the above features, the proposed K -Best SDA exhibits lower complexity, excellent

performance, and is well-suited to real-time applications.

2. We further develop an efficient decoder from the geometrical perspective for he underdetermined MIMO systems. The proposed decoder consists of two stages. First, an improved slab decoding algorithm is adopted to efficiently obtain valid candidate points within a given slab. Next, a multi-slab based decoding algorithm finds the optimal solution by conducting intersections on the obtained candidate set with dynamic radius adaptation. Furthermore, an optimal preprocessing technique is proposed from the geometrical perspective and the comprehensive analysis on the complexity reduction is also provided. The developed procedure can be applied to any static ordering rule even in non-linear ordering rule for QR based MIMO decoder.

(28)

codebook search algorithm for UL CoMP systems. To break the interdependency among user equipments (UEs), a generalized blockwize QR decomposition procedure is proposed. By the proposed generalized block GS decomposition, the original codebook search problem can be reformulated as a problem of finding the longest path and be solved efficiently by conducting a tree search. To efficiently obtain this solution,

a modified K-Best algorithm is also proposed. The proposed algorithm provides a

significant improvement by one order of computational efficiency and provides a near ML performance compared to the exhaustive approach.

1.5 Organization of Dissertation

The remaining of this dissertation is organized as follows.

In Chapter 2, we will propose a low complexity near-ML K-Best sphere decoder.

The proposed K-Best sphere decoding algorithm involves two stages. First, a new

candidate sequence generator (CSG), which operates in the complex plane and efficiently generates sorted candidate sequences with precise path weights, is proposed.

Using the CSG and an associated parallel comparator, the proposed K-Best SDA can

avoid performing a large amount of operations. Next, a new search strategy based on a derived cumulative distribution function (cdf) and an associated efficient procedure is proposed. By incorporating detection ordering into the proposed SDA, it can provide near ML decoding performance with a lower complexity requirement than conventional

K-Best SDAs.

In Chapter3, We further consider an underdetermined MIMO system and propose an efficient decoder from geometry perspective. The underdetermined MIMO systems can be found in the multi-user (MU) uplink transmission of 3GPP LTE/LTE-Advanced or Mobile WiMAX where the number of users exceeds that of receiving antennas at the

(29)

base station and decouples the spatial signals from these users encounter difficulties due to insufficient number of receive antennas. To tackle the problem, we will propose a geometry-based efficient decoder for underdetermined MIMO systems. The proposed decoder involves two stages. First, an improved slab search algorithm efficiently obtains valid candidate points within a given slab. Next, a multi-slab based decoding algorithm finds the optimal solution by taking intersections of the obtained candidate set with dynamic radius adaptation. By doing so, there is no need to perform SDA sequentially. The proposed decoder can thus provide near ML performance with much lower (non-exponential) complexity compared to the state-of-art methods. Furthermore, we propose an optimal preprocessing technique from the geometrical perspective and conduct comprehensive analysis on the complexity reduction. By introducing the proposed preprocessing scheme, the incorporated decoder can significantly reduce the decoding complexity in the low SNR regime without sacrificing performance. The advantage is useful and suitable for practical MU-MIMO operations.

In Chapter 4, we try to apply tree search techniques to solve codebook search problem in Uplink CoMP systems. CoMP techniques has been adopted in LTE-A to improve the cell average and cell edge throughputs. It uses the cooperation between points in several cooperation groups to coordinate the transmission for inter-cell interference (ICI) alleviation and link quality enhancement. We will propose an efficient codebook search algorithm to locate the optimal codebook set in centralized UL CoMP systems. The codebook search issue can be reformulated as a tree search

form and the solution can be obtained efficiently using a modified K-Best enumeration

strategy. The proposed algorithm can effectively perform precoder selection and maintain a significantly lower complexity compared to the exhaustive search method.

Finally, Chapter 5 concludes this dissertation and discusses future extensions of this research.

(30)

Chapter 2 Efficient Search Algorithm for Over-

determined MIMO systems

2.1 Overview

As the mentioned in Chapter 1, the K-Best SDA uses a breadth-first search and

keeps the K-Best candidates of each layer for the search of the next layer. Briefly, the

main idea of the K-Best SDA is to keep only K candidates which have the smallest path

weights as the most promising solutions. Hence, the decoding throughput of the K-Best

SDA is stable. Unfortunately, applying a sorting algorithm to find the K-Best

candidates in each layer requires many computational operations and a long decoding

latency. Moreover, the value of K must be large enough to achieve near-ML

performance, and this would increase the computational complexity, decoding latency, and implementation cost. For reducing the decoding complexity and obtaining

reasonable performance for practical applications, the selected value of K is usually

same as the constellation size. Therefore, how to trade off between performance and complexity is still an active research issue.

In this chapter, we will propose a simple and efficient complex domain candidate sequence generator (CSG) first. The CSG is developed based on the fact that neighboring points share the same candidate sequence in the complex plane, rendering the relevant rule invariant to constellation rotation. With a minor modification, the proposed decoder can be easily applied to wireless communication systems with constellation pre-rotation to obtain a larger diversity gain. By combining the proposed CSG with an efficient sorting architecture, the proposed decoder can significantly

(31)

reduce path weight calculations and comparison operations without sacrificing detection performance. Moreover, to address the performance issue, a new search strategy that incorporates the ML search in the preceding layers under poor channel conditions (i.e., channel matrix is ill-conditioned) improves the performance of the

proposed K-Best SDA even when the value of K is small. A judicious criterion is

proposed that helps determine fewer ML search layers than previous works [23], [27]. An efficient search procedure is also proposed that fully utilizes existing hardware elements. The procedure increases hardware utilization and significantly reduces

implementation cost. Combining the above features, the proposed K-Best SDA exhibits

lower complexity, excellent performance, and is well-suited to real-time applications.

2.2 Signal Model

Consider an MIMO system with N transmit antennas and M receive antennas. The

received signal vector is denoted as 1

1 2 T _M M y y y ´ =é_ê_ë ù_ú_û Î y   , where y is the _m

received signal at the mth receive antenna. Similarly, the transmitted signal vector is

denoted as ₁ ₂ T N[ ] N x x x j é ù =_ê_ë  _ú_û Î x , where [ ]j :=

{

a +jb a b, Î

}

is the

set of Gaussian integers and x is the transmitted signal at the nth transmit antenna. _n

The transmitted signal constellation is assumed to be either 16-QAM or 64-QAM.

Assume M ³N and that the channel responses are frequency-flat fading and remain

constant during a frame transmission. The channel matrix can be expressed as

1,1 1,2 1, 2,1 2,2 2, ,1 ,2 , , N N M M M N h h h h h h h h h é ù ê ú ê ú ê ú = ê ú ê ú ê ú ê ú ë û H        (2.1)

(32)

Assuming that there is sufficient antenna separation at the transmit and receive sites, the entries of the channel matrix H can be regarded as i.i.d. complex Gaussian random variables with zero-mean and unit variance. The relationship between the received signal vector and the transmitted signal vector can be expressed as

,

y = Hx + n (2.2)

where n = _ë_êén₁ n₂  n_Mù_û_úT ÎM´1 is the i.i.d. complex additive white Gaussian

noise (AWGN) vector with zero-mean and covariance matrix 2_I

M

s .

The optimal detector for MIMO systems is the ML detector, which searches all possible combinations of transmitted symbols via the following criterion [10]

2 ˆ arg min , S Î = -x x y Hx (2.3)

where _S ₌_ON_{denotes the set of all possible transmitted symbol vectors and O is}

the modulation symbol alphabet set with a size of M . The computational complexity _c

of ML detection grows exponentially with N. Therefore, it is difficult to be

implemented at the receiver in practice.

The basic idea of the SDA is to restrict the search region of the optimal solution to a smaller subset. Typically, the search region is constrained to the interior of a hyper-sphere of radius d centered around the received signal y as described by [10]

2

2 _.

d ³ y-Hx (2.4)

First, performing complex QR-decomposition to the channel matrix produces

, é ù é ù ê ú ê ú ê ú ë _{û ê ú} ë û 1 2 R H = Q Q 0 (2.5)

where Q₁ ÎM N´ and Q₂ ÎM´(M N- ) are unitary matrices, R is an N´N

(33)

into (2.4), we have

( )

2 2

' ' _,

d ³ y -Rx (2.6)

where y' =Q y₁H and

( )

d' 2 =d2- Q y₂H 2

.

The right-hand-side of (2.6) can be expanded as

( )

2 2 2 ' ' ' , 1 2 2 ' ' , -1 -1, -1, -1 -1 2 2 2 '' 2 '' , 1, 1 -1 -1 , N N i i j j i j i N N N N N N N N N N N N N N N N N N N d y r x y r x y r x r x r y x r y x = = - -³ - = -= - + - - + = - + - +

å

y Rx   (2.7) where '' ' _, _, 1 N i i i j j i i j i y y r x r = + æ _ö÷ ç _÷ =ç_ç - _÷ ÷÷

çè

å

ø . Define the path weight P and branch weight k B k

of the kth layer as for for 1 0, 1 , , 1 k k k k P k N P P₊ B k N ìï = = + ïí ï = + £ £ ïî (2.8) 2 2 '' , . k k k k k B =r y -x (2.9)

The path weight P is the partial Euclidean distance (PED)_k which is a positive and

non-decreasing function of k. The iterative search for the candidates x xN, N-1, , , x x2 1

can be easily transformed into a tree search problem [10]. The decoding process of the

K-Best SDA can then be regarded as descending a tree in which each parent node has

c

M branches.

The main idea of the K-Best SDA is to keep only the K candidates with the smallest

path weights as the most promising solutions. The procedure of the complex K-Best

(34)

Step1:

(a). Set k = N. For each symbol in the complex-plane constellation, calculate PN =BN

.

(b). Choose those symbols having the K smallest paths. Step2:

(a). k ¬ -k 1

.

(b). Path Evaluation: For each partial symbol vector that survives the previous layer; for each symbol in the complex-plane constellation, calculate: Pk =Pk+₁+Bk

.

(c). Sorting and candidate selection: Sort the KMc PEDs, and select K

partial nodes having the smallest PEDs among the entire candidate set

.

Step3:

If k =1

Output the vector with the smallest path weight as the estimated solution. Else

Go back to Step 2.

In (2.8)-(2.9), path weights are defined for a given candidate symbol x . When performing the decoding procedure of Step 2, multiple candidate symbols need to be evaluated concurrently for finding the optimal solution. Therefore, a multi-index notation is needed and Step 2 can be further elaborated as follows:

Let 1_, 2_{, ,} K

i i i

P P  P denote the K smallest PEDs in the ith layer, where

1 2 K

i i i

P £P ££P . In performing search in the (i- th layer, first conduct full 1)

path expansion from the K parent nodes to obtain KM branch weights _c

1,1 1,2 1, ,1 ,2 , 1, 2, , 1c, , 1, 2, , 1 c M K K K M i i i i i i B_- B_-  B_-  B_- B_-  B_- and PEDs 1,1 1,2 1, 1, 2, , 1c, , M i i i P_- P_-  P_-  ,1 ,2 , 1, 2, , 1 c K K K M i i i P_- P_-  P_- respectively, where m n, i B and , 1 m n i

(35)

PED of the nth path expanded from the mth parent node. The associated PED of each

designated node can be evaluated according to , ,

1 1

m n m m n

i i i

P_- =P +B_- . Next, sort the

c

KM PEDs, and select K partial nodes having the smallest PEDs among the whole

candidate set. The above operations are illustrated in Figure 2-1

Figure 2-1: Illustration of the multi-index operation.

A popular alternative to the complex K-Best SDA works in the real domain by

performing RVD on the complex signal model:

( )



( )

Re Re Re Re Im , , , , Im Im Im Im Re é ù é ù é ù é _- ù ê ú ê ú ê ú ê ú = _ê _ú =_ê _ú = _ê _ú =_ê _ú ê ú ê ú ê ú ê ú ë û ë û ë û ë û y x n H H y x n H y x n H H    _(2.10) which yield , = + y Hx n (2.11)

where H Î2M´2N , yÎ2M´1, nÎ2M´1, and xÎ L2N´1 Ì2N´1. Note that { 3, 1,1, 3}

L = - - for 16-QAM and L = - - - -{ 7, 5, 3, 1,1, 3, 5, 7} for 64-QAM.

After RVD, each component x of x is chosen from a set L of integer numbers with _i

c

(36)

detection problem can be solved in the real domain using the same K-Best algorithm.

This is denoted as the conventional K-Best SDA. In [28]-[29], it is shown that the

conventional K-Best SDA slightly outperforms the complex K-Best SDA and requires

lower complexity. However, the conventional K-Best SDA may not always be

applicable in some communications systems with special diversity features. Modified

K-Best SDA aim to reduce decoding complexity, but usually introduce performance

degradation, which is more significant in the complex domain [30]-[31]. These prompt

the development of a low-complexity and high-performance K-Best SDA directly

operating in the complex domain.

2.3 Proposed Sorting Algorithm and Hardware

Architecture

This subsection proposes a complex K-Best sphere decoder that achieves the same

performance as the conventional K-Best SDA with lower complexity. As described in the

previous procedural summary, the K-Best SDA involves three major operations: path

evaluation, sorting, and candidate selection. In the following, new algorithms for sorting and candidate selection will be developed to achieve the reduction in computations. The

path evaluation part remains unchanged so that the decoding performance of the K-Best

SDA can be maintained.

2.3.1 Candidate Sequence Generator in Complex

Plane

To search the symbols efficiently in the complex plane, it is useful to construct a table of candidate symbol sequences within a given region [14]. First, a primitive block is

(37)

defined to be a square block bounded by {1+j, 1-j, - +1 j, - -1 j}. The complex plane can be regarded as consisting of a lot of primitive blocks placed at equal distances. In Figure 2-2, a received symbol is located at y surrounded by four nearest _i"

candidate symbols 41, 42, 49 and 50 in the constellation diagram. A candidate symbol

sequence, 49-50-41-42, can then be formed according to their distance from y in _i"

ascending order. Consider then the square area centered at the origin and surrounded by the candidate symbols 27, 28, 35 and 36. Shifting the symbols 41, 42, 49 and 50 to the symbols 27, 28, 35 and 36 respectively, a location y_{i M}"_, corresponding to y can be _i"

identified. A new candidate symbol sequence, 35-36-27-28, can be identified likewise according to their distance from "

,

i M

y in ascending order. Apparently the relation in

terms of the distance from y to nearby candidate symbols remains unchanged after the _i"

coordination transformation. On the other hand, since the symbols are placed symmetrically in the complex plane, once the relation between a received symbol and the associated candidate symbol sequence in one of the four quadrants is obtained, those in the other three quadrants can be readily derived. Next, Figure 2-3 shows quadrant I of the solid-line square area in Figure 2-2. The area is divided into 30 segments (we will explain how to partition the specified square area later). It can be verified that all symbols inside any given segment share the same candidate symbol sequence of k symbols, where

(38)

-1 1 3 5 -1 1 3 5 Q u a d ra tu re In-Phase

QAM Constellation

25 26 27 28 33 34 35 36 41 42 43 44 49 50 51 52

Figure 2-2: Modulo operation of the search center.

Figure 2-3: Partition of the search segments.

For example, consider two symbols “c” and “d” inside segment 01, and evaluate the distances between all valid candidate symbols and the two points. It is straightforward to

" i

y

" , i M

y

1 Modulo 2 Add offset

(39)

verify that the resulting two candidate sequences are identical, i.e. {1+j, -1+j, 1-j,

-1- j, 1+ j3, -1+ j3, 3+ j, -3+ j, 3- j, -3- j, 1- j3}. For other segments, the same result applies.

Using the above properties, we can construct a table of candidate symbol sequences of the k nearest constellation symbols for all symbols bounded by

{1+j, 1-j, - +1 j, - -1 j} instead of generating approximated path weights

[30]-[31]. Due to the symmetry of 16-QAM and 64-QAM, a simple transformation

allows symbols in the region bounded by {1+j, 1-j, - +1 j, - -1 j} in

quadrants II, III, and IV to use the same table as quadrant I. Any symbol located within the bounded region is first mapped to quadrant I by a simple transformation. The transformed result acts as the search center for finding the k nearest candidate symbols by looking up the table of the symbol sequences, where k is a specified number. When the candidate symbol sequence

{ }

x is found, it can easily be transformed back to the i

original quadrant. Figure 2-3 shows the partition of the search segments in quadrant I and the corresponding symbol sequences are listed in Table 2-1, where k = 11 is chosen as an example. This table can be constructed in advance by the following off-line procedure:

First, the bounded square area by {1+j, 1, , 0 j } is divided into u2 grids, by (u-1)

equally-space horizontal and (u-1) equally-space vertical lines, where u is chosen

according to the required resolution. The corresponding distances between all valid

candidate symbols and the center of each grid, which represents all possible received symbols within, are then evaluated. Next, by using some sorting procedure, the associated candidate sequence of any possible received symbol can be determined.

(40)

Finally, all these possible symbols are rearranged into several search segments such that each segment has the same candidate sequence. By this approach, it is easy to tackle any pre-defined constellation rotation during run-time processing. The following describes the run-time operation in detail:

Table 2-1: LIST OF CANDIDATE SEQUENCES Segment ID Candidate Sequence

01 ₁₊_{j, -1+j, 1-j, -1- j, 1+ j3, -1+ j3, 3+ j, -3+ j, 3- j, -3- j, 1- j3} 02 ₁₊_{j, -1+j, 1-j, -1- j, 1+ j3, -1+ j3, 3+ j, -3+ j, 3- j, -3- j, 3+j3}

: :

29 ₁₊_{j, 1-j, -1+j, 3+ j, -1- j, 3- j, 1+ j3, -1+ j3, 3+j3, 1-j3, -1- j3} 30 ₁₊_{j, 1-j, -1+j, 3+ j, -1- j, 3- j, 1+ j3, 1-j3, -1+ j3, 3+j3,-1- j3}

For any given search center " i

y in the complex plane, the CSG first rounds it to the

relative position "

, i M

y which lies inside the region bounded by

{1+j, 1-j, - -1 j, - +1 j}. This modulo operation is depicted in Figure 2-2:

Modulo operation of the search center. and the associated relationship is described as follows: For

( )

" Re y : i

( )

(

( )

)

( )

X_offset mod X_offest " " " " , Re Re ,2 Re Re i i i M i y y y y ì _ê _ú _ê _ú ï ₌ ₊ ï _ê _ú _ê _ú ï _ë _û _ë _û íï ₌ -ïïî (2.12) For

( )

" Im y : i

(41)

( )

(

( )

)

( )

Y_offset mod Y_offest " " " " , Im Im ,2 . Im Im i i i M i y y y y ì _ê _ú _ê _ú ï ₌ ₊ ï _ê _ú _ê _ú ï _ë _û _ë _û íï ₌ -ïïî (2.13)

Figure 2-4(a) shows the modulo unit of _Re

( )

" i

y based on the 2’s complement

property, which is efficiently implemented by a single adder and a few bit manipulations. S is the sign bit (i.e. MSB) of _Re

( )

"

i

y and b is the LSB of the ₀

integer part of _Re

( )

" i

y . Since the modulo operation of _Im

( )

" i

y is the same as

( )

"

Re y , the modulo circuits of _i _Im

( )

" i

y and _Re

( )

" i

y are identical.

In the next step, if " , i M

y lies in quadrant II, III or IV, the CSG unit maps " , i M

y into

quadrant I by rotating p p and 3₂, p respectively. Figure 2-4(b) shows this ₂

transformation circuit. The multiplexers chooses a right data path based on the sign bits of

( )

"

,

Re yi M and

( )

" ,

Im yi M . The coordinates dx_t and dy_t denote the transformed

values of

( )

" , Re yi M and

( )

" , Im yi M , respectively.

Figure 2-4: (a) Modulo unit of _Re

( )

" i

y .(b) Transformation unit of " , i M

y .

The set (dx_t, dy_t) is sent to the candidate generator unit to generate the desired

(42)

identification (ID) and its corresponding candidate sequence are stored in ROM 1 and ROM 2, respectively, as shown in Figure 2-5, where the hardware architecture of the candidate generator is depicted. The found candidate symbol is first rotated into its original quadrant, and then the offset pair (X_offset, Y_offset) is added to the coordinates of the found candidate symbol. After the constellation restoration, the constellation boundary checker checks whether or not the found symbol lies inside the constellation boundary. If the found restored symbol is a legal one, the distance

calculator calculates the value of " 2

i i

y -x . Multiplying the value of " 2

i i

y -x by 2

, i i

r and adding the parent weight Pi+1 to the multiplied result, we obtain the path weight P i

of the found symbol. The CSG can efficiently generate the coordination pairs of valid candidates and the associated path weights according to their path weights in an ascending order for each given received symbol. From Figure 2-5, the major hardware cost of the CSG involves 3 multipliers, 12 adders, and 2 ROMs. The ROM sizes (number

of logic gates) are 2116 (ROM 1, with u = 32) and 731 (ROM 2), respectively, according

the Synposys® synthesis tools.

For any given symbol and its neighbors, which share the same candidate sequence,

the candidate sequence is generated from the k nearest constellation symbols by sorting

their relative distance to the search center, though these distance values are different for each different search center. The proposed CSG utilizes this property to generate a candidate sequence in ascending order, and calculates the associated path weights so as to avoid a heavy load of path weight evaluations and sorting. Based on this concept, we

can choose the appropriate k to fit the system requirement. The ROM size expands

quickly when a large value of k is chosen. To remedy this, we can divide k into a set

{ }

p ₁ i i k ₌ where = =

å

1 p i i

(43)

Figure 2-5: Hardware architecture of the candidate generator.

2.3.2 Architecture of Highly-Parallel Comparison

Circuit (HPCC)

The sorting operations in the K-Best decoder dominate the major complexity at each

search layer. Hence, sorting is a critical factor in reducing the complexity of the K-Best

SDA. The previously proposed CSG module can be applied to the K-Best SDA by

exploiting the inherent partial orders coming with the property of CSG. This can be

efficiently accommodated by applying the K-merge algorithm [30], [33]. For a more

practical implementation, an efficient architecture that can effectively reduce the sorting complexity is needed.

Recall the definitions of branch weights and PED in Section 2. Let 1 2

, , , K

i i i

(44)

denote the K smallest PEDs in the ith layer. After full path expansion, we have KM c PEDs 1,1 1,2 1, ,1 ,2 , 1, 2, , 1c, , 1, 2 , , 1 c M K K K M i i i i i i P- P-  P-  P- P-  P- at layer i , where , 1 m n i

P- stands for the

PED of the nth path expanded from the mth parent node at layer i . Moreover, based on

the sorted results from the ith layer and the generated sequence from the proposed CSG

module, we have 1 2 K i i i P <P <<P and _-,1< _-,2 << _-, 1 1 1 j j j k i i i P P P for each £ £

1 j k . Selecting the node with the smallest PED from the set

{

1,1_, 2,1_{, ,} K,1

}

i i i

P P  P

is equivalent to finding the smallest PED from the full path expansion set containing

c

KM nodes. These operations are illustrated in Figure 2-6.Exploiting these properties instead of using traditional sorting algorithms, we can realize an efficient comparison

architecture for the K-Best sorting at each stage that avoids full path evaluation and

significantly reduces the sorting complexity. Figure 2-7 depicts this hardware architecture, and the following describes its operation.

(45)

Figure 2-7: HPCC architecture.

The output sequence of the CSG module naturally forms a set in ascending order

according to the evaluated PEDs while performing the Nth layer search. We therefore

only need to conduct a single coordination transformation and K path weight

calculations. The generated results serve as the parent nodes of the next layer. To search in the (i - th layer, we first calculate 1)

{

1,1 2,1 ,1

}

1, 1, , 1 K i i i

P_- P_-  P_- and feed them

into the HPCC. The candidate node with the smallest PED among these candidates is

obtained immediately after (K -1) compare-and-select (CAS) operations. If the

chosen node comes from the pth parent node, then the ,2

1 p p i i

P +B- PED is calculated,

overwriting the previously-chosen node. The node with the 2nd smallest PED is obtained

after log K CAS operations (only ₂ log K results need be re-computed). Repeating ₂

this procedure, we can successfully select K candidate nodes with the smallest PEDs

from the entire valid candidate set. The survival set acts as the parent nodes of the

(i -2) th layer. In searching the nodes in each layer, we use K coordination

(46)

operations. Note that the computational complexity of this approach is nearly fixed and

independent of the constellation size M of the transmitted symbols. Furthermore, the _c

nodes in the survival set still exhibit an ascending order according to their PEDs. In the final search layer, i.e., the 1st layer, we only need to choose the node with the smallest

PED as the detection result. Hence, it takes only K coordination transformation, K PEDs

evaluations, and (K -1) CAS operations.

Compared with the winner path expansion method [34]-[35], the proposed architecture, which is also frequently found in Viterbi decoder for choosing the minimal path metric, can avoid performing unnecessary operations thanks to the property of

parallel computation. Moreover, it requires a smaller number of CAS (K -1) than

that of the conventional bubble sort method (K).

2.3.3 Complexity Advantages

Through the combination of the two proposed modules, we only need K coordination

transformations, (2K - PED evaluations, and (1) K-1 1)

(

+log2K

)

CAS operations in

each layer to obtain K nodes with the smallest PEDs, regardless of the constellation size. These

PEDs only need to be calculated when they are fed into the HPCC. Hence the proposed architecture avoids exhaustive path weight evaluations as required in the conventional bubble sort architecture.

Previous methods attempt to reduce computational complexity by eliminating the number of visited nodes based on the probability or statistical properties of the additive noise. These methods provide an approximate solution, and barter decoding performance for complexity reduction. As an alternative, this chapter presents another way to reduce complexity with the premise of carrying on high quality decoding results. The proposed approach utilizes operation decomposition, reconstruction, and associated efficient

(47)

hardware architecture to select and evaluate only the most promising candidate symbols. The proposed method also significantly reduces computational complexity and provides an efficient solution with a nearly fixed throughput. These advantages are further enhanced when a larger constellation size is adopted. Although the proposed method incurs the extra cost of coordination transformation and restoration, it eliminates many path calculations and sorting operations, and provides the same performance as the

conventional K-Best SDA.

2.4 Proposed Search Strategy for Near-ML

Performance

One way to reduce the complexity of the conventional K-Best SDA is to choose a

smaller number of survival nodes in each layer. However, this can cause performance

degradation in term of error rate. Instead of choosing a sufficiently large K to achieve

the near-ML performance, a new search strategy is proposed. The proposed search strategy preserves all candidate symbols and performs the ML search in the preceding

layers when dealing with poor channel conditions. Only K candidates are kept for the

remaining lower layers. The following sections show how to determine the number of layers performing the ML search.

2.4.1 Preprocessing with Column Permutation

The channel matrix can be preprocessed with various techniques to reduce the

complexity of candidate search and/or improve the performance of the K-Best SDA.

Many preprocessing techniques can be used for this purpose, including column permutation [13], scaling [36] and lattice reduction [37]. In this chapter, column permutation is adopted, in which the permutation order is determined according to the

(48)

column norms of the channel matrix in ascending order. Given the QR decomposition of

the ordered channel matrix H_o =Q R , we characterize below the cumulative _o _o

distribution function (cdf) of the square of the diagonal entries of R denoted by _o 2 , , o i i

r (see the Appendix A):

for i = 1

( )

₍

_{) (}

₎

2 , , 1 1 1 0 0 ! ! 1 ! 1 ! o i i N r _M _k x M x r k N x F r e x e dx k N M -- - -= é ù ê ú = _ê _ú ⋅ - - _ë

å

_û

ò

(2.14) for 2£ £i N

( )

( ) (

)

2 , , -1 -1 _-1 _-1 - -2 - - 1 -0 0 0 0 1 1 . ! ! o i i r _i _{N i} s M k M k _{M i} _i x x M x ii r k k x x F r C e e x e s s dxds k k = = é ù é ù ê ú ê ú = _ê - _{ú ê} _ú ⋅ -ë

å

û ë

å

û

ò ò

(2.15) where

(

) (

!

) (

!! 1 !

) (

2 !

)

. ii N C N i M i i i = - - - - (2.16)

Comparing (2.14)-(2.16) with the results of [13], the ordering mechanism increases

2 , i i

E ré ù_{ë û in the preceding layers, producing two main benefits. First, for a fixed K in the} K-Best SDA, increasing 2

, i i

E ré ù_{ë û in the preceding layers reduces the effective search}

range of the candidates. This in turn reduces the probability of the ML solution being dropped in the preceding layers. Another benefit is that it constrains the growth of the tree and hence reduces search complexity.

高效率之球型解碼演算法及其應用

國 立 交 通 大 學

電信工程研究所

博 士 論 文

高效率之球型解碼演算法及其應用

Highly Efficient Sphere Decoding

Algorithm and Its Applications

研 究 生：黃崇榮

指導教授：李大嵩 博士

高效率之球型解碼演算法及其應用

Highly Efficient Sphere Decoding Algorithm and

Its Applications

研究生：黃崇榮

Student:

Chung-Jung

Huang

指導教授：李大嵩 博士 Advisor:

Dr.

Ta-Sung

Lee

國立交通大學

電信工程研究所

博士論文

A Dissertation

Submitted to Institute of Communication Engineering

College of Electrical and Computer Engineering

National Chiao Tung University

in Partial Fulfillment of the Requirements

for the Degree of Doctor of Philosophy

in

Communication Engineering

September 2013

Hsinchu, Taiwan, R.O.C.

高效率之球型解碼演算法及其應用

學生：黃崇榮

指導教授：李大嵩 博士

Chinese Abstract

國立交通大學電信工程研究所

摘要

Highly Efficient Sphere Decoding Algorithm and Its

Applications

Student: Chung-Jung Huang

Advisor: Dr. Ta-Sung Lee

English Abstract

Institute of Communications Engineering

National Chiao Tung University

Abstract

Acknowledgement

Table of Contents

Chinese Abstract ... i

English Abstract ... ii

Table of Contents ... v

List of Figures ... viii

List of Tables ... xi

Acronym Glossary ... xii

Notations ... xv

Chapter 1 Introduction ... 1

Chapter 2 Efficient Search Algorithm for Over- determined

MIMO systems ... 12

Chapter 3 Geometry Based SDA for Under- determined MIMO

systems ... 50

Chapter 4 Efficient Search Algorithm for Codebook Search in

Uplink CoMP Systems ... 77

Chapter 5 Conclusions and Future Works ... 91

APPENDIX ... 94

List of Figures

( )

List of Tables

Acronym Glossary

Notations

Chapter 1

Introduction

1.1 Basics of Multi-Antenna Systems

1.2 Basics of MIMO Decoder

1.3 Related Literature Review

1.4 Main Contributions

1.5 Organization of Dissertation

Chapter 2

Efficient Search Algorithm for Over-

determined MIMO systems

國立交通大學

博士論文

研究生：黃崇榮

指導教授：李大嵩博士

指導教授：李大嵩博士 Advisor:

指導教授：李大嵩博士