運用改良式重複樹狀搜尋策略的低複雜度軟性輸出球狀解碼器

(1)

國

立

交

通

大

學

電信工程研究所

碩

士

論

文

運用改良式重複樹狀搜尋策略的低複雜度軟性輸出球狀解碼器

Low-Complexity Soft-Output Sphere Decoding with Modified Repeated Tree

Search Strategy

研究生：邱榮東

指導教授：陳伯寧教授

(2)

運用改良式重複樹狀搜尋策略的低複雜度軟性輸出球狀解碼器

Low-Complexity Soft-Output Sphere Decoding with Modified Repeated Tree

Search Strategy

研究生：邱榮東 Student：Rong-Dong Chiu

指導教授：陳伯寧 Advisor：Po-Ning Chen

國立交通大學

電信工程研究所

碩士論文

A Thesis

Submitted to Institute of Computer and Information Science College of Electrical Engineering and Computer Science

National Chiao Tung University in partial Fulfillment of the Requirements

for the Degree of Master in

Communication Engineering

June 2013

Hsinchu, Taiwan, Republic of China

(3)

運用改良式重複樹狀搜尋策略的低複雜度軟性輸出球狀解碼器

學生：邱榮東

指導教授

：

陳伯寧

國立交通大學電信工程研究所碩士班

A

摘

要

E

針對 MIMO 技術的各種解碼器早己提出，但是對於如何達到好的複雜度-效能平衡，

目前仍然是個挑戰。在此篇碩士論文中，我們提出了改良式的重複樹狀搜索的低複雜度

軟性輸出球狀解碼器，並以模擬來說明其具有好的複雜度-效能平衡，並且是可由硬體

來實現的。接著我們更進一步的將此改良式的重複樹狀搜索的低複雜度軟性輸出球狀解

碼器應用於干擾調變未知下的干擾消除。模擬結果顯示相較於傳統的廣義概度比例測試

法，我們所提的方法可以在耗費較少的資源和較低的複雜度下達到同樣好的調變偵測率

和區塊錯誤率。

(4)

Low-Complexity Soft-Output Sphere Decoding with Modified Repeated

Tree Search Strategy

student：Rong-Dong Chiu

Advisor：Po-Ning Chen

Institue of Communications Engineering

National Chiao Tung University

ABSTRACT

Various detectors for multiple-in-multiple-out (MIMO) technologies have been

proposed, yet to achieve a good complexity-performance tradeoff still remains a

challenge problem. In this thesis, we proposed a low complexity soft-output sphere

decoding, called modified repeated tree search (RTS), that can achieve good

complexity-performance tradeoff and is suitable for hardware implementation. We

further apply the modified RTS for interference cancellation with unknown

interference modulation. Simulation results show that a good modulation

classification rate and block error rate (BLER) can be achieved with lesser

complexity and resources consumed when it is compared with the traditional general

likelihood ratio test (GLRT).

(5)

A

誌

謝

EA

先誠摯的感謝指導教授陳伯寧博士，陳老師在我研究生

活遇上無法解決的困難時接納了我；而不時的討論並指點我

正確的方向，使我在這些日子獲益匪淺。老師們對學問的嚴

謹更是我輩學習的典範。

而本論文的完成另外亦得感謝台北大學通訊工程所的

謝欣霖博士大力協助以及指導。在謝欣霖博士亦師亦友的指

導之下，不論是在通訊工程領域的數學推導或程式模擬上，

都讓我獲益良多；更教會了我用正面的態度來面對各種學術

研究上的挫折以及障礙。

感謝 823 實驗室的所有學長、同學、學弟們，你/妳們

的幫忙及搞笑我銘感在心，沒有你們這篇論文亦無法順利完

成。女朋友在背後的默默支持更是我前進的動力，對於我坎

坷的研究生涯給予了最大的的體諒及包容，。

最後，謹以此論文獻給我摯愛的雙親。

(6)

List of Figures

2.1 Illustration of a heterogeneous network . . . 4

2.2 System block diagram . . . 12

2.3 Structure of a symbol-SIC receiver . . . 12

2.4 Structure of a codeword-SIC receiver . . . 13

3.1 Illustration of the second stage with NT = 3 and b = [b1 b2 b3] = [4 3 2] for the proposed modified RTS. The thick solid line corresponds to the near-ML path obtained from the first stage. . . 15

3.2 Impact of Lmax on the STS with turbo coding under fast (left subfigure) and slow (right subfigure) Rayleigh fading channels . . . 19

3.3 Impact of Lmaxon the modified RTS with turbo coding under fast Rayleigh fading channels, where b = [4444] and [2222] are employed. . . 20

3.4 Impact of Lmaxon the modified RTS with turbo coding under slow Rayleigh fading channels, where b = [4444] and [2222] are employed. . . 21

3.5 Impact of Lmaxon the SOCA with turbo coding under fast Rayleigh fading channels, where b1 = 16 and 8 are employed. . . 22

3.6 Impact of Lmax on the SOCA with turbo coding under slow Rayleigh fading channels, where b1 = 16 and 8 are employed. . . 23

3.7 Impact of T1 on BLERs and complexities under fast Rayleigh fading chan-nels. . . 24

3.8 Impact of T1 on BLERs and complexities under slow Rayleigh fading channels. . . 24

(9)

3.9 Performance versus complexity for the STS, the SOCA, and the modi-fied RTS in fast Rayleigh fading channels. The numbers beside the STS marks are the Lmax values used. The numbers next to the SOCA curve

correspond to b1. The number next to each modified RTS mark is T2. . . 25

3.10 Performance versus complexity for the STS, the SOCA, and the modified RTS with turbo code in slow Rayleigh fading channels. The numbers beside the STS marks are the Lmaxused. The numbers next to the SOCA

curve correspond to b1. The number next to each modified RTS mark is T2. 25

3.11 Performance versus complexity for the STS, the SOCA, and the modified RTS with convolutional code in fast Rayleigh fading channels. The num-bers beside the STS marks are the Lmax used. The numbers next to the

SOCA curve correspond to b1. The number next to each modified RTS

mark is T2. . . 26

3.12 Performance versus complexity for the STS, the SOCA, and the modified RTS with convolutional code in slow Rayleigh fading channels. The num-bers besidethe STS marks are the Lmax used. The numbers next to the

SOCA curve correspond to b1. The number next to each modified RTS

mark is T2. . . 26

3.13 Performance versus different percentile-complexity for the STS and the modified RTS with turbo code in fast Rayleigh fading channels. The numbers beside the STS marks are the Lmax used. The number next to

each modified RTS mark is T2. . . 27

3.14 Performance versus different percentile-complexity for the STS and the modified RTS with turbo code in slow Rayleigh fading channels. The numbers beside the STS marks are the Lmax used. The number next to

(10)

3.15 Complexity distribution of the STS with Lmax = 0.2. The maximum range

of the complexity is 200, where the probability of complexity exceeding 200 is 0.0035. . . 28 3.16 Complexity distribution of the modified RTS with Lmax = 0.25 and b =

[4444]. . . 28 4.1 Illustration of interference from a marco cell to a user at the cell boundary

of a pico cell . . . 30 4.2 Structure of the proposed soft-output sphere decoding for joint

modula-tion classificamodula-tion and detecmodula-tion . . . 31 4.3 Illustration of the extended search tree for the 1st stage of the modified

RTS for unknown interference modulation . . . 32 4.4 Illustration of the 2nd stage of the modified RTS for unknown interference

modulation . . . 34 4.5 Comparison of BLERs between correct and erroneous declarations of

mod-ulation scheme for interferences . . . 36 4.6 Modulation classification errors. The modulation schemes of i1 and i2 are

set as {16QAM, 16QAM}, {QPSK, 16QAM}, {QPSK, QPSK} from left to right in the three subfigures, respectively. . . 37 4.7 Modulation classification error. Modulation schemes of i1 and i2 are

ran-domly chosen from QPSK and 16QAM. . . 38 4.8 Classification errors for different Vbias subject to N = 8 . . . 39

4.9 Classification errors for different Vbias subject to N = 16 . . . 40

4.10 Modulation classification error rate for unfair voting and the modified GLRT . . . 41 4.11 Average complexity and the numbers of paths stored during the

modula-tion classificamodula-tion stage . . . 42 4.12 Block error rate of the modified GLRT and the proposed unfair voting . 43

(11)

Chapter 1 Introduction

With increasing demand on system throughput, multiple-input-multiple-output

(MIMO) techniques become a trend in current and future communication technologies. Modern communications standards such as IEEE 802.11n, Worldwide Interoperability for Microwave Access (WiMAX), and Long Term Evolution (LTE), all accommodate the MIMO techniques into their standards.

Along this trend, several linear detecting methods such as zero-forcing (ZF) and minimum mean square error (MMSE) detectors have been proposed for MIMO systems. Due to linearity property, these linear detectors have low and fixed complexities and hence can be easily hardware-implemented. Their resulting performance however would sacrifice a certain degree of MIMO antenna gain. On the other hand, a brutal-force maximum-likelihood (ML) detector finds the optimal codeword at a price of huge and thus impractical complexity. To compromise, the so-called sphere decoding [2, 3] (SD) algorithm maintains a good tradeoff between performance and complexity by searching the (near-ML) best codeword among those within a sphere.

The SD algorithms can be categorized into two classes: hard-output SD algorithms and soft-output SD algorithms. As their names reveal, a hard-output SD algorithm outputs estimates of information bits, while a soft-output one generates the soft like-lihood information for each bit, which can be used to co-work with a soft-input outer decoder such as the turbo decoder. The combination of the soft-output SD algorithm

(12)

and the soft-input outer decoder can then provide significant performance gain over the hard-output SD algorithm [4, 5, 6].

In order to generate soft likelihood information for each bit, more complexity is in-duced for the soft-output SD algorithm when being compared with a hard-output one. Some methods have thus been proposed for the reduction of complexity of the soft-output SD algorithm (with possibly a slight degradation in performance) such as log-likelihood ratio (LLR) clipping and channel matrix regularization [5]. A drawback of these meth-ods, when considering their hardware implementation, is that they have varying com-plexity. The authors in [6] then resolved this problem by proposing a fixed-complexity soft-output SD algorithm, named smart ordering candidate adding (SOCA). Notably, the fixed complexity of the SOCA is actually higher than the average complexity of the soft-output SD algorithm in [5]. The merit of being more easily hardware-implementable of the former due to its fixed complexity nonetheless makes it a better choice in practical application.

In this thesis, we propose in Chapter 3 a new soft-output SD algorithm, which will be referred to as the modified repeated tree search (RTS) SD algorithm [1]. The proposed modified RTS SD algorithm guarantees a practically low complexity upper limit and hence is suitable for hardware implementation. From simulations, our modified RTS SD algorithm can provide a better performance-complexity trade-off in comparison with the SOCA, and its complexity upper limit is clearly smaller than the 99.9-percentile complexity of the single tree search (STS) SD algorithm in [5], where the 99.9-percentile complexity should be the designed maximum complexity allowable for the hardware implementation of a varying complexity soft-output SD algorithm like the STS.

We next investigate in Chapter 4 the interference cancellation (IC) over MIMO sys-tems. Different from treating the interference as a part of the background noise as conventional detection problem does, we now attempt to cancel the interference from other users. Usually, the modulation scheme of the interference from other users is

(13)

un-known to the receiver. Estimation of the possible modulation scheme of interferences thus becomes essential. In [14], Shim et al. proposed to use a modified general likeli-hood ratio test (GLRT) to perform IC with unknown interference modulation scheme, and a bias term is added to the GLRT quantity to balance the impact from different modulation schemes. We then found that the bias term in [14] has not been optimized. An additional 1 dB gain can actually be resulted under the block error rate of 0.01 if we optimize this bias term. Subsequently, we continue to find that the previously pro-posed modified RTS can be further modified to resolve the IC problem with unknown interference modulation scheme. Details will be given in Chapter 4.

In the end, we conclude our thesis in Chapter 5. Possible extension of the proposed modified RTS SD algorithm is also suggested.

(14)

Chapter 2 System Model and Background

ŃŢŴŦġŴŵŢŵŪŰůġŰŧġŎŢŤųŰġ ńŦŭŭĭġŃŔŎ ńŰŷŦųŢŨŦġŰŧġŎŢŤųŰġńŦŭŭ ńŰŷŦųŢŨŦġŰŧġőŪŤŰġńŦŭŭ ŊůŵŦųŧŦųŦůŤŦ ŔŪŨůŢŭŴ ŃŢŴŦġŴŵŢŵŪŰůġŰŧġ őŪŤŰġńŦŭŭĭġŃŔő ŖŴŦų ŖŴŦų ŖŴŦų ŖŴŦų ŖŴŦų

Figure 2.1: Illustration of a heterogeneous network

In a heterogeneous network such as the one in Fig. 2.1, different wireless access technologies respectively for macro and pico cells may be deployed so as to fit their

(15)

characteristics. A macro cell often provides a large radio coverage with a high power base station, while a pico cell is deployed either for areas with high concentration of users or as a coverage extension for indoor communications. In particular, when deploying pico cells within macro cells, more radio resources can be allocated, thereby improving the spectrum efficiency and system throughput. These merits however will unavoidably introduce more inter-cell interference to a single user. The techniques to effectively cancel the inter-cell interferences thus become essential in a heterogeneous network. Since users in the pico cell (respectively, macro cell) may not know the modulation schemes of other users in the macro cell (respectively, pico cell), this makes the MIMO detection and interference cancellation even more challenging.

In the later sections of this chapter, we will formulate the problem we focus in this thesis and then brief the existing technologies that have been proposed to solve it.

2.1 System Model

The MIMO system that we consider in this thesis can be modeled as:

y= HsPsxs+ HiPixi+ n (2.1)

where Hsand Hi represent respectively the channel matrices for signal and interference,

and Ps and Pi are respectively the precoder matrices for signal and interference. Here,

xs is the signal the receiver desires, and xi denotes the interference signal. The length

of xs and xi are Ns and Ni, respectively. The last term n represents the usual additive

white Gaussian noise (AWGN). Note that in a heterogeneous network such as the LTE, which densely distributes with small cells, the interference power of HiPixi may be

larger than the signal power of HsPsxs, which makes cancellation technologies essential

(16)

2.2 Prior Detection Algorithms

2.2.1 Linear Receivers

Traditional so-called type-I linear receivers for solving MIMO detection problems can be written as: wMMSE,1= eHHs e HsHeHs + diag(σIN,i2 ) −₁ (2.2) where eHs = HsPs, and σ2IN,i is the equivalent noise variance that incorporates the

interference power. Another MMSE receiver [17] that is generally classified as type II considers the impact of interference, and its resulting estimate can be written as:

wMMSE,2 = eHHs e HsHeHs + eHiHeHi + diag(σIN2 ) −₁ (2.3) where eHi = HiPi, and σIN2 is the noise variance. Although optimal from the statistical

aspect, the MMSE receivers are known to be sensitive to correlations among antennas [18].

2.2.2 Sequential Interference Cancellation

When the interference power is seemingly larger than the signal power, alternative non-linear sequential interference cancellation (SIC) scheme has been proposed in the lit-erature [15][19]. This method detects and cancels the interferences, starting from the current dominating one, in sequence, and afterwards recover the desired signals. It can be further categorized as symbol-SIC (S-SIC) and codeword-SIC (C-SIC) [19].

The S-SIC applies interference cancelation independently to each subcarrier as shown in Fig. 2.3. The S-SIC is known to suffer serious error propagation when symbol error rate is high. In certain cases, the performance of the S-SIC may be even worse than that of an MMSE receiver.

To resolve this error-propagation problem, the C-SIC, as illustrated in Fig. 2.4, per-forms error correction for interference codeword before interference cancelation, and thus can significantly improve the performance in comparison with the S-SIC. A restriction

(17)

of the C-SIC scheme is that the code rate and modulation of the interference code-word must be know at the receiver; this however is not guaranteed in the current LTE standard.

2.2.3 Sphere Decoding

A technique that recently gains much attention in the practice of interference cancelation is the sphere decoding (SD). The sphere decoding technique can maintain the maximum-likelihood performance under a practically appropriate complexity, and hence is recently used by many researchers for signal detection problem in MIMO systems. Again, the system model can be expressed as:

y = HsPsxs+ HiPixi+ n

= [HsPs HiPi][xs xi]T + n

= Hx + n (2.4)

where H , [HsPs HiPi], and x , [xs xi]T.

Before we continue the presentation of the main results in this chapter, we should tell the main difference between the system model considered in Chapter 3 and that in Chapter 4. In Chapter 3, we only consider an interference-free scenario, where there is no interference signal (i.e., Ni = 0), while in Chapter 4, interference xi will interfere the

demodulation of xs and hence Ni > 0.

Specifically, in Chapter 3, the symbol vector x only contains xs. As such, an MIMO

system with NT transmit layers and NR receive layers is considered, where NT ≤ NR.

The symbol transmitted from each antenna represents Q coded information bits, namely, the information bits are mapped to 2Q_{complex constellation points. For example, Q = 2}

for QPSK, and Q = 4 for 16-QAM. We further assume that the covariance matrix of x satisfies E{xxH_{} = I}

NT, and each entry in H is complex Gaussian distributed with mean zero and variance 1/NT. The noise n is independent Gaussian distributed with

(18)

mean zero and variance N0. As a result of the above setting, the signal-to-noise ratio

(SNR) per receive antenna is exactly 1/N0.

As a contrast in Chapter 4, we assume that the receiver contains totally Ni layers

of interference and Ns layers of desired signals, where Ni + Ns = NT. Again, the

symbol transmitted from each antenna is constituted of Q coded information bits, and the information bits are mapped into 2Q _{constellation points. The modulation for x}

s

is known to the receiver; however, the modulation for xi is only known to belong to

a set of candidate schemes. The covariance matrix of x still satisfies E{xxH_{} = I} NT, which means signals and interferences have equal power. Similar to those assume in Chapter 3, the entry in the NR-by-NT matrix H is assumed to be complex Gaussian

distributed with mean zero and variance 1/NT, and additive noise n has independent

complex Gaussian components with mean zero and variance N0. The SNR per receive

antenna thus remains 1/N0.

Now we return to the introduction of the SD algorithm. The SD algorithm is ben-eficially adopted for MIMO signal detection since it guarantees finding the maximum-likelihood (ML) symbol vector with considerable reduction of demodulation/decoding complexity in comparison with the brutal force ML detector. The idea behind the SD algorithm is that it sets a sphere centered at the received symbol vector with a properly chosen radius. Only the candidate vectors that lie inside the sphere are needed to be checked, thereby reducing its complexity.

The SD algorithm has two steps: 1) per-processing step and 2) tree search step. The pre-processing step QR-decomposes the channel matrix H:

H= Q R 0_(N_R−NT)×NT , (2.5)

where Q is an unitary NR× NR matrix, and R is an NT× NT upper triangular matrix

(19)

to yield a modified input-output relation: ˜

y= QH_y_{= Q}H_Hx_{+ Q}H_n _{= Rx + ˜}_n,

where ˜nremains independent Gaussian distributed with mean zero and variance N0. In

matrix form, this relation can be written as:    ˜ y1 .. . ˜ yNT   =      r1,1 r1,2 · · · r1,NT 0 r2,2 · · · r2,NT .. . . .. ... ... 0 · · · 0 rNT,NT         x1 .. . xNT   +    ˜ n1 .. . ˜ nNT   .

With this modified relation, the tree-search step can be executed according to (2.6) below: ˆ xML = arg min x∈ONT k ˜y− Rx k2 = arg min x∈ONT NT X i=1 y˜i− NT X j=i ri,jxj 2 . (2.6)

In the literature, three major tree-search algorithms have been proposed, which are respectively named depth-first search [5], breadth-first search [7], and best-first search [8] algorithms. These algorithms basically produce hard-output xML. Instead, we wish

to investigate the soft-output SD algorithm in this thesis, which will be briefed in the next section.

2.2.4 Soft-Output Sphere Decoding and Methods for

Complex-ity Reduction

Denote by xj,b the bth bit in the constellation point corresponding to the jth entry of

vector x. In order to reduce the decoding complexity, the true LLR for bit xj,bis replaced

by its Max-Log approximation [5, 9]: L(xj,b) = min

x∈Xj,b(0)

k y − Hx k2− min

x∈Xj,b(1)

k y − Hx k2, (2.7)

where X_j,b(0) and X_j,b(1) are sets of vectors that have the bth bit in the jth entry equal to 0 and 1, respectively. Applying this idea to the QR-decomposition-refined relation

(20)

˜

y= Rx + ˜n, we now obtain an equivalent version of Eq. (2.7):

L(xj,b) = min x∈Xj,b(0)

k ˜y− Rx k2− min

x∈Xj,b(1)

k ˜y− Rx k2 . (2.8)

We then solve the above equation by using the SD-based tree search.

Several tree traversal strategies have been proposed in the literature. They are respectively described below:

1. Repeated Tree Search (RTS)

The main idea of the RTS [4] is to compute the soft LLR value based on the ML solution found by the hard-output SD algorithm. This strategy may re-do some branch computations, resulting in a significant complexity waste.

2. Single Tree Search (STS)

When being compared with the RTS, the STS is much more efficient since every branch in the tree is visited at most once. The STS finds the ML solution via the hard-output SD algorithm, and simultaneously identifies the counter-hypothesis paths corresponding to the ML solution. Thus, the branch computations will not be repeated, which saves the significant complexity in comparison with the RTS. Its largely varying complexity however may make its hardware implementation a challenging task.

3. Smart Ordering and Candidate Adding (SOCA) Algorithm

Aiming to solve to complexity variation problem in the SD algorithm like the STS, the authors in [6] proposed another algorithm named the SOCA in a way that by performing QR decomposition with a smart ordering criterion, together with predefined numbers of per-layer candidates to be searched, the complexity can be made fixed and hence achieve a good performance-complexity trade-off.

(21)

such as LLR clipping, LLR sorting and regularization can also be combined with the above algorithms to further reduce the complexity [4, 5, 6].

(22)

ŔŰŶųŤŦġ ŊůŧŰųŮŢŵŪŰů ńũŢůůŦŭġ ņůŤŰťŦų ŊůŵŦųŭŦŢŷŦų ŎŰťŶŭŢŵŪŰů ńŰťŦŸŰųťĮ ŵŰĮŭŢźŦų ńŰťŦŸŰųťĮ ŵŰĮŭŢźŦų ņŴŵŪŮŢŵŦťġ ŊůŧŰųŮŢŵŪŰů ńũŢůůŦŭġ ŅŦŤŰťŦų ŅŦĮġ ŊůŵŦųŭŦŢŷŦų ŔűũŦųŦġ ŅŦŤŰťŪůŨ ŔŰŶųŤŦġ ŊůŧŰųŮŢŵŪŰů ńũŢůůŦŭġ ņůŤŰťŦų ŊůŵŦųŭŦŢŷŦų ŎŰťŶŭŢŵŪŰů ńŰťŦŸŰųťĮ ŵŰĮŭŢźŦų ńŰťŦŸŰųťĮ ŵŰĮŭŢźŦų ŔŰŶųŤŦġ ŊůŧŰųŮŢŵŪŰů ńũŢůůŦŭġ ņůŤŰťŦų ŊůŵŦųŭŦŢŷŦų ŎŰťŶŭŢŵŪŰů ńŰťŦŸŰųťĮ ŵŰĮŭŢźŦų ńŰťŦŸŰųťĮ ŵŰĮŭŢźŦų ŔŰŶųŤŦġ ŊůŧŰųŮŢŵŪŰů ńũŢůůŦŭġ ņůŤŰťŦų ŊůŵŦųŭŦŢŷŦų ŎŰťŶŭŢŵŪŰů ńŰťŦŸŰųťĮ ŵŰĮŭŢźŦų ńŰťŦŸŰųťĮ ŵŰĮŭŢźŦų ŏŔġ ōŢźŦųŴ įįį įįį ŏŔġ ōŢźŦųŴ ŏŪġ ŊůŵŦųŧŦųŦůŤŦġ ŔŰŶųŤŦ įįį ŏŪġ ōŢźŦųŴ ĩġŏŪġĬġŏŴĪġīġŏœġ ŎŊŎŐġńũŢůůŦŭŴ

Figure 2.2: System block diagram

(23)

(24)

Chapter 3 Soft-Output Sphere Decoding with

Modified Repeated Tree Search

In this chapter, we will first investigate the traditional soft-output sphere decoding algorithm for signal detection, followed by our proposal of the modified-RTS soft-output SD algorithm with complexity upper limit.

3.1 Modified RTS Traversal Strategy

Similar to the RTS tree traverse algorithm, our modified RTS algorithm has two stages. The first stage finds the hard-decision ML path, while the second stage examines counter hypothesis paths to generate the required soft output.

A suitable candidate algorithm for the first stage is the Schnorr-Euchner sphere decoder (SESD) with radius reduction [12]. It is an efficient depth-first tree search algorithm for finding the ML hard-output. Nevertheless, the SESD still has varying decoding complexity, and therefore, we propose to set an upper limit T1 such that the

first stage ends either when the ML hard-output is found or when the complexity upper limit T1 is reached at which time the current best path is outputted instead.

From this setting, an expectation can be resulted. After imposing a complexity upper limit, we may find a near-ML path instead of the ML path. Our simulation results however show that with a small T1, the ML path can be located with high probability,

(25)

After the determination of the ML path (or a near-ML path), we perform the tree traverse in the second stage. Later, we will show that the second stage also has a upper complexity limit T2, and hence the overall complexity for the proposed modified RTS is

limited by (T1+ T2).

Figure 3.1: Illustration of the second stage with NT = 3 and b = [b1 b2 b3] = [4 3 2]

for the proposed modified RTS. The thick solid line corresponds to the near-ML path obtained from the first stage.

Specifically, the second stage finds counter hypothesis paths based on the near-ML path obtained from the first stage. There can be as many as QNT _{counter hypothesis} paths. To save the complexity, we specify a vector b = [b1, · · · , bNT] to restrict the number of counter hypothesis paths to be extended at each level, where 1 ≤ bi ≤ Q.

Therefore, at level i, only those bi paths with the smallest partial path metrics (i.e.,

distances to the received vector) are extended. As an example, suppose the modulation scheme adopted is 16-QAM, which gives immediately Q = log₂(16) = 4. Assume that the symbol at the ith level of the ML path located at the first stage is ˆxi = 1000.

This gives us four counter hypothesis branches, specified by {0000, 1100, 1010, 1001}. Among these four counter hypothesis paths, only bi of them are selected for further

extension according their Euclidean distances to the complex receive scaler ˜yi. For a

better understanding, a simple illustration of the second stage is given in Fig. 3.1. On the other hand, it is known that clipping the LLR value to make it within ±Lmax

(26)

has essential effect on performance and complexity of the soft-output SD algorithm. Since bi ≤ Q, the counter hypothesis paths traversed by our modified RTS strategy may

not include all the paths required by (2.8). In such case, the Max-Log approximated LLR value may become infinity. This makes the selection of the clipping limit Lmaxvery

important in our modified RTS strategy. In fact, we observe that extending a path with the distance of its ith entry to ˜yi larger than Lmax will often result in no improvement

in performance but induce only more complexity. Hence, we set a rule that a path will be abandoned once the distance of its ith entry to ˜yi is found to exceed Lmax, which can

further reduce the complexity.

The upper complexity limit of the second stage can be computed as follows: T2 =

NT X

i=1

bi(NT+ 1 − i) . (3.1)

The idea behind equation (3.1) is that the near-ML path expands bi paths at level

i, and each path visit bi nodes until it reaches the bottom level. Again, as having

been described, by only expanding those nodes with branch metrics within Lmax, the

complexity of the second stage can be further reduced.

We close this section by summarizing the main ideas of the proposed modified RTS in the following:

• We adopt the SESD with an upper complexity limit T1 in the first stage. It finds a

near-ML path, guaranteeing having complexity no greater than T1. With a proper

choice of T1, a good performance-complexity tradeoff is obtained.

• bi counter hypothesis paths with the smallest metrics and also with metrics smaller

than Lmax are extended in the second stage. This guarantees a complexity no

greater than T2, and prevents a complexity waste in the second stage.

3.2 Simulation Results

(27)

Consider an MIMO system transmitted over Rayleigh fading channels with possibly spatial or temporal correlation. Fast fading and slow fading scenarios as specified in [6] are both considered, where in fast fading scenario, the channel realizations change per MIMO transmission, while in slow fading scenario, the channel realizations remain the same throughout an entire (turbo) transmission block but vary across (turbo) transmis-sion blocks. We assume that all channel matrix realizations can be perfectly estimated at the receiver. Four transmit antennas and four receive antennas, (i.e. NT = NR = 4)

and 16-QAM constellation are adopted.

Two kinds of channel coding schemes are tested. The first one is a 3GPP-specified punctured turbo code of code rate R = 1/2 with codeword length 2000 bits [10]. After passing the code through a 40 × 50 block interleaver, 500 16-QAM symbols are formed and transmitted. At the receiver, the 8-iteration Max-Log-MAP decoder is used for turbo decoding.

The second channel coding scheme used in our simulation is a 3GPP-specified (2, 1, 8) convolutional code of code rate R = 1/2. The codeword length is 720 bits. After convolutional encoding, 180 16-QAM symbols are fed into a 15 × 48 block interleaver before they are sent. At the receiver, the Viterbi decoder is used for the decoding of this convolutional code.

The upper complexity limit T1 for the first stage in our modified RTS is set to 30,

and the set of the restriction vectors b examined in our simulation contains b = [4444], [4442], [4422], [4222], and [2222], which respectively result in T2 = 40, 38, 34, 28, and 20.

For the SOCA, the tested numbers of paths extended in the first level, i.e., b1, include

6, 14, 12, 10, 8, 6, and 4, which results complexities 88, 80, 72, 64, 56, 48, and 40, respectively. Note that for the SOCA, bi = 1 for every i > 1.

It should be mentioned that the channel regularization algorithm [5] is used for all three algorithms, i.e., the STS, the SOCA, and our proposed modified RTS when performing the QR-decomposition step. In addition, the SQRD [13] is employed as the

(28)

sorting algorithm in QR decompositoin for both the STS and our proposed modified RTS, while the SOQR in [6] is implemented for the SOCA.

The performance index that we adopt in this thesis is the minimum SNR required to achieve a block error rate of 10−2 _{after channel decoding. The complexity measure is}

the number of visited nodes during the tree search. This complexity measure is widely adopted for one-node-per-cycle hardware implementation architecture [11]. We are now ready to present the simulation results.

We first examine what should e the proper selected value for Lmax. As previously

mentioned, the value of Lmax chosen will affect performance and complexity of the STS,

the SOCA and our modified RTS. As for the STS (as well as other algorithms), a larger Lmax implies a better performance but a larger complexity. We then show the the

performance-complexity tradeoff for the STS in Fig. 3.2. Later, an Lmax value will be

chosen according to this figure.

For the SOCA and our modified RTS, the relationship between Lmaxand

performance-complexity tradeoff is a little messy. Various test results regarding different Lmax values

are summarized in Figs. 3.3, 3.5, 3.4, and 3.6.

Specifically, in Fig. 3.3, the range of the tested Lmax values is ranged from 0.15

to 0.55 for the modified RTS under fast fading channels. We can clearly see from this figure that Lmax = 0.25 has the best complexity-performance trade-off. However, the

implication from Fig. 3.4 is a little different under slow fading scenario. By testing Lmax from 0.15 to 0.55, we observe from Fig. 3.4 that under a slow fading environment,

the smaller the Lmax, the better the performance-complexity tradeoff. To have a good

balance choice that fits both fast fading and slow fading scenarios, we set Lmax = 0.25

for our modified RTS.

Similar tests, for which Lmaxassumes values from 0.15 to 0.55, are performed for the

SOCA. From Fig. 3.5, where the fast fading scenario is assumed, we observe that the performance-complexity tradeoff improves as Lmax increases. Notably, the simulation

(29)

13 13.2 13.4 13.6 13.8 14 0 20 40 60 80 100 120 0.350 0.300 0.250 0.200 0.150 0.100 0.325 0.225 0.175 0.125

Minimum required SNR [dB] for BLER = 0.01

Average complexity STS 16.80 17 17.2 17.4 17.6 17.8 18 10 20 30 40 50 60 0.250 0.225 0.200 0.175 0.150 0.125 0.100 0.075 0.050 0.025

Average complexity

STS

Figure 3.2: Impact of Lmaxon the STS with turbo coding under fast (left subfigure) and

slow (right subfigure) Rayleigh fading channels

results for Lmax ranging from 0.35 to 0.55 are almost indistinguishable. Under the slow

fading scenario, however, different trends can be observed from Fig. 3.6. We note from this figure that when Lmaxis larger than 0.2, the performance-complexity tradeoff begins

to degrade. Again, to compromise between two different scenarios, we choose Lmax = 0.3

for the SOCA.

Figs. 3.7 and 3.8 illustrate how different T1 affect BLERs and complexities. In short,

we can see from Fig. 3.7 that under fast fading, the curve corresponding to T1 = 20 has

already approached the curve of T1 = ∞. In Fig. 3.8, we then see that there is no visible

gap between curves of T1 = 20 and T1 = ∞. Nonetheless, we set T1 = 30 to secure the

(near-)ML performance.

After the settlement of the parameters used, we are now ready to compare the STS and the SOCA with our modified-RTS algorithm. First, we remark on the simulations results regarding turbo coding and fast fading scenario. As observed in Fig. 3.9, the proposed modified RTS achieves the best performance-complexity tradeoff, when it is

(30)

13.3 13.4 13.5 13.6 13.7 13.8 13.9 15 20 25 30 35 40

Average complexity RTS, L_max=0.15 RTS, L_max=0.2 RTS, L max=0.25 RTS, L max=0.3 RTS, L max=0.35 RTS, L max=0.4 RTS, L max=0.45 RTS, L max=0.5 RTS, L_max=0.55

Figure 3.3: Impact of Lmax on the modified RTS with turbo coding under fast Rayleigh

fading channels, where b = [4444] and [2222] are employed.

compared with the STS and the SOCA. In order to examine the variation in complexity, we also record that the 99.9th percentile complexities of the STS and our modified RTS in this figure. Since the SOCA has a fixed decoding complexity, the average complexity of the SOCA is exactly the same as its 99.9th percentile complexity. It is shown in Fig. 3.9 that since the 99.9th percentile complexity of the STS is much higher than its average complexity, the STS may suffer with high variation of complexity and hence may become a challenge for hardware implementation. The high variation of complexity of the STS also makes varying its computational delay.

On the other hand, the gap between the average complexity and the 99.9th percentile complexity of the modified RTS is considerably much smaller that that of the STS. The 99.9th percentile complexity of the modified RTS is even just slightly higher than the (fixed) complexity of the SOCA. This indicates that the complexity upper limit (T1+T2)

we set for the modified RTS does decrease the variation of the decoding complexity, and therefore makes the soft-output SD algorithm more easily hardware-implementable.

(31)

17 17.2 17.4 17.6 17.8 18 18.2 18.4 18.6 18.8 10 15 20 25 30 35

Average complexity RTS, L_max=0.15 RTS, L_max=0.2 RTS, L max=0.25 RTS, L max=0.3 RTS, L max=0.35 RTS, L max=0.4 RTS, L max=0.45 RTS, L max=0.5 RTS, L_max=0.55

Figure 3.4: Impact of Lmax on the modified RTS with turbo coding under slow Rayleigh

fading channels, where b = [4444] and [2222] are employed.

from Fig. 3.10 that the STS achieves the best complexity-performance tradeoff in the sense of average complexity. However, nonetheless, the high complexity variation of the STS remains, which again challenges its hardware implementation. In particular, the 99.9th percentile complexity of the STS is six times larger than its average complexity. Similar conclusion as the one in the fast fading scenario can be obtained about the SOCA and the modified RTS that these two are more appropriate for hardware implementation due to their prohibitively bounded complexity. When comparing the modified RTS with the SOCA, the former requires a higher 99th percentile complexity but has a seemingly less average complexity.

In order to examine the impact on the coding algorithm such as turbo and con-volutional codes, we re-do the previous simulations by replacing the turbo code with the convolutional code. The simulation results are summarized in Figs. 3.11 and 3.12. The results are similar to what obtained using the turbo code. As a result, the SOCA and the proposed modified RTS remain to be more attractive solutions for hardware implementation, regardless of the channel coding scheme.

(32)

13 13.1 13.2 13.3 13.4 13.5 13.6 13.7 55 60 65 70 75 80 85 90

Average complexity SOCA, L_max=0.15 SOCA, L_max=0.2 SOCA, L max=0.25 SOCA, L max=0.3 SOCA, L max=0.35 SOCA, L max=0.4 SOCA, L max=0.45 SOCA, L max=0.5 SOCA, L_max=0.55

Figure 3.5: Impact of Lmax on the SOCA with turbo coding under fast Rayleigh fading

channels, where b1 = 16 and 8 are employed.

In order to have more detailed insight on the complexities of the STS and our modified-RTS algorithm, we show the 50%-percentile, 90%-percentile, 99%-percentile, and 99.9%-percentile complexities in Figs. 3.13 and 3.14 in both fast and slow fading scenarios under turbo coding scheme. Evidently, the gaps among 50%-percentile, 90%-percentile, 99%-90%-percentile, and 99.9%-percentile complexities for the modified RTS are much smaller that those of the STS.

We further investigate the complexity distribution of the STS with Lmax = 0.2 and

also the complexity distribution of the modified RTS with Lmax = 0.25 and b = [4444]

in Figs. 3.15 and 3.16, respectively. Note that to achieve a BLER of approximately 10−₂

, the minimum SNR required for the STS is 13.41 dB, while the minimum SNR required for the modified RTS is 13.40 dB; so they are approximately operated at the same SNR. From the two figures, we can clearly see a drawback of the STS is its high complexity variation. Although its average complexity is only 44.57, its largest decoding complexity can be as large as 900 after testing 2,500,000 simulation samples. Such a high complexity variation of the STS may become a challenge for hardware implementation.

(33)

16.85 16.9 16.95 17 17.05 17.1 17.15 17.2 17.25 55 60 65 70 75 80 85 90

Average complexity SOCA, L_max=0.15 SOCA, L_max=0.2 SOCA, L max=0.25 SOCA, L max=0.3 SOCA, L max=0.35 SOCA, L max=0.4 SOCA, L max=0.45 SOCA, L max=0.5 SOCA, L_max=0.55

Figure 3.6: Impact of Lmax on the SOCA with turbo coding under slow Rayleigh fading

channels, where b1 = 16 and 8 are employed.

As a contrary, the average decoding complexity for our modified RTS is only 25.62 and its true complexity is upper-bounded by T1+T2 = 70 as shown in Fig. 3.16, where unlike

the complexity distribution of the STS, the complexity distribution of the modified RTS does not have a long tail.

(34)

12 12.5 13 13.5 14 10−4 10−3 10−2 10−1 100 SNR BLER T 1 = 10 T 1 = 20 T 1 = 30 T 1 = 40 T 1 = 50 T 1 = Inf 12 12.5 13 13.5 14 25 25.1 25.2 25.3 25.4 25.5 25.6 25.7 25.8 25.9 26 SNR Complexity T 1 = 10 T 1 = 20 T 1 = 30 T 1 = 40 T 1 = 50 T 1 = Inf

Figure 3.7: Impact of T1on BLERs and complexities under fast Rayleigh fading channels.

16 16.5 17 17.5 18 10−3 10−2 10−1 SNR BLER T 1 = 10 T 1 = 20 T 1 = 30 T 1 = 40 T 1 = 50 T 1 = Inf 16 16.5 17 17.5 18 22 22.5 23 23.5 24 24.5 25 SNR Complexity T 1 = 10 T 1 = 20 T 1 = 30 T 1 = 40 T 1 = 50 T 1 = Inf

Figure 3.8: Impact of T1 on BLERs and complexities under slow Rayleigh fading

(35)

13.1 13.2 13.3 13.4 13.5 13.6 13.7 101 102 103 0.350 0.325 0.300 0.275 0.250 0.225 0.200 0.175 0.150 16 14 12 10 8 6 40 38 34 ₂₈ 20 0.350 0.325 0.300 0.275 0.250 0.225 0.200 0.175 0.150 40 38 34 ₂₈ 20

Average and 99.9%−percentile complexity

STS average SOCA

modified RTS average STS 99.9% modified RTS 99.9%

Figure 3.9: Performance versus complexity for the STS, the SOCA, and the modified RTS in fast Rayleigh fading channels. The numbers beside the STS marks are the Lmax

values used. The numbers next to the SOCA curve correspond to b1. The number next

to each modified RTS mark is T2.

16.8 17 17.2 17.4 17.6 17.8 18 100 101 102 103 0.250 0.225 0.200 0.175 0.150 0.125 0.100 0.075 0.050 0.025 16 14 12 10 8 ₆ 4 40 38 34 28 20 0.250 0.225 0.200 0.175 0.150 0.125 0.100 0.075 0.050 0.025 40 38 34 28 ₂₀

STS average SOCA

Figure 3.10: Performance versus complexity for the STS, the SOCA, and the modified RTS with turbo code in slow Rayleigh fading channels. The numbers beside the STS marks are the Lmax used. The numbers next to the SOCA curve correspond to b1. The

(36)

13.9 14 14.1 14.2 14.3 14.4 14.5 14.6 101 102 103 0.350 0.325 0.300 0.275 0.250 0.225 0.200 0.175 0.150 16 14 12 10 8 6 4 3 40 38 34 ₂₈ 20 0.350 0.325 0.300 0.275 0.250 0.225 0.200 0.175 0.150 40 38 34 ₂₈ 20

Average number and 99.9% number of visited nodes

STS average SOCA

Figure 3.11: Performance versus complexity for the STS, the SOCA, and the modified RTS with convolutional code in fast Rayleigh fading channels. The numbers beside the STS marks are the Lmax used. The numbers next to the SOCA curve correspond to b1.

The number next to each modified RTS mark is T2.

16.5 17 17.5 18

100 101 102 103

0.200 0.175 0.150 0.125 0.100 0.075 0.050 0.025 0.200 0.175 0.150 0.125 0.100 0.075 0.050 0.025 16 14 12 10 8 6 4 40 38 34 28 ₂₀ 40 38 _{34 28} 20 STS average SOCA modified RTS average STS 99.9% modified RTS 99.9%

Figure 3.12: Performance versus complexity for the STS, the SOCA, and the modified RTS with convolutional code in slow Rayleigh fading channels. The numbers besidethe STS marks are the Lmax used. The numbers next to the SOCA curve correspond to b1.

(37)

13.1 13.2 13.3 13.4 13.5 13.6 13.7 101 102 103 0.350 0.325 0.300 0.275 0.250 0.225 0.200 0.175 0.150 40 38 34 28 ₂₀

50%, 90%, 99% and 99.9%−percentile complexity

STS 50% STS 90% STS 99% STS 99.9% modified RTS 50% modified RTS 90% modified RTS 99% modified RTS 99.9%

Figure 3.13: Performance versus different percentile-complexity for the STS and the modified RTS with turbo code in fast Rayleigh fading channels. The numbers beside the STS marks are the Lmax used. The number next to each modified RTS mark is T2.

16.8 17 17.2 17.4 17.6 17.8 18 100 101 102 103 40 38 34 28 20 0.200 0.175 0.150 0.125 0.100 0.075 0.050 0.025

50%, 90%, 99% and 99.9%−percentile complexity

STS 50% STS 90% STS 99% STS 99.9% modified RTS 50% modified RTS 90% modified RTS 99% modified RTS 99.9%

Figure 3.14: Performance versus different percentile-complexity for the STS and the modified RTS with turbo code in slow Rayleigh fading channels. The numbers beside the STS marks are the Lmax used. The number next to each modified RTS mark is T2.

(38)

0 20 40 60 80 100 120 140 160 180 200 0 0.004 0.008 0.012 0.016 0.02 Complexity Probability

Figure 3.15: Complexity distribution of the STS with Lmax = 0.2. The maximum range

of the complexity is 200, where the probability of complexity exceeding 200 is 0.0035.

0 10 20 30 40 50 60 70 0 0.008 0.016 0.024 0.032 0.040 0.048 Complexity Probability

Figure 3.16: Complexity distribution of the modified RTS with Lmax = 0.25 and b =

(39)

Chapter 4 Interference Cancellation under

Unknown Interference Modulation

In this chapter, we turn to a new interference cancellation problem, where the modulation scheme of the interference is unknown to the transceiving system.

As shown in the heterogeneous network in Fig. 4.1, a user camps on a pico cell (BSP)

may receive a strong inter-cell interference from a macro cell (BSM), especially when the

cell range expansion [15] is employed. The interference power from BSM may be similar

to or even larger than the signal power from BSP so that without a proper inter-cell

interference cancellation technique, successful communication is not possible.

As having been specified in Eq. (2.1), the general system model for interference cancellation problem can be formulated as

y= HsPsxs+ HiPixi+ n,

where HiPixidenotes the interference source, whose modulation scheme is now unknown

to the receiver. This brings us to the problem of performing inter-cell interference with unknown interference modulation scheme.

In [14], joint modulation classification and detection using sphere decoding was pro-posed for high-speed downlink packet access (HSDPA) system, where exactly one in-terference source exists and only QPSK and 16QAM are the possible the modulation scheme of the interference. The authors then proposed a modified general likelihood

(40)

ŖŴŦų ŃŢŴŦġŴŵŢŵŪŰůġŰŧġŎŢŤųŰġ ńŦŭŭĭġŃŔŎ ńŰŷŦųŢŨŦġŰŧġŎŢŤųŰġńŦŭŭ ńŰŷŦųŢŨŦġŰŧġőŪŤŰġńŦŭŭ ŊůŵŦųŧŦųŦůŤŦ ŔŪŨůŢŭŴ ŃŢŴŦġŴŵŢŵŪŰůġŰŧġ őŪŤŰġńŦŭŭĭġŃŔő

Figure 4.1: Illustration of interference from a marco cell to a user at the cell boundary of a pico cell

ratio test (GLRT) to deal with such a modulation classification problem. Basically, the modified GLRT is to perform a modified hypothesis testing so that all likelihoods of modulation hypotheses are calculated and compared in order to detect the modulation scheme of the interference source. The simulation results in [14] confirms that the mod-ified GLRT method can output a promising modulation classification outcome for the HSDPA system. One drawback of the modified GLRT is that the number of hypotheses grows exponentially with the number of interference sources as well as the number of pos-sible modulation schemes. This may result in an impractical modulation classification complexity for communication standards such as the LTE and LTE-Advanced.

In this chapter, we revisit our modified RTS in Chapter 3 to fit the new demand of performing inter-cell interference with unknown interference modulation scheme. With the inherited merit of the SD algorithm, the classification complexity can be considerably reduced in comparison with the modified GLRT. The main idea behind our proposal is to perform the SD algorithm for all possible modulation schemes for the interference on

(41)

the first few received symbols, and then declare the true modulation scheme by voting. Details will be given later.

4.1 Modified RTS for Joint Modulation

Classifica-tion and DetecClassifica-tion

In Chapter 2, we have shown that the modified RTS can provide a good performance-complexity tradeoff and meanwhile suits the need of hardware implementation under the assumption that the receiver knows the modulation schemes of and wishes to recover all symbols. With a new assumption that the modulation schemes of some unwanted received symbols (which are exactly the interferences) are unknown to the receiver, adjustment to the modified RTS must be done.

The new modified RTS for unknown interference modulations also retains the two-stage structure as shown in Fig. 4.2.

ĲŴŵ_ġŴŵŢŨŦ ŉŢųťĮŰŶŵűŶŵġŔűũŦųŦġ ŅŦŤŰťŦųġħġŗŰŵŪůŨġŧŰųġ ŎŰťŶŭŢŵŪŰů ĳůť_ġŴŵŢŨŦ ŔŰŧŵĮōōœġŨŦůŦųŢŵŪŰů

Figure 4.2: Structure of the proposed soft-output sphere decoding for joint modulation classification and detection

The first stage performs the hard-decision SD algorithm for the first N received symbols or received elements (REs) to generate N votes for possible modulation schemes. Then classification of the interference modulation scheme can be carried out by the so-called unfair voting. The second stage uses the modulation scheme determined in the first stage and performs detection of the wanted signals. Details will be given in the sequel.

(42)

ŔźŮţŰŭŴġŵųŢůŴŮŪŵŵŦťġŧųŰŮġ ŃŎĭġŮŰťŶŭŢŵŪŰůġŤŰŶŭťġţŦġ ĲķĮŒłŎġŰųġŒőŔŌ ŔźŮţŰŭŴġŵųŢůŴŮŪŵŵŦťġŧųŰŮġ ŃŎĭġŮŰťŶŭŢŵŪŰůġŤŰŶŭťġţŦġ ĲķĮŒłŎġŰųġŒőŔŌ ŎŰťŶŭŢŵŪŰůġ ŰŧġŦŷŦųźġ ŴźŮţŰŭġŪŴġ ŬůŰŸůġŵŰġţŦġ ĲķĮŒłŎ

Figure 4.3: Illustration of the extended search tree for the 1st stage of the modified RTS for unknown interference modulation

4.1.1 The First Stage of the Modified RTS

In this first stage, the modified RTS again performs the hard-output SD algorithm to find the ML path but over extended modulation tree that contains all possible candi-date modulation schemes for the interference. Hence, the main difference between the first stages of the modified RTS algorithms in this chapter and in previous chapter is the additional number of candidate modulation schemes for unknown interference. For example, if the modulation scheme of the interference may be either QPSK or 16QAM, each node on the search tree will have 20 children nodes, corresponding to the total number of QPSK symbols and 16QAM symbols. Such a new extended search tree is illustrated in Fig. 4.3.

After executing the first stage for N received elements (REs), we have the ML path for each RE, which suggests a modulation scheme for the interference. We can then decide or classify the modulation scheme of the interference via voting. It can be verified that different number of constellation points for different modulation scheme will give an unbalanced modulation classification error rate. For example, Pr(16QAM|QPSK) is significantly larger than Pr(QPSK|16QAM), where Pr(modulation I|modulation II)

(43)

denotes the error probability that modulation I is detected by the first stage of the modified RTS, while modulation II is the truly used one for the interference source.

Due to the unbalanced modulation error rate, we propose an “unfair voting” method by introduce a bias Vb such that the decision rule is

VI+ Vb≷III VII, (4.1)

where VI and VII denote the numbers of votes respectively for modulation I and

modu-lation II. The parameters N and Vb will be later determined via simulations.

After performing N hard-output SD algorithms to determine the modulation scheme of the interference source via (4.1), those REs, whose ML paths suggest an alternative modulation scheme other than the decided one, will have to be re-done their corre-sponding hard-output SD algorithms with respect to the decided interference modula-tion scheme; otherwise the performance of the inter-cell interference cancellamodula-tion may degrade significantly.

We end this subsection by emphasizing that although the first stage process that we described above is exemplified for two candidate interference modulation schemes as QPSK and 16QAM, it can be straightforwardly extended to three or more candidate interference modulation schemes, e.g., QPSK, 16QAM and 64QAM. Hence, one should be able to apply it to to-date LTE and LTE-Advanced standard.

4.1.2 The Second Stage of the Modified RTS

In the second stage of modified RTS, we will again generate the soft-output LLR values via the help of the counter hypothesis paths.

Since the receiver has no interest to recover the information contained within the interference, only the soft-output LLRs of the desired signals will be generated. For this reason, we modify the sorting and QR-decomposition (SQRD) algorithm [13] such that the symbols corresponding to the desired signals are placed in the first Ns levels

(44)

ŒġŤŰŶůŵŦųġ ũźűŰŵũŦŴŦŴ ŴŪůŨŭŦġţŦŴŵġ űŢŵũġŰůŭźġ ŒġŤŰŶůŵŦųġ ũźűŰŵũŦŴŦŴ ŴŪůŨŭŦġţŦŴŵġ űŢŵũġŰůŭźġ ŴŪůŨŭŦġţŦŴŵġ űŢŵũġŰůŭźġ ŴŪůŨŭŦġţŦŴŵġ űŢŵũġŰůŭźġ ŴŪůŨŭŦġţŦŴŵġ_{űŢŵũġŰůŭźġ} ŎōġűŢŵũ ōŢźųŦŴġŰŧġ ŴźŮţŰŭŴġ ŵųŢůŴŮŪŵŵŦťġ ŧųŰŮġŃŔő

Figure 4.4: Illustration of the 2nd stage of the modified RTS for unknown interference modulation

in order to generate the soft-output LLR values. During this process, only the paths corresponding to the best candidate symbol are retained. The process will be repeated until the bottom level of the search tree is reached. An example of the second stage is illustrated in Fig. 4.4.

4.2 Simulation Results

In this section, simulation results for Ns = Ni = 2 and NR = 4 will be presented,

which corresponds to the system model

y = HsPsxs+ HiPixi + n = [HsPs HiPi]     x1 x2 i1 i2    +n,

where n denotes the additive white Gaussian noise. In the simulation, the modulations of the desired signals x1 and x2 are 16QAM, whereas the modulations of interferences

(45)

3GPP-Algorithm 1 Modified SQRD Algorithm 1: R= 0 , Q = H, P = INT 2: for i = 1, ...Ns do 3: ki = arg min j=1,...,Ns |qj|2

4: Exchange columns i and ki in Q, R, and P

5: ri,i= |qi| 6: qi = qi/ri,i 7: for j = i + 1, ..., NT do 8: ri,j = qHi qj 9: qj = qj− ri,j· qi 10: end for 11: end for 12: for i = Ns+ 1, ...NT do 13: ki = arg min j=Ns+1,...,NT |qj|2

14: Exchange columns i and ki in Q, R, and P

15: ri,i= |qi| 16: qi = qi/ri,i 17: for j = i + 1, ..., NT do 18: ri,j = qHi qj 19: qj = qj− ri,j· qi 20: end for 21: end for

specified punctured turbo code of code rate R = 1/2 and codeword length1920 bits is adopted [10]. Under a 15 × 128 block interleaver, 480 16-QAM symbols are received at the receiver, in which the 8-iteration Max-Log-MAP decoder is used for turbo decoding. It is assumed that the channel coefficients can be perfectly estimated. Only the slow fading scenario is considered; hence, the parameter Lmax is set to be 0.2.

We first examine the impact on the correctness of modulation classifications in Fig. 4.5. The exact modulation schemes of the interference sources i1 and i2 are 16QAM

and QPSK, respectively. For the incorrect modulation judgement of interference, two situations are thus examined. The modulation scheme of i1 is wrongly declared as

QPSK, and that of i2 is wrongly decided to be 16QAM. We can then observe from

Fig. 4.5 that any one incorrectly judgement on interference modulation scheme can seriously jeopardize the system performance, where the resultant BLERs decrease at a

(46)

16 16.5 17 17.5 18 18.5 19 10−4 10−3 10−2 10−1 100 SNR BLER i 1 QPSK i2 QPSK i₁ 16QAM i₂ 16QAM i 1 16QAM i2 QPSK

Figure 4.5: Comparison of BLERs between correct and erroneous declarations of mod-ulation scheme for interferences

very low speed as the SNR grows.

Next, we investigate the modulation classification errors when only one RE is used. The modulation schemes of i1 and i2 are set as {16QAM, 16QAM}, {QPSK, 16QAM},

{QPSK, QPSK} in the left, middle, and right subfigures of Fig. 4.6 respectively.1

Similar to those examined in Fig. 4.6, we shows the modulation classification error for one RE in Fig. 4.7, where the modulation schemes of of i1 and i2 are now randomly

chosen with equal probability from QPSK and 16QAM. Figure 4.7 again confirms that high-order modulations are favored in decision, particularly at the low SNR region.

Next, we examine the thresholds for unfair voting when taking N = 8 and N = 16 REs in Figs. 4.8 and 4.9, respectively. The interference modulation scheme to be compared is the modified GLRT in [14]. Note that since the number of total votes is an even number, we set the thresholds Vbias to be also even as for example setting Vbias= 0

1_{It can be verified that the simulation result for the modulation scheme of i}

1and i2being{16QAM,

QPSK} is identical to that of i₁ and i₂ being{QPSK, 16QAM}. Therefore, we omit such case in Fig.

(47)

16 16.5 17 17.5 18 18.5 19 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 SNR [dB] Probability Pr(16QAM,QPSK|16QAM,16QAQM) Pr(QPSK,16QAM|16QAM,16QAM) Pr(QPSK,QPSK|16QAM,16QAM) 16 16.5 17 17.5 18 18.5 19 0 0.05 0.1 0.15 0.2 0.25 SNR [dB] Probability Pr(16QAM,16QAM|QPSK,16QAM) Pr(16QAM,QPSK|QPSK,16QAM) Pr(QPSK,QPSK|QPSK,16QAM) 16 16.5 17 17.5 18 18.5 19 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 SNR [dB] Probability Pr(16QAM,16QAM|QPSK,QPSK) Pr(16QAM,QPSK|QPSK,QPSK) Pr(QPSK,16QAM|QPSK,QPSK)

Figure 4.6: Modulation classification errors. The modulation schemes of i1 and i2 are

set as {16QAM, 16QAM}, {QPSK, 16QAM}, {QPSK, QPSK} from left to right in the three subfigures, respectively.

is equivalent to setting Vbias = 1, setting Vbias = 2 is equivalent to setting Vbias = 3,

etc. In order to reduce the modulation classification error, we determine the threshold according to

min

Vbias

{max{Pr(QPSK|16QAM), P (16QAM|QPSK)}}.

We then found that Vbias = 2 and Vbias = 6 achieve the above minimization values

respectively for N = 8 and N = 16.

After identifying the Vbias, we next compare the performance of unfair voting with

that of the modified GLRT [14]. In Fig. 4.10, the incorrect modulation classification error rates respectively using unfair voting and the modified GLRT are illustrated for both N = 8 and N = 16. In Fig. 4.11, the average complexity per receive antenna and the number of paths stored during the modulation classification stage are presented. Notably, the number of paths stored can be regarded as an index of memory storage required for the proposed algorithm. We can see from the two figures that the modified GLRT outperforms the proposed unfair voting in classification error rate; however, the superiority in classification error rate of the modified GLRT is obtained at the cost of a

(48)

15 16 17 18 19 20 21 22 23 24 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 SNR

Modulation Classification Error Rate

Pr( 16QAM | QPSK ) Pr( QPSK | 16QAM )

Figure 4.7: Modulation classification error. Modulation schemes of i1 and i2 are

ran-domly chosen from QPSK and 16QAM.

higher complexity and higher storage requirement. The high complexity of the modified GLRT is due to that it checks the metrics of all possible hypotheses. In the scenario we simulated, the modified GLRT needs to examine the four cases of i) i1 ∈ QPSK and

i2 ∈ QPSK, ii) i1 ∈ 16QAM and i2 ∈ QPSK, iii) i1 ∈ QPSK and i2 ∈ 16QAM, and iv)

i1 ∈ 16QAM and i2 ∈ 16QAM. In addition, to determine the ML path as well as the ML

path’s metric for each hypothesis, the SESD algorithm should be executed four times. During this process, all information for each candidate path needs to be stored. For the above reasons, the modified GLRT requires a much higher complexity and storage requirement than our proposed unfair voting.

We would like to add at the end of this discussion that the complexity and the num-ber of paths required to be stored for the modified GLRT are actually proportional to the number of hypotheses, while those of the proposed unfair voting, are only propor-tional to Ni (i.e., the number of interferences), where the number of hypotheses grows

exponentially as Ni increases.

(49)

16 17 18 19 10−4 10−3 10−2 10−1 100 SNR Pr(QPSK|16QAM) V bias=0 V_bias=2 V bias=4 16 17 18 19 10−4 10−3 10−2 10−1 100 SNR Pr(16QAM|QPSK) V bias=0 V_bias=2 V bias=4

Figure 4.8: Classification errors for different Vbias subject to N = 8

for intereferences. If 64QAM is additionally considered, then the complexity of the GLRT may grow dramatically and become impractical.

On the other hand, the performance gap in Fig. 4.10 regarding Pr(16QAM|QPSK) may look huge; however, Figure 4.12 indicates that the simulated BLERs of the modified GLRT and the proposed unfair voting are not that deviated. We may accordingly conclude that the proposed unfair voting can achieve similar performance to the modified GLRT with a much smaller complexity and its simplicity in implementation makes it a suitable candidate for hardware implementation.

(50)

16 17 18 19 10−4 10−3 10−2 10−1 100 SNR Pr(QPSK|16QAM) V_bias=0 V_bias=2 V bias=4 V bias=6 V bias=8 16 17 18 19 10−4 10−3 10−2 10−1 100 SNR Pr(16QAM|QPSK) V_bias=0 V_bias=2 V bias=4 V bias=6 V bias=8

(51)

16 17 18 19 10−4 10−3 10−2 10−1 100 SNR Pr(QPSK|16QAM)

Unfair Voting, N=8, V_bias=2

Modified GLRT, N=8 Modified GLRT, N=16 16 17 18 19 10−4 10−3 10−2 10−1 100 SNR Pr(16QAM|QPSK)

Modified GLRT, N=8 Modified GLRT, N=16

Figure 4.10: Modulation classification error rate for unfair voting and the modified GLRT

(52)

16 17 18 19 5 10 15 20 25 30 35 40 SNR

Average Complexity Per Receive during Modulation Classification Stage

Unfair Voting, N=8, V

bias=2

Unfair Voting, N=16, V_bias=6 Modified GLRT, N=8 Modified GLRT, N=16 16 17 18 19 10 20 30 40 50 60 70 80 90 SNR

Number of Paths stored during Modulation Classification Stage

Unfair Voting, N=8, V

bias=2

Unfair Voting, N=16, V_bias=6 Modified GLRT, N=8 Modified GLRT, N=16

Figure 4.11: Average complexity and the numbers of paths stored during the modulation classification stage

(53)

16 16.5 17 17.5 18 18.5 19 10−3 10−2 10−1 SNR BLER Unfair Voting, N=8, V bias=2 Unfair Voting, N=16, V bias=6 Modified GLRT, N=8 Modified GLRT, N=16

(54)

Chapter 5 Conclusion and Future Work

In this thesis, we proposed a soft-output SD algorithm, named the modified RTS, that can provide a good performance-complexity tradeoff and is appropriate for hard-ware implementation. Furthermore, based on the modified RTS, we further propose a simple method called unfair voting to perform the joint modulation classification and signal detection. Considering its simplicity and also its low requirement in storage, the proposed unfair voting becomes a good candidate for hardwire implementation.

At this stage, the Vbias is actually determined based on simulations. Finding a

the-oretical footing for the selected Vbias could be an interesting future work. To examine

the possibility of a soft voting rather than a hard voting could be another interesting subject to explore. Via these modifications, the modulation classification and the signal detection may become more reliable.

(55)

Bibliography

[1] S.-L. Shieh, R.-D. Chiu, S.-L. Feng and P.-N. Chen, “Low-complexity soft-output sphere decoding with modified repeated tree search strategy,” IEEE Comm. Letter, vol. 17, no. 1, pp. 51-54, January 2013.

[2] C. P. Schnorr and M. Euchner, “Lattice basis reduction: Improved practical algo-rithms and solving subset sum problems,” Math. Programming, vol. 66, no. 2, pp. 181-191, September 1994.

[3] U. Fincke and M. Pohst, “Improved methods for calculating vectors of short length in a lattice, including a complexity analysis.” Mathematics of Computation, vol. 44, pp. 463-471, April 1985.

[4] R. Wang and G. Giannakis, “Approaching MIMO channel capacity with reduced-complexity soft sphere decoding,” in Proc. of IEEE Wireless Communications and Networking Conf. (WCNC), vol. 3, March 2004, pp. 1620-1625.

[5] C. Studer, A. Burg, and H. B¨olcskei, “Soft-output sphere decoding: algorithms and VLSI implementation,” IEEE J. Select. Areas Commun., vol. 26, no. 2, pp. 290-300, February 2008.

[6] D. L. Milliner, E. Zimmermann, J. R. Barry and G. Fettweis, “A fixed-complexity smart candidate adding algorithm for soft-output MIMO detection,” IEEE Trans. Signal Process., vol. 3, no. 6, pp. 1016-1025, December 2009.

(56)

[7] Z. Guo and P. Nilsson, “Algorithm and implementation of the K-best sphere decod-ing for MIMO detection,” IEEE J. Sel. Areas Commun., vol. 24, no. 3, pp. 491-503, March 2006.

[8] M. Myllyla, M. Juntti, and J. R. Cavallaro, “Architecture design and implementa-tion of the increasing radius XList sphere detector algorithm,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Taipei, Taiwan, April 2009, pp. 553-556. [9] B. M. Hochwald and S. ten Brink, “Achieving near-capacity on a multiple-antenna

channel,” IEEE Trans. Commun., vol. 51, no. 3, pp. 389-399, March 2003.

[10] 3rd Generation Partnership Project, “Multiplexing and channel coding (FDD),” 3GPP Tech. Spec., TS 25.212 V11.1.0, March 2012.

[11] A. Burg, M. Borgmann, M. Wenk, M. Zellweger, W. Fichtner and H. Boelcskei, “VLSI Implementation of MIMO transmission using the sphere decoding algo-rithm,” IEEE J. Solid-State Circuits, vol. 40, no. 7, pp. 1566-1577, July 2005. [12] E. Agrell, T. Eriksson, A. Vardy and K. Zeger, “Closest point search in lattices,”

IEEE Trans. on Inform. Theory, vol. 48, no. 8, pp. 2201-2214, August 2002. [13] D. Wubben, R. Bohnke, J. Rinas, V. Kuhn and K.D. Kammeyer, “Efficient

algo-rithm for decoding layered space-time codes,” Electronics Letters, vol. 37, no. 22, pp. 1348-1350, October 2001.

[14] B. Shim, and I. Kang, “Joint modulation classification and detection using sphere decoding,” IEEE Signal Processing Letters, vol. 16, no. 9, pp. 778-781, September 2009.

[15] A. Damnjanovic, J. Montojo, Y. Wei, T. Ji, T. Luo, M. Vajapeyam, T. Yoo, O. Song and D. Malladi, “A survey on 3GPP heterogeneous networks,” IEEE Trans. Wireless Commun., vol. 18, no. 3, pp. 10-21, June 2011.

(57)

[16] Y. Ohwatari, N. Miki, T. Asai, T. Abe and H. Taoka, “Performance of advanced receiver employing interference rejection combining to suppress inter-cell interfer-ence in LTE-Advanced downlink,” in Proc. IEEE 74th Vehicular Technology Conf. (VTC Fall 2011), San Francisco, CA, USA, September 2011, pp. 1-7.

[17] “Proposal for UE receiver assumption in CoMP simulations,” 3GPP TSG RANWG1 #63bis contribution R1-110576, January 2011.

[18] H. Artes, D. Seethaler, and F. Hlawarsch, “Efficient detection algorithms for MIMO channels: A geometrical approach to approximate ML detection,” IEEE Trans. Signal Process., vol. 51, no. 11, pp. 2808-2820, November 2003.

[19] C. N. Manchon, L. Deneire, P. Mogensen, and T. B. Sorensen, “On the design of a MIMO-SIC receiver for LTE downlink,” in Proc. IEEE 68th Vehicular Technology Conf. (VTC Fall 2008), Calgary, Canada, September 2008, pp. 1-5.

運用改良式重複樹狀搜尋策略的低複雜度軟性輸出球狀解碼器

國

立

交

通

大

學

電信工程研究所

碩

士

論

文

運用改良式重複樹狀搜尋策略的低複雜度軟性輸出球狀解碼器

Low-Complexity Soft-Output Sphere Decoding with Modified Repeated Tree

Search Strategy

研 究 生：邱榮東

指導教授：陳伯寧 教授

運用改良式重複樹狀搜尋策略的低複雜度軟性輸出球狀解碼器

Low-Complexity Soft-Output Sphere Decoding with Modified Repeated Tree

Search Strategy

研 究 生：邱榮東 Student：Rong-Dong Chiu

指導教授：陳伯寧 Advisor：Po-Ning Chen

國 立 交 通 大 學

電信工程研究所

碩 士 論 文

運 用 改 良 式 重 複 樹 狀 搜 尋 策 略 的 低 複 雜 度 軟 性 輸 出 球 狀 解 碼 器

學生：邱榮東

指導教授

陳伯寧

國立交通大學電信工程研究所碩士班

摘

要

針對 MIMO 技術的各種解碼器早己提出，但是對於如何達到好的複雜度-效能平衡，

目前仍然是個挑戰。在此篇碩士論文中，我們提出了改良式的重複樹狀搜索的低複雜度

軟性輸出球狀解碼器，並以模擬來說明其具有好的複雜度-效能平衡，並且是可由硬體

來實現的。接著我們更進一步的將此改良式的重複樹狀搜索的低複雜度軟性輸出球狀解

碼器應用於干擾調變未知下的干擾消除。模擬結果顯示相較於傳統的廣義概度比例測試

法，我們所提的方法可以在耗費較少的資源和較低的複雜度下達到同樣好的調變偵測率

和區塊錯誤率。

Low-Complexity Soft-Output Sphere Decoding with Modified Repeated

Tree Search Strategy

student：Rong-Dong Chiu

Advisor：Po-Ning Chen

Institue of Communications Engineering

National Chiao Tung University

ABSTRACT

Various detectors for multiple-in-multiple-out (MIMO) technologies have been

proposed, yet to achieve a good complexity-performance tradeoff still remains a

challenge problem. In this thesis, we proposed a low complexity soft-output sphere

decoding, called modified repeated tree search (RTS), that can achieve good

complexity-performance tradeoff and is suitable for hardware implementation. We

further apply the modified RTS for interference cancellation with unknown

interference modulation. Simulation results show that a good modulation

classification rate and block error rate (BLER) can be achieved with lesser

complexity and resources consumed when it is compared with the traditional general

likelihood ratio test (GLRT).

誌

謝

先誠摯的感謝指導教授陳伯寧博士，陳老師在我研究生

活遇上無法解決的困難時接納了我；而不時的討論並指點我

正確的方向，使我在這些日子獲益匪淺。老師們對學問的嚴

謹更是我輩學習的典範。

而本論文的完成另外亦得感謝台北大學通訊工程所的

謝欣霖博士大力協助以及指導。在謝欣霖博士亦師亦友的指

導之下，不論是在通訊工程領域的數學推導或程式模擬上，

都讓我獲益良多；更教會了我用正面的態度來面對各種學術

研究上的挫折以及障礙。

感謝 823 實驗室的所有學長、同學、學弟們，你/妳們

的幫忙及搞笑我銘感在心，沒有你們這篇論文亦無法順利完

成。女朋友在背後的默默支持更是我前進的動力，對於我坎

坷的研究生涯給予了最大的的體諒及包容，。

最後，謹以此論文獻給我摯愛的雙親。

Contents

List of Figures

Chapter 1

Introduction

Chapter 2

System Model and Background

2.1

System Model

研究生：邱榮東

指導教授：陳伯寧教授

研究生：邱榮東 Student：Rong-Dong Chiu

國立交通大學

碩士論文

運用改良式重複樹狀搜尋策略的低複雜度軟性輸出球狀解碼器