A Simple Scheme for Realizing the Promised Gains of Downlink Nonorthogonal Multiple Access

(1)

A Simple Scheme for Realizing the Promised Gains of Downlink Nonorthogonal Multiple Access

Shin-Lin Shieh, Member, IEEE, and Yu-Chih Huang, Member, IEEE

Abstract—In this paper, the downlink nonorthogonal multiple access (NOMA) system is studied where purely discrete input dis- tributions are found that achieve the capacity region to within a constant gap without successive interference cancellation (SIC).

The approach is a two-step approach where the corresponding linear deterministic model is first studied and the results are then systematically translated into purely discrete input distri- butions for the original model. A simple yet powerful coding scheme, which adopts off-the-shelf turbo codes with pulse ampli- tude modulations (PAM) is then used to simulate the proposed input distributions. Simulation results show that the proposed sim- ple scheme under turbo decoding, both with and without SIC, can operate close to information-theoretic bounds of the proposed input distributions, which lies outside the achievable rate region of any orthogonal multiple access (OMA)-type scheme.

Index Terms—Non-orthogonal multiple access (NOMA), capac- ity region, optimal input distributions, linear deterministic model.

I. INTRODUCTION

D

RIVEN by new types of services such as internet of things (IoT) and by unprecedentedly increasing demands of transmission rates, 5G has been extensively discussed for standardization and 3GPP has made the plan to submit their proposal of novel communication technologies into the IMT 2020 process triggered by ITU-R [1]. Currently, some important requirements of future communication systems such as 5G, have been recognized with two of the most crucial ones being 1) providing higher system capacity while taking into account the fairness among users [2]; 2) providing massive connectivity to deal with huge number of new devices [3]. In [1]–[4], nonorthogonal multiple access (NOMA) is proposed as a candidate of future radio access to partially fulfill these two requirements and the possibility of downlink NOMA for 5G is currently being studied by 3GPP [5].

The currently prevailing approach for multiple access lies in the category of orthogonal multiple access (OMA) which includes frequency division multiple access (FDMA) and time

Manuscript received November 4, 2015; revised January 14, 2016; accepted February 15, 2016. Date of publication February 24, 2016; date of current version April 13, 2016. The work of S.-L. Shieh was supported by Ministry of Science and Technology, Taiwan, under grant MOST 104-2221-E-305-004. The work of Y.-C. Huang was supported by Ministry of Science and Technology, Taiwan, under grant MOST 104-2218-E-305-001-MY2. This work was also supported by the Industrial Technology Research Institute (ITRI), Taiwan. The associate editor coordinating the review of this paper and approving it for publication was Z. Ding.

The authors are with the Department of Communication Engineering, National Taipei University, New Taipei City 23741, Taiwan (e-mail:

[email protected]; [email protected]).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCOMM.2016.2533489

division multiple access (TDMA) adopted in 2G systems, and orthogonal frequency division multiple access (OFDMA) used in 4G systems. This approach first partitions resource into orthogonal resource blocks and then assigns each resource block exclusively to one user. After this, the problem is reduced to a point-to-point communication problem and then well- developed single-user encoders/decoders can be applied. On the one hand, the complexity of this approach is merely the complexity of single-user encoders/decoders. On the other hand, assigning resource blocks exclusively can be very inefficient (in terms of achievable rate regions) and may pose a serious problem about fairness among users.

In contrast to OMA, NOMA allows users to use same resource blocks for transmission simultaneously and hence is potentially more efficient. Indeed, when evaluated under the LTE system, NOMA demonstrates significant gains over OMA systems [6], [7]. In theory, uplink and downlink NOMA can be modelled as multiple access channel (MAC) and broadcast channel (BC), respectively, which have been intensively studied for decades. For the Gaussian case, it is well known that the capacity regions of the Gaussian MAC [8], [9] and that of the Gaussian BC [10]–[12] can be achieved by schemes involving successive interference cancellation (SIC). However, due to hardware limitations, these results have been largely limited to the information-theoretic realm. Recent advances in hardware have reignited the interest of using SIC for multiple access [5], [13] and have suggested the possibility of implementing NOMA.

Apart from [1]–[4], there have been many works discussing potential gains of NOMA over OMA-type systems in vari- ous aspects. In [14], it is shown that H-ARQ is better suited for NOMA than OMA in terms of outage probability. In [15], downlink NOMA is considered in a coordinated two-point system. When the two base stations are allowed to cooperate, a scheme is proposed to provide reasonable transmission rate without degrading the near users’ performance. [16], [17] discuss the potential gains of downlink NOMA when the users are allowed to cooperate. The gains mainly come from exploiting the fact that in downlink NOMA, users with stronger link can decode the signals intended to the weaker users and then use the decoded signals to help the weaker users. Another interesting and crucial problem in NOMA which does not appear in OMA is that one now has to carefully split the users into groups and let the users in each group share the same orthogonal resource block. A version of this problem where each group has two users is called user pairing and is studied recently in [18], [19].

Most of these works (and others in the literature) consider Gaussian input distributions which are more or less prohibited

See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

(2)

in practice. To the best of our knowledge, it is yet unclear how to systematically build practical encoders/decoders that can achieve the gains of NOMA promised by the theoretic works.

In this paper, we look at downlink NOMA and study practical coding and modulation design for realizing the promised gains of NOMA in terms of achievable rates. Specifically, We attempt to systematically build simple yet powerful schemes which can reliably operate at rate tuples close to the capacity region. Our approach is a two-step approach where we first consider the linear deterministic model [20] corresponding to the original Gaussian problem and then use the results obtained from the deterministic model to guide our design for the original model. This method has been phenomenally successful in tackling many difficult problems in network information theory. However, most of these works interpret the results of the deterministic model into coding schemes in the original model. On the contrary, this paper follows the steps of [21], [22] where instead of scheme-wise, we interpret everything in terms of input distributions and strive to find approximately optimal input distributions for which SIC may not be required.

Information-theoretic analysis is performed to show that the proposed input distributions together with the joint typical- set decoder [23] can indeed achieve the capacity region to within a constant gap even without SIC. We then simulate the proposed input distributions by off-the-shelf turbo codes and traditional PAM constellations to verify the theoretical analysis.

This paper contributes to the field of NOMA by substantially bridging the gap between theory and practice. The main contributions of this paper are listed as follows.

1) We propose purely discrete input distributions (or uni- form distributions over PAM constellations to be specific) for downlink NOMA by systematically translating the optimal input distributions of the corresponding deterministic model. We provide lower bounds on the achievable rates for our proposed input distributions and show that these achievable rate regions are able to approach the capacity region to within a constant gap (regardless of SNR). Our analysis uses the single-user joint typical-set decoder rather than the SIC decoder. This indicates that information-theoretically, single-user encoders/decoders (without SIC) suffice to be constant gap away from the capacity limit for downlink NOMA. We would like to emphasize that this may not be an easy task for the capacity-achieving Gaussian input distributions because at the strong user, without SIC, the signal intended for the weak user becomes another Gaussian noise and the gap would then depend onSNR.

2) Monte Carlo simulations are performed to show that there exist achievable rate tuples of the proposed input distributions with joint typical-set decoder lying outside the capacity region of any OMA-type scheme. This implies that theoretically, the proposed purely discrete input distributions can achieve rate tuples that are not achievable by any OMA-type scheme.

3) We build practical schemes with off-the-shelf turbo codes in conjunction with PAM constellations to simulate

the proposed approximately optimal input distributions.

Simulation results show that this simple scheme can operate close to the capacity region under standard turbo decoding [24], both with and without SIC.

We believe that this paper substantially bridges the gap between theory and practice as 1) the purely discrete input distributions are much easier to implement than the Gaussian input distributions which are usually used in the literature; 2) the removal of SIC significantly relieves the burden of decoding for the downlink case as the others’ codebooks are no longer required and decreases the latency.

A. Organization

The rest of the paper is organized as follows. In Section II, we provide the system model for downlink NOMA. We then investigate downlink NOMA in Section III where we study the corresponding deterministic model, translate the input distributions to the original Gaussian case, and show that the proposed distributions achieve the capacity region to within a constant gap. Practical encoders/decoders are then proposed in Section IV to approach the derived information-theoretic results. Section V provide simulation results followed by con- clusion in Section VI.

B. Notations

Throughout the paper,N represents the set of natural numbers andF2 is the binary field. Random variables are written in uppercase Sans Serif font. All the logarithms are to the base 2. For a real number x, (x)⁺ max{0, x}, log(x)⁺ max{0, log(x)}, and x rounds x to the smallest integer that is greater than or equal to x.

II. SYSTEMMODEL OFDOWNLINKNON-ORTHOGONAL

MULTIPLEACCESS FOROFDM

In an OFDM system [25], the spectrum is divided into orthogonal sub-channels where each sub-channel occupies a narrow band (relative to the channel coherent bandwidth) resource block. Under this condition, the signal sent through each sub-channel can be viewed as experiencing a flat fading channel in the frequency domain. Traditional OFDMA adopted in 4G communication assigns orthogonal resource blocks to users exclusively. On the other hand, NOMA allows users to share sub-channels. In order to maintain the flat fading assumption, it is widely accepted that we should implement NOMA on top of OFDM. i.e., one still partitions the spectrum into resource blocks but then assigns each resource block to several users which are allowed to send their signals simultaneously.

In the following, we describe the system model of downlink NOMA for OFDM, which is essentially the Gaussian BC channel [10]–[12].

The problem of downlink NOMA can be modelled as the Gaussian BC channel where a base station wishes to broadcast messagesW1, . . . , WK to users 1, . . . , K , respectively, as shown in Fig. 1. The base station first encodes the messages

(3)

Fig. 1. The BC model.

into the transmitted signal X subject to a power constraint E(X²) ≤ 1. The signals arrived at the user k is given by

Yk=

SNRkX + Zk, k ∈ {1, . . . , K }, (1) where Zk ∼ N(0, 1) and SNRk represent the overall channel effect involving the signal power, noise variance, and channel gains¹ of user k, respectively. The base station is assumed to have the knowledge of¹₂log(SNRk)⁺a version of quantized SNRk for all k∈ {1, . . . , K } and the user k is assumed to have the exact value ofSNRk. Since all the ingredients and insights of the proposed method appear in the two-user case; from now on, we shall focus on the two-user case for the sake of clarity and postpone the results of the general case to Appendix A.

Consider K = 2 and let R1and R2be the rates of the messages intended to the users 1 and 2, respectively. Without loss of generality, we assume SNR1≥ SNR2. The users 1 and 2 then individually make estimates ˆW1 and ˆW2of their desired messages W1 and W2, respectively. The capacity region of this channel is the collection of all the rate pairs (R1, R2) satisfying [23]

R1≤ 1

2log(1 + αSNR1) , (2)

R2≤ 1 2log

1+(1 − α)SNR2) 1+ αSNR1

, (3)

for anyα ∈ [0, 1] the parameter controlling the power allocation between signals intended to the users 1 and 2. Information- theoretically, the capacity region can be easily achieved by employing Gaussian input distributions and performing SIC as follows. At the strong user (user 1), it first decodesX2, sub- tracts it out, and then decodes X1. The weak user (user 2) directly decodesX2by treatingX1as noise. However, in practice, Gaussian input distributions are rarely considered due to implementation issues.

In the following sections, we aim at finding approximately optimal input distributions that are deemed practical. Before

1Here, we only consider the case where the channel gains are real. For the complex case, we can let the base station send the signal through real and imag- inary parts independently and let the user k rotate its phase back. By doing so, we obtain the same signal model. Hence, this model corresponds to the one with coherent detection.

Fig. 2. The deterministic BC channel.

proceeding, we summarize a lemma which will be frequently used when deriving achievable rates for the Gaussian model.

Lemma 1 (Proposition 1 of [26]): LetF be a discrete ran- dom variable with dmi n> 0. Let Z be a zero-mean unit variance random variable independent of F (not necessarily Gaussian random variable). Then

I(F, F + Z) ≥ H(F) − 1 2log

2π exp(1) 12

−1 2log

1+ 12

d_{mi n}²

. (4)

III. DOWNLINKNOMA WITHPURELYDISCRETE

INPUTDISTRIBUTIONS

In this section, we study downlink NOMA and propose purely discrete input distributions that can operate within a constant gap to the capacity region even without SIC. This is done by first considering the linear deterministic model [20]

corresponding to downlink NOMA and then systematically translating these input distributions into distributions for the original Gaussian model.

A. The Deterministic Model

Now, we present the deterministic model corresponding to downlink NOMA. This model is obtained by [20]: 1) approx- imating the channel coefficients by powers of 2, 2) expressing variables as binary expansions, 3) truncating signals below the noise level, and 4) ignoring the carry-overs at each level.

For any n1, n2∈ N, let q max(n1, n2) and. A two-user deterministic BC channel with channel parameters(n1, n2) is given by

Y1= S^q⁻ⁿ¹X, (5) Y2= S^q⁻ⁿ²X, (6) whereX ∈ F^q₂ and the operations are overF2. Without loss of generality, we assume n1≥ n2; thus, q= n1 in this case. An illustration of this model can be found in Fig. 2 where each link represents a bit pipe and one can think of that the lowest

(4)

n1− n2bits get shifted out at the user 2 to model the scenario where a part of the signals are below the noise level and cannot be losslessly recovered at the weaker user. The capacity region of the deterministic BC channel can be easily obtained which is the closure of the convex hull of integer pairs(m1, m2) ∈ N² satisfying [20]

m2≤ n2, (7)

m1+ m2≤ n1. (8)

This capacity region can be easily achieved by a simple scheme that uses the first m2bits to transmit message intended to the user 2 and the next m1bits to transmit message intended to the user 1 [20]. In what follows, we provide another view of this simple strategy which regards the scheme as input distributions obtained by linear transformations of uniform distributions. This view will be used to guide the design of the Gaussian model.

Consider a pair of integers (m1, m2)² satisfying the capacity bounds in (7) and (8). LetU1 andU2be two i.i.d. random vectors of length n1with entries drawn independently and uniformly from F2. Define r11 min{m1, n2− m2} and r12 (m1+ m2− n2)⁺. Let

E1=

⎡

⎢⎢

⎣

0m2,n1

F11

F12

0n₁−m1−m2,n1

⎤

⎥⎥

⎦ , (9)

and

E2= F2

0n₁−m2,n1

, (10)

where F11, F12, and F2are binary matrices with size r11× n1, r12× n1, and m2× n1, respectively, such that [F₁₁^TF₁₂^TF₂^T]^T is full rank. We then linearly transform the i.i.d. uniformly distributed U1 and U2 by E1 and E2 to form X1= E1U1 and X2= E2U2, respectively. The input distribution becomes

X = X1+ X2= E1U1+ E2U2

=

⎡

⎢⎢

⎣

F2U2

F11U1

F12U1

0n1−m1−m2,n1

⎤

⎥⎥

⎦ (11)

Note that with this choice, as long as (8) is satisfied, the positions whereX1 andX2 send their information are always disjoint. For the input distribution thus constructed, we have at the weaker user

I(X2; Y2) = H(Y2) − H(Y2|X2)

= H(S^q⁻ⁿ²X) − H(S^q⁻ⁿ²X1)

= rank(S^q⁻ⁿ²X) − rank(S^q⁻ⁿ²X1)

= rank

⎛

⎝

⎡

⎣0n1−n2,n1

F2U2

F11U1

⎤

⎦

⎞

⎠ − rank

0n1−n2+m2,n1

F11U1

= m2+ r11− r11

= m2, (12)

2Here, we only show the achievability of the integral rate demands. One can get the entire region by time-sharing.

Moreover, at the user 1, we have

I(X1; Y1) = H(Y1) − H(Y1|X1)

= H(S^q⁻ⁿ¹X) − H(S^q⁻ⁿ¹X2)

= rank(S^q⁻ⁿ¹X) − rank(S^q⁻ⁿ¹X2)

= m1+ m2− m2

= m1. (13)

It is worth mentioning that one can also follow the traditional method where SIC is adopted to achieve the capacity as

I(X1; Y1|X2) = H(Y1|X2) − H(Y1|X1, X2)

= H(S^q⁻ⁿ¹X1)

= m1. (14)

However, SIC requires the codebook knowledge of the other user and increases the decoding burden at the user 1. On the other hand, in (13), we only perform single-user decoding and assume that the stronger user is completely oblivious to the codebook knowledge of the other user. This suggests that no SIC is required and treating interference from the user 2 as noise is optimal. This intuition will be leveraged to design schemes for the Gaussian model.

B. The Gaussian Channel

Guided by the corresponding deterministic model, we now propose purely discrete input distributions for the Gaussian BC and analyze the achievable rate regions of using such input distributions. As in [22], we translate each full rank matrix in the deterministic into a traditional PAM constellation with cardinality and power level guided by the deterministic model. We let n1

1

2log(SNR1)⁺

and n2

1

2log(SNR2)⁺

. Also, let (m1, m2) be a pair of real numbers satisfying (7) and (8). In the following, we describe the proposed scheme and show that it can achieve every point inside the capacity region to within 2.4156 bits.

Before proceeding, we make the following two assumptions:

1) neither m1= 0 nor m2= 0 and 2) both SNR1≥ 2 and SNR2≥ 2; therefore, the + sign in the definition of n1 and n2

can be dropped. The former assumption is because if either one is 0, one can easily show that the PAM constellation can achieve the single-user capacity to within 0.2541 bit (corresponding to the 1.53 dB loss in shaping gain [27]). The latter assumption is because for the user withSNRk < 2, the capacity is at most 0.79 bit and is already smaller than the target.

Let us define γ

12

2²ⁿ¹^−2m¹^−2m²(2^2m¹^+2m²− 1)

= 2⁻ⁿ¹

12

1− 2^−2m¹^−2m², (15) and define the following discrete random variables

X11= 2ⁿ¹^−m²^−r¹¹F11, (16) X12= 2ⁿ¹^−m²^−r¹¹^−r¹²F12, (17)

(5)

(recall that r11 = min{m1, n2− m2}, r12 = (m1+ m2− n2)⁺, and r11+ r12= m1) and

X2= 2ⁿ¹^−m²F2, (18)

whereF11,F12, andF2are uniformly distributed over unit distance PAM constellations A11,A12, and A2 with cardinality 2^r¹¹, 2^r¹², and 2^m², respectively. The transmitted signal is then given by

X = γ (X11+ X12+ X2)

= γ (X1+ X2), (19)

whereX1 X11+ X12 andX2are the signals intended to the users 1 and 2, respectively.

Suppose r12= 0, then X12= ∅ and X1 is uniformly distributed with cardinality 2^m¹. Now consider r12 = 0. Note that the maximum distance ofX12is 2ⁿ¹^−m²^−r¹¹(1 − 2^−r¹²) and the minimum distance ofX11is 2ⁿ¹^−m²^−r¹¹. Thus, the supports of X11andX12 are disjoint andX1again is uniformly distributed with cardinality 2^m¹. Moreover, note that the maximum distance ofX1is 2ⁿ¹^−m²(1 − 2^−m¹) which is smaller than 2ⁿ¹^−m², the minimum distance ofX2. Therefore, the supports ofX1and X2are disjoint. This results in a uniformly distributedX with cardinality 2^m¹^+m². One can then verify that with this choice, E[X²]≤ 1; hence the power constraint is satisfied.

With this input distribution, the received signals at users 1 and 2 become

Y1=

SNR1X + Z1

=

SNR1γ

2ⁿ¹⁻ⁿ²F11+ 2ⁿ¹^−m¹^−m²F12

+ 2ⁿ¹^−m²F2

+ Z1, (20)

and

Y2=

SNR2X + Z2

=

SNR2γ

2ⁿ¹⁻ⁿ²F11+ 2ⁿ¹^−m¹^−m²F12

+2ⁿ¹^−m²F2

+ Z2, (21)

respectively.

We now directly bound the mutual information as follows.

At the receiver 1, the desired signal at the receiver 1 isX1and we thus bound I(X1; Y1) as

I(X1;Y1) = h(Y1) − h(Y1|X1)

= [h(Y1) − h(Z1)] − [h(

SNR1γ X2+ Z1) − h(Z1)]

= I (X; Y1) − I (

SNR1γ X2;

SNR1γ X2+ Z1)

≥ I (X; Y1) − H(

SNR1γ X2)

(a)≥ m1−1 2log

2π exp(1) 12

−1 2log

1+ 12

d_{mi n}²

(b)= m1− 1.4156, (22)

where (a) is from applying Lemma 1 and (b) is by plugging in the minimum distance of√

SNR1X dmi n=

SNR1γ · 2ⁿ¹^−m¹^−m²

(c)≥ 2ⁿ¹⁻¹2⁻ⁿ¹

12

1− 2^−2m¹^−2m² · 2ⁿ¹^−m¹^−m²

≥√

3, (23)

where in (c), we have usedSNR1> 2²⁽ⁿ¹⁻¹⁾. At the receiver 2, the desired signal isX2and hence we lower bound I(X2; Y2) as

I(X2; Y2) = h(Y2) − h(Y2|X2)

= [h(

SNR2γ (X11+ X12+ X2) + Z2)

− h(

SNR2γ X12+ Z2)]

− [h(

SNR2γ (X11+ X12) + Z2) − h(

SNR2γ X12+ Z2)]

= I (

SNR2γ (X11+ X2); Y2)

− I (

SNR2γ X11;

SNR2γ (X11+ X12) + Z2)

≥ I (

SNR2γ (X11+ X2); Y2) − H(

SNR2γ X11)

(d)≥ m2−1 2log

2π exp(1) 12

−1 2log

1+ 12

d_{mi n}²

,

(e)= m2− 1.4156, (24)

where (d) follows from Lemma 1 with √

SNR2γ (X11+ X2) as the discrete random variable and (e) is by plugging in the minimum distance of this random variable as

dmi n=

SNR2γ · 2ⁿ¹⁻ⁿ²

≥ 2ⁿ²⁻¹2⁻ⁿ¹

12

1− 2^−2m¹^−2m² · 2ⁿ¹⁻ⁿ²

≥√

3. (25)

Also, note that in the above analysis, we intentionally regard

√SNR2γ X12 as noise in order to make the minimum distance of the combined constellation√

SNR2γ (X11+ X2) to be bounded away from 0. One can see this by observing that if we include any part of√

SNR2γ X12 into the combined constellation, the minimum distance can be arbitrarily small as SNR2

grows. Or in other words, the signal√

SNR2γ X12 is below the noise level as predicted by the deterministic model.

Remark 2: It is known that for the BC channel, the capacity region of the Gaussian model differs from that of the deterministic model by at most 1 bit [20, Sec. II.B] (the gap is often much smaller). Hence, the results in this section indicate that the proposed input distributions can achieve every point on the capacity region to within a gap of 2.4156 bits. Note that the bounds are derived so that they are true regardless ofSNR1and SNR2. As we will see in Section V, the actual gap is usually much smaller. Moreover, the gap can be further shrunk by considering higher-dimensional constellations which have higher shaping gains.

Remark 3: We would like to emphasize that when deriv- ing the achievable rate of the stronger user in (22), no SIC is required and a single-user typical-set decoding is performed.

(6)

Fig. 3. An example of downlink transmission with gray mapping. Note that the combined constellation becomes non-gray.

Hence, the proposed approach does not require the codebook knowledge of the weaker user at the stronger user. This significantly alleviates the burden of downlink NOMA as sometimes it is unrealistic to assume that the receivers have others’ codebooks in the downlink scenario. It is worth mentioning that for the capacity-achieving Gaussian input distributions, if one abandons SIC at the user 1 and treats the signal intended for the user 2 as noise, the performance will be largely degraded and the gap to the capacity will depend onSNR as now the noise would have a Gaussian distribution with variance 1+ (1 − α)SNR1. One explanation of the above phenomenon is that the proposed input distribution is carefully designed such that when viewed as noise, it only has finitely many possibilities which are well-separated. Similar observation that discrete input distributions sometimes outperform Gaussian ones has also been made in [28] where mixed discrete and Gaussian distributions are used for symmetric Gaussian interference channel.

Remark 4: As one may have already noticed, the power allo- cation parameterα in (2) and (3) does not explicitly appear in our proposed method. In fact, the proposed method naturally induces a power allocation strategy between users which can be derived fromγ and the parameters in front of the input distributions (16)–(18). Therefore, one can also think of the proposed method as a power and rate allocation strategy for a scheme using uniform input distribution over a PAM constellation.

IV. PRACTICALENCODERS/DECODERS ANDSIMULATIONRESULTS

In this section, we build practical encoders/decoders to simulate the input distributions proposed in (16), (17), and (18). We use off-the-shelf binary linear codes in conjunction with PAM constellations A11,A12, and A2. This scheme will be tested in Section V where we provide simulation results of the practical encoders/decoders for justifying the theoretical results in Section III.

Specifically, we adopt multilevel coding [29] where the messages are split into bit steams and each stream is encoded by a widely known turbo code in 3GPP technical specification [30].

The coded bits are then mapped to the constellation via gray mapping as shown in Fig. 3 where an example withA11 being 2-PAM,A12= ∅, and A2being 4-PAM is plotted. One can see from this figure that the labeling of the combined constellation Asumat the receiver no longer follows the rule of gray mapping.

We discuss the following two cases where 1) user 1 adopts SIC and 2) user 1 does not adopt SIC. Note that to perform SIC,

full knowledge about the codebook of the co-scheduled user is required at user 1. In this case, user 1 first decodes the other user’s message and then reconstructs the codeword. It then sub- tracts this codeword to get a cleaner received signal and decodes its own message from this signal. The non-SIC receiver does not require the co-scheduled user’s cookbook information and tries to directly decode its own codeword from the received signal. Since user 2 has smaller SNR, it is very unlikely that it can decode the other’s message successfully; therefore, only the case without SIC is considered.

Let A^(b)_sum_{, j} be the set containing all the points in Asum

whose j th bit is b. The user k∈ {1, 2} first calculates the log- likelihood ratio (LLR) of the coded bits bj,lfor each bit j and each received symbol l given by³

LLR(bj,l) = log

max

ξ∈√

SNRk·A⁽¹⁾_{sum, j}

√1 2πexp

−¹₂|yk,l− ξ|²

max

ξ∈√

SNRk·A⁽⁰⁾_sum_{, j}

√1 2πexp

−¹₂|yk,l− ξ|²

= 1 2

⎛

⎝ min

ξ∈√

SNR^k·A⁽⁰⁾_{sum, j}|yk,l− ξ|²− min

ξ∈√

SNR^k·A⁽¹⁾_{sum, j}|yk,l− ξ|²

⎞

⎠,

(26) where yk,l represents the lth received symbol at the user k.

These LLRs are then fed into a channel decoder. Let us first consider the case where we use SIC, user 1 first uses turbo decoding with 32 iterations to decode the message from the user 2. Note that since we adopt multilevel coding, multistage decoding [29] is used here and in the following to decode the message bit by bit. If success⁴, the corresponding codeword is regenerated and the decoder uses this knowledge to improve the LLR with corresponding constellation reduced to√

SNR1A1. It then uses another turbo decoder with 32 iterations for decoding the message from the user 1. If the decoding of the first codeword fails, the decoder falls back to the decoder without SIC which is discussed in the following.

For decoding at user 1 without SIC and decoding at user 2, the decoder simply uses the computed LLR (26) to decode the desired message. We would like to emphasize that the current decoder is by no means optimal and there are several ways to improve it. For example, one option would be performing an iterative decoding which iterates not only within turbo decoding but also between demodulation and decoding. Another option would be drawing a super-graph corresponding to the two turbo codes connected by the channel observations and running iterative decoding on this super-graph. We do not pursue these possibilities here as our intention is to use well-known designs that are deemed as practical.

We conclude this section by noting that although the mapping from coded bits onto constellation points does not affect the overall mutual information as long as it is bijective [23], it does matter in the finite blocklengh regime and does affect the

3Here, we use a widely accepted approximation which replaces summation by max [24].

4In practice, this can be easily checked by applying CRC on top of channel coding.

(7)

Fig. 4. The BC capacity region, the time-sharing region, the region obtained by time-sharing between two single-user PAM schemes, the rate pairs achieved by the proposed scheme with SIC, and the rate pairs achieved by the proposed scheme without SIC. The channel gains are SNR1= 20 dB and SNR2= 10 dB.

design of decoding. The interface between coding and modulation is quite important and interesting; however, it is not the focus of the present work and we intentionally choose gray mapping which is simple and has been adopted almost every- where in practice. Design of algorithms for bit-labelling is left as a potential future work.

V. SIMULATIONRESULTS

In this section, we perform simulations to verify the effec- tiveness of the proposed method for both the two-user and three-user cases. For each case, we first consider the proposed input distributions (16), (17), and (18) for the two-user case and see Appendix A for the three-user case) and perform Monte Carlo simulation (averaged over 10⁶samples) to verify the theoretical results in Section III. We show that although the bounds have indicated a constant gap of at most 2.4156 bits per user, the rate loss can in fact be much smaller. We then perform simulations to verify that the practical scheme constructed in the previous section based on these distributions can indeed operate at points close to the capacity region.

A. Two-User Case

In Fig. 4, we considerSNR1= 20 dB and SNR2= 10 dB and show the capacity region (obtained by Gaussian input distributions), the Gaussian time-sharing region (obtained by time-sharing between the two corner points with Gaussian input distributions), the PAM time-sharing region (obtained by time- sharing between two corner points with PAM inputs), the rate pairs achieved by the proposed scheme with SIC, and that without SIC. One observes that the proposed method with SIC can operate at rate pairs very close to the capacity region and there are many achievable rate pairs (even without SIC) lying outside the Gaussian time-sharing region. This indicates an exciting fact that the proposed simple method can achieve rate pairs that is not achievable by any OMA-type downlink scheme whose capacity region is associated with the Gaussian time-sharing region. Another observation here is that the gap in R1between the proposed scheme with SIC and that without SIC becomes

Fig. 5. The BC capacity region, the time-sharing region, the region obtained by time-sharing between two single-user PAM schemes, the rate pairs achieved by the proposed scheme with SIC, and the rate pairs achieved by the proposed scheme without SIC. The channel gains are SNR1= 30 dB and SNR2= 10 dB.

Fig. 6. The BC capacity region, the time-sharing region, the region obtained by time-sharing between two single-user PAM schemes, the rate pairs achieved by the proposed scheme with SIC, and the rate pairs achieved by the proposed scheme without SIC. The channel gains areSNR1= 15 dB and SNR2= 3 dB.

larger as R2 gets larger. This is due to the fact that large R2

would introduce higher uncertainty at the stronger user (user 1) and the use of the SIC can take care of the uncertainty before decoding the user 1’s message. However, for some applications, it may be unrealistic to assume that the strong user always has the codebook knowledge of the weak one; hence, we might have to live with the rate loss in such applications.

We also provide simulation results for the case ofSNR1= 30 dB andSNR2= 10 dB in Fig. 5. Similar observations can be made for these sets of parameters. One can also see a well-known phenomenon by comparing Fig. 4 and Fig. 5 that the larger theSNR difference, the higher gain a NOMA-type scheme can have over time-sharing. Moreover, we also simulate the proposed scheme at (relatively) lowSNR in Fig. 6 where the SNR1= 15 dB and SNR2= 3 dB and show that similar trend can be observed.

Next, we simulate the proposed practical encoder/decoder and plot the achievable rate pairs⁵. In all the simulations, we

5We say a rate is achievable if it is-feasible [4]. i.e., this encoder/decoder provides a word error rate smaller than. We adopt in this paper a widely

(8)

TABLE I

PARAMETERS FOR THEPRACTICALENCODER/DECODER INFIG. 4

TABLE II

TABLE III

fix the codeword length of each turbo code to be 5000 and vary the information length according to the selected code rate. The results are also shown in Fig. 4 (parameters given in Table I), Fig. 5 (parameters given in Table II), and in Fig. 6 (parameters given in Table III). Each achievable rate pair is obtained by a sequence of simulations as follows. We start with the target mutual information previously obtained, say(r₁^∗, r₂^∗), and keep decreasing (with step size 0.005) the rate of the user 2 until the simulation result shows that it is achievable (i.e.,- feasible). Call this rate r2. We then start with(r₁^∗, r2) and keep decreasing (with step size 0.005) the rate of the user 1 until it is achievable (i.e.,-feasible). We call this rate r1and say(r1, r2) is achievable.

In the last column of these tables, we also show the(R1, R2) gains over the theoretical limit of the OMA-type scheme which employs time-sharing between two single-user PAM inputs. For each entry, two pairs of gains in rate are shown where the first one corresponds to the gains in R1 with R2 fixed the same while the second one corresponds to the gains in R2with R1

fixed the same. It should be noted that we are comparing the achievable rate pairs of the proposed practical scheme having a finite block length with the theoretical limits of the OMA- type scheme where infinite block length is allowed essentially.

This comparison is not entirely fair to the proposed scheme;

however, we still observe large rate improvements of the proposed scheme over the OMA-type scheme. One can expect a larger gain if we compare the achievable rate tuples to the OMA scheme with a practical block length. The simulation results in these figures and tables have corroborated our theoretic analysis and show that in practice, although Gaussian input distributions are more or less forbidden and the assumption of SIC may be

accepted value  = 10⁻¹; however, we have repeated our simulations for

 = 10⁻²and observed negligible differences.

Fig. 7. The BC capacity region, the time-sharing region, the region obtained by time-sharing between two single-user PAM schemes, the rate pairs achieved by the proposed scheme with SIC, and the rate pairs achieved by the proposed scheme without SIC. The channel gains areSNR1= 30 dB, SNR2= 20 dB, andSNR3= 10 dB and we set R3= 0.76 bits per channel use.

unrealistic, the promised gain of NOMA is still realizable by simple and practical designs. Also, there are many ways to further bridge the gap; for example, by employing more powerful channel codes and/or by choosing a longer block length.

B. Three-User Case

Although we have focused on the two-user case throughout the paper and only discussed the K -user case in Appendix A, we provide some simulation results for the 3-user case to demonstrate that the proposed scheme can again provide similar gains. We first consider in Fig. 7 the (relatively) highSNRs case whereSNR1= 30 dB, SNR2= 20 dB, and SNR3= 10 dB. For easy exposition, we fix m3= 1 and enforce m1+ m2+ m3= n1. For this case, the Monte Carlo simulation provides R3= 0.76 bits per channel use irrespective of the choice of m1and m2because the sum is fixed to n1and the user 3 is the weakest one. For the capacity region shown in Fig. 7, we fixβ the power allocated to the user 3 to be

β (2^2R³− 1)(SNR3+ 1)

(2^2R³ + 1)SNR3 , (27) such that the capacity bound would be equal to R3exactly. The capacity region shown in Fig. 7 is then generated by sharing the remained power(1 − β) between users 1 and 2, which corre- sponds to the slice of the entire capacity region at R3= 0.76 bits per channel use. Similar, we plot the capacity region of the OMA-type schemes by fixing the time allocated to the user 3 to be

λ R3

1

2log(1 + SNR3), (28) and sharing the remained time between users 1 and 2. This again corresponds to the slice of the entire time-sharing region at R3= 0.76 bits per channel use. Recall that by OMA, we mean the scheme exclusively allocating its resource to a user and then using time-sharing among users. Therefore, for the

(9)

Fig. 8. The BC capacity region, the time-sharing region, the region obtained by time-sharing between two single-user PAM schemes, the rate pairs achieved by the proposed scheme with SIC, and the rate pairs achieved by the proposed scheme without SIC. The channel gains areSNR1= 30 dB, SNR2= 15 dB, andSNR3= 3 dB and we set R3= 0.485 bits per channel use.

TABLE IV

three-user case shown in Fig. 7, the OMA-type capacity region is no longer the time-sharing between the corner points of the capacity region shown in Fig. 7. The curve thus created would correspond to power-sharing between user 3 and users 1 and 2, and then time-sharing between users 1 and 2, which is clearly not allowed in OMA.

We now plot the achievable (R1, R2) pairs generated by Monte Carlo simulations. For the case where we do not perform SIC at receivers, both the users 1 and 2 do not attempt to decode others’ codewords which may be decodable. Thus, one observes that for the case without SIC, both users suffer from a rate loss. Akin to the two-user case, one observes that for this particular R3there are rate pairs (even without SIC) lying outside the Gaussian time-sharing region. This again shows that the proposed simple method can achieve rate tuples lying outside the capacity region of any OMA-type scheme.

In Fig. 8, we perform the same simulation with (relatively) lowSNRs where SNR1= 30 dB, SNR2= 15 dB, and SNR3= 3 dB. We again generate the capacity region and time-sharing region according to the above methods and plot the achievable rate pairs(R1, R2) by the proposed method with R3= 0.485 bits per channel use fixed. Similar observations can be made which again shows that the proposed simple method can realize the promised gains of NOMA in both the high and lowSNR regimes.

We also verify the theoretical results by simulating the proposed practical scheme similar to what we have done for the two-user case. The parameters can be found in Table IV and Table V. Note that since we adopt PAM constellations, each

TABLE V

user will suffer from a rate loss to the capacity region. In this regard, for all the simulations of the practical schemes shown in Fig. 7, the user 3 backs off to operate at R3= 0.69 bits per channel use, which corresponds to a 0.07 bits rate loss. For the results in Fig. 8, the user 3 backs off to operate at R3= 0.385) bits per channel use, which corresponds to a 0.1 bits rate loss.

VI. CONCLUSIONS ANDFUTUREWORK

The downlink NOMA system has been studied with the focus being on systematically designing practical schemes that can operate close to the fundamental limits. We have first investigated the corresponding linear deterministic model and proposed capacity-achieving input distributions. We have then leveraged the results in the deterministic model to systematically design schemes for the Gaussian case. We have lower bounded the achievable rate regions and have shown that our designs can achieve every point inside the capacity region to with a constant gap even without SIC. Practical encoders/decoders based on the proposed input distributions have been studied which can realize the promised gains of NOMA over OMA-type schemes.

Some interesting directions for future research are listed in the following. It is interesting to use higher-dimensional constellations which have better packing and shaping gains to further shrink the gap to the capacity region. Another direction is to study NOMA under the case of multiple antennas, i.e., multiple-input multiple output (MIMO), as MIMO has been a

“must-have” in almost every modern communication systems and will almost certainly be included in the future standards.

This possibility has recently been studied in [31]–[34]. Also, designing practical coding schemes for uplink NOMA is an interesting problem. One expects the problem for uplink to be more challenging than downlink discussed in this paper as for uplink, different users would suffer from different phases which may not be easy to compensate simultaneously.

APPENDIXA

PROOF OF THEK -USER CASE

So far, we have always considered the two-user case. In this appendix, we briefly discuss the K -user case and show that the results in Section III are generalizable. We again first consider the deterministic model and then investigate the original Gaussian model.

A. The Deterministic Model

For the deterministic model, let n1, n2, . . . , nK ∈ N be the number of bit pipes available from the transmitter to the user

(10)

K , respectively. We assume without loss of generality that n1≥ n2≥ . . . ≥ nK; in this case, q max{n1, . . . , nK} = n1. Thus, the received signal at the user k is given byYk = S^q⁻ⁿ^kX where X is the transmitted signal. The capacity region of this channel is the closure of the convex hull of (m1, . . . , mK) ∈ N^K satisfying

mK ≤ nK, (29)

mK−1+ mK ≤ nK−1, (30) ...

m1+ . . . + mK ≤ n1, (31) Consider any particular tuple(m1, . . . , mK) ∈ N^Ksatisfying (29)–(31). We proceed the proof by first combining the users 2, 3, . . . , K into a super-user with n2 bit pipes available. Let ms,2=K

k=2mk be the total amount of bits this super-user demands. The problem then reduces to the two-user broadcast channel with channel parameters(n1, n2) and the rate pair (m1, ms,2). Observe that this rate pair lies inside the capacity region of the equivalent two-user broadcast channel; therefore, we can use the results in Section III to show the achievability.

Let E1and Es,2 be the corresponding schemes for generating the capacity-achieving input distributions. Note that since the super-user is the weaker user in this reduced channel, Es,2 will always occupy the bit pipes at top levels (akin to (10).

We now break this super-user into two sub-users, namely the user 2 and the new super-user consisting of users 3, . . . , K , and let ms,3 =K

k=3mkbe the total amount of bits this new super- user demands. Now the problem reduces to the two-user broadcast channel with channel parameters(ms,2, min{n3, ms,2}) and the rate pair(m2, ms,3). One can verify that the rate pair falls inside the capacity region of this new broadcast channel and hence one can use the results in Section III to obtain capacity- achieving E2and Es,3. Note that these two schemes will only use the top m2+ ms,3= ms,2levels, which is exactly the lev- els Es,2 has used. Therefore, one can think of it as we break Es,2into E2and Es,3without affecting E1(their non-zero rows are disjoint). Again, the super-user is the weaker user here and hence Es,3will always occupy the bit pipes at top levels.

From this point, one can set up a recursion as follows.

At the stage l, what we have obtained from the previous stages are E1, . . . , El−1 and Es,l where Es,l only occu- pies the top ms,l K

k=lmk levels. Now, we look into the equivalent two-user broadcast channel with channel parameters (ms,l, min{nl+1, ms,l}) and the rate pair (ml, ms,l+1) where ms,l+1_K

k=l+1mk. This again falls inside the capacity region and the results in Section III can be applied to create El and Es,l+1 out from Es,l without affecting E1, . . . , El−1. The resulting Es,l+1 will correspond to the bit pipes at the top ms,l+1levels. After completing the K− 1 stage, we obtain E1, . . . , EK−1, EK(= Es,K) that can achieve this particular set of rates(m1, . . . , mK) ∈ N^K. Note that this argument works for any such tuple satisfying (29)–(31); hence, we have completed the proof for the deterministic model.

B. The Gaussian Channel

For the Gaussian model, we first note that the proof for the two-user case only hinges on that the part of the signal

corresponding to the signal above the noise level (not shifted out in the deterministic model) at each receiver contains the desired signal and is uniformly distributed over a discrete constellation with a non-vanishing minimum distance. This is also true for the K -user case as Lemma 1 can be applied if the above condition holds.

Similar to the two-user case in Section III, we again directly translate the results in the deterministic model to the Gaussian case by translating each full rank matrix in the deterministic into a uniform distribution over a PAM constellation with cardinality and power level guided by the deterministic model. Let nk

1

2log(SNRk)⁺

and (m1, . . . , mK) ∈ N^K be the tuple satisfying (29)–(31). We consider only the case that mk> 0 and bothSNRk ≥ 2 for all k ∈ {1, . . . , K }; therefore, the + sign in the definition of nkcan be dropped.

Let

γ

12

2²

n₁−K

k=1m_k

(2²^K^k⁼¹^m^k − 1)

= 2⁻ⁿ¹

12

1− 2⁻²^K^k⁼¹^m^k, (32)

and define the following discrete random variables

Xk = 2

n₁−K

l=km_l

Fk, k ∈ {1, . . . K }, (33)

where Fk is uniformly distributed over an unit distance PAM constellationAkwith cardinality 2^m^k. The transmitted signal is then given by

X = γ (X1+ X2+ . . . + XK)

where Xkis the signal intended to the users k. One can verify that X is uniformly distributed over a PAM constellation with cardinality 2

K k=1m_k

and minimum distance γ · 2

n₁−K

k=1m_k

. Also, one can then verify that with this choice E[X²]≤ 1; hence the power constraint is satisfied.

With this input distribution, the received signals at the kth user is given by

Yk=

SNRkX + Zk

=

SNRkγ (X1+ X2+ . . . + XK) + Zk

=

SNRkγ (Xa,k+ Xb,k) + Zk, (34)

where we have used Xa,k and Xb,k to denote the signals corresponding to the parts above and below the noise level (depending on whether it gets shifted out or not in the determin- istic model) at the kth user, respectively. Note thatXk⊆ Xa,kby construction.