A Simple Scheme for Realizing the Promised Gains of Downlink Nonorthogonal Multiple Access
Shin-Lin Shieh, Member, IEEE, and Yu-Chih Huang, Member, IEEE
Abstract—In this paper, the downlink nonorthogonal multiple access (NOMA) system is studied where purely discrete input dis- tributions are found that achieve the capacity region to within a constant gap without successive interference cancellation (SIC).
The approach is a two-step approach where the corresponding linear deterministic model is first studied and the results are then systematically translated into purely discrete input distri- butions for the original model. A simple yet powerful coding scheme, which adopts off-the-shelf turbo codes with pulse ampli- tude modulations (PAM) is then used to simulate the proposed input distributions. Simulation results show that the proposed sim- ple scheme under turbo decoding, both with and without SIC, can operate close to information-theoretic bounds of the proposed input distributions, which lies outside the achievable rate region of any orthogonal multiple access (OMA)-type scheme.
Index Terms—Non-orthogonal multiple access (NOMA), capac- ity region, optimal input distributions, linear deterministic model.
I. INTRODUCTION
D
RIVEN by new types of services such as internet of things (IoT) and by unprecedentedly increasing demands of transmission rates, 5G has been extensively discussed for standardization and 3GPP has made the plan to submit their proposal of novel communication technologies into the IMT 2020 process triggered by ITU-R [1]. Currently, some impor- tant requirements of future communication systems such as 5G, have been recognized with two of the most crucial ones being 1) providing higher system capacity while taking into account the fairness among users [2]; 2) providing massive connectivity to deal with huge number of new devices [3]. In [1]–[4], non- orthogonal multiple access (NOMA) is proposed as a candidate of future radio access to partially fulfill these two requirements and the possibility of downlink NOMA for 5G is currently being studied by 3GPP [5].The currently prevailing approach for multiple access lies in the category of orthogonal multiple access (OMA) which includes frequency division multiple access (FDMA) and time
Manuscript received November 4, 2015; revised January 14, 2016; accepted February 15, 2016. Date of publication February 24, 2016; date of current ver- sion April 13, 2016. The work of S.-L. Shieh was supported by Ministry of Science and Technology, Taiwan, under grant MOST 104-2221-E-305-004. The work of Y.-C. Huang was supported by Ministry of Science and Technology, Taiwan, under grant MOST 104-2218-E-305-001-MY2. This work was also supported by the Industrial Technology Research Institute (ITRI), Taiwan. The associate editor coordinating the review of this paper and approving it for publication was Z. Ding.
The authors are with the Department of Communication Engineering, National Taipei University, New Taipei City 23741, Taiwan (e-mail:
[email protected]; [email protected]).
Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TCOMM.2016.2533489
division multiple access (TDMA) adopted in 2G systems, and orthogonal frequency division multiple access (OFDMA) used in 4G systems. This approach first partitions resource into orthogonal resource blocks and then assigns each resource block exclusively to one user. After this, the problem is reduced to a point-to-point communication problem and then well- developed single-user encoders/decoders can be applied. On the one hand, the complexity of this approach is merely the com- plexity of single-user encoders/decoders. On the other hand, assigning resource blocks exclusively can be very inefficient (in terms of achievable rate regions) and may pose a serious problem about fairness among users.
In contrast to OMA, NOMA allows users to use same resource blocks for transmission simultaneously and hence is potentially more efficient. Indeed, when evaluated under the LTE system, NOMA demonstrates significant gains over OMA systems [6], [7]. In theory, uplink and downlink NOMA can be modelled as multiple access channel (MAC) and broadcast channel (BC), respectively, which have been intensively stud- ied for decades. For the Gaussian case, it is well known that the capacity regions of the Gaussian MAC [8], [9] and that of the Gaussian BC [10]–[12] can be achieved by schemes involv- ing successive interference cancellation (SIC). However, due to hardware limitations, these results have been largely limited to the information-theoretic realm. Recent advances in hardware have reignited the interest of using SIC for multiple access [5], [13] and have suggested the possibility of implementing NOMA.
Apart from [1]–[4], there have been many works discussing potential gains of NOMA over OMA-type systems in vari- ous aspects. In [14], it is shown that H-ARQ is better suited for NOMA than OMA in terms of outage probability. In [15], downlink NOMA is considered in a coordinated two-point sys- tem. When the two base stations are allowed to cooperate, a scheme is proposed to provide reasonable transmission rate without degrading the near users’ performance. [16], [17] dis- cuss the potential gains of downlink NOMA when the users are allowed to cooperate. The gains mainly come from exploiting the fact that in downlink NOMA, users with stronger link can decode the signals intended to the weaker users and then use the decoded signals to help the weaker users. Another interesting and crucial problem in NOMA which does not appear in OMA is that one now has to carefully split the users into groups and let the users in each group share the same orthogonal resource block. A version of this problem where each group has two users is called user pairing and is studied recently in [18], [19].
Most of these works (and others in the literature) consider Gaussian input distributions which are more or less prohibited
0090-6778 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
in practice. To the best of our knowledge, it is yet unclear how to systematically build practical encoders/decoders that can achieve the gains of NOMA promised by the theoretic works.
In this paper, we look at downlink NOMA and study practi- cal coding and modulation design for realizing the promised gains of NOMA in terms of achievable rates. Specifically, We attempt to systematically build simple yet powerful schemes which can reliably operate at rate tuples close to the capacity region. Our approach is a two-step approach where we first consider the linear deterministic model [20] corresponding to the original Gaussian problem and then use the results obtained from the deterministic model to guide our design for the orig- inal model. This method has been phenomenally successful in tackling many difficult problems in network information theory. However, most of these works interpret the results of the deterministic model into coding schemes in the original model. On the contrary, this paper follows the steps of [21], [22] where instead of scheme-wise, we interpret everything in terms of input distributions and strive to find approximately optimal input distributions for which SIC may not be required.
Information-theoretic analysis is performed to show that the proposed input distributions together with the joint typical- set decoder [23] can indeed achieve the capacity region to within a constant gap even without SIC. We then simulate the proposed input distributions by off-the-shelf turbo codes and traditional PAM constellations to verify the theoretical analysis.
This paper contributes to the field of NOMA by substan- tially bridging the gap between theory and practice. The main contributions of this paper are listed as follows.
1) We propose purely discrete input distributions (or uni- form distributions over PAM constellations to be specific) for downlink NOMA by systematically translating the optimal input distributions of the corresponding determin- istic model. We provide lower bounds on the achievable rates for our proposed input distributions and show that these achievable rate regions are able to approach the capacity region to within a constant gap (regardless of SNR). Our analysis uses the single-user joint typical-set decoder rather than the SIC decoder. This indicates that information-theoretically, single-user encoders/decoders (without SIC) suffice to be constant gap away from the capacity limit for downlink NOMA. We would like to emphasize that this may not be an easy task for the capacity-achieving Gaussian input distributions because at the strong user, without SIC, the signal intended for the weak user becomes another Gaussian noise and the gap would then depend onSNR.
2) Monte Carlo simulations are performed to show that there exist achievable rate tuples of the proposed input distri- butions with joint typical-set decoder lying outside the capacity region of any OMA-type scheme. This implies that theoretically, the proposed purely discrete input dis- tributions can achieve rate tuples that are not achievable by any OMA-type scheme.
3) We build practical schemes with off-the-shelf turbo codes in conjunction with PAM constellations to simulate
the proposed approximately optimal input distributions.
Simulation results show that this simple scheme can oper- ate close to the capacity region under standard turbo decoding [24], both with and without SIC.
We believe that this paper substantially bridges the gap between theory and practice as 1) the purely discrete input distributions are much easier to implement than the Gaussian input distribu- tions which are usually used in the literature; 2) the removal of SIC significantly relieves the burden of decoding for the down- link case as the others’ codebooks are no longer required and decreases the latency.
A. Organization
The rest of the paper is organized as follows. In Section II, we provide the system model for downlink NOMA. We then investigate downlink NOMA in Section III where we study the corresponding deterministic model, translate the input dis- tributions to the original Gaussian case, and show that the proposed distributions achieve the capacity region to within a constant gap. Practical encoders/decoders are then proposed in Section IV to approach the derived information-theoretic results. Section V provide simulation results followed by con- clusion in Section VI.
B. Notations
Throughout the paper,N represents the set of natural num- bers andF2 is the binary field. Random variables are written in uppercase Sans Serif font. All the logarithms are to the base 2. For a real number x, (x)+ max{0, x}, log(x)+ max{0, log(x)}, and x rounds x to the smallest integer that is greater than or equal to x.
II. SYSTEMMODEL OFDOWNLINKNON-ORTHOGONAL
MULTIPLEACCESS FOROFDM
In an OFDM system [25], the spectrum is divided into orthogonal sub-channels where each sub-channel occupies a narrow band (relative to the channel coherent bandwidth) resource block. Under this condition, the signal sent through each sub-channel can be viewed as experiencing a flat fading channel in the frequency domain. Traditional OFDMA adopted in 4G communication assigns orthogonal resource blocks to users exclusively. On the other hand, NOMA allows users to share sub-channels. In order to maintain the flat fading assump- tion, it is widely accepted that we should implement NOMA on top of OFDM. i.e., one still partitions the spectrum into resource blocks but then assigns each resource block to several users which are allowed to send their signals simultaneously.
In the following, we describe the system model of down- link NOMA for OFDM, which is essentially the Gaussian BC channel [10]–[12].
The problem of downlink NOMA can be modelled as the Gaussian BC channel where a base station wishes to broad- cast messagesW1, . . . , WK to users 1, . . . , K , respectively, as shown in Fig. 1. The base station first encodes the messages
Fig. 1. The BC model.
into the transmitted signal X subject to a power constraint E(X2) ≤ 1. The signals arrived at the user k is given by
Yk=
SNRkX + Zk, k ∈ {1, . . . , K }, (1) where Zk ∼ N(0, 1) and SNRk represent the overall channel effect involving the signal power, noise variance, and channel gains1 of user k, respectively. The base station is assumed to have the knowledge of12log(SNRk)+a version of quantized SNRk for all k∈ {1, . . . , K } and the user k is assumed to have the exact value ofSNRk. Since all the ingredients and insights of the proposed method appear in the two-user case; from now on, we shall focus on the two-user case for the sake of clarity and postpone the results of the general case to Appendix A.
Consider K = 2 and let R1and R2be the rates of the mes- sages intended to the users 1 and 2, respectively. Without loss of generality, we assume SNR1≥ SNR2. The users 1 and 2 then individually make estimates ˆW1 and ˆW2of their desired messages W1 and W2, respectively. The capacity region of this channel is the collection of all the rate pairs (R1, R2) satisfying [23]
R1≤ 1
2log(1 + αSNR1) , (2)
R2≤ 1 2log
1+(1 − α)SNR2) 1+ αSNR1
, (3)
for anyα ∈ [0, 1] the parameter controlling the power alloca- tion between signals intended to the users 1 and 2. Information- theoretically, the capacity region can be easily achieved by employing Gaussian input distributions and performing SIC as follows. At the strong user (user 1), it first decodesX2, sub- tracts it out, and then decodes X1. The weak user (user 2) directly decodesX2by treatingX1as noise. However, in prac- tice, Gaussian input distributions are rarely considered due to implementation issues.
In the following sections, we aim at finding approximately optimal input distributions that are deemed practical. Before
1Here, we only consider the case where the channel gains are real. For the complex case, we can let the base station send the signal through real and imag- inary parts independently and let the user k rotate its phase back. By doing so, we obtain the same signal model. Hence, this model corresponds to the one with coherent detection.
Fig. 2. The deterministic BC channel.
proceeding, we summarize a lemma which will be frequently used when deriving achievable rates for the Gaussian model.
Lemma 1 (Proposition 1 of [26]): LetF be a discrete ran- dom variable with dmi n> 0. Let Z be a zero-mean unit variance random variable independent of F (not necessarily Gaussian random variable). Then
I(F, F + Z) ≥ H(F) − 1 2log
2π exp(1) 12
−1 2log
1+ 12
dmi n2
. (4)
III. DOWNLINKNOMA WITHPURELYDISCRETE
INPUTDISTRIBUTIONS
In this section, we study downlink NOMA and propose purely discrete input distributions that can operate within a constant gap to the capacity region even without SIC. This is done by first considering the linear deterministic model [20]
corresponding to downlink NOMA and then systematically translating these input distributions into distributions for the original Gaussian model.
A. The Deterministic Model
Now, we present the deterministic model corresponding to downlink NOMA. This model is obtained by [20]: 1) approx- imating the channel coefficients by powers of 2, 2) expressing variables as binary expansions, 3) truncating signals below the noise level, and 4) ignoring the carry-overs at each level.
For any n1, n2∈ N, let q max(n1, n2) and. A two-user deterministic BC channel with channel parameters(n1, n2) is given by
Y1= Sq−n1X, (5) Y2= Sq−n2X, (6) whereX ∈ Fq2 and the operations are overF2. Without loss of generality, we assume n1≥ n2; thus, q= n1 in this case. An illustration of this model can be found in Fig. 2 where each link represents a bit pipe and one can think of that the lowest
n1− n2bits get shifted out at the user 2 to model the scenario where a part of the signals are below the noise level and cannot be losslessly recovered at the weaker user. The capacity region of the deterministic BC channel can be easily obtained which is the closure of the convex hull of integer pairs(m1, m2) ∈ N2 satisfying [20]
m2≤ n2, (7)
m1+ m2≤ n1. (8)
This capacity region can be easily achieved by a simple scheme that uses the first m2bits to transmit message intended to the user 2 and the next m1bits to transmit message intended to the user 1 [20]. In what follows, we provide another view of this simple strategy which regards the scheme as input distributions obtained by linear transformations of uniform dis- tributions. This view will be used to guide the design of the Gaussian model.
Consider a pair of integers (m1, m2)2 satisfying the capac- ity bounds in (7) and (8). LetU1 andU2be two i.i.d. random vectors of length n1with entries drawn independently and uni- formly from F2. Define r11 min{m1, n2− m2} and r12 (m1+ m2− n2)+. Let
E1=
⎡
⎢⎢
⎣
0m2,n1
F11
F12
0n1−m1−m2,n1
⎤
⎥⎥
⎦ , (9)
and
E2= F2
0n1−m2,n1
, (10)
where F11, F12, and F2are binary matrices with size r11× n1, r12× n1, and m2× n1, respectively, such that [F11TF12TF2T]T is full rank. We then linearly transform the i.i.d. uniformly dis- tributed U1 and U2 by E1 and E2 to form X1= E1U1 and X2= E2U2, respectively. The input distribution becomes
X = X1+ X2= E1U1+ E2U2
=
⎡
⎢⎢
⎣
F2U2
F11U1
F12U1
0n1−m1−m2,n1
⎤
⎥⎥
⎦ (11)
Note that with this choice, as long as (8) is satisfied, the positions whereX1 andX2 send their information are always disjoint. For the input distribution thus constructed, we have at the weaker user
I(X2; Y2) = H(Y2) − H(Y2|X2)
= H(Sq−n2X) − H(Sq−n2X1)
= rank(Sq−n2X) − rank(Sq−n2X1)
= rank
⎛
⎝
⎡
⎣0n1−n2,n1
F2U2
F11U1
⎤
⎦
⎞
⎠ − rank
0n1−n2+m2,n1
F11U1
= m2+ r11− r11
= m2, (12)
2Here, we only show the achievability of the integral rate demands. One can get the entire region by time-sharing.
Moreover, at the user 1, we have
I(X1; Y1) = H(Y1) − H(Y1|X1)
= H(Sq−n1X) − H(Sq−n1X2)
= rank(Sq−n1X) − rank(Sq−n1X2)
= m1+ m2− m2
= m1. (13)
It is worth mentioning that one can also follow the traditional method where SIC is adopted to achieve the capacity as
I(X1; Y1|X2) = H(Y1|X2) − H(Y1|X1, X2)
= H(Sq−n1X1)
= m1. (14)
However, SIC requires the codebook knowledge of the other user and increases the decoding burden at the user 1. On the other hand, in (13), we only perform single-user decoding and assume that the stronger user is completely oblivious to the codebook knowledge of the other user. This suggests that no SIC is required and treating interference from the user 2 as noise is optimal. This intuition will be leveraged to design schemes for the Gaussian model.
B. The Gaussian Channel
Guided by the corresponding deterministic model, we now propose purely discrete input distributions for the Gaussian BC and analyze the achievable rate regions of using such input dis- tributions. As in [22], we translate each full rank matrix in the deterministic into a traditional PAM constellation with cardinal- ity and power level guided by the deterministic model. We let n1
1
2log(SNR1)+
and n2
1
2log(SNR2)+
. Also, let (m1, m2) be a pair of real numbers satisfying (7) and (8). In the following, we describe the proposed scheme and show that it can achieve every point inside the capacity region to within 2.4156 bits.
Before proceeding, we make the following two assumptions:
1) neither m1= 0 nor m2= 0 and 2) both SNR1≥ 2 and SNR2≥ 2; therefore, the + sign in the definition of n1 and n2
can be dropped. The former assumption is because if either one is 0, one can easily show that the PAM constellation can achieve the single-user capacity to within 0.2541 bit (corresponding to the 1.53 dB loss in shaping gain [27]). The latter assumption is because for the user withSNRk < 2, the capacity is at most 0.79 bit and is already smaller than the target.
Let us define γ
12
22n1−2m1−2m2(22m1+2m2− 1)
= 2−n1
12
1− 2−2m1−2m2, (15) and define the following discrete random variables
X11= 2n1−m2−r11F11, (16) X12= 2n1−m2−r11−r12F12, (17)
(recall that r11 = min{m1, n2− m2}, r12 = (m1+ m2− n2)+, and r11+ r12= m1) and
X2= 2n1−m2F2, (18)
whereF11,F12, andF2are uniformly distributed over unit dis- tance PAM constellations A11,A12, and A2 with cardinality 2r11, 2r12, and 2m2, respectively. The transmitted signal is then given by
X = γ (X11+ X12+ X2)
= γ (X1+ X2), (19)
whereX1 X11+ X12 andX2are the signals intended to the users 1 and 2, respectively.
Suppose r12= 0, then X12= ∅ and X1 is uniformly dis- tributed with cardinality 2m1. Now consider r12 = 0. Note that the maximum distance ofX12is 2n1−m2−r11(1 − 2−r12) and the minimum distance ofX11is 2n1−m2−r11. Thus, the supports of X11andX12 are disjoint andX1again is uniformly distributed with cardinality 2m1. Moreover, note that the maximum dis- tance ofX1is 2n1−m2(1 − 2−m1) which is smaller than 2n1−m2, the minimum distance ofX2. Therefore, the supports ofX1and X2are disjoint. This results in a uniformly distributedX with cardinality 2m1+m2. One can then verify that with this choice, E[X2]≤ 1; hence the power constraint is satisfied.
With this input distribution, the received signals at users 1 and 2 become
Y1=
SNR1X + Z1
=
SNR1γ
2n1−n2F11+ 2n1−m1−m2F12
+ 2n1−m2F2
+ Z1, (20)
and
Y2=
SNR2X + Z2
=
SNR2γ
2n1−n2F11+ 2n1−m1−m2F12
+2n1−m2F2
+ Z2, (21)
respectively.
We now directly bound the mutual information as follows.
At the receiver 1, the desired signal at the receiver 1 isX1and we thus bound I(X1; Y1) as
I(X1;Y1) = h(Y1) − h(Y1|X1)
= [h(Y1) − h(Z1)] − [h(
SNR1γ X2+ Z1) − h(Z1)]
= I (X; Y1) − I (
SNR1γ X2;
SNR1γ X2+ Z1)
≥ I (X; Y1) − H(
SNR1γ X2)
(a)≥ m1−1 2log
2π exp(1) 12
−1 2log
1+ 12
dmi n2
(b)= m1− 1.4156, (22)
where (a) is from applying Lemma 1 and (b) is by plugging in the minimum distance of√
SNR1X dmi n=
SNR1γ · 2n1−m1−m2
(c)≥ 2n1−12−n1
12
1− 2−2m1−2m2 · 2n1−m1−m2
≥√
3, (23)
where in (c), we have usedSNR1> 22(n1−1). At the receiver 2, the desired signal isX2and hence we lower bound I(X2; Y2) as
I(X2; Y2) = h(Y2) − h(Y2|X2)
= [h(
SNR2γ (X11+ X12+ X2) + Z2)
− h(
SNR2γ X12+ Z2)]
− [h(
SNR2γ (X11+ X12) + Z2) − h(
SNR2γ X12+ Z2)]
= I (
SNR2γ (X11+ X2); Y2)
− I (
SNR2γ X11;
SNR2γ (X11+ X12) + Z2)
≥ I (
SNR2γ (X11+ X2); Y2) − H(
SNR2γ X11)
(d)≥ m2−1 2log
2π exp(1) 12
−1 2log
1+ 12
dmi n2
,
(e)= m2− 1.4156, (24)
where (d) follows from Lemma 1 with √
SNR2γ (X11+ X2) as the discrete random variable and (e) is by plugging in the minimum distance of this random variable as
dmi n=
SNR2γ · 2n1−n2
≥ 2n2−12−n1
12
1− 2−2m1−2m2 · 2n1−n2
≥√
3. (25)
Also, note that in the above analysis, we intentionally regard
√SNR2γ X12 as noise in order to make the minimum dis- tance of the combined constellation√
SNR2γ (X11+ X2) to be bounded away from 0. One can see this by observing that if we include any part of√
SNR2γ X12 into the combined constella- tion, the minimum distance can be arbitrarily small as SNR2
grows. Or in other words, the signal√
SNR2γ X12 is below the noise level as predicted by the deterministic model.
Remark 2: It is known that for the BC channel, the capacity region of the Gaussian model differs from that of the determin- istic model by at most 1 bit [20, Sec. II.B] (the gap is often much smaller). Hence, the results in this section indicate that the proposed input distributions can achieve every point on the capacity region to within a gap of 2.4156 bits. Note that the bounds are derived so that they are true regardless ofSNR1and SNR2. As we will see in Section V, the actual gap is usually much smaller. Moreover, the gap can be further shrunk by con- sidering higher-dimensional constellations which have higher shaping gains.
Remark 3: We would like to emphasize that when deriv- ing the achievable rate of the stronger user in (22), no SIC is required and a single-user typical-set decoding is performed.
Fig. 3. An example of downlink transmission with gray mapping. Note that the combined constellation becomes non-gray.
Hence, the proposed approach does not require the codebook knowledge of the weaker user at the stronger user. This signifi- cantly alleviates the burden of downlink NOMA as sometimes it is unrealistic to assume that the receivers have others’ code- books in the downlink scenario. It is worth mentioning that for the capacity-achieving Gaussian input distributions, if one abandons SIC at the user 1 and treats the signal intended for the user 2 as noise, the performance will be largely degraded and the gap to the capacity will depend onSNR as now the noise would have a Gaussian distribution with variance 1+ (1 − α)SNR1. One explanation of the above phenomenon is that the proposed input distribution is carefully designed such that when viewed as noise, it only has finitely many possibilities which are well-separated. Similar observation that discrete input dis- tributions sometimes outperform Gaussian ones has also been made in [28] where mixed discrete and Gaussian distributions are used for symmetric Gaussian interference channel.
Remark 4: As one may have already noticed, the power allo- cation parameterα in (2) and (3) does not explicitly appear in our proposed method. In fact, the proposed method naturally induces a power allocation strategy between users which can be derived fromγ and the parameters in front of the input distribu- tions (16)–(18). Therefore, one can also think of the proposed method as a power and rate allocation strategy for a scheme using uniform input distribution over a PAM constellation.
IV. PRACTICALENCODERS/DECODERS ANDSIMULATIONRESULTS
In this section, we build practical encoders/decoders to simu- late the input distributions proposed in (16), (17), and (18). We use off-the-shelf binary linear codes in conjunction with PAM constellations A11,A12, and A2. This scheme will be tested in Section V where we provide simulation results of the prac- tical encoders/decoders for justifying the theoretical results in Section III.
Specifically, we adopt multilevel coding [29] where the mes- sages are split into bit steams and each stream is encoded by a widely known turbo code in 3GPP technical specification [30].
The coded bits are then mapped to the constellation via gray mapping as shown in Fig. 3 where an example withA11 being 2-PAM,A12= ∅, and A2being 4-PAM is plotted. One can see from this figure that the labeling of the combined constellation Asumat the receiver no longer follows the rule of gray mapping.
We discuss the following two cases where 1) user 1 adopts SIC and 2) user 1 does not adopt SIC. Note that to perform SIC,
full knowledge about the codebook of the co-scheduled user is required at user 1. In this case, user 1 first decodes the other user’s message and then reconstructs the codeword. It then sub- tracts this codeword to get a cleaner received signal and decodes its own message from this signal. The non-SIC receiver does not require the co-scheduled user’s cookbook information and tries to directly decode its own codeword from the received sig- nal. Since user 2 has smaller SNR, it is very unlikely that it can decode the other’s message successfully; therefore, only the case without SIC is considered.
Let A(b)sum, j be the set containing all the points in Asum
whose j th bit is b. The user k∈ {1, 2} first calculates the log- likelihood ratio (LLR) of the coded bits bj,lfor each bit j and each received symbol l given by3
LLR(bj,l) = log
max
ξ∈√
SNRk·A(1)sum, j
√1 2πexp
−12|yk,l− ξ|2
max
ξ∈√
SNRk·A(0)sum, j
√1 2πexp
−12|yk,l− ξ|2
= 1 2
⎛
⎝ min
ξ∈√
SNRk·A(0)sum, j|yk,l− ξ|2− min
ξ∈√
SNRk·A(1)sum, j|yk,l− ξ|2
⎞
⎠,
(26) where yk,l represents the lth received symbol at the user k.
These LLRs are then fed into a channel decoder. Let us first consider the case where we use SIC, user 1 first uses turbo decoding with 32 iterations to decode the message from the user 2. Note that since we adopt multilevel coding, multistage decoding [29] is used here and in the following to decode the message bit by bit. If success4, the corresponding codeword is regenerated and the decoder uses this knowledge to improve the LLR with corresponding constellation reduced to√
SNR1A1. It then uses another turbo decoder with 32 iterations for decoding the message from the user 1. If the decoding of the first code- word fails, the decoder falls back to the decoder without SIC which is discussed in the following.
For decoding at user 1 without SIC and decoding at user 2, the decoder simply uses the computed LLR (26) to decode the desired message. We would like to emphasize that the current decoder is by no means optimal and there are several ways to improve it. For example, one option would be performing an iterative decoding which iterates not only within turbo decoding but also between demodulation and decoding. Another option would be drawing a super-graph corresponding to the two turbo codes connected by the channel observations and running iter- ative decoding on this super-graph. We do not pursue these possibilities here as our intention is to use well-known designs that are deemed as practical.
We conclude this section by noting that although the map- ping from coded bits onto constellation points does not affect the overall mutual information as long as it is bijective [23], it does matter in the finite blocklengh regime and does affect the
3Here, we use a widely accepted approximation which replaces summation by max [24].
4In practice, this can be easily checked by applying CRC on top of channel coding.
Fig. 4. The BC capacity region, the time-sharing region, the region obtained by time-sharing between two single-user PAM schemes, the rate pairs achieved by the proposed scheme with SIC, and the rate pairs achieved by the pro- posed scheme without SIC. The channel gains are SNR1= 20 dB and SNR2= 10 dB.
design of decoding. The interface between coding and modula- tion is quite important and interesting; however, it is not the focus of the present work and we intentionally choose gray mapping which is simple and has been adopted almost every- where in practice. Design of algorithms for bit-labelling is left as a potential future work.
V. SIMULATIONRESULTS
In this section, we perform simulations to verify the effec- tiveness of the proposed method for both the two-user and three-user cases. For each case, we first consider the proposed input distributions (16), (17), and (18) for the two-user case and see Appendix A for the three-user case) and perform Monte Carlo simulation (averaged over 106samples) to verify the the- oretical results in Section III. We show that although the bounds have indicated a constant gap of at most 2.4156 bits per user, the rate loss can in fact be much smaller. We then perform sim- ulations to verify that the practical scheme constructed in the previous section based on these distributions can indeed operate at points close to the capacity region.
A. Two-User Case
In Fig. 4, we considerSNR1= 20 dB and SNR2= 10 dB and show the capacity region (obtained by Gaussian input distributions), the Gaussian time-sharing region (obtained by time-sharing between the two corner points with Gaussian input distributions), the PAM time-sharing region (obtained by time- sharing between two corner points with PAM inputs), the rate pairs achieved by the proposed scheme with SIC, and that with- out SIC. One observes that the proposed method with SIC can operate at rate pairs very close to the capacity region and there are many achievable rate pairs (even without SIC) lying outside the Gaussian time-sharing region. This indicates an exciting fact that the proposed simple method can achieve rate pairs that is not achievable by any OMA-type downlink scheme whose capacity region is associated with the Gaussian time-sharing region. Another observation here is that the gap in R1between the proposed scheme with SIC and that without SIC becomes
Fig. 5. The BC capacity region, the time-sharing region, the region obtained by time-sharing between two single-user PAM schemes, the rate pairs achieved by the proposed scheme with SIC, and the rate pairs achieved by the pro- posed scheme without SIC. The channel gains are SNR1= 30 dB and SNR2= 10 dB.
Fig. 6. The BC capacity region, the time-sharing region, the region obtained by time-sharing between two single-user PAM schemes, the rate pairs achieved by the proposed scheme with SIC, and the rate pairs achieved by the proposed scheme without SIC. The channel gains areSNR1= 15 dB and SNR2= 3 dB.
larger as R2 gets larger. This is due to the fact that large R2
would introduce higher uncertainty at the stronger user (user 1) and the use of the SIC can take care of the uncertainty before decoding the user 1’s message. However, for some applications, it may be unrealistic to assume that the strong user always has the codebook knowledge of the weak one; hence, we might have to live with the rate loss in such applications.
We also provide simulation results for the case ofSNR1= 30 dB andSNR2= 10 dB in Fig. 5. Similar observations can be made for these sets of parameters. One can also see a well-known phenomenon by comparing Fig. 4 and Fig. 5 that the larger theSNR difference, the higher gain a NOMA-type scheme can have over time-sharing. Moreover, we also simulate the proposed scheme at (relatively) lowSNR in Fig. 6 where the SNR1= 15 dB and SNR2= 3 dB and show that similar trend can be observed.
Next, we simulate the proposed practical encoder/decoder and plot the achievable rate pairs5. In all the simulations, we
5We say a rate is achievable if it is-feasible [4]. i.e., this encoder/decoder provides a word error rate smaller than. We adopt in this paper a widely
TABLE I
PARAMETERS FOR THEPRACTICALENCODER/DECODER INFIG. 4
TABLE II
PARAMETERS FOR THEPRACTICALENCODER/DECODER INFIG. 5
TABLE III
PARAMETERS FOR THEPRACTICALENCODER/DECODER INFIG. 6
fix the codeword length of each turbo code to be 5000 and vary the information length according to the selected code rate. The results are also shown in Fig. 4 (parameters given in Table I), Fig. 5 (parameters given in Table II), and in Fig. 6 (parame- ters given in Table III). Each achievable rate pair is obtained by a sequence of simulations as follows. We start with the tar- get mutual information previously obtained, say(r1∗, r2∗), and keep decreasing (with step size 0.005) the rate of the user 2 until the simulation result shows that it is achievable (i.e.,- feasible). Call this rate r2. We then start with(r1∗, r2) and keep decreasing (with step size 0.005) the rate of the user 1 until it is achievable (i.e.,-feasible). We call this rate r1and say(r1, r2) is achievable.
In the last column of these tables, we also show the(R1, R2) gains over the theoretical limit of the OMA-type scheme which employs time-sharing between two single-user PAM inputs. For each entry, two pairs of gains in rate are shown where the first one corresponds to the gains in R1 with R2 fixed the same while the second one corresponds to the gains in R2with R1
fixed the same. It should be noted that we are comparing the achievable rate pairs of the proposed practical scheme having a finite block length with the theoretical limits of the OMA- type scheme where infinite block length is allowed essentially.
This comparison is not entirely fair to the proposed scheme;
however, we still observe large rate improvements of the pro- posed scheme over the OMA-type scheme. One can expect a larger gain if we compare the achievable rate tuples to the OMA scheme with a practical block length. The simulation results in these figures and tables have corroborated our theoretic analysis and show that in practice, although Gaussian input distributions are more or less forbidden and the assumption of SIC may be
accepted value = 10−1; however, we have repeated our simulations for
= 10−2and observed negligible differences.
Fig. 7. The BC capacity region, the time-sharing region, the region obtained by time-sharing between two single-user PAM schemes, the rate pairs achieved by the proposed scheme with SIC, and the rate pairs achieved by the proposed scheme without SIC. The channel gains areSNR1= 30 dB, SNR2= 20 dB, andSNR3= 10 dB and we set R3= 0.76 bits per channel use.
unrealistic, the promised gain of NOMA is still realizable by simple and practical designs. Also, there are many ways to fur- ther bridge the gap; for example, by employing more powerful channel codes and/or by choosing a longer block length.
B. Three-User Case
Although we have focused on the two-user case throughout the paper and only discussed the K -user case in Appendix A, we provide some simulation results for the 3-user case to demonstrate that the proposed scheme can again provide similar gains. We first consider in Fig. 7 the (relatively) highSNRs case whereSNR1= 30 dB, SNR2= 20 dB, and SNR3= 10 dB. For easy exposition, we fix m3= 1 and enforce m1+ m2+ m3= n1. For this case, the Monte Carlo simulation provides R3= 0.76 bits per channel use irrespective of the choice of m1and m2because the sum is fixed to n1and the user 3 is the weakest one. For the capacity region shown in Fig. 7, we fixβ the power allocated to the user 3 to be
β (22R3− 1)(SNR3+ 1)
(22R3 + 1)SNR3 , (27) such that the capacity bound would be equal to R3exactly. The capacity region shown in Fig. 7 is then generated by sharing the remained power(1 − β) between users 1 and 2, which corre- sponds to the slice of the entire capacity region at R3= 0.76 bits per channel use. Similar, we plot the capacity region of the OMA-type schemes by fixing the time allocated to the user 3 to be
λ R3
1
2log(1 + SNR3), (28) and sharing the remained time between users 1 and 2. This again corresponds to the slice of the entire time-sharing region at R3= 0.76 bits per channel use. Recall that by OMA, we mean the scheme exclusively allocating its resource to a user and then using time-sharing among users. Therefore, for the
Fig. 8. The BC capacity region, the time-sharing region, the region obtained by time-sharing between two single-user PAM schemes, the rate pairs achieved by the proposed scheme with SIC, and the rate pairs achieved by the proposed scheme without SIC. The channel gains areSNR1= 30 dB, SNR2= 15 dB, andSNR3= 3 dB and we set R3= 0.485 bits per channel use.
TABLE IV
PARAMETERS FOR THEPRACTICALENCODER/DECODER INFIG. 7
three-user case shown in Fig. 7, the OMA-type capacity region is no longer the time-sharing between the corner points of the capacity region shown in Fig. 7. The curve thus created would correspond to power-sharing between user 3 and users 1 and 2, and then time-sharing between users 1 and 2, which is clearly not allowed in OMA.
We now plot the achievable (R1, R2) pairs generated by Monte Carlo simulations. For the case where we do not per- form SIC at receivers, both the users 1 and 2 do not attempt to decode others’ codewords which may be decodable. Thus, one observes that for the case without SIC, both users suffer from a rate loss. Akin to the two-user case, one observes that for this particular R3there are rate pairs (even without SIC) lying out- side the Gaussian time-sharing region. This again shows that the proposed simple method can achieve rate tuples lying outside the capacity region of any OMA-type scheme.
In Fig. 8, we perform the same simulation with (relatively) lowSNRs where SNR1= 30 dB, SNR2= 15 dB, and SNR3= 3 dB. We again generate the capacity region and time-sharing region according to the above methods and plot the achievable rate pairs(R1, R2) by the proposed method with R3= 0.485 bits per channel use fixed. Similar observations can be made which again shows that the proposed simple method can realize the promised gains of NOMA in both the high and lowSNR regimes.
We also verify the theoretical results by simulating the pro- posed practical scheme similar to what we have done for the two-user case. The parameters can be found in Table IV and Table V. Note that since we adopt PAM constellations, each
TABLE V
PARAMETERS FOR THEPRACTICALENCODER/DECODER INFIG. 8
user will suffer from a rate loss to the capacity region. In this regard, for all the simulations of the practical schemes shown in Fig. 7, the user 3 backs off to operate at R3= 0.69 bits per channel use, which corresponds to a 0.07 bits rate loss. For the results in Fig. 8, the user 3 backs off to operate at R3= 0.385) bits per channel use, which corresponds to a 0.1 bits rate loss.
VI. CONCLUSIONS ANDFUTUREWORK
The downlink NOMA system has been studied with the focus being on systematically designing practical schemes that can operate close to the fundamental limits. We have first investigated the corresponding linear deterministic model and proposed capacity-achieving input distributions. We have then leveraged the results in the deterministic model to sys- tematically design schemes for the Gaussian case. We have lower bounded the achievable rate regions and have shown that our designs can achieve every point inside the capac- ity region to with a constant gap even without SIC. Practical encoders/decoders based on the proposed input distributions have been studied which can realize the promised gains of NOMA over OMA-type schemes.
Some interesting directions for future research are listed in the following. It is interesting to use higher-dimensional con- stellations which have better packing and shaping gains to further shrink the gap to the capacity region. Another direction is to study NOMA under the case of multiple antennas, i.e., multiple-input multiple output (MIMO), as MIMO has been a
“must-have” in almost every modern communication systems and will almost certainly be included in the future standards.
This possibility has recently been studied in [31]–[34]. Also, designing practical coding schemes for uplink NOMA is an interesting problem. One expects the problem for uplink to be more challenging than downlink discussed in this paper as for uplink, different users would suffer from different phases which may not be easy to compensate simultaneously.
APPENDIXA
PROOF OF THEK -USER CASE
So far, we have always considered the two-user case. In this appendix, we briefly discuss the K -user case and show that the results in Section III are generalizable. We again first con- sider the deterministic model and then investigate the original Gaussian model.
A. The Deterministic Model
For the deterministic model, let n1, n2, . . . , nK ∈ N be the number of bit pipes available from the transmitter to the user
K , respectively. We assume without loss of generality that n1≥ n2≥ . . . ≥ nK; in this case, q max{n1, . . . , nK} = n1. Thus, the received signal at the user k is given byYk = Sq−nkX where X is the transmitted signal. The capacity region of this chan- nel is the closure of the convex hull of (m1, . . . , mK) ∈ NK satisfying
mK ≤ nK, (29)
mK−1+ mK ≤ nK−1, (30) ...
m1+ . . . + mK ≤ n1, (31) Consider any particular tuple(m1, . . . , mK) ∈ NKsatisfying (29)–(31). We proceed the proof by first combining the users 2, 3, . . . , K into a super-user with n2 bit pipes available. Let ms,2=K
k=2mk be the total amount of bits this super-user demands. The problem then reduces to the two-user broad- cast channel with channel parameters(n1, n2) and the rate pair (m1, ms,2). Observe that this rate pair lies inside the capacity region of the equivalent two-user broadcast channel; therefore, we can use the results in Section III to show the achievability.
Let E1and Es,2 be the corresponding schemes for generating the capacity-achieving input distributions. Note that since the super-user is the weaker user in this reduced channel, Es,2 will always occupy the bit pipes at top levels (akin to (10).
We now break this super-user into two sub-users, namely the user 2 and the new super-user consisting of users 3, . . . , K , and let ms,3 =K
k=3mkbe the total amount of bits this new super- user demands. Now the problem reduces to the two-user broad- cast channel with channel parameters(ms,2, min{n3, ms,2}) and the rate pair(m2, ms,3). One can verify that the rate pair falls inside the capacity region of this new broadcast channel and hence one can use the results in Section III to obtain capacity- achieving E2and Es,3. Note that these two schemes will only use the top m2+ ms,3= ms,2levels, which is exactly the lev- els Es,2 has used. Therefore, one can think of it as we break Es,2into E2and Es,3without affecting E1(their non-zero rows are disjoint). Again, the super-user is the weaker user here and hence Es,3will always occupy the bit pipes at top levels.
From this point, one can set up a recursion as follows.
At the stage l, what we have obtained from the previous stages are E1, . . . , El−1 and Es,l where Es,l only occu- pies the top ms,l K
k=lmk levels. Now, we look into the equivalent two-user broadcast channel with channel param- eters (ms,l, min{nl+1, ms,l}) and the rate pair (ml, ms,l+1) where ms,l+1K
k=l+1mk. This again falls inside the capac- ity region and the results in Section III can be applied to create El and Es,l+1 out from Es,l without affecting E1, . . . , El−1. The resulting Es,l+1 will correspond to the bit pipes at the top ms,l+1levels. After completing the K− 1 stage, we obtain E1, . . . , EK−1, EK(= Es,K) that can achieve this particular set of rates(m1, . . . , mK) ∈ NK. Note that this argument works for any such tuple satisfying (29)–(31); hence, we have completed the proof for the deterministic model.
B. The Gaussian Channel
For the Gaussian model, we first note that the proof for the two-user case only hinges on that the part of the signal
corresponding to the signal above the noise level (not shifted out in the deterministic model) at each receiver contains the desired signal and is uniformly distributed over a discrete con- stellation with a non-vanishing minimum distance. This is also true for the K -user case as Lemma 1 can be applied if the above condition holds.
Similar to the two-user case in Section III, we again directly translate the results in the deterministic model to the Gaussian case by translating each full rank matrix in the deterministic into a uniform distribution over a PAM constellation with car- dinality and power level guided by the deterministic model. Let nk
1
2log(SNRk)+
and (m1, . . . , mK) ∈ NK be the tuple satisfying (29)–(31). We consider only the case that mk> 0 and bothSNRk ≥ 2 for all k ∈ {1, . . . , K }; therefore, the + sign in the definition of nkcan be dropped.
Let
γ
12
22
n1−K
k=1mk
(22Kk=1mk − 1)
= 2−n1
12
1− 2−2Kk=1mk, (32)
and define the following discrete random variables
Xk = 2
n1−K
l=kml
Fk, k ∈ {1, . . . K }, (33)
where Fk is uniformly distributed over an unit distance PAM constellationAkwith cardinality 2mk. The transmitted signal is then given by
X = γ (X1+ X2+ . . . + XK)
where Xkis the signal intended to the users k. One can ver- ify that X is uniformly distributed over a PAM constella- tion with cardinality 2
K k=1mk
and minimum distance γ · 2
n1−K
k=1mk
. Also, one can then verify that with this choice E[X2]≤ 1; hence the power constraint is satisfied.
With this input distribution, the received signals at the kth user is given by
Yk=
SNRkX + Zk
=
SNRkγ (X1+ X2+ . . . + XK) + Zk
=
SNRkγ (Xa,k+ Xb,k) + Zk, (34)
where we have used Xa,k and Xb,k to denote the signals corresponding to the parts above and below the noise level (depending on whether it gets shifted out or not in the determin- istic model) at the kth user, respectively. Note thatXk⊆ Xa,kby construction.