國立臺灣大學電機資訊學院電信工程學研究所 碩士論文
Department of Electrical Engineering
College of Electrical Engineering and Computer Science National Taiwan University
Master Thesis
有限回授之多天線非正交多工接取系統下使用強化學 習選擇調變編碼模式
On Using Reinforcement Learning to Select Modulation/Coding Schemes for Non-Orthogonal Multiple Access in Multi-User Multiple-Input Multiple-
Output Systems with Limited Feedback
楊雅涵 Ya-Han Yang
指導教授:謝宏昀 博士 Advisor: Hung-Yun Hsieh, Ph.D.
中華民國 108 年 2 月
February 2019
doi:10.6342/NTU201900453
致謝
碩士時光匆匆而過,收穫遠超出進入碩士前的我能夠想像的。
首先要感謝我的指導老師謝宏昀教授。老師總是能夠適時地指出盲點。「大 膽假設,小心求證」,這句從小就耳熟能詳的名言,在碩士生涯有了更深刻的體 悟。老師細心的引導,漸漸地培養起我更嚴謹、全面的思考,學到如何有系統地 探索新領域,是我一生的收穫。另外也感謝老師讓我有機會去德國交換,和各國 家的學生交流,體驗不同的文化。
再來,感謝實驗室夥伴們給的各種溫暖。昀樸和群雄學長,在網管方面提供 了許多協助。冠全和閎琛學長不僅常為實驗室帶來歡樂,也總是不吝於分享經驗,
畢業後也時常關心大家,是實驗室溫暖的大家長們。Eason 學長在我剛進實驗室 的時候給了我許多通訊上的指導,使我更快地上手。昀庭是一路上的好夥伴,在 研究上各種幫忙與討論,生活上各種閒聊和照顧,為我的研究生活增添不少樂趣
。王鈞是一起準備交換的夥伴,每次聽他分享日本經驗總是很有趣。俊伸的實作 能力讓我欽佩,是一開始學習 reinforcement learning 的夥伴。感謝實驗室的 學弟妹們,口試時的各種協助,和遠端操作的幫忙。能夠順利口試,學弟妹們功 不可沒。我會懷念唱KTV的時光,大家都很會 rap,我覺得我快跟不上時代了。
感謝朋友們的鼓勵和支持,很幸運能夠交到一群好朋友,總是願意各種聽 我抱怨,嘴砲之餘也不忘和我一起想解決方法。你們像一場及時雨,總是即時的 伸出援手,也豐富了生活。相信未來的日子,也能繼續互相扶持。
最後,最感謝的就是家人,總是無條件地支持,使我在求學的路上,沒有後 顧之憂,在遇到困難的時候,總是陪伴著我,每次回到家吃完飯,就覺得又是嶄 新的一天。未來的人生,換我成為家裡的後盾了。
最後的最後,謝謝那些逝去的人們的關愛,給了我力量去面對挑戰。
doi:10.6342/NTU201900453
摘要
有許多的研究專注於結合多天線多使用者系統(MU-MIMO)和非正交多工接取
(NOMA)這兩項技術去提升流量,但這些研究大部分並沒有考慮在實際 LTE/LTE-A 環境下,通道資訊回饋是有限的。而根據我們的分析,有限通道資訊 回饋造成的量化誤差會導致不合適的資源配置,因此,在結合這兩項技術時,必 須考慮如何獲得準確信號與干擾雜訊比(SINR)的方法。為了避免改變目前 LTE- A 的通道回饋規範,本論文採用 Outer Loop Link Adaption(OLLA),利用混合動 態回傳(HARQ)來動態調整估 SINR。對於 LTE-A 連結時間短的連線,OLLA 的收斂速度會是一個重要的議題。針對該議題,現行的OLLA 多只考慮通道品質 指標(CQI),但在多天線系統中,預編碼索引(PMI)以及不同排程組合的干擾 也須列入考量。在本論文中,我們使用強化學習去改良OLLA。強化學習可以自 動與環境互動,觀察出各種不被現有知識限制的策略。但當將強化學習應用至一 個新的領域時,對於該問題的了解會是能否有效訓練的關鍵,因此,我們分析了 何種因素會影響OLLA 的策略。基於該分析,我們考慮了排程過的使用者資訊、
通道回饋、偏好的調制與編碼策略和一起排程的使用者以設計合適的特徵擷取、
獎賞設計(reward shaping)和探索(exploration and exploitation)機制,並對訓練 相關參數進行各種嘗試,以提供更有效率的訓練架構。與OLLA 的基線相比,我 們提出的方法在結合NOMA+MU-MIMO 時有 7%的增益,在 MU-MIMO 有 14%
的增益。此外,OLLA 收斂速度增快了 38%。總而言之,我們提出一個能自動改 良OLLA 的架構,此架構能有效針對不同回饋處理干擾並改善流量和收斂速度。
ABSTRACT
Much work has been done to improve the overall throughput by jointly consid- ering MU-MIMO and NOMA, but little has considered the combination of these techniques under practical environments in LTE/LTE-A, in which the feedback of CSI is limited. Based on our analysis, a method capable of obtaining accu- rate SINR is important to reduce the improper resource allocation caused by such limited CSI feedback. In this work, to avoid changing the current feedback ar- chitecture in LTE-A, we adopt outer loop link adaption (OLLA) to dynamically modify the MCS according to HARQ. Convergence plays a crucial role while ap- plying OLLA in LTE-A due to the characteristics of short connections. For the convergence issue, only CQI is considered in most existing OLLA, while PMI and pairing should be taken into account in MU-MIMO. In this work, we adopt the reinforcement learning to enhance OLLA. Reinforcement learning is a technique which can explore unknown strategies through interacting with the environment.
When applying reinforcement learning in a new field, domain knowledge is im- portant for effective training. Therefore, the factors affecting the strategy of OLLA, including the past assigned MCS of the scheduled users, feedback, de- sired MCS, and pairing user, are analyzed for state design, reward shaping, and exploration and exploitation. Our proposed method improves the throughput by 7% in NOMA+MU-MIMO and by 14% in MU-MIMO. Moreover, the convergence speed is increased by 38%. To conclude, we propose an architecture that can en- hance OLLA automatically, deal with the interference, improve the throughput, and accelerate the convergence under different types of feedback.
doi:10.6342/NTU201900453
TABLE OF CONTENTS
ABSTRACT . . . ii
LIST OF TABLES . . . v
LIST OF FIGURES . . . vi
CHAPTER 1 INTRODUCTION . . . 1
CHAPTER 2 BACKGROUND AND RELATED WORK . . . 5
2.1 Background . . . 5
2.1.1 MU-MIMO Overview . . . 5
2.1.2 NOMA Overview . . . 7
2.1.3 Reinforcement Learning . . . 9
2.2 Related Work . . . 11
2.2.1 NOMA+MU-MIMO . . . 11
2.2.2 Outer Loop Link Adaption . . . 12
2.2.3 Machine Learning based Link Adaption . . . 14
CHAPTER 3 SCENARIO AND PROBLEM FORMULATIONS 15 3.1 Network Model of NOMA+MU-MIMO . . . 15
3.2 Communication System in LTE/ LTE-A . . . 18
3.3 Scenario in LTE/LTE-A . . . 20
3.4 Problems Formulation . . . 21
3.4.1 Observation of NOMA+MUMIMO and MUMIMO . . . 21
3.4.2 The Impact of CSI on SINR . . . 22
3.4.3 Convergence Formulation . . . 25
3.5 Analysis of the Convergence Formulation . . . 25
3.5.1 Analysis of the SINR in MU-MIMO . . . 26
3.5.2 Observation of SINR in different CQI and PMI . . . 28
CHAPTER 4 PROPOSED REINFORCEMENT LEARNING BASED LINK ADAPTION . . . 32
4.1 Motivation of Reinforcement Learning . . . 32
4.2 Train an OLLA Agent based on Reinforcement Learning . . . 35
iii
TABLE OF CONTENTS iv
4.3 Proposed Mechanism in Communication System . . . 37
4.3.1 The Design of State, Reward, and Neural Network . . . 37
4.4 Reinforcement Learning Algorithm . . . 46
4.4.1 Asynchronous Advantage Actor-Critic Agents(A3C) . . . . 46
4.4.2 Implementation and Modification of A3C . . . 47
4.5 Proposed Feedback and Scheduler . . . 51
CHAPTER 5 PERFORMANCE EVALUATION . . . 59
5.1 Scenario Setting . . . 59
5.2 Simulation Results . . . 60
5.2.1 Verify the Design of Reinforcement Learning . . . 61
5.2.2 Performance in VIENNA . . . 69
CHAPTER 6 CONCLUSION AND FUTURE WORK . . . 82
REFERENCES . . . 83
doi:10.6342/NTU201900453
LIST OF TABLES
1 Notation Table . . . 16
2 Objective Function . . . 25
3 Notation Table for Reinforcement Learning . . . 33
4 Design of s, r, and a . . . 40
5 PMI Orthogonal Table . . . 58
6 CQI Parameters . . . 59
7 Simulation Setting . . . 60
8 Training Setting . . . 60
9 List of different Design of State in Fig. 32 . . . 61
10 List of Convergence Steps for different Number of Neurons . . . 64
v
LIST OF FIGURES
1 Channel Response for MIMO . . . 5
2 Demonstration of NOMA . . . 8
3 Decoding Process of SIC . . . 9
4 Architecture of Reinforcement Learning . . . 10
5 Overall System in NOMA+MU-MIMO . . . 17
6 Operate NOMA+MU-MIMO in Nt× Nr MIMO. Inter-interference indicates the interference caused by the other beams. Intra-interference indicates the interference caused by the other user in the same beam. 19 7 Operation Diagram in Downlink . . . 19
8 Deployment of NOMA+MU-MIMO . . . 20
9 Comparison between MU-MIMO and NOMA+MU-MIMO with cor- rect Estimation of SINR or not . . . 21
10 Comparison between estimated SINR and real SINR in terms of Throughput. Estimated SINR is the SINR estimated by limited CSI. Real SINR is SINR that UE actually suffers. . . 22
11 Relationship of fi and hi under perfect CSI . . . 23
12 Relationship of fi and hi under limited CSI . . . 23
13 Condition that the receivers of far and near user fail or success to decode the signal . . . 26
14 Real CQI and estimated CQI. Real CQI is calculated by h, the estimated CQI is the CQI returned by UE. . . 28
15 Real SINR in different Quantization Error(cosθ) and Interference for different CQI. The dots in the same quantization error are rep- resented as different interferences. . . 30
16 Real CQI and estimated CQI. Real CQI is calculated with channel vector h, the estimated CQI is the CQI return by UE. The estimated CQI is calculated following Eq. (3.12), which is a lower-bound in MU-MIMO cases. . . 31
17 Training Procedure for the proposed Algorithm . . . 36
18 Diagram of proposed Mechanism in Communication System . . . . 38
19 Block of ’Modify the estimated CQI with Trained Agent’ in Fig. 18 in Detail . . . 41
20 Comparison of Convergence Steps between different Methods in dif- ferent Types of Feedbacks . . . 43
doi:10.6342/NTU201900453
LIST OF FIGURES vii
21 Difference between traditional Mapping and proposed Method with
good initial Value . . . 43
22 Architecture of the Neural Network . . . 44
23 Fully-connected Network . . . 45
24 Architecture of Neural Network of s, π(s) and vπ(s). The block ’Neural Network’ in this thesis is the same as Fig. 22. . . 48
25 Implementation of A3C in training Neural Network . . . 50
26 Process of updating a Neural Network. The gradient descent opti- mization algorithms in ’Optimizer’ in this thesis is RMSprop. The backpropagation needs to compute the derivative of each activate function and the error generated in each layers. . . 52
27 Diagram in VIENNA . . . 54
28 Modified Diagram in VIENNA . . . 55
29 Real CQI and estimated CQI. Real CQI is calculated with channel h, the estimated CQI is the CQI return by UE. The estimated CQI is calculated following Eq. (4.15), which consider merely the SU- MIMO. . . 56
30 Converge-first Scheduler . . . 57
31 With orthogonal Constraint or not . . . 58
32 Comparison of States . . . 61
33 Comparison for each PMI with S1 and S3 . . . 62
34 Training Speed for one fully-connected Network and multiple fully- connected Network for each PMI . . . 63
35 Training speed of different Neurons and Layers . . . 64
36 Comparison of Exploration Rules . . . 66
37 Training speed with different N and R . . . 66
38 Convergence Steps with different Parameters . . . 67
39 Training Speed with different N and R with Rule 2 . . . 68
40 Training Speed with different N and R without -greedy Algorithm 68 41 Relationship between Step Size, Convergence Steps, and Ratio of Nack . . . 70
42 Performance in Traditional Method . . . 71
43 Comparison of different Parameters of the Baseline . . . 72
44 Demonstrate how the chosen MCS changes while AInitial= 1, gamma = 1, and Aoffset = 2 . . . 72
LIST OF FIGURES viii
45 Comparison of Convergence Steps between different Methods in dif-
ferent types of Feedbacks . . . 73
46 Performance in different Types of Feedbacks . . . 74
47 Throughput in different Method . . . 75
48 Performance in different Methods . . . 76
49 Demonstration for each OLLA Method . . . 77
50 Relationship between RN ack, Convergence Steps, and Ratio of Nack 77 51 Demonstration of RN ack = 0 and RN ack = 6 . . . 78
52 Impact of RN ack on Performance . . . 78
53 Impact of RN ack on Performance in converge-first Scheduler . . . 79
54 The trend of each metrics varies with the number of retransmissions. 79 55 Comparison between original Method and proposed Method with Constraint of Retransmission=0 in NOMA+MU-MIMO . . . 80
doi:10.6342/NTU201900453
CHAPTER 1
INTRODUCTION
With the growth of wireless mobile devices, the increasing demands of wireless mobile connections became an important issue. Thus, multiple access technolo- gies have received a great deal of interest over past years. This technique allows multiple users to share the same wireless medium so the spectral efficiency can be higher.
Toward the trend, the 3rd Generation Partnership Project (3GPP) standard- ized the radio interface specifications of LTE/LTE-A to enhance the performance for Multiuser Superposition Transmission (MUST). Considerable attention has been paid on Multiple Input Multiple Output (MIMO) and Non-Orthogonal Mul- tiple Access (NOMA). The receiver and the transmitter with multiple anten- nas, which are called Multiple Input Multiple Output (MIMO), exploit spatial multiplexing, transmit diversity, and beamforming to achieve higher peak rate.
Multiuser-MIMO (MU-MIMO) is one of the MIMO techniques allowing multiple users to share the same resource block through beamforming. The term Non- Orthogonal Multiple Access (NOMA) in the thesis indicates the power-domain non-orthogonal multiple access (PD-NOMA), which is also a promising MA tech- nique. It sends multiple messages through different power allocation. The signals transmitted with NOMA can be decoded by the special receiver, which is the so-called successive interference cancellation (SIC). The characteristics of NOMA improve the performance in terms of peak rate as well as fairness. Since both of the techniques have shown their own advantages, and exploit different domains, it is expected naturally that better performance should be seen if combining these two techniques. As a result, we investigate the problems and challenges while combining MU-MIMO with NOMA in the practical LTE/LTE-A environment in this thesis.
Although MU-MIMO can increase the data transmission rate through proper precoding and scheduling, the further improvement is limited in the practical environment due to the limited channel state information (CSI). The quantization error caused by the limited feedback leads to the inaccurate estimation [1], which is one of the major factors of performance loss. [2] discussed the impact of the limited feedback on MU-MIMO. [3] proposed the lower bound of the expectation value of CQI under limited feedback to avoid the overestimation of MCS. This method ensures the reliability but sacrifices the chance of fully utilizing the capacity of
1
2
the channel. [4] pointed out that NOMA is capable of increasing both the cell average throughput and fairness. Nevertheless, NONA with SIC also raises the new problem which has never seen in the conventional communication system, such as the pairing of users and the power allocation [5, 6].
It is not until recently that the researches concerning the combination of NOMA and MU-MIMO under perfect CSI are published [7,8]. The combination of the techniques utilizing the power and spatial domain in the same resource block can further improve the spectral efficiency. However, to the best of our knowledge, little investigations have been done in the combination of NOMA and MU-MIMO under practical communication environment. Many studies assumed that the base station can obtain the full knowledge of the channel, while this assumption is im- possible in the real world; thus, we would like to further study on the combining NOMA and MU-MIMO in the practical communication environment.
According to our investigation, the quantization error induced by limited feed- back is the major factor that the performance in terms of throughput below our expectation in NOMA+MU-MIMO. The quantization error results in the inac- curate estimation of signal to interference plus noise ratio (SINR), so the MCS selection and the power allocation are unable to be decided correctly. Indeed, the approaches to address the scheduling performance loss caused by the CSI impair- ments have been widely studied. Some paper proposed to dynamically change the link adaption based on the acknowledgment (ACK) and negative acknowledg- ment (NACK) feedback, known as outer loop link adaption [9, 10]. OLLA aims to deal with the CQI reporting inaccuracy, and compensates for the performance loss to some extent [1]. [11] pointed out that the convergence is a crucial issue for performance when applying OLLA due to the characteristics of short connections in LTE; thus, several studies concerning OLLA focused on increasing the conver- gence speed. [12, 13] improved the convergence through the analysis of SINR to BLER model. [14] changes the step size based on sequential hypothesis testing and proposed BLER estimator. [15] proposed a method that the step size is based on the elapsed time. Nevertheless, none of these methods are feasible to be ap- plied in differenent scenarios because the performance is highly dependent on the characteristics of the channel model. As the setting of channel model changed, the mathematical model has to be chosen and analyzed again. This will be ex- hausting work. With the growing complexity of communication environments, it is difficult to find the relationship between the huge number of parameters, and tune the coefficients to handle various communication environments. Thus, the need for a more flexible way to find the relationship between the more compli- cate parameters is growing. Under these circumstances, the attention on machine
doi:10.6342/NTU201900453 3
learning based link adaption are rising. Machine learning is known for the capa- bility of capturing the complicated relationship between parameters. [16, 17] have shown that machine learning techniques can capture the complicated effects of the environment to improve performance. In addition, the previous researches regard- ing convergence in OLLA do not take the convergece speed as objective function, the methods they proposed is based on the observation. Reinforcement learning, which is one of the machine learning techniques, is known for the exploring the unknown strategy through the interaction with the environment while the target is clear. [18–20] using the reinforcement learning to select the MCS.
Although several existing approaches for dynamic link adaption have shown positive results for inaccurate estimation, none of them have suffered from such a severe reporting inaccuracy in the conventional network as much as in NOMA+MU- MIMO due to multiple sources of interferences. There are two interferences in this scenario: one of the interferences is inter-beam interference, which is caused by the MU-MIMO, the projections from the precoding matrix of the other beams can deteriorate the transmission quality. The other is intra-beam interference, which is caused by NOMA, the users with the same precoding matrix but different power allocation induce failed decoding if the paring of users or power allocation is not appropriate. Thus, the deterioration of interference caused by the inaccurate esti- mation of SINR in the scenario that combining NOMA and MU-MIMO becomes more significant than ever before.
In short, in order to explore the potential of the combination of NOMA and MU-MIMO without ignoring the practical situation in LTE/LTE-A environment and modification of the LTE standard, it is necessary to improve the accuracy of estimated SINR without additional feedback, and to handle the complexity in NOMA+MU-MIMO scenario. In this thesis, we propose the reinforcement learn- ing based dynamic link adaption. In this approach, the selection of MCS is modi- fied based on the acknowledgment (ACK) and negative acknowledgment (NACK) feedback. The optimal strategy of modifying MCS as fast as possible is a crucial problem in the thesis. The optimal strategy is related to the channel response and the past assignment of the MCS of the scheduled users. However, current papers lack discussing the optimal strategy of the MCS selection and exploiting the past knowledge. Thus, we aim to take all these factors into consideration to achieve further performance. As for simulator, we use Vienna [21] as a simula- tor to simulate the practical LTE-A environments. And, we design the proper reinforcement model to accelerate the training time and improve the convergence speed of finding appropriate SINR estimation. Furthermore, we suggested using SU-MIMO feedback with OLLA according to our investigation. The impact of the
4
constraint of the retransmissions of the scheduler is also presented in this thesis.
The remainder of the paper is organized as follows: Chapter 2 introduce the background of the current NOMA, MU-MIMO, reinforcement learning, and related work of dynamic link adaption. Chapter 3 describes the system and problem formulation. Chapter 3 analyses the problem. Chapter 4 elaborates the motivation of using reinforcement learning and proposed the reinforcement learning based link adaption. The results are presented in chapter 5. The conclusion is reported in Chapter 6.
doi:10.6342/NTU201900453
CHAPTER 2
BACKGROUND AND RELATED WORK
2.1 Background
2.1.1 MU-MIMO Overview
MU-MIMO is one of the applications of MIMO. The technique exploits the spatial multiplexing to transmit multiple data streams to multiple users.
As illustrated in Fig. 1, with multiple antennas, the channel responses are various. The transmitted signal, y, after passing the channel response matrix, H, the signal the receiver receives is
x = H × y.
Transmitter Receiver
ℎ1,1 ℎ2,1
ℎ𝑛,1 ℎ2,2
ℎ1,2
ℎ𝑛,2 ℎ𝑚,1
ℎ𝑚,2
ℎ𝑚,𝑛 1
m 2
1
n 2
Figure 1: Channel Response for MIMO
It can be seen that the received signal is dependent on the channel response.
Thus, many techniques are proposed to utilize to characteristics of the channel for increasing the data rate or the robustness of data transmission.
Beamforming is a signal processing technique utilizing the characteristics of the MIMO. This technique enhances the desired signals and suppresses the interference under a suitable condition. With this technique, the transmitter encodes the signal before transmission. Let S = [s1, ..., sK] is the messages that the transmitter
5
2.1. BACKGROUND 6
intends to transmit. With beamforming, the transmitted signals are y = F × S,
where F is the precoding matrix.
The mechanism for choosing a precoding matrix is widely researched [22–25].
In general, the assignment of the precoding vector is highly related to the regula- tion of communication system, such as the capability of the control signal.
In this thesis, we adopt Zero-Forcing beamforming. Let hk is the channel response of U Ek. H = [h1, ..., hK]. With Zero-Forcing precoding, the precoding matrix is
F = HH(HHH)−1. (2.1)
Let F = [f1, f2, ..., fk], fk is the U Ek’s precoding vector. Eq. (2.1) implies fi× h0j = 0, if i 6= j. (2.2) That is, if (2.2) holds, the U Ek receives only the signal encoded with wk, the others will be suppressed.
Based on the standard of the 3GPP E-UTRA long-term evolution (LTE), re- ceivers can get the precoding information from DM-RS (demodulation reference signal) so receivers dont need the explicit precoding information from the transmit- ter. The method improves the performance while operating MU-MIMO because BS has more freedom in choosing a precoding matrix. Theoretically, as long as the feedback is perfect and the precoding matrix can be chosen arbitrarily, the zero-forcing is able to mitigate the interference. However, in practice, the feed- back is limited. In LTE/LTE, the UE returns PMI to indicate the direction of the channel from a codebook C,
C = [c1, ..., cC],
where cC is uniform vector. The U E chooses the vector closest to its channel response as its PMI, denoted by bh.
bh = arg max
cj∈C|Huc∗j|. (2.3) Under limited feedback, the Zero-Forcing based precoding vector is
F = bHH( bH bHH)−1, (2.4) where bH = [ bh1, ..., chK], F = [f1, ..., fK].
Therefore, (2.2) cannot be hold anymore. Let bhk = ehk + ek, where ehk is normalized hk, ehk = hk/ k hk k.
wi× bh0j = 0, if i 6= j.
doi:10.6342/NTU201900453
2.1. BACKGROUND 7
fi× eh0j = fi× ( bhk− ek) = fi× ek. (2.5) It is clear that the message, sj, encoded with cj may not be able to mitigate perfectly after passing hi. As a result, the U Ei may receive the message to others, which are unexpected interference for U Ei. Furthermore, it raises new problems about estimating the accurate SINR for both UEs and base stations.
In this thesis, the method in [3] is adopted. The UEs return the lower bound of SINR based on the assumptions. According to [3], the mean of inner product
|eekfei| is 1/(M − 1) based on beta-distribution. |ekfk| is approximated as 0 due to the assumption that the scheduled UEs are near orthogonal. Following these assumptions, the expectation of SIN Rk is
E [SIN Rk] ≥
P
|S |khkk2
( ehkbhHk)( bhkfek) + ekfek
2
1 + |S |P khkk2sin2θkE
P
i∈S\k
ekfei
2
=
P
|S |khkk2
( ehkbhHk)( bhkfek) + ekfek
2
1 + |S |P |S −1|M −1 khkk2sin2θk
≈ pkkhkk2cos2θ 1 + |S |P |S −1|M −1 khkk2sin2θk.
(2.6)
The base station uses the lower bound of SINR for scheduling to prevent over- estimation, which may cause failed transmissions. In practice, the UE return
g(hk) =
P
M khkk2cos2θ
1 + |M |P khkk2sin2θk. (2.7) The formulation of total lower bound estimated by the base station based on the returned G(hk) is expressed as
G(hk) = M
S kfkk2g(hk). (2.8)
2.1.2 NOMA Overview
NOMA is one of the promising techniques in next communication generation.
It meets the increasing demands of wireless mobile connections. Moreover, it improves both overall system throughput and fairness at the same time. The cell-edge user can benefit from NOMA due to the characteristics of NOMA. It is noticing that the gain of cell-edge throughput is improved significantly [6] with NOMA. NOMA is a technique to allocate multiple data on the same resource block(RB), as illustrated in Fig. 2.
2.1. BACKGROUND 8
RB(Frequency/Time) Power
Figure 2: Demonstration of NOMA
The base station use superposition coding to transmit multiple data. The transmitted signal with superposition is,
y = a1s1+ ... + aKsK, (2.9) where ak is the power factor of K, P = P
k∈Kpk; sk is the signal attempt to transmit to U Ek. The received signal for U Ek is represented as
x = hk(a1s1+ ... + aKsK). (2.10) The receivers can use DPC or SIC to decode the signals. In this thesis, we focus on the NOMA with SIC receiver. Assuming there are two users, the de- coding process is shown in Fig. 3. The receiver with SIC decodes the stronger but undesired signal iteratively and then substrate the original signal by decoded signal. The procedure does not stop until it can decode its own messages.
Basically, assuming |h1| > ... > |hK|, the base station allocate power following the criteria [26],
a1 < ... < hK. The U Ek’s throughput is represented as
Rk =X
W log2(1 + hkpk P
k0<k(hk0pk0) + W Nk) (2.11)
doi:10.6342/NTU201900453
2.1. BACKGROUND 9
Demodulation and decoding of 𝑥
2Modulation and coding of 𝑠
2Demodulation of 𝑠
2ℎ1 𝑦1+ 𝑦2 +n 𝑠1
− ℎ1
ℎ1𝑦1+n
𝑠2
Figure 3: Decoding Process of SIC
In this thesis, we discuss the NOMA with K = 2. The user with smaller gain is called far-user while the user with the stronger signal is called near-user. The SIC process only activates in near-user. For far-user, the signal from the other user is too weak to be regarded as interference. Noticing that the success of the decoding of NOMA is highly dependent on the degrading of the signal, it is no surprise the gain difference between users is an important issue while operating NOMA.
The impact of the difference between users on overall throughput has been shown in [4]. Assuming that the total power is P , the power of far user is αP , while the power of near user is (1 − α)P . Therefore, Rnear(α) and Rf ar(α) are,
Rnear(α) =X
W log2(1 + hnearαP W Nk ) Rf ar(α) =X
W log2(1 + hf ar(1 − α)P αP hf ar+ W Nk) Intuitively, the expected total rate Rsum(α) =P W log2(1+hnearW NαP
k )+Rf ar(α) = P W log2(1+αP hhf ar(1−α)P
f ar+W Nk). The power allocation is an important issue to maximize the throughput [4, 6].
2.1.3 Reinforcement Learning
Reinforcement learning is a learning technique to learn a sequence of actions to achieve better performance. The agent in reinforcement learning learns how to act through the interaction with the environment, as illustrated in Fig. 4. The environment sends observable information, which is called state, to agents. And the agent makes actions in response to the last state and rewards it received.
2.1. BACKGROUND 10
Agent Environment
State
Action Reward
Figure 4: Architecture of Reinforcement Learning
In general, the agent is able to change the state through the interaction with environment.
Unlike supervised learning, which needs the labeled example to learn the cor- rect behaviors, reinforcement learning trains the agent by implicit reward. The agent can learn how good is the action it took instead of the correctness of the ac- tions [27]. The aim of reinforcement learning is to maximize the rewards function in long run. Gt = Rt+1+ Rt+2+ ... + RT. Moreover, even the correct behavior is unknown, the agent still can learn the appropriate actions through reinforcement learning. This characteristic of reinforcement learning allows agents to learn with- out the full knowledge of the problem. It is very useful in many fields since if the agent only learns from the existed knowledge, the agent might never explore the other better behaviors due to the limitation of the known knowledge. Take the game of Go for example [28], machine learning recently makes significant progress in this game, the computer defeats the best player of Go. The contribution of state-of-the-art reinforcement learning plays an important role.
Markov decision process is a crucial elements of the theory and algorithm of reinforcement learning. If a problem can be formulated as a Markov decision process, the method solving such a problem can be regarded as an reinforcement learning methods. Markov decision process satisfies the Markov property. If an environment has the Markov property, the agent is able to make decision based on current state. The property can be represented mathematically as
p(s0, r|s, a) = P r(St+1 = s0, Rt+1 = r | S0, A0, R1, ...Rt, St, At)
= P r(St+1 = s0, Rt+1 = r | St= s, At).
(2.12)
doi:10.6342/NTU201900453
2.2. RELATED WORK 11
There are four basic elements in Markov decision process listed in the following, 1. S = {s1, s2, ...., sn} denotes the set of n possible states.
2. A denotes the set of possible actions.
3. P = S × A × S denotes the transition possibility, p(s, a, s0), of s to s’ while taking the action a
4. R = S × A × S is a reward function. r(s, a, s0) express the rewards from s to s0. The expected rewards for the state-action-next-state triples is expressed as
r(s, a, s0) = E[Rt+1| St= s, At = a, St+1 = s0]
= P
r∈<rp(s0, r | s, a)) p(s0 | s, a)
(2.13)
Value function vπ(s) is a expected return under a policy π in state s. π is a mapping from s, s ∈ S, to a, a ∈ A.
Solving a reinforcement learning task is to find a policy that achieves maximal reward in long run. Let v∗(s) = maxπvπ(s)1. v∗(s) can be written as
v∗(s) = maxπEπ[Gt|St= s]
= maxπEπ
" ∞ X
k=0
γkRt+k+1|St = s
#
= maxπEπ
"
Rt+1+
∞
X
k=0
γkRt+k+2| St= s
#
= maxπX
a
π(a|s)X
s0,r
p(s0, r|s, a)[r + γvπ(s0)],
. (2.14)
The last equation in (4.2) is the Bellman optimality equation for v∗. Re- searchers developed many algorithms and methods to solve reinforcement learn- ing tasks. Carefully choosing the proper algorithms and paying attention on the design issue of the algorithm is important to train a agent well.
2.2 Related Work
2.2.1 NOMA+MU-MIMO
NOMA and MU-MIMO techniques can improve the spectral technique. Based on previous works, beamforming, user paring, and power allocation are crucial in order to reach the potential of the techniques. [2] consider the practical feedback system. The cause of the interference in MU-MIMO was well studied. Moreover,
2.2. RELATED WORK 12
the impact of the limited feedback on MU-MIMO was discussed. It suggests that the CQI should consider interference. [3] proposed the lower bound of the expectation value of CQI under limited feedback to avoid the overestimation of MCS. This way guarantees the reliability but sacrifices the chance of fully utilizing the capacity of the channel. Also, it proposes a scheduler for MU-MIMO. The method in [3] is adopted in VIENNA. [4–6] pointed out that NOMA is capable of increasing both the cell average throughput and fairness. Scheduling constraints and fairness metric can affect the performance.
[8] suggested that the two users can share one precoding matrix. It inves- tigated the impact of the threshold of the correlation between the users on the performance. [7] proposed two precoding technique in order to eliminate the inter- interference. It is noticing that both papers have perfect CSIT assumptions. To the best of our knowledge, little investigations have been done in the combina- tion of NOMA and MU-MIMO under practical communication environment; thus, we would like to further study on the combining NOMA and MU-MIMO in the practical communication environment.
2.2.2 Outer Loop Link Adaption
Outer loop link adaption (OLLA) is a well-known technique to compensate for the inaccuracy of the mapping, CQI imperfection, and the variance of the channel. OLLA solves these problems by modifying the mapping from SINR to MCS dynamically, in contrast to traditional static mapping. OLLA can improve the accuracy of the impractical static mapping for SINR to MCS due to the CQI reporting inaccuracy and the inconsistent channel condition. These inconsistent channel conditions occur because the propagation error condition might be dif- ferent from the time when constructing the map. The CQI imperfection includes estimation error, which might be caused by the differently calibrated user equip- ment or hardware inaccuracies and quantization error. The variance of channel condition includes delay of channel reporting(ex: transmission time and decoding time), different numbers of resolvable multi-paths and mobile speeds, and propa- gation error varies with users.
The concept of OLLA is firstly proposed in [9]. A simple model has developed and analyzed. The method of [9] increases the estimated SNR by a certain fixed step size when receiving an ACK, while the estimated SNR is decreased when receiving NACK. It is noticing that there is a relationship between the step size of increasing SNR and decreasing SNR in order to ensure the BLER. According to the analytical results, it suggested that the step size has a direct impact on the performance and further work have to be done in order to investigate the trade-off
doi:10.6342/NTU201900453
2.2. RELATED WORK 13
between convergence and power excess.
[29] implemented OLLA in LTE. Although the performance loss is more sig- nificant when CQI inaccuracy increasing, it is observed that OLLA is capable of compensating for the performance loss while CQI is inaccurate. [11] showed that the convergence is a crucial issue in LTE due to the characteristics of short connec- tions. Thus, there are several researchers aiming to improve the OLLA mechanism in order to deal with the convergence issue. The approach proposed in [30] im- proves the convergence by adjusting the initial offset. It showed that the proper initial value can accelerate the convergence speed. The algorithm consists of three stages: filtering, aggregation, and statistical computation. The obvious drawback of the method in this paper is that it has to collect large connection data initially to find a medium value. It may result in performance loss in the beginning.
[12, 13] not only proposed methods to solve the convergence problem but also presented a comprehensive analysis and BLER model. The detailed procedure for analyzing the BLER elaborated how to find a proper mathematical model.
Also, with the more detailed model in comparison with the model in [9], the proposed OLLA mechanism is more complicated. They took the average BLER and instantaneous BLER into account. In this way, they had more freedom to adjust the change of steps in response to ACK/ NACK. It is shown that the performance can be improved. However, the mathematical model is specific to a certain channel condition. That is, if the complexity of the communication system increases, the analysis of the mathematical model requires exhausting work.
[14] aims to faster convergence to the target BLER region. It defined three different operating modes to decide the step size. Basically, the closer to the BLER region the estimated BLER is, the larger the step size is. The operating mode is decided by the sequential tests of statistical hypothesis(SHT), which can determine which hypothesis(H) is true with a minimum number of observations and BLER estimator. Although the concept of the dynamically changing the step based on the operating mode is attractive, the proposed BLER estimator might fail to choose the proper operating mode while the selected MCS changes too rapidly. [15] proposed a mechanism to recover fast from the idle to active. The magnitude of compensation is decreased when time passing.
In short, although the convergences issue has been received much attention, how to modify the convergence strategy according to the different channels still remains unknown.
2.2. RELATED WORK 14
2.2.3 Machine Learning based Link Adaption
Recently, there has been a growing interest in the applications of machine learning. In the communication field, the capability of machine learning to deal with the complicated parameters catch many researchers’ eyes.
[16] implemented online AMC with support vector machines to capture the channel effect in real time and found out the proper mapping from SNR to MCS.
Unfortunately, this method is not suitable while the mapping is not one-to-one.
Also, the training set is still too large to converge fast. [17] proposed a low dimen- sional feature set to increase the AMC accuracy while operating in MIMO. This research simply adopted k-NN. The method showed good performance. However, this method may suffer from excessive training memory and processing time.
Reinforcement learning is one of the machine learning techniques. It is suitable for a goal-oriented game, training the agent to learn how to act to achieve a higher cumulative reward. Several researches paid attention to this method because one of the advantages of reinforcement learning is that it can train online and save memory in comparison to supervised learning. Reinforcement learning can collect the data in a more efficient way because exploration and exploitation is a widely studied issue in this field [27]. [18, 19] adopted Q-learning and showed better per- formance in comparison with a supervised learning based method. Although the researches have shown the positive results and the potential, the applied reinforce- ment learning techniques are not efficient enough. Moreover, these methods did not focus on optimizing the convergence strategy, which is an important issue in a practical environment.
doi:10.6342/NTU201900453
CHAPTER 3
SCENARIO AND PROBLEM FORMULATIONS
In this section, we first introduce the system model in LTE/LTE-A. And then, the encountered problems when implementing NOMA+MU-MIMO in cur- rent LTE/LTE-A are presented clearly. In the end, we introduce the problem formulation.
Notations: We use upper-case boldface letters for matrices and lower-case boldface for vectors. The operation (·)−1, (·)T and (·)H denote the inverse, the transpose and the conjugate transpose of matrix respectively. E(·) stands for the expectation operator, and C represent the complex value. |S| denote the size of set S.
3.1 Network Model of NOMA+MU-MIMO
When the base station uses NOMA technique to transmit signal, they allocate the users with the different power to utilize the power-domain. The receivers adopt SIC to eliminate the interference from the other users. When it comes to MU- MIMO, the based station uses the precoding technique to encode the transmitted signal, exploiting the spatial-domain to superpose multiple users’ messages in the same resource block. With the combination of NOMA and MU-MIMO, the base stations transmit the signals with different power allocation and precoders for different receivers. In this thesis, different precoder means different beam. The transmitted signal, which is denoted as y, can be written as
y =
Nb
X
b=1
fb X
u∈Kb
ab,usb,u, (3.1)
where Nbis the maximum number of beams. sb,u is the desired data which receiver U Eu in beam b desires to receive. Also, to apply the NOMA, the users have to be in the same beam.
In order to retrieve the desired signal, the receiver have to decode the received signal successfully. In the communication environment, the channel responses for different users are various so the received signal for each users are different. The
15
3.1. NETWORK MODEL OF NOMA+MU-MIMO 16
Table 1: Notation Table Type Symbols Definition
Parameters
NB The maximal number of beams M The number of antenna transmitter Nr The number of antenna receiver
Nc The number of vectors in the codebook P Power constraint on the transmitted signals
Sets
C Set of codebook U Set of UEs
S Set of scheduled UEs
Kb Set of shceduled UEs served in beam b T Set of RBs
M Set of MCSs
Variables
xu The signal received by UEu.
xk,b The signal received by UEk in beam b.
y The transmitted signal
sb,u The desired signal of UEu in beam b.
Hk,b The channel matrix of UEk in beam b.
hk,b The channel vector of UEk in beam b.
bhk,b The quantized channel vector returned by UEk in beam b.
ehk,b The normalized channel response of UEk in beam b.
cj The codebook vector j in codebook.
fb The precoding vetor of beam b.
ab,k The power of UEk in beam b.
ek The error vector for UEk.
eek The normalized error vector for UEk. U Eb,k The UEk in beams b.
γef f The effective SINR estimated by BS based on the mapping.
γef f0 The effective SINR modified by BS.
θ The angle betweenehk,b and bhk,b.
doi:10.6342/NTU201900453
3.1. NETWORK MODEL OF NOMA+MU-MIMO 17
Beamforming Power allocation
scheduler
𝑦1= 𝑎1,1𝑠1,1+ 𝑎1,2𝑠1,2 𝑠1,1, 𝑠1,2
y =
𝑖=1 𝐵
𝑤𝑖𝑦𝑖 𝑠𝐵,1, 𝑠𝐵,2
𝑈𝐸1,1
N𝑟
𝑈𝐸1,2
N𝑟
MMSE->SIC->decoded signal
MMSE->decoded signal 𝑈𝐸B,1
N𝑟
𝑈𝐸B,2
N𝑟
Tx
Figure 5: Overall System in NOMA+MU-MIMO
received signal for U Eu’s in beam b is denoted by xb,u. xb,u is represented as
xu,b = Hu,b
Nb
X
b=1
Fb
X
u∈Kb
ab,usb,u+ n. (3.2)
Also, the user implements spatial filter to enhance the desired signal and sup- press the interference including undesired signals.
Thus, the received signal can be rewritten as
x = vu,bHu,b
Nb
X
b=1
Fb X
u∈Kb
ab,usb,u+ n, (3.3) where vu,b is denoted by equalization of receiver.
The effective channel response denoted as gu,b can be represented as
gu,b = vu,bHu,b. (3.4)
3.2. COMMUNICATION SYSTEM IN LTE/ LTE-A 18
Zero-forcing beamforming is a useful technique to mitigate the interference from the other beam so we implement is as the precoding mechanism. We choose the fborthogonal to the other beams. Theoretically, if the channel information can be fully obtained, the received signal with zero-forcing precoding can be rewritten as
x = vu,bHu,bFb X
u∈Kb
ab,usb,u+ n. (3.5)
The overall system in NOMA+MU-MIMO is illustrated in Fig. 5.
3.2 Communication System in LTE/ LTE-A
In a practical communication system, the channel information returned by UE is limited. It includes CQI, PMI, and RI. CQI implies the magnitude of the chan- nel. PMI indicates the direction of the channel. RI indicates the number of layers the UE preferred. In LTE/LTE-A, the PMI is chosen from the LTE codebook.
Assuming that the number of layers is 1, and the LTE codebook composed of the quantized vector is given by
C = {c1, ....cNc} , (3.6)
where C denotes the set of the codebook. Nc denotes the number of vectors in the codebook.
The chosen PMI is the best choice among the code book to represent the channel. The chosen PMI, which is denoted by hu , can be represented as
chu = arg max
cj∈C|Huc∗j|, (3.7) where chu is best chosen PMI among the codebook.
With zero-forcing beamforming, fu satisfies
fubh∗j = 0, if user i is not allocated in the same beam as user j. (3.8)
The received signal of UE u in beam b is
xu = vu,bGu,bfb X
u∈Kb
ab,usb,u+ vu,bHu,b
Nb
X
j=1,j /∈b
fj X
u∈Kj
aj,usj,u+ n. (3.9)
The term F = vu,bHu,bPNb
j=1,j /∈bfjP
u∈Kjaj,usj,u is regarded as undesired signal for UE u. While the perfect CSI is available, F would be 0. Otherwise, F would be larger than zero.
doi:10.6342/NTU201900453
3.2. COMMUNICATION SYSTEM IN LTE/ LTE-A 19
Tx
𝑈𝐸1,1
𝑈𝐸𝐵,1
𝑈𝐸𝐵,2
SINR N𝑟
low N𝑟
N𝑟
beam1
beamB
intra-interference
inter-interference
N𝑡
𝑈𝐸1,2 N𝑟
Figure 6: Operate NOMA+MU-MIMO in Nt × Nr MIMO. Inter-interference indicates the interference caused by the other beams. Intra-interference indicates the interference caused by the other user in the same beam.
ILLA
preprocessingCQI
OLLA
UE eNB
CQI
MCS ACK/NACK 𝛾𝛾𝑒𝑒𝑒𝑒𝑒𝑒
𝛾𝛾′𝑒𝑒𝑒𝑒𝑒𝑒
Figure 7: Operation Diagram in Downlink
3.3. SCENARIO IN LTE/LTE-A 20
In Fig. 7, the link adaption scheme is separated into two part: one is the inner loop link adaption (ILLA), the other is outer loop link adaption(OLLA).
ILLA is designed to assign the most suitable MCS based on the estimation of link quality. The purpose of OLLA is to correct the reporting inaccuracies. Thus, it modifies the estimated link quality based on the ACK and NACK instead of merely depending on reporting CQI.
3.3 Scenario in LTE/LTE-A
Base station
UE
Figure 8: Deployment of NOMA+MU-MIMO
Fig. 8 illustrates the deployment of NOMA+MU-MIMO. The number of trans- mitters’ antenna Nt = 4, the number of receivers’ antenna Nr = 1. The assump- tions are shown as following:
1. The number of transmit beams is fixed over all RBs.
2. The power allocated among beams is equal.
3. The number of multiplexed users within a beam is smaller than 2.
4. The maximal number of data streams each user received do not exceed.
5. The PMI is chosen from the LTE codebook.
These assumptions allow us to optimize the system performance without losing generality.
doi:10.6342/NTU201900453
3.4. PROBLEMS FORMULATION 21
3.4 Problems Formulation
3.4.1 Observation of NOMA+MUMIMO and MUMIMO
Firstly, we ran simulations with different the setting of estimation of SINR in order to analyze the the difference between perfect CSI and limited CSI. If the base station can obtain full feedback, it can estimate SINR correctly. By contrast, if the feedback is limited, the base station can only approximate the SINR. The format of limited follows the standard of the LTE\LTE-A. The results are shown in Fig. 9. From Fig. 9, there is almost no throughput gain between NOMA+MUMIMO and MUMIMO. In addition, NOMA+MUMIMO is supposed to be better than MU-MIMO in terms of cell-edge due to the characteristic of NOMA. Unfortunately, the expected results can not be seen from Fig. 9. That is, Fig. 9 indicates that the performance fails to be further improved as long as the estimation is not accurate enough.
(a) Throughput. (b) Cell-edge Throughput.
Figure 9: Comparison between MU-MIMO and NOMA+MU-MIMO with correct Estimation of SINR or not
The previous observations imply that CSI integrity is crucial to the improve- ment. The feedback adopted in VIENNA is the lower bound of the expectation of SINR, but the inaccuracy of the estimated SINR and real SINR have not analyzed well in [3]. The feedback has an impact on the estimation of SINR so we analyze the variation of real SINR for estimation of SINR based on limited feedback [3].
It is noticing in Fig. 10 that the variation of real SINR for a returned CQI is extremely large. It implies that the returned CQI from UE is very inaccurate.
The loss of accuracy of SINR might be acceptable while operating in MU-MIMO.
Nevertheless, in the case that taking NOMA into account, the inaccuracy becomes an important issue. Additionally, lots of researches [5,6] have shown that the power allocation and paring is a key problem as operating in NOMA. And the paring mainly depends on the difference of channel gain between the UEs, which receive
3.4. PROBLEMS FORMULATION 22
−10 −5 0 5 10 15 20 25
−8
−6
−4
−2 0 2 4 6 8 10 12
Estimated SINR
Real SINR
Figure 10: Comparison between estimated SINR and real SINR in terms of Throughput. Estimated SINR is the SINR estimated by limited CSI. Real SINR is SINR that UE actually suffers.
the data encoded with NOMA. The wrong estimation of channel gain difference leads to the wrong power allocation, causing the loss of performance.
Furthermore, the wrong paring and power allocation cause the UE to decode the signal unsuccessfully since the degradation of the signal is more than expected.
In other words, the SIC receiver may fail to decode the signal due to the unexpected interference.
3.4.2 The Impact of CSI on SINR
The impact of the inaccurate CSI on SINR will be further discussed.
Assuming that the number of beams is 2, 4x1 MIMO, beamforming is zero- forcing, and the receiver is perfect SIC receiver, which implies that the near user can successfully eliminate the intra-interference. Therefore, the intra-interference can be neglected, the inter-interference is the major topic to be discussed in the following. These assumptions allow us to analyze the problem without losing generality.
If the perfect CSI is available, the relationship between precoder fiand channel hi is demonstrated in Fig. 11. After passing the channel, the signal that UE
doi:10.6342/NTU201900453
3.4. PROBLEMS FORMULATION 23
𝑓1
ℎ1
𝑓2 ℎ2
Figure 11: Relationship of fi and hi under perfect CSI
ℎ1 = 𝑐1
ℎ2=c2 ℎ2 𝜃1
𝜃2 𝑓1
𝑓2 ℎ1
𝑒1
𝑒2
Figure 12: Relationship of fi and hi under limited CSI
3.4. PROBLEMS FORMULATION 24
received is
Z1 = h1y
= h1(f1x1+ w2)) + n
= h1f1x1+ h1f2x2+ n
= h1f1x1+ n
(∵ zero-forcing ∴ h1f2 = 0).
(3.10)
Eq. (3.10) shows that the interference from the other beam can be eliminated perfectly with the precoding. However, the situation under the limited feedback is changed.
The relationship of precoder fi and channel hi is demonstrated in Fig. 12. The channel vector is hk = khkk (| ehkhbk|Hhbk+ ek). ek is denoted by the error vector.
hbk and ehk are denoted by the quantized channel vector and normalized channel vector, respectively. The received signal of UE is expressed as
Z1 = h1y
= kh1k (| eh1hb1|Hhb1+ e1)(f1x1 + f2x2) + n
= kh1k
(| eh1hb1|Hhb1)(f1x1+ f2x2) + e1(f1x1+ f2x2) + n (∵ zero-forcing ∴ | eh1hb1|Hhb1f1 = 0)
= kh1k
(| eh1hb1|Hhb1)f1x1+ e1(f1x1+ f2x2) + n
= kh1k
(| eh1hb1|Hhb1+ e1)f1x1+ e1f2x2 + n.
(3.11)
It can be seen that the interference, F = e1f2x2, cannot be eliminated. The limited feedback causes precoding, f2, to be chosen wrongly because the real chan- nel vector, h1, is unknown. In this situation, the base station could only choose the precoding orthogonal to bh1 instead of h1. As a result, the messages encoded by the precoding of the other beams cannot be eliminated naturally after going through h1.
Thus, the SINR of U Ek under limited feedback is
SIN Rk,real =
P
|S| khkk2
hbkfekcosθk+ ekfek
2
1 + |S|P khkk2sin2θkP
i∈S\k
ekfei
2. (3.12)
The interference, F = P
i∈S\k
ekfei
2
, is unknown for UE, since fi is deter- mined by base station and the co-scheduled UE. Different co-scheduled UEs lead to different F, which could even range from 0 to 1. On the other hand, the ek is unknown at the base station due to the quantized PMI. As a result, the base station and UE can not get the accurate estimation of SINR, if the information
doi:10.6342/NTU201900453 3.5. ANALYSIS OF THE CONVERGENCE FORMULATION 25
Table 2: Objective Function Objective function:
maxπEπ
hPT
k=0rt+k | sti (3.13)
π is the strategy of selecting MCS.
st is the observable information for base station.
rt=
0, if the base station knows that it has assigned the suitable MCS
−1, otherwise
.
between the base stations and UE can not be exchanged completely. The base station may fail to estimate SINR for proper scheduling.
3.4.3 Convergence Formulation
Estimation of SINR is highly dependent on the CQI and PMI returned by UE in LTE. On the condition that reporting CSI is unable to represent the real SINR accurate enough due to the quantization error and limited feedback, changing the MCS based on the HARQ information have to be taken into consideration. Chang- ing MCS dynamically based on HARQ is so-called OLLA mechanisim. The preva- lence of short connection in LTE network [11] enforces the conventional OLLA to take convergence into consideration. As a result, convergence speed issue is our major goal in the thesis. Mathematically, the objective function is shown in Table 3.4.3.
Eq. (3.13) depicts the convergence problem mathematically.
The larger the Eq. (3.13) is, the quicker the base stations are able to assign the suitable MCS within a period time T . In other words, optimizing the objective function is to fulfill the requirement of the short connections in LTE. The base station can respond to the inaccurate reporting SINR more quickly, achieving better performance in scheduling with the corrected suitable estimation of SINR.
Thus, our aim is to design a strategy of modifying the MCS so the base stations can obtain the suitable estimation of SINR as quick as possible.
3.5 Analysis of the Convergence Formulation
To solve the optimization problem, the first step is to analyze the parameters associated with the rt. rt is an indicator that whether the base station finds the proper estimated SINR or not. The proper SINR indicates that the base station