中華大學

(1)

中華大學

碩士論文

4G LTE 基地台混合能源利用效率研究 Hybrid Energy Utilization for Harvesting Base

Station in 4G LTE

學系別：電機工程學系碩士班學號姓名：M10201041 泰森

指導教授：余誌民博士

中華民國 104 年 8 月

(2)

- 1 -

摘要

在本文中，4G LTE 中採用基站（BS），它是由混合能量供給。混合能量由恆定能量源和太陽能收集器。為了最大限度地減少在基站中的能量恆定源的消耗電力，基於 Markov決策過程（MDP）模型被使用。在該模型中，一個緩衝器發送了有限的傳輸時間間隔的數據包的給定數目。為了找到最佳的發射功率為每個時間間隔，MDP模型的迭代採用的是兩個實際場景。在第一種情況，所接收的太陽能收集器將不會，如果它是不夠的，在緩衝器中的分組來傳輸使用。在第二個方案，所接收的太陽能收集器將被立即使用。在這兩種情況下，計算結果表明，提出的MDP模型可以減少交流電源使用率高達50％。

關鍵字: 基站，4G-LTE，混合能量，馬可夫決策過程

(3)

- 2 -

ABSTRACT

In this thesis, the 4G LTE is adopted in base station (BS) and it is supplied by the hybrid energy. The hybrid energy consists of constant energy source and solar energy harvester. To minimize power consumption of the energy constant source in base station, Markov decision process (MDP) model is used. In this model, one buffer is transmitting a given number of data packets over finite transmission time intervals. To find the optimal transmit power for every time interval, the iterations of MDP model is used in two practical scenarios. In the first scenario, the received solar energy harvester will not be used if it is not enough to transmit the packets in the buffer. In the second scenario, the received solar energy harvester will be used immediately. In both scenarios, the numerical results indicate that proposed MDP model can reduce AC power usage up to 50%.

Keywords: Base station, 4G-LTE, Hybrid power, Markov decision process

(4)

- 3 -

Acknowledgment

This work has been supported by many people to whom I wish to express my gratitude. I wish to thank my main adviser Prof. Chih Min Yu. The positive attitude and encouragement provided inspiration to many ideas and undoubtedly gave dynamism to my research studies.

I also want to thank my parents and my brother, as they continue to inspire, encourage and help me through everything in my life.

I am also grateful to my school especially international student affairs, for providing help whenever I needed.

(5)

- 4 -

List of Abbreviations

(6)

- 5 -

List of Figures

Fig.1.1 The timing diagram for UMTS MS receiver activities………...10

Fig.1.2 An example of MDP model………...13

Fig.1.3. Energy minimization for base station…...13

Fig.2.1. Toy examples of solar irradiance...…...15

Fig.2.2. Gaussian mixture hidden Markov chain...…...17

Fig.2.3. The solar energy generation per hour……….…...19

Fig.2.4. The rational of the decomposition ………...24

Fig.3.1. A timing diagram of IEEE 802.16e sleep mode operation………...……...28

Fig.3.2 The sleep mode operation ... ...31

Fig.4.1. MDP structure... ...37

Fig.4.2. Model of a reinforcement learning system……….…………...47

Fig.5.1 Hybrid architecture... ...……….……...48

Figure 5.2 Hybrid model... ...49

Fig.5.3. Practical scenarios... ...52

Fig.5.4. States (2 by 2) for first scenario………...……54

(9)

- 8 -

Fig.5.5. States (2 by 2) for second scenario………...55

Fig.5.6. States (3 by 3) for first scenario…... 60

Figure 5.7 States (3 by 3) for second scenario………... 61

Fig.6.1. Flow chart to calculate minimum reward value...64

Fig.6.2.The Power consumption for different size of the buffer...65

Fig.6.3. Percentage of dropped packet for different size of the buffer...67

Fig.6.4. Percentage of wastage solar energy for different size of the battery…...68

Figure 6.5 Performance the first scenario in power consumption and percentage drop Packets... ... ... ...69

(10)

- 9 -

Chapter 1 Introduction

The often mentioned Gartner report of 2007 [1] states that the information and communication technology (ICT) sector is responsible for 2% of CO2 human footprint. More exactly, mobile communication networks only consume around 0.5% of the global energy supply [2]. These numbers are indicated to increase with the rapid growth of capacity demand, unless significant developments are gotten to networks’ efficiency. Hence reducing and optimizing energy consumption is a strategic object for the cellular communication industry [3].

This is not only because it makes business sense, but also because mobile operators are more committed in corporate social responsibility programs with particular focus on inexhaustible energy and environmental issues.

In the recent years — there is a flourishing interest around the world in supplying cellular networks with renewable energy, e.g., solar, wind, and hydro, to decrease carbon footprints [4]. In addition, there are several gains of using renewable energy sources in powering cellular BSs. One of the intuitive and common justifications regarding renewable energy in general is the desire to render our society more environmentally friendly, e.g. by decreasing the global greenhouse gas (GHG) radiations. However, further to decreasing the GHG radiations, resources can serve as solution to many problems [5].

The evolving fourth-generation (4G) wireless technologies for example long term evolution (LTE) of UMTS and mobile WiMAX offer high bandwidth for data transfer to support many applications including voice, data and multimedia. High

(11)

- 10 -

data rates over the access part of the network are accomplished through the use of higher order modulation, advanced coding, antenna techniques and so on [6]- [8]. The computationally complex circuitry in mobile terminals drains their battery energy quickly and thus limits the potential use of embellished 4G services. Numerous methods and mechanisms, including discontinuous reception (DRX) in LTE [9], [10] and idle/sleep modes in WiMAX [11], [12], are introduced to increase battery lifetime. All recent 3GPP releases specify DRX to save the energy of the user equipment (UE) [13]-[14].

Fig.1.1 The timing diagram for UMTS MS receiver activities

There are numerous publications on modeling DRX. Examples include [15]-[18] (UMTS) as show in the Fig.1.1 and [19]-[21] (LTE). The investigations and experiments found in the literature show that by enabling DRX important UE power savings could be achieved. However, the DRX parameters should be optimized in order to achieve maximum power saving without incurring unacceptable packet delay or network re-entry. The stringent requirements on delay of real-time services, such

(12)

- 11 -

as VoIP or video [22], limit to a certain extent the power saving potential of the DRX mechanism.

On the other hand, the solar energy generation bases on many elements such as temperature, sun light intensity, the geo-location of the solar panel, and so on. However, the daily solar energy generation in a given area exhibits temporal dynamics that peaks around noon and bottoms during the night. The BSs’ energy consumption depends on the mobile traffic, which shows both temporal and spatial diversities. The temporal traffic diversity indicates that the traffic demands at individual BSs are highly different over time while the spatial traffic diversity means that closely located BSs may experience different traffic intensities at the same time period of a day, and therefore, they experience different energy consumption [23].

In the literature, the main goal is to provide a balance between energy consumption and performance [24], [25]. However, Energy harvesting (EH) has captivated considerable interest as an environmentally friendlier supply of energy for communication nodes compared to traditional sources of energy. Energy harvesting EH nodes harvest energy from their surroundings using solar, thermoelectric, and motion effects or by exploiting some other physical phenomenon.

Therefore, the harvested energy is practically free of cost and can ensure an enduring supply of energy.

(13)

- 12 -

From a practical point of view, to achieve both reliable and green communication, it is enticing to have a hybrid source of energy due to the intermittent nature of the harvested energy. A hybrid energy source is a combination of a constant energy source, e.g., power grid, diesel generator etc., and an EH source which harvests energy from solar, wind, thermal, or electromechanical effects. The concept of hybrid energy sources has also drawn interest from industry as expressed in the Fig.1.2 [26].

In this thesis, we use Markov decision process (MDP) to find optimal solution. MDP is the theoretical model of reinforcement learning for a single factor. An environment of the learning agent is modeled by MDP model. MDP model is a tuple,

 S , A , P

_ij^a

, RW

_ij^a



where S is the discrete state set, A is the discrete action set, P is the matrix of state-action transition probabilities,

P

ij^a_is _the probability of reaching state j if the action a is taken in state i. RW is the reward matrix,

RW

ij^a is expected value of accomplishing direct reward of state j if the action a is taken in state i. Fig.1.3 shows an example of MDP model. In Fig.1.3, s0, s1 are the states, a0, a1 are the actions, and a pair of state and action is called state-action rule (a rule).

RW

ij^a is a reward from rule(i, a) to state j. The goal of the factor is to find a policy so as to maximize the recompense, expected sum of discounted rewards or average reward [27].

(14)

- 13 -

Fig.1.2 An example of MDP model

In our works, we aims to minimize AC Power in base station using a hybrid supply of energy by maximum harvesting power and minimum wastage energy as depicted in Fig.1.4:

Fig.1.3 Energy minimization for base station

(a)- Minimize AC power usage in buffer of packet.

(b)- Minimize wastage energy in state of battery.

(15)

- 14 -

Chapter 2 Hybrid Energy

2.1 harvester energy

The model for depicting the harvested energy bases on many parameters, such as weather conditions (e.g., sunny, cloudy, rainy), sunshine duration (e.g., day and night), and behaviour of the rechargeable battery (e.g., storage capacity).Furthermore, the solar energy usually emerges in a smooth fashion over a short time period. We focus on modelling the solar power from the measurements by using a hidden Markov chain, and construction a framework to extract the underlying parameters that can characterize the availability of solar power. We start with a toy example to rationalize the rationality of the proposed energy harvesting models. Assume a real data record of irradiance (i.e., the intensity of the solar radiation in unitsμW/cm2) for the month of June from 2008 to 2010, measured by a solar site in Elizabeth City State University, with the measurements taken at five-minute intervals [28].

In Fig.2.1 (a), the time series of the irradiance is sketched over twenty-four hours for June 15th, 2010, along with the average results for the month of June in 2008 and 2010. We can make the subsequent observations from this figure.

First, the daily solar radiation fluctuates slowly within a short time interval, but could suddenly change from the current level to adjacent levels with higher or lower mean values. Second, the average irradiance value is sufficiently high only from the early morning (seven o’clock) to the late afternoon (seventeen o’clock).

(16)

- 15 -

We refer to this time duration as the sunlight active region. Third, the evolution of the diurnal irradiance follows a very similar time-symmetric mask, whereas the short-term profiles of different days can be very different and unpredictable.

By considering the irradiance from seven o’clock to seventeen o’clock for June in 2008, 2009 and 2010. Fig.2.1 (b) shows the corresponding histogram plotted against the irradiance on the x-axis, which represents the percentage of the incidences of data samples in each bin of width 103 μW/cm2. It can be seen that the irradiance behaves like a mixture random variable generated by a number of normal distributions.

Fig.2.1 Toy examples of solar irradiance measured by a solar site in Elizabeth City State University. (a) Time series of the daily irradiance in June. (b) Histogram of the irradiance during a time period of seven o’clock to seventeen o’clock for the month of June from 2008 to 2010.

In fact, the solar radiation incident on an energy harvesting device is

(17)

- 16 -

affected by its surrounding obstacles (e.g., cloud and terrain), which harvest absorption, reflection and scattering phenomena and it can be seemingly presented as a Gaussian random variable by the law of large numbers. These observations motivate us to show the evolution of the irradiance via a hidden Markov chain with a finite number of possible states, each of which is determined by a normal distribution with unknown mean and variance.

An

N

^H - state solar power harvesting hidden Markov model is

illustrated in Fig. 2.2, where the underlying normal distribution for the

j

th

state is specified by the parameters of the mean



^j

and the variance



^j

The solar irradiance can be classified into several states

S

^H

to represent energy harvesting conditions such as Excellent’, ‘Good’, ‘Fair’, ‘Poor’, etc. Without loss of generality, the solar states are numbered in ascending order of the mean values of the underlying parameters



j _{. Let}

_S

_H^(t⁾be the solar state at time instant t.

We further assume that the hidden Markov model is time homogeneous and

governed by the state transition

probability ij

t H t

H

j S i a

S

P (

⁽ ⁾

 |

⁽ ^¹⁾

 ) 

_.

(18)

- 17 -

F ig.2.2 Gaussian mixture hidden Markov chain of the solar power harvesting

model with the underlying parameters (μj, ρj ) (NH = 4).

2.2 Green Energy Model

Solar panels generate electrical power by transforming solar radiation into direct current electricity using semiconductors that displays the photo-voltaic effect. Solar energy generation depends on various agents, such as the temperature, the solar intensity, and the geolocation of the solar panels. More than that, the hourly solar energy generation can be estimated by using typical yearly meteorological weather data for a given geolocation. Furthermore, System Advisor Model (SAM) [29] and PVWatts [30] model are adopted to estimate the hourly solar energy generation. Fig.2.3 shows the appraisal of the hourly solar energy generation of four different months in New York City. In this evaluation, the nameplate capacity, φ, and the DC-to-AC derate factor, η, to 1 kWdc and 0.77, respectively. From the estimation, the solar panels start to generate energy

(19)

- 18 -

from around 6:00 AM. The solar energy generation keeps increasing and peaks at around 1:00 PM, and ends at about 7:00 PM. The time period are divided into time slots, and derive the energy generation rate at the ith time slot



_i, using the SAM and PVWatts models.

The solar energy generation changes only at a time scale of several minutes rather than considering the instantaneous solar energy fluctuations. This because our algorithm optimizes the cell size of BSs at an interval of several minutes, and the instantaneous energy generation changes may not significantly affect the performance of our algorithm. Since the solar energy generation parades temporal dynamics, the available solar energy cannot always guarantee the sufficient energy supplies to the BSs. BSs located in the same geographical area are assumed to experience almost the same weather environment including solar intensity and temperature. Thus, the solar panels of all the BSs harvest the same green energy generation rate is assumed.

(20)

- 19 -

Fig.2. 3. The solar energy generation per hour.

(21)

- 20 -

2.3 The GEO (green energy optimization)

The GEO problem is to find the optimal pilot signal power of BSs which sets the coverage area of the BSs. disposed the user distribution and the coverage area of the BSs for individual BSs, the green energy allocation at each time slot should be optimized; for the network, the energy consumption among BSs should be balanced —the BSs with a larger amount of green energy should increase their coverage areas to absorb traffic from the BSs with less green energy. Accordingly, it is difficult to optimize the green energy utilization because the optimization involves two dimensions: the time dimension and the space dimension. Therefore, the GEO problem is decomposed into two sub-problems.

The first sub-problem is the MEA problem, which objectives to optimize the green energy usages at different time slots to enlarge the temporal dynamics of the green energy generation and the mobile traffic. The second sub-problem is the MEB problem, which expands the spatial dynamics of the mobile traffic and seeks to maximize the utilization of green energy by balancing the green energy consumption among BSs.

Solving the MEA problem has two benefits on minimizing the on-grid energy consumption. The first one is that solving the MEA problem may diminish the number of energy gaps on individual BSs. The second one is that solving the MEA problem limits the energy gaps at individual time slots. With a limited energy gap, the probability of filling the gap by solving the MEB problem increase

Solving the MEB problem depends in several factors: The spatial diversity

(22)

- 21 -

of the mobile traffic, the energy consumption of closely located BSs may exhibit a great difference. The unstable energy utilization may result in an underutilization of green energy. To maximize the utilization of green energy, the green energy consumption is balanced among BSs via adapting their cell sizes. The optimal cell sizes are chosen according to the amount of green energy and mobile traffic demands. In general, BSs with larger amount of green energy are enforced to have larger cell sizes. BSs adapt their cell sizes by changing the power of their pilot signals. Mobile users select BSs based on the performance of pilot signals from BSs. Therefore, if a BS increases its cell size by increasing its pilot signal strength, the number of mobile users associated with the BS may increase, and thus the energy consumption of the BS may increase. In this way, the energy consumption among BSs is balanced. The solution to the MEB problem is to optimize BSs’ cell sizes at individual time slots, and so to balance energy consumption among the BSs.

The GEO problem involves the optimization in two dimensions: The time dimension and the space dimension. The optimization in the time dimension, the MEA problem, is to optimize the green energy utilization at each time slot for individual BSs while the optimization in the space dimension, the MEB problem, is to balance the energy consumption among BSs. For illustrative purposes, consider the network scenario shown in Fig.2.4 BS 1 and BS 2 are neighboring BSs but experience different traffic demands, and thus they consume different amounts of energy. The green energy generation is the same in both BSs, which is 5 units in the first time slot and 3 units in the second time slot. In the first time slot, there are three users in the network: user 1, user 2, and user 3. User 1 and user 2 are associated with BS 1, and consume 2 units and 3 units of energy from

(23)

- 22 -

BS 1, respectively. User 3 is associated with BS 2, and consumes 1 unit of energy.

In the second time slot, there are four users in the network, and the new user, user 4, is associated with BS 1, and consumes 3 units of energy.

Three network operation strategies are compeered in this method : 1) With no optimization.

2) With only the optimization in space dimension.

3) With the optimization in both time and space dimensions.

For the first network operation strategy, BS 1 consumes zero unit on-grid energy in the first time slot. In the second time slot, three users are associated with BS1 and the total energy consumption is 8 units. Since BS 1 only has 3 units of green energy, which are less than the energy consumption, BS 1 is powered by on-grid energy and consumes 8 units on-grid energy. BS 2 consumes zero unit on- grid energy in both time slots. For the second operation strategy, BS 1 consumes zero units on grid energy in the first time slot. In the second time, since BS 1 does not have sufficient green energy, it reduces the coverage area and offloads user 2 to BS 2. After the offloading, the energy consumption on BS 1 is 5 units, which are larger than the amount of green energy in BS 1.

As a result, BS 1 is powered by on-grid energy and consumes 5 units on-grid energy in the second time slot. Since BS 2 only consumes 1 unit green energy in the first time slot, the available green energy in BS 2 in the second time slot is 7 units. Therefore, BS 2 has sufficient green energy to on-grid energy in both time slots. For the third operation strategy, BS 1 optimizes the green energy utilization in the time dimension. As a result, BS 1 allocates 3 units of green energy in the first time slot, and allocates 5 units of green energy in the second time slot. Then,

(24)

- 23 -

in the first time slot, BS 1 reduces its coverage area and offloads user 2 to BS 2.

The energy consumption of BS 1 turns into 2 units, which are less than the green energy allocation. Thus, BS 2 is powered by green energy, and consumes zero units on-grid energy.

In the second time slot, user 2 is still associated with BS 2. The available green energy and the energy consumption in BS 1 are 6 units and 5 units, respectively. Therefore, BS 1 can be powered by green energy. Thus, by optimizing green energy utilization in both time dimension and space dimension, BS 1 consumes zero unit on-grid energy in both time slots. Although user 2 is offloaded to BS 2, BS 2 has adequate green energy to provide service to both user 2 and user 3. Therefore, the network consumes zero unit on grid energy.

Hence, optimizing green energy utilization in both the time dimension and the space dimension reduces the on grid energy consumption. Therefore, the GEO problem is dissolved into the MEA problem and the MEB problem [31].

(25)

- 24 -

Fig.2.4 The rational of the decomposition.

(26)

- 25 -

Chapter 3 Sleep Mode

3.1 Sleep Mode in LTE

During the past decades, the mobile hand-held devices involving cellular phones have become very popular. Moreover, to support both voice and high-bandwidth data services, new systems are being developed. IEEE 802.16e is one of the aspirants for the next-generation mobile networking. Formerly, the IEEE 802.16 has been designed for the fixed subscriber stations (SSs) [32]. On the other hand, the arising IEEE 802.16e, which is presently under the standardization, is an extension targeting at the service restocking to the Mobile Service Stations (MSSs) [33]. As part of the mobility extension, the 802.16e defines a handoff procedure and a sleep mode operation. Uniquely, the sleep mode operation for the power saving is one of the most important features for the battery-powered MSSs to extend their operational lifetime.

Under the sleep mode operation, an MSS initially sleeps for a fixed amount of time, called sleep window, and then wakes up in order to find if the base station (BS) has any buffered downlink traffic destined to itself. If there is no such traffic, it basically doubles the sleep window size up to the maximum sleep window size, and then checks with the BS when it wakes up again. The related operational parameters including the initial and maximum sleep window sizes can be haggled between the MSS and BS. On the other hand, if an MSS has packets to send for uplink transmissions, it can wakes up prematurely to prepare for the uplink transmission, i.e., bandwidth request, and then the MSS can transmit its pending packets upon bandwidth allocation by its BS. There have been many studies that evaluate the power saving in various systems.

(27)

- 26 -

In [34], the authors evaluate the energy consumption of various access protocols for wireless infrastructure networks. To support short-lived traffic such as HTTP efficiently, bounded slowdown method which is similar to that of the 802.16e is also proposed for the 802.11 WLAN. The power saving mechanism for the 3G UMTS system is also appraised in [35]. The sleep mode operation is modeling of the 802.16e as a semi-Markov chain in order to analyze its performance quantitatively. With this model, the steady state probability distribution in order to derive the packet delay and power consumption performances is obtained. The best values for the operational parameters related to the sleep mode operation, i.e., initial sleep window size and final sleep window size for a given packet arrival rate is defined .

3.2 SLEEP MODE OPERATION IN IEEE 802.16E

An 802.16e MSS with registration with a specific BS can be in one of two operational modes, namely, awake mode and sleep mode. MSSs in the awake mode can send or receive data according to the Base Station (BS)'s scheduling.

On the other hand, MSSs in the sleep mode can be absent from the serving BS during pre-negotiated intervals. Before switching to the sleep mode from the awake mode, the MSS shall inform the BS using a sleep request message (MOB- SLP-REQ) and obtain its approval through a sleep response message (MOBSLP- RSP) from the BS. After receiving an MOB-SLP-RSP message from the BS, the MSS can enter the sleep mode. The sleep mode involves two operational windows

(28)

- 27 -

(i.e., time intervals), namely, sleep window and listening window, and an MSS in the sleep mode basically switches between two windows. At the same time as a sleep window, an MSS turns off most of its circuits in order to minimize the energy consumption, and hence cannot receive/transmit any message.

If any packet(s) intended to the MSS in the sleep mode arrive at the BS during the sleep window of an MSS, these packets are buffered so that they can be delivered to the MSS when it is awake in the future. During a listening window, an MSS synchronizes with its serving BS' downlink (i.e., BS-to-MSS) and listens to a traffic indication message (MOB-TRF-IND), which indicates whether there is any buffered packet(s) destined to the MSS, to decide whether to stay awake to receive the pending packet(s) or go back to sleep. The sleep window size basically increases binary exponentially. That is, when an MSS enters the sleep mode from the awake mode, it sleeps during the initial sleep window long first. Then, at the beginning of the listening window, the MSS wakes up to receive a MOB-TRF-IND, and if there is no packet buffered and destined to itself, it doubles the sleep window size, and sleeps until the next listening window.

Once the MSS has reached the final sleep window size, it shall continue its sleep mode without increasing the sleep window size any further. The values of both initial and final sleep window sizes (along with the listening window size) are determined during the MOB-SLP-REQ/MOB-SLPRSP exchange. These sleeping-and-listening events repeat with updated sleep window sizes until the MSS is notified of the buffered packets destined to itself, at which instance the MSS enters the awake mode by accomplishing a sleep interval in order to receive

(29)

- 28 -

the buffered packets. Fig 3.2 illustrates the relationship among the different time intervals when an MSS is served by the serving BS. A and I represent packet serving and idle MAC frames , respectively, and SWi and L represent the t-th sleep window and listening window, respectively.

Fi Fig. 3.1 A timing diagram of IEEE 802.16e sleep mode operation

T, TI and TL represent the durations of a MAC frame, the initial sleep window, and the listening window duration, respectively. TF represents the final sleep window despite of showing in this figure. Finally, TTH is an idle frame threshold;

the MSS enters the sleep mode from the awake mode when there is no traffic destined to itself for the time interval of the idle frame threshold [36].

3GPP LTE power saving mechanisms have also been a subject of a number of research papers. In [37], it is debated that the DRX mode may reduce the mobile client energy consumption by 40-45% for video traffic and by 60% for VoIP traffic.

However, the case of HTTP traffic is concluded to be the most promising, where the respective energy consumption may be decreased by up to 95%.

(30)

- 29 -

This parameter also depends on the traffic arrival rate [38]. The effect of the DRX parameters on client energy consumption and the mean packet delay is studied in [39]. However, existing research works are not addressing the categorical comparison between the sleep mode and the DRX mode.

3.3 IEEE 802.16m Sleep Mode Operation

IEEE 802.16m Sleep Mode Operation The importance principle of IEEE 802.16m sleep mode algorithm is to consider mobile subscriber (MS) operation as a sequence of sleep cycles. A sleep cycle constitutes two periods: an active period and a sleep period.

During its active period, an MS listens to the channel activity, whereas it turns off its radio part during a sleep period in order to reduce its power consumption. Figure 3.3 explains an example of the sleep mode operation. In the figure, each sleep cycle starts with a listening interval during which the base station (BS) notifies the MS about downlink data to be transmitted. In case of no data, the MS initiates a sleep period with the respective sleep cycle duration given by follow equation:

Where Ci is the duration of the current sleep cycle; Ci−1 is the duration of the previous sleep cycle; Cmax is the maximum allowed duration of a sleep cycle.

(31)

- 30 -

When the BS has pending downlink data packets, their transmission starts instantaneously once the recipient MS enters the active state. The MS also resets its sleep cycle period to the initial value. Consequently, listening period increases, whereas sleep period decreases. Listening period may grow up until the end of the current sleep cycle. In the extreme case, this sleep cycle may have no sleep period at all.

After each frame where it received some data, the MS continues listening

to

the channel activity. As such, the MS learns about whether the BS has more pending data packets to transmit. If there were no more downlink packet transmissions during the inactivity timer countdown, the MS enters the sleep state until the end of the current sleep cycle. Furthermore, the BS may surely notify the recipient MS about its empty downlink buffer, when it transmits the last data packet.

(32)

- 31 -

Fi g. 3.2 The sleep mode operation

(33)

- 32 -

Chapter 4 Markov decision process (MDP)

4.1 Definition

Markov Decision Processes provide a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of the decision maker. MDPs are used in a variety of areas including robotics, automated control, planning, economics and manufacturing. For the solving of MDPs various different approaches exists.

Value Iteration is an algorithm which falls under the class of dynamic programming methods and can be used to solve MDPs.

In recent years, graphics processing units have been evolving rapidly from being very limited processing devices with the sole purpose of accelerating certain parts of the graphics pipeline into fully programmable and powerful parallel processing units.

There are various algorithms to compute the optimal policy in an MDP. The three most commonly used are linear programming, value iteration, and policy iteration. A key component in all three algorithms is the computation of value functions also a value for each state x in the state space. With an unequivocal representation of value functions as a vector of values for the different states, the solution algorithms all can be implemented as a series of simple algebraic steps.

(34)

- 33 -

Markov Decision Processes provide a mathematical framework for decision making. They are widely used for the solving of real-world problems of both planning and control as they are surprisingly capable of capturing the essence of purposeful activity in a variety of situations. For those reasons they have formed the basis on which many important studies in the field of learning, planning and optimization have been built upon. As a result several deferent techniques have been developed for their solution.

Two algorithms which fall under the category of Dynamic Programming and have been successfully applied to solving MDPs are Policy Iteration and Value Iteration. These Dynamic Programming methods perform sweeps through the state space and do a full backup operation on each state. Each backup updates the value of a single state based on the values of all possible successor states and the probability of ending up in that state. Because of the requirement of these dynamic programming methods for doing complete sweeps of the whole state space they are often considered impractical for very large problems. But, comparing them to other methods for solving MDPs we notice that they are actually quite enceinte, and that they are guaranteed to find an optimal policy in polynomial time. They are also actually better suited for handling of large state spaces than competing methods such as direct search and linear programming (Sutton & Barto, 1998). Another attractive quality of the Dynamic Programming algorithms is that they do not require the states to be backed-up in any particular order or equally often in order to converge. Thereby, they have over the opportunity for using deferent approaches to sweeping the state space.

(35)

- 34 -

Markov Decision Processes (MDPs) provide a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of the decision maker. They originated in the study of stochastic optimal control in the 1950s and have remained of key importance in that area ever since. Their theory has continued to develop over the last decades to fit a broader spectrum of problems and has lead to a wealth of common algorithmic ideas and theoretical analysis. Today MDPs are used in a variety of areas, including robotics, automated control, planning, economics and manufacturing.

An MDP consists of an agent and an environment that the agent interacts with. These interactions happen over a sequence of discrete time steps t; at each time step t the agent perceives the state of the environment st and selects an action at to perform. The environment reacts to the action by making a transition to a new state st+1 and returns a scalar reward. The agent’s goal is to maximize the total out of reward it receives from its interactions with the environment. The dynamics of the environment are stationary and the state signal must contain all relevant information but is otherwise unconstrained [40].

Uncertainty is a pervasive element of many models in a variety of fields, from computer science to engineering, from operational research to economics, and many more. It is often necessary to solve problems or make decisions without a comprehensive knowledge of all the relevant factors and their possible future behavior. In many situations, outcomes depend partly on randomness and partly on an agent decisions, with some sort of time dependence involved. It is then useful to build a framework to model how to make decisions in a stochastic

(36)

- 35 -

environment, focusing in particular on Markov processes. The latter are characterized by the fact that the way they evolve in the future depends only on their present state, such that each process is independent of any events from the past. A variety of important random systems can be modeled as Markov processes, including, but not limited to, biological systems and epidemiology, queuing systems, financial and physical systems.

Due to the pervasive presence of Markov processes, the framework to analyze and treat such models is particularly important and has given rise to a rich mathematical theory. This report aims to introduce the reader to Markov Decision Processes (MDPs), which specially model the decision making aspect of problems of Markovian nature. Used in the past and the deferent perspectives they arise from

As previously mentioned, the goal is equipped the reader with the necessary notions to understand the general framework from MDPs and some of the work that has been developed in the field. Note that Putermans book on Markov Decision Processes [41], as well as the relevant chapter in his previous book [42] are standard references for researchers in the field. For readers to familiarize with the topic, Introduction to Operational Research by Hillier and Lieberman [43] is a well-known starting text book in O.R. and may provide a more concise overview of the topic. In particular, they also offer a very good introduction to Markov Processes in general, with some specific applications and relevant methodology. A more advanced audience may wish to explore the

(37)

- 36 -

original work done on the matter. Bellman's [44] work on Dynamic Programming and recurrence sets the initial framework for the field, while Howards [45] had a fundamental role in developing the mathematical theory. References on specific aspects are provided later in the relevant sections.

Finally, due to space restriction, and to preserve the flow and cohesion of the report, applications will not be considered in details. A renowned overview of applications can be found in White's paper, which provides a valuable survey of papers on the application of Markov decision processes, “classified according to the use of real life data, structural results and special computational schemes”

[46]. Although the paper dates back to 1993 and much research has been developed since then, most of the ideas and applications are still valid, and the reasoning and classifications presented supports a general understanding of the field. Puterman's more recent book [47] also provides various examples and directs to relevant research areas and publications.

Although there are different formulations for MDPs, they all show the same key aspects. In this section, we follow the notation used by Puterman, which provides a fairly concise but rigorous overview of MDPs.

To start with, we have “a system under control of a decision maker *which+

is observable as it evolves through time”. A few characteristics distinguish the system at hand as show in Fig 4.1:

(38)

- 37 -

Figure.4.1: MDP structure

 A set T of decision epochs or stages t at which the agent observes the state of the system and may make decisions. Different characteristics of the set T will lead to different classifications of processes (e.g. finite/infinite, structure discrete/continuous).

 The state space S, where St refers to the states for a specific time t.

 The action set A, where in particular As,t is the set of possible actions that can be taken after observing state s at time t.

 The transition probabilities, determining how the system will move to the next state. Indeed, MDPs owe their name to the transition probability function, as this exhibits the markov property1. In particular, ^p^t⁽^j^|^S^,^a⁾ defines the transition to state

(39)

- 38 -

1

S_t

j at time t + 1, and only depends on the state s and chosen action a at time t.

 The reward function2, which determines the immediate consequence for the agent's choice of action a while in state s. In some cases, the value of the reward depends on the next state of the system, effectively becoming an “expected reward”. Following simple probability rules, this can be expressed as:

Where r_t(S,a,j)is the relevant reward in case the system will next be in state j.

A Markov Decision Process can then be defined by the quintuple(T,S_t,A_s_,_t,p_t(j|S,a),r_t(S,a))

, with distinctions between types of MDPs relying on different assumptions. Note that the above serves as a general definition for the reader to grasp the key aspects of any MDP, and understand the reasoning behind the main approaches used, as presented in Subsection 4.2.

4.2 Algorithms

The objective of MDPs is to provide the decision maker with an optimal policy:SA. Policies are essentially functions that rules, for each state, which action to perform. An optimal policy will optimize (either maximize or minimize) a predefined objective function, which will aim to achieve different targets for different formulations. Then, the optimization technique to use

(40)

- 39 -

depends on the characteristics of the process and on the "optimality criterion"

of choice that is the preferred formulation for the objective function. MDPs with a specified optimality criterion (hence forming a sextuple) can be called Markov decision problems. Although some literature uses the terms process and problem interchangeably, in this report we follow the distinction above, which is consistent with the work of Puterman referenced earlier. For simplicity, we present the algorithms assuming the state and action spaces, S and A, are finite.

Note that most concepts are applicable, given relevant adaptations, to general cases too, more details can be found in the given references.

In most situations, a policy is needed that maximizes some cumulative function of the rewards. A common formulation is the expected discounted sum over the given time horizon, which may be finite or infinite. The following formulation can be used:

Where 0 1 is the discount rate. Note that h will in fact be infinity in the infinite horizon case. The formula can also be adapted to situations where the reward depends not only on the time, but also on the current or future state of the system, the action chosen, the policy, or all of the above.

An important hypothesis, although still unproven, unifies all goals and purposes to the form given above, stating they may all be formulated as a maximization of a cumulative sum of rewards [48].

A variety of methods have been developed during the years. Among these,

(41)

- 40 -

exact methods work within the linear and the dynamic programming framework.

We focus on the latter, and present in the following section two most influential and common exact methods available, namely value iteration and policy iteration. Subsection 4.3.2 will then consider an approximate method to approach non-deterministic cases. In both cases, we only provide a brief conceptual overview of the approaches considered.

4.3 Exact Methods: Dynamic Programming

As mentioned before, MDPs first developed from Bellman's work on dynamic programming, so it's not surprising that they can be solved using techniques from this field.

First, a few assumptions need to be made. Let the state transition function P and the reward function R be known, so that the aim is to obtain a policy that maximizes the expected discounted reward. Then let us de ne the value 3 V for policy starting from states as:

Which it gives the overall expected value of the chosen policy from the current to the final state (note that in this case we assume a finite-horizon of length h).

The standard algorithms proceed iteratively (although

(42)

- 41 -

versions using systems of linear equations exist) to construct the following two vectors Defined as follows:

 the optimal actions

 the discounted sum of the rewards

Note that V (s) is the iterative version of the so-called Bellman equation, which determines a necessary condition for optimality to be obtained. The main DP algorithms to solve MDP differ in the order they repeat such steps, as we brie y see in Sections 4.3.1 and 4.3.2. In both cases, what matters is that, given certain conditions on the environment, the algorithms are guaranteed to converge to optimality.

4.3.1 Value iteration

First proposed in by Bellman in 1957, the value iteration approach, also called backward induction, does not compute the policy function separately. In its place, the value of (s) is calculated within V (s) whenever it is needed.

This iterative algorithm calculates the expected value of each state using the value of the adjacent states until convergence (that is, the improvement in value between two consecutive states is smaller than a given tolerance). As per usual

(43)

- 42 -

in iterative methods, smaller tolerance values insure higher precision in results.

The algorithm 1 follows the following logic shown on the below, and terminates when the optimal value is obtained.

4.3.2 Policy iteration

The body of research developed by Howard first sparked from the observation that a policy often becomes exactly optimal long before value estimates have converged to their correct values. The policy iteration algorithm focuses more on the policy in order to reduce the number of computations needed whenever possible. First, an initial policy is chosen, often by simply maximizing the overall policy value using rewards on states as their value. The resulting iterative procedure is shown as express in algorithm 2:

(44)

- 43 -

Two steps follow:

(45)

- 44 -

1. Policy evaluation, when we calculate the value of each state given the current policy until convergence;

2. Policy improvement, to update the policy using eq.(3) until an improvement is possible.

The algorithm terminates when the policy stabilizes. A special class of MDPs called partially observable Markov decision process (POMDP) deals with cases where the current state is not always known. Although these are outside the scope of this project, we refer the reader to a noteworthy online tutorial [49], providing both a simplified overview of the subject and references to the key publications in the area.

Another kind of uncertainty arises when the probabilities or rewards are unknown. In these situations, the ideas presented can be used to develop approximate methods. A popular field concerned with such framework, especially in artificial intelligence, is that of reinforcement learning. This section is based on the publication of Sutton and Barto, whose work is an essential piece of re-search in the field and introduces RL algorithms. For a more extensive treatment of RL algorithms, the work of Bertsekas and Tsitsiklis [50] is a standard reference.

Reinforcement learning methods often rely on representing the policy by a state-action value function where

(46)

- 45 -

The policy from before is then just

Function Q essentially describes the scenario of where we choose action a, to then either continue optimally or according to the current policy. Although this function is also unknown, the key to reinforcement learning techniques is their ability to learn from experience. In practice, that corresponds to exploiting the information from the past and upcoming state of the system, that is from the triplets (s; a; s0). Similar algorithms to value iteration can then be performed.

Define a Q-learning update to be

The figure 4.2 shows model of a reinforcement learning system. First the decision process observes the current state and reward then the decision process performs an action that effects the environment. Finally the environment returns the new state and the obtained reward. [51]

(47)

- 46 -

Figure 4.2 Model of a reinforcement learning system

(48)

- 47 -

Chapter 5 System Model

In this work , the base station which connects wireless sensor network (WSN) is adopted in hybrid energy. The hybrid energy consists of constant energy source and solar energy harvester. To minimize power consumption of the constant energy source in base station, Markov decision process (MDP) model is used. In this model, one buffer is transmitting a given number of data packets over finite transmission time intervals. To find the optimal transmit power for every time interval, the iterations of MDP model is used in scenario. In this scenario, the received solar energy harvester will not be used if it is not enough to transmit the packets in the buffer. This power consumption policy allows harvester energy usage efficiently. But the drawback could be the probability to drop packet if the buffer did not use harvester energy immediately. The numerical results indicate that proposed scenario can reduce average AC power usage up to 50%

5.1 Hybrid model

The hybrid energy consists of two sources. The first source is a constant energy source -e.g., AC power and diesel fuel power generator. The second source is the solar energy harvesting [2]. The architecture of our model is defined as follow in the Fig 5.1 that shows base station (BS) connects with two kind of power.

(49)

- 48 -

Figure 4.2 Model of a reinforcement learning system Figure 5.1 Hybrid architecture

(50)

- 49 -

In Fig 5.2, hybrid power is used two types for this model. In addition, every packet needs one battery or AC power unit to transmit it. To minimize AC power with minimum drop packets, the decision for using AC and solar power immediately or not is depended on the

MDP model. MDP model finds optimal solution using the reward function for proposed scenario.

Figure 5.2 Hybrid model

(51)

- 50 -

5.2 MDP model

In this model, MDP provides the decision maker for <SSA, A, P, R> with an optimal policy [3]. Where SSA is states of the queue buffer (using solar harvesting and AC power) and A is action for transmit packets. Moreover, P is transition probability which is based on states and action. To achieve the target for this model, reward function (R) finds the minimum cost value (AC power usage).

Our MDP model is defined as following:

<SSA, A, P, R>

SSA : State of the Queue (using Solar and AC power) A : Action

P : Transition probability (the probability of the battery buffer is different of probability of packet buffer (Q))

R : Reward (minimization AC power by maximization utilization from battery)

1) First we define the battery capacity

2) We’ll define the battery satiates of base station in N levels Battery states B= ,0,1,…NB-1}

3) Second we define the queue capacity

4) We’ll define the queue (Q) satiates of base station in N levels Packet state Q= ,0, 1,… NQ-1}

Set of action: A= ,0, 1,…N-

(52)

- 51 -

AA: Action for using AC power AS : Action for using solar power A= AS× AA

= {(TS, TA)}

 When AAC =0 that’s meaning no using AC power but just solar power.

 When AS =0 that’s meaning no using solar power but just AC power.

 When AS ≠0 AAC ≠0 and that’s meaning using solar power and AC power together (Hybrid).

Example: when A= {0, 1, 2} is the same for Solar and AC power then:

A={(0,0),(0,1),(0,2),(1,0),(1,1),(1,2),(2,0),(2,1),(2,2)}

Black numbers means No Transmit, Red numbers means using AC only, Green numbers mean using Solar only, and Blue numbers mean using Hybrid.

Each action has four transitions for different transition probabilities as below:

(53)

- 52 -

5.3 Practical scenarios

In this thesis, we use MDP optimal solutions in two practical scenarios as follow:

 First scenario is receiving packets before the action of transmission begins. Moreover, there is probability for delaying using harvester energy if the received solar energy harvester is not enough to transmit the packets in the buffer. In the second scenario, the received solar energy harvester will be used immediately.



Second scenario is receiving packets before the action of transmission begins, too. Beside that it will be used solar energy harvester immediately to transmit packets.

The Fig.5.3 shows when both scenario use solar energy as follow:

Figure 5.3 Practical scenarios

a- Scenario 1 with K transmission intervals, probability for incoming harvested energy , and harvested energies .

b- Scenario 2 is the same of Scenario 1 but Using Harvesting energy immediately to

K H_K

(54)

- 53 -

transmit packet.

In the below table, we explain difference between two scenario.

We assumed that the harvesting energy units and packet arrivals occurred after the action of transmit packet.

Transition probability of Buffer for simple case equals:

Probability of harvesting solar power multiple by probability of receiving new packet. States (2 by 2) for Packet arrival model and energy harvesting unit with action for first and second scenario is expressed in the following figure 5.2 and 5.3:

(55)

- 54 -

Figure 5.4 States (2 by 2) for first scenario

(56)

- 55 -

Figure 5.5 States (2 by 2) for second scenario

(57)

- 56 -

中 華 大 學

中 華 大 學

碩 士 論 文

4G LTE 基地台混合能源利用效率研究 Hybrid Energy Utilization for Harvesting Base

Station in 4G LTE

學 系 別：電機工程學系碩士班 學號姓名：M10201041 泰森

指導教授：余 誌 民 博士

中 華 民 國 104 年 8 月

摘 要

ABSTRACT

Acknowledgment

List of Abbreviations

Contents

List of Figures

Chapter 1 Introduction

 S , A , P

, RW



P

RW

RW

Chapter 2 Hybrid Energy

2.1 harvester energy

N

j





S



S

j S i a

S

P (

 |

 ) 



Chapter 3 Sleep Mode

3.1 Sleep Mode in LTE

3.2 SLEEP MODE OPERATION IN IEEE 802.16E

3.3 IEEE 802.16m Sleep Mode Operation

to

Chapter 4 Markov decision process (MDP)

4.1 Definition

4.2 Algorithms

4.3 Exact Methods: Dynamic Programming

4.3.1 Value iteration

4.3.2 Policy iteration

Chapter 5 System Model

5.1 Hybrid model

5.2 MDP model

5.3 Practical scenarios



The Fig.5.3 shows when both scenario use solar energy as follow:

T he Action and transition probabilities are explained in the follow

table:

中華大學

中華大學

碩士論文

學系別：電機工程學系碩士班學號姓名：M10201041 泰森

指導教授：余誌民博士

中華民國 104 年 8 月

摘要

_S