以賽局理論分析能量收集無線感測網路之隨機接取控制
全文
(2)
(3) 中文摘要 傳統無線感測網路以電池作為電源供應,然而一旦電池用盡,網路將陷入無法使用的困 境,直至更新電池為止。 但在某些無線感測網路應用當中,更換電池幾乎不可行,譬如 建築監測感測系統。 為了解決這問題,人們轉向使用可從環境汲取能源之能量收集無線 感測網路。 在這論文裡,我們建立了能量收集無線感測網路之隨機接取控制理論模型, 探討眾多無線感測裝置爭取有限傳輸資源產生之問題。 考慮無線感測裝置會自私地最大 化自己的效用,所有裝置將會不顧系統整體效能選擇傳輸,使得系統陷入最糟情況。 我 們提出兩種激勵機制:收費機制和干擾機制,用以防止系統陷入最糟狀況。 這兩種激 勵機制可以誘使無線感測裝置選擇對系統而言最佳的策略,使系統達到社會最適(Social Optimal)、或是比例公平(Proportional Fair)的分配。 在論文最後,我們深入探討無線感 測裝置可選擇留存能源之延伸模型。 我們發現,無線感測裝置會根據每段時間的能量收 集機率,決定是否將能源留存至未來。 在系統達成平衡之後,無線感測裝置會在能量收 集機率較高的期間,選擇較高的傳輸機率。. 關 鍵 字: 賽 局 理 論 、 納 許 均 衡 、 能 量 收 集 無 線 感 測 網 路 、 激 勵 機 制 、 社 會 最 適 、 比例公平. ii.
(4) Abstract Traditional wireless sensor networks (WSN) are powered by batteries. Once the batteries run out, the devices become useless until they are replenished. However, for some kinds of applications, such as building structure monitoring, it is nearly impossible to replenish the batteries of devices. To overcome this problem, people turn to the energy-harvesting (EH) WSNs which can harvest energy from the environment. In this work, we construct theoretic models where devices are competing for limited transmission resource. Since the devices are selfish, they all choose to transmit regardless of others’ strategy, which leads to the severe network congestion. We propose two incentive mechanisms, a pricing scheme and an intervention scheme, that prohibit the system outcome from the worst case. The incentive scheme can induce the desired optimal outcomes which maximize the social welfare or the proportional fairness. In the last part, we also build an extension model in which the energy can be stored for the future. We show that it is more likely that the device chooses to save some energy for the period when the energy harvesting probability is comparatively low. On the other hand, the devices will choose a higher transmission probability at the period when the energy harvesting probability is comparatively high. Keywords: game theory, Nash equilibrium, energy-harvesting WSNs, incentive mechanism, social optimal, proportional fairness. iii.
(5) Contents Master Thesis Certification by Oral Defense Committee. i. Chinese Abstract. ii. Abstract. iii. Chapter 1 Introduction. 1. Chapter 2 Related Works. 3. 2.1. Energy Issue in WSNs . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3. 2.2. Medium Access Control Game . . . . . . . . . . . . . . . . . . . . . . .. 5. 2.3. Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6. Chapter 3 One-Shot Models. 7. 3.1. The Random Access Control (RAC) Model . . . . . . . . . . . . . . . .. 7. 3.2. The RAC Game: Definition and Solution Concept . . . . . . . . . . . . .. 9. 3.3. Nash Equilibrium: A Potential Game Approach . . . . . . . . . . . . . .. 10. 3.4. The Energy-Harvesting RAC Game . . . . . . . . . . . . . . . . . . . .. 12. 3.5. Nash Equilibrium in Energy Harvesting RAC Game . . . . . . . . . . . .. 13. Chapter 4 Incentive Mechanisms. 16. 4.1. Pricing Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 16. 4.2. Intervention Function . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 20. Chapter 5 The Desired Outcome. 24. 5.1. The Proportionally Fair Outcome . . . . . . . . . . . . . . . . . . . . . .. 24. 5.2. The Social Optimal Outcome . . . . . . . . . . . . . . . . . . . . . . . .. 28. 5.3. Adopting the Incentive Schemes to Achieve the Optimal Outcomes . . . .. 32. iv.
(6) Chapter 6 Extension: Multi-Period Model. 36. 6.1. Model Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 36. 6.2. The Probability of Harvesting Energy . . . . . . . . . . . . . . . . . . .. 37. 6.3. Nash equilibrium: Intersection of Best Response . . . . . . . . . . . . .. 39. 6.4. Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 41. Chapter 7 Numerical Results. 44. 7.1. System Performance With Different Parameters . . . . . . . . . . . . . .. 44. 7.2. Simulation of The 2-Period 2-Device EH-RAC Game . . . . . . . . . . .. 46. 7.3. Extended Simulations of Multi-period Multi-device EH-RAC Game . . .. 48. Chapter 8 Conclusion. 53. Bibliography. 55. v.
(7) List of Figures 1.1. The comparison between battery-powered WSNs and WSN-HEAP.. . . .. 1. 3.1. The Nash equilibrium is the point that maximizes the potential function. .. 12. 4.1. The equilibrium under the pricing scheme is nearly identical to the target one. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4.2. The equilibrium under the intervention scheme is exactly identical to the desired one. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 5.1. 45. The value of fairness index changes with the number of transmission resource and the number of resource. . . . . . . . . . . . . . . . . . . . .. 7.4. 45. The value of social welfare changes with the number of transmission resource and the number of resource. . . . . . . . . . . . . . . . . . . . .. 7.3. 31. The social optimal outcome and the proportionally fair outcome under different choices of parameters. . . . . . . . . . . . . . . . . . . . . . . .. 7.2. 31. The social optimal outcome leads to the highest social welfare, compared with other outcomes that are randomly generated. . . . . . . . . . . . . .. 7.1. 23. An example of social optimal outcome: device 2 has to choose s2 = 1 and device 1 has to choose s1 = 0. . . . . . . . . . . . . . . . . . . . . .. 5.2. 19. 46. The utility function u1 is a concave function increasing along the strategy s1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 48. 7.5. The Nash equilibrium in the multi-period model . . . . . . . . . . . . . .. 48. 7.6. The energy relation of two-period model . . . . . . . . . . . . . . . . . .. 48. vi.
(8) 7.7. The difference of equilibrium strategy and utility of device i with respect to the harvesting probabilities. . . . . . . . . . . . . . . . . . . . . . . .. 7.8. The difference of equilibrium strategy and utility of device j with respect to the harvesting probabilities. . . . . . . . . . . . . . . . . . . . . . . .. 7.9. 49. 49. The difference of equilibrium strategy and utility of device j with respect to the number of device. . . . . . . . . . . . . . . . . . . . . . . . . . .. 50. 7.10 If the EH probability ratio is increasing, the devices with higher EH probability at period 1 will choose a higher transmission probability. . . . . . . . . .. 51. 7.11 If the EH probability ratio is decreasing, the devices with lower EH probability at period 1 will choose a higher transmission probability. . . . . . . . . .. vii. 52.
(9) Chapter 1 Introduction Due to the advancements in the technology for miniaturization of electronic devices, wireless sensor networks (WSNs) have drawn increasing attention recently. There are two types of WSNs according to how the sensors are powered up. The first is the batterypowered WSNs, where the energy of the battery can only deplete with time. The second type are energy-harvesting WSNs, where the sensor makes use of renewable energy to maintain its operation and the battery energy usually remains at low level to avoid energy leakage [1]. In both types of WSNs, proper energy management is essential to maximize each sensor’s utility, such as transmission success rate. As the energy level typically fluctuates at low level (Fig. 1.1), modeling of energy-harvesting sensors require the knowledge of instantaneous harvesting probability and the transmission policy. In a model for energy-harvesting WSNs, the transmission policy of energy harvesting Stored Energy Snapshot. Energy-Harvesting (Discharge) Energy-Harvesting (Recharge) Battery-Powered. t0. t1. t2. t3. t4. t5. t6. Time. Figure 1.1: The comparison between battery-powered WSNs and WSN-HEAP.. 1.
(10) device should take the harvesting probability into consideration. Take Fig. 1.1 as an example. At time t0 , the device successfully harvests a unit of energy and transmit information packets at the same time slot. (Assume that a unit of energy can afford the number of packets that bring a unit of information.) At time t1 , the device harvests a unit of energy but transmit at the next time slot t1 + 1. In general, the device should choose a transmission policy that maximizes the probability of successful transmission, given the probability of harvesting energy. In this work, we use game theory to build models for energy-harvesting WSNs. The transmission resources are limited so that the devices have to compete with each other. In the first model, we consider a battery-based WSN model where devices are not aware of harvesting energy. The devices choose the transmission policy to maximize its own probability of successful transmission. In the second model, we propose our main model for energy-harvesting WSNs where devices harvest energy from the environment. The devices maximize its own probability of successful transmission based on the knowledge of energy-harvesting probabilities. In the third model, we extend the energy-harvesting model into the multi-stage one. The devices are able to store energy for future and consider the future benefit of transmission. Energy-harvesting technique can be implemented on most electric devices. However, we focus on WSNs in this work because of their well-known problem: sensors are easily depleted and difficult to renew, which is exactly the problem that the energy-harvesting technique aims to solve. Besides, sensors are typically small in size and equipped with small energy capacity, which matches the setting in our models. Therefore, to stay focused at the energy-harvesting issue, we only discuss energy-harvesting WSNs throughout this work.. 2.
(11) Chapter 2 Related Works 2.1 Energy Issue in WSNs Energy management plays an important role in wireless sensor networks (WSNs). Pantazis et al. provide a survey on power control issue in WSNs [2]. The power conservation mechanisms are classified into two categories: passive and active. Passive power conservation mechanisms reduce energy consumption of sensor node by turning off its transceiver when there is no transmission, whereas active power conservation mechanisms count on concept of improving the node’s operation instead of turning off the radio module into power-saving mode. Compared with the traditional WSNs, the energy harvesting WSNs start to arise much attention in these years. The energy harvesting devices have some fundamental differences from the traditional ones [1]. First, the energy source is different: traditional devices aim to utilize the battery energy efficiently, while energy-harvesting devices aim to use the harvested energy smartly. Second, the energy capacity is different: typically, traditional devices have a large-capacity battery to reduce the replenishment cost, while energyharvesting devices are equipped small-capacity storage to minimize the size of device. Kansal et al. propose an analytic model that characterizes the power management of energy harvesting WSNs [3][4]. The authors propose the energy-neutral operation where. 3.
(12) the energy used never exceeds the energy harvested. Given the energy-neutral condition is achieved, they also seek the optimal network performance. Seyedi and Sikdar propose a Markov model for energy harvesting nodes and derive the closed form for the loss probability and the average time to run out energy [5]. Such analytic results provide a good guideline for engineers to design protocols for energy harvesting WSNs. Niyato et al. develop a multidimensional discrete-time Markov chain to model the channel, the solar radiation, and the packet arrival [6]. The authors then use Nash bargaining solution to obtain the optimal sleep-or-wake-up strategy. Lei et al. propose a generic model for energy harvesting by using Markov chain [7]. They derive the optimal transmission policy for the node to decide whether to transmit or not. The node would transmit if the value of transmission exceeds a threshold, which depends on its current energy state. Susu et al. present a stochastic framework for energy harvesting WSN nodes [8], which enables designers to assess statistical system performance such as operation time or lifetime. Ho et al. propose a generalized Markovian (GM) model that introduces an additional parameter to capture the non-stationary properties of energyharvesting circumstance [9]. Based on their empirical experiments, the GM model is better than the stationary Markovian model. Sharma et al. study the optimal policy for energy harvesting nodes. The generated data bits and replenished energy are independently independent and identically distributed random variable respectively. After deriving the necessary condition for stability, the authors construct the throughput-optimal policy and the delay-optimal policy. However, the energy conservation is not always beneficial since energy storage units have limited capacity and are leakage-prone. The more energy store in the storage unit, the more energy leak away. Zhu et al. formulate the leakage problem and implement leakageaware feedback control techniques to utilize energy that could leak away effectively[10].. 4.
(13) 2.2 Medium Access Control Game Akkarajitsakul et al. provide a broad survey for multiple access game-theoretic models [11]. Yang et al. construct a non-cooperative game for CSMA/CA networks and design an adaptive price setting to achieve the desired outcome [12]. Cui et al. consider multiple contention measure signals in the random access game model [13]. Their work also studies the dynamics for random access game, including best-response, gradient-play, and Jacobi-play. Chen et al. propose a dynamic game model for contention control [14]. The node can choose its own transmission probability to maximize its own utility. Chowdhury et al. present a game-theoretic model for contention control in IEEE 802.16/WiMax networks [15]. The game considers a saturated network where nodes always have packets to transmit. To ensure the unique existence of Nash equilibrium, the author design a special form of utility function. Since the equilibrium is not necessarily the best outcome, incentive schemes are used to achieve the best outcome. Park et al. have designed a series of new incentive schemes based on intervention to induce the target outcome [16]. The pricing scheme, the repeated interaction, and the intervention schemes are well-known for providing incentives to the users. However, to charge price requires a secure and reliable process between the manager and the user, which creates burden on both sides. On the other hand, the repeated interaction is hard to implement since users change frequently in mobile networks and fixed interactions are hard to sustain. The intervention schemes directly impose intervention on the users, which exclude the problems of pricing schemes and repeated interactions. Compared with the general intervention schemes in [16], the authors provide an example by applying intervention schemes in medium access control game in [17].. 5.
(14) 2.3 Contribution In this work, we construct game-theoretic models for energy harvesting WSNs where several devices compete for limited transmission resources (e.g., wireless channels). The models are generalized from the previous models which consider only single transmission resource. The models can apply to many promising applications. For example, for machine type communication, the wireless machines contend for random access channels (RACH) for dedicated transmission channel [18]. Typically, the number of RACH resources is fewer than the number of machine. Only the machines receiving grants from the base station (BS) can start transmission. However, if more than two machines are granted for transmission, both machines encounter collision and the transmission fails. Besides collisions, the machine also have to consider the energy harvesting probability when determining the transmission policy. The similar problem arises in many other wireless applications. We generalize the medium access control (MAC) game model proposed in [17] and the one in [12]. Compared with their model, our models further consider the issue of energy harvesting and more than one transmission resource that the devices can request for. We derive the proportionally fair outcome and the social optimal outcome, and adopt two incentive schemes to achieve them. Finally, we extend the model to the multipleperiod model where the energy can be stored for the future use.. 6.
(15) Chapter 3 One-Shot Models In this chapter, we discuss one-shot models where devices make decisions simultaneously. The devices have a chance to harvest energy for transmission. The transmission may collide with other devices if two devices choose the same transmission channel. To emphasize the importance of energy harvesting issue, we first build a traditional WSN model where sensors cannot harvest energy. Then we construct our main model, the energy-harvesting WSN model in the second section. The first section considers a typical model where the devices have unlimited energy source and need not to take energy issue into consideration. This is a generalized form of traditional median access control (MAC) model discussed in [14] [16], which consider only single unit of transmission resource. The second section considers the main model where the devices harvest energy from the environment for transmission. In this model, the device harvests a unit of energy with a fixed probability. The transmission cannot success if the device fails to harvest the energy. That is, the transmission success rate is bounded by the harvesting probability.. 3.1 The Random Access Control (RAC) Model We consider a WSN where N devices asking for transmission resources from the base station (BS). There are totally M transmission resource (channel) to allocate. Device 7.
(16) i ∈ N chooses a probability si ∈ [0, 1] to request a transmission channel from these M channels. The set of possible strategies is denoted by Si for device i and S = S1 × S2 × . . . × SN for all devices. The device requesting for channel sends the request directly through that channel. If two devices send the request at the same time, both requests collide and fail. Although collisions are not favorite, there is no way for device to avoid collision before sending the request since there is no global information about the channel occupation. A generally used method for the device to request the channel is to choose randomly (with uniform distribution). The model is therefore called random access control (RAC) model. Given that every device randomly chooses a channel to request, the probability that a device collides with the other is collide with the other is. M −1 . M. 1 . M. On the contrary, the probability that a device does not. Then the value of device as the transmission success rate. is:. vi (s) = si. sj M −1 ) = si (1 − ) ((1 − sj ) + sj M M j=i j=i. The first term in the product is the probability that device j does not request for a channel, and the second term is the probability that device j requests a different channel from the one that device i gets. The summation of these two terms is the probability that device j does not collide with device i. Then finally we know that the value vi is the probability that device i chooses to request and other devices do not collide with device i. For example, consider a three-device system where the devices choose s1 = 0.3, s2 = 0.5, s3 = 0.7. The number of transmission channel is M = 1. Then the value of first device will be v1 = 0.3 ∗ (1 − 0.5) ∗ (1 − 0.7) = 0.045 and similarly v2 = 0.105 and v3 = 0.245. Due to unavoidable collisions, the transmission success rate of each device is smaller than the transmission probability it chooses. Besides, the value of third device is highest since it chooses the highest transmission probability.. 8.
(17) Generally speaking, the device choosing the highest transmission probability obtains the highest value. Considering a special case M = 1, we obtain the previous models like [12] and [17] which consider only single one transmission channel, with the definition of utility function vi (s) = si. . (1 − sj ).. j=i. Since our model considers a more general case, the results derived in the following sections can apply to these previous models.. 3.2 The RAC Game: Definition and Solution Concept After defining necessary notations, we now introduce the mathematical tool to analyze the system. Due to the selfish nature of wireless devices who cares its own transmission, we adopt several concepts in the game theory to analyze the model, such as the potential game and the Nash equilibrium. Our game model is formulated as a RAC game:. ΓRAC = N , (Si )i∈N , (vi )i∈N We introduce the most basic equilibrium concept called Nash equilibrium. Definition 3.1. (Nash Equilibrium) A strategy profile s∗ = (s1 , s2 , . . . , sN ) is a Nash equilibrium if no unilateral deviation in strategy is profitable for any single device, that is, ui (s∗i , s∗−i ) > ui (si , s∗−i ), ∀si ∈ Si , ∀i, where s−i = (s1 , s2 , . . . , si−1 , si+1 , . . . , sN ) is the strategy profile except for device i. To derive Nash equilibrium, we have to solve the optimization problems with multiple objective functions, which has no standard method to solve. However, we find that the RAC game ΓRAC is a (exact) potential game whose equilibrium can be derived in an easier 9.
(18) way. First, we have to introduce the potential game. Definition 3.2. (Potential Game) A game is an (exact) potential game if and only if there exists a function P : S → R such that P (si , s−i ) − P (si , s−i ) = ui (si , s−i ) − ui (si , s−i ), ∀i, si , si . If ui is a continuous function with respect to si , the condition is equivalent to ∂P (s) ∂ui (s) = ∀i. ∂si ∂si. (3.1). Potential games form a precious subset of games. In potential games, all players can be thought of as optimizing a joint objective function, i.e., the potential function P . If the deviation in strategy can increase some players utility, then it can also increase the value of potential function. Therefore, the strategy maximizing the joint objective function P coordinates the Nash equilibrium. Then we can derive the Nash equilibrium by solving single maximization problem.. 3.3 Nash Equilibrium: A Potential Game Approach In this section, we use the potential function to derive the Nash equilibrium. To prove that the RAC game is a potential game, we need the following lemma. Lemma 3.1. Suppose ui are twice continuously differentiable. The game is a potential game if and only if ∂ 2 uj ∂ 2 ui = , ∀i, j. ∂si ∂sj ∂si ∂sj The detail explanation refers to Monderer and Shapleys’ work [19]. Theorem 3.2. The RAC game ΓRAC is a potential game with the potential function P (s) = −M. i. 10. (1 −. si ) M. (3.2).
(19) Proof. To prove that RAC game is a potential game, we must show the condition in Lemma 3.1 is satisfied. ∂ 2 ui ∂ 2 uj 1 sk = (1 − ) = , ∀i, j. ∂si ∂sj M k=i,j M ∂si ∂sj In addition, the derived potential function must satisfy the condition in Definition 3.2. ∂P ∂ui sj = (1 − ) = ∂si M ∂si j=i. Theorem 3.3. The unique Nash equilibrium strategy in ΓRAC is s∗i = 1 for all i. Proof. Since the RAC game is a potential games, the strategy maximizing P is the equilibrium strategy s∗ .. max P = −M s. si (1 − ) M i. (3.3a). subject to sj si (1 − ) ≥ 0 M j=i si ∈ [0, 1]. (Individual Rationality). (3.3b). (Strategy Space). (3.3c). Obviously, the objective function (3.3a) is linearly increasing with si since the firstorder derivative. sj ∂P = (1 − ) > 0 ∂si M j=i. is a positive constant. Therefore, the unique solution for this maximization problem is s∗i = 1 for all i. The equilibrium s∗i = 1 for all i leads to an unwanted result where every device requests for the transmission resource with probability one. The system will become full 11.
(20) The Nash equilibrium −0.2 Potential Value. Utility of Device 1. 0.8. Device 1 always chooses s = 1. 0.6. 1. 0.4 0.2 0 1. −0.4 −0.6 −0.8 −1 1. 1 0.5 Strategy of Device 1. 1 0.5. 0.5 0. 0. Strategy of Device 2. Strategy of Device 1. (a) The Nash Equilibrium. 0.5 0. 0. Strategy of Device 2. (b) The maximum of potential function. Figure 3.1: The Nash equilibrium is the point that maximizes the potential function.. of aggressive devices requesting for limited transmission resource. To avoid this result, we will propose several incentive schemes to induce better results later. We visually show the Nash equilibrium in Fig. 3.1. In Fig. 3.1(a), we show why the device always chooses the strategy s∗i = 1. This is a two-device system. Whatever strategy device 2 chooses, device 1 always chooses s∗1 = 1, and so does device 2, s∗2 = 1. Besides, we find that the utility of device linearly increases with its strategy. In Fig. 3.1(b), we show that the potential value with respect to all possible strategy profile (s1 , s2 ). Recall that the strategy profile (s∗1 , s∗2 ) = (1, 1) that maximizes the value of potential function is the Nash equilibrium, which coincides with the result in Fig. 3.1(a). Notice that the potential value is always negative as we derive in previous section.. 3.4 The Energy-Harvesting RAC Game In the previous section, we have constructed the RAC game and derived the Nash equilibrium by adopting the potential game concept. In this section, we discuss the energy-harvesting device, which has to harvest energy for transmission. The energy harvesting devices must take the energy issue into consideration since the transmission may fail due to the absence of harvested energy. The harvesting probability is assumed to be a constant [4] [5] [7]. In [4], Kansal et. 12.
(21) al. use a function P (t) to formulate the probability of harvesting energy at time t. In [5], Seyedi et al. set the probability a constant ρa . In [7], Lei et al. assume that the battery is replenished with probability α, and is recharged with probability β. In our work, we denote the probability that device i harvests one unit of energy by pi . A unit of energy is enough to send a request and start a transmission. Combining with the energy harvesting issue, we can build our main model as follows. (With a slight abuse of notation, we adopt the notations of energy harvesting RAC game rather than RAC game in the remaining paper.). vi (s) = si pi. M −1 sj pj ) = si pj ) ((1 − sj ) + sj (1 − pj ) + sj pj (1 − M M j=i j=i. (3.4). The first term in the product is the probability that device j does not request for a channel, the second term is the probability that device j sends a request but has not harvested energy, and the third term is the probability that device j has harvested energy but requests a different channel from the one that device i gets. The summation of these three terms is the probability that device j does not collide with device i. Then the value vi is the probability that device i chooses to request and other devices do not collide with device i. The energy-harvesting RAC game is formulated as. ΓEH = N , (Si )i∈N , (vi )i∈N . 3.5 Nash Equilibrium in Energy Harvesting RAC Game Similarly, we prove that the energy harvesting RAC game is also a potential game and then derive the Nash equilibrium. Theorem 3.4. The energy harvesting RAC game ΓEH is a potential game with the potential. 13.
(22) function P (s) = −M. si pi ). (1 − M i. (3.5). Proof. To prove ΓEH is a potential game, we show that the condition of Lemma 3.1 is satisfied.. ∂ 2 uj pi pj sk pk ∂ 2 ui )= = (1 − , ∀i, j ∂si ∂sj M k=i,j M ∂si ∂sj. The potential function must satisfy the condition (3.1). ∂P sj pj ∂ui = pi (1 − )= ∂si M ∂si j=i. Theorem 3.5. The unique Nash equilibrium strategy in the energy harvesting RAC game ΓEH is s∗i = 1 for all i. Proof. To derive the equilibrium strategy, we have to solve the following maximization problem. max P = −M s. . (1 −. i. si pi ) M. (3.6a). subject to sj pj si pi (1 − )≥0 M j=i si ∈ [0, 1]. (Individual Rationality). (3.6b). (Strategy Space). (3.6c). Obviously, the objective function (3.6a) is linearly increasing with si since ∂P sj pj )>0 = pi (1 − ∂si M j=i is a positive constant. Therefore, the unique solution for this maximization problem is s∗i = 1 for all i. 14.
(23) Same as the RAC game, the EH-RAC game has only one Nash equilibrium s∗i = 1 for all device i. The equilibrium that every device requests a channel is unwanted outcome. In the next chapter, we introduce two incentive schemes that can induce the target outcome. By designing incentive schemes which provide appropriate incentive to devices, we can induce the devices to choose the target strategy.. 15.
(24) Chapter 4 Incentive Mechanisms In the previous chapter, we have proposed the energy harvesting RAC game and derived the Nash equilibrium. However, the Nash equilibrium where every device chooses to transmit with probability one leads to the severe collision problem. To deal with the problem, we propose two incentive mechanisms to induce the desired outcome in this chapter. With any of these two incentive mechanisms, we prove that the outcome where every device transmits is no longer the Nash equilibrium. Moreover, the target outcome becomes the Nash equilibrium. The target outcome is any outcome the system manager want to implement. With help of the incentive schemes, we prevent the system from falling into the worst situation and induce it to the desired outcome.. 4.1 Pricing Scheme The pricing scheme is the most direct mechanism that can provide the incentive to reduce the devices’ transmission probability. The system manager can charge the devices according to the strategy that the device chooses and the probability of harvesting energy. Rather than proposing a specific pricing function, we choose to define a generalized price function ci (si , pi ) of device i. Any pricing function that satisfy the following conditions can be used to induce the target outcome. Under the pricing scheme, the utility of device i. 16.
(25) becomes uPi = vi (s) − ci (si , pi ) = si pj. sj pj ) − ci (si , pi ) (1 − M j=i. (4.1). The sequence of events can be listed as follows. 1. The manager chooses a pricing rule ci (si , pi ) that charges the devices according to the requesting probability and the harvesting probability. 2. Knowing the pricing rule, the device chooses a strategy s∗i that maximizes its own utility uPi . 3. The system reaches the Nash equilibrium s∗ . And the EH-RAC game under the pricing scheme can be formulated as follows.. ΓEH−P = N , (Si )i∈N , (uPi )i∈N , which is also a potential game. The first condition is: the pricing function ci (si , pi ) must twice differentiable and increasing marginal with respect to si , that is, ∂ 2 ci (si , pi ) > 0. ∂s2i That is, the charging fee has to increase faster than the linear function of si . With this condition, we can derive the Nash equilibrium in the following theorem. Theorem 4.1. Given the target strategy s˜ = (s˜1 , s˜2 , . . . , s˜N ), the pricing functions solving the following simultaneous equations can induce the target Nash equilibrium in ΓEH−P . ∂ci (si , pi ) s˜i pi N −1 |si =˜si = pi (1 − , ∀i ) ∂si M. 17. (4.2).
(26) Proof. First, we prove the game is a potential game. ∂ 2 ui ∂ 2 uj pi pj sk pk )= = (1 − , ∀i, j ∂si ∂sj M k=i,j M ∂si ∂sj Second, we prove P = −M. si pi )− (1 − c(si , pi ). M i i. (4.3). is a potential function by showing that ∂P ∂ci (si , pi ) sj pj ∂ui )− = pi (1 − = . ∂si M ∂si ∂si j=i Lastly, we derive the maximal point of the potential by applying the KKT condition. In this way, we have to solve these simultaneous functions. ∂P ∂si. = 0, ∀i, that is,. ∂ci (si , pi ) sj pj ∂P )− = pi (1 − =0 ∂si M ∂s i j=i Rewrite the equation we get sj pj ∂ci (si , pi ) 1 ). = (1 − ∂si pi M j=i Multiple both sides from i = 1 to i = N, we get ∂ci (si , pi ) 1 sj pj si pi N −1 )= ) = (1 − (1 − . ∂si pi M M i i j=i i One of the solution is ∂ci (si , pi ) 1 si pi N −1 ) = (1 − . ∂si pi M. 18.
(27) −9.7. Potential Value. −9.8 −9.9 −10 Max = (0.29, 0.73) −10.1 −10.2 1 1 0.8. 0.5. 0.6 0.4. Strategy of Device 1. 0. 0.2 0. Strategy of Device 2. Figure 4.1: The equilibrium under the pricing scheme is nearly identical to the target one.. To prove that the solution is a maximum point, we show that ∂ 2 ci (si , pi ) ∂2P = − <0 ∂s2i ∂s2i by the prior condition. The potential function is a concave function. Hence we complete the proof.. The system manager can achieve the target ˜s = (s˜1 , s˜2 , . . . , s˜N ) by designing an appropriate pricing rule ci (si , pi ). Since that the outcome that every device chooses to transmit with probability one is not favorite, the system manager now can adopt the pricing scheme to induce the target outcome as it wishes. In Fig. (4.1), we use the pricing scheme to achieve the target outcome (0.3,0.7). The maximal point of the potential function (0.3,0.7) is the Nash equilibrium.. 19.
(28) 4.2 Intervention Function Besides the pricing scheme, we adopt another incentive scheme that can induce the target outcome. Jaeok Park and Mihaela van der Schaar have designed an intervention function that can achieve the desired equilibrium [16][17]. In the intervention scheme, the system manager imposes a certain level of intervention on devices’ transmission according to the strategy s of devices. Formally, the intervene function can be expressed by g : S → [0, 1]. The intervention level is not tailored to devices as the pricing scheme, but identical for every device. The sequence of events can be listed as follows. 1. The manager chooses an intervention rule g. 2. The devices i choose a strategy s∗i that maximizes its own utility. 3. The system manager imposes an intervention on all the devices according to their chosen strategy. 4. The system reaches a Stackelberg equilibrium. Because of the participation of the system manager, the EH-RAC game under the intervention scheme is neither a typical non-cooperative game as previous one nor a potential game. In game theory, this kind of game is called Stackelberg Game where a ruler decides the game rule before other players make decision [17]. After the system manager imposes the intervention, the utility of device i becomes. uIi (g, s) = si pj (1 − g(s)). sj pj ). (1 − M j=i. The utility is discount by a factor (1−g(s)) which is inversely proportional to the level of intervention imposed by the system manager. And since the system manager aim to induce the target outcome without imposing much intervention, the utility of the system manager is 20.
(29) ⎧ ⎪ ⎨ 1 − g(s), u0 (g, s) = ⎪ ⎩ 0,. if s = ˜s otherwise. where ˜s is the target outcome that the system manager aims to induce. The manager gets zero utility if the target equilibrium is not achieved. On the other hand, if the target equilibrium is achieved, the manager aims to reduce the level of intervention. The EH-RAC game under the intervention function is formulated as follows.. ΓEH−I = N , (Si )i∈N0 , (uIi (g, :))i∈N0 g where N0 = N ∪ {0} is the player set including the system manager. According to [17], we construct the intervention function as follows.. ∗. g (s) = [. N si − s˜i i=1. s˜i. ]10. (4.4). where the operator [x]ba = min{max{x, a}, b} is used to trim the value between 0 and 1. The level of intervention g increases if any device deviates from the target outcome. Moreover, when the level of intervention reaches the upper bound (i.e., g = 1), the utility of all devices will be zero. We then prove that the intervention function g ∗ and the target outcome s˜ = (s˜1 , s˜2 , . . . , s˜N , ) constitute a Stackelberg equilibrium. Definition 4.1. (Stackelberg Equilibrium) An intervention rule g ∗ and a strategy s∗ constitute a Stackelberg equilibrium if s∗ is a Nash equilibrium of the game under the intervention scheme and g ∗ ∈ arg max u0 (g, s∗) g∈G. where G is the set of all possible intervention schemes. Theorem 4.2. The intervention function g ∗ defined in Eq. (4.4) and the target outcome s˜ = (s˜1 , s˜2 , . . . , s˜N , ) constitute an Stackelberg equilibrium in the energy-harvesting RAC 21.
(30) game under the intervention scheme ΓEH−I . g Proof. We prove the target outcome constitutes a Stackelberg equilibrium by showing that it constitutes a Nash equilibrium for every device. Then we show that the utility of the system manager is also maximized. Let the strategy of devices except for i choose the strategy profile s˜−i . We aim to prove the strategy leading to the highest utility to device i is si = s˜i . Therefore, no device will deviate and ˜s = (˜ s1 , s˜2 , . . . , s˜N ) constitutes a Nash equilibrium. The utility of device i under the intervention g ∗ is sj p j ). ui (g ∗ , s) = si pj (1 − g ∗(s)) j=i (1 − M ⎧ ⎪ ⎪ 0, if si > 2s˜i ⎪ ⎪ ⎨ sj p j si = s i pi (2 − s˜i ) j=i (1 − M ), if s˜i ≤ si ≤ 2s˜i ⎪ ⎪ ⎪ ⎪ ⎩ si pi (1 − sj pj ), if si < s˜i j=i M In the first case si > 2s˜i , the level of intervention is g ∗ (si , s˜−i ) = 1 from Eq. (4.4) and every device obtains zero utility. In the second case s˜i ≤ si ≤ 2s˜i , the level of intervention is g ∗ (si , s˜−i ) =. si −s˜i . s˜i. In the third case si < s˜i , the level of intervention is. g ∗(si , s˜−i ) = 0. The utility of device i increases on si < s˜i , reaches the maximum at si = s˜i , then decreases on s˜i ≤ si ≤ 2s˜i , and stays at 0 on si > 2s˜i . Therefore, the strategy si = s˜i is the only strategy that maximizes the utility. Finally, when every device chooses the target outcome, the intervention function becomes zero and the system manager obtains the highest utility u0 (g ∗, ˜s) = 1 − g ∗(˜s) = 1. Therefore, we prove that the intervention function g ∗ and the strategy profile ˜s constitute a Stackelberg equilibrium. In Fig. 4.2, we show the Nash equilibrium under the intervention scheme. The target outcome (0.3, 0.7) is exactly the Nash equilibrium. Note that the energy-harvesting RAC game under the intervention scheme is not a potential game since the condition (3.1) is. 22.
(31) 1 Best Response of Device 2 Best Response of Device 1. 0.9 0.8. Strategy of Device 2. 0.7 0.6 0.5. The equilibrium point. 0.4 0.3 0.2 0.1 0. 0. 0.2. 0.4 0.6 Strategy of Device 1. 0.8. 1. Figure 4.2: The equilibrium under the intervention scheme is exactly identical to the desired one.. not satisfied. Therefore, we cannot use the potential function. Instead, we derive the Nash equilibrium by deriving the intersection of best response functions of two devices. So far we have proposed two methods for achieving the target outcome. In the following section, we will discuss two target outcomes that we aim to achieve: the proportionally fair outcome, and the social optimal outcome.. 23.
(32) Chapter 5 The Desired Outcome In the previous chapter, we have proposed two incentive schemes to induce the system to the target outcome. With either the pricing scheme or the intervention scheme, the system manager can achieve any feasible outcome it desire. In this chapter, we are going to investigate the outcomes that the system manager desire. We always want to fulfill the demand of users in wireless systems. However, this is a dream that rarely happens in reality due to the lack of system resource. In a system where limited resource cannot suffice users’ demand, we have to make an optimal allocation. The definition of optimal allocation varies with the purpose of the system applications. We adopt two most common kinds of optimal allocations: the proportionally fair outcome, and the social-optimal outcome.. 5.1 The Proportionally Fair Outcome First, we introduce the proportionally fairness. The proportionally fair outcome maximizes the overall utility while at the same time allow all devices at least the minimal level of service.. 24.
(33) Definition 5.1. A strategy profile s∗ is proportionally fair if si ≥ 0 for all i and ui (s) − ui (s∗ ) ui (s∗ ). i. ≤ 0, ∀i,. (5.1). for any other s ∈ S [20], or equivalently, s∗ is the solution for max s. . log(ui (s)).. (5.2). i. The condition 5.1 indicates that the deviation in strategy from the proportionally fair outcome s∗ increases some utilities and decreases some other utilities, but the summation of these difference is negative. Moreover, the player whose utility is originally small has larger weight factor in the summation, so it guarantees the fairness of the system. The conditions in Definition 5.1 are equivalent since s∗ is the solution for (5.2) if and only if log(ui ) decreases around the point s, that is, . (ui (s) − ui (s∗ )). i. ui (s) − ui (s∗ ) ∂log(ui (s)) |s=s∗ = ≤ 0, ∗) ∂s u (s i i. which coordinates with the condition (5.1). Theorem 5.1. The proportionally fair outcome (the solution to the (5.3a)) is s∗i =. M . N pi. Proof. To find the proportionally fair outcome, we have to solve the following maximization problem.. max s. . log(si pi. i. j. (1 −. sj pj )) M. (5.3a). subject to si ∈ [0, 1]. (Strategy Space). 25. (5.3b).
(34) First, we rephrase the objective function (5.3a) max s. = max s. = max s. = max s. i. . sj pj ) log si pi (1 − M j=i log(si pi ) +. . i. i. i. i. log(1 −. j=i. log(si pi ) + (N − 1). . sj pj ) M. log(1 −. i. si pi ) M. si pi N −1
(35) log si pi (1 − ) M. = max si pi (1 − si. si pi N −1 , ) M. which becomes a maximization problem with single variable. We use the first-order condition i pi N −1 ∂si pi (1 − sM ) =0 ∂si. to derive the maximum point s∗i =. M . Npi. To verify this is a maximum point, we use the second-order differentiation: i pi N −1 ) ∂ 2 si pi (1 − sM 1 −p2i N ∗ = (1 − )N −2 < 0 | s =s i 2 i ∂si M N. Therefore, s∗i =. M N pi. is a solution for (5.3a) and s∗ is the proportionally fair outcome.. From this theorem, the proportional fair strategy profile enables the device with low pi to choose a higher transmission probability, and, on the contrary, enables the device with high pi to choose a lower transmission probability. The strategy is also proportional to the number of transmission resource and inversely proportional to the number of device. The devices have to choose lower transmission probabilities when the number of devices are too many, while they can choose higher transmission probabilities when the number. 26.
(36) of transmission resource is abundant. Denote the fairness index by the value of . log(ui (s)).. i. We can further prove that the value of fairness index is independent of the distribution of harvesting probability p = (p1 , p2 , . . . , pN ). Theorem 5.2. The value of fairness index is a function of the number of device N and the number of transmission resource M, that is, . log(ui (s)) = N(N − 1) log(M(N − 1)) − N 2 log(N),. i. for given N, M. By definition, we have Proof. . log(ui (s)) =. i. . log(. M 1 (1 − )) N j N. log(. M N − 1 N −1 ( ) ) N N. log(. M(N − 1)N −1 ) NN. i. =. i. =. i. = N log(. M(N − 1)N −1 ) NN. = N(N − 1) log(M(N − 1)) − N 2 log(N). and complete the proof. This theorem shows that the fairness index is independent of the distribution of energy harvesting probability of the devices in the system. Even if the devices have higher (or lower) chance to harvest energy, the fairness remains the same. 27.
(37) 5.2 The Social Optimal Outcome In this section, we derive another optimal outcome that maximizes the overall utility of the system. The outcome is called social optimal since the overall utility represents the social welfare. To derive the social optimal outcome, we have to solve the maximization problem as follows.. max. . s. ui (s). subject to. i. si ∈ [0, 1]. (Strategy Space). (5.4a). Theorem 5.3. Without loss of generality, we let p1 > p2 > . . . > pN . Then the strategy profile s1 = s2 = . . . = sk = 1 and sk+1 = . . . = sN = 0 with the maximal positive integer k such that . (1 −. i≤k. 1 pi pj )− pi (1 − ) > 0 M M i≤k j=i M. (5.5). j≤k. is the social optimal strategy set. Proof. First, we prove that in the social strategy set, each device’s strategy is either si = 1 or si = 0. Secondly, we prove that the social optimal strategy consists of consecutive devices from 1 to k. Thirdly, we prove that the value k is the maximal positive integer such that Eq. (5.5) holds. First, to prove that each device’s strategy is either si = 1 or si = 0, we check the first-order derivative of the objective function (5.4a) ∂. i si pi. . j (1. ∂si. −. sj p j ) M. = pi. pi sj pj sk pk )− ), (1 − sj pj (1 − M M M j j=i k=i,j. which is independent with si . Since the first-order derivative is a constant, the optimal. 28.
(38) point is located at the boundary. If the first-order derivative is positive, the optimal point is s∗i = 1. On the contrary, if the first-order derivative is negative, the optimal point is s∗i = 0. For notational convenience, we denote the set containing all devices whose si = 1 by Ω . On the contrary, the complementary set N \ Ω consists of all devices whose si = 0. Therefore, for every possible strategy set s = {s1 , s2 , . . . , sN }, there must exist a corresponding one-to-one set Ω. Then the strategy profile in the statement of theorem can be rephrased to that the device set Ω = {1, 2, . . . , k} is the social optimal solution. After that, now we can define Δ(Ω) as the social welfare given the set Ω as follows. Δ(Ω) ≡. . pi. i∈Ω. . (1 −. j=i. pj ) M. Note that only devices in the set Ω choose si = 1. Besides, if a new device i is added, the new social welfare becomes Δ(Ω ∪ {i}) = Δ(Ω)(1 −. pi pj ) + pi (1 − ). M M j∈Ω. (5.6). The first term on the right hand side is the social welfare deduction (within those original devices except for device i) since the probability of collision increases if device i joins in the set Ω, and the second term is the social welfare addition since the utility of device i also contributes to the social welfare. Secondly, we use the contradiction to prove that the social optimal strategy consists of consecutive number of devices from 1 to k. Let k be the largest number in the set Ω. If Ω does not contain consecutive number of devices, there exists a value h < k such that h
(39) ∈ Ω. Construct two new sets Ω = Ω \ k and Ω = Ω ∪ h. We show that the social welfare of new set Δ(Ω ) is higher than Δ(Ω) by comparing Δ(Ω) = Δ(Ω ∪ {k}) = Δ(Ω )(1 −. pk pj ) + pk (1 − ) M M j∈Ω. 29.
(40) and Δ(Ω ) = Δ(Ω ∪ {h}) = Δ(Ω )(1 −. ph pj ) + ph (1 − ). M M j∈Ω. We can find that. Δ(Ω ) = Δ(Ω )+ph. j∈Ω. Δ(Ω ) pj (1 − ) − M M . > Δ(Ω )+pk. . j∈Ω. Δ(Ω ) pj (1 − ) − M M . since ph > pk . In other word, if Ω does not contain consecutive number of devices, we can remove the device with the largest number and add one with smaller number to increase the social welfare. Therefore, the set Ω cannot be the social optimal solution. For any set Ω that does not contain consecutive number of devices, there exists a new set Ω that contains consecutive number of devices bring higher social welfare. A social optimal strategy set must contain consecutive number of devices. Lastly, from Eq. (5.6), we derive the difference between the social welfare before and after device i is added. Δ(Ω ∪ {i}) − Δ(Ω) = pi (. . (1 −. j∈Ω. Δ(Ω) Δ(Ω) pj si pi )− ) = pi ( (1 − )− ). M M M M i. Device i would be included in the social optimal outcome if and only if the difference remains positive, or equivalently, Eq. (5.5) is positive. We complete the proof.. Different from the proportionally fair strategy profile, the social optimal strategy profile only enables the device with higher harvesting probability to choose the highest transmission probability, while prohibits others from transmitting. In view of maximizing the system performance, this is the best way to allocate the limited resource to the better devices. The theorem also indicates what the number of device in the social optimal strategy profile is. To express in a practical form, we design an algorithm to implement the social optimal outcome as follows. The algorithm is used in the simulation. 30. = Δ(Ω)..
(41) The Social Optimal Outcome 0.7. Sum of Utility. 0.6 0.5 0.4 0.3 0.2 1 1 0.8. 0.5. 0.6 0.4. Strategy of Device 3. 0. 0.2 0. Strategy of Device 2. Figure 5.1: An example of social optimal outcome: device 2 has to choose s2 = 1 and device 1 has to choose s1 = 0.. 1.4. 1.2. Social Welfare. 1. 0.8. 0.6. 0.4. Social Optimal Outcome Random Outcome. 0.2. 0. 0. 200. 400 600 Random Trial. 800. 1000. Figure 5.2: The social optimal outcome leads to the highest social welfare, compared with other outcomes that are randomly generated.. 31.
(42) Algorithm I: S OCIAL O PTIMAL A LGORITHM (p, N, M) p ← D ESCENT S ORT (p) for k ← 2 to N ⎧ ⎪ pi ⎨if i≤k (1 − M )− do ⎪ ⎩ then break. 1 M. i≤k. pi. . j=i j≤k. (1 −. pj ) M. <0. return (k). The algorithm first sorts the devices with respect to their energy harvesting probability in descent order. Then it adds the device with the largest pi in the list, and then the second largest one, and so one. The process terminates if the condition cannot be satisfied. In Fig. 5.1, we use a simple system with four devices to show the social optimal outcome. We calculate that the social optimal outcome is (s1 , s2 , s3 , s4 ) = (1, 1, 0, 0). Because of the constraint of figure dimension, Fig. 5.1 illustrates the effect of strategy of device 2 and device 3 on the social welfare. Apparently, to maximize the social welfare, device 2 has to choose s2 = 1 and device 3 has to choose s3 = 0. On the other hand, Fig. 5.2 shows that the outcome derived by Algorithm I indeed leads to an outcome whose social welfare is maximized.. 5.3 Adopting the Incentive Schemes to Achieve the Optimal Outcomes After proposing two incentive schemes (the pricing scheme and the intervention scheme) and deriving two optimal outcomes (the proportionally fair outcome and the social optimal outcome), we start to adopt the incentive schemes to find the optimal outcomes in this section. There are four kinds of combination. Theorem 5.4. To achieve the proportionally fair outcome by using the pricing scheme,. 32.
(43) we have to set the price function ci (si , pi ) which satisfies ∂ci (si , pi ) N − 1 N −1 ) | s i = M = pi ( , ∀i Np ∂si N i Proof. Adopting s∗i =. M N pi. in Theorem 4.1, we complete the proof.. Theorem 5.5. To achieve the social optimal outcome by using the pricing scheme, we have to set the price function ci (si , pi ) which satisfies ∂ci (si , pi ) pi |si =1 = pi (1 − )N −1 , ∀i ≤ k ∂si M and ∂ci (si , pi ) |si =0 = pi , ∀i > k ∂si where the value k is determined by Theorem 5.3. Proof. Adopting s∗i = 1 for all i ≤ k and s∗i = 0 for all i > k in Theorem 4.1, we complete the proof. Theorem 5.6. To achieve the proportionally fair outcome by using the intervention scheme, we have to set the intervention function ∗. g (s) = [. N si − i=1. Proof. Adopting s∗i =. M N pi. N M pi 1 ]0 N M pi. in Theorem 4.2, we complete the proof.. Here we have presented the way to implement proportionally fair outcome and social optimal outcome by using the pricing scheme and the intervention scheme. Note that theoretically the social optimal outcome cannot be implemented by the intervention scheme, since the intervention function is not well-defined if the element of target outcome si is zero. However, one can still implement the social optimal outcome by choosing infinitesimal strategy that is extremely close to zero. That is, with an arbitrarily chosen δ → 0, we can 33.
(44) construct a sub-optimal strategy si = 1 for i ≤ k and si = δ for i > k. From Theorem 4.2, we know the sub-optimal strategy constitutes a NE, and leads to social welfare equal to δ. SW =. k i=1. N pj δpj pj δpj )+ ). pi (1 − ) (1 − δpi (1 − ) (1 − M j=i M M j=i M j=i i=k+1 j=i. j≤k. j>k. j≤k. j>k. On the other hand, the optimal strategy leads to social welfare equal to. SW 0 =. k i=1. Since δ → 0, the ratio. SW δ SW. SW ≈ SW 0. j=i j≤k. (1 −. pj ). M. can be approximated as. k δ. pi. i=1. pi. . pj j=i (1 − M ) j=i (1 j≤k j>k k pj j=i (1 − M ) i=1 pi j≤k. −. δpj ) M. ,. which is bounded by (1 −. SW δ δ N −k ≤ ≤ 1. ) M SW 0. Even in the worst case, the social welfare of sub-optimal strategy is (1 −. δ N −k ) M. of. optimal strategy. We can conclude that, by using infinitesimal strategy to achieve the social optimal under the intervention scheme, the social welfare decreases by a factor of (1 −. δ N −k ) M. at most. The larger δ we choose, the more social welfare we lose.. In implementing these two incentives, we have to bear in mind that these two schemes have a fundamental difference. The pricing scheme charges different pricing to the devices. The BS has to design proper pricing functions to achieve the target outcome. On the contrary, the intervention scheme imposes the same level of intervention on the devices, which can be implemented by dropping a certain percentage of packets that the BS receives. Therefore, to implement the pricing scheme, we have to install the pricing program on the device side. On the other hand, to implement the intervention scheme, 34.
(45) we have to install the intervention program on the BS side. It is more recommended to implement the intervention scheme since the change on the BS side is usually easier.. 35.
(46) Chapter 6 Extension: Multi-Period Model In the previous section, we construct an one-shot game model and propose two incentive schemes to achieve the target equilibrium. In this section, we start to construct an extension model that considers multi-period energy-harvesting WSN model. Since the energy can be stored for the future usage, the devices face new trade-off in the multi-period energyharvesting WSN.. 6.1 Model Setting Adopting the notations from the previous model, there are N devices competing for M transmission channel. The collision occurs if there is more than two device requesting for the same channel. We consider the slotted-time model. There are T periods in each of which the energy arrives with different probability. Since the value of information increases with the number of transmission success, the devices aim to maximize the number of transmission success during these T periods. At the period 0, the device i chooses a transmission probability si which cannot be changed during these T period. The probability of harvesting energy varies with the time, which is pti at the period t. The energy capacity is also another important issue. However, to keep the result 36.
(47) tractable, we consider a simple scenarios where the devices can store at most one unit of energy.. 6.2 The Probability of Harvesting Energy Since the energy can be stored for future usage, the probability of harvesting energy at period t depends on the strategy of previous periods. To clarify, we denote the harvesting probability that the device harvests a unit of energy at the period t by pti , while the powered-up probability that the device has enough energy to transmit at the period t by qit . Notice that qit > pti since if the device has enough energy to transmit (i.e., the device is powered-up), the energy sources from either the harvested energy at period t, or the energy stored in the previous periods. For example, at the period 1 qi1 = p1i and at the period 2 qi2 = 1 − (1 − p2i )(1 − qi1 (1 − si )). The device has one unit of energy at period 2 if it harvests a unit of energy at period 2 or it has harvested energy in the past and did not use it. Therefore, the general form of qit at period t is qit = 1 − (1 − pti )(1 − qit−1 (1 − si )).. (6.1). We find that qit is a recursive function of qit−1 . By rearranging the recursive function (Eq. (6.1)), we can derive qit in form of (p1i , p2i , . . . , pti ) and si . Lemma 6.1. The probability that the device has enough energy to transmit at period t can be expressed as qit =. t n=1. pni. t . (1 − pm i )(1 − si ). m=n+1. 37. (6.2).
(48) Proof. First, we rephrase Eq. (6.1) to qit = pti + (1 − pti )(1 − si )qit−1. We define the summation factor bt ≡ (1 − pti )(1 − si ). and divide. t m=2. bm from the both sides.. t. qit. m=2. bm. qit−1 = t−1 m=2. bm. + t. pti. m=2. bm. After summing up both sides of equation, we have. t. qit bm. m=2. =. qi1. +. t . n. n=2. pni. m=2. bm. or equivalently,. qit. =. qi1. t . m. b +. m=2. t . pni. n=2. t . m. b. m=n+1. =. t . t . pni. n=1. m. b. .. m=n+1. Replacing bt with (1 − pti )(1 − si ), we derive the solution.. qit =. t . n=1. pni. t . (1 − pm i )(1 − si ). m=n+1. From this lemma, we can determine qtt from the previous probability of energy harvesting (p1i , p2i , . . . , pti ) and the strategies si. 38.
(49) 6.3 Nash equilibrium: Intersection of Best Response In this section, we provide a mathematical solution for the Nash equilibrium of multiperiod model. We first introduce the best response, that is, the strategy that produces the most favorite outcome for a device, taking others’ strategies as given. Therefore, if all the devices select the best response to others’ players’ strategies, it is a Nash equilibrium. From Lemma 6.2, we have. qit. =. t . pni (1. − si ). t . t−n. n=1. (1 −. pm i ). m=n+1. and the multi-period game can be formulated as following.. ΓM P = N , (Si )i∈N , (uM i )i∈N where the utility is the summation of probability of successful transmission. uM i =. T . si qit. t=1. sj qjt ). (1 − M j=i. First, we prove that the Nash equilibrium exists in the game ΓM P . Lemma 6.2. The pure-strategy Nash equilibrium exists if the strategy sets Si are nonempty compact convex subsets of a Euclidean space and the utility functions ui are continuous in si and quasi-concave in si . [21] Theorem 6.3. In the multi-period game ΓM P , the pure-strategy Nash equilibrium exists. Proof. First, the strategy set Si in the multi-period game ΓM P are nonempty compact convex subsets of a Euclidean space. And the utility functions ui are continuous in si . Second, we prove the utility function is a concave function with respect to the strategy si . Replacing qit with Eq. (6.2) and sti = si for all period t, we find the utility uM i. =. T t=1. si. t n=1. pni (1. − si ). t . t−n. m=n+1. 39. (1 −. pm i ). sj qjt ) (1 − M j=i.
(50) is a polynomial function of variable si which can rephrased as uM i. = si. T −1 . Cm,i (1 − si )m. m=0. where the constant. Cm,i. m −m sj qjm+1 T t ) = (1 − pi (1 − pt+n ) i M t=1 n=1 j=i. is independent with the strategy si . Since si ∈ [0, 1], we verify the dual function u¯M i. = (1 − si ). T −1 . Cm,i sm i ,. m=0. ¯M which has the same concavity (or convexity) with uM i , to prove u i is concave by using the second-order derivative T −1 T −1 ∂ 2 u¯M i m−1 = −2 mCm,i si + (1 − si ) Cm,i m(m − 1)sm−2 i ∂s2i m=1 m=2. < =. T −1 m=2 T −1 . (6.3). (−2si + (m − 1)(1 − si )) Cm,i msm−2 i ((m − 1) − (1 + m)si ) Cm,i msm−2 i. m=2. = 2(1 − 3si )C2 − T (T − 1)CT −1 sTi −2 +. T −1 . ((m − 1) − (1 + m − 1)) Cm,i msm−2 i. m=3. <. T −1 . ((m − 1) − (1 + m − 1)) Cm,i msm−2 i. m=3 T −1 . =−. Cm,i msm−2 i. m=3. <0. That is, both dual function u¯i and the utility function ui are concave functions in si . Therefore, the multi-period game ΓM P has a pure-strategy Nash equilibrium.. 40.
(51) Theorem 6.4. The strategy profile s∗ = (s∗1 , s∗2 , . . . , s∗N ) such that si = [¯ s∗i ]10 constitutes a Nash equilibrium in ΓM P , where (¯ s∗1 , s¯∗2 , . . . , s¯∗N ) solves the following simultaneous equations. T −1 . (1 −. ∗ s¯∗i )m Cm,i. −. m=0. T −2 . ∗ s¯∗i (m + 1)(1 − s¯∗i )m Cm+1,i = 0, ∀i. (6.4). m=0. where ∗ Cm,i. . −m m s¯∗j qjm+1 T t ) = (1 − pi (1 − pt+n ) i M t=1 n=1 j=i. (6.5). Proof. Since the utility function uM i = si. T −1 . Cm,i (1 − si )m. m=0. is concave in si , we can use the first-order derivative to derive the best response T −1 T −1 ∂uM i = Cm,i (1 − si )m − si mCm,i (1 − si )m−1 ∂si m=0 m=1. We show that the first order condition (the best response function) is T −1 . ∗ Cm,i (1. m=0. −. s∗i )m. −. s∗i. T −1 . ∗ mCm,i (1 − s∗i )m−1 = 0, ∀i.. m=1. Denote the solution of the first order conditions by (¯ s∗1 , s¯∗2 , . . . , s¯∗N ), which may violate the constraint of strategy space [0, 1]. However, due to the convexity of utility function, the maximal value locates at the boundary of strategy space (i.e., si = 0, 1). Therefore, s∗i ]10 . the best response is the transacted value of the solution above, that is, si = [¯. 6.4 Case Study In this section, we study a special case where T = 2 and N = 2. Denote the devices by 1 and 2. Then we can rephrase Eq. (6.4) as 41.
(52) C0,i + (1 − si )C1,i = si C1,i or equivalently, (2si − 1)C1,i = C0,i for i = 1, 2. Replacing Cm,i with Eq. (6.5), we have (2si − 1)p1i (1 − p2i )(1 −. sj qj2 sj qj1 ) = (p1i + p2i )(1 − ) M M. and then replace qj1 = p1j and qj2 = p1j (1 − p2j )(1 − sj ) + p2j ,. we have the following two simultaneous equations to solve ⎛ 1 2 ⎜ (2s1 − 1)p1 (1 − p1 )(1 − ⎝ (2s2 − 1)p12 (1 − p22 )(1 −. s2 (p12 (1 M. − p22 )(1 − s2 ) + p22 )) = (p11 + p21 )(1 −. s2 p12 ) M. s1 (p11 (1 M. − p21 )(1 − s1 ) + p21 )) = (p12 + p22 )(1 −. s1 p11 ) M. which can be rephrased to ⎛ 2 ⎜ (2s1 − 1)a1 (1 + a2 s2 + a3 s2 ) = a4 s2 + a5 ⎝ (2s2 − 1)b1 (1 + b2 s21 + b3 s1 ) = b4 s1 + b5. 42.
(53) where a1 = p11 (1 − p21 ), p12 (1 − p22 ) , M p1 (1 − p22 ) + p22 a3 = − 2 , M (p1 + p22 )p11 a4 = − 2 , M. a2 =. a5 = (p12 + p22 ) and b1 = p12 (1 − p22 ), p11 (1 − p21 ) , M p1 (1 − p21 ) + p21 b3 = − 1 , M (p1 + p21 )p12 b4 = − 1 , M b2 =. b5 = (p11 + p21 ) By assuming that these two devices locate in the same environment, we have pt1 = pt2 for t = 1, 2 and therefore ai = bi for i = 1, 2, 3, 4, 5. By the property of symmetry, we know that the solution s1 = s2 = s. Then we can simplify the problem into. (2s − 1)b1 (1 + b2 s2 + b3 s) = b4 s + b5 or equivalently, 2b1 b2 s3 + b1 (2b3 − b2 )s2 + (b1 (2 − b3 ) − b4 )s − b1 − b5 = 0,. which is a cubic function that can be solved by the general formula.. 43.
(54) Chapter 7 Numerical Results In this chapter, we build EH-RAC systems and examine the effect of parameter change on the system performance (i.e., social welfare and fairness index). Besides, we also numerically verify the case study in the multi-period EH-RAC game. By the simulation, we can gain more insight about the case study.. 7.1 System Performance With Different Parameters We discuss how the device parameters affect the system performance. In Fig. 7.1(a) and Fig. 7.1(b), we build a system with N = 100 devices and M = 5 transmission channels. There are 9 groups of device whose average harvesting probability is 0.1, 0.2, . . . , 0.9 respectively. The harvesting probability within each group follows the normal distribution N(0.1, 0.1), N(0.2, 0.1), . . . , N(0.9, 0.1) respectively. In Fig. 7.1(a), the social welfare of social optimal outcome is superior to that of a random outcome. Any random outcome decreases in social welfare since the mutual interference increases when the average energy harvesting probability increases. The social welfare of social optimal outcome also increases with the average energy harvesting probability of group, since only some specific devices are allowed to transmit in the social optimal algorithm. When the energy harvesting probabilities of those devices increase, their transmission success rate increases, which improves social welfare. On the other 44.
(55) 2.5. −300 −400 −500 Fairness Index. Social Welfare. 2. 1.5 Social Optimal Random. 1. −600 Proportionally Fair Random. −700 −800 −900 −1000. 0.5. −1100 0 0.1. 0.2. 0.3. 0.4 0.5 0.6 0.7 Group Average EH Probability. 0.8. −1200 0.1. 0.9. (a) The Social Optimal Outcome. 0.2. 0.3. 0.4 0.5 0.6 0.7 Group Average EH Probability. 0.8. 0.9. (b) The Proportionally Fair Outcome. Figure 7.1: The social optimal outcome and the proportionally fair outcome under different choices of parameters.. 50. 2.2 2 1.8 Social Welfare. Social Welfare. 40. 30. 20. 1.6 1.4 1.2 1. 10. 0.8 0.6. 0. 0. 100. 200 300 400 Number of Tranmission Resource. 500. 0. (a) Change in Number of Transmission Resource. 10. 20 30 Number of Device. 40. 50. (b) Change in Number of Device. Figure 7.2: The value of social welfare changes with the number of transmission resource and the number of resource.. hand, in Fig. 7.1(b), the fairness index of proportionally fair outcome is superior to that of a random outcome, and it also remains constant for each group. As we prove in Theorem 5.2, the value of fairness index is a function of N and M. Since each group has the same parameters N and M, the fairness indexes are identical. In Fig. 7.2(a), the value of social welfare increases with the number of transmission resource. When the number of transmission resource is below the device number, social welfare increases sharply mainly because the number of transmitting device is increasing. Later, when the number of transmission resource is above the device number, the number of transmitting device remains the same, so social welfare fare increases only because the average energy harvesting probability increases.. 45.
(56) 50 0. 0 Fairness Index. Fairness Index. −100. −200. −300. −50. −100. −400. −150 −500. −600. −200 0. 50. 100. 150 200 250 300 350 Number of Tranmission Resource. 400. 450. 500. (a) Change in Number of Transmission Resource. 0. 10. 20 30 Number of Device. 40. 50. (b) Change in Number of Device. Figure 7.3: The value of fairness index changes with the number of transmission resource and the number of resource.. In Fig. 7.2(b), the value of social welfare increases with the number of device, but saturates when the number of device is too large. In the increasing region, the system still has capacity to accommodate more devices and the social welfare still has room to increase. However, in the saturation region, the transmission resource is fully utilized and the system cannot accommodate any more device. Therefore, the social welfare stop increasing. In Fig. 7.3(a), the value of fairness index increases with the number of transmission resource. If the system has more transmission resource, then the contention among devices can be reduced and the outcome can be more fair. The reason is similar to the previous one. On the other hand, in Fig. 7.3(b), the value of fairness index decreases with the number of device. If there are more devices in the system, then the problem of lack of transmission resource would become more severe, and the fairness becomes lower.. 7.2 Simulation of The 2-Period 2-Device EH-RAC Game We take advantage of computer simulation to observe the case discussed in the case study of multi-period game. We choose T = 2, N = 2, and M = 1. The harvesting probabilities are (p11 , p21 ) = (p12 , p22 ) = (0.2, 0.5). The simulation algorithm is shown in Algorithm I.. 46.
(57) In Fig. 7.4(a), the utility function of device 2 increases along the strategy of device 1 but decreases along the strategy of device 2. The utility function is a concave function and has maximal point at s∗2 = 1 for any s1 . Since the concavity of utility function is not obvious in Fig. 7.4(a), we derive its first-order differentiation (Fig. 7.4(b)) and find that the value strictly decreases along the strategy of device 2. Remember that if the first-order differentiation is strictly decreasing, then the function is concave. Moreover, from the previous analysis, we have b1 = b2 = 0.1, b3 = −0.6, b4 = −0.14, b5 = 0.7. Then the solution of 0.02s3 − 0.13s2 + 0.4s − 0.8 = 0, is s = 4. Since the utility function u1 is concave in s1 , u1 is monotonically increasing for s ∈ [0, 4]. Due to the constraint of strategy space, the device strategy will choose si = 1 for i = 1, 2, that is, the strategy profile (s1 , s2 ) = (1, 1) is the Nash equilibrium, as shown in 7.5. Algorithm I: T WO -P ERIOD T WO -P LAYER NASH E QUILIBRIUM (p, M, T ) for i ← 0 to 100 ⎧ ⎪ i ⎪ si ← 100 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ for j ← 0 to 100 ⎪ ⎪ ⎧ ⎪ ⎪ ⎪ j ⎪ ⎪ ⎪ sj ← 100 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ for t ← 1 to T ⎪ ⎪ ⎧ ⎪ do ⎪ ⎪ ⎪ ⎪ m ⎨ ⎨q t = t pn (1 − si )t−n t ⎪ ⎪ i n=1 i m=n+1 (1 − pi ) ⎪ ⎪ do do ⎪ . ⎪ ⎪ ⎪ ⎪ ⎪ ⎩qjt = tn=1 pnj (1 − sj )t−n tm=n+1 (1 − pm ⎪ ⎪ ⎪ j ) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ t ⎪ T ⎪ sj q j ⎪ ⎪ t ⎪ ⎪ uM ⎪ ⎪ i (si , sj ) = t=1 si qi (1 − M ) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩uM (s , s ) = T s q t (1 − si qit ) ⎩ i j j t=1 j j M BRi (sj ) = arg maxsi uM i (si , sj ) BRj (si ) = arg maxsj uM j (si , sj ). 47.
(58) 1.5. 1. 0.5. 0 1 1. First−Order Differentiation of u1. Utility of Device 1. 2. 0.08. 0.06 1. 0.04. 0.02 1. 0.5. 0.8. 0.5. 0.8. 0.6. 0.6. 0.4 Strategy of Device 1. 0. 0.4. 0.2 0. 0.2 0. Strategy of Device 1. Strategy of Device 2. (a) The utility function. 0. Strategy of Device 2. (b) The first-order differentiation. Figure 7.4: The utility function u1 is a concave function increasing along the strategy s1 .. 1. 0.8 Strategy of Device 1. Nash Equilibrium. 0.6. Best Response of Device 1 Best Response of Device 2. 0.4. 0.2. 0. 0. 0.2. 0.4. 0.6 Strategy of Device 2. 0.8. 1. 1.2. Figure 7.5: The Nash equilibrium in the multi-period model. 7.3 Extended Simulations of Multi-period Multi-device EH-RAC Game To seek the insight of Nash equilibrium in multi-period energy-harvesting RAC game, we use the computer simulation to show how the energy harvesting probabilities affect. p 1i. p 2i. p 1i (1-s)i Period 1. Period 2. Figure 7.6: The energy relation of two-period model 48.
相關文件
• The burst profile to use for any uplink transmission is defined by the Uplink Interval Usage Code (UIUC).. – Each UIUC is mapped to a burst profile in the
一、訓練目標:充分了解在自動化 機械領域中應用 Arduino 控制,進 而能自行分析、設計與裝配各種控
了⼀一個方案,用以尋找滿足 Calabi 方程的空 間,這些空間現在通稱為 Calabi-Yau 空間。.
2.1.1 The pre-primary educator must have specialised knowledge about the characteristics of child development before they can be responsive to the needs of children, set
Reading Task 6: Genre Structure and Language Features. • Now let’s look at how language features (e.g. sentence patterns) are connected to the structure
• ‘ content teachers need to support support the learning of those parts of language knowledge that students are missing and that may be preventing them mastering the
Root the MRCT b T at its centroid r. There are at most two subtrees which contain more than n/3 nodes. Let a and b be the lowest vertices with at least n/3 descendants. For such
Salas, Hille, Etgen Calculus: One and Several Variables Copyright 2007 © John Wiley & Sons, Inc.. All