X
q=t
βq−tEvg(atg, Σg, estg, τgt)|Σg, estg, τgt , (4.9)
whereβ, 0 ≤ β ≤ 1, is the discount factor.
Lastly, we define the (long-term) system data fidelity:
Definition 4.3. [Long-term system data fidelity] The system data fidelity seen in time slot t is
X
g∈G
∞
X
q=t
βq−tEvg(atg, Σg, estg, τgt)|Σ, est, τt . (4.10)
Maximizing the system data fidelity will be the main design objective as will be shown in Chapter 4.4.
4.4 Node Activation Mechanism
In the proposed framework, the BS makes the node activation decision to maximize the system data fidelity. As the data statistics and the energy states are only known to the
(QHUJ\6WDWH
7UDQVLWLRQ'\QDPLFV 9DOXH,WHUDWLRQ
1RGH$FWLYDWLRQ
3ROLF\
6HUYHU3ULFLQJ
3ROLF\
0HDVXUHPHQW6HQVLQJ
1RGH$FWLYDWLRQ3ULFHV 8SGDWHRI(QHUJ\6WDWHVLQ
(DFK7LPH6ORW 3URSRVHG1RGH$FWLYDWLRQ0HFKDQLVP
)HHGEDFN
1RGHV %6
$FWLYDWLRQ
,QWHUYDO
6HUYHUV
'DWD6WDWLVWLFV )HHGEDFN
(QHUJ\6WDWHV
Figure 4.2: The block diagram of the proposed node activation mechanism. The servers and the nodes are required to feedback the data statistics and the energy states, respec-tively. The BS makes the node activation decision and also charges the servers prices.
servers and the nodes, the BS requires the feedback of the data statistics and the energy states. Since the servers and the nodes (owned by the servers) are selfish in nature, they may falsely feedback to manipulate the BS’s node activation decision if doing so increases their own performance. The overall system performance may become undesirable. There-fore, we design a node activation mechanism that can induce truthful feedback.
The proposed node activation mechanism is shown in Fig. 4.2. It consists of a node activation policy and a pricing policy. The node activation policy decides which nodes to be activated in each time slot. It is designed to maximize the system data fidelity.
The pricing policy decides how the BS charges each server in each time time slot. It is designed to ensure truthful feedback and achieve several desirable properties. Also, a value iteration algorithm is proposed to derive the optimal solutions of the resource allocation policy and the pricing policy. The details are specified as below.
4.4.1 Feedback Strategies of the Servers
Each serverg is required to feedback the data statistics Σg in each time slot. Though the data statistics will not vary with time, we require the feedback in each time slot is for the following reasons: Once the feedback is untruthful in a time slot, the feedback in the future time slots can still be truthful. As we will propose a node activation policy and a pricing policy both based on the current feedback, the untruthful current feedback will not affect the future node activation and pricing. On the other hand, each node(i, g) is required to feedback the energy stateesti,g in each time slot. We assume that each server makes a feedback strategy that decides how itseld and its nodes feedback the data statistics and the energy states. To be specific, when the mechanism starts, each serverg decides a feedback strategysg = ( ˆΣhtig , ˆeshtig ) upon which server g feedbacks the data statistics Σˆtg and each node (i, g) feedbacks the energy state ˆesti,g in each time slott. A truthful feedback strategy meanss∗g = (Σhtig , eshtig ) where Σtg = Σg ∀ t. The collection of the feedback and the activation intervalsτhq≤t−1i(no feedback is required as fully controlled by the BS) up to time slot (t − 1) is called the history in time slot t and denoted by ht= ( ˆΣhq≤t−1i, ˆeshq≤t−1i, τhq≤t−1i). The feedback data statistics ˆΣtgand energy stateesˆti,g should be functions of the historyht, the true data statisticsΣg, and true energy stateesti,g. However, we omit the complex notations and just use ˆΣtgandesˆti,g for short.
4.4.2 Node Activation Policy and Value Iteration
The system is Markov due to the Markov energy state and activation interval transitions.
The BS’s node activation decision can be modelled as a Markov decision process. Note that the data statistics also affect the node activation decision. Therefore, we propose a node activation policy under which the node activation in each time slot depends on the feedback of the current data statistics and energy states. To be specific, the proposed node activation policy is denoted by π∗ = (πg∗)g∈G = (πi,g∗ )i∈Ig,g∈G. The node activation in time slot t is at = π∗( ˆΣt, ˆest, τt), given the current feedback data statistics ˆΣt, energy stateesˆt, and activation intervalτt. The proposed node activation policyπ∗ is designed
to maximize the system data fidelity.3
Design 4.1. [Node activation policy] The node activation policyπ∗ is designed to max-imize the system data fidelity with any initial data statistics ˜Σ, energy state es˜0, and activation intervalτ˜0:
π∗ = arg maxahti
where the energy statees˜tand the activation intervalτ˜tsatisfy the transition functionsE andT with at = π∗( ˜Σ, ˜est, ˜τt), respectively.
Given the feedback data statistics ˆΣt, energy stateesˆt, and activation intervalτtin each time slott, the node activation is at= π∗( ˆΣt, ˆest, τt).
From (4.11), we can also show the maximum system data fidelity denoted by a value functionV∗ in a recursive form as follows:
V∗(Σ, es, τ ) =
Equation (4.12) is usually referred to as the Bellman equation. There are several algo-rithms for solving the Bellman equation. One of the most widely used is value iteration due to its many advantages, such as quick convergence and easy implementation, espe-cially when the state space is very large [65]. The value iteration algorithm for deriving the node activation policyπ∗ and the value functionV∗ is given in Algorithm 4.1. It in-volves the arbitrary initialization of the value functionV∗ (line 1) and the iterative update of the value function according to the Bellman equation in (4.12) (lines 2 to 8). The up-date process repeats until the convergence criterion is met, i.e., the upup-date errorδ is less than the specified error toleranceǫ (line 9). Lastly, the value iteration algorithm outputs
3Because of the stationary and Markov properties of the system, the optimal policy is indeed stationary and a function of the current data statistics, energy state and activation interval.
the optimal value functionV∗ and node activation policyπ∗(line 10).
Algorithm 4.1: Value iteration algorithm for node activation
1: Given Σ, initializeV∗(Σ, es, τ ) for each es and τ .
2: Repeat
3: δ ← 0.
4: For each es andτ .
5: v ← V∗(Σ, es, τ ).
6: V∗(Σ, es, τ ) ←
7:
maxah P
g∈Gvg(ag, Σg, esg, τg) + βP
es′,τ′E(es′|es, a)T (τ′|τ, a)V∗(Σ, es′, τ′)i .
8: δ ← max(δ, |v − V∗(Σ, es, τ )|).
9: Untilδ < ǫ (a small positive number).
10: Output the value functionV∗ and the policyπ∗.
Lastly, it is useful to denote the system data fidelity other than serverg (and its nodes) by the value functionV−g∗ . For the use in the pricing policy in 4.4.4, we can also design a node activation policyπ−gthat maximizes the system data fidelity with serverg excluded from the system. The system data fidelity with serverg excluded is denoted by V−g.
4.4.3 Update of the Energy State Information
We discuss how the BS and the servers update the probability mass function (pmf)∆(est) of the energy state after the feedback in each time slot t.4 The time instant after the feedback is called the ex-post stage of the time slot. In fact, we can define the other two stages: the ex-ante stage and the interim stage [57]. However, since any properties that are ex-post must be interim and ex-ante, we focus on the ex-post stage. So without special mention, the system data fidelity in Definition 4.3 and everything else that will be defined or designed later are all in the ex-post manner. Also note that there is no need to update the information of the data statistics and the activation intervals since the former is not time-varying and the latter is fully controlled by the BS.
After each node feedbacks the energy state, the BS updates the pmf∆(est|ht+1) by
4Since each node’s feedback is based on the server’s feedback strategy, we keep the functions of the node as simple as possible and assume that each node needs not update the system energy state information.
the Bayes’ rule as follows:
∆(est|ht+1) = ∆(est|ht)∆( ˆest|est, ht) P
est∆(est|ht)∆( ˆest|est, ht), (4.13) where∆(est|ht) is derived via the energy state transition function E as follows:
∆(est|ht) = X
est−1
∆(est−1|ht)E(est|est−1, π∗( ˆΣt−1, ˆest−1, τt−1)). (4.14)
On the other hand, though the feedback may not be truthful, we assume that each serverg can transform untruthful feedback to the real energy states of its nodes (an inverse transform from its feedback strategy). Thus, each serverg knows the energy state estgand updates the pmf∆(est|estg, ht+1) as follows:
∆(est|estg, ht+1) = ∆(est|estg, ht)∆( ˆest|est, ht) P
est
−g∆(est|estg, ht)∆( ˆest|est, ht), (4.15) where∆(est|estg, ht) is derived via the energy state transition function E−g as follows:
∆(est|estg, ht) = X
est−1
−g
∆(est−1|est−1g , ht)E−g(est−g|est−1−g , π∗−g( ˆΣt−1, ˆest−1, τt−1)).
(4.16)
4.4.4 Server Pricing Policy
We propose a pricing policy by which the BS charges each server for not only using the network service but also ensuring truthful feedback. Several desirable properties can also be achieved with the proposed pricing policy. The basic idea of the proposed pricing policy comes from the Vickrey-Clarke-Groves (VCG) mechanism [58, 59, 60]. Since the proposed node activation mechanism have multiple time slots, we extend the design idea from one shot to multiple time slots. Specifically, after the feedback in each time slot t, the BS pays each server g P
k∈G\gvk(π∗k, ˆΣtk, ˆestk, τkt), i.e., the one-shot system data fidelity other than serverg. To balance the payment, the BS also charges each server g the
one-shot price
κtg(ht, ˆΣt, ˆest) = V−g( ˆΣt−g, ˆest−g, τ−gt ) − βEh
V−g( ˆΣt−g, est+1−g, τ−gt+1)|ht+1i
, (4.17)
whereht+1 = (ht, ˆΣt, ˆest, τt). Just as in the VCG mechanism, we want the time dis-counted sum ofκtgto beV−g, i.e., the system data fidelity with serverg excluded from the system. However, directly using the VCG mechanism to charge P
k∈G\gvk(π−gk , ˆΣt−g, ˆest−g, τ−gt ), i.e., the one-shot system data fidelity with server g ex-cluded from the system, cannot make the time discounted sum equal toV−g. Therefore, we find another way to decomposeV−g in time intoκtg in (4.17). The proposed pricing policy denoted byρhti = (ρhtig )g∈G is then designed as follows:
Design 4.2. [Pricing policy] The one-shot price the BS charges each serverg in time slot t is
ρtg(ht, ˆΣt, ˆest) = − X
k∈G\g
vk(π∗k( ˆΣt, ˆest, τt), ˆΣtk, ˆestk, τkt) + κtg(ht, ˆΣt, ˆest), (4.18)
whereκtg is given in(4.17).
4.4.5 Utility of the Servers
With the proposed node activation policyπ∗ and pricing policyρhti, we define the ex-post utility of each serverg, denoted by ug(s|t, Σg, estg, ht+1), as the data fidelity minus the price:
Definition 4.4. [Ex-post utility] The utility of serverg seen in time slot t is
ug(s|Σg, estg, ht+1) =
∞
X
q=t
βq−tEh
vg(πg∗( ˆΣq, ˆesq, τq), Σg, esqg, τgq) − ρqg(hq, ˆΣq, ˆesq)|Σg, estg, ht+1i
(4.19)
Each serverg maximizes its own utility by choosing the feedback strategy sg. As will be proved in Chapter 4.6, the proposed node activation mechanism can induce the truthful
feedback strategys∗g from each serverg.