Node Activation Mechanism - 物聯網資訊採集與群組多播的機制設計

q=t

β^q−tEvg(a^t_g, Σg, es^t_g, τ_g^t)|Σg, es^t_g, τ_g^t , (4.9)

whereβ, 0 ≤ β ≤ 1, is the discount factor.

Lastly, we define the (long-term) system data fidelity:

Definition 4.3. [Long-term system data fidelity] The system data fidelity seen in time slot t is

g∈G

∞

q=t

β^q−tEvg(a^t_g, Σg, es^t_g, τ_g^t)|Σ, es^t, τ^t . (4.10)

Maximizing the system data fidelity will be the main design objective as will be shown in Chapter 4.4.

4.4 Node Activation Mechanism

In the proposed framework, the BS makes the node activation decision to maximize the system data fidelity. As the data statistics and the energy states are only known to the

(QHUJ\6WDWH

7UDQVLWLRQ'\QDPLFV 9DOXH,WHUDWLRQ

1RGH$FWLYDWLRQ

3ROLF\

6HUYHU3ULFLQJ

3ROLF\

0HDVXUHPHQW6HQVLQJ

1RGH$FWLYDWLRQ3ULFHV 8SGDWHRI(QHUJ\6WDWHVLQ

(DFK7LPH6ORW 3URSRVHG1RGH$FWLYDWLRQ0HFKDQLVP

)HHGEDFN

1RGHV %6

$FWLYDWLRQ

,QWHUYDO

6HUYHUV

'DWD6WDWLVWLFV )HHGEDFN

(QHUJ\6WDWHV

Figure 4.2: The block diagram of the proposed node activation mechanism. The servers and the nodes are required to feedback the data statistics and the energy states, respec-tively. The BS makes the node activation decision and also charges the servers prices.

servers and the nodes, the BS requires the feedback of the data statistics and the energy states. Since the servers and the nodes (owned by the servers) are selfish in nature, they may falsely feedback to manipulate the BS’s node activation decision if doing so increases their own performance. The overall system performance may become undesirable. There-fore, we design a node activation mechanism that can induce truthful feedback.

The proposed node activation mechanism is shown in Fig. 4.2. It consists of a node activation policy and a pricing policy. The node activation policy decides which nodes to be activated in each time slot. It is designed to maximize the system data fidelity.

The pricing policy decides how the BS charges each server in each time time slot. It is designed to ensure truthful feedback and achieve several desirable properties. Also, a value iteration algorithm is proposed to derive the optimal solutions of the resource allocation policy and the pricing policy. The details are specified as below.

4.4.1 Feedback Strategies of the Servers

Each serverg is required to feedback the data statistics Σg in each time slot. Though the data statistics will not vary with time, we require the feedback in each time slot is for the following reasons: Once the feedback is untruthful in a time slot, the feedback in the future time slots can still be truthful. As we will propose a node activation policy and a pricing policy both based on the current feedback, the untruthful current feedback will not affect the future node activation and pricing. On the other hand, each node(i, g) is required to feedback the energy statees^t_i,g in each time slot. We assume that each server makes a feedback strategy that decides how itseld and its nodes feedback the data statistics and the energy states. To be specific, when the mechanism starts, each serverg decides a feedback strategysg = ( ˆΣ^htig , ˆes^hti_g ) upon which server g feedbacks the data statistics Σˆ^t_g and each node (i, g) feedbacks the energy state ˆes^t_i,g in each time slott. A truthful feedback strategy meanss^∗_g = (Σ^htig , es^htig ) where Σ^t_g = Σg ∀ t. The collection of the feedback and the activation intervalsτ^hq≤t−1i(no feedback is required as fully controlled by the BS) up to time slot (t − 1) is called the history in time slot t and denoted by h^t= ( ˆΣ^hq≤t−1i, ˆes^hq≤t−1i, τ^hq≤t−1i). The feedback data statistics ˆΣ^t_gand energy stateesˆ^t_i,g should be functions of the historyh^t, the true data statisticsΣg, and true energy statees^t_i,g. However, we omit the complex notations and just use ˆΣ^t_gandesˆ^t_i,g for short.

4.4.2 Node Activation Policy and Value Iteration

The system is Markov due to the Markov energy state and activation interval transitions.

The BS’s node activation decision can be modelled as a Markov decision process. Note that the data statistics also affect the node activation decision. Therefore, we propose a node activation policy under which the node activation in each time slot depends on the feedback of the current data statistics and energy states. To be specific, the proposed node activation policy is denoted by π^∗ = (π_g^∗)g∈G = (π_i,g^∗ )i∈Ig,g∈G. The node activation in time slot t is a^t = π^∗( ˆΣ^t, ˆes^t, τ^t), given the current feedback data statistics ˆΣ^t, energy stateesˆ^t, and activation intervalτ^t. The proposed node activation policyπ^∗ is designed

to maximize the system data fidelity.³

Design 4.1. [Node activation policy] The node activation policyπ^∗ is designed to max-imize the system data fidelity with any initial data statistics ˜Σ, energy state es˜⁰, and activation intervalτ˜⁰:

π^∗ = arg maxahti

where the energy statees˜^tand the activation intervalτ˜^tsatisfy the transition functionsE andT with a^t = π^∗( ˜Σ, ˜es^t, ˜τ^t), respectively.

Given the feedback data statistics ˆΣ^t, energy stateesˆ^t, and activation intervalτ^tin each time slott, the node activation is a^t= π^∗( ˆΣ^t, ˆes^t, τ^t).

From (4.11), we can also show the maximum system data fidelity denoted by a value functionV^∗ in a recursive form as follows:

V^∗(Σ, es, τ ) =

Equation (4.12) is usually referred to as the Bellman equation. There are several algo-rithms for solving the Bellman equation. One of the most widely used is value iteration due to its many advantages, such as quick convergence and easy implementation, espe-cially when the state space is very large [65]. The value iteration algorithm for deriving the node activation policyπ^∗ and the value functionV^∗ is given in Algorithm 4.1. It in-volves the arbitrary initialization of the value functionV^∗ (line 1) and the iterative update of the value function according to the Bellman equation in (4.12) (lines 2 to 8). The up-date process repeats until the convergence criterion is met, i.e., the upup-date errorδ is less than the specified error toleranceǫ (line 9). Lastly, the value iteration algorithm outputs

3Because of the stationary and Markov properties of the system, the optimal policy is indeed stationary and a function of the current data statistics, energy state and activation interval.

the optimal value functionV^∗ and node activation policyπ^∗(line 10).

Algorithm 4.1: Value iteration algorithm for node activation

1: Given Σ, initializeV^∗(Σ, es, τ ) for each es and τ .

2: Repeat

3: δ ← 0.

4: For each es andτ .

5: v ← V^∗(Σ, es, τ ).

6: V^∗(Σ, es, τ ) ←

max^ah P

g∈Gvg(ag, Σg, esg, τg) + βP

es′,τ^′E(es^′|es, a)T (τ^′|τ, a)V^∗(Σ, es^′, τ^′)i .

8: δ ← max(δ, |v − V^∗(Σ, es, τ )|).

9: Untilδ < ǫ (a small positive number).

10: Output the value functionV^∗ and the policyπ^∗.

Lastly, it is useful to denote the system data fidelity other than serverg (and its nodes) by the value functionV_−g^∗ . For the use in the pricing policy in 4.4.4, we can also design a node activation policyπ^−gthat maximizes the system data fidelity with serverg excluded from the system. The system data fidelity with serverg excluded is denoted by V^−g.

4.4.3 Update of the Energy State Information

We discuss how the BS and the servers update the probability mass function (pmf)∆(es^t) of the energy state after the feedback in each time slot t.⁴ The time instant after the feedback is called the ex-post stage of the time slot. In fact, we can define the other two stages: the ex-ante stage and the interim stage [57]. However, since any properties that are ex-post must be interim and ex-ante, we focus on the ex-post stage. So without special mention, the system data fidelity in Definition 4.3 and everything else that will be defined or designed later are all in the ex-post manner. Also note that there is no need to update the information of the data statistics and the activation intervals since the former is not time-varying and the latter is fully controlled by the BS.

After each node feedbacks the energy state, the BS updates the pmf∆(es^t|h^t+1) by

4Since each node’s feedback is based on the server’s feedback strategy, we keep the functions of the node as simple as possible and assume that each node needs not update the system energy state information.

the Bayes’ rule as follows:

∆(es^t|h^t+1) = ∆(es^t|h^t)∆( ˆes^t|es^t, h^t) P

est∆(es^t|h^t)∆( ˆes^t|es^t, h^t), (4.13) where∆(es^t|h^t) is derived via the energy state transition function E as follows:

∆(es^t|h^t) = X

est−1

∆(es^t−1|h^t)E(es^t|es^t−1, π^∗( ˆΣ^t−1, ˆes^t−1, τ^t−1)). (4.14)

On the other hand, though the feedback may not be truthful, we assume that each serverg can transform untruthful feedback to the real energy states of its nodes (an inverse transform from its feedback strategy). Thus, each serverg knows the energy state es^t_gand updates the pmf∆(es^t|es^t_g, h^t+1) as follows:

∆(es^t|es^t_g, h^t+1) = ∆(es^t|es^t_g, h^t)∆( ˆes^t|es^t, h^t) P

es^t

−g∆(es^t|es^t_g, h^t)∆( ˆes^t|es^t, h^t), (4.15) where∆(es^t|es^t_g, h^t) is derived via the energy state transition function E−g as follows:

∆(es^t|es^t_g, h^t) = X

es^t−1

−g

∆(es^t−1|es^t−1_g , h^t)E−g(es^t_−g|es^t−1_−g , π^∗_−g( ˆΣ^t−1, ˆes^t−1, τ^t−1)).

(4.16)

4.4.4 Server Pricing Policy

We propose a pricing policy by which the BS charges each server for not only using the network service but also ensuring truthful feedback. Several desirable properties can also be achieved with the proposed pricing policy. The basic idea of the proposed pricing policy comes from the Vickrey-Clarke-Groves (VCG) mechanism [58, 59, 60]. Since the proposed node activation mechanism have multiple time slots, we extend the design idea from one shot to multiple time slots. Specifically, after the feedback in each time slot t, the BS pays each server g P

k∈G\gvk(π^∗_k, ˆΣ^t_k, ˆes^t_k, τ_k^t), i.e., the one-shot system data fidelity other than serverg. To balance the payment, the BS also charges each server g the

one-shot price

κ^t_g(h^t, ˆΣ^t, ˆes^t) = V^−g( ˆΣ^t_−g, ˆes^t_−g, τ_−g^t ) − βEh

V^−g( ˆΣ^t_−g, es^t+1_−g, τ_−g^t+1)|h^t+1i

, (4.17)

whereh^t+1 = (h^t, ˆΣ^t, ˆes^t, τ^t). Just as in the VCG mechanism, we want the time dis-counted sum ofκ^t_gto beV^−g, i.e., the system data fidelity with serverg excluded from the system. However, directly using the VCG mechanism to charge P

k∈G\gvk(π^−g_k , ˆΣ^t_−g, ˆes^t_−g, τ_−g^t ), i.e., the one-shot system data fidelity with server g ex-cluded from the system, cannot make the time discounted sum equal toV^−g. Therefore, we find another way to decomposeV^−g in time intoκ^t_g in (4.17). The proposed pricing policy denoted byρ^hti = (ρ^htig )g∈G is then designed as follows:

Design 4.2. [Pricing policy] The one-shot price the BS charges each serverg in time slot t is

ρ^t_g(h^t, ˆΣ^t, ˆes^t) = − X

k∈G\g

vk(π^∗_k( ˆΣ^t, ˆes^t, τ^t), ˆΣ^t_k, ˆes^t_k, τ_k^t) + κ^t_g(h^t, ˆΣ^t, ˆes^t), (4.18)

whereκ^t_g is given in(4.17).

4.4.5 Utility of the Servers

With the proposed node activation policyπ^∗ and pricing policyρ^hti, we define the ex-post utility of each serverg, denoted by ug(s|t, Σg, es^t_g, h^t+1), as the data fidelity minus the price:

Definition 4.4. [Ex-post utility] The utility of serverg seen in time slot t is

ug(s|Σg, es^t_g, h^t+1) =

∞

q=t

β^q−tEh

vg(π_g^∗( ˆΣ^q, ˆes^q, τ^q), Σg, es^q_g, τ_g^q) − ρ^q_g(h^q, ˆΣ^q, ˆes^q)|Σg, es^t_g, h^t+1i

(4.19)

Each serverg maximizes its own utility by choosing the feedback strategy sg. As will be proved in Chapter 4.6, the proposed node activation mechanism can induce the truthful

feedback strategys^∗_g from each serverg.

在文檔中物聯網資訊採集與群組多播的機制設計 (頁 100-107)