Algorithm design - Proposed model - 具備低能量消耗之公平獎勵的整合策略

Chapter 3 Proposed model

3.2 Algorithm design

The main purpose of this data aggregation policy algorithm is to maximize the number of un-expired measurements in a single transmission. To achieve this requirement, we adopt the Markov Decision Process(MDP) [13] model to formulate its behavior and derive the solution. A MDP consists of five elements: states, actions, transition probabilities, rewards, and discount factor. The algorithm is triggered every time when measurements are pushed into the buffer. The below is the five elements of MDP and a table of symbols used in the algorithm and their meanings.

 States Sn: the states are defined as the measurements in the buffer. For example, let Sn = (X1, X2, X3), and at the next moment, the node receives a measurement and pushes it to the buffer, the state becomes Sn+1 = (X1, X2, X3, X4). The lower case sn means the number of measurements in the buffer. The subscript n is the order of the state.

 Actions An: the model provides two actions, transmit and continue. If the action is "transmit", the buffer is flushed and the measurements are

aggregated and transmitted. The next state becomes empty and restarts from the newly received measurements. If the action is "continue", the buffer remains untouched. There might be some measurements expired waiting for transmission. The buffer can decide whether to drop them in the process of the algorithm.

 Transition Probability: P(Sn+1|Sn,An) = 1. The original meaning of this parameter is that given a state and an action, the next state could have more than one outcome. For our case, each action only has one result, no

randomness involves. When the buffer is added with new measurements, the state has to decide its action. The criteria of transmission depend on the reward of the current state and the next state. If the reward of the next state is better than the current, the state would choose to continue. On the other hand, if the next state has less reward, transmission is taken.

 Rewards R and discount factor: we define the reward as the number of measurements un-expired. The discount factor is 1 because a measurement is either expired or un-expired, and its corresponding reward is 0 or 1. Maximize the reward, and we can achieve our claim of saving energy by transmitting the most of un-expired measurements in a packet.

Symbol Meaning

Sn The list of measurements in the buffer

sn The number of measurements in the buffer Delay Estimated timing cost of the transmission

T(i) The deadline of the i^th measurement in the buffer

ExpiredValue The number of measurements going to be expired at the next moment

ExpectedValue The number of measurements going to be acquired at the next moment

To estimate the reward of the next state, we give the formulation of the expected reward based on some observation. From state Sk to Sk+1, some measurements are pushed to the buffer, while some are expired. The transmission criteria is that the present reward is higher than the reward at the next transmission moment,

Reward(Sn)>Reward(Sn+1). The reward at the next transmission moment is equal to the

present reward plus the ExpectedValue minus the ExpiredValue,

Reward(Sn)>Reward(Sn)+ExpectedValue-ExpiredValue. After some simplification, we

have ExpiredValue>ExpectedValue. The ExpiredValue means the number of measurements that are going to expire in the next moment. The ExpiredValue is

estimated by the time constraint for which we wish to wait and calculates the number of expired measurement before the time constraint. The ExpectedValue represents for the number of coming measurements and is evaluated by sensor generation and forwarding from other nodes.

Assume that we have the measurement generation rate, named λ, and the node's character being as cluster head or cluster member. The cluster members do not receive measurements from others. We analyze the algorithm by cluster head, cluster member, their expected incoming and expired measurements.

A. Cluster head

Expected measurements: The cluster head has additional sources of measurements

by forwarding packets from its cluster member. The cluster head itself generates λ measurements per second, or one measurement per 1/λ second. Once a measurement is generated, the data aggregation policy is triggered. Therefore, the expected value is to calculate the value 1/λ second later. Despite the fact that it is hard to model the

probability when the cluster member is going to forward their measurements, however, it is a simple fact that they all generate one measurements per 1/λ second. We can therefore estimate the number of expected measurements to be the cluster size.

ExpectedValue = 1 + cluster size (1) Expired measurements: The cluster head is only one hop away from its destination,

the base station. After the cluster head decides to transmit, the delay before arriving in the base station is the propagation delay of transmission. The propagation delay is the packet size divided by the wireless traffic rate. However, with only this constraint, our expiration rate would be incredibly high, losing the meaning of sensor deployment. We assign the time which the cluster head takes waiting for transmission to be the buffer size divided by the packet generation rate (λ). This means the time needed to generate the buffer size measurements and the difference of the generation time between the first and the last measurement.

Delay = Prop. delay + buffer size/λ (2)

B. Cluster member

Expected measurements: The cluster member has one measurement when every

time the data aggregation policy is triggered. The expected value is regardless of the measurement generation rate λ. However, the cluster size of the cluster member is zero.

Thus, to simplify the algorithm, we can calculate the expected value formula to be the same as the one of the cluster head.

ExpectedValue = 1 (3)

Expired measurements: The cluster member is two hops away from the base

station, and the time requires to get to the base station is one propagation delay plus the time needed for the cluster head. However, the cluster members do not need other constraints, because the late transmissions to the cluster head do not expire the measurements. The loose time constraint can further aggregate more measurements.

Delay = 2*Prop. delay + buffer size/λ (4) From the above analysis, the conclusion is that the expected value focuses on the node's character in the topology and the expired value puts emphasis on the estimation of delay. For further applications, these can provide useful inspection.

在文檔中具備低能量消耗之公平獎勵的整合策略 (頁 15-20)