We use entropy to measure the uncertainty in a system after in-valid system states have been eliminated by a data cleansing method.
Generally, applying an efficient data cleansing method will lead to systems with smaller entropy. In this section, we firstly show the advantage of the 3-state model over the 2-state model and then prove that the 3-state model can maximize the system performance compared with other detection models with even more than 3 states.
A snippet of the RFID raw data is shown in Table 3 and the actual location of an objecti is denoted as a random variable L.
4.1 Entropy versus Read Rate
Entropy of the 2-State model: Suppose thaty denotes the read rate in the 2-state detection model. According to the right side of Equation 4, the probabilistic mass function ofL in the 2-state model can be represented as:
p(L = l) =n α(1 − y)y(1 − y)β if l= j
α(1 − y)(1 − y)yβ if l∈ {j − 1, j + 1}
0 otherwise
(11)
whereα is the normalizing constant and β represents the prior probability (we assume the prior distribution as a uniform distri-bution) in Equation 4. Thus, we can calculate the entropy of the distribution ofL as:
H(L) = −α(1 − y)(1 − y)yβ · ln(α(1 − y)(1 − y)yβ) (12)
−α(1 − y)y(1 − y)β · ln(α(1 − y)y(1 − y)β)
−α(1 − y)(1 − y)yβ · ln(α(1 − y)(1 − y)yβ)
Because the probabilities on all the locations sum to 1, we can derive Equation 13. By applying Equation 13 to Equation 12 (α andβ are canceled out), we can obtain Equation 14.
αβ= 1
3(1 − y)2y (13)
H(L) = −3 ·1 3· ln1
3= 1.098 (14)
Entropy of the 3-State model: Figure 6 corresponds to the 3-state model scenario. Supposex is the read rate in the major detection
Ҏ
Figure 6: The detection-region overlap interpreted by the 3-state detection model.
Table 3: A snippet of the RFID raw data.
5
region. Then the read rate in the minor detection region can be denoted asx/2. Thus, according to the right side of Equation 4, the probabilistic mass function ofL can be represented as follows:
p(L = l) =n α(1 −
x
2)x(1 −x2)β if l= j
α(1 −x2)(1 − x)x2β if l∈ {j − 1, j + 1}
0 otherwise
Similarly,α is the normalizing constant and β represents the prior probability in Equation 4. Therefore, we can calculate the entropy of the distribution ofL as:
H(L) = −α(1 −x
Since probabilities on all locations sum to 1, we can obtain Equa-tion 16.
αβ= 1
x(1 −x2)(2 −3x2 ) (16) Combining Equation 16 and Equation 15, we have:
H(L) = −2 · 1 − x
4 − 3x· ln 1 − x
4 − 3x− 2 − x
4 − 3x· ln 2 − x 4 − 3x In Figure 7, we plot the relationship between the reconstruction entropy and read rate under the 2-state and 3-state models, respec-tively. As Figure 7 illustrates, the entropy will decrease accordingly with the increase of read rate, which indicates that the system will have less uncertainty with more reliable readers. Moreover, Fig-ure 7 shows that the entropy in the 3-state model is always smaller than that in the 2-state model. For example, ifx = 0.95, the en-tropy in the 3-state model is 0.395 while the enen-tropy in the 2-state model is 1.098. This observation reveals that the 3-state model can be more informative in object localization than the 2-state model.
0
Figure 7: Relationship between entropy and read rate under the 2-state and 3-state models.
4.2 Entropy versus Number of States
Here we investigate the relationship between system entropy and the number of states in a detection model. Suppose ann-state model is adopted with the highest read rate ofx. Thus, the read rate in theithstate (counted with the increase of the detection distance), can be represented as(n−i)·xn−1 . Combined with Equation 4 and Ta-ble 3, we can obtain the probabilistic mass function ofL, repre-sented as Equation 17. Equation 17 shows that in ann-state model, a successful reading ”1” of a certain reader about an object in fact implies that this object may exist in any of the2(n−2)+1 = 2n−3 correlated zones (including the zone which the reader is associated to), each with a non-zero probability.
Based on Equation 17, we plotted the relationship between en-tropy and the number of states in a detection model in Figure 8, where we assumex equals to 0.95 (the most common case). Ac-cording to Figure 8, the 3-state detection model can minimize the system entropy and lead to the maximum system performance. In other words, having more states (more than 3) in a detection model can even deteriorate the system performance. Therefore, our exper-iments are mainly focused on the 3-state model.
0.0 2 3 4 5 6 7 8
Number of State in Reader Detection Model
Figure 8: Relationship between entropy and the number of states in a detection model.
5. SAMPLING
By using Bayesian inference, we derive the posterior, as shown in Equation 4. Since Equation 4 is easy to compute but hard to sample from, we need an efficient method to draw samples from the posterior distribution. In this section, we firstly focus on the Metropolis-Hastings (MH) and Gibbs samplers. Next, we show why MCMC is chosen in our solution. Finally, we propose a Metropolis-Hastings sampler with Constraints (MH-C) method.
5.1 Metropolis-Hastings and Gibbs Sampling
The Metropolis-Hastings (MH) sampler and the Gibbs sampler are the two most common MCMC samplers. MH conducts a se-quence of random walks using a proposal distribution and decides whether to reject the proposed moves using the rejection sampling.
In the applications of Bayesian inference, the normalizing constant is usually extremely difficult to compute. MH avoids the computa-tion of the constant. It approximates the posterior by using only the ratio of the posterior, where the constant is canceled out.
Recall that the random vector representing the locations of ob-jects is denoted as ˆH and the posterior distribution is post( ˆH|Z).
Suppose ˆHt−1is the immediate previous state before the state ˆHt
in the formed Markov chain. According to the MH algorithm, at first, a proposal sample, ˆHq, is drawn from a proposal distribution, q( ˆHq| ˆHt−1), i.e., ˆHq is a random deviation from ˆHt−1. In our research, we use a uniform proposal distribution whose support is defined as the step length. The proposal sample ˆH′can be denoted as ˆHt−1+ ˆHq. Then MH accepts ˆH′as the next state ˆHtwith the probability ofpost( ˆpost( ˆHH′|Z)
t−1|Z).
Here we compare MH sampler with the Gibbs sampler in brief.
The Gibbs sampler requires that conditional (marginal) distribu-tions for each variable are known and easy to sample from. MH relies on the ratio of the posterior, and does not require to sam-ple from any distribution. Because we have already derived the closed form of the posterior as Equation 4 and are able to calculate likelihoods easily according to the proposedn-state model, it will be much more straightforward to use MH sampler rather than the Gibbs sampler in our design.
5.2 Sample Correlation
6
p(L = l) =n
Figure 9: Taking advantage of the correlation between samples to improve sampling efficiency.
In our design, we choose MCMC instead of other sampling tech-nique because MCMC maintains the correlation among samples.
In MCMC, the next sample depends on the current sample. Before we elaborate on how we can take advantage of sample correlation to improve the efficiency of sampling in our scenario, we define two terms as follows.
Definition We call any sample generated by the sampler a candi-date sample. A qualified sample is a candicandi-date sample that satisfies all constraints.
The existence of constraints leads to the uniqueness of our prob-lem. Samples must satisfy all the constraints to become qualified.
Note that in most sampling problems, we prefer independent sam-ples, that is, the current draw of a sample is independent from the previous draw. In our scenario, however, the sampling techniques which generate independent samples (e.g., importance sampling) may suffer from low sampling efficiency due to the loss of corre-lation between adjacent samples. Figure 9 illustrates how the cor-relation between samples can be utilized to improve the sampling efficiency. The qualified sampling space is a subset of the complete sampling space. Suppose sample pointA is the current sample in the qualified sampling space. For an independent sampler, the next sample could be any point in the complete sampling space. How-ever, the next sample is useful only if it happens to fall into the qualified sampling space. On the contrary, if a MCMC-based sam-pler is employed, the next sample will be chosen according to the proposal distribution at pointA, i.e., the next sample will be in the area denoted by the dotted circle centering at pointA. Therefore, compared to other independent samplers, the probability that the next sample generated by MCMC falls into the qualified sampling space is considerably increased.
Note that although MCMC improves sampling efficiency, a sam-ple generated by MCMC may not necessarily be a qualified samsam-ple.
As in Figure 9, pointB is a sample generated by MCMC after point A. However, point B is outside the qualified sampling space. Con-sequently, we have to sample again to acquire pointC, which is a qualified sample, and then add it into the Markov chain as the next state. Afterward, the Markov chain moves from pointA to point C.
5.3 Metropolis-Hastings Sampler with Con-straints
Although the naive MH algorithm can evaluate the posterior by forming a Markov chain in the sampling space, it does not take
Symbol Meaning
Z The raw data matrix from RFID readers S The sample set
C~ The current sample in the Markov chain P~ The proposal sample in the Markov chain Cj The jthdimension of ~C
Pj The jthdimension of ~P E The number of effective samples
B The number of samples in the burn-in phase S The step length for the uniform proposal distribution Dobject The total number of monitored objects
Dzone The total number of zones J itter A random number between 0 and 1
Rand(a, b) Generate a random integer between a and b based on uniform distribution
P ost( ˆH|Z) The posterior probability of the sample ˆH given raw data Z
Table 4: Symbolic notations used in Algorithm 1.
constraints into account. If we impose constraints to samples, many of them should be rejected because they are inapplicable, i.e., they are not qualified samples.
To incorporate constraints in sampling, we propose a Metropolis-Hastings sampler with Constraints (MH-C). With MH-C, each zone is associated with multiple variables called resource descriptors.
The current value of a resource descriptor represents how much the associated resource is still available. Suppose we have a variable, denoted asDescriptorzonei, to keep track of the current available vacancy in zonei. The initial value of Descriptorzonei is set to the maximal capacity of zonei. Thus, whether an object j with the volumeV olumeobjectj is able to be stored in zonei can be examined by:
Descriptorzonei= Descriptorzonei− V olumeobjectj (18)
The proposed resource allocation is feasible only ifDescriptorzonei
is no less than zero. Otherwise, we have to re-sample until a new allocation meets all the constraints. Consequently, the problem of whether an allocation is feasible (or compatible) can reduce to the problem of monitoring the value of each descriptor.
With MH-C, because each sample is aDobject-dimensional vec-tor, a proposal sample is generated iteratively dimension by dimen-sion. If any descriptor for the current allocation is less than zero, there will be no chance for the current partial sample to become a qualified sample. Therefore, we can discard the current value and then choose another value for that dimension by re-sampling. As far as the proposal distribution is concerned, we construct a random walk chain by choosing a uniform proposal distribution within the step length. A detailed description of MH-C algorithm is illustrated in Algorithm 1 and the related notations are summarized in Table 4.
In Algorithm 1, line 1 initializes the sample set and takes the RFID raw data. Line 2 loads then-state detection model of readers with the objective of computing the likelihood. Line 3 initializes all the resource descriptors and line 4 randomly chooses a quali-fied sample as the first state of the Markov Chain. Lines 6 to 17 generate a random sample as the proposal sample dimension by dimension (object by object). Lines 8 to 13 correspond to the
sam-7
Algorithm 1 Metropolis-Hastings Sampler with Constraints