• 沒有找到結果。

Multiple Attributes-based Data Recovery in Wireless Sensor Networks

N/A
N/A
Protected

Academic year: 2022

Share "Multiple Attributes-based Data Recovery in Wireless Sensor Networks"

Copied!
6
0
0

加載中.... (立即查看全文)

全文

(1)

Multiple Attributes-based Data Recovery in Wireless Sensor Networks

Guangshuo Chen, Xiao-Yang Liu, Linghe Kong†∗, Jia-Liang Lu, Yu Gu, Wei Shu∗‡, Min-You Wu

Shanghai Jiao Tong University, China

Singapore University of Technology and Design, Singapore

University of New Mexico, USA

{chengs, yanglet, linghe.kong, jlu, shu, mwu}@sjtu.edu.cn,{linghe kong, jasongu}@sutd.edu.sg, [email protected]

Abstract—In wireless sensor networks (WSNs), since many basic scientific works heavily rely on the complete sensory data, data recovery is an indispensable operation against the data loss. Several works have studied the missing value problem.

However, existing solutions cannot achieve satisfactory accuracy due to special loss patterns and high loss rates in WSNs.

In this work, we propose a multiple attributes-based recovery algorithm which can provide high accuracy. Firstly, based on two real datasets, the Intel Indoor project and the GreenOrbs project, we reveal that such correlations are strong, e.g., the change of temperature and light illumination usually has strong correlation. Secondly, motivated by this observation, we develop a Multi-Attribute-assistant Compressive-Sensing-based (MACS) algorithm to optimize the recovery accuracy. Finally, real trace- driven simulation is performed. The results show that MACS outperforms the existing solutions. Typically, MACS can recover all data with less than 5% error when the loss rate is less than 60%. Even when losing 85% data, all missing data can be estimated by MACS with less than 10% error.

I. INTRODUCTION

Wireless sensor networks (WSNs) [4][6] are widely used to gather multiple attributes from the physical world and reconstruct environmental data in the cyber world [17]. Such data is significant for scientists to discover the physical world around. For instance, scientists reveal the plant evolution based on wind speed, air humidity and temperature data in the air [10], and predict the eruption by the temperature and shake data of volcano [12][9][18]. However, in WSNs, massive data loss is common, e.g., 64% and 35% of the data are missing in the Ocean Sense project [21] and the GreenOrbs project [20], respectively. Hence, recovering these lost data with high accuracy is challenging.

The high loss rates veil the time and spatial correlations.

Therefore classical interpolation methods, such as K-Nearest Neighbors (KNN) [16], cannot provide a satisfactory result due to the lack of one-hop neighbors. A recently proposed compressive sensing approach, the Environmental Space Time Improved Compressive Sensing (ESTI-CS) [8], can achieve better accuracy. However, the low-rank and sparse features are also effected in the massive data loss scenario where the ESTI-CS experiences the increased estimation error.

We are aware of the following two facts. (1) Usually, WSNs gather multiple attributes simultaneously, e.g., TelosB node [20] senses three attributes: temperature, light illumination and humidity. (2) Intuitively, one can expect that those attributes

are correlated. For instance, when the sun is arising, the temperature and light illumination outdoor increase simulta- neously. And the salinity of sea water also ties with the depth.

The empirical study [11] reveals that temperature, dewpoint temperature and relative humidity have linear correlation. The correlations among attributes can be used as the supplement of the internal correlations and benefit the accuracy of the estimation. Hereby, our technical route is how to mine and exploit such correlations for the problem of missing data recovery.

To address this problem, firstly, we study the characteristics of real sensory data from the Intel Indoor project [7] and the GreenOrbs project [20]. The low-rank feature of attributes is revealed. And we propose a joint sparse decomposition method in order to find the cross features among multiple attributes.

The energetic common part are found in the two correlated attributes. Secondly, we design an algorithm, named MACS, which can recover multi-attribute datasets jointly, using their correlation. Thirdly, we simulate the proposed approach on real data. We compare MACS with the classical and state-of- the-art methods such as KNN and ESTI-CS.

Our contributions are summarized as following:

To the best of our knowledge, this is the first work to study the joint data recovery in WSNs.

We design a novel algorithm, MACS, which is based on compressive sensing theory.

Real trace driven simulations are performed extensively.

The evaluation shows that MACS outperforms other compared solutions.

The rest parts of this paper are organized as following. In Section II, we present the related work. Section III shows the problem formulation. Section IV mines the internal and external features of attributes in WSNs. Section V proposes our approach, MACS. The performance is evaluated in Section VI. Section VII discusses the conclusion and future work.

II. RELATEDWORK

Lots of works have contributed in missing data interpolation.

The most classic interpolation method is K-Nearest Neighbors (KNN) [16], which utilizes the average value of neighbors to estimate the missing data. This interpolation method performs well in situations where there is a moderate number of missing

(2)

values. As the loss rate grows, the estimation error increases quickly due to the lack of one-hop neighbors.

Compressive Sensing (CS) [2][3] is currently an advanced and powerful technique for estimating massive missing data.

There are a series of CS based solutions being used in different fields, e.g., Distributed Compressive Sensing (DCS) [1][5] and Multi-Task Compressive Sensing (MTCS) [15] are utilized in the fields of signal processing and image processing. The state-of-the-art CS based interpolation method, utilized in the field of WSNs, is ESTI-CS [8]. ESTI-CS exploits the low- rank feature and spatial-temporal feature from the sensory data against the special loss patterns of WSNs. However, the low- rank and sparse features are also affected in the massive data loss scenario where the ESTI-CS experiences the increased estimation error.

All above methods aim at missing value estimation based on a single attribute. However, many physical attributes in nature have strong correlations such as humidity and temperature [11]. This work is to further improve the recovery accuracy exploiting such correlations. To the best of our knowledge, this is the first missing data recovery work using multiple attributes in WSNs.

III. PROBLEMFORMULATION

A. Environment Data Recovery Problem

Suppose n nodes are deployed in an area, each of which equips k sensors to measure attributes. The monitoring period includes t time slots. The format of the data packet is as following:

Sensor ID Time Stamp Attribute 1 Attribute 2 ...

Hereafter, let k attributes be denoted by Mi, i = 1, 2, · · · , k. Each Mi is a n× t matrix. Mi is usually an incomplete matrix due to the data loss in WSNs. The available information about Mi is a sampled set of entries (Mi)pq, (p, q) ∈ Ωi, where Ωi is a subset of the complete set of entries in Mi. This process is represented by using a sampling operator PΩ(·), which is defined as:

[PΩ(X)]ij=

 Xij, (i, j) ∈ Ω;

0, otherwise. (1)

Therefore, the matrices we obtain arePΩi(Mi), i = 1, · · · , k.

Our problem is to recover a series of matrices M1,· · · , Mk

(complete environmental data) from their sampled matrices PΩi(M1), · · · , PΩi(Mk) (incomplete data gathered by WSN) as precisely as possible, so-called Environment Data Recovery (EDR) problem .

B. Problem Statement

Since we focus on exploiting the correlation among mul- tiple attributes, multiple matrices are estimated jointly. For simplicity, in the most parts of this paper, we discuss the EDR problem under the situation of two attributes as an example.

Our analysis and approach can be easily extended to the case of more attributes.

Formally, when k= 2, the problem is defined as follows:

Fig. 1. Filter the original dataset by selecting the red parts to construct a small but completed dataset as the ground truth [8].

TABLE I

SELECTEDDATASETSASTHEGROUNDTRUTH

Data Name Matrix Size Time Interval

Intel Indoor 49 nodes× 149 intervals 1.5 minutes GreenOrbs Temperature 281 nodes× 170 intervals 10 minutes

Give subsets of M1, M2 as PΩ1(M1), PΩ2(M2), find an optimal solution as ˆM1and ˆM2, i.e.,

minimize || ˆM1− M1||F + μ|| ˆM2− M2||F, (2) subject to PΩ1( ˆM1) = PΩ1(M1),

PΩ2( ˆM2) = PΩ2(M2),

where|| · ||F represents the Frobenius norm, which is used in [19][8]. For instance, to a matrix X= (x(i, j))p×q,||X||F =



i,j(x(i, j))2. Because the magnitudes of attributes are not equal, which may cause one matrix overshadowing another, μ is used a tradeoff coefficient.

IV. DATASETS INSENSORNETWORKS

In this section, we analyze the real datasets of WSNs and discover several features of them, which are the foundations for our data recovery approach.

A. Ground Truth

The original datasets are gathered from two projects, GreenOrbs [20] and Intel Indoor [7]. After investigating the raw data, the loss rates of these two datasets are35% and 23%, respectively. Hence, in order to obtain the ground truth, two small but completed datasets are selected as shown in TABLE I. The selection method is shown in Fig.1, which considers the maximization of the integrality in both time and space.

Each dataset contains subsets of two attributes: temperature and light illumination, which share the same selecting entries.

B. Low-rank Structure

Consider the fact that the readings of nearby sensors are correlated and the readings in short time periods are close, we mine the inherent structure or redundancy of environment datasets.

The singular value decomposition (SVD) is adopted. The SVD of a n× t matrix X is:

X = U ΣVT =

min(n,t)

i=1

σiuivTi, (3)

where σi ≥ σi+1, i= 1, · · · , min(n, t), (·)T is the transpose operator, U is a n× n orthogonal matrix, V is a t × t

(3)

0 20 40 60 80 100 0

20 40 60 80 100

The first i singular values /%

Energy Fraction /% GreenOrbs Light

GreenOrbs Temperature Indoor Light

Indoor Temperature

Fig. 2. The first5% singular values contributes to over 90% total energy in GreenOrbs temperature and Indoor light/temperature. The number is20% in GreenOrbs light.

Two same matrices forest light/temp forest light/indoor light 0

20 40 60 80 100

Energy Fraction /%

Common Part U Individual Part Δ1 Individual Part Δ2

Fig. 3. Correlation analysis by joint sparse decomposition.

orthogonal matrix andΣ is a n × t diagonal matrix containing all singular values σi of X. Suppose r= rank(X), so σi in Σ = diag(σ1,· · · , σr,0, · · · , 0).

The sum of all singular values represent the total energy of X. According to [19], if a matrix X is low-rank, the sum of its first r singular values occupy the total or near total energy, i.e., r

i=1σi min(n,t)

i=1 σi. Fig.2 is a CDF to show the distribution of singular values’ energy. The top 5% singular values contain all energy in Indoor temperature and Indoor light. The top 5% and 20% σi include 90% of energy in GreenOrbs temperature and GreenOrbs light, respectively. The above results show that rank minimization is suitable for our data recovery problem.

C. Inter-Correlation between Attributes

The relationship usually exists among natural attributes. For instance, the empirical study [11] reveals that temperature, dewpoint temperature and relative humidity have linear corre- lation under some special cases. However, in most cases, the correlations cannot be directly measured as a simple function.

In order to exploit the relationship of attributes in a WSN, we propose the joint sparse decomposition (JSD) to divide two matrices into a common part and two individual parts. Suppose M1 = (m(1)1 ,· · · , m(t)1 ) and M2 = (m(1)2 ,· · · , m(t)2 ). For both column vector m(k)1 and m(k)2 , the goal is to split them,

TABLE II

LOW-RANKFEATURESAFTER JOINT SPARSE DECOMPOSITION Data Name Matrix Name XX%σ contain 90% Energy

Intel Indoor U 14%

light/temperature Δ1 8%

Δ2 2%

GreenOrbs U 27%

light/temperature Δ1 28%

Δ2 21%

i.e.,

m(k)1 = u(k)+ δ1(k) m(k)2 = u(k)+ δ2(k)

u(k)= Ψv(k)

(4)

where u(k) is the common part of m(k)1 and m(k)2 , which is the multiplication of a certain basisΨ (e.g., a wavelet basis) and a sparse vector v(k). The individual parts are represented by δ1(k), δ(k)2 , respectively. Furthermore, Eqn.(4) is rewritten in a matrix formulation, i.e.,

 m(k)1 m(k)2



=

Ψ I 0 Ψ 0 I

 ⎡⎣v(k) δ1(k) δ2(k)

⎦ (5)

According to compressive sensing theory [2][3], it is able to obtain(v(k)T, δ1(k)T, δ2(k)T)T by solving an l1-norm minimiza- tion problem as following:

ϑ = arg minˆ

ϑ ||ϑ||1 s.t. m = Aϑ, (6) where || · ||1 is the l1-norm, ϑ= (v(k)T, δ1(k)T, δ2(k)T)T, m= (m(k)T1 , m(k)T2 )T and A= (Ψ, I, 0; Ψ, 0, I). And then u(k)T, δ1(k)T, and δ(k)T2 are calculated from ϑ.

Applying JSD onto every column vector, M1 and M2 are decomposed, i.e.,

M1= U + Δ1

M2= U + Δ2 (7)

Fig.3 shows that the common part occupies bigger ratio if two matrices have stronger correlation. For instance, when JSD is operated on two same matrices, which have definitely highest correlation, the common part contains 100% energy and individual parts Δ1= Δ2= 0. Intuitively, in an outdoor WSN, the sensory light and temperature have correlation. Fig.3 shows the common part of light/temperature in forest contains 29% of total energy, while the individual parts contain 35%

and 36%. We also verify JSD on two irrelevant matrices.

The common part of forest light and indoor light, where no relationship exists, contains only7%.

In addition, the low rank feature of matrices after JSD is also revealed. As shown in TABLE.II, over 90% energy of attributes are contained in the first30% singluar values. This means that the derived matrices of JSD still exhibit the low- rank feature.

V. OUR APPROACH

To address the EDR problem, we propose a novel rela- tive data estimation approach named Multi-Attribute-assistant Compressive Sensing (MACS), which is designed to jointly recover the attributes in a WSN.

(4)

A. Approach Design

Normalization: In Eqn.(2), the choice of μ has a significant effect on the accuracy of estimation. Since the relationship between M1 and M2 is unknown, it is difficult to find the best μ. To overcome the difficulty, a simple method is to normalize each matrix, and then set μ= 1. The real maximum value is possible to loss, hence we adopt the maximum value in gathered datasets instead, i.e., for each sensory matrix PΩi(Mi), max(PΩi(Mi)) is used on the normalization. This operation is based on the observation that the natural attributes changes gradually. In other words, the gap between maximum values of the observed matrix and the original matrix is small in terms of the magnitude, i.e., for i= 1, 2

max(Mi) − max(PΩi(Mi))  max(Mi) (8) Low-Rank Matrix Approximation: Eqn.(2) contains the parameters M1 and M2, so the problem cannot be directly solved. However, since the low-rank features are revealed in Sec.IV, the problem is calculated by converting to a rank minimization problem. Through the inverse process of SVD, using k largest singular value of X, an optimal k-rank approximation [8] of X under the Frobenius norm || · || of errors can be obtained as ˆX =k

1σiuiviT. In our problem, this method is infeasible since we do not know the complete Mi its proper rank. However, it is reasonable to assume that estimated ˆMi is low-rank due to the low-rank feature of the original Mi. Thus the optimal Miis evaluated by the problem:

min(rank( ˆMi)), s.t. PΩi(Mi) = PΩi( ˆMi).

Still two problems are up against us: (1) the rank calculating operator rank(·) is not convex. (2) there is no connection between M1 and M2.

To bypass the difficulty (1), we utilize SVD-like factoriza- tion [8] as ˆX= LRT where L is a n×k matrix and R is a t×k matrix, k is an approximation of the proper rank. According to the progress of the matrix compressive-sensing literature [14][19], rank minimization is exactly equivalent to the nuclear norm minimization when a certain technical condition holds onPΩ(·) (the restricted isometry property [14]). Further, if the rank of X is less than the rank of LRT, min(rank( ˆX)) is equivalent to min(||L||2F + ||RT||2F).

Compressive Sensing-based Joint Matrix Decomposition:

To overcome the difficulty (2), we need to find the correlation between M1 and M2, and then exploit it into finding an optimal solution.

Through the joint sparse decomposition proposed in Sec.IV, the correlation between two matrices can be revealed. Hence, we separate the approximation ˆM1and ˆM2by JSD as:

Mˆ1= ˆU + ˆΔ1

Mˆ2= ˆU + ˆΔ2 (9)

Assume that ˆU , ˆΔ1and ˆΔ2are low-rank based on the low- rank structure analysis in Sec.IV. The problem is reformulated

as:

minimize || ˆU||+ || ˆΔ1||+ || ˆΔ2|| (10) subject to PΩ1( ˆU+ ˆΔ1) = PΩ1(M1)

PΩ2( ˆU+ ˆΔ2) = PΩ2(M2)

where|| · ||is the nuclear norm which is defined as the sum of singular values, e.g.,||X||=r

i=1σi(X).

Further more, by using SVD-like factorization into ˆU , ˆΔ1

and ˆΔ2, Eqn.(10) is rewritten as:

||LU||2F+ ||RTU||2F+ ||L1||2F+ ||RT1||2F+ ||L2||2F+ ||RT2||2F (11) where LU, L1, L2 are n× r matrices and RU, R1, R2 are t× r matrices. Moreover, ˆU = LURTU, ˆΔ1 = L1RT1 and Δˆ2= L2R2T.

To avoid overfitting, we convert the problem to a non- stationary optimization problem by using the Lagrange multi- plier method, i.e.,

minimize ||PΩ1(LURTU+ L1RT1) − S1||2F + ||PΩ2(LURTU+ L2RT2) − S2||2F + λ(

L

||Lj||2F +

R

||Rj||2F) (12)

where S1 = PΩ1(M1) and S2 = PΩ2(M2). The Lagrange multiplier λ allows a tunable tradeoff between the rank mini- mization and the accuracy fitness.

Eqn.(12) is solvable because (1) Ω1, Ω2, S1 and S2 are known, (2) each|| · ||2F is non-negative, (3) the optimal value can be reached by minimizing all non-negative parts to zero.

Hence, ˆM1and ˆM2can be estimated by combining Eqn.(12) with Eqn.(9).

Extension: Our approach is also suitable for the case of more attributes. For instance, if three attributes is measured in one WSN, represented as M1, M2and M3. Similarly, rewrite Eqn.(10) as following:

|| ˆU||+ || ˆΔ1||+ || ˆΔ2||+ || ˆΔ3|| (13) where i= 1, 2, 3, ˆMi= ˆU+ ˆΔi. This Equation can be solved by a similar method like Alg.1. In cases of more attributes, it is able to extend Eqn.(10) by this way.

B. Algorithm

To solve the estimation in the optimization problem defined by Eqn.(12), we propose an efficient algorithm. The detail of this algorithm is shown in Alg.1.

The algorithm solves the optimization by a iterative manner.

First, all L and R matrices are initialized randomly except RU. Fixing LU, RUcan be calculated from other L and R matrices by solving the equation:

PΩ1(LURTU) PΩ2√(LURTU)

λRTU

⎦ =

S1− L1RT1 S2− L2RT2

0

⎦ (14)

this equation is solvable by calculating each line. Rewrite it

as:

PΩ1(LU)(i)RTU(i) PΩ2(L√U)(i)RTU(i)

λRTc(i)

⎦ =

(S1− L1RT1)(i) (S2− L2RT2)(i)

0

⎦ (15)

(5)

Algorithm 1 MACS Algorithm Input:

Ω1 and Ω2: sensory entry set S1and S2: incomplete sensory data r: rank estimation of M1and M2 λ: tradeoff coefficient

k: iteration times Output:

Mˆ1and ˆM2: estimated environment matrices Main Procedure:

1: Normalization

2: α1← max(S1); α2← max(S2);

S1← S1./α1; S2← S2./α2; 3: Approximation

4: LU← rand(n, r); L1← rand(n, r); L2← rand(n, r);

5: R1← rand(r, t); R2← rand(r, t);

6: for 1 to k do

7: A1= S1− PΩ1(L1RT1); A2= S2− PΩ2(L2RT2);

8: RU ← crossInverse(Ω1, Ω2, LU, λ, r, A1, A2) 9: LU ← crossInverse(ΩT1, ΩT2, RU, λ, r, AT1, AT2) 10: B1= S1− PΩ1(LURTU); B2= S2− PΩ2(LURTU);

11: R1← singleInverse(Ω1, L1, λ, r, B1) 12: L1← singleInverse(ΩT1, R1, λ, r, B1T) 13: R2← singleInverse(Ω2, L2, λ, r, B2) 14: L2← singleInverse(ΩT2, R2, λ, r, B2T) 15: v← Eqn.(12)

16: if v < ˆv then

17: LˆU ← LU; ˆRU ← RU; ˆL1 ← L1; ˆR1 ← R1; ˆL2 L2; ˆR2← R2; ˆv ← v;

18: end if 19: end for

20: Mˆ1← α1( ˆLURˆUT+ ˆL1Rˆ1T) 21: Mˆ2← α2( ˆLURˆUT+ ˆL2Rˆ2T)

Procedure Y = singleInverse(Ω, L, λ, r, S):

1: for i=1 to t do

2: Pi← [PΩ(L)(:, i);√ λ∗ Ir] 3: Qi← [S(:, i); 0r]

4: Y (:, i) = (PiT ∗ Pi)\(PiT ∗ Qi) 5: end for

Procedure Y = crossInverse(Ω1, Ω2, L, λ, r, S1, S2):

1: for i=1 to t do

2: Pi← [PΩ1(L)(:, i); PΩ2(L)(:, i);√ λ∗ Ir] 3: Qi← [S1(:, i); S2(:, i); 0r]

4: Y (:, i) = (PiT ∗ Pi)\(PiT ∗ Qi) 5: end for

where i ranges from 1 to t. Eqn.(15) can be treated as a linear least square problem. RU can be obtained by the inverse procedure given in Alg.1 as crossInverse. And LU can be computed using the same procedure by fixing RU.

Li and Ri are obtained by a similar procedure, which is defined in singleInverse as the pseudo code.

Moreover, in Alg.1, the rank approximation r and the lagrange tradeoff coefficient λ are significant influential in the accuracy of estimation. Hence, λ is tuned by the method in [13]. And our evaluation uses r= 20%min(n, t), since 20%

singular values contributes to over90% energy in the datasets.

The complexity of the algorithm is O(rntk). Because the key operation in Alg.1 is the inverse computation, whose complexity is O(nrt) [8], and the algorithm iterates k times.

VI. PERFORMANCEEVALUATION

A. Methodology

Performance evaluation is based on real-trace driven simu- lation.

Ground Truth: The real trace includes the temperature and light illumination attributes from GreenOrbs and Intel Indoor projects. In Sec. IV-A, we have presented the method to obtain the ground truth from raw data in detail.

Compared Methods: To verify the effectiveness of our approach, two methods for missing data recovery in WSNs are chosen for comparison. They are the classical interpolation method, K-Nearest Neighbor (KNN) [16], and the state-of-the- art method, Environmental Space Time Improved Compressive Sensing (ESTI-CS) [8].

Metric: To compare results evaluated from different matri- ces, the error rate of approximation under the Frobenius norm, err( ˆM , M,Ω), is applied [13], which is defined as:

err( ˆM , M, Ω) =||PΩ(M) − PΩ( ˆM )||2F

||PΩ(M)||2F (16) where Ω is the complementary set of Ω.

Procedure: The procedure of simulation is as following:

1) Randomly lose the data from the ground truth to simu- late the gathered data in WSNs. Generate the subset Ω from the random loss pattern and then setΩ1= Ω2= Ω.

The quantity of data loss is from 20% to 90%.

2) Using datasets of two physical attributes, compute PΩ(M1) and PΩ(M2).

3) PΩ(M1) and PΩ(M2) serve as the inputs of the estima- tion algorithms including KNN, ESTI-CS, and MACS, separately or together. Then we obtain the approxima- tions of M1 and M2as ˆM1and ˆM2.

4) Compare the performance of the algorithms on the error rate defined by Eqn.(16).

B. Simulation Results

In Fig.4, we plot the comparison result of three algorithms in the case of two attributes. According to the simulation, MACS can obtain less than5% error rate under the loss rate less than 60%, where ESTI-CS can provide 10% and KNN performs worse. Even in high loss rate (80%), the error rate of MACS is still less than 10%. The main reason is that MACS uses the correlation between two attributes. Hence, the accuracy of estimating missing values increases if the correlation exists.

And even when there are no relation between two attributes, the performance of MACS is as equal as ESTI-CS.

The recovery accuracy of the temperature is higher than the one of the light illumination. The main reason is that the temperature in outdoor WSNs changes slowly and has small amplitude, which leads to its strong time and space stabilities benefiting estimation methods. While the accuracy of light illumination in GreenOrbs is a little weak, the reason is that light illumination varies considerably in nature.

As shown in Fig.4, the estimation performance of KNN is barely satisfactory and reduces quickly as the increasing of

(6)

20 40 60 80 0

10 20 30 40 50 60

Data loss probability /%

Error Rate /%

MACS ESTI−CS KNN

(a) Indoor Light

20 40 60 80

0 10 20 30 40 50 60

Data loss probability /%

Error Rate /%

MACS ESTI−CS KNN

(b) Indoor Temp

20 40 60 80

0 20 40 60 80

Data loss probability /%

Error Rate /%

MACS ESTI−CS KNN

(c) GreenOrbs Light

20 40 60 80

0 10 20 30 40 50 60

Data loss probability /%

Error Rate /%

MACS ESTI−CS KNN

(d) GreenOrbs Temp Fig. 4. The accuracy of missing value estimation methods.

data loss rate. The possible reason is that the massive data loss in WSNs veils the time and spatial correlations between attributes. Hence the interpolation methods can not benefit well from these features.

Totally, MACS outperforms ESTI-CS and KNN in random loss pattern, whatever the correlation between attributes exists or not.

VII. CONCLUSION

In this paper, we studied the Environment Data Recovery Problem in WSNs. We proposed the joint sparse decomposi- tion to reveal the correlation among multiple attributes. The low-rank feature was exhibited by both the original and the JSD derived data. Driven by these observations, we designed the MACS algorithm to approximate the missing data. The algorithm combines the benefits of compressive sensing and the correlation of attributes. Data-driven simulations illustrated that MACS outperforms existing interpolation methods.

The future works are as following. First, considering to use the Bayesian Model into the prediction and the data reconstruction. Second, studying the relationship between the computation time and the accuracy. Third, generalizing the multiple attributes data reconstruction to more fields.

ACKNOWLEDGMENT

This research was supported by NSF of China under grant No. 61073158, No. 61100210, STCSM Project No.

12dz1507400, No. 13511507800, Doctoral Program Foun- dation of Institutions of Higher Education under grant No.

20110073120021, Singapore-MIT International Design Cen- ter IDG31000101, iTrust Cyber Physical System Protection project, and the Singapore NRF under its IDM Futures Fund- ing Initiative and administered by the Interactive and Digital Media Programme Office, Media Development Authority.

REFERENCES

[1] D. Baron, M.B. Wakin, M.F. Duarte, S. Sarvotham, and R.G. Baraniuk.

“Distributed compressed sensing”, Preprint, 2005.

[2] D.L. Donoho. “Compressed sensing”, IEEE Transactions on Information Theory, Vol. 4, No. 4, pp. 1289-1306, 2006.

[3] E.J. Cand`es, J. Romberg, and T. Tao. “Robust uncertainty principles:

Exact signal reconstruction from highly incomplete frequency informa- tion”, IEEE Transactions on Information Theory, Vol. 52, No.2, pp.

489-509, 2006.

[4] F.L. Lewis. “Wireless sensor networks”, Smart Environments: Tech- nologies, Protocols, and Applications, pp. 11-46, 2004.

[5] G. Chen, X.-Y. Liu, L. Kong, J.-L. Lu, W. Shu, and M.-Y. Wu. “JSSDR:

Joint-Sparse Sensory Data Recovery in Wireless Sensor Networks”, IEEE WiMob, 2013.

[6] IF. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci. “Wireless sensor networks: a survey”, Elsevier Computer Networks, Vol. 38, No.

4, pp. 393-422, 2002.

[7] Intel Lab data. http://www.select.cs.cmu.edu/data/labapp3/index.html.

[8] L. Kong, M. Xia, X.-Y. Liu, M.-Y. Wu, and X. Liu. “Data loss and reconstruction in sensor networks”, IEEE INFOCOM, pp. 1701-1710, 2013.

[9] L. Kong, M. Zhao, X.-Y. Liu, J.-L. Lu, Y. Liu, M.-Y. Wu, and W. Shu.

“Surface coverage in sensor networks”, IEEE Transactions on Parallel and Distributed Systems, 2013.

[10] M. Heil and R. Karban. “Explaining evolution of plant communication by airborne signals”, Trends in Ecology & Evolution, Vol. 25, No. 3, pp. 137-144, 2010.

[11] M.G. Lawrence. “The relationship between relative humidity and the dewpoint temperature in moist air: A simple conversion and application- s”, Bulletin of the American Meteorological Society, Vol. 86, No. 2, pp.

225-233, 2005.

[12] M.L. Rudolph, L. Karlstrom, and M. Manga. “A prediction of the longevity of the lusi mud eruption, indonesia”, Earth and Planetary Science Letters, Vol. 308, No. 1, pp. 124-130, 2011.

[13] S. Rallapalli, L. Qiu, Y. Zhang and YC. Chen. “Exploiting temporal stability and low-rank structure for localization in mobile networks”, ACM MobiCom, pp. 161-172, 2010.

[14] B. Recht, M. Fazel, and PA. Parrilo. “Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization”, SIAM, Vol. 52, No. 3, pp. 471-501, 2010.

[15] S. Ji, D. Dunson, and L. Carin. “Multitask compressive sensing”, IEEE Transactions on Signal Processing, Vol. 57, No. 1, pp. 92-106, 2009.

[16] T. Cover and P. Hart. “Nearest neighbor pattern classification”, IEEE Transactions on Information Theory, Vol. 13, No. 1, pp. 21-27, 1967.

[17] T. He, S. Krishnamurthy, J.A. Stankovic, T. Abdelzaher, L. Luo, R. Stoleru, T. Yan, L. Gu, J. Hui, and B. Krogh. “Energy-efficient surveillance system using wireless sensor networks”, ACM Proceedings of the 2nd international conference on Mobile systems, applications, and services, pp. 270-283, 2004.

[18] X.-Y. Liu, K. Wu, Y. Zhu, L. Kong, and M.-Y. Wu. “Mobility increases the surface coverage of distributed sensor networks”, Elsevier Computer Networks, 2013.

[19] Y. Zhang, M. Roughan, W. Willinger, and L. Qiu. “Spatio-temporal compressive sensing and internet traffic matrices”, ACM SIGCOMM, Vol. 39, No. 4, pp. 267-278, 2009.

[20] Y. Liu, Y. He, M. Li, J. Wang, K. Liu, L. Mo, W. Dong, Z. Yang, M. Xi, and J. Zhao. “Does wireless sensor network scale? A measurement study on greenorbs”, IEEE INFOCOM, pp. 873-881, 2011.

[21] Z. Yang, M. Li, and Y. Liu. “Sea depth measurement with restricted floating sensors”, IEEE RTSS, pp. 469-478, 2007.

數據

Fig. 1. Filter the original dataset by selecting the red parts to construct a small but completed dataset as the ground truth [8].
TABLE II

參考文獻

相關文件

Overview of NGN Based on Softswitch Network Architectures of Softswitch- Involved Wireless Networks.. A Typical Call Scenario in Softswitch- Involved

Note that if the server-side system allows conflicting transaction instances to commit in an order different from their serializability order, then each client-side system must apply

Microphone and 600 ohm line conduits shall be mechanically and electrically connected to receptacle boxes and electrically grounded to the audio system ground point.. Lines in

Established in 2019, The Project Futurus is an accredited social enterprise based in Hong Kong that explores the future of aging through education, advocacy

Based on the observations and data collection of the case project in the past three years, the critical management issues for the implementation of

Results indicate that the proposed scheme reduces the development cost, numbers of design change, and project schedule of the products, and consequently improve the efficiency of

The aim of this research was investigated and analyzed the process of innovation forwards innovative production networks of conventional industries that could be based on the

Particularly, combining the numerical results of the two papers, we may obtain such a conclusion that the merit function method based on ϕ p has a better a global convergence and