Multiple Attributes-based Data Recovery in Wireless Sensor Networks

(1)

Multiple Attributes-based Data Recovery in Wireless Sensor Networks

Guangshuo Chen^∗, Xiao-Yang Liu^∗, Linghe Kong^†∗, Jia-Liang Lu^∗, Yu Gu^†, Wei Shu^∗‡, Min-You Wu^∗

∗Shanghai Jiao Tong University, China

†Singapore University of Technology and Design, Singapore

‡University of New Mexico, USA

∗{chengs, yanglet, linghe.kong, jlu, shu, mwu}@sjtu.edu.cn,^†{linghe kong, jasongu}@sutd.edu.sg, ^‡[email protected]

Abstract—In wireless sensor networks (WSNs), since many basic scientiﬁc works heavily rely on the complete sensory data, data recovery is an indispensable operation against the data loss. Several works have studied the missing value problem.

However, existing solutions cannot achieve satisfactory accuracy due to special loss patterns and high loss rates in WSNs.

In this work, we propose a multiple attributes-based recovery algorithm which can provide high accuracy. Firstly, based on two real datasets, the Intel Indoor project and the GreenOrbs project, we reveal that such correlations are strong, e.g., the change of temperature and light illumination usually has strong correlation. Secondly, motivated by this observation, we develop a Multi-Attribute-assistant Compressive-Sensing-based (MACS) algorithm to optimize the recovery accuracy. Finally, real trace- driven simulation is performed. The results show that MACS outperforms the existing solutions. Typically, MACS can recover all data with less than 5% error when the loss rate is less than 60%. Even when losing 85% data, all missing data can be estimated by MACS with less than 10% error.

I. INTRODUCTION

Wireless sensor networks (WSNs) [4][6] are widely used to gather multiple attributes from the physical world and reconstruct environmental data in the cyber world [17]. Such data is signiﬁcant for scientists to discover the physical world around. For instance, scientists reveal the plant evolution based on wind speed, air humidity and temperature data in the air [10], and predict the eruption by the temperature and shake data of volcano [12][9][18]. However, in WSNs, massive data loss is common, e.g., 64% and 35% of the data are missing in the Ocean Sense project [21] and the GreenOrbs project [20], respectively. Hence, recovering these lost data with high accuracy is challenging.

The high loss rates veil the time and spatial correlations.

Therefore classical interpolation methods, such as K-Nearest Neighbors (KNN) [16], cannot provide a satisfactory result due to the lack of one-hop neighbors. A recently proposed compressive sensing approach, the Environmental Space Time Improved Compressive Sensing (ESTI-CS) [8], can achieve better accuracy. However, the low-rank and sparse features are also effected in the massive data loss scenario where the ESTI-CS experiences the increased estimation error.

We are aware of the following two facts. (1) Usually, WSNs gather multiple attributes simultaneously, e.g., TelosB node [20] senses three attributes: temperature, light illumination and humidity. (2) Intuitively, one can expect that those attributes

are correlated. For instance, when the sun is arising, the temperature and light illumination outdoor increase simultaneously. And the salinity of sea water also ties with the depth.

The empirical study [11] reveals that temperature, dewpoint temperature and relative humidity have linear correlation. The correlations among attributes can be used as the supplement of the internal correlations and beneﬁt the accuracy of the estimation. Hereby, our technical route is how to mine and exploit such correlations for the problem of missing data recovery.

To address this problem, ﬁrstly, we study the characteristics of real sensory data from the Intel Indoor project [7] and the GreenOrbs project [20]. The low-rank feature of attributes is revealed. And we propose a joint sparse decomposition method in order to ﬁnd the cross features among multiple attributes.

The energetic common part are found in the two correlated attributes. Secondly, we design an algorithm, named MACS, which can recover multi-attribute datasets jointly, using their correlation. Thirdly, we simulate the proposed approach on real data. We compare MACS with the classical and state-of- the-art methods such as KNN and ESTI-CS.

Our contributions are summarized as following:

• To the best of our knowledge, this is the ﬁrst work to study the joint data recovery in WSNs.

• We design a novel algorithm, MACS, which is based on compressive sensing theory.

• Real trace driven simulations are performed extensively.

The evaluation shows that MACS outperforms other compared solutions.

The rest parts of this paper are organized as following. In Section II, we present the related work. Section III shows the problem formulation. Section IV mines the internal and external features of attributes in WSNs. Section V proposes our approach, MACS. The performance is evaluated in Section VI. Section VII discusses the conclusion and future work.

II. RELATEDWORK

Lots of works have contributed in missing data interpolation.

The most classic interpolation method is K-Nearest Neighbors (KNN) [16], which utilizes the average value of neighbors to estimate the missing data. This interpolation method performs well in situations where there is a moderate number of missing

(2)

values. As the loss rate grows, the estimation error increases quickly due to the lack of one-hop neighbors.

Compressive Sensing (CS) [2][3] is currently an advanced and powerful technique for estimating massive missing data.

There are a series of CS based solutions being used in different fields, e.g., Distributed Compressive Sensing (DCS) [1][5] and Multi-Task Compressive Sensing (MTCS) [15] are utilized in the fields of signal processing and image processing. The state-of-the-art CS based interpolation method, utilized in the field of WSNs, is ESTI-CS [8]. ESTI-CS exploits the low- rank feature and spatial-temporal feature from the sensory data against the special loss patterns of WSNs. However, the low- rank and sparse features are also affected in the massive data loss scenario where the ESTI-CS experiences the increased estimation error.

All above methods aim at missing value estimation based on a single attribute. However, many physical attributes in nature have strong correlations such as humidity and temperature [11]. This work is to further improve the recovery accuracy exploiting such correlations. To the best of our knowledge, this is the ﬁrst missing data recovery work using multiple attributes in WSNs.

III. PROBLEMFORMULATION

A. Environment Data Recovery Problem

Suppose n nodes are deployed in an area, each of which equips k sensors to measure attributes. The monitoring period includes t time slots. The format of the data packet is as following:

Sensor ID Time Stamp Attribute 1 Attribute 2 ...

Hereafter, let k attributes be denoted by M_i, i = 1, 2, · · · , k. Each Mi is a n× t matrix. Mi is usually an incomplete matrix due to the data loss in WSNs. The available information about M_i is a sampled set of entries (Mi)pq, (p, q) ∈ Ωi, where Ωi is a subset of the complete set of entries in M_i. This process is represented by using a sampling operator PΩ(·), which is deﬁned as:

[PΩ(X)]_ij=

X_ij, (i, j) ∈ Ω;

0, otherwise. (1)

Therefore, the matrices we obtain arePΩi(Mi), i = 1, · · · , k.

Our problem is to recover a series of matrices M₁,· · · , Mk

(complete environmental data) from their sampled matrices PΩi(M1), · · · , PΩi(Mk) (incomplete data gathered by WSN) as precisely as possible, so-called Environment Data Recovery (EDR) problem .

B. Problem Statement

Since we focus on exploiting the correlation among multiple attributes, multiple matrices are estimated jointly. For simplicity, in the most parts of this paper, we discuss the EDR problem under the situation of two attributes as an example.

Our analysis and approach can be easily extended to the case of more attributes.

Formally, when k= 2, the problem is deﬁned as follows:

Fig. 1. Filter the original dataset by selecting the red parts to construct a small but completed dataset as the ground truth [8].

TABLE I

SELECTEDDATASETSASTHEGROUNDTRUTH

Data Name Matrix Size Time Interval

Intel Indoor 49 nodes× 149 intervals 1.5 minutes GreenOrbs Temperature 281 nodes× 170 intervals 10 minutes

Give subsets of M₁, M₂ as PΩ1(M1), PΩ2(M2), ﬁnd an optimal solution as ˆM₁and ˆM₂, i.e.,

minimize || ˆM₁− M1||F + μ|| ˆM₂− M2||F, (2) subject to PΩ1( ˆM₁) = PΩ1(M1),

PΩ2( ˆM₂) = PΩ2(M2),

where|| · ||F represents the Frobenius norm, which is used in [19][8]. For instance, to a matrix X= (x(i, j))p×q,||X||F =

i,j(x(i, j))². Because the magnitudes of attributes are not equal, which may cause one matrix overshadowing another, μ is used a tradeoff coefﬁcient.

IV. DATASETS INSENSORNETWORKS

In this section, we analyze the real datasets of WSNs and discover several features of them, which are the foundations for our data recovery approach.

A. Ground Truth

The original datasets are gathered from two projects, GreenOrbs [20] and Intel Indoor [7]. After investigating the raw data, the loss rates of these two datasets are35% and 23%, respectively. Hence, in order to obtain the ground truth, two small but completed datasets are selected as shown in TABLE I. The selection method is shown in Fig.1, which considers the maximization of the integrality in both time and space.

Each dataset contains subsets of two attributes: temperature and light illumination, which share the same selecting entries.

B. Low-rank Structure

Consider the fact that the readings of nearby sensors are correlated and the readings in short time periods are close, we mine the inherent structure or redundancy of environment datasets.

The singular value decomposition (SVD) is adopted. The SVD of a n× t matrix X is:

X = U ΣV^T =

min(n,t)

i=1

σiuiv^T_i, (3)

where σ_i ≥ σi+1, i= 1, · · · , min(n, t), (·)^T is the transpose operator, U is a n× n orthogonal matrix, V is a t × t

(3)

0 20 40 60 80 100 0

20 40 60 80 100

The first i singular values /%

Energy Fraction /% GreenOrbs Light

GreenOrbs Temperature Indoor Light

Indoor Temperature

Fig. 2. The ﬁrst5% singular values contributes to over 90% total energy in GreenOrbs temperature and Indoor light/temperature. The number is20% in GreenOrbs light.

Two same matrices forest light/temp forest light/indoor light 0

20 40 60 80 100

Energy Fraction /%

Common Part U Individual Part Δ₁ Individual Part Δ₂

Fig. 3. Correlation analysis by joint sparse decomposition.

orthogonal matrix andΣ is a n × t diagonal matrix containing all singular values σ_i of X. Suppose r= rank(X), so σi in Σ = diag(σ1,· · · , σr,0, · · · , 0).

The sum of all singular values represent the total energy of X. According to [19], if a matrix X is low-rank, the sum of its ﬁrst r singular values occupy the total or near total energy, i.e., _r

i=1σ_i ≈ _min(n,t)

i=1 σ_i. Fig.2 is a CDF to show the distribution of singular values’ energy. The top 5% singular values contain all energy in Indoor temperature and Indoor light. The top 5% and 20% σ_i include 90% of energy in GreenOrbs temperature and GreenOrbs light, respectively. The above results show that rank minimization is suitable for our data recovery problem.

C. Inter-Correlation between Attributes

The relationship usually exists among natural attributes. For instance, the empirical study [11] reveals that temperature, dewpoint temperature and relative humidity have linear correlation under some special cases. However, in most cases, the correlations cannot be directly measured as a simple function.

In order to exploit the relationship of attributes in a WSN, we propose the joint sparse decomposition (JSD) to divide two matrices into a common part and two individual parts. Suppose M₁ = (m⁽¹⁾₁ ,· · · , m^(t)₁ ) and M2 = (m⁽¹⁾₂ ,· · · , m^(t)₂ ). For both column vector m^(k)₁ and m^(k)₂ , the goal is to split them,

TABLE II

LOW-RANKFEATURESAFTER JOINT SPARSE DECOMPOSITION Data Name Matrix Name XX%σ contain 90% Energy

Intel Indoor U 14%

light/temperature Δ1 8%

Δ2 2%

GreenOrbs U 27%

light/temperature Δ1 28%

Δ2 21%

i.e.,

m^(k)₁ = u^(k)+ δ₁^(k) m^(k)₂ = u^(k)+ δ₂^(k)

u^(k)= Ψv^(k)

(4)

where u^(k) is the common part of m^(k)₁ and m^(k)₂ , which is the multiplication of a certain basisΨ (e.g., a wavelet basis) and a sparse vector v^(k). The individual parts are represented by δ₁^(k), δ^(k)₂ , respectively. Furthermore, Eqn.(4) is rewritten in a matrix formulation, i.e.,

m^(k)₁ m^(k)₂

=

Ψ I 0 Ψ 0 I

⎡⎣v^(k) δ₁^(k) δ₂^(k)

⎤

⎦ (5)

According to compressive sensing theory [2][3], it is able to obtain(v^(k)T, δ₁^(k)T, δ₂^(k)T)^T by solving an l₁-norm minimization problem as following:

ϑ = arg minˆ

ϑ ||ϑ||1 s.t. m = Aϑ, (6) where || · ||1 is the l₁-norm, ϑ= (v^(k)T, δ₁^(k)T, δ₂^(k)T)^T, m= (m^(k)T₁ , m^(k)T₂ )^T and A= (Ψ, I, 0; Ψ, 0, I). And then u^(k)T, δ₁^(k)T, and δ^(k)T₂ are calculated from ϑ.

Applying JSD onto every column vector, M₁ and M₂ are decomposed, i.e.,

M₁= U + Δ₁

M₂= U + Δ2 (7)

Fig.3 shows that the common part occupies bigger ratio if two matrices have stronger correlation. For instance, when JSD is operated on two same matrices, which have deﬁnitely highest correlation, the common part contains 100% energy and individual parts Δ1= Δ2= 0. Intuitively, in an outdoor WSN, the sensory light and temperature have correlation. Fig.3 shows the common part of light/temperature in forest contains 29% of total energy, while the individual parts contain 35%

and 36%. We also verify JSD on two irrelevant matrices.

The common part of forest light and indoor light, where no relationship exists, contains only7%.

In addition, the low rank feature of matrices after JSD is also revealed. As shown in TABLE.II, over 90% energy of attributes are contained in the ﬁrst30% singluar values. This means that the derived matrices of JSD still exhibit the low- rank feature.

V. OUR APPROACH

To address the EDR problem, we propose a novel rela- tive data estimation approach named Multi-Attribute-assistant Compressive Sensing (MACS), which is designed to jointly recover the attributes in a WSN.

(4)

A. Approach Design

Normalization: In Eqn.(2), the choice of μ has a significant effect on the accuracy of estimation. Since the relationship between M₁ and M₂ is unknown, it is difficult to find the best μ. To overcome the difficulty, a simple method is to normalize each matrix, and then set μ= 1. The real maximum value is possible to loss, hence we adopt the maximum value in gathered datasets instead, i.e., for each sensory matrix PΩi(Mi), max(PΩi(Mi)) is used on the normalization. This operation is based on the observation that the natural attributes changes gradually. In other words, the gap between maximum values of the observed matrix and the original matrix is small in terms of the magnitude, i.e., for i= 1, 2

max(Mi) − max(PΩ_i(Mi)) max(Mi) (8) Low-Rank Matrix Approximation: Eqn.(2) contains the parameters M1 and M2, so the problem cannot be directly solved. However, since the low-rank features are revealed in Sec.IV, the problem is calculated by converting to a rank minimization problem. Through the inverse process of SVD, using k largest singular value of X, an optimal k-rank approximation [8] of X under the Frobenius norm || · || of errors can be obtained as ˆX =_k

1σ_iu_iv_i^T. In our problem, this method is infeasible since we do not know the complete M_i its proper rank. However, it is reasonable to assume that estimated ˆM_i is low-rank due to the low-rank feature of the original M_i. Thus the optimal M_iis evaluated by the problem:

min(rank( ˆM_i)), s.t. PΩi(Mi) = PΩi( ˆM_i).

Still two problems are up against us: (1) the rank calculating operator rank(·) is not convex. (2) there is no connection between M1 and M2.

To bypass the difﬁculty (1), we utilize SVD-like factorization [8] as ˆX= LR^T where L is a n×k matrix and R is a t×k matrix, k is an approximation of the proper rank. According to the progress of the matrix compressive-sensing literature [14][19], rank minimization is exactly equivalent to the nuclear norm minimization when a certain technical condition holds onPΩ(·) (the restricted isometry property [14]). Further, if the rank of X is less than the rank of LR^T, min(rank( ˆX)) is equivalent to min(||L||²_F + ||R^T||²_F).

Compressive Sensing-based Joint Matrix Decomposition:

To overcome the difficulty (2), we need to find the correlation between M1 and M2, and then exploit it into finding an optimal solution.

Through the joint sparse decomposition proposed in Sec.IV, the correlation between two matrices can be revealed. Hence, we separate the approximation ˆM₁and ˆM₂by JSD as:

Mˆ₁= ˆU + ˆΔ₁

Mˆ₂= ˆU + ˆΔ₂ (9)

Assume that ˆU , ˆΔ1and ˆΔ2are low-rank based on the low- rank structure analysis in Sec.IV. The problem is reformulated

as:

minimize || ˆU||∗+ || ˆΔ1||∗+ || ˆΔ2||∗ (10) subject to PΩ1( ˆU+ ˆΔ1) = PΩ1(M1)

PΩ2( ˆU+ ˆΔ2) = PΩ2(M2)

where|| · ||∗is the nuclear norm which is deﬁned as the sum of singular values, e.g.,||X||∗=r

i=1σ_i(X).

Further more, by using SVD-like factorization into ˆU , ˆΔ1

and ˆΔ2, Eqn.(10) is rewritten as:

||LU||²F+ ||R^TU||²F+ ||L1||²F+ ||R^T1||²F+ ||L2||²F+ ||R^T2||²F (11) where L_U, L₁, L₂ are n× r matrices and RU, R₁, R₂ are t× r matrices. Moreover, ˆU = LUR^T_U, ˆΔ1 = L1R^T₁ and Δˆ2= L2R₂^T.

To avoid overﬁtting, we convert the problem to a non- stationary optimization problem by using the Lagrange multi- plier method, i.e.,

minimize ||PΩ1(LUR^T_U+ L1R^T₁) − S1||²_F + ||PΩ2(LUR^T_U+ L2R^T₂) − S2||²_F + λ(

L

||Lj||²_F +

R

||Rj||²_F) (12)

where S₁ = PΩ1(M1) and S2 = PΩ2(M2). The Lagrange multiplier λ allows a tunable tradeoff between the rank mini- mization and the accuracy ﬁtness.

Eqn.(12) is solvable because (1) Ω1, Ω2, S₁ and S₂ are known, (2) each|| · ||²_F is non-negative, (3) the optimal value can be reached by minimizing all non-negative parts to zero.

Hence, ˆM₁and ˆM₂can be estimated by combining Eqn.(12) with Eqn.(9).

Extension: Our approach is also suitable for the case of more attributes. For instance, if three attributes is measured in one WSN, represented as M₁, M₂and M₃. Similarly, rewrite Eqn.(10) as following:

|| ˆU||∗+ || ˆΔ₁||∗+ || ˆΔ₂||∗+ || ˆΔ₃||∗ (13) where i= 1, 2, 3, ˆM_i= ˆU+ ˆΔi. This Equation can be solved by a similar method like Alg.1. In cases of more attributes, it is able to extend Eqn.(10) by this way.

B. Algorithm

To solve the estimation in the optimization problem deﬁned by Eqn.(12), we propose an efﬁcient algorithm. The detail of this algorithm is shown in Alg.1.

The algorithm solves the optimization by a iterative manner.

First, all L and R matrices are initialized randomly except R_U. Fixing L_U, R_Ucan be calculated from other L and R matrices by solving the equation:

⎡

⎣PΩ1(LUR^T_U) PΩ2√(L_UR^T_U)

λR^T_U

⎤

⎦ =

⎡

⎣S1− L1R^T₁ S₂− L2R^T₂

0

⎤

⎦ (14)

this equation is solvable by calculating each line. Rewrite it

as: _⎡

⎣PΩ1(LU)_(i)R^T_U(i) PΩ2(L√_U)_(i)R^T_U(i)

λR^T_c(i)

⎤

⎦ =

⎡

⎣(S₁− L1R^T₁)_(i) (S₂− L2R^T₂)_(i)

0

⎤

⎦ (15)

(5)

Algorithm 1 MACS Algorithm Input:

Ω₁ and Ω₂: sensory entry set S1and S2: incomplete sensory data r: rank estimation of M1and M₂ λ: tradeoff coefﬁcient

k: iteration times Output:

Mˆ1and ˆM2: estimated environment matrices Main Procedure:

1: Normalization

2: α1← max(S1); α2← max(S2);

S₁← S1./α₁; S₂← S2./α₂; 3: Approximation

4: L_U← rand(n, r); L1← rand(n, r); L2← rand(n, r);

5: R1← rand(r, t); R2← rand(r, t);

6: for 1 to k do

7: A1= S1− PΩ1(L1R^T₁); A2= S2− PΩ2(L2R^T₂);

8: R_U ← crossInverse(Ω1, Ω₂, L_U, λ, r, A₁, A₂) 9: L_U ← crossInverse(Ω^T1, Ω^T₂, R_U, λ, r, A^T₁, A^T₂) 10: B₁= S₁− PΩ1(L_UR^T_U); B₂= S₂− PΩ2(L_UR^T_U);

11: R₁← singleInverse(Ω1, L₁, λ, r, B₁) 12: L₁← singleInverse(Ω^T1, R₁, λ, r, B₁^T) 13: R2← singleInverse(Ω2, L2, λ, r, B2) 14: L2← singleInverse(Ω^T2, R2, λ, r, B₂^T) 15: v← Eqn.(12)

16: if v < ˆv then

17: Lˆ_U ← LU; ˆR_U ← RU; ˆL₁ ← L1; ˆR₁ ← R1; ˆL₂ ← L₂; ˆR₂← R2; ˆv ← v;

18: end if 19: end for

20: Mˆ₁← α1( ˆL_URˆ_U^T+ ˆL₁Rˆ₁^T) 21: Mˆ₂← α2( ˆL_URˆ_U^T+ ˆL₂Rˆ₂^T)

Procedure Y = singleInverse(Ω, L, λ, r, S):

1: for i=1 to t do

2: P_i← [PΩ(L)(:, i);√ λ∗ Ir] 3: Q_i← [S(:, i); 0r]

4: Y (:, i) = (P_i^T ∗ Pi)\(Pi^T ∗ Qi) 5: end for

Procedure Y = crossInverse(Ω1, Ω₂, L, λ, r, S₁, S₂):

1: for i=1 to t do

2: Pi← [PΩ1(L)(:, i); PΩ2(L)(:, i);√ λ∗ Ir] 3: Q_i← [S1(:, i); S₂(:, i); 0r]

4: Y (:, i) = (P_i^T ∗ Pi)\(Pi^T ∗ Qi) 5: end for

where i ranges from 1 to t. Eqn.(15) can be treated as a linear least square problem. R_U can be obtained by the inverse procedure given in Alg.1 as crossInverse. And LU can be computed using the same procedure by ﬁxing R_U.

L_i and R_i are obtained by a similar procedure, which is deﬁned in singleInverse as the pseudo code.

Moreover, in Alg.1, the rank approximation r and the lagrange tradeoff coefficient λ are significant influential in the accuracy of estimation. Hence, λ is tuned by the method in [13]. And our evaluation uses r= 20%min(n, t), since 20%

singular values contributes to over90% energy in the datasets.

The complexity of the algorithm is O(rntk). Because the key operation in Alg.1 is the inverse computation, whose complexity is O(nrt) [8], and the algorithm iterates k times.

VI. PERFORMANCEEVALUATION

A. Methodology

Performance evaluation is based on real-trace driven simulation.

Ground Truth: The real trace includes the temperature and light illumination attributes from GreenOrbs and Intel Indoor projects. In Sec. IV-A, we have presented the method to obtain the ground truth from raw data in detail.

Compared Methods: To verify the effectiveness of our approach, two methods for missing data recovery in WSNs are chosen for comparison. They are the classical interpolation method, K-Nearest Neighbor (KNN) [16], and the state-of-the- art method, Environmental Space Time Improved Compressive Sensing (ESTI-CS) [8].

Metric: To compare results evaluated from different matrices, the error rate of approximation under the Frobenius norm, err( ˆM , M,Ω), is applied [13], which is deﬁned as:

err( ˆM , M, Ω) =||P_Ω(M) − P_Ω( ˆM )||²F

||P_Ω(M)||²_F (16) where Ω is the complementary set of Ω.

Procedure: The procedure of simulation is as following:

1) Randomly lose the data from the ground truth to simulate the gathered data in WSNs. Generate the subset Ω from the random loss pattern and then setΩ1= Ω2= Ω.

The quantity of data loss is from 20% to 90%.

2) Using datasets of two physical attributes, compute PΩ(M1) and PΩ(M2).

3) PΩ(M1) and PΩ(M2) serve as the inputs of the estimation algorithms including KNN, ESTI-CS, and MACS, separately or together. Then we obtain the approxima- tions of M₁ and M₂as ˆM₁and ˆM₂.

4) Compare the performance of the algorithms on the error rate deﬁned by Eqn.(16).

B. Simulation Results

In Fig.4, we plot the comparison result of three algorithms in the case of two attributes. According to the simulation, MACS can obtain less than5% error rate under the loss rate less than 60%, where ESTI-CS can provide 10% and KNN performs worse. Even in high loss rate (80%), the error rate of MACS is still less than 10%. The main reason is that MACS uses the correlation between two attributes. Hence, the accuracy of estimating missing values increases if the correlation exists.

And even when there are no relation between two attributes, the performance of MACS is as equal as ESTI-CS.

The recovery accuracy of the temperature is higher than the one of the light illumination. The main reason is that the temperature in outdoor WSNs changes slowly and has small amplitude, which leads to its strong time and space stabilities beneﬁting estimation methods. While the accuracy of light illumination in GreenOrbs is a little weak, the reason is that light illumination varies considerably in nature.

As shown in Fig.4, the estimation performance of KNN is barely satisfactory and reduces quickly as the increasing of

(6)

20 40 60 80 0

10 20 30 40 50 60

Data loss probability /%

Error Rate /%

MACS ESTI−CS KNN

(a) Indoor Light

20 40 60 80

0 10 20 30 40 50 60

Error Rate /%

MACS ESTI−CS KNN

(b) Indoor Temp

20 40 60 80

0 20 40 60 80

Error Rate /%

MACS ESTI−CS KNN

(c) GreenOrbs Light

20 40 60 80

0 10 20 30 40 50 60

Error Rate /%

MACS ESTI−CS KNN

(d) GreenOrbs Temp Fig. 4. The accuracy of missing value estimation methods.

data loss rate. The possible reason is that the massive data loss in WSNs veils the time and spatial correlations between attributes. Hence the interpolation methods can not beneﬁt well from these features.

Totally, MACS outperforms ESTI-CS and KNN in random loss pattern, whatever the correlation between attributes exists or not.

VII. CONCLUSION

In this paper, we studied the Environment Data Recovery Problem in WSNs. We proposed the joint sparse decomposition to reveal the correlation among multiple attributes. The low-rank feature was exhibited by both the original and the JSD derived data. Driven by these observations, we designed the MACS algorithm to approximate the missing data. The algorithm combines the beneﬁts of compressive sensing and the correlation of attributes. Data-driven simulations illustrated that MACS outperforms existing interpolation methods.

The future works are as following. First, considering to use the Bayesian Model into the prediction and the data reconstruction. Second, studying the relationship between the computation time and the accuracy. Third, generalizing the multiple attributes data reconstruction to more ﬁelds.

ACKNOWLEDGMENT

This research was supported by NSF of China under grant No. 61073158, No. 61100210, STCSM Project No.

12dz1507400, No. 13511507800, Doctoral Program Foun- dation of Institutions of Higher Education under grant No.

20110073120021, Singapore-MIT International Design Cen- ter IDG31000101, iTrust Cyber Physical System Protection project, and the Singapore NRF under its IDM Futures Fund- ing Initiative and administered by the Interactive and Digital Media Programme Ofﬁce, Media Development Authority.

REFERENCES

[1] D. Baron, M.B. Wakin, M.F. Duarte, S. Sarvotham, and R.G. Baraniuk.

“Distributed compressed sensing”, Preprint, 2005.

[2] D.L. Donoho. “Compressed sensing”, IEEE Transactions on Information Theory, Vol. 4, No. 4, pp. 1289-1306, 2006.

[3] E.J. Cand`es, J. Romberg, and T. Tao. “Robust uncertainty principles:

Exact signal reconstruction from highly incomplete frequency informa- tion”, IEEE Transactions on Information Theory, Vol. 52, No.2, pp.

489-509, 2006.

[4] F.L. Lewis. “Wireless sensor networks”, Smart Environments: Tech- nologies, Protocols, and Applications, pp. 11-46, 2004.

[5] G. Chen, X.-Y. Liu, L. Kong, J.-L. Lu, W. Shu, and M.-Y. Wu. “JSSDR:

Joint-Sparse Sensory Data Recovery in Wireless Sensor Networks”, IEEE WiMob, 2013.

[6] IF. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci. “Wireless sensor networks: a survey”, Elsevier Computer Networks, Vol. 38, No.

4, pp. 393-422, 2002.

[7] Intel Lab data. http://www.select.cs.cmu.edu/data/labapp3/index.html.

[8] L. Kong, M. Xia, X.-Y. Liu, M.-Y. Wu, and X. Liu. “Data loss and reconstruction in sensor networks”, IEEE INFOCOM, pp. 1701-1710, 2013.

[9] L. Kong, M. Zhao, X.-Y. Liu, J.-L. Lu, Y. Liu, M.-Y. Wu, and W. Shu.

“Surface coverage in sensor networks”, IEEE Transactions on Parallel and Distributed Systems, 2013.

[10] M. Heil and R. Karban. “Explaining evolution of plant communication by airborne signals”, Trends in Ecology & Evolution, Vol. 25, No. 3, pp. 137-144, 2010.

[11] M.G. Lawrence. “The relationship between relative humidity and the dewpoint temperature in moist air: A simple conversion and application- s”, Bulletin of the American Meteorological Society, Vol. 86, No. 2, pp.

225-233, 2005.

[12] M.L. Rudolph, L. Karlstrom, and M. Manga. “A prediction of the longevity of the lusi mud eruption, indonesia”, Earth and Planetary Science Letters, Vol. 308, No. 1, pp. 124-130, 2011.

[13] S. Rallapalli, L. Qiu, Y. Zhang and YC. Chen. “Exploiting temporal stability and low-rank structure for localization in mobile networks”, ACM MobiCom, pp. 161-172, 2010.

[14] B. Recht, M. Fazel, and PA. Parrilo. “Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization”, SIAM, Vol. 52, No. 3, pp. 471-501, 2010.

[15] S. Ji, D. Dunson, and L. Carin. “Multitask compressive sensing”, IEEE Transactions on Signal Processing, Vol. 57, No. 1, pp. 92-106, 2009.

[16] T. Cover and P. Hart. “Nearest neighbor pattern classiﬁcation”, IEEE Transactions on Information Theory, Vol. 13, No. 1, pp. 21-27, 1967.

[17] T. He, S. Krishnamurthy, J.A. Stankovic, T. Abdelzaher, L. Luo, R. Stoleru, T. Yan, L. Gu, J. Hui, and B. Krogh. “Energy-efﬁcient surveillance system using wireless sensor networks”, ACM Proceedings of the 2nd international conference on Mobile systems, applications, and services, pp. 270-283, 2004.

[18] X.-Y. Liu, K. Wu, Y. Zhu, L. Kong, and M.-Y. Wu. “Mobility increases the surface coverage of distributed sensor networks”, Elsevier Computer Networks, 2013.

[19] Y. Zhang, M. Roughan, W. Willinger, and L. Qiu. “Spatio-temporal compressive sensing and internet trafﬁc matrices”, ACM SIGCOMM, Vol. 39, No. 4, pp. 267-278, 2009.

[20] Y. Liu, Y. He, M. Li, J. Wang, K. Liu, L. Mo, W. Dong, Z. Yang, M. Xi, and J. Zhao. “Does wireless sensor network scale? A measurement study on greenorbs”, IEEE INFOCOM, pp. 873-881, 2011.

[21] Z. Yang, M. Li, and Y. Liu. “Sea depth measurement with restricted ﬂoating sensors”, IEEE RTSS, pp. 469-478, 2007.