Fault-tolerance analysis of a wireless sensor network with distributed classification codes

(1)

Fault-Tolerance Analysis of a Wireless Sensor

Network with Distributed Classiﬁcation Codes

Po-Ning Chen

∗

, Tsang-Yi Wang

†

, Yunghsiang S. Han

‡

, Pramod K. Varshney

§

, Chien Yao

¶

and Shin-Lin Shieh

∗

∗_{Dept. of Commun. Eng., Nat’l Chiao Tung Univ., Taiwan 300, ROC. Email: [email protected]} †_{Graduate Institute of Commun. Eng., Nat’l Sun Yat-sen Univ., Taiwan, ROC. Email: [email protected]}

‡_{Graduate Institute of Commun. Eng., Nat’l Taipei Univ., Taiwan, ROC. Email: [email protected]} §_{Dept. of Electrical Eng. and Computer Science, Syracuse Univ., USA. Email: [email protected]} ¶_{Dept. of Electronics Eng., Nat’l Chiao-Tung Univ., Taiwan 300, ROC. Email: [email protected]}

_{Sunplus Technology Co., Ltd, Taiwan 300, ROC. Email: [email protected]}

Abstract— In this work, we analyze the performance of a

wireless sensor network with distributed classification codes, where independence across sensors, including local observations, local classifications and sensor-fusion link noises, is assumed. In terms of large deviations technique, we establish the necessary and sufficient condition under which the minimum Hamming distance fusion error vanishes as the number of sensors tends to infinity. With the necessary and sufficient condition and the upper performance bounds, the relation between the fault-tolerance capability of a distributed classification code and its pair-wise Hamming distances is characterized.

I. INTRODUCTION

Consider a wireless sensor network (WSN) that consists of N sensors, N wireless and hence noisy one-way communica-tion links, and a fusion center as shown in Fig. 1. The WSN is tasked with the solution of a M -ary hypothesis testing or classification problem. Compression on the local observation is assumed to be performed at each sensor before information is sent to the fusion center. In this work, we are specifically concerned with the case where the sensor nodes only send out binary decisions to the fusion center at which they are fused to produce the final M -ary decision.

An issue that may be encountered in the WSNs considered is that the wireless binary-output sensor that is supposed to be manufactured by a simple and low-cost technology may suffer from hardware as well as software malfunctions after deployment over a harsh environment [1]. Therefore, the fault-tolerance capability to protect against unexpected sensor failures is of great importance in such cases to maintain an acceptable level of performance in a WSN.

To achieve the desired robustness against sensor faults, a distributed classiﬁcation code has been proposed to be used in the wireless sensor network to provide a good fault-tolerance capability under feasible system complexity [3]. It was shown in [3] that with adequately high probability, the decision made by the minimum Hamming distance fusion rule can fall into the correct acceptance region even if several sensor faults that are unknown to the fusion center are present.

In [2], we had characterized the asymptotic performance of the minimum Hamming distance fusion rule under some restrictive assumptions. In this work, we extend our analysis in [2] by relaxing the assumptions of common distribution for

all local observations and identical local classification rule for all sensors. Also, only independence across sensors is assumed for the additive noises over the wireless links. Contrary to the requirement of sufficiently large number of sensors in [2], the probability bounds obtained in this work are now valid for any finite number of sensors. In particular, the necessary and sufficient condition under which the minimum Hamming distance fusion error vanishes as the number of sensors tends to infinity is established. With the necessary and sufficient condition and the upper bounds on the error probability, the relation between the fault-tolerance capability of a distributed classification code and its pair-wise Hamming distances is characterized.

II. SYSTEM MODEL

As depicted in Fig. 1, the distributed M -ary classiﬁcation system assumes that the local observations {y_j}N_j=1 are con-ditionally independent given each hypothesis, and each local sensor classiﬁes its own observation, independent of all others, to one of theM hypotheses using its own decision rule. Denote by h(j)_|i the probability of classifying H given that Hi is the true hypothesis at sensor j. Also assume that the prior probability of each hypothesis is equal, and the event of link error, i.e, [u_j = u∗_j], is not only independent across sensors but independent of the local observation as well as the true hypothesis Hi.

Based on the assumed statistics, an M × N code matrix C is then designed in advance, of which element c,j lies in {0, 1} for = 0, . . . , M − 1 and j = 1, . . . , N. In the code matrix, each hypothesis is associated with a row, and each column stands for the local binary outputs corresponding to the classiﬁed hypotheses at the respective sensor. Thus, sensorj transmits c,j, ifHis declared locally. For notational convenience,c (c_,1,c,2,. . ., c,N) is used to denote the row ofC corresponding to the hypothesis H.

After the observation is locally processed, the local output code bit u∗_j is transmitted to the fusion center. The fusion center receives the wordu = (u1, u2, . . . , uN), where uj and

u∗_j form a binary symmetric channel (BSC) with crossover probability j. The minimum Hamming distance fusion rule,

(2)

Multiclass Phenomenon ?y1 ?y2 · · · ?yN Local sensor 1 Local sensor 2 · · · Local sensorN u∗1 u∗2 · · · u∗N u∗j = c,j, ifHis declared true

upon the reception ofyj

× = BSC with crossover probability j

× × ×

?u1 ?u2 · · · ?uN

minimum Hamming distance fusion arg min

0≤≤M−1d(u, c)

=⇒

A distributed classiﬁcation codeC sensor 1 · · · sensorN H0 c0,1 · · · c0,N H1 c1,1 · · · c1,N .. . ... · · · ... HM−1 cM−1,1 · · · cM−1,N

Fig. 1. System model for a WSN with distributed classiﬁcation code.

or speciﬁcally, ω = arg min0≤≤M−1d(u, c), is then em-ployed to obtain the multiclass decisionω, where d(·, ·) is the Hamming distance.

III. PERFORMANCE ANALYSIS

Lemma 1: Let {Z_j}∞_j=1 be independent binary variables with Pr[Z_j = 1] = q_j and Pr[Z_j = −1] = 1 − q_j. Then, ifλm E[Z1+ · · · + Zm]/m < 0, Pr {Z1+ · · · + Zm≥ 0} ≤ e−m·Im(0), (1) where Im(x) sup θ≥0[θx − ϕm(θ)] and ϕm(θ) 1 mlog E eθ(Z1+···+Zm)_.

Proof: The lemma can be proved by following the

funda-mental large deviations argument. It is omitted due to page

limitations.

The probability bound in (1) does not exhibit any apparent relation withλm, namely the average of the means of{Zi}mi=1. This can be amended by the next lemma.

Lemma 2: Ifλm E[Z1+ · · · + Zm]/m < 0, then Pr {Z1+ · · · + Zm≥ 0} ≤ (1 − λ2m)m/2.

Proof: Let¯q_m= (1/m)m_i=1qj, and note thatλm= 2¯qm−1. So, the assumption of the lemma is equivalent to¯q_m< 1/2.

The validity of the lemma for0 < ¯q_m< 1/2 can be proved by Jensen’s inequality in terms of the upper bound in (1) as

follows. e−m·Im(0)_{= inf} θ≥0exp ⎧ ⎨ ⎩ m j=1 logqjeθ+ (1 − qj)e−θ ⎫⎬ ⎭ = inf θ≥0exp ⎧ ⎨ ⎩m ⎛ ⎝ m j=1 1 mlog qjeθ+ (1 − qj)e−θ ⎞_⎠⎫⎬ ⎭ ≤ inf θ≥0exp m · log _m k=1 1 m qjeθ+ (1 − qj)e−θ = inf θ≥0exp m · log¯qmeθ+ (1 − ¯qm)e−θ = (4¯qm(1 − ¯qm))m/2,

where the last equality takes the optimizer θ∗= log(1 − ¯qm)/¯qm> 0 for0 < ¯q_m< 1/2.

In case ¯q_m= 0, we have (4¯q_m(1 − ¯q_m))m/2= 0, and inf θ≥0exp ⎧ ⎨ ⎩ m j=1 logqjeθ+ (1 − qj)e−θ ⎫⎬ ⎭ ≤ inf θ≥0exp m log¯qmeθ+ (1 − ¯qm)e−θ = inf θ≥0exp {−mθ} = 0. Based on the probability bounds obtained in Lemmas 1 and 2, we can upper-bound the minimum Hamming distance fusion error for a WSN with distributed classiﬁcation codes by the following theorem.

Theorem 1: If

λmax< 0, (2)

then the minimum Hamming distance fusion error satisﬁes: Pe≤ (M − 1)(1 − λ2max)dmin/2, (3) where Pe 1 M M −1 i=0 Pr(fusion decision = Hi|Hi),

(3)

dmin min 0≤,i≤M−1,=id(c, ci), qi,j j+ (1 − 2j) M −1 k=0 (ci,j⊕ ck,j)h(j)k|i, (4) and

λmax _{0≤,i≤M−1,}max

=i 1 d(c, ci) N j=1 (c,j⊕ ci,j)(2qi,j− 1). (5) Proof: Pr(fusion decision = Hi|Hi) ≤ Pr

d(u, ci) ≥_{0≤≤M−1,=i}min d(u, c) Hi ≤ 0≤≤M−1,=i Pr (d(u, ci) ≥ d(u, c)| Hi) = 0≤≤M−1, =i Pr ⎛ ⎜ ⎝ {j∈[1,··· ,N]: c,j=ci,j} (zi,j− ¯zi,j) ≥ 0 Hi ⎞ ⎟ ⎠ , wherezi,j uj⊕ci,j and¯z represents the complement of the binary0-1 variable z. Observe that

Pr(zi,j= 1|Hi)

= Pr(uj⊕ ci,j= 1|Hi)

= Pr(uj= u∗j anduj⊕ ci,j = 1|Hi) + Pr(uj= u∗j anduj⊕ ci,j = 1|Hi) = Pr(uj= u∗j andu∗j⊕ ci,j= 1|Hi)

+ Pr(uj= u∗j andu∗j⊕ ci,j = 0|Hi) = Pr(uj= u∗j) Pr(u∗j⊕ ci,j = 1|Hi)

+ Pr(uj= u∗j) Pr(u∗j⊕ ci,j= 0|Hi) = j+ (1 − 2j) Pr(u∗j⊕ ci,j= 1|Hi) = j+ (1 − 2j)

M −1 k=0

(ci,j⊕ ck,j)h(j)k|i= qi,j, and{z_i,j}N

j=1is independent across sensors givenHi is true. Therefore, (3) can be obtained by applying the upper bound

in Lemma 2.

With the above theorem, we figure that if for some δ > 0, λmax < −δ for all sufficiently large N , the decoding error vanishes exponentially fast asdmin approaches infinity. Since under a fixed number of hypotheses, dmin can be made to grow linearly with the number of sensors N , we conclude that the average error probability for a WSN with distributed classification code and minimum Hamming distance fusion can be made zero asymptotically as N goes to infinity, and the error exponent is bounded below by

lim inf N →∞ −

1

Nlog Pe≥ lim infN →∞ −

dmin

2N log(1 − λ2max) as long aslim sup_{N →∞}λmax < 0. Next, we will show that the assumption that lim sup_{N →∞}λmax > 0 leads to a non-vanishingPe, and hence, establish the necessary and sufﬁcient condition under whichPe vanishes.

Theorem 2: Pe is bounded away from zero inﬁnitely often, iflim sup_{N →∞}λmax> 0.

Proof: The assumption that lim sup_{N →∞}λmax > 0 implies the existence of δ > 0 such that λmax > δ for inﬁnitely manyN . Hence, for any N validating λmax> δ, there exists = (N ) and i = i(N ) such that

N j=1

(c,j⊕ ci,j)(2qi,j− 1) > δ · d(c, ci). (6) By deﬁningzi,j and ¯zi,j the same as in the proof of Theorem 1, we obtain: µ,i E ⎡ ⎣ {j∈[1,··· ,N] : c,j=ci,j} (zi,j− ¯zi,j) ⎤ ⎦ = N j=1 (c,j⊕ ci,j)(2qi,j− 1) > δ · d(c, ci). As a result, Pr(fusion decision = Hi|Hi) ≥ Pr

d(u, ci) >_{0≤≤M−1,=i}min d(u, c) Hi ≥ Pr (d(u, ci) > d(u, c)| Hi) = Pr ⎛ ⎝ {j∈[1,··· ,N] : c,j=ci,j} (zi,j− ¯zi,j) > 0 Hi ⎞ ⎠ ≥ Pr ⎛ ⎝ {j∈[1,··· ,N] : c,j=ci,j}

(zi,j− ¯zi,j) − µ,i> 0 Hi ⎞ ⎠ → 1_{2 ,} ifd(c, ci) approaches inﬁnity,

where the last step follows the central limit theorem for the sum of independent bounded variables. Thus, the claim of the theorem holds for the case thatd(c, ci) tends to inﬁnity.

In situations when d(c, ci) is bounded as N approaches inﬁnity in which case a bad code design results, the theorem

is trivially valid.

IV. ANALYSIS OFPESSIMISTICFAULT-TOLERANCE CAPABILITY

As mentioned earlier, the wireless sensor network consid-ered in this paper is likely to contain faulty sensors. Faults may include all misbehaviors, ranging from stuck-at faults to sensors that behave arbitrarily. Observe that when sensor faults (SF) occur,qi,j is no longer given by (4), but becomes a function of the new statistics ofu∗_j owing to sensor faults. For example, when stuck-at-one fault occurs at sensor j, Pr{u∗

j = 1|Hi} = 1 for 0 ≤ i ≤ M − 1. Hence,

q_i,j(SF) = j+ (1 − 2j) Pr(u∗j⊕ ci,j = 1|Hi) = jci,j+ (1 − j)(1 − ci,j).

Similarly, for stuck-at-zero fault,

q_i,j(SF) = j+ (1 − 2j) Pr(u∗j⊕ ci,j = 1|Hi) = j(1 − ci,j) + (1 − j)ci,j.

(4)

M -ary Multiclass Phenomenon y1 y2 · · · yN · · · u∗1 u∗2 · · · u∗N × × × ?u1 ?u2 · · · ?uN

minimum Hamming distance decoder arg min 0≤≤M−1d(u, c) Memoryless BSC Postulate code-dependent binary channel =⇒ ?u ? ? ∗ 1 u∗2 · · · u∗N ⊕ ⊕ ⊕ - - -n1 n2 nN ?u1 ?u2 · · · ?uN

minimum Hamming distance decoder !

# of (u∗

j⊥⊥ cj) < |βmax|(dmin/2)"

implies vanishing decoding error.

? ? ?

∅ ∅ ∅

c1 c2 · · · cN

M -ary code encoder

c = (c1, c2, · · · , cN) ∈ C = {c0, c1, · · · , cM−1}

N uncooperative

bit-by-bit postulate encoders

? ? ?

Fig. 2. Equivalent serial-connected binary channel model speciﬁcally for wireless sensor networks.

In case a random fault occurs, in which Pr{u∗_j = 0|Hi} =

Pr{u∗

j= 1|Hi},

qi,j(SF)= j+ (1 − 2j) Pr(u∗j⊕ ci,j= 1|Hi) =

1 2 .

In fact, q(SF)i,j ranges from min{j, 1 − j} to max{j, 1 −

j}. As no prior information on the sensor fault type, as well

as the faulty sensor number, is assumed known at the fusion center, it is safer to consider the fault-tolerance capability of the system by the worst case scenario. Then, the next corollary, which is a straightforward extension of Theorem 1, can be used to characterize the fault-tolerance capability of a distributed classiﬁcation coding system.

Corollary 1: Suppose that the fusion center knows the set of faulty sensor indices, F, and also knows the respective qi,j(SF ) of thosej ∈ F. Then, if λmax(F) < 0, we have:

Pe≤ (M − 1)(1 − λ2max(F))dmin/2,

where the superscript “c” denotes the set complement opera-tion and

λmax(F) max_{0≤,i≤M−1,}

=i 1 d(c, ci) j∈Fc (c,j⊕ ci,j)(2qi,j− 1) + j∈F (c,j⊕ ci,j)(2qi,j(SF)− 1) .

By min{j, 1 − j} ≤ qi,j(SF) ≤ max{j, 1 − j}, we can

verify based on the above corollary that: λmax(F) − λmax = max 0≤,i≤M−1,=i 1 d(c, ci) ⎛ ⎝ N j=1 (c,j⊕ ci,j)(2qi,j− 1) +2 j∈F

(c,j⊕ ci,j)(q(SF)i,j − qi,j)

⎞ ⎠ − λmax ≤ 2 max 0≤,i≤M−1,=i 1 d(c, ci) j∈F

(c,j⊕ ci,j)(qi,j(SF)− qi,j)

≤ ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ 2 max 0≤,i≤M−1, =i 1 d(c, ci) j∈F (c,j⊕ ci,j)(1 − 2j) M −1 k=0 [1 − (ci,j⊕ ck,j)]h(j)k|i, if j≤ 1 2 2 max 0≤,i≤M−1, =i 1 d(c, ci) j∈F (c,j⊕ ci,j)(2j− 1) M −1 k=0 (ci,j⊕ ck,j)h(j)_k|i, if j> 1 2 ≤ 2 max 0≤,i≤M−1,=i 1 d(c, ci) j∈F |1 − 2j| M −1 k=0 h(j)_k|i = 2 max 0≤,i≤M−1,=i 1 d(c, ci) j∈F |1 − 2j| = 2 dmin j∈F |1 − 2j|. (7)

In order to guarantee a vanishing Pe with the maximal

allowable number |F| of faulty sensors, it sufﬁces to have λmax(F) ≤ λmax+_d2 min |F| j=1 |1 − 2j| < 0. (8)

For an identical sensor system wherej= and h(j)_k|i = hk|i

(5)

to dmin> −2|1 − 2| |F| λmax = 2 |F||βmax|, (9) where βmax max 0≤,i≤M−1,=i M −1 k=0 hk|i[d(ci, ck) − d(c, ck)] d(c, ci) . Since λmax ≥ min 0≤i≤M−1,1≤j≤N(2qi,j− 1) = −(1 − 2) 1 − 2 M −1 k=0 (ci,j⊕ ck,j)h(j)k|i ≥ −|1 − 2|

for an identical sensor system, we have:

dmin> −2|1 − 2| |F|

λmax = 2 |F||βmax|≥ 2|F|. (10) Note that the condition of dmin > 2|F| that was formerly used as a heuristic code search requirement in [3] resembles the interpretation for conventional coding techniques, which states that a code with minimum pair-wise Hamming distance dmin can tolerate around dmin/2 errors. However, inequality (10) hints that a larger dmin than (2|F|)/|βmax| instead of 2|F| may be necessary for an identical fault-tolerant sensor network system. By examining those codes that minimize (3) for M = 8 and N ∈ {50, 100, 150, · · · , 600}, we found that βmax is around −0.66. In other words, in the worst case where the fusion center has no information on both the sensor fault types and faulty sensor indices, the number of faulty sensors allowable for these codes is only two-third of dmin/2. Inequality (10) also interestingly indicates that under an identical sensor system, the worst-case fault-tolerance requirement has nothing to do with the link noise as we have anticipated. Inequality (10) will reduce to the heuristic constraint of dmin > 2|F| when all the misclassiﬁcation probabilities become zero (in which case hi|i = 1 for 0 ≤

i ≤ M − 1, and hence βmax = −1 regardless of the codes adopted).

V. CONCLUDING REMARKS

The coding problem considered in this paper can actually be transformed into one for the memoryless binary symmetric channel (BSC) with unreliable bit-by-bit postulate encoders as shown in Fig. 2, when the link noises have common marginal distribution. We can further consider the memoryless BSC channel with unreliable bitwise postulate encoders as a serial connection of two binary channels, in which the ﬁrst channel suffers code-dependent noises that give

Pr(u∗ j|cj) = M −1 i=0 $ (cj⊕ ci,j) M −1 k=0 1 − (u∗j⊕ ck,j)h(j)_k|i % M −1 i=0 1 − (cj⊕ ci,j) , where an over bar represents a complement operation, and the second channel is the memoryless BSC channel. The case of sensor faults under the equivalent channel model becomes

thatu∗_j turns independent ofcj (and hence, code-independent)

without notifying the fusion center. Our results then indicate that the constraint that the number of code-independent bits inu∗ (i.e., the number of faulty sensors) is less than|βmax| × (dmin/2) is sufﬁcient to guarantee a vanishing decoding error for such a serially connected binary channel. This bound is derived based on the pessimistic view when both faulty sensor indices and sensor fault types are unknown to the fusion center, or equivalently, the decoder is aware of neither the index of every faulty bitu∗_j nor its resultant code-independent distribution. In the extreme case thatu∗ andc are completely dependent, which should occur when h(j)_k|i = 1 for every 0 ≤ k = i ≤ M −1, the constraint reduces to the conventional |F| < dmin/2 for the coding technique since βmax= −1, and the serially connected binary channel reduces to a memoryless BSC channel. This observation hints that in a channel suffering from code-dependent noises, a code that makesu∗ (channel output) andc (channel input) more “dependent” (and thus, the channel output has more information about the input) is still expected to be a better and more robust code, which is exactly the underlying concept behind the Shannon baptized “channel capacity”. It would be interesting to conduct research along this line, and determine the capacity of the postulate code-dependent channels.

REFERENCES

[1] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam and E. Cayirci, “A survey on sensor networks,” IEEE Communications Magazine, pp. 102–114, August 2002.

[2] P.-N. Chen, T.-Y. Wang, Y. S. Han, P. K. Varshney and C. Yao, “Asymp-totic performance analysis for minimum-hamming-distance fusion,” in IEEE International Conference on Acoustics, Speech, and Signal Pro-cessing, Philidaphia, May, 2005, pp. 865–868.

[3] T.-Y. Wang, Y. S. Han, P. K. Varshney and P.-N. Chen, “Distributed fault-tolerant classiﬁcation in wireless sensor networks,” IEEE Journal of Selected Areas in Communications, vol. 23, no. 4, pp. 724–734, April 2005.