Secure Clustering in WSNs - Related Work - 在叢聚式無線感測網路下以統計分析為基礎之選舉式攻擊偵測系統

2. Related Work

2.2. Secure Clustering in WSNs

Besides prolong the lifetime of WSNs, the security issues are relatively important. Cluster-based wireless sensor network often reduce communication overhead by means of message aggregation by clustered-heads or sinks. But message aggregation results in more degree of difficulty in security. Many researchers propose method of security and energy efficiency separately. In the point of energy efficiency, the methods to support data aggregation and clustering algorithm are proposed [2-8]. In the point of security, the method to support compromised resistant [10-14]

encryption techniques and manage a secret key that is applicable to sensor networks is proposed. Some researches [10] design secure routing protocol combining conventional routing protocol with security protocol under cluster-based wireless sensor networks. Because of the restrict resources problem in WNSs, conventional security solutions do not fit into sensor network system. So they use the SPINS protocol.

SPINS (Security Protocols for Sensor Networks) [11] provides not only data encryption but also message authentication and user identification service by used of symmetric key.

SNEP (Sensor Network Encryption Protocol) is one protocol of the SPINS. It provides data confidentiality, two-party data authentication, and data freshness, with low overhead. In order to achieve two-party authentication and data integrity, it uses a message authentication code (MAC). For example, the two communication parties A and B share a master key YAB, and the independent keys they derived are using the pseudorandom function F. The processes of deriving independent keys are as follow：

Encryption keys KAB = FY(n) (n is a random number known by A and B) and KBA = FY(n+2) for each direction of communication, and MAC keys K’AB = FY(n+2) and K’BA = FY(n+4) for each direction of communication. And the format of the encryption data is：E = {D}(K,C), where D is the data, K is the

encryption key and C is the value of counter. So the complete message that A sends to B is

A Æ B： {D}(K’AB,CA), MAC(K’ABCA || {D}(K’AB,CA)).

SNEP offers semantic security, data authentication, replay protection, weak freshness, low communication overhead…etc.

Another famous protocol of SPINS is µTESLA. It not only provides broadcast authentication but also solves the many inadequacies of the standard TESLA. µTESLA uses only symmetric mechanisms instead of initialing packet with a digital signature which is too expansive for sensor nodes. µTESLA also saves energy by disclosing the key once per epoch.

Finally µTESLA restricts the number of authentication senders.

The feature of µTESLA is that it requires the base station and nodes be loosely time synchronized, and each node knows an upper bound on the maximum synchronization error. In order to transmit an authentication packet, the base station computes a MAC on the packet with a key which is a secret at that point in time. When a node received that packet, it can verify the MAC key of that packet was not yet disclosed by the base station according to its time loosely synchronized clock, its maximum synchronization error, and the time schedule at which keys are disclosed. The node then stores the packet in a buffer. At the time that the key was disclosed, the base station broadcasted the verified keys to these nodes. When the receiver receives the key, it can verify the correctness of the key by used of the one way key chain. After that the node can use the key to authenticate the packet stored in its buffer. The verification of the key is as following：

Each MAC key which is generated by a public one-way hash function F is a key of a key chain. In order to generate the on way key chain, the sender randomly computes the last key Kn of the key chain as the initial input of F and repeatedly process F to compute all the keys of the key chain：Ki = F(Ki+1). When the nodes received Ki+1 at time interval i, it can verify that by the rule Ki = F(Ki+1). But it cannot back trace Ki+1 when it only knows Ki. Figure 2-1 is the concept of the key verification.

Some other researches [12-14] aim to present an effective key management scheme which improves security of cluster-based WSNs. The secure distributed cluster formation protocol [9] organizes sensor networks into mutually secure disjoint cliques.

2.3 Security Threats of Compromised Nodes in Clustered WSNs

Wireless sensor network often reduce communication overhead by means of message aggregation. But message aggregation results in more degree of difficulty in security. Each intermediate node which was compromised can modify, forge or discard message, or simply transmit false values to aggregator.

One of the inside attacks is the fabricated report attack, which means compromised nodes may pretend to have detected nearby events or forward a fabricated report supposedly originating at another location to aggregator. If there is no secure mechanism to protect the network, adversaries could claim non-existed events nearly to aggregator. This kind of attack will not only waste the effort to report but also provide an un-trusted condition of the networks to managers. The other is false votes on real reports attack. This attack is that the attackers may inject false MACs for every real report. If the methods are used, all these real event reports would be dropped during the process of forwarding. The work [18] has presented a scheme to protect from these attacks.

Node compromise presents many security threats for WSNs. The resent researches, statistical en-route filtering mechanism (SEF) [19], an interleaved per hop authentication scheme (IHA) [20], and a location-based resilient

security solution (LBRS) [21], provide some efficient scheme to protect WSNs from fabricate report attack. SEF is the first paper that addresses false sensing report detection problem in the presence of compromised sensors.

But there are some problems in these previous researches. They address these problems at a location several hops from the attacker, which results in high resource consumption and the spread of damage across the network.

This paper aims to find a way to solve the problems completely.

3. Proposed Scheme

The cluster-based WSNs have been proved the higher performance than non-clustered [1]. However, one of the important potential factors of the so called “higher performance” is the degree of correlation between inter-cluster nodes’ readings. In order to ensure the full performance superiority of clustered WSNs over non-clustered WSNs, the degree of correlation between inter-cluster nodes’ readings must be high. One should know that the degrees of correlation are not fixed. The way to achieve that goal is to make each cluster binding within the same isoclusters [1].

In addition to lower the number of traffic by packet aggregation in the clustered WSNs, the security of aggregation is also important. One should realize that if some compromised nodes always report the fake messages to cluster heads, it would cause down the performance of aggregation and make the data invalid. In our scheme, we use the statistic analysis technique to detect these attacks. Before constructing the scheme, we should bind the range of isocluster first. Following are the requirements in our scheme

1. In order to save the effort of clustering. The time to cluster must be begun at the time that the event occurred whose range exceeds two hops away and the clustering positions must be within the same isoclusters.

2. Each sensor node has its own look-up table which can queue sensing data for a while.

3. Each sensor node knows the information of its neighbors. Sensors broadcast their information periodically to their neighbors, including the cluster binding range factor Sc’(described bellow), the Sc’ and IDs of its neighbor…etc.

4. The nodes in the network are quasi-stationary.

5. Time synchronization in the network must be secure and precise.

6. In order to let the sensors know whether event occurred inside the sensing range or not. There should be a training phase which makes the sensors know the normal condition of the environment.

7. The numbers of compromised neighbors must be less than half of a node’s neighbors in WSNs. If there are N nodes which uniformly and

independently distributed over an area R = [0,L]² in the network (i.e. L is the length of the field of the network). Think about that if there are more than half of neighbors compromised. Then the total numbers of compromised nodes was N/2. And the network was almost crash. Figure 3-1 shows the scenario of the compromised case of the network.

3.1 Dynamic Range Binding of Isoclusters

In order to increase the degree of correlation between inter-cluster nodes’ readings, we should bind the range of isoclusters first. Than, we use HEED protocol [7] to achieve clustering within the isoclusters.

Before binding the range of isoclusters, we should compute the cluster binding range factor Sc’ of each node to judge whether or not the node should be included in the range. The standard of Sc’ of each node will be known at training phase of the network.

Figure 3-2 is the data structure of the sensor’s look-up table. As time go by, sensors sense data and store the contents in the look-up table. They can queue data for a while (in figure 3-2, sensors can queue data from time slot T1

to time slot T5) and compute the mean value Mi (see figure 3-1) of data Di

real-time in each column of look-up table. Each D of the table means the

sensing data of data type Di at time slot Tj.

Figure 3-3 shows the data range reference table of each data type (the data range is predefine at training phase of WSNs), it contains the range of each data type in normal condition and divides several levels of values of each Di. Rij define the range of data value Di located in Rj.

If the Data Level of each Li = Rj in Figure 3-2 means that the mean value Mi

locates at range Rj in Figure 3-3.

The standard of cluster binding range factor of the node would be：

Sc = α1RD1 +α2RD2 +α3RD3……+αiRDi (i is the number of data type)

For each αi of Sc is the weight of normal range RDi( i.e. α1 +α2 +….+ αi = 1, If the number of data type sensed by sensors in WSNs is one, α =1 ). The way to compute Sc’ is the same as Sc.

Then we can judge Sc’ from Sc. If Sc’ ≠Sc, it means there are some events occurred at the range of this node. The sensors will record the abnormal data type and inform its neighbor its Sc’.

When events occurred in the network, the Sc’ of sensors would be different from the Sc. Depending on the information exchanged periodically by sensors, the sensors will know whether their neighbors and the ones who is two hops away from them were in the range of isoclusters or not. When sensors get the information that their two hops away neighbors are in the range of isoclusters, they will broadcast “starting clustering” messages and start to cluster. The sensors which were located on the boundary of isocluster would know themselves (see Figure 3-4), because they know the Sc’ of their neighbors. When the ones who were located in the isocluster got the information that the Sc’ of their neighbors equal to Sc, they are on the boundary of isocluster. Figure 3-4 shows a scenario of isoclusters binding

range. The node with blue colored represents that they were located in the range of isoclusters, vise versa. The degree of color on the background is represented as the degree of event. The node circled (see the node located in the middle of circle) is on the border of the isocluster. When it got the information (i.e. Sc’) of its red colored neighbors, it knows itself that “I am on the border of the isocluster”.

After binding the range of clustering, we use the HEED protocol [7] to execute the detail action of clustering. The clustering process of HEED terminates in O(1) iterations and the network topology and size do not influence that. But there are some problems of the HEED protocol. At the end of finalization of HEED, the nodes with smaller cost would have higher probability to be chosen as the cluster-heads. For others, each of non-cluster-heads will join the clusters with messages heard from the cluster-heads. Otherwise it is elected as cluster-heads. Consequently, a node is guaranteed to be either a cluster-head or a non-cluster-head which belong to a cluster. However, the worst case of the HEED protocol will make a node cluster only itself. That means the number of members in the cluster is only one (i.e. cluster-head). The problem has been solved by some works [2] [3].

Figure 3-5 shows the topology of clusters in WSNs of figure 3-4. When the isocluster is bound by sensors, it would be clustered into several clusters by process HEED protocol.

3.2 Construction of Detection Scheme

In order to either protect WSNs from these attacks that compromised nodes may transmit fabricated contents of messages to cluster-heads and make the data aggregation incorrect. We proposed a statistical voting scheme.

This scheme computes the reasonable data range of each sensor and uses it to judge each sensor for a while. With the effect of trust-worthy formula, the clustered-heads would know which non-clustered-heads were compromised.

There are three steps of the detection process：

1. Trusted samples filtration.

2. Reasonable data range analysis.

3. Judging of compromised nodes.

3.2.1 Trusted Samples Filtration

The clusters in WSNs formed when there are events occurred nearby, so the data ranges in one cluster were limited. Therefore, we use the neighbors of a node which is the destination of detection as the voters to vote that whether the destination is compromised or not. The problem is that each non-cluster-heads is not only the destination of detection but also a voter. If there were some compromised nodes between neighbors, the degree of accuracy would be very low. So we have to choose the trustful samples to

trust them as voters. According to retirement 7, the numbers of compromised neighbors are less than half of a node’s neighbors in WSNs. Figure 3-6 shows the parameters used by following steps.

1. After receiving the data from non-cluster-heads, cluster-heads compute the Mi at each time slot Ti. (the number of time slots is the same as that in the look-up table of each sensors)

2. Find the half of data set at Ti that most close to the Mi. According to requirements, the neighbors who transmit these data are the trust voters.

3. In order to judge the remainders is worthy to be trusted or not, we use the standard deviation Ei to identify. If the value of the difference e from Mi to data (e = Dij - Mi) is less than or equal to Ei (i.e. e <= Ei), we then consider that node which transmit that data as a legal voter. The reason why choose Ei to identify whether the remainders are trustful or not is that the standard deviation Ei represents the arrangement of data. If standard deviation is high, it means the range of data spread loosely and vise verse. Furthermore, all normal density curves satisfy the following property which is often referred to as the Empirical Rule. 68% of the observations fall within 1 standard deviation of the mean, 95% of the observations fall within 2 standard deviations of the mean, 99.7% of the observations fall within 3 standard deviations of the mean. Thus, for a normal distribution, almost all values lie within 3 standard deviations of the mean.

4. The chosen voters of each Ti may be different. We chose the ones who were chosen most of the time.

5. After processing these three steps, the data of chosen voters would be the trustful samples.

If sensors sense more than one data type, the sensor would mark the data type of event and the detection process would go onto that data type.

Figure 3-6: The parameters of sample filtration in detection

3.2.2 Reasonable Data Range Analysis

When cluster-heads know who the voters of the detection destination are, they can compute the reasonable data range of the destination at each time slot. We use regression analysis technique to make cluster-heads achieve that task.

In order to adjust the data of voters into reasonable range, the regression technique will help that. Figure 3-7 shows the data structure of regression analysis. When cluster-heads get the data (i.e. Dij) of voters at each time slots, they will compute the mean value Mi of each voter’s data and the mean value mi of data at each time slot.

The effect of sensors ai and the effect of Tj, bj would be：

a = m – M S The destination of detection N The number of neighbors of S

Ti The ith time slot, sensors sense data once each Ti

Mi The mean value of data at ith time slot sensed by neighbors of S Ei The standard deviation of the data at ith time slot

(j is the number of time slots in look-up table of each sensor) bj = Mj – M (i is the number of voters) Then we compute the fitted value D’ij of each Dij：

D’ij = M + ai + bj

After compute the fitted value D’ij of each Dij, we would have the identical form of data at each time slot. But these are not the final result of range. We have to consider the residential differences between Dij and D’ij. The reasonable data of detection destination should be in the range of the data value provided by voters, so we must add the residential differences to D’ij.

D”ij = D’ij + (D’ij – Dij) = 2D’ij – Dij

Finally, we find the up-bound Uj and low-bound Lj of data D”ij at each time slot Tj and these would be the reasonable range (from Uj to Lj at Tj).

3.2.3 Judging of Compromised Nodes

When the cluster-heads want to detect the non-cluster-heads by judging whether they transmit spoofed data or not, they can use the data of their neighbors to achieve detection. After processing first and second steps of the scheme, the cluster-heads then get the reasonable range of data value to judge the destination. However, it is not correct that if only one data transmitted by destination was out of reasonable range, the cluster-heads would consider the destination as a compromised one. There should be a trustworthy formula to make the decision.

Let the trustworthy value of sensor Si is Wi. If the data of destination at time slot Tj is out of reasonable range, then：

Wi = Wi – ρj × e^- (i.e. e^- is the effect of un-trustful value.) If ρj < 1

Then ρj = ρj-1 + p (p is pre-defined relation probability).

Else ρj = 1.

If the data of destination at time slot Tj is inside the reasonable range, then:

Wi = Wi + e⁺ (i.e. e⁺ is the effect of un-trustful value.)

After detecting j times, we should consider the trustworthy value Wi of sensor Si. If Wi is less than the standard threshold, Si has been compromised and vise versa.

4. Evaluation

In this section, we first analyze the environment where the simulations process in. Second, we show the result of our simulations and discuss several factors that can probably influence the results. Finally, we give a list of security discussion to our scheme.

4.1 Environmental Analysis

The environment of our scheme should be based on the architecture of clustered organized wireless sensor networks. Contract to the traditional clustered WSNs, the action of clustering begins after the time that event occurred and locates within the isoclusters. As a consequence, the geographical locations of clusters are just right on the locations of events. So the range of data values is limited into a small scope. And the precision of detection for compromised nodes would be reasonable. But there are still two problems of our dynamic clustering environment.

1. What is the size of clusters are suitable for the scheme?

2. What is the relation between the isoclusters and the number of clusters?

在文檔中在叢聚式無線感測網路下以統計分析為基礎之選舉式攻擊偵測系統 (頁 14-0)