英國大學其研究所水準已達國際水準,英語為其教授母語,溝通上相當容 易,值得多與他們交流。
Randomized Distributed Algorithm for Peer-to-Peer
Data Replication in Wireless Ad Hoc Networks
Hong-Zu Chou, Szu-Chi Wang, and Sy-Yen Kuo
Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan
[email protected]
Abstract
In this paper, a randomized distributed algorithm is proposed to enhance data accessibility in wireless ad hoc networks. Furthermore, in order to analyze the behavior of our algorithm, a probabilistic approach is presented to derive the upper bound of convergence by a novel technique called path coupling, which gives more insight into factors determining system performance.
1. Introduction
Recently, peer-to-peer (P2P) systems have been extensively investigated [1]. Under the paradigm of P2P communications, data accessibility is crucial for overall system performance. In order to increase data accessibility for P2P applications, various data replication schemes have been proposed. However, these research results may not suitable for wireless ad hoc networks due to the vast differences in network characteristic.
Many cooperative caching schemes tailored for applications over mobile ad hoc networks have been presented. To cope with the situation in which a fixed access point is not available, Sailhan and Issarny [2]
introduced a cache management strategy which can minimize the energy cost, especially for the Web caching problem. In [3], Yin and Cao proposed a hybrid cache scheme, unlike traditional ones that only replicate the contents of objects, the adopted path-caching strategy redirects possible future requests to a nearby node instead of the remote data center. In [4], Hara proposed many replica allocation and data update mechanisms to improve data availability for a partitioned network.
In this paper, we propose a randomized distributed algorithm for data replication. Moreover, we adopted a novel technique called path coupling to derive the upper bound of convergence time of the algorithm. We believe the concepts and the techniques presented here can provide a pragmatic building block for P2P applications over ad hoc networks.
2. System Model
Assume there are n nodes and m objects in a wireless ad hoc network and for simplicity, m is set as O(n).. Each node u has equal transmission radius r and memory capacity Φ(u)=c < m. The number of objects allocated in node u is denoted as mu. Let d(u, v) denote the hop-distance between node u and node v, the set Nh(u)={v∈V : d(u, v)≤h} is called the h-hop neighborhood of u. In this paper, the system is assumed as a relaxed asynchronous model, i.e., the upper bounds on process execution speed, message transmission delays, and clock draft rates are known. Thus we assume that a successful one-to-all local-broadcast operation can be accomplished within a constant period tlb.
Albeit perfect synchronization is impossible, rational synchronization can be achieved via extra facilities, such as GPS signals. Therefore, we assume that all nodes are synchronized in rounds which consist of a number of time-slots.
Consider a P2P system, each node u has its innate objects (denoted as INNATE(u)) and some replicated objects (denoted as REP(u)). If node u is interested in object o, it issues a query q(u, o)∈Q to search for o. A query q(u, o) is called resolved if there exists a query resolution r that indicates the path information to node v, and o∈INNATE(v)
∪ REP(v). For all query q(u, o)∈Q, if there exits a resolution set R such that d(u, v)≤k, we call Q k-coverable and R a k-covering resolution set for Q.
3.2. Randomized Distributed Algorithm
The main objective of our algorithm is to assure that all query sets become k-coverable. Since a wireless ad hoc network can be constructed without any pre-existing infrastructure, it typically provides a great degree of flexibility. On the other hand, data replication strategies in a centralized or hierarchical fashion are not desirable as the system size becomes large. Moreover, nodes in the same region are apt to require similar object(s). If a deterministic algorithm retrieves the most preferred object, it may cause unnecessary resource drain on the network because too many duplicates are allocated in the vicinity. Therefore, we propose a randomized distributed algorithm that aims to higher scalability and efficiency in a resource-limited network. The pseudo-code of the proposed algorithm is presented in Figure 1, where each node u executes the same procedure to make object allocation decisions.
Figure 2. Pseudo-code description of our data replication algorithm.
In line 1, the procedure is a data collector which periodically collects data from node u’s 1-hop neighborhood.
Each node u contains a distance vector costu with size m; in which each element costu(oi) records the hop-distance of object oi. During the period of tlb, each node local-broadcasts the distance vector to its 1-hop neighbors. Whenever node u collects all the distance information from nodes v∈N1(u), for each object oi, if costv(oi)+1 ≤ costu(oi), the value of costu(oi) will be updated to costv(oi)+1. After executing such operations for k times, if the value of each costu(oi) is smaller or equal to k, it indicates that the query set of all objects is k-coverable, otherwise, there are some queries that cannot be resolved by Nk(u). In order to avoid unnecessary redundancy by replicating objects in cooperation with its neighbors, node u do nothing with probability α, and chooses the candidate object for replicating/dropping with probability (1-α). In lines 6 – 12, the candidates or and od are chosen uniformly and randomly (u.a.r) from the sets R = {oi: costu(oi) > k} and D = {oj: costu(oj) = 0}, respectively. If the local memory is full, object od will be dropped; otherwise node u will issue a request for replicating object or.
As described above, the steps of information exchange and object replicating/dropping are all repeated in a distributed manner. Note that since the dropping candidate is selected without considering neighbors’ states, a node that has reached its stable state may be invoked to execute the algorithm again if a shared object is dropped by some neighbor. However, it is shown in the following sections that all query sets will eventually become k-coverable and the system enters a stable state with high probability (w.h.p.).
4. Stochastic Analysis
Consider our data replication algorithm in which the decisions made by each node only depend on its current state, it is clear that the algorithm satisfies the memoryless property and can be treated as a Markov chain with state space Ω (i.e., the set of all configurations). For a randomized algorithm which is operated as a Markov chain, one of the prime objectives is to derive the mixing time of the algorithm. In other words, it refers to how long the algorithm will take to reach one of the legitimate configurations with high probability. Therefore, we now use a powerful
technique called path coupling [5] to examine the behavior of the proposed randomized algorithm, more specifically, we show the upper bound of the time before entering one of the legitimate configuration set L. Based on our problem formulation, L consists of all configurations that the query sets are k-coverable. It is not hard to verify that the configurations in L are strongly connected. Moreover, L denotes the states with non-zero probability in the stationary distribution of the corresponding Markov chain. For brevity, the upper bound of memory capacity c is assumed to be large enough to replicate object(s) in the k-hop neighborhood. Furthermore, the self-loop probability α is set to 1/2 for ensuring the aperiodicity of Markov chain.
The state of each node i is expressed as a set si = {o1, o2, …, om}, where oj∈{0, 1}, 1 ≤ i ≤ n, 1 ≤ j ≤ m. The case oj = 1 indicates that object j is in node i's memory. Likewise, oj = 0 indicates that node i does not have the replica of object j. Obviously, the number of 1’s appearing in si is no more than the memory size c, thus there are C possible
states for a node, where m. If we encode s
our algorithm can be described by all combinations of si. Each possible configuration in Ω is expressed as (s1, s2, …, sn), where si∈{0,1,…, C}. Note that (i) the size of Ω is Cn; (ii) the replicating/dropping decisions made by the participating nodes may cause some movement from one configuration in Ω to another.
A Markov chain is rapid-mixing if the (ε-approximate) mixing time is bounded by a polynomial in ln(ε−1) and the size of each configuration in the state space. Due to the lack of space, we only show that the process of randomized distributed algorithm is rapid-mixing (i.e., converging in a time-efficient manner) and skip details of the mechanical proofs1.
Proposition 1: For the proposed algorithm, there exist a subset S of Ω × Ω, an integer-valued metric δ on Ω × Ω taking the values in {0, 2, 4,…, 2cn}, and a coupling defined on S, such that for all (Xt, Yt)∈S, E[δ (Xt+1, Yt+1)] ≤ β⋅δ (Xt, Yt), where β = 1 − ((m+c)/2mc).
Proposition 2. The mixing time of our algorithm is upper bounded by 2 ln(2 −1) +
cn ε
c m
mc
w.h.p.6. Conclusions and Future Works
In this paper we concentrate on augmenting data accessibility for peer-to-peer data communications over wireless ad hoc networks. In addition to proposing the randomized algorithm, a path-coupling based method was used to verify the rapid-mixing property of state transition dynamics, together with a (loose) upper bound of the convergence time. In the future, we would like to conduct performance evaluation through much more extensive experiments, including the considerations of data updating, radio signal interference, and node mobility.
Furthermore, since wireless communication may suffer from the traditional layered architecture, we hope to achieve further improvement by incorporating cross-layer adaptation.
References
[1] S. Androutsellis-Theotokis and D. Spinellis, ″A survey of peer-to-peer content distribution technologies, ″ ACM Computing Surveys, 36(4):335-371, 2004.
[2] F. Sailhan and V. Issarny, ″Cooperative Caching in Ad Hoc Networks, ″ Proc. Int. conf. on MDM, pp. 13-28, 2003.
[3] L. Yin and G. Cao, ″Supporting Cooperative Caching in Ad Hoc Networks,″ IEEE Trans. Mobile Comput, 5(1):
77-89, 2006.
[4] T. Hara and S. K. Madria, ″Data Replication for Improving Data Accessibility in Ad Hoc Networks,″ IEEE Trans. Mobile Comput, 5(11): 1515-1532, 2006.
[5] R. Bubley and M. Dyer, ″Path-coupling: a technique for proving rapid mixing in Markov chains, ″ Proc. of IEEE FOCS, pp. 223-231, 1997.
[6] V. Guruswami, ″Rapidly mixing markov chains: A comparison of techniques,″ May 2000. (available at http://cs.washington.edu/homes/venkat/pubs/papers.html)
[7] D. Randall, ″ Mixing, ″ Proc. of IEEE FOCS, 2003.
1 Similar proof techniques can be found in [6][7]