Rosiglitazone is not associated with an increased risk of bladder cancer

(1)

LOCALITY AND RESOURCE AWARE PEER-TO-PEER OVERLAY

NETWORKS

(Invited)

Chun-Hung Wu, Kuo Chiang, Ruo-Jian Yu, and Sheng-De Wang

ABSTRACT

Unstructured peer-to-peer (P2P) overlay architectures are attracting more and more attention. In order to solve the topology mismatch problem, many approaches take locality information into account when designing peer-to-peer overlay networks. In this paper, we not only exploit locality but also take resource types into consideration. Taking advantage of data replication, selective search, clustering, and interest groups, we can improve the search performance of unstructured P2P networks. Simulation results show that our algorithm is better than the mOverlay network in the number of messages per search while it maintains almost the same hit ratio and comes with competitive locality properties.

Key Words: peer-to-peer networks, overlay networks, locality aware, resource aware.

*Corresponding author. (Tel: 33663579; Fax: 886-2-23671909; Email: sdwang@ntu.edu.tw)

The authous are with the Department of Electrical Engineering, National Taiwan University, Taipei 106, Taiwan, R.O.C.

I. INTRODUCTION

Peer-to-peer overlay networks are virtual net-works built on top of underlying netnet-works (Doval and O’Mahony, 2003) and can provide various services, such as video streaming and file sharing. Gnutella and Napster are pioneers in peer-to-peer file sharing s y s t e m s . T h e t w o p e e r - t o - p e e r s y s t e m s a r e unstructured, and the allocation of files is completely unrelated with the topology of the overlay networks. They provide a convenient environment to share ob-jects or files among a large number of users. Be-cause of the limits of query flooding and the central index service scheme, respectively, Gnutella and Napster both suffer a scalability problem. Much re-search has been devoted to develop new locating and searching technologies for peer-to-peer systems. Many peer-to-peer systems take advantage of structured overlay networks, such as distributed hash table (DHT) techniques to distribute objects evenly. Among these, CAN (Ratnasamy et al., 2001), Chord (Stoica et al., 2001), Pastry (Rowstron and Druschel, 2001), and Tap-estry (Zhao et al., 2001) are well-known infrastruc-tures that deploy DHT as the fundamental part. Two

neighbors in virtual overlay networks may actually span two distant countries in underlying networks. Furthermore, according to Lv et al. (2002), when nodes frequently join or leave the overlay network, struc-tured peer-to-peer systems take much time in order to keep the structure of the overlay network. On the other hand, DHT-based structured networks are not well suited for keyword search, although the search performance of structured peer-to-peer systems is more efficient than unstructured peer-to-peer systems.

In unstructured peer-to-peer networks, to reduce the bandwidth requirement and improve the efficiency of the flooding mechanism, two classes of methods are used. The first class of methods is based on se-lective search including biased search and blind search. Biased search sends messages only to the nodes that are more likely to get the information, instead of transmitting messages to all their neighbors. The methods are used in Yang and Garcia-Molina (2002), Ratnasamy et al. (2002), and Lv et al. (2002). Blind search confines the nodes to transmitting mes-sages to some adjacent nodes instead of all adjacent nodes without using the information of messages or choosing the best path for transmitting. The meth-ods of random walk (Lv et al., 2002) and dynamic query (Xu et al., 2003) belong to blind search. In the second class of methods, nodes make use of cach-ing or replication mechanisms to keep the known objects to enhance the search efficiency. The most

(2)

common way is uniform index caching (UIC), where the locations of files are recorded in caches once the files are successfully found in a process of search, so the files will be found easily later. The method is used by Patro and Hu (2003), Chiang et al. (2007), Cohen and Shenker (2002), Bestavros and Jin (2003), and Wang and Vanninen (2004). Although the se-lective search and the caching mechanism can im-prove the search efficiency of peer-to-peer networks, the topology mismatch problem of the overlay net-works still exists.

To further improve the efficiency, locality should be considered by multicast or routing algorithms of the application layer when constructing overlay net-works (Zhang et al., 2004), (Zhang et al., 2005), (Kobayashi et al., 2005), (Shamsi et al., 2005), (Sun

et al., 2006), (Shin et al., 2002), (Xiao et al., 2005).

Two common frameworks are based on the structures of trees and meshes, respectively. For example, the Narada protocol (Chu et al., 2000), which is suitable for small overlay networks, is based on meshes. The NICE application layer multicast protocol (Banerjee

et al., 2002) and the host multicast tree protocol (HMTP)

(Zhang et al., 2002) are based on the structures of trees. Thus, when nodes are added to the overlay networks, they have to notify the node at the highest level. The highest level node becomes a hotspot; if the hotspot fails, the systems fail. The binning scheme (Ratnasamy et al., 2002) uses a set of nodes called landmarks to consider the locality information and thus improve overall efficiency of the network. Al-though the method has improved the efficiency, it needs extra information about landmarks and may incur hotspots when the overlay network grows. To con-struct an overlay network that takes account of local-ity of peers, the mOverlay network (Zhang et al., 2004) is proposed to decrease the communication cost be-tween nodes. The mOverlay network uses a dynamic landmark technique to achieve the load balance and avoid the hotspots. In this paper, we consider not only the locality and but also the resource types in an overlay similar to the mOverlay network.

Locality is also considered in (Rowstron et al., 2001) and (Shamsi et al., (2005). In (Rowstron et

al., 2001), the nodes with high hit ratio are put near

the nodes that issue queries, but it does not utilize the concept of clustering. In this case, the search would incur a lot of messages transmitted in the network. In (Shamsi et al., 2005), similar files are put in the same semantic overlay, and there are several clusters in a semantic overlay. This mechanism is similar to ours. But in the aspect of locality, in (Shamsi et al., 2005), TTL are assigned to decide which cluster to join in a semantic overlay. Since the distribution of real underlying networks is not considered, it cannot solve the problem of topology mismatch.

II. OVERLAY NETWORKS WITH THE AWARENESS OF LOCALITY AND

RESOURCE TYPES

The concept of locality can be viewed as the distance between two peers, network latency, round trip time, or the minimum bandwidth between two nodes. To solve the problem of mismatch (Liu et al., 2004), we hope that adjacent nodes in the underlying network are also adjacent in the overlay network. As an example, Fig. 1(a) (Zhang et al., 2004) shows an overlay network with consideration of locality, result-ing in less cross-group communications, while Fig. 1 (b) does not take the locality into account, leading to many cross-group communications.

The concept of interest groups (Chen et al, 2006) can play an important role in designing peer-to-peer systems. Everyone has his own interest and the com-munities in the Internet also reflect the fact. If we can place files with the same attribute or files inter-esting the same group of users in the same cluster, the users can easily find the files by searching for them within the same cluster. The concept is shown in Fig. 2(a) for general resource types or as Fig.

1 2 1 3 4 3 2 4 5 7 6 8 5 8 7 6 (a) (b) 1

Fig. 1 (a) An overlay network concerning locality and (b) an over-lay network built randomly

Movies Ebooks Songs The unpopular

(a)

(b)

Rocks Pop Jazz Country

Fig. 2 (a) The groups of resource types and (b) the groups of song types

(3)

2(b) for types of songs. It is worthy of note that the resource types themselves present the properties of fractals both in coarse and fine classifications of resources. The proposed overlay network is a two-level scheme, called LARO networks, with the con-cept of grouping by factors of data replication, clustering, and interest groups. Due to the fact that we take both locality and resource types into con-sideration in our algorithm, there may be some files added to a cluster because of locality, although they do not match the resource type of the cluster.

The proposed LARO overlay networks is a two-level scheme that makes use of the concepts of locality and resource awareness to construct the peer-to-peer network. The upper level, level 1, is composed of clusters or groups, and the lower level, level 0, is composed of nodes. Nodes in the same clusters share some similarity, either locality or resource types. There is a group leader in each cluster. Group leaders man-age and keep all the information of the nodes, includ-ing peer lists, distances between peers, and resource type number in clusters. Each file in a node belongs to one resource type; for example, the files may be of types of movies, music, or programs. Each node will be assigned a resource type number determined by the majority of resource types of its files. For example, if there are three movie files, one music file, and one program in a node, we assign the node the resource type “movies” as the node’s resource type number. And the resource type number of a cluster is the same as the group leader’s resource type number. In general, a group leader might be the first node to form the cluster. To further promote the search efficiency of the proposed peer-to-peer network, we also make use of the concept of data replication. Assuming there is a node with N files and every file has its resource type number, we do data replication during the locating process that constructs the overlay. Once a node has been added to a group, we randomly select zero to N-1 files and replicate them to the clusters with the same resource type number.

In this paper, clusters or groups are composed of nodes with either the close locations or the same resource type. A locating process is required to help a node find a group to join. We add a node to a group, first considering its resource type and then its local-ity in the underlying network. This will form a net-work of groups, with each group mainly comprising the same type of resources. We present the pseudocode of our algorithm as follows.

Group_Locating(host) { Boothost=Get_Boot_Host(RP); Group=Get_Group(Boothost); do{ If(Meet_Resource_Criterion()) return Group; if(Meet_Locality_Criterion(host, Group)) retrun Group; CandidateList=Get_Neighbors(Group); D i s t a n c e L i s t = M e a s u r e _ D i s t a n c e ( G r o u p , CandidateList); (MinDis, MinGroup)=Min(DistanceList); Group=MinGroup; if(Meet_Stop_Criterion()) return NewGroup; }while(true) }

As in the mOverlay network (Zhang et al., 2004), the locating process relies on a node called Rendez-vous Point (RP) that has all the information of the overlay network. When a node wants to enter the overlay network, it gets connection with the RP first, and the RP responds to it with a bootstrapping host randomly (Cramer et al., 2004). The method is called dynamic landmark location algorithm (Zhang et al., 2004), which randomly provides the bootstrapping hosts in order to balance the load of nodes in the network. Then, the locating procedure continues to check if the resource type number of the node is the same as that of the cluster that the node wants to join. If they have the same resource type number, we add the node to the cluster directly. If the resource type numbers are different, we consider the cluster’s locality. If the distance between the node and the cluster’s group leader is less than a threshold value, the node joins the cluster. If the above two condi-tions are not satisfied, the locating process will enter a recursive procedure. The group and its neighbors will be put in an array and then we find its nearest neighbor considering resource and locality conditions. The recursive process may terminate after a fixed number of trials. In our simulation, we set ten times as our upper bound; namely, the node will form a new cluster and be the leader if the locating procedure occurs over ten times. The node also forms a new cluster if there is no neighbor found in the locating process.

In Fig. 3, we explain the procedure of adding a node to the overlay network. When node A wants to join the overlay network, it first contacts RP for some nodes to try. In each try, node A first checks if the condition of resource types is met and then if the lo-cality condition is satisfied. If yes, node A will be added to the group. If no, we will find the group’s nearest neighbor and repeat the same procedure. We can see from Fig. 3 that node A traverses from group 1 to group 4. If group 4 meets the resource or local-ity conditions, node A is added to group 4, and the locating process ends.

(4)

consider the type of resources; if the condition is not satisfied, we will then consider the condition of locality. The problem is that how we set the priority of the two conditions. If we take the condition of distance into consideration first, it may cause lots of files which do not have the same resource type num-ber to be incorporated into clusters, because the nodes are added to a cluster only if their distances are less than a threshold value. In this case, when users search for files in a cluster, they may find too many files with different resource type numbers, making a lot of mismatches in searching. If we take the type of resource into consideration first, the problem is solved. However, there is no perfect strategy. Since if we consider the type of resources first, we will get clusters with longer distances between nodes. In our simulation, we can find that the average distance does not increase that much. So, we decided to first con-sider the types of resource and then distance.

In summary, we propose using three conditions that make peers join a certain group, namely resource types, locality, and replication. First, a peer’s re-source type number is the same as the group. Second, the average distance between a peer and the group is smaller than a certain threshold. Third, when a peer joins the group with the same resource type number, it will also randomly select other groups with the same resource type number to join.

There are three conditions to form a new group: first, the first peer enters the overlay network. Second, a peer executes the locating process over a specific number of times. Third, a peer moves to an isolated group with a different resource type number and the average distance between them is over a cer-tain threshold. In forming a new group, there needs to be a new resource type number and M neighbors must be found by running the locating process sev-eral times.

III. SEARCH ALGORITHMS FOR THE PROPOSED OVERLAY

1. Search Algorithms

The method of selective search is originally used to reduce the requirement of bandwidth and the num-ber of messages during flooding. In this paper, the proposed search algorithm is based on the random walk algorithm (Kobayashi et al., 2005). Because our overlay architecture is a two-level framework, common searching algorithms cannot satisfy our needs. The original random walk algorithm uses a boot host to send a fixed number of walkers to start the search in parallel. It randomly selects the next peers to send the query with each peer having the same probability to be selected. Since our algorithm is based on a two-level architecture as shown in Fig. 4, the proposed random walk algorithm first takes a fixed number of walks in the first level, level 0 and then takes a another fixed number of walks in the second level, level 1. The groups in the first level, level 0, have the same resource type number as the file; the groups in the second level, level 1, contain those groups with different resource type numbers. For example, if we want to search for a song, first of all, we will search among Song groups, which are in the first level, level 0. If we cannot find the required song among them, we will search among other groups which are in the second level, level 1.

As an example, we take an F-walker random walk and set F = 10. When a peer wants to search for a file, it will release 10 walks to groups in the first level. We randomly select a boot host in a group in each walk and randomly take two walks. The search algorithm will stop if we find the required file at this step; if not, a peer will release another 10 walks to groups in the second level, and each walk randomly selects a boot host in a group and randomly takes two steps.

We can view an overlay network as a graph. Each node in a graph represents a group or a peer in the overlay network, and two nodes connected by Group Group Group Group Group Group 1 Group 3 Group Group 2 Group Group Group B node A (3) (4) Group (2) (5) (1) RP Group Group Group Group Group Group Group Group Group 4 Group 3 Group 2 Group 1

Fig. 3 Locating process

Same resource type Level 0 Different resource type Level 1

(5)

links are neighbors. Assume that an overlay network with N total nodes and each node has M neighbors on average; in other words, each node has M out degrees on average. The nearest group can be found in an overlay network by looking at nodes and edges in a graph. The proof of the following formula about av-erage distance in hops, d, between two nodes can be found in (Zhang et al., 2004):

d < logM N + 3

Thus, the average distance in hops of two nodes is O(logM N); therefore, a node can find its nearest group in O(logN) communications.

Connectedness of nodes or connectedness of groups is an important issue in a network (Si and Li, 2005). A query in a peer-to-peer network might fail if there are isolated nodes in overlay networks; in other words, we cannot find a path from a start node which is an isolated node to any other groups. To solve this problem, the proposed LARO algorithm will find some groups as neighbor groups when form-ing a new group. Therefore, it will reduce the prob-ability of forming a disconnected graph.

As compared to mesh-based and tree-based architectures, the proposed two-level scheme is much more efficient to maintain. For an N-node peer-to-peer network, mesh-based overlay networks (Hsiao and Liao, 2006) suffer from an O(N2₎ communica-tion load between two peers. For tree-based overlay networks, the tree root or top-level nodes would be overloaded when peers make queries in this overlay network frequently.

2. Average Distance Analysis

We are interested in knowing the average dis-tance for the LARO networks. We assume that there are N groups under the LARO network. Each group, on average contains n nodes and has M group neighbors. Each node in a group has neighbors on average. We use Do to represent the average distance between two groups, and Di to represent the average distance between two nodes in a group. Then, we know there are N . n . m/2 connections inside groups and N . M/2 links between groups. We can get the average distance D under LARO networks, the pro-–

cedure being the same as for mOverlay (Zhang et al., 2004): D = D_iN⋅n⋅m 2 + DoN₂⋅M N⋅n⋅m 2 + N⋅ M 2 =Di⋅n_n⋅_⋅m + D_{m + M}o⋅M. (1)

Assuming that the intra group node degree m and in-ter group degree M is related by m = λM, we can get

D =Di⋅_λλ_⋅⋅n + Do

n + 1 . (2)

Furthermore, by assuming λn >> 1, we have

–

D ≅ Di. (3)

From Eq. (3), we can see that in the LARO overlay the average distance is close to the intra group dis-tance Di under some conditions. This is reasonable because the conditions λn >> 1 can easily be met by

increasing the intra group degree factor, λ, and main-taining an appropriate average number of nodes in groups, n.

On the other hand, we would like to compare the average intra group distances between LARO and mOverlay. Assume the average intra- and inter- group distances of mOverlay are Din and Dou, respectively. It has been shown in Zhang et al. (2004) that if the number of group N satisfies N >> 1, the average dis-tance of a randomly connected overlay, Drco can be approximated as:

Drco≅ Dou. (4)

In a group of the LARO network, we assume the ratio of the number of peers that have the same resource type number but do not satisfy the locality criterion to the number of peers that meet the locality criterion is r/s. Thus, for the intra group distance Di in LARO, we have

Di = (r . Dou + s . Din)/(r + s). (5)

From Eq. (5), it is obvious that

Din < Di < Dou. (6)

If r = 0, that is, in any group of LARO, there is no peer that does not meet the locality criterion, the LARO network would reduce to an mOverlay network. Thus

Di = Din if r = 0. (7)

The result shows that if r << s, the intra group dis-tance of groups of LARO would be close to that of mOverlay.

3. Discussion

In this paper, we propose applying the grouping concept to unstructured P2P networks. It is noted that the grouping concept can also be applied to the Dis-tributed hash tables (DHT) based P2P systems such as CAN and Chord. Although DHT-based networks pro-vide efficient platforms with guaranteed searching

(6)

performance, the locality is destroyed since DHTs use a uniform hash function to evenly distribute nodes and objects in the virtual space. The hierarchical architec-tures based on DHTs, such as HIERAS (Xu et al., 2003) and Canon (Ganesan et al., 2004), can significantly improve locality by grouping nodes geographically. HIERAS combines a hierarchical structure with DHT based routing algorithms like Chord. Besides the big-gest ring, it groups topologically adjacent nodes into smaller rings. In a two-layer HIERAS system, for example, P is the Layer-1 ring which may contain three Layer-2 rings. The nodes within a ring in the lower layer are more adjacent topologically. HIERAS uses the Distributed Binning Scheme (Ratnasamy et al., 2002) to determine which ring the node belongs to. A set of landmark nodes is chosen for dividing a system into disjoint bins. When a node joins HIERAS, it measures distance between landmark nodes and itself. Nodes that get the same order are organized into a smaller ring. In the smaller ring, nodes create proper links and table entries according to the system’s DHT structure. Compared to Grapes, HIERAS takes the landmark scheme which adds only extra tables and links in each hierarchical layer but not independent DHTs. There-fore the loads of the nodes in HIERAS are all the same because the consistent hashing produces good load-balancing. When a node requests an object, it looks up the smallest ring’s table first. This method exploits locality because the nodes in the smaller ring are closer than in the bigger ring. HIERAS can reduce hops on DHT based systems significantly, too.

Canon proposes a generic architecture that is based on DHTs with hierarchical structures. It in-herits load-balancing offered by a flat design. Its key idea is recursive routing, and refers to the internal nodes in the hierarchy as domains. The domains in upper layers merge their children in lower layers. It looks like the DNS system. The Canon principle can be applied to many different DHTs. For example, Canon can transform Chord into its Canonical ver-sion which is called Crescendo. Upper Crescendo rings are obtained by merging lower Chord rings or Crescendo rings. When two rings are merged, nodes keep original links but add some new links.

IV. SIMULATION

We use four types of file distributions to simu-late the file distribution in an overlay network, namely, uniform distributions, dichotomy distributions, Zipf distributions, and Zipf-Mandelbrot distributions. Dif-ferences between these four file distributions are as follows. When using a uniform distribution, it means that files or objects to be shared have the same prob-ability to be put into the groups in P2P networks. The dichotomy distribution is a phenomenon that file

distributions are power-law up to a point and expo-nential decay beyond that point. We use it here to model that the file distribution is sharp in some items and decays quite quickly. The Zipf distribution states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. Thus, the most fre-quent word will occur twice as much as the second most frequent word; the second most frequent word occurs twice as much as the fourth most frequent word; etc. This law is used to refer to anything about power law probability distribution. Zipf’s law states that for a population of N elements, the frequency of k-th elements, f (k; s, N), is given by:

f (k; s, N) = 1/k

s

Σ

n = 1 N

1/ns, (8)

where N is the number of files, k is their rank, s is the value of the exponent characterizing the distribution, and we use the classic version of Zipf’s law, which corresponds to s = 1, in the simulation.

The Zipf-Mandelbrot distribution is also often used to model file distributions. Zipf-Mandelbrot is a generalized version of the Zipf distribution with one more parameter. The following equation defines the probability of receiving a k-th rank object in N objects.

P(k) = 1

H_N,_α_{, q}⋅(k + q)α, HN,α, q=

Σ

i = 1

N _i

(i + q)α,

q ≥ 0 (9)

where α is a skewness factor and q is a plateau factor. In Saleh and Hefeeda (2006), α is chosen in between [0.4, 0.7], and q is between [5, 60]. q is called the plateau factor because it determines the plateau shape near the left-most part of the distribution. It is noted that the higher the value of q, the flatter the head of the distribution will be. When q = 0, Zipf-Mandelbrot distribution degenerates to a Zipf distribution with a skewness factor α. Higher values of α mean that the difference in ranking distribution is more pronounced. In our experiments, we set α = 1 and q = 60 to model a distribution something in between the uniform dis-tribution and the Zipf disdis-tribution.

We add 50000 peers in order into the network to form an overlay network, respectively, set the num-ber of files to be 15000, and compare the properties of locality, messages per search, the hit ratio, and the hit ratio in level 0 for the four file distributions. The locality will be modeled as an index δ that denotes the improvement ratio over the randomly connected overlay and is defined as below:

δ= 1 –dLARO

(7)

where dLARO denotes the average distance under the LARO network and dRCO is the average distance un-der the randomly connected overlay (RCO) network. The simulation results are shown In Table 1, as com-pared to the RCO network, the LARO network has the δ > 80% for the uniform, the Zipf and the Zipf-Mandelbrot distributions, while for the dichotomy distribution the LARO network gets only about 54% improvement. The reason is that there are more du-plicate files under the dichotomy distribution than the former three file distributions; therefore, a node has more chance to join a group with the same resource type number instead of considering locality, thus causing average distance to increase.

After running 10000 times, the search algorithm using modified random walk has a clear trend in showing the average number of messages per search. The trend is quite coincident with the relationship with the popu-larity modeling capability of the file distribution we use in overlay networks. The number of duplicate files increases when there are increasing numbers of popular objects in the network. The popularity mod-eling capability from high to low are the dichotomy, the Zipf, the Zipf-Mandelbrot, and the uniform distributions, respectively, and the average number of messages per search from low to high are ranked the same as above. Note that using the dichotomy distribution produces more replicate files for more popular objects, so it incurs fewer messages per search than using the other three file distributions. In this case, we can almost find the queried file in level 0 as can be seen from the fact that the number of mes-sages per search is far below the other three distributions. In the proposed two-level random walk search algorithm as shown in Fig. 4, we make queries for the required file in groups of level 0 with the same

resource type number. If the file is not found in level 0, we search in groups of level 1 with different re-source type numbers. As a result, the search time and messages per search can be reduced if we can find the needed file directly in level 0 instead of level 1. From Table 1, almost all files can be found in level 0 when using the dichotomy distribution; 60% of files c a n b e f o u n d i n l e v e l 0 w h e n u s i n g t h e Z i p f distribution. Although only 10.23% and 13.23% files can be found in level 0 for the uniform distribution and the Zipf-Mandelbrot distribution, respectively, more than 95% of files can be found in level 1 when using either one of the two distributions.

In addition to simulating the LARO network with 50000 peers, we also simulated LARO with 100000 peers in order to compare all kinds of simulation re-sults after increasing the size of the overlay network. From Table 1 to Table 2, we can see that even though the size of the overlay network increases, the same results about locality, the messages per search, the hit ratio, and the hit ratio in level 0 are obtained; however, the more peers in the overlay network, the more messages per search are incurred for a query.

To compare the performance of the LARO net-work with that of the mOverlay netnet-work, we redo the same experiments for the mOverlay network. The results are shown in Table 3 and Table 4. To easily compare the results of Table 1 and Table 3, we present the charts shown in Fig. 6 to Fig. 8 for messages per search, the locality improvement ratio, and hit ratio, respectively.

In Fig. 5, we observe that LARO spends much fewer messages per search than mOverlay for all file distributions. It is clearly that the LARO network for the dichotomy distribution incurs many fewer messages per search than all the others. The results

Table 1 Experiment results of the LARO networks with 50000 peers

Messages Locality improvement Response time Hit ratio Hit in Level 0

per search ratio (µs)

Uniform 81991 89.2% 95.8% 10.3% 304773 Zipf-Mandelbrot 61528 89.4% 97.5% 13.3% 93105

Dichotomy 58 53.5% 99.9% 99.6% 4592

Zipf 24705 80.3% 97.6% 57.4% 84939

Table 2 Experiment results for the LARO Networks with 100000 peers

Uniform 171235 89.3% 95.1% 11.8% 633444 Zipf-Mandelbrot 123671 87.2% 97.0% 13.6% 385106

Dichotomy 60 53.4% 99.9% 99.6% 6902

(8)

are closely related to the sharp popularity modeled with the dichotomy distribution. In this case, there exist a lot of replications in the LARO network for the popular objects. It is noted that the Dichotomy distribution describes an extreme case that many ob-jects fall into the same popular groups so that the grouping is dominated by the resource criteria, mak-ing the locality conditions hardly been considered in the join process.

As for the hit ratio, Fig. 6 shows that all meth-ods can achieve above 95%. However, it is interest-ing to note that the mOverlay network has a little bit higher hit ratio than the LARO network. The result may be due to the fact that groups of LARO are larger than those of mOverlay, thus being more difficult to search for an object in the LARO network under the same search parameters. In turn, the result of a larger group size of the LARO network may come from the fact that the locating process of the LARO network considers the resource types first and the process ter-minates once the resource types are matched.

Figure 7 shows the locality improvement ratios of the LARO and mOverlay networks with respect to the RCO network. The locality improvement ratios of the LARO networks for all file distributions are above 80% except the dichotomy distribution; the locality improvement ratios of the mOverlay network for all file distributions are about 90% because the mOverlay network concerns only locality and file distributions have no effect on it. It is interesting to note that for the uniform and the Zipf-Mandelbrot distributions the LARO networks can achieve almost 90% of locality improvement ratios, which is almost the same as mOverlay networks do. The result is ex-plainable since the locating process of LARO network considers resource types first and then the locality. The resultant network is mainly due to the factor of locality when the resource types are nearly uniformly

Table 4 Experiment results for the mOverlay networks with 100000 peers.

Uniform 332962 90.2% 95.9% - 802241

Zipf-Mandelbrot 206830 90.3% 97.3% - 470730

Dichotomy 312 90.2% 99.9% - 34677

Zipf 91218 90.4% 98.4% - 223454

Table 3 Experiment results for the mOverlay network with 50000 peers

Uniform 94505 90.2% 95.7% - 415218 Zipf-Mandelbrot 75583 90.2% 97.5% - 160850 Dichotomy 223 90.2% 99.9% - 13109 Zipf 38147 90.3% 98.5% - 153617 100000 90000 80000 70000 60000 50000 40000 30000 20000 10000 0 LARO mOverlay Uniform Zip-Mandelbrot Dichotomy Zipf

Fig. 5 Comparison of messages per search between mOverlay and LARO networks

Fig. 6 Comparison of hit ratios between the LARO and the mOverlay network for different file distributions

100 99 98 97 96 95 94 93 % Hit Ratio LARO_Uniform From left to right:

mOverlay_Uniform LARO_Zipf-Mandelbrot mOverlay_Zipf-Mandelbrot LARO_Dichotomy mOverlay_Dichotomy LARO_Zipf mOverlay_Zipf

(9)

distributed.

We are curious to know the average node degrees of the LARO network and the mOverlay degree. Fig. 8 and Fig. 9 show the average node degrees and the response time, respectively, for the LARO and mOverlay networks with 50000 peers. The average node degree reflects the number of closely related objects or du-plicate files. Different types of file distributions for the LARO network cause different numbers of repli-cate files. The result shows that the average node de-grees are 8.62, 6.82, 6.26, and 6.23 for the dichotomy, Zipf, Zipf-Mandelbrot, and uniform distributions, respectively. This is quite coincident with the con-cepts implied by the file distributions. Average node degree for the mOverlay network is 4.4 for all distri-butions because it is not aware of resource types when constructing an overlay network. The results of the response time are also consistent with the observation of the number of messages, where we can see that the more pronounced the popularity of a distribution is, the quicker the query will be responded to.

V. CONCLUSION

We have considered both the locality and the resource awareness in constructing peer-to-peer over-lay networks. We define the resource awareness as the knowledge of resource types. It is interesting to note that the resource types themselves present the properties of fractals both in coarse and fine classifi-cations of resources. The proposed overlay network is a two-level scheme, called LARO networks, with the concept of grouping by ordered factors of local-ity clustering, interest groups, and data replication. The modified random walk search algorithm can achieve a high query performance in the proposed LARO networks. Simulation results show that our algorithm is better than the mOverlay network in the number of messages per search while it maintains

almost the same hit ratio and comes with competitive locality properties.

NOMENCLATURE

d average distance in hops

Din average intra-group distances of an mOverlay

Dou average inter-group distances of an mOverlay

Di average intra-group distances of an LARO net-work

Do average inter-group distances of an LARO net-work

–

D the average distance under LARO networks

Drco the average distance of a randomly connected overlay

δ the improvement ratio

N total nodes or groups of a network

M average number of neighbors

ACKNOWLEDGEMENT

The work is partially supported by a research

Fig. 7 Improvement of average distance between two peers for both LARO and mOverlay networks with respect to ran-domly connected overlay

Average degree per peer

10 9 8 7 6 5 4 3 2 1 0 % LARO_Uniform mOverlay_Uniform LARO_Zipf-Mandelbrot mOverlay_Zipf-Mandelbrot LARO_Dichotomy mOverlay_Dichotomy LARO_Zipf mOverlay_Zipf From left to right:

Fig. 8 Average degree of peers of the LARO and mOverlay net-works for different file distributions

800000 700000 600000 500000 400000 300000 200000 100000 0 mOverlay LARO Uniform Zip-Mandelbrot Dichotomy Zipf

Fig. 9 Average response time (µs) per query of the LARO and mOverlay networks for different file distributions 100.00% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% LARO mOverlay Uniform Zip-Mandelbrot Dichotomy Zipf

(10)

grant from National Science Council, Taiwan under the contract number NSC 95-2221-E-002 -100 -MY3.

REFERENCES

Banerjee, S., Bhattacharjee, B., and Kommareddy, C., 2002, “Scalable Application Layer Multicast,”

Proceedings of ACM SIGCOMM, August 19-23,

2002, Pittsburgh, Pennsylvania, USA. pp. 205-217.

Bestavros, A., and Jin, S., 2003, “OSMOSIS: Scal-able Delivery of Real-Time Streaming Media in Ad-Hoc Overlay Networks,” Proceedings of the

23rd International Conference on Distributed Computing Systems Workshops, IEEE Computer

Society, Vienna, Austria, pp. 214-219.

Chen, W. -T., Chao, C. -H., and Chiang, J. -L., 2006, “An Interest-Based Architecture for Peer-to-Peer Network Systems,” Proceedings of the 20th

In-ternational Conference on Advanced Information Networking and Applications (AINA’06), Vienna,

Austria, Vol. 1, pp. 707-712.

Chu, Y. -H., Rao, S. G., and Zhang, H., 2000, “A Case for End System Multicast,” Proceedings of ACM

SIGMETRICS, pp. 1-12.

Cohen, E., and Shenker, S., 2002, “Replication Strat-egies in Unstructured Peer-to-Peer Networks,”

SIGCOMM Computer Communication Review,

Vol. 32 No. 4, pp. 177-190.

Cramer, C., Kutzner, K., and Fuhrmann, T., 2004, “Bootstrapping Locality-Aware P2p Networks,”

Proceedings of IEEE International Conference on Networks, Singapore, pp. 357-361.

Doval, D., and O’Mahony, D., 2003, “Overlay Networks: A Scalable Alternative for P2P,” IEEE

Internet Computing, Vol. 7 No. 4, pp. 79-82.

Ganesan, P., Gummadi, K., and Garcia-Molina, H., 2004, “Canon in G major: designing DHTs with hierarchical structure,” Proceedings of

Interna-tional Conference on Distributed Computing Systems, Hachioji, Tokyo, Japan, pp. 263-272.

Hsiao, H. -C., and Liao, H., 2006, “The Peering Prob-lem in Tree-Based Master/Worker Overlays,”

Proceedings of International Grid Pervasive Computing Conference, Taichung, Taiwan, pp.

83-92.

Kobayashi, H., Takizawa, H., Inaba, T., Takizawa, Y., 2005, “A Self-Organizing Overlay Network to Exploit the Locality of Interests for Effective Resource Discovery in P2P Systems,”

Proceed-ings of the The 2005 Symposium on Applications and the Internet, Trento, Italy, pp. 246-255.

Liu, Y., Zhuang, Z., Xiao, L., and Ni, L. M., 2004, “A Distributed Approach to Solving Overlay Mis-matching Problem,” Proceedings of International

Conference on Distributed Computing Systems,

Hachioji, Tokyo, Japan pp. 132-139.

Lv, Q., Cao, P., Cohen, E., Li, K., and Shenker, S., 2002, “Search and replication in unstructured Peer-to-Peer networks,” Proceedings of the 16th

international conference on Supercomputing,

ACM, New York, USA, pp. 84-95.

Lv, Q., Ratnasamy, S., Shenker, S., 2002, “Can Het-erogeneity Make Gnutella Scalable?” The First

International Workshop on Peer-to-Peer Systems, Springer-Verlag, Cambridge, MA, USA, pp.

94-103.

Patro, S., Hu, Y. C., 2003, “Transparent Query Caching in Peer-to-Peer Overlay Networks,” Proceedings

of the 17th International Symposium on Parallel and Distributed Processing, IEEE Computer Society,

Nice, France, pp. 10.

Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S., 2001, “A Scalable Content-Address-able Network,” Proceedings of the 2001

Confer-ence on Applications, Technologies, Architectures, and Protocols for Computer Communications, ACM,

San Diego, California, USA, pp. 161-172. Ratnasamy, S., Handley, M., Karp, R., and Shenker,

S., 2002, “Topologically-aware Overlay Con-struction and Server Selection,” Proceedings

INFOCOM 2002. Twenty-First Annual Joint Con-ference of the IEEE Computer and Communica-tions Societies. IEEE, New York, NY USA, vol.

3, pp. 1190-1199.

Rowstron, A., and Druschel, P., 2001, “Pastry: Scalable, Distributed Object Location and Rout-ing for Large-scale Peer-to-Peer Systems” In IFIP/

ACM International Conference on Distributed Systems Platforms (Middleware), Heidelberg,

Germany, pp. 329-350.

Saleh, O., and Hefeeda, M., 2006, “Modeling and Caching of Peer-to-Peer Traffic,” Proceedings of

International Conference on Network Protocols,

Santa Barbara, California, USA, pp. 249-258. Shamsi, J., Brockmeyer, M., and Abebe, L., 2005,

“TACON: Tactical Construction of Overlay Networks,” Proceedings of Global

Telecommu-nications Conference, GLOBECOM ’05. IEEE,

St. Louis, MO, USA, pp. 926-931.

Shin, K., Lee, S., Lim, G., Yoon, H., and Ma, J. S., 2002, “Grapes: Topology-Based Hierarchical Vir-tual Network for Peer-to-Peer Lookup Services,”

International Conference on Parallel Processing Workshops (ICPPW’02), Vancouver, British

Columbia, Canada, pp. 159-164.

Si, W., and Li, M., 2005, “On the Connectedness of Peer-to-Peer Overlay Networks,” Proceedings of

11th International Conference on Parallel and Distributed Systems, Fukuoka, Japan, pp.

474-480.

(11)

Balakrishnan, H., 2001, “Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications.”

Proceedings of the 2001 Conference on Applications, Technologies, Architectures, And Protocols for Com-p u t e r C o m m u n i c a t i o n s , A C M , S a n D i e g o ,

California, USA, pp. 149-160.

Sun, Y., Sun, L., Huang, X., and Lin, Y., 2006, “Re-source Discovery in Locality-Aware Group-Based Semantic Overlay of Peer-to-Peer Networks,”

Proceedings of the 1st International Conference on Scalable Information Systems, ACM, Hong

Kong, Article No. 44.

Wang, J. Z., and Vanninen, M. A., 2004, “A Novel Self-Configuration Mechanism for Heterogeneous P2P Networks,” Proceedings of IEEE/WIC/ACM

International Conference the Intelligent Agent Technology, IEEE, Beijing, China, pp. 281-287.

Xiao, L., Liu, Y., and Ni, L. M., 2005, “Improving Unstructured Peer-to-Peer Systems by Adaptive Connection Establishment,” IEEE Transactions

on Computers, Vol. 54 No. 9, pp. 1091-1103.

Xu, Z., Min, R., Hu, Y., 2003, “HIERAS: a DHT Based Hierarchical P2P Routing Algorithm,”

Pro-ceedings of International Conference on Paral-lel Processing, Kaohsiung, Taiwan, pp. 187-194.

Xu, Z., Tang, C., Zhang, Z., 2003, “Building Topol-ogy-Aware Overlays Using Global Soft-State,”

Proceedings of the 23rd International Conference o n D i s t r i b u t e d C o m p u t i n g S y s t e m s, I E E E ,

Providence, Rhode Island USA, pp. 500-508. Yang, B., and Garcia-Molina, H., 2002 “Efficient

Search in Peer-to-Peer Networks,” Proceedings

International Conference on Distributed Comput-ing Systems, Vienna, Austria, pp. 5-15.

Zhang, B., Jamin, S., and Zhang, L., 2002, “Host multicast: A framework for delivering multicast to end users,” Proceedings of IEEE INFOCOM, New York, NY, USA, Vol. 3, pp. 1366-1375. Zhang, J., Liu, L., and Pu, C., 2005, “Constructing a

Proximity-Aware Power Law Overlay Network,”

Proceedings of Global Telecommunications Conference, GLOBECOM ’05. IEEE, St. Louis, MO,

USA, pp.636-640.

Zhang, X. Y., Song, G., Zhang, Q., Zhu, W., Gao, L., and Zhang, Z., 2004, “Measurement-Based Con-struction of Locality-Aware Overlay Networks,”

IEEE International Conference on Communications,

Paris, France, pp. 1401-1405.

Zhang, X. Y., Zhang, Q., Zhang, Z., Song, G., and Zhu, W., 2004, “A Construction of Locality-Aware Over-lay Network: mOverOver-lay and Its Performance,” IEEE

Journal Selected Areas in Communications, Vol.

22, No. 1, pp. 18-28.

Zhao, B. Y., Kubiatowicz, J., and Joseph, A. D., 2001, “Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and Routing,” Technical

Report, UCB/CSD-01-1141, 28 pages, University

of California at Berkeley, CA, USA.

Manuscript Received: Apr. 10, 2008 Revision Received: Aug. 24, 2008 and Accepted: Sep. 24, 2008