RELATED WORK - Efficiency and reliability in cluster based peer-to-peer systems

2 12

Fig. 21. Connectedness and HOPS in a dynamic system.

6. RELATED WORK

P2P systems can be classified as either structured systems or unstructured systems.

Chord [11], CAN [6], P-Grid [8], Pastry [10], and Tapestry [9], which use DHT [4] for object placement and searching, belongs to structured P2P systems. Gnutella [15], on the other hand, search for objects by message flooding. Kazaa, another well-known P2P sys-tem [16], adopts a hierarchical structure. Powerful nodes are elected to be so called su-pernodes which connect several clients (normal nodes) and search objects for them. Al-though DHT based P2P systems perform excellently in searching for objects, other re-lated problems should be addressed and solved. Complex search (partial query), for ex-ample, is still difficult in a DHT network. Also, maintenance cost is another issue for any P2P system with frequent joining and cutting of nodes [17].

For unstructured P2P systems, some methods are proposed to reduce the messages originated from search requests. Research in [18] adopts a random walk to select some neighbors instead of all neighbors to send the request. Other research [19] use cache to let the search results could to be re-used by later search requests.

Research in [21] provides a cluster-based architecture which is similar to our re-search. However, neither Internet topology nor redundant message by message flooding are considered in that research. Another problem is that the method in [21] needs a cen-tral server to perform the clustering of nodes, while a cencen-tral sever causes a bottleneck for the system. Another paper [22] also adopts a cluster structure for its system, but the clustering of nodes depends on similar properties of nodes. The Internet topology and redundant message flooding are not considered.

The DSE system is an Internet topology-matching network for promoting message locality. Researches reported in [23-25] provide similar mechanisms for building topol-ogy matching systems. One major difference between them and DSE is that they use a measurement based method to evaluate the distance between nodes. Although they claim the measurement based method provides an accurate value of the distance between nodes, the method can not represent accurate communication cost between nodes. For example, a traffic jam on the route between two nodes may slow down their communication, how-ever, the communication costs between them may be very cheap. Conversely, fast com-munication may exist between two nodes which are located in different backbones. Also, the researchers try to make the system a multicast tree to reduce unnecessary traffic by cutting some connections across different AS’s (according to their distances). Such a method takes risks splitting the system and, hence, hurt the performance of the system.

For our system, DSE makes the system a set of connected clusters. The SMR mechanism utilizes the benefit of clustering to reduce redundant flooding, and the RAL mechanism recovers the system if splits occur due to broken connections caused by the SMR mecha-nism or failures of nodes.

Another two related researches [26, 27] provide different methods for topology matching overlay and message routing. Research in [26] uses measured distance to some default landmark hosts to group nodes into bins. Nodes in the same bin are likely to be in the same AS. A DHT-based overlay network can be constructed on such structures to reduce message routing latency. Research [27], on the other hand, uses AS-level topology extracted from BGP reports [28] or landmark numbering (similar to [26]) to build an aux-iliary expressway network, over a normal DHT based overlay network. The expressway network is a set of powerful nodes which provide better connectivity, forwarding capac-ity, and availability. Moreover, these expressway nodes are clustered based on the above two methods to match the physical network topology. In summary, the landmark meas-urement method of [26, 27] to group nodes depends highly on availability of landmark hosts. The number of landmark hosts and how they distribute over the Internet deter-mines the precise of the method. Moreover, scalability is another problem because all nodes connect to landmark hosts for their topology-aware numbering. Our DSE system adopts HIP address format which is similar to BGP method except that DSE considers ISPID instead of ASID. The ISPID method provides three benefits. First, each ISP usu-ally provides high bandwidth pipelines between its AS’s. Second, the IP address ranges of each ISP are usually available to the public. Third, the mappings of IP addresses and ISPIDs in DSE are fulfilled by its own built-in message flooding method. No other addi-tional mechanism needs to publish the mappings.

For redundant message reduction, researches [29, 30] proposed their methods for unstructured P2P networks. Research [29] built an auxiliary tree-like sub-overlay called FloodNet within the unstructured P2P network such as Gnutella to reduce redundant

flooding messages. The research is motivated by an observation that most redundant messages are generated at their last few HOPs, but the flooding coverage expands much faster within their first few HOPs. With a tree-like overlay, each message travels by pure flooding within the low HOPs and switch to the sub-overlay for the remaining HOPs.

The simulation shows that such a design reduces redundant messages but retains the same message propagation scope as that of standard floodings. One major difference between LightFlood and DSE is that the latter clusters the system topology so that most redundant messages on the same logical connections can be prevented, but the former switches the flooding of each message at the last few HOPs to FloodNet so that the mes-sage coverage expends to the same scope but with much fewer redundant mesmes-sages. No-tice that LightFlood does not change the original system topology. On the other hand, [30]

proposed a peer-to-peer lookup service called Yappers over an arbitrary topology. Yap-pers groups nearby nodes to small DHTs and provides a search mechanism to traverse all the small DHTs. Such a hybrid design reduces the nodes contacted for a lookup request and, hence, reduces redundant messages. One major difference between Yappers and DSE is that Yappers focuses on key-value lookup which can be solved by DHT, while DSE concerns the reduction of redundant messages for flooding.

In comparison to well-known P2P network Kazaa, DSE provides different proper-ties. Both Kazaa and DSE adopt a hierarchical structure, but the self-organization mecha-nism of Kazaa elects so called supernodes which maintain many more connections to other normal nodes. Also, the search for files is handled by supernodes. In other words, each supernode is a small server to serve other normal nodes. The election of supernodes is based on the bandwidth and computing capacity of nodes. Powerful nodes get a higher priority to be supernodes. In DSE, every node is homogeneous. The hierarchical cluster structure is for better message routing and higher message locality. Each search request reaches every node in the DSE system to search for matched objects on local nodes. On the other hand, the election of supernodes in Kazaa requires precise measurements of the bandwidth and computing power of hosts. Frequent joins and leaves of nodes may hurt the availability of supernodes and the function they provide. These problems do not bother the DSE system because all nodes are homogeneous, and there are no nodes in DSE to become bottlenecks of the system.

The supernode structure indeed reduces traffic significantly because only super-nodes perform search requests for their clients. However, the SMR mechanism in DSE utilizes the cluster structure of nodes to reduce by up to 87.27% of the redundant traffic in message flooding.

7. CONCLUSION

The DSE system is a fully distributed and self-organizing P2P system. Our research addresses the inefficient communication in unstructured P2P systems, such as Gnutella, to propose NCC, SMR, and RAL mechanisms to improve the efficiency of communica-tion. The NCC mechanism re-constructs the topology of the system as a clustered struc-ture. The clustering of nodes depends on the logical distance between nodes. Compared to Gnutella, the NCC mechanism significantly reduces the number of messages which travel across AS’s. On the other hand, redundant message is a serious problem of the

message flooding in P2P systems. Based on the clustering structure of the system, the SMR mechanism significantly reduces redundant messages in comparison to unstruc-tured P2P systems in which forwarding of redundant flooding messages is not fully con-sidered. The last issue that we consider is the connectedness of the system. A server less P2P system organizes the whole system in a fully distributed way, hence splitting system is always possible. We proposed the RAL mechanism which handles the recovery of split components based on message exchanges between nodes. With the RAL mechanism, flooding messages can be guaranteed to reach every node in the system if necessary.

With the cooperation of the NCC, SMR and RAL mechanisms, the communication cost in P2P systems can be greatly reduced. Low-cost communication makes P2P sys-tems more scalable and perform better.

REFERENCES

1. K. Aberer, M. Punceva, M. Hauswirth, and R. Schmidt, “Improving data access in P2P systems,” IEEE Internet Computing, Vol. 6, 2002, pp. 58-67.

2. A. Oram, Peer-to-Peer Harnessing the Power of Disruptive Technologies, O’Reilly, U.S.A., 2001.

3. D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small-world’ networks,”

Nature, Vol. 393, 1998, pp. 440-442.

4. S. Ratnasamy, S. Shenker, and I. Stoica, “Routing algorithms for DHTs: some open questions,” in Proceedings of the 1st International Workshop on Peer-to-Peer Sys-tems (IPTPS), 2002, pp. 45-52.

5. M. Ripeanu, I. Foster, and A. Iamnitchi, “Mapping the Gnutella network: properties of large-scale peer-to-peer systems and implications for system design,” IEEE Inter-net Computing Journal, Vol. 6, 2002, pp. 50-57.

6. S. Ratnasamy, M. Handley, R. Karp, and S. Shenker, “A scalablecontent-addressable network,” in Proceedings of SIGCOMM, 2001, pp. 161-172.

7. S. Saroliu, P. Gummadi, and S. Gribble, “A measurement study of peer-to-peer file sharing systems,” in Proceedings of the Multimedia Computing and Networking (MMCN), 2002, pp. 156-170.

8. K. Aberer, “P-grid: a self-organizing access structure for P2P information systems,”

in Proceedings of the 6th International Conference on Cooperative Information Sys-tems (CoopIS), 2001, pp. 179-194.

9. B. Y. Zhao, J. Kubiatowicz, and A. D. Joseph, “Tapestry: an infrastructure for fault- tolerant wide-area location and routing,” Technical Report No. UCB/CSD-01-1141, Computer Science Division, University of California, Berkeley, 2001.

10. A. Rowstron and P. Druschel, “Pastry: scalable, decentralized object location and routing for large-scale peer-to-peer systems,” in Proceedings of the International Conference on Distributed Systems Platforms, LNCS 2218, 2001, pp. 329-350.

11. I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan, “Chord: a scalable peer-to-peer lookup service for internet applications,” Technical Report No.

TR-819, MIT, 2001.

12. RFC 1305, Network Time Protocol, http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc 1305.html.

13. RFC 1772, Autonomous System, http://www.faqs.org/rfcs/rfc1772.html.

14. Gnutella, http://gnutella.wego.com/.

15. Apia: Advanced P2P infrastructure and architecture, http://apia.peerlab.net/.

16. Kazaa, http://www.kazaa.com/.

17. Y. Chawathe, S. Ratnasamy, L. Breslau, N. Lanham, and S. Shenker, “Making Gnutella-like P2P systems scalable,” in Proceedings of the ACM SIGCOMM, 2003, pp. 407-418.

18. C. Lv, P. Cao, E. Cohen, K. Li, and S. Shenker, “Search and replication in unstruc-tured peer-to-peer networks,” in Proceedings of the 16th Annual ACM International Conference on Supercomputing, 2002, pp. 84-95.

19. N. Ambastha, I. Beak, S. Gokhale, and A. Mohr, “A cache-based resource location approach for unstructured P2P network architectures,” Department of Computer Sci-ence, Stony Brook University, New York, 2003.

20. TWNIC IP Address Distribution List, http://rms.twnic.net.tw/twnic/User/Member/

Search/main7.jsp?Order=ORG.ID.

21. A. Fox, S. D. Gribble, Y. Chawathe, E. A. Brewer, and P. Gauthier, “Cluster-based scalable network services,” in Proceedings of the 16th ACM Symposium on Operat-ing System Principles, 1997, pp. 78-91.

22. C. H. Ng and K. C. Sia, “Peer clustering and firework query model,” in Poster Pro-ceedings of the 11th International World Wide Web Conference, 2002.

23. Y. Liu, Z. Zhuang, L. Xiao, and L. M. Ni, “A distributed approach to solving overlay mismatching problem,” in Proceedings of the 24th International Conference on Dis-tributed Computing Systems (ICDCS), 2004, pp. 132-139.

24. Y. Liu, X. Liu, L. Xiao, L. M. Ni, and X. Zhang, “Location-aware topology match-ing in P2P systems,” in Proceedmatch-ings of the IEEE INFOCOM, Vol. 4, 2004, pp.

2220-2230.

25. Y. Liu, Z. Zhuang, L. Xiao, and L. M. Ni, “AOTO: adaptive overlay topology opti-mization in unstructured P2P systems,” in Proceedings of the IEEE GLOBECOM, Vol 7, 2003, pp. 4186-4190.

26. S. Ratnasamy, M. Handley, R. Karp, and S. Shenker, “Topologically-aware overlay construction and server selection,” in Proceedings of the IEEE INFOCOM, 2002, pp.

1190-1199.

27. Z. Xu, M. Mahalingam, and M. Karlsson, “Turning heterogeneity into an advantage in overlay routing,” in Proceedings of the IEEE INFOCOM, 2003, pp. 1499-1509.

28. BGP Routing Table Reports, http://bgp.potaroo.net/.

29. S. Jiang, L. Guo, and X. Zhang, “LightFlood: an efficient flooding scheme for file search in unstructured peer-to peer systems,” in Proceedings of the International Conference on Parallel Processing, 2003, pp. 627-635.

30. O. Ganesan, Q. Sun, and H. Garcia-Molina, “YAPPERS: a peer-to-peer lookup ser-vice over arbitrary topology,” in Proceedings of the IEEE INFOCOM, Vol. 2, 2003, pp. 1250-1260.

Ching-Wei Huang (黃經緯) received his B.S. degree in Mathematics from National Tsing Hua University, Taiwan, in 1994 and the M.S. degree in Computer Science from National Chiao Tung University, Taiwain, in 1997. Currently he is doing his Ph.D degree in National Chiao Tung University. His research interests are distributed computing, peer-to-peer systems, net-work security, and XML processing.

Wuu Yang (楊武) received his B.S. degree in Computer Science from National Taiwan University in 1982 and the M.S.

and Ph.D. degrees in Computer Science from University of Wis-consin at Madison in 1987 and 1990, respectively. Currently he is a Professor in the National Chiao Tung University, Taiwan, R.O.C. Dr. Yang’s current research interests include Java and network security, programming languages and compilers, and attribute grammars. He is also very interested in the study of hu-man languages, and huhu-man intelligence.

在文檔中 Efficiency and reliability in cluster based peer-to-peer systems (頁 22-27)