Qualitative comparison of KAD-N and KAD

Chapter 3 Design Approach

3.4 Qualitative comparison of KAD-N and KAD

Table II shows the comparison between the original KAD and our KAD-N. If we hash a keyword at most N times, the publishing load will be more balanced and the search hit rate will also increase. Indexes of a keyword are published to at most N targets. Furthermore, KAD-N does not increase the total number of indexes. It just distributes indexes more even, as shown in Figure 9. In other words, in KAD-N the total publishing load of the network is same as that of KAD but the number of search messages will increase N times compared to KAD. Because KAD-N will spread the indexes, there will be more peers who have the same indexes. KAD-N will improve the search hit rate in case that some peers failed. However, the network traffic will increase slightly because of the increased number of search messages. The keyword of an object may be hashed at most N times and the computation overhead is thus O(N). In the original KAD, since each keyword is hashed only once, the computation overhead is O(1).

KAD-N would cause extra computation overhead. We will discuss an optimal value of N in the following chapter.

TABLE II. Qualitative comparison of KAD and KAD-N

Approach KAD KAD-N (proposed)

Publishing load Imbalance Balance

Search hit rate Normal Better

Computation overhead O(1) O(N)

Query messages per search 1 N

Network traffic Normal More

Chapter 4 Simulation Results

4.1 Simulation setup

First, we analyze the overhead of publishing messages and search messages in KAD. In [21], they spied on 20 different keyspaces of the KAD network for 24 hours. During this time, on average, 4.3 million publishing messages and 350,000 search messages were recorded. Based on the measurements of [21], it showed that there are ten times more publishing messages than search messages. Moreover, a publishing message is ten times bigger than a search message since it contains not only a keyword but also metadata describing a published object. In [23], they also spied on a keyspace in the KAD P2P network for 12 hours. They got 561,542 search messages and 5,549,183 publishing messages. Search messages produced 10.8 MB traffic and publishing messages produced 966 MB traffic. Based on these data, traffic produced by a search message is 0.019 KB and 0.18 KB for a publishing message on average. We used these data to calculate total network traffic in our simulation environment. Total network traffic contains search traffic and publishing traffic.

We rank keywords according to their appearance times. Rank 1 is the most popular keyword.

The publishing messages, which were collected by [21], contain 26,500 different keywords per keyspace and 315,000 distinct files. The appearances of each keyword were also counted.

Based on these data, [21] used Matlab to estimates the number of indexes for the ’th popular keyword which is proportional to 1/^. and the number of indexes for the most popular keyword is about 10. For example, the number of indexes of the most popular keyword is ten times more than the tenth popular keyword in the KAD P2P network. That is to say, the peer

who handles indexes of the most popular keyword will get ten times network load than the peer who handles indexes of the tenth popular keyword. Based on the above analysis, we evaluate the performance of our approach.

We used JAVA to construct our simulation environment. Based on [20] and the above analysis, we simulated the behaviors of how KAD P2P networks publish objects and distribute indexes. The indexes handled by each peer were also recorded. Then we applied our method to this simulation environment. We gathered the indexes handled by each peer and used them to show the effectiveness of the proposed KAD-N method.

4.2 Simulation results

Figure 13 shows the index distributions of each keyspace under different hash times. We rank keyspaces according to the number of indexes handled, i.e. keyspace popularity. Rank 1 keyspace handles the most indexes. We found that the index distribution of the original KAD (KAD-1) is very uneven. A large number of indexes were handled by a few keyspaces. If we hash more times, some indexes would be moved from front rank keyspaces to others. From this figure, the index distribution curve will be smoother if we hash the key more times. In other words, if we hash the key more times, publishing load will be more balanced.

Figure 13. The index distribution of each keyspace under different hash times.

However, the number of search messages will increase after applying the proposed KAD-N method. We found that the total network messages will increase linearly with more hash times in Figure 14. Total network messages were calculated based on [23], which was introduced in the simulation setup. They include search messages and publishing messages.

Figure 14. The total number of network messages under different hash times.

1E+2

Figure 15 plots the percentage of extra traffic under different hash times. The growth of the curve is linear just like Figure 14. As we mentioned in the simulation setup, number of search messages multiplied 0.019 KB is search traffic and number of publishing messages multiplied 0.18 KB is publishing traffic. We add search traffic to publishing traffic to get total network traffic. We calculate extra traffic percentage p according to the following equation:

The extra traffic is very small because the number of search messages is much fewer than the number of publishing messages, and traffic produced by a search message is much smaller than a publishing message.

Figure 15. The percentage of extra traffic under different hash times.

We use a standard deviation # to show the divergence under different hash times. A standard deviation is a measure of the dispersion of a data set. A low standard deviation indicates that the

data points tend to be very close to the mean, while a high standard deviation indicates that the data are “spread out” over a large range of values. We calculate standard deviation using the number of indexes handled by each keyspace. In other words, the higher the value, the more unbalanced publishing load of each keyspace. # is computed as follows:

$ %∑ '(^._)/0 ₎* +, -.

where is the number of keyspaces ( = 256 in KAD); 1₂ is the number of indexes handled in the th keyspace and 3 is the average number of indexes handled in each keyspace.

From Figure 16, we observed that when hash times ≥ 7, # will not decrease too much. That is, if we hash more than 7 times, the standard deviations are almost the same. In other words, when hash times ≥ 7, it doesn’t help much on load balancing.

Figure 16. The standard deviation of each keyspace under different hash times.

400000

We also simulated the hit rate

failed. We cannot retrieval indexes from failed peers.

Objects referenced by missing calculated by the number of hashing more times it increase

of peers vary according to a diurnal. And the minimal number of peers is about 78%

maximum. So the percentage of failed peers in a day is about 27%.

Figure 17. The hit rate

Figure 17 shows that the hit rate peers failed. The proposed KAD number of peers failed. Because a cost-effectiveness factor k

the hit rate variation under different hash times in case that cannot retrieval indexes from failed peers. We call these indexes

missing indexes would be unsearchable. Note that the h number of missing indexes dividing number of total indexes.

increases the hit rates while peers failed. From [22]

vary according to a diurnal. And the minimal number of peers is about 78%

So the percentage of failed peers in a day is about 27%.

. The hit rate with respect to failed peers under different

the hit rate is close to 100% if we hash more than 5 tim

The proposed KAD-N will increase the search resilience in the situation of Because more hash times do not always bring more efficiency

k to determine the maximum hash times.

10 20 30

Failed peers (%) KAD

in case that some peers indexes as missing indexes.

Note that the hit rate is total indexes. In Figure 17, by [22], we know the number vary according to a diurnal. And the minimal number of peers is about 78% of the

der different hash times.

hash more than 5 times with 27% of in the situation of a large bring more efficiency, we used

4 #

To have a larger k, one has to increase the hit rate, and reduce the total network traffic and the standard deviation. From Figure 18, k of 6, 7 and 8 hash times are very close, and k of 7 hash times is the highest. That is, hashing 7 times is the optimal choice for the trace we simulated.

Figure 18. Cost-effectiveness factor under different hash times.

5.663 5.705

5.696

3.000 3.500 4.000 4.500 5.000 5.500 6.000

1 2 3 4 5 6 7 8 9 10 11 12

Cost-effectiveness factor (k)

Hash times KAD

Chapter 5 Implementation Issues

5.1 Applying KAD-N to existing KAD P2P networks

In this chapter, we describe how to implement the proposed KAD-N method. Our method is an improvement of the original KAD. We can implement it based on an existing KAD P2P network, such as eMule [8] or aMule [24]. They are both open source projects so we can get their source codes easily. By modifying their source codes, the proposed KAD-N method can be implemented. In the following, we describe how to adapt a KAD method to the proposed KAD-N method.

Figure 19 shows the publishing procedure of the original KAD P2P network and Figure 20 shows its search procedure. We can implement the proposed KAD-N method based on a KAD P2P network by modifying the hash operation for publishing and searching objects. To apply our method to the KAD P2P network, we replace block B1 in Figure 19 with block A1 in Figure 11 and also replace block B2 in Figure 20 with block A2 in Figure 12.

Figure 19. The publishing procedure of KAD.

Start

Obtain keyword A from object name

Use key tmp as a target and run the lookup procedure

Use the responses of lookup messages to update the candidate list

Select 11 closer nodes from the candidate list: nodes 1~11

Send publishing messages to nodes 1~11

End

Is the candidate list stable?

Yes

Hash keyword A to get key tmp

Block B1

Figure 20. The search procedure of KAD.

5.2 Combining with streaming

The KAD P2P network can be extended to support streaming applications. Most of existing P2P networks based on KAD are only capable of file sharing. We can enhance the ability of KAD P2P networks by adding some functions, such as P2P streaming. For example, if a video file has been published by several peers in the KAD network, we can use a P2P streaming tool to view this file while downloading. In the original KAD P2P network, we must wait until the whole file is downloaded. It is not efficient.

Yes

TOTAL: Maximum number of answers

Start

Obtain keyword A from a query

Hash keyword A to get key tmp

Use key tmp as a target and then send a search message to the network

Save answers from search responses

End

Number of answers > TOTAL or timeout

Block B2

To combine a KAD P2P networks with streaming, we can modify its download function. A peer can search a video file to get a list of peers who have this file in the KAD P2P network. We can use this peer list to form a P2P streaming network. Peers who have the requested video file will be the sources in the P2P streaming network. Other peers who want to watch this video file can join the P2P streaming network. The KAD-N method we proposed can improve the search hit rate. It may increase the probability of finding more peers who have the requested video file.

By applying our method, the P2P streaming network can achieve resilience in case that some peer failed.

Chapter 6 Conclusions

6.1 Concluding remarks

The proposed KAD-N method does balance load of each keyspace and also improve the search hit rate. It is a simple and effective method. By hashing random times when publishing a keyword, indexes can be distributed more even and the publishing load of each peer would be more balanced. Although KAD-N may slightly increase the number of total messages in the KAD network, the extra traffic is very small. Based on the simulation results, the optimal hash times is 7, which improves the hit rate to close to 100% and cause about 7% of extra traffic. Our method can not only improve the search resilience but also balance the publishing load between peers in KAD networks. In addition, the proposed KAD-N method can be extended to support other DHT based P2P networks.

6.2 Future work

Our KAD-N method is a simple and effective way to achieve load balancing and search resilience. There are some issues that deserve to be further studied. (1) Adapt our method to let it be applicable to other DHT based P2P networks. In this thesis, our method is based on KAD P2P networks. Because of the differences of search and publishing mechanisms between KAD and other DHT based P2P networks, our method needs to be adapted for applying to other networks. (2) Support KAD P2P networks with P2P streaming applications. Most of existing networks built by KAD are only capable of file sharing. In Chapter 5, we discussed how to

combine a KAD P2P network and our method for P2P streaming. This issue deserves to be further studied as well.

Bibliography

[1] D. Kundur, Z. Liu, M. Merabti, and H. Yu, “Advances in peer-to-peer con tent search,” in Proceedings of the IEEE International Conference on Multimedia and Expo, pp. 404-407, July 2007.

[2] Y.J. Joung, L.W. Yang, and C.T. Fang, "Keyword search in DHT-based peer-to-peer networks," IEEE Journal on Selected Areas in Communications, vol. 25, pp. 46-61, January 2007.

[3] P. Maymounkov and D. Mazieres, “Kademlia: A peer-to-peer informatiion system based on the XOR metric”, in Proceedings of the 1st International Workshop on Peer-to-Peer Systems (IPTPS), pp. 53-65, March 2002.

[4] Stoica, R. Morris, D. R. Karger, M. F. Kaashoek, and H. Balakrishnan, “Chord: A scalable peer-to-peer lookup service for Internet applications,” in Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer

Communications, pp. 149-160, August 2001.

[5] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker, “A scalable content-addressable network,” in Proceedings of the Conference on Applications,

Technologies, Architectures, and Protocols for Computer Communications, pp. 161-172, August 2001.

[6] Rowstron and P. Druschel, “Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems,” in Proceedings of the 2001 IFIP/ACM International Conference on Distributed Systems Platforms, vol. 2218, pp. 329-350, November 2001.

[7] Y. Zhao, J. Kubiatowicz, and A. D. Joseph, “Tapestry: An infrastructure for fault-tolerant wide-area location and routing,” University of California, Berkeley, Tech. Rep.

UCB/CSD-01-1141, April 2001.

[8] “eMula Project,” [Online]. Available: http://www.emule.com/.

[9] “BitTorrent,” [Online]. Available: http://www.bittorrent.com/.

[10] “Gnutella website,” [Online]. Available: http://www.gnutella.com.

[11] “Fasttrack peer-to-peer technology,” [Online]. Available: http://www.fasttrack.nu/.

[12] “iMesh website,” [Online]. Available: http://www.imesh.com.

[13] “JXTA community projects,” [Online]. Available: https://jxta.dev.java.net.

[14] M. Abdelaziz, B. Traversat, and E. Pouyoul, "Project JXTA: A loosely-consistent DHT rendezvous walker," March 2003. [Online]. Available:

http://www.jxta.org/docs/jxta-dht.pdf.

[15] T. H. Chang, “Keyword search for enhancing JXTA discovery service in peer to peer networks,” Master’s Thesis, National Chiao Tung University, June 2008.

[16] M. Steiner, D. Carra, and E. W. Biersack, “Faster content access in KAD,” in Proceedings of the Eighth International Conference on Peer-to-Peer Computing, pp. 195-204,

September 2008.

[17] D.Wu, Y. Tian, and K.W. Ng, “Achieving resilient and efficient load balancing in DHT-based P2P networks,” in Proceedings of the 31^st IEEE Conference on Local Computer Networks, pp. 115-122, November, 2006.

[18] B. Godfrey, K. Lakshminarayanan, S. Surana, R. Karp, and I. Stoica, “Load balancing in dynamic structured p2p systems,” in Proceedings of the IEEE INFOCOM’04, pp.

2253-2262, March 2004.

[19] J. Byers, J. Considine, and M. Mitzenmacher, “Simple load balancing for distributed hash tables,” In Proceedings of the IPTPS’03, pp. 80-87, October 2003.

[20] R. Brunner, “A performance evaluation of the KAD-protocol,” Master’s Thesis, University of Mannheim and Institut Eurecom, November 2006.

[21] M. Steiner, W. Effelsberg, T. En-Najjary, and E. W. Biersack, “Load reduction in the KAD peer-to-peer system,” in Proceedings of the 5th International Workshop on Databases, Information Systems and Peer-to-Peer Computing, October 2007.

[22] M. Steiner, T. En-Najjary, and E. W. Biersack, “A global view of KAD” in Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, pp. 117-122, October 2007.

[23] E. W. Biersack, “Everything you want to know on KAD,” June 2008. [Online]. Available:

http://www.thlab.net/old/rescom2008/talks/E-Biersack_KAD-tut.pdf.

[24] “aMule Project,” [Online]. Available: http://www.amule.org/.

[25] “WiredReach,” [Online]. Available: http://www.wiredreach.com/.

[26] “Collanons Workplace,” [Online]. Available: http://www.collanos.com/.

[27] X. L. Fu and Y. Xu, “A load balance algorithm for hybrid P2P network model,” in Proceedings of the ISECS International Colloquium on Computing, Communication, Control, and Management, pp.236-239, August 2008.

在文檔中基於強韌搜尋之KAD同儕網路負載平衡方法 (頁 29-0)