Next, we design an experiment on PlanetLab to evaluate the effects of group locality and network locality on performance. PlanetLab is a global research test bed comprising approximately 1025 machines distributed at 488 sites around the world in July 2009. When the experiment was running, only about 640 nodes were active.
0.0
k = 8, erasure 0.0116854546 0.3017631322 0.7882322328 0.9751780577 0.9989487992 k = 16, erasure 0.0004467193 0.1967625597 0.8437627667 0.9959893851 0.9999878177 full replication 0.3439000000 0.5904000000 0.7599000000 0.8704000000 0.9375000000
Average node availability
0.6 0.7 0.8 0.9 1
k = 8, erasure 0.9999871684 0.9999999735 0.9999999999 0.9999999999 1.0000000000 k = 16, erasure 0.9999999970 0.9999999999 0.9999999999 0.9999999999 1.0000000000 full replication 0.9744000000 0.9919000000 0.9984000000 0.9999000000 1.0000000000
Figure 4-2 Availability comparison when storage cost is four times.
The experiment setup is described below. First, a node was randomly selected as the file source. Latency between the file source node and all other nodes was measured, and group members consisted of the nodes with lower latency. Since Peeraid distributes a file to 32 nodes, we chose 32 file nodes according to three policies of file distribution, RANDOM, GROUP, and MIX, respectively. Nodes which received a request would download 1 KB shares from arbitrary eight of 32 file nodes of each
0.0 0.5 1.0 1.5 2.0 2.5
Number of group nodes
RANDOM MIX GROUP
RANDOM+ MIX+ GROUP+
32 48 128 320 480
Download time (s)
Figure 4-3 The internal nodes of group.
policy sequentially.
Then, the experiment was repeated with considering network locality. The only difference was that nodes which received a request would download 1 KB shares from eight nodes with the lowest latency of 32 file nodes of each policy. Peeraid can achieve this because latency information is measured by the DHT module.
Figure 4-3 shows how performance changes as the number of group nodes increases for internal nodes of the group. The symbol “+” means network locality was employed when downloading shares. It can be seen that, for internal nodes of the group, distributing file shares with GROUP policy can achieve the best performance; the next is MIX, and RANDOM is the worst. When the number of group nodes is 32, which is exactly sufficient to store 32 file shares, performance of GROUP is about 2.5 times better than MIX, and almost 5 times better than RANDOM without network locality.
0.0 0.5 1.0 1.5 2.0 2.5
Number of group nodes
RANDOM MIX GROUP
RANDOM+ MIX+ GROUP+
32 48 128 320 480
Download time (s)
Figure 4-4 The external nodes of group.
As the number of group nodes increases, performance of GROUP and MIX descends because the divergence of latency between group members also increases.
With network locality, performance of GROUP and RANDOM increases more than 2 times, and MIX increases almost 4 times when the number of group nodes is 32.
Because in MIX half of file nodes are selected from group members, downloading file shares from eight nodes with the lowest latency has the same effect as GROUP.
Although performance of GROUP and MIX descends as the number of group nodes increases, network locality still raises 2-3 times.
Figure 4-4 shows download results of external nodes of the group. For external nodes, GROUP and MIX are worse than RANDOM. When the number of group nodes is 32, performance of RANDOM is 1.6 times better than GROUP, and 1.2 times better
0.0 0.5 1.0 1.5 2.0 2.5
Number of group nodes
RANDOM MIX GROUP
RANDOM+ MIX+ GROUP+
32 48 128 320 480
Download time (s)
Figure 4-5 The file source node.
than MIX. Because external nodes of the group spread around the world, high degree of concentration of file shares are unfavorable for them.
Figure 4-5 shows download results of the file source node. It is similar to the internal nodes of group, but performance of GROUP is better. This property is good for the file system because the owner of a certain file is usually the user who accesses that file most often. Network locality also raises performance 2-3 times for external nodes and the file source node.
Next, we again repeat the experiment but increase the number of shares downloaded from one file node. Figure 4-6 and Figure 4-7 show the performance status with different download size from one file node when the number of group nodes is 32 and 48. It can be seen that the rank of three policies does not change when download size increases. For internal nodes of group, GROUP is still the best; the next
is MIX, and RANDOM is the worst. For external nodes of group, RANDOM is better than GROUP and MIX. However, the difference between three policies becomes greater when download size increases. Figure 4-8 shows download results when the number of group nodes is 320. The benefit from group nodes disappears and the curves of three policies are almost overlapped whether internal nodes or external nodes of group. With network locality, performance increases about 1.5-4 times no matter what kind of conditions.
These experiment results can make useful suggestions for file distribution. For those files which only group members have permission to access, users should distribute them with GROUP policy to achieve the best performance. However, for those files which are public to everyone, users should distribute them with MIX policy, because it has an average performance for internal nodes and external nodes of the group.
0 5 10 15 20 25
0 1K 10K 100K 1M
Share size (Byte)
RANDOM MIX GROUP
RANDOM+ MIX+ GROUP+
Download time (s)
1K 10K 100K 1M
RANDOM 2.180 2.368 5.518 22.854
MIX 1.069 1.579 3.443 12.435
GROUP 0.401 0.699 1.878 5.490
RANDOM+ 0.954 0.992 2.560 16.156
MIX+ 0.276 0.753 1.722 8.894
GROUP+ 0.144 0.372 0.575 2.408
(a) Internal nodes of group.
39
0 5 10 15 20 25 30 35 40
0 1K 10K 100K 1M
Share size (Byte)
RANDOM MIX GROUP
RANDOM+ MIX+ GROUP+
Download time (s)
1K 10K 100K 1M
RANDOM 1.003 1.382 2.558 14.201
MIX 1.292 1.942 3.038 21.903
GROUP 1.624 2.105 3.204 34.762
RANDOM+ 0.411 1.051 1.642 8.580
MIX+ 0.330 1.651 2.620 13.256
GROUP+ 1.117 1.865 2.691 21.333
(b) External nodes of group.
0 5 10 15 20 25
0 1K 10K 100K 1M
Share size (Byte)
RANDOM MIX GROUP
RANDOM+ MIX+ GROUP+
Download time (s)
1K 10K 100K 1M
RANDOM 1.899 2.104 5.305 23.593
MIX 0.916 1.237 2.476 10.626
GROUP 0.172 0.287 1.108 3.182
RANDOM+ 0.584 0.830 2.505 15.855
MIX+ 0.233 0.409 1.262 6.351
GROUP+ 0.166 0.197 0.410 2.141
(c) The file source node.
Figure 4-6 Performance comparison when the number of group nodes is 32.
41
0 5 10 15 20 25
0 1K 10K 100K 1M
Share size (Byte)
RANDOM MIX GROUP
RANDOM+ MIX+ GROUP+
Download time (s)
1K 10K 100K 1M
RANDOM 1.810 2.315 5.806 20.803
MIX 1.164 2.197 3.798 15.026
GROUP 0.464 1.020 2.765 6.937
RANDOM+ 0.944 1.236 2.728 14.604
MIX+ 0.362 0.803 1.948 8.051
GROUP+ 0.302 0.491 0.635 2.507
(a) Internal nodes of group.
0 5 10 15 20 25 30 35 40
0 1K 10K 100K 1M
Share size (Byte)
RANDOM MIX GROUP
RANDOM+ MIX+ GROUP+
Download time (s)
1K 10K 100K 1M
RANDOM 1.019 1.241 2.945 16.661
MIX 1.211 1.828 3.155 20.328
GROUP 1.359 1.918 3.182 33.386
RANDOM+ 0.418 1.110 2.227 9.081
MIX+ 0.378 1.359 2.770 13.216
GROUP+ 0.874 1.461 2.677 24.343
(b) External nodes of group.
43
0 5 10 15 20 25
0 1K 10K 100K 1M
Share size (Byte)
RANDOM MIX GROUP
RANDOM+ MIX+ GROUP+
Download time (s)
1K 10K 100K 1M
RANDOM 1.714 2.369 5.800 21.432
MIX 1.204 1.531 2.683 13.712
GROUP 0.200 0.842 2.263 4.831
RANDOM+ 0.509 0.914 2.683 12.584
MIX+ 0.219 0.467 1.569 7.110
GROUP+ 0.158 0.208 0.641 2.301
(c) The file source node.
Figure 4-7 Performance comparison when the number of group nodes is 48.
0 5 10 15 20 25 30
0 1K 10K 100K 1M
Share size (Byte)
RANDOM MIX GROUP
RANDOM+ MIX+ GROUP+
Download time (s)
1K 10K 100K 1M
RANDOM 2.193 2.338 4.936 23.687
MIX 1.539 2.301 5.337 25.903
GROUP 1.372 2.241 5.423 22.574
RANDOM+ 0.782 1.557 2.430 15.390
MIX+ 0.705 1.337 2.297 14.959
GROUP+ 0.752 1.369 2.451 16.014
(a) Internal nodes of group.
45
0 5 10 15 20 25
0 1K 10K 100K 1M
Share size (Byte)
RANDOM MIX GROUP
RANDOM+ MIX+ GROUP+
Download time (s)
1K 10K 100K 1M
RANDOM 0.950 1.564 3.015 15.270
MIX 1.160 1.641 2.876 16.117
GROUP 0.939 1.428 2.814 14.780
RANDOM+ 0.441 0.939 2.708 10.675
MIX+ 0.421 1.116 2.442 11.258
GROUP+ 0.411 1.098 2.593 13.572
(b) External nodes of group.
0 5 10 15 20 25 30
0 1K 10K 100K 1M
Share size (Byte)
RANDOM MIX GROUP
RANDOM+ MIX+ GROUP+
Download time (s)
1K 10K 100K 1M
RANDOM 1.714 2.369 5.800 21.432
MIX 1.204 1.531 2.683 13.712
GROUP 0.200 0.842 2.263 4.831
RANDOM+ 0.509 0.914 2.683 12.584
MIX+ 0.219 0.467 1.569 7.110
GROUP+ 0.158 0.208 0.641 2.301
(c) The file source node.
Figure 4-8 Performance comparison when the number of group nodes is 320.
47
5 Conclusion and Future Works
Peeraid is a high scalable, robust, and fault-tolerant peer-to-peer file system based on DHT for open clouds structured as a collection of peer nodes which are supplied by different participants around the world. It provides three basic mechanisms to ensure file security.
Erasure coding is applied for file backup to achieve higher data availability with less storage cost. Assuming the average node availability is 0.8, erasure coding achieves higher availability than replication when storage cost is more than double.
Peeraid supports the concept of group with two mechanisms. First, the information about a group is recorded in a SysInfo-object. Only the group manager has write permission for its SysInfo-object. Second, every group has their own routing table for key lookup. Peeraid combines file distribution with group locality to improve access performance. From our experiment results, distributing file shares with GROUP policy can achieve better access performance than MIX and RANDOM for the internal nodes of the group.
At present, Peeraid does not guarantee file consistence when multiple users write a sharing file. If there are conflicts between write logs, users have to select the ones they trust by themselves. A mechanism for synchronization should be provided in the future.
Besides, although erasure coding achieves higher data availability, we still need an efficient way to maintain the number of file shares in the system for durability.
Eventually, accounting is another important issue in our future research.
Bibliography
[1] A. Muthitacharoen, R. Morris, T. M. Gil, and B. Chen, “Ivy: a read/write peer-to-peer file system,” in OSDI, 2002.
[2] A. Rowstron and P. Druschel, “Pastry: scalable, distributed object location and routing for large-scale peer-to-peer systems,” in IFIP/ACM Middleware, 2001.
[3] A. Rowstron and P. Druschel, “Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility,” in ACM SOSP, 2001.
[4] B. Cohen, “Incentives build robustness in BitTorrent,” May, 2003
[5] B. G. Chun, F. Dabek, A. Haeberlen, E. Sit, H. Weatherspoon, M. F. Kaashoek, J. Kubiatowicz, and R. Morris, “Efficient replica maintenance for distributed storage systems,” in Proceeding of the 3rd Conferenc on 3rd Symposium on Networked Systems Design & Implementation, vol. 3, 2006.
[6] B. Y. Zhao, J. D. Kubiatowicz, and A. D. Joseph, “Tapestry: an infrastructure for fault-resilient wide-area location and routing,“ U.C. Berkeley, Tech. Rep.
No. USB//CSD-01-1141, April 2001.
[7] Clip2, The Gnutella Protocol Specification v0.4, 2000.
[8] D. A. Patterson, G. Gibson, and R. H. Katz, “A case for redundant arrays of inexpensive disks,” ACM SIGMOD Record, vol. 17, pp. 109-116, June 1988.
[9] D. Mazi è res, M. Kaminsky, M. F. Kaashoek, and Emmett Witchel, ”Separating key management from file system security,” ACM SIGOPS Operating Systems Review, vol. 33, pp. 124-139, December 1999.
[10] F. Dabek et al, “Wide-area cooperative storage with CFS,” in Usenix SOSP, 2001.
[11] F. Dabek, J. Li, E. Sit, J. Robertson, M. F. Kaashoek, and R. Morris, “Design a DHT for low latency and high throughput,” in USENIX-NSDI, 2004.
[12] FIPS 180-1, Secure hash standard, U.S. Department of Commerce/NIST, National Technical Information Service, Springfield, VA, April 1995.
[13] H. Weatherspoon and J. D. Kubiatowicz, “Erasure coding vs. replication: a quantitative comparison,” in Electronic Proceedings for the 1st International Workshop on Peer-to-Peer Systems, March 2002.
[14] I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan, “Chord:
a scalable peer-to-peer lookup service for Internet applications,” in ACM SIGCOMM, 2001.
[15] J. Kubiatowicz et al, “OceanStore: an architecture for global-scale persistent storage,” in ACM ASPLOS, 2000.
[16] Kazaa, http://www.kazaa.com.
[17] L. Vaquero, L. Rodero-Merino, J. Caceres, and M. Lindner, “A Break in the Clouds: Towards a Cloud Definition,” ACM SIGCOMM Computer Communication Review, vol. 39, pp. 50-55, January 2009.
[18] M. A. Armbrust et al, ”Above the clouds: a Berkeley view of cloud computing,” UC Berkeley Reliable Adaptive Distributed Systems Laboratory, Tech. Rep. No. UCB/EECS-2009-28, 2009.
[19] M. D. Stefano, Distributed data management for grid computing, Hoboken, N.J., Wiley-Interscience, 2005.
[20] Morpheus, http://www.morpheus.com
[21] M. Satyanarayanan et al, ”Coda: a highly available file system for a distributed workstation environment,” IEEE Transactions on Computers, vol.
39, pp. 447-459, April 1990.
[22] Napster Inc., http://www.napster.com.
[23] P. Maymounkov and D. Mazières, “Kademlia: a peer-to-peer information system based on the XOR metric,” in Electronic Proceedings for the 1st International Workshop on Peer-to-Peer Systems, March 2002.
[24] R. Pike, D. Presotto, S. Dorward, B. Flandrena, K. Thompson, H. Trickey, and P. Winterbottom, “Plan 9 from Bell Labs,” Computing Systems, 1995.
[25] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker, “A scalable content-addressable network,” in ACM SIGCOMM, August 2001.
[26] S. Shepler et al, Network File System (NFS) Version 4 Protocol, RFC 3530, April 2003.