Current Status and System Requirements - AshFS: 一個支援離線操作的輕量化行動檔案系統

Chapter 4 Implementation

4.3 Current Status and System Requirements

AshFS is now implemented in C on Linux. The server is a standard OpenSSH server, and the client is a user-level process who will fork other process in background. Figure 8 shows the basic flow of current AshFS when a user carries out one file system operation.

AshFS uses OpenSSH client applications to communicate with the server: “ssh” for passing control messages, and “sftp” for retrieving and updating files. AshFS will create two processes to individually execute these programs at initial and then return to the main procedure.

To realize AshFS, we implemented a set of functions like read(), write(), getattr(), readdir(), etc. FUSE uses a special structure which contains function pointers to this set of functions, and these functions will overlay a normal file system function [12]. These functions will access the special cache structure we designed for AshFS, and create other process or thread like Upload manager to start some mechanisms.

In order to run AshFS, besides AshFS client program we wrote, the client-side computers should have been installed the FUSE framework (both the FUSE kernel module and library) and the OpenSSH client suite which at least contains ssh and sftp command.

Surely, you also need to prepare a SSH server.

Figure 8: Path of stat request in AshFS

Chapter 5 Experiments and Performance Evaluation

In this chapter, some experiments were set to test if AshFS is suitable for our proposed environment. For an objective comparison, we added some of our related works into the experiments. Since MAFS is implemented on FreeBSD and RSC filesystem is designed for the Symbian OS, they were not tested in these experiments. Here we listed the results and analyzed these data. In the end of each experiment, we tried to interpret the relationship between results and our design principles.

5.1 Experiment Environment and Tools

To explore how our file system performs, some experiments have been made. We connected two computers into a Gigabit network; one acted as our client and the other acted as our server. The client computer had an Athlon XP 2600+ CPU and 2GB ram. The server was a Pentium R computer with 3.0GHz clock rate and 1.5GB ram. Both client and server were running Linux with 2.6.17 kernel above. In our experiments, the server shared some parts of its local file system; the client mounted it and performed file system some operations under the mount point. Each experiment was repetitive running for 10 times and the data presented here were the average. Here are some tools and what they were doing for.

z time-1.7-29-fc7: It times a simple command or gives resource usage [21]. We used it to measure the finishing time of each experiment.

z bonnie++-1.03: Bonnie++ is an industry-standard file system benchmark used to benchmark ideal performance in a uniform and repeatable way [19-20]. We used it to evaluate write throughput of a single file.

z CBQ.init-v0.7.3: Cbq.init is a Linux shell script file which uses CBQ (class-based queuing) mechanism to control network traffic [22]. We used it to constrain the bandwidth between the client and the server.

z iftop-0.17-6.fc7: It listens to network traffic on a named interface and displays current bandwidth usage by pairs of hosts [23]. We used it to measure client’s bandwidth usage.

5.2 Frequencies of Filesystem Operations

In this section, we used several methods to test operation speed of our target file systems.

Theses results should be close to how users really feel when using them.

5.2.1 List frequencies

In these experiments we executed “list” operation and recorded execution time of each target file system. Because some file systems have permanent/temporary cache and the cache state affects its performance, we made measurements for both cold and hot cache. In the first experiment, the client listed every sub-directory under the mounted directory and measured the latencies. For easing comparison, we used frequency (1/second) instead of time (second).

The result was shown in the Figure 9.

It is apparent that AshFS and Coda were slower than others when they listed the uncached directory. We thought that it is because of the cache overhead. But even under this circumstance, AshFS were 10 times faster than Coda. Nevertheless, if the client reads the directory again, Coda and AshFS will faster than others because they answer the request from cache directly without contacting to the server.

5.2.2 Frequencies of a sequence of operations

In this experiment, we measured the latencies of a series of file system operations:

create, list (stat), and remove. First the client created a list of files, and then it listed these files, finally it deleted them in sequential order. Each kind of latency was measured and Figure 10 shows the result. We also performed the same operations on local file system (ext3), and the result was denotes as “Local” in Figure 10.

The operation speed of AshFS was faster than all other network file systems. Coda also did well in this experiment. Because in most of modifications, AshFS and Coda will write to local disk first without waiting responses from the server, and communications often take time.

Figure 9: Frequencies of List operations

Figure 10: Frequencies of a sequence of operations

5.3 Write Throughput

We were going to evaluate write throughput of each file system. To explore how bandwidth affects these file systems, we divided environments into two different network bandwidth. Furthermore, consumed bandwidth of each operation was measured for comparing protocol overheads of these file systems.

5.3.1 Write throughput under unlimited bandwidth

There are two kinds of write throughput benchmark, one is writing a single large file, and the other is writing separate small files. For testing the anterior one, we used Bonnie++ to benchmark our target file systems. For the latter one, we copied a directory with varied total size manually from the local file system to the target filesystem, and then measured its execution time. Each directory contains many small files with different sizes from 100B to 1MB.

Figure 11 and 12 show the write throughput benchmarks. Obviously, sshfs had a lower performance in both two benchmarks; it is because of its encryption. It stands to reason that client’s local file system has the best throughput. Coda and AshFS led others because they wrote to local file system first, and didn’t communicate with the server much at that time.

When writing separate small files, all network file systems showed here are evidently slower than local, because much more time was spent on allocating space to files and file system itself (many stat and create actions were executed). Note that AshFS performed a little slower than Coda, and the gap was almost a constant in single file throughput, but it became smaller in separate files throughput. It could because that AshFS always transacts communications in batch (not individual and not synchronous for every file operation). Hence AshFS performed well in small files.

Figure 11: Single file write throughput

Figure 12: Separate files write throughput

Figure 13 shows the write overheads. We measured total transferred and received flows.

Then subtracted it from original size of the written file and took this value as the overhead. In both benchmarks, Coda’s overhead was much higher than others. It maybe means that Coda’s protocols are more talkative, so it needs more network resources for write operations. Sshfs took 2^nd high overhead, because the encryption of SSH2. Nevertheless, AshFS also uses SSH2, but its overhead was 52-77% less than Coda and 32-34% less than sshfs. Here shows that our

design in reducing communications between client and server works.

Figure 13: Write overhead

5.3.2 Write throughput under lower bandwidth

To simulate mobile networks with usual transfer rate, we confined the experiment bandwidth to 4096kbps (512KB/s) for both downloading and uploading.

Figure 14 and 15 are our experiment results. NFS, SMB and sshfs performed notably worse than others, their throughputs were almost all equal to 0.46 MB/s (We have enlarged their data to make our figures more clear; original data displayed three overlapped lines lying on the x-axis). It shows that their performances were severely constrained by the available network bandwidth. It is because of they don’t have a persistent cache, and they usually need a steady, strong connection to the server. AshFS and Coda both take asynchronously write policy which means that files are written to local disk first, so their throughputs were similar to ones under unlimited bandwidth.

Figure 14: Single file throughput in 512KBps

Figure 15: Separate files throughput in 512KBps

Figure 16 are the write overheads in 512KBps bandwidth. There are no big differences except Coda’s overhead. Comparing with the overhead in unlimited bandwidth, even though coda’s overhead became less, it still used 51-52% more network resources than AshFS.

Figure 16: Write overhead in 512 KBps

Tables below are numeric benchmark data for reference.

AshFS Coda NFS SMB sshfs Local(ext3)

thruput-256M 37.314 42.007 28.086 30.660 10.446 47.482 thruput-384M 37.701 41.034 24.769 27.105 10.019 46.078 thruput-512M 37.703 40.251 21.453 23.549 10.257 44.299 thruput-640M 37.584 39.987 23.002 21.406 9.931 46.431 thruput-768M 36.753 39.840 22.138 19.738 9.181 43.542 overhead (%) 6.497 28.126 6.449 5.750 9.938 0.000

Table 3: Single file throughput under high BW

AshFS Coda NFS SMB sshfs Local(ext3)

thruput-256M 27.236 34.839 19.460 17.676 8.628 40.962 thruput-384M 28.197 31.457 18.941 17.844 8.615 42.025 thruput-512M 27.065 27.137 18.768 17.740 8.751 41.524 thruput-640M 27.356 28.938 19.468 17.776 8.745 42.875 thruput-768M 28.927 30.373 17.429 17.114 7.997 42.381 overhead (%) 6.483 13.467 6.820 8.275 9.648 0.000

Table 4: Separate files throughput under high BW

AshFS Coda NFS SMB sshfs Local(ext3) thruput-256M 33.066 40.751 0.4656 0.4679 0.4601 58.084

thruput-384M 35.071 39.795 0.4664 0.4675 0.4610 46.748 thruput-512M 35.664 38.294 0.4659 0.4688 0.4601 46.361 thruput-640M 34.229 39.587 0.4658 0.4676 0.4611 45.767 thruput-768M 35.026 40.243 0.4656 0.4672 0.4613 46.748 thruput-896M 35.256 40.687 0.4654 0.4671 0.4602 45.384 overhead (%) 6.297 12.825 6.406 5.719 9.516 0.000

Table 5: Single file throughput under 512KBps

AshFS Coda NFS SMB sshfs Local(ext3)

thruput-256M 24.267 33.810 0.4648 0.4655 0.4603 53.076 thruput-384M 25.120 33.550 0.4642 0.4651 0.4605 47.206 thruput-512M 24.700 29.922 0.4644 0.4660 0.4608 46.166 thruput-640M 23.917 28.693 0.4649 0.4656 0.4605 43.304 thruput-768M 23.358 24.710 0.4647 0.4653 0.4605 41.776 thruput-896M 23.210 26.963 0.4644 0.4659 0.4600 39.282 overhead (%) 6.375 13.375 6.750 6.219 9.766 0.000

Table 6: Separate files throughput under 512KBps

5.4 Synchronization Performance

5.4.1 Synchronization time

How long does it take to propagate all changed files to the server? We were going to measure it in this experiment. First we disconnected the network connection, and then wrote several files to our target file systems. As a result of lacking support of disconnected operations, just Coda and AshFS remained as our target file systems. After writing has finished, we reconnected the network. Once the target file system started to synchronize, we timed it. Figure 17 is our result; the horizontal coordinate denotes the total size of all written

files. Obviously, AshFS can finish synchronizing faster. The reason is because AshFS has a lower protocol overhead, and it transfers less data when synchronizing. In fact, we also recorded their transfer rates: The average transfer rate of AshFS was about 15MB/s, and one of Coda was about merely 9MB/s. It means that even they transfer the same amount of data, AshFS is more efficient.

Figure 17: Synchronization time

AshFS Coda

time-128M (sec) 13.530 24.797 time-256M (sec) 22.518 39.820 time-384M (sec) 32.945 61.950 time-512M (sec) 42.483 75.86 time-640M (sec) 57.260 93.72 Table 7: Synchronization time

5.4.2 Rewriting speedup

Previous write throughputs are the measurements of writing new files to the mount point.

In fact, users often deal with writing to existent files, and we used “rsync” utility to enhance rewriting performance of AshFS. In this experiment, we evaluated how it much faster than original writing methods. First we wrote a directory containing many files with different sizes

to the mount point. After these files are all successfully propagated to the server, we modified every file with a small change: we used a text editor to open them and appended two new lines of text to each file. Our target file systems will perceive the changes of cached files and then synchronize them with server’s ones. (For the file systems with no persistent cache, the changes are directly replayed on the server’s files.) After the synchronization has been finished, we appended these files again. Similar actions were repeated for three times, and the results shown here are the averages. The consumed bandwidth of each write/rewrite operation was measured and we compared it with the original file size. Here we invoke a new terminology “Speedup” used in rsync and follow its original definition.

z Speedup = Original file size / total transferred packet size

We can think that the number Speedup is a bandwidth-preserving factor. Obviously everyone wants it to be as greater as possible, since we can use a little bandwidth to finish synchronization. In a normal case, Speedup is usually less than 1 because of the protocol overhead (It costs more than N bytes to transfer an N bytes file).

The results are shown in Figure 18, where AshFS (z on) means AshFS with compress-enabled rsync; i.e. AshFS will compress some data when synchronizing. When all of our target file systems uploaded new files to the server, they transferred all contents of files.

Hence all Speedups are a little less than 1, in a word, there is no speed improvement.

Traditional cache-enabled network file systems usually upload every changed file to the server in their entirety, but AshFS will perceive that these files already exist on the server and will find the differences between the newer version and older version. When writing the same file again, AshFS will only propagate the modified contents. Since only a small part of file is transferred, Speedup is much bigger than 1 in the rewriting operation. Intuitively, with the support of compression, AshFS (z on) has larger Speedup. Note that Speedup of NFS is a little greater than 1 for the rewriting operation, and it means that NFS may compress the file data for rewriting. Samba and Sshfs performs notably worse than writing new files: their Speedups

are almost a half of the original ones, and we noticed that the amounts of their receiving packets were almost equal to sending packets (and almost equal to total directory size). This looks strange because the receiving packets are usually much less than sending packets for writing operations. We think this may because of the temporary cached files: Samba and Sshfs found that the user was writing to an existent file and thought it may be rewritten for several times in the future, so they downloaded the file to their memory caches first. After getting the file and writing to it, these file systems then uploaded the file to the server. This can explain why the amounts of sending packets and receiving packets were almost equal to the file size.

But both of them don’t support persistent cache, so when the user wrote to the same file after a while (the cached file was invalidated or replaced), they needed to retrieve it from the server again. Therefore Samba and Sshfs transferred double amounts of packets in rewriting operations. Finally, lacking of synchronization algorithms similar to rsync, Coda’s Speedups are all the same in both write and rewrite benchmarks, because it transferred similar contents again and again.

Figure 18: Rewriting speedup

Chapter 6 Conclusion and Future work

In this thesis, we described the designs and implementations of AshFS, a personal network file system designed for the mobile networks. AshFS supports disconnected operations and automatic synchronization, so it can functions correctly under the different network environments. We chose SSH2 as our main protocol to develop AshFS since it is widely distributed and secure. In order to ease installation and deployment, no server-side changes were made.

AshFS is created based on FUSE, a user space file system framework. Although its functionalities are implemented in user space, experiments show that its performance is acceptable. Since the rise in computing power is led over the speed improvement in network equipments, it’s a feasible solution to create a network file system in user space.

AshFS has a higher throughput and a lower protocol overhead than the majority of network file systems in our experiments. For the rewriting operation, a frequently-executed action in the filesystem, AshFS can preserve much more bandwidth than others. Besides, its performance is not severely affected by available bandwidth. AshFS are also aware of the variation of network status soon. Future work is to adapt this experimental implementation to become more reliable and maneuverable.

References

[1] I. Voras and M. Zagar, “Network Distributed File System in User Space,” Information Technology Interfaces, 2006. 28^th International Conference on, 2006, pp. 669-674.

[2] P. J. Braam and P. A. Nelson, “Removing Bottlenecks in Distributed Filesystems: Coda &

InterMezzo as esamples,” Proceedings of Linux Expo 1999, May 1999.

[3] J. Tolvanen, T. Suihko, J. Lipasti, and N. Asokan, “Remote Storage for Mobile Devices,”

Communication System Software and Middleware, 2006. Comsware 2006. First International Conference on, Jan 2006, pp. 1-9.

[4] M. Satyanarayanan, “Coda: A Highly Available File System for a Distributed Workstation Environment,” in Proceedings of the Second IEEE Workshop on Workstation Operating

Systems, 1989.

[5] B. Atkin, and K.P. Birman, “Network-Aware Adaptation Techniques for Mobile File Systems,” Network Computing and Applications, 2006. NCA 2006. Fifth IEEE

International Symposium on, July 2006, pp. 181-188.

[6] P. J. Braam, “File Systems for Clusters from a Protocol Perspective,” Second Extreme Linux Topic Workshop, Monterey CA, 1999.

[7] I-Hsuan Huang, Wei-Jin Tzeng, Szu-Wei Wang, and Cheng-Zen Yang, “Design and Implementation of a Mobile SSH Protocol,” TENCON 2006. 2006 IEEE Region 10

Conference, November 2006, pp. 1-4.

[8] OpenSSH, project home http://www.openssh.com/ .

[9] Christopher Smith, “Linux NFS-HOWTO,” online publication, May 2006, http://nfs.sourceforge.net/nfs-howto/ .

[10] A. Josey, chair, “POSIX Threads,” IEEE Standard 1003.1, 2004 Edition.

[11] FUSE: Filesystem in Userspace, project home

[13] Brian Pawlowski, Spencer Shepler, Carl Beame, Brent Callaghan, Michael Eisler, David Noveck, David Robinson, Robert Thurlow, “The NFS Version 4 Protocol,” 2^nd International SANE Conference, May 22-25 2000 MECC, Maastricht, The Netherlands.

[14] Jelmer R. Vernooij, John H. Terpstra, and Gerald (Jerry) Carter, “The Official Samba 3.2.x HOWTO and Reference Guide,” online publication.

http://us3.samba.org/samba/docs/man/Samba-HOWTO-Collection/ .

[15] John D. Blair, “Samba: Integrating Unix and Windows,” Specialized Systems Consultants Inc, Seattle, 1998.

[16] Matthew E. Hoskins, "SSHFS: Super Easy File Access over SSH," Linux Journal, April 28^th 2006.

[17] SSH Filesystem,

http://fuse.sourceforge.net/sshfs.html .

[18] J. Postel and J. Reynolds, “File Transfer Protocol (FTP)”, IETF RFC Document #959, 1985.

[19] bonnie++(8) - Linux man page, online publication http://linux.die.net/man/8/bonnie++/ .

[20] Tim Bray and Russell Coker, “bonnie++ readme,” online publication, 1999, http://www.coker.com.au/bonnie++/readme.html .

[21] time(1) - Linux man page, online publication http://linux.die.net/man/1/time .

[22] Tomasz Chmielewski, “Bandwidth Limiting HOWTO,” online publication, Nov 20th 2001,

[23] Paul Warren and Chris Lightfoot, “iftop: Display bandwidth usage on an interface,”

online publication, Feb 12th 2006, http://www.ex-parrot.com/~pdw/iftop/ .

[24] John C.S. Lui, Oldfield K.Y. So, and T.S. Tam, “NFS/M: An Open Platform Mobile File System,” The 18^th International Conference on Distributed Computing Systems (ICDCS’98), May 1998.

[25] Braam and P.J., “The Coda Distributed File System,” Linux Journal, #50, June 1998.

[26] Athicha Muthitacharoen, Benjie Chen, and David Mazieres, “A Low-bandwidth Network File System,” Proceedings of the eighteenth ACM symposium on Operating systems

principles, 2001, pp. 174-187.

[27] Michael Holve, “A Tutorial on Using rsync,” online publication, Nov 12th 1999, http://everythinglinux.org/rsync/ .

[28] Rsync man page, Jun 29^th 2008,

http://samba.anu.edu.au/ftp/rsync/rsync.html .

在文檔中 AshFS: 一個支援離線操作的輕量化行動檔案系統 (頁 39-0)