Chapter 4 Journaling analyses
4.5 Observation
In section 4.3 and 4.4, we observe the factors that effect the journal file system. Journal I/O is necessary for consistency recovery but harms the performance. Higher commit interval brings higher performance which results from delayed write effect but will lose more data if crash happens.
We propose remote journaling in next chapter. Remote journaling removes journal I/O from disk to network and has low commit interval which implies high consistency semantic.
The network overhead and throughput benefit of remote journaling will be estimated in our experiments.
Chapter 5
Remote Journaling
We explain the concept of remote journal and how it can be used in this chapter, and will have experiments in next chapter.
5.1 Concept of Remote Journaling
In this chapter, we propose a remote journaling architecture, which journals data to a remote journal server instead of local disk. In addition to guaranteeing consistency, a remote journaling file system also results in similar performance with non-journal file systems. This is because the journal data can be sent out immediately when it is generated and thus the journal traffic will not harm the file system performance.
Different with disk, network transmission does not need position time in disk I/O. The position time includes seek time and rotation time are time-consuming and harms performance. Remote journal scheme can prevent more necessary position time when flushing journal to local disk and then improving performance.
Moreover, remote journaling is a cheap solution. Many hosts can share a single journal server at the same time. Since the workload of the journal server is write-dominated, the disk layout of the journal can be designed to optimize the write performance.
5.2 Applications
Remote journaling can be applied to any journal file systems. This mechanism is especially useful for metadata bound workloads, like online transactions environment, web server, or news server. The network bandwidth taken by remote journaling is acceptable when using in network applications. File system consistency recovery which is the same with traditional process besides reading journal from network reduces downtime, which is important in a commercial service. Moreover, remote journal can be used in a storage cluster, like 錯誤! 找不到參照來源。3. Each storage server amortizes the cost of journal server.
Figure 3the example application
5.3 Implementations
We modify the daemon, kjournald, for remote journaling. When user specifies a remote journal mode in file system mount table, the modified kjournald tries to connect to the remote journal server. Metadata transfer is through TCP/IP, which guarantees the transfer can be accomplished without loss. However, if a transmission error happens, which may be caused by network congestion or server failures, the modified kjournald switches to the local journal mode for file system safety consistency.
The main function for journal flush in kjournald is journal_commit_transaction.
Journal_commit_transaction first update journal superblock which includes journal information. We modify journal_update_superblock function from disk commit to network commit. Then journal_commit_transaction tries to commit data buffer in ordered mode. After flush of all data that is needed flushing before metadata completes, we collect buffers which have journal data and commit it from buffer cache layer to network layer. If all commits are accomplished, we insert a checkpoint and release this file system transactions.
When recovery process starts, we try to connect remote server. We read journal superblock and necessary information for recovery from remote server. If there is any error, unfortunately we have to do a whole disk scanning because we do not have any information in order to recovery.
The best choice of the file system on the remote journal server is log-structured file system. Because workload on remote journal server is write-oriented in most time and
performance consideration, log-structured file system can achieve high consistency for the file system. The fast recovery of remote server file system is important, because any error of remote server causes clients doing whole disk scanning.
Chapter 6
Experiments result
We estimate the performance and overhead which is bring by remote journal here.
Remote journal scheme removes journal data traffic form disk to network. So we estimate performance raise, network bandwidth usage and other factors in this chapter.
6.1 Performance comparisons
There are throughput comparisons of three journal mode in Ext3 file system. We add non-journal serious as a the upper bound here. This helps us realizing the overhead of remote journaling. Figure 4 shows the overhead brings by remote journal is about 5% ~ 7% to upper bound and still outperforms about 10% (in writeback and ordered mode) to 21% (in journal mode).
Generally speaking, Ext3 journal mode that has to journal both data and metadata brings higher overhead, thus performance is only 78% of writeback mode. However, by remote journal the gap between writeback and journal mode becomes narrow. Figure 4 indicates the performance of journal mode raises to 86% of writeback mode by remote journaling. The raise is significant because it improves the availability of journal mode.
0
Figure 4 performance of remote journal
6.2 CPU utilities of remote journal
In this section, we record the CPU utilities of three mode and compare it. Figure 5, 6, 7 show the curves of normal mode, non-journal serious, and remote journal serious.
0
Figure 5 CPU utilities of writeback mode
0
Figure 6 CPU utilities of ordered mode
0
Figure 7 CPU utilities of journal mode
CPU usage time of non-journal and remote journal is higher than normal mode about 5% to 20%. However, the total time need to complete benchmark is less.
In order to understand how much CPU overhead will remote journaling brings, we integrate the area in figure 5 ~ 7 and show the result in table 3,4 and 5. Although remote journal brings higher CPU utilities, total CPU time approaches normal mode (lower than 10%).
Writeback RJ-Writeback Non-Writeback Ordered RJ-Ordered Non-Ordered Journal RJ-Journal Non-Journal
Figure 8 CPU time comparison
6.3 Network bandwidth taken by remote journal
Although file system performance benefits by removes journal I/O from disk to network in remote journal scheme, it may damage the network availability when using in network applications. Thus we estimate the network bandwidth taken by remote journal in three mode.
The results are shown in Figure 9. Y-axis shows network bandwidth percentage which is used
by remote journaling in gigabit Ethernet and X-axis is three mode of Ext3 file system.
Because the writeback and ordered mode only log metadata and journal mode logs both metadata and data, the journal mode has heavier burden on network than other two mode. In our workload writeback and ordered mode have only less than 2% network bandwidth and journal mode uses 6.6% network bandwidth in gigabit Ethernet. Note that the network burden in journal mode may different with workload. Thus the network overhead brings by remote journal becomes larger when workload includes larger data.
0.00%
1.00%
2.00%
3.00%
4.00%
5.00%
6.00%
7.00%
writeback ordered journal
Network usage utility per second
Figure 9 Network usage comparison
Chapter 7
Conclusion and future works
7.1 Conclusion
In this paper, we proposed a scheme named remote journal which improving performance of journal file systems. Remote journal improves file system performance by removing journal I/O from local disk to remote journal disk by network. If there is error when doing remote journaling, we switch the remote journal into local journal in order to guarantee the fast recovery. The consistency semantic of original file system will not be harmed because we do log the same journal data. We implement remote journal scheme in Ext3 file system, a popular journal file system in Linux. JBD layer in Linux and the daemon, kjournald, has been modified here.
The main advantage brings by remote journal is obvious performance upgrade which mainly results from remove of journal I/O. Another advantage is cost in hardware. A remote journal server can support many clients and thus the cost can be shared.
According to the experiments in this paper, remote journal increases about 10% (in Ext3 writeback and ordered mode) to 21% (in Ext3 journal mode) performance, but the penalty is light. Although remote journal does need more CPU time for network transfer, the overhead is less than 10%. On the other hand, we also consider that the journal traffic effect to network bandwidth. In our experiment result, overhead in writeback and ordered mode is light because only metadata is logged into journal by network. In our metadata bound workload, their overhead are just less than 2%. Nevertheless, Journal mode will have higher overhead because the journal traffic depends on workload. With big files workload, journal traffic is heavy and
overhead will be higher.
To sum up, remote journal scheme is a easy and cheap solution for improving performance of journal file system. It can be easily used after patching the kernel. In most time it brings better performance and keeps the same file system consistency semantic but low overhead penalty.
7.2 Future works
When we adopt the remote journal scheme in mobile storage, it can be used in disk power management. In order to save unused disk spinning power, most power saving approaches in disk try to make disk sleep time much longer. However, journal flush activities needs frequent update for consistency, but the sleep time in mobile storage suffers from frequent journal flush activities. This makes the disk wasting more energy on mode switch that includes spinning up and down.
Remote journal can solves this problem under this condition. Remote journal server can be a FTP server or a free mail space. If general Ethernet is used, remote journal can save energy by removing journal data I/O without any extra scheme. If wireless network is used, how to place journal needs more consideration. Because data transfer by wireless network also consumes much energy, a monitor and a arbiter are needed for controlling journal data flow. The monitor watches the status of disk and wireless network and know the power profile of the disk and wireless network. The arbiter controls the journal data placement by the monitor report. The journal superblock needs to be modified in order to indicate where the journal data is placed.
When the disk just enters sleep state, the arbiter redirects journal flush activities to remote server by wireless network. And if wireless network enters sleep mode or the energy needs by wireless transfer is greater then waking up disk, the arbiter flushes the journal data to disk.
Reference
[1] Baker, M., Asami, S., Deprit, E., Ousterhout, J., Seltzer, M. “Non-Volatile Memory for Fast, Reliable File Systems,” Proceedings of the 5th ASPLOS, pp. 10–22. Boston, MA, Oct. 1992.
[2] Chutani, S., Anderson, O., Kazer, M., Leverett, B., Mason, W.A., Sidebotham, R. “The Episode File System,” Proceedings of the 1992 Winter USENIX Technical Conference, pp. 43–60. San Francisco, CA, Jan. 1992.
[3] Elkhardt, K., Bayer, R. “A Database Cache for High Performance and Fast Restart in Database Systems,” ACM Transactions on Database Systems, 9(4), pp. 503– 525. Dec.
1984.
[4] Ganger, G., Patt, Y. “Metadata Update Performance in File Systems,” Proceedings of the First OSDI, pp. 49–60. Monterey, CA, Nov. 1994.
[5] Ganger, G., Patt Y. “Soft Updates: A Solution to the Metadata Update Problem in File Systems,” Report CSETR-254-95. University of Michigan, Ann Arbor, MI, Aug. 1995.
[6] Ganger, G., McKusick, M.K., Soules, C., Patt, Y., “Soft Updates: A Solution to the Metadata Update Problem in File Systems,” to appear in ACM Transactions on Computer Systems.
[7] Gray, J., Reuter, A. Transaction Processing: Concepts and Techniques. San Mateo, CA:
Morgan Kaufmann, 1993.
[8] Hagmann, R. “Reimplementing the Cedar File System Using Logging and Group Commit,” Proceedings of the 11th SOSP, pp. 155–162. Austin, TX, Nov. 1987.
[9] Haskin, R., Malachi, Y., Sawdon, W., Chan, G. “Recovery Management in QuickSilver,”
ACM Transactions on Computer Systems, 6(1), pp. 82–108. Feb. 1988.
[10] Katcher, J., “Postmark: A New File System Benchmark,” Technical Report TR3022.
Network Appliance Inc., Oct. 1997.
[11] M. I. Seltzer, G. R. Ganger, M. K. McKusick, K. A. Smith, C. A. N. Soules, and C. A.
Stein. Journaling versus soft updates: Asynchronous meta-data protection in file systems.
In Proc. of the 2000 USENIX Annual Technical Conference: San Diego, California, USA, June 2000.
[12] Matthews, J., Roselli, D., Costello, A., Wang, R., Anderson, T. “Improving the
Performance of Log-Structured File Systems with Adaptive Methods,” Proceedings of the 16th SOSP, pp. 238–251. Saint-Malo, France, Oct. 1997
[13] McKusick, M.K., Ganger, G., “Soft Updates: A Technique for Eliminating Most Synchronous Writes in the Fast Filesystem,” Proceedings of the 1999 Freenix track of the USENIX Technical Conference, pp. 1–17. Jun.1999.
[14] McKusick, M.K., Joy, W., Leffler, S., Fabry, R. “A Fast File System for UNIX,” ACM Transactions on Computer Systems 2(3), pp 181–197. Aug. 1984.
[15] Rosenblum, M., Ousterhout, J. “The Design and Implementation of a Log-Structured File System,” ACM Transactions on Computer Systems, 10(1), pp. 26–52.Feb. 1992.
[16] Seltzer, M., Bostic, K., McKusick, M.K., Staelin, C. “An Implementation of a Log-Structured File System forUNIX,” Proceedings of the 1993 USENIX Winter Technical Conference, pp. 307–326. San Diego, CA, Jan. 1993.
[17] Stein, C. “The Write-Ahead File System: Integrating Kernel and Application Logging,”
Harvard University Technical Report, TR-02-00, Cambridge, MA, Apr. 2000.
[18] Wilkes, J., Golding, R., Staelin, C., Sullivan, T. “The HP AutoRAID hierarchical storage system,” 15th SOSP, pp.96–108. Copper Mountain, CO, Dec. 1995.