為了達到高效能低成本的嵌入式網路儲存系統,必須就整個系統做全面設計 與測詴以找出系統中的瓶頸,並加以評估硬體的設計應使用多少的資源才能達到 最好的成本效益(cost-performance)。我們針對 Samba 網路儲存伺服器,提供一套 完整的測詴及設計流程,完整追蹤出 Samba 於處理資料時,函式間的相互關係。
我們追蹤包含系統呼叫下核心函式的完整路徑,以幫助設計嵌入式儲存系統時軟 硬體的整合;我們測量出資料複製於較大的 request file size 下佔了系統約 25%的時 間,接著利用追蹤到的完整路徑將 Samba 處理資料的部分遷移到核心,以達成 zero-copy,及減少 context switch。其中較大的檔案有將近 1.31 的效能改善,有效 的減輕了 CPU 的負擔。
此外就系統所花費的另 70%時間裡,經過我們的測量,結果顯示約有 40%左 右的時間是花在處理網路部分,接著進一步評估,若 zero-copy 加上 TOE 將可達到 約 2.62 的效能改善,因此未來若能將網路處理部分以 FPGA 上自行設計的 offload engine 來處理,並加上軟體部分的 zero-copy,即可達成一個高效能且低成本的網 路儲存式伺服器。
47
參考文獻
[1] Fall, K.R., and Pasquale, J., "Exploiting in-Kernel Data Paths to Improve I/O Throughput and Cpu Availability", in Proceedings of the 1993 USENIX Winter Technical Conference, 1993, pp. 327-334.
[2] Hewlett-Packard Company, "Linux Programmer's Manual - Sendfile", http://devresource.hp.com/STKL/man/RH6.1/sendfile_2.html, 2001.
[3] Buddhikot, M., "Project Mars: Scalable, High Performance, Web Based
Multimedia-Ondemand (Mod) Services and Servers", in Departmetn of Computer Science, Washington University, St. Louis, MO, USA, 1998
[4] Kim, H.-y., and Rixner, S., "Tcp Offload through Connection Handoff", in European Conference on Computer Systems, 2006, pp. 279-290.
[5] Dalessandro, D., Wyckoff, P., and Montry, G., "Initial Performance Evaluation of the Neteffect 10 Gigabit Iwarp Adapter", in Cluster Computing, 2006 IEEE International Conference, 2006, pp. 1-7.
[6] Balaji, P., Jin, H.W., Vaidyanathan, K., and Panda, D.K., "Supporting Iwarp
Compatibility and Features for Regular Network Adapters", in Cluster Computing, 2005. IEEE International, 2005, pp. 1-10.
[7] Dalessandro, D., Devulapalli, A., and Wyckoff, P., "Iwarp Protocol Kernel Space Software Implementation", in Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International, 2006, pp. 8 pp.
[8] Engel, J., Meneskie, J., and Kocak, T., "Performance Analysis of Network Protocol Offload in a Simulation Environment", in Atlantic Coast Marketing SE, 2006, pp.
762-763.
[9] Halvorsen, P., Jorde, E., Skevik, K.A., Goebel, V., and Plagemann, T.,
"Performance Tradeoffs for Static Allocation of Zero-Copy Buffers", in Proceedings of 28th Euromicro Conference, 2002, pp. 138-143.
[10] Kang, D.-J., Kim, Y.-H., Cha, G.-I., Jung, S.-I., Kim, M.-J., and Bae, H.-Y., "Design and Implementation of Zero-Copy Path for Efficient File Transmission", High Performance Computing and Communications, vol. 4208/2006, 2006.
[11] J. Tranter. Exploring the sendfile system call.
http://ldp.dvo.ru/LDP/LG/issue91/tranter.html [12] Andrew Tridgwell. Samba resources.
http://us1.samba.org/samba/, 1992.
[13] Tom's Guide web resources.
http://www.tomsguide.com/us/dlink-medialounge-dsm-g600-wireless-g-network-st
48
orage-enclosure,review-676-9.html
[14] Senapathi, S., and Hernandez, R., Introduction of Tcp Offload Engines (Dell Power Solution, 2004), pp. 103-107
[15] Gupta, P., Light, A., and Hameroff, I., Boosting Data Transfer with Tcp Offload Engine Technology on Ninth-Generation Dell Poweredge Servers (Dell Power Solutions, 2006), pp. 18 - 22
[16] Tianhua, L., Hongfeng, Z., Guira, C., and Chuansheng, Z., "Research and
Implementation of Zero-Copy Technology in Linux", in Sarnoff Symposium, 2006 IEEE, 2006, pp. 1-4.
[17] InfiniBand Trade Association, "Infiniband Architecture Specification", 2004.
[18] Borkar, S., Cohn, R., Cox, G., Gleason, S., Gross, T., Kung, H.T., Lam, M., Moore, B., Peterson, C., Pieper, J., Rankin, L., Tseng, P.S., Sutton, J., Urbanski, J., and Webb, J., "Iwarp: An Integrated Solution to High-Speed Parallel Computing", in Supercomputing '88. [Vol.1]. Proceedings., 1988, pp. 330-339.
[19] Wu, Z.-Z., Chen, H.-C., and Huang, C.-M., "The 10gbit Hba Hardware Design for Iwarp Offloading Engine", CCL TECHNICAL JOURNAL, 2005.
[20] CR, H., Implementing Cifs: The Common Internet File System (2004) [21] Samba resources.
http://us6.samba.org/samba/
[22] Wang, C.W., Performance Optimization of the Samba Read Service on Linux-Based Network-Attached Storage Systems (2008)
[23] Dong-Jae, K., Chei-Yol, K., Kang-Ho, K., and Sung-In, J., "Design and Implementation of Kernel S/W for Tcp/Ip Offload Engine(Toe)", in Advanced Communication Technology, 2005, ICACT 2005. The 7th International Conference on, 2005, pp. 706-709.
[24] Intel Corporation. IOmeter resources.
http://www.iometer.org/, 1998.
[25] Sysstat resources.
http://pagesperso-orange.fr/sebastien.godard/index.html [26] Intel. VTune Performance Analyzer resources.
http://www.intel.com/cd/software/products/asmo-na/eng/239144.htm [27] Strace resources.
http://linux.die.net/man/1/strace [28] Valgrind resources.
http://valgrind.org/
[29] IBM. Pvtrace resources.
http://www.ibm.com/developerworks/library/l-graphvis/
49
[30] Graphviz resources.
http://www.graphviz.org/
[31] Sun. Dtrace resources.
http://www.sun.com/bigadmin/content/dtrace/
[32] Red Hat, IBM, Intel, and Hitachi. SystemTap resources.
http://sourceware.org/systemtap/