* Proposed architectures - 具多字元狀態轉移之高效字串比對引擎

The remaining rows of the table are for software approaches. The approach of Scarpazza et al. [19] was implemented on an IBM Cell/B.E. processor which has eight SPEs (synergistic processing element); the throughput of each SPE is 5 Gbps and jointly is 40 Gbps. The approach of Tumeo et al. [24] was implemented on an Nvidia GPU Tesla C1060, which works at 1296 MHz (shader clock) and has thirty cores, and the throughput can achieve 15.6 Gbps. In the later work of Tumeo et al. [25], several architectures are evaluated where this table lists only the result with the highest performance, which is evaluated on a Cray XMT with 128 processors;

the resulting throughput is 28Gbps. The approach of Yang and Prasanna [22] was implemented on a 32-core Intel Manycore Testing Lab machine based on the Intel Xeon X7560 processors, which is an 8-core ‘Nehalem’ running at 2.26 GHz, and the resulting throughput is 34 Gbps.

The top five rows list the results of the approaches proposed in this thesis. The proposed multi-character AC-DFA and two configurable architectures are imple-mented in ASIC devices, and the proposed multi-character AC-NFA and hybrid AC-FA approaches are implemented in FPGA devices. In addition, the architecture with configurable stage scheme has been optimized by a two-stage pipeline circuit.

The results of the architecture with configurable data-width scheme is obtained by the implementation of 4-character units.

Among the proposed multi-character string matching approaches, the AC-NFA approach has the best performance due to the simplicity in architecture, while its

7.2. Discussions

circuit needs to be rebuilt when the keyword set is changed and is suitable to be implemented in programmable devices, such as FPGAs. The AC-DFA approach can operate at a higher clock rate; nevertheless, the multi-character AC-DFA approach would suffer from the problem of explosive transitions as the number of characters inspected in parallel increases. Although the proposed configurable data-width ar-chitecture operates at a relative low clock-rate, it can be configured to process more characters in parallel and can obtain a reasonable performance as compared with the AC-DFA approach. In addition, techniques of pipelining can be considered to optimize the operation clock of the configurable string matching architecture.

7.2 Discussions

The advantages of the hardware string matching accelerator are revealed from this comparison. The modern CPUs are sophisticated products that can run at very high speed and have wide data width, while they are designed for general purposes.

It is worth to note that a simple hardware string matching accelerator running at much lower clock can achieve the compatible throughput with respect to a software program running at a very powerful CPU. Moreover, a hardware string matching accelerator that can inspect multiple characters in parallel can achieve multiplied throughput at the same clock rate. As comparing with the software approaches that process multiple texts in multiple threads, it is more intuitive in a real application that the hardware approaches process multiple characters in parallel.

The proposed work aims to propose a systematic approach for deriving multi-character transitions and develop high efficient string matching engines capable of inspecting multiple characters in parallel, and builds the implementations mainly for verifying the effectiveness of architecture. The obtained results are preliminary and can be improved further. For instance, the priority multiplexer are used intensively in the proposed architectures and dominates the performances of the proposed ar-chitectures. The structure of priority multiplexer is much similar to the structure of CAM, which have been researched and improved in many previous works [33, 34].

Therefore, it is considerable to improve the structure of priority multiplexer by re-ferring to the structure of CAM. The performances of the proposed architecture should be improve further when the priority multiplexer is improved.

Chapter 8 Conclusions

This thesis first presents three approaches including AC-DFA, AC-NFA, and hybrid AC-FA approaches to implement the AC-algorithm. The AC-DFA approach can be implemented in a deterministic circuit while is inefficient in space. In contrast, the AC-NFA approach is efficient in space while is nondeterministic in implementation.

Therefore, this thesis proposes a hybrid AC-FA that combines both the advantages of the AC-DFA and AC-NFA approaches, i.e. efficiency in space and being deter-ministic in implementation. The deterdeter-ministic implementation is enable to design a general architecture of string matching to process various keyword sets.

Next, an intuitive algorithm is proposed to derive multi-character transitions from an AC-DFA, an AC-NFA, or a hybrid AC-FA, where each transition can match multiple characters at a time. This derivation algorithm also includes using assistant transitions and a pseudo state to resolve the alignment problem. Several architec-tures are also proposed to implement the derived multi-character AC-DFA, AC-NFA, and hybrid FA, respectively. Moreover, configurable architectures are also proposed to provide flexibility in applications. Evaluations are performed for the proposed architectures, respectively, to demonstrate their properties.

In summary, the proposed architectures of multi-character transition string match-ing engine are simple and intuitive, allowmatch-ing for its easy implementation for any required number of characters to be inspected in parallel. As a result, the proposed architectures can achieve efficient performance by inspecting multiple characters in parallel, while maintain the efficiency in the hardware implementation.

Bibliography

[1] Alfred V. Aho and Margaret J. Corasick. Efficient string matching: an aid to bibliographic search. Commun. ACM, 18(6):333–340, June 1975.

[2] N. Tuck, T. Sherwood, B. Calder, and G. Varghese. Deterministic memory-efficient string matching algorithms for intrusion detection. In INFOCOM 2004.

Twenty-third AnnualJoint Conference of the IEEE Computer and Communica-tions Societies, volume 4, pages 2628–2639 vol.4, 2004.

[3] Xinyan Zha and S. Sahni. Highly compressed Aho-Corasick automata for ef-ficient intrusion detection. In Computers and Communications, 2008. ISCC 2008. IEEE Symposium on, pages 298–303, 2008.

[4] Mansoor Alicherry, Muthusrinivasan Muthuprasanna, and Vijay Kumar. High speed pattern matching for network IDS/IPS. In Network Protocols, 2006.

ICNP’06. Proceedings of the 2006 14th IEEE International Conference on, pages 187–196. IEEE, 2006.

[5] Gerald Tripp. A parallel string matching engine for use in high speed network intrusion detection systems. Journal in Computer Virology, 2(1):21–34, 2006.

[6] Derek Pao and Xing Wang. Multi-stride string searching for high-speed content inspection. The Computer Journal, 55(10):1216–1231, 2012.

[7] Vahid Rahmanzadeh and MohammadBagher Ghaznavi-Ghoushchi. A multi-Gb/s parallel string matching engine for intrusion detection systems. In Ad-vances in Computer Science and Engineering, volume 6 of Communications in Computer and Information Science, pages 847–851. Springer Berlin Heidelberg, 2009.

[8] Burton H Bloom. Space/time trade-offs in hash coding with allowable errors.

Communications of the ACM, 13(7):422–426, 1970.

Bibliography

[9] Sarang Dharmapurikar, Praveen Krishnamurthy, Todd Sproull, and John Lock-wood. Deep packet inspection using parallel bloom filters. In High Performance Interconnects, 2003. Proceedings. 11th Symposium on, pages 44–51. IEEE, 2003.

[10] Wei Lin and Bin Liu. Pipelined parallel AC-based approach for multi-string matching. In Parallel and Distributed Systems, 2008. ICPADS’08. 14th IEEE International Conference on, pages 665–672. IEEE, 2008.

[11] Derek Pao, Wei Lin, and Bin Liu. A memory-efficient pipelined implementa-tion of the Aho-Corasick string-matching algorithm. ACM Trans. Archit. Code Optim., 7(2):10:1–10:27, October 2010.

[12] Nan Hua, Haoyu Song, and T. V. Lakshman. Variable-stride multi-pattern matching for scalable deep packet inspection. In INFOCOM 2009, IEEE, pages 415–423, 2009.

[13] V. Dimopoulos, I. Papaefstathiou, and D. Pnevmatikatos. A memory-efficient reconfigurable Aho-Corasick FSM implementation for intrusion detection sys-tems. In Embedded Computer Systems: Architectures, Modeling and Simulation, 2007. IC-SAMOS 2007. International Conference on, pages 186–193, 2007.

[14] YE Yang, Viktor K Prasanna, and Chenqian Jiang. Head-body partitioned string matching for deep packet inspection with scalable and attack-resilient performance. In Parallel & Distributed Processing (IPDPS), 2010 IEEE Inter-national Symposium on, pages 1–11. IEEE, 2010.

[15] Michela Becchi and Patrick Crowley. A hybrid finite automaton for practical deep packet inspection. In Proceedings of the 2007 ACM CoNEXT conference, CoNEXT ’07, pages 1:1–1:12. ACM, 2007.

[16] Yutaka Sugawara, Mary Inaba, and Kei Hiraki. Over 10Gbps string matching mechanism for multi-stream packet scanning systems. In Field Programmable Logic and Application, volume 3203 of Lecture Notes in Computer Science, pages 484–493. Springer Berlin Heidelberg, 2004.

[17] N. Yamagaki, R. Sidhu, and S. Kamiya. High-speed regular expression matching engine using multi-character NFA. In Field Programmable Logic and Applica-tions, 2008. FPL 2008. International Conference on, pages 131–136, 2008.

Bibliography

[18] T. Katashita, A. Maeda, K. Toda, and Y. Yamaguchi. A method of generating highly efficient string matching circuit for intrusion detection. In Field Pro-grammable Logic and Applications, 2006. FPL ’06. International Conference on, pages 1–4, 2006.

[19] Daniele Paolo Scarpazza, Oreste Villa, and Fabrizio Petrini. Exact multi-pattern string matching on the Cell/B.E. processor. In Proceedings of the 5th conference on Computing frontiers, CF ’08, pages 33–42. ACM, 2008.

[20] Leena Salmela, Jorma Tarhio, and Jari Kyt¨ojoki. Multipattern string matching with q-grams. Journal of Experimental Algorithmics (JEA), 11:1–1, 2007.

[21] Daniele Paolo Scarpazza, Oreste Villa, and Fabrizio Petrini. High-speed string searching against large dictionaries on the Cell/BE processor. In Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on, pages 1–12. IEEE, 2008.

[22] Y-HE Yang and Viktor K Prasanna. Robust and scalable string pattern match-ing for deep packet inspection on multicore processors. Parallel and Distributed Systems, IEEE Transactions on, 24(11):2283–2292, 2013.

[23] Oreste Villa, Daniel Chavarria-Miranda, and Kristyn Maschhoff. Input-independent, scalable and fast string matching on the Cray XMT. In Parallel

& Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pages 1–12. IEEE, 2009.

[24] Antonino Tumeo, Oreste Villa, and Donatella Sciuto. Efficient pattern match-ing on GPUs for intrusion detection systems. In Proceedmatch-ings of the 7th ACM international conference on Computing frontiers, pages 87–88. ACM, 2010.

[25] Antonino Tumeo, Oreste Villa, and Daniel G Chavarr´ıa-Miranda. Aho-Corasick string matching on shared and distributed-memory parallel architectures. Par-allel and Distributed Systems, IEEE Transactions on, 23(3):436–443, 2012.

[26] D Herath, C Lakmali, and R Ragel. Accelerating string matching for bio-computing applications on multi-core CPUs. In Industrial and Information Systems (ICIIS), 2012 7th IEEE International Conference on, pages 1–6. IEEE, 2012.

[27] Jennifer Stephenson and Paul Metzgen. Logic Optimization Techniques for Multiplexers. Altera Literature, 2004.

Bibliography

[28] Yi-Hua E Yang, Weirong Jiang, and Viktor K Prasanna. Compact architecture for high-throughput regular expression matching on FPGA. In Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Commu-nications Systems, pages 30–39. ACM, 2008.

[29] C.R. Clark and D.E. Schimmel. Scalable pattern matching for high speed networks. In Field-Programmable Custom Computing Machines, 2004. FCCM 2004. 12th Annual IEEE Symposium on, pages 249–257, 2004.

[30] Reetinder Sidhu and Viktor K Prasanna. Fast regular expression matching using FPGAs. In Field-Programmable Custom Computing Machines, 2001.

FCCM’01. The 9th Annual IEEE Symposium on, pages 227–238. IEEE, 2001.

[31] Snort.

https://www.snort.org.

[32] Benfano Soewito. Packet inspection on programmable hardware. Computer Engineering and Intelligent Systems, 4(2):57–68, 2013.

[33] Kenneth J Schultz. Content-addressable memory core cells a survey. INTE-GRATION, the VLSI journal, 23(2):171–188, 1997.

[34] Kostas Pagiamtzis and Ali Sheikholeslami. Content-addressable memory (cam) circuits and architectures: A tutorial and survey. Solid-State Circuits, IEEE Journal of, 41(3):712–727, 2006.

在文檔中具多字元狀態轉移之高效字串比對引擎 (頁 94-102)