# 網頁弱點最佳化補強 - 政大學術集成

全文

(2) Abstract. The security problems of web application are always questioned and concerned by users because that can cause huge loss of financial and. 政 治 大. privacy. We want to provide a online service that is open to public users, who can access and upload their codes to check for potential vul-. 立. nerabilities. Moreover, if there exist vulnerabilities and may be cause. ‧ 國. 學. damages, it will guide users how they can edit their codes through a easy way step by step.. ‧. In this paper, we propose an optimal word correction approach for patching string related vulnerabilities in web applications. To be brief, we synthesize patches that sanitize malicious inputs to normal ones. y. Nat. sit. with the shortest edit distance. The analysis consists of two phases:. al. er. io. First, we use automata based static string analysis techniques called Stranger to detect vulnerabilities in web applications, and generate. n. v i n C h are not exploited filter that ensures the vulnerabilities e n g c h i U with respect. sanitization signatures that accept un-malicious inputs as an input to given attack patterns. Second, we adopt the shortest edit-distance algorithms between words and automata to find a minimum way on the cost of edit distance to patch malicious inputs. A malicious input (not accepted by the sanitization signature) is replaced with an unmalicious string and has the minimum change of character from the original input. We integrate the presented approach with Stranger and report the result of experiments on various web applications.. Keyword : Web Security, Patch Synthesis, Word Correction, Word Analysis..

(3) Contents List of Figures. 立. List of Tables. 政 治 大. v vii. ‧ 國. 學. 1 Introduction 1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . .. 1 1. Patching Vulnerabilities Online . . . . . . . . . . . . . . . . . . .. 1.3. Word Correction . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2. 1.4. Content Organization . . . . . . . . . . . . . . . . . . . . . . . . .. 2. ‧. 1.2. Nat. sit. y. 2. er. io. 2 Related Work 2.1 String Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . .. al. n. v i n . . . . .C. . . . . . . . . . . . . . . . . . . . . . . hengchi U. 4 4. 2.2. Word Correction . . . . . . . . . . . . . . . . . . . . . . . . . . .. 5. 2.3. Patch Synthesis. 6. 3 Overview 3.1. 9. Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Vulnerability Analysis and Sanitization Generation . . . .. 9 10. 3.1.2. Sanitization Patching . . . . . . . . . . . . . . . . . . . . .. 11. A multi-track example . . . . . . . . . . . . . . . . . . . . . . . .. 12. 4 Algorithm 4.1 Automata composition . . . . . . . . . . . . . . . . . . . . . . . .. 15 16. 3.2. 4.1.1 4.2. Extension to Multi-track . . . . . . . . . . . . . . . . . . .. 20. Character composition . . . . . . . . . . . . . . . . . . . . . . . .. 23. 4.2.1. 27. Extension to Multi-track . . . . . . . . . . . . . . . . . . .. iii.

(4) CONTENTS. 4.2.2. Pre-Computation of Shortest Distance . . . . . . . . . . .. 31. 5 Experiment 5.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 33 33. 5.2. Correction Effect . . . . . . . . . . . . . . . . . . . . . . . . . . .. 36. 5.3. Attack Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 38. 6 Conclusions. 41. References. 42. 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. iv. i n U. v.

(5) List of Figures . . . . . . . . . . . . . . . . . . . . . 治 ........... 政 Patches for the example in multi-track . . . .大 The graph of non-vulnerability signature . . . . . . . . . . . . . . 立. 12. 3.4 3.5. Patches for the example in multi-track . . . . . . . . . . . . . . . OptPatch function . . . . . . . . . . . . . . . . . . . . . . . . . .. 13 14. 4.1 4.2. The process graph of Automata composition . . . . . . . . . . . . The process graph of Automata composition based on multi-track. 19 21. 4.3. The schematic diagram of multi-track composition . . . . . . . . .. 22. 4.4. The process graph of Character composition . . . . . . . . . . . .. 26. 4.5. The process graph of Character composition based on multi-track. 30. 5.1. The graph of average time curve based on the length of input. 學. ‧. y. Nat. io. sit. 3.3. n. al. 10 13. er. 3.2. The architecture of patcher. ‧ 國. 3.1. Ch. engchi. (1)Single track automata 1(Q=12,bdd=29). i n U. v. (2)Single track automata 2(Q=62,bdd=127) (3)Single track automata 3(Q=74,bdd=298) (4)The average time of Character composition from (1)(2)(3) (5)The average time of Pre-computation from (1)(2)(3) . . . . . .. v. 34.

(6) LIST OF FIGURES. The graph of average time curve based on the length of inputs (1)Multi-track automata 1(Q=21,bdd=77) with the increasing length of input 1 (2)Multi-track automata 2(Q=36,bdd=72) with the increasing length of input 1 (3)Multi-track automata 1(Q=21,bdd=77) with the increasing length of both inputs (4)The average time of Character composition from (1)(2)(3) (5)The average time of Pre-computation from (1)(2)(3) . . . . . .. 立. 政 治 大. 學 ‧. ‧ 國 io. sit. y. Nat. n. al. er. 5.2. Ch. engchi. vi. i n U. v. 35.

(7) List of Tables. multiple edit result . . . . . . . . . . . . . . . . . . . . . . . . . .. 40. 學. ‧. Nat. y. 5.4. io. sit. 5.3. 36. n. al. er. 5.2. 治. . . . . . . . . . . . . . 政automata Attack inputs based on testing 大 Attack patterns edit立 result . . . . . . . . . . . . . . . . . . . . . . testing automata information . . . . . . . . . . . . . . . . . . . .. ‧ 國. 5.1. Ch. engchi. vii. i n U. v. 37 39.

(8) 1 Introduction. Background and Motivation. ‧ 國. 學. 1.1. 立. 政 治 大. ‧. In recent year, with the rapid development of network and smart phone, web applications have become a crucial part of commerce, entertainment and social interaction. In the near future, they are expected to play critical roles because of. y. Nat. the popularization of cloud service and e-commerce. However, the security prob-. sit. al. er. io. lems on the internet are also questioned and concerned by users because that can cause huge loss of financial and privacy. An well-know example in 2014 is. n. Mt.Gox, which is a biggest Bitcoin (a type of virtual currency) exchange based on. Ch. i n U. v. the world, announced bankrupt because that around 850,000 bitcoins belonging. engchi. to customers had been stolen. We can learn this a very serious problem because malicious users around the world can exploit a vulnerable and cause serious damage from this case. Protecting customers assets and information is the basic responsibility of a company. However, the big cost of information security expert makes lots of company step back. There are many information-security tools on the market, the data of that software are not user-friendly, only suitable for engineers to understand and maintain their programs. For non-technical userse.g., managers of IT departments or application users or the bloggersit is hard to tell them whether the web application is safe to use, or how to fixed the vulnerabilities easily if that. 1.

(9) 1.2 Patching Vulnerabilities Online. have some problems.. 1.2. Patching Vulnerabilities Online. Patcher is a online service that is open to public users, who can access and upload their codes to check for potential vulnerabilities. Moreover, if there exist vulnerabilities and may be cause damages, patcher will guide users how they can edit their codes through a easy way step by step. The service provides users with. 政 治 大 risks. We believe this service will certainly reduce the risks of Web applications 立 and improve their security easily. a clear view of vulnerabilities of target applications and a quick fix to reduce their. ‧ 國. 學. 1.3. Word Correction. ‧. Word correction is widely used in searching , text processing, and speech. y. Nat. recognition. However, There is no research about Web security based on word. sit. correction. One of most common ways to eliminate the vulnerability is halting the. n. al. er. io. execution and blocking the input when it matches a vulnerability signature. In this paper, we adopt the edit-distance algorithms between words and automata to. Ch. i n U. v. find a minimum way on the cost of edit distance to modify malicious inputs. The. engchi. edit-distance between two strings is the smallest number of operations required to transform one string into the other. We can use the edit distance as a similarity measure between two strings. The shorter distance implies that the two strings are more similar. The advantage of this algorithm is that we can reserve the original inputs closely and lift the restrictions of enter string. e.g., for preventing XSS attacks, the input fields of some Website limit the length of string and usage of ”¡”.. 1.4. Content Organization. In this paper, we present techniques for automatically detecting and patching string related vulnerabilities in web applications.. 2.

(10) 1.4 Content Organization. The rest of the paper is structured as follows:. • Chapter 2:We review related work. • Chapter 3:We give an overview of Patcher and the patching process. • Chapter 4:We introduce our optimal word correction algorithm. • Chapter 5:We evaluate our algorithm with several automata and get the performance.. 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. 3. i n U. v.

(11) 2 Related Work String Analysis. ‧ 國. 學. 2.1. 立. 政 治 大. ‧. Due to its importance in security, string analysis has been widely studied. One influential approach has been grammar-based string analysis that statically computes an over-approximation of the values of string expressions in Java pro-. y. Nat. grams (2) which has also been used to check for various types of errors in Web. sit. al. er. io. applications (7, 12, 19, 20). In (12, 19), multi-track DFAs, also known as transducers, are used to model replacement operations. There are also several re-. n. cent string analysis tools that use symbolic string analysis based on DFA en-. Ch. i n U. v. codings (6, 16, 24). Some of them are based on symbolic execution and use a. engchi. DFA representation to model and verify the string manipulation operations in Java programs (6, 16). HAMPI (10) is a bounded string constraint solver that searches for a string that satisfies a given set of string constraints by bounding the string length. In Fang Yu’s early work (22, 24), they use single-track DFA based symbolic reachability analysis to verify the correctness of string sanitization operations in PHP programs. (21) report how generating (non-relational) vulnerability signatures using single-track DFA . (25) also propose the foundations of string analysis using multi-track automata. In (23) present a relational vulnerability signature (i.e., a vulnerability signature that involves more than one input) generation technique for strings, and propose patches based on a min-cut algorithm on vulnerability signatures. This work addresses sanitization synthesis using the presented techniques on automata.. 4.

(12) 2.2 Word Correction. 2.2. Word Correction. The problem of computing the edit-distance between a string and a finite automaton or a string arises in a variety of applications in computational biology, text processing, and speech recognition. In all these cases, an optimal alignment is also typically sought. In computational biology, this may be to infer the function and various properties of the original protein sequence from the one that is best aligned with. In speech recognition, this determines the best transcription hypothesis contained in the lattice. Compared to the previous work, we. 政 治 大. apply optimal word correction algorithms to Web application security, synthesizing optimal patches to secure vulnerable web applications. Below we review some previous works in word corrections.. 立. ‧ 國. 學. Wagner (18) sketched an algorithm which uses finite state automata to calcu-. ‧. late the minimum edit distance required for correcting an erroneous word belonging to a regular language. When an input string was given, the automata which applied the algorithm will read one character each time and change it to a valid. Nat. sit. y. one in the minimal number of edit operations. After the automata going through. er. io. the whole input string, we get a reasonable correction for the input string that takes minimum edit distance.. al. n. v i n C hproblems. The algorithm I, which deals with string correction adopted dynamic i U e h n c g programming principles and tend to calculate the edit distance recursively. The Kashyap and Oommen (9) introduces an effective algorithm, called Algorithm. approach does not need to calculate the edit distance for all the possible corrected words individually and try to utilize the information it obtained in the process. Although the approach needs more memory for keeping results of internal edit distance computation, it brings the advantage of reducing the complexity of the while calculating process (18). Oflazer (13) presents an application of this to error-tolerant analysis of the agglutinative morphology of Turkish words. The algorithm can be applied to morphological analysis of any language whose morphology has been fully captured by a single (and possibly very large) finite-state transducer, regardless of the word formation processes and morphographemic phenomena involved. In the context of. 5.

(13) 2.3 Patch Synthesis. spelling correction, error-tolerant recognition can be used to enumerate candidate correct forms from a given misspelled string within a certain edit distance. In (4), Cuceran and Brill investigate their use in spelling correction of search queries, a task which poses many additional challenges beyond the traditional spelling correction problem. They present an approach that uses an iterative transformation of the input query strings into other strings that correspond to more and more likely queries according to statistics extracted from internet search query logs. Allauzen and Mohri (1) propose a linear space algorithm to find out minimal. 政 治 大. edit distance between two finite automata. They further consider a word lattice as a weighted automaton, and extend the algorithm to the edit-distance between. 立. a string and a weighted automaton.. ‧ 國. 學. 2.3. Patch Synthesis. ‧. There has been previous work on automatically generating filters for blocking. sit. y. Nat. bad input (3). Although this is similar to our match-and-block strategy, there are several significant differences with our work. First, earlier work (3) focuses on. al. er. io. buffer-overflow vulnerabilities which are different than the string vulnerabilities. n. we investigate here. Second, in earlier work (3) the generation of filters is done. Ch. i n U. v. starting with an existing exploit, whereas we start with an attack pattern. For patching string related vulnerabilities, there are other work:. engchi. Su and Wasserman (17) present the first formal definition of command injection attacks. they also propose a approach SQLCHKS, that is blocking user queries in which the input substrings change the syntactic structure of the rest of the query, to prevent command injection based on context-free grammars and compiler parsing techniques. It identified command injection attacks precisely and incurred low runtime overhead. ScriptGuard (15) is a system for ASP.NET applications which can detect and repair the incorrect placement of sanitizers for avoiding two type of error : context-mismatched sanitization and inconsistent multiple sanitization. Its has two part of components, one is a training and analysis phase and the other one. 6.

(14) 2.3 Patch Synthesis. is runtime auto-sanitization. Training and analysis phase traces the dynamic execution of the application on test inputs. The results of the analysis produce a sanitization cache, which records all execution paths that have context inconsistency, learning a map between code paths and the correct sequence of sanitizers to apply for the browser context reached by that code path. This cache will be a basis for runtime auto-correction through detecting which path is actually executed. ScriptGuard can enforce the run time execution applying the correct sanitizers. It is orthogonal to our approach where we synthesize effective patches for specific kinds of vulnerabilities with respect to attack patterns.. 政 治 大. Samuel, Saxena, and Song (14) focus on securing web application vulnerabilities via web language frameworks. Through a context-sensitive auto-sanitization. 立. (CSAS) engine, it execute two steps for sanitization. First step is type inference. ‧ 國. 學. that convert vanilla templates into an internal representation (or called IR) complying with their type rules, and the other one is compilation that compile the well-typed IR to the template language code with sanitization.. ‧. Livshits and Chong (11) propose a fully automatic technique for sanitizer. y. Nat. placement by analyzing the flow of tainted data in the program. There are two. sit. er. io. strategies for automatic sanitizer placement based on the dataflow graph. One is a node-based entirely static approach that will choose a sanitizer based on the must-passed nodes in the graph, but it will not sanitize correctly when there is no. n. al. i n U. v. must-passed nodes. The other one is an edge-based approach that defined several. Ch. engchi. different kinds of edges that are relevant to the run-time discipline for applying correct sanitization to values. deDacota (5) implement a novel approach to securing legacy web applications by automatically and statically rewriting an application, so that the code and data can separated in its web pages for avoiding the server-side cross-site scripting attacks. There are three steps of this approach: (1)they statically determine a conservative approximation of the page’s HTML output, (2)they extract all inline JavaScript from the approximated HTML output, and (3) they rewrite the application so that all inline JavaScript is moved to external files which will be the only JavaScript that the browser will execute based on a Content Security Policy.. 7.

(15) 2.3 Patch Synthesis. WEBLOG (8) is a declarative web development language designed to eliminate todays most prevalent security vulnerabilities. For SQL injection attacks, WEBLOG developers write all the SQL commands directly, without trying to combine the query code with input that ensures that user inputs are never treated as SQL keywords. For cross-site scripting attacks, WEBLOG performs output sanitization that sanitize all user supplied data only when data is sent to the renderer. Unlike prior results, in this paper, we generate sanitization statements that repair bad inputs using the optimal strategy with sanitization signatures. which. 政 治 大. has not bee done before to the best of our knowledge. Our contributions in this paper can be summarized as follows: We present novel techniques for automati-. 立. cally generating patches that eliminate string vulnerabilities in web applications.. ‧ 國. 學. We use two types of analysis: One based on string analysis to generate vulnerability and sanitization signatures, and the other based on word correction to modify malicious inputs in an optimal way. To the best of our knowledge this. ‧. is the first paper that applies word correction to address web application secu-. y. Nat. rity with relational signatures (i.e., a vulnerability or sanitization signature that. sit. n. al. er. io. involves more than one input) generation technique for strings. We present and implement two algorithms for searching optimal edition on malicious inputs.. Ch. engchi. 8. i n U. v.

(16) 3 Overview Architecture. ‧ 國. 學. 3.1. 立. 政 治 大. ‧. We start with a set of attack patterns (regular expressions) that characterize possible attacks (either taken from an attack pattern specification library or. Nat. synthesis approach works in two phases showing in figure 3.1:. y. written by the web application developer). Given an attack pattern, our string. sit. al. er. io. Phase 1: Vulnerability Analysis and Sanitization Generation: First, we use automata-based static string analysis techniques to determine if the web. n. application is vulnerable to attacks characterized by the given attack pattern. Ch. i n U. v. and generate a characterization of the potential attack strings if the application. engchi. is vulnerable. Then we project these attack strings to user inputs and compute an overapproximation of all possible inputs that can accept without harmful. This characterization of user inputs is called the non-vulnerability signature for a given attack pattern. Phase 2: Sanitization Patching: Once we have the non-vulnerability signature, we dynamically synthesize patches that eliminate the vulnerability using optimal patch strategy, edit a smallest number of characters or stay the same to match the non-vulnerability signature, in two ways: • Automata Composition: Composite the non-vulnerability signature and input string in a graph directly and find out the shortest path to be the. 9.

(17) 3.1 Architecture. 立. 政 治 大. sit. er. io. reference of edition.. y. ‧. ‧ 國. 學. Nat. Figure 3.1: The architecture of patcher. al. n. v i n C h the input string • Character Composition: Separate into a set of characters U i e h n g c sequentially. Find out the and composite the non-vulnerability signature shortest path from initial statement to current characters level by level until the end.. 3.1.1. Vulnerability Analysis and Sanitization Generation. For vulnerability analysis and Non-vulnerability Signature Generation, we use the analysis tool called Stranger (22) that deterministic finite automata (DFAs) to represent values that string expressions can take and generate vulnerability signature. When getting the php code, the tool will analysis for identifying the tainted sink, which are sensitive function that may cause a vulnerability. The. 10.

(18) 3.1 Architecture. attack patterns intersecting the results of the forward analysis at sinks gives us the potential automata-based attack strings called vulnerability signature if the program is vulnerable. Based on these vulnerability signatures, we generated effective patches for inputs that ensure the removal of all reachable attack strings on sinks. It also can generate a relational vulnerability signature based on multitrack deterministic finite automata (MDFA). An MDFA has multiple tracks and reads one symbol for each track in each transition. This approach proposed due the combinations of inputs that can exploit the vulnerability. For example, if an attack string is generated by concatenating two input strings, it might not be. 政 治 大. possible to prevent the attack by blocking only one of the inputs, since a string coming from one input can lead to an attack if it is concatenated with a suitably. 立. constructed string coming from another input.. ‧ 國. 學. In this paper, we make one more step, that is computing an over-approximation and turning vulnerability signature into non-vulnerability signature. e.g.,the vulnerability signature Σ∗ < SCRIP T Σ∗ will become Σ∗ (Σ− < SCRIP T ) Σ∗ while. ‧. Σ representing all the user input. Non-vulnerability signature is just like white-. y. sit er. Sanitization Patching. io. 3.1.2. Nat. list, which means the automata accepts all strings except Σ∗ < SCRIP T Σ∗ .. al. n. v i n C the input values that will not exploit We modify the input in U h evulnerability. i h n c input is safe and cannot lead optimal minimal way to guarantee that the g edited The non-vulnerability signature gives an over-approximation of all possible. to any attack strings. Our goal is to find an optimal minimum set of characters, such that if we edit those characters from a given string, the resulting string will be accepted by the DFA. As we discuss in chapteralgorithm, this corresponds to finding a shortest path in the graph defined by the states and the transitions of the union of input and DFA dynamically, i.e., composition input string and DFA into a graph, there are a sequence of edges has minimum cost in the graph from the initial state to a final state. After composition, there must be at least one path from an initial state to a final state. Note that each edge of the union DFA is labeled with two symbols representing an edit action from symbol 1 to symbol 2. If symbol 1 is equal to symbol 2, there is no edit action and cost. We use an. 11.

(19) 3.2 A multi-track example. optimal algorithm to compute the smallest edit distance that contains minimum cost of edit action. Then we generate a patch that the original string does the edit action by the sequence of edges.. 3.2. A multi-track example. In this paper, we propose a first word correction algorithm between more than one string and MDFA. Therefore, we give an overview of our framework using a multi-track example.. 政 治 大. <?php $title=$_GET["title"]; $name=$_GET["name"]; $out = "NAME : " . $title . $name; echo $out; ?>. ‧. ‧ 國. 立. 學. 1 2 3 4 5 6. Figure 3.2: Patches for the example in multi-track. y. Nat. sit. Consider the PHP script shown in Figure 3.2. This script starts with assigning. n. al. er. io. the user input provided in the _GET array to the variable title and name in line 2,3. It concatenates a constant combination string assigns it to another variable. Ch. i n U. v. out in line 4. Then it simply outputs the variable out using the echo statement. engchi. in line 5. The combinations of two inputs that can exploit the vulnerability potentially. After upload this php code to patcher, it warns the danger from XSS attack and generates the non-vulnerability signature(or call WhiteAuto). The result of automata can be seen in figure 3.3. The circles are the static states and the double circles are the accept states of the automata. The arrows of transitions sign the symbols can make the state jump to the other state. The symbols made up by 8-bit boolean in one track represent the ASCII code of the accept characters(the symbol X means 0 or 1). i.e., 28 can include most of all the characters. Every transition only represents a character in one time. Those paths of automata from initial states to final states make up the accept strings of this automata.. 12.

(20) 3.2 A multi-track example. 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 0 0 0 0 1 X 0 1 1 1 1 X 0 1 1 1 1 0 1 1 1 1 X X X 0 1 1 1 X X 0 1 1 1 X 0 0 0 1 X X X X 0 1 1 X X X 0 1 1 X 0 1 1 X X X X X X 0 1 X X X X 0 1 X X 0 1 X X X X X X X 0 X,X,X,X,1,X,X,X,1,X,X,X,X,X,X,X,X,X. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 X X 0 1 1 1 1 X 0 0 0 0 1 1 1 1 X X X 0 1 1 1 X 0 0 0 1 0 1 1 1 X X X X 0 1 1 X 0 0 1 X X 0 1 1 X X X X X 0 1 X 0 1 X X X X 0 1 X X X X X X 0 X,1,X,X,X,X,X,1,X,X,X,X,X,X,X,X. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 0 0 0 0 1 X 0 1 1 1 1 X 0 1 1 1 1 0 1 1 1 1 X X X 0 1 1 1 X X 0 1 1 1 X 0 0 0 1 X X X X 0 1 1 X X X 0 1 1 X 0 0 1 X X X X X X 0 1 X X X X 0 1 X 0 1 X X X X X X X X 0 X,X,X,X,1,X,X,1,X,X,X,X,X,X,X,X,X,X. 1. 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 0 1 1 1 1 1 0 1 1 1 1 1 X X 0 1 1 1 1 X 0 1 1 1 1 X X X 0 1 1 1 X X 0 1 1 1 X X X X 0 1 1 X X X 0 1 1 X X X X X 0 1 X X X X 0 1 X X X X X X 0 X X X X 1 X X X X X X X X 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1,1,1,1,1,1,1,1,1,1,1,1,1. 0. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 0 0 0 0 1 X 0 1 1 1 1 X 0 1 1 1 1 0 0 0 0 1 X X X 0 1 1 1 X X 0 1 1 1 0 1 1 1 X X X X X 0 1 1 X X X 0 1 1 X 0 0 1 X X X X X X 0 1 X X X X 0 1 X 0 1 X X X X X X X X 0 X,X,X,X,1,X,X,0,X,X,X,X,X,X,X,X,X,X. 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 0 0 0 0 1 X 0 1 1 1 1 X 0 1 1 1 1 0 1 1 1 1 X X X 0 1 1 1 X X 0 1 1 1 X 0 0 0 1 X X X X 0 1 1 X X X 0 1 1 X 0 0 1 X X X X X X 0 1 X X X X 0 1 X 0 1 X X X X X X X X 0 X,X,X,X,1,X,X,X,0,X,X,X,X,X,X,X,X,X. 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 0. 14. 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 0. 1 1 1 1 1 1 1 1 0 1 0 1 0 0 1 0 1 1 1 1 1 1 1 1 0 1 0 0 0 0 1 1. 1 1 1 1 1 1 1 1 0 1 0 1 0 0 1 1. 1 1 1 1 1 1 1 1 0 1 0 1 0 0 0 0. 12 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 0. 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 0. 4. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 0 0 0 0 1 X 0 1 1 1 1 X 0 1 1 1 1 0 0 0 0 1 X X X 0 1 1 1 X X 0 1 1 1 0 0 0 1 X X X X X 0 1 1 X X X 0 1 1 0 0 1 X X X X X X X 0 1 X X X X 0 1 0 1 X X X X X X X X X 0 X,X,X,X,1,X,X,0,X,X,X,X,X,X,X,X,X,X. 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 0 0 0 0 1 X 0 1 1 1 1 X 0 1 1 1 1 0 1 1 1 1 X X X 0 1 1 1 X X 0 1 1 1 X 0 0 0 1 X X X X 0 1 1 X X X 0 1 1 X 0 0 1 X X X X X X 0 1 X X X X 0 1 X 0 1 X X X X X X X X 0 X X X X 1 X X X 0 X X X X X X X X X 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1. 1 1 1 1 1 1 1 1 0 1 0 0 1 0 0 1. 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 0. 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 0. 0 1 0 1 0 0 0 0 1 1 1 1 1 1 1 1. 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 0 0 0 0 1 X 0 1 1 1 1 X 0 1 1 1 1 0 0 0 0 1 X X X 0 1 1 1 X X 0 1 1 1 0 1 1 1 X X X X X 0 1 1 X X X 0 1 1 X 0 0 1 X X X X X X 0 1 X X X X 0 1 X 0 1 X X X X X X X X 0 X X X X 1 X X 0 X X X X X X X X X X 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1. 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 0 0 0 0 1 X 0 1 1 1 1 X 0 1 1 1 1 0 0 0 0 1 X X X 0 1 1 1 X X 0 1 1 1 0 0 0 1 X X X X X 0 1 1 X X X 0 1 1 0 0 1 X X X X X X X 0 1 X X X X 0 1 0 1 X X X X X X X X X 0 X X X X 1 X X 0 X X X X X X X X X X 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 0 0 0 0 1 X 0 1 1 1 1 X 0 1 1 1 1 0 1 1 1 1 X X X 0 1 1 1 X X 0 1 1 1 X 0 0 0 1 X X X X 0 1 1 X X X 0 1 1 X 0 0 1 X X X X X X 0 1 X X X X 0 1 X 0 1 X X X X X X X X 0 X X X X 1 X X X 1 X X X X X X X X X 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1. 10. 0 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1. 8. 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1. 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1. 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 0 0 0 0 1 X 0 1 1 1 1 X 0 1 1 1 1 0 1 1 1 1 X X X 0 1 1 1 X X 0 1 1 1 X 0 0 0 1 X X X X 0 1 1 X X X 0 1 1 X 0 0 1 X X X X X X 0 1 X X X X 0 1 X 0 1 X X X X X X X X 0 X X X X 1 X X 1 X X X X X X X X X X 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1. 16. 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1. 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 X X 0 1 1 1 1 X 0 0 0 0 1 1 1 1 X X X 0 1 1 1 X 0 0 0 1 0 1 1 1 X X X X 0 1 1 X 0 0 1 X X 0 1 1 X X X X X 0 1 X 0 1 X X X X 0 1 X X X X X X 0 X 1 X X X X X 1 X X X X X X X X 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1. 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1. 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1. 0 1 0 1 1 1 0 1 0 1 0 1 0 1 0 1 1 0 1 0 1 1 1 0 1 0 1 0 1 0 1,0. 0 1 0 1 0 1 0 0 1 1 1 1 1 1 1 1. 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 0 1 1 1 1 1 0 1 1 1 1 1 X X 0 1 1 1 1 X 0 1 1 1 1 X X X 0 1 1 1 X X 0 1 1 1 X X X X 0 1 1 X X X 0 1 1 X X X X X 0 1 X X X X 0 1 X X X X X X 0 X X X X 1 X X X X X X X X 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1,1,1,1,1,1,1,1,1,1,1,1,1. 0 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1. 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 X X 0 1 1 1 1 X 0 0 0 0 1 1 1 1 X X X 0 1 1 1 X 0 0 0 1 0 1 1 1 X X X X 0 1 1 X 0 0 1 X X 0 1 1 X X X X X 0 1 X 0 1 X X X X 0 1 X X X X X X 0 X 1 X X X X X 1 X X X X X X X X 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1. 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 1 0 1 0 0 1 0 1 1 1 1 1 1 1 1. 0 1 0 1 0 1 0 0 1 1 1 1 1 1 1 1. 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1. 0 1 0 0 1 0 0 1 1 1 1 1 1 1 1 1. 3. 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1. 5 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1. 0 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 7 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1. 17. 15. 2 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1. 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1. 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1. 6. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 0 0 0 0 1 X 0 1 1 1 1 X 0 1 1 1 1 0 1 1 1 1 X X X 0 1 1 1 X X 0 1 1 1 X 0 0 0 1 X X X X 0 1 1 X X X 0 1 1 X 0 0 1 X X X X X X 0 1 X X X X 0 1 X 0 1 X X X X X X X X 0 X,X,X,X,1,X,X,X,1,X,X,X,X,X,X,X,X,X. 1 1 1 1 1 1 1 1 0 1 0 1 0 1 0 0. 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 0 0 0 0 1 X 0 1 1 1 1 X 0 1 1 1 1 0 1 1 1 1 X X X 0 1 1 1 X X 0 1 1 1 X 0 0 0 1 X X X X 0 1 1 X X X 0 1 1 X 0 1 1 X X X X X X 0 1 X X X X 0 1 X X 0 1 X X X X X X X 0 X X X X 1 X X X 1 X X X X X X X X X 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1. 13. 0 1 0 0 1 0 0 1 1 1 1 1 1 1 1 1. 0 1 0 1 0 0 1 0 1 1 1 1 1 1 1 1. 0 1 0 1 0 0 0 0 1 1 1 1 1 1 1 1. 11. 9. 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1. 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1. 0 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1. 政 治 大. Figure 3.3: The graph of non-vulnerability signature. 立. <?php include("optPatch/optPatch.php"); $title=optPatch($_GET["title"],3930,0); $name=optPatch($_GET["name"],3930,1); $out = "NAME : " . $title . $name; echo $out; ?>. ‧. ‧ 國. 學. 1 1.1 2 3 4 5 6. sit. y. Nat. io. n. al. er. Figure 3.4: Patches for the example in multi-track. i n U. v. For the purpose of computing that complex automata, we propose a easy-used package for users. It just decompress that and put into the directory of website.. Ch. engchi. Patcher will provide the instruction of editing your code with certain line number. Figure 3.4 shown that PHP code adds function to including optPatch.php in line 1.1 to use our package. optPatch in line 1.2 is the function that can compute the input string and edit it if it’s malicious and does not accept by sanitization signature. The parameters of optPatch are input variable (title,name), sink ID(3930) signing the corresponding automata, and input index(0,1) signing the order of inputs if there is a multi-track sink problem. In order to generate the sanitization statements from multi-track sanitization signatures, we use the function optPatch in the sanitization signature MDFA. For the 2-track case, we call optPatch twice but same $sinkId for the ouput based on $inputNo. The result will be the minimum cost edited strings accepting by. 13.

(21) 3.2 A multi-track example. 1 2 3 4. optPatch($inputVar,$sinkId,$inputNo){ $VARS = getInputVars($inputVar,$sinkId,$inputNo); $whiteAutoPath = $dir.DIRECTORY; $cmdLine = $__PATCHER__JAVA_BIN." -jar ". $optPatchJarPath." ".$whiteAutoPath." "; 5 for($i =0;$i<count($VARS);$i++) 6 { 7 $cmdLine.="\"".$VARS[$i]."\" "; 8 } 9 exec($cmdLine,$__PATCHER__optPatchs["$sinkId"],$exitValue); 10 $result = $__PATCHER__optPatchs["$sinkId"][$inputNo]; 11 return $result 12}. 立. 政 治 大. Figure 3.5: OptPatch function. ‧ 國. 學. patch sanitization signature. e.g., the attack strings [”/<SCRI” , ”PT PI@”]. ‧. becomes [”@/ SCRI”,”PT PI@”] cost 1 edit-distance after make the computation in Figure 3.4.. y. Nat. In Figure 3.5 shows optPatch function to connect the user’s web application. sit. to our word correction algorithm based on non-vulnerability signature. There is. n. al. er. io. a array $VARS putting all input strings by calling getInputVars if the condition is multi-track and there are more than one $inputNo based on same $sinkId in. Ch. i n U. v. line 2. $whiteAutoPath made by PHP directory and $sinkId for pairing the patch. engchi. sanitization signature in line 3. For patching the vulnerabilities, we will call our .jar file including the optimal patching algorithm developed by java. First, we construct the shell command for calling jar file, and the command line made by ”java path -jar jar path input variable”. While the jar input variables are automata path and input strings, the process of construction shows in line 4 to 8. We use PHP exec to execute an external program that is our optimal patch jar file in line 9. The ouput of exec is what we what and return the result based by $sinkId (Line 9-11).. 14.

(22) 4 Algorithm. 立. 政 治 大. In this chapter, it will describe the indeed algorithms for sanitization patching.. ‧ 國. 學. There are two ways based on (1) named Automata composition and Character composition. However, both two are only suitable for single track. While the. ‧. automata is a multi-track deterministic finite automata (MDFA), they are not work. Therefore, we modify the algorithms and make them can execute with. y. Nat. MDFA. we will introduce in following parts : Automata composition based on. sit. single track or multi-track, and Character composition based on single track. n. al. er. io. or multi-track. This two ways of word correction are made by composition which is making a new automata-based graph from sanitization signature A and input. Ch. i n U. v. string automata X denoted by A ◦ X, and shortest distance algorithm which is. engchi. equal to the edit distance between the input string and automata. Our objectives are that: • w ∈ / L(A), find w0 ∈ L(A), such that dist(w, w0 ) ≤ dist(w, w”) for all w” ∈ L(A). • (w1 , w2 , . . . , wn ) ∈ / L(A), find (w0 1 , w0 2 , . . . , w0 n ) ∈ L(A), such that dist(w, w0 ) ≤ dist(w, w”) for all w” ∈ L(A), where dist((w1 , w2 , . . . , wn ), (w0 1 , w0 2 , . . . , w0 n )) = Σ1≤i≤n dist(wi , w0 i ). 15.

(23) 4.1 Automata composition. 4.1. Automata composition. Composition : Let A be a deterministic finite automata (DFA) that accepts desired input strings (sanitization signature) and X be the only finite automaton representing the input string x. The composition of A and X, denoted as A ◦ X, results in a weighted transducer. Let A = hQA , ΣA , δA , qA 0 , FA i and X = hQX , ΣX , δX , qX 0 , FX i, the weighted transducer A◦X is defined as the five tuple of hQ, Σ, I, F, δi. A state q ∈ Q ⊆ QA ×QX is a pair (q1 , q2 ) where q1 ∈ QA , q2 ∈ QX . The alphabet Σ is (ΣX ∪ {}) × (ΣA ∪ {}) × (0, 1). I = {(qA 0 , qX 0 )} is the set. 政 治 大. of initial states. F = {(q1 , q2 ) | q1 ∈ FA , q2 ∈ FX } is the set of accepting states. The transition relation δ ∈ Q × Σ × Q is defined by δA and δX as follows:. 立. 學. For any δA (i, cA ) = i and δX (k, cX ) = k 0 , we have:. ‧ 國. 0. • δ(q, (cX , cA , 1)) = q 0 , where q = (i, k) and q 0 = (i0 , k 0 ), if cA 6= cX .. ‧. • δ(q, (cX , cA ), 0)) = q 0 , where q = (i, k) and q 0 = (i0 , k 0 ), if cA = cX .. y er. io. • δ(q, (, cA , 1)) = q 0 , where q = (i, k) and q 0 = (i0 , k).. sit. Nat. • δ(q, (cX , , 1)) = q 0 , where q = (i, k) and q 0 = (i, k 0 ).. al. v i n So the space and time complexityCof composition is O(|A||X|). hengchi U n. (a, ε) is a deletion, ( ε,b) is an insertion , and (a, b) with a 6= b is a substitution.. Shortest distance : fined by:. : Let A(x, y) be the weight associated by A is deA(x, y) = minπ∈P (I,x,y,F ). where π is a path, We also denote by P (I, x, y, F ) the set of paths from the initial states I to the final states F labeled with input string x and output string y. X ◦ A is a weighted transducer. The shortest distance from state p to q is defined as: d[p, q] =. min. w[π]. π∈PX◦A (p,q). where p, q ∈ Q in X ◦ A and w is the transition weight. If p = I, q = F , then the shortest-distance from the initial state to a final state in the weighted transducer. 16.

(24) 4.1 Automata composition. is d[I,F]. It can be computed using the generic single-source shortest-distance algorithm of Dijkstra. Use the shortest path algorithm to find out the path from initial to final states directly. d(x, A) = d[I, F ] = minπ∈PX◦A (I,F ) w[π] Edit Distance :. Based on (1), we have:. Lemma 1 Let A be a weighted automaton over the tropical semiring and let X be the finite automaton representing a string x. Then, the edit-distance between x and A is the shortest-distance from the initial state to a final state in the weighted transducer X ◦ A.. 政 治 大. 立. Automatacomposition. Lemma 2 x ∈ / A. →. ‧. ‧ 國. 學. Then it can be computed by Dijkstra algorithm to find out minimum cost path of edit operation from initial to final state.. x0 ∈ A. dist(x,x’) is min. sit. y. Nat. al. er. io. Proof : From paper (1), the edit distance in composition graph of a string x. n. and automata A, is also called levenshtein distance, is for making x ∈ A and has. Ch. i n U. v. a minimum cost. d(x, A) = miny∈L(A) d(x, y). where L(A) denotes the regular language accepted by A. x is the only string accepted by X, U= X ◦ A. The. engchi. shortest-distance from the initial state to final state in U is the edit distance between x and A. The eudocode of a simplified version of the shortest path algorithm for the automata is given in algorithm 1. Assign to every node a tentative distance value: set it to the minimum distance from input CU RDIS for our initial node and to infinity for all other nodes (Line 1-5). While set S is not null, dequeue q as current node from unvisited set S and will never be checked again.(Line 6-8) For the current node, consider all of its unvisited neighbors and calculate their tentative distances. Compare the newly calculated tentative distance to the current assigned value and assign the smaller one (Line 9-11). Record the predecessor node for tracing the shortest route and visited nodes from destination. 17.

(25) 4.1 Automata composition. 2 3 4 5 6 7 8 9 10 11 12 13 14 15. Input: Source,Graph,CU RDIS for each p ∈ Q in Graph do d[p]← ∞; end S ← {Source}; d[Source]=CURDIS[Source]; while S 6= ∅ do q← HEAD(S); DEQUEUE(S); for e∈ E[q] do if d[s] + w[e] ¡ d[n[e]] then d[n[e]] ← d[s] + w[e]; n[e].predecessor ← s; if n[e]∈ / S then Enqueue(S, n[e]); end. 立. 17 18. end end. 學. end. 16. 政 治 大. ‧. ‧ 國. 1. Algorithm 1: dijkstra process. y. Nat. sit. to source (Line 12). When we are done considering all of the neighbors of the. io. er. current node, If the destination node is not in S ,then add the node into S (Line 13-14). When the Set S is null, all nodes connect to source node has been visited. n. al. i n U. v. and the algorithm has finished. The algorithm coincides with Dijkstras algorithm with heap and its time complexity can reduce to O((|E|+|Q|)log|Q|). 1 2 3 4 5 6. Ch. engchi. Input: Des while Des.predecessor not null do path←path +E(Des,Des.predecessor).char; Des←Des.predecessor; end last←Des; return path;. Algorithm 2: trace process Let trace be a function that can trace back the shortest path from a node Des and shown in algorithm 2. while Des still has predecessor, path record the character from the edge between the node and its predecessor and replace the. 18.

(26) 4.1 Automata composition. node with its predecessor (Line 2-3). In the end, there are a parameter last label the last one node we tracing back and return the whole string in path. The automata composition algorithm follows bellowing process to get the edit distance and edited string: Figure 4.1 shows an example illustrating Automata composition.(a) Finite. 4. dijkstra(I,A◦ X,0); for f in F do if dijkstra.d[f ] is minimum then edit-distance=dijkstra.d[f], w’=trace(f); end. 5. end. 1 2 3. 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. Figure 4.1: The process graph of Automata composition. automaton X representing the string x = a. (b) Finite automaton A that only accept the string without character ’¡’. When input including ’¡’ will make state go to the sink noted by none label state. (c) Weighted transducer A ◦ X. We will ignore the sink state because our target is making the string accepted by A. The dijkstra algorithm can be used to compute the shortest distance from an initial state of A ◦ X to final state and thus the edit distance of A and x is 1 and the edited string is any character except for ’¡’.. 19.

(27) 4.1 Automata composition. 4.1.1. Extension to Multi-track. Composition : A multi-track deterministic finite automata (MDFA) reads one symbol for each track in each transition. Let Mn be an n-track MDFA which accepts n input strings. An n-track MDFA consists of its alphabet ΣMn ∈ (ΣA ×{λ})n where λ ∈ / ΣA is a special symbol for padding. A symbol cMn ∈ ΣMn is restricted to an n-tuple (λ, λ, . . . , cA , . . . , λ, λ), where only the ith track has value cA ∈ ΣA and λ for all other tracks. We further definecMn .track= i to denote the non lambda track of cMn , and cMn .char=cA to denote the value of that track. Let. 政 治 大. qMn 0 denote the initial state of Mn . FMn denote the final state of Mn . If there is no cA , that is all of the symbol in a transition are λ, it means this transition is a condition of shifting and we will compute additionally. ∀q ∈. 立. Q, ∃δ(q, λ) = q 0 , ∀δ(q 0 , c) = q”, add δ(q, c) = q” and F (q 0 ).. ‧ 國. 學. We use Xi (1 ≤ i ≤ n) to denote the single track DFA that only accepts the. ‧. string of the ith input , i.e., the value in the ith track of L(Mn ). For all 1 ≤ i ≤ n, Xi has its alphabet Σi ∈ (Σ×{λ}). (X) = (X1 , X2 , . . . , Xn ) denotes the set of automata Xi . (QX ) = QnX is the set of states of (Xi ). (q(X) ) = (qX1 , qX2 , . . . , qXn ) ∈. y. Nat. 0 0 /qXj ] = (qX1 , qX2 , . . . , qX (QX ) denote a state of (X). (q(X) )[qX , . . . , qXn ) dej j. sit. er. io. 0 . Let q(X) 0 denote the initial state notes the state in Xj updates from qXj to qX j (qX1 0 , qX2 0 , . . . , qXn 0 ). Let F(X) denote the final state (FX1 , FX2 , . . . , FXn ).. al. v i n A state q ∈ Q ⊆ Q × (Q ) is C a pair ((q ), q ) where q ∈ Q h e(Σ h i× U n g∪c{}) (Q ). The alphabet Σ is (Σ ∪ {}) × (0, 1). I = {q n. The weighted transducer Mn ◦(X) is defined as the five tuple of hQ, Σ, I, F, δi. Mn. X. Mn. (X). Mn. Mn , (q(X) ). ∈. Mn 0 , q(X) 0 }. is the set of initial states. F = {(q1 , q2 ) | q1 ∈ FMn , q2 ∈ F(X) } is the set of accepting X. (X). A. states. The transition relation δ ∈ Q × Σ × Q is defined by δMn and for 1 ≤ i ≤ n, δXi as follows: 0 For any δMn (qMn , cMn ) = qM , if there exists j = cMn .track, δXj (qXj , cX ) = n 0 qX . Let cA = cMn .char. Then, we have: j 0 0 • δ(q, (cX , cA , 1)) = q 0 , where q = (qMn , q(Xj ) ) and q 0 = (qM , q(X) [qX /qXj ]), n j. cX 6= cA .. 20.

(28) 4.1 Automata composition. 0 0 , q(X) [qX /qXj ]), • δ(q, (cX , cA , 0)) = q 0 , where q = (qMn , q(Xj ) ) and q 0 = (qM n j. cX = cA . 0 • δ(q, (cX , , 1)) = q 0 , where q = (qMn , q(Xj ) ) and q 0 = (qMn , q(X) [qX /qXj ]). j 0 • δ(q, (, cA , 1)) = q 0 , where q = (qMn , q(Xj ) ) and q 0 = (qM , q(Xj ) ). n. Shortest distance :. Mn ◦ (X) use the same shortest path algorithm as. standard composition. The multi-track algorithm follows bellowing process to get the edit distance and edited string:. 立. 政 治 大. end. ‧. ‧ 國. 5. 3. 學. 4. dijkstra(I,Mn ◦ (X),0); for f in F do if dijkstra.d[f ] is minimum then edit-distance=dijkstra.d[f], w=trace(f); ¨ end. 1 2. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. Figure 4.2: The process graph of Automata composition based on multi-track. Figure 4.2 shows an example illustrating multi-track Automata composition. (a) Finite automaton X1 ,X2 representing the string x1 = a and x2 = < . (b). 21.

(29) 4.1 Automata composition. 政 治 大. 立. ‧ 國. 學. Figure 4.3: The schematic diagram of multi-track composition. ‧. multi-track deterministic finite automata M. (c) Weighted transducer M2 ◦X1 ◦X2 . When M2 ’s state 0 to 1 has a transition (c : λ) means cA .track= 1, compose all. Nat. io. sit. y. δX1 . So in this part on graph we create those transitions. We use the for loop to build the transitions in every part according to the MDFA’s track. Finally, the. n. al. er. whole graph will be built like a cube in figure 4.2 and the dijkstra algorithm can. i n U. v. be used to compute the shortest distance from initial to a final state if they are copulative.. Ch. engchi. Automatacomposition. Lemma 3 x1 , x2 ∈ / (M2 ) 0 dist(x1 , x1 )+dist(x2 , x02 ). →. x1 , x2 ∈ (M ), edit distance = min. Proof : When condition comes to more than one inputs, the state of Mn ◦ (X2 ) is built by X1 , X2 and M . According to the F =(FMn , (FX1 , FX2 )), we can get that if the state from initial wants to reach to final, it needs to this 3 state (FMn , (FX1 , FX2 )) all reach final. Because a edit transition in U can only edit a char (insertion,deletion,substitution) of X1 or X2 in one time, the whole automata’s edit distance is equal to the minimum total distance of X1 and X2 . Similarly, it can also apply when Mn is a n-track MDFA.. 22.

(30) 4.2 Character composition. 4.2. Character composition. Composition : The Automata composition leads to a space complexity O(|x||A|). However, it’s possible to improve the space complexity to linear space O(|x| + |A|) by (1). Lemma 4 Use Character composition that requires at most O(|QA |) space to maintain a stack over a set of states in A . Then, the edit-distance between x and A can be computed in linear space.. 政 治 大. Proof: In (1) say that when computing the shortest distance to q = (i, j) in A ◦ X, only the shortest distances to the states in A ◦ X of level i and i - 1 need to be stored in O(|QA |). The shortest distances to the states of level strictly. 立. ‧ 國. 學. less than i - 1 can be safely discarded and keep in memory the last two levels active in the shortest-distance algorithm. That is because the string automata is. ‧. one-way and acyclic, the composition automata needs to go through at least on state in every level of X. Therefore, the space used to store the active part is in. Nat. io. sit. edit-distance of x and A is linear, that is in O(|x| + |A|).. y. O(|δA | + |QA |) = O(|A|). Thus, it follows that the space required to compute the. er. Given a word x made by characters c1 , c2 , . . . , cm . For any ci−1 , ci ∈ x, they. al. n. v i n The composition of A and c Cwhere h e nc gis ctheh ii hUcharacters in x, denoted. have the relationship that the level of ci is one greater than the level of ci−1 . i. i. t. as A ⊗ ci , results in a weighted transducer. Let A = hQA , ΣA , δA , qA 0 , FA i, the weighted transducer A ⊗ ci is defined as the five tuple of hQ, Σ, I, F, δi. A state q ∈ Q ⊆ QA × Int is a pair (q1 , i) where q1 ∈ QA , where i is the it h characters in string x. Let getIndex:Q × Int be a function that can get the index of automata state q1 in Q. The alphabet Σ is (ci × (ΣA ∪ {}) × (0, 1). I = {(qA 0 , 0)} is the set of initial states. F = {(q1 , |x|) | q1 ∈ FA } is the set of accepting states. The transition relation δ ∈ Q × Σ × Q is defined by δA and ci as follows: For any δA (k, cA ) = k 0 and δci (i − 1, ci ) = i, we have: • δ(q, (ci , cA , 1)) = q 0 , where q = (k, i − 1) and q 0 = (k 0 , i), if cA 6= ci .. 23.

(31) 4.2 Character composition. • δ(q, (ci , cA ), 0)) = q 0 , where q = (k 0 , i − 1) and q 0 = (k 0 , i), if cA = ci . • δ(q, (ci , , 1)) = q 0 , where q = (k, i − 1) and q 0 = (k, i). • δ(q, (, cA , 1)) = q 0 , where q = (k, i − 1) and q 0 = (k 0 , i − 1). Because only the shortest distances to the states in A ⊗ ci of level i and i − 1 need to be processed ,there is no need to store in memory the full transducer A ⊗ ci . Input string length m can be separated into m characters ,denoted as c1 , c2 ...cm . We will make A compose c1 to cm incrementally.. 政 治 大. Algorithm : The sudocode is in algorithm 3. First, input a word x and an automata A. We have CU RDIS occupied |QA | space to record the current states’. 立. ‧ 國. 學. shortest distance to Initial state I and initializes the value to 0. In the same time, we use CU RCHAR to label the current edited string in every steps (Line 1).. ‧. The sanitization signature A has cycles and can’t slice while input x is a linear automata. We use Separate(x) make x separate into (c), where (c) is the set of the character in x (Line 2). Set for loop (cl = 1 to |x|) to record the current level. Nat. sit. y. for computing the every character of (c). We will compute a small graph C = A⊗. er. io. ccl and its distance level by level until reaching the final states (Line 4). We make states set Init add any exist state which level is current level minus one being. al. n. v i n Ch 5-12). Calling the function dijkstra(Init,C,CUR) U the shortest path form e n g c hfindi out. initial states in C. If current character is first one, use I as the initial state.(Line states of Init to the other states in C with the initial cost CU RDIS.(Line 8). (Line 14-19) is the core part in this algorithm. Set of state Des has every state which level is equal to cl. CU RDIS labels the minimum distance between initial states Init and states Des, where Init ∈ Q, level(q) = cl − 1 and CU RCHAR labels the string made by transition through the shortest path tracing back from Des and plus the before CU RCHAR. When level comes to last, the edit-distance and edited-string between a input string x and a sanitization signature A, that is taking from CU RDIS and CU RCHAR by final state index (Line 21-25). As a result, the space complexity in Automata composition |X × A| decrease to |A| in Character composition.. 24.

(32) 4.2 Character composition. 2 3 4 5 6 7 8. 10. 學. 9. 立. 政 治 大. ‧ 國. 1. Input: x,A CURDIS ← 0, CURCHAR ← null; (c)=Separate(x); for cl:= 1 to |x| do C=A ⊗ ccl ; if cl = 1 then Init← q=(IA ,0); else for ∀ s ∈ QA do Init← q=(s,cl-1); end. 18. end. y. al. n. 16. io. 15. sit. 14. Nat. 13. er. 12. ‧. 17. end dijkstra(Init,C,CURDIS); for i := |QA |-1 to 0 do Des← ∀ q where level(q)=cl; CURDIS[i] =dijkstra.d[Des[i]]; CURCHAR[i]= CURCHAR[getIndex(trace(Des[i]).last)]+trace(Des[i]);. 11. Ch. engchi. 23. end for f := getIndex(FA ) do if dist=CURDIS[f ] is minimum then edit-distance=dist, w’=CURCHAR[f]; end. 24. end. 19 20 21 22. i n U. Algorithm 3: Character composition. 25. v.

(33) 4.2 Character composition. 立. 政 治 大. Figure 4.4: The process graph of Character composition. ‧ 國. 學. Figure 4.4 shows an example illustrating Character composition. (a)String x = ”¡a” can be separated into c1 =’¡’ and c2 =’a’. (b) Finite automaton A that. ‧. only accept the word not include the character ’¡’. (c) Weighted transducer A ⊗ c1 (left) and A ⊗ c2 (right). The dijkstra algorithm can be used to compute. Nat. sit. y. the shortest distance from Initial state of A ⊗ c1 to a state at level 1 and thus the CU RDIS of states (1,1) and (0,1) are 1 in left part of figure. In the same time,. io. er. CU RCHAR of states in level 1 have the edited character ”b” by substitution any one except ”¡”, and ”null” by deletion. e.g.,while c1 make level cl equal to. al. n. v i n 1, we only compute the states in C level 0, and record h e1 and i U the shortest distance h n c g and edited character in states of level 1. After that, we discard the graph to compute the next level. When comes to next level we compute A ⊗ c2 in right part figure, We can find out that the state (1,1) and (0,1) also exist, that represent from initial to final state need to go through one of the states (1,1) or (0,1). The result Labeling in CU RDIS and CU RCHAR can shift and use in the right part. The edit distance of (1,2) is minimum value of the shortest distance in right part plus the the labeling value in the state from forward level. There are two shortest paths in (c), one is (0,0)(1,1)(1,2), the other one is (0,0)(0,1)(1,2). Although they have different editing operation, their edit distances are both 1.. 26.

(34) 4.2 Character composition. 4.2.1. Extension to Multi-track. When condition based on multi-track sink problems, we try to adapt Character composition for MDFA and multiple inputs. Let Mn be an n-track MDFA which accepts n input strings. It still separates every input string first. Let inputs (xi ) separates into characters in every track, (cni ) denotes the set includes every character in n-track input, and represents the it h character in nt h track. We will get an composition with Mn and (cni ) incrementally. The weighted transducer Mn ⊗(cni ) is defined as the five tuple of hQ, Σ, I, F, δi.. 政 治 大. A state q ∈ Q ⊆ QMn ×(ni ) where (ni ) is a set of the node index limited value i in track n. Let getIndex:Q × Int be a function that can get the index of automata state QMn in Q.. 立. (mj−1 )[j/j − 1] = (c1 , c2 , . . . , mj+1 , . . . , cn ) denotes the index of mj−1 updates. ‧ 國. 學. from j − 1 to j in track m. The alphabet Σ is (cni ) × (ΣA ∪ {}) × (0, 1).. ‧. I = {qMn 0 , (0)} where (0) is the set of 0 totally. F = {(q1 , (|xi |)) | q1 ∈ FMn , (|xi |) is the set of the lengths of input. The transition relation δ ∈ Q × Σ × Q is defined by δMn and (cni ) as follows:. y. Nat. sit. n. al. er. io. 0 , if there exists m = cMn .track, (j − 1, cmj ) = j. For any δMn (qMn , cMn ) = qM n Let cA = cMn .char. Then, we have:. i n U. v. 0 • δ(q, (cmj , cA , 1)) = q 0 , where q = (qMn , (mj−1 )) and q 0 = (qM , (mj−1 )[j/j − n. 1]), cX 6= cA .. Ch. engchi. 0 , (mj−1 )[j/j − • δ(q, (cmj , cA , 0)) = q 0 , where q = (qMn , (mj−1 )) and q 0 = (qM n. 1]), cX = cA . • δ(q, (cmj , , 1)) = q 0 , where q = (qMn , (mj−1 )) and q 0 = (qMn , (mj−1 )[j − 1/j]). 0 • δ(q, (, cA , 1)) = q 0 , where q = (qMn , (mj−1 )) and q 0 = (qM , (mj−1 )). n. Algorithm : The space complexity reduced by one dimension (|x × A| → |A|) after executing Character composition. So we can infer that conduct Character composition will reduce the space complexity to a smaller space.. 27.

(35) 4.2 Character composition. 3 4 5 6 7 8 9 10 11 12. 14 15. 學. 13. 立. 政 治 大. 24. end dijkstra(Init,C,CURDIS); for i := |QMn |-1 to 0 do Des← ∀ (qMn ,(cl)); CURDIS[i][convert[(cl)]]=dijkstra.d[Des[i]]; CURCHAR[i][convert[(cl)]]= CURCHAR[getIndex(trace(Des[i]).last)][convert[(cl)]] +trace(Des[i]);. 25. end. 17. io. 20 21. n. al. er. 19. sit. Nat. 18. 22 23. Ch. engchi. 32 33. end. 30 31. v. end end for f := getIndex(FMn ) do if dist=CURDIS[f ][convert[(cl)]] is minimum then edit-distance=dist, w=CURCHAR[f][convert[(cl)]]; ¨ end. 28 29. i n U. end. 26 27. ‧. end. 16. y. 2. Input: Mn ,(xi ) CURDIS ← 0, CURCHAR ← null; (cni ) = Separate(xi ); for cl1 = 0 to |x1 | do for cl2 = 0 to |x2 | do ...; for cln = 0 to |xn | do (cl)=(cl1,cl2,cl3,. . . ,cln); C = Mn ⊗ (cn(cl) ); if (cl)=(0) then Init← q=(IMn ,(0)); else for ∀ s ∈ QMn do for m := 1 to n do Init← ∃ q =(s,(mj )[mcl−1 /mc l]); end. ‧ 國. 1. Algorithm 4: Multi-track Character composition. 28.

(36) 4.2 Character composition. Based on algorithm of Character composition, we have some changes in multitrack for the more input strings and MDFA showed in 4. Inputs are a n-track MDFA Mn and a set of inputs (xi ). We spend the space |QMn | × |x2 | . . . kxn | to initial CU R for recording the shortest distance (Line 1), and then use Separate to slice the input strings into the set of character (cni ) in track 1 to n (Line 2). Using n layer of for function to compute the shortest distance incrementally (Line 3-6). Let (cl)=(cl1,cl2,cl3,. . . ,cln) represent the index set of current computing composition (Line 7), and use (cl) to get the corresponding characters for small composition C = Mn ⊗ (cn(cl) ) (Line 8). when (cl) is first and equal to (0),. 政 治 大. we use I as Init in C. In other condition, we make set Init add any exist state which level in anyone track minus one being initial states in C (Line 9-18). Call. 立. dijkstra(Init,C,CUR) to get the shortest path based on Init and C and plus the. ‧ 國. 學. distance between Initial and Init in CU RDIS (Line 19). We consider Des as any exist state which level index set is equal to (cl). CU RDIS record the shortest distance by before level states Init , and plus the shortest path to current state. ‧. Des in C. Let convert([(cl)]):Int[ ] → Int be a function from a set of integer to. sit. y. Nat. a integer defined by:. io. er. convert[(cl)] = cl2 + (cl3 × |QX 2 |) + · · · +. al. v i n Ch The function converts the multidimensional coordinates i U into a only integer ease h n c g ily. Moreover, it computes except for cl1 because we set every characters in x to n. (cln × |QX (n−1) | × |QX (n−2) | × · · · × |QX 2 |). 1. be the necessarily passing node. Using convert[(cl)] to get the index of twodimensional array CU RDIS that can update the shortest distance at current states in CU RDIS (Line 20-22). CU RCHAR labels the string made by transition through the shortest path tracing back from Des to the beginning state in Init and plus the before CU RCHAR (Line 23). In the end, when CU RDIS includes final states, we choose the minimum distance between Mn and inputs (xi ) in CU RDIS and also get the final edited string w¨.(Line 24-28) For example in figure 4.5, while Mn has 2 tracks, Automata composition will build a cube and execute the shortest path algorithm. However, we separate. 29.

(37) 4.2 Character composition. 政 治 大. 立. ‧. ‧ 國. 學 sit. y. Nat. io. n. al. er. Figure 4.5: The process graph of Character composition based on multi-track. i n U. v. the inputs and apply Character composition. The composition will only be the blue part in (a). The red lines is the CU RDIS and CU RCHAR to record. Ch. engchi. the shortest distances and edited string to the current states from initial state. Figure (b) is the detail of the blue cube which shows the general composition of M⊗(c1i , c2j ). It uses the neighboring CU RDIS in every track to calculate the shortest distance of states on green line. In the end in (c), The green line will be replaced by the red line of CU RDIS at before level. We use this approach incrementally to get the shortest distance until CU RDIS including the final state and take it out. In the end, we try to infer that: Theorem 1 An optimal patch between a set of string x1 to xn and an multiple tracks automaton Mn requires can be computed in space O(|x1 | + |x2 | + · · · +. 30.

(38) 4.2 Character composition. |xn |+|Mn |+|QMn | +|Qx2 | . . . |QXn |) and in time O(n|Qx1 | |Qx2 | . . . |Qxn | (|Mn |) log|QMn |). Proof: When doing the shortest path based on Character composition algorithm, we only compute a character of every track of input. As a result, the process space complexity is O(|x1 | + |x2 | + · · · + |Mn |). But the recording size of necessary go through state increase from O(|QA |) in single track to O(|QMn ||Qx2 | . . . |Qxn |). In the stage of Automata composition, the time complexity is |x||A|. The time complexity when using short path algorithm is (|QA |)2 and execute |QX ||QA |. 政 治 大. times. We only compute one phase of part in one time. So the time complexity of. 立. the small part is O(n(|Mn |)log|QMn |). But we need to execute |Qx1 ||Qx2 | . . . |Qxn ||QMn |. 4 5 6 7 8 9 10 11 12. y. sit. er. engchi. end. 14. 16. Ch. i n U. end. 13. 15. al. n. 3. Input: Mn for ∀char ∈ Σ do for track := 1 to n do Arrays.fill((cni ),char); C = Mn ⊗ (cni ); for ∀ from ∈ QMn do Init← q =(from,(0)); dijkstra(Init,C,(0)); for ∀ to ∈ QMn do Des← (to,(0)[track 0 /track 1 ])); DISINFO[(int)char][from][des] =dijkstra.d[Des.getIndex]]; CHARINFO[(int)char][from][des][track]=trace(Des);. io. 2. Nat. 1. Pre-Computation of Shortest Distance. ‧. 4.2.2. ‧ 國. (|Mn |) log|QMn |).. 學. times, that is because the times of remaining execution are the total amount of states. Therefore, the total time complexity is O(n|Qx1 ||Qx2 | . . . |Qxn | |QMn |. end end. Algorithm 5: Pre-Computation. 31. v.

(39) 4.2 Character composition. There are a lot of analogous step in Character composition when the inputs runtime come in. For example, when the application gets a input ” << ”, it will compose and find shortest path twice. However, the results of this two composition and path are same. Therefore, for better efficiency, we can do the Pre-computation for all possible input one characters one time to calculate and record the shortest distance between every state in every track. The pre-computation process is showed in algorithm 5. The pre-computation information DISINFO and CHARINFO occupied O(Σ× n(|QMn |)2 ) can replace the automata to be the sanitization signature and ignore. 政 治 大. the quantity of automata transition. In the same time, Pre-computation make Character composition faster because it skips the overhead of Dijkstra algo-. 立. rithm which is the most time-consuming part of our method in real-time. The. ‧. ‧ 國. 學. time complexity is O(n|Qx1 ||Qx2 | . . . |Qxn ||QMn |).. n. er. io. sit. y. Nat. al. Ch. engchi. 32. i n U. v.

(40) 5 Experiment Performance. ‧ 國. 學. 5.1. 立. 政 治 大. ‧. While we execute two algorithms to compute the edit-distance, it’s necessary to compare the average time between that. In the situation of single track, we test the simple automata (Q=12, bdd=29) which has 12 states and 29 tran-. y. Nat. sitions composes with 1000 random input strings in certain length, to see the. sit. al. er. io. average time two algorithms spent in different length of inputs for comparing the efficiency between that. In the figure 5.1(1), the abscissa is seconds. n. of time and the ordinate is the length of input string. We can find out that. Ch. i n U. v. when the input is short, Automata composition spends average time is same as. engchi. Character composition. Since input over certain length, the situation is that Character composition and Pre-computation has linear time growth curve while Automata composition’s slop bigger and bigger. We can infer that the functions of Character composition and Pre-computation have better efficiency in most case. Next, we choose the automata 2 which has bigger amount of state and bdd (Q=62, bdd=127) for testing. In figure 5.1(2), we can see the almost same growth curve based on figure 5.1(1), but it has 100 times larger number on ordinate. The fact representing big amount of state makes a longer time to conduct two algorithms. We also get an automata has almost triple transition (Q=74, bdd=298) with close amount state to test the time, and find out that the curve is still close in. 33.

(41) 5.1 Performance. (1). (2). 立. ‧ 國. ‧. n. al. er. io. sit. y. Nat. (5). (4). 學. (3). 政 治 大. v. Figure 5.1: The graph of average time curve based on the length of input (1)Single track automata 1(Q=12,bdd=29) (2)Single track automata 2(Q=62,bdd=127) (3)Single track automata 3(Q=74,bdd=298) (4)The average time of Character composition from (1)(2)(3) (5)The average time of Pre-computation from (1)(2)(3). Ch. engchi. i n U. figure 5.1(3). However, the numbers of ordinate increase again. Because of the time gap of three functions is too big to see the difference. Figure 5.1(4)(5) put the lines of Character composition and Pre-computation together from the first three figure to compare the efficiency with different automata based on same method. We can find out that the time gap of Character composition between (2)(3) is much larger than that of Pre-computation. It can be inferred that the slopes of time with Character composition is affected. 34.

(42) 5.1 Performance. by the length of input string, the size of state and transition amount in an automata. Compared with that, Pre-computation is only affected by the length of input string and the size of state.. (1). 立. ‧. ‧ 國. 學. (3). 治 政 (2) 大. (4). n. er. io. sit. y. Nat. al. (5). Ch. engchi. i n U. v. Figure 5.2: The graph of average time curve based on the length of inputs (1)Multi-track automata 1(Q=21,bdd=77) with the increasing length of input 1 (2)Multi-track automata 2(Q=36,bdd=72) with the increasing length of input 1 (3)Multi-track automata 1(Q=21,bdd=77) with the increasing length of both inputs (4)The average time of Character composition from (1)(2)(3) (5)The average time of Pre-computation from (1)(2)(3). When the experiment comes to multi-track, we try to test the efficiency affected by this three factors or not. Getting a basic MDFA (Q=21, bdd=77) to test the time based on the growth of the string length in first track and the result show. 35.

(43) 5.2 Correction Effect. in figure 5.2(1). We observe the growth curves of two algorithm are like single track. In the same time, the growth rate of Automata composition spend time is bigger, and time curve of Character composition and Pre-computation are still close to linear. Time difference between this three has larger magnification gap. We also choose a bigger amount of state in MDFA (Q=36, bdd=72) to conduct this two function. The result in figure 5.2(2) has obviously same curve in figure 5.2(2), but has 2 times larger of number on ordinate. the results of experiment indicate that Character composition has less time spent in most case and. 政 治 大. a better efficiency.. 立. Last, we increase the input length growth with both track in figure 5.2(3).. ‧ 國. 學. That is, the length of x1 is equal to x2 in the two-track automata. The result show that The time curve of Automata composition greatly goes upward. Relatively speaking, Character composition and Pre-computation save more time and. ‧. more efficient in the complex condition.. y. Nat. Last, we put the lines of Character composition and Pre-computation to-. sit. er. io. gether from the first three figure in 5.2(4)(5). We can find out that the line from (3) has Square slope when we try to increase the input length simultaneously.. n. al. 5.2. Correction Effect Ch Attack Attack 1(SQLI). Attack 2(XSS). track 1 2 3 1 2 3. engchi #automata 427 10 15 60 8 3. #state 14 107.5 183.1 10.0 78.5 164.3. i n U. v. #bdd 115 1814.5 2068.3 79.1 1163.5 1672.6. Table 5.1: testing automata information. We analysis some web sites’ php code in real and generate white automata to test our algorithm. Table 5.1 lists the number of automata classified as different attack types (SQLI, XSS) and tracks (1,2 and 3). Meanwhile, the average sizes of states and bdds of automata are also calculated. We can find out that there. 36.

數據

Outline

相關文件

After students have had ample practice with developing characters, describing a setting and writing realistic dialogue, they will need to go back to the Short Story Writing Task

• helps teachers collect learning evidence to provide timely feedback & refine teaching strategies.. AaL • engages students in reflecting on & monitoring their progress

Robinson Crusoe is an Englishman from the 1) t_______ of York in the seventeenth century, the youngest son of a merchant of German origin. This trip is financially successful,

fostering independent application of reading strategies Strategy 7: Provide opportunities for students to track, reflect on, and share their learning progress (destination). •

Strategy 3: Offer descriptive feedback during the learning process (enabling strategy). Where the

How does drama help to develop English language skills.. In Forms 2-6, students develop their self-expression by participating in a wide range of activities

Now, nearly all of the current flows through wire S since it has a much lower resistance than the light bulb. The light bulb does not glow because the current flowing through it

Hope theory: A member of the positive psychology family. Lopez (Eds.), Handbook of positive