混合式的Java網頁應用程式分析工具 - 政大學術集成

全文

(1)國立政治大學資訊科學系 Department of Computer Science National Chengchi University 碩士論文 Master’s Thesis. 立. 政治大. ‧ 國. 學. 混合式的 Java 網頁應用程式分析工具. ‧. A Hybrid Security Analyzer for Java Web Nat. n. sit er. io. al. y. Applications Ch. engchi. i Un. v. 研究生：江尚倫指導教授：陳. 恭博士. 中華民國九十九年七月 July 2010.

(2) 混合式 Java 網頁應用程式分析工具 A Hybrid Security Analyzer for Java Web Applications 研究生：江尚倫指導教授：陳恭博士. 立. 國立政治大學政治資訊科學系碩士論文. 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. A Thesis submitted to Department of Computer Science National Chengchi University in partialafulfillment of the Requirements iv l C n forhthe e ndegree g c h iofU Master in Computer Science. 中華民國九十九年七月 July 2010.

(3) 謝辭在寫謝辭的同時，兩年的碩士生涯就要在這邊畫下句點了…真是百感交集。在碩士班的求學過程中，曾經讓我相當挫折，感覺自己如一個門外漢直接就踏入了不同領域的研究所，但最終我還是克服了，並且完成屬於自己的碩士論文，對我來說，這是一件非常有成就感的一件事。其中最感謝的是我的指導教授陳恭老師，願意帶領我進入這個陌生的領域，並且給予我鞭策及指導，也經常傳受自己的經驗與我們分享，讓我受益良多，由衷感謝老師在各方面的教導。此外要感謝實驗室的夥伴們，讓我在與學術研究苦戰時不覺得孤單，起點. 政治大歡笑。最後我要感謝我的家人和佩瑩，沒有你們不會有今天的我，未來我將帶著立王、政宏、天線、瀟灑哥、Kevin 黃、立恒，謝謝你們的陪伴和為實驗室帶來的. ‧. ‧ 國. 學. 你們賦予給我的信心與勇氣，在未知的路途上，踏下美麗繽紛的足跡。. n. er. io. sit. y. Nat. al. Ch. engchi. i Un. v.

(4) 混合式 Java 網頁應用程式分析工具論文摘要近年來網路應用蓬勃的發展，經由網頁應用程式提供服務或從事商業行為已經成為趨勢，因此網頁應用程式自然而然成為網路攻擊者的目標，攻擊手法也隨著時間不斷的翻新。已經有許多的方法被提出用來防範這些攻擊，增加網頁應用程式的安全性，如防火牆的機制以及加密連線，但是這些方法所帶來的效果有限，最根本的方法應為回歸原始的網頁應用程式設計，確實的找出應用程式本身的弱點，才能杜絕不斷變化的攻擊手法。以程式分析的技術來發現這些弱點是常. 政治大. 見的方法之一，程式分析又分為靜態分析和動態分析，兩種分析技術都能有效的. 立. 找出這些弱點。我們整理了近幾年的網頁應用程式分析技術，多採用靜態分析，. ‧ 國. 學. 然而比較後發現靜態分析的技術對於 Java 的網頁應用程式的分析，無法達到精確的分析結果，原因在於 Java 語言所具有的特性，如：變數的多型、反射機制. ‧. 的應用等。靜態分析在處理這些問題具有先天上的缺陷，由於並沒有實際的去執. y. Nat. sit. 行程式，所以無法獲得這些執行時期才有的資訊。. n. al. er. io. 本研究的重點將放在動態的程式分析技術上，也就是於程式執行期間所進行. Ch. i Un. v. 的分析，來解決分析 Java 網頁應用程式的上述問題。為了在程式執行期間得到. engchi. 可利用的分析資訊，我們運用了 AspectJ 的插碼技術。我們的工具會先將負責收集資訊的模組插入應用程式的源碼，並以單元測試的方式執行程式，於程式執行的過程中將分析資訊傳遞給分析模組，利用 Java 語言的特性進行汙染資料的追蹤。另外，我們考慮到以動態分析的方式偵測弱點會因為執行的路徑，導致一些潛在的弱點無法被發現，所以我們利用了線上分析的概念，設計出了線上的污染資料流分析模組，我們的工具結合了上述兩個分析模組所產生的分析結果，提供開網頁應用程式弱點資訊。. 關鍵字：動態分析、線上分析、資料流分析、網頁應用程式、安全性弱點 I.

(5) A Hybrid Security Analyzer for Java Web Applications Abstract In recent years, development of web application is flourishing and the increasing population of using internet, providing customer service and making business through network has been a prevalent trend. Consequently, the web applications have become the targets of the web hackers. With the progress of information technology, the technique of web attack becomes timeless and widespread. Some approaches have been taken to prevent from web attacks, such as firewall and encrypted connection. But these approaches have a limited effect against these attack techniques. The basic method should be taken is to eliminate the vulnerabilities inside the web application. Program analysis is common technique for detecting these vulnerabilities. There are two major program analysis approaches: static analysis and dynamic analysis. Both these approaches can detect vulnerabilities effectively. We reviewed several program analysis tools. Most of them are static analysis tool. However, we noticed that it is insufficient to analysis Java program in a static way due to the characteristic of Java language, e.g., polymorphism, reflection and more. Static has its congenital defects in examining these features, because static analysis happens when the program is not executing and lacks of runtime information. In this thesis, we focus on dynamic analysis of programs, where the analysis occurs when the program is executing, to solve the problems mentioned above in Java web application. In order to retrieving the runtime analysis information, we utilize the instrumentation mechanism provided by AspectJ. We instrument designed module in to the program and gather the needed information and execute the program in a unit testing approach. Our dynamic analysis module retrieves the information from instrumented executing program and utilizes the characteristic of Java to perform the tainted data tracking. We considered the dynamic tracking mechanism will leave some vulnerabilities undiscovered when the program is not completely executed. Hence we adopt the online analysis concept and design an online analysis module to find out the potential vulnerabilities which cannot be detected by dynamically tracking the tainted data. Our analysis tool finally integrates these two analysis results and provides the most soundness analysis result for developers.. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i Un. v. Keywords: Dynamic Analysis, Online Analysis, Dataflow Analysis, Web Application, Security Vulnerabilities.. II.

(6) Contents 1 INTRODUCTION ...................................................................................................... 1 1.1 1.2 1.3. BACKGROUND ................................................................................................. 1 MOTIVATION AND OBJECTIVES ........................................................................ 1 THESIS OUTLINE ............................................................................................. 4. 2 RELATED WORK...................................................................................................... 5 2.1 STATIC ANALYSIS IN JAVA APPLICATION WITH STATIC ANALYSIS ........................... 6 2.1.1 System Overview .......................................................................................... 6 2.1.2 Points-to analysis .......................................................................................... 6 2.1.3 Specifying Taint Problems in PQL ............................................................... 7 2.1.4 Discussion ..................................................................................................... 7 2.2 STATIC ANALYSIS TOOL FOR PHP WEB APPLICATION .............................................. 8 2.2.1 System overview ........................................................................................... 9 2.2.2 Taint dataflow analysis................................................................................ 10 2.2.3 Discussion ................................................................................................... 10 2.3 DATAFLOW POINTCUT IN ASPECT-ORIENTED PROGRAMMING .............................. 11 2.3.1dataflow pointcut ......................................................................................... 12 2.3.2 Excluding condition .................................................................................... 13 2.3.3 Discussion ................................................................................................... 13 2.4 POSITIVE TAINTING IN WASP SYSTEM ................................................................. 14 2.4.1 Positive Tainting ......................................................................................... 15 2.4.2 Syntax-Aware Evaluation ........................................................................... 15 2.4.4 Discussion ................................................................................................... 16 2.5 FAST ONLINE POINTER ANALYSIS ....................................................................... 17 2.5.1 Online analysis architecture ........................................................................ 17 2.5.2 Discussion ................................................................................................... 18 2.6 SUMMARY ........................................................................................................... 18. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i Un. v. 3 PRELIMINARIES .................................................................................................... 20 3.1 VULNERABILITES IN WEB APPLICATIONS ............................................................ 20 3.1.2 Injection flaw vulnerabilities ...................................................................... 21 3.1.3 Cross-Site Scripting vulnerabilities ............................................................ 22 3.2 ASPECT-ORIENTED PROGRAMMING ..................................................................... 24 4 SYSTEM ARCHITECTURE.................................................................................... 25 4.1 TAINT TRACKER ASPECT ................................................................................ 26 4.2 ONLINE TAINT DATAFLOW ANALYSIS.................................................................... 33 III.

(7) 4.2.1 Collect the information from instrumentation ............................................ 34 4.2.2 Online Taint Dataflow Analysis .................................................................. 36 4.3 PROGRAM EXECUTOR .................................................................................. 37 4.3.1Gather the information of HTML form........................................................ 38 4.3.2Parsing Web Configuration XML File ......................................................... 39 5 EVALUATION ......................................................................................................... 40 5.1 SECURIBENCH-MICRO BENCHMARK ................................................................... 40 5.2 PRECOMPILATION OF JAVA SERVER PAGES .......................................................... 41 5.3 A REAL WORLD CASE: PEBBLE BLOG ................................................................. 42 6 CONCLUSION ......................................................................................................... 44 6.1 CONTRIBUTIONS .................................................................................................. 44 6.2 FUTURE WORK .................................................................................................... 45. 政治大. REFERENCE:.............................................................................................................. 47. 立. APPENDIX .................................................................................................................. 49. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. IV. i Un. v.

(8) List of Figures FIGURE 1. 1 A MOTIVATING EXAMPLE .............................................................................. 3 FIGURE 2. 1 SYSTEM ARCHITECTURE OF JAVA STATIC ANALYZER ..................................... 6 FIGURE 2. 2 EXAMPLE OF PQL QUERY FOR FINDING SQL INJECTIONS ............................. 8 FIGURE 2. 3:PHP ANALYZER ANALYSIS ARCHITECTURE ................................................... 9 FIGURE 2. 4 THE RULES OF PHP TAINT DATAFLOW ANALYSIS ........................................ 10 FIGURE 2. 5 TAINT DATAFLOW ANALYSIS ALGORITHM ................................................... 11 FIGURE 2. 6: WASP SYSTEM ARCHITECTURE ................................................................. 15 FIGURE 2. 7: IDENTIFICATION OF TRUSTED AND UNTRUSTED DATA ................................ 15 FIGURE 2. 8: ONLINE ANALYSIS ARCHITECTURE ............................................................ 18. 立. 政治大 ........................................... 23. FIGURE 3. 1 DOM BASED XXS VULNERABILITY EXAMPLE. ‧. ‧ 國. 學. FIGURE 4. 1 SYSTEM ARCHITECTURE............................................................................. 26 FIGURE 4. 2: SCENARIO OF ANALYSIS USING OUR TOOL ................................................. 26 FIGURE 4. 3 STRING OPERATIONS .................................................................................. 31 FIGURE 4. 4: ONLINE ANALYSIS FLOW DIAGRAM ........................................................... 34 FIGURE 4. 5: ONLINE TAINT DATAFLOW ANALYSIS RULES .............................................. 36 FIGURE 4. 6: SCENARIO OF PROGRAM EXECUTOR .......................................................... 37. er. io. sit. y. Nat. al. v. n. FIGURE 5. 1: COMPARISON OF THE EXPERIMENTAL RESULT WITH OTHER ANALYZER USING SECURIBENCH-MICRO BENCHMARK ...................................................................... 41 FIGURE 5. 2 PEBBLE EXPERIMENT RESULT ..................................................................... 42. Ch. engchi. V. i Un.

(9) Chapter 1 Introduction 1.1 Background The number of web applications are growing rapidly in the recent years. They have become ubiquitous due to the convenience, flexibility, and availability. More and more organizations and companies have built their own websites for customer services or their own business. 政治大 As the user’s needs grows, the business logic and program code of the Web 立. purpose. People have been getting used to the service provided by the Web applications via the Internet.. ‧ 國. 學. applications are becoming more complicated and fragile. People are easily having security issue when accessing these web applications. According to the OWASP[1-2] organization,. ‧. they indicate that there are already a variety of ways to access a Web application maliciously,. sit. y. Nat. e.g., Injection flaws, Cross-Site Scripting attacks. These attacks are due to the security. er. io. vulnerabilities in a web application while developing the program.. al. n. iv n C h ealln gthese from these attacks but they can’t solve and only work under certain i U c hvulnerabilities Some techniques, such as firewall and encrypted connection, have been taken to prevent. scenarios. So in the thesis, we focus on eliminating these vulnerabilities from source code level. We believe, before the Web application is deployed online, removing most of the insecure code can make the Web application more secure.. 1.2 Motivation and Objectives Vulnerabilities in Web applications may lead to serious security issues such as leaking personal sensitive information, database corruption even crush down the sever. Attackers can access to insecure Web applications through these vulnerabilities and take over the control. Insufficient background knowledge or a careless Web application developer are possible. 1.

(10) reasons for producing these vulnerabilities. However, we can reduce these vulnerabilities in Web applications in the development stage. Although, these vulnerabilities cannot be identified by a compiler at compile time. An experienced developer may still write insecure code to form these vulnerabilities. One way to reduce the vulnerabilities in Web application is to review the program code manually but it is time consuming and error-prone when the web application grows larger and becomes more complex. For such reasons, it is practical to have a program analysis tool to help developers to find out the vulnerabilities in the program.. 政治大. We use program analysis approaches to implement our analysis tool. There are two major. 立. approaches of program analysis: static program analysis and dynamic program analysis.. ‧ 國. 學. Static analysis analyzes the program without executing it and dynamic analysis analyzes the program while it is being executing. Both of them have some limitations. For example, static. ‧. the. program.. Dynamic. program. analysis. sit. executing. Nat. actually. y. analysis may lead to false positive alarms more than dynamic analysis because it doesn’t may. leave. some. n. al. er. io. vulnerabilities undiscovered if the program has been executed incompletely. However, online. i Un. v. analysis[6] has been proposed and tries to solve these problems mentioned above. Online. Ch. engchi. analysis performs like dynamic analysis. It analyzes the program when the program is running. Online analysis incrementally analyzes code when it is dynamically loaded into the running program. Thus online analysis can handle with the dynamic code. In this thesis, we choose Java as our target language because it is one of widely used server-side language. According TIOBE’s[3] statistic in Jun 2010, Java language is ranked the first. Java language has its own characteristics, e.g., polymorphism, dynamic loading, and Reflection mechanism. Hence it is insufficient to analyze a Java program in a static way due to these features. A motivation example is shown in the figure below.. 2.

(11) public class Refl1 extends BasicTestCase implements MicroTestCase { private static final String FIELD_NAME = "name"; protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws IOException { String s1 = req.getParameter(FIELD_NAME); PrintWriter writer = resp.getWriter(); Method idMethod = null; try { Class clazz = Class.forName("securibench.micro.reflection.Refl1"); Method methods[] = clazz.getMethods(); for(int i = 0; i < methods.length; i++) { Method method = methods[i]; if(method.getName().equals("id")) { idMethod = method; break; } } Object o = idMethod.invoke(this, new Object[] {s1, writer}); String s2 = (String) o; writer.println(s2); } catch( Exception e ) { e.printStackTrace(); } public String id(String string, PrintWriter writer) { return string; } }. 立. 政治大. ‧. ‧ 國. 學. Nat. Figure 1. 1 A motivating example. sit. y. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 u 27. n. al. er. io. In the code sample, the unchecked user input (regarded as a tainted string) s1 get its value. i Un. v. from the calls to getParameter at line 5. Next, a dynamic loading method call loads the Refl1. Ch. engchi. class itself and acquire the id method in the class at lines 9 and 13. Then the id method is invoked (reflectively) at line 18. Since the id method simply return its first string argument, object o and string s2 are reference to the same string object with string s1. Method println is considered an XSS sink because it renders the string value of its input to the screen. Thus, the call to println with argument s2 at line 20 poses a security issue. This example illustrates many of the challenges faced by static taint analysis of real programs. To analyze this code precisely, the analysis must track flow through dynamic loading and reflective calls and also need to realize the polymorphism relationship among present objects. Static analysis algorithms that cannot disambiguate the dynamic code will fail to distinguish between the 3.

(12) vulnerable and benign calls to println. Thus we design a hybrid analysis tool which combines dynamic analysis with online analysis trying to solve the problems appeared with using static analyzer. There are three major parts in our analysis tool. First, we choose the AspectJ language to implement the dynamic analysis part of our tool. By instrumenting the dataflow analysis module into the program, we can dynamically track the unchecked user input when the program is under execution. Second, we use SOOT[4], a Java optimize framework, to implement the online. 政治大. analysis module. This module dynamically loads the program code which has been executed. 立. into the module by collecting the information from designed instrumentation to the program.. ‧ 國. 學. Then it compiles the executed program code into JIMPLE code which is a typed, 3-adress statement based intermediate representation and performs online taint dataflow analysis. We. ‧. design an online taint dataflow analysis algorithm based on JIMPLE to detect vulnerabilities. y. Nat. sit. in JAVA web applications. We believe that combining this two module’s results can perform. n. al. er. io. more accurate vulnerability detection. Finally, we design a executor module to execute the. i Un. v. program as completely as it can for triggering the entire analysis. Our program analysis tool. Ch. engchi. can help developers to reduce the numbers of vulnerabilities effectively.. 1.3 Thesis Outline The rest of this paper is organized as follows: In chapter 2, we will introduce several related work for detecting Web application vulnerabilities. In chapter3, we will introduce some preliminaries regarding this thesis including common web vulnerabilities, AOP programming, and introduce SOOT’s JIMPLE statement. In chapter 4, we will show the overall system architecture. In chapter 4 and 5, we will discuss the system implementation in detail and present the experiment results of our evaluation. Finally, we will conclude in chapter 6 and indicate some possible future work. 4.

(13) Chapter 2 Related Work There are already plenty of approaches that have been proposed for detecting web application vulnerability. These approaches were mainly divided into two categories: static analysis and dynamic analysis. Both of them are well established for a long time and have their own characteristics and limitations. We will introduce a Java static analyzer proposed by Livshits. 政治大 will also introduce the Chung’s PHP static analysis tool[5] and describe the advantages of 立. et al. They use PQL language to query the vulnerabilities in the class files of the program. We. ‧ 國. 學. implementing the static analysis on intermediate representation. Hirzel et al. [6] were perhaps the first to propose the online analysis approach for Anderson’s pointer analysis. Online. ‧. analysis is a kind of dynamic analysis. It has some both static analysis and dynamic analysis. sit. y. Nat. characteristic. It is performed like dynamic analysis that needs to execute the program and. er. io. runs the specific analysis algorithm as the static analysis to gradually produce analysis result.. al. n. iv n C U the possible set of values calculated h einformation concern. It is a technique for gathering n g c h iabout We will describe the features of online analysis. In the thesis, dataflow analysis is what we. at arbitrary points in a program. It can be both presented by using static analysis or dynamic analysis. Masuhara et al[7]. proposed a brand new pointcut, named dflow poincut, based on the AspectJ language. It performs dynamic dataflow analysis on the program with dflow pointcut. We use the dflow pointcut semantics as our base techniques for implementing a dynamic dataflow analysis module. Halfond et al[8]. proposed positive tainting techniques for preventing SQL injection attacks (SQLIAs) opposite to tradition taint analysis concepts. The WASP system dynamically tracks the hard code strings (regarded as untainted string) instead of user input string (regarded as tainted string) and use the Syntax-Aware Evaluation. 5.

(14) techniques to protect the web application from SQL injection attacks. We will describe these mechanisms in detail in the following sections.. 2.1 Static Analysis in Java Application with Static Analysis Statics analysis is common and well established approach for program analysis. Livshits et al[9]. proposed a static analysis technique for detecting vulnerabilities in Java program based on a scalable and precise points-to analysis. They use PQL[10] language to specifying taint problems.. 2.1.1 System Overview. 立. 政治大. ‧ 國. 學. Their tool based on a static analysis for finding vulnerabilities caused by unchecked input. Users of the tool can describe vulnerability patterns of interest succinctly in PQL, which is an. ‧. easy to-use program query language with a Java-like syntax. The system overview is shown in. sit. y. Nat. Figure 2.1. It applies user-specified queries to Java bytecode and finds all potential matches. n. al. er. io. statically. The results of the analysis are integrated into Eclipse.. Ch. engchi. i Un. v. Figure 2. 1 System architecture of Java static analyzer. 2.1.2 Points-to analysis Their approach is to use a sound static analysis to find all potential violations matching a vulnerability specification given by its source, sink, and derivation descriptors. To find security violations statically, it is necessary to know what objects these descriptors may refer to points-to analysis. Having precise points-to information can significantly reduce the number of false positives. Context sensitivity refers to the ability of an analysis to keep. 6.

(15) information from different invocation contexts of a method separate and is known to be an important feature contributing to precision. Their static analysis is based on a context-sensitive Java points-to analysis. The algorithm uses binary decision diagrams to efficiently represent and manipulate points-to results for exponentially many contexts in a program.. 2.1.3 Specifying Taint Problems in PQL Their tool uses PQL, a program query language, to describe taint problem. PQL serves as. 政治大. syntactic sugar, allowing users to express vulnerability patterns in a familiar Java-like syntax.. 立. PQL is a general query language capable of expressing a variety of questions about program. ‧ 國. 學. execution. A PQL query is a pattern describing a sequence of dynamic events that involves. ‧. variables referring to dynamic object instances. The uses keyword clause declares all object variables the query refers to. The matches keyword clause specifies the sequence of events on. y. Nat. er. io. sit. object variables that must occur for a match. Finally, the return keyword clause specifies the objects returned by the query whenever a set of object instances participating in the events in. n. al. Ch. i Un. v. the matches clause is found. Figure 2.2 is an example of the PQL query for finding SQL injections.. engchi. 2.1.4 Discussion Pros: They introduced the point-to analysis to statically detecting vulnerabilities in Java Web application. This technique leads to more precise analysis result with static analysis approach. They utilized the PQL language to query the vulnerabilities in the program and formalized the source, sink, and derivation descriptors. Cons:. 7.

(16) Static analysis is a conservative analysis approach. The static analysis approach doesn’t actually execute the program. Thus incompleteness becomes the major challenges when implementing a static analysis. The over approximation technique often introduce the false positives to the static analysis result. And still one of the disadvantages of static analysis is that it can’t evaluate the variable which dynamically loading into the program. If the static analysis is performed on Java web application, it cannot solve the dynamic features of Java language (e.g., dynamic loading, reflection…etc.) and will lead to false negative.. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i Un. v. Figure 2. 2 Example of PQL query for finding SQL injections. 2.2 Static analysis tool for PHP web application Chung et al.[5] applied the static taint dataflow analysis for detecting vulnerabilities in PHP 8.

(17) web application. The PHP analysis tool parses the PHP program into an intermediate representation (IR) and performs the taint dataflow analysis on the IR. The intermediate language keeps the semantics and removes redundant code in the original PHP program. It is also more clarify to the analysis algorithm.. 2.2.1 System overview The PHP analyzer operates in two major phases. . Parse PHP into C intermediate language. 政治大. The system first parses PHP into abstract syntax tree then it transforms the PHP abstract. 立. syntax tree into C abstract syntax tree with developed compiler and keep the most. ‧ 國. 學. semantics in the PHP program during the transformation. Finally, the system transforms. . ‧. the C abstract syntax tree into C intermediate language (CIL) Dataflow analysis. y. Nat. er. io. sit. The second phase runs static analysis on the CIL generated from the first phase. The system gathers the control flow graph information and start the dataflow analysis. They. n. al. Ch. i Un. v. designed an algorithm to evaluate the program for taint dataflow analysis.. engchi. The analysis architecture is shown in Figure 2.3.. Figure 2. 3:PHP analyzer analysis architecture. 9.

(18) 2.2.2 Taint dataflow analysis There are four rules to decide whether a variable is tainted or not while the system is running dataflow analysis. The rules are source rules, pass-through rules, cleanse rules, sink rules and the Figure 2.4 below illustrates the relation between rules and PHP functions. Their dataflow analysis and includes two steps: the first step is a forward dataflow analysis which record variable dependency information and taint information; the second step is a backward dataflow analysis which find tainted variables in sink functions and trace back its taint source.. 政治大 Function or variable names 立. Finally, they use a designed algorithm to execute the taint dataflow analysis. Rules. Cleanse rules. var assign str(), arr assign str(), php str split()... php htmlspecialchars(), php preg replace()... printf(), php mysql query().... ‧. Sink. 學. Pass-through. $ GET, $ POST, $ REQUEST, $ SERVER, $ COOKIE, $ FILES. ‧ 國. Source. Nat. sit. y. Figure 2. 4 The rules of PHP taint dataflow analysis. io. er. Figure 2.5 shows the algorithm of taint dataflow analysis. In the algorithm, it first records the. al. iv n C U separately. Then it merges each h eevery computes the variable information for n gbranch c h istatement n. variable information for each statement. If it meets a branch statement like ’if’, the algorithm. result at the branch join point. If the algorithm meets a loop statement, it computes the variable information repeatedly until find the fixed point. In the second step, the algorithm the result computed from step one to trace back tainted variables in sink functions.. 2.2.3 Discussion Pros It is a common technique to perform the analysis on intermediate representation translated from original program. By translating the program into an intermediate representation retains the semantic of the origin program and presents the original code in a clean and clarify 10.

(19) semantics form. In the static PHP analysis tool, they choose CIL for its intermediate representation. The CIL code simplifies the taint dataflow analysis and present much better analysis result in a PHP web application. Cons Like previous introduced work, static analysis needs to perform over-approximation techniques to evaluate possible states of the program. Hence the false positive rate is reasonably higher than dynamic analysis. Traverse the CIL AST nodes CIL AST. lvalue’s name. this. side value in. T flag := Check rv. Update whole variable. al. Algorithm has. t flag ). map with vi with vi. and line number. iv n C h statement" U information = e newn g variable c h i information n. old variable. rv,. "if statement". information. if n = joint node of if. vi (vname,. information map. if n = joint node of. io. else. instruction.. ‧. Nat. Update local variable information. Merge variable. the. is tainted or untatinted.. Obtain new variable information. else. instruction.. y. Rv := right hand. in. 學. Vname :=. 立. sit. = instruction node. ‧ 國. If n. in. er. Foreach node n. 政治大. maps. from previous. paths. "loop. reached a fixed point,. go. then. to next node. else Merge old variable information with new variable information map Revisit. this loop block Figure 2. 5 Taint dataflow analysis algorithm. 2.3 Dataflow Pointcut in Aspect-Oriented programming Masuhara et al[7]. considered that in the web applications, the sanitizing task can be a crosscutting concern because its implementation could involve with many parts of the program. Thus they proposed a new poincut, named dflow pointcut, in AspectJ language for. 11.

(20) dynamically performing dataflow analysis in a Java Web application.. 2.3.1dataflow pointcut Precisely the dflow pointcut must be a conjunctive term to other pointcuts. The following syntax defines pointcut p: P. ::=. call(s) | args(x1,x2,. . .) | p&&p | p||p. ::=. dflow[x,x’](p) | returns(x). where s represents method signature patterns and x represents variables. The first line defines. 政治大. existing pointcuts in AspectJ. call(s) matches calls to methods with matching signatures to s.. 立. args(x1,. . .,x n) matches any calls to n-arguments methods, and binds the arguments to the. ‧ 國. 學. respective variables x1, . . . , xn . Operators && and || combine pointcuts conjunctively and. ‧. disjunctively, respectively. The second line defines dataflow pointcuts. dflow[x,x’](p) matches if there is a dataflow from x’ to x. Variable x’ is bound to a value in the current join point.. y. Nat. er. io. sit. Thus dflow pointcut must be used in conjunction with some other pointcut. Variable x should be bound to a value in a past join point matching to p. By using dflow, the pointcut for the. n. al. Ch. sanitizing task can be defined as follows:. engchi. i Un. v. pointcut respondClientString(String o) :. call(* PrintWriter.print*(String)) && args(o) && within(Servlet+) && dflow[o,i](call(String Request.getParameter(String)) && returns(i));. The dflow pointcut restricts the join points to such ones that the parameter string originates from a return value of getParameter() in a past join point and tracks the value which flows to the sensitive sink statement which we concern in the program, e.g., print(),sql.query().. 12.

(21) 2.3.2 Excluding condition They also defined an extended syntax of dflow for excluding particular dataflows. They call the mechanism bypassing. The bypassing syntax is shown in the figure. Intuitively, a bypassing clause specifies join points that should not appear along with a dataflow. The bypassing requires existence of at least one dataflow that does not go through joinpoints matching to the pointcut in the bypassing clause. p. ::= dflow[x,x](p) bypassing[x](p). 政治大. Thus the bypassing mechanism can be explained for in terms of the web application sanitizing. 立. task.The conplete dflow pointcut for the example program above will bypass the quoted String. ‧ 國. 學. as following:. ‧. Class MailConfirmation extends Servlet { void doPost(Request request, Response response) {. sit. y. Nat. PrintWriter out = response.getWriter();. String address = quote(request.getParameter("ADDR"));. io. al. n. }. er. out.print(address);}. Ch. engchi. pointcut respondClientString(String o) :. i Un. v. call(* PrintWriter.print*(String)) && args(o) && within(Servlet+) && dflow[o,i](call(String Request.getParameter(String)) && returns(i)) bypassing[q](call(String *.quote(String)) && returns(q));. 2.3.3 Discussion Pros The dflow pointcut is a very intuitive mechanism for dataflow analysis, especially for tracking the vulnerabilities in a web application. It marks the tainted variable and examine if a marked. 13.

(22) variable reach the sink program statement which we concern. And also by using the bypassing mechanism, we can easily bypass the variables which have been sanitized. Cons The dflow pointcut is not an implemented pointcut in AspectJ version 1.6.9 yet. It only has a experimental prototype in Scheme language. Although by instinct, we may want to weave a field into the Object class as a tag for determine if this object is tainted or not. However, AspectJ doesn’t allow weaving to the Java Standard Library fundamental class (i.e.,. 政治大. java.lang.Object).. 2.4 positive tainting 立 in WASP system. ‧ 國. 學. Halfond et al[8]. proposed a new highly automated protecting system, named WASP, for. ‧. dynamic detection and prevention of SQL injection attacks (SQLIAs). They adopted the similar concept with dflow pointcut, they use instrument mechanism to marks and tracks. y. Nat. er. io. sit. variable in a program at runtime. The system architecture is shown in Figure 2.6. The MetaStrings library provides functionality for assigning trust markings to strings and. n. al. Ch. i Un. v. precisely propagating the markings at runtime. Module String Initializer And Instrumenter. engchi. instruments Web applications to enable the use of the MetaStrings library and adds calls to the String Checker module. Module String Checker performs syntax-aware evaluation of query strings right before the strings are sent to the database.. 14.

(23) Figure 2. 6: WASP system architecture. 2.4.1 Positive Tainting. 政治大 marking, and tracking of trusted 立 rather than untrusted data. In the web application, the trusted Positive tainting differs from traditional tainting approach. It is based on the identification,. ‧ 國. 學. data refers to the hard-coded String in the program and the untrusted data refers to the user input from the request object. There two major advantages for positive tainting. First, trusted. ‧. data can be more easily and accurately be identified than the untrusted data. Second, it is that. Nat. sit. y. the incomplete analysis leads the unidentified trusted data to false positive but never result in. n. al. er. io. an SQLIA escaping detection. On the contrary, in the case of negative tainting, incomplete. i Un. v. analysis leads to trusting data that should not be trusted and, ultimately, to false negatives.. Ch. engchi. Figure 2.7 shows a graphical depiction of this fundamental difference between negative and positive tainting.. Figure 2. 7: Identification of trusted and untrusted data. 2.4.2 Syntax-Aware Evaluation In the WASP system, they proposed the syntax-aware evaluation. The syntax-aware 15.

(24) evaluation considers the context in which trusted and untrusted data is used to make sure that all parts of a query other than string or numeric literals, e.g., SQL keywords and operators, consist only of trusted characters. As long as untrusted data is confined to literals, it is guaranteed that no SQLIA will occur. The following figures show the example of syntax-aware evaluation.. 政治大 The WASP system performs syntax-aware evaluation of a query string immediately before the 立. ‧ 國. 學. string is sent to the database to be executed. To evaluate the query string, the technique first uses a SQL parser to break the string into a sequence of tokens that correspond to SQL. ‧. keywords, operators, and literals. The technique then iterates through the tokens and checks. sit. y. Nat. whether tokens (that is, substrings) other than literals contain only trusted data. If all such. n. al. detected, a developer specified action can be invoked.. 2.4.4 Discussion Pros. Ch. engchi. er. io. tokens pass this check, the query is considered safe and is allowed to execute. If an attack is. i Un. v. Positive tainting is well-designed techniques. It not only makes the trusted data more easily to be identified but also cleverly turns the false negative into false positive due to the incompleteness of analysis. Positive tainting approach reduces the String target and easily identifies them in the program. Syntax-aware evaluation is also a sound technique for preventing the SQL injection attacks. By utilizing the SQL parser to break down the SQL string into tokens and literals and examine each token and literal if contains unstrusted variable. It simplifies the process which identify the SQL string is malicious or not.. 16.

(25) Cons So far, the WASP system is implemented only for preventing SQL injection attacks. It differs from our purpose. We are focusing on detecting the possible vulnerabilities in a Java web application not protecting the application from vulnerabilities. Hence positive tainting approach is not suitable for our approach.. 2.5 Fast Online Pointer Analysis Hirzel et al[6]. proposed the concept of online analysis and implemented the Anderson’s. 政治大. pointer analysis algorithm. An online analysis incrementally analyzes new code when it is. 立. dynamically loaded into the running program. They addressed online analysis for three major. ‧ 國. 學. reasons. First, an offline analysis does not know where code will be loaded from, which code. ‧. will be loaded, and whether a given piece of code will be loaded at runtime. Second, the reflection mechanism can manipulate the arbitrary program entities named by strings which. y. Nat. er. io. sit. can be constructed at runtime. The offline analysis does not know the content of these strings so it does not know what operation that the program entity refers to. Third, if the program. n. al. Ch. i Un. v. cooperates with foreign language, the offline analysis does not know how to analysis these. engchi. foreign code. But online analysis can instrument the places where the analyzed language makes calls to, or receives calls from the foreign language. The instrumented code checks whether the foreign code affects the analyzed code and updates analysis results accordingly. 2.5.1 Online analysis architecture The system architecture of online pointer analysis is shown in Figure 2.8. In the online analysis scenario, method compilation is performed by a just-in-time compiler (JIT) that compiles bytecode to machine code when the method gets executed for the first time and generates the inputs to the analysis. Then the constraint finder and constraint propagator. 17.

(26) computes pointsTo sets following the constraints occurred in the flow sets until the pointsTo sets reach a least fixed-point.. 政治大. 立Figure 2. 8: Online analysis architecture. ‧ 國. 學. 2.5.2 Discussion Pros. ‧. Online analysis concept solves the dynamic characteristic in Java languages. As modern. y. Nat. io. sit. languages have their own dynamic features, static analysis may leads to false alarms when. n. al. er. examining the dynamic code. Hence it is very practical to implement the analysis online.. Ch. i Un. v. Online analysis can also reduce the amount of code for being analyzed Cons. engchi. The online analysis is a kind of dynamic analysis. Because it has to execute the program thoroughly, this way greatly increases the analysis runtime overhead.. 2.6 Summary So far we have introduced many program analysis techniques. Most of them are dynamic methods. This is because our target language is Java and we seriously concern its dynamic features which happen frequently in a java web application. Dataflow pointcut is an effective and clarified way for tracking vulnerabilities in the web application and overcomes all the. 18.

(27) dynamic features in Java. In order to have the dflow tracking result, we have to the execute the program. Incompleteness becomes the major factor of dynamic dataflow analysis. Incomplete executing the program will lead to incomplete analysis result. The online analysis gives us inspirations to solve the problem. We adopt the online analysis concept to perform the taint dataflow analysis and combine the online analysis result with dynamic tracking result. We believe this way can obtain more accurate and sufficient analysis results for detecting vulnerabilities in web application.. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. 19. i Un. v.

(28) Chapter 3 Preliminaries In this section we describe some background knowledge of the thesis. First we focus on a variety of security vulnerabilities in Web applications and present some examples of these vulnerabilities commonly occur in web program. Then we will introduce the major techniques used for developing our analysis tool.. 3.1 Vulnerabilites in Web Applications 政治. 立. 大. For most security issues in a web application, vulnerabilities are caused by unchecked user. ‧ 國. 學. input. Scott and Sharp[13-14] have defined that Web application vulnerabilities are inherent in Web application programs and independent of the technology in which the application in. ‧. question is implemented, the security of the web server, and back-end database. The. y. Nat. sit. vulnerability cannot be found by the language compiler during compile time. Thus it is hard. n. al. er. io. to detect these vulnerabilities when the program is under developing. According to. i Un. v. OWASP[1-2] organization, the top ten web application security risks for 2010 are listed as follows.. Ch. engchi. 1.. Injection. 2.. Cross-Site Scripting (XSS). 3.. Broken Authentication and Session Management. 4.. Insecure Direct Object References. 5.. Cross-Site Request Forgery (CSRF). 6.. Security Misconfiguration. 7.. Insecure Cryptographic Storage. 8.. Failure to Restrict URL Access 20.

(29) 9.. Insufficient Transport Layer Protection. 10. Unvalidated Redirects and Forwards By observing the statistic reports recent years, Injection flaws and Cross-Site Scripting are two major vulnerabilities in a web application. Most of web applications suffer from these two kinds of attacking. We will introduce injection flaw and cross-site scripting vulnerabilities in detail and present some examples.. 3.1.2 Injection flaw vulnerabilities. 政治大. Injection flaw vulnerability is that attackers use the web application’s flaws to inject the. 立. malicious code to the systems. There are many kinds of injections: SQL, LDAP, XSLT,. ‧ 國. 學. HTML, XML, OS command injection and many more. Injection flaws will let the attackers to. ‧. create, read, write or delete any data available to the application. Attackers cheat the interpreter to execute unexpected commands. The most well known injection flaw is SQL. y. Nat. er. io. sit. injection. A SQL injection vulnerability results from the application’s use of user input in constructing database statements. The attacker invokes the application, passing as an input a. n. al. Ch. i Un. v. (partial) SQL statement, which the application executes. This permits the attacker to get. engchi. unauthorized access or damage to the data stored in a database. Fig. is a SQL injection example. String NAME = request.getParameter(“name”) String PASS = request.getParameter(“pass”) SELECT * FROM tblUser WHERE UserName=NAME AND Password=PASS. The fields NAME and PASS are used to construct a SQL command. Their values are distrusted given by the HTTP client. Once these two fields are set to: NAME = admin’-- and PASS = dont care. The SQL statement will be: SELECT * FROM tblUser WHERE UserName='admin'--' AND Password=’dont care’. 21.

(30) The interpreter of the SQL server will merely execute the command as follows: SELECT * FROM tblUser WHERE UserName='admin'. Finally, the administrator’s information will be exposed without any identification process. This technique can let any arbitrary operations perform in the database.. 3.1.3 Cross-Site Scripting vulnerabilities Cross site Scripting (XSS) vulnerability is a kind of injection flaw, in which malicious scripts are injected into otherwise benign and trusted web sites. XXS is the top one vulnerability in. 政治大. 2007 OWASP report. Although it drops to number two in 2010, it is still a prevalent security. 立. issues in web application. Cross site scripting attacks occur when an attacker uses a web. ‧ 國. 學. application to send malicious code, generally in the form of a browser side script, to a. ‧. different end user. In the mean time, attackers can highjack user sessions or cookies, deface the web site, insert malicious code and conduct the phishing attacks, etc. Most of XXS attacks. y. Nat. er. io. sit. in recent times are written in Javascript but any other scripting languages adopted by the browser might be used for producing XXS attacks. There are three known types of cross site. n. al. Ch. scripting: reflected, stored, and DOM injection. . engchi. i Un. v. Reflected XSS is the easiest to exploit – a page will reflect user supplied data directly back to the user. This kind of XXS is regarded as first-order XXS.. . Stored XSS takes hostile data, stores it in a file, a database, or other back end system, and then at a later stage, displays the data to the user, unfiltered. This is extremely dangerous in systems such as CMS, blogs, or forums, where a large number of users will see input from other individuals. This kind of XXS is also regarded as second-order XSS. . With DOM based XSS attacks, the site’s JavaScript code and variables are. 22.

(31) manipulated rather than HTML elements. Alternatively, attacks can be a blend or hybrid of all three types. The danger with cross site scripting is not the type of attack, but that it is possible. <IFRAME src="javascript:document.location.href='http://www.malicious.website.com';"></IFRAME>. Statement above shows a simple example of stored based XXS attacks. It combines the Html syntax with javascript. Attackers may use similar technique to input the malicious statement to the blog or forum. The victims whoever browse this page encoded with the attack statement. 政治大. will be redirect to the malicious website.. 立. Different from stored based XXS, a DOM based XXS uses Document Object Model (DOM). ‧ 國. 學. techniques to present the attacks. Fig 3.1 is a DOM based XXS vulnerability example. <HTML>. y. Nat. <SCRIPT>. sit. Hi. ‧. <TITLE>Welcome!</TITLE>. var pos=document.URL.indexOf("name=")+5;. io. al. n. </SCRIPT>. er. document.write(document.URL.substring(pos,document.URL.length));. <BR>. Ch. Welcome to our system. engchi. i Un. v. … </HTML> Figure 3. 1 DOM based XXS vulnerability example. Normally, this HTML page would be used for welcoming the user, e.g., http://www.vulnerable.site/welcome.html?name=Joe. However, a request such as: http://www.vulnerable.site/welcome.html?name= <script>alert(document.cookie)</script> would result in an XSS condition.. 23.

(32) 3.2 Aspect-Oriented Programming Aspect-oriented programming (AOP) languages are very popular in recent years. They support modularization of crosscutting concerns. The concept of AOP was proposed in 1997. It has a number of antecedents: the Visitor Design Pattern, CLOS MOP, and others. AspectJ is the implementation of AOP for Java language. The most important mechanisms in AOP is called pointcut-and-advice. . Pointcut: Pointcuts specify certain join points in the program flow and the join. 政治大. points are the program points where we intend to instrument. For Example:. 立. call(String HttpServletRequest.getParameter(String));. ‧ 國. 學. The pointcut picks out each join point that is a call to a method that has the. ‧. signature, HttpServletRequest.getParameter(String) – that is, HttpServletRequest’s getParameter method with a String as return type and with a single String parameter.. y. Nat. er. al. n. . io. pointcuts.. sit. A pointcut can also be Conjunctions, Negations, and Disjunctions of other. Ch. i Un. v. Advice: Advice is the piece code being executed when reaching a join point. For example:. engchi. before(): break() { System.out.println("play the music "); }. The before keyword indicates that the advice on a method call join point runs before the actual method starts running. The other two keywords are after and around. The after keyword indicates that the advice on a method call join point runs after the actual method starts running and around. 24.

(33) Chapter 4 System Architecture We designed a dynamic analysis tool which is written in Java and AspectJ. Figure 4.1 shows the high-level architecture of our system. Our system consists of three major parts (denoted by shaded boxes in the figure): TAINT TRACKER aspect, ONLINE ANALYZER module and PROGRAM EXECUTOR module. The TAINT TRACKER aspect is written in AspectJ. It. 政治大 aspect instruments the web application to gather the dataflow information. It tracks the user 立 simulates the dflow pointcut mechanism. By using AspectJ compiler, TAINT TRACKER. ‧ 國. 學. input as taint object and produce the tracking result. It also provides the information of executed code to ONLINE ANALYZER module. ONLINE ANALYZER module performs the. ‧. online taint dataflow analysis. It receives the needed information from TAINT TRACKER. sit. y. Nat. aspect and loads the program code which has been executed. The ONLINE ANALYZER. er. io. module receives the code information form designed aspect. It first builds call graph based on. al. n. iv n C U dataflow analysis algorithm on the The ONLINE ANALYZER moduleh performs e n g cthe h itaint. the loaded method and compiles the loaded code into JIMPLE intermediate representation.. loaded code and produce online taint dataflow analysis result. Module PROGRAM EXECUTOR module executes the web application for triggering the entire dynamic analysis. The scenario for using our analysis tool is shown in Figure 4.2. The web application is instrumented with our designed aspect. When the web program is executed by PROGRAM EXECUTOR, the instrumented code will be invoke and triggers TAINT TRACKER module and ONLINE ANALYZER for producing the corresponding analysis result. In the next section, we will discuss each module in more detail.. 25.

(34) Figure 4. 1 System architecture. 立. 政治大. n. Ch. engchi U. Worklist propagator. y. Online analysis information. OLIINE ANALYZER. sit. io. al. ‧. Nat. public class MyServlet2 extends HttpServlet{ protected void doGet(HttpServletRequest req, public class MyServlet1 extends HttpServlet{ HttpServletResponse throws IOException { protected void resp) doGet(HttpServletRequest req, String name = req.getParameter(“name”); HttpServletResponse resp) throws IOException { protected void doGet(HttpServletRequest req, . String name = req.getParameter(“name”); HttpServletResponse resp) throws IOException { . . name = req.getParameter(“name”); .String . . PrintWriter . . writer = resp.getWriter(); writer.println(name); PrintWriter writer = resp.getWriter(); . } writer.println(name); PrintWriter writer = resp.getWriter(); } } writer.println(name); } } }. Online Aspect. er. public class MyServlet3 extends HttpServlet{. 學. Web application. ‧ 國. PROGRAM EXECUTOR. v ni. Potential vulnerabilities result. TAINT TRACKER Aspect. True vulnerabilities result. Figure 4. 2: Scenario of analysis using our tool. 4.1 TAINT TRACKER aspect In TAINT TRACKER aspect, we adopted the concept of dflow pointcut and simulate the dflow with existing pointcuts in Aspect. We utilized the characteristic of Java language, i.e., pass by reference, to keep track of each unchecked user input object (regarded as tainted object). In java language, objects are passed by reference and each object has its own object id in JVM at runtime. Once we record the object ID of the tainted string objects, we can track 26.

(35) these tainted objects along the program to see if they reached any sensitive program point that we concerns (i.e., sink statement). In our implementation, we use the function, System.identityHashCode, to retrieve the unique ID for each tainted object in the JVM during runtime. The following shows an example of retrieving the object ID. TaintedSet.add(System.identityHashCode(obj));. In order to completely simulate the ideas of dflow poincut, we categorize our designed pointcuts into source poincuts, sink poincuts, propagation pointcuts and sanitization pointcuts.. 政治大. Source pointcuts specify untrusted methods and store the object ID of its returning object in a. 立. data structure, named TainteSet. Sink pointcuts specify the sensitive output methods, i.e., to. ‧ 國. 學. the database or the browser in the client side, which we concerns. Propagation pointcuts define the methods which will modify the object ID that is already in our tainted set and. ‧. replace the old object ID with the new one. Sanitization pointcuts specify the method which. y. Nat. sit. will sanitize the tainted object into a clean one and remove the object id which has been. n. al. er. io. sanitized from the TaintedSet.. Source pointcuts. Ch. engchi. i Un. v. In a Java web application, the request from the client side will be packed into an object which implements the javax.servlet.http.HttpServletRequest interface by the web server container. The web program can access the client input through this object and we have to monitor the methods applied by the object. Thus we designed three pointcuts to collect the needed information for these insecure methods as follows: pointcut SOURCEPCD1():call(String javax.servlet.http.HttpServletRequest.getParameter(String)) pointcut SOURCEPCD2():call(String javax.servlet.http.HttpServletRequest.getParameterMap(String)) pointcut SOURCEPCD3():call(String javax.servlet.http.HttpServletRequest. getParameterValues(String)). Source advice 27.

(36) Source advices capture the returned tainted objects from the methods monitored by the source pointcuts and add the tainted objects to an aspect scope data structure, named TaintedSet. We design three advices according to the souce pointcuts above as follows: after() returning(String taint):SOURCEPCD1(){this.TaintedSet.add(System.identityHashCode(taint));} after() returning(Map stringmap):SOURCEPCD2(){ for(String taint: stringmap.values()){this.TaintedSet.add(System.identityHashCode(taint));} } after() returning(String[] stringarray):SOURCEPCD3(){ for(String taint : stringarray ){ this.TaintedSet.add(System.identityHashCode(taint));} }. 立. 政治大. 學. ‧ 國. In the SOURCEPCD1, we simply take the returned string as tainted object and add it to the TaintedSet due to the getParameter method only return one string a time. In the SOURCEPCD2. and. SOURCEPCD3,. the. two. functions,. getParameterMap. and. ‧. getParameterValues, both return a data structure of tainted string. Thus we have to break. y. Nat. sit. down the data structure into single string object and add each string object id to the TaintedSet.. n. al. er. io. Obviously, these pointcuts and advice cannot describe the complete source point in a program.. i Un. v. An untrusted string can be obtained from a SQL query in a database Web application. Thus in. Ch. engchi. our tool, we let developers to define their source pointcuts and advices to specify the program statements which they concern as a source statement in certain Web application.. Sink pointcut In a web application, there are many sink statement that we concern, such as a writer print the content to the client side browser or execution of the SQL string to make a transaction to database in the back end. If a tainted string object involve with these operations, the web application becomes insecure. Obviously, different web applications may have different concerned sink point. In our sink pointcut design, we only write down the most commonly used insecure output methods as follows. 28.

(37) pointcut SINKPCD1():call(* *.println(..));. We also hope the developer can provide more sink statement in the program. Our tool will also let developers input other sink statements’ signature according to the specified web application.. Sink advice In the sink advice, it examines all the arguments of the method and reports if the arguments contain the tainted objects. We design the sink advice as follows:. 政治大 for (Object ob : thisJoinPoint.getArgs()) { 立 if(this.isTainted(ob)){ //report as a vulnerability}. before():SINKPCD1(){. ‧ 國. 學. } }. ‧. private boolean isTainted(Object target){ for(Integer ob : TaintedSet){. y. sit. al. n. }. io. return false;. er. }. Nat. if(ob.intValue() == System.identityHashCode(target)){return true;}. Ch. engchi. i Un. v. We design the isTainted helper function in the TaintTrackingAspcet to check if the input object’s id is in the TaintedSet. The sink advice use isTainted function for determining whether an argument of this function appears in the sink statement and regard as a vulnerable statement, i.e., a web application vulnerability.. Propagation pointcuts Propagation pointcuts define the statement which might propagates a tainted object to other variables. In a Java program, we don’t have to record the right and left side of an assignment statement of two objects. If an object is passed to another object through assignment statement, it only changes the object reference id in the left-hand side. These two variables 29.

(38) still refer to the same object. In other words, the assignment statement only makes these two variable points to the same object. That is, if a variable refer to a tainted object and assign to another variable, we don’t have to change anything in the TaintedSet. Thus in this pointcut, we only concern the operation which might create or modify the tainted object id. During runtime, String is an immutable object in the JVM. Thus every string operation listed below will make a new string after execution. We categorize these string operations by using return type into three groups: string, String[] and char[]. (CharSequence is an interface. 政治大. implemented by String, hence we regard the returned object as a String). 立. Signature. Method. public String concat(String str). ‧ 國. public String(String original). 學. Constructor. Return Object. ‧. public String intern() public String replace(char oldChar, char newChar). new String new String new String. Nat. y. new String. er. io. sit. public String replace(Char Sequence target, Char Sequence replacement) new String public String replaceAll(String regex, String replacement). new String. public String replaceFirst(String regex, String replacement). new String. n. al. Ch. engchi. i Un. v. public CharSequence subSequence(int beginIndex, int endIndex). new CharSequence. public String substring(int beginIndex). new String. public String substring(int beginIndex,in t endIndex). new String. public String toLowerCase(). new String. public String toLowerCase(Locale locale). new String. public String toString(). new String. public String toUpperCase(). new String. public String toUpperCase(Locale locale). new String. public String trim(). new String. 30.

(39) public String[] split(String regex). new String[]. public String[] split(String regex, int limit). new String[]. public char[] toCharArray(). new char[]. Figure 4. 3 String operations. Thus we have the pointcuts as follows: pointcut propagate1(Object target):call(String.new(String))&&args(target); pointcut propagate2(Object target):call(String String.*(..))&&target(target); pointcut propagate3(Object target):call(String[] String.*(..))&&target(target); pointcut propagate4(Object target):call(char[] String.*(..))&&target(target);. 政治大. The propagate1 pointcut intercepts the constructor call of string and the others specify. 立. different return type of string operations. In java language, there is still a string concatenation. ‧ 國. 學. operator, “+” and the java compiler compiles it into a series of StringBuffer operation. For. ‧. example:. String x = “a” + “b” +”c”+”3”;. y. Nat. er. io. sit. It is compiled to the equivalent of:. x = new StringBuffer(“a”).append("b").append(“c”).append("3").toString(). n. al. Ch. i Un. v. There are two possible objects for compiler to transforms: StringBuilder and StringBuffer.. engchi. Stringbuilder is used when program is accessed by single thread and StringBuffer is used when the program is accessed by multiple threads. Thus we design another two poincuts to gather the information of StringBuilder and StringBuffer: pointcut propagate5(Object arg):call(StringBuffer.new(String))&&args(arg); pointcut propagate6(Object arg):call(StringBuilder.new(String))&&args(arg); pointcut propagate7(Object target):call(String *.toString())&&target(target);. Finally, we have to avoid the string-liked objects (ie., char[], string[]) transforms into another new strings by using toString() method. So we have to add one more pointcut: pointcut propagate7(Object target):call(String *.toString())&&target(target);. 31.

(40) In summary, our entire qualified propagation pointcuts are listed below: pointcut propagate1(Object target):call(String.new(String))&&args(target)\; pointcut propagate2(Object target):call(String String.*(..))&&target(target); pointcut propagate3(Object target):call(String[] String.*(..))&&target(target); pointcut propagate4(Object target):call(char[] String.*(..))&&target(target); pointcut propagate5(Object arg):call(StringBuffer.new(String))&&args(arg); pointcut propagate6(Object arg):call(StringBuilder.new(String))&&args(arg); pointcut propagate7(Object target):call(String *.toString())&&target(target); pointcut propagation1(Object target):propagate1(target)||propagate2(target) ||propagate7(target) &&!within(TaintTrackingAspect+);. 政治大 &&!within(TaintTrackingAspect+); 立 pointcut propagation3(Object arg):propagate5(arg)||propagate6(arg). 學. ‧ 國. pointcut propagation2(Object target):propagate3(target)||propagate4(target). &&!within(TaintTrackingAspect+);. Propagation advices. ‧. Once a new string is produced after executing certain string operation on the tainted string. y. Nat. sit. object, we consider these new strings are also suspicious and regard them as tainted string. n. al. er. io. conservatively. Thus, our propagation advices gather the new string object it and add the new. i Un. v. tainted string object id to the TaintedSet. We have three advices associated with the propagation pointcuts:. Ch. engchi. @SuppressAjWarnings({"adviceDidNotMatch"}) after(Object target) returning(Object ret):propagation1(target){ if(this.isTainted(target)){ this.TaintedSet.add(System.identityHashCode(ret)); } } @SuppressAjWarnings({"adviceDidNotMatch"}) after(Object arg) returning(Object ret):propagation3(arg){ if(this.isTainted(arg)){ this.TaintedSet.add(System.identityHashCode(ret)); } } 32.

(41) @SuppressAjWarnings({"adviceDidNotMatch"}) after(Object target) returning(Object[] ret):propagation2(target){ if(this.isTainted(target)){ this.TaintedSet.add(System.identityHashCode(ret)); for(Object tmp : ret){ this.TaintedSet.add(System.identityHashCode(tmp)); } } }. Summary:. 立. 政治大. The TAINTTRACKER aspect is designed to simulate the Dflow pointcut. The semantic of. ‧ 國. 學. Dataflow pointcut is trying to tag the tainted object and examine the tag in the sink point. Since we can’t use AspectJ to weave the fundamental class with a tag field, i.e.,. ‧. java.lang.Object class, in the standard library, we use the unique object ID as a tag to track the. y. Nat. sit. tainted object. The propagation pointcuts are trying to prevent the lost of tracking tainted. n. al. er. io. object due to the operations provided by String object. They replace the old object ID with a. i Un. v. new one to continue to track the tainted object. We have considered all the operations on. Ch. engchi. strings that possibly produced a new tainted string because the string object is immutable during the runtime. Furthermore, TAINT TRACKER is designed as a template class. Because these corresponding pointcuts (source, sink and sanitize) may different from one web application to another. Hence we also provide a flexible user interface for developers to define the source, sink and sanitization pointcuts according to specified Java Web Application.. 4.2 Online taint dataflow analysis The ONLINE ANALYZER module is developed with Soot framework. We choose to use JIMPLE as our intermediate representation. In our system, ONLINE ANALYZER module has. 33.

(42) two purposes. One is to help PROGRAM EXECUTOR module to collect the key information of the HTML forms hardcoded in the program. This part will be explained in the next section. The other one is to collocate with our designed aspect to perform the online taint dataflow analysis. The ONLINE ANALYZER analyze flow diagram is shown in Figure 4.4 Class and method information from designed pointcut. Build / update Call-graph. 政治大. Loading associated class and method body. Raw information of the associated class and method. 學. ‧ 國. 立. Compile the raw information into JIMPLE intermediate representation. ‧. Associated class and method in JIMPLE form. sit. y. Nat. io. point. n. al. er. Perform taint dataflow analysis to fixed. Ch. engchi. Analysis Result. i Un. v. Figure 4. 4: Online analysis flow diagram. 4.2.1 Collect the information from instrumentation Online analysis gradually examines the code which has been executed. Hence, providing the gradual information to ONLINE ANALYZER is necessary. We designed a pointcut, named OnlineAnalysisPCD, to collect needed information for ONLINE ANALYZER. Before online analyzing, ONLINE ANALYZER needs to build the call graph along the executing program. The following example shows the part of OnlineAnalysisPCD definition:. 34.

(43) pointcut CallGraphPCD():OnlineAnalysisPCD (execution(* doGet(..)))&& if(thisJoinPoint.getKind().equals(\"method-call\"))&& !call(* java..*.*(..))&&!call(* javax..*.*(..))&&!call(*.new(..)). There are two major protocols for client to request the resource in the back-end: GET and POST. Hence doGet method and doPost are two possible entry methods for a Java servlet web application. In the pointcut definition above, we illustrate the pointcut definition of GET protocol. The entry method is bound to be the head node in the call graph of every execution. 政治大. path. In the OnlineAnalysisPCD pointcut, we use cflowbelow pointcut to confine the join. 立. points for call out methods from the entry method in the program. The cflowbelow pointcut. ‧ 國. 學. can capture all join points encountered within the program control flow after the initiating join point selected by a separate pointcut we defined, ie., execution(* doGet(..)) . The. ‧. callgraphPCD pointcut uses another two pointcuts, (!call(* java..*.*(..))) and (!call(*. y. Nat. sit. javax..*.*(..))), to reduce unnecessary weaving to the methods in Java Standard Library. We. n. al. er. io. only focus on the business logic part of the application. The OnlineAnalysisPCD advice is defined as follows:. before():OnlineAnalysisPCD(){. Ch. engchi. i Un. v. ONLINEANALYZER.makeCallEdge( thisEnclosingJoinPointStaticPart.getSignature().getDeclaringType().toString(), thisEnclosingJoinPointStaticPart.getSignature().getName(), thisJoinPoint.getSignature().getDeclaringType().toString(), thisJoinPoint.getSignature().getName()); }. The advice of OnlineAnalysisPCD advice provides the caller and callee information to the ONLINE ANALYZER when it encountered the newly loaded class and method during runtime. Thus the ONLINE ANALYZER can load the associated class and method information by using SOOT class loader and ready for analyzing the loaded code. 35.

(44) 4.2.2 Online Taint Dataflow Analysis After building the call graph of the program, OLINE ANALYZER examines if there has a branch structure in the current execution path. If so, the ONLINE ANALYZER starts to do the online taint dataflow analysis from the head node of the call graph. By this way, we can reduce the numbers of code to be analyzed. Because if there is no branch structure along in current call graph, it is impossible to produce potential vulnerabilities from original code. That is, we can let the TAINT TRACKER aspect to do the entire analysis along current. 政治大. execution path and preserve the analysis time by the ONLINE ANALYZER. ONLINE. 立. ANALYZER performs the inter-process taint dataflow analysis. Dataflow analysis is a. ‧ 國. 學. technique for computing the possible values at arbitrary program points and performs on the. ‧. program’s control flow graph (CFG). Thus we must have the program’s CFG before the data flow analysis. The CFG is a data structure built on top of the intermediate code representation. y. Nat. er. io. sit. abstracting the control flow behavior of compiled function. It is an oriented graph where nodes are basic blocks and edges represent possible control flows from one basic block to. n. al. Ch. i Un. v. another. Here we compute the CFG by using the module provided by SOOT framework. Like. engchi. TAINT TRACKER aspect, we need to make rules to decide whether a variable is tainted or not. These rules are source rules, sink rules, propagation rules and sanitization rules. Fig. depicts the relationship of rules and function names. Rules. Function Names. Source. getParameter(String), getParameterMap(String)…. Sink. println(String), query(String)…. Propagation. concat(String), substring(..),split(..)…. Sanitization. Defined by developers Figure 4. 5: Online taint dataflow analysis rules. SOOT framework covers the low level task for us. We can choose to use different algorithms. 36.

(45) (e.g., Iter, alias, worklist) to be the propagator for the flow analysis. In our ONLINE ANALYER, we choose Worklist algorithm to be our propagator. Because of it is an efficiency algorithm for program analysis. ONLINE ANALYER module provides its own analysis results. The results produced by ONLINE ANALYZER may contain the false positive alarms. We regard the results as potential vulnerabilities and compare them with TAINT TRACKER’s result.. 4.3 PROGRAM EXECUTOR. 政治大. Our system is a kind dynamic analysis system. The web application must be executed as. 立. completely as possible for integrity of the analysis result. It is not an easy task. We designed. ‧ 國. 學. the PROGRAM EXECUTOR module and try to solve the incompleteness problem. The. ‧. PROGRAM EXECUTOR module provides two approaches to execute the programs: 1) executes the web application with mock HTTP request and response object. This approach. y. Nat. er. io. sit. doesn’t need the web program deployed on a web container and is specially designed for simple java servlet web application; 2) executes the web application with formatted URL. In. n. al. Ch. i Un. v. this case, the program must be deployed on a web container. Figure 4.6 depicts the scenario of these two approaches.. engchi. Figure 4. 6: Scenario of program executor. For executing a web program, we need to know which associated resource in the back-end. 37.