The structure of this thesis is shown as follows. Chapter 2 describes the backgrounds of software testing and vulnerability, along with an introduction to related work. Chapter 3 and 4 explain our method and implementation. Chapter 5 shows the experimental results. Finally, Chapter 6 concludes our thesis, with further work.
Chapter 2 Background
The techniques of software testing are very various. Static analysis can handle large-scale programs because it just scans source code without executing the programs, but false positives often happen in static analysis, i.e. less precise. Dynamic analysis executes the programs under test to find the explicit bugs, but it is often slow than static analysis. In this chapter, three popular techniques of software testing are introduced, and compared their differences.
On the other hand, the types of software vulnerabilities are also various, such as buffer over-flow, command injection, and race condition. Control-flow hijacking vulnerabilities are main targets in this thesis, and five common vulnerabilities to be experimented are explained in follow related sections. In the final section, we describe the related work about symbolic execution, which is the main used technique of software testing in this thesis, and exploit generation.
2.1 Software Testing
2.1.1 Fuzz Testing
Fuzz testing is a common technique of software testing, which provides random or unexpected data as input to crash a program or to trigger assertions. Fuzz testing often treats the program under test as a black box, and cooperates with a Fuzzer1, which is a kind of testing tools that will generate data, to repeatedly feed the program with random input. Fuzz testing is fast and precise because programs are concretely executed, but path coverage is probably low because the input are generated randomly. Considering line 12 in Listing 1, the chances of random input
1For example, zzuf (http://caca.zoy.org/wiki/zzuf) is a transparent application input fuzzer aiming to find bugs in applications.
to take true branch at condition statement i f (x == 2011) is 1
232 if x is 32 bits. Fuzz testing is likely to spend much time to wildly explore the paths. Consequently, fuzz testing is inefficient for covering all paths of programs, but is good at getting some input to crash programs.
2.1.2 Symbolic Execution
Symbolic execution[4, 5, 6] is a popular technique of software testing. In contrast with concrete execution that treats the program under test as a black box and find next new path without any information, symbolic execution attempts to explore all paths in the program more systemati-cally by transforming the path feasibility problem into boolean satisfiability problem.
The main idea of symbolic execution is to replace variables controlled by external environ-ments with symbolic values rather than actual data. The value range of those variables repre-sented by symbolic expressions is unlimited, i.e. any value, when the program runs initially.
With program execution, those symbolic variables will taint other non-symbolic variables, and its value will be gradually restricted.
A path condition is a quantifier-free boolean formula, and its satisfiability could be validated by constraint solvers, a kind of solver for Satisfiability Modulo Theories (SMT) problem. The path condition represents the control flow of program execution and its value is true initially.
Whenever program execution encounters branches that associate with symbolic variables, the symbolic execution forks a new execution with different path conditions, i.e. different restric-tions for the symbolic value. On true branch, the branch condition is added to the path condition, otherwise the negation of the branch condition is used. Each of two updated path conditions will be passed to a constraint solver to determine whether the path condition is satisfiable or not.
If the path condition is not satisfiable, the path will be dropped because it is infeasible. When a program execution terminates, the path condition can be solved by a constraint solver to get a test case that will traverse same path.
Consider the example code in Listing 1, variable x is replaced with a symbol X when the program starts running. At line 5, the execution is forked, one’s constraint is X ≥ 0 and another is X < 0. The concrete variable y is tainted by symbolic variable x at line 10, so variable y becomes symbolic and its value is X + 100. At line 12, the execution is forked again because variable y is symbolic. An execution is added a constraint X = 2011 and another is added X6= 2011. The path whose path constraint is (X < 0) ∧ (X = 2011) is dropped because it is
infeasible. Finally, when symbolic execution has explored all three paths, each path condition is passed to a constraint solver to get a solution taking each path respectively. The process of symbolic execution is shown in Figure 1.
Listing 1: An example code for software testing 1 v o i d t e s t ( i n t x )
Figure 1: The symbolic execution tree of Listing 1
2.1.3 Concolic Testing
In practice, symbolic execution is usually infeasible for large programs because of path explo-sion problem. The number of paths is growing exponentially in proportion to the number of
branches, and if a program contains infinite loops, such as GUI applications, the number of paths will approach infinite. Currently, many research efforts focus on these issues, including using path selection heuristics to quicker find desired paths, using static analysis to prune off the useless parts of search space, etc.
Concolic testing is a strategy combining the accuracy of concrete execution and the sys-tematic capacity of symbolic execution. Concolic testing executes the program under test con-cretely and symbolically, and explores only one path at a time. Concolic testing first executes the program under test with concrete random input, and symbolic execution is used to collect the branch conditions. Whenever a path terminates and gets a final path condition, concolic testing negates the end condition of the whole path condition to generate the next test case that will explore a new next path followed by depth-first search.
Consider the code in Listing 1, for example, the random concrete value of variable x is 0 initially, and concolic testing explores the path whose path condition is (X ≥ 0) ∧ (X 6= 2011).
For exploring a new next path, concolic testing inverts the end constraint X 6= 2011 and then passes the changed path condition (X ≥ 0) ∧ ¬(X 6= 2011) to a constraint solver to get a new test case, e.g. x = 2011, and explore other new paths. The process of concolic testing is shown in Figure 2.
Figure 2: The process of concolic testing on Listing 1
In addition to handling path explosion problem, concolic testing addresses what symbolic execution gets stuck in some constraints, e.g. (X = Y ∗Y ∗Y ), because constraint solvers have
trouble with non-linear constraints. Concolic testing can replace variable Y with concrete value, e.g. Y is 2, and the constraint is simplified to X = 8.