CAST:自動化動態軟體驗證工具

(1)

國

立

交

通

大

學

資訊科學與工程研究所

碩

士

論

文

CAST:自動化動態軟體驗證工具

CAST: Automatic and Dynamic Software Verification Tool

研究生：林友祥

指導教授：黃世昆教授

中

中華

華

華民

民

民國

國

國九

九

九十

十八

十

八

八年

年

年六

六

六月

月

(2)

ii

CAST:自動化動態軟體驗證工具

CAST: Automatic and Dynamic Software Verification Tool

研究生：林友祥 Student：You-Siang Lin

指導教授：黃世昆 Advisor：Shih-Kun Huang

國立交通大學

資訊科學與工程研究所

碩士論文

A Thesis

Submitted to Department of Computer and Engineering College of Computer Science

National Chiao Tung University in partial Fulfillment of the Requirements

for the Degree of Master

in

Computer and Information Science

June 2009

Hsinchu, Taiwan, Republic of China

(3)

CAST:

CAST:自動化動態軟體驗證工具

自動化動態軟體驗證工具

學生：林友祥指導教授：黃世昆教授

國立交通大學資訊科學與工程研究所碩士班

摘要

軟體測試是軟體工程用以確保軟體品質重要的一部分。此外，在程式中自動驗證特性是軟體測試的遠程目標。近年來，結合具體與符號執行( concolic 測試)成為一個眾所周知的方法用來路徑分支測試並且許多研究表明該方法可以結合全域檢查，來找出程式錯誤。在本論文中，我們提出 CAST 的規範語言，建立於 concolic 測試結合全域檢查的基礎上，可以描述各種規格檢查 C 語言程式的安全特性(從另一個角度來看，我們可以將此作為一種駭客攻擊，以獲得接近 exploit 的測試資料)。CAST 是一個自動和動態軟體驗證工具，主要包括樣式匹配，全域檢查和資料流分析，所以可以使我們的全域檢查比一般的 concolic 測試的更加靈活和複雜。

(4)

ii

CAST: Automatic and Dynamic Software Verification Tool

Student: You-Siang Lin Advisor: Dr. Shih-Kun Huang

Institute of Computer Science and Engineering

National Chiao Tung University

ABSTRACT

Software testing is an essential part of software engineering for ensuring software quality. Furthermore, automatically verifying properties in programs is a long-time goal in software testing. In recent years, combining concrete and symbolic execution (concolic testing) becomes a well-known approach for branch testing and many researches indicate that the approach can combine with universal checks to find bugs. In this paper, we present the CAST specification language which can describe various kinds of specification for checking security properties of C programs (from another point of view, we can take this as a hack attack to attain test cases close to exploit) based on concolic testing with universal checks. CAST, an automatic and dynamic software verification tool, is mainly composed of pattern matching, universal check and data flow analysis such that we can make universal checks more flexible and complex than that general concolic testing uses.

(5)

誌謝

首先誠摯的感謝指導教授黃世昆老師，老師悉心的教導使我得以一窺軟體品質領域的深奧，不時的討論並指點我正確的方向，使我在這些年中獲益匪淺。老師對學問的嚴謹更是我輩學習的典範。三年裡的日子，實驗室裡共同的生活點滴，學術上的討論、言不及義的閒扯、讓人又愛又怕的宵夜、趕作業的革命情感...，感謝眾位學長、同學、學弟妹的共同砥礪(墮落?)，你/妳們的陪伴讓三年的研究生活變得絢麗多彩。感謝昌憲、彥佑、立文、揚傑學長們不厭其煩的指出我研究中的缺失，且總能在我迷惘時為我解惑，也感謝昆翰、文健、彥廷、世宇同學的幫忙，恭喜我們順利走過這段研究生的日子。感謝秋如姊姊和文智姐夫時常對我的關心與不時的加油打氣，並且在我心情沮喪的時候帶我到處散心。感謝資工系踢球的小武、紅幫斬、修齊、僑生以及其他學弟們，在交大踢球的日子真的非常快樂，一起拿下大資盃、系際盃、風竹盃那段光榮日子，到老都值得回憶再三。真的很感謝大家，你們的好我會銘記在心一輩子。最後，謹以此文獻給我摯愛的雙親。

(6)

iv

List of Figures

Figure 1: Testing to make sure that the implementation includes the features described in the

specification will miss many security problems ... 2

Figure 2: Static data flow analysis is based on abstract syntax tree. ... 7

Figure 3: An example of stack overflow. ... 8

Figure 4: A simple illustration of stack segment in Figure 3. ... 9

Figure 5: An example with command injection vulnerability. ... 10

Figure 6: An exploitable example of buffer overflow ... 12

Figure 7: ALERT and CAST system architecture flow graph... 14

Figure 8: A simple C example with CAST instrumentation. ... 18

Figure 9: How to model a C program with symbolic execution. ... 20

Figure 10: A C program with symbolic name problem. ... 21

Figure 11: Instrumentation of inserting and removing object map. ... 23

Figure 12: A C program of buffer accesses with symbolic index... 24

Figure 13: Pseudo code of generating shellcode constraints. ... 25

Figure 14: The grammar of the CAST specification language ... 28

Figure 15: The method of efficient pattern matching. ... 30

Figure 16: Check integer overflow with universal checks. ... 31

Figure 17: A specification checks integer overflow. ... 32

Figure 18: Wargame 1 contains a buffer overflow... 34

Figure 19: Simple memory allocation of wargame 1. ... 34

Figure 20: Specification of wargame 1. ... 35

Figure 21: Wargame 2 contains a command injection. ... 37

Figure 22: Specification of wargame 2 checks command injection. ... 37

Figure 23: Wargame 3 constains buffer overflow. ... 41

Figure 24: The memory map of wargame 3. ... 42

Figure 25: Specification of wargame 3 checks buffer overflow and exploit. ... 43

Figure 26: Concrete values bound to symbolic values ... 47

Figure 27: The influence of gcc alignment on function parameter... 48

(9)

List of Tables

Table 1: DDFA instruments codes for each instruction pattern in C. ... 3

Table 2: CIL instrumenter instruments three-address code. ... 16

Table 3: The cases of buffer overflow cause different vulnerabilities to a C program. ... 23

Table 4: CIL instrumenter instruments functions code. ... 26

Table 5: Environment of CAST ... 27

Table 6: Comparison between CAST, CUTE, CREST and EXE ... 44

(10)

1

1. Introduction

Software validation is the process of evaluating a software system or component in order to determine whether it satisfies specified requirements and it is a very hard problem. Usually, most validation is associated with traditional execution-based testing, that is, exercising code with manually-generated test cases.

Up to now software testing is the primary way to check the correctness and quality of software. Software testing is an essential part of software engineering and it not only costs United States economy tens of billions of dollars annually but also has high economic impacts because of inadequate infrastructure [1]. As software develops vastly and becomes more complex, we can’t only rely on manpower to find bugs because of time-consuming and inefficiency. Automation is the lowest requirement and many approaches have been proposed since 70’s [2]. Then, we implement a tool, CAST, which can automatically verify a c program with user-defined specifications depending on the power of concolic testing. In addition, combining concolic testing, pattern matching, dynamic data flow analysis and universal checks into CAST provides the ability to describe complex checks for testers who are eager to confirm whether the properties are violated or not.

In this paper, we show that CAST can be easily used to describe properties to automatically exploit wargames from our secure programming course.

1.1. Background

The property checking problem is to check whether a program satisfies a specified safety property. According to the different kinds of property, different techniques are applied and different policies are adopted. Static analysis is complete and fast but imprecise. Dynamic analysis is usually sound and precise but slow. Model checking try to be sound and complete but it can’t be scalable. Concolic

(11)

testing balances the advantage and the disadvantage of these techniques above. We describe these techniques more clearly in Section 2. There are two broad approaches which have been proposed, and we describe them as below.

1.1.1. Universal Checks

Universal checks [3] support several universal symbolic checks which are more powerful than concrete checks since a symbolic check checks all possible values for a given program, whereas a concrete check only checks a specific value. These checks, called “universal”, mean that if the check passes, there is no possible value that the symbolic input could cause the check to fail on this program path.

Figure 1: Testing to make sure that the implementation includes the features described in the

specification will miss many security problems

As implementation fails to meet a particular requirement in testing, a bug is found, but even testing software functionally, it will miss many security problems because security problems are often not violations of the requirements. The intersection of requirements and implementation shown in Figure 1 [4] is usually fully under testing because concolic testing traverses each execution path of branch, but as we are eager to find bugs or security problems which are not violated requirements, we need additional constraints to guide constraint solver through specific execution path. In order for testing to be wide-ranging, one has to add additional universal checks which can assist us in

(12)

3

verifying whether the program fits to our special requirement (specification) or cause some vulnerable errors on each program path.

1.1.2. Dynamic Data Flow Analysis

Dynamic data flow analysis (DDFA) is a useful technique that helps us to decide if a sensitive variable can be tainted from user inputs. DDFA usually maintains a map which records each variable’s address with its taint tag and set its taint tag when tainted variable is assigned to it. However, it encounters some problems with library function call and side effect described in Table 1.

Table 1: DDFA instruments codes for each instruction pattern in C.

Pattern Instruction/Instrumentation Difficulties SSA lhs = rhs1 op rhs2;

taint_set_tag(&lhs, &rhs1, &rhs2);

Easy.

Outer user-defined function call

taint_push_in_arg(&arg1);

taint_push_in_arg(&arg2); ret = foo(arg1, arg2);

taint_set_tag(&ret, taint_pop_out_arg());

Easy.

Inner user-defined function call

int foo(int form1, int form2){

taint_set_tag(&form1, taint_pop_in_tag());

taint_set_tag(&form2, taint_pop_in_tag()); …

taint_push_out_arg(&fooret); return fooret;

}

Easy.

Library function call strncpy(dest, src, n);

taint_copy_buffer(dest, src, n);

Manually instrument codes according to different library calls. Side effect (global variable) lhs = global;

taint_set_tag(&lhs, &global);

Imprecision when global variable was set in library call.

When we attempt to find a block of memory to insert shell code, DDFA direct us to find a series of tainted memory.

(13)

1.1.3. Static Program Analysis

Static program analysis is automatic, complete, and scalable but imprecise because of false positives. It simulates program running on the abstraction (such as data flow, syntax tree, etc) instead of really running the program on concrete values. Static verification, syntax parser, and traditional symbolic execution are static program analysis techniques. Using some approaches, static program analysis techniques can deal with path explosion problem that concolic testing encounters.

PREfix performs path-sensitive, interprocedural analysis and walks the abstract syntax tree (AST) to explore various execution paths [8]. Then the symbolic execution state is tracked in a virtual machine for checking errors. Avoiding a plenty of false alarms, PREfix adopts some heuristic analyses neither sound nor complete. It makes simplifying approximations that analysis can trace for huge codes of program with incomplete information.

PREfast performs simple, intraprocedural checks on code and it is a “fast” version of the PREfix tool [9]. Based on pattern matching in the abstract syntax tree of the C/C++ program, certain PREfast analyses find simple programming mistakes. PREfix and PREfast take complementary approaches for finding errors. PREfix performs a complete, path-by-path analysis of an entire program to track information across function boundaries. In contrast, PREfast is a lightweight tool that looks for errors locally.

Lint [10] , a c program checker like a compiler, does lexical and syntax analysis on the input text, constructs and maintains symbol tables, and builds trees for expressions. It is one of the earliest static C program checker that can detect many program errors.

LCLint [11] , an extension of Lint, supports different levels of specification and takes advantage of abstract type to check errors without affecting the correctness of programs

ESC/Java [12], an application of theorem proving, catches more errors than conventional static checkers such as type checkers and uses an automatic theorem-prover to reason about the semantics of programs, which allows ESC to give static warnings about many errors that are caught at runtime.

(14)

5

The Bandera Tool Set [13], an integrated collection of program analysis, abstraction, and other model-checking tool such as Spin, dSpin, SMV, and JPF, addresses the problem of state explosion, model construction, requirement specification, output interpretation.

Cyclone [14], like a type theory checker, performs a static analysis on source code, and inserts run-time checks into the compiled output at places where the analysis cannot determine that an operation is safe.

1.1.4. Dynamic Program Analysis

It is sound because it runs the program on concrete inputs, and doesn’t have a lot of false alarms like static program analysis. The objective of dynamic program analysis is to determine the software quality of a system through collecting information and analyzing the results by running it.

1.1.5. Fuzz Testing (Fuzzing or Random Testing)

It is a software testing technique that provides random data to the inputs of a program and is often used for software development to perform black box testing because of its simple design, high speed, scalability, and free of preconceptions about system behavior. However, fuzz testing usually wastes much time to produce duplicate test inputs because of using pseudo-random number that depends on the seed value. Even if we use other random number generators to avoid the problem, it is better but still slow. [5]. The severest shortcoming is that there is no assurance of coverage for random testing [15]. For example, the probability that the conditional statement of the branch “if(x == 5){ …}” is ‘true’ to be executed is 1/232 if x is a 32-bit integer input and is randomly generated.

1.1.6. Symbolic Execution

Symbolic execution [16] is useful in the validation of software and can be used to aid in the generation of test input and program proving. Traditionally, it doesn’t execute a program with input values but run a program with symbolic values instead. The most common approach to use symbolic

(15)

execution is to analyze a program by traversing its flow graph from an entry point and collecting a list of assignment statements and branch predicates in terms of symbolic values along a particular path. After gathering these constraints, a test case can be generated by solving the path condition. It fails to scale for large software due to the limitations of underlying theorem proofers and symbolic analyzers [17].

1.1.7. Concolic Testing

Concolic testing combines random testing, concrete and symbolic execution to iteratively generate test inputs to traverse all feasible paths and to partly overcome the limitations of each testing technique. It statically instruments codes according to each instruction and dynamically collects symbolic constraints. Based on dynamic methods for test input generation, Concolic testing doesn’t have false positives. Based on symbolic execution, it generates concrete test inputs so that the coverage of concolic testing is better than random testing. The primary disadvantage of concolic testing is that it doesn’t scale to large programs because of path explosion problem. The primary advantage of concolic testing is that it is better than pure symbolic execution because of the presence of concrete values, which can be used both to reason precisely about complex data structures and to simplify constraints for the sake of aiding the underlying constraint solver. It is unlike the traditional testing techniques only based on symbolic execution or static analysis.

EGT [18], EXE [7], DART [19], Hybrid [20], CUTE [6] and LATEST [21] perform concolic testing, especially EXE addresses byte-level memory model, Hybrid combines with random testing switch, LATEST abstracts function call to reduce the execution path with the same branch coverage in main function.

1.1.8. Static Data Flow Analysis

Static data flow analysis, a mature technique adopted by complier and some application such as cqual[22], CIL[23], helps compiler generate the optimized assembly code from source code and

(16)

7

finds bugs like format string and command injection attack. By statically generating and analyzing AST (Abstract Syntax Tree), efficiency and scalability is its feature, but a large amount of false alarm is its disadvantage.

x = a + b; y = a * b; while(y > a) { a = a + 1; x = a + b; } Program x = a + b while y > a Block a = a + 1 ... ...

Figure 2: Static data flow analysis is based on abstract syntax tree.

1.2. Common Vulnerabilities

When a software contains a vulnerability, it may be exploitable. As long as a software is exploitable, it allows attackers from an authorized state to any unauthorized states to do harm to users. We introduce common vulnerabilities that cause wargames exploitable in section 3.

1.2.1. Buffer Overflow

Buffer overflow, the most notorious problem in software quality, is one of the OWASP Top 10 2004[25] and it causes many well-known attacks and worms shown in CERT/CC Advisories [26]. By overwriting memory fragments of the process, buffer overflow may control values of the instruction pointer, base pointer and other registers to direct the flow of the process and

(17)

execute arbitrary codes. Buffer overflow contains stack overflow and heap overflow.

1.2.1.1. Stack Overflow

Local variables are allocated on the stack, along with parameters and linkage information that where to resume execution after a function returns. When data is written outside of the

boundaries of the stack buffer, it is called stack overflow. Misuse of strcpy(), strcat(), gets(), sprint() or other analogous, self-implemented functions usually generates this kind of bug. Figure 3 shows a simple code with the stack overflow caused by gets function and Figure 4 shows the graph of the stack. When our input length is over than 256, we can write the data of b and further the data of the return address and ebp to control the flow of the process and execute arbitrary code. 1 #include <stdio.h> 2 #include <string.h> 3 void doit(int a) 4 { 5 int b=0; 6 char buf[256]; 7 gets(buf);

8 printf("%s %d %d\n", buf, b, a); 9 } 10 11 int main() 12 { 13 printf("Start doit\n"); 14 doit(10); 15 printf("End doit\n"); 16 17 return 0; 18 }

(18)

9

Figure 4: A simple illustration of stack segment in Figure 3.

1.2.1.2. Heap Overflow

Heap overflow is a kind of buffer overflow that occurs in the heap data area. The most difference with stack overflow is that heap overflow is hard to exploit with regard to the implementation of malloc() in C or new in C++.

1.2.2. Command Injection

The command injection attack is to inject and execute commands specified by the attacker in the vulnerable application. In most cases, it can be injected because of lack of correct input data validation, which can be manipulated by the attacker.

The following simple example accepts a filename as a command line argument, and displays the contents of the file. If the program runs with root privileges, the call to system() also executes with root privileges. If a user specifies a standard filename, the call works as expected. However, if an attacker passes a string of the form ";sh", then the call to system() fails to execute cat due to a lack of arguments and then execute sh with root privileges. Then, the

(19)

attacker can do anything that a root can do. 1 #include <stdio.h>

2 #include <unistd.h> 3

4 int main(char* argc, char** argv) { 5 char cmd[256] = "/usr/bin/cat "; 6 7 strcat(cmd, argv[1]); 8 system(cmd); 9 10 return 0; 11 }

Figure 5: An example with command injection vulnerability.

1.3. Motivation

Software bugs or errors are usually introduced by some inputs and cause program not to behave as designers expect. If we obtain inputs which trigger bugs, we can traverse the program in a systematic way to observe what inputs affect. Finding inputs which trigger bugs is important, but the premise is how to define bugs. One common definition of a software error is a mismatch between the program and its specification. Even though the specification is perhaps incorrect and then it induces the testing to find a non-realistic error [5], our target is to help developers obtain the information of what cause violation of specification as soon as possible. Take divide-by-zero as an example. CUTE [6] uses assert function which should be manually inserted into the check point every time when the division happens. Even thought EXE [7]uses universal checking by means of CIL to automatically insert the predicate into the check point, we can’t avoid writing an Ocaml module containing our analysis and transformation. As a consequence of the inconvenience, we want to design a C-like specification language to describe what we are eager to verify.

Even though we find bugs or errors in a program, there are often other hurdles that we have to overcome, especially whether or not the vulnerability exists in a program. For this reason, we make

(20)

11

our specification language flexible and feasible enough to describe a bug which is more complicated than a bug found by CUTE and EXE and it approaches to the problem causing vulnerability.

1.4. Objective

Our objective is to make a specification language for convenience of defining many properties and verify it by our new concolic testing tool CAST to generate exploits automatically. In order to find a bug which violates some properties and has some specific flow relation in a program, we try to integrate universal checks, static code analysis and dynamic data flow analysis into one.

1.5. A Motivation Example

To illustrate why we need to combine universal checks with dynamic data flow analysis, consider the program in Figure 6.

1 #include <stdio.h> 2

3 char Buffer[256];

4 int RespondHttpRequest(int Pos) 5 { 6 int kk=0; 7 char tmpCad[20] 8 char *p1; 9 int idx, j=0; 10

11 fgets(Buffer, sizeof(Buffer), stdin); 12 p1 = strstr(Buffer, "GET"); 13 if(p1 == NULL) 14 p1 = strstr(Buffer, "Get"); 15 if(p1 == NULL) 16 p1 = strstr(Buffer, "get"); 17 if(p1 != NULL){ 18 while(p1[j] != ' ' && p1[j]){ 19 tmpCad[kk++] = p1[j++]; 20 } 21 } 22

(21)

23 ... 24

25 return 0; 26 }

Figure 6: An exploitable example of buffer overflow

First, we describe the program’s objective shortly. It is common to copy a string to a local buffer after parsing. Then a buffer overflow occurs, because programmer forgets to check if the size of the target buffer is big enough. Therefore, in Figure 6, the variable kk will be set as tainted by dynamic data flow analysis, and universal checks check if kk is greater than tmpCad‘s bound. Traditional concolic testing is usually applied to detect buffer overflow, but it doesn’t know how to exploit it. However, we model a new object map and a byte-level symbolic execution to figure out the problem such that DDFA can check if the return address is tainted and if there are a lot of continuous symbolic values stuffed with shell codes.

(22)

13

2. Testing Sequential Programs with CAST Specification

In this chapter, we present an idea to combine concolic testing with a specification language for describing many kinds of properties and focus on testing sequential programs. Concolic testing [6] [17] can automatically explore all of feasible execution paths of sequential programs, where concolic denotes cooperative CONCrete and symbOLIC execution. And we define a specification language to instrument the program at the feasible place where we check security properties with the help of universal check. Our concolic testing tool, CAST modified from ALERT [27, 28], helps us explore all feasible paths, especially those paths causing vulnerability.

2.1. CAST Architecture

We can view our concolic testing tool, CAST modified from ALERT [27-29], as a program path explorer to traverse all path of the execution tree of a program and then check every property described in a specification on each path. The components and the flow chart of CAST are shown in Figure 7 and described in detail as below.

(23)

source code .C .h preprocessor CIL simplification constraint solver CIL instrumenter next-path input c compiler self-tested program simplified source code .C .h symbolic execution library CVC3 theorem prover library 1. 2.

specification source code

.C .h preprocessor CIL simplification constraint solver CIL instrumenter next-path input exploit input c compiler self-tested program property checking result

CAST parser simplified source code .C .h symbolic execution library CVC3 theorem prover library dynamic tainted data flow library 1.

2.

(a)ALERT (b)CAST

Figure 7: ALERT and CAST system architecture flow graph.

2.1.1. CIL Simplification

CIL (C Intermediate Language) [23] is a tool that permits easy analysis and source-to-source transformation of C programs, then we uses CIL to instrument the C program for symbolic execution,

(24)

15

universal checks and dynamic data flow analysis. In order to instrument the code conveniently, we use CIL modules to simplify the structure of the code and preserve its semantics. We simply describe the simplification steps and functions as below:

1. Simplified Control Flow (cilly --domakeCFG)

We use this module to simplify the control flow structures described as below: A. Loop statement (while, for and do-while)

Each loop statement will be changed into a while(1) statement with if-else statements, goto statements and break statements.

B. Branch statement (if, if-elseif-else and switch)

Each branch statement will be transformed into if-else statements. C. Predicate of if statement (if(A && B), if(A || B); A,B are C expression)

Each predicate of if statement will be divided into many if-else statements that each predicate of them is a binary relation expression (such as if(x==y); x,y are variables). 2. Simple Three-Address (cilly --dosimplify)

The simplify.ml module simplifies the expressions of a program to gives us a form of three-address code. Moreover, all sizeof and alignof forms are turned into constants. Accesses to arrays and structures whose address is taken are turned into "Memory" accesses and all index and field computations are turned into address arithmetic.

3. Simple Memory (cilly --dosimpleMem)

The simplemem.ml module makes CIL lvalues that contain more than one memory access even further simplified via the introduction of well-typed temporary variables. After this simplification, all CIL lvalues involve at most one pointer dereference.

4. One Return (cilly --dooneRet)

The oneret.ml module transforms each function body to have at most one return statement.

(25)

2.1.2. CAST Parser

CAST parser, written in Ocaml, reads specification and translates it into CIL instrumenter to statically analyze source codes for instrumenting specification check (a complex universal check) at appropriate position.

2.1.3. CIL Instrumenter

CIL instrumenter statically analyzes simplified source codes to instrument it with our dynamic analysis functions according to each instruction pattern. We show every kind of instruction with instrumented codes in Table 2.

Table 2: CIL instrumenter instruments three-address code. Pattern Instruction/Instrumentation

One-Address _sqPush( para1, T_INT, (unsigned int )(& para1));

_sqPush( para2, T_INT, (unsigned int )(& para2));

_sqPush( para3, T_INT, (unsigned int )(& para3)); function(para1, para2, para3)

Two-Address _sqPush( para1, T_INT, (unsigned int )(& para1)); _sqPush( para2, T_INT, (unsigned int )(& para2)); _sqPush( para3, T_INT, (unsigned int )(& para3)); LHS = function(para1, para2, para3);

_sqPopReturnValue( T_INT, (unsigned int )(& LHS)); LHS = RHS;

_sqSymExec(T_INT, (unsigned int )(& LHS), OP_NOP,

"RHS", (unsigned long long )RHS, T_INT, (unsigned int )(&RHS), "SQ_constant", 0, T_INT, 0, T_NON);

LHS = (long)RHS;

_sqSymExec(T_INT, (unsigned int )(& LHS), OP_NOP,

"RHS", (unsigned long long )RHS, T_INT, (unsigned int )(&RHS), "SQ_constant", 0, T_INT, 0,

T_LONG); LHS = *RHS;

sqPointerRead( (unsigned int )(&LHS), T_INT, (unsigned int )RHS, (unsigned int )(&RHS));

(26)

17 *LHS = RHS;

sqPointerWrite((unsigned int )LHS, (unsigned int )(& LHS), T_INT, (unsigned long long )RHS, (unsigned int )(& RHS));

Three-Address LHS = Operand1 op Operand2;

_sqSymExec(T_INT, (unsigned int )(& LHS), OP_PLUS,

"Operand1", (unsigned long long )Operand1, T_INT, (unsigned int )(& Operand1), "Operand2", (unsigned long long )Operand2, T_INT, (unsigned int )(& Operand2), T_INT);

op: +, -, *, /, etc

These instrumented functions trace data flow and collect symbolic constraints to dynamically make symbolic execution based on operands’ type, operator, and operands’ name got from object map by operand’s address. In Figure 8, a simple C program is simplified to three-address codes with a lot of CIL temporary variables for the convenience of instrumentation. Because of CIL

simplification, pattern matching of CAST to instrument according to a specification has some difficulties we discuss in section 2.3.1.

(27)

(28)

19 2.1.4. Self-Tested Program

When making a program to test it by itself, we should figure out the problem of how to input a value generated from constraint solver. As long as a variable is symbolic, that means its tainted origin comes from user’s setting (make_symbolic(&i, T_INT)) or stdin function like read(0, buf, size), fgets(buf , size , stdin), etc. If symbolic variable is set by user, we write its value into a file and read the file next time to make program to run next path. Otherwise, we write it into a file and pipe the file to the program next time to make program to run next path. Writing a shell script to

automatically run self-tested program is easy to make all the process automatic.

2.2. Implementation

We show what CAST modifies ALERT to make it possible to test a vulnerable program and find exploits in this section. First, we simply describe symbolic execution applied to constraint solver; then secondly depict how to utilize symbolic execution to model a tested program into byte level precision. Thirdly, how to maintain an object map for recording sensitive data is shown in section 2.2.3. Fourth, the most important issue about symbolic pointer dereference shown in section 2.2.4 discuss how to query constraint solver for extra execution paths.

2.2.1. Symbolic Execution

Automatic theorem prover such as CVC3[30] we use in this paper, a tool to solve Satisfiability

Modulo Theories (SMT) problems, is the kernel of symbolic execution such that symbolic execution is limited by the power of underlying theorem prover. We will explain how symbolic execution works in Figure 9 and turn it into CVC3 file format (.cvc) to experiment for finding why it fails without symbolic execution.

(29)

Figure 9: How to model a C program with symbolic execution.

When not using symbolic execution in Figure 9(a), we collect the constraint formula is (i>100 AND i=i+1) and then find that we can’t make the difference between (i>100 AND i=i+1) that means (if(i>100) i=i+1;) and (i=i+1 AND i>100) that means (i=i+1; if(i>100);). Therefore, to verify this, we turn the formula (i>100 AND i=i+1) to CVC3 query language shown in Figure 9(b) and call theorem prover, then get the error message “symbolic.cvc:7: this is the location of the error”. It, further, means that the solver doesn’t know the formula is the case of Figure 9(c) (i1>100 AND i2=i1+1) or Figure 9(d) (i2>100 AND i2=i1+1), whereas after applying symbolic execution into Figure 9(a), we have the constraint formula (i1>100 AND i2=i1+1) which is the interpretation of the path of the program in Figure 9(a) that input i passes through line 3 and line 4.

2.2.2. Symbolic Name and Symbolic Byte Name

(30)

21

problem of type size in C comes with it. In Figure 10, the variable a is equal to a4@a3@a2@a1 (where “@” denotes bitvector concatenation, and we use little-endian order for multi-byte values) due to buffer overflow that sets a1, a2, a3, a4 to be symbolic inputs. ALERT collect symbolic expression (buf[4]_0 = a1_0 && buf[5]_0 = a2_0 && buf[6]_0 = a3_0 && buf[7]_0 = a4_0) from line 13 to line 16 and try to query (a_0 = 5566) at line 17. Then it fails to generate inputs of the next path, because it doesn’t make the equation of a_0 and buf[7]_0@buf[6]_0@buf[5]_0@buf[4]_0. CUTE and CREST use the same symbolic expression memory model to map &buf[4] to a1 at line 13 and to get a1 from finding &a in the map at line 17 to ask constraint solver if a1 = 5566. Then, they can’t generate next path inputs because the symbolic expression memory map model is not byte-sensitive enough to get the correct symbolic expression a4@a3@a2@a1=5566. After running the case in CUTE and CREST, we make the conclusion as above.

1 #include <crest.h> 2 #include <stdio.h> 3

4 int main(void) 5 {

6 char a1, a2, a3, a4; 7 int a;

8 char buf[4];

9 make_symbolic_char(&a1);//make a1 as symbolic char input 10 make_symbolic_char(&a2); 11 make_symbolic_char(&a3); 12 make_symbolic_char(&a4); 13 buf[4] = a1; 14 buf[5] = a2; 15 buf[6] = a3; 16 buf[7] = a4; 17 if(a==5566) 18 fprintf(stderr,"a==5566\n"); 19 else 20 fprintf(stderr,"a!=5566\n"); 21 }

Figure 10: A C program with symbolic name problem.

(31)

check each byte of variable a by recording each bytes into symbolic expression memory map. Therefore, we can assert the formula (getByteExpr((char *)&a+3) @ getByteExpr((char *)&a+2) @ getByteExpr((char *)&a+1) @ getByteExpr((char *)&a) = 5566) to ask the correct symbolic expression (a4_0@a3_0@a2_0@a1_0=5566). EXE claims that they implement byte-level precision, but we can’t get the program to test the case.

2.2.3. Object Map

CAST implements an object map like CRED[32], which dynamically records the information of all variables, especially ebp and return address of each function about its memory address and type size that help us make extra symbolic assignment constraints by _sqSymExec() and symbolic branch constraints by _sqAddPredicate() to query not only precise but also exploitable inputs. We add/remove the object information of every function into object map when program enter/exit each function. Further, adding sensitive data such as global offset table, return address, etc into object map provides a chance to calculate more exploitable paths. Avoiding getting unknown objects from buffer overflow, we fill every stack frame with stuff objects by sqAddStuff() to make sure that symbolic execution works as usual.

1 unsigned *sqEbp; 2

3 int testme(int a) 4 {

5 asm("movl %%ebp, %0\n": "=g" (sqEbp));

6 sqAdd((unsigned int) sqEbp+4, sizeof(int), "testme[ret]"); 7 sqAdd((unsigned int) sqEbp, sizeof(int), "testme[ebp]"); 8 sqAdd((unsigned int )(&a), sizeof(a), "a");

9 sqAddStuff(); 10

11 a=a+1; 12

13 sqRemove((unsigned int)_sqEbp +4); 14 sqRemove((unsigned int) sqEbp; 15 sqRemove((unsigned int)(&a));

(32)

23 16 sqRemoveStuff();

17

18 return a; 19 }

Figure 11: Instrumentation of inserting and removing object map.

GCC provides a large number of built-in functions for programmers to observe program, such as __builtin_frame_address() and __builtin_return_address(). When using these functions to get ebp and return address, we encounter some problems about its imprecision after calling some standard library. For the sake of getting precise ebp to solve the problem of imprecision, we write assembly code of the line 5 in Figure 11 such that an extra global variable sqEpb is necessary to record current ebp. Therefore, CAST can dynamically check the address from start to end of each frame to ensure that each byte is added into object map for symbolic byte name execution.

2.2.4. Symbolic Pointer Read/Write with Symbolic Value

One of the most notorious forms of attack to C program is buffer overflow which generates extra execution paths in which we are interested. Table 3 shows the cases that cause different kinds of buffer overflow with different exploitable levels of vulnerabilities. Here, we assume that variable a, b[i], *p and i are integer types. In the case (1), the variable a can be any value in the memory such that it has an influence on the branches directed by the variable a. The case (2) means that we can write 4 bytes at any memory location with value a. The most dangerous behavior of a program is the case (3) that users can write any value into any memory location, especially certain sensitive data like return address.

Table 3: The cases of buffer overflow cause different vulnerabilities to a C program.

Case Pattern Concrete Symbolic Exploitable (1)Symbolic Buffer Index/Symbolic Pointer

Reading

a = b[i]; B i possible a = *p; p possible (2) Symbolic Buffer Index/Symbolic Pointer b[i] = a; b, a i possible

(33)

Writing *p = a; a p possible (3)Symbolic Buffer Index/Symbolic Pointer

Writing with Symbolic Assignment

b[i] = a; b i, a achievable *p = a; p, a achievable Since CUTE and CREST attempt to efficiently traverse all feasible execution paths in a program, extra paths are not handled after bugs execution shown in Table 3, in the mean time they also cannot achieve printf(“GOAL”) in Figure 12. ALERT and EXE, however, handle it as a normal access to any element of buffer, so they can achieve all branches except printf(“sp is written by buffer overflow”) at line 17 and line 19.

1 int main() 2 { 3 int i, sp=0, a[4]={1,2,3,4}; 4 make_symbolic_int(&i);//make i symbolic 5 a[i]=10; 6 if(a[0]==10) 7 printf(“GOAL”); 8 if(a[0]==1) 9 printf(“1”); 10 if(a[1]==3) 11 printf(“2”); 12 if(a[2]==5) 13 printf(“3”); 14 if(a[3]==2) 15 printf(“4”); 16 if(a[4]==10)

17 printf(“sp is written by buffer overflow”); 18 if(sp==10)

19 printf(“sp is written by buffer overflow”); 20

21 return a; 22 }

Figure 12: A C program of buffer accesses with symbolic index.

Encountering a[i]=10 at line 5, EXE generates a big disjunction constraint (i == 0 && a[0] == 10) || (i == 1 && a[1] == 10) || (i == 2 && a[2] == 10) || (i == 3 && a[3] == 10). CAST generates a bigger disjunction constraint (i == 0 && a[0] == 10) || (i == 1 && a[1] == 10) || (i == 2 && a[2] ==

(34)

25

10) || (i == 3 && a[3] == 10 || … || (i==n && a[n]==nearest_return_address)) than EXE’s constraint for the purpose that we can explore extra execution path. Thus, printf(“sp is written by buffer overflow”) at line 17 and line 19 can be achieved by CAST.

2.2.5. Use Object Map to Construct Shellcode Constraints

When a symbolic pointer assigned with symbolic value taints a return address, we want to find a continuous memory large enough to allocate shellcode and make return address to point the start of the shellcode.

void vc_mkShellCodeExpr(char *shellcode){ vector<object>::iterator objmap_it;

vector<object>::iterator omtmp_it; /* objmap temp iterator */

for(objmap_it = objmap.begin(); objmap_it != objmap.end(); objmap_it++){ omtmp_it = objmap_it;

for(i=0; i< strlen(shellcode); i++){ // check if there are continuous memory if(omtmp_it != objmap.end())

andExpr &= (omtmp_it->expr = shellcode[i]); omtmp_it++;

}

orExpr |= (andExpr & retAddrExpr == objmap_it); }

vc->assertFormula(orExpr); /* universal check for shellcode */

vc->query(vc->falseExpr())); /* constraint solver check if it is valid */ }

(35)

2.2.6. Dynamic Tainted Data Flow for Function Call

When testing C programs by concolic testing, we must pass symbolic execution and tainted data flow to the parameter of function call. Instrumentation in Table 4 shows how to instrument a function when entering and exiting in order to make the equation of arg1 = para1 and LHS = ret_val.

Table 4: CIL instrumenter instruments functions code. Case Instruction/Instrumentation

(1)One-Address _sqPush( arg1, T_INT, (unsigned int )(& arg1)); _sqPush( arg2, T_INT, (unsigned int )(& arg2)); _sqPush( arg3, T_INT, (unsigned int )(& arg3)); function(arg1, arg2, arg3)

(2)Two-Address _sqPush( arg1, T_INT, (unsigned int )(& arg1)); _sqPush( arg2, T_INT, (unsigned int )(& arg2)); _sqPush( arg3, T_INT, (unsigned int )(& arg3)); LHS = function(arg1, arg2, arg3);

_sqPopReturnValue( T_INT, (unsigned int )(& LHS)); (3)Function int function(int para1, int para2, int para3){

__parameterToSymbolic((unsigned int )(& para1), T_INT); __parameterToSymbolic((unsigned int )(& para2), T_INT); __parameterToSymbolic((unsigned int )(& para3), T_INT); …

_sqPushReturnValue("ret_val", (unsigned int )(& ret_val), (unsigned int )ret_val, T_INT);

return ret_val; }

2.2.7. Standard Library Testing by Using uclibc instead of glibc

A C program usually calls library function, but when it is tested by concolic testing, library function is not instrumented like case 3 in Table 4. So we cannot collect the correct symbolic execution and tainted data flow after calling library function. The approach proposed by ALERT[27] assumes that constraints are generated by our human logical reasoning without instrumentation such that it is fast in concolic testing but time-consuming for us to reason a library function. Instead of

(36)

27

manually reasoning library, we instrument library with source code. We instrument uclib rather than

glibc because uclibc is tiny and easy for debugging. Take function strcmp in string.h header file as an example, we use CIL to modify its name to alert_strcmp for each tested source code such that testing a C program calling standard library is just complied with uclibc object file compiled by gcc and instrumented by CAST.

2.2.8. Post-condition for standard library fgets and read from stdin

In the cause of testing programs with fgets and read from stdin, we apply the approach [27] to make post-condition constraints for fgets and read. Instead of instrumenting each stdin library function, we instrument each fgets and read from stdin with _sqPostFgets() and _sqPostRead for making symbolic byte name to each destination buffer.

2.2.9. Environment

Table 5 shows the environment and tools CAST uses.

Table 5: Environment of CAST Operating System ubuntu-7.04-desktop-i386

Memory 1.5G

CPU Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz Compiler gcc 4.1.2

Instrumenter CIL Theorem Prover CVC3

2.3. Syntax of Specification

Check properties are expressed using observer concolic testing with universal checks. These provide a way to specify vulnerable properties of C programs based on syntactic pattern matching of C code. The definition of an observer concolic testing consists of a set of declarations, each defining an

(37)

observer global variable, or a property. Figure 14 gives the grammar for specifying observer concolic testing.

Observer: DeclSeq

DeclSeq: Declaration | DeclSeq Declaration Declaration: “GLOBAL” CVarDef

| “PROPERTY” ‘{‘ Temporal

“LOCATION” { #line num "filename" } “PATTERN” ‘{‘ CStmt ‘}’

“PRIORITY” ‘{‘ CIng ‘}’ “ASSERT” ‘{‘ CCondExpr ‘}’ “SHELLCODE” ‘{’ CString ‘}’

‘}’

Temporal: “BEFORE” | “AFTER” | “” SVar : ‘$’Num

Num : 1~65535

Figure 14: The grammar of the CAST specification language

Observer global variable is instrumented into codes to analyze and determine some properties. Each observer global variable may have any C primitive type, and is declared following the keyword GLOBAL, where the nonterminal CVarDef stands for any C variable declaration.

Each property observes all program steps, if a pattern is matched, specifies what the property checks. The keyword PROPERTY is followed by up to five parts: a temporal qualifier, a priority, a location, a pattern, an assertion, and a shellcode. If more than one pattern matches, then CAST instrument the program with assertion at each matched point according to Temporal qualifier. Temporal qualifier is either BEFORE or AFTER. It specifies whether the assertion is checked before or after the source code that matches the pattern. The keyword PRIORITY is followed by a

(38)

29

C integer taken as the serious level to fix. The keyword ASSERT is followed by a C condition expression taken as a formula checked by universal checks. At each universal check point in the program execution, the observer checks the current program constraints formula with assertion formula. The keyword PATTERN is followed by a C statement that is matched against the program source code. The pattern is defined by the nonterminal CStmt, not a real C statement; it now only supports instruction list. Sometimes, there is no pattern to match, therefore the keyword LOCATION followed by a line number and a file name specifies where to be instrumented with universal checks. The keyword SHELLCODE is followed by a C string that is generated by Metasploit [33] to be executed after that any return address is overwritten. Finding if there are continuous symbolic memories to allocate this string is easy to query object map and make symbolic equation with each symbolic bytes and the shellcode.

2.3.1. Pattern Matching

During the execution of concolic testing we calculated a number of statistics on the performance of the pattern matching that resulted in the following interesting observations:

Instruction set (a = b + c) dominates all others. In most cases they accounted for more than 85% of all instructions.

Function call and other instruction sets (a = b op c, where op is not ‘+’) are the other remaining. Thus, in order for pattern matching to be efficient, CAST scans the program to match the less frequency instruction set, if any, of the pattern first. Then it sequentially searches the other instruction sets from the position of the matched instruction set backward and forward. In Figure 15, the steps of searching and matching the instruction follow the number in the middle of single arrow in increasing order. The multiplication of the expression like ($1*2) rarely appears in the source code; then we search it first. The dotted arrow line means it is unmatched, otherwise it is matched.

(39)

$3 = foo($1, tmp); $1 = $1 * 2; tmp = $2; $2 = $2 + 1; sum = foo(i, tmp); i = i * 2; tmp2 = j; j = j + 1; ... ... ... ... ... tmp1 = buf; tmp3 = arr; tmp3 = tmp3 + 1; k = k +1; 1 4 7 10 Pattern{ $3 = foo($1*=2, $2++); } 3 2 5 6 8 9 pattern source code CIL simplification

Figure 15: The method of efficient pattern matching.

As CIL simplifies source codes into three-address codes, pattern might as well be simplified such that pattern matching will be precise. As downside, CIL simplification increases times of pattern matching. On the other hand, it makes simple semantic pattern matching.

2.3.2. Universal Check (Specification Check Instrumentation)

Intuitively, at each point in the program execution, the observer checks the current program statement (i.e., AST node) being executed according to the ASSERT of each property. A universal check, used for checking specification assertion, we call it specificationCheck given in Figure 16.

(40)

31

The function specificationCheck instrumented by pattern matching can check complex logical expressions such as integer overflow. We will show more other specifications that check vulnerabilities and generate exploits in the section 5.

1 int main() 2 {

3 int a, b, c; 4

5 c = a + b;

6 specificationCheck("(a>=0 && c<b) || (a<0 && c>b)”, "c a b", "T_INT T_INT T_INT", 3, (long long)c, (long long)a, (long long)b, &c, &a, &b);

7 return 0; 8 }

Figure 16: Check integer overflow with universal checks.

2.4. Semantic of specification

The semantics of a trace property is given by running the concolic testing with the program. CAST accepts a trace property every time a pattern matches, the corresponding universal check is valid, and moreover, if shellcode is filled, then the string of the shellcode stuffs itself with the program memory.

2.4.1. Specification of Integer Overflow

Because our pattern matching is type-sensitive, type of each variable should be declared first. A post-condition of integer overflow in addition shown in guard block will be checked after each integer addition like the pattern.

1 svar int $1; 2 svar int $2; 3 svar int $3; 4

(41)

6 after

7 pattern { $1 = $2 + $3; }

8 guard { ($2>=0 && $1<$3) || ($2<0 && $1>$3) } 9 }

(42)

33

3. Experiment

We will discuss the specification of test cases that come from the course of secure programming in this section. These test cases, called wargame, are used for training students to understand the security problem of vulnerable programs.

3.1. Wargame 1 (Exploiting a Bug without ShellCode)

Wargame 1 is a simple C program that can be exploited by buffer overflow because of the misuse of fgets(). Then it can enter an unauthorized region at line 29 without the correct password.

1 #include <stdio.h> 2 #include <string.h> 3 #include <unistd.h> 4 #include <sys/types.h> 5 #include <fcntl.h> 6 char pass[8]; 7

8 int main(int argc, char **argv){ 9 10 FILE *fp; 11 int i = 0, auth = 0; 12 char buf[8]; 13 14 printf("Input passwd: "); 15 fgets(buf, 20, stdin); 16

17 if((fp = fopen("/home/wargame1/passwd", "r")) == NULL) { 18 printf("fopen error!\n"); 19 return 1; 20 } 21 fgets(pass, sizeof(pass), fp); 22 pass[strlen(pass)-1] = '\0'; 23

24 for( ; i < strlen(buf); ++i) 25 if(buf[i]<'a'|| buf[i]>'z') 26 return 1;

(43)

27 if(!strcmp(buf, pass)) 28 auth = 1;

29 if(auth == 1 && buf[0] == '0'){ 30 char fname[32];

31 uid_t uid = getuid();

32 sprintf(fname, "/home/wargame1/checkin/%u", uid); 33 open(fname, O_CREAT | O_WRONLY, 0000);

34 }

35 return 0; 36 }

Figure 18: Wargame 1 contains a buffer overflow.

After instrumenting fgets(), auth and i are set symbolic variables because of 19 bytes tainted by stdin from the address of buf to i shown in Figure 19.

Variable Buf auth i

Address 40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f 50 51 52 53 Figure 19: Simple memory allocation of wargame 1.

Concolic testing, however, is able to get all concrete values of a tested program such that it always generates an input with bounded values at line 24 and correct password at line 27. Avoiding to this, we write a specification to check the value of auth, i and buf[0] at line 23.

1 #include <stdlib.h> 2 #include <stdio.h> 3 4 svar int $1; 5 svar int $2; 6 svar int $3; 7 8 svar int i; 9 svar int auth; 10 svar char buf[8]; 11

12 property { 13 before

14 location { #line 23 "wargame1.c" }

15 pattern { $1 = i > 20 && auth == 1 && buf[0] == 30; } 16 guard { i > 20 && auth == 1 && buf[0] == 30 }

(44)

35 17 }

Figure 20: Specification of wargame 1.

To enter line 29 should be assured that like the guard in Figure 20 and to check this should be at line 23 described by label location. The pattern is nothing important but not empty to make CAST to work well.

3.2. Wargame 2 (Exploiting a Bug with Command Injection)

Wargame 2 can be exploited by command injection at line 46 because it calls system() without correct input data validation.

1 #include <stdio.h> 2 #include <stdlib.h> 3 #include <string.h> 4 #include <errno.h> 5 6 #define BIGSIZ 15 7 char * Hmalloc (size) 8 unsigned int size; 9 {

10 unsigned int s = (size + 4) & 0xfffffffc; /* 4GB?! */ 11 char *p = malloc (s); 12 if (p != NULL) 13 memset (p, 0, s); 14 else 15 printf("Hmalloc %d failed", s); 16 return (p); 17 } /* Hmalloc */ 18

19 unsigned int findline (fbuf, siz) 20 char * fbuf;

21 unsigned int siz; 22 {

23 char * p; 24 int x;

25 if (! fbuf) /* various sanity checks... */ 26 return (0);

(45)

28 return (0); 29 x = siz;

30 for (p = fbuf; x > 0; x--) { 31 if (*p == '\n') {

32 x = (int) (p - fbuf);

33 x++; /* 'sokay if it points just past the end! */ 34 return (x); 35 } 36 p++; 37 } /* for */ 38 return (siz); 39 } /* findline */ 40 41 void testcmd(char *p){ 42 char buf[24]; 43 strcpy(buf, "/usr/bin/cal \'"); 44 strcat(buf, p); 45 strcat(buf, "\'"); 46 system(buf); 47 } 48

49 unsigned int insaved = 0; /* stdin-buffer size for multi-mode */ 50

51 int main(int argc, char **argv) 52 {

53 char **tav;//replace argv for test 54 char *cp;

55 int x; 56

57 cp = argv[0];

58 tav = (char **) Hmalloc (128 * sizeof (char *)); 59 tav[0] = cp; /* leave old prog name intact */ 60 cp = Hmalloc (BIGSIZ);

61 tav[1] = cp; /* head of new arg block */ 62 fprintf (stderr, "Cmd line: ");

63 insaved = read (0, cp, BIGSIZ); cp[BIGSIZ-2] = '\n'; 64 x = findline (cp, insaved); //ok for cp

65 if (x)

66 insaved -= x; /* remaining chunk size to be sent */ 67 cp = strchr (tav[1], '\n');//tav[1] ok

68 if (cp)

69 *cp = '\0'; 70 cp = tav[1];

(46)

37

72 x = 2; /* we know tav 0 and 1 already */ 73 for (; *cp != '\0'; cp++) {

74 if (*cp == ' ') {

75 *cp = '\0'; /* smash all spaces */ 76 continue; 77 } else { 78 if (*(cp-1) == '\0') { 79 tav[x] = cp; 80 x++; 81 } 82 } /* if space */ 83 } /* for cp */ 84 argc = x; 85 switch (tav[1][1]) { 86 case 'x': 87 if(tav[2]!=NULL) 88 testcmd(tav[2]); 89 break; 90 default: 91 errno = 0;

92 printf("\033[1;33mnc -h for help\033[0m"); 93 }

94 return 0; 95 }

Figure 21: Wargame 2 contains a command injection.

Concolic testing dynamically analyzes tainted data flow from read() to the argument of system() call. Thus, if the argument of system() is symbolic, how to make it to execute our desired system commands is easy to add a constraint formula into constraint solver to solve if the string equals to a illegal string (“;”, “&”, “&&”, “|”, “||”) followed with a command.

1 svar char* $1; 2

3 property { 4 before

5 pattern { system($1); }

6 guard { $1[14] == '2' && $1[15] == '\'' && $1[16] == ';' && $1[17] == 's' && $1[18] == 'h' }

7 }

(47)

Therefore, checking for wargame2 if it can be exploited with command injection, we make sure that it can execute `sh` after CAST generating an input fitting to our specification.

3.3. Wargame 3 (Exploiting a Buffer Overlfow Bug with ShellCode Injection)

Wargame3 contains a buffer overflow at line 34 in Figure 23. It copies the content from a global buffer Buffer to a local buffer tmpCad, but Buffer’s size is greater than tmpCad’s size such that it can cause buffer overflow.

1 #include <stdio.h> 2 #include <string.h> 3 #include <stdlib.h> 4 5 char Buffer [140]; 6 int SirveWeb = 0;

7 char *CabeceraHTTP = "HTTP/1.0 200 OK\nContent-Type:

text/html\n\n<html><head><title>test</title></head><body><H1><Ce nter> HELLO</H1><HR></Center>";

8 char *FinalHTTP = "<br><hr></center></body></html>"; 9 int MaxPartidas; 10 int PuertoWeb; 11 int PuertoMus; 12 int PosPrimeraPartida = -1; 13 int NumPartidas; 14 unsigned *w3ebp;

15 void RespondeHTTPPendiente(int Pos) 16 { 17 int j, kk, faltan, tmp; 18 char tmpCad[64], *p1, *p2; 19 FILE *f; 20 21 Buffer[130] = 0; 22 p1 = strstr(Buffer, "GET"); 23 if (p1 == NULL) 24 p1 = strstr(Buffer, "Get"); 25 if (p1 == NULL) 26 p1 = strstr(Buffer, "get");

(48)

39 27 if (p1 != NULL) { 28 /* Bug position */ 29 j = 5; 30 kk = 0; 31 if (j < strlen(p1)) 32 while (p1[j] != ' ' && p1[j]){ 33 tmpCad[kk] = p1[j]; 34 kk++; 35 j++; 36 } 37 tmpCad[kk] = '\0'; 38 }

39 printf("Sirve Web [%s]\n", tmpCad);

40 if (SirveWeb && strcmp(tmpCad, "/") && tmpCad[0]) {

41 if (strstr(tmpCad, ".jar") || strstr(tmpCad, ".class")) 42 p1 = "application/x-java";

43 else if (strstr(tmpCad, ".jpg") || strstr(tmpCad, ".jpeg")) 44 p1 = "image/jpeg";

45 else if (strstr(tmpCad, ".gif")) 46 p1 = "image/gif"; 47 else if (strstr(tmpCad, ".txt")) 48 p1 = "text/plain"; 49 else 50 p1 = "text/html"; 51

52 sprintf(Buffer, "./web/%s", tmpCad); 53 f = fopen(Buffer, "rb");

54 if (f != NULL) {

55 sprintf(Buffer, "HTTP/1.0 200 OK\nContent-type: %s%c%c", p1, 10, 10);

56 write(1, Buffer, strlen(Buffer)); 57 do { 58 j = fread(Buffer, 1, 140, f); 59 if (j) 60 write(1, Buffer, j); 61 } while (j == 140); 62 } else {

63 strcpy(Buffer, "HTTP/1.0 404 FILE NOT

FOUND\nContent-type: text/html\n\n<h1>404 FILE NOT FOUND.</h1>"); 64 write(1, Buffer, strlen(Buffer));

65 }

66 if (f != NULL) 67 fclose(f); 68 } else {

(49)

69 write(1, CabeceraHTTP, strlen(CabeceraHTTP));

70 sprintf(Buffer, "Puerto por el que escucha este servidor de partidas: %d <br> Numero maximo de partidas simultaneas: %

71 strcat(Buffer, "<br>Listado de partidas en

curso:<br><UL>");//problem global constant is not sqAdd in alert... 72 write(1, Buffer, strlen(Buffer));

73 write(1, FinalHTTP, strlen(FinalHTTP)); 74 }

75 }

76 int main(int argc, char **argv) 77 {

78 int j , k, l, p; 79 int PonAyuda, PonAyudb; 80 int NuevoSocket; 81 time_t Ahora; 82 time_t Ahorb; 83 84 85 PonAyuda = 0; 86 MaxPartidas = 5; 87 PuertoWeb = 0; 88 PuertoMus = 0; 89 k = 0; 90 91 srandom(time(NULL)); 92 93 for (j = 0; j < argc; j++) { 94 if (argv[j][0] == '-') { 95 switch (argv[j][1]) { 96 case 'w': 97 k = atoi(argv[j + 1]); 98 if (k > 0) 99 PuertoWeb = k; 100 break; 101 case 'p': 102 k = atoi(argv[j + 1]); 103 if (k > 0) 104 PuertoMus = k; 105 break; 106 case 'm': 107 k = atoi(argv[j + 1]); 108 if (k > 0) 109 MaxPartidas = k; 110 break;

(50)

41 111 case 'd': 112 SirveWeb = 1; 113 break; 114 default: 115 PonAyuda = 1; 116 } 117 } 118 } 119 120 k = read(0, Buffer, 140); 121 if (k > 0) 122 RespondeHTTPPendiente(j); 123 else if (k == 0) {

124 fprintf(stderr, "error: recv!!\n"); 125 }

126

127 return 1; 128 }

CAST:自動化動態軟體驗證工具

國

立

交

通

大

學

資訊科學與工程研究所

碩

碩

碩

碩

士

士

士

士

論

論

論

論

文

文

文

文

CAST:自動化動態軟體驗證工具

CAST: Automatic and Dynamic Software Verification Tool

研 究 生：林友祥

指導教授：黃世昆 教授

中

中

中

中 華

華

華

華 民

民

民

民 國

國

國

國 九

九

九

九 十

十 八

十

十

八

八

八 年

年

年

年 六

六

六

六 月

月

月

月

CAST:自動化動態軟體驗證工具

CAST: Automatic and Dynamic Software Verification Tool

研 究 生：林友祥 Student：You-Siang Lin

指導教授：黃世昆 Advisor：Shih-Kun Huang

國 立 交 通 大 學

資 訊 科 學 與 工 程 研 究 所

碩 士 論 文

CAST:

CAST:

CAST:

CAST:自動化動態軟體驗證工具

自動化動態軟體驗證工具

自動化動態軟體驗證工具

自動化動態軟體驗證工具

學生：林友祥 指導教授：黃世昆 教授

國立交通大學資訊科學與工程研究所碩士班

摘要

摘要

摘要

摘要

CAST: Automatic and Dynamic Software Verification Tool

研究生：林友祥

指導教授：黃世昆教授

中華

華民

民國

國九

九十

十八

八年

年六

六月

研究生：林友祥 Student：You-Siang Lin

國立交通大學

資訊科學與工程研究所

碩士論文

學生：林友祥指導教授：黃世昆教授