Implementation - Testing Sequential Programs with CAST Specification

2. Testing Sequential Programs with CAST Specification

2.2. Implementation

We show what CAST modifies ALERT to make it possible to test a vulnerable program and find exploits in this section. First, we simply describe symbolic execution applied to constraint solver;

then secondly depict how to utilize symbolic execution to model a tested program into byte level precision. Thirdly, how to maintain an object map for recording sensitive data is shown in section 2.2.3. Fourth, the most important issue about symbolic pointer dereference shown in section 2.2.4 discuss how to query constraint solver for extra execution paths.

2.2.1. Symbolic Execution

Automatic theorem prover such as CVC3[30] we use in this paper, a tool to solve Satisfiability Modulo Theories (SMT) problems, is the kernel of symbolic execution such that symbolic execution

is limited by the power of underlying theorem prover. We will explain how symbolic execution works in Figure 9 and turn it into CVC3 file format (.cvc) to experiment for finding why it fails without symbolic execution.

Figure 9: How to model a C program with symbolic execution.

When not using symbolic execution in Figure 9(a), we collect the constraint formula is (i>100 AND i=i+1) and then find that we can’t make the difference between (i>100 AND i=i+1) that means

(if(i>100) i=i+1;) and (i=i+1 AND i>100) that means (i=i+1; if(i>100);). Therefore, to verify this, we turn the formula (i>100 AND i=i+1) to CVC3 query language shown in Figure 9(b) and call theorem prover, then get the error message “symbolic.cvc:7: this is the location of the error”. It, further, means that the solver doesn’t know the formula is the case of Figure 9(c) (i1>100 AND i2=i1+1) or Figure 9(d) (i2>100 AND i2=i1+1), whereas after applying symbolic execution into Figure 9(a), we have the constraint formula (i1>100 AND i2=i1+1) which is the interpretation of the path of the program in Figure 9(a) that input i passes through line 3 and line 4.

2.2.2. Symbolic Name and Symbolic Byte Name

As ALERT, CUTE and CREST [31] adopt symbolic execution to query constraint solver, the

problem of type size in C comes with it. In Figure 10, the variable a is equal to a4@a3@a2@a1 (where “@” denotes bitvector concatenation, and we use little-endian order for multi-byte values) due to buffer overflow that sets a1, a2, a3, a4 to be symbolic inputs. ALERT collect symbolic expression (buf[4]_0 = a1_0 && buf[5]_0 = a2_0 && buf[6]_0 = a3_0 && buf[7]_0 = a4_0) from line 13 to line 16 and try to query (a_0 = 5566) at line 17. Then it fails to generate inputs of the next path, because it doesn’t make the equation of a_0 and buf[7]_0@buf[6]_0@buf[5]_0@buf[4]_0.

CUTE and CREST use the same symbolic expression memory model to map &buf[4] to a1 at line 13 and to get a1 from finding &a in the map at line 17 to ask constraint solver if a1 = 5566. Then, they can’t generate next path inputs because the symbolic expression memory map model is not byte-sensitive enough to get the correct symbolic expression a4@a3@a2@a1=5566. After running the case in CUTE and CREST, we make the conclusion as above.

1 #include <crest.h>

9 make_symbolic_char(&a1);//make a1 as symbolic char input 10 make_symbolic_char(&a2);

Figure 10: A C program with symbolic name problem.

Instead of traditional symbolic name, CAST makes all variable into symbolic byte name and

check each byte of variable a by recording each bytes into symbolic expression memory map.

Therefore, we can assert the formula (getByteExpr((char *)&a+3) @ getByteExpr((char *)&a+2) @ getByteExpr((char *)&a+1) @ getByteExpr((char *)&a) = 5566) to ask the correct symbolic expression (a4_0@a3_0@a2_0@a1_0=5566). EXE claims that they implement byte-level precision, but we can’t get the program to test the case.

2.2.3. Object Map

CAST implements an object map like CRED[32], which dynamically records the information of all variables, especially ebp and return address of each function about its memory address and type size that help us make extra symbolic assignment constraints by _sqSymExec() and symbolic branch constraints by _sqAddPredicate() to query not only precise but also exploitable inputs. We add/remove the object information of every function into object map when program enter/exit each function. Further, adding sensitive data such as global offset table, return address, etc into object map provides a chance to calculate more exploitable paths. Avoiding getting unknown objects from buffer overflow, we fill every stack frame with stuff objects by sqAddStuff() to make sure that symbolic execution works as usual.

1 unsigned *sqEbp;

3 int testme(int a) 4 {

5 asm("movl %%ebp, %0\n": "=g" (sqEbp));

6 sqAdd((unsigned int) sqEbp+4, sizeof(int), "testme[ret]");

7 sqAdd((unsigned int) sqEbp, sizeof(int), "testme[ebp]");

8 sqAdd((unsigned int )(&a), sizeof(a), "a");

9 sqAddStuff();

11 a=a+1;

13 sqRemove((unsigned int)_sqEbp +4);

14 sqRemove((unsigned int) sqEbp;

15 sqRemove((unsigned int)(&a));

16 sqRemoveStuff();

18 return a;

19 }

Figure 11: Instrumentation of inserting and removing object map.

GCC provides a large number of built-in functions for programmers to observe program, such as __builtin_frame_address() and __builtin_return_address(). When using these functions to get ebp and return address, we encounter some problems about its imprecision after calling some standard library. For the sake of getting precise ebp to solve the problem of imprecision, we write assembly code of the line 5 in Figure 11 such that an extra global variable sqEpb is necessary to record current ebp. Therefore, CAST can dynamically check the address from start to end of each frame to ensure that each byte is added into object map for symbolic byte name execution.

2.2.4. Symbolic Pointer Read/Write with Symbolic Value

One of the most notorious forms of attack to C program is buffer overflow which generates extra execution paths in which we are interested. Table 3 shows the cases that cause different kinds of buffer overflow with different exploitable levels of vulnerabilities. Here, we assume that variable a, b[i], *p and i are integer types. In the case (1), the variable a can be any value in the memory such

that it has an influence on the branches directed by the variable a. The case (2) means that we can write 4 bytes at any memory location with value a. The most dangerous behavior of a program is the case (3) that users can write any value into any memory location, especially certain sensitive data like return address.

Table 3: The cases of buffer overflow cause different vulnerabilities to a C program.

Case Pattern Concrete Symbolic Exploitable

(1)Symbolic Buffer Index/Symbolic Pointer Reading

a = b[i]; B i possible

a = *p; p possible

(2) Symbolic Buffer Index/Symbolic Pointer b[i] = a; b, a i possible

Writing *p = a; a p possible program, extra paths are not handled after bugs execution shown in Table 3, in the mean time they also cannot achieve printf(“GOAL”) in Figure 12. ALERT and EXE, however, handle it as a normal access to any element of buffer, so they can achieve all branches except printf(“sp is written by

Figure 12: A C program of buffer accesses with symbolic index.

Encountering a[i]=10 at line 5, EXE generates a big disjunction constraint (i == 0 && a[0] ==

10) || (i == 1 && a[1] == 10) || (i == 2 && a[2] == 10) || (i == 3 && a[3] == 10). CAST generates a bigger disjunction constraint (i == 0 && a[0] == 10) || (i == 1 && a[1] == 10) || (i == 2 && a[2] ==

10) || (i == 3 && a[3] == 10 || … || (i==n && a[n]==nearest_return_address)) than EXE’s constraint for the purpose that we can explore extra execution path. Thus, printf(“sp is written by buffer overflow”) at line 17 and line 19 can be achieved by CAST.

2.2.5. Use Object Map to Construct Shellcode Constraints

When a symbolic pointer assigned with symbolic value taints a return address, we want to find a continuous memory large enough to allocate shellcode and make return address to point the start of the shellcode.

void vc_mkShellCodeExpr(char *shellcode){

vector<object>::iterator objmap_it;

vector<object>::iterator omtmp_it; /* objmap temp iterator */

for(objmap_it = objmap.begin(); objmap_it != objmap.end(); objmap_it++){

omtmp_it = objmap_it;

for(i=0; i< strlen(shellcode); i++){ // check if there are continuous memory if(omtmp_it != objmap.end())

andExpr &= (omtmp_it->expr = shellcode[i]);

omtmp_it++;

}

orExpr |= (andExpr & retAddrExpr == objmap_it);

}

vc->assertFormula(orExpr); /* universal check for shellcode */

vc->query(vc->falseExpr())); /* constraint solver check if it is valid */

}

Figure 13: Pseudo code of generating shellcode constraints.

2.2.6. Dynamic Tainted Data Flow for Function Call

When testing C programs by concolic testing, we must pass symbolic execution and tainted data flow to the parameter of function call. Instrumentation in Table 4 shows how to instrument a function when entering and exiting in order to make the equation of arg1 = para1 and LHS = ret_val.

Table 4: CIL instrumenter instruments functions code.

Case Instruction/Instrumentation

(1)One-Address _sqPush( arg1, T_INT, (unsigned int )(& arg1));

_sqPush( arg2, T_INT, (unsigned int )(& arg2));

_sqPush( arg3, T_INT, (unsigned int )(& arg3));

function(arg1, arg2, arg3)

(2)Two-Address _sqPush( arg1, T_INT, (unsigned int )(& arg1));

_sqPush( arg2, T_INT, (unsigned int )(& arg2));

_sqPush( arg3, T_INT, (unsigned int )(& arg3));

LHS = function(arg1, arg2, arg3);

_sqPopReturnValue( T_INT, (unsigned int )(& LHS));

(3)Function int function(int para1, int para2, int para3){

__parameterToSymbolic((unsigned int )(& para1), T_INT);

__parameterToSymbolic((unsigned int )(& para2), T_INT);

__parameterToSymbolic((unsigned int )(& para3), T_INT);

…

_sqPushReturnValue("ret_val", (unsigned int )(& ret_val), (unsigned int )ret_val, T_INT);

return ret_val;

}

2.2.7. Standard Library Testing by Using uclibc instead of glibc

A C program usually calls library function, but when it is tested by concolic testing, library function is not instrumented like case 3 in Table 4. So we cannot collect the correct symbolic execution and tainted data flow after calling library function. The approach proposed by ALERT[27]

assumes that constraints are generated by our human logical reasoning without instrumentation such that it is fast in concolic testing but time-consuming for us to reason a library function. Instead of

manually reasoning library, we instrument library with source code. We instrument uclib rather than glibc because uclibc is tiny and easy for debugging. Take function strcmp in string.h header file as an

example, we use CIL to modify its name to alert_strcmp for each tested source code such that testing a C program calling standard library is just complied with uclibc object file compiled by gcc and instrumented by CAST.

2.2.8. Post-condition for standard library fgets and read from stdin

In the cause of testing programs with fgets and read from stdin, we apply the approach [27] to make post-condition constraints for fgets and read. Instead of instrumenting each stdin library function, we instrument each fgets and read from stdin with _sqPostFgets() and _sqPostRead for making symbolic byte name to each destination buffer.

2.2.9. Environment

Table 5 shows the environment and tools CAST uses.

Table 5: Environment of CAST Operating System ubuntu-7.04-desktop-i386

Memory 1.5G

CPU Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz

Compiler gcc 4.1.2

Instrumenter CIL

Theorem Prover CVC3

在文檔中 CAST:自動化動態軟體驗證工具 (頁 28-36)