Pointer Corruption Detection - 自動產生攔截控制流程之攻擊程式碼

When accessing a memory address, S²E will check whether the address is symbolic or not.

If it is symbolic, S²E must determine an explicit location before keeping program execution.

S²E uses a binary search to find all locations that the symbolic address can point to, and forks executions to explore each address. Before S²E handles a symbolic address, the address can try to point to sensitive data, such as return address and GOT. If it is feasible, those sensitive data will be tainted and corrupt EIP register later. Otherwise, the symbolic address will be changed to taint other concrete data because it may help exploit generation if other vulnerabilities corrupt EIP register later. For example, we can change the address to taint data segment so that shellcode can inject to there to bypass ASLR protection.

4.4 Concolic-mode Simulation

To simulate concolic testing on S²E, we execute the program under test with input data, such as arguments and environment variables, and the main tasks are building input constraints and collecting branch conditions. As the previous note about memory model shown in Figure 16, the concrete values are stored separately from symbolic data, but the concrete values are ignored since the variables are marked as symbolic. In order to get the input data at run time, we can read the last concrete value of symbolic variables from concreteStore structure by hand and build constraints to limit the each symbolic expression to its concrete value. A vector container is used to save all input constraints because it is very easy to delete some constraints when they are unnecessary, and to combine with every constraint into a complete input constraint when it is needed.

In addition, a branch condition that takes only one path won’t be added to path conditions on symbolic execution because of redundancy. But concolic testing may need to collect those branch conditions because it always explores only one path in branches and the result is caused by adding extra input constraints. Therefore, whenever a branch is encountered and its both paths are feasible on symbolic execution, the SMT solver redetermines the branch conditions associate with input constraints and adds the branch conditions by force. The process is shown in Figure 22.

Input

branch

Path conditions Program

Execute programs with input

Concolic mode

EIP is corrupted !

Input constraints

Add branch conditions Input constraints

Build input constraints

Figure 22: The process of concolic-mode simulation

In addition to branches, symbolic addresses also fork executions on symbolic execution.

When accessing a memory address whose value is symbolic, S²E cannot determine where it should access. So, S²E forks executions to try to access every address where the symbolic address can point to. In concolic mode, input constraints are still used to help SMT solvers to determine a location on symbolic address. But, when pointer corruption happen, the address could be redirected to other addresses as section 4.3 notes. So, when a symbolic address is redirected, the input constraints that associate with symbolic address will be deleted from the vector container in order to avoid conflict.

It is very easy to switch between concolic-mode simulation and original symbolic execution because the memory model of S²E doesn’t be modified. If symbolic execution will be per-formed, we just set the input constraints to be “true” because path conditions won’t be changed when they perform an “AND” logical operation with a “true” expression.

4.5 Code Selection

Because the concrete value of a variable is stored separately from symbolic data, the switch of a variable between concrete and symbolic is very easy. The concreteMask structure is used to record the states of a memory region, and we can modify the boolean value to change a byte value from concrete to symbolic or vice versa.

To concretize a symbolic data, we pass the symbolic expression and path conditions to an SMT solver to find a concrete solution and write the concrete value back to the address of symbolic data. Because the variable has became concrete, all the code accessing the variable will be executed concretely. So, we concretize the arguments of uninterested functions to stop symbolic execution to explore the paths in them.

To restore a concrete variable to symbolic, we invert the boolean value in concreteMask structure to ignore the concrete value and keep the original symbolic expression to perform symbolic execution on this variable.

FILE * ptr = func(filename, mode);

symbolize(filename);

return ptr;

} fopen()

Glibc

LD_PRELOAD Programs under test

void *handle = dlopen(…);

FILE *(*func)(…) = dlsym(…);

Figure 23: An example for code selection

It is very easy that using this method to select a code to run on concrete execution or sym-bolic execution. In Linux, LD PRELOAD environment variable can intercept the library func-tions and jump to the funcfunc-tions we writing. To cooperate with this environment variable, those uninterested library functions can be intercepted automatically, and run them on concrete ex-ecution. Figure 23 shows an example that intercepts fopen function, and performs concrete execution on it. In addition, some functions just print messages to screen without return value, such as perror(), so those functions can be skipped using this method directly to speed the process of exploit generation.

Chapter 5 Experimental Results

In order to evaluate our work, three experiments were completed. The first part generated exploits for five different control-flow hijacking vulnerabilities to show our method can handle all vulnerabilities that corrupt EIP register and some vulnerabilities that corrupt pointers. The second experiment demonstrated that return-to-libc and jump-to-register exploits we generated can bypass some protections in real-world systems.

In the final experiment, we generated exploits for ten real-word programs to evaluate the results and prove our method can apply in real-world applications. In these ten real-world programs, seven programs were chosen from the benchmarks of AEG to demonstrate that our method can handle at least all cases that AEG can address.

5.1 Testing Method and Environment

In this chapter, we demonstrate the results of automated exploit generation for vulnerable sam-ple code and real-world programs. All experiments were performed on a 2.66 GHz Intel Core 2 Quad CPU with 4 GB of RAM, and the host environment is Ubuntu 10.10 64-bit. The guest environment is Debian 5.0 32-bit with default settings of QEMU, which is a 266MHz Pentium II (Klamath) CPU with 128 MB of RAM.

Mostly, the programs under test were compiled by GCC 4.3.2 and ran on Glibc 2.7, which are the default in Debian 5.0. But some programs used GCC 3.4.6 or Glibc 2.3.2 to generate exploits because the default version protects main function against stack buffer overflow or performs heap hardening integrity checks to stop heap overflow.

We used an end-to-end approach to generate exploits on binary executables without modi-fying the source code, i.e. the source code are not necessary. The method we used is forking a new process to execute the program under test and passing the symbolic data to it from out-side. In Debian 5.0, ASLR is enabled by default so that the based address of stack and heap is randomized. Therefore, ASLR was disabled for generating and testing all exploits except jump-to-register exploits.

在文檔中自動產生攔截控制流程之攻擊程式碼 (頁 46-50)