Stack Corrupt Site Identification - Control Flow Anomaly Detection

4 Research Method

4.1 Control Flow Anomaly Detection

4.1.1 Stack Corrupt Site Identification

When the program crashes, by inspecting it using the debugger we know the instruction where the program stops running. The point where the program stops running abnormally is the crash site. When the stack-based overflow occurs, the stack is “corrupt” for the saved base pointer and the return address corresponding to a certain function is overwritten. This is the point where the stack becomes abnormal.

At some later time, this program must either crash or be exploited. The goal of the stack corrupt site identification is that right after the control flow of the program has been changed, we identify where the corrupt site is as precisely as possible.

Figure 5 is a sample program to demonstrate the distinction between the crash site and the stack corrupt site. Function main passes the pointer of its local buffer

buff to function ^a, and then function ^a passes it to function ^b. In function ^b, after

strcpy() finishes copying the overlong string to main’s local buffer, the stack is corrupt. However, the program has not yet crashed until the function ^main returns.

Obviously, the debugger could not specify the distance between the stack corrupt site and the crash site.

#include <stdio.h>

void b(char *buff){

strcpy(buff, “AAAAAAAAAAAAAAAAAAAAAA”); /* overlong string */

/* stack corrupt site */

...

}

void a(char *buff){

b(buff);

}

void main(){

char buff[4];

a(buff);

} /* crash site */

Figure 5 The sample program to demonstrate the crash site and the corrupt site

In the following sub-sections, the mechanism to identify the stack corrupt site is described.

4.1.1.1 Pertinent Registers to a Stack

In order to understand the operation on a stack, we should know some specific assembly language knowledge. Normally, there are three registers that are pertinent to the operation on a stack: EIP, EBP and ESP.

EIP is the extended instruction pointer. It stores the address of the current instruction we are executing. When we call a function, this address will be pushed on the stack. We call the saved EIP the return address (RET). When exiting the function, the control flow will go back to RET for later execution. ESP is the extended stack pointer. It points to the current position on the stack. When we use push or pop instruction to add or remove data on the stack, ESP will change as well. Moreover, we could change the ESP by direct stack pointer manipulation. Finally, EBP is the extended base pointer. It is used to access the stack data such as local variables and offsets in a function and should keep the same throughout the lifetime of the function.

4.1.1.2 Stack Frame Backtracing

Stack frame backtracing employs the fact that saved base pointer points to previous saved base pointer in the stack. Typically, the function prologue is used to allocate the space on the stack for local variables. The following short disassembly shows how the compiler decided to implement the allocation of stack variables.

// function prologue

PUSH EBP // save old frame pointer

MOV EBP, ESP // the current EBP points to the saved EBP SUB ESP, X // stack variables allocation with X bytes

The old EBP is pushed on the stack, and then the current EBP is overwritten by the address of stack pointer, which points the top of the stack. That is, the current EBP points to the previous saved EBP. If we continuously trace back the saved EBP, the tracing will reach the saved EBP of main function. We utilize stack frame backtracing to verify that the call stack is sound and furthermore identify the stack corrupt site when the stack-based overflow occurs.

We define our term “stacktrace”. In Figure 6, function A invokes function B.

Therefore, the stack frame of function A is in the higher address and the stack frame of function B is in the lower address. Now assume that the EBP register points to the saved base pointer of function B. If we perform the stack frame backtracing, we will generate a stacktrace, which comprises {(SavedEBP, RET)B, (SavedEBP, RET)A, …, (SavedEBP, RET)Main}. Actually, this sequence could be understood easily by realizing that the main function calls some other functions and then some other functions call function A, and then function A calls function B.

High

(2) FuncB’s saved base pointer points to FuncA’s saved base pointer.

(3) Current EBP register points to FuncB’s saved base pointer.

Figure 6 The operation of stack frame backtracing

We first insert a monitor function in the function’s prologue and epilogue separately to perform the detection mechanism and we have to ensure that this

monitor function performs in the function’s prologue and epilogue is as following:

(1) In the prologue:

‧Reserving all the registers

‧Using the current EBP to enforce stack frame backtracing

‧Restore all the registers (2) In the epilogue:

‧Reserving all the registers

‧Using the current EBP to enforce stack frame backtracing

‧Comparing the stacktrace with the prologue’s stacktrace and point out the difference

‧Restore all the registers

To detect the stack corruption, we compare the stacktraces generated in a certain function’s prologue and epilogue. If the stacktraces are different, there must exist some stack buffer in a certain function growing out of bound so that the return address or the saved EBP corresponding to that function is overwritten.

在文檔中程式失控動態分析系統設計與實作 (頁 30-33)