5 Implementation
5.4 Experience and Further Discussion
The second parameter describes the number of milliseconds to wait for a debug event. If a debug event does not occur in this time, the function times out and returns FALSE. If a debug event occurs then the function returns TRUE and puts the information about event type into the DEBUG_EVENT structure. Then we check the event type. If it is the event corresponding to INT 3, we perform the call target validation measure as described in Section 5. After our code for validating the call target, we have to use the ContinueDebugEvent() to resume the thread execution and wait for next event to occur.
5.4 Experience and Further Discussion
When implementing this instrument tool, we encounter some issues that are not intuitively simple to overcome. We address these issues in this sub-section and describe our solutions and experience.
5.4.1 Stack Region
When performing stack frame backtracing, we need to figure out when to stop tracing the frame pointer. The straightforward idea is that the frame pointer should not point to the address that is out of stack region.
At first, we try to use VirtualQueryEx() API to retrieve the meta-data of a stack region. It provides information about a region of consecutive pages beginning at
a specified address that share the same attributes. VirtualQueryEx() determines the attributes of the first page in the region and then scans subsequent pages until it scans the entire range of pages, or until it encounters a page with a non-matching set of attributes. Because of our wrong assumption that the whole stack region shares the same attributes, we make a serious mistake on determining the stack upper boundary.
Therefore, in this wrong implementation we did not traverse the whole stack and missed many stack frames to check.
Our solution to overcome this problem is to use Thread Information Block (TIB) to identify when to stop backtracing the frame pointer. TIB is a key system data structure in Microsoft Windows and there are many data related to threads inside it, including a pointer to the thread’s structured exception handler list, the location of the thread’s stack and the location of the thread local storage. Furthermore, each thread in the system has its corresponding TIB.
In all Intel-based Win32 implementations, the FS register points to the TIB. As a result, we have to look at what the FS register points to for getting the information hidden in the TIB. For example, FS:[0] points to the structured exception handling chain, while FS:[2C] points to the thread’s local storage array. The information we needed to judge the stack region is pvStackUserTop and pvStackUserBase field in the TIB. The 04h DWORD pvStackUserTop filed contains the linear address of the topmost address of the thread’s stack. This thread should not have a stack pointer value that is greater than or equal to the value of this field. The 08h DWORD
pvStackUserBase field contains the linear address of the lowest committed page in the thread’s user mode stack. As the thread uses successively lower addresses in the stack, those pages will be committed, and this field will be updated accordingly. The 18h DWORD ptibSelf field holds the linear address of the TIB. We use this data to access the pvStackUserTop and pvStackUserBase structure. The following code is to demonstrate how to access these system data structure.
PTIB pTIB;
__asm {
mov EAX, FS:[18h]
mov [pTIB], EAX }
Therefore, we could use pTIB->pvStackUserTop and pTIB->pvStackUserBase
to set the boundary when performing stack frame backtracing.
5.4.2 Stack Evolvement After Instrument
We need to explain more about the stack evolvement after our code is instrumented. The instrument library replaces certain functions at runtime. It will allocate the space for the stub and append the instructions used to perform stack frame backtracing on the stub. The instruction to call the monitor function will add a stack frame on the stack, and this stack frame is not our concern.
The instrument library inserts a JMP instruction in the prologue and epilogue and the inserted JMP instruction in prologue jumps to the following stub code:
PUSH addr
CALL Monitor_Function ADD ESP, 4
// Instructions which is recognized by parser and moved from the original prologue
PUSH EBP MOV EBP
……
// Jump back to the next instruction after prologue recognized by the parser
JMP Next_inst_after_prologue
The PUSH addr instruction is intended to pass a parameter addr, which is the address of the prologue, to the Monitor_Function but it adds 4 bytes on the stack.
Afterward, the CALL Monitor_Function instruction pushes the return address of monitor function on the stack. After calling the monitor function, its saved base pointer will also pushed on the stack. Therefore, in the monitor function we should access the return address of the wrapped function by adding 12 bytes as following:
unsigned long ret = *(unsigned long *)(EBP+12);
Similarly, the inserted JMP instruction in the epilogue jumps to the following
stub code:
...
PUSH addr
CALL Monitor_Function ADD ESP, 4
// Instructions which is recognized by parser and moved from the original epilogue
...
POP EBP RETN
The stack evolvement in the epilogue is similar to that in prologue. Therefore, access to the return address and saved base pointer of the wrapped function is the same as that in prologue and is not trivial as well.
5.4.3 Corrupt Site Approximation
Because of insufficient space to instrument a JMP instruction to prologue and epilogue, we do not wrap all the typical functions in the target program. Therefore, some corrupt site approximation could be discussed to increase the precision of the corrupt site identification.
For a certain wrapped function, its stacktraces performed in prologue and epilogue will fall in one of situations below under an assumption: a “normal”
stacktrace is defined.
(1) If the stacktrace in the prologue is normal but the stacktrace in the epilogue is abnormal, it means that the stack is corrupted in this wrapped function.
(2) If the stacktraces in the prologue and epilogue are normal, it means that the stack is not yet corrupted.
(3) If the stacktrace in the prologue is abnormal, it means that no matter the stacktrace in the epilogue is normal or not, the stack is corrupted in one of the previous functions.
Case 3 can be divided into two situations.
(i) If the stacktrace in current wrapped function’s prologue and the stacktrace in the
previous wrapped function’s prologue differ in one saved base pointer / return address pair as following, it means that the corruption occurred in the previous wrapped function.
Stacktrace in previous wrapped function’s prologue:
(EBP1,RET1),(EBP2,RET2),…,(EBPn,RETn)
Stacktrace in current wrapped function’s prologue:
(EBP1,RET1),(EBP2,RET2), …,(EBPn,RETn),(EBPn+1,RETn+1)
(ii) If the stacktrace in current wrapped function’s prologue and the stacktrace in the previous wrapped function’s epilogue differ in one more saved base pointer / return address pairs as following, it means that the corruption occurred in one of the previous unwrapped functions.
Stacktrace in previous wrapped function’s prologue:
(EBP1,RET1), …,(EBPn,RETn)
Stacktrace in current wrapped function’s prologue:
(EBP1,RET1),…,(EBPn,RETn),(EBPn+1,RETn+1),(EBPn+2,RETn+2),(EBPn+3,RETn+3)
If we could retrieve the function entries corresponding to these different stack frames, we could use another method such as software interrupt to wrap these functions to identity the exact corrupt site. Therefore, we could increase the precision of corrupt site identification.