Replay and Debug Parallel Programs

2 Related Work

2.6 Replay and Debug Parallel Programs

General debugger is not suitable for debugging parallel programs. Ronsse et al.

explains the relationship between execution replay and debugging [21]. The main problem is that parallel programs are non-deterministic: each program run (even with the same input) might result in different program execution. It is so-called non-determinism property of the parallel programs. For example, programs do not determine the sequence of using semaphores for processors. It depends on the competition of processes’ execution at runtime. As a result, when debugging the parallel programs, the execution details should be recorded for the debugger to reproduce the former execution. The purpose is to remove the non-determinism produced in execution time.

3 Win32 API Hooking Techniques

API call interception technique is the groundwork of our system. The ability to control API function calls is extremely helpful and enables developers to track down the internal actions happening during the API call. Actually, this is the reason why the title of this work adopts the word “instrument”. The purpose of API call interception is to take control of some execution code. That is the so-called “stub” to force the target application to execute the injected code. Therefore, the injected code can easily monitor the program by parameter logging, return value checking, stack dump, frame pointer tracing, etc. We have investigated some API hooking techniques and the details of these methods will be discussed in the following.

There are two roles in the Win32 API hooking system. One is the Hook Server, which injects the Driver, i.e. spying DLL code, into the address space of the target application at some proper time. The other is the Hook Driver, which is injected in the target process’ space to execute the interception work. Usually, the Hook Server should communicate with the Hook Driver. It retrieves the information from the Driver when the Driver performing the interception. In the following, the injection and the interception work we have surveyed will be introduced separately, corresponding to the Server’s and the Driver’s work.

3.1 Injection

Dynamic-link libraries (DLLs) are the structural elements of Microsoft Windows.

They are separate files containing functions that could be called by programs to perform certain jobs. It is the DLLs that we want to write our spying code in. We can consider them as an extension to the application programs. In Win32, each process has its own address space and its own set of loaded DLLs. The DLL’s file image must be mapped into the address space of the calling thread’s process so that the program could call a function in a DLL. Here comes the problem. How does the application program use our function in the spying DLLs?

Since the target application does not have any information about our spying DLL,

we need to use some tricks to force the DLL into the target process to perform the interception. There are three injection techniques we have studied.

3.1.1 Registry

There is a registry key that records the DLL names loaded by the operating

system into the address space of each process at the process startup time. We can simply add the DLL name to the value of the following registry hierarchy:

HKEY_LOCAL_MACHINE\Software\Microsoft\WindowsNT\CurrentVe rsion\Windows\AppInit_DLLs

The loading of these DLLs in the above registry key is performed when the USER32.DLL initializes. In its ^DllMain, USER32.DLL will use the explicit linking,

LoadLibrary() call, to map these files into the address space. This is a tedious and manual way to inject the spy DLL into our target application process and has some disadvantages:

1. Windows has to be rebooted for the activation of the injection. This will add a lot of overhead for our experiments.

2. All the processes that use the USER32.DLL will be injected the spy DLL. We have to add some check code in the spy DLL to avoid injecting to the processes we are not interested in. Furthermore, if the application we are interested in does not use the USER32.DLL such as most console-based applications, this technique fails to inject the spy DLL.

3.1.2 Windows Hooks

Installing a windows hook by SetWindowsHookEx() can also force the certain DLLs into the address space of the target processes. The hook is installed as follows:

HHOOK SetWindowsHookEx ( int idHook,

HOOKPROC lpfn,

HINSTANCE hMod, DWORD dwThreadId );

The first parameter indicates the type of hook procedure to be installed. The second parameter identifies the pointer to the hook procedure. The third parameter specifies the handle to the DLL containing the hook procedure. Finally, the last parameter specifies the thread to hook. The operating system will automatically inject the DLL containing the hook procedure into the address spaces of all processes influenced by the hook.

The advantage of this method is that it can use the UnhookWindowsHookEx() to unload the DLL when the hook is not needed. However, there are still some shortages:

1. The API call made by the target process before the hook is installed will be missed.

2. The spy DLL will not actually be loaded until the some actions performed by the target process trigger the hook procedure.

3. Windows hooks increase much overhead of extra message processing so as to decrease the performance of the whole system.

3.1.3 Remote Threads

We adopt this technique in this work. This method is more flexible and trying to force the target application process to call the LoadLibrary() and load the spy DLL.

However, the problem is that we don’t have any access to the target process’s thread and trick it to load the DLL for us. In order to overcome this difficulty, we need some Win32 functions that could affect other processes. CreateRemoteThread() is the one.

Its prototype is as follows:

HANDLE CreateRemoteThread ( HANDLE hProcess,

LPSECURITY_ATTRIBUTE lpThreadAttributes, SIZE_T dwStackSize,

LPTHREAD_START_ROUTINE lpStartAddress, LPVOID lpParameter,

DWORD dwCreationFlags, LPDOWRD lpThreadId );

CreateRemoteThread() allows one process to create a thread that runs in the virtual space of another process. Compared with the CreateThread(), this API just have one more parameter to specify the process that will contain the newly-created thread. The lpStartAddress parameter identifies the memory address of the thread function, which will be executed after the remote thread has been created. We can use

GetProcAddress() to retrieve the address of the LoadLibrary() API and consider it as thread function to load our spy DLL. Because KERNEL32.DLL is always mapped to the same address of every process, the address of the LoadLibrary() API will be correct for sure. Therefore, we can succeed to ask the target process to execute

LoadLibrary() on our behalf.

In Matt Pietrek’s work, he does not use the CreateRemoteThread() API..

Instead, he modifies the target process’s memory and registers so that it look like the process is calling LoadLibrary() on its own. The advantage is that his method is portable to the platforms that do not support CreateRemoteThread().

3.2 Interception

After injecting the Hook Driver (spy DLL code) into the target process’s address

space, what we have to do next is to intercept the API call. That is, the injected DLL should be responsible for accomplishing all the preparation for interception. In the following, three interception techniques we surveyed will be introduced.

3.2.1 Modification of the Import Address Table

This technique is based on the fact that Win32 executables files and DLLs are built on the neat structure of Portable Executable (PE) file format, which is an extension of Common Object File Format (COFF). PE file format consists of several logical chunks called sections. Each section stores a specific type of data. For example, the ^.text section contains all general-purpose code produced b the compiler

or assembler; the .edata section is a list of the functions and data that the PE file exports for other modules.

In order to implement the API interception, we should pay more attention on the^.idata section, which contains information about functions that the module imports from other DLLs. An important table resided in this section (so-called Import Address Table) contains file-relative offsets to the names of imported functions referenced by the executable’s code. When the program is loaded to the memory, the addresses in the IAT will be patched to the real addresses of the imported functions.

Figure 3 shows the process of calling a function in another module. When you call a function in another module (for example, GetMessage in USER32.DLL), the CALL instruction produced by the compiler does not transfer control directly to the function in DLL. Instead, the call instruction transfer control to a ^{JMP DWROD}

PTR[00040042] instruction in the .text section. The JMP instruction indirects through a DWORD variable in the ^.idata section. This ^.idata section DWORD contains the real address of API function entry point.

Application program

USER32.DLL

0x77879426

JMP DWORD PTR[00040042]

CALL 000144408 (Call to GetMessage)

.idata (import table)

GetMessage Code 0x77879426

0x00040042

0x00014408 intercept.DLL

Call to GetMessage .text

Figure 3 The process of calling a function in another module

3.2.2 API patch

This method directly modifies the API function itself. One approach is to replace the first byte of the target API with a breakpoint interrupt instruction (INT 3). Any call to the target API will generate an breakpoint exception, and the operating system will inform your API interceptor, which serves as a debugger of the target process, to handle it. The shortcoming of this approach is the overhead caused by Windows exception handling mechanism.

Figure 4 The interception process

Another approach to perform API patch is to modify the first few bytes of the target API with the control-transfer instruction JMP. Actually, our process rewriting mechanism to hook the user function, which will be described in Section 5, adopts this approach. Detours is an implementation of the API patch mechanism, and the following Figure 4 explains the interception process through the terms used in Detours. When execution reaches the target function, control jumps directly to the

user-supplied detour function. After the detour function performs the interception work, it calls the trampoline function, which consists of the initial instructions from the target function and a jump to the remainder of the target function.

3.3 The Comparison of API Interception Works

According to the ways of injection of users’ DLL into the target process and interception mechanisms, there exists some different kind of works for different purposes. Table 1 compares these surveyed interception work. After the consideration about stack frame evolving due to added monitor function and the completeness of the API interception mechanism, Detours is chosen to be the framework of this work.

Table 1 The comparison of API interception techniques

Watchd ^[13] Detours^[10] API-SPY^[18] Intel^[26]

Ways to intercept API

Modify IAT to the wrapper function and when then jumps back to the

real target function.

Modify IAT to the checkpoint wrapper and when finishing logging then jumps back to the real target function.

Does other process be influenced?

No (Copy-on-write) No (Copy-on-write) No No

Ways to inject DLL

Ways to Launch

Use their own loader to load app.exe

In the logging routine of the stub

In the wrapper function

Ways to get the Return value

Get the return of the target function directly in the

to get the return value.

(Using the “return address stack”)

N/A

Can determine which API to intercept by the

configuration file

4 Research Method

Our research uses the following approaches to manifest and analyze the crash process as precisely as possible.

4.1 Control Flow Anomaly Detection

If programs crash, programmers and hackers are eager to find the bugs. There are two main causes of a crash. The first is accessing data in an invalid address, for example, null pointer assignment. The second is transferring control to an invalid address, often due to buffer overflow. The latter is the more serious in the two cases.

In this situation, we can transfer the control of the program by overwriting the following data:

(1) Return address: The corruption of this data belongs to stack-based control flow anomaly and will be detected by our stack corrupt site identification mechanism.

When the current function returns, the program transfers the control to the code designated by the return address. By overwriting the return address, we can jump to any position in the process. After the function returns, the control flow will be intercepted. This is the popular target of buffer overflow exploit.

(2) Saved base pointer: The corruption of this data also belongs to stack-based control flow anomaly and will be detected by our stack corrupt site identification mechanism.

Saved base pointer points to the previous stack frame. If the saved base pointer is overwritten, the process will have a fake frame after returning from the current function and will jump to the fake return address.

(3) Function pointer: This data may be in the stack or heap and will be detected by our call target validation mechanism. When overwriting the function pointer, the process will jump to an arbitrary position. Overwriting the virtual function pointer in the heap is also a common vulnerability in C++ program.

The entire control flow anomaly caused by overwriting these data mentioned above would be detected by the following two mechanisms. Some limitations will be described in the Section 6.

4.1.1 Stack Corrupt Site Identification

When the program crashes, by inspecting it using the debugger we know the instruction where the program stops running. The point where the program stops running abnormally is the crash site. When the stack-based overflow occurs, the stack is “corrupt” for the saved base pointer and the return address corresponding to a certain function is overwritten. This is the point where the stack becomes abnormal.

At some later time, this program must either crash or be exploited. The goal of the stack corrupt site identification is that right after the control flow of the program has been changed, we identify where the corrupt site is as precisely as possible.

Figure 5 is a sample program to demonstrate the distinction between the crash site and the stack corrupt site. Function main passes the pointer of its local buffer

buff to function ^a, and then function ^a passes it to function ^b. In function ^b, after

strcpy() finishes copying the overlong string to main’s local buffer, the stack is corrupt. However, the program has not yet crashed until the function ^main returns.

Obviously, the debugger could not specify the distance between the stack corrupt site and the crash site.

#include <stdio.h>

void b(char *buff){

strcpy(buff, “AAAAAAAAAAAAAAAAAAAAAA”); /* overlong string */

/* stack corrupt site */

...

}

void a(char *buff){

b(buff);

}

void main(){

char buff[4];

a(buff);

} /* crash site */

Figure 5 The sample program to demonstrate the crash site and the corrupt site

In the following sub-sections, the mechanism to identify the stack corrupt site is described.

4.1.1.1 Pertinent Registers to a Stack

In order to understand the operation on a stack, we should know some specific assembly language knowledge. Normally, there are three registers that are pertinent to the operation on a stack: EIP, EBP and ESP.

EIP is the extended instruction pointer. It stores the address of the current instruction we are executing. When we call a function, this address will be pushed on the stack. We call the saved EIP the return address (RET). When exiting the function, the control flow will go back to RET for later execution. ESP is the extended stack pointer. It points to the current position on the stack. When we use push or pop instruction to add or remove data on the stack, ESP will change as well. Moreover, we could change the ESP by direct stack pointer manipulation. Finally, EBP is the extended base pointer. It is used to access the stack data such as local variables and offsets in a function and should keep the same throughout the lifetime of the function.

4.1.1.2 Stack Frame Backtracing

Stack frame backtracing employs the fact that saved base pointer points to previous saved base pointer in the stack. Typically, the function prologue is used to allocate the space on the stack for local variables. The following short disassembly shows how the compiler decided to implement the allocation of stack variables.

// function prologue

PUSH EBP // save old frame pointer

MOV EBP, ESP // the current EBP points to the saved EBP SUB ESP, X // stack variables allocation with X bytes

The old EBP is pushed on the stack, and then the current EBP is overwritten by the address of stack pointer, which points the top of the stack. That is, the current EBP points to the previous saved EBP. If we continuously trace back the saved EBP, the tracing will reach the saved EBP of main function. We utilize stack frame backtracing to verify that the call stack is sound and furthermore identify the stack corrupt site when the stack-based overflow occurs.

We define our term “stacktrace”. In Figure 6, function A invokes function B.

Therefore, the stack frame of function A is in the higher address and the stack frame of function B is in the lower address. Now assume that the EBP register points to the saved base pointer of function B. If we perform the stack frame backtracing, we will generate a stacktrace, which comprises {(SavedEBP, RET)B, (SavedEBP, RET)A, …, (SavedEBP, RET)Main}. Actually, this sequence could be understood easily by realizing that the main function calls some other functions and then some other functions call function A, and then function A calls function B.

High

(2) FuncB’s saved base pointer points to FuncA’s saved base pointer.

(3) Current EBP register points to FuncB’s saved base pointer.

Figure 6 The operation of stack frame backtracing

We first insert a monitor function in the function’s prologue and epilogue separately to perform the detection mechanism and we have to ensure that this

monitor function performs in the function’s prologue and epilogue is as following:

(1) In the prologue:

‧Reserving all the registers

‧Using the current EBP to enforce stack frame backtracing

‧Restore all the registers (2) In the epilogue:

‧Reserving all the registers

‧Using the current EBP to enforce stack frame backtracing

‧Comparing the stacktrace with the prologue’s stacktrace and point out the difference

‧Restore all the registers

To detect the stack corruption, we compare the stacktraces generated in a certain function’s prologue and epilogue. If the stacktraces are different, there must exist some stack buffer in a certain function growing out of bound so that the return address or the saved EBP corresponding to that function is overwritten.

4.1.2 Call Target Validation

This mechanism is designed for the control flow anomaly resulted from the function pointer overwritten. We instrument the application process at the point where each CALL instruction is. With this instruction-grained instrument, we insure that each CALL instruction is transferring control to the normal function entries.

We use the software interrupt to enforce this instrument. We overwrite the first

byte of the CALL instruction with breakpoint interrupt instruction (INT 3), and install a corresponding exception handler. When an INT 3 instruction is executed, it generates a Debugger Breakpoint Exception, and the handler gains control to perform call target validation. After finishing the validation, we will restore the original EIP and CALL instruction.

According to the way of the CALL target is determined, we divide the CALL

在文檔中程式失控動態分析系統設計與實作 (頁 20-0)