We implemented RELEASE for testing x86 ELF format binary executables.
Figure 3 illustrates RELEASE‟s system architecture. At the highest level, RELEASE consists of a static analyzer and a dynamic analyzer which bases on a dynamic binary instrumentation (DBI) platform. In the following section, we will discuss each part of RELEASE deeper, followed by a detailed algorithm description.
4.1 Static analyzer
Static analyzer is the main part to extract information from binary program. By moving workload from dynamic analyzer to static analyzer, it can effectively mitigate plenty analysis overhead during the run-time. Our static analyzer is composed of a disassembler, a control flow constructor, and a static loop locator. The information in binary program is revealed step by step by running each of them.
Disassembler. RELEASE uses a disassembler to translate machine code to assembly code, then passes assembly code to control flow constructor for next step. To get other useful information of binary program is also very important, such as virtual address base, sections, and procedure linkage table (PLT). The virtual address base helps our system to handle the memory address relationship between real addressing and virtual one. The sections information let RELEASE know how to deploy each of them. By getting PLT, it makes our dynamic binary instrumentation subsystem can map the outer library function calls correctly. In our implementation, we choose to use two linux open source tools to achieve this goal: one is a disassembler called GNU ob-jdump, and another one is an executable information revealer called GNU readelf, both of their versions are v2.18.93.20081009.
17
Control flow constructor. After analyzing each instruction, control flow constructor converts assembly code to its own data structure, and groups them as a “basic block,”
which contains a sequence of instructions and has single entry and single exit. RE-LEASE first divides basic blocks by using a linear scan method to search control-flow changing instructions like call, (un)conditional jump, and return. Then RELEASE does a different process than other frameworks, such as Valgrind[20], QEMU[21], and PIN[22]. Basic block is cut again by target addresses of direct call or jump and broken down to “atom block”. The main idea different between other approaches is that RELEASE deals with direct control-flow changing instructions while static anal-ysis rather than in run-time, thus make it fit in with our principle: “By moving work-load from dynamic analyzer to static analyzer, it can effectively mitigate plenty anal-ysis overhead during the run-time.” To clarify and unify the definition, the phase “ba-sic block” is used as “atom block” in the rest of paper.
Dynamic Binary Instrumentation (DBI) Framework
Control Flow
Figure 3: Illustrate of system architecture of RELEASE.
18
Static loop locator. By using control flow information, the “directly” loop location can be easily to identify. That means, by taking advantage of x86 instruction formats, the operand of direct conditional jump must be a relative offset. A relative offset is generally specified as a label in assembly code, but at the machine code level, it is encoded as a signed, 8-bit or 32-bit immediate value, which is added to the instruction pointer.
4.2 Dynamic Binary Instrumentation
Our dynamic binary instrumentation (DBI) framework has three components: a dispatcher, an emulator, and a just-in-time (JIT) compiler. The relationships between them are shown in Figure 4. The instruction cache inherits from control flow con-structor described in Section 4.1. The dispatcher selects a target basic block to JIT compiler for execution. JIT complier fetches each instruction in the selected basic block and emulates their behaviors. In the emulation unit, an emulated memory sys-tem is implemented for storing the emulation results. By using this DBI framework, both instructions of normal program and user instrumentation can execute correctly.
Operating System
Hardware Emulation Unit
JIT Compiler Instruction
Cache
Dispatcher
Single Process Virtual Machine
Dynamic Binary Instrumentation (DBI) Figure 4: DBI framework architecture
19
4.3 Dynamic analyzer
Based on concolic execution, loop-aware concolic execution does concrete and symbolic execution simultaneously. As a new approach applied on concolic execution, a serial of methods is proposed to analyze program behavior. The names of these me-thods are dynamically loop finding (DLF), loop counter recognition (LCR), loop block re-sorting (LBR), and loop behavior modeling (LBM). Their functionalities are described in Section 3.3. The following paragraphs illustrate their implementations in RELEASE system.
Dynamic loop locator. Dynamic loop locator is the implementation of DLF. While RELEASE system running, the executed flag of executing basic block will be marked.
Dynamic loop locator will check the executed flag of new fetched basic bock is marked or not to identify the control flow is a loop or not.
Loop-aware concolic execution. By modeling the loop behavior, we apply our anal-ysis method, LCR, LBR, and LBM, to discover buffer overflow vulnerabilities. RE-LEASE system uses a recurrence relation solver, called PURRS[23], on LBM to solve the modeling problem. This solver has the capability to calculate polynomial recur-rence relation. For instance, a polynomial recurrecur-rence relation equation is like:
2
And the solved equation is:
0 RELEASE system uses a path constraint solver, called STP. The result answer is gen-erated as the next execution input.
20