Design Issues - Design and Implementation

III. Design and Implementation

3.2. Design Issues

Code discovery problem and code location problem are the main problem we have to solve. The former is due to we don’t know where is the code and where is the data in the source executable, and the latter is due to the source indirect branch target address is not the same as the target indirect branch target address. For more detail, please see 2.2 and 2.3. We show how we solve these problems and describe other problems we encountered in this chapter.

3.2.1. Code Discovery Problem

As described in 2.5, distinguishing data and code in Thumb-2 binary is the major challenge we have to defeat when solving code discovery problem. Data interspersed with code in the binary make this problem hard to solve, so we classify all of data into several sets first and then conquer each set separately. These data can be classified into four kinds:

1) PC-relative data 2) Switch table 3) Padding

4) Unidentified cases

Once our translator finds all of them, it knows where the data are, and this problem is solved. We will describe how these data generated in Thumb-2 code and how our translator finds them in next section.

3.2.1.1. PC-relative Data

This kind of data are easy to find, since they can be found by load-register instructions, including LDR (unsigned word), LDRB (unsigned byte), LDRSB (signed byte), LDRH (unsigned halfword), LDRSH (signed halfword), LDRD (double word), with PC register being the base. PC register is not permitted to be an operand in all Thumb-2 instructions, and these instructions are examples that are permitted. Notice that PC value must be word-aligned, according to the document of ARM architecture [12].

3.2.1.2. Switch Table

Different compilers may use different patterns to generate switch tables. Since GCC [13]

is open source, popular, and widely used in modern world, we can ensure that we can find all switch tables generated by GCC. There are three possible patterns that GCC generates for switch tables, as shown in Table 2. % indicates certain register, and # indicates an immediate value.

Table 2. Thumb-2 switch patterns

– cmp %case, #case_num – bhi #default_target

tbb [PC, %case] tbh [PC, %case, lsl #1] adr %reg, #table_head ldr PC, [%reg, %case, lsl #2]

All Thumb-2 switch tables start with CMP (compare the value) and BHI (branch if greater than). First, the program compares the input value and total number of cases, and then jumps to the default target if the input value is out of range. The third instruction is decided by how many bits are used to indicate one case. TBB (H) uses one (two) bit(s) to indicated one target address, and the remainder uses four bits. The difference of them is what are stored in the table. The offset between target address and current PC is stored when TBB or TBH is chosen, while the target address is stored when the third pattern is used.

The third kind of switch pattern uses ADR instruction before LDR instruction and uses

%reg instead of PC, because Thumb-2 has 16-bit and 32-bit instructions and the word data must be word-aligned. There might be a NOP instruction, regarding as a padding instruction, between LDR and the head of switch table if the address of LDR is not word-aligned. Since 16 bit ADR instruction ensure that the address stored in the register must be word-aligned, GCC generates ADR instruction before LDR instruction for fear that NOP influences the head address of switch table.

The offset between switch table and the branch target must be encoded in 9 bits and 17 bits for TBB case and TBH case respectively, and it must be positive (that is, only the

addresses bigger than the table can be branch targets), so the third case is needed since the target address is stored immediately in the data, and all of the addresses in the binary can be

branch targets.

3.2.1.3. Padding

The purpose of using padding data is to make the instructions align word, half-word, or other length of bytes. NOP and NOP.w are frequently used for padding, and they can pad 16-bit and 32-16-bit respectively. Compilers also use 0x0000 as a 16-16-bit padding, and sometimes the padding size may be the multiple of 16 bits for some purpose. Fortunately, 0x0000 is regarded as MOVS r0, r0 when using Thumb-2 ISA; therefore, three cases of padding

described above are all instructions, and they don’t influence any CPU state and control flow of the program, so our translator doesn’t have to regard these cases as special cases.

If other encoding methods of Thumb-2 instructions are also used for padding, they must be decoded and translated without any control flow reaching them. Even though they might consist of some undefined instruction, they can still be handled by regarding them as

unidentified data, described in 3.2.1.4.

Our concern is the padding data that are not regarded as an instruction. This kind of data occurs when GCC generates switch table with TBB instruction and the total number of case is an odd number. Since Thumb-2 must be half-word aligned, a byte data must be padded.

As Figure 10 shows, there are nine cases, numbered from 0 to 8, in this switch table.

The address 0xdf32 to 0xdf39 describe the target addresses of case 0 to case 7. Since we assume it is little-endian, the information of case 8 is “0x44”. The next address to be decoded is at df3c, so there must be a padding byte at df3b (0x00 colored orange).

df2a: cmp r3, #8 df2c: bhi.n df3c df2e: tbb [pc, r3]

df32: 382c0738--- 4 cases df36: 38311450--- 4 cases df3a: 23000044--- 1 case

Figure 10. An example of padding byte

There might be some strange cases that occurs in hand-writing code for generating padding data due to commercial issues. For example, pad a PC-relative data and some address of word is then marked as a data, so it won’t be translated. In our experiment, PC-relative data are put at the end of functions, because the efficiency of pipeline may be influenced if they are put at the middle of a function; besides, there must be a branch instruction before these data. Therefore, the analyzer can determine whether the data address to load is really a PC-relative data by checking whether the instruction before this address is a branch instruction, or what before this address is also PC-relative data. The remaining case is that it might be the next function entry, and it can be determined by decoding from this data address and checking whether this instruction can be a function entry. The possible characteristic of function entries will be described in 3.3.2.1.

3.2.1.4. Unidentified cases

This kind of data occurs when LLVM disassembler returns fail state. Either these instructions are not supported by LLVM or they use some architectures, like vector float point (VFP), which is not normal architecture of ARM and Thumb-2. This kind of data may also be a kind of padding data. Our translator marks them as a kind of data and don’t translate them for fear that some exceptions occur when executing our translator. The next word-boundary will be the next address for translating, because many Thumb-2 instructions require PC being word-alignment when executing; therefore, less mistranslations occur by handling in this way.

3.2.2. Code Location Problem

Since indirect branch targets can’t be known by our translator in compiling time, a source-PC (SPC) to target-PC (TPC) mapping table is required for translated binaries, and TPC is the head address of an LLVM basic block in our translator. As a result, to make this table usable, our translator must maintain a mapping table from SPC to corresponding basic block.

This table is not difficult to generate, since required information is kept during

translation. The difficulty is how to reduce the size of this table. The larger the table size is, the more time needed for optimizing and compiling; moreover, the execution time is also longer.

3.2.3. Other Problems

Occasionally, the pattern of certain kind of data may have a little difference compared with the normal pattern. For example, an instruction loads a word with the base register that stores the value of PC, and it must be a PC-relative load. Our system must have abilities to find this kind of situations.

Our translator translates all of the source binary into only one LLVM function, named

“main”, so LLVM optimizer and LLVM static compiler may take too much time if the source binary is large since optimization unit is a LLVM function. For example, 445.gobmk in

CINT2006, which has about 160 thousand instructions, takes about five hours for translation in our experiment. The complexity of them are at least quadratic, instead of linear, so our translator must have an ability to partition the main function into several smaller L-functions.

在文檔中一個為Thumb-2可執行檔以LLVM為基準的靜態二元轉譯系統 (頁 27-31)