Relaxing the Restrictions - Design and Implementation

III. Design and Implementation

3.4. Relaxing the Restrictions

The limitations of our system are shown in 3.1, and our system can be more powerful if some of them can be relaxed or removed. We just discuss how to relax the restriction that the binary must be generated by GCC, since this is the largest obstacle for us when claiming that we have solved the code discovery problem in Thumb-2 instruction set. Other restriction may also be relaxed, and we regard them as future works.

3.4.1. Using the compiler other than GCC

GCC is one of the most powerful and popular compiler in the world, and it is also open-source, so we can easily know how to find the data, including PC-relative data and switch tables, in the GCC-generated binary by tracing source code of GCC. For other compilers, we have to separate them into two categories according to whether it is open-source.

PC-relative data can usually be found by searching LDR-prefixed instructions with base register being PC. However, some compilers generate this kind of instructions by storing PC to certain register and using this register as the base when loading. Fortunately, we have implemented the register mapping table as described in 3.3.3, so these cases are easy to be found, and we can add some routine to our analyzer easily if the pattern can’t be recognized.

Therefore, no matter whether the compiler is open-source, PC-relative data can be easily detected.

Finding switch tables is more complex. If the compiler is open-source, then it’s easy since a new finite state machine can be added to our analyzer and the corresponding patterns can be recognized. However, the popular compilers for ARM, like ARMCC and Microsoft ARM compiler, are not open-source; therefore, what we can do is using many test cases to try what they generate for switch tables and then we can create a FSM

corresponding to them. This approach is dangerous since we can’t ensure our analyzer can recognize all of the patterns that the compiler may generate. As a result, we have to do some analysis about the compositions of switch tables, and this work is shown in next section.

3.4.2. Switch Table Analysis

The switch table must contain several factors, including table base, default target, number of cases, and the value that describes the jump offset. Once our analyzer finds them, it knows where the table is, so we will discuss how to find these values in the binary.

Thumb-2 instruction set has two instructions, TBB and TBH, which implement table branch; therefore, compilers that use only these instructions to generate switch table is reasonable since hardware implementation must be better than software. As a result, we analyze the use of TBB and TBH first. Encoding method of TBB (TBH) are shown in Figure 29, it loads a byte (halfword) from the address which is the sum of <Rn> and <Rm> (<Rn> and

<Rm> * 2) and branch to the address whose value is current PC plus the loaded value.

Figure 29. Encoding method of TBB and TBH

First, we have to claim that the TBB (TBH) must be decoded correctly. If the table is put after TBB (TBH), then the instructions before TBB (TBH) must be decoded correctly and so as TBB (TBH). If the table is put before TBB (TBH), there must be a branch instruction before the table since this table is generated for the instruction after it. As a result, we can claim that the TBB (TBH) for the switch table is definitely be found.

<Rn> of TBB (TBH) is the table base and <Rm> is the offset, so what our analyzer have to find is default target and number of cases, now. The offset must between zero and the number of cases; otherwise, the table basis must be other value. Therefore, there must be a conditional branch instruction that branches to the default target, and a comparing

instruction that decides whether the branch is taken, and they must be found before TBB (TBH).

Testing whether a value is between zero and another value takes two instructions, so the compiler may shift all of the case number and let the smallest one be zero. Therefore, only one comparing instruction is needed, and it tests whether the value is less than certain value. Furthermore, this upper bound can be regarded as the number of cases. What we have to notice is that the table size must be a multiple of two due to the alignment issue, so the number of cases must be plus one if it is an odd number. In general, this comparing instruction is CMP, which uses subtracting operation and updates APSR without saving result, and the following with the instruction B, with the condition being greater than or equal, just greater than is also permitted. As a result, if our analyzer can find a comparing instruction and a branch instruction in ordered before TBB (TBH), then the corresponding switch table can be found.

Compilers may generate switch table using instructions other than TBB and TBH, but the

comparing instruction and branch instruction are still necessary. Since the instructions that can be used to load a data and jump is not many, we can add all of them in our analyzer that fits all of the cases.

3.4.3. Case study: ARMCC

The ARM compiler (ARMCC) usually uses LDR-prefixed instructions to load PC-relative data, and it also uses ADR (form a PC relative address) instruction to store PC value in the register and then load the data by LDM (Load multiple) instruction.

In some cases, PC-relative data generated by ARMCC may be null-terminated. For example, the only parameter passed to the “printf” function is only the head address of the data. This case don’t occur in the binary generated by GCC since GCC puts null-terminated data in other section. Our analyzer don’t support this kind of PC-relative data, since we don’t have enough information about how many functions use this kinds of data.

For switch tables, ARMCC uses CMP and BCS (branch greater than or equal) and following TBB or TBH to implement switch tables, and what the difference between it and GCC is that GCC uses BHI (branch greater than). Therefore, the difference is that the number of cases in ARMCC that is gotten from CMP instruction have to plus one. Some cases may not be found by our test cases since ARMCC is not open-source, but we have shown that our analyzer can cover almost all of the cases with a few modification.

The system call instruction generated by ARMCC use ARM semihosting interface, instead of using Linux kernel, so our system call emulator doesn’t support the binaries generated by ARMCC now. We tried to translate the binary generated by ARMCC just because we want to ensure that what we did for code discovery problem can fit other compilers easily.

在文檔中一個為Thumb-2可執行檔以LLVM為基準的靜態二元轉譯系統 (頁 47-51)