The Second Phase - C程式中正負整數轉換錯誤之偵測

The key idea is: if programmers do input validation right, an unsafe input value should never trigger the same execution path as its safe counterpart. If not, user may use very large value as input and make program crash. We define a safe range as follows:

when an integer conversion operation happens in a program, the value of the converted variable may not be able to be represented by the format of new type. When the logical meaning of value is preserved, it is in “safe range.” Otherwise, it is in “unsafe range.”

This idea is illustrated in Figure 8.

01 char i;

02 unsigned char j;

03 i = -1;

04 j = i;

Figure 8: An example of safe and unsafe range

After execution of line 4 in Figure 8, j becomes 255(0xff) while its original

meaning is -1. So -1 is in the unsafe range of integer conversion in line 4. {So a safe/unsafe range is actually goes with an integer conversion operation, but not a variable alone.}

In dealing with signedness problem, we can expect safe/unsafe ranges of the following form: a >= 0 or a < 0. For example, safe range of the integer conversion in line 4 of Figure 8 is “i >= 0” while the unsafe range is “i<0.”

The goal is to find out whether unsafe input value can trigger the same execution path as safe input value does. There may be a lot of potentially dangerous integer conversions all over the program. To check them efficiently and soundly, we propose a testing method based on ALERT. We call this method “refilter algorithm” because when ALERT perform this algorithm, basically ALERT is doing extra input validation for programmers. If unsafe input value does not filtered out by input validation in program, ALERT will check (filter) it by refilter algorithm.

The Refilter algorithm consists of two steps:

1. Monitor the occurrence of unsafe value data flow.

2. Check whether targeted unsafe value data flow is dangerous.

A unsafe value is a value in unsafe range. An unsafe range is defined by a specific integer conversion. Therefore, to monitor the occurrence of unsafe value, ALERT must identify potentially dangerous integer conversions. To identify potentially dangerous integer conversions, we have to search for all integer conversion in the program we want to check. We use CIL for this task, which builds up an abstract syntax tree (AST) for the program. Then we can traverse this AST and search for integer conversions.

Once we find an integer conversion, we insert a checker call into this AST. Then CIL transforms this AST back to source code.

To check targeted unsafe value data flow is dangerous, ALERT must check whether the unsafe value flow through the target integer conversion and flow along the current execution path is dangerous. ALERT must know whether a unsafe value will flow through the same path as a safe counterpart does. ALERT achieve this by checkers and checking function executed at the end of each ALERT iteration. Checkers collects information until the whole execution path is decided. Once the whole current execution

path is decided, we can check the complete unsafe value data flow of each potentially dangerous integer conversion along this execution path. If ALERT checks an unsafe value data flow when the current execution path is not complete, then it may find out a unsafe value flow through partial execution path the same as safe value flow through.

But this unsafe value may be filtered out by some input validation in the following execution path.

Just inserting checker call into source code does not fulfill our goal. Those checker need to be triggered. ALERT will systematically search for all execution paths and triggers all checker along the execution paths it finds. Therefore, all checkers will be triggered, that means all integer conversions will be checked.

When checkers are triggered, they check whether a specific integer conversion is really a dangerous one. This task can be performed by CIL, but can also be done during runtime. If the integer conversion checked by ALERT is a dangerous one, ALERT add corresponding constraint to CVCL and these constraints will be solved together with path conditions latter. What constraint is going to be added is depend on types involved in integer conversion and concrete value of converted variable when conversion is performed. The types involved in the integer conversion decide what is safe range and unsafe range of this integer conversion. The concrete value of the converted variable when conversion is performed in runtime decides whether we add a constraint corresponds to safe range or unsafe range. If concrete value is in safe range, we add a constraint corresponding to the unsafe range. Otherwise, we add the one corresponding to the safe range.

At the end of current execution path triggered by ALERT, we enter the main part of this algorithm. ALERT use information collected along current execution path to check whether all integer conversions in the path are safe or unsafe one by one. ALERT keeps track of each dangerous integer conversion and the safe/unsafe range of that. By this, we can ask whether value of each dangerously converted variable can be in another range.

If they are in safe range when integer conversion is performed, we want to find out whether they can be in unsafe range and still trigger the same execution path as current path, and vice versa. The unsafe value should be filtered out by input validation of a program. An unsafe value should be handled by exception handling part of a program.

Figure 9 & Figure 10 illustrate this idea. CVCL will solve this and tell ALERT whether it is possible or not. This will generate false positives and false negatives because we do not model all operations that a machine can perform. If CVCL tells ALERT it is possible, ALERT will get input data. Then users can check whether it is feasible by execute the uninstrumented program with input data generated by ALERT.

Figure 9: Successful input validation

Figure 10: Unsuccessful input validation

When we solve a set of constraint by a solver, we are asking the solver whether these constraints are true in all cases. If it is possible to make the set of constraints false, the solver will give a counter-example that makes the set of constraints evaluates to false. Path conditions are a set of constraints that every transition from initial state to current state of a specific program counter must meet. For example: If the execution path of this program marked by line number in Figure 11 is 1-2-3-6, then variable i must larger than 10 or the execution path will lead to line 4. The path condition of 1-2-3-6 is

“i>=10.”

Refilter algorithm collects also the constraint of integer conversion itself. When constraints of integer conversion are solved together with path conditions, ALERT are simply checking whether a specific variable can be in a specific range when integer conversion happens while follow current execution path. For example: If the solver report invalid to this set of mixed constraints of Path 2 in Figure 12, then it is impossible to make the variable i < 0 on executing line 4 while execution path is 1-2-3-4-5-6.

Source code Execution path Path condition

Path 1: 1-2-3-6 i>=10

Path2: 1-2-3-4-6 (i<10)&(i+j <= 5) 01 int i;

02 Int j;

03 If(i < 10) 04 if( i+j > 5) 05 printf(“foo”);

06 return;

Path3: 1-2-3-4-5-6 (i<10)&(i+j > 5)

Figure 11: example of path condition

Source code Executoin Path Path Condition Integer conversion constraint

Path 1: 1-2-3-6 (i>=10) 01 int i=input();

02 unsigned int j;

03 If(i < 10){

04 j = i+1;

05 malloc(j); } 06 return;

Path 2: 1-2-3-4-5-6 (i<10) (i@line4 < 0)

Figure 12: example of path condition mix integer conversion constraint

Our algorithm is shown in Figure 13.

while(there exist some path not searched){

inputData = getNextInput();

executeAndMarkUnsafe(inputData);

for(each of marked signedness conversion){

safeRange = markedConversion.safeRange;

unsafeRange = markedConversion.unsafeRange;

if( solve(PathConstraint, safeRange)&&

solve(PathConstraint, unsafeRange)) universalChecking();

}

generateNextInput();

}

Figure 13: Pseudo code of refilter algorithm

We consider the following four kinds of bug are most related to the unfiltered taint data flow:

1. Memory related library function. The functions with parameter of type size_t should be checked, such as malloc(size_t). When calling these functions, the corresponding argument should not be negative.

2. String library function with boundary checking, such strncpy(char*, const char*, size_t).

3. Array index out of boundary

4. Big loop index variable

We can instrument a universal check before these instructions to check whether these bugs will occur.

When ALERT executes the suspicious execution path, it will triggers checkers. If any checkers should fail, ALERT will generates the input data that make it fail. Then we can use these input data to check whether it will really cause a problem.

Universal checks are not able to find semantic bugs. If all universal check fails, then programmer should consider this suspicious execution path is cause by a semantic bug.

在文檔中 C程式中正負整數轉換錯誤之偵測 (頁 21-27)