Path Optimization - Related work: Constraint-based Execution Optimization

3 Related work: Constraint-based Execution Optimization

3.2 Path Optimization

3.2.1 Constraint Caching and reusing

Concolic execution will systematically execute every feasible path. In fact, the program paths often increase exponentially in real program. Therefore executing all feasible paths is not scalable. However, many paths are traversed repeatedly in test programs. One of solutions to alleviate the problem is constraint caching and reusing, that is to reuse constraints that already exist. SMART, which is an extension of DART, exercises functions in the program and records the summaries (pre-conditions and post-conditions) of these functions by their inputs and outputs. SMART re-use these summaries when program calls these visited functions. According to SMART’s experiments, it can reduce the number of execution path from exponential to linear. ALERT with SP module (Resolving Constraints for COTS/Binary Components for Concolic Random Testing proposed by Yang-Chieh Fan [20]) also uses the similar idea of constraint caching and reusing. However, it only

implements on the C Standard Library currently. When the test program calls a function from the C Standard Library, instead of traversing this function, ALERT will automatically add the corresponding post-condition that is pre-constructed by ALERT developer. Not only alleviate the path explosion problem, it also makes the concolic execution more precise and explores more program paths.

SMART algorithm, compared with DART, is only different in function call statement.

When SMART executes a function call f, it first checks whether a summary of f is available by checking whether current symbolic calling context implies the disjunction precondition currently record in the summary of f. If so, SMART adds the summary of f into the path constraints and turns the backtracking flag off, which is resumed when function f returned.

If no summary of f for current input is available, SMART will put the current input of f into a stack and start backtracking f for current input. When it finishes backtracking, it will execute add_to_summary function to compute the summary of f. After computing the summary of f, SMART will determine where it should backtrack next by executing solve_path_constr function. When backtracking is over, SMART will restore the input saved in the stack.

Unlike SMART, ALERT with SP module provides an easy way to add post-conditions of different functions and also support a post-condition library of the C Standard Library.

When the test program calls a function in the C Standard Library, ALERT instruments a stub function after this function to add the corresponding pre-conditions into the current constraint system. By this way it is not necessary to add source codes of functions in the C Standard Library into the test program, and can add post-condition without traversing the C Standard Library function.

The researchers of SMART run both SMART and DART on the subset of oSIP parser code, which is an open-source C library implementing SIP protocol. According to the paper of SMART, the runs DART needed is exponentially increasing with the packet size,

however, SMART just increase linearly with the packet size. In addition, the running time is linear to the number of the runs. The paper of Resolving Constraints for COTS/Binary Components for Concolic Random Testing shows that the frequency of invocations of external functions increases with the scale of the test program. With larger test programs the missing constraints of external function increases. However, ALERT with SP module can efficiently mitigate this problem by adding additional post-constraints. It also implies ALERT with SP module can alleviate the path explosion problem, because it does not have to traverse those external functions to obtain constraints.

DART has path explosion problem, but SMART solves it without loss path coverage and makes it more scalable to large programs, because SMART can use the information about functions it computed and re-use those summaries when it executes these functions again. Thus, paths in these summarized functions do not need to be traversed repeatedly.

Moreover, this idea is not only just on functions, but also on loops, program blocks or object methods.

The purpose of ALERT with SP module is to mitigate the gap between external functions and test programs due to missing constraints resulted from concretization. With its help, ALERT with SP module makes concolic execution more precise. In addition we also can view this method as a kind of constraint caching and reusing since it can generate required constraints without traversing external functions. Compare to SMART, ALERT with SP module is a simple kind of SMART which can not generate required constraints during execution and need to construct by users beforehand.

3.2.2 Path Reduction

To solve path explosion problem, SMART uses summary of functions to avoid repeated traversal. On the other way, this problem can be alleviated by avoiding repeated traversal paths that causes the same side-effects since a lot of paths are redundant.

RWset[21], which derives from EXE, discards current path that have same side-effects as previously exploded one. RWset proposes that it is unnecessary to traverse path that has same side-effects with previously one. Once it recognizes this situation, it will stop the traverse and generate an input for this path. And the experiments show that RWset got a significant improvement on efficiency.

The main idea of RWset is many execution paths have the same effects, and it depends on program point and program state. A program point is a MD4 hash of the program counter and callstack, and a program state consists of the current path constraint and the values of all concrete memory location. If there is an execution path that has the same program state with previously executed one at some program point, then RWset considers this path will produce the same effects and truncates this traverse. Besides, this idea can be enhanced by exploiting that two program states are identical if they only differ in variables that are not read after subsequent execution.

RWset records all program states along the execution path so far by writeset which is a set of values of all concrete memory locations and path constraints at every program point;

moreover, it also stores readset at each program point. Given a program state, the readset collects all locations read after a program point. When program reaches a program point, RWset determines whether it has seen the program state before. RWset first intersects the writeset and path constraints with the corresponding readset, and then compare the intersection result with the current program state. If the result and the current program state are equivalent, then RWset will prune this execution and generate a test case for this path;

otherwise RWset continues.

Researchers of RWset ran five medium-sized open-source benchmark compared with EXE. The results show that RWset has a significant improvement. It substantially reduces the number of test cases to achieve the same coverage as in EXE. In addition RWset does not cause much overhead, according to the evaluation result of RWset, for all the

benchmarks, the average runtime overhead of RWset is at most 4.38%. They measure it by running an EXE version with all RWset require computations but without pruning any paths.

The most important problem that RWset solved is the scalability problem. As we already know, constraint-based execution is not scalable when it checks large scale programs.

RWset proposes a new method to detect and prune the redundant paths.

在文檔中藉由迴圈相依性與限制式牴觸分析進行大範圍安全性質檢查 (頁 21-26)