Constraint Solving Optimization - Related work: Constraint-based Execution Optimization

3 Related work: Constraint-based Execution Optimization

3.1 Constraint Solving Optimization

3.1.1 Minimizing Number of Calls to Constraint Solver

Constraint solving is a very CPU-expensive operation; therefore we want to minimize the number of calls to the constraint solver. We first present optimizations in EXE and CUTE, discuss their advantages and drawbacks.

Constraint Caching

In EXE, a persistent cache is used to minimize the number of calls to the solver. EXE prints the constraints as a string and computes the MD4 hash of the string. It then queries the cache to see whether it is hit or not. If it gets a hit then the cache will return the answer;

if not, after invoking the solver EXE the (hash, result) pair is stored back to the cache.

If the cache size is not big enough, EXE has to replace a (hash, result) pair. Least recent used strategy may be a good one. It is possible that an error answer occurs due to hash collision. If so, it may get a false path, and cause the false positive or false negative.

Thus, universal hash might be a good choice.

Fast Unsatisfiability Check

In CUTE, if the last constraint is syntactically the negation of any preceding one, then

the solver do not need to solve it, since the last one contradicts one of the preceding constraints, and the whole path constraints is invalid.(In CUTE’s experimental results this method can reduce the number of semantic checks by 60-95%.)Intuitively, linear search can be used to check whether the last constraint is the negation of the preceding constraints.

However, it is not practical in real world, since the number of constraints is usually huge.

Hash function might be a good solution, it can reduce the search time from O(n) to constant.(n is the number of constraints.)

3.1.2 Minimizing Formulas

The longer formula we feed to the solver the more time it needs to solve constraints. Thus if we can minimize formula, we can speed up constraint solving. Following is three optimizations in EXE and CUTE respectively.

Common Sub-constraints Elimination

Before passing path constrains to the lp_solve(CUTE builds their own solver on top of lp_solve.), CUTE first identifies and eliminates common arithmetic sub-constraints. (In CUTE’s experimental results this method can reduce the number of sub-constraints checks by 64-90%.) It is possible to find common sub-constraints by linear search, but the number of constraints is usually huge. Thus linear search may not be a good choice.

An MD4 hash for each sub-constraint is computed and stored in a cache. Before adding a constraint into the path constraints, computing the hash of this constraint and then check whether it is in the cache or not. If so, the constraint will not add into the path constraint formula. Therefore the time complexity of searching reduces from O(n) to constant.(n is the number of constraints in the path constraint formula.)

Constraint Independence Optimization

This is one of EXE’s important optimization, which exploits that path constraints can be divided into multiple independent subsets. EXE defines independent if two constraints

have disjoint sets of operands. This optimization method has two benefits. (1) It can ignore irrelevant constraints, thus decrease the cost of constraint solving. (2) It can increase the cache hit rate in EXE, since the subset may be used by previous paths and smaller subset reduce the number of checks. Thus it increases the cache hit rate.

EXE computes independent subset by constructing a graph G, whose nodes are variables existing in all the constraints. Adding an edge between nodes n_i and n_j if and only if there is a constraint that contains both of the nodes. Once graph G is constructed, EXE apply component connected algorithm on G to compute its connected component. Then for each connected component construct corresponding independent subset by adding all constraints in the same component.

Incremental Solving:

Incremental solving identifies dependency between sub-constraints and exploits it to achieve following goal: (1) solving constraints faster. (2) keeping the input similar.

CUTE defines two constraints p and p’ are dependent if variables in p and p’ have conjunction or there is a constraint p’’ such that p and p’’ are dependent and p’ and p’’ are dependent. It is worth to note that in CUTE, path constraints in two consecutive concolic executions C and C’ only differ in the last constraint. Assume their corresponding solutions are I and I’’, and the last constraint in C is K. CUTE first collects constraints in C that are dependent with the negation of constraint K. Let the set of those constraints be M, and the solution of M is C’’. Solution I’’ is the same as I except for those constraints in M every solution of which is replaced by C’’(l).(l is the constraint in set M.), that is, the solution of constraints that are not dependent with K still remains while the solution of constraints that are dependent with K is changed.

The above optimizations have the same object: reducing the path constraints size by reusing previous answers. However, they achieve the goal by different ways. EXE divides

constraints into multiple subsets, then checks the cache whether there is an answer, and just solves constraints that are not in the cache. CUTE first finds out the constraints in C that is dependent with the negation of the last constraint N, and then solves these constraints to obtain solution C’’. The next solution C’ is the same as the previous one C but for those constraints dependent with N replace their solution with C’’(l).(l is the constraint that is dependent with N.).

Because EXE will fork process, it is better to use cache, and every child process can query the cache. CUTE will negate the last predicate then compute the next input, thus the replace strategy is better for it, since every execution path is relevant to the previous one. In fact constraint independence optimization and incremental solving both use the “cache concept”. They do not solve redundant constraints and only solve new constraints. Both of them have significant improvement on execution optimization.

在文檔中藉由迴圈相依性與限制式牴觸分析進行大範圍安全性質檢查 (頁 18-21)