The Chaining Approach - 以搜尋式方法偵測程式溢位弱點

2. Background

2.3 The Chaining Approach

The existing methods that only use control information (e.g. control flow graph) about a program will have trouble in guiding the search for the correct solutions. The search failure due to data dependencies within the program shows that only control information is insufficient to handle various program structures. Take the function in Figure 2 as an example, where the target is the execution of node 5 that is influenced by a data dependence of variable b. To execute the target node, b has to be 1 that only happens when the input variable a is 0. The situation frequently occurs in many programs, but it cannot be handled by the methods making use of only control information. Actually, the search failure can be avoided in this case if data dependences related to the test target were also taken into consideration. In regard to this issue, the Chaining Approach [19], an extension of goal-oriented approach, uses data flow analysis to improve the chance of finding test data if control flow information fails to guide the search process.

CFG Node

s int ca_example(int a) {

1 int b = 0;

2 if (a == 0)

3 b = 1;

4 if (b == 1)

5 return 1; // Target

6 return 0;

}

Figure 2. Example for the Chaining Approach.

The basic idea of the Chaining Approach is to identify certain statements which define variables used in a "problem node", then construct abstract paths that lead to the execution of target by traversing the definition nodes prior to the problem node. A problem node is referred to as where search process has difficulties to find test data for preferred execution flow in control flow graph. Basically, the abstract paths referred to as "event sequences" is a sequence of executed nodes, where an event is an executed node. An event sequence consists of events and the order of each event means its order in an execution path. Formally speaking, an event is a tuple ei = (ni, Ci) where ni is a program node in control flow graph and Ci is a set of variables called "constraint set" in which each variable cannot be modified until the next event e(i+1), and an event sequence E = <e1, e2, ..., ek> is an sequence of events.

Extending from the goal-oriented approach, the Chaining Approach serves as a backup strategy employed if the original method failed to search for appropriate solutions. At the beginning, the initial event sequence E0 = <(s,), (t,)> contains the start node s and the target node t that each has an empty constraint set. Once the search

process encounters a problem node p at which test input cannot be found to alter the execution flow toward the preferred branch, data flow analysis is applied to identify last definitions of variables used at node p. Let one of the last definition nodes is di. A new event ed consists of node di and a appropriate constraint set that prevents the variable defined at di or previous nodes are redefined again and keeps a definition-free path. A new event sequence Ei (for i > 0) is formed by inserting new events and the problem event (p,). If the input search cannot find test data to execute the path indicated by an event sequence, another problem node occurs and a new event sequence is generated in the same way. All generated event sequences are organized in a tree structure. Root of the tree is initial event sequence with the first encounter problem node. During the traversal of a tree node, child nodes with new event sequences are formed when encountering a new problem node.

2.3.1 Example for the Chaining Approach

In the given example of Figure 2, the target is execution of node 5. Initial event sequence is E0 = <(s,), (5,)>. Without data flow information, methods without data flow information are hard to find input data to take true branch from node 4 where the predicate is (b == 1). Because the objective function only takes the branch distance at node 4 into account, and is not aware of the data dependence that the b is only to be 1 when a is 0. So node 4 is marked as a problem node and is inserted into the event sequence:

E0 = <(s,), (4,), (5,)>

Only variable b is used at node 4, then loop up the last definition for node 4 and two nodes are obtained: node 1 and 3. For each of both nodes, a new event consists of node itself and a constraint set with one element b. The effect of the constraint set is to keep the last definition of node 4 from redefinition. Two new event sequences are generated by placing event (1, {b}) and (3, {b}) before the problem node 4 in the event sequence E0 separately:

E1 = <(s,), (1, {b}), (4,), (5,)>

E2 = <(s,), (3, {b}), (4,), (5,)>

Searching input for E1 may not have any improvement since no more information is given to the search process, while E2 gives more branch distance information at node 2 to guide the execution for covering the required node 3 that result in the true branch at node 4 executed.

2.3.2 Formal Description

The generation of an event sequence in the Chaining Approach is described as follows. Assume that E = <e1, e2, ..., ei, ei+1, ..., em> is the event sequence on which the search is finding input. Right after the event ei is executed, a problem node p is encountered. Suppose that at a program node only one variable is defined. Let d to be one of the last definitions of problem node p, and def(d) is a variable defined by d. Two new events are obtained from the problem node and its last definition: ed = <d, def(d)>

and e_p = <p,>. By inserting e_d and e_p into event sequence E, a new event sequence E' is generated. The position of ep in E' is right after ei, and ed is in a certain position that decides subpath from d to ei. Now the new event sequence E' is almost completed:

E' = <e1, e2, ..., ek, ed, ek+1, ..., ei, ep, ei+1, ..., em>.

Finally, the constraint set of some events in E' have to be modified to maintain the effect of last definitions. There are three cases should be applied the constraint set maintenance.

1. C_p = C_i, the constraint set of e_p is the same as the constraint set of the prior event e_i. 2. C_d = C_k  def(d), the constraint set of e^d is the union of prior event's and one variable set defined at d.

3. Cj = Cj  def(d), k+1  j  i, merge def(d) with C^j to keep def(d) from redefinition between ek+1 and ei.

在文檔中以搜尋式方法偵測程式溢位弱點 (頁 18-22)