以隨機與邏輯推理運算方式進行高涵蓋率之測試

(1)

國

立

交

通

大

學

資訊科學與工程研究所

碩

士

論

文

以隨機與邏輯推理運算方式進行

高

涵

蓋

率

之

測

試

Automatic Logic Evaluation for Random Testing

研究生：吳孟勳

指導教授：黃世昆教授

(2)

以隨機與邏輯推理運算方式進行高涵蓋率之測試

Automatic Logic Evaluation for Random Testing

研究生：吳孟勳 Student：Meng Hsun Wu

指導教授：黃世昆 Advisor：Shih-Kun Huang

國立交通大學

資訊科學與工程研究所

碩士論文

A Thesis

Submitted to Institute of Computer Science and Engineering College of Computer Science

National Chiao Tung University in partial Fulfillment of the Requirements

for the Degree of Master

in

Computer Science

June 2006

Hsinchu, Taiwan, Republic of China

(3)

以隨機與邏輯推理運算方式進行高涵蓋率

之測試

研究生：吳孟勳

指導教授：黃世昆

摘要

軟體測試所需成本高達總成本的 50%，因此成為軟體發展(Software Development) 過程中的主要瓶頸。現今發展軟體系統的趨勢，趨向於藉由輔助工具(Auxiliary Tool)來降低軟體測試(Software Testing)所需的資源，但現有測試系統大多需要使用者自行設定測試環境(Testing Driver)，測試的效能也隨之影響，主要原因是測試分析工具無法有系統地自動產生測試資料，以證明相關輸入資料可到達特定的執行環境(Program State)，這卻是我們用來印證測試效率及品質最直接的方法。本論文中，我們將以程式邏輯推論為出發點，有系統地產生測試資料(Testing Inputs)以執行所有可能路徑(Possible Paths)。我們實作了一個自動測試工具，稱為 ALERT，能自動分析程式執行時的邏輯條件(Logic Constraints)，以表示執行的語義(Execution Semantics), 藉此可推論剩下可能執行路徑的測試輸入。我們利用 gcov 來分析 ALERT 的測試效率，所自動產生的測試資料可達 90% 的涵蓋率。

(4)

Automatic Logic Evaluation for Random

Testing

Student：Meng

Hsun

Wu Advisor：Shih-Kun Huang

Abstract

Testing has become the bottleneck of software development process due to the cost of

testing process even attributing to 50% of total cost. The trend of contemporary software

development process is toward the use of auxiliary tools to reduce the resource usage of

testing process. However, most testing tools require users to manually produce testing drivers

which have much to do with testing quality. It is due to the lack of automatic input generation

that ensures the existence of the execution path from inputs to the specified program state

with rendered testing condition. Nevertheless, it is our intuitive way to prove the testing

efficiency and testing quality. In this paper, we analyze program semantics based on program

logic reasoning and try to generate testing cases to cover all possible paths. We design and

implement an automatic testing tool, called ALERT, which can analyze logic constraints to

represent semantics of program executions, and reason testing inputs of remaining paths. We

perform coverage test by Gcov [1] to evaluate the testing efficiency of ALERT and reach 90%

(5)

誌謝

我首先必須要感謝我的父母，謝謝他們辛苦撫養我長大，並提供我唸研究所的一切需求。接著，謝謝黃世昆老師，經過老師費心的指導和詳細的討論，才成就這篇論文。還有實驗室的學長們，陪我一起度過在實驗室熬夜的每一個日子。特別感謝 benny、luke 學長提供英文指導，softsweet 在我心情低落時安慰我，以及我身邊的死黨 shen、linkp，還有許多幕後功臣：cocomo、ytls、ji、john、jeff、jay、lwhsu…我以你們為榮。吳孟勳謹誌民國九十五年七月十八日

(6)

List of Figures

Figure 1, General testing process ... - 1 -

Figure 2, Unit testing with inputs generation ... - 3 -

Figure 3, Unit test with model extraction ... - 5 -

Figure 4, ALERT simple example ... - 12 -

Figure 5, ALERT constraints example ... - 12 -

Figure 6, ALERT constraints example ... - 13 -

Figure 7, ALERT traces simulation ... - 14 -

Figure 8, Execute and record logs ... - 15 -

Figure 9, Constraint log and dependency log ... - 15 -

Figure 10, Symbolic evaluate ... - 16 -

Figure 11, Logic evaluate ... - 16 -

Figure 12, ALERT pseudo-code ... - 18 -

Figure 13, Flow chart of symbolic evaluation... - 21 -

Figure 14, Flow chart of logic resolution ... - 23 -

Figure 15, ALERT components ... - 25 -

(9)

List of Tables

Table 1, Manual vs. automated testing ... - 2 -

Table 2, Comparison of former data recreation methods ... - 7 -

(10)

1. Introduction

In this chapter, we first show the software testing process of modern times and the

problem that we are going to solve. An automatic method of testing process is introduced in

next section. Finally, we briefly state our motivation and objective.

1.1. Software Testing

Software quality nowadays seems crucially essential. In software engineering, the most

significant process to ensure software quality is testing. The key point that how to carry

testing process out yet depends on the testing strategies. Figure 1 shows a general type of

testing. It consists of three stages: input testing cases, execute programs and compare outputs

with correct results.

Figure 1, General testing process

In the last stage, we usually perform the comparison of results manually. But it is very

expensive to verify the program outputs manually. The Quality Assurance Institute in 1995

proposed a comparison of manual and automatic testing which introduced about 1750 testing

cases and 700 defects. Table 1 shows the result about time consumption for testing processes

(11)

analyses are 55%, 95% and 50%, respectively.

Table 1, Manual vs. automated testing

Test step Manual testing Automatic testing Percent Improvement

Test plan development 32 40 -25%

Test case development 262 117 55%

Test execution 466 23 95%

Test result analyses 117 58 50%

Defect tracking 117 23 80%

Report creation 96 16 83%

Total hours 1090 277 75%

It encouraged us to develop ALERT, an automatic testing tool that automates these test

steps including test case development, test execution and test result analyses. In order to

generate test cases, ALERT must handle the logic information about program source and

static semantics. In addition to generating test cases, the creation of test interface is required

to automate test execution. To demonstrate all system functionalities in a direct way, we have

to test every component one by one, such a method named unit testing. Finally, due to the

automation of the whole test process, we need to generate test case for the remained paths

(12)

Figure 2, Unit testing with inputs generation

1.2. Test Strategies

Traditionally, test strategies method divided into two classes, static and dynamic analyses,

based on the source code and the runtime execution, respectively. Both have been maturely

developed in modern times. Static analyses examine program source code or binary image

without invoking the programs. It is recommended to apply static analyses to program

systems with complex interaction or heavy running overhead. As a consequence, the static

method is extensively used in large scale software for inconsistency checking. Yet, the static

method could cause large number of false alarms. It is because we can’t verify and perform

any feature revealed through static analyses. Static analyses report too many features that the

practicability of those features is unknown.

Dynamic analyses involve the execution of programs. We apply the dynamic analysis on

programs whose environment can be simply reconstructed and the runtime overheads are

tolerable. Obviously, the difficulties of the dynamic method are test input development.

(13)

satisfies the constraints to perform features. The quality of test cases decides the efficiency of

testing tools. It is an extremely difficult work to manually generate testing inputs upon

thousand of testing runs.

1.3. Motivation

If we can find the input cases for a given code location and environmental content, we

are able to eliminate the false positives of static method by examining the feasibility of each

reported feature. For dynamic method, with the support of test cases which enable programs

to perform the remained paths, we can improve the efficiency of whole analysis process, with

better coverage.

With the automatic generation of particular inputs from another running instance, we can

easily apply debug method with only failure running instances. It is an advanced integration

of testing and debugging that we resolve the program run-time logic and produce special test

cases for debugging tools to make decisions more precisely.

1.3.1. Feasible Inputs

We propose a mechanism to automatically generate test cases and ensure feasibility of

given paths. Static methods check source code consistency, but if we want to suppress false

alarms we must execute programs and check the features reachable. On the other hand,

dynamic methods execute programs many times, and require test inputs to go through all

(14)

program logic to transform programs into special models for of possible test case resolution.

1.4. Objective

Our objective is to extract logic models from execution instances and create test cases

that cover as many as possible control branches by checking logic models, as figure 3 shows.

We focus on C language, which is one of the most popular imperative languages and present

the logic models with rule-based language. Clearly, there is some gap between imperative

language and rule-based language, and it is impossible to transform imperative programs into

equivalent rule-based models. However, we discover that it is feasible to make a one-way

transformation from run-time instances of imperative programs into rule-based ones.

Figure 3, Unit test with model extraction

We perform the test process by reasoning on the program behaviors, similar to manually

trace-back of program logic, interleaving with symbolic evaluation and concrete evaluations.

Two goals will be fulfilled: logic model extraction and run-time predicates resolution.

1.4.1. Logic constraints extraction

(15)

represent the dynamic states upon codes. We are going to build an executable logic model that

can be evaluated exactly like the original program running. There is still one difference: an

executable model doesn't surround by real program environment. To fix this, the incomplete

predicates of logic models would be resolved by symbolic evaluation of runtime traces.

1.4.2. Runtime predicates resolution

What we want to resolve during logic model construction are those variables and

date-structures that exist in concrete memory space, including function arguments took by

procedures and addresses that pointers refer to. Because logic rules can not express the state

transactions of concrete memory space, we mark those predicates as unknown symbols and

resolve their value dependencies by symbolic evaluation.

We can evaluate each line of source code, and substitute the symbolic meaning of

variables. Of course, the resolving operation is not invertible. We expect to show how the next

state comes from according to the current state. It is used to illustrate running instance of

program iteration.

1.5. Synopsis

The rest of our thesis is organized as follows. The related works and state of the art are

reviewed in chapter 2. The overview and basic concept of ALERT are discussed in chapter 3.

Chapter 4 presents the detail and implementation of ALERT. The result and evaluation of our

(16)

2. Related Work

The implementation of program logic resolution spans many topics. Recently researches

focus on causes-transaction [2]. This method inspects into a program execution state, e.g.

breakpoints, tries to find some clues between each state changing, and extracts a cause effect

chain from the blamed site to the faulty site. We are not the first to consider logic

representations. Former research like PREFix[3], infers for logic conflicts by treating program

expressions as logic constrains and program functions as logic sub-models. We will also

introduce other implementations like model extraction[4-7], dynamic instrumentation for

tagged memory operations[8], and conditional toggling test method with full path-coverage

guarantee[9].

2.1. Data Reconstruction for Cause Effect Chains

Let us denote the memory structure occupied by a program as MStruct(t), where t stands

for time. We can conclude that: Zeller’s method [2] automatically creates memory graph

MGraph1 and MGraph2 from MStruct(t1) and MStruct(t2) respectively under the debugger.

Demskey[10] try to reconstruct data structure representation from MStruct(t) with

user-specified constrains and debugging information. Purify[8] searches the whole MStruct(t)

for potential dereference. In more general, we propose a logic representation from source code

and resolve logic predicates with MStruct(t). Table1 shows advantages and disadvantages of

our work and others.

(17)

Zeller[2] Brian[10] Purify[8] ALERT Advantages Automatic Memory Graph Extraction and Differentiation Semi-Automatic Data Structure Construction Memory Tag Annotation Automatic Logic Model Extraction and Resolution

Limitations Require Two Similar MStruct(t) Require User Specified Constraints Require Extra Memory Space Perform Unit Testing for Functions

2.2. Model Checking

The way to model the behavior of programs varies from language properties to checking

purposes. Finite state machine checking is easily understood and specified. Most model

checking methods use FSM to model program systems. Monica Lam’s work [4] creates FSA

model representations from Java. They focus on object interactions and method call sequences.

Method calls are treated as state transitions. The FSA object obtains a new state when a

member function is executed. Metal [6, 7] is a programming language for detecting program

contradictions. Users specify a FSA model in Metal to check inconsistence within system

codes. The FSA model contains states and transition rules that will match any AST nodes of

programs. Metal reports these inconsistent states matched by rules. PREfix[3] have pointed

out that programs can be represented in a logic way. They transform each function into a logic

model which contains logic-based constraints, guards and results. The entire program model

can be built by combination of all sub-models. They search contradictions in program models

(18)

of conditional statements to users. We make comparisons of each model checking methods

and depict in Table2.

Table 3, Comparison of model checking methods

Monica work[4] Metal[6, 7] PREfix[3] ALERT

Model Properties Component-based, Object-Oriented Describing State Transaction with Programming language Subroutine-based, Top Model Contains

Sub-models Logic-Rules-based, Dependencies between Symbols Checking Mechanism Model Extraction for manually testing Specifying Model for Automatic Checking Model Construction for Logic Unification

Following Logic Dependencies for Causes Analysis Type Static and

Dynamic Analysis Static Analysis Static Analysis

Static Analysis and Dynamic Resolution

2.3. Techniques of Fault Localizers

There are many fault localization techniques used in debugging tools. Since 1960s these

techniques have changed and improved [11]. Our research involves two major conceptual

techniques: feasible path generation and logic symbol evaluation. Path generation[12] is an

automatic generation of program paths that lead to user-specified outputs by toggling each

control branch instances, and uses ESC[13] to check feasibility of toggled paths. Symbolic

evaluation [9] is an automatic dependency extraction for unit testing that guarantees full paths

coverage. We use path generation and symbolic evaluation to find associates of inputs and

(19)

2.4. Symbolic Evaluation and Concrete Evaluation

Another tool named DART [9] combines symbolic evaluation with concrete values. It

provides an automatic unit testing data and test interface generation for a specified C function,

and it can ensure to expand as more program control branches as the variety of testing inputs

by evaluation the symbolic expressions of programs and modification of testing data. DART

guarantees all the bugs reported have corresponding testing inputs to reveal. It is a very

attractive contribution for automatic testing. It also deploys lpsolve [14] to resolve linear

symbolic constraints.

CUTE [15], a successive implementation of DART, improves structural testing data

generation and pointer arithmetic. CUTE creates logic map for each pointer argument, which

mapping to its concrete value of some memory location. During execution, logic map updates

(20)

3. ALERT overview

For an example as figure 4, ALERT generate a testing interface and trace the specified

function. The result of these trace is to get the suitable input cases that cover all conditional

branches. There are two control branches in line 2 and 6. In the first run, x and y input values

are randomly created, said -2853 and 416. The program execution path would go through 1, 2,

6, 7, and return. ALERT trap the conditional expressions to show the restrictions of this

execution path. In this case, they would be ‘! ( x > 0 && y >1 ) ’ and ‘( x < 10 ) ’ , as the

guard of the if expressions in line 2 and 6. In the next execution turn, the testing case ALERT

generated would fulfill the branch alternative restrictions which could be ‘( x > 0 && y >1 ) ’

or ‘! ( x < 10)’. For ALERT would take the latest restriction, the input data would set x to

satisfy “no less than 10” in order to change the execution path to go through line 9 instead line

7.

1 int foo( int x, int y ) { //simple c function

2 if( x > 0 && y > 1) { //branch condition for x, y

3 /* … */

4 return 0;

5 }

6 if( x <10 ) // branch condition for x

7 return x;

8 else // branch condition (false)

9 return y;

(21)

Figure 4, ALERT simple example

Here is a more detailed example which illustrates the conditional and dependency

constraints extraction and the dynamic logic resolution. In figure 5, there are two control

guards along the execution paths: (a + b == 5) and (a == 3), and a dependency instance (a --).

During function execution, the control guards make control thread to branch two paths, one

ensures the sum of variable a, b equals 5 and another not. ALERT now can make assurance

that if the control thread takes the guard of (a + b == 5) and (a == 3) the value of variable at

that moment must be 2. But there is still one thing to took into consideration, the dependency

instance (a --) modify the value of variable a, that causes the logic engine misunderstand all

following conditional constraints. To modify constraints according to dependencies, we

(22)

As figure 6 shows, ALERT traces every conditional and dependency expressions and

output as constraint symbols. These constraint symbols are used to forward the modification

of dependencies. During runtime interaction, the execution instance outputs a unique set of

constraints according to the conditional and dependency expression it reaches. We analyze

these constraints to trace back to the initial state of the execution parameters of programs.

Figure 6, ALERT constraints example

In the symbolic aspect, we quest for the concrete values of symbols, and in the logic

aspect, we just reason a set of solutions to satisfy logic constraints rules. We can easily

produce input cases that lead program execution to take different paths by modifying some of

those constraints. Theoretically, we can’t claim different constraints lead execution to

different paths, but it is true that different paths have different constraints. As figure 7, we

(23)

base to avoid duplicated constraints.

Figure 7, ALERT traces simulation

The following figures 8, 9, 10, 11 illustrate how ALERT records program execution and

handles program outputs. In brief, ALERT has three work stations: instrumenter, symbolic

evaluator, and logic resolver. Each station produce input for next the station. Logic resolver

generates input cases for next running iteration. The first input case is randomly generated or

all zero. Until ALERT can’t resolve inputs to satisfy the constraints after modification

(24)

Figure 8, Execute and record logs

(25)

Figure 10, Symbolic evaluate

Figure 11, Logic evaluate

The ALERT analysis structure is different from DART and CUTE. We perform a

transformation from symbol domain to logic domain which enables us to solve bounded-value

(26)

which performs symbolic evaluation and concrete evaluation on every program expression,

dynamic replacement of variables with symbols, maintenances of branch constraints for full

path coverage.

The essential structure of DART is as followed: in the program compilation level, every

memory fetch and restore instruction are enclosed by checking functions to ensure every

operation over variables is monitored along program progress. At this moment, a testing

interface is generated and all possible branch paths are illustrated. DART performs an infinite

loop to execute the program until the program abort abnormally or assertion failed. There is a

threshold for counting program iterations. One program iteration stands for one program run.

DART reports no bug found if a given threshold is broke. The main testing loop begins with

randomly generated inputs. In the execution level, each concrete evaluation is coupled with a

symbolic evaluation. DART bases on the results of symbolic evaluation to resolve inputs

leading another execution path.

ALERT performs instrumentation in the source code level, and enables the program to

recode the conditional constraints and variable transactions. Conditional constraints are

generated while program evaluates a conditional expression. The purpose of evaluating a

conditional expression is to branch or join program control flows. Therefore, ALERT inserts

instrumentation code that contains the branched conditions into every spread branch, instead

of evaluating the conditional expression to calculate the constraint records. ALERT keeps the

(27)

Figure 12, ALERT pseudo-code

Figure 12 is the algorithm of resolving inputs from symbols and constraints.

Conceptually, ALERT does the same things as DART and CUTE. That is to automate testing

process according to program execution, but we do trace the logic in the programs, not the

//F is the function to be tested. Alert(F) :

//S is the set of Symbols. //I is the input set. //V is the set of Variables S <= find-symbols(F). I <= random(); V <= empty. do

//R is the rules generated while execution. // Logic-execution throw Exceptions. R <= Logic-execution(F,I). For-each Rule in R do R <= R - Rule. If Rule = Assignment-Rule Symbolic-evaluation(R,S). Trigger-Rules(R,Rule). If Rule is Constraints If Contradiction(Rule, R-Rule) throw Exceptions. P <= find-association(Rule). Q <= Associate-Rules(Rule, P). R <= R∪{Q}. Logic-evaluation(S, V). I <= map-inputs(V).

(28)

actual values in the memory.

3.1. Instrumentation for Runtime Trace

We can't prove the existence of the transformation model from imperative language to

rule-based language. But we can model a program execution with rule-based language,

because in a program running instance, all variable dependencies have been decided. And also

alias naming can be resolved through a running instance interpretation. We never try to

transform an imperative language program to a rule-based program; instead, we try to model a

dynamic behavior of an imperative language program. In an imperative program, variables are

nodes spread entire program, static analysis try to find the links between the nodes during the

program execution, but we never ensure the links between variables would really happen.

There is a missing portal for a specified link. As a result, we can easily recognize that only

tracing in run-time, can we get the exactly linkage between variables, and guarantee

feasibility 'cause it is actually running.

3.2. Generation of Runtime Constraints

The information provided by a running program is totally different from the program

source code. In running program, all values are ‘active’, which affect each other in a

consequence. And such consequences provide us a clear obvious view of how programs run.

If we want to understand the control behavior of programs, an intuitive way is to locate every

control branches and print the concrete values during execution. It could take overhead to

(29)

and values is still unknown.

We propose another tracing mechanism that prints the whole branch constraints instead

of concrete values. Additionally, we modify the constraints to fit where the program controls

reach, that is, we insert the constraints exactly at the end of branch operation, no the

beginning. It makes a great reduce on the inspection overhead.

3.3. Symbolic Evaluation Framework

We use symbols to represent variables. If the value of some variable A transfers to

another variable B, we can modify the symbolic representation of variable B to A, and A is

still A. If after modification B, another operation modifies A to ‘A + 1’; then the

representations of both B and A become ‘A + 1’. Initially, we have symbols which represent

some variables, constraints that restrict the symbols to some fixed domain, and transaction

(30)

Figure 13, Flow chart of symbolic evaluation

We transfer an execution states into rule-based description. By substitutions between

variables and symbols, we found that we can illustrate the whole running path that is

composed by every single step of program executions by the accumulation of those steps. For

every execution step can be represented by symbolic compositions, we can evaluate the

symbols to reduce to the minimal constraints. Figure 13 shows how to replace symbols with

representations.

Briefly speaking, the basic concept of ALERT symbolic evaluation is mapping of

symbols. Initially, ALERT enforces every variable to map to itself. Every variable is unique

before the execution. After any modification expression, ALERT traces the original symbol of

the modified variable and combines old symbol with new obtained one. ALERT maintains

(31)

mapping, respectively. Once a symbol is modified to another symbol expression, ALERT

evaluates each symbol in the expression and looks up symbol maps for dynamic values. For

relative access, like pointer, ALERT uses composite symbols as a key to look up many-by-one

maps. In special cases, when some symbol operations require multiple relative accesses,

ALERT creates a dummy symbol during the evaluation of relative accesses. For example,

ALERT creates dummy symbols for ‘p’, ‘*p’ and ‘**p’ when dealing with ‘***p’ or creates

‘a’, ‘a []’, and ‘a [] []’ when dealing with ‘a[i] [j]’.

Symbolic evaluation terminates if the program reaches a halt state. The main purpose of

symbolic evaluation is to aggregate a multi-step program run to a single step. It makes

ALERT figure out the execution path in a logic way and obtain other execution paths by

inference on logic rules. After all symbols are resolved to their initial references, ALERT

sends them as inputs to next stage, logic evaluation.

3.4. Automatic Logic Evaluation

It may be impossible to model programs made by imperative language with logic rules,

but it seems to be possible to describe runtime behaviors in logic rules. Because the execution

of programs is unpredictable, we can’t reproduce the transaction of volatile program states.

But it is possible to reverse the behavior of an executed program. We try to solve the

following problem with logic: given rules describing a set of symbols or facts, and rules

contain constraints to restrict the binding of variables and symbols, we expect to unify

(32)

Figure 14, Flow chart of logic resolution

Previous section points out that the transformation from imperative language programs

to rule-based programs is possible if the volatile state is revertible. For this reason, we deploy

symbolic evaluation before logic evaluation. Symbolic evaluation performs dependency

forwarding on symbolic modification, which enables us to trace back the volatile state of

imperative language programs.

The main purpose of logic evaluation is to find a feasible input data according to given

symbolic constraints. Like figure 14, the evaluation process begins with empty logic variables

as input data, initially with bounded range constraints to specify valid domain. For each

symbolic constraint, we refine both right-hand-side and left-hand-side to minimal unit,

eliminate duplicated constraints, and annotate possible boundary to every logic variables. A

(33)

variable are triggered, when the boundary of some variable narrows down. It causes other

variables to get more precise boundaries. However, if the unification results are in conflict

with the constraints of some variables, the inference process restores the changed bindings

(34)

4. Implementation

This sector describes the design and architecture of ALERT. ALERT is composed of

three blocks: instrumenter, symbolic evaluator, and logic resolver. As Figure 15 shows, they

exchange messages through a top module, which is the UI and entry point of ALERT. ALERT

takes program source as input and executes the instrumented instance until error occurs or

ALERT is unable to generate inputs for a different path.

Figure 15, ALERT components

4.1. Instrumenter

The first contact with source code is the instrumenter. It takes program source as inputs

and generates testing interface and testing driver for runtime trace. The followings are design

(35)

4.1.1. Branch Oriented

The cut point is the program branch. Statically, program branches consist of two parts;

one is a conditional statement for checking, and another is some alternative paths to take. It

seems that we have two options: put our code before first part or put them after second part.

Both ways influence the same thing in source code. In the option one, the operation of

instrumentation includes not only the controlling log but also the evaluation of the condition

statement. It is called ‘cut point’. But in the option two, the operation is reduced to only

controlling log of the branch choices. It is a tradeoff between time and space. For performance

consideration, we insert many log operations in each branch choice instead of one log and

evaluation operation in the cut point.

4.1.2. Condition Sensitive

We break composite condition into independent ones. Because different combination of

these conditions leads to different program states even they are in the same basic blocks. With

the support of different program states, we can resolve the execution logic and generate

different path.

4.1.3. Function driven

The smallest unit of program is a function. Based on this, we can analyze each path in

the specified function. Each invocation of a function has its unique logic. Parameters of

(36)

invocation in the simplest way. With short tags labeling the parameters and local variables we

can distinguish variables under different scopes. And by aliasing all passed parameter and

return value, we can preserve the consistency of current program state while function

invocation.

4.2. Symbolic Evaluator

We discard the concrete values of program executions. We use the composted symbols to

explain how the program transfers information by variables. The following are design feature

of the symbolic evaluation.

4.2.1. Paired Symbols

ALERT requires two aspects of view of symbols to perform symbolic evaluation:

symbolic appearance and symbolic propagation. An appearance is what the symbol 'looks like'

(37)

Define an appearance A of some symbol x is that intuitively x is declared as A.

Ex:

int a; //a symbol whose symbolic appearance is ‘a’.

Define a propagation P of some symbol x is that the current state of x is

propagated from P. P can be composed by any symbol in any ways. Ex:

pi = 3.14159, r =10, area=r*r*pi;

box[0]=top(table), box[1]=top(box[1]), box[2]=near(box[0]);

Define a symbol domain S = {As, Ps} where As is a set of all symbolic

appearances, and Ps is a set of all symbolic propagations. The following

(38)

Define a replacement of an appearance A over a propagation X, denoted as Ra(A,

X), is that for every symbol C in A, if C equals X as propagation symbols, we substitute C with As[X].

Ex:

a=3, b=4, c=foo(b, c); foo(int x, int y){

return sqrt(x*x + y*y);

}//in function foo, Ra (As [foo.x,foo.y],[b,c]), As [foo.x]=As [a] and As [foo.y]=As [b]

Define a replacement of a propagation P over a appearance X, denoted as Rp(P,

X), is that for every symbol C in P, if C equals X as appearance symbols , we substitute C with Ps[X].

Ex:

pi = 3.14159, r =10, area=r*r*pi;

//after Rp (Ps [area], r),Ps [area]=10*10*Ps [pi]

Define a symbol replacement of an appearance symbol x over symbolic domain S,

denotes as Ra (S, x), said S’ = Ra (S, x) and S’ = { {a | a in Ps and x in Ps } , { Ra ( p, Ps

[x] | p in Ps } }.

Define a symbol replacement of a propagation symbol x over symbolic domain S,

denotes as Rp (S, x), said S’ = Rp (S, x) and S’ = { {a | a in As and x in As } , { Rp ( p, As

(39)

4.2.2. Dependency

The dependency relations between symbols are forwarding and rewriting. It is the

fundamental concept of symbolic evaluation. The forwarding is a replacement of the symbol

itself. The rewriting is a substitution of the symbol with another symbol.

Ex: x+=y; // after forwarding, P[x] is updated to P[x] + P[y]

fins=stdin; // rewriting the P[fins] as P[stdin]

4.3. Logic Resolver

These logic rules are implemented in SWI-prolog [16]. The followings are design feature

of logic resolver.

4.3.1. Variable

Logic variable is an empty container. It could bind to any possible values. The logic

variables describe an input of testing initially. And it will finally be assigned a feasible value

that satisfies the current constraints. The variables can be mapped to original parameters of

testing function. For some structures or array or pointer type, the variables can represent some

part of composed-type parameters, for example, a month field of a date structure, the 8th

element of an array of real numbers, or the 3rd offset of a pointer. ALERT generates the

(40)

4.3.2. Freezing Expression

Freezing expression is a set of expressions that associates some variables. The

expressions will limit value of the associated variable. For example, the freezing expression of

a positive odd value is like “freeze(X, [X > 0, X % 2 = 1])”. When the variable is bound, logic

engine evaluates its freezing expressions with the binding value. If the result is true, the whole

process proceeds and triggers other variable binding, otherwise the engine would trace back to

previous decision site and make alternative decision. The logic resolving routine terminates

successfully if all variables are bound and their freezing expressions are satisfied.

4.3.3. Domain Range

Domains are used to set the boundary of variables. For variables to have storage type and

specified storage bits, we can set up a signed integer X with 2 bytes length like “X

indomain(-32768..32767)”. For more advanced application, we can enumerate customized

types such as “Color indomain(red, blue, green)”, and it can be extended to display higher

order of abstraction of symbolic evaluation.

4.4. Implementation Difficulties

The main problems we encountered during ALERT implementation are to generate

testing interface and eliminate program state-transaction into stateless logic rules. The former

problem appears in the instrument stage. There are usually various user-defined types

(41)

different user-defined type automatically, we maintain a stack for registration of structural

data-types, a vector for pointers and a list for array type.

The later problem comes from symbolic evaluating stage. It is the key idea for logic

evaluating. Because logic engine doesn’t care the order of rules, but program logic does, we

make some modification on symbol variables to fix the transaction. We ensure that through

symbolic evaluating ALERT can preserve the states and reversibility of program logic.

4.4.1. Generation of User-defined Type Interface

To generate testing interface for automatic execution, we have to create fixed functions

that dedicates to input parameters of user specified types. In the beginning, we try to examine

the length of specified type and flip the bits to initialize. It can reflect the resolved values, but

without field information. The field information tells about the number of sub-elements and

their types. It became a problem that we can’t catch the length of sub-elements during bits

flipping. We build a type sub-system to get rid of it, according to [9]. We base on fundamental

types and construct composed type with point, array, or structure. The inputs function for

testing interface and driver can be rendered with recursive type inheritance.

4.4.2. State-sensitive Variable Renaming

In the beginning, we thought that the renaming of variables is simple. It is easy to

imagine that as it comes to a state modification, we just store in data pool and change the

(42)

decide which the same symbol variable is. A symbol variable could have different names

when it is transferred to another scope. The solution is to exchange the appearance of all

symbols dynamically during identical symbol selection to prevent symbol variable confusing.

(43)

5. Experimental Results

We evaluate our work with zlib [17], a well-known compressing library. We apply

ALERT on zlib and perform coverage evaluation by conjunction of gcc and gcov [1]. Gcov is

a coverage analysis tool that provides coverage statistics include how often each line of code

executes, what lines of code are actually executed, and how much computing time each

section of code uses.

5.1. Instrumentation and Generation Overhead

The space overhead of ALERT instrumentation is one line-of-code for a significant

branch. We insert one line of log trace code into the beginning of a branch block. This results

in that our space overhead is in proportion to the conditional statements in programs. The

execution overhead of instrumentation is an additional record operation for each conditional

evaluating, so one running instance may have extra operations in proportion to the condition

statements along the paths.

The testing cases generation overhead is one logic variable for an effective function

parameter, but if parameters contain sub-field, logic engine may require more than one logic

variable to resolve symbols. We generate one logic script for the next running instance, so the

total testing script generation of ALERT application is as many as the found possible paths.

5.2. Expand and Explore

(44)

discovery. The discovery algorithm records constraints of previous running instance in

database and produces constraints that differ from those in database by negating some of the

constraints. In the running instance, the program itself may reach new branch that was not

visited before in previous instance. The trace log may contain new constraint entry for

negation. ALERT expand the program branches and explore the newly expended node at the

same time.

5.3. Similar Path Discovery

The ALERT path discovery algorithm firstly negates the last constraints, resulting in the

path of next successful instance which is similar with current paths. The purpose is for

generating similar inputs for testing quickly.

Figure 16, Coverage analysis

(45)

displays the coverage percentage of running iterations accumulatively. The y axis displays the

count of successful generated inputs against a program. Initially, the inputs for first running

instance are zero, so the coverage percentage is about 10% ~ 15%. It could tell that empty

input path contains 15% of LOC. At the end of testing, all program’s coverage accumulated

about 70% ~ 90%. It is because some conditional ALERT couldn’t resolve. Some of them

may be pointer arithmetic or array element cross reference. We can deal with pointer

arithmetic in symbolic evaluation, by constructing a nameless symbol to represent calculated

pointer. But the logic engine can’t take any variable without a name to reference, so does

array reference. It also a limitation of ALERT, that it can’t work in functions take all

parameter as pointer or array. This would leave the room for our future improvement

(46)

6. Conclusion

Testing is a time consuming process and hard to automate. Its main purpose is to make

sure every functionality works. The simplest suggestion is to “show it wrong or right”. If we

do the testing manually, we could lose in the labyrinthine program execution states. To

automate the testing processes, it is essentially important to direct the testing program. We

divide ALERT into three stages: instrumenter, symbolic evaluator, and logic resolver. Each

stage will generate inputs for next stage. A control top module exchanges messages between

every stage. It is a better architecture for testing duty.

Program states consist of thousands of variables and their combinations are

unpredictable. We know it is impossible to enumerate program states through the values of

variables. So we try an alternative way to enumerate program logic. It is much feasible that a

program behavioral representation can be considered as the combination of its logic

constraints. There are many related work that take advantage of program logic. ALERT is the

first application for dynamic analysis.

We discover that program logic is reversible under some circumstances. The reversible

states enable ALERT to solve constraints and produce feasible inputs that lead program to

perform a different execution path. By recording constraints of each running instance, we can

enforce the program to perform all of its possible paths.

(47)

ALERT is just a beginning. For the different testing strategy, such as integration testing,

we can enhance ALERT in the component interaction aspect, which enable ALERT to resolve

more complicated program logic. In addition to simulating the manual input operation,

ALERT could be extended to generate specified environments for exhibitions of security

issues by modification of the messages transmitting between the logic resolver and symbolic

evaluator layers.

6.1.1. Feasible Input Generation

We desire a tool that tells when and how a defect or fault could occur, especially when

we don’t have enough information to prove the existence of the feature. ALERT is just a

beginning of that. It automatically resolves specified program logic (or called constraint rules)

and produces feasible inputs to render program in the way we want. Not only single function,

but integration constrains can be evaluated as a collection of symbols. We look forward to a

way of understanding the program semantics automatically. This method would become a

new trend in the field of automatic testing to deal with every logic feature in program.

6.1.2. Delta Debugging

In the former software debugging process, we are familiar with getting in the reference

between source codes and runtime interaction. To find out what is going wrong behind

programs, it would be convenient for programmers to automate the process that traces

programs without manual assistance. Delta debugging[2] is a latest debugging method that

(48)

debugging requires the comparison of a failure program run with a successful one. Moreover,

the failing program execution characteristics do not actually differ much from the successful

program execution. Delta debugging eliminates the common characteristics through

comparison of value transaction and cause chain of two runs. Nevertheless, in real cases, we

only have program failing runs. That is a tough work to determine the similarity of failing run

and successful runs with which we can apply Delta debugging method.

References

[1] "gcov: a Test Coverage Program http://gcc.gnu.org/onlinedocs/gcc-3.0/gcc_8.html,"

[2] H. Cleve and A. Zeller, "Locating causes of program failures," in ICSE '05: Proceedings

of the 27th International Conference on Software Engineering, 2005, pp. 342-351.

[3] W. R. Bush, J. D. Pincus and D. J. Sielaff, "A static analyzer for finding dynamic

programming errors," SPE, vol. 30, pp. 775-802, 2000.

[4] J. Whaley, M. C. Martin and M. S. Lam, "Automatic extraction of object-oriented

component interfaces," in ISSTA '02: Proceedings of the 2002 ACM SIGSOFT International

Symposium on Software Testing and Analysis, 2002, pp. 218-228.

[5] D. Jackson, "Aspect: detecting bugs with abstract dependences," ACM Trans. Softw. Eng.

Methodol., vol. 4, pp. 109-145, 1995.

[6] D. Engler, D. Y. Chen, S. Hallem, A. Chou and B. Chelf, "Bugs as deviant behavior: A

general approach to inferring errors in systems code," in SOSP '01: Proceedings of the

(49)

[7] S. Hallem, B. Chelf, Y. Xie and D. Engler, "A system and language for building

system-specific, static analyses," in PLDI '02: Proceedings of the ACM SIGPLAN 2002

Conference on Programming Language Design and Implementation, 2002, pp. 69-82.

[8] H. R. and J. B., "Purify: Fast Detection of. Memory Leaks and Access Errors." Winter

USENIX. Conference, 1992.

[9] P. Godefroid, N. Klarlund and K. Sen, "DART: Directed automated random testing," in

PLDI '05: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2005, pp. 213-223.

[10] B. Demsky and M. Rinard, "Data structure repair using goal-directed reasoning," in ICSE

'05: Proceedings of the 27th International Conference on Software Engineering, 2005, pp.

176-185.

[11] H. Agrawal and E. H. Spafford, "Bibliography on debugging and backtracking,"

SIGSOFT Softw. Eng. Notes, vol. 14, pp. 49-56, 1989.

[12] T. Wang and A. Roychoudhury, "Automated path generation for software fault

localization," in ASE '05: Proceedings of the 20th IEEE/ACM International Conference on

Automated Software Engineering, 2005, pp. 347-351.

[13] C. Flanagan, K. R. M. Leino, M. Lillibridge, G. Nelson, J. B. Saxe and R. Stata,

"Extended static checking for java," in PLDI '02: Proceedings of the ACM SIGPLAN 2002

Conference on Programming Language Design and Implementation, 2002, pp. 234-245.

[14] lp_solve http://lpsolve.sourceforge.net/

[15] K. Sen, D. Marinov and G. Agha, "CUTE: A concolic unit testing engine for C," in

ESEC/FSE-13: Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2005, pp. 263-272.

[16] SWI-Prolog -- an LGPL comprehensive portable Prolog compiler. http://www.swi-prolog.org/

(50)

[17] zlib -- A Massively Spiffy Yet Delicately Unobtrusive Compression Library.

以隨機與邏輯推理運算方式進行高涵蓋率之測試

國

立

交

通

大

學

資訊科學與工程研究所

碩

士

論

文

以 隨 機 與 邏 輯 推 理 運 算 方 式 進 行

高

涵

蓋

率

之

測

試

Automatic Logic Evaluation for Random Testing

研 究 生：吳孟勳

指導教授：黃世昆 教授

以隨機與邏輯推理運算方式進行高涵蓋率之測試

Automatic Logic Evaluation for Random Testing

研 究 生：吳孟勳 Student：Meng Hsun Wu

指導教授：黃世昆 Advisor：Shih-Kun Huang

國 立 交 通 大 學

資 訊 科 學 與 工 程 研 究 所

碩 士 論 文

以隨機與邏輯推理運算方式進行高涵蓋率

之測試

研 究 生：吳孟勳

指導教授：黃世昆

摘要

Automatic Logic Evaluation for Random

Testing

Student：Meng

Hsun

Wu Advisor：Shih-Kun Huang

Abstract

誌謝

Table of Contents

List of Figures

List of Tables

1. Introduction

1.1. Software Testing

1.2. Test Strategies

1.3. Motivation

1.3.1. Feasible Inputs

1.4. Objective

1.4.1. Logic constraints extraction

1.4.2. Runtime predicates resolution

1.5. Synopsis

2. Related Work

2.1. Data Reconstruction for Cause Effect Chains

2.2. Model Checking

2.3. Techniques of Fault Localizers

2.4. Symbolic Evaluation and Concrete Evaluation

3. ALERT overview

3.1. Instrumentation for Runtime Trace

3.2. Generation of Runtime Constraints

3.3. Symbolic Evaluation Framework

3.4. Automatic Logic Evaluation

4. Implementation

4.1. Instrumenter

4.1.1. Branch Oriented

4.1.2. Condition Sensitive

4.1.3. Function driven

4.2. Symbolic Evaluator

4.2.1. Paired Symbols

4.2.2. Dependency

4.3. Logic Resolver

4.3.1. Variable

4.3.2. Freezing Expression

4.3.3. Domain Range

4.4. Implementation Difficulties

4.4.1. Generation of User-defined Type Interface

4.4.2. State-sensitive Variable Renaming

5. Experimental Results

以隨機與邏輯推理運算方式進行

研究生：吳孟勳

指導教授：黃世昆教授

研究生：吳孟勳 Student：Meng Hsun Wu

國立交通大學

資訊科學與工程研究所

碩士論文

研究生：吳孟勳