Flow analysis of class relationships for object-oriented programs

(1)

Received December 17, 1997; revised April 13, 1998; accepted August 20, 1998. Communicated by Y. S. Kuo.

619

Flow Analysis of Class Relationships for

Object-Oriented Programs

JIUN-LIANG CHENAND FENG-JIAN WANG

Department of Computer Science and Information Engineering National Chiao Tung University

Hsinchu, Taiwan 300, R.O.C.

Program analysis techniques have been widely applied in various fields of software engineering, such as debugging, testing, and proof of simple correctness properties. In object-oriented (OO) programs, inheritance, association, and aggregation relationships may introduce complicated dependencies concealed within classes that might obstruct program analysis. This paper proposes a class relationship flow models to provide analysis for inheritance, association, and aggregation of class relationships. The flow model consists of three flows, inheritance, association, and aggregation flows, corresponding to these relationships. A sequence of class relationships is represented as a flow path from one class to another. Along a flow path, each member within a class is associated with an operation, define or use, to represent whether its status is changed or referenced. Thereby, the concealed dependencies introduced by class relationships can be analyzed according to the flow operations. The analysis might be used as a technique for program understanding, anomaly detection, and program testing.

Keywords: program analysis, class relationship, object-oriented, software engineering,

pro-gram dependence graph

1. INTRODUCTION

Program analysis predicates some properties of the dynamic behavior of a program statically. It has been extensively used to enable various optimizations and transformations in compilers. Program analysis techniques are also widely applied in software engineering, such as in static debugging, testing, proof of simple correctness properties, and so forth [1]. The object-oriented (OO) paradigm for software development has gained momentum and popularity over the years. The paradigm provides the features of object abstraction, encapsulation, inheritance, and polymorphism for program construction. More and more OO components and class libraries have become available. These libraries encourage pro-gram reuse for building (large-scale) software systems, and the analysis of OO components for reuse, testing, debugging, and maintenance is becoming important. Most program ana-lyzers focus on detecting features likely to be bugs for a single class in light of the specific syntax of a language [2]. Nevertheless, an appropriate model for analyzing class libraries is still lacking.

In OO programs, a class encapsulates attributes and methods (both are also called members) as the state and behavior of its instances (objects). The details of programs are often hidden inside classes deeply. The relationships between classes include inheritance, association, and aggregation [3]. An inheritance relationship between one class and another,

(2)

called subclass, means that the subclass can possess members of the class. An association relationship between two classes, where one is associated with the other, means that the latter class’s members can be used in the former. An aggregation relationship between two classes means that an instance of one class is a part of one instance of the other. On the other hand, such a relationship implies that one class can propagate some information (e.g., methods or attributes) to the other via the relationship. This propagation can be transitive via a sequence of relationships over several classes. That is, class relationships may intro-duce dependencies concealed within classes. The class diagrams used in most OO method-ologies [4, 5] are too coarse-grained to describe the concealed information propagated via these class relationships. For example, from Fig. 1, one observes that “class B inherits from class A” and “class E inherits from class B.” However, one may not know which members defined in A and B are inherited by E. As to association, one can know that “class A can be associated with class S by invoking method draw() in S.” If the invocation is made by means of a polymorphic message, the methods implemented in classes T and U can be invoked potentially. That is, A has implicit association relationships with T and U. Based on aggregation, it is obvious that “class Y encapsulates class X’s instance as its attribute” and “X encapsulates class E’s instance as its attribute.” This figure does not reveal “which members in E and X are accessible in Y.” Such information propagated via class relation-ships implicitly might increase the difficulty of understanding, debugging, and testing OO programs [6-8].

Fig. 1. An example of a class diagram.

In [9], an inheritance flow model was presented to reveal members propagated among classes via inheritance relationships. The model does not consider the other two class rela-tionships that might incur implicit dependencies among classes. In order to analyze these implicit dependencies, this paper proposes a class relationship flow model, which consists of inheritance, association, and aggregation flows used to represent implicit propagation introduced by class relationships, by extending the inheritance flow model. The association flow describes the method invocations and attribute accesses along with a sequence of association relationships. The aggregation flow describes the members of a class that are accessible along with a sequence of aggregation relationships.

(3)

In the class relationship flow model, a sequence of class relationships is represented as a flow path class by class. Each member within a class is associated with an operation, define or use, to describe whether its status is changed or referenced along a flow path. Hence, the implicit information propagated among classes can be deemed as along these flow paths of class relationships. Such notation is the same as traditional data flow [10]. To illustrate the flow model in practice, Java [11] is used in sample programs through this paper. This flow model can be used in program analysis techniques in several applications, e.g., program understanding, anomaly detection, and program testing.

The remainder of this paper is organized as follows. Section 2 reviews related work on flow analysis. Section 3 proposes a class relationship flow model that consists of inheritance, association, and aggregation flows. Then, the analysis of class relationship flow is presented in the next section. Section 5 discusses several applications of flow analysis. In the final section, we draw conclusions and suggest directions for work.

2. BACKGROUND 2.1 Preliminary Definitions

In object-oriented languages, the definition of a member (i.e., a method or attribute) in a class can be divided into signature and body parts, where the signature denotes the member declaration, and the body denotes the member implementation. A member can be denoted as (t, n, p, b), where t is a member type (a returned value type for a method), n is a member name or identifier, p is a list of formal parameters (none for an attribute), and b is a member body. A member signature consists of t, n, and p. A member is abstract or pure virtual when it has no body, i.e., b is null. Let M1 = (t1, n1, p1, b1) and M2 = (t2, n2, p2, b2) be two members. The signatures of M1 and M2 are identical if t1 = t2, n1 = n2, and p1 = p2. These two signatures are indistinguishable if t1π t2, n1 = n2, and p1 = p2.

In a class, let a member d be inherited from a superclass, and let a member d¢ be

specified in the class definition. The signature of d¢ overrides that of d when the two

signatures are indistinguishable. The body of d¢ will override that of d in the class when the

two signatures are indistinguishable or identical. For example, in the program shown in Fig. 2, the signature of member id in class Vehicle and that in class Car are indistinguishable because one is of the int type, and the other is of the String type. The signatures of move() in the two classes are identical. In this case, id in Car overrides the signature and body of that inherited from Vehicle whereas move() in Car overrides only the body of that inherited from Vehicle.

class Vehicle{ public int id;

public void move(){ /* body of Vehicle move */}; }

class Car extends Vehicle { public String id;

public void move(){ /* body of Car move */ }; }

(4)

To represent the structure of OO programs, we define a graph, called a Program Structure Graph (PSG). A PSG is a multi-digraph, where vertices represent classes, interfaces, methods, and attributes, and multiple edge sets represent class inheritance, interface inheritance, public-/protected-/private-memberships, declaration, and method invocation, respectively. The vertex and multiple edge sets were first discussed in [12]. A program structure graph for an OO program can, thus, be defined as follows.

Definition 2.1: Let P be an OO program. A Program Structure Graph (PSG) of P is defined

as GPSG(P) = (V, E), where:

1. V = Vc» Vi» Vm» Va, where

∑ Vc is a set of vertices representing classes,

∑ Vi is a set of vertices representing interfaces,

∑ Vm is a set of vertices representing methods, and

∑ Va is a set of vertices representing attributes.

2. E = (Eext, Eimp, Epub, Epro, Epri, El, Em), where

∑ EextÕ Vc¥ Vc is a set of edges from a class to its immediate subclass representing class

inheritance,

∑ EimpÕ Vi¥ (Vc» Vi) is a set of edges from an interface to its immediate subclass or

subinterface representing interface inheritance,

∑ EpubÕ (Vm» Va) ¥ (Vc» Vi) is a set of edges from a member to its definition class

representing public-membership relationships,

∑ EproÕ (Vm» Va) ¥ (Vc» Vi) is a set of edges from a member to its definition class

representing protected-membership relationships,

∑ EpriÕ (Vm» Va) ¥ (Vc» Vi) is a set of edges from a member to its definition class

representing private-membership relationships,

∑ ElÕ Vc¥ Va is a set of edges from a class to an attribute representing declaration

dependencies [12], and

∑ EmÕ (Vm» Va) ¥ Vm is a set of edges from a member to a method that accesses the

member directly.

The subsequent definition for a PSG is for convenience IN presenting our model.

Definition 2.2: Let e1, e2, ..., and ek be a set of edges in a PSG. (e1, e2, ..., ek) is a path in the

graph if and only if the terminal vertex of ei is the initial vertex of ei+1 for 1 £ i £ k - 1. Let vI be the initial vertex e1, and let vT be the terminal vertex of ek. The path from vI to vT is

denoted as vIÆ vT for short. For a path vaÆ vb = (e1, e2, ..., en) VaE_x∪E_y∪ ∪LE_z→Vb, means that " j, 1 £ j £ n, ejŒ Ex» Ey»... » Ez.

From Program I (see Fig. 3), we can construct a PSG as shown in Fig. 4. The classes for primitive data types (e.g., char, integer, etc.) are omitted in a PSG.

class Rec{ public long array[] = new long[200]; } class C0 {

public int stateC = 0; }

(5)

interface FunPack {

abstract public void funC(); }

class C1 extends C0 implements FunPack { public Rec nameList = new Rec(); public void funC( ) { ...}

}

class C2 extends C1

private S0 objS = new S0( ); public void setup( ){ objS.setS( ); } }

class S0{

private int stateS;

public void setS( ){ stateS = 0; } }

Fig. 3. Program I.

Fig. 4. The PSG of Program I.

2.2 Related Work

Data flow analysis is a technique used to ascertain and collect information about define, use, and kill operations on variables in a program (e.g., [10, 13]). A variable is defined in a statement if a value is assigned to the variable. A variable is used if the variable is referenced in a statement. A variable is killed if the value of the variable is no longer

(6)

available. Conventional data flow analysis often represents a program as a control flow graph, where vertices denote statements, and edges denote execution sequences between statements. Along an execution path in the graph, the flow information for a variable can be expressed as a sequence of operations. The flow information is useful not only for optimiz-ing and parallelizoptimiz-ing compilers [14], but also for usoptimiz-ing many program analysis techniques (e.g., program dependency graphs [15, 16], program slicing [17], ripple effect analysis [18], and so on). In OO programs, a variable represents an object with its own state and behavior defined in a class, rather than data only. The data flow information alone is insufficient to analyze the information implicitly propagated via class relationships.

Sudholt and Steigner [19] extended an inter-procedural data flow analysis algorithm for OO programs. Their approach first thoroughly decomposes an object into a set of pro-cedures and global variables and then performs an inter-procedural data flow analysis algo-rithm (e.g., [20]) on these variables with the procedures. It focuses on low-level data flow information for optimization and parallelization of compilation. In [21], Hierarchical Data Flow Analysis (HDFA) was proposed to explore data flow information in OO programs for three hierarchical layers: classes, objects, and attributes. The HDFA consists of class flow, object flow, and attribute flow, in which three kinds of operations, kill, define, and use, are used to describe the states of attributes, objects, and classes for each of the hierarchical layers. As in traditional data flow analysis, an object flow and a class flow are derived from the state change of the variables when a number of messages are executed. The extended inter-procedural data flow analysis and HDFA focus on the state change of program execution, rather than on class relationships.

Kung in [3] proposed an object relation diagram (ORD) to represent the relationships between classes with inheritance, association, and aggregation for change impact analysis during regression testing. An example of ORD is shown in Fig. 1. The drawback of ORD is that it is too coarse-grained for exploring implicit propagation among classes via class relationships.

3. A FLOW MODEL FOR CLASS RELATIONSHIPS

This section presents a class relationship flow model that consists of inheritance, association, and aggregation flows used to express the implicit information propagated among classes via inheritance, association, and aggregation relationships.

3.1 Inheritance Flow

Classes in an OO program can be structured as a hierarchy via inheritance relationships. In a class hierarchy, a class can either use the members inherited from its superclasses without explicit declaration, or it can redefine them. Both the signature and body of an inherited member can be propagated via inheritance, and they are called signature-inherit-ance and body-inheritsignature-inherit-ance, respectively.

In [9], an inheritance flow model was proposed to describe the members of inherit-ance in a class hierarchy. In this model, each member is associated with a pair of operations which stand for its signature and body defined or used in a class. For signature/body-inheritance, the operations on a member are defined as follows.

(7)

Definition 3.1: In inheritance flow, an operation on the signature of a member for a class

or interface is either a signature-inheritance define (Dsi) or signature-inheritance use (Usi).

1. A Dsi on a member means that the signature of the member is declared in the class or

interface originally.

2. A Usi on a member means that the body of the member is implemented in the class, and

that the member signature is inherited from a superclass.

A member with a Dsi indicates that the member signature is not inherited from a superclass.

An operation on a member is a Usi when the class overrides the member body inherited

from a superclass.

Definition 3.2: In inheritance flow, an operation on the body of a member for a class is

either a body-inheritance define (Dbi), body-inheritance use (Ubi), or null (Nbi).

1. A Dbi on a member means that the body of the member is newly defined or redefined in

the class.

2. A Ubi on a member means that the body of the member exists but is not specified in the

definition of the class.

3. An Nbi on a member means that neither Dbi nor Ubi on the body of the member, i.e., the

body does not exist.

A Dbi on a member indicates that the member body is implemented or re-implemented,

while a Ubi means that the member body is inherited from a superclass, no matter whether

or not it is used in a class. An Nbi on a member means that the member is abstract.

To simplify discussion as in [9], we can define a flow graph, called a class relation-ship graph (CRG) so as to represent inheritance, association, and aggregation relationrelation-ships for our flow model. The CRG is extended from the PSG with vertex tags. A class relation-ship flow graph for a program can, thus, be defined as follows.

Definition 3.3: A Class Relationship Graph (CRG) of an OO program P is defined as GCRG

(P) = (V, E, T), where: 1. (V, E) is GPSG(P).

2. T = {<tsi, tbi> | tsiŒ {Dsi, Usi, e} and tbiŒ {Dbi, Ubi, Nbi, e}} is a set of vertex tags. T(X) =

<X.tsi, X.tbi> is a pair of vertex tags associated with vertex X, where X.tsi and X.tbi

repre-sent the operations of signature-inheritance and body-inheritance on X, respectively. In a class, a member associated with <Usi, Ubi> or <Usi, Nbi> implies that it is inherited

from a superclass or superinterface. Since the member is not defined for the class, there is no vertex with vertex tag <Usi, Ubi> or <Usi, Nbi> in CRG. Therefore, T(X) in CRG is either

<Dsi, Dbi>, <Dsi, Nbi>, or <Usi, Dbi>. Note that if an OO language provides public, protected,

and public inheritances, then the CRG needs different inheritance edges to represent them. Here, our target language, Java, allows public inheritance only; the other two inheritances and related work are not discussed.

(8)

An example of a CRG is shown in Fig. 5. The CRG is constructed from Program I. The pairs of vertex tags denoting the flow operations are attached to the bottom of each method and attribute vertice. The vertex tags for the vertices without corresponding inher-itance flow operations are not shown in the figure.

For example, method funC() in interface FunPack shown in Fig. 3 is associated with <Dsi, Nbi>, since it is abstract. For attribute attrib in C0, its associated operation is <Dsi, Dbi>

because its signature and body have been defined. In class C1, method funC()’s body is defined, but the signature is inherited from FunPack. Therefore, funC() in C1 is associated with <Usi, Dbi>. Class C2 possesses attrib and funC() inherited from its superclasses implicitly.

These two members are available in C2; the operations on them are <Usi, Ubi>, and they do

not appear in the program context of C2.

An inheritance flow path from one class to another in CRG can be specified as in the following definition.

Definition 3.4: Let q1 and q2 be two class or interface vertices in a CRG. A flow path from vertex q1 to vertex q2 is an inheritance flow path, denoted as q1 →IHF q2, iff one of the

following holds:

1. q1Æ q2Œ Eext» Eimp; or

2. $a, a Œ Vc» Vi, such that q1Æ a Œ Eext» Eimp and α →IHF q2.

F o r e x a m p l e , a n i n h e r i t a n c e f l o w p a t h f r o m c l a s s C 0 t o c l a s s C 2 i s “C0 →_Eext C1 →_Eext C2”, shown as the bold, grey arrow in Fig. 5.

Fig. 5. The CRG of Program I. A pair of vertex tags for inheritance flow operations

Inheriteance flow path

Association flow path Aggregation flow path

(9)

3.2 Association Flow

An association relationship between two classes denotes that the execution of a method in an instance of one class might send a message to an instance of the other class to invoke the corresponding method. For a class, its method body includes the messages sent to its parameters (if they exist), class attributes, and global objects. These messages, whose receivers could be the instances of other classes, thus increase the number of method executions. Again, the executions will incur message passing. That is, a method might be invoked by a number of methods for execution along a sequence of association relation-ships of classes. The method invocation sequences along with association relationrelation-ships constitute the association flow. A sequence of association relationships is called an asso-ciation flow path.

Along an association flow path, a set of members might be invoked (or accessed) by a message. This can be described by the flow operations defined below.

Definition 3.5: In association flow, the operation on a member for a class is an association

define (Das) or association use (Uas).

1. A Das on a member means that the class owns the member.

2. A Uas on a member means that the class contains a message that might access or invoke

the member.

According to this definition, a Das on a member implies that the class explicitly defines the

member or inherits it from the other class. A Uas on a member implies that the member

might be invoked by some message within the class.

For example, in Fig. 5, the method setup() and attribute objS in class C2 are both associated with Das since they are defined in the class. Similarly, the methods setS() and

stateS in class S0 are associated with Das. There is a message in setup() to invoke setS(), and

setS() contains a message to access stateS. Therefore, the operations on setS() and stateS are Uas.

Fig. 6. An association flow path.

In CRG, an association flow path from one class to another can be defined as follows. The definition is recursive. The necessary base condition of q1 →_ASF q2 means that member

d of class q1 might be accessed by the method c of class q2. δ →Em χ means that d might be invoked or accessed by some message within c. The base condition is illustrated in Fig.

(10)

Definition 3.6: Let q1 and q2 be two class vertices in a CRG. A flow path from vertex q1 to vertex q2 is an association flow path, denoted as q1 →ASF q2, iff one of the following holds:

1. $ d, d Œ Vm » Va, and $ c, c Œ Vm, such that δEpub∪Epro Epri∪ →q1∧δ →Em χ∧

χEpuh∪Epro Epri∪ →q2; or

2. $ a, a Œ Vc, $ d¢, d¢ Œ Vm» Va, and $ c¢, c¢ Œ Vm, such that δ′ Epub Epro Epri∪ ∪ →q1∧

′  → ′ ∧

δ Em χ χ′ Epub∪Epro∪Epri→α, and α →ASF q2.

In Fig. 5, an association flow path, S →_ASF C2, is shown by the dashed bold arrow because “setS() →_Epub S”; i.e., class S owns method setS(), and “setS() →Em setup()

Epub

 → C2”.

3.3 Aggregation Flow

An aggregation relationship between two classes means that an instance of one class is encapsulated as an attribute in the other class. A class that has an aggregation relationship with another class can be encapsulated in the latter class. With a sequence of aggregation relationships, one class’s members concealed within another class under multiple layers of encapsulation might be still accessible in the latter class. The accessible members along with aggregation relationships are called an aggregation flow. A sequence of aggregation relationships is called an aggregation flow path.

In an aggregation flow path, a class’s instance can be encapsulated as an attribute within another class along the sequence of aggregation relationships. The operations in aggregation flow are stated in Definition 3.7.

Definition 3.7: In an aggregation flow, the operation on a member in a class is either an

aggregation define (Dag) or aggregation use (Uag).

1. A Dag on a member means that the class owns the member.

2. A Uag on a member means that the member can be directly accessed within the class.

According to the definition, in a class, a Dag on a member implies that the class explicitly

defines the member or inherits it from another class. A Uag on a member implies that the

member can be accessed in the class. For example, class Rec in Program I encapsulates

(11)

array as a public attribute. The operation on the attribute in Rec is Dag. The object nameList,

an instance of Rec, is encapsulated as an attribute in class C1. Hence, Rec’s attribute array is accessible in C1, and the operation on array in C1 is a Uag.

In CRG, an aggregation flow path from one class to another can be described as follows. The necessary condition to form an aggregation flow path between two classes is that one class encapsulates an instance of the other as an attribute directly or indirectly. The first condition of Definition 3.8 is shown in Fig. 7.

Definition 3.8: Let q1 and q2 be two class vertices in a CRG. A flow path from vertex q1 to vertex q2 is an aggregation flow path, denoted as q1 →AGF q2, iff one of the following

holds:

1. $ d, d Œ Va, such that q1 →El δ δ∧ Epub∪Epro∪Epri→q2; or

2. $ a, a Œ Vc, and $ d¢, d¢ Œ Va, such that q1 →El δ δ′ ∧ ′ Epub∪Epro∪Epri→α, and

α →AGF q2.

For example, an aggregation path from class Rec to class C1 in Program I is shown as the dotted bold arrow via “Re c

El

 → nameList →_Epub C1”, in Fig. 5. Note that an interface in Java can be regarded as a special abstract class because it contains method signatures and constants only. Therefore, an interface is contained in an inheritance flow only, not in association and aggregation flows.

4. FLOW ANALYSIS OF CLASS RELATIONSHIPS

With the class relationship flow model, we can analyze the flow among classes via inheritance, association, and aggregation relationships. The analysis can be performed through the CRG of a program.

4.1 Define-Use Relation

Traditional data flow provides the relations of define and use operations on variables in control flow for various applications. A define-use relation indicates that the state of a variable is changed and referenced along an execution path; this is the essential information for program parallelization, optimization, and testing [22, 23]. The flow operations in our model may form the define-use relation as in traditional data flow. Along a flow path from one class to another, a define operation on a member in the former class and a use operation on one in the latter class can be deemed as a define-use relation, called a define-use pair. A define-use pair formed by two flow operations is shown in Definition 4.1.

Definition 4.1: Let Dx be a define operation, and let Ux be a use operation, where the

subscript x is ‘si,’ ‘bi,’ ‘as,’ or ‘ag’, corresponding to signature-inheritance, body-inheritance, association, or aggregation flows. Let Q1 and Q2 be two classes, and let M be a member. Two flow operations on M, one in Q1 and the other in Q2, form a define-use pair if 1. the operation on M in Q1 is a Dx;

2. the operation on M in Q2 is a Ux; and

(12)

In an inheritance flow, there are two kinds of define-use pairs, a signature define-use (DUsi)

pair and a body define-use (DUbi) pair, according to the operations of signature-inheritance

and body-inheritance. A define-use pair formed by the operations of an association flow is called an association define-use (DUas) pair while that formed by the operations of an

ag-gregation flow is called an agag-gregation define-use (DUag) pair. The corresponding

charac-teristics of these define-use pairs in CRG can be deduced as shown in the following corol-laries .

Corollary 4.1: Let q1 and q2 be two classes, and let m be a member. (m, q1, q2) is a DUsi pair

if the following are true:

1. x.tsi = Dsi and m Æ q1Œ Epub» Epro;

2. $ m¢, m¢ Œ Vm» Va, such that m′ Epub∪Epro∪Epri→q2∧ ′m t.si =Usi, and the signatures of

m¢ and m are identical; and

3. q1 →_IHF q2 and  " a, a Œ Vc and q1 →IHF α α∧  →IHF q2, there is no vertex m≤, m≤ Œ Vm» Va, such that m′′ Epub_∪Epro Epri_∪ →α∧ ′′m t.si =Dsi and m≤’s signature and m’s are

indistinguishable.

Fig. 8. A signature define-use pair.

Corollary 4.1 indicates that there are three conditions in a CRG for a signature define-use pair of member m in the class q1 and class q2. The first condition is that a Dsi exists on the

member m in the class q1, and that m can be inherited by q1’s subclasses. The second is that a Usi exists on m¢ in q2. Since the signatures of m¢ and m are identical; i.e., the operation on method m is a Usi, the result is that m’s signature in class q1 is not redefined before being inherited by class q2 along an inheritance flow path. If m≤ exists, the signature of m in q1 is overridden by m≤’s signature. Fig. 8 illustrates the above description.

The conditions for a DUbi pair in a CRG are given in Corollary 4.2. The first

condi-tion is that a Dbi exists on member m in class q1, and that m can be inherited by q1’s subclasses. The second is that a Ubi exists on member m in class q2. That is, class q2 may own member m without any explicit declaration. The final result is that m’s body in class q1 is not redefined before being inherited by class q2 along an inheritance flow path. If m≤ exists, then the body of m inherited by class q2 is not the one specified in class q1. The corollary is depicted by Fig. 9.

(13)

Fig. 9. A body define-use pair.

Corollary 4.2: Let q1 and q2 be two classes, and let m be a member. (m, q1, q2) is a DUbi pair

if the following are true:

1. x.tbi = Dbi and m Æ q1Œ Epub» Epro;

2. there is no vertex m¢, m¢ Œ Vm» Va, such that m′ Epub_∪Epro Epri_∪ →q2∧ ′m t. bi=Dbi, and

m¢’s signature and m’s are identical or indistinguishable; and

3. q₁ →_IHF q₂ and " a, a Œ Vc and q1 →IHF α α∧  →IHF q2, there is no vertex m≤, m≤ Œ

Vm» Va, such that m′′ _Epub_∪_Epro_∪_Epri→α∧ ′′m t. bi =Dbi, and m≤’s signature and m’s are

identical or indistinguishable.

In the CRG of Program I (see Fig. 5), (funC(), FunPack, C1) is a DUsi pair, and

(stateC, C0, C1) is a DUbi pair. Note that a member associated with Usi or Ubi in a class does

not mean that it is invoked (or accessed) in the class. Therefore, the define-use pairs can be used to compute whether a class possesses the signatures and bodies of inherited members from superclasses.

For an association flow, a define-use pair can be regarded as the invocation relation between a sender and a receiver of a message. The define-use pairs include direct and indirect method invocations or attribute accesses. Since an association flow does not in-volved interfaces, these define-use pairs exist between classes. For a member, only one occurrence of a Das exists along an association flow path.

Corollary 4.3: Let q1 and q2 be two classes, and let m be a member. (m, q1, q2) is a DUas pair

if the followings are true:

1. m.tbi = Dbi and m Æ q1Œ Epub» Epro» Epri;

2. $ d, d Œ Vm, such that m →Em δ δ∧ Epub∪Epro∪Epri→q2; and

3. q₁ →_ASF q₂.

Corollary 4.3 shows the conditions for an association define-use pair. The first con-dition is that the class q1 may own member m, namely, a Das on m in q1. The second is that class q2 has some method d containing a message to invoke/access method m; i.e., a Uas is

on m in q2. The final result is that there is an association flow path from class q1 to class q2. The overriding of method m does not have to be considered in an association flow path. Fig. 10 depicts a DUas of member m in class q1 and class q2. In the CRG of Program I, (setS(), S, C2) is an association define-use pair since setS() defined in S might be invoked by method setup() in C2.

(14)

For an aggregation flow, a define-use pair is a special whole-part relation between a class and its encapsulated members. An aggregation define-use pair implies that a member of a class encapsulated inside another class is accessible by the latter class. The encapsula-tion might be across multiple encapsulaencapsula-tion layers along an aggregaencapsula-tion flow path.

Corollary 4.4: Let q1 and q2 be two classes, and let m be a member. (m, q1, q2) is a DUag pair

if the following are true: 1. m.tbi = Dbi and m Æ q1Œ Epub;

2. $ d, d Œ Va, such that; q1El∪Epub→δ δ∧ Epub∪Epro∪Epri→q2 and

3. q1 →_AGF q2.

Corollary 4.4 shows the conditions for an aggregation define-use pair in a CRG. The first condition is that class q1 owns member m, i.e., a Dag on m in q1, and member m must be accessible from outside of class q1; i.e., m is a public member of class q1. The second is that class q1’s instance must be encapsulated within some attribute d of class q2. The third is that there is an aggregation flow path from class q1 to class q2. A Dag on m along an aggregation

flow path occurs only once. It is not necessary to consider whether member m will be overridden along the flow path. The define-use pair is illustrated in Fig. 11. In the CRG of Program I, a DUag pair, for example, is the attribute array in classes Rec and C1.

Fig. 11. An aggregation define-use pair.

4.2 Hybrid Class Relationships Association flow with inheritance

The members propagated via an inheritance flow may introduce implicit association relationships between classes. In other words, one class can associate with another without any association flow path between them. Such an association can be achieved by means of inherited members or polymorphism. For example, class B in Fig. 12 owns method mA(E) from class A. Method mA(E) contains the message shape.draw() that invokes the method draw(), whose receiver’s class is E. Class E inherits the method draw() from class D. Therefore, class B has an association relationship with class E.

(15)

class A{

public void mA(E shape){ shape.draw(); } }

class B extends A ...

} class D{

public void draw(){ ... } }

class E extends D ...

}

Fig. 12. Program II.

To model an association flow in a class hierarchy, inheritance relationships have to be taken into consideration. An association flow path from one class to another in a CRG is restated by Definition 4.2. The flow path may include inheritance edges, besides the membership and member access edges considered in Definition 3.6.

Definition 4.2: Let q1 and q2 be two class vertices in a CRG. A flow path from vertex q1 to vertex q2 is an association flow path, denoted as q1 →ASF q2, iff one of the following

holds:

1. $ d, d Œ Vm» Va, and $ c, c Œ Vm, such that δEpub∪Epro Epri∪ ∪Eext→q1∧δ →Em χ∧

χ_Epub_∪_Epro_∪_Epri_∪_Eext→q2; or

2. $ a, a Œ Vc, $ d¢, d¢ Œ Vm» Va, and $ c¢, c¢ Œ Vm, such that δ′ Epub∪Epro Epri∪ ∪Eext→q1,

′  → ′ ∧ ′ _∪ _∪ _∪ →

δ Em χ χ Epub Epro Epri Eext α, and α →ASF q2.

This definition is extended from Definition 3.6. The base condition involves the associa-tion relaassocia-tionships incurred by inherited members as Fig. 13 shows. Thus, the associaassocia-tion relationship of classes B and E mentioned above can be modeled as an association flow path from E and B, E →_ASF B. The flow path shown as the dashed bold arrow in Fig. 14 is formed by “draw()  →_Epub D →_Eext E” and “draw()  →Em mA(E) →Epub A →Eext B”.

(16)

A Das on a member in a class means that the class owns the member. The member

can be defined by the class or inherited from the class’s superclass. A Das in a CRG can be

stated as follows.

Corollary 4.5: Let q be a class and m be a member. The association flow operation on m is

a Das in q iff one of the following conditions holds:

1. m.tbi = Dbi and m Æ q Œ Epub» Epro» Epri; or

2. $ a, a Œ Vc, such that (m, a, q) is a DUbi pair.

In Corollary 4.5, the first condition implies that member m is specified within class q while the second imp;ies that there exists one superclass a of q defining m inherited by class q.

For example, the association flow operation on draw() in class E (see Program II) is a Das

because (draw(), D, E) is a DUbi pair.

A Uas on a member in a class is caused by the method that sends a message to access

the member within the class. The method can be defined within the class or inherited from other class. Such a Uas in a CRG can be found as follows.

Corollary 4.6: Let q be a class and m be a member. The association flow operation on m is

a Uas in q iff one of the following conditions holds:

1. $ d, d Œ Vm, such that m →Em δ δ∧ Epub∪Epro Epri∪ →q; or

2. $ a, a Œ Vc, and $c, c Œ Vm, such that m →Em χ, and (c, a, q) is a DUbi pair.

The corollary above shows the conditions for a Uas on a member. The first condition

im-plies that a method d in class q contains a message to invoke m. The second one implies

that there exists a class a, from which class q inherits a method c containing a message to

invoke m. For example, a Uas is on draw() in class B (see Program II), since draw() →Em mA (E) and (mA(E), A, B) is a DUbi pair.

Aggregation flow with inheritance

An inheritance flow may also introduce an implicit aggregation flow such that one class may aggregate another without any aggregation flow path to the latter. For example, the public attribute array in class Rec (see Program I) is accessible in class C1 because Rec’s instance is encapsulated as nameList in C1. Class C2 inherits nameList from C1 and

(17)

is allowed to access array defined in Rec. C2 is aggregated in class F (see Program III in Fig. 15), so the public members in C2 might also be accessible in F. Therefore, F may aggregate Rec via aggregation and inheritance relationships.

class F

public C2 objC2 = new C2();

// class C2 is defined in Program I ...

}

Due to inheritance relationships, the flow path between classes for aggregation flow has to be redefined as follows.

Definition 4.3: Let q1 and q2 be two class vertices in a CRG. A flow path from vertex q1 to vertex q2 is an aggregation flow path, denoted as q1 →AGF q2, iff one of the following

holds:

1. $ d, d Œ Vm» Va, such that q1El∪Eext→δ δ∧ Epub∪Epro∪Epri∪Eext→q2; or

2. $ a, a Œ Vc, and $ d¢, d¢ Œ Vm» Va, such that αEl∪Eext→δ δ′ ∧ ′ Epub∪Epro∪Epri∪Eext→q2,

and α →AGF q2.

This definition is recursive, and the base condition involves the aggregated classes via inheritance edges besides membership and declaration edges in Definition 3.8. Such a case is shown in Fig. 16. For example, the aggregation of classes F and Rec in Program III can be an association flow path from Rec and F. The flow path, Re c →_AGF F, is formed by “Re c _E

l

 → nameList  →Epub C1 →Eext C2” and “C2 →El objC2 →Epub F”. The dotted bold arrow in Fig. 17 shows the flow path.

Fig. 16. An aggregation flow path with inheritance.

Fig. 17. An aggregation flow path from Rec to F. Fig. 15. Program III.

(18)

According to Definitions 3.5 and 3.7, the meaning of Das is the same as that of Dag.

Therefore, the necessary conditions in Corollary 4.5 can be applied to a Dag on a member in

a class. The necessary conditions for Uag with an inheritance flow in a CRG is shown in

Corollary 4.7.

Corollary 4.7: Let q be a class and m be a member. The aggregation flow operation on m

is a Uag in q iff one of the following conditions holds:

1. $ d, d Œ Va, such that mEl∪Epub→δ δ∧ Epub∪Epro∪Epri→q; or

2. $ a, a Œ Vc, and $c, c Œ Va, such that m _E _Epub _Epub _Epro _Epri q

l∪ ∪ ∪

→χ χ∧ → and (c, a,

q) is a DUbi pair.

This corollary shows the necessary conditions for a Uag on a member. The first condition is

that there is an attribute d of class q, whose class encapsulates m as a public member. That

is, m is accessible within class q. The second one is that there exists a class a, from which

class q inherits an attribute c whose class as a public member. For example, a Uag is an array

in class F (see Program III), because (draw(), D, E) is a DUbi pair.

4.3 Flow Information

For a class, the information propagated via class relationships can be divided into input, generated, and output flows. The input flow of a class that includes the members is from the prior classes in the flow paths. The generated flow of a class is referred to as the newly defined or redefined members in the class. The output flow of a class subsumes the input and generated flows that can be propagated to the immediate post classes in the flow paths. In a CRG, we can define the flow information of a class with respect to inheritance, association, and aggregation flows as follows.

The signature-inheritance flow information of a class is stated in Definition 4.4. The input flow includes the member signatures defined in superclasses or superinterfaces that are inherited by the class. The generated flow denotes the newly defined or redefined signatures in the class. The output flow involves the signatures, including those from superclasses, of the class that can be inherited by subclasses.

Definition 4.4: In an inheritance flow, the input, generated, and output

signature-inherit-ance flows of a class q are defined as SIFin(q), SIFgen(q), and SIFout(q), where SIFin(q) = {(m, a) | (m, a, q) is a DUsi pair};

SIFgen(q) = {(m, q) | m Œ Vm» Va, m.tsi = Dsi, and m Æ q Œ Epub» Epro» Epro}; and SIFout(q) = SIFin(q) » SIFgen(q) - {(m, q) | (m, q) Œ SIFgen(q) and m Æ q Œ Epri}.

For class C2 in Program I, SIFin(C2) = {(funC(), FunPack), (stateC, C0), (nameList, C1)},

and SIFgen(C2) = {(objS, C2), (setup(), C2)}. Then, SIFout(C2) = {(funC(), FunPack), (stateC,

C0), (nameList, C1), (setup(), C2)}.

The definition of the input, generated, and output flows for body-inheritance in Defi-nition 4.5 is the similar to that for signature-inheritance. Only their corresponding flow operations and define-use pairs are different. This definition is shown as follows.

(19)

Definition 4.5: In an inheritance flow, the input, generated, and output body-inheritance

flows of a class q are defined as BIFin(q), BIFgen(q), and BIFout(q), where BIFin(q) = {(m, a) | (m, a, q) is a DUbi pair};

BIFgen(q) = {(m, q) | m Œ Vm» Va, m.tbi = Dbi, and m Æ q Œ Epub» Epro» Epri}; and BIFout(q) = BIFin(q) » BIFgen(q) - {(m, q) | (m, q) Œ BIFgen(q) and m Æ q Œ Epri}.

In Program I, the body-inheritance flows of C2 are BIFin(C2) = {(funC(), C1), (stateC, C0),

(nameList, C1)}, BIFgen(C2) = {(objS, C2), (setup(), C2)}, and BIFout(C2) = {(funC(), C1),

(stateC, C0), (nameList, C1), (setup(), C2)}.

The association flow of a class can be defined as in Definition 4.6. The input flow of class q, ASFin(q), is a set of member-class pairs, (m, a). Each of the pairs denotes that

member m in class a will be invoked directly or indirectly by a message within q. The

generated flow of class q includes not only the members specified in q, but also those inherited from q’s superclasses. That is, there is a Das on each of these members in q. The

output flow involves the member-class pairs (m, a), where member m of class a can be

invoked (or accessed) directly or indirectly by some message from the outside (scope) of q. These pairs are a subset of the union of ASFgen(q) and ASFin(q).

Definition 4.6: In an association flow, the input, generated, and output flows of a class q

are defined as ASFin(q), ASFgen(q), and ASFout(q), where ASFin(q) = {(m, a) | (m, a, q) is a DUas pair};

ASFgen(q) = {(m, q) | m Œ Vm» Va, and either (i) m.tbi = Dbi and m Æ q Œ Epub» Epro» Epri, or (ii) $ a, a Œ Vc, such that (m, a, q) is a DUbi pair}; and

ASFout(q) = {(m, a) | (m, a) Œ ASFgen(q) » ASFin(q), and either (i) mEpub∪Epro∪Eext→q

or (ii) $ d, d Œ V_m∧ δ _Epub_∪_Epro_∪_Eext→q, such that m →_Em δ}.

For example, the association flows of class C2 in Program I are ASFin(C2) = {(setS(), S),

(stateS, S)}, ASFgen(C2) = {(objS, C2), (setup(), C2), (stateC, C0), (funC(), C1), (nameList,

C1)}, and ASFout(C2) = {(setS(), S), (stateS, S), (objS, C2), (setup(), C2), (stateC, C0),

(funC(), C1), (nameList, C1)}.

In an aggregation flow, the flow information of a class can be defined as in Definition 4.5. The input flow of class q includes the members that are encapsulated in other classes and are accessible in q. The generated flow of class q includes the members specified in q and those inherited from q’s superclasses. The aggregation flow operation on each of these members in q is a Dag. The output flow of class q involves the members in the input and

generated flows that are accessible outside of q.

Definition 4.7: In an aggregation flow, the input, generated, and output flows of a class q

can be defined as AGFin(q), AGFgen(q), and AGFout(q), where AGFin(q) = {(m, a) | (m, a, q) is a DUag pair},

AGFgen(q) = {(m, q) | m Œ Vm» Va, and either (i) m.tbi = Dbi and m Æ q Œ Epub» Epro» Epri, or (ii) $ a, a Œ Vc, such that (m, a, q) is a DUbi pair}; and

AGFout(q) = {(m, a) | (m, a) Œ ASFgen(q) » ASFin(q), and either (i) mEpub Eext∪ →q or (ii) $ d, d Œ V_a∧ δ _Epub_∪_Eext→q, such that m_Epub_{∪ ∪}_El _Eext→δ}.

(20)

For example, the aggregation flows of class C2 in Program I are AGFin(C2) = {(array, Rec),

(stateC, C0), (nameLlist, C1), (funC(), C1)}, AGFgen(C2) = {(objC2, C2), (setup(), C2)},

and AGFout(C2) = {(array, Rec), (stateC, C0), (nameLlist, C1), (funC(), C1), (setup(), C2)}.

Note that access of attribute objC2 from outside of C2 is not allowed, so (objC2, C2) is not included in AGFout(C2).

4.4 Flow Computation

Since an OO program can be represented as a CRG, computing the flow information of the program can be performed on the CRG. For a given class, the steps in computing its flow information can be derived from the definitions given in subsection 4.3.

In an inheritance flow, Algorithm 4.1 shows how to compute the input signature-inheritance flow for a given class c, i.e., SIFin(c). The algorithm performs a breadth-first

traversal from c backward along the edges of Eext and Eimp, i.e., ascending to c’s superclasses

or superinterfaces. The traversal is controlled by a queue work_list, in which a vertex can be stored and popped with first-in-first-out order using the enqueue and dequeue methods. The traversal finds the member signatures and classes (or interfaces) that form DUsi pairs

with class c. Because an inheritance flow path in a CRG never forms a cycle, the algorithm stops after all the superclasses of c have been examined. The output, SIF_IN, is SIFin(c).

Algorithm 4.1 Computing_SIF_IN(c, GCRG)

Input: (c, GCRG), c Œ Vc» Vi and GCRG is a CRG

Output: SIF_IN /* Input signature-inheritance flow of c */ Begin

SIF_IN := f;

work_list.enqueue (c); /* Initialize the value of work_list to be ‘c’ */

while work_list is not empty do

v := work_list.dequeue (); /* Pop a class or interface vertex v from work_list */

for each s, s Œ Vc» ViŸ s Æ v Œ Eext» Eimp, do

for each m, m Œ Vm» VaŸ m.tsi = DsiŸ m Æ s Œ Epub» Epro, do

if (there is no (x, q), (x, q) Œ SIF_IN, such that the signatures of x and m

are indistinguishable) then

SIF_IN := SIF_IN » {(m, s)}; /* Signature DU pair (m, s, c) */

endif endfor

work_list.enqueue(s); /* Store c’s superclass s in work_list */

endfor

endwhile

output SIF_IN /* SIFin(c) */

End.

In Algorithm 4.1, each vertex is visited at most once, and the visited method and attribute vertices are compared with the elements in SIF_IN. Let |V| denote the number of vertices in a CRG. The time complexity of the algorithm is O(|V|2_{) for the worst case.}

(21)

The algorithm used to compute the input flow information for the body-inheritance of a class is similar to that for signature-inheritance. The difference is that the bodies of inherited members are from superclasses only. That is, it is not necessary to visit interface vertices. Algorithm 4.2 shows the computation of the input body-inheritance flow of a class. The time complexity of this algorithm for the worst case is also O(|V|2_).

Algorithm 4.2 Computing_BIF_IN(c, GCRG)

Input: (c, GCRG), c Œ Vc and GCRG is a CRG

Output: BIF_IN /* Input signature-inheritance flow of c */ Begin

BIF_IN := f;

work_list.enqueue (c); /* Initialize the value of work_list to be ‘c’ */

while work_list is not empty do

v := work_list.dequeue (); /* Pop a class or interface vertex v from work_list */

for each s, s Œ VcŸ s Æ v Œ Eext do

for each m, m Œ Vm» VaŸ m.tsi = DsiŸ m Æ s Œ Epub» Epro, do

if (there is no (x, q), (x, q) Œ BIF_IN, such that the signatures of x and

m are indistinguishable) then

BIF_IN := BIF_IN » {(m, s)}; /* Signature DU pair (m, s, c) */

endif endfor

work_list.enqueue(s); /* Store c’s superclass s in work_list */

endfor

endwhile

output BIF_IN /* BIFin(c) */

End.

The computations between the generated flows for signature-inheritance and body-inheritance are similar. Algorithm 4.3 shows how the generated body-body-inheritance flow is computed for a given class c, i.e., BIFgen(c). The computation is done vertex by vertex,

where the vertex whose tbi tag is Dbi and which has an Epub, Epro, or Epri edge to c is included

in BIF_GEN. The complexity of this algorithm is O(|V|) for the worst case.

Algorithm 4.3 Computing_BIF_GEN(c, GCRG)

Output: BIF_GEN Begin

BIF_GEN := f;

for each m, m Œ Vm» VaŸ m.tsi = DsiŸ m Æ c Œ Epub» Epro» Epri, do

BIF_GEN := BIF_GEN » {(m, c)};

endfor

output BIF_GEN /* BIFgen(c) */

(22)

According to Definitions 4.4 and 4.5, the output flows for signature-inheritance and body-inheritance of a class c, SIFout(c) and BIFout(c), can be obtained from c’s input flow

and generated flow. These output flows can be computed by excluding the private members from the union of the input and generated flows corresponding to signature/body-inheritance. The input flows of association and aggregation for a class c, ASFin(c) and AGFin(c),

can be computed by backward traversing the corresponding flow paths from c in a CRG. The traversal finds the members and classes that form DUas/DUag pairs with c. As

ex-plained in Subsection 4.2, some association and aggregation flows may be introduced by inherited members of a class. To get such association and aggregation flows, we can apply Algorithm 4.2 (by invoking Computing_BIH_IN()) to find the inherited members of a class. The detailed steps in computing the input association and aggregation flows of a class are shown in Algorithms 4.4 and 4.5, respectively. In these algorithms, the statement Computing_BIH_GEN() is used to invoke Algorithm 4.3 to get the members specified within a class. Algorithm 4.4 Computing_ASF_IN(c, GCRG) Input: (c, GCRG), c Œ Vc and GCRG is a CRG Output: ASF_IN Begin ASF_IN := f;

BIH_IN := Computing_BIH_IN(c, GCRG); /* call Computing_BIH_IN(c, GCRG) */

BIH_GEN := Computing_BIH_GEN(c, GCRG); /* call Computing_BIH_GEN(c, GCRG) */

use_set := {m | $ q, q Œ Vc, such that (m, q) Œ BIH_IN » BIH_GEN};

for each v, v →_ASF c, do

BIH_IN := Computing_BIH_IN(v, GCRG); /* call Computing_BIH_IN(v, GCRG) */

BIH_GEN := Computing_BIH_GEN(v, GCRG); /* call Computing_BIH_GEN(v, GCRG) */

def_set := BIH_IN » BIH_GEN;

for each (x, s), (x, s) Œ def_set do

if ($ m, m Œ use_set, and x →_Em m) then ASF_IN := ASF_IN » {(x, s)};

endif endfor endfor

output ASF_IN /* ASFin(c) */

End.

In Algorithm 4.4, the worst case is that all incoming Em edges of each vertex have to

be traversed once when Computing_BIH_IN() is not considered. Let |E| denote the number of vertices in a CRG. The complexity for the worst case is O(|V| ¥ |E|). In Algorithm 4.5,

each vertex is visited at most once in computing the input aggregation flow of a class, excluding the invocation of Computing_BIH_IN(). Hence, the complexity of the algorithm for the worst case without considering inheritance is O(|V|).

Algorithm 4.5 Computing_AGF_IN(c, GCRG)

(23)

Begin

ASF_IN := f;

BIH_IN := Computing_BIH_IN(c, GCRG); /* call Computing_BIH_IN(c, GCRG) */

BIH_GEN := Computing_BIH_GEN(c, GCRG); /* call Computing_BIH_GEN(c, GCRG) */

work_set := {q | $ (a, z), (a, z) Œ BIH_IN » BIH_GEN, such that q Æ a Œ El};

done_set := f; /* a set of checked classes */

while work_set π f do

select a vertex v from work_set; work_set := work_set - {v};

done_set := done_set » {v};

BIH_IN := Computing_BIH_IN(v, GCRG); /* call Computing_BIH_IN(v, GCRG) */

BIH_GEN := Computing_BIH_GEN(v, GCRG); /* call Computing_BIH_GEN(v, GCRG) */

def_set := BIH_IN » BIH_GEN;

for each (x, s) in def_set do

if (x Æ s Œ Epub) then

AGF_IN := AGF_IN ( {(x, s)};

endif

if (x Œ Va) then

work_set := work_set » {r | r Æ x Œ El and r œ done_set };

endif

endfor

endwhile

output AGF_IN /* AGFin(c) */

End.

The generated association and aggregation flows for a class are similar since their definitions are identical (see Definitions 4.6 and 4.7). Hence, computing the two generated flows is done to get all the members, including those specified in the class and those inher-ited from superclasses. From the input and generated flows for association and aggregation of a class c, we can obtain the corresponding output flows of c, ASFout(c) and AGFout(c), in

accordance with Definitions 4.6 and 4.7.

These algorithms show that the analysis of class relationships can be reduced to the graph reachability problem. They are not optimal but are used to show how our flow infor-mation in an OO program can be computed.

5. APPLICATIONS

The flow model can provide the flow analysis of class relationships for program understanding, anomaly detection, and program testing.

5.1 Program Understanding

When reusing a class, a programmer often needs to understand not only the class, but also its relations with another class [24]. Thus, it is necessary to navigate in a class library to find the relations. Inheritance flow information can help one understand what implicit members a class owns and a class hierarchy [9]. The input flow of a class in an association

(24)

Fig. 18. Hierarchical view of classes.

Fig. 19. Flow analyzer of class relationships for Java Programs.

flow can help one identify that the members defined in other classes might be invoked or accessed by the class. For a complicated composition class, its available members from other classes can be obtained by computing the input flow via aggregation relationships.

The flow analysis tool of this flow model for program understanding is realized in the Microsoft Windows“ 95/NT environment. The display and computation of class

relation-ship flows were developed in C++ [25]. To implement the source code scanner, parser, and CRG constructor for Java programs, we employ the tools flex and bison (both are shareware developed by GNUTM_{). Fig. 18 shows a hierarchical view of Program I. A user can select} a class to perform inheritance, association, and aggregation analysis. The result of analysis is displayed in the window in the middle of Fig. 19. The analyzer helps the user understand class libraries by providing flow analysis of class relationships.

(25)

5.2 Anomaly Detection

An anomaly in a program is often an indication of the existence of a programming error or an inappropriate design. The flow operation sequences in this class relationship flow model can be used to detect anomalies in OO programs, such as method interface conflicts and unimplemented methods [26]. These anomalies can be detected as in tradi-tional data flow anomaly detection [27].

A member propagated along a flow path, from class q1, class q2, ... to class qk, can be

regarded as a sequence of flow operations, op1, op2, ..., opk. A method interface conflict

occurs when a superclass introduces a new method while one of its subclasses has previ-ously introduced a method with the same name. The new method is overridden and, therefore, can not be inherited by subclasses. In this flow model, we can detect the conflict by finding a sequence of signature-inheritance flow operations that contains ‘DsiDsi’. An unimplemented

method occurs when a class inherits an abstract method but does not define the method's body. This can be detected from a sequence of body-inheritance flow operations that con-tains ‘NbiUbi’. In addition, flow operation sequences of association and aggregation might

help to determine whether the reusing approach, by means of inheritance or composition, is appropriate for existing components.

5.3 Program Testing

Before doing regression testing, it is very important to identify the potentially af-fected parts to be re-tested when a program is modified. The potentially afaf-fected parts can be bound by ripple effect analysis with respect to the modification [3, 18]. In order to reduce the cost of testing, the re-tested parts should be as small as possible. The more precise ripple effect analysis is, the smaller the re-tested parts are. The approach proposed in [3] uses class relationships to identify affected classes. This approach can be deemed as finding the classes that can be reached from a modified class via flow paths in a CRG. Our flow model can improve the precision of their approach by using define-use relations.

class S1{

public void m1(){}; }

class S2{

public void m2(){};

public void n2 (S1 objS1) { objS1.m1()}; }

class S3{

public void m3(S2 objS2){ objS2.m2()}; }

For example, classes S1, S2, and S3 are defined in Fig. 20, and their CRG is shown in Fig. 21. Assume that class S1 is modified, but that the relationships among the three classes are unchanged. According to the analysis in [3], the classes that need to be re-tested are S1, S2, and S3 because there is an association flow path from S1 to S3. In fact, the modification in S1 does not affect S3 via S2. This can be detected by our flow analysis since there is no DUas and DUag pair between S1 and S3. Hence, only classes S1 and S2 need to be re-tested.

(26)

During structural testing (also called white-box testing), a sufficient number of test cases should be fed to a target program in order to satisfy some degree of a coverage crite-rion [28]. The critecrite-rion is defined on a model that represents a program, e.g., all-execution paths or all-branches in the control flow model. Thus, coverage criteria are an important factors affecting test case generation for structural testing. Many OO program testing techniques, e.g., [23, 29-32], focus on a single class and lack proper criteria for inter-classes. In the class relationship flow model, the define-use pairs can indicate inter-class testing criteria, for example, all-flow-paths, all-define-use-pairs, etc.

6. CONCLUSIONS AND FUTURE WORK

In this paper, the class relationship flow model, consisting of inheritance, association, and aggregation flows, has been proposed to analyze class libraries. With respect to these flows, each member within a class is associated with an operation to represent whether its status is defined or used. The concealed dependencies propagated along class relationships can be represented as a sequence of flow operations along a flow path. By representing a program as a class relationship graph, the flow analysis can be reduced to the graph reachability problem.

The interpretation of OO features can vary with program constructs in different languages, such as inheritance rules and object representation. Although Java programs were used for demonstration purposes in this paper, this flow model can be tailored to fit specific OO languages with minor modification for the interpretation of OO features. In addition, this flow model can give a user the ability to interpret behavior evolution, message passing, and object encapsulation in programs as sequences of flow operations. These operation sequences could be applied in various fields of OO software engineering, such as program understanding, anomaly detection, complexity measurement, and program testing. Currently, we are improving the efficiency of the flow computation algorithms for the whole set of classes in a program. In the future, we plan to develop testing and mainte-nance tools based on this model, and plan to embed them within an integrated visual-pro-gramming environment [33].

(27)

ACKNOWLEDGEMENT

The authors would like to thank the referees, whose comments helped to improve the overall presentation. This research was sponsored by the MOEA and supported by the Insti-tute for Information Industry, Taiwan, R.O.C.

REFERENCES

1. D. L. Metayer, “Program analysis for software engineering: new applications, new requirements, new tools,” ACM SIGPLAN Notices, Vol. 32, No. 1, 1997, pp. 86-88. 2. S. Meyers and M. Klaus, “Examining C++ program analyzers,” Dr. Dobb’s Journal,

No. 262, 1997, pp. 68-75.

3. D. C. Kung, J. Gao, P. Hsia, Y. Toyoshima, and C. Chen, “On regression testing of object-oriented programs,” Journal of Object-Oriented Programming, Vol. 8, No. 2, 1995, pp. 51-65.

4. G. Booch, Object-oriented Analysis and Design with Applications, Redwood City: Ben-jamin/Cummings, 1994.

5. J. Rumbaugh, M. Blaha, W. Premerlani, F. Eddy, and W. Lorensen, Object-oriented Modeling and Design, Prentice-Hall, Englewood Cliffs, NJ, 1991.

6. J. L. Chen and F. J. Wang, “Encapsulation in object-oriented programs,” ACM SIGPLAN Notices, Vol. 31, No. 7, 1996, pp. 30-32.

7. M. Lejter, S. Meyers, and S. P. Reiss, “Support for maintaining object-oriented programs,” IEEE Transactions on Software Engineering, Vol. 18, No. 12, 1992, pp. 1045-1052. 8. P. K. Linos and V. Courtois, “A tool for understanding object-oriented program

dependencies,” in Proceedings of IEEE Third Workshop on Program Comprehension, 1994, pp. 20-27.

9. J. L. Chen and F. J. Wang, “An inheritance flow model for class hierarchy analysis,” Information Processing Letters, Vol. 66, No. 6, 1998, pp. 309-315.

10. M. S. Hecht, Flow Analysis of Computer Programs, Elsevier North-Holland, 1977. 11. J. Gosling, B. Joy, and G. Steele, The Java Language Specification, Addison-Wesley,

Reading, Mass, 1996.

12. J. L. Chen, F. J. Wang, and Y. L. Chen, “An object-oriented dependency graph for program slicing,” in Proceedings of the 24th International Conference on Technology of Object-Oriented Languages and Systems (TOOLS 24), 1997, pp. 147-156.

13. F. E. Allen and J. Cocke, “A program data flow analysis procedure,” Communications of the ACM, Vol. 19, No. 3, 1976, pp. 137-147.

14. S. S. Muchnick and N. D. Jones, Program Flow Analysis: Theory and Applications, Prentice-Hall Inc., 1981.

15. J. Ferrante, K. J. Ottenstein, and J. D. Warren, “The program dependence graph and its use in optimization,” ACM Transactions on Programming Languages and Systems, Vol. 9, No. 3, 1987, pp. 319-349.

16. S. Horwitz and T. Reps, “The use of program dependence graphs in software engineering,” in Proceedings of the 14th International Conference on Software Engineering, 1992, pp. 392-411.

17. M. Weiser, “Program slicing,” IEEE Transactions on Software Engineering, Vol. 10, No. 4, 1984, pp. 352-357.