A Comparative Study of Language Support for Generic Programming

(1)

A Comparative Study of Language Support for Generic Programming

Ronald Garcia Jaakko J ¨arvi Andrew Lumsdaine Jeremy Siek Jeremiah Willcock

Open Systems Lab Indiana University Bloomington

Bloomington, IN USA

{

garcia,jajarvi,lums,jsiek,jewillco

}

@osl.iu.edu

ABSTRACT

Many modern programming languages support basic generic programming, sufficient to implement type-safe polymorphic containers. Some languages have moved beyond this basic support to a broader, more powerful interpretation of generic programming, and their extensions have proven valuable in practice. This paper reports on a comprehensive comparison of generics in six programming languages: C⁺⁺, Standard ML, Haskell, Eiffel, Java (with its proposed generics extension), and Generic C#. By implementing a substantial example in each of these languages, we identify eight language features that support this broader view of generic programming. We find these features are necessary to avoid awkward designs, poor maintainability, unnecessary run-time checks, and painfully verbose code. As languages increasingly support generics, it is important that language designers understand the features necessary to provide powerful generics and that their absence causes serious difficulties for programmers.

Categories and Subject Descriptors

D.2.13 [Software Engineering]: Reusable Software—reusable li- braries; D.3.2 [Programming Languages]: Language Classifica- tions—multiparadigm languages; D.3.3 [Programming Langua- ges]: Language Constructs and Features—abstract data types, con- straints, polymorphism

General Terms

Languages, Design, Standardization

Keywords

generics, generic programming, polymorphism, C⁺⁺, Standard ML, Haskell, Eiffel, Java, C#

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

OOPSLA’03, October 26–30, 2003, Anaheim, California, USA.

1. INTRODUCTION

Generic programming is an increasingly popular and important paradigm for software development and many modern programming languages provide basic support for it. For example, the use of type-safe polymorphic containers is routine programming practice today. Some languages have moved beyond elementary generics to a broader, more powerful interpretation, and their extensions have proven valuable in practice. One domain where generic programming has been particularly effective is reusable libraries of software components, an example of which is the Standard Tem- plate Library (STL), now part of the C⁺⁺Standard Library [23, 45].

As the generic programming paradigm gains momentum, it is important to clearly and deeply understand the language issues. In particular, it is important to understand what language features are required to support the broader notion of generic programming.

To aid in this process, we present results of an in-depth study comparing six programming languages that support generics: Stan- dard ML [35], C⁺⁺[17], Haskell [21], Eiffel [30], Java (with the proposed genericity extension) [6], and Generic C# [24, 33]. The first four currently support generics while the latter two have proposed extensions (and prototype implementations) that do so. These languages were selected because they are widely used and represent the state of the art in programming languages with generics.

Our high-level goals for this study were the following:

• Understand what language features are necessary to support generic programming;

• Understand the extent to which specific languages support generic programming;

• Provide guidance for development of language support for generics; and

• Illuminate for the community some of the power and sub- tleties of generic programming.

It is decidedly not a goal of this paper to demonstrate that one lan- guage is “better” than any other language. This paper is also not a comparison of generic programming to object-oriented programming (or to any other paradigm).

To conduct the study, we designed a model library by extract- ing a small but significant example of generic programming from a state-of-the art generic library (the Boost Graph Library [41]). The model library was fully implemented in all six target languages.

This example was chosen because it includes a variety of generic programming techniques (some beyond the scope of, say, the STL)

(2)

and could therefore expose many subtleties of generic programming. We attempted to create a uniform implementation across all of the languages while still using the standard techniques and idioms of each language. For each implementation, we evaluated the language features available to realize different facets of generic programming. In addition, we evaluated each implementation with respect to software quality issues that generic programming enables, such as modularity, safety, and conciseness of expression.

The results of this process constitute the main results of this paper and are summarized in Table 1. The table lists the eight language features that we identified as being important to generic programming and shows the level of support for that feature in each language. We find these features are necessary for the development of high-quality generic libraries. Incomplete support of these features can result in awkward designs, poor maintainability, unnecessary run-time checks, and painfully verbose code. As languages increasingly support generics, it is important that language designers understand the features necessary to provide powerful generics and that their absence causes serious difficulties for programmers.

The rest of this paper describes how we reached the conclusions in the table and why those language properties are important. The paper is organized as follows. Section 2 provides a brief introduction to generic programming and defines the terminology we use in the paper. Section 3 describes the design of the generic graph library that forms the basis for our comparisons. Sections 4 through 9 present the individual implementations of the graph library in the selected languages. Each of these sections also evaluates the level of support for generic programming provided by each language.

In Section 10 we discuss in detail the most important issues we encountered during the course of this study and provide a detailed explanation of Table 1. We present some conclusions in Section 11.

2. GENERIC PROGRAMMING

Definitions of generic programming vary. Typically, generic programming involves type parameters for data types and functions.

While it is true that type parameters are required for generic programming, there is much more to generic programming than just type parameters. Inspired by the STL, we take a broader view of generic programming and use the definition from [18] reproduced in Figure 1.

Associated with this definition, terminology and techniques for carrying out generic programming (and for supporting these key ideas) have emerged.

Terminology

Fundamental to realizing generic algorithms is the notion of abstraction: generic algorithms are specified in terms of abstract properties of types, not in terms of particular types. Following the ter- minology of Stepanov and Austern, we adopt the term concept to mean the formalization of an abstraction as a set of requirements on a type (or on a set of types) [1]. These requirements may be semantic as well as syntactic. A concept may incorporate the requirements of another concept, in which case the first concept is said to refine the second. Types that meet the requirements of a concept are said to model the concept. Note that it is not necessar- ily the case that a concept will specify the requirements of just one type—it is sometimes the case that a concept will involve multiple types and specify their relationships.

Concepts play an important role in specifying generic algorithms.

Since a concept may be modeled by any concrete type meeting its requirements, algorithms specified in terms of concepts must be able to be used with multiple types. Thus, generic algorithms must be polymorphic. For languages that explicitly support concepts,

Generic programming is a sub-discipline of computer science that deals with finding abstract representations of efficient algorithms, data structures, and other software concepts, and with their systematic organization. The goal of generic programming is to express algorithms and data structures in a broadly adaptable, interoperable form that allows their direct use in software construction. Key ideas include:

• Expressing algorithms with minimal assumptions about data abstractions, and vice versa, thus making them as interoperable as possible.

• Lifting of a concrete algorithm to as general a level as possible without losing efficiency; i.e., the most abstract form such that when specialized back to the concrete case the result is just as efficient as the original algorithm.

• When the result of lifting is not general enough to cover all uses of an algorithm, additionally providing a more general form, but ensuring that the most efficient specialized form is automatically chosen when applicable.

• Providing more than one generic algorithm for the same purpose and at the same level of abstraction, when none dominates the others in efficiency for all inputs. This introduces the necessity to provide sufficiently precise characterizations of the domain for which each algorithm is the most efficient.

Figure 1: Definition of Generic Programming concepts are used to constrain type parameters.

Traditionally, a concept consists of associated types, valid ex- pressions, semantic invariants, and complexity guarantees. The associated types of a concept specify mappings from the modeling type to other collaborating types (see Figure 4 for an example).

Valid expressions specify the operations that must be implemented for the modeling type. At this point in the state of the art, type systems typically do not include semantic invariants and complexity guarantees. Therefore, we state that for a type to properly model a concept, the associated types and valid expressions specified by the concept must be defined.

These primary aspects of generic programming, i.e., generic algorithms, concepts, refinement, modeling, and constraints, are realized in different ways in our different target programming languages. The specific language features that are used to support generic programming are summarized in Table 2.

Example

A simple example illustrates these generic programming issues.

The example is initially presented in C⁺⁺; Figure 2 shows versions in all six languages.

In C⁺⁺, type parameterization of functions is accomplished with templates. The following is an example of a generic algorithm, realized as a function template in C⁺⁺:

template<class T>

const T& pick(const T& x, const T& y){ if (better(x, y)) return x; else return y;

}

This algorithm applies the better function to its arguments and re- turns the first argument if better returns true, otherwise it returns the second argument.

Not every type can be used with pick. The conceptComparable is defined to represent types that may be used with pick. Unfortu-

(3)

C⁺⁺ Standard ML Haskell Eiffel Java Generics Generic C#

Multi-type concepts - ^∗ ^# ^# ^#

Multiple constraints - ^G# ^#^† ^#^‡

Associated type access ^G# ^G# ^G# ^G#

Retroactive modeling - ^# ^# ^#

Type aliases ^# ^# ^#

Separate compilation ^#

Implicit instantiation ^# ^# ^#^‡

Concise syntax ^G# ^# ^G# ^#

∗Using the multi-parameter type class extension to Haskell 98 [22].^†Planned language additions.^‡Planned for inclusion in Whidbey release of C#.

Table 1: The level of support for important properties for generic programming in the evaluated languages. “Multi-type concepts” indicates whether multiple types cane be simultaneously constrained. “Multiple constraints” indicates whether more than one constraint can be placed on a type parameter. “Associated type access” rates the ease in which types can be mapped to other types within the context of a generic function. “Retroactive modeling” indicates the ability to add new modeling relationships after a type has been defined. “Type aliases”

indicates whether a mechanism for creating shorter names for types is provided. “Separate compilation” indicates whether generic functions are type-checked and compiled independently from their use. “Implicit instantiation” indicates that type parameters can be deduced without requiring explicit syntax for instantiation. “Concise syntax” indicates whether the syntax required to compose layers of generic components is independent of the scale of composition. The rating of “-” in the C⁺⁺column indicates that while C⁺⁺does not explicitly support the feature, one can still program as if the feature were supported due to the flexibility of C⁺⁺templates.

Role C⁺⁺ ML Haskell Eiffel Java generics Generic C#

Generic algorithm function template functor polymorphic function generic class generic method generic method

Concept documentation signature type class deferred class interface interface

Refinement documentation include inheritance (⇒) inherit extends inherit (:)

Modeling documentation implicit instance inherit implements inherit (:)

Constraint documentation param sig (:) context (⇒) conformance (→) extends where

Table 2: The roles of language features used for generic programming.

nately, C⁺⁺does not support concepts directly so naming and documentation conventions have been established to represent them [1].

TheComparableconcept is documented this way in C⁺⁺: Comparable

bool better(const T&, const T&)

Any type T is a model ofComparableif there is a better function with the given signature. For int to modelComparable, we simply define a better function for ints:

bool better(int i, int j){ return j < i; }

In C⁺⁺it is customary to identify concepts by appropriately naming template parameters. The previous example would normally be written

template<class Comparable>

const Comparable&

pick(const Comparable& x, const Comparable& y){ if (better(x, y)) return x; else return y;

}

We define two types, Apple and Orange struct Apple{

Apple(int r) : rating(r){}

int rating;

};

bool better(const Apple& a, const Apple& b) { return b.rating < a.rating; }

struct Orange{

Orange(const string& s) : name(s){ } string name;

};

bool better(const Orange& a, const Orange& b)

{ return lexicographical compare(b.name.begin(), b.name.end(), a.name.begin(), a.name.end());} Apple and Orange model theComparableconcept implicitly via the existence of the better function for those types.

We finish by calling the generic algorithm pick with arguments of type int, Apple, and Orange.

int main(int, char∗[]) { int i = 0, j = 2;

Apple a1(3), a2(5);

Orange o1(”Miller”), o2(”Portokalos”);

int k = pick(i, j);

Apple a3 = pick(a1, a2);

Orange o3 = pick(o1, o2);

return EXIT SUCCESS;

}

3. A GENERIC GRAPH LIBRARY

To evaluate support for generic programming, a library of graph data structures was implemented in each language. The library provides generic algorithms associated with breadth-first search, including Dijkstra’s single-source shortest paths and Prim’s minimum spanning tree algorithms [13, 39]. The design presented here descends from the generic graph library presented in [43], which evolved into the Boost Graph Library (BGL) [41].

Figure 3 depicts the graph algorithms, their relationships, and how they are parameterized. Each large box represents an algorithm and the attached small boxes represent type parameters. An arrow from one algorithm to another specifies that one algorithm is

(4)

C

⁺⁺

ML Haskell

// concept Comparable:

// bool better(const T&, const T&) template<class Comparable>

const Comparable& pick(const Comparable& x, const Comparable& y){ if (better(x, y)) return x; else return y;

}

struct Apple{

Apple(int r) : rating(r){}

int rating;

};

bool better(const Apple& a, const Apple& b) { return b.rating < a.rating; }

int main(int, char∗[]) { Apple a1(3), a2(5);

Apple a3 = pick(a1, a2);

}

signature Comparable = sig

type value t

val better : value t∗ value t → bool end

functor MakePick(C : Comparable) = struct

type value t = C.value t

fun pick x y = if C.better(x,y) then x else y end

structure Apple = struct

datatype value t = AppleT of int fun create n = AppleT n

fun better ((AppleT x),(AppleT y)) = y< x end

structure PickApples = MakePick(Apple) val a1 = Apple.create 5 and a2 = Apple.create 3 val a3 = PickApples.pick a1 a2

class Comparable t where better :: (t, t)→ Bool pick :: Comparable t⇒ (t, t) → t pick (x, y) = if (better (x, y)) then x else y data Apple = MkApple Int

instance Comparable Apple where

better = (λ (MkApple m, MkApple n) → n < m) a1 = MkApple 3; a2 = MkApple 5

a3 = pick(a1, a2)

Eiffel Java Generics Generic C#

deferred class COMPARABLE[T]

feature

better (a: T) : BOOLEAN is deferred end end

class PICK[T→ COMPARABLE[T]]

feature

go (a: T; b: T) : T is do if a.better(b) then

Result := a else

Result := b end end end

class APPLE inherit COMPARABLE[APPLE] end create make

feature

make(r: INTEGER) is do rating := r end better (a: APPLE) : BOOLEAN is do

Result := rating< a.rating;

end

feature{APPLE}

rating : INTEGER end

class ROOT CLASS create make feature make is

local

a1, a2, a3 : APPLE;

picker: pick[APPLE];

do

create picker;

create a1.make(3); create a2.make(5);

a3 := picker.go(a1, a2);

end end

interface Comparable<T> { boolean better(T x);

}

class pick{

static<T extends Comparable<T>>

T pick(T a, T b){

if (a.better(b)) return a; else return b;

}}

class Apple implements Comparable<Apple> { Apple(int r){ rating = r; }

public boolean better(Apple x) { return x.rating < rating;}

int rating;

}

public class Main{

public static void main(String[] args){ Apple a1 = new Apple(3),

a2 = new Apple(5);

Apple a3 = pick.go(a1, a2);

}}

interface Comparable<T> { bool better(T x);

}

class pick{

static T go<T>(T a, T b) where T : Comparable<T> { if (a.better(b)) return a; else return b;

}}

class Apple : Comparable<Apple> { public Apple(int r){rating = r;}

public bool better(Apple x) { return x.rating < rating; } private int rating;

}

public class Main eg{

public static int Main(string[] args){ Apple a1 = new Apple(3),

a2 = new Apple(5);

Apple a3 = pick.go<Apple>(a1,a2);

return 0;

}}

Figure 2: Comparing Apples to Apples. TheComparableconcept, pick function, and Apple data type are implemented in each of our target languages. A simple example using each language is also shown.

(5)

implemented using the other. An arrow from a type parameter to an unboxed name specifies that the type parameter must model that concept. For example, the breadth-first search algorithm has three type parameters: G, C, and Vis. Each of these have requirements: G must model theVertex List GraphandIncidence Graphconcepts, C must model theRead/Write Mapconcept, and Vis must model the BFS Visitorconcept. Finally, breadth-first search is implemented using the graph search algorithm.

The core algorithm of this library is graph search, which tra- verses a graph and performs user-defined operations at certain points in the search. The order in which vertices are visited is controlled by a type argument, B, that models theBagconcept. This concept abstracts a data structure with insert and remove operations but no requirements on the order in which items are removed. When B is bound to a FIFO queue, the traversal order is breadth-first. When it is bound to a priority queue based on distance to a source vertex, the order is closest-first, as in Dijkstra’s single-source shortest paths algorithm. Graph search is also parameterized on actions to take at event points during the search, such as when a vertex is first dis- covered. This parameter, Vis, must model theVisitorconcept. The graph search algorithm also takes a type parameter C for mapping each vertex to its color and C is required to model theRead/Write Mapconcept.

TheRead MapandRead/Write Mapconcepts represent variants of an important abstraction in the graph library: the property map.

In practice, graphs represent domain-specific entities. For example, a graph might depict the layout of a communication network, vertices representing endpoints and edges representing direct links.

In addition to the number of vertices and the edges between them, a graph may associate values to its elements. Each vertex of a communication network graph might have a name and each edge a max- imum transmission rate. Some algorithms require access to domain information associated with the graph representation. For example, Prim’s minimum spanning tree algorithm requires “weight” information associated with each edge in a graph. Property maps provide a convenient implementation-agnostic means of expressing, to algorithms, relations between graph elements and domain-specific data. Some graph data structures directly contain associated values with each node; others use external associative data structures to express these relationships. Interfaces based on property maps work equally well with both representations.

The graph algorithms are all parameterized on the graph type.

Graph search takes one type parameter G, which must model two concepts, Incidence GraphandVertex List Graph. TheIncidence Graphconcept defines an interface for accessing out-edges of a vertex. Vertex List Graphspecifies an interface for accessing the vertices of a graph in an unspecified order. The Bellman-Ford shortest paths algorithm [4] requires a model of theEdge List Graphcon- cept, which provides access to all the edges of a graph.

That graph capabilities are partitioned among three concepts illustrates generic programming’s emphasis on algorithm requirements. The Bellman-Ford shortest paths algorithm requires of a graph only the operations described by theEdge List Graphcon- cept. Graph search, in contrast, requires the functionality of both its required concepts. By partitioning the functionality of graphs, each algorithm can be used with any data type that meets its minimum requirements. If the three graph concepts were replaced with one, each algorithm would require more from its graph type parameter than necessary—and would thus unnecessarily restrict the set of types with which it could be used.

The graph library design is suitable for evaluating generic programming capabilities of languages because it includes a rich variety of generic programming techniques. Most of the algorithms are

implemented using other library algorithms: breadth-first search and Dijkstra’s shortest paths use graph search, Prim’s minimum spanning tree algorithm uses Dijkstra’s algorithm, and Johnson’s all-pairs shortest paths algorithm uses both Dijkstra’s and Bellman- Ford shortest paths. Type parameters for some algorithms, such as the G parameter to breadth-first search, must model multiple con- cepts. In addition, the algorithms require certain relationships between type parameters. For example, consider the graph search algorithm. The C type argument, as a model ofRead/Write Map, is required to have an associated key type. The G type argument is required to have an associated vertex type. Graph search requires that these two types be the same.

The graph library is used throughout the remainder of this paper as a common basis for discussion. Though the entire library was implemented in each language, discussion is limited for brevity.

We focus on the interface of the breadth-first search algorithm and the infrastructure surrounding it, including concept definitions and an example use of the algorithm. The interested reader can find the full implementations for each language, including instructions for compilation, at the following URL:

http://www.osl.iu.edu/research/comparing/

4. GRAPH LIBRARY IN C

⁺⁺

C⁺⁺generics were intentionally designed to exceed what is required to implement containers. The resulting template system provides a platform for experimentation with, and insight into the expressive power of, generic programming. Before templates, C⁺⁺

was primarily considered an object-oriented programming language.

Templates were added to C⁺⁺for the same reason that generics were added to several other languages in our study: to provide a means for developing type safe containers [46,§15.2]. Greater emphasis was placed on clean and consistent design than restric- tion and policy. For example, although function templates are not necessary to develop type-safe polymorphic containers, C⁺⁺has al- ways supported classes and standalone functions equally; supporting function templates in addition to class templates preserves that design philosophy. Early experiments in developing generic functions suggested that more comprehensive facilities would be bene- ficial. These experiments also inspired design decisions that differ from the object-oriented generics designs (Java generics, Generic C#, and Eiffel). For example, C⁺⁺does not contain any explicit mechanism for constraining template parameters. During C⁺⁺standardization, several mechanisms were proposed for constraining template parameters, including subtype-based constraints. All proposed mechanisms were found to either undermine the expressive power of generics or to inadequately express the variety of constraints used in practice [46,§15.4].

Two C⁺⁺language features combine to enable generic programming: templates and function overloading. C⁺⁺includes both function templates and class templates; we use function templates to represent generic algorithms. We discuss the role of function overloading in the next section. In C⁺⁺, templates are not separately type checked. Instead, type checking is performed after instantiation at each call site. Type checking of the bound types can only succeed when the input types have satisfied the type requirements of the function template body. Unfortunately, because of this, if a generic algorithm is invoked with an improper type, byzantine and potentially misleading error messages may result.

4.1 Implementation

The breadth first search function template is shown in Figure 4.

C⁺⁺does not provide direct support for constraining type parameters; standard practice is to express constraints in documentation in

(6)

Breadth-First Search G

<uses>

Dijkstra Shortest Paths G D W < +

<uses>

Johnson All-Pairs G W < +

<uses>

Prim Min Span Tree G D W <

<uses>

Graph Search

G B Vis

Incidence Graph

Vertex List Graph

Bellman-Ford Shortest Paths G D W < + Edge List Graph

Read-Map Read/Write-Map

Read/Write-Map

Read-Map

C Read/Write-Map

Vertex List Graph

Vis

BFS Visitor

Visitor

Bag

C Read/Write-Map

Figure 3: Graph algorithm parameterization and reuse within the graph library. Arrows for redundant models relationships are not shown.

For example, the type parameter G of breadth-first search must also modelIncidence Graphbecause breadth-first search uses graph search.

template<class G, class C, class Vis>

void breadth first search(const G& g,

typename graph traits<G>::vertex s, C c, Vis vis);

constraints:

GmodelsVertex List GraphandIncidence Graph CmodelsRead/Write Map

map traits<C>::key == vertex map traits<C>::valuemodelsColor VismodelsBFS Visitor

Figure 4: Breadth-first search as a function template.

conjunction with meaningful template parameter names [1]. Tech- niques for checking constraints in C⁺⁺can be implemented as a library [29, 42]. These techniques, however, are distinct from ac- tual language support and involve insertion of what are essentially compile-time assertions into the bodies of generic algorithms.

The graph traits class template provides access to the associated types of the graph type. Here we use graph traits to access the ver- tex type. Traits classes are an idiom used in C⁺⁺to map types to other types or functions [37]. A traits class is a class template. For each type in the domain of the map a specialized version of the class template is created containing nested typedefs and member func- tions. In Figure 5 we specialize graph traits for the AdjacencyList class, which models Graph.

Inside the breadth first search function, calls to functions asso- ciated with the concepts, such as out edges fromIncidence Graph, are resolved by the usual function overloading rules for C⁺⁺. That is, each is resolved to the best overload for the given argument types.

Documentation for the graph concepts is shown in Table 3. In addition to function signatures, the concepts specify access to associated types such as vertex, edge, and iterator types through the graph traits class.

A sketch of a concrete adjacency list implementation is shown in Figure 5. The AdjacencyList class is a model of theIncidence GraphandVertex List Graphconcepts, but this fact is implicit. There is no mechanism for specifying that AdjacencyList models these concepts. The graph traits class is specialized for AdjacencyList so the associated types can be accessed from within function templates.

The definitions of theRead/Write MapandRead Mapconcepts are in Table 4 and the definition of theBFS Visitorconcept is in Table 5.

Graph

graph traits<G>::vertex graph traits<G>::edge vertex src(edge, const G&);

vertex tgt(edge, const G&);

Incidence GraphrefinesGraph

graph traits<G>::out edge iter modelsIterator pair<out edge iter> out edges(vertex, const G&);

int out degree(vertex, const G&);

Vertex List Graph

graph traits<G>::vertex iter models^Iterator pair<vertex iter> vertices(const G&);

int num vertices(const G&);

Table 3: Documentation for the graph concepts.

class AdjacencyList{ public:

...

private:

vector< list<int> > adj lists;

};

template<> struct graph traits<AdjacencyList> { typedef int vertex;

typedef pair<int, int> edge;

typedef list<int>::const iterator out edge iter;

};...

Figure 5: Sketch of a concrete graph implementation.

Read Map

map traits<M>::key map traits<M>::value value get(const M&, key);

Read/Write MaprefinesRead Map void put(M&, key, value);

Table 4: Documentation for the mapping concepts.

(7)

BFS Visitor

void V::discover vertex(vertex, G);

void V::finish vertex(vertex, G);

void V::examine edge(edge, G);

void V::tree edge(edge, G);

void V::non tree edge(edge, G);

void V::gray target(edge, G);

void V::black target(edge, G);

Table 5: Documentation for theBFS Visitorconcept.

In the code below, an example use of the breadth first search function is presented. The vertices of a graph are output in breadth- first order by creating the test vis visitor that overrides the function discover vertex; empty implementations of the other visitor func- tions are provided by default bfs visitor. A graph is constructed us- ing the AdjacencyList class, and then the call to breadth first search is made. The call site is the point where type checking occurs for the body of the breadth first search function template; function templates are not separately type checked. This type check ensures that the argument types satisfy the needs of the body of the generic function, but it does not verify that the types model the concepts required by the algorithm (because the needs of the body may be less than the declared constraints for the function).

typedef graph traits<AdjacencyList>::vertex vertex;

struct test vis : public default bfs visitor{

void discover vertex(vertex v, const AdjacencyList& g) { cout << v << ” ”; }

};

int main(int, char∗[]) { int n = 7;

typedef pair<int,int> E;

E edges[] ={ E(0,1), E(1,2), E(1,3), E(3,4), E(0,4), E(4,5), E(3,6)};

AdjacencyList g(n, edges);

vertex s = get vertex(0, g);

vector property map color(n, white);

breadth first search(g, s, color, test vis());

return EXIT SUCCESS;

}

4.2 Evaluation of C

⁺⁺

Generics

C⁺⁺templates succeed in enabling the expression of generic algorithms, even for large and complex generic libraries. It is relatively easy to convert concrete functions to function templates, and function templates are just as convenient for the client to call as normal functions. The traits mechanism provides a way to access associated types, an area where several other languages fail.

The C⁺⁺template mechanism, however, has some drawbacks in the area of modularity. The complete implementations of templates reside in header files (or an equivalent). Thus, users must recom- pile when template implementations change. In addition, at call sites to function templates, the arguments are not type checked against the interface of the function—the interface is not expressed in the code— but instead the body of the function template is type checked. As a result, when a function template is misused, the resulting error messages point to lines within the function template.

The internals of the library are thus needlessly exposed to the user and the real reason for the error becomes harder to find.

Another problem with modularity is introduced by the C⁺⁺overload resolution rules. During overload resolution, functions within namespaces that contain the definitions of the types of the arguments are considered in the overload set (“argument-dependent look-

up”). As a result, any function call inside a function template may resolve to functions in other namespaces. Sometimes this may be the desired result, but other times not. Typically, the operations required by the constraints of the function template are meant to bind to functions in the client’s namespace, whereas other calls are meant to bind to functions in the namespace of the generic library.

With argument-dependent lookup, these other calls can be acci- dentally hijacked by functions with the same name in the client’s namespace.

Nevertheless, C⁺⁺templates still provide type safety with genericity; there is no need to use downcasts or similar mechanisms when constructing generic libraries. Of course, C⁺⁺itself is not fully type safe because of various loopholes that exist in the type system. These loopholes, however, are orthogonal to templates.

The template system does not introduce new issues with respect to type safety.

Finally, since templates are purely a means for obtaining static (compile-time) polymorphism, there is no run-time performance penalty due to templates per se. Generic libraries, however, make heavy use of procedural and data abstraction which can induce run-time overheads, though good optimizing compilers are adept at at flattening these layers of abstraction. C⁺⁺can therefore be an excellent tool for applications where run-time efficiency is criti- cal [44, 47]. Heavy use of templates can sometimes lead to significant increases in executable size, although there are programming idioms that ameliorate this problem.

5. GRAPH LIBRARY IN ML

Generic programs in Standard ML leverage three language features: structures, signatures, and functors. Structures group program components into named modules. They manage the visibility of identifiers and at the same time package related functions, types, values, and other structures. Signatures constrain the contents of structures. A signature prescribes what type names, values, and nested structures must appear in a structure. A signature also prescribes a type for each value, and a signature for each nested structure. In essence, signatures play the same role for structures as types play for values. Functors are templates for creating new structures and are parameterized on values, types, and structures.

Multiple structures of similar form can be represented using a single functor that emphasizes characteristics the structures hold in common. Differences between these structures are captured by the functor’s parameters. Functors represent ML’s primary mechanism for generics. As illustrated in the following, structures, signatures, and functors together enable generic programming.

5.1 Implementation

Concepts are expressed in ML using signatures. The following code shows ML representations of graph concepts for the breadth- first search algorithm:

signature GraphSig = sig

type graph t eqtype vertex t end

signature IncidenceGraphSig = sig

include GraphSig type edge t

val out edges : graph t→ vertex t → edge t list val source : graph t→ edge t → vertex t val target : graph t→ edge t → vertex t end

(8)

signature VertexListGraphSig = sig

include GraphSig

val vertices : graph t→ vertex t list val num vertices : graph t→ int end

For signature names, we use the convention of affixing Sig to the end of corresponding concept names. The GraphSig signature rep- resents theGraphconcept and requires graph t and vertex t types.

It also requires vertex t to be an equality type, meaning vertex t values can be compared using the = operator.

IncidenceGraphSig and VertexListGraphSig demonstrate con- cept refinement in ML. The clause include GraphSig in each sig- nature imports the contents of the GraphSig signature. The include directive cannot, however, represent all refinements between concepts. Though a signature may include more than one other signature, all included signatures must declare different identifiers. Con- sider the following code:

(∗ ERROR: VertexListGraphSig and IncidenceGraphSig overlap ∗) signature VertexListAndIncidenceGraphSig =

sig

include VertexListGraphSig include IncidenceGraphSig end

This example shows an incorrect attempt to describe aVertex List And Incidence Graphconcept that refines both theVertex List Graph andIncidence Graphconcepts. The ML type system rejects this ex- ample because both VertexListGraphSig and IncidenceGraphSig share the vertex t and graph t names from the GraphSig signa- ture. To work around this issue, an algorithm that would otherwise require a model of theVertex List and Incidence Graphcon- cept instead requires two arguments, a model ofVertex List Graph and a model ofIncidence Graph, and places additional restrictions on those arguments. The implementation of breadth-first search in ML, shown later, demonstrates this technique.

Program components that model concepts are implemented as structures. The following code shows the adjacency list graph implemented in ML:

structure ALGraph = struct

datatype graph t = Data of int∗ int list Array.array type vertex t = int

type edge t = int∗ int

fun create(nv : int) = Data(nv,Array.array(nv,[])) fun add edge (G as Data(n,g),(src:int),(tgt:int)) = ( Array.update(g,src,tgt::Array.sub(g,src)); G ) fun vertices (Data(n,g)) = List.tabulate(n,fn a => a);

fun num vertices (Data(n,g)) = n

fun out edges (Data(n,g)) v = map (fn n => (v,n)) (Array.sub(g,v)) fun adjacent vertices (Data(n,g),v) = Array.sub(g,v)

fun source (Data(n,g)) (src,tgt) = src fun target (Data(n,g)) (src,tgt) = tgt fun edges (Data(n,g)) =

#2(Array.foldl (fn (tgts:int list,(src,sofar:(int∗int) list)) =>

(src+1,(map (fn n => (src,n)) tgts) @ sofar)) (0,[]) g)

end;

The ALGraph structure encapsulates types that represent graph val- ues and functions that operate on them. Because it meets the re- quirements of the GraphSig, VertexListGraphSig, and Incidence- GraphSig signatures, ALGraph is said to model theGraph, Ver-

tex List Graph, andIncidence Graphconcepts. ALGraph defines additional functions that fall outside the requirements of the three signatures. The create function, for example, constructs a value of type graph t, which represents a graph with nv vertices.

In ML, algorithms are implemented using functors. The following code illustrates the general structure of a generic breadth-first search implementation:

functor MakeBFS(Params : BFSPSig) = struct

fun breadth first search g v vis map = ...

end;

Generic algorithms are instantiated by way of functor application.

When a functor is applied to parameters that satisfy certain requirements, it creates a new structure specialized for the functor param- eters. The MakeBFS functor takes one parameter, a structure that fulfills the requirements of the following signature:

signature BFSPSig = sig

structure G1 : IncidenceGraphSig structure G2 : VertexListGraphSig structure C : ColorMapSig structure Vis : BFSVisitorSig sharing G1 = G2 = Vis

sharing type C.key t = G1.vertex t end

The signature dictates that Params must contain four nested struc- tures, each corresponding to an algorithm parameter. BFSPSig en- forces concept requirements by constraining its nested structures with signatures. The G1 structure, for example, is constrained by the IncidenceGraphSig signature.

The breadth-first search algorithm ideally requires a graph type argument that models both the Incidence Graph and Vertex List Graphconcepts. Because the signatures that represent these two concepts cannot be composed, the implementation requires two arguments, constrained by the signatures IncidenceGraphSig and VertexListGraphSig respectively. When the MakeBFS functor is applied, the same structure is bound to both type parameters.

In addition to listing required structures, BFSPSig specifies that some type names in the structures must refer to identical types.

These are denoted as sharings. Two sharings appear in the BFSPSig signature. The first is a structure sharing among G1, G2, and Vis.

It states that if the three structures share any nested element name in common, then the name must refer to the same entity for all three structures. For example, each of the three structures is required by its signature to contain a nested type vertex t. The sharing re- quires that G1.vertex t, G2.vertex t, and Vis.vertex t must refer to the same type. The second sharing, a type sharing, declares that C.key t and G1.vertex t must be the same type. Sharings emphasize that in addition to the signature requirements placed on each sub- structure of Params, certain relationships between structures must also hold.

ML supports multi-parameter functors, but it does not support sharing specifications among the parameters. As a workaround, functors that implement generic algorithms accept a single structure parameter whose signature lists the algorithm’s arguments and specifies the necessary relationships among them. Since the structure argument to the functor can be defined at the point of application, the single parameter solution is reasonable.

The following code shows a call to breadth first search:

structure BFS = MakeBFS(struct

structure G1 = ALGraph structure G2 = ALGraph

(9)

structure C = ALGColorMap structure Vis = VisitImpl end)

BFS.breadth first search g src (VisitImpl.create()) (ALGColorMap.create(graph));

First, the algorithm is instantiated by applying MakeBFS to a struc- ture, defined in place, that meets BFSPSig’s requirements. The ALGraph structure is used to match both the IncidenceGraphSig and VertexListGraphSig signatures. Although this is awkward, it avoids the explicit declaration of a VertexListAndIncidenceGraph- Sig signature, which cannot be constructed by composing the two mentioned signatures. The ALGColorMap structure models the Read/Write Mapconcept. The VisitImpl structure models theBFS Visitorconcept and encapsulates user-defined callbacks. The three structures together meet the sharing requirements of BFSPSig. Ap- plication of the MakeBFS functor defines the BFS structure, which encapsulates a breadth first search function specialized with the above structures. Finally, BFS.breadth first search is called with parameters that match the now concrete type requirements.

5.2 Evaluation of ML

ML language mechanisms provide good support for generic programming. Signatures and structures conveniently express concepts and concept models using nested types and functions to implement associated types and valid expressions. The structure representation of concept models enables modularity by managing iden- tifier visibility. Functors can express any generic algorithm of similar complexity to the described graph library algorithms. Signa- tures effectively constrain generic algorithms with respect to the concepts upon which the algorithms are parameterized. Sharing specifications enable separate type checking of generic algorithms and their call sites. They capture additional requirements on the concept parameters to an algorithm. All necessary sharing relationships between functor parameters must be declared explicitly.

If not, ML will issue type checking errors when the functor is an- alyzed. When a functor is applied, ML verifies that its arguments also meet the sharing and signature requirements imposed on the functor.

Technically, functors are not the only means for implementing generic algorithms. ML programmers often use polymorphic functions and parameterized data types to achieve genericity. An example of this style of programming follows.

(∗ concept ∗)

datatype ’a Comparable = Cmp of (’a→ ’a → bool);

(∗ models ∗)

datatype Apples = Apple of int;

fun better apple (Apple x) (Apple y) = x> y;

datatype Oranges = Orange of int;

fun better orange (Orange x) (Orange y) = x> y;

(∗ algorithm ∗)

fun pick ((Cmp better):’a Comparable) (x:’a) (y:’a) = if (better x y) then x else y;

(∗ examples ∗)

pick (Cmp better apple) (Apple 4) (Apple 3);

pick (Cmp better orange) (Orange 3) (Orange 4);

This example implements the better algorithm in terms of theCom- parableconcept. Here a concept is realized using a parameterized data type that holds a table of functions or dictionary. The con- cept’s associated types are the data type’s parameters, and its valid expressions are the dictionary functions. In addition to other val-

ues, a generic algorithm takes a dictionary for each concept model it requires. The algorithm is then implemented in terms of the functions from the dictionaries.

This style of generic programming in ML, though possible, is not ideal. In larger ML programs, managing dictionaries manually becomes cumbersome and increases the code base signifi- cantly. This situation is analogous to implementing virtual tables in C rather than leveraging the object-oriented programming features of C⁺⁺. In fact, some Haskell environments lower programs that use generics (type classes) to equivalent Haskell programs that use this dictionary-passing style. Automating the mechanisms of generic programming is preferable to implementing them manually.

Using ML functors to implement generic algorithms enables the convenient application of algorithms to a variety of user-defined components. Functors in ML only require their arguments to conform structurally to the specified signatures. Since ML structures can implicitly conform to signatures, a structure need not be designed with a signature in mind. Thus, a generic ML algorithm, written in terms of signatures, can operate on any structures that meets its requirements.

In order to promote modularity, a language may allow program components that model concepts to be statically checked against concepts prior to their use with generic algorithms. When a structure is defined in ML, it may be constrained by a signature. In this manner a structure’s conformity to a signature can be confirmed apart from its use in a generic algorithm. Constraining a structure with a signature limits its interface to that described by the signature. This may not be the desired result if the structure defines members that the signature does not declare. For example, if the ALGraph structure were declared:

structure ALGraph : IncidenceGraphSig = ...

then it would no longer meet the VertexListGraphSig requirements because vertices and num vertices would not be visible.

Rather than constrain the structure directly, the conformity of ALGraph to the necessary signatures can be checked as shown in the following code outline:

structure ALGraph = struct

...

end

structure ALGraphCheck1 : IncidenceGraphSig = ALGraph;

structure ALGraphCheck2 : VertexListGraphSig = ALGraph;

The structures ALGraphCheck1 and ALGraphCheck2 are both as- signed ALGraph and constrained by the IncidenceGraphSig and VertexListGraphSig signatures respectively. Each of these struc- tures confirms statically that ALGraph conforms to the correspond- ing signature without limiting access to its structure members. This technique as a side effect introduces the unused ALGraphCheck1 and ALGraphCheck2 structures.

As previously described, the include mechanism for signature combination in ML cannot express concept refinements that involve overlapping concepts. Ramsey [40] discusses this shortcoming and suggests language extensions to address it.

6. GRAPH LIBRARY IN HASKELL

The Haskell community uses the term “generic” to describe a form of generative programming with respect to algebraic datatypes [2, 15, 19]. Thus the typical use of the term “generic” with respect to Haskell is somewhat different from our use of the term. How- ever, Haskell does provide support for generic programming as we have defined it here and that is what we present in this section.

(10)

The specification of the graph library in Figure 3 translates natu- rally into polymorphic functions in Haskell. In Haskell, a function is polymorphic if an otherwise undefined type name appears in the type of a function; such a type is treated as a parameter. Constraints on type parameters are given in the context of the function, i.e., the code between :: and⇒. The context contains class assertions. In Haskell, concepts are represented with type classes. Although the keyword Haskell uses is class, type classes are not to be confused with object-oriented classes. In traditional object-oriented terminology, one talks of objects being instances of a class, whereas in Haskell, types are instances of type classes. A class assertion declares which concepts the type parameters must model. In Haskell, the term instance corresponds to our term model. So instead of saying that a type models a concept, one would say a type is an instance of a type class.

6.1 Implementation

As with the previous languages, we focus on the interface of breadth-first search. The Haskell version of this function is shown below. The first line gives the name, and the following three are the context of the function. The function is curried; it has five parame- ters and the return type is a, a user defined type for the output data accumulated during the search.

breadth first search ::

(VertexListGraph g v, IncidenceGraph g e v, ReadWriteMap c v Color,

BFSVisitor vis a g e v)⇒ g→ v → c → vis → a → a

The following are the type classes for theGraph,Incidence Graph, andVertex List Graphconcepts:

class Graph g e v| g → e, g → v where src :: e→ g → v

tgt :: e→ g → v

class Graph g e v⇒ IncidenceGraph g e v where out edges :: v→ g → [e]

out degree :: v→ g → Int

class VertexListGraph g v| g → v where vertices :: g→ [v]

num vertices :: g→ Int

The use of contexts within type class declarations is the Haskell mechanism for concept refinement. Here we have IncidenceGraph refining the Graph concept.

Associated types are handled in Haskell type classes differently from C⁺⁺or ML. In Haskell, all the associated types of a concept must be made parameters of the type class. Thus, the graph concepts are parameterized not only on the main graph type, but also on the vertex and edge types. If we had used an iterator abstraction instead of plain lists for the out-edges and vertices, the graph type classes would also be parameterized on iterator types.

In Haskell 98, type classes are restricted to a single parameter, but most Haskell implementations support multiple parameters. The g→ e denotes a functional dependency [20,22]. That is, for a given graph type g there is a unique edge type. Without functional depen- dencies it would be difficult to construct a legal type in Haskell for breadth first search.

The BFSVisitor type class, shown below, is parameterized on the graph, queue, and output type a. The queue and output type are needed because Haskell is a pure functional language and any state changes must be passed through explicitly, as is done here, or implicitly using monads. The visitor concept is also parameterized on the vertex and edge types because they are associated types of

data AdjacencyList = AdjList (Array Int [Int]) deriving (Read, Show)

data Vertex = V Int deriving (Eq, Ord, Read, Show) data Edge = E Int Int deriving (Eq, Ord, Read, Show) adj list :: Int→ [(Int,Int)] → AdjacencyList adj list n elist =

AdjList (accumArray (++) [] (0, n− 1) [ (s, [t])| (s,t) ← elist]) instance Graph AdjacencyList Edge Vertex where

src (E s t) g = V s tgt (E s t) g = V t

instance IncidenceGraph AdjacencyList Edge Vertex where out edges (V s) (AdjList adj) = [ E s t| t ← (adj!s) ] out degree (V s) (AdjList adj) = length (adj!s) instance VertexListGraph AdjacencyList Vertex where

vertices (AdjList adj) = [V v| v ← (iota n) ] where (s,n) = bounds adj

num vertices (AdjList adj) = n+1 where (s,n) = bounds adj

Figure 6: Simple adjacency list implementation.

the graph. The BFSVisitor type class has default implementations that do nothing.

class (Graph g e v)⇒ BFSVisitor vis q a g e v where discover vertex :: vis→ v → g → q → a → (a,q) examine edge :: vis→ e → g → q → a → (a,q) ...

discover vertex vis v g q a = (a,q) examine edge vis e g q a = (a,q) ...

The implementation of the AdjacencyList type is shown in Fig- ure 6. The AdjacencyList type must be explicitly declared to be an instance of theIncidence GraphandVertex List Graphconcepts.

The following shows an example use of the breadth first search function to create a list of vertices in breadth-first order.

n = 7::Int

g = adj list n [(0,1),(1,2),(1,3),(3,4),(0,4),(4,5),(3,6)]

s = vertex 0 data TestVis = Vis

instance BFSVisitor TestVis q [Int]

AdjacencyList Edge Vertex where discover vertex vis v g q a = ((idx v):a,q) color = init map (vertices g) White

res = breadth first search g s color Vis ([]::[Int])

Here, the idx function converts a vertex from an AdjacencyList to an integer. At the call site of a polymorphic function, the Haskell implementation checks that the context requirements of the function are satisfied by looking for instance declarations that match the types of the arguments. A compilation error occurs if a match cannot be found.

6.2 Evaluation of Haskell Generics

In general, we found Haskell to provide good support for generic programming. The Haskell type class mechanism, with the extensions for multiple parameters in type classes and functional depen- dencies, provides a flexible system for expressing complex generic libraries. The type classes and polymorphic functions provide suc- cinct mechanisms for abstraction, and invoking a polymorphic func-