Copyright c IEEE 7th International Conference on Computer Vision, Kerkyra, Greece, 1999.
Approximate Tree Matching and Shape Similarity
Tyng-Luh Liu
Institute of Information Science Academia Sinica
Nankang, Taipei 115 Taiwan [email protected]
Davi Geiger
Courant Institute of Mathematical Science New York University
New York, NY 10012 USA [email protected]
Abstract
We present a framework for2D shape contour (silhou- ette) comparison that can account for stretchings, occlu- sions and region information. Topological changes due to the original 3D scenarios and articulations are also ad- dressed. To compare the degree of similarity between any two shapes, our approach is to represent each shape con- tour with a free tree structure derived from a shape axis (SA) model, which we have recently proposed. We then use a tree matching scheme to find the best approximate match and the matching cost. To deal with articulations, stretch- ings and occlusions, three local tree matching operations, merge, cut, and merge-and-cut, are introduced to yield op- timally approximate matches, which can accommodate not only one-to-one but many-to-many mappings. The opti- mization process gives guaranteed globally optimal match efficiently. Experimental results on a variety of shape con- tours are provided.
1 Introduction
Object shapes can deform due to changes in view- ing condition, deformations, articulations and occlusions.
In order to compare two object shapes, i.e., to assign a match/correspondence and to give a measure of similarity, all these issues must be modeled and accounted for.
Methods to compare two shape contours based on eval- uating global deformations [6] tend to be sensitive to oc- clusion and fail to account for local deformations (such as articulations).
A class of methods compares objects by deforming one object into another and evaluating the amount of deforma- tion applied in this process, including [7, 19]. Guaran- teed methods, typically, use dynamic programming (time- warping) to register two contours. These are all string (con- tour) matching algorithms, e.g., [2, 5]. Problems with these approaches are that they
1. do not account for region information and for symme- tries (see Figure-1).
2. are sensitive to topological changes. Imagine com- paring two flowers with a stem in different sides (see Figure-4(b)(d)). Most approaches will consider these as occlusions and need to pay for large penalties/costs to match them, while the fact that the “occluded” parts are similar though at different places.
3. have problems of efficiency, since the size of occlu- sions can make these methods drastically slow.
Γ2 Γ1 Γ3
Γ2/SA Γ1/SA Γ3/SA
COST Γ1 Γ2 Γ3
Γ1 0 11.2607 18.874
Γ2 11.2607 0 17.8634
Γ3 18.874 17.8634 0
Figure 1.Γ2andΓ3are derived fromΓ1by dif- ferent deformations (same amount of stretch- ings but at different places). While most of the local string deformation methods will fail to distinguish the dissimilarity between them,Γ2is considered more similar toΓ1by our method.
Our goal is to develop a shape representation of objects that yields similarity measures that can account for local de- formations, symmetries, and region information. We follow the view of comparing deformable objects by measuring the amount of energy needed to locally deform one shape into the other. Moreover, we attempt to provide a shape compar- ison method that considers not only local deformations but global shape symmetries.
We start with the representation of shapes and consider a shape axis (SA) representation [9](see Figure-2). The SA of a given shape contour is obtained through a self-similarity variational framework; a unique shape axis tree (SA-tree), where every pair of consecutive nodes (edge) corresponds to an object substructure, can be constructed to encode the contour data and its SA. Since the SA framework is varia- tional, we also obtain a measure of how effective the SA- tree representation is for a given shape, namely the value of the minimal cost.
Each shape contour is represented by an SA-tree so that the similarity between shapes can be evaluated via a tree matching scheme. The cost of matching edges is the cost of comparing two object parts (local deformations and region information can be considered). A key issue is to structure the set of possible correspondences, and perform an effi- cient search for the best correspondence. We also require a cost function to determine how local differences between the shape contours should affect their perceived similarity.
In this paper we describe a tree matching scheme that uses the neighboring topological structure among nodes and is much more complicated than simple string matching al- gorithms. Any two SA-trees are not required to have the same number of nodes; thus we seek the best approximate tree match between the two trees [14]. Pruning and merging vertices can be applied in the process of matching. Such a method has to address occlusions, that is, some subtrees of an SA-tree may not be matched. It has to account for region information and stretching, i.e., the comparison/matching can not only be between tree structures, but has to consider the region and contour segments associated with a particu- lar edge of an SA-tree. It has to deal with articulations, e.g., the cost for the mismatch of angles (each angle is measured by a pair of consecutive edges) should increase sub-linearly.
As we will show, the tree-matching shape comparison algo- rithm is very efficient and it can be applied to, e.g., anima- tion and on-line image (shape contour) database retrieval.
1.1 Previous Work
Siddiqi et al. [17] have proposed a shape matching method based on a shock graph grammar where to match two nodes in the shock trees, an affine transformation is used to align two interpolated geometric curves. The ap- proach is interesting but can not account for articulations
occurred with respect to each node’s geometric structure.
More recently, they have presented a new framework based on finding maximal cliques of the association graphs to match two trees [12]. The matching scheme works well in matching hierarchical structures. However, it is limited to finding only one-to-one correspondences, which may not be suitable for flexible objects with articulations where an object part may correspond to more than one nodes.
In Zhu and Yuille’s work [20], a FORMS system is pro- posed to recognize and represent flexible objects from their silhouettes. The silhouettes are derived from skeleton ex- traction and part segmentation, using a deformable circle method. To compare two objects, say hands, they first com- pute each object’s skeleton then match each skeleton to a model of hand where its skeleton is well-defined. In this way, the skeleton of each object can be refined. A pair of parts in the two objects are matched to each other if they correspond to the same part in the model. This implies that the shape comparison between two object is not done di- rectly but via referencing an additional model.
Our method in shape similarity differs from theirs on allowing many-to-many correspondences so an edge in an SA-tree can be matched to a path consisting of more than one edges in the other SA-tree (note that, for simplicity, we only consider paths consisting of two consecutive edges).
Thus the mappings between nodes of two SA-trees are not required to be one-to-one due to the merge, cut and merger- cut operations. Also, to compute the cost of an edge-to-edge or edge-to-path matching, we have used a local shape com- parison model [2] that can account for articulations. This is important that we can easily extend our approach to seg- mentation for real images by combining this model with an active contour tracker. Unlike the FORMS, in our system no model is required when comparing two shapes to derive the correspondences.
2 Shape Representation
We adopt the shape representation framework developed in [9], leading to a unique SA-tree. The advantages over other related representations [3, 10, 1, 4, 11, 13, 15, 16] are that (i) we are not seeking a symmetry axis representation but rather, a set of correspondences along the shape con- tour structured in a tree graph, and (ii) we use a variational approach to establish a measurement on how good the rep- resentation of a contour shape is (see Figure-1).
2.1 Shape Axis Tree
Given a (shape) contour (e.g., Figure-2 (a)(b)(c)), we can represent it as a parameterized curve: Γ = Γ(s) = {x(s), s ∈ [0, 1)} where x(s) are the coordinates of the contour points. To find the SA of contourΓ(s), we match
(a) (b) (c) (d) (e) (f) (g) (h) (i) Figure 2. Shape axis model: shape contours and their shape axes and SA-trees. In an SA-tree, the non-leaf vertices correspond to bifurcations in the shape axes.
Γ(s) to its own mirror version ˜Γ = ˜Γ(t) = {˜x(t) = x(1 − t), t ∈ (0, 1]} using the cost functional established in [9]. Given a correspondence t(s) between Γ(s) and ˜Γ(t), the SA of contourΓ is defined as the set of middle points between x(s) and ˜x(t), i.e.,
xSA(s) = x(s) + ˜x(t(s))
2 =x(s) + x(1 − t(s))
2 .
Also following [9], we can construct a unique SA-tree by grouping the discontinuities in the the correspondence t(s). In Figure-2(d)(e)(f), the dashed lines are the optimal correspondences t(s) and the shape axes are formed by con- necting the middle points. The corresponding SA-trees are shown in Figure-2 (g)(h)(i).
An SA-tree is a free tree (a connected, acyclic and undi- rected graph) and there are two types of vertices in T . The first type of vertices contains all the leaves and the second includes those corresponding to bifurcations (non-leaf ver- tices). Note that each edge of an SA-tree corresponds to a pair of shape contour segments (see Figure-2).
3 Shape Similarity and Tree Matching
Our approach makes use of both the global and local in- formation of shapes. The global symmetries are captured by the SA-tree representation itself. The degree of deformation of one shape into the other is then modeled by the cost of approximately matching one SA-tree to the other one. An approximate matching is necessary since viewing position, occlusion and stretching may yield different SA-trees for the same object shape.
By formulating the shape similarity problem as an ap- proximate tree matching one, we need to investigate the fol- lowing issues.
• Unlike the regular approximate tree pattern matching [14], we want to find not only the node-to-node but also the edge-to-edge/edge-to-path correspondences.
In our case, a node of an SA-tree conveys the topo- logical structure of shape and an edge encodes the cor- responding shape information.
• When modeling occlusions, we need to consider the possible deletions or merges of subtree structures and to estimate the penalties (or costs) for them.
• When modeling articulations and stretchings, we should have a local shape comparison model to evalu- ate the cost of edge-to-edge/edge-to-path matching so that it can account for articulations. By an edge-to- path matching, we mean a stretching matching over the tree structures. Note that while the comparison is fully structured by the SA-trees, the cost of comparing edges is based on the actual shapes associated with the edges. This cost includes shape bending and stretching and possibly other region deformations.
The optimal approximate matching between two SA- trees can be found efficiently with an A∗ algorithm. Our focus is to illustrate that with only local tree matching op- erations (to be defined later), the shape comparison method can account for topological changes, articulations, deforma- tions and occlusions.
We now first elaborate how to find the best approximate matching then concentrate on how to formulate the local tree matching operations.
3.1 SA-Tree Matching Algorithm
Conceptually, we can construct a solution tree where each node represents an edge-to-edge or edge-to-path local matching of two SA-trees, and every path, from the root to a leaf, corresponds to a sequence of local matchings and a possible solution/match. Our goal is to find the best match, and this can be achieved by using an A∗-like algorithm to locate the optimal path in the solution tree, i.e., the best ap- proximate match. Though there are many different ways to solve the approximate tree matching problems in poly- nomial time [14], the A∗ approach has the advantage to be easily extended to real image application.
To initialize the optimization process, we begin with set- ting up a “virtual root” of a solution tree with cost0. Then,
generate all the root’s children, i.e., the level-1 nodes, by considering all possible edge-to-edge or edge-to-path local matchings where each of them must contain a leaf. Add all the level-1 nodes into a priority queue Q together with their respective (local) matching costs. To grow the solution tree, the current item in Q with the minimal key (cost) is located and this min-item corresponds to some node in the solu- tion tree. We then extend all of its possible child nodes and again add them into Q, taking into account the existing lo- cal matches from the root to this current min-node as well as properties of local matching. The optimization process stops when we first reach a leaf in the solution tree and the optimal path can be recovered by tracing back to the root.
Note that although an edge-to-edge or edge-to-path local matching may appear in many different paths in a solution tree, its shape comparison cost is only computed once, that is, we save the local shape comparison costs in a look-up table. This guarantees that our method can efficiently locate the best approximate matching.
3.2 Tree Matching Operations
We now explain how shape information is encoded into an SA-tree structure and how local tree matching operations are applied to deal with occlusions and stretchings. To illus- trate, we use shape contours and SA-trees in Figure-3.
We denote the edge connecting vertices u2 and u5 as e(u2, u5) and the corresponding contour segments as CT(u2, u5) (see Figure-3 (b)) where, in this example, CT(u2, u5) = [ΓBC1 ,ΓCD1 ] . Note that the order of con- tour segments does matter and is always arranged in a counter-clockwise manner. This implies CT(u2, u5) = CT(u5, u2) = [ΓBC1 ,ΓCD1 ].
Next we describe some useful rules/models that we have adopted for matching two contours via tree structures.
1. To compare the similarity between two contour seg- ments, we use the model established in [2] to compute the cost, i.e., given two contour segments, sayΓs(pa- rameterized by s) andΓt(parameterized by t), the cost of shape similarity comparison is
costS(Γs,Γt) = min
t(s) costS(Γs,Γt, t(s))
= min
t(s)
Γs
|ktt−ks|2
|ktt|+|ks|+ λ|tt−1|+12
ds , (1)
where t = dt/ds and ks, kt are the curvatures at Γs(s), Γt(t), respectively. The first term in the inte- gral of (1) is the bending cost and the second is the stretching cost. λ weights the relative contributions of stretching and bending. A correspondence t(s) is con- sidered optimal if it minimizes (1). It is known that the above model can account for articulations.
2. Given two SA-trees, say T1 and T2, we say that e(ui, uj) ∈ T1 is matched to e(vk, vl) ∈ T2
if node ui is mapped to node vk and node uj
is mapped to node vl. The cost is denoted as cost(e(ui, uj), e(vk, vl)) and is computed from the cost of comparing CT(ui, uj) with CT (vk, vl) , i.e.,
cost(e(ui, uj), e(vk, vl))
= costS(CT (ui, uj), CT (vk, vl)) .
For example, in Figure-3 (a)(b)(c)(d), the cost of matching e(u5, u2) ∈ T1to e(v4, v1) ∈ T2is
costS(CT (u5, u2), CT (v4, v1))
= costS(ΓBC1 ,ΓAB2 ) + costS(ΓCD1 ,ΓBC2 ) . 3. We require a leaf in T1can only be matched to a leaf in
T2, and vice versa. This gives rise to the topological similarity (this condition can be relaxed by allowing
“stretching” matchings described next).
4. Recall that an SA-tree is a free tree. When comparing two SA-trees, they become rooted trees with respect to each solution path. For instance, in Figure-3, u5and v4 become the root of T1and T2, respectively, along a so- lution path starting with an initial local match between e(u5, u2) and e(v4, v1).
The above rules are not sufficient for our application. We need to incorporate tree matching operations that can cut or merge the substructures of a tree to derive a good match and account for deformations and occlusions. Thus we in- troduce local “stretching matchings” allowing an edge to be matched to a path of length2. The notation p(u, v, w) is used to denote a path of two edges, namely e(u, v) followed by e(v, w) . There are three types of stretching matching in- troduced in this work includingmerge,cutandmerge-and- cut. Altogether, they address the issues of stretchings and occlusions directly.
Merge operation: The structure of SA-trees from a same class of objects could be different due to movements and stretchings (see Figure-3 (b)(d)). To model the scenario, we design a “merge” operation (M -operation for abbrevi- ation) that an edge, say e(v2, v1) can be matched to, say, p(u3, u2, u1) through a merge between nodes u2 and u1 (Figure-3 (g)). The newly merged node will be denoted as [u2u1] to indicate that node u1 is merged with u2 and all child nodes of u1become children of u2 . For each merge, a penalty cost, denoted as costM, needs to be paid and it is proportional to the product of total length of contour seg- ments being merged and some positive real-valued function of the difference of the neighboring configurations between the merged node,[u2u1], and its matched node, v1 . More
C A B
D
E F G
u0 u1 u2
u3 u4
u5 u6
C
B D
A E
v0 v1
v2 v3
v4 v5
A
B C
w0 w1 w2
w3
w4 w5
(a)Γ1 (b) T1 (c)Γ2 (d) T2 (e)Γ3 (f) T3
Merge
u1 u2 u3
v1
v2 p(u3, u2, u1)←→
M e(v2, v1)
CUT
C E
A B
G
F A
C
B D
u1 u2
u3 u4
v1 v2 p(u3, u2, u1)←→
C e(v2, v1)
Γ1 Γ2
CUT MERGE
u1 u2
u3 u4
v1
v2 p(u3, u2, u1)←→
M Ce(v2, v1)
(g) (h) (i)
Figure 3. (a)-(f) are human shape contours and their SA-trees. (g) An example of stretching match via a merge operation (M-operation). (h) An example of stretching match via a cut operation (C-operation) overlapped with its shape contour. (i) An example of stretching match via a merge- and-cut (M C-operation).
specifically, the total cost of this match via an M -operation is
cost(p(u3,[u2u1]), e(v2, v1))
= costS(CT (u3, u2), CT (v2, v1)) + costM([u2u1], v1) ,
where
costM([u2u1], v1)
= α ×CT (u2, u1) × (1 +max|deg([u(deg([u2u12])−deg(vu1]),deg(v1)|1))) . In most cases, the parameter α is set to 0.2. The no- tation deg(u) is the number of adjacent nodes of u and
CT (u2, u1) is the length of contour segments of CT(u2, u1). In Figure-3 (g), we have deg([u2u1]) = deg(v1) = 4. The penalty costM is defined in the way that, after a merge, the less similar in the topological con- figurations are, the more expensive costM is.
Cut operation: A “cut” operation (C-operation) is ap- plied to a stretching match to remove extra subtree struc- tures. This is especially useful in dealing with occlusions while some part structures of an object are missing due to changes of viewing direction.
In Figure-3 (h), p(u3, u2, u1) is matched to e(v2, v1) via a “C-operation”. We use the notation u3ˆu2u1to indicate
that except e(u3, u2) and e(u2, u1), all edges (including their subtrees) connecting to u2 are cut from the SA-tree.
Similar to the merge case, we need to estimate the penalty cost, costC, for a C-operation.
cost(p(u3,ˆu2, u1), e(v2, v1))
= costS(CT (u3, u2) ∪ CT (u2, u1), CT (v2, v1)) +costC(u3ˆu2u1) ,
where
costS(CT (u3, u2) ∪ CT (u2, u1), CT (v2, v1))
= costS(ΓAB1 ∪ ΓBC1 ,ΓAB2 ) + costS(ΓCD1 ∪ ΓF G1 ,ΓBC2 ) and
costC(u3ˆu2u1)
= β × (CT (u2, u4) + Gap(e(u3, u2), e(u2, u1))) . Gap(e(u3, u2), e(u2, u1)) is the length of total gaps be- tween CT(u3, u2) and CT (u2, u1) and it is equal to the distance between point D and F in contourΓ1of Figure- 3 (h). Again, β is a parameter to be adjusted and we have used β = 0.2 . Note that a C-operation is not just deleting some subtrees form a contour but also creating a gap (see Figure-4 (g) and Figure-5(e)). Thus, both factors should be considered in formulating a reasonable costC.
Merge-and-Cut operation: A merge-and-cut (M C- operation) is a combination of M -operation and C- operation as shown in Figure-3 (i). Therefore, the cost of a stretching match with an M C-operation is
cost(p(u3,[ˆu2u1)]), e(v2, v1))
= costS(CT (u3, u2), CT (v2, v1)) + costM C(u3[ˆu2u1]) ,
where
costM C(u3[ˆu2u1]) = costM([u2u1], v1)+costC(u3ˆu2u1) .
4 Examples and Discussion
Some of the experimental results are shown in Figure- 4 and 5. In each example, we show the best approximate match between the two contours and the comparison cost.
The quadruple includes the sizes of contours associated with the two SA-trees and the sizes of omitted contour seg- ments of the best match due to cuts or merges. It takes less than one minute to complete a shape comparison task on a Pentium-II PC, if the SA structures are given.
We have developed a tree matching framework combin- ing local and global approach for shape comparison based on a shape axis model. The issues of occlusions and ar- ticulations are handled by formulating the comparison task as an approximate tree matching problem. We use A∗ al- gorithm to find the comparison cost and best matching be- tween a pair of contours. Our method can be extended to real images because (1) the region information can be used in modeling the tree matching operations, and (2) the A∗ scheme is easier to be combined with a tree structure-wise grouping process.
Acknowledgments
T-L. Liu is supported in part by the Institute of Infor- mation Science, Academia Sinica of Taiwan. D. Geiger is supported in part by an NSF career award grant.
References
[1] H. Asada and M. Brady. The Curvature Primal Sketch. IEEE PAMI, Vol. 5, pp. 2–14, 1983.
[2] B. Basri, L. Costa, D. Geiger, and D. Jacobs. Determine Shape Similarity. IEEE workshop in Physics Based Vision, Boston, June 1995.
[3] H. Blum. Biological Shape and Visual Science. J. of Theoret- ical Biology, 38:205-287, 1973.
[4] C.A. Burbeck and S.M. Pizer. Object Representation by Cores: Identifying and Representing Primitive Spatial Re- gions. Vision Research, Vol. 35, pp. 1917-19301995.
[5] Y. Gdalyahu and D. Weinshall. Measures for Silhouettes Re- semblance and Representative Silhouettes of Curved Objects.
4th ECCV, Cambridge, UK, April 1996.
[6] D. Huttenlocher, G. Klanderman, and W. Rucklidge. Com- paring Images Using the Hausdorff Distance. IEEE PAMI, 15(9):850-863, 1993.
[7] M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Active Contour Models. Int. J. Comput. Vision 1(4):321-331, 1998.
[8] K. Kupeev and H. Wolfson. On Shape Similarity. Proceedings Int. Conf. on Pattern Recognition, pp. 227-237, 1994.
[9] T-L. Liu, D. Geiger and R. V. Kohn. Representation and Self- Similarity of Shapes. ICCV, pp. 1129-1135, Bombay, India, 1998.
[10] R. Nevatia and T. O. Binford. Description and Recognition of Curved Objects. Artificial Intelligence, Vol. 8, pp. 77–98, 1977.
[11] R. Ogniewicz. Discrete Voronoi Skeletons. Hartung-Gorre, 1993.
[12] M. Pelillo, K. Siddiqi and S. W. Zucker. Matching Hierar- chical Structures Using Association Graphs. ECCV, Freiburg, Germany, 1998.
[13] W. Richards and D. D. Hoffman. Codon Constraints on Closed2D Shapes. CVGIP, 31(2):156-177, 1985.
[14] D. Shasha, J. Wang and K. Zhang. Exact and Approximate Algorithm for Unordered Tree Matching. IEEE Trans. Sys- tems, Man, and Cybernetics, 24(4), pp. 668-678, 1994.
[15] K. Siddiqi and B. B. Kimia. Parts of Visual Form: Compu- tational Aspects. IEEE PAMI, Vol. 17, No. 3, pp. 239-251, March, 1995.
[16] K. Siddiqi and B.B. Kimia. A Shock Grammar for Recogni- tion. CVPR, pp. 507-513, S. Francisco, 1996.
[17] K. Siddiqi, A. Shokoufandeh, S. Dickinson and S.Zucker.
Shock Graphs and Shape Matching. ICCV, Bombay, India, 1998.
[18] D. Terzopolous, A. Witkin,A. and. M. Kass. Symmetry- seeking models and 3D object recovery. Int. J. Comput. Vi- sion, 1, pp. 211-221, 1987.
[19] S. Ullman. Aligning Pictorial Descriptions: An Approach to Object Recognition. Cognition, 32(3):193-254, 1989.
[20] S. C. Zhu and A. L. Yuille. FORMS: a Flexible Object Recognition and Modeling System. ICCV, Boston, 1995.
(a)F1(412) (b)F2(601) (c)F3(601) (d)F4(613)
(e)Cost:25.2766 (601, 613, 0, 0) (exact matching)
(f)Cost:43.3587 (601, 601, 0, 52) (merge)
(g)Cost:63.561 (422, 601, 0, 277) (cut) Figure 4. (a), (b), (c) and (d) are examples of flower-shape contours and the numbers in the parentheses are their sizes. The de- gree of similarity is measured by the cost of best match while the quadruple includes the sizes of contours and the sizes of omitted contour segments of the best match due to cuts or merges. Example (e) is to demon- strate that our method is not a sequential one. The optimal match in (f) is derived with anM-operation between the two branches of F2. The size of contour segments omitted, due to the merge, is52. In (g), aC-operation to remove the two branches (total size is277) is required to obtain a good matching.
(a)Cost:70.6646 (668, 714, 0, 0) (exact matching)
(b)Cost:75.7528 (738, 711, 0, 0) (exact matching)
(c)Cost:78.5166 (714, 738, 0, 0) (exact matching)
(d)Cost:112.913 (501, 738, 0, 211) (cut)
(e)Cost:109.784 (739, 477, 299, 0) (cut) Figure 5. Results (a), (b) and (c) are ex- amples that the SA-tree shape comparison method can account for articulations and global shape information. To handle occlu- sions is also straightforward as shown in (c) and (d) with aC-operation.