Attributed hypergraph matching on a Riemannian manifold

(1)

DOI 10.1007/s00138-014-0598-1 O R I G I NA L PA P E R

Attributed hypergraph matching on a Riemannian manifold

J. M. Wang · S. W. Chen · C. S. Fuh

Received: 19 September 2012 / Revised: 19 July 2013 / Accepted: 23 January 2014 / Published online: 20 March 2014

Abstract If we consider a matching that preserves high- order relationships among points in the same set, we can int- roduce a hypergraph-matching technique to search for correspondence according to high-order feature values. While graph matching has been widely studied, there is limited research available regarding hypergraph matching. In this paper, we formulate hypergraph matching in terms of tensors. Then, we reduce the hypergraph matching to a bipartite matching problem that can be solved in polynomial time. We then extend this hypergraph matching to attributed hypergraph matching using a combination of different attributes with different orders. We perform analyses that demonstrate that this method is robust when handling noisy or missing data and can achieve inexact graph matching. To the best of our knowledge, while attributed graph-matching and hypergraph-matching have been heavily researched, methods for attributed hypergraph matching have not been proposed before.

Keywords Graph matching· Hilbert space · Riemannian manifold· Inexact graph matching · Attributed hypergraph matching

J. M. Wang· C. S. Fuh

Department of Computer Science and Information Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 10617, Taiwan, ROC

S. W. Chen (

B

⁾

Department of Computer Science and Information Engineering, National Taiwan Normal University, No. 88, Sec. 4, Tingzhou Rd., Wenshan District, Taipei 116, Taiwan, ROC

e-mail: swchen2012@gmail.com

1 Introduction

Matching two sets of feature points is a key element in many computer vision tasks, such as feature tracking, image align- ment and stitching, and stereo fusion. This problem is closely related to weighted bipartite graphs. One may consider the two sets of feature points as two sets of nodes, and asso- ciate a weight value (called similarity) to an edge between two nodes belonging to different sets. Finding a maximum weighted bipartite matching will give a matching with the maximal sum of the edge values. This problem has been reduced to the maximal flow problem [1] and can be solved in polynomial time. Many additional algorithms have been proposed to solve these problems [2], such as the Bellman–

Ford algorithm, Dijkstra’s algorithm, and the Fibonacci heap.

Bipartite matching is perhaps the simplest matching problem, because it only concerns information of the node (feature point) itself. Graph-matching techniques are developed with the intent to match the feature points in different sets and to preserve binary relationships between points in the same set. Here, we regard the feature points as nodes and the relationships of point pairs as edges, and then construct a graph for a set of feature points. Graph matching is performed to establish the mapping between the nodes of two graphs while preserving their edge, i.e., if an edge is mapped to another edge in the other graph, those nodes will be linked by the corresponding edges.

Graph-matching methods can be roughly divided into exact graph matching and inexact graph matching [3,4]. In exact graph matching, two graphs may have some identical parts. The challenge is then to find an isomorphism such that we can map a node and an edge from one graph to another in terms of their associated values. According to the size of the identical part, we may have a graph isomorphism, a subgraphisomorphism, or a maximum common subgraph

(2)

isomorphism. The graph isomorphism problem is an unsolved problem in relation to whether it is NP or NP-complete, and we believe that it cannot be solved in polynomial time. Tree- based search algorithms [5] are the standard processing methods for isomorphism testing, and most of them require exponential time in the worst case. Some constrained graphs, such as trees [6], permutation graphs, and planar graphs [7], have been invoked to develop polynomial time algorithms [8] for the graph isomorphism problem.

Unlike graph isomorphism, subgraph isomorphism has been reduced to maximum common subgraph isomorphism, which has been proved to be an NP-complete problem. Both are believed to be impossible to solve in polynomial time.

In practice, the maximum common subgraph isomorphism problem is more common as the feature points extracted from an image may be different from those points extracted from another image. Tree-based search algorithms [9] can be applied to solve the subgraph isomorphism problem in exponential time. To solve this problem in polynomial time, some approximation or suboptimal algorithms [10] have been proposed for constrained graphs (e.g., sparse graphs), which have then been solved iteratively in bounded polynomial time.

In computer vision, however, feature values often have spatial or temporal deviations, which cause the matching to be inexact matching rather than an exact graph matching.

Inexact matching algorithms are designed with a tolerance for errors and noise, therefore they can be applied to determine the identical parts of two graphs in a more practical application than is possible with exact graph matching. It is for this reason that inexact graph matching is also called error-tolerant graph matching [11]. It can be regarded as the maximum common graph matching in some form and has been proved to be an NP-complete problem [12].

In recent years, various algorithms designed for inexact graph matching have been proposed. For example, tree-based search algorithms [13] are guaranteed to find the optimal solution in exponential time. To allow a solution to be found in polynomial time, we may choose some constraints for the graphs or accept a suboptimal solution. However, various applications have led to differing algorithms. Artificial neural networks [14], relaxation labeling [15], genetic search [16], the spectral method [17], and the kernel method [18] are key classes of inexact graph matching algorithms. One may refer to the papers [3] and [4] to find more information about inexact graph matching algorithms.

Traditional graph matching only considers about nodes and edges, which are related to unary features and binary relationships, respectively. In computer vision, the unary feature could be color or position value related to the feature point itself, while a binary relationship could be distance or orientation value computed from one pair of feature points. Fea- ture points are extracted from one image frame and matched

to those extracted in the successive frame. The feature values may or may not be deviated spatially or temporally. For example, the position feature is illumination-invariant, but the color feature is not. Moreover, the orientation feature is invariant to scale but the distance feature is not. If we can develop an algorithm based on some kind of invariant feature, our proposed algorithm will be robust to that kind of deformation.

Since we can design features with many kinds of invariance from unary and binary relationships, some applications still require high-order relationships [19,20]. Relating the search for correspondence using high-order relationships to hypergraph matching problem [21] is very straightforward. Searching for correspondence using high-order relationships is equivalent to hypergraph matching [21] or high- order graph matching [22]. While an adjacency matrix repre- sents a graph, a tensor can be used to represent hypergraph.

Extending the formulation from the adjacency matrix to a tensor-based formulation is very straightforward. However, solving the tensor formulation of hypergraph matching is not so simple [22].

Except for increasing the order of relationships, we can extend graph matching to attributed graph matching [23,24], where each node and edge is associated with a set of attribute values in an attributed graph. When exploring the correspondence not only the structure, but also the attribute value should be considered. This approach is very useful in computer vision applications where we want to match feature points in different images [25]. However, considering different kinds of attributes at the same time encourages inconsistency due to inconsistent feature types [26].

In this study, we consider attributed hypergraph matching, where feature points are associated with many kinds of feature values, both high order and low order. The hypergraph is represented using several tensors in different orders, and the attributed hypergraph matching is formulated in terms of least squares of tensors. The correspondence is finally shown in a permutation matrix. In Sect.3, bipartite matching is addressed and formulated. Traditional graph matching is then discussed in Sect.4. The extension from graph matching to hypergraph matching and attributed hypergraph matching are discussed in Sects.5and6, respectively. Finally, the robustness of the proposed method is investigated in the final section of this paper.

The main contributions are summarized as follows. First, to the best of our knowledge, although some attributed graph- matching and hypergraph-matching algorithms have been proposed before, no algorithm focusing on attributed hypergraph matching, as our method does, has been proposed pre- viously. Second, the hypergraph in general case is considered, unlike most of the previous works concerned about special graphs to deal with the inexact graph matching because it is believed to be NP-hard problem. Third, attribute values

(3)

are transformed and represented in the same Hilbert space to prevent the issues relating to inconsistency of feature type.

Fourth, the hypergraph matching was reduced to bipartite graph matching in polynomial time, therefore it required much less processing time than other methods. Finally, under some assumptions, we can obtain optimal solution of hypergraph matching.

2 Related works

Let V = {V1, . . . , Vn} and V= {V₁, . . . , V_n} be two sets of points, where n and nare the number of points in V and V, respectively. We may solve their correspondence by look- ing for an n× nassignment matrix Z such that its (i, j)th element Zi, j= 1 when Vi corresponds to V_jand it satisfies Z 1≤ 1 and Z^T1≤ 1, where 1 is a column vector in the form 1= [1, 1, . . . , 1]^T. Such a problem is also called an assignment problem and it can be solved by bipartite matching in polynomial time. If the correspondences are established by considering their binary relationships, Ei, j = (Vi, Vj) and E_i_{, j} = (V_i, V_j), simultaneously, this problem is known as a quadratic assignment problem and has no known polynomial- time algorithm for solving it. As those nodes and edges form a graph G = (V, E = {Ei, j}) in mathematics, we may say that the quadratic assignment problem is a graph-matching problem. One may refer to the book [27] to find more information about assignment problems.

Approximate methods for the quadratic assignment problem can be categorized as those expressed by an affinity matrix and those expressed by a weighted adjacency matrix.

In the first category, the affinity matrix A∈ Rⁿⁿ^×nn, whose element Ai, jis the pairwise affinity of a potential assignment i = (Vi, V_i) to j = (Vj, V_j), is used for representation.

Solving the problem yields the following binary optimiza- tion problem:

z^∗= arg max

z (z^TAz), Z^∗∈ {0, 1}ⁿⁿ,

where z is a row-wise vectorized replica of Z . This formu- lation was first shown in [28] and can be solved via spectral relaxation:

w^∗= arg max_w w^TAw w^Tw .

Here,w^∗ can be solved by computing the leading eigen- value and corresponding eigenvector of A. Given the con- tinuous solutionw^∗, one may discretizew^∗to derive z^∗. By the Perron–Frobenius theorem, the leading eigenvector of A is known to exist if A is symmetric and non-negative. The power iteration method is a useful technique to compute the leading eigenvector iteratively.

To extend the above works to high-order relationship, the affinity tensor T ∈ Rⁿⁿ^×nn^×nn^×..., whose element is the affinity between potential assignments, is used for formula-

tion of hypergraph matching. A hypergraph G = (V, E) is a generalization of a graph where each edge, called hyperedge, can connect any number of nodes. According to the order of relationship, we can define the order of the affinity tensor and the number of nodes connected to one hyperedge.

Each hyperedge can be associated with the value of relationship among the connected nodes. To solve the hypergraph matching, one may unfold T into a matrix and apply the aforementioned methods proposed in matrix space [19], but some relationships would be lost in such a reduction.

Power iteration has been extended to tensor power itera- tions [22,29]. However, as the order of the tensor increases, the number of entries of the tensor increases exponentially.

If the tensor is not sparse, constructing the tensor and computing the leading eigenvector will be time-expensive oper- ations. The leading eigenvector could be used to infer the assignment matrix as it provides the optimal rank one approximation for the matrix (as in the Eckart–Young Theorem) [19]. However, this property does not hold for tensors.

The affinity matrix T can be though as the weighted adja- cency matrix of an association graph and Cho et al. [30]

proposed a random walk algorithm to solve the matching problem in this association graph. This work is then gener- alized to higher order [31], i.e., the random walk algorithm is applied to an association hypergraph that is represented by the affinity tensor. Power iteration and random walk algorithms use the expression with bulky processing space and solve the problem iteratively, which cause their methods are time and space consuming. Besides, their methods may be trapped in local minimum.

In the second category, graph nodes represent the points in the set and graph edges denote the relationships between two points in the same set. The sets of points, V and V, and their corresponding set of relationships are then represented as two weighted graphs using two weighted adjacency matrices. We denote these two matrices as A ∈ Rⁿ^×nand A ∈ Rⁿ^×n, respectively. Here, the assignment matrix Z is used as the permutation matrix P, and the matching problem may then be expressed as a least square problem as follows:

P^∗= arg min

P

A − P AP^T , (1)

where || · || means some norm (the Frobenius norm in our research). Obviously, this expression uses less processing space than the previous one. Besides, computing affinity matrix requires much more time O(n²) than computing weighted adjacency matrix O(n). As the order of relationship is increased, computing affinity tensor will require unaccept- able processing time O(n^2d).

Spectral methods are another technique for solving Eq. (1).

These can be applied because of the observation that the eigenvectors relating to the adjacency of a graph are invariant with respect to node permutation [4]. The graduated assign-

(4)

ment algorithm [32] may be one of the first algorithms to solve the quadratic assignment problem by relaxing P^∗ to a leading eigenvector of A. This algorithm combines the quadratic assignment problem and sparsity, and has proved to be a very successful algorithm [19].

While a weighted adjacency matrix defined for graph, we can define a tensor, whose elements are the values of hyper- edges, for hypergraph. Matrix decompositions, such as sin- gular value decomposition [33] or eigen decomposition [34], are widely used to solve quadratic assignment problems in matrix space. The corresponding decompositions for a tensor could be CANDECOMP/PARAFAC (CP) decompositions or Tucker decompositions [35]. The CP decomposition decom- poses a tensor as a sum of rank-one tensors, while the Tucker decomposition is a principal component analysis for tensors.

However, as discussed by Kolda and Bader [35], there is no straightforward algorithm to determine the rank of a specific given tensor, or the tensor decomposition.

Our method has the following advantages: At first, our method took a significantly short processing time (less than 1 s) to perform a hypergraph matching while obtaining rea- sonable results. In the second, this speed allows us to employ a higher number of attributes and higher orders of relationships for hypergraph matching. The higher number of attributes helps our method to adapt to noise, while the higher orders of relationships allow our method to be invariant to some image distortion (i.e., scale, rotation, or translation).

Finally, even graph matching is supposed to be NP-hard problem, we can obtain the optimal solution in polynomial time under some assumptions.

3 Bipartite matching

To determine the correspondence between the feature points of V and those of V, we look for an n× n permutation matrix P such that its (i, j)th element Pi, j= 1 when Vi cor- responds to V_j. In general, n is not equal to n. However, dummy feature points can be added to the smaller set to give the same size n for both sets V and. This matching prob- lem can be formulated as the minimization of the following equation [36]:

P^∗= arg min

P

U− PU (2)

where U and Uare two n× 1 vectors constructed from V and V, respectively, and in each vector, the i th element Ui

contains the feature value of Vi. Each element of P can be either 1 or 0, and only one element is 1 in each row and each column. The permutation matrix P with element Pi, j = 1 will permute U_jto subtract Ui. Since U and Uassociate the nodes of the graph without edges, Eq. (2) can express the solution of the graph matching.

To satisfy Eq. (2), the element U_j should be as close as possible to the element Ui when Pi j is set as 1. In a realistic situation, noise exists in the feature values. Therefore, Uiand U_j will not be exactly equal even if they represent the same feature point. To address the issue of noise, a similarity matrix S can be introduced for solving the problem. Each element, Si, j, of S is computed as the degree of the similarity between Ui and U_j. The permutation matrix P can be solved by the following maximization:

P^∗= arg min

P

i, j

Si, jPi, j.

The above problem is a well-known maximum weight bipartite matching problem, and can be solved in polynomial time [1]. Many methods, such as the relaxation method, the Hopfield network method [17], the Hungarian algorithm [37], and the Dijkstra’s algorithm, have been proposed for solving this problem.

In this paper, Si, j, is defined as Si, j = k(d(i, j)), where k(·) is a decreasing function (e.g., k(x) = 1/x) and d(i, j) is the degree of dissimilarity:

d(i, j) =Uⁱ− Uj . (3)

The vector norm||·|| is applied here instead of the absolute value, because we may assign a feature vector to a node in many applications. Finally, P is the solution of the bipartite matching with a maximum sum of weights, where V and Vare the two sets of nodes, and Si, j is the weighted value of the edge between nodes Vi and V_j. Note that we will reduce hypergraph matching to bipartite matching in polynomial time. Since the solution of maximum weight bipartite matching can be solved in polynomial time, we will be able to solve hypergraph matching in polynomial time too.

4 Graph matching

To decide the correspondence between different sets of feature points and to consider the binary relationships of point pairs at the same time, we construct a graph for each set of feature points and apply the method of graph matching to solve the correspondence problem. All the feature points in one set, V = {V1, . . ., Vn}, are considered as the nodes in the graph G, and the binary relationship between Vi and Vj

can be thought of the edge Ei, j between nodes. We assign a feature value Ai, j to edge Ei, j and a feature value Aii to node Vi, and then construct an weighted adjacency matrix A= [Ai, j]. Feature points in Vcan be a graph represented by another weighted adjacency matrix A. In our application,

A and Aare not necessarily symmetric.

Solving the correspondence between the feature points in different images can be formulated as the minimization problem [24] defined in Eq. (1). By assigning a feature value to an

(5)

edge, the graph becomes an attributed graph. Since A and A represent the attributed graphs, Eq. (1) formulates the problem of an attributed graph matching [38]. In previous works, many algorithms have been proposed to solve the attributed graph matching [3,4], but they encounter difficulties, such as an exponential processing time [9], only applying in special cases [10], or falling into local minimum traps.

Let us consider a permutation matrix P. Suppose A^∗ = P AP^T, and the permutation matrix P, with elements Pi, j = 1, will permute A_j_,:and A_{:, j} to A^∗_i_,: and A^∗_:,i, respectively.

If Eq. (1) is satisfied, Ai,: and A_:,i should be very similar to A^∗_i_,:and A^∗_:,i, respectively. Since the elements in A^∗_i_,:and A^∗_:,i, come from A_j_,:and A_{:, j}, we can expect the elements in A_j_,:and A_{:, j} to be close to those in Ai,:and A_:,iexcept for sequence.

For convenience, let A(i) denote the bag containing the elements of Ai,:and A_:,i, and A( j) denote the bag containing the elements of A_i_,: and A_:,i. If A(i) and A( j) are equal, we will set Pi, j to 1 in most cases. In a realistic situation, however, noises exist in the feature values and, thus, A(i) and A( j) cannot be exactly equal even if they represent the same feature point. We use the similarity matrix S introduced above with Si, j computed as Si, j = k(d(i, j)), where the distance function d(i, j) provides the degree of dissimilarity between A(i) and A( j). When we have S, the permutation matrix P can be solved using the methods discussed in the previous section.

Instead of measuring dissimilarity by comparing the elements in the bag directly, we compare their continuous distribution described below to tolerate the noise. Many harmonic analysis methods, such as the Hilbert transform or the Fourier transform, can be applied for describing the continuous distribution. In this study, we approximate of the elements in a bag with the distribution by a Fourier series f(x):

f(x) =

∞ n=−∞

cne^{i n}^μx, (4)

with:

cn= 1 2π

∞

−∞

f(x)e^−inμxdx, (5)

where f(x) is a function that is periodic on the interval [−L, L] and μ = ^π_L. In our application, we normalize the values of the elements in a bag [( A(i) or A( j) in our case]

to be between−π and π, and set:

f(x) =

_n_k

|A(i)| if x = ak

0 otherwise , (6)

where akis an element of set A(i), and nkis the number of elements with value ak. If we treat f(x) as a function that

is periodic on the interval[−π, π], Eq. (5) can be simplified as:

cn= 1 2π

π

−π

f(x)e^−inxdx,

and its discrete form would be:

cn= 1 2π

|A(i)|

k=1

1

|A(i)|e^−ina^k = 1 2π

1

|A(i)|

k=1

e^−ina^k

= C

|A(i)|

k=1

e^−ina^k, (7)

where C = ₂¹_π_|A(i)|¹ . The computation of Eq. (7) requires time complexity O(|A(i)|).

The above function f(x) is associated with a sequence of Fourier series,{FN(x)}, where N = 0, 1, 2, . . ., defined by:

FN(x) =

N n=−N

cne^{i nx}.

For any small positive real number, we can always choose N such that:

 f (x) − FN(x) < ε

for N > N. Finally, any square-integrable function can be expressed as a series:

f(x) =

∞ n=−∞

cne^{i nx}. (8)

Thus, the sequence of Fourier series converges. The functions e^{i nx} form the orthogonal basis of a space, satisfy- ing the conditions for a Hilbert space. Under these basis functions, we can represent the function f(x) by a vector (. . ., c₋₁, c0, c1, . . .) or by components (cn) in a Fourier coef- ficient space.

The Fourier series approximation, FN(x) = f(x) =

N

n=−Ncne^{i nx}, is then used instead of Eq. (4). We defined a Hilbert space H , the f(x) now can be represented as a vector v= (c_−N, . . . , c0, . . . , cN) in H. Figure1shows a diagram of the above transformation. With such an approximation, the distribution of the elements can accommodate data noise and even a loss of data. The total time complexity required for computing the Fourier series approximation is O(N|A(i)|).

Since N is a small constant number (8 in our experiments), the time complexity is assumed to be O(|A(i)|).

Let us use fi(x), f_i(x), and vito denote the discrete function, the Fourier series approximation, and the vector, respec- tively, transformed from A(i). Note that fj(x), f_j(x), and vj

are similarly transformed from A( j). The degree of dissim- ilarity between A(i) and A( j) corresponds to the difference between fi(x) and fj(x), which can be computed by the

(6)

Extracting row and column

Normalization {v₁,v₂, }

Discrete function

Fourier series approximation

Representation A(i)={A_5,1, A_5,2, A_5,3, A_5,4, A_5,6, A_1,5, A_2,5, A_3,5, A_4,5, A_6,5}

v=(c_N, , c₀, , c_N) f (x)

Fig. 1 Transformation from A(i) to f(x), where A(i) is a bag with the elements in the i th row and column of matrix A, f (x) is the discrete distribution of A(i), and f(x) is the Fourier series approximation for f(x). Note that the elements Aiican be zero if we do not consider the feature values of nodes

difference between f_i(x) and f_j(x) to tolerate the noise.

Finally, we compute the distance between vi and vjto reflect the difference between f_i(x) and f_j(x) for convenience.

To compute the true distance between vi and vj, we intro- duce a Riemannian manifold M embedded in H . The metric gvon M is a family of inner products:

gv: TvM× TvM → R, v ∈ M,

where TvM refers to the tangent space consisting of all the tangent vectors at one point v on M. Regarding the space H , the components of the metric tensor G at point v are given by:

gmn(v) = gv

∂ M

∂e^{i mx}

v

,

∂ M

∂e^{i nx}

v

.

With the metric tensor G, the inner product between two tangent vectors v1 ∈ TvM and v2 ∈ TvM can be com- puted by gv(v1, v2) = v₁^TGv2. Let us consider another point

v= v + dv, where dv = (0, . . ., 0, dcm, 0, . . ., 0). The distance,||dv||, between vand v defined on the manifold can be computed by:

dv²= dv²= dv^TGdv= gmm(v) · dcm· dcm. (9)

Suppose f_v(x) and fv(x) are the Fourier series approxi- mations defined by vand v. We calculate the difference,|dv|, between f_v(x) and fv(x) using the following equation:

dv =

f_v(x) − fv(x) dx=

dcme^{i mx}dx

= dcm

e^{i mx}dx. (10)

By combining Eqs. (9) and (10), gmm(v) equals to

e^{i mx}dx2

. Let

e^{i mx}dx2

= (C^m)², where Cm can be obtained from the inverse Fourier transform on the frequency₂^m_π.

Next, let us replace dv with(0, . . ., 0, dcm, 0, . . ., 0, dn, 0, . . ., 0). Equation (9) can be rewritten as:

dv²= dv^TGdv= gmm(v) · dcm· dcm

+2gmn(v) · dcm· dcn+ gnn(v) · dcn· dcn

= C_m² (dcm)²+ 2gmn(v) · dcm· dcn+ C_n²(dcn)², (11) and Eq. (10) will become:

dv =

f_v(x) − f_v(x) dx

= dcm

e^{i mx}dx+ dcn

e^{i nx}dx

and

dv²=

dcm

e^{i mx}dx

2

+2dcmdcn

e^{i mx}dx

e^{i nx}dx +

dcn

e^{i nx}dx

2

= (dc^m)²C_m² + 2dcmdcnCmCn+ (dcⁿ)²C_n². (12)

By combining Eqs. (11) and (12), we can obtain gmn(v) = CmCn. Finally, we have the metric tensor G that can be computed beforehand.

Let r : [a, b] be a continuously differentiable curve in M.

We define dis(r(a), r(b)) as the length of the shortest path (i.e., a geodesic) r between a and b:

(7)

dis(r(a), r(b)) =

b

a

r(s)ds=

b

a

r(s)^TGr(s)ds

=

b

a

r(s)^TCC^Tr(s)ds =

b

a

r(s)^TCds

=

b

a

r(s)^Tds· C = r^TC=

r^TGr

,

(13) where C= (C−N, . . . , C0, . . . , CN) and r = r(b) − r(a).

The elements of the similarity matrix S are defined by Si, j = k(d(i, j)), where d(i, j) is given by Eq. (13), d(i, j)

= dis(vi, vj). Constructing S will require time complexity O(n³) where n = |A(i)|. Using the method relating to Eq. (2) proposed in the previous section, we can obtain P^∗from S.

The graph with fewer nodes is padded with dummy nodes to give an equal number of nodes in each graph. In the matrices, the feature values of the dummy nodes and their corresponding edges are set to zero. To handle the zero values in the matrix, the corresponding values of f(x) are set to be a very small number. After graph matching, each node will match one of the nodes in the other graph. Dummy nodes will match either a dummy node or a redundant node in the other graph.

Based on the definition, A(i) contains the weights of the connected edges of node i . Its corresponding expressions vi

can be thought of as the feature vectors of node i . In other words, we transform the edge values A(i) of the binary rela- tionship to the feature vector, vi, of node i . The same situation occurs in the transformation from A( j) to vj. Solving Eq. (1) will be equivalent to solving Eq. (2) if we set Ui = vi and replace Eq. (3) with Eq. (13). We transform the graph matching to a bipartite matching. Figure2shows the overview of

the transformation. In the next section, the same idea provides a method for transforming a matching that considers d-ary relationships to a bipartite matching.

5 Hypergraph matching

As a weighted adjacency matrix can be used to represent an attributed graph, an attributed tensor can be used to represent an attributed hypergraph with d-ary affinities, each of which is a relationship of d nodes [22]. Let us take a matching of ternary relationships for example. As the matrix is applied to represent the binary relationship of the nodes, a third-order tensor T is applied to represent the ternary relationship of the nodes. A third-order tensor T can be regarded as a three- dimensional array, and its element Ti, j,kcontains the weight among the i th, j th, and kth nodes. The element Ti, j,k will store the weight value of an edge if two of the indexes (i, j, or k) are equal, while Ti,i,i will store the feature value of single node i as the indexes are the same.

With respect to the Cartesian basis {ei = (. . ., 0, 1, 0, . . .)^T}, where all the elements are 0 except that the ith element is 1, a third-order tensor can be represented by a linear combination of the Cartesian components:

T =

i, j,k

Ti, j,kei⊗ ej⊗ ek, (14)

where ei ⊗ ej ⊗ ek is the dyadic product of three vectors ei, ej, and ek. The dimension of{ei} is the same as the number of nodes in our case, and the dyadic product of three vectors is extended from the dyadic product of two vectors. As the dyadic product of two vectors is shown as a matrix, ei⊗ej⊗ek

is a three-dimensional array with 0 in all elements except for the (i, j, k) element (equal to 1). In the following expression, Fig. 2 Overview of the

transformation from graph matching to bipartite matching

(8)

the summation sign of the triple will be suppressed (using the Einstein summation convention), and a tensor expressed by Eq. (14) will be denoted by

T = Ti, j,kei⊗ ej ⊗ ek.

Solving the correspondence between the feature points in different images can be formulated as the following minimization problem with the above triple:

P^∗= arg min

P

T − Ti, j,k(Pei) ⊗ (Pej) ⊗ (Pek) , (15) where P is the permutation matrix and Pei will equal ej if Pi, j = 1. The permutation matrix can permute the slices T_j_,:,:, T_{:, j,:} , and T_{:,:, j} to subtract Ti,:,:, T_:,i:, and T_:,:,i, respectively, in Eq. (15). A slice T_j_,:,: can be considered as a matrix, which can have the inner product applied with the permutation matrix, P T_j_,:,:P^T, as it is expressed in graph matching. To minimize Eq. (15), the per- muted slice P T_j_,:,:P^T, PT_{:, j,:} P^T, PT_{:,:, j} P^Tmust similar to Ti,:,:, T_:,i,:, T_:,:,i, respectively. As discussed in the previous section, since T and T are associated with hypergraphs, Eq. (15) expresses the solution for a hypergraph matching.

Let us consider T(i) as a bag of elements in Ti,:,:, T_:,i:, and T_{:,:, j}, and consider T( j) as a bag of elements in T_j_,:,:, T_{:, j,:} , and T_{:,:, j} . The higher the similarity between T(i) and T( j), the higher the probability that Pi, j will be to 1. Using the process described earlier, T(i) and T( j) can be trans- formed and expressed as the feature vectors vi and vj using Eq. (5). The time complexity for a transformation is O(n²).

After setting r(a) = vi and r(b) = vj, we can measure

the dissimilarity using Eq. (13) and construct S for solving the permutation matrix in Eq. (2). Extending the tensor

order from three to d is straightforward. The difference is the time complexity, which is O(n^d⁻¹) for computing the vector and O(n^d⁺¹) for computing P^∗.

5.1 Proof

Using the above transformation, a hypergraph matching can be reduced to a bipartite matching in polynomial time. As the latter can be solved in polynomial time, so can the proposed hypergraph matching. Since graph matching is generally an NP-hard problem, the solution should be an approximation.

However, we have a proof to show that the processing result is the global minimum of Eq. (15).

It is known that bipartite matching can be solved in polynomial time, which means that we can have global minimum of Eq. (1) using those methods mentioned before. Let vector U consist of elements vi. Then, Eq. (1) can be expanded to the following:

P^∗= arg min

P

U− PU =arg min

P

j

vj − v_{π( j)}²,

(16) where π( j) refers to the permutation of element j. As in Eq. (8), vj and v_{π( j)} refer to the Fourier series f_j(x) and f_{π( j)} (x), respectively. Minimizing the norm,vj− vπ( j)is equivalent to minimizing the expected value E

f_j(x) − f_{π( j)} (x)

. Thus, by substituting this for the norm, Eq. (16) can be expanded as:

arg min

P

j

vj − v_{π( j)}²

= arg min

P

j

E

f_j(x) − f_{π( j)} (x)2

= arg min

P

j

E

⎡

⎣ ^N

n=−N

⎛

⎝C^{|T ( j)|}

k=1

e^−ina^j^,k

⎞

⎠ e^{i nx}−

N n=−N

⎛

⎝C|^T^{(π( j))}|

k=1

e^−ina^{π( j),π(k)}

⎞

⎠ e^{i nx}

⎤

⎦

2

, (17)

where aj,kis the kth element in bag T( j). Equation (17) can be further rearranged to:

(9)

arg min

P

j

E

⎡

⎣ ^N

n=−N

⎛

⎝C^{|T ( j)|}

k=1

e^−ina^j^,k

⎞

⎠ e^{i nx}−

N n=−N

⎛

⎝C|^T(π( j)) |

k=1

e^−ina^{π( j),π(k)}

⎞

⎠ e^{i nx}

⎤

⎦

2

= arg min

P

j

E

⎡

⎣ ^N

n=−N

⎛

⎝^{|T ( j)|}

k=1

e^−ina^j,k − e^−ina^{π( j),π(k)} ⎞

⎠ e^{i nx}

⎤

⎦

2

= arg min

P

j

E

⎡

⎣^{|T ( j)|}

k=1

_N

n=−N

e^−ina^j^,k− e^−ina^{π( j),π(k)} e^{i nx}

⎤

⎦

2

= arg min

P

j

E

k

_N

n=−N

e^−ina^j^,ke^{i nx}−

N n=−N

e^−ina^{π( j),π(k)} e^{i nx}

!2

,

(18)

where we assume that|T ( j)| = |T(π( j)|. As defined in Eq. (7),N

n=−Ne^−ina^j^,ke^{i nx} is the Fourier series approximation of:

f(x) =

1 if x= aj,k

0 otherwise , with _N

n=−Ne^−ina^{π( j),π(k)}e^{i nx} defined similarly. Both

_N

n=−Ne^−ina^j^,ke^{i nx}and_N

n=−Ne^−ina^{π( j),π(k)} e^{i nx}are Gaussi- an functions with the same standard deviation. The subtraction of these two Gaussian functions can be performed as the subtraction of two random variables that are normally distrib- uted with mean values aj,k and a_{π( j),π(k)}, respectively. Let us define two random variables, Xj,k≡N

n=−Ne^−ina^j^,ke^{i nx} and X_{( j),π(k)}(x) ≡ _N

n=−Ne^−ina^{π( j),π(k)}e^{i nx}. These can be substituted into Eq. (18), giving:

arg min

P

j

E

k

_N

n=−N

e^−ina^j,ke^{i nx}−

N n=−N

e^−ina^{π( j),π(k)}e^{i nx}

!2

= arg min

P

j

E

k

X_j,k− Xπ( j),π(k)!2

= arg min

P

j

E

k

D X_j,k

!₂

, (19)

where D Xj,k = Xj,k − Xπ( j),π(k). We assume that each D Xj,k is independent to one other. Then, Eq. (19) can be written as:

arg min

P

j

E

k

D X_j,k

!₂

= arg min

P

j

k

E"

D X_j,k#₂ .

(20) Since the expected value E"

D Xj,k#

shows the difference between aj,kand a_{π( j),π(k)}, Eq. (20) becomes:

arg min

P

j

k

E"

D Xj,k#2

= arg min

P

T − Ti, j,k(Pei) ⊗ (Pej) ⊗ (Pek) . (21) Finally, we have P^∗= arg min

P

T − T_i_{, j,k}(Pei) ⊗ (Pej)

⊗(Pek), proving that our processing result satisfies Eq. (15).

5.2 Error propagation and discernibility

Most of the previous research has focused on the matching of an attributed graph constructed using node values (unary features) and edge values (binary relationships). In hypergraph matching, we try to solve the matching problem involving high-order relationships [26]. Such matching has the advantages of increasing the geometric invariance and more models can explain [22]. However, sometimes high-order feature values have greater deviation than expected. When we construct a d-order tensor for hypergraph matching, the absence of one pair influences the d×(d −1)×n^d⁻²feature values of a bag.

To prove this, let us consider a d-order tensor Ti1,i2,...,id and a specific node i . The bag for computing feature vector vi

contains d elements in T:,...:,i,:,...,:. For each T:,...:,i,:,...,:, one of the remaining (d− 1) indexes indicates the corresponding node whose feature vector will be influenced. This leaves d−2 indexes to set as one of the n index values (n^d⁻²cases).

With respect to the bag size n^d⁻¹, the ratio of the changed elements to the non-changed elements in a bag is:

d× (d − 1) n

under one missing pair. When we are missing k pairs, the ratio would be:

k× d × (d − 1)

n = r × d × (d − 1),

(10)

Fig. 3 a The relationship between r and d when the rate of influence reaches 100 %. b The degree of influence (y-axis) to the ratio (x-axis) of the number of feature values being changed

where r = ^k_n is defined as the rate of missing nodes. As the order d (>1) is increased, the influenced elements of a bag increase exponentially. This influence leads to low tolerance for missing nodes. Figure3a shows the relationship between r and d when the rate of influence reaches 100 %. We can see that while the tensor order is>4, the missing rate cannot be>10 %.

Increasing the order of the tensor leads to another diffi- culty: the decreasing degree of discernibility. As long as the number of elements in a bag is very large, those bags will become harder to distinguish. To demonstrate this, we ran- domly produce k feature values and considered these values as the elements of a bag. This bag is then computed to the fea- ture vector voriusing Eq. (5). Given a missing rate of r, r × k feature values of the bag are randomly selected and replaced using new random values. Then, a new bag is produced and can be computed with a new feature vector vnew. Their dissimilarity is computed as dis(vori, vnew) using Eq. (13), and indicates the degree of discernibility. Figure3b shows the degree of discernibility in relation to r . We see that while the number of elements is>1,000, the two bags cannot be distinguished, even if they are 30 % different. Such issues are easy to encounter in hypergraph matching, for example, if d = 3 and n > 32, then the number (32³) of elements in the bag will be>1,000; if d = 4 and n > 10 , then the number(10³) of elements in the bag will also be >1,000.

Some of the above issues can be addressed. First, if only a few nodes are designed for hypergraph matching, the propagation of errors will be reduced. Second, to avoid the influence of noise, the hypergraph matching can only be applied in a case where there are no missing nodes or data noise. Third, constructing a tensor that is very sparse helps to reduce the effects of error propagation. This can be achieved by applying constraints as proposed by Duchenne et al. [22]. Finally, we can select a suitable tensor order according to the order of the relationship, and use the attributed hypergraph-matching method proposed in the next section to combine different orders of relationships.

6 Attributed hypergraph matching

Attributed graph matching is defined as:

P^∗= arg min

_m P

k=1

wkTk− Tk,i1,··· ,i_dk(Pei₁) ⊗ · · · ⊗ (Pei_dk)

,

(22) where dkrefers to the order of the tensor relating to the dk-ary relationship, andwk is the weight for describing the impor- tance of the kth feature. If dk = 1, thenT − Ti₁(Pe_i1) will equal the form ofU− PUrelating to Eq. (2). Similarly, if dk = 2, then T − Ti₁,i2(Pei1) ⊗ (Pei2) will equal the form ofA− P AP^Trelating to Eq. (1). As we have the solution for each part of the Eq. (22) (hypergraph matching), i.e.,

P^∗= arg min

P

Tk− Tk,i1,...,i_dk(Pei1) ⊗ · · · ⊗ (Pei_dk) , (23) we solve this problem first and then combine all the solu- tions. Recall that we can compute the similarity matrix Sk

for Eq. (23) at first. Then the combination is defined as:

S=

m k=1

wkSk, (24)

and the final solution P^∗is derived from S using Eq. (2).

However, previous research [19] shows that combining multiple solutions may increase the occurrence of problems due to degradation and bias if one of the error solutions dom- inates the result. A reason for this could be that the solutions are solved from different types of equations and instances,