DOI 10.1007/s00138-014-0598-1 O R I G I NA L PA P E R

**Attributed hypergraph matching on a Riemannian manifold**

**J. M. Wang** **· S. W. Chen · C. S. Fuh**

Received: 19 September 2012 / Revised: 19 July 2013 / Accepted: 23 January 2014 / Published online: 20 March 2014

© Springer-Verlag Berlin Heidelberg 2014

**Abstract** If we consider a matching that preserves high-
order relationships among points in the same set, we can int-
roduce a hypergraph-matching technique to search for cor-
respondence according to high-order feature values. While
graph matching has been widely studied, there is limited
research available regarding hypergraph matching. In this
paper, we formulate hypergraph matching in terms of ten-
sors. Then, we reduce the hypergraph matching to a bipartite
matching problem that can be solved in polynomial time. We
then extend this hypergraph matching to attributed hyper-
graph matching using a combination of different attributes
with different orders. We perform analyses that demonstrate
that this method is robust when handling noisy or miss-
ing data and can achieve inexact graph matching. To the
best of our knowledge, while attributed graph-matching and
hypergraph-matching have been heavily researched, methods
for attributed hypergraph matching have not been proposed
before.

**Keywords** Graph matching· Hilbert space · Riemannian
manifold· Inexact graph matching · Attributed hypergraph
matching

J. M. Wang· C. S. Fuh

Department of Computer Science and Information Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 10617, Taiwan, ROC

S. W. Chen (

### B

^{)}

Department of Computer Science and Information Engineering, National Taiwan Normal University, No. 88, Sec. 4, Tingzhou Rd., Wenshan District, Taipei 116, Taiwan, ROC

e-mail: swchen2012@gmail.com

**1 Introduction**

Matching two sets of feature points is a key element in many computer vision tasks, such as feature tracking, image align- ment and stitching, and stereo fusion. This problem is closely related to weighted bipartite graphs. One may consider the two sets of feature points as two sets of nodes, and asso- ciate a weight value (called similarity) to an edge between two nodes belonging to different sets. Finding a maximum weighted bipartite matching will give a matching with the maximal sum of the edge values. This problem has been reduced to the maximal flow problem [1] and can be solved in polynomial time. Many additional algorithms have been proposed to solve these problems [2], such as the Bellman–

Ford algorithm, Dijkstra’s algorithm, and the Fibonacci heap.

Bipartite matching is perhaps the simplest matching prob- lem, because it only concerns information of the node (feature point) itself. Graph-matching techniques are developed with the intent to match the feature points in different sets and to preserve binary relationships between points in the same set. Here, we regard the feature points as nodes and the rela- tionships of point pairs as edges, and then construct a graph for a set of feature points. Graph matching is performed to establish the mapping between the nodes of two graphs while preserving their edge, i.e., if an edge is mapped to another edge in the other graph, those nodes will be linked by the corresponding edges.

Graph-matching methods can be roughly divided into exact graph matching and inexact graph matching [3,4]. In exact graph matching, two graphs may have some identical parts. The challenge is then to find an isomorphism such that we can map a node and an edge from one graph to another in terms of their associated values. According to the size of the identical part, we may have a graph isomorphism, a subgraphisomorphism, or a maximum common subgraph

isomorphism. The graph isomorphism problem is an unsolved problem in relation to whether it is NP or NP-complete, and we believe that it cannot be solved in polynomial time. Tree- based search algorithms [5] are the standard processing meth- ods for isomorphism testing, and most of them require expo- nential time in the worst case. Some constrained graphs, such as trees [6], permutation graphs, and planar graphs [7], have been invoked to develop polynomial time algorithms [8] for the graph isomorphism problem.

Unlike graph isomorphism, subgraph isomorphism has been reduced to maximum common subgraph isomorphism, which has been proved to be an NP-complete problem. Both are believed to be impossible to solve in polynomial time.

In practice, the maximum common subgraph isomorphism problem is more common as the feature points extracted from an image may be different from those points extracted from another image. Tree-based search algorithms [9] can be applied to solve the subgraph isomorphism problem in exponential time. To solve this problem in polynomial time, some approximation or suboptimal algorithms [10] have been proposed for constrained graphs (e.g., sparse graphs), which have then been solved iteratively in bounded polynomial time.

In computer vision, however, feature values often have spatial or temporal deviations, which cause the matching to be inexact matching rather than an exact graph matching.

Inexact matching algorithms are designed with a tolerance for errors and noise, therefore they can be applied to deter- mine the identical parts of two graphs in a more practical application than is possible with exact graph matching. It is for this reason that inexact graph matching is also called error-tolerant graph matching [11]. It can be regarded as the maximum common graph matching in some form and has been proved to be an NP-complete problem [12].

In recent years, various algorithms designed for inexact graph matching have been proposed. For example, tree-based search algorithms [13] are guaranteed to find the optimal solution in exponential time. To allow a solution to be found in polynomial time, we may choose some constraints for the graphs or accept a suboptimal solution. However, various applications have led to differing algorithms. Artificial neural networks [14], relaxation labeling [15], genetic search [16], the spectral method [17], and the kernel method [18] are key classes of inexact graph matching algorithms. One may refer to the papers [3] and [4] to find more information about inexact graph matching algorithms.

Traditional graph matching only considers about nodes and edges, which are related to unary features and binary rela- tionships, respectively. In computer vision, the unary feature could be color or position value related to the feature point itself, while a binary relationship could be distance or orien- tation value computed from one pair of feature points. Fea- ture points are extracted from one image frame and matched

to those extracted in the successive frame. The feature val- ues may or may not be deviated spatially or temporally. For example, the position feature is illumination-invariant, but the color feature is not. Moreover, the orientation feature is invariant to scale but the distance feature is not. If we can develop an algorithm based on some kind of invariant fea- ture, our proposed algorithm will be robust to that kind of deformation.

Since we can design features with many kinds of invari- ance from unary and binary relationships, some applica- tions still require high-order relationships [19,20]. Relating the search for correspondence using high-order relationships to hypergraph matching problem [21] is very straightfor- ward. Searching for correspondence using high-order rela- tionships is equivalent to hypergraph matching [21] or high- order graph matching [22]. While an adjacency matrix repre- sents a graph, a tensor can be used to represent hypergraph.

Extending the formulation from the adjacency matrix to a tensor-based formulation is very straightforward. However, solving the tensor formulation of hypergraph matching is not so simple [22].

Except for increasing the order of relationships, we can extend graph matching to attributed graph matching [23,24], where each node and edge is associated with a set of attribute values in an attributed graph. When exploring the corre- spondence not only the structure, but also the attribute value should be considered. This approach is very useful in com- puter vision applications where we want to match feature points in different images [25]. However, considering differ- ent kinds of attributes at the same time encourages inconsis- tency due to inconsistent feature types [26].

In this study, we consider attributed hypergraph matching, where feature points are associated with many kinds of fea- ture values, both high order and low order. The hypergraph is represented using several tensors in different orders, and the attributed hypergraph matching is formulated in terms of least squares of tensors. The correspondence is finally shown in a permutation matrix. In Sect.3, bipartite matching is addressed and formulated. Traditional graph matching is then discussed in Sect.4. The extension from graph matching to hypergraph matching and attributed hypergraph match- ing are discussed in Sects.5and6, respectively. Finally, the robustness of the proposed method is investigated in the final section of this paper.

The main contributions are summarized as follows. First, to the best of our knowledge, although some attributed graph- matching and hypergraph-matching algorithms have been proposed before, no algorithm focusing on attributed hyper- graph matching, as our method does, has been proposed pre- viously. Second, the hypergraph in general case is considered, unlike most of the previous works concerned about special graphs to deal with the inexact graph matching because it is believed to be NP-hard problem. Third, attribute values

are transformed and represented in the same Hilbert space to prevent the issues relating to inconsistency of feature type.

Fourth, the hypergraph matching was reduced to bipartite graph matching in polynomial time, therefore it required much less processing time than other methods. Finally, under some assumptions, we can obtain optimal solution of hyper- graph matching.

**2 Related works**

*Let V* *= {V*1*, . . . , V**n**} and V*^{}*= {V*_{1}^{}*, . . . , V*_{n}^{}} be two sets of
*points, where n and n*^{}*are the number of points in V and V*^{},
respectively. We may solve their correspondence by look-
*ing for an n× n*^{}*assignment matrix Z such that its (i, j)th*
*element Z**i**, j**= 1 when V**i* *corresponds to V*_{j}^{}and it satisfies
**Z 1****≤ 1 and Z**^{T}**1≤ 1, where 1 is a column vector in the form**
**1***= [1, 1, . . . , 1]** ^{T}*. Such a problem is also called an assign-
ment problem and it can be solved by bipartite matching in
polynomial time. If the correspondences are established by

*considering their binary relationships, E*

*i*

*, j*

*= (V*

*i*

*, V*

*j*

*) and*

*E*

_{i}^{}

_{, j}*= (V*

_{i}^{}

*, V*

_{j}^{}

*), simultaneously, this problem is known as a*quadratic assignment problem and has no known polynomial- time algorithm for solving it. As those nodes and edges form

*a graph G*

*= (V, E = {E*

*i*

*, j*}) in mathematics, we may say that the quadratic assignment problem is a graph-matching problem. One may refer to the book [27] to find more infor- mation about assignment problems.

Approximate methods for the quadratic assignment prob- lem can be categorized as those expressed by an affinity matrix and those expressed by a weighted adjacency matrix.

*In the first category, the affinity matrix A***∈ R**^{nn}^{}^{×nn}^{}, whose
*element A**i**, j*is the pairwise affinity of a potential assignment
*i* *= (V**i**, V*_{i}^{}*) to j = (V**j**, V*_{j}^{}*), is used for representation.*

Solving the problem yields the following binary optimiza- tion problem:

**z**^{∗}= arg max

*z* **(z**^{T}**Az), Z**^{∗}*∈ {0, 1}*^{nn}^{}*,*

* where z is a row-wise vectorized replica of Z . This formu-*
lation was first shown in [28] and can be solved via spectral
relaxation:

*w*^{∗}= arg max_{w}*w*^{T}*Aw*
*w*^{T}*w* *.*

Here,*w*^{∗} can be solved by computing the leading eigen-
*value and corresponding eigenvector of A. Given the con-*
tinuous solution*w*^{∗}, one may discretize*w*^{∗}*to derive z*^{∗}. By
*the Perron–Frobenius theorem, the leading eigenvector of A*
*is known to exist if A is symmetric and non-negative. The*
power iteration method is a useful technique to compute the
leading eigenvector iteratively.

To extend the above works to high-order relationship, the
*affinity tensor T* **∈ R**^{nn}^{}^{×nn}^{}^{×nn}^{}* ^{×...}*, whose element is the
affinity between potential assignments, is used for formula-

*tion of hypergraph matching. A hypergraph G* *= (V, E) is*
a generalization of a graph where each edge, called hyper-
edge, can connect any number of nodes. According to the
order of relationship, we can define the order of the affinity
tensor and the number of nodes connected to one hyperedge.

Each hyperedge can be associated with the value of relation-
ship among the connected nodes. To solve the hypergraph
*matching, one may unfold T into a matrix and apply the*
aforementioned methods proposed in matrix space [19], but
some relationships would be lost in such a reduction.

Power iteration has been extended to tensor power itera- tions [22,29]. However, as the order of the tensor increases, the number of entries of the tensor increases exponentially.

If the tensor is not sparse, constructing the tensor and com- puting the leading eigenvector will be time-expensive oper- ations. The leading eigenvector could be used to infer the assignment matrix as it provides the optimal rank one approx- imation for the matrix (as in the Eckart–Young Theorem) [19]. However, this property does not hold for tensors.

*The affinity matrix T can be though as the weighted adja-*
cency matrix of an association graph and Cho et al. [30]

proposed a random walk algorithm to solve the matching problem in this association graph. This work is then gener- alized to higher order [31], i.e., the random walk algorithm is applied to an association hypergraph that is represented by the affinity tensor. Power iteration and random walk algo- rithms use the expression with bulky processing space and solve the problem iteratively, which cause their methods are time and space consuming. Besides, their methods may be trapped in local minimum.

In the second category, graph nodes represent the points in
the set and graph edges denote the relationships between two
*points in the same set. The sets of points, V and V*^{}, and their
corresponding set of relationships are then represented as two
weighted graphs using two weighted adjacency matrices. We
*denote these two matrices as A* **∈ R**^{n}^{×n}*and A*^{} **∈ R**^{n}^{}^{×n}^{},
*respectively. Here, the assignment matrix Z is used as the*
*permutation matrix P, and the matching problem may then*
be expressed as a least square problem as follows:

*P*^{∗}= arg min

*P*

*A − P A*^{}*P*^{T}* ,* (1)

where || · || means some norm (the Frobenius norm in our
research). Obviously, this expression uses less processing
space than the previous one. Besides, computing affinity
*matrix requires much more time O(n*^{2}*) than computing*
*weighted adjacency matrix O(n). As the order of relationship*
is increased, computing affinity tensor will require unaccept-
*able processing time O(n*^{2d}*).*

Spectral methods are another technique for solving Eq. (1).

These can be applied because of the observation that the eigenvectors relating to the adjacency of a graph are invariant with respect to node permutation [4]. The graduated assign-

ment algorithm [32] may be one of the first algorithms to
*solve the quadratic assignment problem by relaxing P*^{∗} to
*a leading eigenvector of A. This algorithm combines the*
quadratic assignment problem and sparsity, and has proved
to be a very successful algorithm [19].

While a weighted adjacency matrix defined for graph, we can define a tensor, whose elements are the values of hyper- edges, for hypergraph. Matrix decompositions, such as sin- gular value decomposition [33] or eigen decomposition [34], are widely used to solve quadratic assignment problems in matrix space. The corresponding decompositions for a tensor could be CANDECOMP/PARAFAC (CP) decompositions or Tucker decompositions [35]. The CP decomposition decom- poses a tensor as a sum of rank-one tensors, while the Tucker decomposition is a principal component analysis for tensors.

However, as discussed by Kolda and Bader [35], there is no straightforward algorithm to determine the rank of a specific given tensor, or the tensor decomposition.

Our method has the following advantages: At first, our method took a significantly short processing time (less than 1 s) to perform a hypergraph matching while obtaining rea- sonable results. In the second, this speed allows us to employ a higher number of attributes and higher orders of rela- tionships for hypergraph matching. The higher number of attributes helps our method to adapt to noise, while the higher orders of relationships allow our method to be invariant to some image distortion (i.e., scale, rotation, or translation).

Finally, even graph matching is supposed to be NP-hard prob- lem, we can obtain the optimal solution in polynomial time under some assumptions.

**3 Bipartite matching**

To determine the correspondence between the feature points
*of V and those of V*^{}*, we look for an n× n*^{} permutation
*matrix P such that its (i, j)th element P**i**, j**= 1 when V**i* cor-
*responds to V*_{j}^{}*. In general, n is not equal to n*^{}. However,
dummy feature points can be added to the smaller set to give
*the same size n for both sets V and. This matching prob-*
lem can be formulated as the minimization of the following
equation [36]:

*P*^{∗}= arg min

*P*

*U− PU*^{} (2)

**where U and U**^{}*are two n× 1 vectors constructed from V*
*and V*^{}*, respectively, and in each vector, the i th element U**i*

*contains the feature value of V**i**. Each element of P can be*
either 1 or 0, and only one element is 1 in each row and each
*column. The permutation matrix P with element P**i**, j* = 1
*will permute U*^{}_{j}*to subtract U**i***. Since U and U**^{}associate the
nodes of the graph without edges, Eq. (2) can express the
solution of the graph matching.

To satisfy Eq. (2), the element U^{}* _{j}* should be as close as

*possible to the element U*

*i*

*when P*

*i j*is set as 1. In a realistic

*situation, noise exists in the feature values. Therefore, U*

*i*and

*U*

^{}

*will not be exactly equal even if they represent the same feature point. To address the issue of noise, a similarity matrix*

_{j}*S can be introduced for solving the problem. Each element,*

*S*

*i*

*, j*

*, of S is computed as the degree of the similarity between*

*U*

*i*

*and U*

^{}

_{j}*. The permutation matrix P can be solved by the*following maximization:

*P*^{∗}= arg min

*P*

*i**, j*

*S**i**, j**P**i**, j**.*

The above problem is a well-known maximum weight bipartite matching problem, and can be solved in polyno- mial time [1]. Many methods, such as the relaxation method, the Hopfield network method [17], the Hungarian algorithm [37], and the Dijkstra’s algorithm, have been proposed for solving this problem.

*In this paper, S**i**, j**, is defined as S**i**, j* *= k(d(i, j)), where*
*k(·) is a decreasing function (e.g., k(x) = 1/x) and d(i, j)*
is the degree of dissimilarity:

*d(i, j) =**U*^{i}*− U*^{}*j** .* (3)

The vector norm||·|| is applied here instead of the absolute
value, because we may assign a feature vector to a node in
*many applications. Finally, P is the solution of the bipartite*
*matching with a maximum sum of weights, where V and*
*V*^{}*are the two sets of nodes, and S**i**, j* is the weighted value
*of the edge between nodes V**i* *and V*_{j}^{}. Note that we will
reduce hypergraph matching to bipartite matching in poly-
nomial time. Since the solution of maximum weight bipartite
matching can be solved in polynomial time, we will be able
to solve hypergraph matching in polynomial time too.

**4 Graph matching**

To decide the correspondence between different sets of fea-
ture points and to consider the binary relationships of point
pairs at the same time, we construct a graph for each set of
feature points and apply the method of graph matching to
solve the correspondence problem. All the feature points in
*one set, V* *= {V*1*, . . ., V**n*}, are considered as the nodes in
*the graph G, and the binary relationship between V**i* *and V**j*

*can be thought of the edge E**i**, j* between nodes. We assign
*a feature value A**i**, j* *to edge E**i**, j* *and a feature value A**ii* to
*node V**i*, and then construct an weighted adjacency matrix
*A= [A**i**, j**]. Feature points in V*^{}can be a graph represented
*by another weighted adjacency matrix A*^{}. In our application,

*A and A*^{}are not necessarily symmetric.

Solving the correspondence between the feature points in different images can be formulated as the minimization prob- lem [24] defined in Eq. (1). By assigning a feature value to an

*edge, the graph becomes an attributed graph. Since A and A*^{}
represent the attributed graphs, Eq. (1) formulates the prob-
lem of an attributed graph matching [38]. In previous works,
many algorithms have been proposed to solve the attributed
graph matching [3,4], but they encounter difficulties, such as
an exponential processing time [9], only applying in special
cases [10], or falling into local minimum traps.

*Let us consider a permutation matrix P. Suppose A*^{∗} =
*P A*^{}*P*^{T}*, and the permutation matrix P, with elements P**i**, j* =
*1, will permute A*^{}_{j}_{,:}*and A*^{}_{:, j}*to A*^{∗}_{i}_{,:}*and A*^{∗}* _{:,i}*, respectively.

If Eq. (1) is satisfied, A*i**,:* *and A** _{:,i}* should be very similar

*to A*

^{∗}

_{i}

_{,:}*and A*

^{∗}

_{:,i}*, respectively. Since the elements in A*

^{∗}

_{i}*and*

_{,:}*A*

^{∗}

_{:,i}*, come from A*

^{}

_{j}

_{,:}*and A*

^{}

*, we can expect the elements in*

_{:, j}*A*

^{}

_{j}

_{,:}*and A*

^{}

_{:, j}*to be close to those in A*

*i*

*,:*

*and A*

*except for sequence.*

_{:,i}*For convenience, let A(i) denote the bag containing the*
*elements of A**i**,:**and A*_{:,i}*, and A*^{}*( j) denote the bag containing*
*the elements of A*_{i}^{}_{,:}*and A*^{}_{:,i}*. If A(i) and A*^{}*( j) are equal,*
*we will set P**i**, j* to 1 in most cases. In a realistic situation,
*however, noises exist in the feature values and, thus, A(i)*
*and A*^{}*( j) cannot be exactly equal even if they represent the*
*same feature point. We use the similarity matrix S introduced*
*above with S**i**, j* *computed as S**i**, j* *= k(d(i, j)), where the*
*distance function d(i, j) provides the degree of dissimilarity*
*between A(i) and A*^{}*( j). When we have S, the permutation*
*matrix P can be solved using the methods discussed in the*
previous section.

Instead of measuring dissimilarity by comparing the ele-
ments in the bag directly, we compare their continuous distri-
bution described below to tolerate the noise. Many harmonic
analysis methods, such as the Hilbert transform or the Fourier
transform, can be applied for describing the continuous dis-
tribution. In this study, we approximate of the elements in a
*bag with the distribution by a Fourier series f*^{}*(x):*

*f*^{}*(x) =*

∞
*n*=−∞

*c**n*e^{i n}^{μx}*,* (4)

with:

*c**n*= 1
2*π*

∞

−∞

*f(x)e*^{−inμx}*dx,* (5)

*where f(x) is a function that is periodic on the interval*
*[−L, L] and μ =* ^{π}* _{L}*. In our application, we normalize the

*values of the elements in a bag [( A(i) or A*

^{}

*( j) in our case]*

to be between*−π and π, and set:*

*f(x) =*

_{n}_{k}

*|A(i)|* *if x* *= a**k*

0 otherwise *,* (6)

*where a**k**is an element of set A(i), and n**k*is the number of
*elements with value a**k**. If we treat f(x) as a function that*

is periodic on the interval*[−π, π], Eq. (5) can be simplified*
as:

*c**n*= 1
2π

*π*

*−π*

*f(x)e*^{−inx}*dx,*

and its discrete form would be:

*c**n*= 1
2π

*|A(i)|*

*k*=1

1

*|A(i)|*e^{−ina}* ^{k}* = 1
2π

1

*|A(i)|*

*|A(i)|*

*k*=1

e^{−ina}^{k}

*= C*

*|A(i)|*

*k*=1

e^{−ina}^{k}*,* (7)

*where C* = _{2}^{1}_{π}_{|A(i)|}^{1} . The computation of Eq. (7) requires
*time complexity O(|A(i)|).*

*The above function f(x) is associated with a sequence of*
Fourier series,*{F**N**(x)}, where N = 0, 1, 2, . . ., defined by:*

*F**N**(x) =*

*N*
*n**=−N*

*c**n*e^{i nx}*.*

*For any small positive real number, we can always choose N*
such that:

* f (x) − F**N*^{}*(x) < ε*

*for N*^{} *> N. Finally, any square-integrable function can be*
expressed as a series:

*f(x) =*

∞
*n*=−∞

*c**n*e^{i nx}*.* (8)

Thus, the sequence of Fourier series converges. The func-
tions e* ^{i nx}* form the orthogonal basis of a space, satisfy-
ing the conditions for a Hilbert space. Under these basis

*functions, we can represent the function f(x) by a vector*

*(. . ., c*

_{−1}

*, c*0

*, c*1

*, . . .) or by components (c*

*n*

*) in a Fourier coef-*ficient space.

*The Fourier series approximation, F**N**(x) = f*^{}*(x) =*

*N*

*n**=−N**c**n*e* ^{i nx}*, is then used instead of Eq. (4). We defined a

*Hilbert space H , the f*

^{}

*(x) now can be represented as a vector*

**v**= (c

_{−N}*, . . . , c*0

*, . . . , c*

*N*

*) in H. Figure*1shows a diagram of the above transformation. With such an approximation, the distribution of the elements can accommodate data noise and even a loss of data. The total time complexity required for

*computing the Fourier series approximation is O(N|A(i)|).*

*Since N is a small constant number (8 in our experiments),*
*the time complexity is assumed to be O(|A(i)|).*

*Let us use f**i**(x), f*_{i}^{}**(x), and v***i*to denote the discrete func-
tion, the Fourier series approximation, and the vector, respec-
*tively, transformed from A(i). Note that f**j**(x), f*^{}_{j}**(x), and v***j*

*are similarly transformed from A*^{}*( j). The degree of dissim-*
*ilarity between A(i) and A*^{}*( j) corresponds to the difference*
*between f**i**(x) and f**j**(x), which can be computed by the*

Extracting row and column

Normalization
*{v*_{1}*,v*_{2}, }

Discrete function

Fourier series approximation

Representation
*A(i)=*{*A*_{5,1}*, A*_{5,2}*, A*_{5,3}*, A*_{5,4}*, A*_{5,6}*, A*_{1,5}*, A*_{2,5}*, A*_{3,5}*, A*_{4,5}*, A*_{6,5}}

* v*=

*(c*

*,*

_{N}*, c*

_{0},

*, c*

*)*

_{N}*f (x)*

**Fig. 1 Transformation from A***(i) to f*^{}*(x), where A(i) is a bag with*
*the elements in the i th row and column of matrix A**, f (x) is the discrete*
*distribution of A**(i), and f*^{}*(x) is the Fourier series approximation for*
*f**(x). Note that the elements A**ii*can be zero if we do not consider the
feature values of nodes

*difference between f*_{i}^{}*(x) and f*^{}_{j}*(x) to tolerate the noise.*

**Finally, we compute the distance between v***i* **and v***j*to reflect
*the difference between f*_{i}^{}*(x) and f*_{j}^{}*(x) for convenience.*

**To compute the true distance between v***i* **and v***j*, we intro-
*duce a Riemannian manifold M embedded in H . The metric*
*g***v***on M is a family of inner products:*

*g***v***: T***v***M× T***v***M* **→ R, v ∈ M,**

*where T***v***M refers to the tangent space consisting of all the*
**tangent vectors at one point v on M. Regarding the space H ,*** the components of the metric tensor G at point v are given*
by:

*g**mn**(v) = g***v**

*∂ M*

*∂e*^{i mx}

**v**

*,*

*∂ M*

*∂e*^{i nx}

**v**

*.*

*With the metric tensor G, the inner product between two*
* tangent vectors v*1

*∈ T*

**v***2*

**M and v***∈ T*

**v***M can be com-*

*puted by g*

**v***1*

**(v***2*

**, v**

**) = v**_{1}

^{T}*2. Let us consider another point*

**Gv****v**^{}**= v + dv, where dv = (0, . . ., 0, dc***m**, 0, . . ., 0). The dis-*
tance,**||dv||, between v**^{}* and v defined on the manifold can*
be computed by:

**dv**^{2}**= dv**^{2}**= dv**^{T}**Gdv**= g*mm***(v) · dc***m**· dc**m**.* (9)

*Suppose f*_{v}^{}*(x) and f***v**^{}*(x) are the Fourier series approxi-*
**mations defined by v**^{}**and v. We calculate the difference,****|dv|,***between f*_{v}^{}_{}*(x) and f***v**^{}*(x) using the following equation:*

**dv =**

*f*_{v}^{}*(x) − f***v**^{}*(x)*
*dx*=

*dc**m*e^{i mx}*dx*

*= dc**m*

e^{i mx}*dx.* (10)

By combining Eqs. (9) and (10), g*mm***(v) equals to**

e^{i mx}*dx*2

. Let

e^{i mx}*dx*2

*= (C*^{m}*)*^{2}*, where C**m* can be obtained from
the inverse Fourier transform on the frequency_{2}^{m}* _{π}*.

**Next, let us replace dv with**(0, . . ., 0, dc*m**, 0, . . ., 0, d**n**, 0,*
*. . ., 0). Equation (9) can be rewritten as:*

**dv**^{2}**= dv**^{T}**Gdv**= g*mm***(v) · dc***m**· dc**m*

*+2g**mn***(v) · dc***m**· dc**n**+ g**nn***(v) · dc***n**· dc**n*

*= C*_{m}^{2} *(dc**m**)*^{2}*+ 2g**mn***(v) · dc***m**· dc**n**+ C*_{n}^{2}*(dc**n**)*^{2}*,*
(11)
and Eq. (10) will become:

*dv =*

*f*_{v}^{}*(x) − f*_{v}^{}*(x)*
*dx*

*= dc**m*

e^{i mx}*dx+ dc**n*

e^{i nx}*dx*

and

*dv*^{2}=

*dc**m*

e^{i mx}*dx*

2

*+2dc**m**dc**n*

e^{i mx}*dx*

e^{i nx}*dx*
+

*dc**n*

e^{i nx}*dx*

2

*= (dc*^{m}*)*^{2}*C*_{m}^{2} *+ 2dc**m**dc**n**C**m**C**n**+ (dc*^{n}*)*^{2}*C*_{n}^{2}*. (12)*

By combining Eqs. (11) and (12), we can obtain g*mn***(v) =***C**m**C**n**. Finally, we have the metric tensor G that can be com-*
puted beforehand.

*Let r* *: [a, b] be a continuously differentiable curve in M.*

*We define dis(r(a), r(b)) as the length of the shortest path*
*(i.e., a geodesic) r between a and b:*

dis*(r(a), r(b)) =*

*b*

*a*

*r*^{}*(s)**ds*=

*b*

*a*

*r*^{}*(s)*^{T}*Gr*^{}*(s)ds*

=

*b*

*a*

*r*^{}*(s)*^{T}**CC**^{T}*r*^{}*(s)ds =*

*b*

*a*

*r*^{}*(s)*^{T}**Cds**

=

*b*

*a*

*r*^{}*(s)*^{T}*ds · C = r*

^{T}*=*

**C***r*^{T}*Gr*

*,*

(13)
**where C**= (C*−N**, . . . , C*0*, . . . , C**N**) and r = r(b) − r(a).*

*The elements of the similarity matrix S are defined by*
*S**i**, j* *= k(d(i, j)), where d(i, j) is given by Eq. (13), d(i, j)*

**= dis(v***i***, v***j**). Constructing S will require time complexity*
*O(n*^{3}*) where n = |A(i)|. Using the method relating to Eq. (2)*
*proposed in the previous section, we can obtain P*^{∗}*from S.*

The graph with fewer nodes is padded with dummy nodes
to give an equal number of nodes in each graph. In the matri-
ces, the feature values of the dummy nodes and their corre-
sponding edges are set to zero. To handle the zero values in
*the matrix, the corresponding values of f(x) are set to be*
a very small number. After graph matching, each node will
match one of the nodes in the other graph. Dummy nodes
will match either a dummy node or a redundant node in the
other graph.

*Based on the definition, A(i) contains the weights of the*
**connected edges of node i . Its corresponding expressions v***i*

*can be thought of as the feature vectors of node i . In other*
*words, we transform the edge values A(i) of the binary rela-*
**tionship to the feature vector, v***i**, of node i . The same situation*
*occurs in the transformation from A*^{}**( j) to v***j*. Solving Eq. (1)
will be equivalent to solving Eq. (2) if we set U*i* **= v***i* and
replace Eq. (3) with Eq. (13). We transform the graph match-
ing to a bipartite matching. Figure2shows the overview of

the transformation. In the next section, the same idea pro-
vides a method for transforming a matching that considers
*d-ary relationships to a bipartite matching.*

**5 Hypergraph matching**

As a weighted adjacency matrix can be used to represent an
attributed graph, an attributed tensor can be used to represent
*an attributed hypergraph with d-ary affinities, each of which*
*is a relationship of d nodes [22]. Let us take a matching of*
ternary relationships for example. As the matrix is applied to
represent the binary relationship of the nodes, a third-order
*tensor T is applied to represent the ternary relationship of*
*the nodes. A third-order tensor T can be regarded as a three-*
*dimensional array, and its element T**i**, j,k*contains the weight
*among the i th, j th, and kth nodes. The element T**i**, j,k* will
*store the weight value of an edge if two of the indexes (i, j,*
*or k) are equal, while T**i**,i,i* will store the feature value of
*single node i as the indexes are the same.*

With respect to the Cartesian basis **{e***i* *= (. . ., 0, 1, 0,*
*. . .)*^{T}*}, where all the elements are 0 except that the ith ele-*
ment is 1, a third-order tensor can be represented by a linear
combination of the Cartesian components:

*T* =

*i**, j,k*

*T**i**, j,k***e***i***⊗ e***j***⊗ e***k**,* (14)

**where e***i* **⊗ e***j* **⊗ e***k* is the dyadic product of three vectors
**e***i***, e***j***, and e***k*. The dimension of**{e***i*} is the same as the number
of nodes in our case, and the dyadic product of three vectors
is extended from the dyadic product of two vectors. As the
**dyadic product of two vectors is shown as a matrix, e***i***⊗e***j***⊗e***k*

is a three-dimensional array with 0 in all elements except for
*the (i, j, k) element (equal to 1). In the following expression,*
**Fig. 2 Overview of the**

transformation from graph matching to bipartite matching

the summation sign of the triple will be suppressed (using the Einstein summation convention), and a tensor expressed by Eq. (14) will be denoted by

*T* *= T**i**, j,k***e***i***⊗ e***j* **⊗ e***k**.*

Solving the correspondence between the feature points in different images can be formulated as the following mini- mization problem with the above triple:

*P*^{∗}= arg min

*P*

*T − T**i*^{}*, j,k***(Pe***i***) ⊗ (Pe***j***) ⊗ (Pe***k**)** , (15)*
**where P is the permutation matrix and Pe***i* will equal
**e***j* *if P**i**, j* = 1. The permutation matrix can permute
*the slices T*_{j}^{}_{,:,:}*, T*_{:, j,:}^{} *, and T*_{:,:, j}^{} *to subtract T**i**,:,:**, T** _{:,i:}*, and

*T*

*, respectively, in Eq. (15). A slice T*

_{:,:,i}

_{j}^{}

*can be con- sidered as a matrix, which can have the inner product*

_{,:,:}*applied with the permutation matrix, P T*

_{j}^{}

_{,:,:}*P*

*, as it is expressed in graph matching. To minimize Eq. (15), the per-*

^{T}*muted slice P T*

_{j}^{}

_{,:,:}*P*

^{T}*, PT*

_{:, j,:}^{}

*P*

^{T}*, PT*

_{:,:, j}^{}

*P*

*must similar to*

^{T}*T*

*i*

*,:,:*

*, T*

_{:,i,:}*, T*

*, respectively. As discussed in the previous*

_{:,:,i}*section, since T and T*

^{}are associated with hypergraphs, Eq. (15) expresses the solution for a hypergraph matching.

*Let us consider T(i) as a bag of elements in T**i**,:,:**, T** _{:,i:}*, and

*T*

_{:,:, j}*, and consider T*

^{}

*( j) as a bag of elements in T*

_{j}^{}

_{,:,:}*, T*

_{:, j,:}^{},

*and T*

_{:,:, j}^{}

*. The higher the similarity between T(i) and T*

^{}

*( j),*

*the higher the probability that P*

*i*

*, j*will be to 1. Using

*the process described earlier, T(i) and T*

^{}

*( j) can be trans-*

**formed and expressed as the feature vectors v***i*

**and v***j*using Eq. (5). The time complexity for a transformation is O

*(n*

^{2}

*).*

*After setting r(a) = v**i* *and r (b) = v*

*j*, we can measure

the dissimilarity using Eq. (13) and construct S for solv- ing the permutation matrix in Eq. (2). Extending the tensor

*order from three to d is straightforward. The difference is the*
*time complexity, which is O(n*^{d}^{−1}*) for computing the vector*
*and O(n*^{d}^{+1}*) for computing P*^{∗}.

5.1 Proof

Using the above transformation, a hypergraph matching can be reduced to a bipartite matching in polynomial time. As the latter can be solved in polynomial time, so can the proposed hypergraph matching. Since graph matching is generally an NP-hard problem, the solution should be an approximation.

However, we have a proof to show that the processing result is the global minimum of Eq. (15).

It is known that bipartite matching can be solved in poly-
nomial time, which means that we can have global minimum
of Eq. (1) using those methods mentioned before. Let vector
**U consist of elements v***i*. Then, Eq. (1) can be expanded to
the following:

*P*^{∗}= arg min

*P*

**U****− PU**^{} =arg min

*P*

*j*

**v***j* **− v**_{π( j)}^{2}*,*

(16)
where *π( j) refers to the permutation of element j. As in*
Eq. (8), v*j* **and v**_{π( j)}*refer to the Fourier series f*_{j}^{}*(x) and*
*f*_{π( j)}^{} *(x), respectively. Minimizing the norm,***v***j***− v***π( j)*is
equivalent to minimizing the expected value
*E*

*f*_{j}^{}*(x) − f*_{π( j)}^{} *(x)*

. Thus, by substituting this for the norm, Eq. (16) can be expanded as:

arg min

*P*

*j*

**v***j* **− v**_{π( j)}^{2}

= arg min

*P*

*j*

*E*

*f*_{j}^{}*(x) − f*_{π( j)}^{} *(x)*2

= arg min

*P*

*j*

*E*

⎡

⎣ ^{N}

*n**=−N*

⎛

*⎝C*^{|T ( j)|}

*k*=1

e^{−ina}^{j}^{,k}

⎞

⎠ e* ^{i nx}*−

*N*
*n**=−N*

⎛

*⎝C*|^{T}^{}* ^{(π( j))}*|

*k*=1

e^{−ina}^{π( j),π(k)}^{}

⎞

⎠ e^{i nx}

⎤

⎦

2

*,* (17)

*where a**j**,k**is the kth element in bag T( j). Equation (17) can*
be further rearranged to:

arg min

*P*

*j*

*E*

⎡

⎣ ^{N}

*n**=−N*

⎛

*⎝C*^{|T ( j)|}

*k*=1

e^{−ina}^{j}^{,k}

⎞

⎠ e* ^{i nx}*−

*N*
*n**=−N*

⎛

*⎝C*|^{T}^{}*(π( j))* |

*k*=1

e^{−ina}^{}^{π( j),π(k)}

⎞

⎠ e^{i nx}

⎤

⎦

2

= arg min

*P*

*j*

*E*

⎡

⎣ ^{N}

*n**=−N*

⎛

⎝^{|T ( j)|}

*k*=1

e^{−ina}* ^{j,k}* − e

^{−ina}

^{π( j),π(k)}^{}⎞

⎠ e^{i nx}

⎤

⎦

2

= arg min

*P*

*j*

*E*

⎡

⎣^{|T ( j)|}

*k*=1

_{N}

*n**=−N*

e^{−ina}^{j}* ^{,k}*− e

^{−ina}^{}

*e*

^{π( j),π(k)}

^{i nx}⎤

⎦

2

= arg min

*P*

*j*

*E*

*k*

_{N}

*n**=−N*

e^{−ina}^{j}* ^{,k}*e

*−*

^{i nx}*N*
*n**=−N*

e^{−ina}^{π( j),π(k)}^{} e^{i nx}

!2

*,*

(18)

where we assume that*|T ( j)| = |T*^{}*(π( j)|. As defined in*
Eq. (7),*N*

*n**=−N*e^{−ina}^{j}* ^{,k}*e

*is the Fourier series approxi- mation of:*

^{i nx}*f(x) =*

1 *if x= a**j**,k*

0 otherwise *,*
with _{N}

*n**=−N*e^{−ina}^{}* ^{π( j),π(k)}*e

*defined similarly. Both*

^{i nx}_{N}

*n**=−N*e^{−ina}^{j}* ^{,k}*e

*and*

^{i nx}

_{N}*n**=−N*e^{−ina}^{π( j),π(k)}^{} e* ^{i nx}*are Gaussi-
an functions with the same standard deviation. The subtrac-
tion of these two Gaussian functions can be performed as the
subtraction of two random variables that are normally distrib-

*uted with mean values a*

*j*

*,k*

*and a*

*, respectively. Let*

_{π( j),π(k)}*us define two random variables, X*

*j*

*,k*≡

*N*

*n**=−N*e^{−ina}^{j}* ^{,k}*e

^{i nx}*and X*

_{( j),π(k)}*(x) ≡*

_{N}*n**=−N*e^{−ina}^{}* ^{π( j),π(k)}*e

*. These can be substituted into Eq. (18), giving:*

^{i nx}arg min

*P*

*j*

*E*

*k*

_{N}

*n=−N*

e^{−ina}* ^{j,k}*e

*−*

^{i nx}*N*
*n=−N*

e^{−ina}^{}* ^{π( j),π(k)}*e

^{i nx}!2

= arg min

*P*

*j*

*E*

*k*

*X*_{j,k}*− X**π( j),π(k)*!2

= arg min

*P*

*j*

*E*

*k*

*D X*_{j,k}

!_{2}

*,*
(19)

*where D X**j**,k* *= X**j**,k* *− X**π( j),π(k)*. We assume that each
*D X**j**,k* is independent to one other. Then, Eq. (19) can be
written as:

arg min

*P*

*j*

*E*

*k*

*D X*_{j,k}

!_{2}

= arg min

*P*

*j*

*k*

*E*"

*D X** _{j,k}*#

_{2}

*.*

(20)
*Since the expected value E*"

*D X**j**,k*#

shows the difference
*between a**j**,k**and a** _{π( j),π(k)}*, Eq. (20) becomes:

arg min

*P*

*j*

*k*

*E*"

*D X**j**,k*#2

= arg min

*P*

*T − T**i*^{}*, j,k***(Pe***i***) ⊗ (Pe***j***) ⊗ (Pe***k**)** .* (21)
*Finally, we have P*^{∗}= arg min

*P*

*T − T*_{i}^{}_{, j,k}**(Pe***i***) ⊗ (Pe***j**)*

**⊗(Pe***k**)*, proving that our processing result satisfies Eq. (15).

5.2 Error propagation and discernibility

Most of the previous research has focused on the matching of
an attributed graph constructed using node values (unary fea-
tures) and edge values (binary relationships). In hypergraph
matching, we try to solve the matching problem involving
high-order relationships [26]. Such matching has the advan-
tages of increasing the geometric invariance and more models
can explain [22]. However, sometimes high-order feature val-
ues have greater deviation than expected. When we construct
*a d-order tensor for hypergraph matching, the absence of one*
*pair influences the d×(d −1)×n*^{d}^{−2}feature values of a bag.

*To prove this, let us consider a d-order tensor T**i*1*,i*2*,...,i**d* and
**a specific node i . The bag for computing feature vector v***i*

*contains d elements in T**:,...:,i,:,...,:**. For each T**:,...:,i,:,...,:*, one
*of the remaining (d*− 1) indexes indicates the correspond-
ing node whose feature vector will be influenced. This leaves
*d−2 indexes to set as one of the n index values (n*^{d}^{−2}cases).

*With respect to the bag size n*^{d}^{−1}, the ratio of the changed
elements to the non-changed elements in a bag is:

*d× (d − 1)*
*n*

*under one missing pair. When we are missing k pairs, the*
ratio would be:

*k× d × (d − 1)*

*n* *= r × d × (d − 1),*

* Fig. 3 a The relationship between r and d when the rate of influence reaches 100 %. b The degree of influence (y-axis) to the ratio (x-axis) of the*
number of feature values being changed

*where r* = ^{k}* _{n}* is defined as the rate of missing nodes. As the

*order d (>1) is increased, the influenced elements of a bag*increase exponentially. This influence leads to low tolerance for missing nodes. Figure3a shows the relationship between

*r and d when the rate of influence reaches 100 %. We can*see that while the tensor order is

*>4, the missing rate cannot*be

*>10 %.*

Increasing the order of the tensor leads to another diffi-
culty: the decreasing degree of discernibility. As long as the
number of elements in a bag is very large, those bags will
become harder to distinguish. To demonstrate this, we ran-
*domly produce k feature values and considered these values*
as the elements of a bag. This bag is then computed to the fea-
* ture vector v*oriusing Eq. (5). Given a missing rate of r, r × k
feature values of the bag are randomly selected and replaced
using new random values. Then, a new bag is produced and

*new. Their dis- similarity is computed as dis*

**can be computed with a new feature vector v***(v*ori

*, v*new

*) using Eq. (*13), and indicates the degree of discernibility. Figure3b shows the

*degree of discernibility in relation to r . We see that while*the number of elements is

*>1,000, the two bags cannot be*distinguished, even if they are 30 % different. Such issues are easy to encounter in hypergraph matching, for example,

*if d*

*= 3 and n > 32, then the number (32*

^{3}

*) of elements*in the bag will be

*>1,000; if d = 4 and n > 10 , then the*number

*(10*

^{3}

*) of elements in the bag will also be >1,000.*

Some of the above issues can be addressed. First, if only a few nodes are designed for hypergraph matching, the prop- agation of errors will be reduced. Second, to avoid the influ- ence of noise, the hypergraph matching can only be applied in a case where there are no missing nodes or data noise. Third, constructing a tensor that is very sparse helps to reduce the effects of error propagation. This can be achieved by apply- ing constraints as proposed by Duchenne et al. [22]. Finally, we can select a suitable tensor order according to the order of the relationship, and use the attributed hypergraph-matching method proposed in the next section to combine different orders of relationships.

**6 Attributed hypergraph matching**

Attributed graph matching is defined as:

*P*^{∗}= arg min

_{m}*P*

*k*=1

*w**k**T**k**− T**k*^{}*,i*1*,··· ,i*_{dk}**(Pe***i*_{1}**) ⊗ · · · ⊗ (Pe***i*_{dk}*)*

*,*

(22)
*where d**k**refers to the order of the tensor relating to the d**k*-ary
relationship, and*w**k* is the weight for describing the impor-
*tance of the kth feature. If d**k* = 1, then*T − T**i*_{1}**(Pe**_{i1}*)* will
equal the form of*U− PU*^{}^{}relating to Eq. (2). Similarly,
*if d**k* = 2, then *T − T**i*^{}_{1}*,i*2**(Pe***i*1**) ⊗ (Pe***i*2*)* will equal the
form of*A− P A*^{}*P** ^{T}*relating to Eq. (1). As we have the
solution for each part of the Eq. (22) (hypergraph matching),
i.e.,

*P*^{∗}= arg min

*P*

*T**k**− T**k*^{}*,i*1*,...,i*_{dk}**(Pe***i*1**) ⊗ · · · ⊗ (Pe***i*_{dk}*)** ,*
(23)
we solve this problem first and then combine all the solu-
*tions. Recall that we can compute the similarity matrix S**k*

for Eq. (23) at first. Then the combination is defined as:

*S*=

*m*
*k*=1

*w**k**S**k**,* (24)

*and the final solution P*^{∗}*is derived from S using Eq. (2).*

However, previous research [19] shows that combining multiple solutions may increase the occurrence of problems due to degradation and bias if one of the error solutions dom- inates the result. A reason for this could be that the solutions are solved from different types of equations and instances,