Bridging the Gap between von Neumann Graph Entropy and Structural Information: Theory and Applications

(1)

Bridging the Gap between von Neumann Graph Entropy and Structural Information: Theory and Applications

Xuecheng Liu

Shanghai Jiao Tong University [email protected]

Luoyi Fu

Xinbing Wang

ABSTRACT

The von Neumann graph entropy is a measure of graph complexity based on the Laplacian spectrum. It has recently found applications in various learning tasks driven by networked data. However, it is computational demanding and hard to interpret using simple structural patterns. Due to the close relation between Lapalcian spectrum and degree sequence, we conjecture that the structural information, defined as the Shannon entropy of the normalized degree sequence, might be a good approximation of the von Neumann graph entropy that is both scalable and interpretable.

In this work, we thereby study the difference between the structural information and von Neumann graph entropy named asentropy gap. Based on the knowledge that the degree sequence is majorized by the Laplacian spectrum, we for the first time prove the entropy gap is between 0 and log

2e in any undirected unweighted graphs. Consequently we certify that the structural information is a good approximation of the von Neumann graph entropy that achieves provable accuracy, scalability, and interpretability simultaneously. We further study two entropy based applications which can benefit from the bounded entropy gap and structural information:

network design and graph similarity measure. We combine greedy method and pruning strategy to develop fast algorithm for the network design, and propose a novel graph similarity measure with a fast incremental algorithm for graph streams. Our experimental results on graphs of various scales and types show that the very small entropy gap readily applies to a wide range of graphs and weighted graphs. As an approximation of the von Neumann graph entropy, the structural information is the only one that achieves both high efficiency and high accuracy among the prominent methods. It is at least two orders of magnitude faster than SLaQ [40] with comparable accuracy. Our structural information based methods also exhibit superior performance in two entropy based applications.

ACM Reference Format:

Xuecheng Liu, Luoyi Fu, and Xinbing Wang. 2021. Bridging the Gap between von Neumann Graph Entropy and Structural Information: Theory and Applications. InWWW ’21: Proceedings of The Web Conference 2021, April 19–23, 2021, Ljubljana, Slovenia. ACM, New York, NY, USA, 12 pages. https:

//doi.org/10.1145/1122445.1122456

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and /or a fee. Request permissions from [email protected].

WWW ’21, April 19–23, 2021, Ljubljana, Slovenia

ACM ISBN 978-1-4503-XXXX-X/18/06. . . $15.00 https://doi.org/10.1145/1122445.1122456

1 INTRODUCTION

Evidence has rapidly grown in the past few years that graphs are ubiquitous in our daily life; online social networks, metabolic networks, transportation networks, and collaboration networks are just a few examples that could be represented precisely by graphs.

One important issue in graph analysis is to measure the complexity of these graphs [4, 28] which refers to the level of organization of the structural features such as the scaling behavior of degree distribution, community structure, etc. In order to capture the inherent structural complexity of graphs, many entropy based graph measures [5, 13, 21, 28, 36, 37] are proposed, each of which is a specific form of the Shannon entropy for different types of distributions extracted from the graphs.

As one of the aforementioned entropy based graph complexity measures, the von Neumann graph entropy defined as the Shan- non entropy of the spectrum of the trace rescaled Laplacian matrix of a graph (see Definition 3.1), is of special interests to scholars and practitioners [2, 7, 8, 12, 15, 22, 30, 40]. This spectral based entropy measure distinguishes between different graph structures.

For instance, it is maximal for complete graphs, minimal for graphs with only single edge, and takes on intermediate values for ring graphs. Actually, the entropy measure originates from quantum information theory and is used to describe the mixedness of a quantum system. It is Braunstein et al. that first use the von Neumann entropy to measure the complexity of graphs by viewing each pure state of a quantum system as one of the edges of a graph [5].

Built upon the Laplacian spectra, the von Neumann graph entropy is a natural choice to capture the graph complexity since the Laplacian spectra is well-known to contain rich information about the multi-scale structure of graphs [17, 20]. As a result, it has recently found applications in downstream tasks of complex network analysis and pattern recognition. For example, the von Neumann graph entropy facilitates the measure of graph similarity via Jensen-Shannon divergence, which could be used to compress multilayer networks [15] and detect anomalies in graph streams [7]. As another example, the von Neumann graph entropy could be used to measure edge centrality [30] and design entropy-driven networks [33].

1.1 Motivations

However, despite the popularity received in applications, the main obstacle encountered in practice is the computational inefficiency of the exact von Neumann graph entropy. Indeed, as the spectral based entropy measure, the von Neumann graph entropy suffers from computational inefficiency since the computational complexity of the graph spectrum is cubic in the number of nodes. Meanwhile, the existing approximation approaches [7, 8, 40] such as quadratic approximation fail to capture the presence of non-trivial structural

(2)

0 10 20 30

index

0 10

spectrum degree

(a) Zachary’s karate club

0 20 40 60

index

0

10 spectrum

degree

(b) Dolphins

Figure 1: The close relation between Laplacian spectra and degree sequence in two representative real-world graphs.

Both the Laplacian spectra and degree sequence are sorted in non-increasing order. The x-axis represents the index of the sorted sequences, and the y-axis represents the value of Laplacian spectrum and degree.

patterns that seem to interpret the spectral based entropy measure.

Therefore,there is a strong desire to find a good approximation that achieves accuracy, scalability, and interpretability simultaneously.

Instead of starting from scratch, we are inspired by the well- known knowledge that there is a close relationship between the combinatorial characteristics of a graph and the algebraic properties of its associated matrices [9]. To illustrate, we plot the Laplacian spectrum and degree sequence together in a same figure for two representative real-world graphs. As shown in Fig. 1, the sorted spectrum sequence and the sorted degree sequence almost coincide with each other. The similar phenomenon can also be observed in larger scale free graphs, which indicates that it is possible to reduce the approximation of the von Neumann graph entropy to the time- efficient computation of simple node degree statistics. Therefore, we ask without hesitation the first research question,

RQ1: Does there exist some non-polynomial functionϕ such that Í_n

i=1ϕdi/Í_n

j=1dj

is close to the von Neumann graph entropy?

wherediis the degree of the nodei in a graph of order n.

We emphasize on the non-polynomial property of the function ϕ since most of previous works that are based on polynomial approximations fail to fulfill the interpretability. The challenges from scalability and interpretability are translated directly into two re- quirements on the functionϕ to be determined. First, the explicit expression ofϕ must exist and keep simple to ensure the interpretability of the sum over degree statistics. Second, the functionϕ should be graph-agnostic to meet the scalability requirement, that is,ϕ should be independent from the graph to be analyzed. One natural choice yielded by the entropy nature of the graph complexity measure for the non-polynomial functionϕ is ϕ(x) = −x log2x.

The sum−Ín

i=1d_i/Ín j=1d_j

log2d_i/Ín j=1d_j

has been named as one-dimensionalstructural information by Li et al. [28] in a connected graph since it has an entropy form and captures the information of a classic random walker in a graph. We extend this notion to arbitrary undirected graphs. Following the questionRQ1, we raise the second research question,

RQ2: Is the structural information an accurate proxy of the von Neumann graph entropy?

To address the second question, we conduct to our knowledge a first study of the difference between structural information and von Neumann graph entropy, which we name asentropy gap.

1.2 Contributions

To study the entropy gap, we are based on a fundamental relationship between Laplacian spectrumλ and degree sequence d in undirected graphs:d is majorized byλ. In other words, there is a doubly stochastic matrixP such that Pλ = d. Leveraging the majorization and classic Jensen’s inequality, we prove that the entropy gap is no less than 0 in arbitrary undirected graphs. By exploiting the Jensen’s gap [29] which is an inverse version of the classic Jensen’s inequality, we further prove that the entropy gap is no more than log

2e in arbitrary unweighted undirected graphs. The constant lower and upper bounds on the entropy gap are further sharpened using more advanced knowledge about the Lapalcian spectrum and degree sequence, such as the Grone-Merris majorization [1]. We also apply the similar technique to bound the entropy gap in weighted graphs.

In a nutshell, our paper makes the following contributions:

• Theory and interpretability: Inspired by the close relation between Laplacian spectrum and degree sequence, we for the first time bridge the gap between the von Neumann graph entropy and structural information by proving that the entropy gap is between 0 and log

2e in any unweighted graph. To the best of our knowledge, the constant bounds on the approximation error in unweighted graphs are sharper than that of any existing approaches with provable accuracy, such as FINGER [7]. There- fore, the answers to bothRQ1 and RQ2 are YES! Besides, the structural information provides a simple geometric interpretation of the von Neumann graph entropy as a measure of degree heterogeneity. Thus, the structural information is a good approximation of the von Neumann graph entropy that achieves provable accuracy, scalability, and interpretability simultaneously.

• Applications and efficient algorithms: Using the structural information as a proxy of the von Neumann graph entropy with bounded error (entropy gap), we develop fast algorithms for two entropy based applications: network design and graph similarity measure. For the network design aiming to maximize the von Neumann entropy, we combine greedy method and pruning strategy to speed up the searching process. For the graph similarity measure, we propose a new distance measure based on structural information and Jensen-Shannon divergence. We further show that the proposed measure is a pseudometric and devise fast incremental algorithm to compute the similarity between adjacent graphs in a graph stream.

• Extensive experiments and evaluations: We use 3 random graph models, 9 real-world static graphs, and 2 real-world temporal graphs to evaluate the properties of the entropy gap and proposed algorithms. The results show that the entropy gap is small in a wide range of graphs, including the weighted graphs.

And it is insensitive to the change of graph size. Compared with prominent methods for approximating the von Neumann graph entropy, the structural information is superior in both accuracy and computational speed. It is at least 2 orders of magnitude faster than the accurate SLaQ [40] algorithm with comparable accuracy.

Our proposed algorithms based on structural information also exhibit superb performance in two entropy based applications.

Roadmap: The remainder of this paper is organized as follows. We review two related issues in Section 2. In Section 3 we introduce

(3)

Table 1: Comparison of methods for approximating the von Neumann graph entropy in terms of fulfilled (✓) and miss- ing (✗) properties.

[7] [40] [8] Structural Information (Ours)

Provable accuracy ✓ ✗ ✗ ✓

Scalability ✓ ✓ ✗ ✓

Interpretability ✗ ✗ ✗ ✓

the definitions of the von Neumann graph entropy, structural information, and the notion of entropy gap. Section 4 shows the close relationship between von Neumann graph entropy and structural information by bounding the entropy gap. Section 5 presents efficient algorithms for two graph entropy based applications. Section 6 provides experimental results. Section 7 offers some conclusions and directions for future research.

2 RELATED WORK

We review two main issues arised from the broad applications [2, 6, 11, 15, 26, 30, 31, 33] of the von Neumann graph entropy:

computation and interpretation.

Approximate computation of the von Neumann graph entropy: In an effort to overcome the computational inefficiency of the von Neumann graph entropy, past works have resorted to various numerical approximations. Chen et al. [7] first compute a quadratic approximation of the entropy via Taylor expansion, then derive two finer approximations with accuracy guarantee by spectrum-based and degree-based rescaling, respectively. Before Chen’s work, the Taylor expansion is widely adopted to give compu- tationally efficient approximations [45], but there is no theoretical guarantee on the approximation accuracy. Following Chen’s work, Choi et al. [8] propose several more complex quadratic approximations based on advanced polynomial approximation methods whose superiority are verified through experiments.

Besides, there is a trend to approximate spectral sums using stochastic trace estimation based approximations [19], the merit of which is the provable error-bounded estimation of the spectral sums. For example, Kontopoulou et al. [22] propose three random- ized algorithms based on Taylor series, Chebyshev polynomials, and random projection matrices to approximate the von Neumann entropy of density matrices. As another example, based on the stochastic Lanczos quadrature technique [41], Tsitsulin et al. [40]

propose an efficient and effective approximation technique called SLaQ to estimate the von Neumann entropy and other spectral descriptors for web-scale graphs. However, the approximation error bound of SLaQ for the von Neumann graph entropy is not provided.

The disadvantages of such stochastic approximations are also ob- vious; their computational efficiency depends on the number of random vectors used in stochastic trace estimation, and they are not suitable for applications like anomaly detection in graph streams and entropy-driven network design.

The comparison of methods for approximating the von Neumann graph entropy is presented in Table 1. One of the common draw- backs of the aforementioned methods is the lack of interpretability, that is, none of these methods provide enough evidence to interpret this spectral based entropy measure in terms of structural patterns.

By contrast, as a good proxy of the von Neumann graph entropy, the structural information offers us the intuition that the spectral based entropy measure is closely related to the degree heterogeneity of graphs.

Spectral descriptor of graphs and its structural counterpart:

Researchers in spectral graph theory have always been interested in establishing a connection between combinatorial characteristics of a graph and the algebraic properties of its associated matrices.

For example, the algebraic connectivity (also known as Fiedler eigenvalue), defined as the second smallest eigenvalue of a graph Laplacian matrix, has been used to measure the robustness [20] and synchronizability [46] of graphs. The magnitude of the algebraic connectivity has also been found to reflect how well connected the overall graph is [17]. As another example, the Fiedler vector, defined as the eigenvector corresponding to the Fiedler eigenvalue of a graph Laplacian matrix, has been found to be a good sign of the bi-partition structure of a graph [14]. However, there are some other spectral descriptors that have found applications in graph analytics, but require more structural interpretations, such as the heat kernel trace [39, 44] and von Neumann graph entropy.

Simmons et al. [38] suggest to interpret the von Neumann graph entropy as the centralization of graphs, which is very similar to our interpretation using structural information. They derive both upper and lower bounds on the von Neumann graph entropy in terms of graph centralization under some hard assumptions on the range of the von Neumann graph entropy. Therefore, their results cannot be directly converted to accuracy guaranteed approximations of the von Neumann graph entropy for arbitrary simple graphs. By constrast, our work shows that the structural information is an accurate, scalable, and interpretable proxy of the von Neumann graph entropy for arbitrary simple graphs. Besides, the techniques used in our proof are also quite different from [38].

3 PRELIMINARIES

In this paper, we study the undirected graphG = (V , E, A) with positive edge weights, whereV = {1, . . . ,n} is the node set, E is the edge set, andA ∈ R^n×n₊ is the symmetric weight matrix with positive entryAij denoting the weight of an edge(i, j) ∈ E. If the node pair(i, j) < E, then Aij = 0. If graph G is unweighted, the weight matrixA ∈ {0, 1}^n×n is called the adjacency matrix ofG. The degree of node i ∈ V in graph G is defined as d_i = Í_n

j=1Aij. The Laplacian matrix of graphG is defined as L = D − A whereD = diag(d¹, . . . , dn) is the degree matrix. Let {λi}ⁿ_i=1be the sorted eigenvalues ofL such that λ1 ≥ λ2 ≥ · · · ≥ λn = 0, which is called Laplacian spectrum. We define vol(G) = Íⁿ_i=1d_i as the volume of graphG, then vol(G) = tr(L) = Íⁿ_i=1λi where tr(·) is the trace operator. For the convenience of delineation, we define a special functionf (x) ≜ x log2x on the support [0, ∞) where f (0) ≜ limx ↓0f (x) = 0 by convention. In the following, we present formal definitions of the von Neumann graph entropy, structural information, and the entropy gap. Slightly different from the one-dimensional structural information proposed by Li et al.

[28], our definition of structural information does not require the graphG to be connected.

Definition 3.1 (von Neumann graph entropy). The von Neumann graph entropy of an undirected graphG = (V , E, A) is defined as

(4)

Hvn(G) = − Íⁿ_i=1f (λi/vol(G)), where λ1≥λ2≥ · · · ≥λn= 0 are the eigenvalues of the Laplacian matrixL = D − A of the graph G, and vol(G) = Íⁿ_i=1λi is the volume ofG.

Definition 3.2 (Structural information). The structural information of an undirected graphG = (V , E, A) is defined as H¹(G) =

−Ín

i=1f (d_i/vol(G)), where d_i is the degree of nodei in G and vol(G) = Íⁿ_i=1is the volume ofG.

Definition 3.3 (Entropy gap). The entropy gap of an undirected graphG = (V , E, A) is defined as ∆H(G) = H¹(G) − Hvn(G).

The von Neumann graph entropy and structural information are well-defined for all the undirected graphs except for the graphs with empty edge set, in which vol(G) = 0. When E = ∅, we take it for granted thatH1(G) = H^vn(G) = 0.

4 APPROXIMATION ERROR ANALYSIS

In this section we bound the entropy gap in the undirected graphs of ordern. Since the nodes with degree 0 have no contribution to structural information and von Neumann graph entropy, without loss of generality we assume thatd_i > 0 for any node i ∈ V .

4.1 Bounds on the Approximation Error

We first provide the additive approximation errors in Theorem 4.1, Corollary 4.5, and Corollary 4.6, then obtain the multiplicative approximation error in Theorem 4.7.

Theorem 4.1 (Bounds on the absolute approximation error).

For any undirected graphG = (V , E, A), the inequality 0≤∆H(G) ≤log

2e δ · tr(A²)

vol(G) (1)

holds, whereδ = min{di|di > 0} is the minimum positive degree.

Before proving Theorem 4.1, we introduce two techniques: majorization and Jensen’s gap. The former one is a preorder of the vector of reals, while the latter is an inverse version of the Jensen’s inequality, whose definitions are presented as follows.

Definition 4.2 (Majorization [32]). For a vector x ∈ R^d, we denote byx^↓ ∈ R^dthe vector with the same components, but sorted in descending order. Givenx, y ∈ R^d, we say thatx majorizes y (written asx ≻ y) if and only ifÍ_k

i=1x_i^↓≥Í_k

i=1y_i^↓fork = 1, . . . ,d andx^T1= y^T1.

Lemma 4.3 ( Jensen’s gap [29]). LetX be a one-dimensional random variable with meanµ and support Ω. Let ψ (x) be a twice differ- entiable function onΩ and define function h(x) =ψ (x)−ψ (µ)

(x−µ)² −^ψ_x−µ^′^(µ), then E[ψ (X )] − ψ (E[X ]) ≤ supx ∈Ω{h(x)} · var(X ). Additionally, if ψ^′(x) is convex, then h(x) is monotonically increasing in x, and if ψ^′(x) is concave, then h(x) is monotonically decreasing in x.

Lemma 4.4. The functionf (x) = x log2x is convex, its first order derivativef^′(x) = log2x + log2e is concave.

Proof. The second order derivativef^′′(x) = (log2e)/x > 0,

thusf (x) = x log2x is convex. □

We can see that the majorization characterizes the degree of concentration between two vectors,x ≻ y means that the entries

ofy are more concentrated on its mean y^T1/1^T1 than the entires ofx. An equivalent definition of the majorization [32] using linear algebra says thatx ≻ y if and only if there exists a doubly stochastic matrixP such that Px = y. As a famous example of the majorization, the Schur-Horn theorem [32] says that the diagonal elements of a positive semidefinite Hermitian matrix are majorized by its eigenvalues. Sincex^TLx = Í(i, j)∈EAij(xi−xj)²≥ 0 for any vector x ∈ Rⁿ, the Laplacian matrixL is a positive semidefinite symmetric matrix whose diagonal elements form the degree sequenced and eigenvalues form the spectrumλ. Therefore, λ ≻ d implying that there exists some doubly stochastic matrixP = (pij) ∈ [0, 1]^n×n such thatPλ = d.

Using the fact thatPλ = d and the convexity of f (x) in Lemma 4.4, we can now proceed to prove Theorem 4.1.

Proof of Theorem 4.1. For eachi ∈ V , we define a discrete random variableXi with probability mass functionÍ_n

j=1pijδ_λ_j(x), whereδa(x) is the Kronecker delta function. Then the expectation E[Xi]= Íⁿ_j=1p_ijλ_j = di and the variancevar(X_i)= Íⁿ_j=1p_ij(λ_j− d_i)²= Íⁿ_j=1p_ijλ²_j−d_i².

First, we express the entropy gap in terms of the Lapalcian spectrum and the degree sequence. Since

H1(G) = − Õn i=1

di

vol(G)

log2

di

vol(G)

= − 1 vol(G)

Õn i=1

f (di) − Õn i=1

dilog

2(vol(G))

!

= log2(vol(G)) − Í_n

i=1f (d_i) vol(G) ,

(2)

and similarlyHvn(G) = log2(vol(G)) − Íⁿ_i=1f (λ_i)/vol(G), we have

∆H(G) = H¹(G) − Hvn(G) = Í_n

i=1f (λi) −Í_n

i=1f (di)

vol(G) . (3)

Second, we use Jensen’s inequality to prove∆H(G) ≥ 0. Since f (x) is convex, f (di)= f (E[Xi]) ≤ E[f (Xi)] for anyi ∈ {1, . . . , n}.

By summing overi, we have Õn

i=1

f (di) ≤ Õn

i=1E[ f (Xi)]= Õn i=1

Õn j=1

pijf (λj)= Õn j=1

f (λj).

Therefore,∆H(G) ≥ 0 for any undirected graphs.

Finally, we use Jensen’s gap to prove∆H(G) ≤ ^log_δ²^e_vol(G)^tr^(A²⁾. Apply the Jensen’s gap toX_i andf (x),

E[ f (Xi)] −f (E[Xi]) ≤ sup

x ∈[0,vol(G)]{hi(x)} · var(Xi), (4) where

hi(x) = f (x) − f (E[Xi])

(x − E[Xi])² − f^′(E[Xi]) x − E[Xi].

Since f^′(x) is concave, hi(x) is monotonically decreasing in x.

Therefore, supx ∈[0,vol(G)]{h_i(x)} = hi(0). Since h_i(0)= f (0) − f (d_i)

d_i² + f^′(d_i) di =log

2e di ≤

log2e δ ,

(5)

the inequality in (4) can be simplified as

n

Õ

j=1

pijf (λj) −f (di) ≤ log2e

δ · ©

«

n

Õ

j=1

pijλ²_j−d_i²ª

®

¬

. (5)

By summing both sides of the inequality (5) overi, we get an upper boundUB onÍ_n

j=1f (λ_j) −Í_n

i=1f (d_i) as

UB= log

2e δ ·

Õn i=1

©

« Õn j=1

pijλ²_j−d_i²ª

®

¬

=log

2e δ · ©

« Õn j=1

λ²_j − Õn i=1

d_i²ª

®

¬

= log₂e δ ·

tr(L²) − tr(D²)

= log

2e δ ·

tr(A²) − tr(AD) − tr(DA) = log

2e δ · tr(A²) As a result,∆H(G) = ^Íⁿⁱ⁼¹^{f (λ}volⁱ⁾⁻(G)^Íⁿⁱ⁼¹^{f (d}ⁱ⁾ ≤^log_δ²^e ^tr(^A

2) vol(G).

□ To illustrate the tightness of the bounds in Theorem 4.1, we further derive bounds on the entropy gap for unweighted graphs, especially the regular graphs. Via multiplicative error analysis, we show that the structural information converges to the von Neumann graph entropy as graph size grows.

Corollary 4.5 (Constant bounds on the entropy gap). For any unweighted, undirected graphG, 0 ≤ ∆H(G) ≤ log2e holds.

Proof. In unweighted graphG, tr(A²) = Íⁿ_i=1Í_n

j=1A_ijA_ji = Í_n

i=1Í_n

j=1Aij = Íⁿ_i=1di = vol(G) and δ ≥ 1, therefore 0 ≤

∆H(G) ≤^log_δ²^e_vol(^tr(A_G)²⁾ =^log_δ²^e ≤ log2e. □

Corollary 4.6 (Entropy gap of regular graphs). For any unweighted, undirected, regular graphG_dof degreed, the inequality 0≤∆H(G_d) ≤ ^log_d²^e holds.

Proof sketch. In any unweighted, regular graphG_d,δ = d. □ Theorem 4.7 (Convergence of the multiplicative approximation error). For almost all unweighted graphsG of order n,

H1(G)

Hvn(G)− 1 ≥ 0 and decays to 0 at the rate ofo(1/log2(n)).

Proof. Dairyko et al. [10] proved that for almost all unweighted graphsG of order n, H^vn(G) ≥ H^vn(K¹,n−1) whereK¹,n−1stands for the star graph. SinceHvn(K1,n−1)= log2(2n−2)−2n−2ⁿ log

2n = 1+¹2log

2n + o(1), _H^H¹^(G)

vn(G)− 1= ^∆H(G)_H

vn(G) ≤ _H^log²^e

vn(K1,n−1)= o(log¹ 2n).

□

4.2 Sharpened Bounds on the Entropy Gap

Though the constant bounds on the entropy gap is tight enough for applications, we can still sharpen the bounds on the entropy gap in unweighted graphs using more advanced majorizations.

Theorem 4.8 (Sharpened lower bound on entropy gap). For any unweighted, undirected graphG, ∆H(G) is lower bounded by (f (dmax+ 1) − f (d^max)+ f (δ − 1) − f (δ))/vol(G) where d^maxis the maximum degree andδ is the minimum positive degree.

Proof. The proof is based on the advanced majorization [18]:

λ ≻ (d1+ 1,d², . . . , d_n− 1) holds on any unweighted, undirected graphG where d¹ ≥d²≥ · · · ≥dnis the sorted degree sequence ofG. Similar to the proof of Theorem 4.1, we have Íⁿ_i=1f (λi) ≥ f (d1+1)+ f (dn−1)+Íⁿ⁻¹_i=2 f (d_i). Then the sharpened upper bound follows from the equation (3) sinced¹= d^maxanddn= δ. □ Theorem 4.9 (Sharpened upper bound on entropy gap). For any unweighted, undirected graphG = (V , E), ∆H(G) is upper bounded by min{log2e,b1,b2} whereb1 = ^Íⁿⁱ⁼¹_vol(^{f (d}_G)^∗ⁱ⁾−

Ín i=1f (d_i) vol(G)

andb2= log2(1+ Íⁿ_i=1d²_i/vol(G)) −^Íⁿⁱ⁼¹_vol(^{f (d}_G)ⁱ⁾. Here (d1^∗, . . . , d_n^∗) is the conjugate degree sequence ofG where d_k^∗= |{i|di ≥k}|.

Proof. We first prove∆H(G) ≤ b1using the Grone-Merris majorization [1]:(d1^∗, . . . , d_n^∗) ≻λ. Similar to the proof of Theorem 4.1, we haveÍ_n

i=1f (d_i^∗) ≥Í_n

i=1f (λi), thusb1≥

Ín

i=1f (λ_i)−Ín i=1f (d_i)

vol(G) =

∆H(G). We then prove ∆H(G) ≤ b2. Since Ín

i=1f (λ_i) vol(G) =

Õn i=1

λi

Ín j=1λ_j

!

log2λi ≤ log₂ Ín

i=1λ²_i Ín

j=1λ_j

!

and

Ín i=1λ²_i Ín

i=1λi = _vol(^tr(^L_G)²⁾ = 1 +^Í_vol(ⁿⁱ⁼¹_G)^dⁱ²,∆H(G) = ^Íⁿⁱ⁼¹^{f (λ}_vol(ⁱ_G)^{)−f (d}ⁱ⁾ ≤b2.

□

5 APPLICATIONS AND ALGORITHMS

As a measure of the structural complexity of a graph, the von Neu- mann entropy has been applied in a variety of applications. For example, the von Neumann graph entropy is exploited to measure the importance of an edge [30]. As another example, the von Neumann graph entropy can also be used to measure the distance between graphs for graph classification and anomaly detection [2, 7]. In addition, the von Neumann graph entropy is used in the context of network embedding [11] to learn low-dimensional feature repre- sentations of nodes. We observe that, in these applications, the von Neumann graph entropy is used to address the following primitive tasks:

• Entropy-based network design: Change the existing graph to a new graph such that the entropy requirement is attained with minimal perturbations on the existing graph. For example, Minello et al. [33] use the von Neumann entropy to explore the potential network growth model via experiments.

• Graph similarity measure: Compute a real positive number to reveal the similarity between two graphs. For example, Domenico et al. [15] use the von Neumann graph entropy to compute the Jensen-Shannon distance between graphs for the purpose of compressing multilayer networks.

To resolve both tasks, it requires computing the von Neumann graph entropy exactly. To reduce the computational cost and pre- serve the interpretability, we can use the accurate proxy, structural information, to approximately solve these tasks.

5.1 Entropy-based network design

Network design aims to minimally perturb the network to fulfill some goals. Consider a goal to maximize the von Neumann entropy of a graph, it helps to understand how different structural patterns

(6)

influence the entropy value. The entropy-based network design problem is formulated as follows,

Problem 1 (MaxEntropy). Given an unweighted, undirected graphG = (V , E) of order n and an integer budget k, find a set F of non-existing edges ofG whose addition to G creates the largest increase of the von Neumann graph entropy and |F | ≤ k.

Due to the spectral nature of the von Neumann graph entropy, it is not easy to find an effective strategy to perturb the graph, especially in the scenario where there are exponential number of combinations for the subsetF . If we use the structural information as a proxy of the von Neumann entropy, Problem 1 reduces to maximizeH1(G^′) whereG^′= (V , E ∪ F) such that |F | ≤ k. To further alleviate the computational pressure rooted in the exponential size of the search space forF , we adopt the greedy method in which the new edges are added one by one until either the structural information attains its maximum value log

2n or k new edges have already been added. We denote the graph withl new edges asG_l = (V , El), thenG0= G. Now suppose that we have Glwhose structural information is less than log

2n, then we want to find a new edgee_l+1 = (u,v) such that H¹(G_l+1) is maximized, where G_l+1= (V , E_l∪ {e_l+1}). Since H1(G_l+1) can be rewritten as

log2(2|E_l|+ 2) − 1

2|E_l|+ 2 f (du+ 1) + f (dv+ 1) + Õ

i,u,v

f (di)

! ,

the edgee_l+1maximizingH1(G_l+1) should also minimize the edge centralityEC(u,v) = f (du+ 1) − f (du)+ f (dv+ 1) − f (dv), where d_iis the degree of nodei in G_l.

We present the pseudocode of our fast algorithm EntropyAug in Algorithm 1, which leverages the pruning strategy to acceler- ate the process of finding a single new edge that creates a largest increase of the von Neumann entropy. EntropyAug starts by ini- tiating an empty setF used to contain the node pairs to be found and an entropy valueH used to record the maximum structural information in the graph evolution process (line 1). In each following iteration, it sorts the set of nodesV in non-decreasing degree order (line 3). Note that the edge centralityEC(u,v) has a nice monotonic property:EC(u1,v1) ≤ EC(u2,v2) if min{du1, dv1} ≤ min{d_u2, d_v2} and max{d_u1, d_v1} ≤ max{d_u2, d_v2}. With the sorted list of nodesVs, the monotonicity ofEC(u,v) can be translated into EC(V_s[i1],V_s[j1]) ≤EC(V_s[i2],V_s[j2]) if the indices satisfyi1< j1, i²< j²,i¹< i², andj¹< j². Thus, using the two pointers{head, tail}

and a thresholdT , it can prune the search space and find the desired non-adjacent node pair as fast as possible (line 4-12). It then adds the non-adjacent node pair minimizingEC(u,v) into F and update the graphG (line 13). The structural information of the updated graph is computed to determine whetherF is the optimal subset till current iteration (line 14-15).

5.2 Graph Similarity Measure

Entropy based graph similarity measure aims to compare graphs using Jensen-Shannon divergence. The Jensen-Shannon divergence, as a symmetrized and smoothed version of the Kullback-Leibler divergence, is defined formally in the following Definition 5.1.

Definition 5.1 (Jensen-Shannon divergence). LetP and Q be two probability distributions on the same support setΩ_N = {1, . . . , N }.

Algorithm 1: EntropyAug

Input: The graphG = (V , E) of order n, the budget k Output: A set of node pairs

1 F ← ∅, H ← 0;

2 while |F | < k do

3 Vs: list← sortV in non-decreasing degree order;

4 head← 0, tail ← |Vs| − 1,T ← +∞;

5 while head < tail do

6 fori = head + 1, head + 2, . . . , tail do

7 ifEC(V_s[head],V_s[i]) ≥ T then

8 tail←i − 1; break;

9 if (Vs[head],Vs[i]) < E then

10 u ← V_s[head],v ← V_s[i], T ← EC(u,v);

11 tail←i − 1; break;

12 head← head+ 1;

13 E ← E ∪ {(u,v)}, F ← F ∪ {(u,v)};

14 if H1(G) > H then H ← H1(G), F^∗←F ;

15 if H= log2n then break;

16 returnF^∗.

The Jensen-Shannon divergence betweenP and Q is defined as DJS(P, Q) = H((P + Q)/2) − H(P)/2 − H(Q)/2,

whereH(P) = − Í^N_i=1pilogpiis the entropy of the distributionP.

Endres et al. [16] prove thatpDJS(P, Q) is a bounded metric on the space of distributions overΩ_Nwith its maximum valuep

log 2 being attained when min{pi, qi}= 0 for every i ∈ ΩN. Since the von Neumann graph entropy is an entropy of the spectrum based distribution, Lamberti et al. [25] define a quantum Jensen-Shannon distance between two graphs which is closely related to the von Neumann graph entropy in the following Definition 5.2.

Definition 5.2 (Quantum Jensen-Shannon distance). The quantum Jensen-Shannon distance between two weighted, undirected graphs G1= (V , E¹, A1) andG2= (V , E², A2) is defined as DQJS(G1, G2)= q

Hvn(G) − (Hvn(G1)+ H^vn(G2))/2, whereG = (V , E¹∪E2, A) is an weighted graph withA = A¹/2vol(G1)+ A²/2vol(G2).

Based on the quantum Jensen-Shannon distance, we consider the following problem that can be applied in anomaly detection and multiplex network compression,

Problem 2. Compute the quantum Jensen-Shannon distance between adjacent graphs in a stream of graphs {G_k= (V , Ek, t_k)}^K_k=1 wheret_kis the timestamp of the graphG_kandt_k < t_k+1.

As a distance measure between graphs,DQJSis typically required to be a pseudometric [39], that is, it should be symmetric and satisfy triangle inequality. However, to the best of our knowledge, it is still an open problem whetherDQJSfulfills the triangle inequality [25].

Meanwhile, the quantum Jensen-Shannon distance inherits the computational inefficiency from the von Neumann graph entropy.

Therefore, to solve Problem 2 efficiently we propose a new distance measure based on structural information as a surrogate forDQJS.

(7)

Algorithm 2: IncreSim Input:G1and{∆G_k}^K−1_k=1 Output: {DSI(G_k, G_k+1)}^K−1

k=1

1 d ← the degree sequence of the graph G1;

2 m ← Íⁿ_i=1d_i/2;

3 H1(G¹) ← log2(2m) −2¹mÍ_n

i=1f (di);

4 fork = 1, . . . , K − 1 do

5 ∆d ← the degree sequence of the signed graph ∆G_k;

6 ∆m ← Í_{i ∈V}_k∆di/2;

7 Computea,b,y, z in Lemma 5.6 via iterating over V_k;

8 ComputeH1(G_k+1) and H1(G_k) based on Lemma 5.6;

9 DSI(G_k, G_k+1) ← q

H1(G_k) − (H1(G_k)+ H¹(G_k+1))/2;

10 m ← m + ∆m;

11 foreachi ∈ V_kdod_i ←d_i+ ∆di;

12 return {DSI(G_k, G_k+1)}_k=1^K−1

Definition 5.3 (Structural information distance). The structural information distance between two weighted, undirected graphs G1= (V , E¹, A1) andG2 = (V , E², A2) is defined as DSI(G1, G2)= q

H1(G) − (H1(G1)+ H¹(G2)) /2, whereG = (V , E¹∪E2, A) is an weighted graph withA = A¹/2vol(G1)+ A²/2vol(G2).

It is a little surprising to find thatDSIis a pseudometric, the details of which are stated in Theorem 5.4.

Theorem 5.4 (Properties of the distance measureDSI). The distance measure DSI(G1, G2) is a pseudometric on the space of undirected graphs:

• DSIis symmetric, i.e., DSI(G1, G2)= D^SI(G2, G1);

• DSIis non-negative, i.e., D^SI(G1, G2) ≥ 0 where the equality holds if and only ifÍn^d^{i, 1}

k=1d_{k, 1} =^Ín^d^{i, 2}

k=1d_{k, 2} for every nodei ∈ V wheredi, jis the degree of nodei in Gj;

• DSIobeys the triangle inequality, i.e.,

DSI(G1, G2)+ D^SI(G2, G3) ≥ DSI(G1, G3);

• DSIis upper bounded by 1, i.e., DSI(G1, G2) ≤ 1 where the equality holds if and only if min{di,1, di,2}= 0 for every node i ∈ V where d_{i, j}is the degree of nodei in G_j.

To establish a connection betweenDSIandDQJS, we study their extreme values and present the results in Theorem 5.5.

Theorem 5.5 (Connection between DQJS andDSI). Both DQJS(G1, G2) and DSI(G1, G2) attain the same maximum value of 1 under the identical condition that min{di,1, di,2}= 0 for every node i ∈ V where di, jis the degree of nodei in Gj.

In order to compute the structural information distance between adjacent graphs in the graph stream{G_k = (V , E_k, t_k)}^K_k=1, we first compute the structural informationH1(G_k) for eachk ∈ {1, . . . , K}, which takes Θ(Kn) time. Then we compute the structural information ofG_kwhose adjacent matrixA_k= Ak/2vol(G_k)+ A_k+1/2vol(G_k+1) for eachk ∈ {1, . . . , K − 1}. Since the degree of nodei in G_kisd_i,k =_2vol(G^d^{i, k}

k)+_2vol(G^d^{i, k+1}_k+1₎andÍ_n

i=1d_i,k = 1, the

structural information ofG_k isH1(G_k) = − Íⁿ_i=1f (d_i,k) which takesΘ(n) time for each k. Therefore, the total computational cost isΘ((2K − 1)n).

In practice, the graph stream is fully dynamic such that it would be more efficient to represent the graph stream as a stream of edge insertions and deletions over time, rather than a sequence of graphs. Suppose that the graph stream is represented as an initial base graphG1= (V , E¹, t1) and a sequence of graph changes {∆G_k= (V_k, E_+,k, E−,k, t_k)}_k=1^K−1wheret_kis the timestamp of the setE_+,kof edge insertions and the setE−,kof edge deletions, andV_k is the subset of nodes covered byE_+,k∪E_−,k. We can view the graph change∆G_kas a signed network where the edge inE_+,khas positive weight+1 and the edge in E−,khas negative weight−1. The degree of nodei ∈ V_kin the graph change∆G_krefers toÍ

j ∈V_kI{(i, j) ∈ E_+,k}−I{(i, j) ∈ E−,k}. Using the information about previous graph G_kand current graph change∆G_k, we can compute the entropy statistics of the current graphG_k+1incrementally and efficiently via the following lemma, whose proof can be found in the appendix.

Lemma 5.6. Using the degree sequenced of the graph G_k, the structural information H1(G_k), and the degree sequence∆d of the signed graph∆G_k, the structural information of the graphG_k+1can be efficiently computed as

H1(G_k+1)= f (2(m + ∆m)) − a − f (2m) + 2mH¹(G_k)

2(m + ∆m) ,

wherem = Íⁿ_i=1d_i/2,∆m = Í_{i ∈V}_k∆d_i/2, anda = Í_{i ∈V}_kf (d_i +

∆di) −f (di). Moreover, the structural information of the averaged graphG_kbetweenG_kandG_k+1can be efficiently computed as

H1(G_k)= −b − (2m − y)f (c) − c(f (2m) − 2mH¹(G_k) −z), wherey = Íi ∈V_kdi,z = Íi ∈V_kf (di),c = 4m(m+∆m)²^m+∆m , andb = Íi ∈V_k f _d

i

4m +₄₍^d_m+∆m)ⁱ^+∆dⁱ .

The pseudocode of our fast algorithm IncreSim for computing the structural information distance in a graph stream is shown in Algorithm 2. It starts by computing the structural information of the base graphG1(line 1-3), which takesΘ(n) time. In each following iteration, it first computes the value ofa,b, c,y, z (line 5-7), then calculates the structural information distance between two adjacent graphs (line 8-9), finally updates the edge countm and the degree sequenced (line 10-11). The time cost of each iteration is Θ(|V_k|), consequently the total time complexity isΘ(n + Í^K−1_k=1 |V_k|).

6 EXPERIMENTS AND EVALUATIONS

We conduct extensive experiments over both synthetic and real- world datasets to answer the following questions:

Q1.Universality of the entropy gap over arbitrary simple graphs:

Is the entropy gap close to 0 for a wide range of graphs? Is the structural information a good proxy of the von Neumann graph entropy for a wide range of graphs?

Q2.Sensitivity of the entropy gap to graph properties: How do graph properties affect the value of entropy gap?

Q3.Accuracy of the approximation: As a proxy of the von Neu- mann graph entropy, is the structural information more accurate than its prominent competitors?

(8)

Table 2: Real-world datasets used in our experiments.

Name #Nodes #Edges Category Statistics Static graphs without timestamps Avg. degree

Zachary (ZA) 34 78 Friendship 4.59

Dolphins (DO) 62 159 Animal 5.13

Jazz ( JA) 198 2,742 Contact 27.70

Skitter (SK) 1,696,415 11,095,298 Internet 13.08 Brightkite (BK) 58,228 214,078 Friendship 7.35 Caida (CA) 26,475 53,381 Internet 4.03 YouTube (YT) 1,134,890 2,987,624 Friendship 5.27 LiveJournal (LJ) 3,997,962 34,681,189 Friendship 17.35 Pokec (PK) 1,632,803 22,301,964 Friendship 27.32 Dynamic graphs with timestamps #Snapshots Wiki-IT (WK) 1,204,009 34,826,283 Hyperlink 100 Facebook (FB) 61,096 788,135 Friendship 29

Q4. Speed of the computation: Is the computation of the structural information faster than its prominent competitors?

Q5. Extensibility of the entropy gap to weighted graphs: Is the entropy gap sensitive to the change of edge weights? Is the entropy gap still close to 0 for weighted graphs?

Q6. Performance analysis (Appendix A): What is the performance of EntropyAug (Algorithm 1) in maximizing the von Neumann graph entropy? What is the performance of IncreSim (Algo- rithm 2) in analyzing graph streams? Can the structural information distance be further used to detect anomalies in a graph stream?

6.1 Experimental Settings

Datasets: We consider both synthetic graphs and real-world graphs.

The synthetic graphs are generated from three well-known random graph models: Erdös-Rényi (ER) model, Barabási-Albert (BA) model [3], and Watts-Strogatz (WS) model [43]. The real-world graphs [24, 27, 42] used in our experiments are listed in Table 2, which contain both static graphs with varying size and average degree, and temporal graphs with varying size and time span. In every static graph, we ignore the direction and weight of all edges and remove both self-loops and multiple edges. We treat every temporal graph as a stream of undirected weighted edges with timestamps. For the convenience of analysis, we partition these edges into several groups where each group is within a certain time interval.

Hardwares: The experiments have been performed on a server with Intel(R) Xeon(R) CP U 2.40 GHz (32 virtual cores) and 256GB RAM, averaging 10 runs for random algorithms and random inputs unless stated otherwise.

Implementation: All of the proposed algorithms and baselines are implemented in Python.

Reproducibility: The code and datasets used in the paper are available at https://github.com/xuecheng27/WWW21- Structural- Information.

6.2 Q1. Universality (Fig. 2)

To evaluate the universality of the entropy gap, we measure the structural information and exact von Neumann entropy on a set of synthetic graphs with 2,000 nodes. For the ER and BA models,

we generate graphs with average degree in{2, 4, . . . , 200}. For the WS model, we generate graphs with edge rewiring probability in{0, 1/20, . . . , 1} for each average degree in {6, 10, 20, 50}. We additionally measure the sharpened lower and upper bounds of the entropy gap. The results are shown in Fig. 2.

The observations are three fold. First,the entropy gap is close to 0 for a wide range of graphs. The entropy gap of each synthetic graph is no more than 0.2, whereas the exact von Neumann entropy is greater than 10. Second,the entropy gap is negatively correlated with the average degree. Dense graph tends to have very small entropy gap. Third,the structural information is lin- early correlated with the von Neumann graph entropy, with only few exceptions. There is no exception for the ER synthetic graphs. For the BA synthetic graphs, the exceptions are those graphs with extremely small average degree. For the WS synthetic graphs, the exceptions are those graphs with extremely small edge rewiring probability.

6.3 Q2. Sensitivity (Fig. 2, Fig. 3)

To evaluate the sensitivity of the entropy gap to graph properties such as average degree, graph size, and rewiring probability, we further measure the entropy gap of 10 synthetic graphs with average degree in{500, 1000, . . . , 5000} for each random model.

The average degree is chosen from{2, 5, 10, 20, 50, 100} for ER and BA models, and the edge rewiring probability is chosen from {0, 0.1, 0.2, 0.4, 0.8, 1} for the WS model.

The observations from Fig. 2 and Fig. 3 are three fold. First, the entropy gap decreases as the average degree increases for all the three random graph models. Second, the entropy gap decreases as the edge rewiring probability increases for the WS model. Third, the entropy gap is nearly insensitive to the change of graph size.

6.4 Q3. Accuracy (Fig. 4)

To evaluate the accuracy of the structural information as an approximation of the von Neumann graph entropy, we measure the structural information, exact von Neumann entropy (when the graph size is small), and three prominent approximations (as competitors) in 9 real-world static graphs. The competitors are 1) FINGER- bH [7] defined asHbF(G) = −Q log2(λmax/tr(L)) where Q = 1 − tr(L²)/tr²(L), 2) FINGER- eH [7] defined asHeF(G) = −Q log2(2dmax/tr(L)), and 3) SLaQ [40]. The results in Fig. 4 show thatthe structural information is an accurate approximation of the von Neumann graph entropy. The approximation error of structural information is obviously much smaller thanHbFandHeF. And it is comparable to the approximation error of SLaQ with only few exceptions such as YT and SK where the structural information is slightly better.

6.5 Q4. Speed (Fig. 5)

To evaluate computational speed of the structural information, we measure the running time of structural information and its three competitors in 9 real-world static graphs. The results in Fig. 5 show thatthe computation of structural information is fast. It is about 2 orders of magnitude faster thanHbF, at least 2 orders of magnitude faster than SLaQ, and comparable toHeF. Combining