wall time (second per 10,000 runs)

(1)

Triangle Counting in Large Sparse Graph

Meng-Tsung Tsai [email protected]

(2)

Problem Setting

(3)

Problem Setting(1/3)

Goal:

Calculating the cluster coefficient of a given graph G(V, E), where |V | = n and |E| = m.

(4)

Problem Setting(1/3)

Goal:

Cluster coefficient indicates the probability that the friend of one’s friend is also one’s friend.

(5)

Problem Setting(1/3)

Goal:

Cluster coefficient is one of the important features to examine whether a man-made graph fits to the real one.

(6)

Problem Setting(1/3)

Goal:

(7)

Problem Setting(1/3)

Goal:

In terms of graph theory,

CC(G) = 3 × number of triangles ∈ G

number of triples ∈ G .

u

TT u

TT

(8)

Problem Setting(2/3)

Example:

TT

T```

z z

(9)

Problem Setting(2/3)

Example:

TT

T```

z z

number of triangle = 2

(10)

Problem Setting(2/3)

Example:

TT

T```

z z

number of triangle = 2 number of triple = 8

(11)

Problem Setting(2/3)

Example:

TT

T```

z z

cluster coefficient = 3 × 2 / 8 = 0.75

(12)

Problem Setting(2/3)

Example:

TT

T```

z z

(13)

Problem Setting(3/3)

Requirement:

Seeking for an efficient algorithm to count the number of triangles such that it takes Ω(m) space and Ω(n³) time.

(14)

Problem Setting(3/3)

Requirement:

We focus on social network graphs which cluster coefficient is especially important in.

(15)

Problem Setting(3/3)

Requirement:

We focus on social network graphs which cluster coefficient is especially important in.

In social network, the fact that m = ω(n²) usually holds.

(16)

Triangle Counting (Trivial Algorithm)

(17)

Trivial Algorithm

z z

z

TT T

u v

(18)

Trivial Algorithm

z z

z

TT T

u v +

(19)

Trivial Algorithm

z z

z

TT T

u v +

z z

u v

(20)

Trivial Algorithm

z z

z

TT T

u v +

z z

u v =

(21)

Trivial Algorithm

z z

z

TT T

u v +

z z

u v =

z z

z

TT T

(22)

Trivial Algorithm

z z

z

TT T

u v +

z z

u v =

z z

z

TT T

Let M be a matrix such that M_i,j is 1 if f an edge to connect vertices i and j exists.

(23)

Trivial Algorithm

z z

z

TT T

u v +

z z

u v =

z z

z

TT T

Let M² be M · M. What does M_i,j² mean?

(24)

Trivial Algorithm

z z

z

TT T

u v +

z z

u v =

z z

z

TT T

△ = 1 P M² · M^i,j

(25)

Trivial Algorithm

z z

z

TT T

u v +

z z

u v =

z z

z

TT T

△ = 1

6 P M_i,j² · M^i,j

Simple Matrix Multiplication, Strassen Algorithm, and Winograd Algorithm all require O(n²) space to obtain M². Not Acceptable!

(26)

Triangle Counting (Forward

Algorithm)

(27)

Forward Algorithm(1/2)

TT TT TT

``````

~ ~

(28)

Forward Algorithm(1/2)

TT TT TT

``````

~ ~

1

2

3

4

(29)

Forward Algorithm(1/2)

TT TT TT

``````

~ ~

1

2

3

4

{}

{1}

{2}

{1, 2, 3}

(30)

Forward Algorithm(1/2)

TT TT TT

``````

~ ~

1

2

3

4

{}

{1}

{2}

{1, 2, 3}

{1} ∩ {1, 2, 3} = {1}

(31)

Forward Algorithm(1/2)

TT TT TT

``````

~ ~

1

2

3

4

{}

{1}

{2}

{1, 2, 3}

{1} ∩ {1, 2, 3} = {1}

△ = P

edge(u,v)∈E|N^u ∩ N^v|

(32)

Forward Algorithm(1/2)

TT TT TT

``````

~ ~

1

2

3

4

{}

{1}

{2}

{1, 2, 3}

{1} ∩ {1, 2, 3} = {1}

△ = P

|N ∩ N |

(33)

Forward Algorithm(1/2)

TT TT TT

``````

~ ~

1

2

3

4

{}

{1}

{2}

{1, 2, 3}

{1} ∩ {1, 2, 3} = {1}

△ = P

all triangles can be found + all found objects are triangles

(34)

Forward Algorithm(1/2)

TT TT TT

``````

~ ~

1

2

3

4

{}

{1}

{2}

{1, 2, 3}

{1} ∩ {1, 2, 3} = {1}

TT TT TT

``````

~ ~

4

2

3

1

△ = P

|N ∩ N |

(35)

Forward Algorithm(1/2)

TT TT TT

``````

~ ~

1

2

3

4

{}

{1}

{2}

{1, 2, 3}

{1} ∩ {1, 2, 3} = {1}

TT TT TT

``````

~ ~

4

2

3

1

{1, 2}

{1}

{1, 2}

{}

△ = P

all triangles can be found + all found objects are triangles

(36)

Forward Algorithm(1/2)

TT TT TT

``````

~ ~

1

2

3

4

{}

{1}

{2}

{1, 2, 3}

{1} ∩ {1, 2, 3} = {1}

TT TT TT

``````

~ ~

4

2

3

1

{1, 2}

{1}

{1, 2}

{}

△ = P

|N ∩ N |

(37)

Forward Algorithm(2/2)

Assign indices to vertices according to their degree. The higher the degree of a vertex is, the lower the index of it is.

(38)

Forward Algorithm(2/2)

If degree of vertex v ≤ √

2m, |N^v| ≤ √

2m.

(39)

Forward Algorithm(2/2)

2m, |N^v| ≤ √

2m.

If degree of vertex v >= k, at most 2m/k vertices with higher degree. Thus, |N^v| <= √

2m where deg(v) ≥ √

2m.

(40)

Forward Algorithm(2/2)

2m, |N^v| ≤ √

2m.

If degree of vertex v >= k, at most 2m/k vertices with higher degree. Thus, |N^v| <= √

2m where deg(v) ≥ √

2m.

(41)

Triangle Counting (Four Russians’

Algorithm)

(42)

Four-Russians’ Algorithm

{1, 0, 1, 1, . . .}

{0, 1, 0, 0, . . .}

. . .

(43)

Four-Russians’ Algorithm

{

sector

z}|{1, 0, 1, 1, . . .} {2, 3, . . .}

{0, 1, 0, 0, . . .} {1, 0, . . .}

. . . .

(44)

Four-Russians’ Algorithm

{

sector

z}|{1, 0, 1, 1, . . .} {2, 3, . . .}

{0, 1, 0, 0, . . .} {1, 0, . . .}

. . . .

0 1 2 3

0 0 0 0 0

1 0 1 0 1

2 0 0 1 1

(45)

Four-Russians’ Algorithm

{

sector

z}|{1, 0, 1, 1, . . .} {2, 3, . . .}

{0, 1, 0, 0, . . .} {1, 0, . . .}

. . . .

0 1 2 3

0 0 0 0 0

1 0 1 0 1

2 0 0 1 1

3 0 1 1 2

The table utilized in Four-Russians’ Algorithm is 2^{log n} by 2^{log n}. Thus, its speedup is O(log n).

(46)

Triangle Counting (FFR Algorithm)

(47)

FFR Algorithm

The red part of △ = P

edge(u,v)∈E |N^u ∩ N^v| in Forward Algorithm can be sped up with Four-Russians’

Algorithm.

(48)

FFR Algorithm

Algorithm.

Let the length of sectors be ¹₂ log m, additional space for table is Θ(m).

(49)

FFR Algorithm

Algorithm.

The number of non-all-zero sectors in N_v is O(p

m/ log m) where deg(v) ≤ p

m/ log m.

(50)

FFR Algorithm

Algorithm.

m/ log m.

(51)

FFR Algorithm

Algorithm.

m/ log m. The number of non-all-zero sectors in N_v is O(p

m/ log m) where deg(v) ≥ p

m/ log m.

FFR needs O(m^3/2/ log^1/2 m) time.

(52)

CPU Instruction versus Memory

Access

(53)

Instruction versus Memory(1/3)

The inner product in Four-Russians’ Algorithm can be accomplished by two CPU instructions. It is known that the execution speed of CPU instruction is much faster than

that of memory access.

(54)

Instruction versus Memory(1/3)

"logical and" C = A ˚∧ B, C_i = min(A_i, B_i)

(55)

Instruction versus Memory(1/3)

"logical and" C = A ˚∧ B, C_i = min(A_i, B_i)

"population count" d = ˚σ A, d = P_g

i=1 A_i

(56)

Instruction versus Memory(2/3)

2.5 3 3.5 4

wall time (second per 10,000 runs)

ALGO 5 ALGO 2 with p= 8 ALGO 2 with p=16

(57)

Instruction versus Memory(2/3)

0 5 10 15 20 25 30

0 10 20 30 40 50 60

wall time (second per 10,000 runs)

bit density (x out of 64 bits are 1)

ALGO 2 with p= 8 ALGO 2 with p=16 ALGO 2 with p=22

(58)

Instruction versus Memory(3/3)

CPU instructions can handle sectors of size g, where g is the length of CPU register.

(59)

Instruction versus Memory(3/3)

Is g a constant in the analysis of algorithm?

(60)

Instruction versus Memory(3/3)

Is g a constant in the analysis of algorithm?

Are all instructions O(1)-executable?

(61)

Is g a constant?

(62)

Is g a constant?

(63)

Is g a constant?

Assume a program executed on M, a random access machine, using Θ(S) memory space.

(64)

Is g a constant?

Θ(S) memory address is required.

(65)

Is g a constant?

Θ(S) memory address is required.

The length of the registers in M is Ω(log S).

(66)

Are all instructions O(1) -executable?

(67)

Are all instructions O(1) -executable?

(68)

Are all instructions O(1) -executable?

AC⁰ instructions are those which can be realized with polynomial size and constant depth circuit.

(69)

Are all instructions O(1) -executable?

Multiplication is not an AC⁰ instruction.

(70)

Are all instructions O(1) -executable?

To access multi-dimension array in constant time, multiplication must be constant time executable.

(71)

Are all instructions O(1) -executable?

To access multi-dimension array in constant time, multiplication must be constant time executable.

We suggest those instructions can be implemented faster than multiplication is constant time

executable.

(72)

Population Count

(73)

Population Count(1/3)

(74)

Population Count(1/3)

˚σ is not supported by all types of CPU.

(75)

Population Count(1/3)

Any alternative way?

(76)

Population Count(1/3)

The previous work shows a bitwise twiddling method to realize the population count. The method needs

O(log⁽²⁾ g) basic instructions. Hence, the speedup is O(g^1/2/ log⁽²⁾ g) = Ω(log^1/2 m/ log⁽³⁾ m) due to

g = Ω(log m).

(77)

Population Count(1/3)

g = Ω(log m).

Any faster solution?

(78)

Population Count(1/3)

g = Ω(log m).

(79)

Population Count(2/3)

{ 1 1 0 0 }

{ 1 0 1 0 }

{ 1 1 0 0 }

(80)

Population Count(2/3)

{ 1 1 0 0 }

{ 1 0 1 0 }

+ { 1 1 0 0 } 2⁰ { 1 0 1 0 } 2¹ { 1 1 0 0 }

(81)

Population Count(2/3)

{ 1 1 0 0 }

{ 1 0 1 0 }

+ { 1 1 0 0 } 2⁰ { 1 0 1 0 } 2¹ { 1 1 0 0 }

Using this method to reduce 2^d − 1 ˚σ into d ˚σ.

(82)

Population Count(2/3)

{ 1 1 0 0 }

{ 1 0 1 0 }

+ { 1 1 0 0 } 2⁰ { 1 0 1 0 } 2¹ { 1 1 0 0 }

Using this method to reduce 2^d − 1 ˚σ into d ˚σ. The speedup is Ω(log^1/2 m/ log⁽⁴⁾ m).

(83)

Instruction versus Memory(2/3)

0 0.5 1 1.5 2 2.5 3

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

elapsed wall time (second)

rewiring probability

ALGO 3 ALGO 7[12 <- ALGO 10]

ALGO 7[12 <- ALGO 12]

(84)

Instruction versus Memory(2/3)

40 60 80 100

speedup relative to ALGO 3(%)

ALGO 7[12 <- ALGO 10]

ALGO 7[12 <- ALGO 12]

(85)

Conclusion

(86)

Conclusion

(87)

Conclusion

Previous efficient algorithm, Forward Algorithm, needs O(m^3/2) time and O(m) space.

(88)

Conclusion

To develop algorithms on random access machines, we come up with two arguments.

(89)

Conclusion

Based on the arguments, our algorithm has Ω(log^1/2 m/ log⁽⁴⁾ m) speedup.

(90)

Conclusion

Based on the arguments, our algorithm has Ω(log^1/2 m/ log⁽⁴⁾ m) speedup.

Though it may slightly worse than FFR Algorithm in

(91)

Future Work

(92)

Future Work

(93)

Future Work

Maybe some graph features are more proper to analyze than degeneracy when the algorithm to

calculate the intersection of given two sets changed.

(94)

Future Work

Maybe some graph features are more proper to analyze than degeneracy when the algorithm to

calculate the intersection of given two sets changed.

The same arguments on random access machines can be applied to many other algorithms.

(95)