• 沒有找到結果。

# wall time (second per 10,000 runs)

N/A
N/A
Protected

Share "wall time (second per 10,000 runs)"

Copied!
95
0
0

(1)

## Triangle Counting in Large SparseGraph

Meng-Tsung Tsai r95065@cise.ntu.edu.tw

(2)

## Problem Setting

(3)

### Problem Setting(1/3)

Goal:

Calculating the cluster coefficient of a given graph G(V, E), where |V | = n and |E| = m.

(4)

### Problem Setting(1/3)

Goal:

Calculating the cluster coefficient of a given graph G(V, E), where |V | = n and |E| = m.

Cluster coefficient indicates the probability that the friend of one’s friend is also one’s friend.

(5)

### Problem Setting(1/3)

Goal:

Calculating the cluster coefficient of a given graph G(V, E), where |V | = n and |E| = m.

Cluster coefficient indicates the probability that the friend of one’s friend is also one’s friend.

Cluster coefficient is one of the important features to examine whether a man-made graph fits to the real one.

(6)

### Problem Setting(1/3)

Goal:

Calculating the cluster coefficient of a given graph G(V, E), where |V | = n and |E| = m.

Cluster coefficient indicates the probability that the friend of one’s friend is also one’s friend.

Cluster coefficient is one of the important features to examine whether a man-made graph fits to the real one.

(7)

### Problem Setting(1/3)

Goal:

Calculating the cluster coefficient of a given graph G(V, E), where |V | = n and |E| = m.

Cluster coefficient indicates the probability that the friend of one’s friend is also one’s friend.

Cluster coefficient is one of the important features to examine whether a man-made graph fits to the real one.

In terms of graph theory,

CC(G) = 3 × number of triangles ∈ G

number of triples ∈ G .

u

 TT u

 TT

(8)

Example:

 TT

T```





z z

z z

(9)

### Problem Setting(2/3)

Example:

 TT

T```





z z

z z

number of triangle = 2

(10)

### Problem Setting(2/3)

Example:

 TT

T```





z z

z z

number of triangle = 2 number of triple = 8

(11)

### Problem Setting(2/3)

Example:

 TT

T```





z z

z z

number of triangle = 2 number of triple = 8

cluster coefficient = 3 × 2 / 8 = 0.75

(12)

### Problem Setting(2/3)

Example:

 TT

T```





z z

z z

number of triangle = 2 number of triple = 8

(13)

### Problem Setting(3/3)

Requirement:

Seeking for an efficient algorithm to count the number of triangles such that it takes Ω(m) space and Ω(n3) time.

(14)

### Problem Setting(3/3)

Requirement:

Seeking for an efficient algorithm to count the number of triangles such that it takes Ω(m) space and Ω(n3) time.

We focus on social network graphs which cluster coefficient is especially important in.

(15)

### Problem Setting(3/3)

Requirement:

Seeking for an efficient algorithm to count the number of triangles such that it takes Ω(m) space and Ω(n3) time.

We focus on social network graphs which cluster coefficient is especially important in.

In social network, the fact that m = ω(n2) usually holds.

(16)

## Triangle Counting (Trivial Algorithm)

(17)

z z

z

 TT T

u v

(18)

z z

z

 TT T

u v +

(19)

z z

z

 TT T

u v +

z z

u v

(20)

z z

z

 TT T

u v +

z z

u v =

(21)

z z

z

 TT T

u v +

z z

u v =

z z

z

 TT T

(22)

### Trivial Algorithm

z z

z

 TT T

u v +

z z

u v =

z z

z

 TT T

Let M be a matrix such that Mi,j is 1 if f an edge to connect vertices i and j exists.

(23)

### Trivial Algorithm

z z

z

 TT T

u v +

z z

u v =

z z

z

 TT T

Let M be a matrix such that Mi,j is 1 if f an edge to connect vertices i and j exists.

Let M2 be M · M. What does Mi,j2 mean?

(24)

### Trivial Algorithm

z z

z

 TT T

u v +

z z

u v =

z z

z

 TT T

Let M be a matrix such that Mi,j is 1 if f an edge to connect vertices i and j exists.

Let M2 be M · M. What does Mi,j2 mean?

△ = 1 P M2 · Mi,j

(25)

### Trivial Algorithm

z z

z

 TT T

u v +

z z

u v =

z z

z

 TT T

Let M be a matrix such that Mi,j is 1 if f an edge to connect vertices i and j exists.

Let M2 be M · M. What does Mi,j2 mean?

△ = 1

6 P Mi,j2 · Mi,j

Simple Matrix Multiplication, Strassen Algorithm, and Winograd Algorithm all require O(n2) space to obtain M2. Not Acceptable!

(26)

## Algorithm)

(27)



TT TT TT

``````





~ ~

~ ~

(28)



TT TT TT

``````





~ ~

~ ~

1

2

3

4

(29)



TT TT TT

``````





~ ~

~ ~

1

2

3

4

{}

{1}

{2}

{1, 2, 3}

(30)

### Forward Algorithm(1/2)



TT TT TT

``````





~ ~

~ ~

1

2

3

4

{}

{1}

{2}

{1, 2, 3}

{1} ∩ {1, 2, 3} = {1}

(31)

### Forward Algorithm(1/2)



TT TT TT

``````





~ ~

~ ~

1

2

3

4

{}

{1}

{2}

{1, 2, 3}

{1} ∩ {1, 2, 3} = {1}

△ = P

edge(u,v)∈E|Nu ∩ Nv|

(32)

### Forward Algorithm(1/2)



TT TT TT

``````





~ ~

~ ~

1

2

3

4

{}

{1}

{2}

{1, 2, 3}

{1} ∩ {1, 2, 3} = {1}

△ = P

|N ∩ N |

(33)

### Forward Algorithm(1/2)



TT TT TT

``````





~ ~

~ ~

1

2

3

4

{}

{1}

{2}

{1, 2, 3}

{1} ∩ {1, 2, 3} = {1}

△ = P

edge(u,v)∈E|Nu ∩ Nv|

all triangles can be found + all found objects are triangles

(34)

### Forward Algorithm(1/2)



TT TT TT

``````





~ ~

~ ~

1

2

3

4

{}

{1}

{2}

{1, 2, 3}

{1} ∩ {1, 2, 3} = {1}



TT TT TT

``````





~ ~

~ ~

4

2

3

1

△ = P

|N ∩ N |

(35)

### Forward Algorithm(1/2)



TT TT TT

``````





~ ~

~ ~

1

2

3

4

{}

{1}

{2}

{1, 2, 3}

{1} ∩ {1, 2, 3} = {1}



TT TT TT

``````





~ ~

~ ~

4

2

3

1

{1, 2}

{1}

{1, 2}

{}

△ = P

edge(u,v)∈E|Nu ∩ Nv|

all triangles can be found + all found objects are triangles

(36)

### Forward Algorithm(1/2)



TT TT TT

``````





~ ~

~ ~

1

2

3

4

{}

{1}

{2}

{1, 2, 3}

{1} ∩ {1, 2, 3} = {1}



TT TT TT

``````





~ ~

~ ~

4

2

3

1

{1, 2}

{1}

{1, 2}

{}

△ = P

|N ∩ N |

(37)

### Forward Algorithm(2/2)

Assign indices to vertices according to their degree. The higher the degree of a vertex is, the lower the index of it is.

(38)

### Forward Algorithm(2/2)

Assign indices to vertices according to their degree. The higher the degree of a vertex is, the lower the index of it is.

If degree of vertex v ≤ √

2m, |Nv| ≤ √

2m.

(39)

### Forward Algorithm(2/2)

Assign indices to vertices according to their degree. The higher the degree of a vertex is, the lower the index of it is.

If degree of vertex v ≤ √

2m, |Nv| ≤ √

2m.

If degree of vertex v >= k, at most 2m/k vertices with higher degree. Thus, |Nv| <= √

2m where deg(v) ≥ √

2m.

(40)

### Forward Algorithm(2/2)

Assign indices to vertices according to their degree. The higher the degree of a vertex is, the lower the index of it is.

If degree of vertex v ≤ √

2m, |Nv| ≤ √

2m.

If degree of vertex v >= k, at most 2m/k vertices with higher degree. Thus, |Nv| <= √

2m where deg(v) ≥ √

2m.

(41)

## Algorithm)

(42)

### Four-Russians’ Algorithm

{1, 0, 1, 1, . . .}

{0, 1, 0, 0, . . .}

. . .

(43)

### Four-Russians’ Algorithm

{

sector

z}|{1, 0, 1, 1, . . .} {2, 3, . . .}

{0, 1, 0, 0, . . .} {1, 0, . . .}

. . . .

(44)

### Four-Russians’ Algorithm

{

sector

z}|{1, 0, 1, 1, . . .} {2, 3, . . .}

{0, 1, 0, 0, . . .} {1, 0, . . .}

. . . .

0 1 2 3

0 0 0 0 0

1 0 1 0 1

2 0 0 1 1

(45)

### Four-Russians’ Algorithm

{

sector

z}|{1, 0, 1, 1, . . .} {2, 3, . . .}

{0, 1, 0, 0, . . .} {1, 0, . . .}

. . . .

0 1 2 3

0 0 0 0 0

1 0 1 0 1

2 0 0 1 1

3 0 1 1 2

The table utilized in Four-Russians’ Algorithm is 2log n by 2log n. Thus, its speedup is O(log n).

(46)

## Triangle Counting (FFR Algorithm)

(47)

### FFR Algorithm

The red part of △ = P

edge(u,v)∈E |Nu ∩ Nv| in Forward Algorithm can be sped up with Four-Russians’

Algorithm.

(48)

### FFR Algorithm

The red part of △ = P

edge(u,v)∈E |Nu ∩ Nv| in Forward Algorithm can be sped up with Four-Russians’

Algorithm.

Let the length of sectors be 12 log m, additional space for table is Θ(m).

(49)

### FFR Algorithm

The red part of △ = P

edge(u,v)∈E |Nu ∩ Nv| in Forward Algorithm can be sped up with Four-Russians’

Algorithm.

Let the length of sectors be 12 log m, additional space for table is Θ(m).

The number of non-all-zero sectors in Nv is O(p

m/ log m) where deg(v) ≤ p

m/ log m.

(50)

### FFR Algorithm

The red part of △ = P

edge(u,v)∈E |Nu ∩ Nv| in Forward Algorithm can be sped up with Four-Russians’

Algorithm.

Let the length of sectors be 12 log m, additional space for table is Θ(m).

The number of non-all-zero sectors in Nv is O(p

m/ log m) where deg(v) ≤ p

m/ log m.

(51)

### FFR Algorithm

The red part of △ = P

edge(u,v)∈E |Nu ∩ Nv| in Forward Algorithm can be sped up with Four-Russians’

Algorithm.

Let the length of sectors be 12 log m, additional space for table is Θ(m).

The number of non-all-zero sectors in Nv is O(p

m/ log m) where deg(v) ≤ p

m/ log m. The number of non-all-zero sectors in Nv is O(p

m/ log m) where deg(v) ≥ p

m/ log m.

FFR needs O(m3/2/ log1/2 m) time.

(52)

## Access

(53)

### Instruction versus Memory(1/3)

The inner product in Four-Russians’ Algorithm can be accomplished by two CPU instructions. It is known that the execution speed of CPU instruction is much faster than

that of memory access.

(54)

### Instruction versus Memory(1/3)

The inner product in Four-Russians’ Algorithm can be accomplished by two CPU instructions. It is known that the execution speed of CPU instruction is much faster than

that of memory access.

"logical and" C = A ˚∧ B, Ci = min(Ai, Bi)

(55)

### Instruction versus Memory(1/3)

The inner product in Four-Russians’ Algorithm can be accomplished by two CPU instructions. It is known that the execution speed of CPU instruction is much faster than

that of memory access.

"logical and" C = A ˚∧ B, Ci = min(Ai, Bi)

"population count" d = ˚σ A, d = Pg

i=1 Ai

(56)

### Instruction versus Memory(2/3)

2.5 3 3.5 4

wall time (second per 10,000 runs)

ALGO 5 ALGO 2 with p= 8 ALGO 2 with p=16

(57)

### Instruction versus Memory(2/3)

0 5 10 15 20 25 30

0 10 20 30 40 50 60

wall time (second per 10,000 runs)

bit density (x out of 64 bits are 1)

ALGO 2 with p= 8 ALGO 2 with p=16 ALGO 2 with p=22

(58)

### Instruction versus Memory(3/3)

CPU instructions can handle sectors of size g, where g is the length of CPU register.

(59)

### Instruction versus Memory(3/3)

CPU instructions can handle sectors of size g, where g is the length of CPU register.

Is g a constant in the analysis of algorithm?

(60)

### Instruction versus Memory(3/3)

CPU instructions can handle sectors of size g, where g is the length of CPU register.

Is g a constant in the analysis of algorithm?

Are all instructions O(1)-executable?

(61)

## Is g a constant?

(62)

(63)

### Is g a constant?

Assume a program executed on M, a random access machine, using Θ(S) memory space.

(64)

### Is g a constant?

Assume a program executed on M, a random access machine, using Θ(S) memory space.

Θ(S) memory address is required.

(65)

### Is g a constant?

Assume a program executed on M, a random access machine, using Θ(S) memory space.

Θ(S) memory address is required.

The length of the registers in M is Ω(log S).

(66)

## Are all instructions O(1) -executable?

(67)

(68)

### Are all instructions O(1) -executable?

AC0 instructions are those which can be realized with polynomial size and constant depth circuit.

(69)

### Are all instructions O(1) -executable?

AC0 instructions are those which can be realized with polynomial size and constant depth circuit.

Multiplication is not an AC0 instruction.

(70)

### Are all instructions O(1) -executable?

AC0 instructions are those which can be realized with polynomial size and constant depth circuit.

Multiplication is not an AC0 instruction.

To access multi-dimension array in constant time, multiplication must be constant time executable.

(71)

### Are all instructions O(1) -executable?

AC0 instructions are those which can be realized with polynomial size and constant depth circuit.

Multiplication is not an AC0 instruction.

To access multi-dimension array in constant time, multiplication must be constant time executable.

We suggest those instructions can be implemented faster than multiplication is constant time

executable.

(72)

## Population Count

(73)

(74)

### Population Count(1/3)

˚σ is not supported by all types of CPU.

(75)

### Population Count(1/3)

˚σ is not supported by all types of CPU.

Any alternative way?

(76)

### Population Count(1/3)

˚σ is not supported by all types of CPU.

Any alternative way?

The previous work shows a bitwise twiddling method to realize the population count. The method needs

O(log(2) g) basic instructions. Hence, the speedup is O(g1/2/ log(2) g) = Ω(log1/2 m/ log(3) m) due to

g = Ω(log m).

(77)

### Population Count(1/3)

˚σ is not supported by all types of CPU.

Any alternative way?

The previous work shows a bitwise twiddling method to realize the population count. The method needs

O(log(2) g) basic instructions. Hence, the speedup is O(g1/2/ log(2) g) = Ω(log1/2 m/ log(3) m) due to

g = Ω(log m).

Any faster solution?

(78)

### Population Count(1/3)

˚σ is not supported by all types of CPU.

Any alternative way?

The previous work shows a bitwise twiddling method to realize the population count. The method needs

O(log(2) g) basic instructions. Hence, the speedup is O(g1/2/ log(2) g) = Ω(log1/2 m/ log(3) m) due to

g = Ω(log m).

(79)

{ 1 1 0 0 }

{ 1 0 1 0 }

{ 1 1 0 0 }

(80)

### Population Count(2/3)

{ 1 1 0 0 }

{ 1 0 1 0 }

+ { 1 1 0 0 } 20 { 1 0 1 0 } 21 { 1 1 0 0 }

(81)

### Population Count(2/3)

{ 1 1 0 0 }

{ 1 0 1 0 }

+ { 1 1 0 0 } 20 { 1 0 1 0 } 21 { 1 1 0 0 }

Using this method to reduce 2d − 1 ˚σ into d ˚σ.

(82)

### Population Count(2/3)

{ 1 1 0 0 }

{ 1 0 1 0 }

+ { 1 1 0 0 } 20 { 1 0 1 0 } 21 { 1 1 0 0 }

Using this method to reduce 2d − 1 ˚σ into d ˚σ. The speedup is Ω(log1/2 m/ log(4) m).

(83)

### Instruction versus Memory(2/3)

0 0.5 1 1.5 2 2.5 3

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

elapsed wall time (second)

rewiring probability

ALGO 3 ALGO 7[12 <- ALGO 10]

ALGO 7[12 <- ALGO 12]

(84)

### Instruction versus Memory(2/3)

40 60 80 100

speedup relative to ALGO 3(%)

ALGO 7[12 <- ALGO 10]

ALGO 7[12 <- ALGO 12]

(85)

## Conclusion

(86)

(87)

### Conclusion

Previous efficient algorithm, Forward Algorithm, needs O(m3/2) time and O(m) space.

(88)

### Conclusion

Previous efficient algorithm, Forward Algorithm, needs O(m3/2) time and O(m) space.

To develop algorithms on random access machines, we come up with two arguments.

(89)

### Conclusion

Previous efficient algorithm, Forward Algorithm, needs O(m3/2) time and O(m) space.

To develop algorithms on random access machines, we come up with two arguments.

Based on the arguments, our algorithm has Ω(log1/2 m/ log(4) m) speedup.

(90)

### Conclusion

Previous efficient algorithm, Forward Algorithm, needs O(m3/2) time and O(m) space.

To develop algorithms on random access machines, we come up with two arguments.

Based on the arguments, our algorithm has Ω(log1/2 m/ log(4) m) speedup.

Though it may slightly worse than FFR Algorithm in

(91)

## Future Work

(92)

(93)

### Future Work

Maybe some graph features are more proper to analyze than degeneracy when the algorithm to

calculate the intersection of given two sets changed.

(94)

### Future Work

Maybe some graph features are more proper to analyze than degeneracy when the algorithm to

calculate the intersection of given two sets changed.

The same arguments on random access machines can be applied to many other algorithms.

(95)

## Any Questions?

• Consider an algorithm that runs C for time kT (n) and rejects the input if C does not stop within the time bound.. • By Markov’s inequality, this new algorithm runs in time kT (n)

• Consider an algorithm that runs C for time kT (n) and rejects the input if C does not stop within the time bound.. • By Markov’s inequality, this new algorithm runs in time kT (n)

Here, a deterministic linear time and linear space algorithm is presented for the undirected single source shortest paths problem with positive integer weights.. The algorithm

He proposed a ﬁxed point algorithm and a gradient projection method with constant step size based on the dual formulation of total variation.. These two algorithms soon became

Breu and Kirk- patrick [35] (see [4]) improved this by giving O(nm 2 )-time algorithms for the domination and the total domination problems and an O(n 2.376 )-time algorithm for

O(log 2 k/ log log k)-Approximation Algorithm for Directed Steiner Tree: A Tight

When an algorithm contains a recursive call to itself, we can often describe its running time by a recurrenceequation or recurrence, which describes the overall running time on

Zdunek and Cichocki (2008, Algorithm 4) used a projected Barzilai-Borwein method to solve m independent problems (15) without line search.. (2009, Section 5) reported that on