**Triangle Counting in Large Sparse** **Graph**

Meng-Tsung Tsai r95065@cise.ntu.edu.tw

**Problem Setting**

**Problem Setting(1/3)**

**Problem Setting(1/3)**

**Goal:**

**Calculating the cluster coefficient of a given graph**
G(V, E), where |V | = n and |E| = m.

**Problem Setting(1/3)**

**Problem Setting(1/3)**

**Goal:**

**Calculating the cluster coefficient of a given graph**
G(V, E), where |V | = n and |E| = m.

**Cluster coefficient indicates the probability that the**
friend of one’s friend is also one’s friend.

**Problem Setting(1/3)**

**Problem Setting(1/3)**

**Goal:**

**Calculating the cluster coefficient of a given graph**
G(V, E), where |V | = n and |E| = m.

**Cluster coefficient indicates the probability that the**
friend of one’s friend is also one’s friend.

**Cluster coefficient is one of the important features to**
examine whether a man-made graph fits to the real
one.

**Problem Setting(1/3)**

**Problem Setting(1/3)**

**Goal:**

**Calculating the cluster coefficient of a given graph**
G(V, E), where |V | = n and |E| = m.

**Cluster coefficient indicates the probability that the**
friend of one’s friend is also one’s friend.

**Cluster coefficient is one of the important features to**
examine whether a man-made graph fits to the real
one.

**Problem Setting(1/3)**

**Problem Setting(1/3)**

**Goal:**

**Calculating the cluster coefficient of a given graph**
G(V, E), where |V | = n and |E| = m.

**Cluster coefficient indicates the probability that the**
friend of one’s friend is also one’s friend.

**Cluster coefficient is one of the important features to**
examine whether a man-made graph fits to the real
one.

In terms of graph theory,

CC(G) = 3 × number of triangles ∈ G

number of triples ∈ G .

u

TT u

TT

**Problem Setting(2/3)**

**Problem Setting(2/3)**

**Example:**

TT

T```

z z

z z

**Problem Setting(2/3)**

**Problem Setting(2/3)**

**Example:**

TT

T```

z z

z z

number of triangle = 2

**Problem Setting(2/3)**

**Problem Setting(2/3)**

**Example:**

TT

T```

z z

z z

number of triangle = 2 number of triple = 8

**Problem Setting(2/3)**

**Problem Setting(2/3)**

**Example:**

TT

T```

z z

z z

number of triangle = 2 number of triple = 8

cluster coefficient = 3 × 2 / 8 = 0.75

**Problem Setting(2/3)**

**Problem Setting(2/3)**

**Example:**

TT

T```

z z

z z

number of triangle = 2 number of triple = 8

**Problem Setting(3/3)**

**Problem Setting(3/3)**

**Requirement:**

Seeking for an efficient algorithm to count the number of
triangles such that it takes Ω(m) space and Ω(n^{3}) time.

**Problem Setting(3/3)**

**Problem Setting(3/3)**

**Requirement:**

Seeking for an efficient algorithm to count the number of
triangles such that it takes Ω(m) space and Ω(n^{3}) time.

**We focus on social network graphs which cluster**
**coefficient is especially important in.**

**Problem Setting(3/3)**

**Problem Setting(3/3)**

**Requirement:**

Seeking for an efficient algorithm to count the number of
triangles such that it takes Ω(m) space and Ω(n^{3}) time.

**We focus on social network graphs which cluster**
**coefficient is especially important in.**

In social network, the fact that m = ω(n^{2}) usually holds.

**Triangle Counting (Trivial Algorithm)**

**Trivial Algorithm**

**Trivial Algorithm**

z z

z

TT T

u v

**Trivial Algorithm**

**Trivial Algorithm**

z z

z

TT T

u v +

**Trivial Algorithm**

**Trivial Algorithm**

z z

z

TT T

u v +

z z

u v

**Trivial Algorithm**

**Trivial Algorithm**

z z

z

TT T

u v +

z z

u v =

**Trivial Algorithm**

**Trivial Algorithm**

z z

z

TT T

u v +

z z

u v =

z z

z

TT T

**Trivial Algorithm**

**Trivial Algorithm**

z z

z

TT T

u v +

z z

u v =

z z

z

TT T

Let M be a matrix such that M_{i,j} is 1 if f an edge to
connect vertices i and j exists.

**Trivial Algorithm**

**Trivial Algorithm**

z z

z

TT T

u v +

z z

u v =

z z

z

TT T

Let M be a matrix such that M_{i,j} is 1 if f an edge to
connect vertices i and j exists.

Let M^{2} be M · M. What does M_{i,j}^{2} mean?

**Trivial Algorithm**

**Trivial Algorithm**

z z

z

TT T

u v +

z z

u v =

z z

z

TT T

Let M be a matrix such that M_{i,j} is 1 if f an edge to
connect vertices i and j exists.

Let M^{2} be M · M. What does M_{i,j}^{2} mean?

△ = 1 P M^{2} · M^{i,j}

**Trivial Algorithm**

**Trivial Algorithm**

z z

z

TT T

u v +

z z

u v =

z z

z

TT T

Let M be a matrix such that M_{i,j} is 1 if f an edge to
connect vertices i and j exists.

Let M^{2} be M · M. What does M_{i,j}^{2} mean?

△ = 1

6 P M_{i,j}^{2} · M^{i,j}

**Simple Matrix Multiplication, Strassen Algorithm,**
**and Winograd Algorithm all require** O(n^{2}) space to
obtain M^{2}. Not Acceptable!

**Triangle Counting (Forward**

**Algorithm)**

**Forward Algorithm(1/2)**

**Forward Algorithm(1/2)**

TT TT TT

``````

~ ~

~ ~

**Forward Algorithm(1/2)**

**Forward Algorithm(1/2)**

TT TT TT

``````

~ ~

~ ~

1

2

3

4

**Forward Algorithm(1/2)**

**Forward Algorithm(1/2)**

TT TT TT

``````

~ ~

~ ~

1

2

3

4

{}

{1}

{2}

{1, 2, 3}

**Forward Algorithm(1/2)**

**Forward Algorithm(1/2)**

TT TT TT

``````

~ ~

~ ~

1

2

3

4

{}

{1}

{2}

{1, 2, 3}

{1} ∩ {1, 2, 3} = {1}

**Forward Algorithm(1/2)**

**Forward Algorithm(1/2)**

TT TT TT

``````

~ ~

~ ~

1

2

3

4

{}

{1}

{2}

{1, 2, 3}

{1} ∩ {1, 2, 3} = {1}

△ = P

edge(u,v)∈E|N^{u} ∩ N^{v}|

**Forward Algorithm(1/2)**

**Forward Algorithm(1/2)**

TT TT TT

``````

~ ~

~ ~

1

2

3

4

{}

{1}

{2}

{1, 2, 3}

{1} ∩ {1, 2, 3} = {1}

△ = P

|N ∩ N |

**Forward Algorithm(1/2)**

**Forward Algorithm(1/2)**

TT TT TT

``````

~ ~

~ ~

1

2

3

4

{}

{1}

{2}

{1, 2, 3}

{1} ∩ {1, 2, 3} = {1}

△ = P

edge(u,v)∈E|N^{u} ∩ N^{v}|

all triangles can be found + all found objects are triangles

**Forward Algorithm(1/2)**

**Forward Algorithm(1/2)**

TT TT TT

``````

~ ~

~ ~

1

2

3

4

{}

{1}

{2}

{1, 2, 3}

{1} ∩ {1, 2, 3} = {1}

TT TT TT

``````

~ ~

~ ~

4

2

3

1

△ = P

|N ∩ N |

**Forward Algorithm(1/2)**

**Forward Algorithm(1/2)**

TT TT TT

``````

~ ~

~ ~

1

2

3

4

{}

{1}

{2}

{1, 2, 3}

{1} ∩ {1, 2, 3} = {1}

TT TT TT

``````

~ ~

~ ~

4

2

3

1

{1, 2}

{1}

{1, 2}

{}

△ = P

edge(u,v)∈E|N^{u} ∩ N^{v}|

all triangles can be found + all found objects are triangles

**Forward Algorithm(1/2)**

**Forward Algorithm(1/2)**

TT TT TT

``````

~ ~

~ ~

1

2

3

4

{}

{1}

{2}

{1, 2, 3}

{1} ∩ {1, 2, 3} = {1}

TT TT TT

``````

~ ~

~ ~

4

2

3

1

{1, 2}

{1}

{1, 2}

{}

△ = P

|N ∩ N |

**Forward Algorithm(2/2)**

**Forward Algorithm(2/2)**

Assign indices to vertices according to their degree. The higher the degree of a vertex is, the lower the index of it is.

**Forward Algorithm(2/2)**

**Forward Algorithm(2/2)**

Assign indices to vertices according to their degree. The higher the degree of a vertex is, the lower the index of it is.

If degree of vertex v ≤ √

2m, |N^{v}| ≤ √

2m.

**Forward Algorithm(2/2)**

**Forward Algorithm(2/2)**

Assign indices to vertices according to their degree. The higher the degree of a vertex is, the lower the index of it is.

If degree of vertex v ≤ √

2m, |N^{v}| ≤ √

2m.

If degree of vertex v >= k, at most 2m/k vertices with
higher degree. Thus, |N^{v}| <= √

2m where deg(v) ≥ √

2m.

**Forward Algorithm(2/2)**

**Forward Algorithm(2/2)**

If degree of vertex v ≤ √

2m, |N^{v}| ≤ √

2m.

If degree of vertex v >= k, at most 2m/k vertices with
higher degree. Thus, |N^{v}| <= √

2m where deg(v) ≥ √

2m.

**Triangle Counting (Four Russians’**

**Algorithm)**

**Four-Russians’ Algorithm**

**Four-Russians’ Algorithm**

{1, 0, 1, 1, . . .}

{0, 1, 0, 0, . . .}

. . .

**Four-Russians’ Algorithm**

**Four-Russians’ Algorithm**

{

sector

z}|{1, 0, 1, 1, . . .} {2, 3, . . .}

{0, 1, 0, 0, . . .} {1, 0, . . .}

. . . .

**Four-Russians’ Algorithm**

**Four-Russians’ Algorithm**

{

sector

z}|{1, 0, 1, 1, . . .} {2, 3, . . .}

{0, 1, 0, 0, . . .} {1, 0, . . .}

. . . .

0 1 2 3

0 0 0 0 0

1 0 1 0 1

2 0 0 1 1

**Four-Russians’ Algorithm**

**Four-Russians’ Algorithm**

{

sector

z}|{1, 0, 1, 1, . . .} {2, 3, . . .}

{0, 1, 0, 0, . . .} {1, 0, . . .}

. . . .

0 1 2 3

0 0 0 0 0

1 0 1 0 1

2 0 0 1 1

3 0 1 1 2

The table utilized in Four-Russians’ Algorithm is 2^{log n}
by 2^{log n}. Thus, its speedup is O(log n).

**Triangle Counting (FFR Algorithm)**

**FFR Algorithm**

**FFR Algorithm**

The red part of △ = P

edge(u,v)∈E |N^{u} ∩ N^{v}| **in Forward**
**Algorithm can be sped up with Four-Russians’**

**Algorithm.**

**FFR Algorithm**

**FFR Algorithm**

The red part of △ = P

edge(u,v)∈E |N^{u} ∩ N^{v}| **in Forward**
**Algorithm can be sped up with Four-Russians’**

**Algorithm.**

Let the length of sectors be ^{1}_{2} log m, additional space
for table is Θ(m).

**FFR Algorithm**

**FFR Algorithm**

The red part of △ = P

edge(u,v)∈E |N^{u} ∩ N^{v}| **in Forward**
**Algorithm can be sped up with Four-Russians’**

**Algorithm.**

Let the length of sectors be ^{1}_{2} log m, additional space
for table is Θ(m).

The number of non-all-zero sectors in N_{v} is
O(p

m/ log m) where deg(v) ≤ p

m/ log m.

**FFR Algorithm**

**FFR Algorithm**

The red part of △ = P

edge(u,v)∈E |N^{u} ∩ N^{v}| **in Forward**
**Algorithm can be sped up with Four-Russians’**

**Algorithm.**

Let the length of sectors be ^{1}_{2} log m, additional space
for table is Θ(m).

The number of non-all-zero sectors in N_{v} is
O(p

m/ log m) where deg(v) ≤ p

m/ log m.

**FFR Algorithm**

**FFR Algorithm**

The red part of △ = P

edge(u,v)∈E |N^{u} ∩ N^{v}| **in Forward**
**Algorithm can be sped up with Four-Russians’**

**Algorithm.**

Let the length of sectors be ^{1}_{2} log m, additional space
for table is Θ(m).

The number of non-all-zero sectors in N_{v} is
O(p

m/ log m) where deg(v) ≤ p

m/ log m.
The number of non-all-zero sectors in N_{v} is
O(p

m/ log m) where deg(v) ≥ p

m/ log m.

FFR needs O(m^{3/2}/ log^{1/2} m) time.

**CPU Instruction versus Memory**

**Access**

**Instruction versus Memory(1/3)**

**Instruction versus Memory(1/3)**

**The inner product in Four-Russians’ Algorithm can be**
accomplished by two CPU instructions. It is known that the
execution speed of CPU instruction is much faster than

that of memory access.

**Instruction versus Memory(1/3)**

**Instruction versus Memory(1/3)**

**The inner product in Four-Russians’ Algorithm can be**
accomplished by two CPU instructions. It is known that the
execution speed of CPU instruction is much faster than

that of memory access.

*"logical and"* C = A ˚∧ B, C_{i} = min(A_{i}, B_{i})

**Instruction versus Memory(1/3)**

**Instruction versus Memory(1/3)**

**The inner product in Four-Russians’ Algorithm can be**
accomplished by two CPU instructions. It is known that the
execution speed of CPU instruction is much faster than

that of memory access.

*"logical and"* C = A ˚∧ B, C_{i} = min(A_{i}, B_{i})

*"population count"* d = ˚σ A, d = P_{g}

i=1 A_{i}

**Instruction versus Memory(2/3)**

**Instruction versus Memory(2/3)**

2.5 3 3.5 4

wall time (second per 10,000 runs)

ALGO 5 ALGO 2 with p= 8 ALGO 2 with p=16

**Instruction versus Memory(2/3)**

**Instruction versus Memory(2/3)**

0 5 10 15 20 25 30

0 10 20 30 40 50 60

wall time (second per 10,000 runs)

bit density (x out of 64 bits are 1)

ALGO 2 with p= 8 ALGO 2 with p=16 ALGO 2 with p=22

**Instruction versus Memory(3/3)**

**Instruction versus Memory(3/3)**

CPU instructions can handle sectors of size g, where g is the length of CPU register.

**Instruction versus Memory(3/3)**

**Instruction versus Memory(3/3)**

CPU instructions can handle sectors of size g, where g is the length of CPU register.

Is g a constant in the analysis of algorithm?

**Instruction versus Memory(3/3)**

**Instruction versus Memory(3/3)**

CPU instructions can handle sectors of size g, where g is the length of CPU register.

Is g a constant in the analysis of algorithm?

Are all instructions O(1)-executable?

**Is** g **a constant?**

**Is** g **a constant?**

**Is**

**a constant?**

**Is** g **a constant?**

**Is**

**a constant?**

Assume a program executed on M, a random access machine, using Θ(S) memory space.

**Is** g **a constant?**

**Is**

**a constant?**

Assume a program executed on M, a random access machine, using Θ(S) memory space.

Θ(S) memory address is required.

**Is** g **a constant?**

**Is**

**a constant?**

Assume a program executed on M, a random access machine, using Θ(S) memory space.

Θ(S) memory address is required.

The length of the registers in M is Ω(log S).

**Are all instructions** O(1) **-executable?**

**Are all instructions** O(1) **-executable?**

**Are all instructions**

**-executable?**

**Are all instructions** O(1) **-executable?**

**Are all instructions**

**-executable?**

AC^{0} instructions are those which can be realized with
polynomial size and constant depth circuit.

**Are all instructions** O(1) **-executable?**

**Are all instructions**

**-executable?**

AC^{0} instructions are those which can be realized with
polynomial size and constant depth circuit.

**Multiplication is not an** AC^{0} instruction.

**Are all instructions** O(1) **-executable?**

**Are all instructions**

**-executable?**

AC^{0} instructions are those which can be realized with
polynomial size and constant depth circuit.

**Multiplication is not an** AC^{0} instruction.

To access multi-dimension array in constant time,
**multiplication must be constant time executable.**

**Are all instructions** O(1) **-executable?**

**Are all instructions**

**-executable?**

^{0} instructions are those which can be realized with
polynomial size and constant depth circuit.

**Multiplication is not an** AC^{0} instruction.

To access multi-dimension array in constant time,
**multiplication must be constant time executable.**

We suggest those instructions can be implemented
**faster than multiplication is constant time**

**executable.**

**Population Count**

**Population Count(1/3)**

**Population Count(1/3)**

**Population Count(1/3)**

**Population Count(1/3)**

˚σ is not supported by all types of CPU.

**Population Count(1/3)**

**Population Count(1/3)**

˚σ is not supported by all types of CPU.

Any alternative way?

**Population Count(1/3)**

**Population Count(1/3)**

˚σ is not supported by all types of CPU.

Any alternative way?

The previous work shows a bitwise twiddling method to realize the population count. The method needs

O(log^{(2)} g) basic instructions. Hence, the speedup is
O(g^{1/2}/ log^{(2)} g) = Ω(log^{1/2} m/ log^{(3)} m) due to

g = Ω(log m).

**Population Count(1/3)**

**Population Count(1/3)**

˚σ is not supported by all types of CPU.

Any alternative way?

The previous work shows a bitwise twiddling method to realize the population count. The method needs

O(log^{(2)} g) basic instructions. Hence, the speedup is
O(g^{1/2}/ log^{(2)} g) = Ω(log^{1/2} m/ log^{(3)} m) due to

g = Ω(log m).

Any faster solution?

**Population Count(1/3)**

**Population Count(1/3)**

˚σ is not supported by all types of CPU.

Any alternative way?

The previous work shows a bitwise twiddling method to realize the population count. The method needs

O(log^{(2)} g) basic instructions. Hence, the speedup is
O(g^{1/2}/ log^{(2)} g) = Ω(log^{1/2} m/ log^{(3)} m) due to

g = Ω(log m).

**Population Count(2/3)**

**Population Count(2/3)**

{ 1 1 0 0 }

{ 1 0 1 0 }

{ 1 1 0 0 }

**Population Count(2/3)**

**Population Count(2/3)**

{ 1 1 0 0 }

{ 1 0 1 0 }

+ { 1 1 0 0 }
2^{0} { 1 0 1 0 }
2^{1} { 1 1 0 0 }

**Population Count(2/3)**

**Population Count(2/3)**

{ 1 1 0 0 }

{ 1 0 1 0 }

+ { 1 1 0 0 }
2^{0} { 1 0 1 0 }
2^{1} { 1 1 0 0 }

Using this method to reduce 2^{d} − 1 ˚σ into d ˚σ.

**Population Count(2/3)**

**Population Count(2/3)**

{ 1 1 0 0 }

{ 1 0 1 0 }

+ { 1 1 0 0 }
2^{0} { 1 0 1 0 }
2^{1} { 1 1 0 0 }

Using this method to reduce 2^{d} − 1 ˚σ into d ˚σ.
The speedup is Ω(log^{1/2} m/ log^{(4)} m).

**Instruction versus Memory(2/3)**

**Instruction versus Memory(2/3)**

0 0.5 1 1.5 2 2.5 3

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

elapsed wall time (second)

rewiring probability

ALGO 3 ALGO 7[12 <- ALGO 10]

ALGO 7[12 <- ALGO 12]

**Instruction versus Memory(2/3)**

**Instruction versus Memory(2/3)**

40 60 80 100

speedup relative to ALGO 3(%)

ALGO 7[12 <- ALGO 10]

ALGO 7[12 <- ALGO 12]

**Conclusion**

**Conclusion**

**Conclusion**

**Conclusion**

**Conclusion**

**Previous efficient algorithm, Forward Algorithm,**
needs O(m^{3/2}) time and O(m) space.

**Conclusion**

**Conclusion**

**Previous efficient algorithm, Forward Algorithm,**
needs O(m^{3/2}) time and O(m) space.

To develop algorithms on random access machines, we come up with two arguments.

**Conclusion**

**Conclusion**

**Previous efficient algorithm, Forward Algorithm,**
needs O(m^{3/2}) time and O(m) space.

To develop algorithms on random access machines, we come up with two arguments.

Based on the arguments, our algorithm has
Ω(log^{1/2} m/ log^{(4)} m) speedup.

**Conclusion**

**Conclusion**

**Previous efficient algorithm, Forward Algorithm,**
needs O(m^{3/2}) time and O(m) space.

To develop algorithms on random access machines, we come up with two arguments.

Based on the arguments, our algorithm has
Ω(log^{1/2} m/ log^{(4)} m) speedup.

**Though it may slightly worse than FFR Algorithm in**

**Future Work**

**Future Work**

**Future Work**

**Future Work**

**Future Work**

Maybe some graph features are more proper to
**analyze than degeneracy when the algorithm to**

calculate the intersection of given two sets changed.

**Future Work**

**Future Work**

Maybe some graph features are more proper to
**analyze than degeneracy when the algorithm to**

calculate the intersection of given two sets changed.

The same arguments on random access machines can be applied to many other algorithms.