### A Randomized Linear-Time A lgorithm to Find Minimum S paning Trees

黃則翰 R96922141 蘇承祖 R96922077 張紘睿 R96922136 許智程 D95922022 戴于晉 R96922171

David R. Karger Philip N. Klein Robert E. Tarjan

### Outline

Introduction

Basic Property & Definition

Algorithm

Analysis

### Outline

Introduction

Basic Property & Definition

Algorithm

Analysis

### Introduction

[Borůvka 1962] O(m log n)

Gabow et al.[1984] O(m log β(m,n) )

◦β(m,n)= min { i |log^{(i)}n <= m/n}

Verification algorithm

◦King[1993] O(m)

A randomize algorithm runs in O(m) tim e with high probability

### Outline

Introduction

Basic Property & Definition

Algorithm

Analysis

### Cycle property

For any cycle C in a graph, the heavie st edge in C dose not appear in the mi nimum spanning forest.

2

3

5

6

2

3

5

6

### Cut Property

2

3

5

6

*For any proper nonempty subset X of the *
vertices, the lightest edge with exactly one
*endpoint in X belongs to the minimum *

spanning tree

**X**

### Definition

Let G be a graph with weighted edges.

◦w(x,y)

The weight of edge {x,y}

If F is a forest of a subgraph in G

◦F(x, y)

the path (if any) connecting x and y in F

◦w_{F}(x, y)

the maximum weight of an edge on F(x , y)

◦w_{F}(x, y)=∞

If x and y are not connected in F

### F-heavy & F-light

An edge {x,y} is F-heavy if w(x,y) > w_{F}
(x,y)

and F-light otherwise

Edge of F are all F-light

A C

B D

2

3

5

6

E G

F H

2

3

5

W(B,D)=6 6

W_{F}(B,D)=max{2,3,
5}

F-heavy

W(F,H)=6
W_{F}**(F,H)= ∞ **
F-light

W(C,D)=5
W_{F}(C,D)=5
F-light

No F-heavy edge can be in the minimum spanning forest of G (cycle property)

Discard edge that cannot be in the min imum spanning tree

F-light edge can be the candidate edge for the minimum spanning tree of G

### Observation

### Outline

Introduction

Basic Property & Definition

Algorithm

Analysis

### Boruvka Algorithm

For each vertex, select the minimum-we ight edge incident to the vertex.

Replace by a single vertex each connec ted component defined by the selected edges.

Delete all resulting isolated vertice s, loops, and all but the lowest-weigh t edge among each set of multiple edge s.

### Algorithm Step1

Apply two successive Boruvka steps to the graph, thereby reducing the number of vertices by at least a factor of fo ur.

### Algorithm Step2

Choose a subgraph H by selecting each edge independently with probability ½.

Apply the algorithm recursively to H, producing a minimum spanning forest F of H.

Find all the F-heavy edges and delete them from the contracted graph.

### Algorithm Step3

Apply the algorithm recursively to the remaining graph to compute a spanning forest F’. Return those edges contrac ted in Step1 together with the edges o f F’.

**G**

**G**

**H**

**H**

Boruvka × 2

*G**

**Original Problem**

**G**

**G**

**’**

_{problem}

^{Right }

^{Sub-}Return
minimum forest
*F of H*

Delete F-heavy edges from G*

**Left **
**Sub-**
**problem**

Sample with *F’*

*p=0.5*

### Correctness

By the cut property, every edge contra cted during Step1 is in the MSF.

By the cycle property, the edges delet ed in Step2 do NOT belong to the MSF.

By the induction hypothesis, the MSF o f the remaining graph is correctly det ermined in the recursive call of Step 3.

### Candidate Edge of MST

The expected number of F-light edges i n G is at most n/p (negative binomial)

For every sample graph H, the expected candidate edge for MST in G is at most n/p (F-light edge)

### Random-sampling

To help discard some edge that cannot be in the minimum spanning tree

Construct the sample graph H

◦Process the edges in increasing order

◦To process an edge e

◦1. Test whether both endpoints of e i n same component

◦2. Include the edge in H with probabi lity p

◦3. If e is in H and is F-light, add e to the Forest F

### Random-sampling

C E

D F

6

5

11

9

A G

4

3

10

14

13 B

7

C E

D F

6

5

11

9

A G

4

3

10

14

13 B

7

**G**
**H**

**F**

W(E,G)=14

W_{F}(E,G)=max{5,6,
9,13}

F-heavy

W(E,F)=11

W_{F}(E,F)=max{5,6,
9}

F-heavy
W(D,F)=9
W_{F}(D,F)=9
F-light

W(A,B)=7
W_{F}(A,B)= ∞
F-light

### Random-sampling

C E

D F

6

5

11 9

A G

4

3

10

14

13 7 B

**G**

**F**

1. Increasing Order 2. If F-light

Throw If

Select 3. Else

Throw

Don’t select

1. Random select edges to H 2. Find F of H

C E

D F

6

5

11 9

A G

4

3

10

14

13 7 B

**G**

No F-heavy edge can be in the minimum spanning forest of G (cycle property)

F-light edge can be the candidate edge for the minimum spanning tree of G

The forest F produced is the forest th at would be produced by Kruskal and in lcude all possible MSF of G

### Observation

### Observation

The size of F is at most n-1

The expected number of F-light edges i n G is at most n/p (negative binomial)

*k*

*n* *p*

*k* *p*
*n*
*p* *k*

*n*
*k*

*f* 1 (1 )

)

;

;

(

*p*
*p*

*k* *p*
*n*

*k* _{n}_{k}

) 1

) ( 1

( _{1}

^{}

*p*
*n*1 *p*

*p* *n*
*n*1 *p*

*p*
*n*

*Mean k =*
*Expected n =*

### Outline

Introduction

Basic Property & Definition

Algorithm

Analysis

### Analysis of the Algorithm

The worst case.

The expectations running time.

The probability of the expectations ru nning time.

### Running time Analysis

Total running time= running time in ea ch steps.

Step(1): 2 steps Boruvka’s algorithm

Step(2):Dixon-Rauch-Tarjan verificatio n algorithm.

All takes linear time to the number of edges.

◦Estimate the total number of edges.

### Observe the recursion tree

G=(V,E) |V| = n, |E|=m .

◦m≧n/2 since there is no isolate vert ices.

Each problem generates at most 2 subpr oblems.

◦At depth d, there is at most 2^{d} node
s.

◦Each node in depth d has at most n/4^{d}
vertices.

The depth d is at most log_{4}n.

◦There are at most vertices in all sub problems

###

###

^{}

_{0}2 /4 _{d}_{0} /2* ^{d}* 2

*d*

*d*

*d**n* *n* *n*

### The worst case

Theorem 4.1 The worst-case running ti
me of the minimum-spanning-forest alg
orithm is O(min{n^{2},m log n}), the same
as the bound for Boruvka’s algorith

m.

Proof: There is two different estimat e ways.

1. A subproblem at depth contains at m
ost (n/4^{d})^{2}/2 edges.

Total edges in all subproblems i s:

###

*log4*

_{d}

^{n}*n*

^{d}**

^{d}*O*

*n*

0

2 2

) ( 2 2

) 4 / (

### The worst case

2. Consider a subprolbem G=(V,E) after
step(1), we have a G^{’}=(V^{’ },E^{’}),|E^{’}|≦

|E| - |V|/2, |V^{’}| ≦|V|/4
Edges in left-child = |H|

Edges in right-child ≦ |E^{’}| - |H| +

|F|

so edges in two subproblem is less th en:

(|H|) + (|E^{’}| - |H| + |F|)

=|E^{’}| +|F|≦|E|-|V|/2 + |V|/4≦|E|

The two sub problem at most contains | E| edges.

### The worst case

*m edges*

edges

*m*

edges

*m*

edges

*m*
*n*

log

### The worst case

The depth is at most log_{4}n and each lev
el has at most m edges, so there are a
t most (m log n) edges.

The worst-case running time of the min
imum-spanning-forest algorithm is O(mi
n{n^{2},m log n}).

### Analysis of the Algorithm

The worst case.

The expectations running time.

The probability of the expectations ru nning time.

### Analysis – Average Case

_{(1/}

8)

### Theorem: the expected running time o f the minimum spanning forest algori thm is O ( m )

◦Calculating the expected total number of
edges for all left path problems**Original Problem**

**Left Sub-problem****Right Sub-problem**

**Left Subsub-problem****Right Subsub-problem**

### Analysis – Average Case

_{(2/}

8)

### Calculating the expected total edge number for one left path started at one problem with m’ edges

### Evaluating the total edge number for all right sub-problems

# of edges**= m’**

# of edges

**= m’**

Expected total edge number

**≤ 2m’**

Expected total edge number

**≤ 2m’**

### Analysis – Average Case

_{(3/}

8)

**G**

**G**

**H** **G**

**H**

**G**

**’**

Boruvka × 2

*G**

Sample with
*p=0.5*

1. E[edge number of H] = 0.5 × edge number
**of G***

**Original Problem**

**Left **

**Sub-problem**

**Right **
**Sub-problem**

2. ∵ Boruvka × 2

∴ **edge number of G* ≤ edge number of ****G**

**E[edge number of H] ≤ 0.5 × edge number ****of G**

### Calculating the expected total edge numbe

### r for one left path started at one proble

### m with m’ edges

### Analysis – Average Case

_{(4/}

8)

**G**

**G**

**H** **G**

**H**

**G**

**’**

Boruvka × 2

*G**

Sample with
*p=0.5*

**Original Problem**

**Left **

**Sub-problem**

**Right **
**Sub-problem**

**E[edge number of H] ≤ 0.5 × edge number ****of G**

### Calculating the expected total edge numbe r for one left path started at one proble m with m’ edges

# of edges

**= m’**

# of edges

**= m’**

# of edges ≤ 0.5

**× m’**

# of edges ≤ 0.5

**× m’**

**Expected total edge number ≤ **
= **2m’**

### Analysis – Average Case

_{(5/}

Calculating the 8) expected total edge numbe r for one left path L started at one prob lem with m’ edges

◦Expected total edge number on L ≤ 2m’

### • Evaluating the total edge number of **all right sub-problems**

*• E[total edges of all right sub-problem] ≤ n*

**K.O**

**.**

### Analysis – Average Case

_{(6/}

8)

**G**

**G**

**H** **G**

**H**

**G**

**’**

**Original Problem**

**Left **

**Sub-problem**

**Right **
**Sub-problem**

1. ∵ Boruvka × 2

∴ **vertex number of G* **

≤ 0.25 × vertex number of
**G**

**E[edge number of G’] ****≤ 0.5×vertex number of **
**G**

Evaluating the total edge number for all right su b-problems

◦ To prove : E[total edges of all right sub-problem] ≤ n

Boruvka × 2

*G**

Sample with
*p=0.5*

Return
minimum forest
*F of H*

Delete F-heavy edges from G*

2. Based on lemma 2.1:

** ****E[edge number of G’] ≤ 2 × vertex number of G***

### Analysis – Average Case

_{(7/}

8)

**E[edge number of G’] ****≤ 0.5×vertex number of **
**G**

Evaluating the total edge number for all right su b-problems

◦ To prove : E[total edges of all right sub-problem] ≤ n

**G**

**G**

**H** **G**

**H**

**G**

**’**

**Original Problem**

**Left **

**Sub-problem**

**Right **
**Sub-problem**

Boruvka × 2

*G**

Sample with

*p=0.5* # of vertices of sub-

problems ≤ 2×n/4

# of vertices of sub-
problems ≤ 4×n/4^{2}

# of vertices of sub-
problems ≤ 8×n/4^{3}

# of vertices of sub-
problems ≤ 16×n/4^{4}

# of edges of right sub-problems ≤ n/2

# of edges of right sub-problems ≤ 2×n/8

# of vertices of original- problems=n

# of edges of right sub-
problems ≤ 4×n/(4^{2}×2)

# of edges of right sub-
problems ≤ 8×n/(4^{3}×2)

**= n**

### Analysis – Average Case

_{(8/}

Calculating the 8) expected total edge number for on e left path started at one problem with m’ edges

◦Expected total edge number for one left path ≤ 2m’

Evaluating the total edge number for all right su b-problems

◦E[total edges of all right sub-problem] ≤ n

# of edges

**= m’**

# of edges

**= m’**

Expected total edge number

**≤ 2m’**

Expected total edge number

**≤ 2m’**

*E[processed edges in the original problem and all sub-*
problems]

**=2×(m+n)**

### Analysis of the Algorithm

The worst case.

The expectations running time.

The probability of the expectations ru nning time.

### The Probability of Lineari ty

### Theorem 4.3

◦The minimum spanning forest algorithm runs in Ο(m) time with probability 1 – exp(-Ω(m))

### The Probability of Lineari ty

### _{}

^{n}

1 i

At E etX_{i}

e A

X Pr Chernoff Bound:

Given x_{i} as i.d.d. random variables and 0< i n, and X is
the sum of all x_{i}, for t > 0, we have

Thus, the probability that less than s successes (each with chance p) within k trials is

###

12 12

) s ( Ω

k t st

k 1 i st tX

p and t

for ,

e

) pe ( e

e E e

s X

Pr ^{i}

###

### The Probability of Lineari ty

### Right Subproblems

◦At most the number of vertices in all right subproblems: n/2 ( proved by th eorem 4.2 )

◦n/2 is the upper bound on the total n umber of heads in nickel-flips

### Right Subproblems

### The probability

◦It occurs fewer than n/2 heads in a s equence of 3m nickel-tosses

### m + n ≦ 3m since n/2 ≦ m

### The probability is exp (-Ω(m)) by

### a Chernoff bound

### The Probability of Lineari ty

### Left Subproblem

◦Sequence: every sequence ends up with a tail, that is, HH…HHT

◦The number of occurrences of tails is at most the number of sequences

◦Assume that there are at most m’ edg es in the root problem and in all rig ht subproblems

### Left Subproblems

### The probability

◦It occurs m’ tails in a sequence of more than 3m’ coin-tosses

### The probability is exp (-Ω(m)) by

### a Chernoff bound

### The Probability of Lineari ty

### Combining Right & Left Subproblems

◦The total number of edges is Ο(m) w ith a high-probability bound 1 – exp (-Ω(m))