A Randomized Linear-Time A lgorithm to Find Minimum S paning Trees
黃則翰 R96922141 蘇承祖 R96922077 張紘睿 R96922136 許智程 D95922022 戴于晉 R96922171
David R. Karger Philip N. Klein Robert E. Tarjan
Outline
Introduction
Basic Property & Definition
Algorithm
Analysis
Outline
Introduction
Basic Property & Definition
Algorithm
Analysis
Introduction
[Borůvka 1962] O(m log n)
Gabow et al.[1984] O(m log β(m,n) )
◦β(m,n)= min { i |log(i)n <= m/n}
Verification algorithm
◦King[1993] O(m)
A randomize algorithm runs in O(m) tim e with high probability
Outline
Introduction
Basic Property & Definition
Algorithm
Analysis
Cycle property
For any cycle C in a graph, the heavie st edge in C dose not appear in the mi nimum spanning forest.
2
3
5
6
2
3
5
6
Cut Property
2
3
5
6
For any proper nonempty subset X of the vertices, the lightest edge with exactly one endpoint in X belongs to the minimum
spanning tree
X
Definition
Let G be a graph with weighted edges.
◦w(x,y)
The weight of edge {x,y}
If F is a forest of a subgraph in G
◦F(x, y)
the path (if any) connecting x and y in F
◦wF(x, y)
the maximum weight of an edge on F(x , y)
◦wF(x, y)=∞
If x and y are not connected in F
F-heavy & F-light
An edge {x,y} is F-heavy if w(x,y) > wF (x,y)
and F-light otherwise
Edge of F are all F-light
A C
B D
2
3
5
6
E G
F H
2
3
5
W(B,D)=6 6
WF(B,D)=max{2,3, 5}
F-heavy
W(F,H)=6 WF(F,H)= ∞ F-light
W(C,D)=5 WF(C,D)=5 F-light
No F-heavy edge can be in the minimum spanning forest of G (cycle property)
Discard edge that cannot be in the min imum spanning tree
F-light edge can be the candidate edge for the minimum spanning tree of G
Observation
Outline
Introduction
Basic Property & Definition
Algorithm
Analysis
Boruvka Algorithm
For each vertex, select the minimum-we ight edge incident to the vertex.
Replace by a single vertex each connec ted component defined by the selected edges.
Delete all resulting isolated vertice s, loops, and all but the lowest-weigh t edge among each set of multiple edge s.
Algorithm Step1
Apply two successive Boruvka steps to the graph, thereby reducing the number of vertices by at least a factor of fo ur.
Algorithm Step2
Choose a subgraph H by selecting each edge independently with probability ½.
Apply the algorithm recursively to H, producing a minimum spanning forest F of H.
Find all the F-heavy edges and delete them from the contracted graph.
Algorithm Step3
Apply the algorithm recursively to the remaining graph to compute a spanning forest F’. Return those edges contrac ted in Step1 together with the edges o f F’.
G
H
Boruvka × 2
G*
Original Problem
G
’
problemRight Sub-Return minimum forest F of H
Delete F-heavy edges from G*
Left Sub- problem
Sample with F’
p=0.5
Correctness
By the cut property, every edge contra cted during Step1 is in the MSF.
By the cycle property, the edges delet ed in Step2 do NOT belong to the MSF.
By the induction hypothesis, the MSF o f the remaining graph is correctly det ermined in the recursive call of Step 3.
Candidate Edge of MST
The expected number of F-light edges i n G is at most n/p (negative binomial)
For every sample graph H, the expected candidate edge for MST in G is at most n/p (F-light edge)
Random-sampling
To help discard some edge that cannot be in the minimum spanning tree
Construct the sample graph H
◦Process the edges in increasing order
◦To process an edge e
◦1. Test whether both endpoints of e i n same component
◦2. Include the edge in H with probabi lity p
◦3. If e is in H and is F-light, add e to the Forest F
Random-sampling
C E
D F
6
5
11
9
A G
4
3
10
14
13 B
7
C E
D F
6
5
11
9
A G
4
3
10
14
13 B
7
G H
F
W(E,G)=14
WF(E,G)=max{5,6, 9,13}
F-heavy
W(E,F)=11
WF(E,F)=max{5,6, 9}
F-heavy W(D,F)=9 WF(D,F)=9 F-light
W(A,B)=7 WF(A,B)= ∞ F-light
Random-sampling
C E
D F
6
5
11 9
A G
4
3
10
14
13 7 B
G
F
1. Increasing Order 2. If F-light
Throw If
Select 3. Else
Throw
Don’t select
1. Random select edges to H 2. Find F of H
C E
D F
6
5
11 9
A G
4
3
10
14
13 7 B
G
No F-heavy edge can be in the minimum spanning forest of G (cycle property)
F-light edge can be the candidate edge for the minimum spanning tree of G
The forest F produced is the forest th at would be produced by Kruskal and in lcude all possible MSF of G
Observation
Observation
The size of F is at most n-1
The expected number of F-light edges i n G is at most n/p (negative binomial)
k
n p
k p n p k
n k
f 1 (1 )
)
;
;
(
p p
k p n
k n k
) 1
) ( 1
( 1
p n1 p
p n n1 p
p n
Mean k = Expected n =
Outline
Introduction
Basic Property & Definition
Algorithm
Analysis
Analysis of the Algorithm
The worst case.
The expectations running time.
The probability of the expectations ru nning time.
Running time Analysis
Total running time= running time in ea ch steps.
Step(1): 2 steps Boruvka’s algorithm
Step(2):Dixon-Rauch-Tarjan verificatio n algorithm.
All takes linear time to the number of edges.
◦Estimate the total number of edges.
Observe the recursion tree
G=(V,E) |V| = n, |E|=m .
◦m≧n/2 since there is no isolate vert ices.
Each problem generates at most 2 subpr oblems.
◦At depth d, there is at most 2d node s.
◦Each node in depth d has at most n/4d vertices.
The depth d is at most log4n.
◦There are at most vertices in all sub problems
02 /4 d 0 /2d 2
d
d
dn n n
The worst case
Theorem 4.1 The worst-case running ti me of the minimum-spanning-forest alg orithm is O(min{n2,m log n}), the same as the bound for Boruvka’s algorith
m.
Proof: There is two different estimat e ways.
1. A subproblem at depth contains at m ost (n/4d)2/2 edges.
Total edges in all subproblems i s:
dlog4 n n d d O n0
2 2
) ( 2 2
) 4 / (
The worst case
2. Consider a subprolbem G=(V,E) after step(1), we have a G’=(V’ ,E’),|E’|≦
|E| - |V|/2, |V’| ≦|V|/4 Edges in left-child = |H|
Edges in right-child ≦ |E’| - |H| +
|F|
so edges in two subproblem is less th en:
(|H|) + (|E’| - |H| + |F|)
=|E’| +|F|≦|E|-|V|/2 + |V|/4≦|E|
The two sub problem at most contains | E| edges.
The worst case
m edges
edges
m
edges
m
edges
m n
log
The worst case
The depth is at most log4n and each lev el has at most m edges, so there are a t most (m log n) edges.
The worst-case running time of the min imum-spanning-forest algorithm is O(mi n{n2,m log n}).
Analysis of the Algorithm
The worst case.
The expectations running time.
The probability of the expectations ru nning time.
Analysis – Average Case
(1/8)
Theorem: the expected running time o f the minimum spanning forest algori thm is O ( m )
◦Calculating the expected total number of edges for all left path problemsOriginal Problem
Left Sub-problem Right Sub-problem
Left Subsub-problem Right Subsub-problem
Analysis – Average Case
(2/8)
Calculating the expected total edge number for one left path started at one problem with m’ edges
Evaluating the total edge number for all right sub-problems
# of edges= m’
# of edges
= m’
Expected total edge number
≤ 2m’
Expected total edge number
≤ 2m’
Analysis – Average Case
(3/8)
G
H G
’
Boruvka × 2
G*
Sample with p=0.5
1. E[edge number of H] = 0.5 × edge number of G*
Original Problem
Left
Sub-problem
Right Sub-problem
2. ∵ Boruvka × 2
∴ edge number of G* ≤ edge number of G
E[edge number of H] ≤ 0.5 × edge number of G
Calculating the expected total edge numbe
r for one left path started at one proble
m with m’ edges
Analysis – Average Case
(4/8)
G
H G
’
Boruvka × 2
G*
Sample with p=0.5
Original Problem
Left
Sub-problem
Right Sub-problem
E[edge number of H] ≤ 0.5 × edge number of G
Calculating the expected total edge numbe r for one left path started at one proble m with m’ edges
# of edges
= m’
# of edges
= m’
# of edges ≤ 0.5
× m’
# of edges ≤ 0.5
× m’
Expected total edge number ≤ = 2m’
Analysis – Average Case
(5/ Calculating the 8) expected total edge numbe r for one left path L started at one prob lem with m’ edges
◦Expected total edge number on L ≤ 2m’
• Evaluating the total edge number of all right sub-problems
• E[total edges of all right sub-problem] ≤ n
K.O
.
Analysis – Average Case
(6/8)
G
H G
’
Original Problem
Left
Sub-problem
Right Sub-problem
1. ∵ Boruvka × 2
∴ vertex number of G*
≤ 0.25 × vertex number of G
E[edge number of G’] ≤ 0.5×vertex number of G
Evaluating the total edge number for all right su b-problems
◦ To prove : E[total edges of all right sub-problem] ≤ n
Boruvka × 2
G*
Sample with p=0.5
Return minimum forest F of H
Delete F-heavy edges from G*
2. Based on lemma 2.1:
E[edge number of G’] ≤ 2 × vertex number of G*
Analysis – Average Case
(7/8)
E[edge number of G’] ≤ 0.5×vertex number of G
Evaluating the total edge number for all right su b-problems
◦ To prove : E[total edges of all right sub-problem] ≤ n
G
H G
’
Original Problem
Left
Sub-problem
Right Sub-problem
Boruvka × 2
G*
Sample with
p=0.5 # of vertices of sub-
problems ≤ 2×n/4
# of vertices of sub- problems ≤ 4×n/42
# of vertices of sub- problems ≤ 8×n/43
# of vertices of sub- problems ≤ 16×n/44
# of edges of right sub-problems ≤ n/2
# of edges of right sub-problems ≤ 2×n/8
# of vertices of original- problems=n
# of edges of right sub- problems ≤ 4×n/(42×2)
# of edges of right sub- problems ≤ 8×n/(43×2)
= n
Analysis – Average Case
(8/ Calculating the 8) expected total edge number for on e left path started at one problem with m’ edges
◦Expected total edge number for one left path ≤ 2m’
Evaluating the total edge number for all right su b-problems
◦E[total edges of all right sub-problem] ≤ n
# of edges
= m’
# of edges
= m’
Expected total edge number
≤ 2m’
Expected total edge number
≤ 2m’
E[processed edges in the original problem and all sub- problems]
=2×(m+n)
Analysis of the Algorithm
The worst case.
The expectations running time.
The probability of the expectations ru nning time.
The Probability of Lineari ty
Theorem 4.3
◦The minimum spanning forest algorithm runs in Ο(m) time with probability 1 – exp(-Ω(m))
The Probability of Lineari ty
n
1 i
At E etXi
e A
X Pr Chernoff Bound:
Given xi as i.d.d. random variables and 0< i n, and X is the sum of all xi, for t > 0, we have
Thus, the probability that less than s successes (each with chance p) within k trials is
12 12
) s ( Ω
k t st
k 1 i st tX
p and t
for ,
e
) pe ( e
e E e
s X
Pr i
The Probability of Lineari ty
Right Subproblems
◦At most the number of vertices in all right subproblems: n/2 ( proved by th eorem 4.2 )
◦n/2 is the upper bound on the total n umber of heads in nickel-flips
Right Subproblems
The probability
◦It occurs fewer than n/2 heads in a s equence of 3m nickel-tosses
m + n ≦ 3m since n/2 ≦ m
The probability is exp (-Ω(m)) by
a Chernoff bound
The Probability of Lineari ty
Left Subproblem
◦Sequence: every sequence ends up with a tail, that is, HH…HHT
◦The number of occurrences of tails is at most the number of sequences
◦Assume that there are at most m’ edg es in the root problem and in all rig ht subproblems
Left Subproblems
The probability
◦It occurs m’ tails in a sequence of more than 3m’ coin-tosses
The probability is exp (-Ω(m)) by
a Chernoff bound
The Probability of Lineari ty
Combining Right & Left Subproblems
◦The total number of edges is Ο(m) w ith a high-probability bound 1 – exp (-Ω(m))