Non-projective Dependency Parsin g using Spanning Tree Algorithm
R98922004 Yun-Nung Chen 資工碩一 陳縕儂
1 /39
Reference
Non-projective Dependency Parsing usi ng Spanning Tree Algorithms (HLT/EMNL P 2005)
Ryan McDonald, Fernando Pereira, Kiril Ribarov, Jan Hajic
2 /39
Introduction
3 /39
Example of Dependency Tree
Each word depends on exactly one parent
Projective
Words in linear order, satisfying
▪ Edges without crossing
▪ A word and its descendants form a contiguous subs tring of the sentence
4 /39
Non-projective Examples
English
Most projective, some non-projective
Languages with more flexible word ord er
Most non-projective
▪ German, Dutch, Czech
5 /39
Advantage of Dependency Parsing
Related work
relation extraction
machine translation
6 /39
Main Idea of the Paper
Dependency parsing can be formalized as
the search for a maximum spanning tree in a directed graph
7 /39
Dependency Parsing and Spanning Tre es
8 /39
Edge based Factorization (1/
3)
sentence: x = x1 … xn
the directed graph Gx = ( Vx , Ex ) given by
dependency tree for x: y
the tree Gy = ( Vy , Ey ) Vy = Vx
Ey = {(i, j), there’s a dependency from xi to xj } 9 /39
Edge based Factorization (2/
3)
scores of edges
score of a dependency tree y for sent ence x
10 /39
Edge based Factorization (3/
3)
11 /39
x = John hit the ball with the bat
root hit
John ball the
with
bat the
y1 root
ball John hit
the
with bat the
y2 root
John
ball hit the
with bat the
y3
Two Focus Points
1) How to decide weight vector w
2) How to find the tree with the maximum score
12 /39
Maximum Spanning Trees
dependency trees for x
= spanning trees for Gx
the dependency tree with maximum score for x = maximum spanning trees for Gx
13 /39
Maximum Spanning Tree Algorithm
14 /39
Chu-Liu-Edmonds Algorithm (1/12)
Input: graph G = (V, E)
Output: a maximum spanning tree in G
① greedily select the incoming edge with hi ghest weight
▪ Tree
▪ Cycle in G
② contract cycle into a single vertex and r ecalculate edge weights going into and ou t the cycle
15 /39
Chu-Liu-Edmonds Algorithm (2/12)
x = John saw Mary
16 /39
sa w roo
t
Joh n
Mary
9
3 0 1
0 2 9 0
3 3
0 1
1
0
Gx
Chu-Liu-Edmonds Algorithm (3/12)
For each word, finding highest scorin g incoming edge
17 /39
sa w roo
t
Joh n
Mary
9
3 0 1
0 2 9 0
3 3
0 1
1
0
Gx
Chu-Liu-Edmonds Algorithm (4/12)
If the result includes
Tree – terminate and output
Cycle – contract and recalculate
18 /39
sa w roo
t
Joh n
Mary
9
3 0 1
0 2 9 0
3 3
0 1
1
0
Gx
Chu-Liu-Edmonds Algorithm (5/12)
Contract and recalculate
▪ Contract the cycle into a single node
▪ Recalculate edge weights going into and out th e cycle
19 /39
sa w roo
t
Joh n
Mary
9
3 0 1
0 2 9 0
3 3
0 1
1
0
Gx
Chu-Liu-Edmonds Algorithm (6/12)
Outcoming edges for cycle
20 /39
sa w roo
t
Joh n
Mary
9
3 0 1
0
9
3
1 1
0
Gx
2
0 3
0
Chu-Liu-Edmonds Algorithm (7/12)
Incoming edges for cycle
,
21 /39
sa w roo
t
Joh n
Mary
9
3 0 1
0
9
1 1
0
Gx
2
0 3
0
Chu-Liu-Edmonds Algorithm (8/12)
x = root
▪ s(root, John) – s(a(John), John) + s(C) = 9-30+50=29
▪ s(root, saw) – s(a(saw), saw) + s(C) = 10-20+50=40
22 /39
sa w roo
t
Joh n
Mary
9
3 0 1
0
9
1 1
0
Gx 4
0 2
9
2
0 3
0
Chu-Liu-Edmonds Algorithm (9/12)
x = Mary
▪ s(Mary, John) – s(a(John), John) + s(C) = 11-30+50=31
▪ s(Mary, saw) – s(a(saw), saw) + s(C) = 0-20+50=30
23 /39
sa w roo
t
Joh n
Mary
9
3 0
1 1
0
Gx
3 1 4
0
3 2 0
0 3
0
Chu-Liu-Edmonds Algorithm (10/12)
24 /39
sa w roo
t
Joh n
Mary
9
3 0
Gx
Reserving highest tree in cycle
Recursive run the algorithm
3 1 4
0 2
0 3
0 30
Chu-Liu-Edmonds Algorithm (11/12)
25 /39
sa w roo
t
Joh n
Mary
9
3 0
Gx
Finding incoming edge with highest sc ore
Tree: terminate and output
3 1 4
0
30
Chu-Liu-Edmonds Algorithm (12/12)
26 /39
sa w roo
t
Joh n
Mary
3 0
Gx
Maximum Spanning Tree of Gx
30 4 0 1 0
Complexity of Chu-Liu-Edmonds Algorit hm
Each recursive call takes O(n2) to fin d highest incoming edge for each word
At most O(n) recursive calls (contracting n times)
Total: O(n3)
Tarjan gives an efficient implementati on of the algorithm with O(n2) for den se graphs
27 /39
Algorithm for Projective Tre es
Eisner Algorithm: O(n3)
Using bottom-up dynamic programming
Maintain the nested structural constraint (non-crossing constraint)
28 /39
Online Large Margin Learni ng
29 /39
Online Large Margin Learning
Supervised learning
Target: training weight vectors w betwee n two features (PoS tag)
Training data:
Testing data: x
30 /39
MIRA Learning Algorithm
Margin Infused Relaxed Algorithm (MIR A)
dt(x): the set of possible dependency tree s for x
31 /39
keep new vector as close as possible to the old
final weight vector is the
average of the weight vectors after each iteration
Single-best MIRA
Using only the single margin constrai nt
32 /39
Factored MIRA
Local constraints
correct incoming edge for j other incoming edge for j
correct spanning tree
incorrect spanning trees
More restrictive than original constraints
33 /39
a margin of 1
the number of incorrect edges
Experiments
34 /39
Experimental Setting
Language: Czech
More flexible word order than English
▪ Non-projective dependency
Feature: Czech PoS tag
standard PoS, case, gender, tense
Ratio of non-projective and projective
Less than 2% of total edges are non-projective
▪ Czech-A: entire PDT
▪ Czech-B: including only the 23% of sentences with non- projective dependency
35 /39
Compared Systems
COLL1999
The projective lexicalized phrase-structur e parser
N&N2005
The pseudo-projective parser
McD2005
The projective parser using Eisner and 5-b est MIRA
Single-best MIRA
Factored MIRA
The non-projective parser using Chu-Liu-Ed monds
36 /39
Results of Czech
Czech-A (23% non- projective)
Accuracy Complete
82.8 -
80.0 31.8
83.3 31.3
84.1 32.2
84.4 32.3
37 /39
Czech-B (non- projective)
Accuracy Complete
- -
- -
74.8 0.0
81.0 14.9
81.5 14.3
COLL1999 O(n5) N&N2005
McD2005 O(n3) Single-best MIRA O(n2)
Factored MIRA O(n2)
Results of English
English
Accuracy Complete
90.9 37.5
90.2 33.2
90.2 32.3
38 /39
McD2005 O(n3) Single-best MIRA O(n2)
Factored MIRA O(n2)
English projective dependency trees
Eisner algorithm uses the a priori
knowledge that all trees are projective
Thanks for your attention!
39/39