Non-projective Dependency Parsing using Spanning Tree Algorithm

(1)

Non-projective Dependency Parsin g using Spanning Tree Algorithm

R98922004 Yun-Nung Chen 資工碩一陳縕儂

1 /39

(2)

Reference

 Non-projective Dependency Parsing usi ng Spanning Tree Algorithms (HLT/EMNL P 2005)

 Ryan McDonald, Fernando Pereira, Kiril Ribarov, Jan Hajic

2 /39

(3)

Introduction

3 /39

(4)

Example of Dependency Tree

 Each word depends on exactly one parent

 Projective

 Words in linear order, satisfying

▪ Edges without crossing

▪ A word and its descendants form a contiguous subs tring of the sentence

4 /39

(5)

Non-projective Examples

 English

 Most projective, some non-projective

 Languages with more flexible word ord er

 Most non-projective

▪ German, Dutch, Czech

5 /39

(6)

Advantage of Dependency Parsing

 Related work

 relation extraction

 machine translation

6 /39

(7)

Main Idea of the Paper

 Dependency parsing can be formalized as

 the search for a maximum spanning tree in a directed graph

7 /39

(8)

Dependency Parsing and Spanning Tre es

8 /39

(9)

Edge based Factorization (1/

3)

 sentence: x = x₁ … x_n

 the directed graph G_x = ( V_x, E_x ) given by

 dependency tree for x: y

 the tree G_y = ( V_y , E_y ) V_y = V_x

E_y = {(i, j), there’s a dependency from _x_i to _x_j} 9 /39

(10)

Edge based Factorization (2/

3)

 scores of edges

 score of a dependency tree y for sent ence x

10 /39

(11)

Edge based Factorization (3/

3)

11 /39

 x = John hit the ball with the bat

root hit

John ball the

with

bat the

y₁ ^root

ball John hit

the

with bat the

y₂ ^root

John

ball hit the

with bat the

y₃

(12)

Two Focus Points

1) How to decide weight vector w

2) How to find the tree with the maximum score

12 /39

(13)

Maximum Spanning Trees

 dependency trees for _x

= spanning trees for _G_x

 the dependency tree with maximum score for _x = maximum spanning trees for _G_x

13 /39

(14)

Maximum Spanning Tree Algorithm

14 /39

(15)

Chu-Liu-Edmonds Algorithm (1/12)

 Input: graph G = (V, E)

 Output: a maximum spanning tree in G

① greedily select the incoming edge with hi ghest weight

▪ Tree

▪ Cycle in G

② contract cycle into a single vertex and r ecalculate edge weights going into and ou t the cycle

15 /39

(16)

Chu-Liu-Edmonds Algorithm (2/12)

 x = John saw Mary

16 /39

sa w roo

t

Joh n

Mary

9

3 0 1

0 2 9 0

3 3

0 1

1

0

G_x

(17)

Chu-Liu-Edmonds Algorithm (3/12)

 For each word, finding highest scorin g incoming edge

17 /39

sa w roo

t

Joh n

Mary

9

3 0 1

0 2 9 0

3 3

0 1

1

0

G_x

(18)

Chu-Liu-Edmonds Algorithm (4/12)

 If the result includes

 Tree – terminate and output

 Cycle – contract and recalculate

18 /39

sa w roo

t

Joh n

Mary

9

3 0 1

0 2 9 0

3 3

0 1

1

0

G_x

(19)

Chu-Liu-Edmonds Algorithm (5/12)

 Contract and recalculate

▪ Contract the cycle into a single node

▪ Recalculate edge weights going into and out th e cycle

19 /39

sa w roo

t

Joh n

Mary

9

3 0 1

0 2 9 0

3 3

0 1

1

0

G_x

(20)

Chu-Liu-Edmonds Algorithm (6/12)

 Outcoming edges for cycle

20 /39

sa w roo

t

Joh n

Mary

9

3 0 1

0

9

3

1 1

0

G_x

2

0 3

0

(21)

Chu-Liu-Edmonds Algorithm (7/12)

 Incoming edges for cycle

,

21 /39

sa w roo

t

Joh n

Mary

9

3 0 1

0

9

1 1

0

G_x

2

0 3

0

(22)

Chu-Liu-Edmonds Algorithm (8/12)

 x = root

▪ s(root, John) – s(a(John), John) + s(C) = 9-30+50=29

▪ s(root, saw) – s(a(saw), saw) + s(C) = 10-20+50=40

22 /39

sa w roo

t

Joh n

Mary

9

3 0 1

0

9

1 1

0

G_x ⁴

0 2

9

2

0 3

0

(23)

Chu-Liu-Edmonds Algorithm (9/12)

 x = Mary

▪ s(Mary, John) – s(a(John), John) + s(C) = 11-30+50=31

▪ s(Mary, saw) – s(a(saw), saw) + s(C) = 0-20+50=30

23 /39

sa w roo

t

Joh n

Mary

9

3 0

1 1

0

G_x

3 1 4

0

3 2 0

0 3

0

(24)

Chu-Liu-Edmonds Algorithm (10/12)

24 /39

sa w roo

t

Joh n

Mary

9

3 0

G_x

 Reserving highest tree in cycle

 Recursive run the algorithm

3 1 4

0 2

0 3

0 30

(25)

Chu-Liu-Edmonds Algorithm (11/12)

25 /39

sa w roo

t

Joh n

Mary

9

3 0

G_x

 Finding incoming edge with highest sc ore

 Tree: terminate and output

3 1 4

0

30

(26)

Chu-Liu-Edmonds Algorithm (12/12)

26 /39

sa w roo

t

Joh n

Mary

3 0

G_x

 Maximum Spanning Tree of G_x

30 4 0 1 0

(27)

Complexity of Chu-Liu-Edmonds Algorit hm

 Each recursive call takes O(n²) to fin d highest incoming edge for each word

 At most O(n) recursive calls (contracting n times)

 Total: O(n³)

 Tarjan gives an efficient implementati on of the algorithm with O(n²) for den se graphs

27 /39

(28)

Algorithm for Projective Tre es

 Eisner Algorithm: O(n³)

 Using bottom-up dynamic programming

 Maintain the nested structural constraint (non-crossing constraint)

28 /39

(29)

Online Large Margin Learni ng

29 /39

(30)

Online Large Margin Learning

 Supervised learning

 Target: training weight vectors w betwee n two features (PoS tag)

 Training data:

 Testing data: x

30 /39

(31)

MIRA Learning Algorithm

 Margin Infused Relaxed Algorithm (MIR A)

 dt(x): the set of possible dependency tree s for x

31 /39

keep new vector as close as possible to the old

final weight vector is the

average of the weight vectors after each iteration

(32)

Single-best MIRA

 Using only the single margin constrai nt

32 /39

(33)

Factored MIRA

 Local constraints

 correct incoming edge for j other incoming edge for j

 correct spanning tree

incorrect spanning trees

 More restrictive than original constraints

33 /39

 a margin of 1

 the number of incorrect edges

(34)

Experiments

34 /39

(35)

Experimental Setting

 Language: Czech

 More flexible word order than English

▪ Non-projective dependency

 Feature: Czech PoS tag

 standard PoS, case, gender, tense

 Ratio of non-projective and projective

 Less than 2% of total edges are non-projective

▪ Czech-A: entire PDT

▪ Czech-B: including only the 23% of sentences with non- projective dependency

35 /39

(36)

Compared Systems

 COLL1999

 The projective lexicalized phrase-structur e parser

 N&N2005

 The pseudo-projective parser

 McD2005

 The projective parser using Eisner and 5-b est MIRA

 Single-best MIRA

 Factored MIRA

 The non-projective parser using Chu-Liu-Ed monds

36 /39

(37)

Results of Czech

Czech-A (23% non- projective)

Accuracy Complete

82.8 -

80.0 31.8

83.3 31.3

84.1 32.2

84.4 32.3

37 /39

Czech-B (non- projective)

Accuracy Complete

- -

74.8 0.0

81.0 14.9

81.5 14.3

COLL1999 O(n⁵) N&N2005

McD2005 O(n³) Single-best MIRA O(n²)

Factored MIRA O(n²)

(38)

Results of English

English

Accuracy Complete

90.9 37.5

90.2 33.2

90.2 32.3

38 /39

McD2005 O(n³) Single-best MIRA O(n²)

Factored MIRA O(n²)

 English projective dependency trees

 Eisner algorithm uses the a priori

knowledge that all trees are projective

(39)

Thanks for your attention!



39/39