### Non-projective Dependency Parsin g using Spanning Tree Algorithm

R98922004 Yun-Nung Chen 資工碩一 陳縕儂

1 /39

### Reference

Non-projective Dependency Parsing usi ng Spanning Tree Algorithms (HLT/EMNL P 2005)

Ryan McDonald, Fernando Pereira, Kiril Ribarov, Jan Hajic

2 /39

## Introduction

3 /39

### Example of Dependency Tree

Each word depends on exactly one parent

Projective

Words in linear order, satisfying

▪ Edges without crossing

▪ A word and its descendants form a contiguous subs tring of the sentence

4 /39

### Non-projective Examples

English

Most projective, some non-projective

Languages with more flexible word ord er

Most non-projective

▪ German, Dutch, Czech

5 /39

### Advantage of Dependency Parsing

Related work

relation extraction

machine translation

6 /39

### Main Idea of the Paper

Dependency parsing can be formalized as

the search for a maximum spanning tree in a directed graph

7 /39

### Dependency Parsing and Spanning Tre es

8 /39

### Edge based Factorization (1/

### 3)

**sentence: x = x**_{1}* … x*_{n}

**the directed graph G**_{x}** = ( V**_{x }**, E**_{x}** ) given by**

**dependency tree for x: y**

**the tree G**_{y}** = ( V**_{y}** , E**_{y}** ) ***V*_{y}* = V*_{x}

*E*_{y}* = {(i, j), *there’s a dependency from _{x}*_{i}* to

_{x}*} 9 /39*

_{j }### Edge based Factorization (2/

### 3)

scores of edges

**score of a dependency tree y for sent****ence x**

10 /39

### Edge based Factorization (3/

### 3)

11 /39

**x = John hit the ball with the bat**

*root*
*hit*

*John ball*
*the*

*with*

*bat*
*the*

**y**_{1}^{root}

*ball*
*John* *hit*

*the*

*with*
*bat* *the*

**y**_{2}^{root}

*John*

*ball*
*hit*
*the*

*with*
*bat* *the*

**y**_{3}

### Two Focus Points

1) **How to decide weight vector w**

2) How to find the tree with the maximum score

12 /39

### Maximum Spanning Trees

dependency trees for _{x}

= spanning trees for _{G}_{x}

the dependency tree with maximum score
for *_{x}* = maximum spanning trees for

_{G}

_{x}13 /39

### Maximum Spanning Tree Algorithm

14 /39

### Chu-Liu-Edmonds Algorithm (1/12)

**Input: graph G = (V, E)**

**Output: a maximum spanning tree in G**

① greedily select the incoming edge with hi ghest weight

▪ Tree

▪ **Cycle in G**

② contract cycle into a single vertex and r ecalculate edge weights going into and ou t the cycle

15 /39

### Chu-Liu-Edmonds Algorithm (2/12)

**x = John saw Mary**

16 /39

*sa*
*w*
*roo*

*t*

*Joh*
*n*

*Mary*

9

3 0 1

0 2 9 0

3 3

0 1

1

0

**G**_{x}

### Chu-Liu-Edmonds Algorithm (3/12)

For each word, finding highest scorin g incoming edge

17 /39

*sa*
*w*
*roo*

*t*

*Joh*
*n*

*Mary*

9

3 0 1

0 2 9 0

3 3

0 1

1

0

**G**_{x}

### Chu-Liu-Edmonds Algorithm (4/12)

If the result includes

Tree – terminate and output

Cycle – contract and recalculate

18 /39

*sa*
*w*
*roo*

*t*

*Joh*
*n*

*Mary*

9

3 0 1

0 2 9 0

3 3

0 1

1

0

**G**_{x}

### Chu-Liu-Edmonds Algorithm (5/12)

Contract and recalculate

▪ Contract the cycle into a single node

▪ Recalculate edge weights going into and out th e cycle

19 /39

*sa*
*w*
*roo*

*t*

*Joh*
*n*

*Mary*

9

3 0 1

0 2 9 0

3 3

0 1

1

0

**G**_{x}

### Chu-Liu-Edmonds Algorithm (6/12)

Outcoming edges for cycle

20 /39

*sa*
*w*
*roo*

*t*

*Joh*
*n*

*Mary*

9

3 0 1

0

9

3

1 1

0

**G**_{x}

2

0 3

0

### Chu-Liu-Edmonds Algorithm (7/12)

Incoming edges for cycle

,

21 /39

*sa*
*w*
*roo*

*t*

*Joh*
*n*

*Mary*

9

3 0 1

0

9

1 1

0

**G**_{x}

2

0 3

0

### Chu-Liu-Edmonds Algorithm (8/12)

*x = root*

▪ *s(root, John) – s(a(John), John) + s(C) *= 9-30+50=29

▪ *s(root, saw) – s(a(saw), saw) + s(C) *= 10-20+50=40

22 /39

*sa*
*w*
*roo*

*t*

*Joh*
*n*

*Mary*

9

3 0 1

0

9

1 1

0

**G**_{x}^{4}

0 2

9

2

0 3

0

### Chu-Liu-Edmonds Algorithm (9/12)

*x = Mary*

▪ *s(Mary, John) – s(a(John), John) + s(C) *= 11-30+50=31

▪ *s(Mary, saw) – s(a(saw), saw) + s(C) *= 0-20+50=30

23 /39

*sa*
*w*
*roo*

*t*

*Joh*
*n*

*Mary*

9

3 0

1 1

0

**G**_{x}

3 1 4

0

3 2 0

0 3

0

### Chu-Liu-Edmonds Algorithm (10/12)

24 /39

*sa*
*w*
*roo*

*t*

*Joh*
*n*

*Mary*

9

3 0

**G**_{x}

Reserving highest tree in cycle

Recursive run the algorithm

3 1 4

0 2

0 3

0 30

### Chu-Liu-Edmonds Algorithm (11/12)

25 /39

*sa*
*w*
*roo*

*t*

*Joh*
*n*

*Mary*

9

3 0

**G**_{x}

Finding incoming edge with highest sc ore

Tree: terminate and output

3 1 4

0

30

### Chu-Liu-Edmonds Algorithm (12/12)

26 /39

*sa*
*w*
*roo*

*t*

*Joh*
*n*

*Mary*

3 0

**G**_{x}

**Maximum Spanning Tree of G**_{x}

30 4 0 1 0

### Complexity of Chu-Liu-Edmonds Algorit hm

**Each recursive call takes O(n**^{2}* ) to fin*
d highest incoming edge for each word

* At most O(n) recursive calls*
(contracting n times)

**Total: O(n**^{3}**)**

Tarjan gives an efficient implementati
**on of the algorithm with O(n**^{2}* ) for den*
se graphs

27 /39

### Algorithm for Projective Tre es

**Eisner Algorithm: O(n**^{3}**)**

Using bottom-up dynamic programming

Maintain the nested structural constraint (non-crossing constraint)

28 /39

## Online Large Margin Learni ng

29 /39

### Online Large Margin Learning

Supervised learning

* Target: training weight vectors w betwee*
n two features (PoS tag)

Training data:

**Testing data: x**

30 /39

### MIRA Learning Algorithm

Margin Infused Relaxed Algorithm (MIR A)

**dt(x): the set of possible dependency tree****s for x**

31 /39

keep new vector as close as possible to the old

final weight vector is the

average of the weight vectors after each iteration

### Single-best MIRA

Using only the single margin constrai nt

32 /39

### Factored MIRA

Local constraints

correct incoming edge for j other incoming edge for j

correct spanning tree

incorrect spanning trees

More restrictive than original constraints

33 /39

a margin of 1

the number of incorrect edges

## Experiments

34 /39

### Experimental Setting

Language: Czech

More flexible word order than English

▪ Non-projective dependency

Feature: Czech PoS tag

standard PoS, case, gender, tense

Ratio of non-projective and projective

Less than 2% of total edges are non-projective

▪ Czech-A: entire PDT

▪ Czech-B: including only the 23% of sentences with non- projective dependency

35 /39

### Compared Systems

COLL1999

The projective lexicalized phrase-structur e parser

N&N2005

The pseudo-projective parser

McD2005

The projective parser using Eisner and 5-b est MIRA

Single-best MIRA

Factored MIRA

The non-projective parser using Chu-Liu-Ed monds

36 /39

### Results of Czech

**Czech-A (23% non-**
**projective)**

Accuracy Complete

82.8 -

80.0 31.8

83.3 31.3

84.1 32.2

**84.4** **32.3**

37 /39

**Czech-B (non-**
**projective)**

Accuracy Complete

- -

- -

74.8 0.0

81.0 **14.9**

**81.5** **14.3**

COLL1999 O(n^{5}* )*
N&N2005

McD2005 O(n^{3}* )*
Single-best MIRA

**O(n**

^{2}

**)**Factored MIRA
**O(n**^{2}**)**

### Results of English

**English**

Accuracy Complete

**90.9** **37.5**

90.2 33.2

**90.2** **32.3**

38 /39

McD2005 O(n^{3}* )*
Single-best MIRA

**O(n**

^{2}

**)**Factored MIRA
**O(n**^{2}**)**

English projective dependency trees

Eisner algorithm uses the a priori

knowledge that all trees are projective

## Thanks for your attention!

##

39/39