• 沒有找到結果。

# Non-projective Dependency Parsing using Spanning Tree Algorithm

N/A
N/A
Protected

Share "Non-projective Dependency Parsing using Spanning Tree Algorithm"

Copied!
39
0
0

(1)

### Non-projective Dependency Parsin g using Spanning Tree Algorithm

R98922004 Yun-Nung Chen 資工碩一 陳縕儂

1 /39

(2)

### Reference

Non-projective Dependency Parsing usi ng Spanning Tree Algorithms (HLT/EMNL P 2005)

Ryan McDonald, Fernando Pereira, Kiril Ribarov, Jan Hajic

2 /39

(3)

## Introduction

3 /39

(4)

### Example of Dependency Tree

Each word depends on exactly one parent

Projective

Words in linear order, satisfying

Edges without crossing

A word and its descendants form a contiguous subs tring of the sentence

4 /39

(5)

### Non-projective Examples

English

Most projective, some non-projective

Languages with more flexible word ord er

Most non-projective

German, Dutch, Czech

5 /39

(6)

Related work

relation extraction

machine translation

6 /39

(7)

### Main Idea of the Paper

Dependency parsing can be formalized as

the search for a maximum spanning tree in a directed graph

7 /39

(8)

8 /39

(9)

### 3)

sentence: x = x1 … xn

the directed graph Gx = ( Vx , Ex ) given by

dependency tree for x: y

the tree Gy = ( Vy , Ey ) Vy = Vx

Ey = {(i, j), there’s a dependency from xi to xj } 9 /39

(10)

### 3)

scores of edges

score of a dependency tree y for sent ence x

10 /39

(11)

### 3)

11 /39

x = John hit the ball with the bat

root hit

John ball the

with

bat the

y1 root

ball John hit

the

with bat the

y2 root

John

ball hit the

with bat the

y3

(12)

### Two Focus Points

1) How to decide weight vector w

2) How to find the tree with the maximum score

12 /39

(13)

### Maximum Spanning Trees

dependency trees for x

= spanning trees for Gx

the dependency tree with maximum score for x = maximum spanning trees for Gx

13 /39

(14)

14 /39

(15)

### Chu-Liu-Edmonds Algorithm (1/12)

Input: graph G = (V, E)

Output: a maximum spanning tree in G

greedily select the incoming edge with hi ghest weight

Tree

Cycle in G

contract cycle into a single vertex and r ecalculate edge weights going into and ou t the cycle

15 /39

(16)

### Chu-Liu-Edmonds Algorithm (2/12)

x = John saw Mary

16 /39

sa w roo

t

Joh n

Mary

9

3 0 1

0 2 9 0

3 3

0 1

1

0

Gx

(17)

### Chu-Liu-Edmonds Algorithm (3/12)

For each word, finding highest scorin g incoming edge

17 /39

sa w roo

t

Joh n

Mary

9

3 0 1

0 2 9 0

3 3

0 1

1

0

Gx

(18)

### Chu-Liu-Edmonds Algorithm (4/12)

If the result includes

Tree – terminate and output

Cycle – contract and recalculate

18 /39

sa w roo

t

Joh n

Mary

9

3 0 1

0 2 9 0

3 3

0 1

1

0

Gx

(19)

### Chu-Liu-Edmonds Algorithm (5/12)

Contract and recalculate

Contract the cycle into a single node

Recalculate edge weights going into and out th e cycle

19 /39

sa w roo

t

Joh n

Mary

9

3 0 1

0 2 9 0

3 3

0 1

1

0

Gx

(20)

### Chu-Liu-Edmonds Algorithm (6/12)

Outcoming edges for cycle

20 /39

sa w roo

t

Joh n

Mary

9

3 0 1

0

9

3

1 1

0

Gx

2

0 3

0

(21)

### Chu-Liu-Edmonds Algorithm (7/12)

Incoming edges for cycle

,

21 /39

sa w roo

t

Joh n

Mary

9

3 0 1

0

9

1 1

0

Gx

2

0 3

0

(22)

### Chu-Liu-Edmonds Algorithm (8/12)

x = root

s(root, John) – s(a(John), John) + s(C) = 9-30+50=29

s(root, saw) – s(a(saw), saw) + s(C) = 10-20+50=40

22 /39

sa w roo

t

Joh n

Mary

9

3 0 1

0

9

1 1

0

Gx 4

0 2

9

2

0 3

0

(23)

### Chu-Liu-Edmonds Algorithm (9/12)

x = Mary

s(Mary, John) – s(a(John), John) + s(C) = 11-30+50=31

s(Mary, saw) – s(a(saw), saw) + s(C) = 0-20+50=30

23 /39

sa w roo

t

Joh n

Mary

9

3 0

1 1

0

Gx

3 1 4

0

3 2 0

0 3

0

(24)

### Chu-Liu-Edmonds Algorithm (10/12)

24 /39

sa w roo

t

Joh n

Mary

9

3 0

Gx

Reserving highest tree in cycle

Recursive run the algorithm

3 1 4

0 2

0 3

0 30

(25)

### Chu-Liu-Edmonds Algorithm (11/12)

25 /39

sa w roo

t

Joh n

Mary

9

3 0

Gx

Finding incoming edge with highest sc ore

Tree: terminate and output

3 1 4

0

30

(26)

### Chu-Liu-Edmonds Algorithm (12/12)

26 /39

sa w roo

t

Joh n

Mary

3 0

Gx

Maximum Spanning Tree of Gx

30 4 0 1 0

(27)

### Complexity of Chu-Liu-Edmonds Algorit hm

Each recursive call takes O(n2) to fin d highest incoming edge for each word

At most O(n) recursive calls (contracting n times)

Total: O(n3)

Tarjan gives an efficient implementati on of the algorithm with O(n2) for den se graphs

27 /39

(28)

### Algorithm for Projective Tre es

Eisner Algorithm: O(n3)

Using bottom-up dynamic programming

Maintain the nested structural constraint (non-crossing constraint)

28 /39

(29)

## Online Large Margin Learni ng

29 /39

(30)

### Online Large Margin Learning

Supervised learning

Target: training weight vectors w betwee n two features (PoS tag)

Training data:

Testing data: x

30 /39

(31)

### MIRA Learning Algorithm

Margin Infused Relaxed Algorithm (MIR A)

dt(x): the set of possible dependency tree s for x

31 /39

keep new vector as close as possible to the old

final weight vector is the

average of the weight vectors after each iteration

(32)

### Single-best MIRA

Using only the single margin constrai nt

32 /39

(33)

### Factored MIRA

Local constraints

correct incoming edge for j other incoming edge for j

correct spanning tree

incorrect spanning trees

 More restrictive than original constraints

33 /39

 a margin of 1

 the number of incorrect edges

(34)

## Experiments

34 /39

(35)

### Experimental Setting

Language: Czech

More flexible word order than English

Non-projective dependency

Feature: Czech PoS tag

standard PoS, case, gender, tense

Ratio of non-projective and projective

Less than 2% of total edges are non-projective

Czech-A: entire PDT

Czech-B: including only the 23% of sentences with non- projective dependency

35 /39

(36)

### Compared Systems

COLL1999

The projective lexicalized phrase-structur e parser

N&N2005

The pseudo-projective parser

McD2005

The projective parser using Eisner and 5-b est MIRA

Single-best MIRA

Factored MIRA

The non-projective parser using Chu-Liu-Ed monds

36 /39

(37)

### Results of Czech

Czech-A (23% non- projective)

Accuracy Complete

82.8 -

80.0 31.8

83.3 31.3

84.1 32.2

84.4 32.3

37 /39

Czech-B (non- projective)

Accuracy Complete

- -

- -

74.8 0.0

81.0 14.9

81.5 14.3

COLL1999 O(n5) N&N2005

McD2005 O(n3) Single-best MIRA O(n2)

Factored MIRA O(n2)

(38)

### Results of English

English

Accuracy Complete

90.9 37.5

90.2 33.2

90.2 32.3

38 /39

McD2005 O(n3) Single-best MIRA O(n2)

Factored MIRA O(n2)

English projective dependency trees

Eisner algorithm uses the a priori

knowledge that all trees are projective

(39)

## 

39/39

Moreover, when compared with other meta-algorithms that reduce cost-sensitive classification to binary classification—namely, one-versus-all (Lin, 2008), error-correcting output

We proposed the condensed filter tree (CFT) algorithm by coupling several tools and ideas: the label powerset approach for reducing to cost-sensitive classifi- cation, the

Consider the following example where the left graph is G and the right graph is the spanning tree with the minimum total routing cost.. Finally, there are

• Schemes for bounded path length minimal spanning tree with upper (and lower) bound control are presented. – With applications to

Keywords: decomposition methods, low-degree polynomial mapping, kernel functions, support vector machines, dependency parsing, natural language

The execution of a comparison-based algorithm can be described by a comparison tree, and the tree depth is the greatest number of comparisons, i.e., the worst-case

Unsupervised Induction and Filling of Semantic Slot for Spoken Dialogue Systems Using Frame-Semantic Parsing.. Yun-Nung (Vivian) Chen, William Yang Wang, and

• 下面介紹三種使用 greedy algorithm 產生 minimum cost s panning

• Tree lifetime: When the first node is dead in tree T, the rounds number of the node surviving is the lifetime of the tree. The residual energy of node is denoted as E)), where

— John Wanamaker I know that half my advertising is a waste of money, I just don’t know which half.. —

• The randomized bipartite perfect matching algorithm is called a Monte Carlo algorithm in the sense that.. – If the algorithm ﬁnds that a matching exists, it is always correct

• Non-uniform space subdivision (for example, kd tree and octree) is better than uniform grid kd-tree and octree) is better than uniform grid if the scene is

Lecture by baluteshih Credit by zolution... Minimum

• 下面介紹三種使用greedy algorithm產生minimum cost spanning

Both problems are special cases of the optimum communication spanning tree problem, and are reduced to the minimum routing cost spanning tree (MRCT) prob- lem when all the

• Non-uniform space subdivision (for example, kd tree and octree) is better than uniform grid kd-tree and octree) is better than uniform grid if the scene is

O(log 2 k/ log log k)-Approximation Algorithm for Directed Steiner Tree: A Tight

The dynamic feature points are roughly clustered by the C-means algorithm and then a spatial-temporal shortest spanning tree is proposed to segment each

In this paper, we try to explain the three of ideals of graph theories - Dijkstra’s, Prim’s, and Kruskal’s minimum spanning tree on the board game Ticket to Ride. Brief describe

Yoshimura, “An enhanced perturbing algorithm for floorplan design using the O-Tree representation,” Proc. Wu, “B*-trees: A new represent- tation for non-slicing floorplans,”

Our research use the suffix tree algorithm which created by Ukkonen and further improved by Gusfield team for develop the primers selection algorithm, we reform the problem of