• 沒有找到結果。

Relation Recognition

N/A
N/A
Protected

Academic year: 2022

Share "Relation Recognition"

Copied!
48
0
0

加載中.... (立即查看全文)

全文

(1)

lecture of Internet-based IE

Relation Recognition

Fang Li

Dept. of Computer Science &

Engineering

(2)

lecture of Internet-based IE Technology

Contents

Relation Representation

Relation Identification

1.Knowledge Engineering Approach 2.Machine Learning Approach

--Supervised learning

--Semi-supervised learning -- Distant supervised learning -- Deep Learning

3. Knowledge graph

(3)

lecture of Internet-based IE

Relation Example:

(4)

lecture of Internet-based IE Technology

Relation with/without Time

 Relations may be timeless or bound to time intervals. For example,

father(x,y) vs. boss(x,y)

 Time type is divided into temporal points and intervals

(5)

lecture of Internet-based IE

Relation and Event

 Events: a particular kind of simple or complex relation among entities

involving a change in relation state at the end of a time interval.

 Eg: Company-founding

Company: IBM

Location: New York Date: June 16,1911

Original-Name: Coputer- Tabulating-Recording Co.

Founding-year (IBM, 1911) Founding-location (IBM, New York)

(6)

Relation Examples

 Physical--Located PER--‐GPE

He was in Tennessee.

 Part--Whole-Subsidiary ORG--‐ORG

XYZ, the parent company of ABC.

 Person--Social--Family PER--‐PER

John’s wife Yoko!

 Org--AFF-Founder PER--‐ORG

Steve Jobs, co-founder of Apple.

lecture of Internet-based IE Technology

(7)

lecture of Internet-based IE

Explicit and Implicit Relations

Explicit relations are expressed by certain surface linguistic forms:

Prepositional phrase: The CEO of Microsoft…

Prenominal modification: the American envoy…

Possessive: Microsoft’s chief scientist…

Nominalizations: Anan’s visit to Baghdad..

Apposition: Tony Blair, Britain’s prime minister….

(8)

lecture of Internet-based IE Technology

Explicit and Implicit Relations (cont.)

A relation is implicit in the text if the text provides convincing evidence that the relation actually holds.

Example:

Prime Minister Tony Blair attempted to convince the British Parliament of the necessity of intervening in Iraq.

Question: Was Tony Blair a British Prime Minister?

(9)

A conceptual view of entities and relations

E’s are the

entities found in a sentence.

R’s are the relations

between any two entities.

mutually dependent

lecture of Internet-based IE

(10)

Example:

lecture of Internet-based IE Technology

Mo Yan is the first Chinese writer to win the Nobel Prize in Literature, who was born in Shan Dong province.

(11)

Three Cases of Binary

Relation Extractions R(E

1

,E

2

)

 For a given fixed pair of entities (E1,E2), to find out the type of

relationship (R) that exists between the pair.

 Given relationship R and an entity name E1, to extract the entities (E2 ) with which E1 has relationship R.

 Given a fixed relationship type R, to find all the entity pairs (E1,E2).

lecture of Internet-based IE

(12)

Relation Extraction

 A harder task than entity extraction

 Relation extraction requires a

skillful combination of local and nonlocal noisy clues from

diverse syntactic and semantic structures in a sentence.

lecture of Internet-based IE Technology

(13)

Preprocessing Steps for relation extraction

lecture of Internet-based IE

E.g. Haifa located 53 miles from Tel Aviv will host ICML in 2010.  located

1) Named entity identification:

<LOC>Haifa</LOC> located 53 miles from

<LOC>Tel Aviv</LOC> will host ICML in 2010.

2) POS tagging:

Haifa/NNP located/VBN 53/CD miles/NNS from/IN Tel/NNP Aviv/NNP will/MD host/VB ICML/NNP

in/IN 2010/CD

(14)

Preprocessing Steps of Relation Extraction (cont.)

3)Syntactic Parse Tree

4) dependency Graph

Haifa located 53 miles from Tel Aviv will host ICML in 2010

(15)

lecture of Internet-based IE

Methods of Relation Recognition

1. Pattern-based methods:

 hand made patterns

 learning based on seeded pattern.

2. Supervised method

3. Semi-supervised method

4. Distant-supervised method 5. Deep learning method

6. …

(16)

Hearst’s Patterns

Some patterns extracted from the sentence or between the two entities: X is an instance of Y

(17)

lecture of Internet-based IE

Learning Patterns

-- based on seeds

<Mark Twain, Elmira> Seed tuple

“MarkTwain is buried in Elmira, NY.”

X is buried in Y --pattern1 induced

“The grave of Mark Twain is in Elmira”

The grave of X is in Y –pattern2 induced

“Elmira is Mark Twain’s final resting place”

Y is X’s final resting place. –pattern3 induced

 Use those patterns to grep for new facts.

(18)

lecture of Internet-based IE Technology

Problems with patterns

 Hand-built

Plus: High-precision, tailored to specific domains

Minus: low-recall, huge labor

 Learning based on seeds

Plus: high-recall, less human labor Minus: noise, low-precision

(19)

Supervised machine learning methods (overview)

1. Choose a set of relations to extract 2. Find and label data

Label the named entities and the relations between these entities.

Break into training, development and test sets

3. Train a classifier on the training set

4. Find all pairs of named entities (usually in the same sentence)

5. Use the classifier to identify the relation

lecture of Internet-based IE

(20)

For example: to identify the

employee relation (Org, Per)

lecture of Internet-based IE Technology

Input:

Acme Inc. hired Mr Smith as their new CEO, replacing Mr Bloggs.

Org: Acme Inc.

Per: Smith and Bloggs

(Acme Inc., Smith) true (Acme Inc., Bloggs) false Model

(21)

Train the Model

 Extract features:

1. Features similar to those used in the entity recognition: capitalized, suffix and so on.

2. Conjunctions of the features from the two entities: spouse_of needs person type of both entities.

 Choose models: many models.

lecture of Internet-based IE lecture of Internet-based IE

(22)

Word Features

Acme Inc (mention 1). hired Mr Smith (mention 2) as their new CEO, replacing Mr Bloggs.

Headwords of M1 and M2, and combination Inc. Smith Inc.Smith

Bag of words and bigrams in M1 and M2

{Acme, Inc, hired, Mr., Smith, Acme Inc, Mr. Smith,…}

Words or bigrams in particular positions left and right of M1 and M2

M1: +1 hired M2: +1 as, -1 hired

Bag of words or bigrams between M1 and M2 {hired}

lecture of Internet-based IE Technology

(23)

Named entity type and mention level

Features

Acme Inc (mention 1). hired Mr Smith (mention 2) as their new CEO, replacing Mr Bloggs.

Named-entity types (ORG, PER, etc) M1: ORG M2: person

Concatenation of the two entity type ORG-PERSON

Entity level of M1 and M2 {name, nominal, pronoun}

M1: name M2: name

lecture of Internet-based IE

(24)

Parse Features

Acme Inc (mention 1). hired Mr Smith (mention 2) as their new CEO, replacing Mr Bloggs.

Base syntactic chunk sequence from one to the other

VP

Constituent path through the tree from one to the other

NP VP NP

Dependency path

Acme Inc. hired Mr Smith

lecture of Internet-based IE Technology

(25)

Other Features : Gazeteers and trigger words

lecture of Internet-based IE

 Personal relative trigger list from Wordnet: parent, wife, husband,…

 Country name list

 Wikipedial

(26)

Acme Inc (mention 1). hired Mr Smith

(mention 2) as their new CEO, replacing Mr Bloggs.

Entity-based features M1 type: ORG

M1 head: Inc M2 type: PERS M2 head: Smith

Word-based features

Between entity bag of words: {hired}

Words before M1: none Words after M2: as

Syntactic features

Constituent path: NP VP NP

Basic syntactic chunk path: VP Typed-dependency path:

Acme Inc.<- subj hired obj Mr.Smith

lecture of Internet-based IE Technology

Feature summary

(27)

lecture of Internet-based IE

From Dan Jurafsky 为什么需要两

个实体的 Entity head 特征呢?

(28)

Features  Feature

representation of the model

Acme Inc (mention 1). hired Mr Smith (mention 2) as their new CEO, replacing Mr Bloggs.

Feature between Mention1 and

mention2: word sequences or number of words between them.

Morphologic feature of mention 1(形态 特征):

ACME INC, A.C.M.E Inc, Acme Inc, acme inc

Combination feature (顺序关系):

(company name, person name), (person name, company name)

lecture of Internet-based IE Technology

(29)

Classifiers for supervised methods

(ref. chapter 5 of textbook)

 Choose models:

1. MaxEnt(maximum entropy model) 2. NB(Naïve Bayes)

3. SVM(support vector machines) 4. …

Train it on the training set, turn on the development set, test on the test set.

lecture of Internet-based IE

(30)

lecture of Internet-based IE Technology

Relationship Extraction using Support Vector Machine (SVM)

Support vector machine (SVM) is recognized as one of the best classification algorithm over various applications and domains.

SVM is a method that finds a function that discriminates between two classes.

Positive example

Negative example

(31)

Support Vector Machine (SVM)

When classifying natural language data, it is not always possible to linearly separate the data  map them into a feature space where they are linearly separable.

lecture of Internet-based IE

Kernel function

Positive example

Negative example

(32)

SVMLight: an open software

Install an SVM package such as SVMlight (http://svmlight.joachims.org/)

Transfer your training data format in order to be matched.

Use training command for SVMlight.

SVM Ref:

http://nlp.stanford.edu/IR-book/html/htmledition/support-

vector-machines-the-linearly-separable-case-1.html#svm-sv- classifier

lecture of Internet-based IE Technology

(33)

A Guide to SVM

 Transform data to the format of an SVM package.

 Conduct simple scaling on the data.

 Choose a kernel for SVM.

 Use cross-validation to the best parameter.

 Train the whole training set.

 Test

lecture of Internet-based IE

(34)

Data Preprocessing

 SVM requires that each data instance is represented as a vector of real

numbers.

 Use m numbers to represent a m- category attribute. For example a

three-category attribute such as (red, green, blue) can be represented as

(0,0,1), (0,1,0), and (1,0,0).

lecture of Internet-based IE Technology

(35)

Scaling

 Some attribute may be a value, such as the length of a sentence.

 Scaling before using SVM  [0,1] or [- 1,1], for example, [-10,10] to [-1,1]

 How ?

X= (x-min)/(max-min)

Using the same scaling factors for training and test sets, obtain better result.

lecture of Internet-based IE

(36)

Choose a kernel

 Linear kernel when the number of features is very large.

 RBF kernel can handle nonlinear problem.

lecture of Internet-based IE Technology

(37)

Cross-validation & grid-search

In v-fold cross-validation, first divide the training set into v subsets of equal size.

Sequentially one subset is tested using the classier trained on the remaining v-1

subsets.

Each instance of the whole training set is predicted once so the cross-validation

accuracy is the percentage of data which are correctly classified.

Grid-search parameter using cross- validation. lecture of Internet-based IE

(38)

Problems of Supervised methods

 High precision with enough hand-labeled training data.

 Labeling is expensive.

 Supervised models can not generalize well to different genres.

lecture of Internet-based IE Technology

(39)

lecture of Internet-based IE

Comparison of Classification Models

 Test corpus:

Reuters-21578 Data Set

21578 documents

118 categories

An article can be in more than one category

Learn 118 binary category distinctions

Common categories (#train, #test)

Earn (2877, 1087)

• Acquisitions (1650, 179)

• Money-fx (538, 179)

• Grain (433, 149)

• Crude (389, 189)

• Trade (369,119)

• Interest (347, 131)

• Ship (197, 89)

• Wheat (212, 71)

• Corn (182, 56)

(40)

lecture of Internet-based IE Technology

lecture of Internet-based IE

Technology 40

(41)

lecture of Internet-based IE

Semi-supervised method

Relation Bootstrapping

Gather a set of seeds

Iterate:

1.Find sentence with these seeds 2.Look at the context between or

around the seeds to define a pattern 3.Use the pattern for more examples

(42)

lecture of Internet-based IE Technology

Bootstrapping from seed entity

pairs to learn relations

(43)

lecture of Internet-based IE Technology

Confidence Value for Bootstrpping

Given a document collection D, a current set of tuples T, and a proposed pattern P, two factors need to be considered:

Hits: the set of tuples in T that p matches while looking in D.

Finds: The total set of tuples that p finds in D

A corpus D:

ABCDEF BDCDFE CDEFG HHECDE

Tuple set T:

BCDE ECDE AUD Conf(CD)=2/4 X log (4)=30%

(44)

lecture of Internet-based IE Technology

(45)

lecture of Internet-based IE lecture of Internet-based IE

Patterns:

[person][was assigned│was selected │was appointed as][position]

New examples:

Example: Extract Person name and position title

Search Engine Keywords: Wang Ning + vice Mayor

(46)

lecture of Internet-based IE Technology

Summarization

What is relations recognition? Three cases

How to identify relations?

Pattern-based methods

Supervised methods

Semi-supervised methods

(47)

lecture of Internet-based IE lecture of Internet-based IE

References

Text book chapter 5 Supervised Classification

Sunita Sarawagi. Information Extraction Foundations and Trends in Databases vol.1,No.3 2007 261-377.

Jun Zhu,et al. StatSnowball:a statistical apprpach to extracting entity relationships In Proceedings of

WWW 2009, Madrid.

回家作业课程网站下载,两篇文章选一篇阅读,完成截止 10月28日24点以前

Mintz,Bills,Snow,Jurafsky.Distant supervision for relation extraction without labeled data. ACL 2009

TransE

(48)

lecture of Internet-based IE Technology

Distant supervision method

“Distant supervision for relation extraction without labeled data”

What means “distant supervision”?

What are the advantages of the method?

What are the disadvantages of the method?

lecture of Internet-based IE Technology

參考文獻

相關文件

sketch with weak labels first, refine with limited labeled data later—or maybe learn from many weak labels only?.. Learning with Limited

Relation Between Sinusoidal and Complex Exponential Signals

Although there was not much significant difference in the performance of students in relation to their durations of computer usage per day in the secondary

Guiding students to analyse the language features and the rhetorical structure of the text in relation to its purpose along the genre egg model for content

From the existence theorems of solution for variational relation prob- lems, we study equivalent forms of generalized Fan-Browder fixed point theorem, exis- tence theorems of

It costs &gt;1TB memory to simply save the raw  graph data (without attributes, labels nor content).. This can cause problems for

• Dilek Hakkani-Tur, Asli Celikyilmaz, Larry Heck, and Gokhan Tur, Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding,

Lecture 16: Three Learning Principles Occam’s Razor?. Sampling Bias Data Snooping Power