Sequence Modeling
Idea: aggregate the meaning from all words into a vector
Compositionality Method:
◦Basic combination: average, sum
◦Neural combination:
Recursive neural network (RvNN)
Recurrent neural network (RNN)
Convolutional neural network (CNN)
How to compute
這
(this)
規格
(specification)
有
(have)
誠意
(sincerity)
N-dim
Recursive Neural Network
From Words to Phrases
Recursive Neural Network
Idea: leverage the linguistic knowledge (syntax) for combining multiple words into phrases
Assumption: language is described recursively
Related Work for RvNN
Pollack (1990): Recursive auto-associative memories
Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa et al. (2003) assumed fixed tree structure and used one-hot vectors.
Hinton (1990) and Bottou (2011): Related ideas about
recursive models and recursive operators as smooth versions of logic operations
Outline
Property
◦Syntactic Compositionality
◦Recursion Assumption
Network Architecture and Definition
◦Standard Recursive Neural Network
◦ Weight-Tied
◦ Weight-Untied
◦Matrix-Vector Recursive Neural Network
◦Recursive Neural Tensor Network Applications
◦Parsing
◦Paraphrase Detection
◦Sentiment Analysis
Outline
Property
◦Syntactic Compositionality
◦Recursion Assumption
Network Architecture and Definition
◦Standard Recursive Neural Network
◦ Weight-Tied
◦ Weight-Untied
◦Matrix-Vector Recursive Neural Network
◦Recursive Neural Tensor Network Applications
◦Parsing
◦Paraphrase Detection
◦Sentiment Analysis
Phrase Mapping
Principle of “Compositionality”
◦The meaning (vector) of a sentence is determined by 1) the meanings of its words and
2) the rules that combine them
Idea: jointly learn parse trees and compositional vector representations
the country of my birth 0.4
0.3
2.1 3.3
7 7
4 4.5
2.3 3.6 1
3.5
2.5 3.8 5.5 6.1 1
5 the country of my birth
the place where I was born
Sentence Syntactic Parsing
Parsing is a process of analyzing a string of symbols Parsing tree conveys
1) Part-of-speech for each word 2) Phrases
3) Relationships
(NN = noun, VB = verb, DT = determiner, IN = Preposition)
The cat sat on the mat.
DT NN VB IN DT NN
NP PP VP
NP
S
Sentence Syntactic Parsing
Parsing is a process of analyzing a string of symbols Parsing tree conveys
1) Part-of-speech for each word 2) Phrases
3) Relationships
The cat sat on the mat.
DT NN VB
(NN = noun, VB = verb, DT = determiner, IN = Preposition)
IN DT NN NP PP VP
NP
S
Sentence Syntactic Parsing
Parsing is a process of analyzing a string of symbols Parsing tree conveys
1) Part-of-speech for each word 2) Phrases
• Noun phrase (NP): “the cat”, “the mat”
• Preposition phrase (PP): “on the mat”
• Verb phrase (VP): “sat on the mat”
• Sentence: “the cat sat on the mat”
3) Relationships
The cat sat on the mat.
DT NN VB
(NN = noun, VB = verb, DT = determiner, IN = Preposition)
IN DT NN
NP PP VP
NP
S
Sentence Syntactic Parsing
Parsing is a process of analyzing a string of symbols Parsing tree conveys
1) Part-of-speech for each word 2) Phrases
3) Relationships
The cat sat on the mat.
DT NN VB
(NN = noun, VB = verb, DT = determiner, IN = Preposition)
IN DT NN
NP PP VP
NP
S
subject verb modifier_of_place
• “the cat” is the subject of “sat”
• “on the mat” is the place modifier of “sat”
Learning Structure & Representation
Vector representations incorporate the meaning of words and their compositional structures
The cat sat on the mat.
NP PP
VP
NP
S
Outline
Property
◦Syntactic Compositionality
◦Recursion Assumption
Network Architecture and Definition
◦Standard Recursive Neural Network
◦ Weight-Tied
◦ Weight-Untied
◦Matrix-Vector Recursive Neural Network
◦Recursive Neural Tensor Network Applications
◦Parsing
◦Paraphrase Detection
◦Sentiment Analysis
Recursion Assumption
Are languages recursive?
Recursion helps describe natural language
◦Ex. “the church which has nice windows”, a noun phrase containing a relative clause that contains a noun phrases
◦NP NP PP
debatable
Recursion Assumption
Characteristics of recursion
1. Helpful in disambiguation
2. Helpful for some tasks to refer to specific phrases:
◦ John and Jane went to a big festival. They enjoyed the trip and the music there.
◦ “they”: John and Jane; “the trip”: went to a big festival; “there”: big festival
3. Works better for some tasks to use grammatical tree structure Language recursion is still up to debate
Outline
Property
◦Syntactic Compositionality
◦Recursion Assumption
Network Architecture and Definition
◦Standard Recursive Neural Network
◦ Weight-Tied
◦ Weight-Untied
◦Matrix-Vector Recursive Neural Network
◦Recursive Neural Tensor Network Applications
◦Parsing
◦Paraphrase Detection
◦Sentiment Analysis
Recursive Neural Network Architecture
A network is to predict the vectors along with the structure
◦Input: two candidate children’s vector representations
◦Output:
1) vector representations for the merged node 2) score of how plausible the new node would be
Neural Network
on the mat.
NP
score PP
Recursive Neural Network Definition
1) vector representations for the merged node
2) score of how plausible the new node would
Neural Network
besame W parameters at all nodes of the tree
weight-tied
3.1 0.3 0.1 0.4 2.3
Sentence Parsing via RvNN
Neural Network Neural
Network
Neural Network
Neural Network
Neural Network
1.1
0.1 0.4 2.3
Sentence Parsing via RvNN
Neural Network
Neural Network
Neural Network
Neural Network
1.1
0.1
3.6
Sentence Parsing via RvNN
Neural Network
Neural Network
Neural Network
1.1
3.8
Sentence Parsing via RvNN
Neural Network
Neural Network
Sentence parsing score
Sentence Parsing via RvNN
Neural Network
Sentence vector embeddings
Backpropagation through Structure
Principally the same as general backpropagation (Goller& Küchler, 1996)
1
1 1 l x
l a
j l
l j
i
Backward Pass
⋮
⋮
Forward Pass
⋮
⋮
Three differences
Sum derivatives of W from all nodes
Split derivatives at each node
Add error messages from parent + node itself
1) Sum derivatives of W from all nodes
Neural Network
Neural Network
2) Split derivatives at each node
During forward propagation, the parent node is computed based on two children
During backward propagation, the errors should be computed wrt each of them
Neural Network Neural Network
3) Add error messages
For each node, the error message is compose of
◦Error propagated from parent
◦Error from the current node
Neural Network
Outline
Property
◦Syntactic Compositionality
◦Recursion Assumption
Network Architecture and Definition
◦Standard Recursive Neural Network
◦ Weight-Tied
◦ Weight-Untied
◦Matrix-Vector Recursive Neural Network
◦Recursive Neural Tensor Network Applications
◦Parsing
◦Paraphrase Detection
◦Sentiment Analysis
Composition Matrix W
Neural Network
Issue: using the same network W for different compositions
Syntactically Untied RvNN
Idea: the composition function is conditioned on the syntactic categories
Neural Network
Benefit
• Composition function are syntax-dependent
• Allows different composition functions for word pairs, e.g. Adv + AdjP, VP + NP
Issue: speed due to many candidates
Compositional Vector Grammar
Compute score only for a subset of trees coming from a simpler, faster model (Socher et al, 2013)
◦Prunes very unlikely candidates for speed
◦Provides coarse syntactic categories of the children for each beam candidate
Probability context-free grammar (PCFG) helps decrease the search space
Labels for RvNN
The score can be passed through a softmax function to compute the probability of each category
x1 x2 x3 x4
y1
y2
y3 Neural Network softmax
NP
Softmax loss cross-entropy error for optimization
Outline
Property
◦Syntactic Compositionality
◦Recursion Assumption
Network Architecture and Definition
◦Standard Recursive Neural Network
◦ Weight-Tied
◦ Weight-Untied
◦Matrix-Vector Recursive Neural Network
◦Recursive Neural Tensor Network Applications
◦Parsing
◦Paraphrase Detection
◦Sentiment Analysis
Recursive Neural Network
Neural Network
Issue: some words act mostly as an operator, e.g. “very” in “very good”
Matrix-Vector Recursive Neural Network
Neural Network
Idea: each word can additionally serve as an operator
Outline
Property
◦Syntactic Compositionality
◦Recursion Assumption
Network Architecture and Definition
◦Standard Recursive Neural Network
◦ Weight-Tied
◦ Weight-Untied
◦Matrix-Vector Recursive Neural Network
◦Recursive Neural Tensor Network Applications
◦Parsing
◦Paraphrase Detection
◦Sentiment Analysis
Recursive Neural Tensor Network
Idea: allow more interactions of vectors
Outline
Property
◦Syntactic Compositionality
◦Recursion Assumption
Network Architecture and Definition
◦Standard Recursive Neural Network
◦ Weight-Tied
◦ Weight-Untied
◦Matrix-Vector Recursive Neural Network
◦Recursive Neural Tensor Network Applications
◦Parsing
◦Paraphrase Detection
◦Sentiment Analysis
Language Compositionality
Image Compositionality
Idea: image can be composed by the visual segments (same as natural language parsing)
Outline
Property
◦Syntactic Compositionality
◦Recursion Assumption
Network Architecture and Definition
◦Standard Recursive Neural Network
◦ Weight-Tied
◦ Weight-Untied
◦Matrix-Vector Recursive Neural Network
◦Recursive Neural Tensor Network Applications
◦Parsing
◦Paraphrase Detection
◦Sentiment Analysis
Paraphrase for Learning Sentence Vectors
A pair-wise sentence comparison of nodes in parsed trees for learning sentence embeddings
Outline
Property
◦Syntactic Compositionality
◦Recursion Assumption
Network Architecture and Definition
◦Standard Recursive Neural Network
◦ Weight-Tied
◦ Weight-Untied
◦Matrix-Vector Recursive Neural Network
◦Recursive Neural Tensor Network Applications
◦Parsing
◦Paraphrase Detection
◦Sentiment Analysis
Sentiment Analysis
Sentiment analysis for sentences with negation words can benefit from RvNN
Sentiment Analysis
Sentiment Treebank with richer annotations
Phrase-level sentiment labels indeed improve the performance
Sentiment Tree Illustration
Stanford live demo: http://nlp.stanford.edu/sentiment/
Phrase-level annotations learn the specific compositional functions for sentiment
Concluding Remarks
Recursive Neural Network
◦ Idea: syntactic compositionality
& language recursion Network Variants
◦Standard Recursive Neural Network
◦ Weight-Tied
◦ Weight-Untied
◦Matrix-Vector Recursive Neural Network
◦Recursive Neural Tensor Network