### Sequence Modeling

Idea: aggregate the meaning from all words into a vector

* Compositionality*
Method:

◦Basic combination: average, sum

◦Neural combination:

Recursive neural network (RvNN)

Recurrent neural network (RNN)

Convolutional neural network (CNN)

How to compute

這

(this)

規格

(specification)

有

(have)

誠意

(sincerity)

*N-dim*

## Recursive Neural Network

From Words to Phrases

### Recursive Neural Network

Idea: leverage the linguistic knowledge (syntax) for combining multiple words into phrases

Assumption: language is described recursively

### Related Work for RvNN

Pollack (1990): Recursive auto-associative memories

Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa et al. (2003) assumed fixed tree structure and used one-hot vectors.

Hinton (1990) and Bottou (2011): Related ideas about

recursive models and recursive operators as smooth versions of logic operations

### Outline

Property

◦Syntactic Compositionality

◦Recursion Assumption

Network Architecture and Definition

◦Standard Recursive Neural Network

◦ Weight-Tied

◦ Weight-Untied

◦Matrix-Vector Recursive Neural Network

◦Recursive Neural Tensor Network Applications

◦Parsing

◦Paraphrase Detection

◦Sentiment Analysis

### Outline

Property

◦**Syntactic Compositionality**

◦Recursion Assumption

Network Architecture and Definition

◦Standard Recursive Neural Network

◦ Weight-Tied

◦ Weight-Untied

◦Matrix-Vector Recursive Neural Network

◦Recursive Neural Tensor Network Applications

◦Parsing

◦Paraphrase Detection

◦Sentiment Analysis

### Phrase Mapping

Principle of “Compositionality”

◦The meaning (vector) of a sentence is determined by 1) the meanings of its words and

2) the rules that combine them

Idea: jointly learn parse trees and compositional vector representations

the country of my birth 0.4

0.3

2.1 3.3

7 7

4 4.5

2.3 3.6 1

3.5

2.5 3.8 5.5 6.1 1

5 the country of my birth

the place where I was born

### Sentence Syntactic Parsing

**Parsing is a process of analyzing a string of symbols**
Parsing tree conveys

1) Part-of-speech for each word 2) Phrases

3) Relationships

(NN = noun, VB = verb, DT = determiner, IN = Preposition)

The cat sat on the mat.

DT NN VB IN DT NN

NP PP VP

NP

S

### Sentence Syntactic Parsing

Parsing is a process of analyzing a string of symbols Parsing tree conveys

**1)** **Part-of-speech for each word**
2) Phrases

3) Relationships

The cat sat on the mat.

DT NN VB

(NN = noun, VB = verb, DT = determiner, IN = Preposition)

IN DT NN NP PP VP

NP

S

### Sentence Syntactic Parsing

Parsing is a process of analyzing a string of symbols Parsing tree conveys

1) Part-of-speech for each word
**2)** **Phrases**

• Noun phrase (NP): “the cat”, “the mat”

• Preposition phrase (PP): “on the mat”

• Verb phrase (VP): “sat on the mat”

• Sentence: “the cat sat on the mat”

3) Relationships

The cat sat on the mat.

DT NN VB

(NN = noun, VB = verb, DT = determiner, IN = Preposition)

IN DT NN

NP PP VP

NP

S

### Sentence Syntactic Parsing

Parsing is a process of analyzing a string of symbols Parsing tree conveys

1) Part-of-speech for each word 2) Phrases

**3)** **Relationships**

The cat sat on the mat.

DT NN VB

(NN = noun, VB = verb, DT = determiner, IN = Preposition)

IN DT NN

NP PP VP

NP

S

*subject verb modifier_of_place*

• “the cat” is the subject of “sat”

• “on the mat” is the place modifier of “sat”

### Learning Structure & Representation

Vector representations incorporate the meaning of words and their compositional structures

The cat sat on the mat.

NP PP

VP

NP

S

### Outline

Property

◦Syntactic Compositionality

◦**Recursion Assumption**

Network Architecture and Definition

◦Standard Recursive Neural Network

◦ Weight-Tied

◦ Weight-Untied

◦Matrix-Vector Recursive Neural Network

◦Recursive Neural Tensor Network Applications

◦Parsing

◦Paraphrase Detection

◦Sentiment Analysis

### Recursion Assumption

Are languages recursive?

Recursion helps describe natural language

◦Ex. “the church which has nice windows”, a noun phrase containing a relative clause that contains a noun phrases

◦NP NP PP

debatable

### Recursion Assumption

Characteristics of recursion

1. Helpful in disambiguation

2. Helpful for some tasks to refer to specific phrases:

◦ John and Jane went to a big festival. They enjoyed the trip and the music there.

◦ “they”: John and Jane; “the trip”: went to a big festival; “there”: big festival

3. Works better for some tasks to use grammatical tree structure Language recursion is still up to debate

### Outline

Property

◦Syntactic Compositionality

◦Recursion Assumption

Network Architecture and Definition

◦**Standard Recursive Neural Network**

◦ **Weight-Tied**

◦ Weight-Untied

◦Matrix-Vector Recursive Neural Network

◦Recursive Neural Tensor Network Applications

◦Parsing

◦Paraphrase Detection

◦Sentiment Analysis

### Recursive Neural Network Architecture

A network is to predict the vectors along with the structure

◦Input: two candidate children’s vector representations

◦Output:

1) vector representations for the merged node 2) score of how plausible the new node would be

Neural Network

on the mat.

NP

score PP

### Recursive Neural Network Definition

1) vector representations for the merged node

2) score of how plausible the new node would

### Neural Network

be*same W parameters at all nodes of the tree*

weight-tied

3.1 0.3 0.1 0.4 2.3

### Sentence Parsing via RvNN

Neural Network Neural

Network

Neural Network

Neural Network

Neural Network

1.1

0.1 0.4 2.3

### Sentence Parsing via RvNN

Neural Network

Neural Network

Neural Network

Neural Network

1.1

0.1

3.6

### Sentence Parsing via RvNN

Neural Network

Neural Network

Neural Network

1.1

3.8

### Sentence Parsing via RvNN

Neural Network

Neural Network

Sentence parsing score

### Sentence Parsing via RvNN

Neural Network

Sentence vector embeddings

### Backpropagation through Structure

Principally the same as general backpropagation (Goller& Küchler, 1996)

1

1 1
*l*
*x*

*l*
*a*

*j*
*l*

*l* *j*

*i*

**Backward Pass**

⋮

⋮

**Forward Pass**

⋮

⋮

Three differences

*Sum derivatives of W from all nodes*

Split derivatives at each node

Add error messages from parent + node itself

*1) Sum derivatives of W from all nodes*

Neural Network

Neural Network

### 2) Split derivatives at each node

During forward propagation, the parent node is computed based on two children

During backward propagation, the errors should be computed wrt each of them

Neural Network Neural Network

### 3) Add error messages

For each node, the error message is compose of

◦Error propagated from parent

◦Error from the current node

Neural Network

### Outline

Property

◦Syntactic Compositionality

◦Recursion Assumption

Network Architecture and Definition

◦**Standard Recursive Neural Network**

◦ Weight-Tied

◦ **Weight-Untied**

◦Matrix-Vector Recursive Neural Network

◦Recursive Neural Tensor Network Applications

◦Parsing

◦Paraphrase Detection

◦Sentiment Analysis

*Composition Matrix W*

Neural Network

*Issue: using the same network W*
for different compositions

### Syntactically Untied RvNN

Idea: the composition function is conditioned on the syntactic categories

Neural Network

Benefit

• Composition function are syntax-dependent

• Allows different composition functions for word pairs, e.g. Adv + AdjP, VP + NP

Issue: speed due to many candidates

### Compositional Vector Grammar

Compute score only for a subset of trees coming from a simpler, faster model (Socher et al, 2013)

◦Prunes very unlikely candidates for speed

◦Provides coarse syntactic categories of the children for each beam candidate

Probability context-free grammar (PCFG) helps decrease the search space

### Labels for RvNN

The score can be passed through a softmax function to compute the probability of each category

*x*_{1}*x*_{2}*x*_{3}*x*_{4}

*y*_{1}

*y*_{2}

*y** _{3}* Neural Network
softmax

NP

Softmax loss cross-entropy error for optimization

### Outline

Property

◦Syntactic Compositionality

◦Recursion Assumption

Network Architecture and Definition

◦Standard Recursive Neural Network

◦ Weight-Tied

◦ Weight-Untied

◦Matrix-Vector Recursive Neural Network

◦Recursive Neural Tensor Network Applications

◦Parsing

◦Paraphrase Detection

◦Sentiment Analysis

### Recursive Neural Network

Neural Network

Issue: some words act mostly as an operator, e.g. “very” in “very good”

### Matrix-Vector Recursive Neural Network

Neural Network

Idea: each word can additionally serve as an operator

### Outline

Property

◦Syntactic Compositionality

◦Recursion Assumption

Network Architecture and Definition

◦Standard Recursive Neural Network

◦ Weight-Tied

◦ Weight-Untied

◦Matrix-Vector Recursive Neural Network

◦Recursive Neural Tensor Network Applications

◦Parsing

◦Paraphrase Detection

◦Sentiment Analysis

### Recursive Neural Tensor Network

Idea: allow more interactions of vectors

### Outline

Property

◦Syntactic Compositionality

◦Recursion Assumption

Network Architecture and Definition

◦Standard Recursive Neural Network

◦ Weight-Tied

◦ Weight-Untied

◦Matrix-Vector Recursive Neural Network

◦Recursive Neural Tensor Network Applications

◦**Parsing**

◦Paraphrase Detection

◦Sentiment Analysis

### Language Compositionality

### Image Compositionality

Idea: image can be composed by the visual segments (same as natural language parsing)

### Outline

Property

◦Syntactic Compositionality

◦Recursion Assumption

Network Architecture and Definition

◦Standard Recursive Neural Network

◦ Weight-Tied

◦ Weight-Untied

◦Matrix-Vector Recursive Neural Network

◦Recursive Neural Tensor Network Applications

◦Parsing

◦**Paraphrase Detection**

◦Sentiment Analysis

### Paraphrase for Learning Sentence Vectors

A pair-wise sentence comparison of nodes in parsed trees for learning sentence embeddings

### Outline

Property

◦Syntactic Compositionality

◦Recursion Assumption

Network Architecture and Definition

◦Standard Recursive Neural Network

◦ Weight-Tied

◦ Weight-Untied

◦Matrix-Vector Recursive Neural Network

◦Recursive Neural Tensor Network Applications

◦Parsing

◦Paraphrase Detection

◦**Sentiment Analysis**

### Sentiment Analysis

Sentiment analysis for sentences with negation words can benefit from RvNN

### Sentiment Analysis

Sentiment Treebank with richer annotations

Phrase-level sentiment labels indeed improve the performance

### Sentiment Tree Illustration

Stanford live demo: http://nlp.stanford.edu/sentiment/

Phrase-level annotations learn the specific compositional functions for sentiment

### Concluding Remarks

Recursive Neural Network

◦ Idea: syntactic compositionality

& language recursion Network Variants

◦Standard Recursive Neural Network

◦ Weight-Tied

◦ Weight-Untied

◦Matrix-Vector Recursive Neural Network

◦Recursive Neural Tensor Network