Word Representations

(1)

Word Representations

Applied Deep Learning

February 21st, 2022 http://adl.miulab.tw

(2)

Meaning Representations

◉

Definition of “Meaning”

o the idea that is represented by a word, phrase, etc.

o the idea that a person wants to express by using words, signs, etc.

o the idea that is expressed in a work of writing, art, etc.

2

(3)

Meaning Representations in Computers

3

Knowledge-Based Representation Corpus-Based Representation

(4)

Meaning Representations in Computers

4

(5)

Knowledge-Based Representation

◉

Hypernyms (is-a) relationships of WordNet 5

Issues:

▪ newly-invented words

▪ subjective

▪ annotation effort

▪ difficult to compute word similarity

(6)

Meaning Representations in Computers

6

(7)

Corpus-Based Representation

◉

Atomic symbols: one-hot representation

7

[0 0 0 0 0 0 1 0 0 … 0]

^AND

[0 0 1 0 0 0 0 0 0 … 0] = 0

Idea: words with similar meanings often have similar neighbors

Issues: difficult to compute the similarity (i.e. comparing “car” and “motorcycle”) car

car

car motorcycle

(8)

Corpus-Based Representation

◉

Neighbor-based representation

o Co-occurrence matrix constructed via neighbors o Neighbor definition: full document v.s. windows 8

full document

word-document co-occurrence matrix gives general topics

→ “Latent Semantic Analysis”

windows

context window for each word

→ capture syntactic (e.g. POS) and semantic information

(9)

Window-Based Co-occurrence Matrix

◉

^Example

o Window length=1 o Left or right context o Corpus:

9

I love AI.

I love deep learning.

I enjoy learning.

Counts I love enjoy AI deep learning

I 0 2 1 0 0 0

love 2 0 0 1 1 0

enjoy 1 0 0 0 0 1

AI 0 1 0 0 0 0

deep 0 1 0 0 0 1

learning 0 0 1 0 1 0

similarity > 0

Issues:

▪ matrix size increases with vocabulary

▪ high dimensional

▪ sparsity → poor robustness

Idea: low dimensional word vector

(10)

Low-Dimensional Dense Word Vector

◉

Method 1: dimension reduction on the matrix

◉

Singular Value Decomposition (SVD) of co-occurrence matrix X

approximate 10

(11)

Low-Dimensional Dense Word Vector

◉

Method 1: dimension reduction on the matrix

◉

Singular Value Decomposition (SVD) of co-occurrence matrix X

semantic relations

Rohde et al., “An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence,” 2005.

syntactic relations

Issues:

▪ computationally expensive:

O(mn²) when n<m for nxm matrix

▪ difficult to add new words

Idea: directly learn low- dimensional word vectors

11

(12)

Low-Dimensional Dense Word Vector

◉

Method 2: directly learn low-dimensional word vectors

○

Learning representations by back-propagation. (Rumelhart et al., 1986)

○

A neural probabilistic language model (Bengio et al., 2003)

○

NLP (almost) from Scratch (Collobert & Weston, 2008)

○

Recent and most popular models: word2vec (Mikolov et al. 2013) and Glove (Pennington et al., 2014)

•

As known as “Word Embeddings”

12

(13)

Summary

◉

Knowledge-based representation

◉

Corpus-based representation

✓

Atomic symbol

✓

Neighbors

o

High-dimensional sparse word vector

o

Low-dimensional dense word vector

▪ Method 1 – dimension reduction

▪ Method 2 – direct learning

13