Word Representations
Applied Deep Learning
February 21st, 2022 http://adl.miulab.tw
Meaning Representations
◉
Definition of “Meaning”o the idea that is represented by a word, phrase, etc.
o the idea that a person wants to express by using words, signs, etc.
o the idea that is expressed in a work of writing, art, etc.
2
Meaning Representations in Computers
3
Knowledge-Based Representation Corpus-Based Representation
Meaning Representations in Computers
4
Knowledge-Based Representation Corpus-Based Representation
Knowledge-Based Representation
◉
Hypernyms (is-a) relationships of WordNet 5Issues:
▪ newly-invented words
▪ subjective
▪ annotation effort
▪ difficult to compute word similarity
Meaning Representations in Computers
6
Knowledge-Based Representation Corpus-Based Representation
Corpus-Based Representation
◉
Atomic symbols: one-hot representation7
[0 0 0 0 0 0 1 0 0 … 0]
[0 0 0 0 0 0 1 0 0 … 0]
AND[0 0 1 0 0 0 0 0 0 … 0] = 0
Idea: words with similar meanings often have similar neighbors
Issues: difficult to compute the similarity (i.e. comparing “car” and “motorcycle”) car
car
car motorcycle
Corpus-Based Representation
◉
Neighbor-based representationo Co-occurrence matrix constructed via neighbors o Neighbor definition: full document v.s. windows 8
full document
word-document co-occurrence matrix gives general topics
→ “Latent Semantic Analysis”
windows
context window for each word
→ capture syntactic (e.g. POS) and semantic information
Window-Based Co-occurrence Matrix
◉
Exampleo Window length=1 o Left or right context o Corpus:
9
I love AI.
I love deep learning.
I enjoy learning.
Counts I love enjoy AI deep learning
I 0 2 1 0 0 0
love 2 0 0 1 1 0
enjoy 1 0 0 0 0 1
AI 0 1 0 0 0 0
deep 0 1 0 0 0 1
learning 0 0 1 0 1 0
similarity > 0
Issues:
▪ matrix size increases with vocabulary
▪ high dimensional
▪ sparsity → poor robustness
Idea: low dimensional word vector
Low-Dimensional Dense Word Vector
◉
Method 1: dimension reduction on the matrix◉
Singular Value Decomposition (SVD) of co-occurrence matrix Xapproximate 10
Low-Dimensional Dense Word Vector
◉
Method 1: dimension reduction on the matrix◉
Singular Value Decomposition (SVD) of co-occurrence matrix Xsemantic relations
Rohde et al., “An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence,” 2005.
syntactic relations
Issues:
▪ computationally expensive:
O(mn2) when n<m for nxm matrix
▪ difficult to add new words
Idea: directly learn low- dimensional word vectors
11
Low-Dimensional Dense Word Vector
◉
Method 2: directly learn low-dimensional word vectors○
Learning representations by back-propagation. (Rumelhart et al., 1986)○
A neural probabilistic language model (Bengio et al., 2003)○
NLP (almost) from Scratch (Collobert & Weston, 2008)○
Recent and most popular models: word2vec (Mikolov et al. 2013) and Glove (Pennington et al., 2014)•
As known as “Word Embeddings”12
Summary
◉
Knowledge-based representation◉
Corpus-based representation✓
Atomic symbol✓
Neighborso
High-dimensional sparse word vectoro
Low-dimensional dense word vector▪ Method 1 – dimension reduction
▪ Method 2 – direct learning
13