• 沒有找到結果。

Word Representations

N/A
N/A
Protected

Academic year: 2022

Share "Word Representations"

Copied!
13
0
0

加載中.... (立即查看全文)

全文

(1)

Word Representations

Applied Deep Learning

February 21st, 2022 http://adl.miulab.tw

(2)

Meaning Representations

Definition of “Meaning”

o the idea that is represented by a word, phrase, etc.

o the idea that a person wants to express by using words, signs, etc.

o the idea that is expressed in a work of writing, art, etc.

2

(3)

Meaning Representations in Computers

3

Knowledge-Based Representation Corpus-Based Representation

(4)

Meaning Representations in Computers

4

Knowledge-Based Representation Corpus-Based Representation

(5)

Knowledge-Based Representation

Hypernyms (is-a) relationships of WordNet 5

Issues:

▪ newly-invented words

▪ subjective

▪ annotation effort

▪ difficult to compute word similarity

(6)

Meaning Representations in Computers

6

Knowledge-Based Representation Corpus-Based Representation

(7)

Corpus-Based Representation

Atomic symbols: one-hot representation

7

[0 0 0 0 0 0 1 0 0 … 0]

[0 0 0 0 0 0 1 0 0 … 0]

AND

[0 0 1 0 0 0 0 0 0 … 0] = 0

Idea: words with similar meanings often have similar neighbors

Issues: difficult to compute the similarity (i.e. comparing “car” and “motorcycle”) car

car

car motorcycle

(8)

Corpus-Based Representation

Neighbor-based representation

o Co-occurrence matrix constructed via neighbors o Neighbor definition: full document v.s. windows 8

full document

word-document co-occurrence matrix gives general topics

→ “Latent Semantic Analysis”

windows

context window for each word

→ capture syntactic (e.g. POS) and semantic information

(9)

Window-Based Co-occurrence Matrix

Example

o Window length=1 o Left or right context o Corpus:

9

I love AI.

I love deep learning.

I enjoy learning.

Counts I love enjoy AI deep learning

I 0 2 1 0 0 0

love 2 0 0 1 1 0

enjoy 1 0 0 0 0 1

AI 0 1 0 0 0 0

deep 0 1 0 0 0 1

learning 0 0 1 0 1 0

similarity > 0

Issues:

▪ matrix size increases with vocabulary

▪ high dimensional

▪ sparsity → poor robustness

Idea: low dimensional word vector

(10)

Low-Dimensional Dense Word Vector

Method 1: dimension reduction on the matrix

Singular Value Decomposition (SVD) of co-occurrence matrix X

approximate 10

(11)

Low-Dimensional Dense Word Vector

Method 1: dimension reduction on the matrix

Singular Value Decomposition (SVD) of co-occurrence matrix X

semantic relations

Rohde et al., “An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence,” 2005.

syntactic relations

Issues:

▪ computationally expensive:

O(mn2) when n<m for nxm matrix

▪ difficult to add new words

Idea: directly learn low- dimensional word vectors

11

(12)

Low-Dimensional Dense Word Vector

Method 2: directly learn low-dimensional word vectors

Learning representations by back-propagation. (Rumelhart et al., 1986)

A neural probabilistic language model (Bengio et al., 2003)

NLP (almost) from Scratch (Collobert & Weston, 2008)

Recent and most popular models: word2vec (Mikolov et al. 2013) and Glove (Pennington et al., 2014)

As known as “Word Embeddings”

12

(13)

Summary

Knowledge-based representation

Corpus-based representation

Atomic symbol

Neighbors

o

High-dimensional sparse word vector

o

Low-dimensional dense word vector

▪ Method 1 – dimension reduction

▪ Method 2 – direct learning

13

參考文獻

相關文件

Keywords: multi-view representation of pedestrian, sequential Monte Carlo method, static parameters, dynamic parameters,

In another word, the initial state description is the conjunct of the precondition of the task and the guard condition of the task’s method, and the state descriptions are

graphs, a slot-based semantic knowledge graph and a word-based lexical knowledge graph, are au- tomatically constructed. To jointly consider the word-to-word, word-to-slot,

engineering is replaced by an existing implementation and documentation of the system. TAME is built on a commercial hypertext system-Knowledge Manage- ment System by

Here the difference in the time of arrival of the signal from the mobile to more than one base station is used to calculate the location of the device.. This method needs

Elementary Representation Theory of Compact Lie Groups.

Receiver operating characteristic (ROC) curves are a popular measure to assess performance of binary classification procedure and have extended to ROC surfaces for ternary or

To write the power series with   rather than  +2 , we will decrease each occurrence of  in the term by 2 and increase the initial value of the summation variable by 2..