• 沒有找到結果。

Review 2

N/A
N/A
Protected

Academic year: 2022

Share "Review 2"

Copied!
37
0
0

加載中.... (立即查看全文)

全文

(1)
(2)

Review

2

(3)

3

Task-Oriented Dialogue System

(Young, 2000)

3

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Backend Database/

Knowledge Providers

http://rsta.royalsocietypublishing.org/content/358/1769/1389.short

(4)

4

Task-Oriented Dialogue System

(Young, 2000)

4

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Backend Action / Knowledge Providers

http://rsta.royalsocietypublishing.org/content/358/1769/1389.short

(5)

Conventional LU

5

(6)

6

Language Understanding (LU)

Pipelined

6

1. Domain Classification

2. Intent

Classification 3. Slot Filling

(7)

LU – Domain/Intent Classification

As an utterance classification

task

• Given a collection of utterances ui with labels ci, D= {(u1,c1),…,(un,cn)} where ci C, train a model to estimate labels for new utterances uk.

7

find me a cheap taiwanese restaurant in oakland

Movies Restaurants Music

Sports

find_movie, buy_tickets

find_restaurant, find_price, book_table find_lyrics, find_singer

Domain Intent

(8)

8

Conventional Approach

8

Data

Model

Prediction

dialogue utterances annotated with domains/intents

domains/intents

machine learning classification model e.g. support vector machine (SVM)

(9)

9

Theory: Support Vector Machine

SVM is a maximum margin classifier

Input data points are mapped into a high dimensional feature space where the data is linearly separable

Support vectors are input data points that lie on the margin

9 http://www.csie.ntu.edu.tw/~htlin/mooc/

(10)

10  z

z

Theory: Support Vector Machine

Multiclass SVM

Extended using one-versus-rest approach

Then transform into probability

http://www.csie.ntu.edu.tw/~htlin/mooc/

SVM1 SVM2 SVM3 SVMk

S1 S2 S3 Sk

score for

each class … …

P1 P2 P3 Pk

prob for

each class … …

Domain/intent can be decided based on the estimated scores

(11)

LU – Slot Filling

11

flights from Boston to New York today

O O B-city O B-city I-city O

O O B-dept O B-arrival I-arrival B-date

As a sequence tagging task

• Given a collection tagged word sequences, S={((w1,1,w1,2,…, w1,n1), (t1,1,t1,2,…,t1,n1)), ((w2,1,w2,2,…,w2,n2), (t2,1,t2,2,…,t2,n2)) …}

where ti M, the goal is to estimate tags for a new word sequence.

flights from Boston to New York today

Entity Tag Slot Tag

(12)

12

Conventional Approach

12

Data

Model

Prediction

dialogue utterances annotated with slots

slotsand their values

machine learning tagging model e.g. conditional random fields (CRF)

(13)

13

Theory: Conditional Random Fields

CRF assumes that the label at time step t depends on the label in the previous time step t-1

Maximize the log probability log p(y | x) with respect to parameters λ

13

input output

Slots can be tagged based on the y that maximizes p(y|x)

(14)

Neural Network Based LU

14

(15)

15

A Single Neuron

z w

1

w

2

w

N

x

1

x

2

x

N

b

  z

  z

bias

z

y

 

z

z e

  1

 1

Sigmoid function Activation function

1

w, bare the parameters of this neuron

15

(16)

16

A Single Neuron

z w

1

w

2

w

N

x

1

x

2

x

N

b

bias

y

1

  

5 . 0

"

2

"

5 . 0

"

2

"

y not

y is

A single neuron can only handle binary classification

16

M

N

R

R

f : 

(17)

17

A Layer of Neurons

Handwriting digit classification

f : R

N

R

M

A layer of neurons can handle multiple possible output, and the result depends on the max one

x

1

x

2

x

N

1

y

1

… …

“1” or not

“2” or not

“3” or not

y

2

y

3

10 neurons/10 classes

Which one is max?

(18)

18

Deep Neural Networks (DNN)

Fully connected feedforward network

x1

x2

……

Layer 1

……

y1

y2

……

Layer 2

……

Layer L

……

……

……

Input Output

yM

xN

vector x

vector y

Deep NN: multiple hidden layers

M

N

R

R

f : 

(19)

19

Recurrent Neural Network (RNN)

http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/

: tanh, ReLU

time

RNN can learn accumulated sequential information (time-series)

(20)

20

Model Training

All model parameters can be updated by SGD

http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/ 20

yt-1 yt yt+1 target

predicted

(21)

21

BPTT

21

For 𝐶(1) Backward Pass:

For 𝐶(2)

For 𝐶(3) For 𝐶(4)

Forward Pass: Compute s1, s2, s3, s4 ……

y1 y2 y3

x1 x2 x3

o1 o2 o3

ini t

y4

x4 o4

𝐶(1) 𝐶(2) 𝐶(3) 𝐶(4)

s1 s2 s3 s4

The model is trained by comparing the correct sequence tags and the predicted ones

(22)

22

Deep Learning Approach

22

Data

Model

Prediction

dialogue utterances annotated with semantic frames (user intents & slots)

user intents, slots and their values

deep learning model (classification/tagging) e.g. recurrent neural networks (RNN)

(23)

23

Classification Model

Input: each utterance ui is represented as a feature vector fi

Output: a domain/intent label ci for each input utterance

23

As an utterance classification

task

• Given a collection of utterances ui with labels ci, D= {(u1,c1),…,(un,cn)} where ci C, train a model to estimate labels for new utterances uk.

How to represent a sentence using a feature vector

(24)

Sequence Tagging Model

24

As a sequence tagging task

• Given a collection tagged word sequences, S={((w1,1,w1,2,…, w1,n1), (t1,1,t1,2,…,t1,n1)), ((w2,1,w2,2,…,w2,n2), (t2,1,t2,2,…,t2,n2)) …}

where ti M, the goal is to estimate tags for a new word sequence.

Input: each word wi,jis represented as a feature vector fi,j

Output: a slot label tifor each word in the utterance

How to represent a word using a feature vector

(25)

25

Word Representation

Atomic symbols: one-hot representation

25

[0 0 0 0 0 0 1 0 0 … 0]

[0 0 0 0 0 0 1 0 0 … 0]

AND

[0 0 1 0 0 0 0 0 0 … 0] = 0

Issues: difficult to compute the similarity (i.e. comparing “car” and “motorcycle”)

car

car

car motorcycle

(26)

26

Word Representation

Neighbor-based: low-dimensional dense word embedding

26

Idea: words with similar meanings often have similar neighbors

(27)

27

Chinese Input Unit of Representation

Character

Feed each char to each time step

Word

Word segmentation required

你知道美女與野獸電影的評價如何嗎?

你/知道/美女與野獸/電影/的/評價/如何/嗎

Can two types of information fuse together for better performance?

(28)

LU – Domain/Intent Classification

As an utterance classification

task

• Given a collection of utterances ui with labels ci, D= {(u1,c1),…,(un,cn)} where ci C, train a model to estimate labels for new utterances uk.

28

find me a cheap taiwanese restaurant in oakland

Movies Restaurants Music

Sports

find_movie, buy_tickets

find_restaurant, find_price, book_table find_lyrics, find_singer

Domain Intent

(29)

29

Deep Neural Networks for Domain/Intent Classification – I

(Sarikaya et al, 2011)

Deep belief nets (DBN)

Unsupervised training of weights

Fine-tuning by back-propagation

Compared to MaxEnt, SVM, and boosting

29 http://ieeexplore.ieee.org/abstract/document/5947649/

(30)

30

Deep Neural Networks for Domain/Intent Classification – II

(Tur et al., 2012; Deng et al., 2012)

Deep convex networks (DCN)

Simple classifiers are stacked to learn complex functions

Feature selection of salient n-grams

Extension to kernel-DCN

30 http://ieeexplore.ieee.org/abstract/document/6289054/; http://ieeexplore.ieee.org/abstract/document/6424224/

(31)

31

Deep Neural Networks for Domain/Intent Classification – III

(Ravuri and Stolcke, 2015)

RNN and LSTMs for utterance classification

Word hashing to deal with large number of singletons

Kat: #Ka, Kat, at#

Each character n-gram is associated with a bit in the input encoding

31 https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/RNNLM_addressee.pdf

(32)

LU – Slot Filling

32

flights from Boston to New York today

O O B-city O B-city I-city O

O O B-dept O B-arrival I-arrival B-date

As a sequence tagging task

• Given a collection tagged word sequences, S={((w1,1,w1,2,…, w1,n1), (t1,1,t1,2,…,t1,n1)), ((w2,1,w2,2,…,w2,n2), (t2,1,t2,2,…,t2,n2)) …}

where ti M, the goal is to estimate tags for a new word sequence.

flights from Boston to New York today

Entity Tag Slot Tag

(33)

33

Recurrent Neural Nets for Slot Tagging – I

(Yao et al, 2013; Mesnil et al, 2015)

Variations:

a.

RNNs with LSTM cells

b.

Input, sliding window of n-grams

c.

Bi-directional LSTMs

𝑤0 𝑤1 𝑤2 𝑤𝑛 0𝑓 1𝑓 2𝑓 𝑛𝑓 0𝑏 1𝑏 2𝑏 𝑛𝑏 𝑦0 𝑦1 𝑦2 𝑦𝑛

(b) LSTM-LA (c) bLSTM 𝑦0 𝑦1 𝑦2 𝑦𝑛

𝑤0 𝑤1 𝑤2 𝑤𝑛 0 1 2 𝑛

(a) LSTM 𝑦0 𝑦1 𝑦2 𝑦𝑛

𝑤0 𝑤1 𝑤2 𝑤𝑛 0 1 2 𝑛

http://131.107.65.14/en-us/um/people/gzweig/Pubs/Interspeech2013RNNLU.pdf; http://dl.acm.org/citation.cfm?id=2876380

(34)

34

Recurrent Neural Nets for Slot Tagging – II

(Kurata et al., 2016; Simonnet et al., 2015)

Encoder-decoder networks

Leverages sentence level information

Attention-based encoder- decoder

Use of attention (as in MT) in the encoder-decoder network

Attention is estimated using a feed-forward network with input: ht and st at time t

𝑦0 𝑦1 𝑦2 𝑦𝑛

𝑤𝑛 𝑤2 𝑤1 𝑤0 𝑛 2 1 0

𝑤0 𝑤1 𝑤2 𝑤𝑛 𝑦0 𝑦1 𝑦2 𝑦𝑛

𝑤0 𝑤1 𝑤2 𝑤𝑛

0 1 2 𝑛 𝑠0 𝑠1 𝑠2 𝑠𝑛

ci 0𝑛

http://www.aclweb.org/anthology/D16-1223

(35)

ht-

1

ht+

1

ht

W W W W

taiwanese

B-type U

food U

please U

V

O V

O V

hT+1 EOS U

FIND_RES T V

Slot Filling Intent Prediction

Joint Semantic Frame Parsing

Sequence- based (Hakkani-Tur

et al., 2016)

• Slot filling and intent prediction in the same

output sequence

Parallel (Liu and Lane, 2016)

• Intent prediction and slot filling are performed in two branches

35 https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/IS16_MultiJoint.pdf; https://arxiv.org/abs/1609.01454

(36)

36

Milestone 1 – Language Understanding

3)

Collect and annotate data

4)

Use machine learning method to train your system

Conventional

SVM for domain/intent classification

CRF for slot filling

Deep learning

LSTM for domain/intent classification and slot filling

5)

Test your system performance

36

(37)

37

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Backend Database/

Knowledge Providers

Concluding Remarks

37

參考文獻

相關文件

allocate new-table with 2*T.size slots insert all items in T.table into new- table.

 Local, RADIUS, LDAP authentication presents user with a login page.  On successful authentication the user is redirected to

•  Automatically generate predicates and solutions from user troubleshooting traces. • 

⚫ Students should be able to create interactive user selection, such as the 2-level interdependent select list, pull down menu and click-to-expand menu. Students should be able

– Each listener may respond to a different kind of  event or multiple listeners might may respond to event, or multiple listeners might may respond to 

language reference User utterances “Find me an Indian place near CMU.” language reference Meta data Monday, 10:08 – 10:15, Home contexts of the tasks..

Dennis Ritchie Brian Kernighan Douglas McIlroy Michael Lesk Joe Ossanna.. Multitasking and multi-user OS

Describe the purpose and types of expansion slots and adapter cards, and differentiate among slots for various removable flash memory devices Differentiate between a port and