• 沒有找到結果。

Slides credit from Shawn

N/A
N/A
Protected

Academic year: 2022

Share "Slides credit from Shawn"

Copied!
38
0
0

加載中.... (立即查看全文)

全文

(1)

Slides credit from Shawn

(2)

Review

2

(3)

3

Task-Oriented Dialogue System

(Young, 2000)

3

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Backend Database/

Knowledge Providers

http://rsta.royalsocietypublishing.org/content/358/1769/1389.short

(4)

4

Task-Oriented Dialogue System

(Young, 2000)

4

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Backend Action / Knowledge Providers

(5)

Language Modeling

5

(6)

6

Language Modeling

Goal: estimate the probability of a word sequence

Example task: determinate whether a sequence is grammatical or makes more sense

6

recognize speech or

wreck a nice beach Output =

“recognize speech”

If P(recognize speech)

> P(wreck a nice beach)

(7)

7

N-Gram Language Modeling

Goal: estimate the probability of a word sequence

N-gram language model

Probability is conditioned on a window of (n-1) previous words

Estimate the probability based on the training data

7

𝑃 beach|nice = 𝐶 𝑛𝑖𝑐𝑒 𝑏𝑒𝑎𝑐ℎ

𝐶 𝑛𝑖𝑐𝑒 Count of “nice” in the training data Count of “nice beach” in the training data

Issue: some sequences may not appear in the training data

(8)

8

N-Gram Language Modeling

Training data:

The dog ran ……

The cat jumped ……

8

P( jumped | dog ) = 0 P( ran | cat ) = 0

give some small probability

 smoothing

0.0001 0.0001

 The probability is not accurate.

 The phenomenon happens because we cannot collect all the possible text in the world as training data.

(9)

9

Neural Language Modeling

Idea: estimate not from count, but

from the NN prediction

9

Neural Network

vector of “START”

P(next word is

“wreck”)

Neural Network

vector of “wreck”

P(next word is “a”)

Neural Network

vector of “a”

P(next word is

“nice”)

Neural Network

vector of “nice”

P(next word is

“beach”) P(“wreck a nice beach”) = P(wreck|START)P(a|wreck)P(nice|a)P(beach|nice)

(10)

10

Neural Language Modeling

Bengio et al., “A Neural Probabilistic Language Model,” in JMLR, 2003. 10

Issue: fixed context window for conditioning input

hidden output

context vector Probability distribution

of the next word

(11)

11

Neural Language Modeling

The input layer (or hidden layer) of the related words are close

If P(jump|dog) is large, P(jump|cat) increase accordingly (even there is not “… cat jump …” in the data)

11

h1 h2

dog cat

rabbit

Smoothing is automatically done

(12)

12

RNNLM

Idea: condition the neural network on all previous words and tie the weights at each time step

Assumption: temporal information matters

12

vector of “START”

P(next=“wreck”)

vector of “wreck”

P(next=“a”)

vector of “a”

P(next=“nice”)

vector of “nice”

P(next =“beach”)

Idea: pass the information from the previous hidden layer to leverage all contexts context

vector word dist

(13)

Natural Language Generation

13

Traditional Approaches

(14)

14

Natural Language Generation (NLG)

Mapping dialogue acts into natural language

inform(name=Seven_Days, foodtype=Chinese)

Seven Days is a nice Chinese restaurant

14

(15)

15

Template-Based NLG

Define a set of rules to map frames to NL

15

Pros:simple, error-free, easy to control Cons: time-consuming, rigid, poor scalability Semantic Frame Natural Language

confirm() “Please tell me more about the product your are looking for.”

confirm(area=$V) “Do you want somewhere in the $V?”

confirm(food=$V) “Do you want a $V restaurant?”

confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.”

(16)

16

Class-Based LM NLG

(Oh and Rudnicky, 2000)

Class-based language modeling

NLG by decoding

16

Pros:easy to implement/

understand, simple rules

Cons: computationally inefficient Classes:

inform_area inform_address

request_area request_postcode

http://dl.acm.org/citation.cfm?id=1117568

(17)

17

Phrase-Based NLG

(Mairesse et al, 2010)

Semantic DBN Phrase

DBN

Charlie Chan is a Chinese Restaurant near Cineworld in the centre

d d

Inform(name=Charlie Chan, food=Chinese, type= restaurant, near=Cineworld, area=centre)

17

Pros:efficient, good performance Cons: require semantic alignments

realization phrase semantic stack

http://dl.acm.org/citation.cfm?id=1858838

(18)

Natural Language Generation

18

Deep Learning Approaches

(19)

19

RNN-Based LM NLG

(Wen et al., 2015)

<BOS> SLOT_NAME serves SLOT_FOOD .

<BOS> Din Tai Fung serves Taiwanese . delexicalisation

Inform(name=Din Tai Fung, food=Taiwanese) 0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, 0, 0, 0…

dialogue act 1-hot representation

SLOT_NAME serves SLOT_FOOD . <EOS>

Slot weight tying

conditioned on the dialogue act

Input

Output

http://www.anthology.aclweb.org/W/W15/W15-46.pdf#page=295

(20)

20

Handling Semantic Repetition

Issue: semantic repetition

Din Tai Fung is a great Taiwanese restaurant that serves Taiwanese.

Din Tai Fung is a child friendly restaurant, and also allows kids.

Deficiency in either model or decoding (or both)

Mitigation

Post-processing rules (Oh & Rudnicky, 2000)

Gating mechanism (Wen et al., 2015)

Attention(Mei et al., 2016; Wen et al., 2015)

20

(21)

21

Visualization

21

(22)

22

Original LSTM cell

Dialogue act (DA) cell

Modify C

t

Semantic Conditioned LSTM

(Wen et al., 2015)

DA cell LSTM cell

Ct

it

ft

ot

rt

ht

dt

dt-1

xt

xt ht-1

xt ht-1 xt ht-1 xt ht-

1

ht-1

Inform(name=Seven_Days, food=Chinese)

0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, … dialog act 1-hot representation d0

22

Idea: using gate mechanism to control the generated semantics (dialogue act/slots)

http://www.aclweb.org/anthology/D/D15/D15-1199.pdf

(23)

23

Attentive Encoder-Decoder for NLG

Slot & value embedding

Attentive meaning representation

23

(24)

24

Attention Heat Map

(25)

25

Model Comparison

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

1 10 100

BLEU

% of data

hlstm sclstm encdec

0%

1%

10%

100%

1 10 100

ERR

% of data hlstm

sclstm encdec

(26)

26

Structural NLG

(Dušek and Jurčíček, 2016)

Goal: NLG based on the syntax tree

Encode trees as sequences

Seq2Seq model for generation

26 https://www.aclweb.org/anthology/P/P16/P16-2.pdf#page=79

(27)

27

Contextual NLG

(Dušek and Jurčíček, 2016)

Goal: adapting users’

way of speaking, providing context- aware responses

Context encoder

Seq2Seq model

27 https://www.aclweb.org/anthology/W/W16/W16-36.pdf#page=203

(28)

28

Decoder Sampling Strategy

Decoding procedure

Greedy search

Beam search

Random search

28

Inform(name=Din Tai Fung, food=Taiwanese) 0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, 0, 0, 0…

SLOT_NAME serves SLOT_FOOD . <EOS>

(29)

29

Greedy Search

Select the next word with the highest probability

29

(30)

30

Beam Search

Select the next k-best words and keep a beam with width=k for following decoding

30

(31)

31

Random Search

Randomly select the next word

Higher diversity

Can follow a probability distribution

31

(32)

Chit-Chat Generation

32

(33)

33

Chit-Chat Bot

Neural conversational model

Non task-oriented

33

(34)

34

Many-to-Many

Both input and output are both sequences → Sequence-to- sequence learning

E.g. Machine Translation (machine learning→機器學習)

34

learning

machine

機 器 學 習

[Ilya Sutskever, NIPS’14][Dzmitry Bahdanau, arXiv’15]

===

(35)

35

A Neural Conversational Model

Seq2Seq

35

[Vinyals and Le, 2015]

(36)

36

Chit-Chat Bot

36

電視影集 (~40,000 sentences)、美國總統大選辯論

(37)

37

Sci-Fi Short Film - SUNSPRING

https://www.youtube.com/watch?v=LY7x2Ihqj37

(38)

38

Concluding Remarks

The three pillars of deep learning for NLG

Distributed representation – generalization

Recurrent connection – long-term dependency

Conditional RNN – flexibility/creativity

Useful techniques in deep learning for NLG

Learnable gates

Attention mechanism

Generating longer/complex sentences

Phrase dialogue as conditional generation problem

Conditioning on raw input sentence  chit-chat bot

Conditioning on both structured and unstructured sources  task-completing dialogue system

38

參考文獻

相關文件

Machine Translation Speech Recognition Image Captioning Question Answering Sensory Memory.

Constrain the data distribution for learned latent codes Generate the latent code via a prior

Reinforcement learning is based on reward hypothesis A reward r t is a scalar feedback signal. ◦ Indicates how well agent is doing at

Shang-Yu Su, Chao-Wei Huang, and Yun-Nung Chen, “Dual Supervised Learning for Natural Language Understanding and Generation,” in Proceedings of The 57th Annual Meeting of

 Goal: select actions to maximize future reward Big three: action, state, reward.. Scenario of Reinforcement Learning.. Agent learns to take actions to maximize expected

Training two networks jointly  the generator knows how to adapt its parameters in order to produce output data that can fool the

 Goal: select actions to maximize future reward Big three: action, state,

◦ Value function: how good is each state and/or action1. ◦ Model: agent’s representation of