Slides credit from Shawn

(1)

(2)

Review

2

(3)

3

Task-Oriented Dialogue System

(Young, 2000)

3

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Backend Database/

Knowledge Providers

http://rsta.royalsocietypublishing.org/content/358/1769/1389.short

(4)

4

Task-Oriented Dialogue System

(Young, 2000)

4

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Backend Action / Knowledge Providers

(5)

Natural Language Generation

5

Traditional Approaches

(6)

6

Natural Language Generation (NLG)



Mapping dialogue acts into natural language

inform(name=Seven_Days, foodtype=Chinese)

Seven Days is a nice Chinese restaurant

6

(7)

7

Template-Based NLG



Define a set of rules to map frames to NL

7

Pros:simple, error-free, easy to control Cons: time-consuming, rigid, poor scalability Semantic Frame Natural Language

confirm() “Please tell me more about the product your are looking for.”

confirm(area=$V) “Do you want somewhere in the $V?”

confirm(food=$V) “Do you want a $V restaurant?”

confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.”

(8)

8

Class-Based LM NLG

(Oh and Rudnicky, 2000)



Class-based language modeling



NLG by decoding

8

Pros:easy to implement/

understand, simple rules

Cons: computationally inefficient Classes:

inform_area inform_address

…

request_area request_postcode

http://dl.acm.org/citation.cfm?id=1117568

(9)

9

Phrase-Based NLG

(Mairesse et al, 2010)

Semantic DBN Phrase

DBN

Charlie Chan is a Chinese Restaurant near Cineworld in the centre

d d

Inform(name=Charlie Chan, food=Chinese, type= restaurant, near=Cineworld, area=centre)

9

Pros:efficient, good performance Cons: require semantic alignments

realization phrase semantic stack

http://dl.acm.org/citation.cfm?id=1858838

(10)

Natural Language Generation

10

Deep Learning Approaches

(11)

11

RNN-Based LM NLG

(Wen et al., 2015)

<BOS> SLOT_NAME serves SLOT_FOOD .

<BOS> Din Tai Fung serves Taiwanese . delexicalisation

Inform(name=Din Tai Fung, food=Taiwanese) 0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, 0, 0, 0…

dialogue act 1-hot representation

SLOT_NAME serves SLOT_FOOD . <EOS>

Slot weight tying

conditioned on the dialogue act

Input

Output

http://www.anthology.aclweb.org/W/W15/W15-46.pdf#page=295

(12)

12

Handling Semantic Repetition

 Issue: semantic repetition

 Din Tai Fung is a great Taiwanese restaurant that serves Taiwanese.

 Din Tai Fung is a child friendly restaurant, and also allows kids.

 Deficiency in either model or decoding (or both)

 Mitigation

 Post-processing rules (Oh & Rudnicky, 2000)

 Gating mechanism (Wen et al., 2015)

 Attention(Mei et al., 2016; Wen et al., 2015)

12

(13)

13

Visualization

13

(14)

14



Original LSTM cell



Dialogue act (DA) cell



Modify C

^t

Semantic Conditioned LSTM

(Wen et al., 2015)

DA cell LSTM cell

Ct

i_t

ft

o_t

r_t

h_t

dt

dt-1

x_t

x_t h_t-1

x_t h_t-1 xt h_t-1 x_t h_t-

1

h_t-1

Inform(name=Seven_Days, food=Chinese)

0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, … dialog act 1-hot representation d₀

14

Idea: using gate mechanism to control the generated semantics (dialogue act/slots)

http://www.aclweb.org/anthology/D/D15/D15-1199.pdf

(15)

15

Attentive Encoder-Decoder for NLG



Slot & value embedding



Attentive meaning representation

15

(16)

16

Attention Heat Map

(17)

17

Model Comparison

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

1 10 100

BLEU

% of data

hlstm sclstm encdec

0%

1%

10%

100%

1 10 100

ERR

% of data hlstm

sclstm encdec

(18)

18

Structural NLG

(Dušek and Jurčíček, 2016)



Goal: NLG based on the syntax tree

 Encode trees as sequences

 Seq2Seq model for generation

18 https://www.aclweb.org/anthology/P/P16/P16-2.pdf#page=79

(19)

19

Contextual NLG

(Dušek and Jurčíček, 2016)



Goal: adapting users’

way of speaking, providing context- aware responses

 Context encoder

 Seq2Seq model

19 https://www.aclweb.org/anthology/W/W16/W16-36.pdf#page=203

(20)

20

Decoder Sampling Strategy



Decoding procedure



Greedy search



Beam search



Random search

20

Inform(name=Din Tai Fung, food=Taiwanese) 0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, 0, 0, 0…

SLOT_NAME serves SLOT_FOOD . <EOS>

(21)

21

Greedy Search



Select the next word with the highest probability

21

(22)

22

Beam Search



Select the next k-best words and keep a beam with width=k for following decoding

22

(23)

23

Random Search



Randomly select the next word



Higher diversity



Can follow a probability distribution

23

(24)

Chit-Chat Generation

24

(25)

25

Chit-Chat Bot



Neural conversational model



Non task-oriented

25

(26)

26

Many-to-Many

 Both input and output are both sequences → Sequence-to- sequence learning



E.g. Machine Translation (machine learning→機器學習)

26

learning

machine

機器學習

[Ilya Sutskever, NIPS’14][Dzmitry Bahdanau, arXiv’15]

===

(27)

27

A Neural Conversational Model



Seq2Seq

27

[Vinyals and Le, 2015]

(28)

28

Chit-Chat Bot

28

電視影集 (~40,000 sentences)、美國總統大選辯論

(29)

29

Sci-Fi Short Film - SUNSPRING

https://www.youtube.com/watch?v=LY7x2Ihqj29

(30)

30

Concluding Remarks

 The three pillars of deep learning for NLG

 Distributed representation – generalization

 Recurrent connection – long-term dependency

 Conditional RNN – flexibility/creativity

 Useful techniques in deep learning for NLG

 Learnable gates

 Attention mechanism

 Generating longer/complex sentences

 Phrase dialogue as conditional generation problem

 Conditioning on raw input sentence  chit-chat bot

 Conditioning on both structured and unstructured sources  task-completing dialogue system

30

Slides credit from Shawn

Review

Task-Oriented Dialogue System

Task-Oriented Dialogue System

Natural Language Generation

Traditional Approaches

Natural Language Generation (NLG)

Mapping dialogue acts into natural language

Template-Based NLG

Define a set of rules to map frames to NL

Class-Based LM NLG

Class-based language modeling

NLG by decoding

Phrase-Based NLG

Natural Language Generation

Deep Learning Approaches

RNN-Based LM NLG

Handling Semantic Repetition

Visualization

Original LSTM cell

Dialogue act (DA) cell

Modify C

Semantic Conditioned LSTM

Attentive Encoder-Decoder for NLG

Slot & value embedding

Attentive meaning representation

Attention Heat Map

Model Comparison

Structural NLG

Goal: NLG based on the syntax tree

Contextual NLG

Goal: adapting users’

way of speaking, providing context- aware responses

Decoder Sampling Strategy

Decoding procedure

Greedy search

Beam search

Random search

Greedy Search

Select the next word with the highest probability

Beam Search

Select the next k-best words and keep a beam with width=k for following decoding

Random Search

Randomly select the next word

Higher diversity

Can follow a probability distribution

Chit-Chat Generation

Chit-Chat Bot

Neural conversational model

Non task-oriented

Many-to-Many

E.g. Machine Translation (machine learning→機器學 習)

A Neural Conversational Model

Seq2Seq

Chit-Chat Bot

Sci-Fi Short Film - SUNSPRING

Concluding Remarks

E.g. Machine Translation (machine learning→機器學習)