Deep Learning for Dialogue Systems

(1)

Deep Learning for Dialogue Systems

deepdialogue.miulab.tw

(2)

2

Outline

 Introduction & Background

 Neural Networks

 Reinforcement Learning

 Modular Dialogue System

 Spoken/Natural Language Understanding (SLU/NLU)

 Dialogue Management (DM)

◼ Dialogue State Tracking (DST)

◼ Dialogue Policy Optimization

 Natural Language Generation (NLG)

 End-to-End Neural Dialogue Systems

 Evaluation

 Recent Trends on Learning Dialogues

(3)

Neural Networks

Reinforcement Learning

(4)

4

Early 1990s

Early 2000s

2017

Multi-modal systems

e.g., Microsoft MiPad, Pocket PC

Keyword Spotting (e.g., AT&T)

System: “Please say collect, calling card, person, third

TV Voice Search e.g., Bing on Xbox

Intent Determination

(Nuance’s Emily™, AT&T HMIHY) User: “Uh…we want to move…we want to change our phone line from this house to another house” Task-specific argument extraction

(e.g., Nuance, SpeechWorks) User: “I want to fly from Boston to New York next week.”

Brief History of Dialogue Systems

Apple Siri (2011)

Google Now (2012)

Facebook M & Bot (2015)

Google Home (2016) Microsoft Cortana

(2014)

Amazon Alexa/Echo (2014)

Google Assistant (2016)

DARPA CALO Project

Virtual Personal Assistants

(5)

5

Language Empowering Intelligent Assistant

Apple Siri (2011) Google Now (2012)

Facebook M & Bot (2015) Google Home (2016)

Microsoft Cortana (2014)

Amazon Alexa/Echo (2014)

Google Assistant (2016)

Apple HomePod (2017)

(6)

6

Conversational Agents

Chit-Chat

Task-Oriented

(7)

7

Challenges

 Variability in natural language

 Robustness

 Recall/Precision Trade-off

 Meaning Representation

 Common Sense, World Knowledge

 Ability to learn

 Transparency

7

(8)

8

Task-Oriented Dialogue System (Young, 2000)

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Backend Action /

http://rsta.royalsocietypublishing.org/content/358/1769/1389.short

(9)

9

Outline

 Introduction & Background

 Neural Networks

 Reinforcement Learning

 Modular Dialogue System

 Spoken/Natural Language Understanding (SLU/NLU)

 Dialogue Management (DM)

◼ Dialogue State Tracking (DST)

◼ Dialogue Policy Optimization

 Natural Language Generation (NLG)

 End-to-End Neural Dialogue Systems

 System Evaluation

 Recent Trends on Learning Dialogues

9

(10)

10

A Single Neuron

z w 1

w 2

w N

…

x 1

x 2

x N

+ b

( ) ^z

 ^ ( ) ^z

bias z

y

( ) _z

z e ₋

= + 1

 1

Sigmoid function Activation function

1 w, b are the parameters of this neuron

(11)

11

A Single Neuron

z w 1

w 2

w N

…

x 1

x 2

x N

+

b

bias

y

1   





5 . 0

"

2 "

5 . 0

"

2 "

y not

y is

A single neuron can only handle binary classification

11

M

N R

R

f : →

(12)

12

A Layer of Neurons



Handwriting digit classification f : R ^N → R ^M

A layer of neurons can handle multiple possible output, and the result depends on the max one

…

x 1

x 2

x N

+

1 + y ₁

+

… …

“1” or not

“2” or not

“3” or not

y 2

y 3

10 neurons/10 classes

Which

one is

max?

(13)

13

Deep Neural Networks (DNN)



Fully connected feedforward network

x

1

x

2

……

Layer 1

……

y

1

y

2

……

Layer 2

……

Layer L

……

Input Output

y

M

x

N

vector x

vector y

Deep NN: multiple hidden layers

M

N R

R

f : →

(14)

14

Recurrent Neural Network (RNN)

: tanh, ReLU

time

RNN can learn accumulated sequential information (time-series)

(15)

15

Outline

 Introduction & Background

 Neural Networks

 Reinforcement Learning

 Modular Dialogue System

 Spoken/Natural Language Understanding (SLU/NLU)

 Dialogue Management (DM)

◼ Dialogue State Tracking (DST)

◼ Dialogue Policy Optimization

 Natural Language Generation (NLG)

 System Evaluation

 Recent Trends on Learning Dialogues

15

(16)

16

Reinforcement Learning

 RL is a general purpose framework for decision making

 RL is for an agent with the capacity to act

 Each action influences the agent’s future state

 Success is measured by a scalar reward signal

 Goal: select actions to maximize future reward

Observation

Action

Reward

(17)

17

Supervised v.s. Reinforcement

 Supervised

 Reinforcement

17

……

Say “Hi”

Say “Good bye”

Learning from teacher

Learning from critics

Hello ☺ ^……

“Hello”

“Bye bye”

……. ^…….

OXX???!

Bad

(18)

18

Deep Reinforcement Learning

Environment

Observation Action

Reward Function

Input

Function Output

Used to pick the best function

… …

… DNN

Goal: select actions that maximize the expected total reward

(19)

Modular Dialogue System

(20)

20

Outline

 Introduction & Background

 Neural Networks

 Reinforcement Learning

 Modular Dialogue System

 Spoken/Natural Language Understanding (SLU/NLU)

 Dialogue Management (DM)

◼ Dialogue State Tracking (DST)

◼ Dialogue Policy Optimization

 Natural Language Generation (NLG)

 End-to-End Neural Dialogue Systems

 System Evaluation

 Recent Trends on Learning Dialogues

(21)

21

Task-Oriented Dialogue System (Young, 2000)

21

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Backend Action /

Knowledge Providers

(22)

22

Semantic Frame Representation



Requires a domain ontology: early connection to backend



Contains core content (intent, a set of slots with fillers) find me a cheap taiwanese restaurant in oakland

show me action movies directed by james cameron

find_restaurant (price=“cheap”,

type=“taiwanese”, location=“oakland”)

find_movie (genre=“action”, director=“james cameron”) Restaurant

Domain

Movie Domain

restaurant price type

location

movie year genre

director

(23)

23

Language Understanding (LU)

 Pipelined

23

1. Domain Classification

2. Intent

Classification 3. Slot Filling

(24)

LU – Domain/Intent Classification

As an utterance classification

task

• Given a collection of utterances u _i with labels c _i , D= {(u ₁ ,c ₁ ),…,(u _n ,c _n )} where c _i ∊ C, train a model to estimate labels for new utterances u _k .

24

find me a cheap taiwanese restaurant in oakland

Movies Restaurants Music

Sports

…

find_movie, buy_tickets

find_restaurant, find_price, book_table find_lyrics, find_singer

…

Domain Intent

(25)

25

Domain/Intent Classification (Sarikaya+, 2011)

 Deep belief nets (DBN)

 Unsupervised training of weights

 Fine-tuning by back-propagation

 Compared to MaxEnt, SVM, and boosting

25 http://ieeexplore.ieee.org/abstract/document/5947649/

(26)

26

Domain/Intent Classification

2012)

 Deep convex networks (DCN)

 Simple classifiers are stacked to learn complex functions

 Feature selection of salient n-grams

 Extension to kernel-DCN

http://ieeexplore.ieee.org/abstract/document/6289054/; http://ieeexplore.ieee.org/abstract/document/6424224/

(27)

27

2015)

 RNN and LSTMs for utterance classification

27

Intent decision after reading all words performs better

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/RNNLM_addressee.pdf

(28)

28

Dialogue Act Classification (Lee & Dernoncourt, 2016)

 RNN and CNNs for dialogue act classification

(29)

LU – Slot Filling

29

flights from Boston to New York today

O O B-city O B-city I-city O

O O B-dept O B-arrival I-arrival B-date

As a sequence tagging task

• Given a collection tagged word sequences, S={((w

_1,1

,w

_1,2

,…, w

_1,n1

), (t

_1,1

,t

_1,2

,…,t

_1,n1

)), ((w

_2,1

,w

_2,2

,…,w

_2,n2

), (t

_2,1

,t

_2,2

,…,t

_2,n2

)) …}

where t

_i

∊ M, the goal is to estimate tags for a new word sequence.

flights from Boston to New York today

Entity Tag

Slot Tag

(30)

30

Slot Tagging (Yao+, 2013; Mesnil+, 2015)

 Variations:

a. RNNs with LSTM cells

ℎ

_𝑛

http://131.107.65.14/en-us/um/people/gzweig/Pubs/Interspeech2013RNNLU.pdf; http://dl.acm.org/citation.cfm?id=2876380

(31)

31

Slot Tagging (Kurata+, 2016; Simonnet+, 2015)

 Encoder-decoder networks

 Leverages sentence level information

 Attention-based encoder- decoder

http://www.aclweb.org/anthology/D16-1223

(32)

32

Joint Segmentation & Slot Tagging (Zhai+, 2017)

 Encoder that segments

 Decoder that tags the segments

https://arxiv.org/pdf/1701.04027.pdf

(33)

33

2016)

 Multi-task learning

 Goal: exploit data from domains/tasks with a lot of data to improve ones with less data

 Lower layers are shared across domains/tasks

 Output layer is specific to task

33 https://arxiv.org/abs/1604.00117; http://www.sensei-conversation.eu/wp-content/uploads/2016/11/favre_is2016b.pdf

(34)

34

Semi-Supervised Slot Tagging (Lan+, 2018)

Idea: language understanding objective can enhance other tasks

Slot Tagging

Model

BLM exploits the unsupervised knowledge, the shared-private framework and adversarial training make the slot tagging model more generalized

https://speechlab.sjtu.edu.cn/papers/oyl11-lan-icassp18.pdf

(35)

35

LU Evaluation

 Metrics

 Sub-sentence-level: intent accuracy, intent F1, slot F1

 Sentence-level: whole frame accuracy

35

(36)

h_t-

1

h_t+

1

h_t

W W W W

taiwanese

B-type U

food U

please U

V

O V

h_T+1 EOS U

FIND_RES T V

Slot Filling Intent

Joint Semantic Frame Parsing

Liu & Lane, 2016)

Sequence- based (Hakkani- Tur+, 2016)

• Slot filling and intent prediction in the same

output sequence

Parallel (Liu & Lane,

2016)

• Intent prediction and slot filling are performed in two branches

36 https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/IS16_MultiJoint.pdf; https://arxiv.org/abs/1609.01454

(37)

37

Slot-Gated Joint SLU (Goo+, 2018)

Slot Attention

Intent Attention 𝑦^𝐼

Word Sequence

𝑥₁ 𝑥₂ 𝑥₃ 𝑥₄

BLSTM Slot

Sequence

^𝑦¹

𝑆 𝑦₂^𝑆 𝑦₃^𝑆 𝑦₄^𝑆

Word

Sequence

^𝑥¹ ^𝑥² ^𝑥³ ^𝑥⁴

BLSTM

Slot Gate

𝑊

𝑐

^𝐼

𝑣

tanh 𝑔

𝑐

_𝑖^𝑆

Slot Gate

𝑔 = ∑𝑣 ∙ tanh 𝑐

_𝑖^𝑆

+ 𝑊 ∙ 𝑐

^𝐼

Slot Prediction

𝑦

_𝑖^𝑆

= 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑊

^𝑆

ℎ

_𝑖

+ 𝒈 ∙ 𝑐

_𝑖^𝑆

+ 𝑏

^𝑆

𝒈 will be larger if slot and intent are better related

(38)

38

Contextual LU

just sent email to bob about fishing this weekend

O O O O

B-contact_name O

B-subject I-subject I-subject U

S

I send_email D communication

→ send_email(contact_name=“bob”, subject=“fishing this weekend”)

are we going to fish this weekend U

₁

S

₂

→ send_email(message=“are we going to fish this weekend”) send email to bob

U

₂

→ send_email(contact_name=“bob”)

B-message

I-message I-message I-message I-message I-message I-message

B-contact_name S

₁

Domain Identification → Intent Prediction → Slot Filling

(39)

39

Which restaurant would you like to book a table for and for what time?

Contextual LU

 User utterances are highly ambiguous in isolation

Cascal, for 6.

#people time

?

Book a table for 10 people tonight.

Which restaurant would you like to book a table for?

Restaurant

Booking

(40)

40

Contextual LU (Bhargava+, 2013; Hori+, 2015 )

 Leveraging contexts

 Used for individual tasks

 Seq2Seq model

 Words are input one at a time, tags are output at the end of each utterance

 Extension: LSTM with speaker role dependent layers

https://www.merl.com/publications/docs/TR2015-134.pdf

(41)

41

End-to-End Memory Networks (Sukhbaatar+, 2015)

U: “i d like to purchase tickets to see deepwater horizon”

S: “for which theatre”

U: “angelika”

S: “you want them for angelika theatre?”

U: “yes angelika”

S: “how many tickets would you like ?”

U: “3 tickets for saturday”

S: “What time would you like ?”

U: “Any time on saturday is fine”

S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm”

U: “Let’s do 5:40”

m₀

m_i

m_n-1 u

(42)

42

E2E MemNN for Contextual LU (Chen+, 2016)

u

Knowledge Attention Distribution

p

_i

m

_i

Memory Representation

Weighted

Sum

h

∑ W

_kg

Knowledge Encoding

o

Representation history utterances {x_i}

current utterance

c

Inner Product Sentence

Encoder RNN_in

x1 x2 … xi

Contextual Sentence Encoder

x1 x2 … xi

RNN_mem

slot tagging sequence

y

h_t-1 h_t

V V

W W W

w_t-1 w_t y_t-1 y_t

U U

M M

1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding

Idea: additionally incorporating contextual knowledge during slot tagging

→ track dialogue states in a latent way

RNN Tagger

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/IS16_ContextualSLU.pdf

(43)

43

Analysis of Attention

U: “i d like to purchase tickets to see deepwater horizon”

S: “for which theatre”

U: “angelika”

S: “you want them for angelika theatre?”

U: “yes angelika”

S: “how many tickets would you like ?”

U: “3 tickets for saturday”

S: “What time would you like ?”

U: “Any time on saturday is fine”

S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm”

U: “Let’s do 5:40”

0.69

0.13

0.16

(44)

44

Dialogue Encoder Network (Bapna+, 2017)

 Past and current turn encodings input to a feed forward network

http://aclweb.org/anthology/W17-5514

(45)

u

₁

u

₇

Current

Sentence-Level Time-Decay Attention

u

₃

u

₅

Role-Level Time- Decay Attention

𝛼

_𝑟₁

𝛼

_𝑟₂

𝛼

_𝑢_𝑖

∙ u₂ ∙ u₄ ∙ u₅

𝛼_𝑢₂ 𝛼_𝑢₄ 𝛼_𝑢₅ 𝛼_𝑢∙ u₁ ₁𝛼_𝑢∙ u₃ ₃ 𝛼_𝑢∙ u₆ ₆ History Summary

Time-Decay Attention Function (𝛼

_𝑢

& 𝛼

_𝑟

) 𝛼

𝑑 𝛼

𝑑

convex linear concave

Role-Based Time-Decay Attention (Su+, 2018)

http://aclweb.org/anthology/N18-1194

(46)

46

Dense Layer

+

Current Utterance w

_t

w

_t+1

w

_T

… …

Dense

Layer Spoken Language Understanding

∙ u₂ ∙ u₄ ∙ u₅

u

₂

u

₆

_𝑟₂

𝛼

_𝑢_𝑖

𝛼_𝑢₄ 𝛼_𝑢₅ 𝛼_𝑢₃ 𝛼_𝑢₆ History Summary

Attention Model

convex linear concave

Time-decay attention significantly improves the understanding results

Context-Sensitive Time-Decay (Su+, 2018)

(47)

47

Structural LU (Chen+, 2016)

 Prior knowledge as a teacher

47

Knowledge Encoding

Sentence Encoding

Inner Product

m

_i

Knowledge Attention Distribution

p _i

Encoded Knowledge Representation

Weighted Sum

∑

Knowledge- Guided Representation

slot tagging sequence knowledge-guided structure {x

_i

}

showme theflights fromseattleto sanfrancisco

ROOT

Input Sentence

W W W W

w_t-1

y_t-1 U

w_t M U

w_t+1 U

V

y_t V

y_t+1 V M

M

RNN Tagger

Knowledge Encoding Module

http://arxiv.org/abs/1609.03286

(48)

48

Structural LU (Chen+, 2016)



Sentence structural knowledge stored as memory

Semantics (AMR Graph)

show me

the

flights from seattle

to francisco ROOT

1.

3.

4.

2. show

you flight I

1.

2.

4. city city

Seattle San Francisco 3.

Sentence s show me the flights from seattle to san francisco

Syntax (Dependency Tree)

(49)

49

Structural LU (Chen+, 2016)



Sentence structural knowledge stored as memory

Using less training data with K-SAN allows the model pay the similar attention to the salient substructures

(50)

50

Semantic Frame Representation



Requires a domain ontology: early connection to backend



Contains core content (intent, a set of slots with fillers) find me a cheap taiwanese restaurant in oakland

show me action movies directed by james cameron

find_restaurant (price=“cheap”,

type=“taiwanese”, location=“oakland”)

find_movie (genre=“action”, director=“james cameron”) Restaurant

Domain

Movie Domain

restaurant price type

location

movie year genre

director

(51)

51

LU – Learning Semantic Ontology (Chen+, 2013)

 Learning key domain concepts from goal-oriented human-human conversations

 Clustering with mutual information and KL divergence

(Chotimongkol & Rudnicky, 2002)

 Spectral clustering based slot ranking model (Chen et al., 2013)

◼ Use a state-of-the-art frame-semantic parser trained for FrameNet

◼ Adapt the generic output of the parser to the target semantic space

51 http://www.cs.cmu.edu/~ananlada/ConceptIdentificationICSLP02.pdf, http://ieeexplore.ieee.org/abstract/document/6707716/

(52)

52

LU – Intent Expansion (Chen+, 2016)

 Transfer dialogue acts across domains

 Dialogue acts are similar for multiple domains

 Learning new intents by information from other domains

CDSSM New Intent

Intent Representation 1 2

K :

Embedding Generation

K+1

<change_calender> K+2 Training Data

<change_note>

“adjust my note”

:

<change_setting>

“volume turn down”

300 300 300 300

U ^A

¹

^A

2

A

_n

CosSi m

P(A

₁

| U) P(A

₂

| U) P(A

_n

| U)

…

Utterance

The dialogue act representations can be

http://ieeexplore.ieee.org/abstract/document/7472838/

postpone my meeting to five pm

(53)

53

LU – Language Extension (Upadhyay+, 2018)

 Source language: English (full annotations)

 Target language: Hindi (limited annotations)

53

RT: round trip, FC: from city, TC: to city, DDN: departure day name

http://shyamupa.com/papers/UFTHH18.pdf

(54)

54

LU – Language Extension (Upadhyay+, 2018)

English Train

Hindi Train

Hindi Tagger

MT SLU

Results Hindi Test

Train on Target (Lefevre et al, 2010)

English Tagger Hindi

Test

English

MT Test SLU

Results Test on Source (Jabaian et al, 2011)

SLU Results Hindi Train (Small)

Bilingual Tagger English Train (Large)

Joint Training

Hindi Test Joint Training

MT system is not required and both languages can be processed by a single model

http://shyamupa.com/papers/UFTHH18.pdf

(55)

55

Outline

 Introduction & Background

 Neural Networks

 Reinforcement Learning

 Modular Dialogue System

 Spoken/Natural Language Understanding (SLU/NLU)

 Dialogue Management (DM)

◼ Dialogue State Tracking (DST)

◼ Dialogue Policy Optimization

 Natural Language Generation (NLG)

 End-to-End Neural Dialogue Systems

 System Evaluation

 Recent Trends on Learning Dialogues

55

(56)

56

Task-Oriented Dialogue System (Young, 2000)

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Backend Action /

(57)

57

Elements of Dialogue Management

(Figure from Gašić)

57

Dialogue state tracking

(58)

58

Dialogue State Tracking (DST)



Dialogue state: a representation of the system's belief of the user's goal(s) at any time during the dialogue



Inputs

 Current user utterance

 Preceding system response

 Results from previous turns



For

 Looking up knowledge or making API call(s)

 Generating the next system action/response

(59)

59

Dialogue State Tracking (DST)

 Maintain a probabilistic distribution instead of a 1-best prediction for better robustness to SLU errors or

ambiguous input

59

How can I help you?

Book a table at Sumiko for 5 How many people?

3 Slot Value

# people 5 (0.5)

time 5 (0.5)

Slot Value

# people 3 (0.8)

time 5 (0.8)

(60)

60

Multi-Domain Dialogue State Tracking

 A full representation of the system's belief of the user's goal at any point during the dialogue

 Used for making API calls

Movies

Less More

Date Time

#People 6 pm

2 11/15/17

7 pm 8 pm 9 pm

Century 16 Shoreline

#People Theater

Inferno.

Inferno Movie

Which movie are you interested in?

I wanna buy two tickets for tonight at the Shoreline theater.

(61)

61

Multi-Domain Dialogue State Tracking

 A full representation of the system's belief of the user's goal at any point during the dialogue

 Used for making API calls

61 Movies

Less Likely

More Likely

I wanna buy two tickets for tonight at the Shoreline theater.

Date Time

#People

6:30 pm 2 11/15/17

7:30 pm 8:45 pm 9:45 pm

Which movie are you interested in?

Inferno.

Inferno showtimes at Century 16 Shoreline are 6:30pm, 7:30pm, 8:45pm and 9:45pm. What time do you prefer?

We'd like to eat dinner before the movie at Cascal, can you check what time i can get a table?

Restaurants

6:00 pm 6:30 pm 11/15/17

Date

Time 7:00 pm

Cascal

#People 2 Restaurant

(62)

62

Multi-Domain Dialogue State Tracking

 A full representation of the system's belief of the user's goal at any point during the dialogue

 Used for making API calls

Movies

Less More

Date Time

#People

6:30 pm 2 11/15/17

7:30 pm 8:45 pm 9:45 pm

Inferno.

Inferno showtimes at Century 16 Shoreline are 6:30pm, 7:30pm, 8:45pm and 9:45pm. What time do you prefer?

We'd like to eat dinner before the movie at Cascal, can you check what time i can get a table?

Restaurants

6:00 pm 6:30 pm 11/15/17

Date

Time 7:00 pm

Cascal

Cascal has a table for 2 at 6pm and 7:30pm.

OK, let me get the table at 6 and tickets for the 7:30 showing.

#People 2 Restaurant

(63)

63

RNN-CNN DST (Mrkšić+, 2015)

(Figure from Wen et al, 2016)

63 https://arxiv.org/abs/1506.07190

(64)

64

Neural Belief Tracker (Mrkšić+, 2016)

 Candidate pairs are considered

https://arxiv.org/abs/1606.03777

Previous Belief State: [b_t-1]

(65)

65

Global-Locally Self-Attentive DST (Zhong+, 2018)

 More advanced encoder



Global modules share parameters for all slots



Local modules learn slot-specific feature representations

65 http://www.aclweb.org/anthology/P18-1135

(66)

66

Dialog State Tracking Challenge (DSTC)

(Williams+, 2013, Henderson+, 2014, Henderson+, 2014, Kim+, 2016, Kim+, 2016)

Challenge Type Domain Data Provider Main Theme DSTC1 Human-

Machine Bus Route CMU Evaluation Metrics

DSTC2 Human-

Machine Restaurant U. Cambridge User Goal Changes

DSTC3 Human-

Machine Tourist Information U. Cambridge Domain Adaptation

DSTC4 Human-

Human Tourist Information I2R Human Conversation

DSTC5 Human-

Human Tourist Information I2R Language Adaptation

(67)

67

DST Evaluation

 Metric

 Tracked state accuracy with respect to user goal

 Recall/Precision/F-measure individual slots

67

(68)

68

DST – Language Extension (Shi+, 2016)

 Training a multichannel CNN for each slot

 Chinese character CNN

 Chinese word CNN

 English word CNN

(69)

69

DST – Task Lineages (Lee & Stent, 2016)

 Slot values shared across tasks

 Utterances with complex constraints on user goals

 Interleaved multiple task discussions

69

(confidence, dialog act item )

^Start_timeEnd_time

Task Frame:

Connection to Manhattan and find me a Thai restaurant, not Italian

https://www.aclweb.org/anthology/W/W16/W16-36.pdf#page=29

Task State:

Thai restaurant, not Italian

(70)

70

DST – Task Lineages (Lee & Stent, 2016)

https://www.aclweb.org/anthology/W/W16/W16-36.pdf#page=29

(71)

71

DST – Scalability (Rastogi+, 2017)

 Focus only on the relevant slots

 Better generalization to ASR lattices, visual context, etc.

71

S> How about 6 pm?

U> I am busy then, book it for 7 pm instead.

(72)

72

DST – Handling Unknown Values (Xu & Hu, 2018)

 Issue: fixed value sets in DST

http://aclweb.org/anthology/P18-1134

<sys> would you like some Thai food

Attention Dist.

<usr> I prefer Italian one <food>

“Italian”

other dontcare

none

Italian

(73)

73

Joint NLU and DST (Gupta+, 2018)

73 d^t

d_st

System Act Encoder

Utterance Encoder

request(movie) request(date)

<SOS> Tickets for Avatar tonight <EOS>

d^t-1 d_st at

d^t-2 d_st

u_e u^t

System Act Encoder

Utterance Encoder

greeting

<SOS> I want to see a movie <EOS>

a^t-1

u_e u^t-1

u

^t

u

_o

d^t-1 d_o

d^t d_o

a^t

O O O B-movie B-date O

User Intent Classifier

..

BUY_MOVIE_TICKETS

Dialogue Act Classifier

INFORM

Slot Tagger

u

^t

u

_o

(74)

74

Joint NLU and DST (Gupta+, 2018)

System Act Encoder

Utterance Encoder

request(movie) request(date)

d^t d_st d^t-1 d_st at

d^t-1 d_o

d^t d_o d^t-2 d_st

u_e u^t

System Act Encoder

Utterance Encoder

greeting

<SOS> I want to see a movie <EOS>

a^t-1

u_e u^t-1

u

^t

u

_o

Candidate Scorer

D^t D^t-1

movie: Avatar date: tonight

(75)

75

Outline

 Introduction & Background

 Neural Networks

 Reinforcement Learning

 Modular Dialogue System

 Spoken/Natural Language Understanding (SLU/NLU)

 Dialogue Management (DM)

◼ Dialogue State Tracking (DST)

◼ Dialogue Policy Optimization

 Natural Language Generation (NLG)

 End-to-End Neural Dialogue Systems

 System Evaluation

 Recent Trends on Learning Dialogues

75

(76)

76

Elements of Dialogue Management

Dialogue policy optimization

(77)

77

Dialogue Policy Optimization

 Dialogue management in a RL framework

77

U s e r

Reward R Observation O Action A

Environment

Agent

Natural Language Generation Language Understanding

Dialogue Manager

Goal: select the best action that maximizes the future reward

(78)

78

Reward for RL ≅ Evaluation for System

◼ Dialogue is a special RL task

◼ Human involves in interaction and rating (evaluation) of a dialogue

◼ Fully human-in-the-loop framework

◼ Rating: correctness, appropriateness, and adequacy

- Expert rating high quality, high cost

- User rating unreliable quality, medium cost

- Objective rating Check desired aspects, low cost

(79)

79

RL for Dialogue Policy Optimization

79

Language understanding

Language (response) generation

Dialogue Policy 𝑎 = 𝜋(𝑠)

Collect rewards (𝑠, 𝑎, 𝑟, 𝑠’)

Optimize 𝑄(𝑠, 𝑎) User input (o)

Response

𝑠

𝑎

Type of Bots State Action Reward

Social ChatBots

Chat history System Response # of turns maximized;

Intrinsically motivated reward

InfoBots (interactive Q/A)

User current question + Context

Answers to current question

Relevance of answer;

# of turns minimized

Task-Completion Bots

User current input + Context

System dialogue act w/

slot value (or API calls)

Task success rate;

# of turns minimized

Goal: develop a generic deep RL algorithm to learn dialogue policy for all bot categories

(80)

80

Dialogue Reinforcement Learning Signal

Typical reward function

◼ Large reward at completion if successful

◼ -1 for per turn penalty

Typically requires domain knowledge

✔ Simulated user

✔ Paid users (Amazon Mechanical Turk)

✖ Real users

|||

…

﹅

The user simulator is usually required for

dialogue system training before deployment

(81)

81

Neural Dialogue Manager (Li+, 2017)

 Deep RL for training DM

 Input: current semantic frame observation, database returned results

 Output: system action

81

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location DQN-based

Dialogue Management Simulated/paid/real (DM)

User

Backend DB

http://www.aclweb.org/anthology/I17-1074

(82)

82

E2E Task-Completion Bot (TC-Bot) (Li+, 2017)

Idea: SL for each component and RL for end-to-end training

w_i

<slot>

w_i+1

O

EOS

w_i

<slot>

w_i+1

O

EOS

Database

Neural Dialogue System User Model

User Simulation

Dialogue Policy Natural Language

w₀ w₁

NLG

w₂

EOS

User Goal

w_i

<slot>

w_i+1

O

EOS

LU

𝑠

_𝑡

DST

……

…

Dialogue Policy Learning Are there any

action movies to see this weekend?

request_location

http://www.aclweb.org/anthology/I17-1074

(83)

83

SL + RL for Sample Efficiency (Su+, 2017)

 Issue about RL for DM

 slow learning speed

 cold start

 Solutions

 Sample-efficient actor-critic

◼ Off-policy learning with experience replay

◼ Better gradient update

 Utilizing supervised data

◼ Pretrain the model with SL and then fine- tune with RL

◼ Mix SL and RL data during RL learning

◼ Combine both

http://aclweb.org/anthology/W17-5518

(84)

84

Learning to Negotiate (Lewis+, 2017)

 Task: multi-issue bargaining

 Each agent has its own value function

(85)

85

Learning to Negotiate (Lewis+, 2017)

 Dialogue rollouts to simulate a future conversation

 SL + RL

 SL aims to imitate human users’ actions

 RL tries to make agents focus on the goal

(86)

86

Online Training (Su+, 2015; Su+, 2016)

 Policy learning from real users

 Infer reward directly from dialogues (Su+, 2015)

 User rating (Su+, 2016)

 Reward modeling on user binary success rating

Reward

Model Success/Fail Embedding

Function

Dialogue Representation

Reinforcement Signal Query rating

http://www.anthology.aclweb.org/W/W15/W15-46.pdf#page=437; https://www.aclweb.org/anthology/P/P16/P16-1230.pdf

(87)

87

Interactive RL for DM (Shah+, 2016)

87

Immediate Feedback

https://research.google.com/pubs/pub45734.html

Use a third agent for providing interactive feedback to the DM

Explicit

Implicit

(88)

88

Multi-Domain – Hierarchical RL (Peng+, 2017) Travel Planning

Actions

• Set of tasks that need to be fulfilled collectively!

• Build a DM for cross-subtask constraints (slot constraints)

• Temporally constructed goals

• hotel_check_in_time > departure_flight_time

• # flight_tickets = #people checking in the hotel

• hotel_check_out_time< return_flight_time,

(89)

89

Multi-Domain – Hierarchical RL (Peng+, 2017)



Model makes decisions over two levels: meta-controller & controller



The agent learns these policies simultaneously



the policy of optimal sequence of goals to follow 𝜋

_𝑔

𝑔

_𝑡

, 𝑠

_𝑡

; 𝜃

₁



89

Meta- Controller Controller

(mitigate reward sparsity issues)

(90)

90

Planning – Deep Dyna-Q (Peng+, 2018)



Issues: sample-inefficient, discrepancy between simulator & real user



Idea: learning with real users with planning

Policy Model World User

Model

Real Experience

Direct Reinforcement

Learning World Model

Learning Planning

Acting

Human

Conversational Data

Imitation Learning

Supervised

Learning

(91)

91

Deep Dyna-Q (Su+, 2018)



Idea: add a discriminator to filter out the bad experiences

91

Policy Model

User World

Model

Real Experience

Direct Reinforcement

Learning

World Model Learning

Controlled Planning

Acting

Human

Conversational Data ^Imitation

Learning Supervised

Learning

Discriminator

Discriminative Training

NLU

Discriminator

System Action (Policy) Semantic

Frame

State Representation Real

Experience

DST

Policy Learning NLG

Simulated Experience

World Model

User

(to appear) EMNLP 2018

(92)

92

Deep Dyna-Q (Su+, 2018)

𝑠 𝑎

𝑜 𝑟 𝑡

User

Response Reward Termination Signal

Dialogu

e State System

Action Task-Specific

Layer

Shared Layer

𝑠, 𝑎, 𝑟, 𝑠^′ World Model

𝑫^𝒔 𝑠, 𝑎, 𝑟, 𝑠𝑫^𝒖 ^′

Simulated Real

LSTM Discriminator

𝑜_𝑡−1 𝑜₂

𝑜₁

Dialogue Contexts

1: high-quality

0: low-quality

S.-Y. Su, X. Li, J. Gao, J. Liu, and Y.-N. Chen, “Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning," (to appear) in Proc. of EMNLP, 2018.

(93)

93

Deep Dyna-Q (Su+, 2018)

93

S.-Y. Su, X. Li, J. Gao, J. Liu, and Y.-N. Chen, “Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning," (to appear) in Proc. of EMNLP, 2018.

The policy learning is more robust and shows the improvement in human evaluation

(94)

94

Dialogue Management Evaluation

 Metrics

 Turn-level evaluation: system action accuracy

 Dialogue-level evaluation: task success rate, reward

(95)

95

RL-Based DM Challenge



SLT 2018 Microsoft Dialogue Challenge:

End-to-End Task-Completion Dialogue Systems



Domain 1: Movie-ticket booking



Domain 2: Restaurant reservation



Domain 3: Taxi ordering

(96)

96

Outline

 Introduction & Background

 Neural Networks

 Reinforcement Learning

 Modular Dialogue System

 Spoken/Natural Language Understanding (SLU/NLU)

 Dialogue Management (DM)

◼ Dialogue State Tracking (DST)

◼ Dialogue Policy Optimization

 Natural Language Generation (NLG)

 End-to-End Neural Dialogue Systems

 System Evaluation

 Recent Trends on Learning Dialogues

(97)

97

Task-Oriented Dialogue System (Young, 2000)

97

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Backend Action /

Knowledge Providers

(98)

98

Natural Language Generation (NLG)

 Mapping dialogue acts into natural language

inform(name=Seven_Days, foodtype=Chinese)

Seven Days is a nice Chinese restaurant

(99)

99

Template-Based NLG

 Define a set of rules to map frames to NL

99

Pros: simple, error-free, easy to control

Cons: time-consuming, un-natural , poor scalability Semantic Frame Natural Language

confirm() “Please tell me more about the product your are looking for.”

confirm(area=$V) “Do you want somewhere in the $V?”

confirm(food=$V) “Do you want a $V restaurant?”

confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.”

(100)

100

Plan-Based NLG (Walker+, 2002)

 Divide the problem into pipeline

 Statistical sentence plan generator

(Stent et al., 2009)

 Statistical surface realizer

(Dethlefs et al., 2013; Cuayáhuitl et al., 2014; …)