ASR channel simulator - Deep Learning for Dialogue Systems



Evaluate the generated sentences using BLUE-like measures against the reference utterances collected from humans (with the same

goal)

116

Seq2Seq User Simulation (El Asri et al., 2016)



Seq2Seq trained from dialogue data



Input: c

encodes contextual features, such as the previous system action, consistency between user goal and machine provided values



Output: a dialogue act sequence form the user



Extrinsic evaluation for policy

https://arxiv.org/abs/1607.00070

Material: http://deepdialogue.miulab.tw

117

Seq2Seq User Simulation (Crook and Marin, 2017)



Seq2Seq trained from dialogue data



No labeled data



Trained on just human to machine conversations

118

User Simulator for Dialogue Evaluation Measures

• whether constrained values specified by users can be understood by the system

• agreement percentage of system/user understandings over the entire dialog (averaging all turns) Understanding Ability

• Number of dialogue turns

• Ratio between the dialogue turns (larger is better) Efficiency

• an explicit confirmation for an uncertain user utterance is an appropriate system action

• providing information based on misunderstood user requirements Action Appropriateness

Material: http://deepdialogue.miulab.tw

119

How NOT to Evaluate Dialog System (Liu et al., 2017)



How to evaluate the quality of the generated response ?



Specifically investigated for chat-bots



Crucial for task-oriented tasks as well



Metrics:



Word overlap metrics, e.g., BLEU, METEOR, ROUGE, etc.



Embeddings based metrics, e.g., contextual/meaning representation between target and candidate

https://arxiv.org/pdf/1603.08023.pdf

120

Dialogue Response Evaluation (Lowe et al., 2017)

Towards an Automatic Turing Test



Problems of existing automatic evaluation



can be biased



correlate poorly with human judgements of response quality



using word overlap may be misleading



Solution



collect a dataset of accurate human scores for variety of dialogue responses (e.g., coherent/un-coherent, relevant/irrelevant, etc.)



use this dataset to train an automatic dialogue

evaluation model – learn to compare the reference to candidate responses!



Use RNN to predict scores by comparing against human scores!

Context of Conversation

Speaker A: Hey, what do you want to do tonight?

Speaker B: Why don’t we go see a movie?

Model Response

Nah, let’s do something active.

Reference Response

Yeah, the film about Turing looks great!

Material: http://deepdialogue.miulab.tw

End-to-End Learning for Dialogues Multimodality

Dialogue Breath

Dialogue Depth

122

Outline

 Introduction

 Background Knowledge

 Neural Network Basics

 Reinforcement Learning

 Modular Dialogue System

 Spoken/Natural Language Understanding (SLU/NLU)

 Dialogue Management

 Dialogue State Tracking (DST)

 Dialogue Policy Optimization

 Natural Language Generation (NLG)

 Evaluation

 Recent Trends and Challenges

 End-to-End Neural Dialogue System

 Multimodality

 Dialogue Breath

 Dialogue Depth

Material: http://deepdialogue.miulab.tw

123

ChitChat Hierarchical Seq2Seq (Serban et al., 2016)



Learns to generate dialogues from offline dialogs



No state, action, intent, slot, etc.

http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11957

124

ChitChat Hierarchical Seq2Seq (Serban et.al., 2017)



A hierarchical seq2seq model with Gaussian latent variable for generating dialogues (like topic or sentiment)

https://arxiv.org/abs/1605.06069

Material: http://deepdialogue.miulab.tw

125

2017)

125 https://arxiv.org/abs/1702.01932

126

E2E Joint NLU and DM (Yang et al., 2017)



Errors from DM can be propagated to NLU for regularization + robustness

126

Model DM NLU

Baseline (CRF+SVMs) 7.7 33.1 Pipeline-BLSTM 12.0 36.4

JointModel

22.8 37.4

Both DM and NLU performance (frame accuracy) is improved

https://arxiv.org/abs/1612.00913

Material: http://deepdialogue.miulab.tw

127

0 0 0 … 0 1

Database Operator

Copy field

…

Database

Seven days Curry Prince Nirala Royal Standard Little Seuol

DB pointer

Can I have korean Korean

0.7 British 0.2 French 0.1

…

Belief Tracker Intent Network

Can I have <v.food>

E2E Supervised Dialogue System (Wen et al., 2016)

Generation Network

<v.name> serves great <v.food> .

Policy Network

127

p^t x^t

MySQL query:

“Select * where food=Korean”

https://arxiv.org/abs/1604.04562

128

E2E MemNN for Dialogues (Bordes et al., 2016)



Split dialogue system actions into subtasks



API issuing



API updating



Option displaying



Information informing

https://arxiv.org/abs/1605.07683

Material: http://deepdialogue.miulab.tw

129

E2E RL-Based KB-InfoBot (Dhingra et al., 2017)

Movie=?; Actor=Bill Murray; Release Year=1993

Find me the Bill Murray’s movie.

I think it came out in 1993.

When was it released?

Groundhog Day is a Bill Murray movie which came out in 1993.

KB-InfoBot User

Entity-Centric Knowledge Base

Idea: differentiable database for propagating the gradients 129

http://www.aclweb.org/anthology/P/P17/P17-1045.pdf

Movie Actor Release

Year Groundhog Day Bill Murray 1993

Australia Nicole Kidman X

Mad Max: Fury Road X 2015

130

E2E RL-Based System (Zhao and Eskenazi, 2016)

130



Joint learning



NLU, DST, Dialogue Policy



Deep RL for training



Deep Q-network



Deep recurrent network

Baseline RL

Hybrid-RL

http://www.aclweb.org/anthology/W/W16/W16-36.pdf

Material: http://deepdialogue.miulab.tw

131

E2E LSTM-Based Dialogue Control (Williams and Zweig, 2016)

131



Idea: an LSTM maps from raw dialogue history directly to a distribution over system actions



Developers can provide software including business rules & programmatic APIs

 LSTM can take actions in the real world on behalf of the user



The LSTM can be optimized using SL or RL

https://arxiv.org/abs/1606.01269

132

E2E Task-Completion Bot (TC-Bot) (Li et al., 2017)

B-type

wi +1

wi+2

O O

EOS

<intent

w_i

B-type

wi +1

w_i+2

O O

EOS

<intent

> Semantic Frame request_movie genre=action, date=this weekend System Action /

Policy

request_location User Dialogue Action

Inform(location=San Francisco)

Time t-1

w_i

<slot>

w_i

wi+2

O O

EOS

Language Understanding (LU)

Time t-2 Time t

Dialogue Management

(DM)

w₀ w1 w2

Natural Language Generation (NLG)

User EOS

Goal

User Agenda Modeling User Simulator

End-to-End Neural Dialogue System Text Input

Are there any action movies to see this weekend?

Idea: supervised learning for each component and reinforcement learning for end-to-end training the neural dialogue system

132 https://arxiv.org/abs/1703.01008

Material: http://deepdialogue.miulab.tw

133

E2E Task-Completion Bot (TC-Bot) (Li et al., 2017)

 User goal: Two tickets for “the witch” tomorrow 9:30 PM at regal meridian 16, Seattle.

RULE BASED SYSTEM

User: Can I get 2 tickets for the witch?

Agent: Which theater do you want?

User: Regal meridian 16

Agent: What time would you like to see it?

User: 9:30 pm

Agent: What date would you like to see it?

User: Tomorrow!

Agent: How many people are you?

User: Just two adults.

Agent: What is your location?

User: Seattle!

Agent: What kind of video format do you like?

User: Oh, any kind is okay.

Agent: Do you have any price constraint?

User: No, I don’t care.

Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm.

User: Thank you.

Agent: Thank you! 133

The system can learn how to efficiently interact with users for task completion

REINFORCEMENT LEARNING SYSTEM User: Can I get 2 tickets for the witch?

Agent: What date would you like to see it?

User: Tomorrow

Agent: Which theater do you prefer?

User: Regal meridian 16 Agent: What is your location?

User: Seattle

Agent: What time would you like to see it?

User: 9:30 pm.

Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm.

User: Thanks.

Agent: Thanks!

https://arxiv.org/abs/1703.01008

134

Hierarchical RL for Composite Tasks (Peng et al., 2017)

134

Travel Planning

Actions

• Set of tasks that need to be fulfilled collectively!

• Build a dialog manager that satisfies cross-subtask constraints (slot constraints)

• Temporally constructed goals

• hotel_check_in_time > departure_flight_time

• # flight_tickets = #people checking in the hotel

• hotel_check_out_time< return_flight_time,

https://arxiv.org/abs/1704.03084 Peng et.al., EMNLP 2017

Material: http://deepdialogue.miulab.tw

135

Hierarchical RL for Composite Tasks (Peng et al., 2017)

135

 The dialog model makes decisions over two levels: meta-controller and meta-controller

 The agent learns these policies simultaneously

 the policy of optimal sequence of goals to follow 𝜋_𝑔 𝑔_𝑡, 𝑠_𝑡; 𝜃₁

 Policy 𝜋_𝑎,𝑔 𝑎_𝑡, 𝑔_𝑡, 𝑠_𝑡; 𝜃₂ for each sub-goal 𝑔_𝑡

Meta-Controller Controller

(mitigate reward sparsity issues)

https://arxiv.org/abs/1704.03084 Peng et.al., EMNLP 2017

136

Outline



Introduction



Background Knowledge

 Neural Network Basics

 Reinforcement Learning



Modular Dialogue System

 Spoken/Natural Language Understanding (SLU/NLU)

 Dialogue Management

 Dialogue State Tracking (DST)

 Dialogue Policy Optimization

 Natural Language Generation (NLG)



Recent Trends and Challenges

 End-to-End Neural Dialogue System

 Multimodality

 Dialogue Breath

 Dialogue Depth

136

Material: http://deepdialogue.miulab.tw

137

Brain Signal for Understanding

137



Misunderstanding detection by brain signal



Green: listen to the correct answer



Red: listen to the wrong answer

http://dl.acm.org/citation.cfm?id=2388695

Detecting misunderstanding via brain signal in order to correct the understanding results

138

Video for Intent Understanding

138

Proactive (from camera) I want to see a movie on TV!

Intent: turn_on_tv

在文檔中 Deep Learning for Dialogue Systems (頁 115-138)