Agent - Deep Learning for Dialogue Systems

Dialogue Policy Optimization



Dialogue management in a RL framework

U s e r

Reward R

Observation O Action A

Reward for RL ≅ Evaluation for System



Dialogue is a special RL task



Human involves in interaction and rating (evaluation) of a dialogue



Fully human-in-the-loop framework



Rating: correctness, appropriateness, and adequacy

- Expert rating high quality, high cost

- User rating unreliable quality, medium cost - Objective rating Check desired aspects, low cost

Material: http://deepdialogue.miulab.tw

Reinforcement Learning for Dialogue Policy Optimization

Language understanding

Language (response) generation

Dialogue Policy 𝑎 = 𝜋(𝑠)

Collect rewards (𝑠, 𝑎, 𝑟, 𝑠’)

Optimize 𝑄(𝑠, 𝑎) User input (o)

Response

𝑠

𝑎

Type of Bots State Action Reward

Social ChatBots Chat history System Response # of turns maximized;

Intrinsically motivated reward

InfoBots (interactive Q/A) User current question + Context

Answers to current question

Relevance of answer;

# of turns minimized

Task-Completion Bots User current input + Context

System dialogue act w/

slot value (or API calls)

Task success rate;

# of turns minimized

Goal: develop a generic deep RL algorithm to learn dialogue policy for all bot categories

Dialogue Reinforcement Learning Signal



Typical reward function



-1 for per turn penalty



Large reward at completion if successful



Typically requires domain knowledge

✔ Simulated user

✔ Paid users (Amazon Mechanical Turk)

✖ Real users

|||

…

﹅

The user simulator is usually required for dialogue system training before deployment

Material: http://deepdialogue.miulab.tw

Neural Dialogue Manager (Li et al., 2017)



Deep Q-network for training DM policy



Input: current semantic frame observation, database returned results



Output: system action

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location DQN-based

Dialogue Management

Simulated User

(DM)

Backend DB

https://arxiv.org/abs/1703.01008

SL + RL for Sample Efficiency (Su et al., 2017)



Issue about RL for DM



slow learning speed



cold start



Solutions



Sample-efficient actor-critic



Off-policy learning with experience replay



Better gradient update



Utilizing supervised data



Pretrain the model with SL and then fine-tune with RL



Mix SL and RL data during RL learning



Combine both

88 https://arxiv.org/pdf/1707.00130.pdf Su et.al., SIGDIAL 2017

Material: http://deepdialogue.miulab.tw

Online Training (Su et al., 2015; Su et al., 2016)



Policy learning from real users



Infer reward directly from dialogues

(Su et al., 2015)



User rating

(Su et al., 2016)



Reward modeling on user binary success rating

Reward

Model

Success/Fail

Embedding

Function

Dialogue Representation

Reinforcement Signal Query rating

http://www.anthology.aclweb.org/W/W15/W15-46.pdf; https://www.aclweb.org/anthology/P/P16/P16-1230.pdf

Interactive RL for DM (Shah et al., 2016)

Immediate Feedback

https://research.google.com/pubs/pub45734.html

Use a third agent for providing interactive feedback to the DM

Material: http://deepdialogue.miulab.tw

Interpreting Interactive Feedback (Shah et al., 2016)

91 https://research.google.com/pubs/pub45734.html

Dialogue Management Evaluation



Metrics



Turn-level evaluation: system action accuracy



Dialogue-level evaluation: task success rate, reward

Material: http://deepdialogue.miulab.tw

Outline

 Introduction

 Background Knowledge

 Neural Network Basics

 Reinforcement Learning

 Modular Dialogue System

 Spoken/Natural Language Understanding (SLU/NLU)

 Dialogue Management

 Dialogue State Tracking (DST)

 Dialogue Policy Optimization

 Natural Language Generation (NLG)

 Evaluation

 Recent Trends and Challenges

 End-to-End Neural Dialogue System

 Multimodality

 Dialogue Breath

 Dialogue Depth

Natural Language Generation (NLG)



Mapping semantic frame into natural language

inform(name=Seven_Days, foodtype=Chinese) Seven Days is a nice Chinese restaurant

Material: http://deepdialogue.miulab.tw

Template-Based NLG



Define a set of rules to map frames to NL

Pros:

simple, error-free, easy to control

Cons: time-consuming, poor scalability Semantic Frame Natural Language

confirm() “Please tell me more about the product your are looking for.”

confirm(area=$V) “Do you want somewhere in the $V?”

confirm(food=$V) “Do you want a $V restaurant?”

confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.”

Plan-Based NLG (Walker et al., 2002)



Divide the problem into pipeline



Statistical sentence plan generator

(Stent et al., 2009)



Statistical surface realizer

(Dethlefs et al., 2013; Cuayáhuitl et al., 2014; …) Inform(

name=Z_House, price=cheap )

Z House is a cheap restaurant.

Pros:

can model complex linguistic structures

Cons: heavily engineered, require domain knowledge Sentence

Plan Generator

Sentence Plan Reranker

Surface Realizer

syntactic tree

Material: http://deepdialogue.miulab.tw

Class-Based LM NLG (Oh and Rudnicky, 2000)



Class-based language modeling



NLG by decoding

Pros:

easy to implement/ understand, simple rules

Cons: computationally inefficient

Classes:

inform_area inform_address

…

request_area request_postcode

http://dl.acm.org/citation.cfm?id=1117568

Phrase-Based NLG (Mairesse et al, 2010)

Semantic DBN Phrase

DBN

Charlie Chan is a Chinese Restaurant near Cineworld in the centre

d d

Inform(name=Charlie Chan, food=Chinese, type= restaurant, near=Cineworld, area=centre)

Pros:

efficient, good performance

Cons: require semantic alignments

realization phrase semantic stack

http://dl.acm.org/citation.cfm?id=1858838

Material: http://deepdialogue.miulab.tw

RNN-Based LM NLG (Wen et al., 2015)

<BOS> SLOT_NAME serves SLOT_FOOD .

<BOS> Din Tai Fung serves Taiwanese . delexicalisation

Inform(name=Din Tai Fung, food=Taiwanese) 0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, 0, 0, 0…

dialogue act 1-hot representation

SLOT_NAME serves SLOT_FOOD . <EOS>

Slot weight tying

conditioned on the dialogue act

Input

Output

http://www.anthology.aclweb.org/W/W15/W15-46.pdf#page=295

100

Handling Semantic Repetition



Issue: semantic repetition



Din Tai Fung is a great Taiwanese restaurant that serves Taiwanese.



Din Tai Fung is a child friendly restaurant, and also allows kids.



Deficiency in either model or decoding (or both)



Mitigation



Post-processing rules

(Oh & Rudnicky, 2000)

 Gating mechanism (Wen et al., 2015)

 Attention(Mei et al., 2016; Wen et al., 2015)

100

Material: http://deepdialogue.miulab.tw

101



Original LSTM cell



Dialogue act (DA) cell



Modify C

Semantic Conditioned LSTM (Wen et al., 2015)

DA cell LSTM cell

i_t

o_t

r_t

h_t

dt-1

x_t

x_t h_t-1

x_t h_t-1 xt h_t-1 x_t h

t-1

h_t-1

Inform(name=Seven_Days, food=Chinese)

0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, … dialog act 1-hot representation d₀

101

Idea: using gate mechanism to control the generated semantics (dialogue act/slots)

http://www.aclweb.org/anthology/D/D15/D15-1199.pdf

102

Structural NLG (Dušek and Jurčíček, 2016)



Goal: NLG based on the syntax tree



Encode trees as sequences



Seq2Seq model for generation

102 https://www.aclweb.org/anthology/P/P16/P16-2.pdf#page=79

Material: http://deepdialogue.miulab.tw

103

Contextual NLG (Dušek and Jurčíček, 2016)



Goal: adapting users’ way of speaking, providing context-aware responses



Context encoder



Seq2Seq model

103 https://www.aclweb.org/anthology/W/W16/W16-36.pdf#page=203

104

Controlled Text Generation (Hu et al., 2017)



Idea: NLG based on generative adversarial network (GAN) framework



c: targeted sentence attributes

https://arxiv.org/pdf/1703.00955.pdf

Material: http://deepdialogue.miulab.tw

105

NLG Evaluation



Metrics



Subjective: human judgement

(Stent et al., 2005)



Adequacy: correct meaning



Fluency: linguistic fluency



Readability: fluency in the dialogue context



Variation: multiple realizations for the same concept



Objective: automatic metrics



Word overlap: BLEU

(Papineni et al, 2002)

, METEOR, ROUGE



Word embedding based: vector extrema, greedy matching, embedding average

There is a gap between human perception and automatic metrics

105

Evaluation

106

107

Dialogue System Evaluation

107



Dialogue model evaluation



Crowd sourcing



User simulator



Response generator evaluation



Word overlap metrics



Embedding based metrics

108

Crowdsourcing for Dialogue System Evaluation (Yang et al., 2012)

108 http://www-scf.usc.edu/~zhaojuny/docs/SDSchapter_final.pdf

The normalized mean scores of Q2 and Q5 for approved ratings in each category. A higher score maps to a higher level of task success

Material: http://deepdialogue.miulab.tw

109

User Simulation



Goal: generate natural and reasonable conversations to enable reinforcement learning for exploring the policy space



Approach



Rule-based crafted by experts (Li et al., 2016)



Learning-based (Schatzmann et al., 2006; El Asri et al., 2016, Crook and Marin, 2017) Dialogue

Corpus

Simulated User

Real User

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy

Interaction

110

Elements of User Simulation

Error Model

• Recognition error

• LU error

Dialogue State Tracking (DST)

System dialogue acts

Reward

Backend Action / Knowledge Providers

Dialogue Policy Optimization

Dialogue Management (DM)

User Model Reward Model

User Simulation

Distribution over user dialogue acts (semantic frames)

The error model enables the system to maintain the robustness

Material: http://deepdialogue.miulab.tw

111

Rule-Based Simulator for RL Based System (Li et al., 2016)

111



rule-based simulator + collected data



starts with sets of goals, actions, KB, slot types



publicly available simulation framework



movie-booking domain: ticket booking and movie seeking



provide procedures to add and test own agent

http://arxiv.org/abs/1612.05688

112

Model-Based User Simulators



Bi-gram models (Levin et.al. 2000)



Graph-based models (Scheffler and Young, 2000)



Data Driven Simulator (Jung et.al., 2009)



Neural Models (deep encoder-decoder)

112

Material: http://deepdialogue.miulab.tw

113

Data-Driven Simulator (Jung et.al., 2009)

113



Three step process

User intention simulator

Current discourse status (t-1) User’s current

semantic frame (t-1)

Current discourse

status (t) User’s current

semantic frame (t)

Current discourse

status User’s current

semantic frame

request+search_loc

(*) compute all possible semantic frame given previous turn info

(*) randomly select one possible semantic frame

features (DD+DI)

114

Data-Driven Simulator (Jung et.al., 2009)

114



Three step process

User intention simulator

User utterance simulator

request+search_loc

Given a list of POS tags associated with the semantic frame, using LM+Rules

they generate the user utterance.

I want to go to the city hall

在文檔中 Deep Learning for Dialogue Systems (頁 83-114)