• 沒有找到結果。

83

Dialogue Policy Optimization

Dialogue management in a RL framework

83

U s e r

Reward R

Observation O Action A

84

Reward for RL ≅ Evaluation for System

Dialogue is a special RL task

Human involves in interaction and rating (evaluation) of a dialogue

Fully human-in-the-loop framework

Rating: correctness, appropriateness, and adequacy

- Expert rating high quality, high cost

- User rating unreliable quality, medium cost - Objective rating Check desired aspects, low cost

84

Material: http://deepdialogue.miulab.tw

85

Reinforcement Learning for Dialogue Policy Optimization

85

Language understanding

Language (response) generation

Dialogue Policy 𝑎 = 𝜋(𝑠)

Collect rewards (𝑠, 𝑎, 𝑟, 𝑠’)

Optimize 𝑄(𝑠, 𝑎) User input (o)

Response

𝑠

𝑎

Type of Bots State Action Reward

Social ChatBots Chat history System Response # of turns maximized;

Intrinsically motivated reward

InfoBots (interactive Q/A) User current question + Context

Answers to current question

Relevance of answer;

# of turns minimized

Task-Completion Bots User current input + Context

System dialogue act w/

slot value (or API calls)

Task success rate;

# of turns minimized

Goal: develop a generic deep RL algorithm to learn dialogue policy for all bot categories

86

Dialogue Reinforcement Learning Signal

Typical reward function

-1 for per turn penalty

Large reward at completion if successful

Typically requires domain knowledge

✔ Simulated user

✔ Paid users (Amazon Mechanical Turk)

✖ Real users

|||

86

The user simulator is usually required for dialogue system training before deployment

Material: http://deepdialogue.miulab.tw

87

Neural Dialogue Manager (Li et al., 2017)

Deep Q-network for training DM policy

Input: current semantic frame observation, database returned results

Output: system action

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location DQN-based

Dialogue Management

Simulated User

(DM)

Backend DB

https://arxiv.org/abs/1703.01008

88

SL + RL for Sample Efficiency (Su et al., 2017)

Issue about RL for DM

slow learning speed

cold start

Solutions

Sample-efficient actor-critic

Off-policy learning with experience replay

Better gradient update

Utilizing supervised data

Pretrain the model with SL and then fine-tune with RL

Mix SL and RL data during RL learning

Combine both

88 https://arxiv.org/pdf/1707.00130.pdf Su et.al., SIGDIAL 2017

Material: http://deepdialogue.miulab.tw

89

Online Training (Su et al., 2015; Su et al., 2016)

Policy learning from real users

Infer reward directly from dialogues

(Su et al., 2015)

User rating

(Su et al., 2016)

Reward modeling on user binary success rating

Reward

Model

Success/Fail

Embedding

Function

Dialogue Representation

Reinforcement Signal Query rating

http://www.anthology.aclweb.org/W/W15/W15-46.pdf; https://www.aclweb.org/anthology/P/P16/P16-1230.pdf

90

Interactive RL for DM (Shah et al., 2016)

90

Immediate Feedback

https://research.google.com/pubs/pub45734.html

Use a third agent for providing interactive feedback to the DM

Material: http://deepdialogue.miulab.tw

91

Interpreting Interactive Feedback (Shah et al., 2016)

91 https://research.google.com/pubs/pub45734.html

92

Dialogue Management Evaluation

Metrics

Turn-level evaluation: system action accuracy

Dialogue-level evaluation: task success rate, reward

92

Material: http://deepdialogue.miulab.tw

93

Outline

Introduction

Background Knowledge

Neural Network Basics

Reinforcement Learning

Modular Dialogue System

Spoken/Natural Language Understanding (SLU/NLU)

Dialogue Management

Dialogue State Tracking (DST)

Dialogue Policy Optimization

Natural Language Generation (NLG)

Evaluation

Recent Trends and Challenges

End-to-End Neural Dialogue System

Multimodality

Dialogue Breath

Dialogue Depth

93

94

Natural Language Generation (NLG)

Mapping semantic frame into natural language

inform(name=Seven_Days, foodtype=Chinese) Seven Days is a nice Chinese restaurant

94

Material: http://deepdialogue.miulab.tw

95

Template-Based NLG

Define a set of rules to map frames to NL

95

Pros:

simple, error-free, easy to control

Cons: time-consuming, poor scalability Semantic Frame Natural Language

confirm() “Please tell me more about the product your are looking for.”

confirm(area=$V) “Do you want somewhere in the $V?”

confirm(food=$V) “Do you want a $V restaurant?”

confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.”

96

Plan-Based NLG (Walker et al., 2002)

Divide the problem into pipeline

Statistical sentence plan generator

(Stent et al., 2009)

Statistical surface realizer

(Dethlefs et al., 2013; Cuayáhuitl et al., 2014; …) Inform(

name=Z_House, price=cheap )

Z House is a cheap restaurant.

Pros:

can model complex linguistic structures

Cons: heavily engineered, require domain knowledge Sentence

Plan Generator

Sentence Plan Reranker

Surface Realizer

syntactic tree

Material: http://deepdialogue.miulab.tw

97

Class-Based LM NLG (Oh and Rudnicky, 2000)

Class-based language modeling

NLG by decoding

97

Pros:

easy to implement/ understand, simple rules

Cons: computationally inefficient

Classes:

inform_area inform_address

request_area request_postcode

http://dl.acm.org/citation.cfm?id=1117568

98

Phrase-Based NLG (Mairesse et al, 2010)

Semantic DBN Phrase

DBN

Charlie Chan is a Chinese Restaurant near Cineworld in the centre

d d

Inform(name=Charlie Chan, food=Chinese, type= restaurant, near=Cineworld, area=centre)

98

Pros:

efficient, good performance

Cons: require semantic alignments

realization phrase semantic stack

http://dl.acm.org/citation.cfm?id=1858838

Material: http://deepdialogue.miulab.tw

99

RNN-Based LM NLG (Wen et al., 2015)

<BOS> SLOT_NAME serves SLOT_FOOD .

<BOS> Din Tai Fung serves Taiwanese . delexicalisation

Inform(name=Din Tai Fung, food=Taiwanese) 0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, 0, 0, 0…

dialogue act 1-hot representation

SLOT_NAME serves SLOT_FOOD . <EOS>

Slot weight tying

conditioned on the dialogue act

Input

Output

http://www.anthology.aclweb.org/W/W15/W15-46.pdf#page=295

100

Handling Semantic Repetition

Issue: semantic repetition

Din Tai Fung is a great Taiwanese restaurant that serves Taiwanese.

Din Tai Fung is a child friendly restaurant, and also allows kids.

Deficiency in either model or decoding (or both)

Mitigation

Post-processing rules

(Oh & Rudnicky, 2000)

Gating mechanism (Wen et al., 2015)

Attention(Mei et al., 2016; Wen et al., 2015)

100

Material: http://deepdialogue.miulab.tw

101

Original LSTM cell

Dialogue act (DA) cell

Modify C

t

Semantic Conditioned LSTM (Wen et al., 2015)

DA cell LSTM cell

Ct

it

ft

ot

rt

ht

dt

dt-1

xt

xt ht-1

xt ht-1 xt ht-1 xt h

t-1

ht-1

Inform(name=Seven_Days, food=Chinese)

0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, … dialog act 1-hot representation d0

101

Idea: using gate mechanism to control the generated semantics (dialogue act/slots)

http://www.aclweb.org/anthology/D/D15/D15-1199.pdf

102

Structural NLG (Dušek and Jurčíček, 2016)

Goal: NLG based on the syntax tree

Encode trees as sequences

Seq2Seq model for generation

102 https://www.aclweb.org/anthology/P/P16/P16-2.pdf#page=79

Material: http://deepdialogue.miulab.tw

103

Contextual NLG (Dušek and Jurčíček, 2016)

Goal: adapting users’ way of speaking, providing context-aware responses

Context encoder

Seq2Seq model

103 https://www.aclweb.org/anthology/W/W16/W16-36.pdf#page=203

104

Controlled Text Generation (Hu et al., 2017)

Idea: NLG based on generative adversarial network (GAN) framework

c: targeted sentence attributes

https://arxiv.org/pdf/1703.00955.pdf

Material: http://deepdialogue.miulab.tw

105

NLG Evaluation

Metrics

Subjective: human judgement

(Stent et al., 2005)

Adequacy: correct meaning

Fluency: linguistic fluency

Readability: fluency in the dialogue context

Variation: multiple realizations for the same concept

Objective: automatic metrics

Word overlap: BLEU

(Papineni et al, 2002)

, METEOR, ROUGE

Word embedding based: vector extrema, greedy matching, embedding average

There is a gap between human perception and automatic metrics

105

Evaluation

106

107

Dialogue System Evaluation

107

Dialogue model evaluation

Crowd sourcing

User simulator

Response generator evaluation

Word overlap metrics

Embedding based metrics

108

Crowdsourcing for Dialogue System Evaluation (Yang et al., 2012)

108 http://www-scf.usc.edu/~zhaojuny/docs/SDSchapter_final.pdf

The normalized mean scores of Q2 and Q5 for approved ratings in each category. A higher score maps to a higher level of task success

Material: http://deepdialogue.miulab.tw

109

User Simulation

Goal: generate natural and reasonable conversations to enable reinforcement learning for exploring the policy space

Approach

Rule-based crafted by experts (Li et al., 2016)

Learning-based (Schatzmann et al., 2006; El Asri et al., 2016, Crook and Marin, 2017) Dialogue

Corpus

Simulated User

Real User

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy

Interaction

110

Elements of User Simulation

Error Model

• Recognition error

• LU error

Dialogue State Tracking (DST)

System dialogue acts

Reward

Backend Action / Knowledge Providers

Dialogue Policy Optimization

Dialogue Management (DM)

User Model Reward Model

User Simulation

Distribution over user dialogue acts (semantic frames)

The error model enables the system to maintain the robustness

Material: http://deepdialogue.miulab.tw

111

Rule-Based Simulator for RL Based System (Li et al., 2016)

111

rule-based simulator + collected data

starts with sets of goals, actions, KB, slot types

publicly available simulation framework

movie-booking domain: ticket booking and movie seeking

provide procedures to add and test own agent

http://arxiv.org/abs/1612.05688

112

Model-Based User Simulators

Bi-gram models (Levin et.al. 2000)

Graph-based models (Scheffler and Young, 2000)

Data Driven Simulator (Jung et.al., 2009)

Neural Models (deep encoder-decoder)

112

Material: http://deepdialogue.miulab.tw

113

Data-Driven Simulator (Jung et.al., 2009)

113

Three step process

1)

User intention simulator

Current discourse status (t-1) User’s current

semantic frame (t-1)

Current discourse

status (t) User’s current

semantic frame (t)

Current discourse

status User’s current

semantic frame

request+search_loc

(*) compute all possible semantic frame given previous turn info

(*) randomly select one possible semantic frame

features (DD+DI)

114

Data-Driven Simulator (Jung et.al., 2009)

114

Three step process

1)

User intention simulator

2)

User utterance simulator

request+search_loc

Given a list of POS tags associated with the semantic frame, using LM+Rules

they generate the user utterance.

I want to go to the city hall

在文檔中 Deep Learning for Dialogue Systems (頁 83-114)

相關文件