Task-Oriented Dialogue Systems - Open-Domain Neural Dialogue Systems

37 Material: http://opendialogue.miulab.tw

Task-Oriented Dialogue System

(Young, 2000)

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Backend Action / Knowledge Providers

38 Material: http://opendialogue.miulab.tw

Outline

 PART I. Introduction & Background Knowledge

 PART II. Task-Oriented Dialogue Systems

 Spoken/Natural Language Understanding (SLU/NLU)

 Dialogue Management – Dialogue State Tracking (DST)

 Dialogue Management – Dialogue Policy Optimization

 Natural Language Generation (NLG)

 End-to-End Task-Oriented Dialogue Systems

 PART III. Social Chat Bots

 PART IV. Evaluation

 PART V. Recent Trends and Challenges

39 Material: http://opendialogue.miulab.tw

Language Understanding (LU)

 Pipelined

1. Domain Classification

2. Intent

Classification 3. Slot Filling

LU – Domain/Intent Classification

• Given a collection of utterances u_iwith labels c_i, D= {(u₁,c₁),…,(u_n,c_n)}

where c_i∊ C, train a model to estimate labels for new utterances u_k. Mainly viewed as an utterance classification task

find me a cheap taiwanese restaurant in oakland

Movies Restaurants Sports

Weather Music

…

Find_movie Buy_tickets Find_restaurant Book_table Find_lyrics

…

41 Material: http://opendialogue.miulab.tw

DNN for Domain/Intent Classification

(Ravuri & Stolcke, 2015)

Intent decision after reading all words performs better

 RNN and LSTMs for utterance classification

42 Material: http://opendialogue.miulab.tw

DNN for Dialogue Act Classification

(Lee & Dernoncourt, 2016)

 RNN and CNNs for dialogue act classification

LU – Slot Filling

flights from Boston to New York today

O O B-city O B-city I-city O

O O B-dept O B-arrival I-arrival B-date

As a sequence tagging task

• Given a collection tagged word sequences, S={((w_1,1,w_1,2,…, w_1,n1), (t_1,1,t_1,2,…,t_1,n1)), ((w_2,1,w_2,2,…,w_2,n2), (t_2,1,t_2,2,…,t_2,n2)) …}

where t_i ∊ M, the goal is to estimate tags for a new word sequence.

flights from Boston to New York today

Entity Tag Slot Tag

44 Material: http://opendialogue.miulab.tw

RNN for Slot Tagging – I

(Yao et al, 2013; Mesnil et al, 2015)

 Variations:

a. RNNs with LSTM cells

b. Input, sliding window of n-grams

c. Bi-directional LSTMs

𝑤₀ 𝑤₁ 𝑤₂ 𝑤_𝑛 ℎ₀^𝑓 ℎ₁^𝑓 ℎ₂^𝑓 ℎ_𝑛^𝑓 ℎ₀^𝑏 ℎ₁^𝑏 ℎ₂^𝑏 ℎ_𝑛^𝑏 𝑦₀ 𝑦₁ 𝑦₂ 𝑦_𝑛

(b) LSTM-LA (c) bLSTM

𝑦₀ 𝑦₁ 𝑦₂ 𝑦_𝑛

𝑤₀ 𝑤₁ 𝑤₂ 𝑤_𝑛 ℎ₀ ℎ₁ ℎ₂ ℎ_𝑛

(a) LSTM 𝑦₀ 𝑦₁ 𝑦₂ 𝑦_𝑛

𝑤₀ 𝑤₁ 𝑤₂ 𝑤_𝑛 ℎ₀ ℎ₁ ℎ₂ ℎ_𝑛

45 Material: http://opendialogue.miulab.tw

RNN for Slot Tagging – II

(Kurata et al., 2016; Simonnet et al., 2015)

 Encoder-decoder networks

 Leverages sentence level information

 Attention-based encoder-decoder

 Use of attention (as in MT) in the encoder-decoder network

 Attention is estimated using a

feed-forward network with input: h_t and s_t at time t

𝑦₀ 𝑦₁ 𝑦₂ 𝑦_𝑛

𝑤_𝑛 𝑤₂ 𝑤₁ 𝑤₀ ℎ_𝑛 ℎ₂ ℎ₁ ℎ₀

𝑤₀ 𝑤₁ 𝑤₂ 𝑤_𝑛

𝑦₀ 𝑦₁ 𝑦₂ 𝑦_𝑛

𝑤₀ 𝑤₁ 𝑤₂ 𝑤_𝑛

ℎ₀ ℎ₁ ℎ₂ ℎ_𝑛 𝑠₀ 𝑠₁ 𝑠₂ 𝑠_𝑛

c_i ℎ₀…ℎ_𝑛

46 Material: http://opendialogue.miulab.tw

RNN for Slot Tagging – III

(Jaech et al., 2016; Tafforeau et al., 2016)

 Multi-task learning

 Goal: exploit data from domains/tasks with a lot of data to improve ones with less data

 Lower layers are shared across domains/tasks

 Output layer is specific to task

47 Material: http://opendialogue.miulab.tw

Joint Segmentation and Slot Tagging

(Zhai et al., 2017)

 Encoder that segments

 Decoder that tags the segments

t-1

h_t+

h_t

W W W W

taiwanese

B-type U

food U

please U

O V

h_T+1 EOS U

FIND_REST V

Slot Filling Intent Prediction

Joint Semantic Frame Parsing

Sequence-based

(Hakkani-Tur+ 16)

• Slot filling and intent prediction in the same

output sequence

Parallel-based (Liu+ 16)

• Intent prediction and slot filling are performed in two branches

49 Material: http://opendialogue.miulab.tw

Contextual LU

just sent email to bob about fishing this weekend

O O O O

B-contact_name O

B-subject I-subject I-subject U

I send_email D communication

 send_email(contact_name=“bob”, subject=“fishing this weekend”)

are we going to fish this weekend U₁

S₂

 send_email(message=“are we going to fish this weekend”) send email to bob

U₂

 send_email(contact_name=“bob”)

B-message

I-messageI-message I-message I-message I-message I-message

B-contact_name S₁

Domain Identification  Intent Prediction  Slot Filling

50 Material: http://opendialogue.miulab.tw

Contextual LU

 User utterances are highly ambiguous in isolation

Cascal, for 6.

#people time

Book a table for 10 people tonight.

Which restaurant would you like to book a table for?

Restaurant Booking

51 Material: http://opendialogue.miulab.tw

Contextual LU

(Bhargava et al., 2013; Hori et al, 2015)

 Leveraging contexts

 Used for individual tasks

 Seq2Seq model

 Words are input one at a time, tags are output at the end of each utterance

 Extension: LSTM with speaker role dependent layers

52 Material: http://opendialogue.miulab.tw

End-to-End Memory Networks

(Sukhbaatar et al, 2015)

U: “i d like to purchase tickets to see deepwater horizon”

S: “for which theatre”

U: “angelika”

S: “you want them for angelika theatre?”

U: “yes angelika”

S: “how many tickets would you like ?”

U: “3 tickets for saturday”

S: “What time would you like ?”

U: “Any time on saturday is fine”

S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm”

U: “Let’s do 5:40”

m₀

m_i

m_n-1 u

53 Material: http://opendialogue.miulab.tw

E2E MemNN for Contextual LU

(Chen et al., 2016)

Knowledge Attention Distribution

p_i

m_i

Memory Representation

Weighted

Sum h

∑ W_kg

Knowledge Encoding o

Representation history utterances {x_i}

current utterance

Inner Product Sentence

Encoder RNN_in

x1 x2 … xi

Contextual Sentence Encoder

x1 x2 … xi

RNN_mem

slot tagging sequencey

h_t-1 h_t

V V

W W W

w_t-1 w_t y_t-1 y_t

U U

M M

1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding

Idea: additionally incorporating contextual knowledge during slot tagging

 track dialogue states in a latent way

RNN Tagger

54 Material: http://opendialogue.miulab.tw

Analysis of Attention

U: “i d like to purchase tickets to see deepwater horizon”

S: “for which theatre”

U: “angelika”

S: “you want them for angelika theatre?”

U: “yes angelika”

S: “how many tickets would you like ?”

U: “3 tickets for saturday”

S: “What time would you like ?”

U: “Any time on saturday is fine”

S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm”

U: “Let’s do 5:40”

0.69

0.13

0.16

55 Material: http://opendialogue.miulab.tw

Sequential Dialogue Encoder Network

(Bapna et al., 2017)

 Past and current turn encodings input to a feed forward network

55 Bapna et.al., SIGDIAL 2017

56 Material: http://opendialogue.miulab.tw

Structural LU

(Chen et al., 2016)

 K-SAN: prior knowledge as a teacher

Knowledge Encoding

Sentence Encoding

Inner Product

m_i

Knowledge Attention Distribution

p_i

Encoded Knowledge Representation

Weighted Sum

∑

Knowledge-Guided Representation

slot tagging sequence knowledge-guided structure {x_i}

showme theflights fromseattleto sanfrancisco

ROOT

Input Sentence

W W W W

w_t-1

y_t-1 U

w_t M U

w_t+1 U

y_t V

y_t+1 V M

RNN Tagger

Knowledge Encoding Module

57 Material: http://opendialogue.miulab.tw

Structural LU

(Chen et al., 2016)

 Sentence structural knowledge stored as memory

Semantics (AMR Graph)

show me

the

flights from seattle

san francisco ROOT

show

you flight I

city city

Seattle San Francisco 3.

Sentence s show me the flights from seattle to san francisco

Syntax (Dependency Tree)

58 Material: http://opendialogue.miulab.tw

Structural LU

(Chen et al., 2016)

 Sentence structural knowledge stored as memory

Using less training data with K-SAN allows the model pay the similar attention to the salient substructures that are important for tagging.

59 Material: http://opendialogue.miulab.tw

LU Importance

(Li et al., 2017)

 Compare different types of LU errors

Slot filling is more important than intent detection in language understanding

Sensitivity to Intent Error Sensitivity to Slot Error

60 Material: http://opendialogue.miulab.tw

LU Evaluation



Metrics

 Sub-sentence-level: intent accuracy, slot F1

 Sentence-level: whole frame accuracy

61 Material: http://opendialogue.miulab.tw

Outline

 PART I. Introduction & Background Knowledge

 PART II. Task-Oriented Dialogue Systems

 Spoken/Natural Language Understanding (SLU/NLU)

 Dialogue Management – Dialogue State Tracking (DST)

 Dialogue Management – Dialogue Policy Optimization

 Natural Language Generation (NLG)

 End-to-End Task-Oriented Dialogue Systems

 PART III. Social Chat Bots

 PART IV. Evaluation

 PART V. Recent Trends and Challenges

62 Material: http://opendialogue.miulab.tw

Elements of Dialogue Management

(Figure from Gašić) 62

Dialogue State Tracking

63 Material: http://opendialogue.miulab.tw

Dialogue State Tracking (DST)

 Maintain a probabilistic distribution instead of a 1-best prediction for better robustness to SLU errors or ambiguous input

How can I help you?

Book a table at Sumiko for 5 How many people?

Slot Value

# people 5 (0.5)

time 5 (0.5)

Slot Value

# people 3 (0.8)

time 5 (0.8)

64 Material: http://opendialogue.miulab.tw

Multi-Domain Dialogue State Tracking (DST)

 A full representation of the system's belief of the user's goal at any point during the dialogue

 Used for making API calls

Do you wanna take Angela to go see a movie tonight?

Sure, I will be home by 6.

Let's grab dinner before the movie.

How about some Mexican?

Let's go to Vive Sol and see Inferno after that.

Angela wants to watch the Trolls movie.

Ok. Lets catch the 8 pm show.

Inferno

6 pm 7 pm

2 3

11/15/16

Vive Sol Restaurant

Mexican Cuisine

6:30 pm 7 pm 11/15/16 Date

Time

Restaurants

7:30 pm

Century 16

Trolls

8 pm 9 pm

Movies

65 Material: http://opendialogue.miulab.tw

Dialog State Tracking Challenge (DSTC)

(Williams et al. 2013, Henderson et al. 2014, Henderson et al. 2014, Kim et al. 2016, Kim et al. 2016)

Challenge Type Domain Data Provider Main Theme

DSTC1 Human-Machine Bus Route CMU Evaluation Metrics

DSTC2 Human-Machine Restaurant U. Cambridge User Goal Changes

DSTC3 Human-Machine Tourist Information U. Cambridge Domain Adaptation

DSTC4 Human-Human Tourist Information I2R Human Conversation

DSTC5 Human-Human Tourist Information I2R Language Adaptation

66 Material: http://opendialogue.miulab.tw

NN-Based DST

(Henderson et al., 2013; Mrkšić et al., 2015; Mrkšić et al., 2016)

(Figure from Wen et al, 2016) 66

67 Material: http://opendialogue.miulab.tw

Neural Belief Tracker

(Mrkšić et al., 2016)

68 Material: http://opendialogue.miulab.tw

DST Evaluation

 Dialogue State Tracking Challenges

 DSTC2-3, human-machine

 DSTC4-5, human-human

 Metric

 Tracked state accuracy with respect to user goal

 Recall/Precision/F-measure individual slots

69 Material: http://opendialogue.miulab.tw

Outline

 PART I. Introduction & Background Knowledge

 PART II. Task-Oriented Dialogue Systems

 Spoken/Natural Language Understanding (SLU/NLU)

 Dialogue Management – Dialogue State Tracking (DST)

 Dialogue Management – Dialogue Policy Optimization

 Natural Language Generation (NLG)

 End-to-End Task-Oriented Dialogue Systems

 PART III. Social Chat Bots

 PART IV. Evaluation

 PART V. Recent Trends and Challenges

70 Material: http://opendialogue.miulab.tw

Elements of Dialogue Management

(Figure from Gašić) 70

Dialogue Policy Optimization

71 Material: http://opendialogue.miulab.tw

Dialogue Policy Optimization

 Dialogue management in a RL framework

U s e r

Reward R Observation O Action A

Environment

Agent

Natural Language Generation Language Understanding

Dialogue Manager

Slides credited by Pei-Hao Su

Optimized dialogue policy selects the best action that can maximize the future reward.

Correct rewards are a crucial factor in dialogue policy training

72 Material: http://opendialogue.miulab.tw

Reward for RL ≅ Evaluation for System

 Dialogue is a special RL task

 Human involves in interaction and rating (evaluation) of a dialogue

 Fully human-in-the-loop framework

 Rating: correctness, appropriateness, and adequacy

- Expert rating high quality, high cost

- User rating unreliable quality, medium cost - Objective rating Check desired aspects, low cost

73 Material: http://opendialogue.miulab.tw

Reinforcement Learning for Dialogue Policy Optimization

Language understanding

Language (response) generation

Dialogue Policy 𝑎 = 𝜋(𝑠)

Collect rewards (𝑠, 𝑎, 𝑟, 𝑠’)

Optimize 𝑄(𝑠, 𝑎) User input (o)

Response

𝑠

𝑎

Type of Bots State Action Reward

Social ChatBots Chat history System Response # of turns maximized;

Intrinsically motivated reward

InfoBots (interactive Q/A) User current question + Context

Answers to current question

Relevance of answer;

# of turns minimized

Task-Completion Bots User current input + Context

System dialogue act w/

slot value (or API calls)

Task success rate;

# of turns minimized

Goal: develop a generic deep RL algorithm to learn dialogue policy for all bot categories

74 Material: http://opendialogue.miulab.tw

Dialogue Reinforcement Learning Signal

 Typical reward function

 -1 for per turn penalty

 Large reward at completion if successful

 Typically requires domain knowledge

✔ Simulated user

✔ Paid users (Amazon Mechanical Turk)

✖ Real users

|||

…

﹅

The user simulator is usually required for dialogue system training before deployment

75 Material: http://opendialogue.miulab.tw

Neural Dialogue Manager

(Li et al., 2017)

 Deep Q-network for training DM policy

 Input: current semantic frame observation, database returned results

 Output: system action

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location DQN-based

Dialogue Management

Simulated User (DM) Backend DB

Material: http://deepdialogue.miulab.tw

76 Material: http://opendialogue.miulab.tw

SL + RL for Sample Efficiency

(Su et al., 2017)

 Issue about RL for DM

 slow learning speed

 cold start

 Solutions

 Sample-efficient actor-critic

Off-policy learning with experience replay

Better gradient update

 Utilizing supervised data

Pretrain the model with SL and then fine-tune with RL

Mix SL and RL data during RL learning

Combine both

77 Material: http://opendialogue.miulab.tw

Online Training

(Su et al., 2015; Su et al., 2016)

 Policy learning from real users

 Infer reward directly from dialogues (Su et al., 2015)

 User rating (Su et al., 2016)

 Reward modeling on user binary success rating

Reward

Model Success/Fail Embedding

Function

Dialogue Representation

Reinforcement Signal Query rating

78 Material: http://opendialogue.miulab.tw

Interactive RL for DM

(Shah et al., 2016)

Immediate Feedback

Use a third agent for providing interactive feedback to the DM

79 Material: http://opendialogue.miulab.tw

Dialogue Management Evaluation

 Metrics

 Turn-level evaluation: system action accuracy

 Dialogue-level evaluation: task success rate, reward

80 Material: http://opendialogue.miulab.tw

Outline

 PART I. Introduction & Background Knowledge

 PART II. Task-Oriented Dialogue Systems

 Spoken/Natural Language Understanding (SLU/NLU)

 Dialogue Management – Dialogue State Tracking (DST)

 Dialogue Management – Dialogue Policy Optimization

 Natural Language Generation (NLG)

 End-to-End Task-Oriented Dialogue Systems

 PART III. Social Chat Bots

 PART IV. Evaluation

 PART V. Recent Trends and Challenges

81 Material: http://opendialogue.miulab.tw

Natural Language Generation (NLG)

 Mapping semantic frame into natural language

inform(name=Seven_Days, foodtype=Chinese) Seven Days is a nice Chinese restaurant

82 Material: http://opendialogue.miulab.tw

Template-Based NLG

 Define a set of rules to map frames to NL

Pros:simple, error-free, easy to control Cons: time-consuming, poor scalability Semantic Frame Natural Language

confirm() “Please tell me more about the product your are looking for.”

confirm(area=$V) “Do you want somewhere in the $V?”

confirm(food=$V) “Do you want a $V restaurant?”

confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.”

83 Material: http://opendialogue.miulab.tw

Plan-Based NLG

(Walker et al., 2002)

 Divide the problem into pipeline

 Statistical sentence plan generator (Stent et al., 2009)

 Statistical surface realizer (Dethlefs et al., 2013; Cuayáhuitl et al., 2014; …) Inform(

name=Z_House, price=cheap )

Z House is a cheap restaurant.

Pros:can model complex linguistic structures

Cons: heavily engineered, require domain knowledge Sentence

Plan Generator

Sentence Plan Reranker

Surface Realizer

syntactic tree

84 Material: http://opendialogue.miulab.tw

Class-Based LM NLG

(Oh and Rudnicky, 2000)

 Class-based language modeling

 NLG by decoding

Pros:easy to implement/ understand, simple rules Cons: computationally inefficient

Classes:

inform_area inform_address

…

request_area request_postcode

85 Material: http://opendialogue.miulab.tw

RNN-Based LM NLG

(Wen et al., 2015)

<BOS> SLOT_NAME serves SLOT_FOOD .

<BOS> Din Tai Fung serves Taiwanese . delexicalisation

Inform(name=Din Tai Fung, food=Taiwanese) 0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, 0, 0, 0…

dialogue act 1-hot representation

SLOT_NAME serves SLOT_FOOD . <EOS>

Slot weight tying

conditioned on the dialogue act

Input

Output

86 Material: http://opendialogue.miulab.tw

Handling Semantic Repetition

 Issue: semantic repetition

 Din Tai Fung is a great Taiwanese restaurant that serves Taiwanese.

 Din Tai Fung is a child friendly restaurant, and also allows kids.

 Deficiency in either model or decoding (or both)

 Mitigation

 Post-processing rules (Oh & Rudnicky, 2000)

 Gating mechanism (Wen et al., 2015)

 Attention(Mei et al., 2016; Wen et al., 2015)

87 Material: http://opendialogue.miulab.tw

 Original LSTM cell

 Dialogue act (DA) cell

 Modify C^t

Semantic Conditioned LSTM

(Wen et al., 2015)

DA cell LSTM cell

i_t

o_t

r_t

h_t

dt-1

x_t

x_t h_t-1

x_t h_t-1 xt h_t-1 x_t h

t-1

h_t-1

Inform(name=Seven_Days, food=Chinese)

0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, … dialog act 1-hot representation d₀

Idea: using gate mechanism to control the generated semantics (dialogue act/slots)

88 Material: http://opendialogue.miulab.tw

Structural NLG

(Dušek and Jurčíček, 2016)

 Goal: NLG based on the syntax tree

 Encode trees as sequences

 Seq2Seq model for generation

89 Material: http://opendialogue.miulab.tw

Contextual NLG

(Dušek and Jurčíček, 2016)

 Goal: adapting users’ way of speaking, providing context-aware responses

 Context encoder

 Seq2Seq model

90 Material: http://opendialogue.miulab.tw

Controlled Text Generation

(Hu et al., 2017)

 Idea: NLG based on generative adversarial network (GAN) framework

 c: targeted sentence attributes

91 Material: http://opendialogue.miulab.tw

NLG Evaluation

 Metrics

 Subjective: human judgement (Stent et al., 2005)

Adequacy: correct meaning

Fluency: linguistic fluency

Readability: fluency in the dialogue context

Variation: multiple realizations for the same concept

 Objective: automatic metrics

Word overlap: BLEU (Papineni et al, 2002), METEOR, ROUGE

Word embedding based: vector extrema, greedy matching, embedding average

There is a gap between human perception and automatic metrics 91

92 Material: http://opendialogue.miulab.tw

Outline

 PART I. Introduction & Background Knowledge

 PART II. Task-Oriented Dialogue Systems

 Spoken/Natural Language Understanding (SLU/NLU)

 Dialogue Management – Dialogue State Tracking (DST)

 Dialogue Management – Dialogue Policy Optimization

 Natural Language Generation (NLG)

 End-to-End Task-Oriented Dialogue Systems

 PART III. Social Chat Bots

 PART IV. Evaluation

 PART V. Recent Trends and Challenges

93 Material: http://opendialogue.miulab.tw

E2E Joint NLU and DM

(Yang et al., 2017)

 Errors from DM can be propagated to NLU for regularization + robustness

Model DM NLU

Baseline (CRF+SVMs) 7.7 33.1 Pipeline-BLSTM 12.0 36.4

JointModel 22.8 37.4

Both DM and NLU performance (frame accuracy) is improved

94 Material: http://opendialogue.miulab.tw

0 0 0 … 0 1

Database Operator

Copy field

…

Database

Seven days Curry Prince Nirala Royal Standard Little Seuol

DB pointer

Can I have korean Korean

0.7 British 0.2 French 0.1

…

Belief Tracker Intent Network

Can I have <v.food>

E2E Supervised Dialogue System

(Wen et al., 2017)

Generation Network

<v.name> serves great <v.food> .

Policy Network

p^t x^t

MySQL query:

“Select * where food=Korean”

95 Material: http://opendialogue.miulab.tw

E2E MemNN for Dialogues

(Bordes et al., 2017)

 Split dialogue system actions into subtasks

 API issuing

 API updating

 Option displaying

 Information informing

96 Material: http://opendialogue.miulab.tw

E2E RL-Based KB-InfoBot

(Dhingra et al., 2017)

Movie=?; Actor=Bill Murray; Release Year=1993

Find me the Bill Murray’s movie.

I think it came out in 1993.

When was it released?

Groundhog Day is a Bill Murray movie which came out in 1993.

KB-InfoBot User

Entity-Centric Knowledge Base

Idea: differentiable database for propagating the gradients 96

Movie Actor Release

Year Groundhog Day Bill Murray 1993

Australia Nicole Kidman X

Mad Max: Fury Road X 2015

97 Material: http://opendialogue.miulab.tw

E2E RL-Based System

(Zhao and Eskenazi, 2016)

 Joint learning

 NLU, DST, Dialogue Policy

 Deep RL for training

 Deep Q-network

 Deep recurrent network

Baseline RL

Hybrid-RL

98 Material: http://opendialogue.miulab.tw

E2E LSTM-Based Dialogue Control

(Williams and Zweig, 2016)

 Idea: an LSTM maps from raw dialogue history directly to a distribution over system actions

 Developers can provide software including business rules & programmatic APIs

 LSTM can take actions in the real world on behalf of the user

 The LSTM can be optimized using SL or RL

99 Material: http://opendialogue.miulab.tw

E2E Task-Completion Bot (TC-Bot)

(Li et al., 2017)

B-type

wi +1

wi+2

O O

EOS

<intent

w_i

B-type

wi +1

w_i+2

O O

EOS

<intent

> Semantic Frame request_movie genre=action, date=this weekend System Action /

Policy

request_location User Dialogue Action

Inform(location=San Francisco)

Time t-1

w_i

<slot>

w_i

wi+2

O O

EOS

Language Understanding (LU)

Time t-2 Time t

Dialogue Management

(DM)

w₀ w1 w2

Natural Language Generation (NLG)

User EOS

Goal

User Agenda Modeling User Simulator

End-to-End Neural Dialogue System Text Input

Are there any action movies to see this weekend?

Idea: supervised learning for each component and reinforcement learning for end-to-end training the neural dialogue system

100 Material: http://opendialogue.miulab.tw

E2E Task-Completion Bot (TC-Bot)

(Li et al., 2017)

 User goal: Two tickets for “the witch” tomorrow 9:30 PM at regal meridian 16, Seattle.

RULE BASED SYSTEM

User: Can I get 2 tickets for the witch?

Agent: Which theater do you want?

User: Regal meridian 16

Agent: What time would you like to see it?

User: 9:30 pm

Agent: What date would you like to see it?

User: Tomorrow!

Agent: How many people are you?

User: Just two adults.

Agent: What is your location?

User: Seattle!

Agent: What kind of video format do you like?

User: Oh, any kind is okay.

Agent: Do you have any price constraint?

User: No, I don’t care.

Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm.

User: Thank you.

Agent: Thank you! 100

The system can learn how to efficiently interact with users for task completion

REINFORCEMENT LEARNING SYSTEM User: Can I get 2 tickets for the witch?

Agent: What date would you like to see it?

User: Tomorrow

Agent: Which theater do you prefer?

User: Regal meridian 16 Agent: What is your location?

User: Seattle

Agent: What time would you like to see it?

User: 9:30 pm.

Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm.

User: Thanks.

Agent: Thanks!

101 Material: http://opendialogue.miulab.tw

Hierarchical RL for Composite Tasks

(Peng et al., 2017)

101

Travel Planning

Actions

• Set of tasks that need to be fulfilled collectively!

• Build a dialog manager that satisfies cross-subtask constraints (slot constraints)

• Temporally constructed goals

• hotel_check_in_time > departure_flight_time

• # flight_tickets = #people checking in the hotel

• hotel_check_out_time< return_flight_time,

102 Material: http://opendialogue.miulab.tw

Hierarchical RL for Composite Tasks

(Peng et al., 2017)

102

 The dialog model makes decisions over two levels: meta-controller and meta-controller

 The agent learns these policies simultaneously

 the policy of optimal sequence of goals to follow 𝜋_𝑔 𝑔_𝑡, 𝑠_𝑡; 𝜃₁

 Policy 𝜋_𝑎,𝑔 𝑎_𝑡, 𝑔_𝑡, 𝑠_𝑡; 𝜃₂ for each sub-goal 𝑔_𝑡

Meta-Controller Controller

(mitigate reward sparsity issues)

Social Chat Bots

103

104 Material: http://opendialogue.miulab.tw

Social Chat Bots

104

 The success of XiaoIce (小冰)

 Problem setting and evaluation

 Maximize the user engagement by automatically generating

 enjoyable and useful conversations

 Learning a neural conversation engine

 A data driven engine trained on social chitchat data (Sordoni+ 15; Li+ 16)

 Persona based models and speaker-role based models (Li+ 16; Luan+ 17)

 Image-grounded models (Mostafazadeh+ 17)

 Knowledge-grounded models (Ghazvininejad+ 17)

在文檔中 Open-Domain Neural Dialogue Systems (頁 36-166)