• 沒有找到結果。

Open-Domain Neural Dialogue Systems

N/A
N/A
Protected

Academic year: 2022

Share "Open-Domain Neural Dialogue Systems"

Copied!
166
0
0

加載中.... (立即查看全文)

全文

(1)

Open-Domain Neural Dialogue Systems

opendialogue.miulab.tw

YUN-NUNG(VIVIAN) CHEN JIANFENGGAO

How can I help you?

(2)

2 Material: http://opendialogue.miulab.tw

Outline

PART I. Introduction & Background Knowledge

PART II. Task-Oriented Dialogue Systems

PART III. Social Chat Bots

PART IV. Evaluation

PART V. Recent Trends and Challenges

2

Break

(3)

Introduction & Background Knowledge

Introduction

3

(4)

4 Material: http://opendialogue.miulab.tw

Outline

PART I. Introduction & Background Knowledge

Dialogue System Introduction

Neural Network Basics

Reinforcement Learning

PART II. Task-Oriented Dialogue Systems

PART III. Social Chat Bots

PART IV. Evaluation

PART V. Recent Trends and Challenges

4

(5)

5 Material: http://opendialogue.miulab.tw

Early 1990s

Early 2000s

2017

Multi-modal systems

e.g., Microsoft MiPad, Pocket PC

Keyword Spotting (e.g., AT&T)

System: “Please say collect, calling card, person, third number, or operator”

TV Voice Search e.g., Bing on Xbox

Intent Determination

(Nuance’s Emily™, AT&T HMIHY) User: “Uh…we want to move…we want to change our phone line from this house to another house Task-specific argument extraction

(e.g., Nuance, SpeechWorks) User: “I want to fly from Boston to New York next week.”

Brief History of Dialogue Systems

Apple Siri (2011)

Google Now (2012)

Facebook M & Bot (2015)

Google Home (2016) Microsoft Cortana

(2014)

Amazon Alexa/Echo (2014)

Google Assistant (2016)

DARPA CALO Project

Virtual Personal Assistants

(6)

6 Material: http://opendialogue.miulab.tw

Why We Need?

“I am smart”

“I have a question”

“I need to get this done”

“What should I do?”

6

Turing Test (“I” talk like a human) Information consumption

Task completion Decision support

(7)

7 Material: http://opendialogue.miulab.tw

Why We Need?

“I am smart”

“I have a question”

“I need to get this done”

“What should I do?”

Turing Test (“I” talk like a human) Information consumption

Task completion Decision support

• What is the employee review schedule?

• Which room is the dialogue tutorial in?

• When is the IJCNLP 2017 conference?

• What does NLP stand for?

(8)

8 Material: http://opendialogue.miulab.tw

Why We Need?

“I am smart”

“I have a question”

“I need to get this done”

“What should I do?”

Turing Test (“I” talk like a human) Information consumption

Task completion Decision support

• Book me the flight from Seattle to Taipei

• Reserve a table at Din Tai Fung for 5 people, 7PM tonight

• Schedule a meeting with Bill at 10:00 tomorrow.

(9)

9 Material: http://opendialogue.miulab.tw

Why We Need?

“I am smart”

“I have a question”

“I need to get this done”

“What should I do?”

9

Turing Test (“I” talk like a human) Information consumption

Task completion Decision support

• Is this product worth to buy?

(10)

10 Material: http://opendialogue.miulab.tw

Why We Need?

“I am smart”

“I have a question”

“I need to get this done”

“What should I do?”

10

Turing Test (“I” talk like a human) Information consumption

Task completion Decision support

Task-Oriented Dialogues

(11)

11 Material: http://opendialogue.miulab.tw

Language Empowering Intelligent Assistant

Apple Siri (2011) Google Now (2012)

Facebook M & Bot (2015) Google Home (2016)

Microsoft Cortana (2014)

Amazon Alexa/Echo (2014)

Google Assistant (2016)

Apple HomePod (2017)

(12)

12 Material: http://opendialogue.miulab.tw

Intelligent Assistants

12

Task-Oriented Engaging

(social bots)

(13)

13 Material: http://opendialogue.miulab.tw

Why Natural Language?

Global Digital Statistics (2017 January)

Total Population 7.48B

Internet Users 3.77B

Active Social Media Users

2.79B

Unique Mobile Users 4.92B

The more natural and convenient input of devices evolves towards speech.

13

Active Mobile Social Users

2.55B

(14)

14 Material: http://opendialogue.miulab.tw

Spoken Dialogue System (SDS)

Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions.

Spoken dialogue systems are being incorporated into various devices (smart-phones, smart TVs, in- car navigating system, etc).

14

JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion

Good dialogue systems assist users to access information conveniently and finish tasks efficiently.

(15)

15 Material: http://opendialogue.miulab.tw

App  Bot

A botis responsible for a “single” domain, similar to an app

15

Users can initiate dialogues instead of following the GUI design

(16)

16 Material: http://opendialogue.miulab.tw

GUI v.s. CUI (Conversational UI)

16 https://github.com/enginebai/Movie-lol-android

(17)

17 Material: http://opendialogue.miulab.tw

GUI v.s. CUI (Conversational UI)

Website/APP’s GUI Msg’s CUI

Situation Navigation, no specific goal Searching, with specific goal

Information Quantity More Less

Information Precision Low High

Display Structured Non-structured

Interface Graphics Language

Manipulation Click mainly use texts or speech as input

Learning Need time to learn and adapt No need to learn

Entrance App download Incorporatedin any msg-based interface

Flexibility Low, like machine manipulation High, like converse with a human

17

(18)

Two Branches of Dialogue Systems

Personal assistant, helps users achieve a certain task

Combination of rules and statistical components

POMDP for spoken dialog systems (Williams and Young, 2007)

End-to-end trainable task-oriented dialogue system (Wen et al., 2016)

End-to-end reinforcement learning dialogue system (Li et al., 2017; Zhao and Eskenazi, 2016)

No specific goal, focus on natural responses

Using variants of seq2seq model

A neural conversation model (Vinyals and Le, 2015)

Reinforcement learning for dialogue generation (Li et al., 2016)

Conversational contextual cues for response ranking (AI-Rfou et al., 2016)

18

Task-Oriented Bot Chit-Chat Bot

(19)

19 Material: http://opendialogue.miulab.tw

Task-Oriented Dialogue System

(Young, 2000)

19

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Backend Action / Knowledge Providers

(20)

20 Material: http://opendialogue.miulab.tw

Outline

PART I. Introduction & Background Knowledge

Dialogue System Introduction

Neural Network Basics

Reinforcement Learning

PART II. Task-Oriented Dialogue Systems

PART III. Social Chat Bots

PART IV. Evaluation

PART V. Recent Trends and Challenges

20

(21)

21 Material: http://opendialogue.miulab.tw

Machine Learning ≈ Looking for a Function

Speech Recognition

Image Recognition

Go Playing

Chat Bot

f

f

f

f

cat

“你好 (Hello) ”

5-5 (next move)

“Where is IJCNLP?” “The location is…”

Given a large amount of data, the machine learns what the function f should be.

(22)

22 Material: http://opendialogue.miulab.tw

Machine Learning

22

Machine Learning

Unsupervised Learning Supervised

Learning

Reinforcement Learning

Deep learning is a type of machine learning approaches, called “neural networks”.

(23)

23 Material: http://opendialogue.miulab.tw

A Single Neuron

z w

1

w

2

w

N

x

1

x

2

x

N

b

 

z

  z

bias

z

y

 

z

z e

  1

1

Sigmoid function Activation function

1

w, bare the parameters of this neuron

23

(24)

24 Material: http://opendialogue.miulab.tw

A Single Neuron

z w

1

w

2

w

N

x

1

x

2

x

N

b

bias

y

1

  

5 . 0

"

2

"

5 . 0

"

2

"

y not

y is

A single neuron can only handle binary classification

24

M

N R

R

f : 

(25)

25 Material: http://opendialogue.miulab.tw

A Layer of Neurons

Handwriting digit classification f :RNRM

A layer of neurons can handle multiple possible output, and the result depends on the max one

x

1

x

2

x

N

1

y

1

… …

“1” or not

“2” or not

“3” or not

y

2

y

3

10 neurons/10 classes

Which one is max?

(26)

26 Material: http://opendialogue.miulab.tw

Deep Neural Networks (DNN)

Fully connected feedforward network

x1

x2

……

Layer 1

……

y1

y2

……

Layer 2

……

Layer L

……

……

……

Input Output

yM

xN

vector x

vector y

Deep NN: multiple hidden layers

M

N R

R

f : 

(27)

27 Material: http://opendialogue.miulab.tw

Recurrent Neural Network (RNN)

http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/

: tanh, ReLU

time

RNN can learn accumulated sequential information (time-series)

(28)

28 Material: http://opendialogue.miulab.tw

Outline

PART I. Introduction & Background Knowledge

Dialogue System Introduction

Neural Network Basics

Reinforcement Learning

PART II. Task-Oriented Dialogue Systems

PART III. Social Chat Bots

PART IV. Evaluation

PART V. Recent Trends and Challenges

28

(29)

29 Material: http://opendialogue.miulab.tw

Reinforcement Learning

RL is a general purpose framework for decision making

RL is for an agentwith the capacity to act

Each actioninfluences the agent’s future state

Success is measured by a scalar rewardsignal

Goal: select actions to maximize future reward

(30)

30 Material: http://opendialogue.miulab.tw

Scenario of Reinforcement Learning

Agent learns to take actions to maximize expected reward.

Environment

Observation ot Action at

Reward rt If win, reward = 1 If loss, reward = -1 Otherwise, reward = 0

Next Move

(31)

31 Material: http://opendialogue.miulab.tw

Supervised v.s. Reinforcement

Supervised

Reinforcement

31

……

Say “Hi”

Say “Good bye”

Learning from teacher

Learning from critics

Hello  ……

“Hello”

“Bye bye”

……. …….

OXX???!

Bad

(32)

32 Material: http://opendialogue.miulab.tw

Sequential Decision Making

Goal: select actions to maximize total future reward

Actions may have long-term consequences

Reward may be delayed

It may be better to sacrifice immediate reward to gain more long-term reward

32

(33)

33 Material: http://opendialogue.miulab.tw

Deep Reinforcement Learning

Environment

Observation Action

Reward Function

Input

Function Output

Used to pick the best function

……

DNN

(34)

34 Material: http://opendialogue.miulab.tw

Reinforcing Learning

Start from state s0

Choose action a0

Transit to s1 ~ P(s0, a0)

Continue…

Total reward:

Goal: select actions that maximize the expected total reward

(35)

35 Material: http://opendialogue.miulab.tw

Reinforcement Learning Approach

Policy-based RL

Search directly for optimal policy

Value-based RL

Estimate the optimal value function

Model-based RL

Build a model of the environment

Plan (e.g. by lookahead) using model

is the policy achieving maximum future reward

is maximum value achievable under any policy

(36)

Task-Oriented Dialogue Systems

36

(37)

37 Material: http://opendialogue.miulab.tw

Task-Oriented Dialogue System

(Young, 2000)

37

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Backend Action / Knowledge Providers

(38)

38 Material: http://opendialogue.miulab.tw

Outline

PART I. Introduction & Background Knowledge

PART II. Task-Oriented Dialogue Systems

Spoken/Natural Language Understanding (SLU/NLU)

Dialogue Management – Dialogue State Tracking (DST)

Dialogue Management – Dialogue Policy Optimization

Natural Language Generation (NLG)

End-to-End Task-Oriented Dialogue Systems

PART III. Social Chat Bots

PART IV. Evaluation

PART V. Recent Trends and Challenges

38

(39)

39 Material: http://opendialogue.miulab.tw

Language Understanding (LU)

Pipelined

39

1. Domain Classification

2. Intent

Classification 3. Slot Filling

(40)

LU – Domain/Intent Classification

• Given a collection of utterances uiwith labels ci, D= {(u1,c1),…,(un,cn)}

where ci ∊ C, train a model to estimate labels for new utterances uk. Mainly viewed as an utterance classification task

40

find me a cheap taiwanese restaurant in oakland

Movies Restaurants Sports

Weather Music

Find_movie Buy_tickets Find_restaurant Book_table Find_lyrics

(41)

41 Material: http://opendialogue.miulab.tw

DNN for Domain/Intent Classification

(Ravuri & Stolcke, 2015)

41

Intent decision after reading all words performs better

RNN and LSTMs for utterance classification

(42)

42 Material: http://opendialogue.miulab.tw

DNN for Dialogue Act Classification

(Lee & Dernoncourt, 2016)

42

RNN and CNNs for dialogue act classification

(43)

LU – Slot Filling

43

flights from Boston to New York today

O O B-city O B-city I-city O

O O B-dept O B-arrival I-arrival B-date

As a sequence tagging task

• Given a collection tagged word sequences, S={((w1,1,w1,2,…, w1,n1), (t1,1,t1,2,…,t1,n1)), ((w2,1,w2,2,…,w2,n2), (t2,1,t2,2,…,t2,n2)) …}

where ti M, the goal is to estimate tags for a new word sequence.

flights from Boston to New York today

Entity Tag Slot Tag

(44)

44 Material: http://opendialogue.miulab.tw

RNN for Slot Tagging – I

(Yao et al, 2013; Mesnil et al, 2015)

Variations:

a. RNNs with LSTM cells

b. Input, sliding window of n-grams

c. Bi-directional LSTMs

𝑤0 𝑤1 𝑤2 𝑤𝑛 0𝑓 1𝑓 2𝑓 𝑛𝑓 0𝑏 1𝑏 2𝑏 𝑛𝑏 𝑦0 𝑦1 𝑦2 𝑦𝑛

(b) LSTM-LA (c) bLSTM

𝑦0 𝑦1 𝑦2 𝑦𝑛

𝑤0 𝑤1 𝑤2 𝑤𝑛 0 1 2 𝑛

(a) LSTM 𝑦0 𝑦1 𝑦2 𝑦𝑛

𝑤0 𝑤1 𝑤2 𝑤𝑛 0 1 2 𝑛

(45)

45 Material: http://opendialogue.miulab.tw

RNN for Slot Tagging – II

(Kurata et al., 2016; Simonnet et al., 2015)

Encoder-decoder networks

Leverages sentence level information

Attention-based encoder-decoder

Use of attention (as in MT) in the encoder-decoder network

Attention is estimated using a feed-

forward network with input: ht and st at time t

𝑦0 𝑦1 𝑦2 𝑦𝑛

𝑤𝑛 𝑤2 𝑤1 𝑤0 𝑛 2 1 0

𝑤0 𝑤1 𝑤2 𝑤𝑛

𝑦0 𝑦1 𝑦2 𝑦𝑛

𝑤0 𝑤1 𝑤2 𝑤𝑛

0 1 2 𝑛 𝑠0 𝑠1 𝑠2 𝑠𝑛

ci 0𝑛

(46)

46 Material: http://opendialogue.miulab.tw

RNN for Slot Tagging – III

(Jaech et al., 2016; Tafforeau et al., 2016)

Multi-task learning

Goal: exploit data from domains/tasks with a lot of data to improve ones with less data

Lower layers are shared across domains/tasks

Output layer is specific to task

46

(47)

47 Material: http://opendialogue.miulab.tw

Joint Segmentation and Slot Tagging

(Zhai et al., 2017)

Encoder that segments

Decoder that tags the segments

47

(48)

ht-

1

ht+

1

ht

W W W W

taiwanese

B-type U

food U

please U

V

O V

O V

hT+1 EOS U

FIND_REST V

Slot Filling Intent Prediction

Joint Semantic Frame Parsing

Sequence- based (Hakkani-

Tur+ 16)

• Slot filling and intent prediction in the same

output sequence

Parallel- based (Liu+ 16)

• Intent prediction and slot filling are performed in two branches

48

(49)

49 Material: http://opendialogue.miulab.tw

Contextual LU

49

just sent email to bob about fishing this weekend

O O O O

B-contact_name O

B-subject I-subject I-subject U

S

I send_email D communication

 send_email(contact_name=“bob”, subject=“fishing this weekend”)

are we going to fish this weekend U1

S2

 send_email(message=“are we going to fish this weekend”) send email to bob

U2

 send_email(contact_name=“bob”)

B-message

I-messageI-message I-message I-message I-message I-message

B-contact_name S1

Domain Identification  Intent Prediction  Slot Filling

(50)

50 Material: http://opendialogue.miulab.tw

Contextual LU

User utterances are highly ambiguous in isolation

Cascal, for 6.

#people time

?

Book a table for 10 people tonight.

Which restaurant would you like to book a table for?

Restaurant Booking

(51)

51 Material: http://opendialogue.miulab.tw

Contextual LU

(Bhargava et al., 2013; Hori et al, 2015)

Leveraging contexts

Used for individual tasks

Seq2Seq model

Words are input one at a time, tags are output at the end of each utterance

Extension: LSTM with speaker role dependent layers

51

(52)

52 Material: http://opendialogue.miulab.tw

End-to-End Memory Networks

(Sukhbaatar et al, 2015)

U: “i d like to purchase tickets to see deepwater horizon”

S: “for which theatre”

U: “angelika”

S: “you want them for angelika theatre?”

U: “yes angelika”

S: “how many tickets would you like ?”

U: “3 tickets for saturday”

S: “What time would you like ?”

U: “Any time on saturday is fine”

S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm”

U: “Let’s do 5:40”

m0

mi

mn-1 u

(53)

53 Material: http://opendialogue.miulab.tw

E2E MemNN for Contextual LU

(Chen et al., 2016)

53

u

Knowledge Attention Distribution

pi

mi

Memory Representation

Weighted

Sum h

Wkg

Knowledge Encoding o

Representation history utterances {xi}

current utterance

c

Inner Product Sentence

Encoder RNNin

x1 x2 xi

Contextual Sentence Encoder

x1 x2 xi

RNNmem

slot tagging sequencey

ht-1 ht

V V

W W W

wt-1 wt yt-1 yt

U U

M M

1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding

Idea: additionally incorporating contextual knowledge during slot tagging

 track dialogue states in a latent way

RNN Tagger

(54)

54 Material: http://opendialogue.miulab.tw

Analysis of Attention

U: “i d like to purchase tickets to see deepwater horizon”

S: “for which theatre”

U: “angelika”

S: “you want them for angelika theatre?”

U: “yes angelika”

S: “how many tickets would you like ?”

U: “3 tickets for saturday”

S: “What time would you like ?”

U: “Any time on saturday is fine”

S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm”

U: “Let’s do 5:40”

0.69

0.13

0.16

(55)

55 Material: http://opendialogue.miulab.tw

Sequential Dialogue Encoder Network

(Bapna et al., 2017)

Past and current turn encodings input to a feed forward network

55 Bapna et.al., SIGDIAL 2017

(56)

56 Material: http://opendialogue.miulab.tw

Structural LU

(Chen et al., 2016)

K-SAN: prior knowledge as a teacher

56

Knowledge Encoding

Sentence Encoding

Inner Product

mi

Knowledge Attention Distribution

pi

Encoded Knowledge Representation

Weighted Sum

Knowledge- Guided Representation

slot tagging sequence knowledge-guided structure {xi}

showme theflights fromseattleto sanfrancisco

ROOT

Input Sentence

W W W W

wt-1

yt-1 U

wt M U

wt+1 U

V

yt V

yt+1 V M

M

RNN Tagger

Knowledge Encoding Module

(57)

57 Material: http://opendialogue.miulab.tw

Structural LU

(Chen et al., 2016)

Sentence structural knowledge stored as memory

57

Semantics (AMR Graph)

show me

the

flights from seattle

to

san francisco ROOT

1.

3.

4.

2.

show

you flight I

1.

2.

4.

city city

Seattle San Francisco 3.

Sentence s show me the flights from seattle to san francisco

Syntax (Dependency Tree)

(58)

58 Material: http://opendialogue.miulab.tw

Structural LU

(Chen et al., 2016)

Sentence structural knowledge stored as memory

Using less training data with K-SAN allows the model pay the similar attention to the salient substructures that are important for tagging.

(59)

59 Material: http://opendialogue.miulab.tw

LU Importance

(Li et al., 2017)

Compare different types of LU errors

Slot filling is more important than intent detection in language understanding

Sensitivity to Intent Error Sensitivity to Slot Error

(60)

60 Material: http://opendialogue.miulab.tw

LU Evaluation

Metrics

Sub-sentence-level: intent accuracy, slot F1

Sentence-level: whole frame accuracy

60

(61)

61 Material: http://opendialogue.miulab.tw

Outline

PART I. Introduction & Background Knowledge

PART II. Task-Oriented Dialogue Systems

Spoken/Natural Language Understanding (SLU/NLU)

Dialogue Management – Dialogue State Tracking (DST)

Dialogue Management – Dialogue Policy Optimization

Natural Language Generation (NLG)

End-to-End Task-Oriented Dialogue Systems

PART III. Social Chat Bots

PART IV. Evaluation

PART V. Recent Trends and Challenges

61

(62)

62 Material: http://opendialogue.miulab.tw

Elements of Dialogue Management

(Figure from Gašić) 62

Dialogue State Tracking

(63)

63 Material: http://opendialogue.miulab.tw

Dialogue State Tracking (DST)

Maintain a probabilistic distribution instead of a 1-best prediction for better robustness to SLU errors or ambiguous input

63

How can I help you?

Book a table at Sumiko for 5 How many people?

3

Slot Value

# people 5 (0.5)

time 5 (0.5)

Slot Value

# people 3 (0.8)

time 5 (0.8)

(64)

64 Material: http://opendialogue.miulab.tw

Multi-Domain Dialogue State Tracking (DST)

A full representation of the system's belief of the user's goal at any point during the dialogue

Used for making API calls

64

Do you wanna take Angela to go see a movie tonight?

Sure, I will be home by 6.

Let's grab dinner before the movie.

How about some Mexican?

Let's go to Vive Sol and see Inferno after that.

Angela wants to watch the Trolls movie.

Ok. Lets catch the 8 pm show.

Inferno

6 pm 7 pm

2 3

11/15/16

Vive Sol Restaurant

Mexican Cuisine

6:30 pm 7 pm 11/15/16 Date

Time

Restaurants

7:30 pm

Century 16

Trolls

8 pm 9 pm

Movies

(65)

65 Material: http://opendialogue.miulab.tw

Dialog State Tracking Challenge (DSTC)

(Williams et al. 2013, Henderson et al. 2014, Henderson et al. 2014, Kim et al. 2016, Kim et al. 2016)

Challenge Type Domain Data Provider Main Theme

DSTC1 Human-Machine Bus Route CMU Evaluation Metrics

DSTC2 Human-Machine Restaurant U. Cambridge User Goal Changes

DSTC3 Human-Machine Tourist Information U. Cambridge Domain Adaptation

DSTC4 Human-Human Tourist Information I2R Human Conversation

DSTC5 Human-Human Tourist Information I2R Language Adaptation

(66)

66 Material: http://opendialogue.miulab.tw

NN-Based DST

(Henderson et al., 2013; Mrkšić et al., 2015; Mrkšić et al., 2016)

(Figure from Wen et al, 2016) 66

(67)

67 Material: http://opendialogue.miulab.tw

Neural Belief Tracker

(Mrkšić et al., 2016)

67

(68)

68 Material: http://opendialogue.miulab.tw

DST Evaluation

Dialogue State Tracking Challenges

DSTC2-3, human-machine

DSTC4-5, human-human

Metric

Tracked state accuracy with respect to user goal

Recall/Precision/F-measure individual slots

68

(69)

69 Material: http://opendialogue.miulab.tw

Outline

PART I. Introduction & Background Knowledge

PART II. Task-Oriented Dialogue Systems

Spoken/Natural Language Understanding (SLU/NLU)

Dialogue Management – Dialogue State Tracking (DST)

Dialogue Management – Dialogue Policy Optimization

Natural Language Generation (NLG)

End-to-End Task-Oriented Dialogue Systems

PART III. Social Chat Bots

PART IV. Evaluation

PART V. Recent Trends and Challenges

69

(70)

70 Material: http://opendialogue.miulab.tw

Elements of Dialogue Management

(Figure from Gašić) 70

Dialogue Policy Optimization

(71)

71 Material: http://opendialogue.miulab.tw

Dialogue Policy Optimization

Dialogue management in a RL framework

71

U s e r

Reward R Observation O Action A

Environment

Agent

Natural Language Generation Language Understanding

Dialogue Manager

Slides credited by Pei-Hao Su

Optimized dialogue policy selects the best action that can maximize the future reward.

Correct rewards are a crucial factor in dialogue policy training

(72)

72 Material: http://opendialogue.miulab.tw

Reward for RL ≅ Evaluation for System

Dialogue is a special RL task

Human involves in interaction and rating (evaluation) of a dialogue

Fully human-in-the-loop framework

Rating: correctness, appropriateness, and adequacy

- Expert rating high quality, high cost

- User rating unreliable quality, medium cost - Objective rating Check desired aspects, low cost

72

(73)

73 Material: http://opendialogue.miulab.tw

Reinforcement Learning for Dialogue Policy Optimization

73

Language understanding

Language (response) generation

Dialogue Policy 𝑎 = 𝜋(𝑠)

Collect rewards (𝑠, 𝑎, 𝑟, 𝑠’)

Optimize 𝑄(𝑠, 𝑎) User input (o)

Response

𝑠

𝑎

Type of Bots State Action Reward

Social ChatBots Chat history System Response # of turns maximized;

Intrinsically motivated reward

InfoBots (interactive Q/A) User current question + Context

Answers to current question

Relevance of answer;

# of turns minimized

Task-Completion Bots User current input + Context

System dialogue act w/

slot value (or API calls)

Task success rate;

# of turns minimized

Goal: develop a generic deep RL algorithm to learn dialogue policy for all bot categories

(74)

74 Material: http://opendialogue.miulab.tw

Dialogue Reinforcement Learning Signal

Typical reward function

-1 for per turn penalty

Large reward at completion if successful

Typically requires domain knowledge

✔ Simulated user

✔ Paid users (Amazon Mechanical Turk)

✖ Real users

|||

74

The user simulator is usually required for dialogue system training before deployment

(75)

75 Material: http://opendialogue.miulab.tw

Neural Dialogue Manager

(Li et al., 2017)

Deep Q-network for training DM policy

Input: current semantic frame observation, database returned results

Output: system action

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location DQN-based

Dialogue Management

Simulated User (DM) Backend DB

Material: http://deepdialogue.miulab.tw

(76)

76 Material: http://opendialogue.miulab.tw

SL + RL for Sample Efficiency

(Su et al., 2017)

Issue about RL for DM

slow learning speed

cold start

Solutions

Sample-efficient actor-critic

Off-policy learning with experience replay

Better gradient update

Utilizing supervised data

Pretrain the model with SL and then fine-tune with RL

Mix SL and RL data during RL learning

Combine both

76

(77)

77 Material: http://opendialogue.miulab.tw

Online Training

(Su et al., 2015; Su et al., 2016)

Policy learning from real users

Infer reward directly from dialogues (Su et al., 2015)

User rating (Su et al., 2016)

Reward modeling on user binary success rating

Reward

Model Success/Fail Embedding

Function

Dialogue Representation

Reinforcement Signal Query rating

(78)

78 Material: http://opendialogue.miulab.tw

Interactive RL for DM

(Shah et al., 2016)

78

Immediate Feedback

Use a third agent for providing interactive feedback to the DM

(79)

79 Material: http://opendialogue.miulab.tw

Dialogue Management Evaluation

Metrics

Turn-level evaluation: system action accuracy

Dialogue-level evaluation: task success rate, reward

79

(80)

80 Material: http://opendialogue.miulab.tw

Outline

PART I. Introduction & Background Knowledge

PART II. Task-Oriented Dialogue Systems

Spoken/Natural Language Understanding (SLU/NLU)

Dialogue Management – Dialogue State Tracking (DST)

Dialogue Management – Dialogue Policy Optimization

Natural Language Generation (NLG)

End-to-End Task-Oriented Dialogue Systems

PART III. Social Chat Bots

PART IV. Evaluation

PART V. Recent Trends and Challenges

80

(81)

81 Material: http://opendialogue.miulab.tw

Natural Language Generation (NLG)

Mapping semantic frame into natural language

inform(name=Seven_Days, foodtype=Chinese) Seven Days is a nice Chinese restaurant

81

(82)

82 Material: http://opendialogue.miulab.tw

Template-Based NLG

Define a set of rules to map frames to NL

82

Pros:simple, error-free, easy to control Cons: time-consuming, poor scalability Semantic Frame Natural Language

confirm() “Please tell me more about the product your are looking for.”

confirm(area=$V) “Do you want somewhere in the $V?”

confirm(food=$V) “Do you want a $V restaurant?”

confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.”

(83)

83 Material: http://opendialogue.miulab.tw

Plan-Based NLG

(Walker et al., 2002)

Divide the problem into pipeline

Statistical sentence plan generator (Stent et al., 2009)

Statistical surface realizer (Dethlefs et al., 2013; Cuayáhuitl et al., 2014; …) Inform(

name=Z_House, price=cheap )

Z House is a cheap restaurant.

Pros:can model complex linguistic structures

Cons: heavily engineered, require domain knowledge Sentence

Plan Generator

Sentence Plan Reranker

Surface Realizer

syntactic tree

(84)

84 Material: http://opendialogue.miulab.tw

Class-Based LM NLG

(Oh and Rudnicky, 2000)

Class-based language modeling

NLG by decoding

84

Pros:easy to implement/ understand, simple rules Cons: computationally inefficient

Classes:

inform_area inform_address

request_area request_postcode

(85)

85 Material: http://opendialogue.miulab.tw

RNN-Based LM NLG

(Wen et al., 2015)

<BOS> SLOT_NAME serves SLOT_FOOD .

<BOS> Din Tai Fung serves Taiwanese . delexicalisation

Inform(name=Din Tai Fung, food=Taiwanese) 0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, 0, 0, 0…

dialogue act 1-hot representation

SLOT_NAME serves SLOT_FOOD . <EOS>

Slot weight tying

conditioned on the dialogue act

Input

Output

(86)

86 Material: http://opendialogue.miulab.tw

Handling Semantic Repetition

Issue: semantic repetition

Din Tai Fung is a great Taiwanese restaurant that serves Taiwanese.

Din Tai Fung is a child friendly restaurant, and also allows kids.

Deficiency in either model or decoding (or both)

Mitigation

Post-processing rules (Oh & Rudnicky, 2000)

Gating mechanism (Wen et al., 2015)

Attention(Mei et al., 2016; Wen et al., 2015)

86

參考文獻

相關文件

The ontology induction and knowledge graph construction enable systems to automatically acquire open domain knowledge. The MF technique for SLU modeling provides a principle model

Shang-Yu Su, Chao-Wei Huang, and Yun-Nung Chen, “Dual Supervised Learning for Natural Language Understanding and Generation,” in Proceedings of The 57th Annual Meeting of

 End-to-end reinforcement learning dialogue system (Li et al., 2017; Zhao and Eskenazi, 2016)?.  No specific goal, focus on

A spoken language understanding (SLU) component requires the domain ontology to decode utterances into semantic forms, which contain core content (a set of slots and slot-fillers)

Department of Electrical Engineering, National Cheng Kung University In this thesis, an embedded system based on SPCE061A for interactive spoken dialogue learning system (ISDLS)

- allow students to demonstrate their learning and understanding of the target language items in mini speaking

Is end-to-end congestion control sufficient for fair and efficient network usage. If not, what should we do

◉ These limitations of vanilla seq2seq make human-machine conversations boring and shallow.. How can we overcome these limitations and move towards deeper