• 沒有找到結果。

Deep Learning for Dialogue Systems

N/A
N/A
Protected

Academic year: 2022

Share "Deep Learning for Dialogue Systems"

Copied!
157
0
0

加載中.... (立即查看全文)

全文

(1)

Deep Learning for Dialogue Systems

deepdialogue.miulab.tw

(2)

2

Outline

Introduction

Background Knowledge

Neural Network Basics

Reinforcement Learning

Modular Dialogue System

Spoken/Natural Language Understanding (SLU/NLU)

Dialogue Management

Dialogue State Tracking (DST)

Dialogue Policy Optimization

Natural Language Generation (NLG)

Evaluation

Recent Trends and Challenges

End-to-End Neural Dialogue System

Multimodality

Dialogue Breath

Dialogue Depth

2

Material: http://deepdialogue.miulab.tw

Break

(3)

Introduction

(4)

4

Early 1990s

Early 2000s

2017

Multi-modal systems

e.g., Microsoft MiPad, Pocket PC

Keyword Spotting (e.g., AT&T)

System: “Please say collect, calling card, person, third number, or operator”

TV Voice Search e.g., Bing on Xbox

Intent Determination

(Nuance’s Emily™, AT&T HMIHY) User: “Uh…we want to move…we want to change our phone line from this house to another house Task-specific argument extraction

(e.g., Nuance, SpeechWorks) User: “I want to fly from Boston to New York next week.”

Brief History of Dialogue Systems

Apple Siri (2011)

Google Now (2012)

Facebook M & Bot (2015)

Google Home (2016) Microsoft Cortana

(2014)

Amazon Alexa/Echo (2014)

Google Assistant (2016)

DARPA CALO Project

Virtual Personal Assistants

Material: http://deepdialogue.miulab.tw

(5)

5

Language Empowering Intelligent Assistant

Apple Siri (2011) Google Now (2012)

Facebook M & Bot (2015) Google Home (2016)

Microsoft Cortana (2014)

Amazon Alexa/Echo (2014)

Google Assistant (2016)

Apple HomePod (2017)

(6)

6

Why We Need?

Get things done

E.g. set up alarm/reminder, take note

Easy access to structured data, services and apps

E.g. find docs/photos/restaurants

Assist your daily schedule and routine

E.g. commute alerts to/from work

Be more productive in managing your work and personal life

6

Material: http://deepdialogue.miulab.tw

(7)

7

Why Natural Language?

Global Digital Statistics (2015 January)

7

Global Population 7.21B

Active Internet Users 3.01B

Active Social Media Accounts

2.08B

Active Unique Mobile Users

3.65B The more natural and convenient input of devices evolves towards speech.

(8)

8

Spoken Dialogue System (SDS)

Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions.

Spoken dialogue systems are being incorporated into various devices (smart-phones, smart TVs, in- car navigating system, etc).

8

JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion

Good dialogue systems assist users to access information conveniently and finish tasks efficiently.

Material: http://deepdialogue.miulab.tw

(9)

9

App  Bot

A bot is responsible for a “single” domain, similar to an app

9

Users can initiate dialogues instead of following the GUI design

(10)

10

GUI v.s. CUI (Conversational UI)

10 https://github.com/enginebai/Movie-lol-android

Material: http://deepdialogue.miulab.tw

(11)

11

GUI v.s. CUI (Conversational UI)

Website/APP’s GUI Msg’s CUI

Situation Navigation, no specific goal Searching, with specific goal

Information Quantity More Less

Information Precision Low High

Display Structured Non-structured

Interface Graphics Language

Manipulation Click mainly use texts or speech as input

Learning Need time to learn and adapt No need to learn

Entrance App download Incorporatedin any msg-based interface

Flexibility Low, like machine manipulation High, like converse with a human

11

(12)

12

Challenges

Variability in Natural Language

Robustness

Recall/Precision Trade-off

Meaning Representation

Common Sense, World Knowledge

Ability to Learn

Transparency

12

Material: http://deepdialogue.miulab.tw

(13)

Two Branches of Bots

Personal assistant, helps users achieve a certain task

Combination of rules and statistical components

POMDP for spoken dialog systems (Williams and Young, 2007)

End-to-end trainable task-oriented dialogue system (Wen et al., 2016)

End-to-end reinforcement learning dialogue system (Li et al., 2017; Zhao and Eskenazi, 2016)

No specific goal, focus on natural responses

Using variants of seq2seq model

A neural conversation model (Vinyals and Le, 2015)

Reinforcement learning for dialogue generation (Li et al., 2016)

Conversational contextual cues for response ranking (AI-Rfou et al., 2016)

13

Task-Oriented Bot Chit-Chat Bot

(14)

14

Task-Oriented Dialogue System (Young, 2000)

14

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Backend Action / Knowledge Providers

http://rsta.royalsocietypublishing.org/content/358/1769/1389.short

Material: http://deepdialogue.miulab.tw

(15)

15

Interaction Example

15

User

Intelligent

Agent Q: How does a dialogue system process this request?

Good Taiwanese eating places include Din Tai Fung, Boiling Point, etc. What do you want to choose? I can help you go there.

find a good eating place for taiwanese food

(16)

16

Task-Oriented Dialogue System (Young, 2000)

16

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Backend Action / Knowledge Providers

Material: http://deepdialogue.miulab.tw

(17)

17

Requires Predefined Domain Ontology

17

find a good eating place for taiwanese food

User

Organized Domain Knowledge (Database)

Intelligent Agent

Restaurant DB Taxi DB Movie DB

Classification!

(18)

18

2. Intent Detection

Requires Predefined Schema

18

find a good eating place for taiwanese food

User

Intelligent Agent

Restaurant DB

FIND_RESTAURANT FIND_PRICE

FIND_TYPE :

Classification!

Material: http://deepdialogue.miulab.tw

(19)

19

Requires Predefined Schema

find a good eating place for taiwanese food

User

Intelligent Agent

19

Restaurant DB

Restaurant Rating Type Rest 1 good Taiwanese

Rest 2 bad Thai

: : :

FIND_RESTAURANT rating=“good”

type=“taiwanese”

SELECT restaurant { rest.rating=“good”

rest.type=“taiwanese”

Semantic Frame } Sequence Labeling

O O B-rating O O O B-type O

(20)

20

Task-Oriented Dialogue System (Young, 2000)

20

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Backend Action / Knowledge Providers

Material: http://deepdialogue.miulab.tw

(21)

21

Requires Hand-Crafted States

User

Intelligent Agent

find a good eating place for taiwanese food

21

location rating type

loc, rating rating, type

loc, type all

i want it near to my office

NULL

(22)

22

State Tracking

Requires Hand-Crafted States

User

Intelligent Agent

find a good eating place for taiwanese food

22

location rating type

loc, rating rating, type

loc, type all

i want it near to my office

NULL

Material: http://deepdialogue.miulab.tw

(23)

23

Handling Errors and Confidence

User

Intelligent Agent

find a good eating place for taixxxx food

23

FIND_RESTAURANT rating=“good”

type=“taiwanese”

FIND_RESTAURANT rating=“good”

type=“thai”

FIND_RESTAURANT rating=“good”

location rating type

loc, rating rating, type

loc, type all

NULL

?

?

rating=“good”, type=“thai”

rating=“good”, type=“taiwanese”

?

?

(24)

24

Dialogue Policy for Agent Action

Inform(location=“Taipei 101”)

“The nearest one is at Taipei 101”

Request(location)

“Where is your home?”

Confirm(type=“taiwanese”)

“Did you want Taiwanese food?”

24

Material: http://deepdialogue.miulab.tw

(25)

25

Task-Oriented Dialogue System (Young, 2000)

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling Hypothesis

are there any action movies to see this weekend

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location Text Input

Are there any action movies to see this weekend?

Speech Signal

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy

Backend Action / Knowledge Providers Natural Language

Generation (NLG) Text response

Where are you located?

(26)

26

Output / Natural Language Generation

Goal: generate natural language or GUI given the selected dialogue action for interactions

Inform(location=“Taipei 101”)

“The nearest one is at Taipei 101” v.s.

Request(location)

“Where is your home?” v.s.

Confirm(type=“taiwanese”)

“Did you want Taiwanese food?” v.s.

26

Material: http://deepdialogue.miulab.tw

(27)

Neural Network Basics

Reinforcement Learning

(28)

28

Outline

Introduction

Background Knowledge

Neural Network Basics

Reinforcement Learning

Modular Dialogue System

Spoken/Natural Language Understanding (SLU/NLU)

Dialogue Management

Dialogue State Tracking (DST)

Dialogue Policy Optimization

Natural Language Generation (NLG)

Evaluation

Recent Trends and Challenges

End-to-End Neural Dialogue System

Multimodality

Dialogue Breath

Dialogue Depth

28

Material: http://deepdialogue.miulab.tw

(29)

29

Machine Learning ≈ Looking for a Function

Speech Recognition

Image Recognition

Go Playing

Chat Bot

f

f

f

f

cat

“你好 (Hello) ”

5-5 (next move)

“Where is Westin?” “The address is…”

Given a large amount of data, the machine learns what the function f should be.

(30)

30

Machine Learning

30

Machine Learning

Unsupervised Learning Supervised

Learning

Reinforcement Learning

Deep learning is a type of machine learning approaches, called “neural networks”.

Material: http://deepdialogue.miulab.tw

(31)

31

A Single Neuron

z w

1

w

2

w

N

x

1

x

2

x

N

b

  z

  z

bias z

y

 

z

z e

  1

 1

Sigmoid function Activation function

1

w, b

are the parameters of this neuron

31

(32)

32

A Single Neuron

z w

1

w

2

w

N

x

1

x

2

x

N

b

bias

y

1   

5 . 0

"

2

"

5 . 0

"

2

"

y not

y is

A single neuron can only handle binary classification

32

M

N

R

R

f : 

Material: http://deepdialogue.miulab.tw

(33)

33

A Layer of Neurons

Handwriting digit classification f : R

N

R

M

A layer of neurons can handle multiple possible output, and the result depends on the max one

x

1

x

2

x

N

1

y

1

… …

“1” or not

“2” or not

“3” or not

y

2

y

3

10 neurons/10 classes

Which one is max?

(34)

34

Deep Neural Networks (DNN)

Fully connected feedforward network

x

1

x

2

……

Layer 1

……

y

1

y

2

……

Layer 2

……

Layer L

……

……

……

Input Output

y

M

x

N

vector x

vector y

Deep NN: multiple hidden layers

M

N

R

R

f : 

Material: http://deepdialogue.miulab.tw

(35)

35

Recurrent Neural Network (RNN)

http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/

: tanh, ReLU

time

RNN can learn accumulated sequential information (time-series)

(36)

36

Outline

Introduction

Background Knowledge

Neural Network Basics

Reinforcement Learning

Modular Dialogue System

Spoken/Natural Language Understanding (SLU/NLU)

Dialogue Management

Dialogue State Tracking (DST)

Dialogue Policy Optimization

Natural Language Generation (NLG)

Evaluation

Recent Trends and Challenges

End-to-End Neural Dialogue System

Multimodality

Dialogue Breath

Dialogue Depth

36

Material: http://deepdialogue.miulab.tw

(37)

37

Reinforcement Learning

RL is a general purpose framework for decision making

RL is for an agent with the capacity to act

Each action influences the agent’s future state

Success is measured by a scalar reward signal

Goal: select actions to maximize future reward

(38)

38

Scenario of Reinforcement Learning

Agent learns to take actions to maximize expected reward.

Environment

Observation o

t

Action a

t

Reward r

t

If win, reward = 1 If loss, reward = -1 Otherwise, reward = 0

Next Move

Material: http://deepdialogue.miulab.tw

(39)

39

Supervised v.s. Reinforcement

Supervised

Reinforcement

39

Hello ☺

Agent

……

Agent

……. …….

……

Bad

“Hello” Say “Hi”

“Bye bye” Say “Good bye”

Learning from teacher

Learning from critics

(40)

40

Sequential Decision Making

Goal: select actions to maximize total future reward

Actions may have long-term consequences

Reward may be delayed

It may be better to sacrifice immediate reward to gain more long-term reward

40

Material: http://deepdialogue.miulab.tw

(41)

41

Deep Reinforcement Learning

Environment

Observation Action

Reward Function

Input

Function Output

Used to pick the best function

… …

DNN

(42)

42

Reinforcing Learning

Start from state s

0

Choose action a

0

Transit to s

1

~ P(s

0

, a

0

)

Continue…

Total reward:

Goal: select actions that maximize the expected total reward

Material: http://deepdialogue.miulab.tw

(43)

43

Reinforcement Learning Approach

Policy-based RL

Search directly for optimal policy

Value-based RL

Estimate the optimal value function

Model-based RL

Build a model of the environment

Plan (e.g. by lookahead) using model

is the policy achieving maximum future reward

is maximum value achievable under any policy

(44)

Modular Dialogue System

44

(45)

45

Task-Oriented Dialogue System (Young, 2000)

45

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Backend Action / Knowledge Providers

http://rsta.royalsocietypublishing.org/content/358/1769/1389.short

(46)

46

Outline

Introduction

Background Knowledge

Neural Network Basics

Reinforcement Learning

Modular Dialogue System

Spoken/Natural Language Understanding (SLU/NLU)

Dialogue Management

Dialogue State Tracking (DST)

Dialogue Policy Optimization

Natural Language Generation (NLG)

Evaluation

Recent Trends and Challenges

End-to-End Neural Dialogue System

Multimodality

Dialogue Breath

Dialogue Depth

46

Material: http://deepdialogue.miulab.tw

(47)

47

Language Understanding (LU)

Pipelined

47

1. Domain Classification

2. Intent

Classification 3. Slot Filling

(48)

LU – Domain/Intent Classification

• Given a collection of utterances u

i

with labels c

i

, D= {(u

1

,c

1

),…,(u

n

,c

n

)}

where c

i

∊ C, train a model to estimate labels for new utterances u

k

.

Mainly viewed as an utterance classification task

48

find me a cheap taiwanese restaurant in oakland

Movies Restaurants Sports

Weather Music

Find_movie Buy_tickets Find_restaurant Book_table Find_lyrics

Material: http://deepdialogue.miulab.tw

(49)

49

DNN for Domain/Intent Classification – I (Sarikaya et al., 2011)

Deep belief nets (DBN)

Unsupervised training of weights

Fine-tuning by back-propagation

Compared to MaxEnt, SVM, and boosting

49 http://ieeexplore.ieee.org/abstract/document/5947649/

(50)

50

DNN for Domain/Intent Classification – II (Tur et al., 2012;

Deng et al., 2012)

Deep convex networks (DCN)

Simple classifiers are stacked to learn complex functions

Feature selection of salient n-grams

Extension to kernel-DCN

50 http://ieeexplore.ieee.org/abstract/document/6289054/; http://ieeexplore.ieee.org/abstract/document/6424224/

Material: http://deepdialogue.miulab.tw

(51)

51

DNN for Domain/Intent Classification – III (Ravuri & Stolcke, 2015)

51 https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/RNNLM_addressee.pdf

Intent decision after reading all words performs better

RNN and LSTMs for utterance classification

(52)

52

DNN for Dialogue Act Classification – IV (Lee & Dernoncourt, 2016)

52

RNN and CNNs for dialogue act classification

Material: http://deepdialogue.miulab.tw

(53)

LU – Slot Filling

53

flights from Boston to New York today

O O B-city O B-city I-city O

O O B-dept O B-arrival I-arrival B-date

As a sequence tagging task

• Given a collection tagged word sequences, S={((w

1,1

,w

1,2

,…, w

1,n1

), (t

1,1

,t

1,2

,…,t

1,n1

)), ((w

2,1

,w

2,2

,…,w

2,n2

), (t

2,1

,t

2,2

,…,t

2,n2

)) …}

where t

i

∊ M, the goal is to estimate tags for a new word sequence.

flights from Boston to New York today

Entity Tag

Slot Tag

(54)

54

Recurrent Neural Nets for Slot Tagging – I (Yao et al, 2013;

Mesnil et al, 2015)

Variations:

a.

RNNs with LSTM cells

b.

Input, sliding window of n-grams

c.

Bi-directional LSTMs

𝑤0 𝑤1 𝑤2 𝑤𝑛0𝑓1𝑓2𝑓𝑛𝑓0𝑏1𝑏2𝑏𝑛𝑏 𝑦0 𝑦1 𝑦2 𝑦𝑛

(b) LSTM-LA (c) bLSTM

𝑦0 𝑦1 𝑦2 𝑦𝑛

𝑤0 𝑤1 𝑤2 𝑤𝑛012𝑛

(a) LSTM 𝑦0 𝑦1 𝑦2 𝑦𝑛

𝑤0 𝑤1 𝑤2 𝑤𝑛012𝑛

http://131.107.65.14/en-us/um/people/gzweig/Pubs/Interspeech2013RNNLU.pdf; http://dl.acm.org/citation.cfm?id=2876380

Material: http://deepdialogue.miulab.tw

(55)

55

Simonnet et al., 2015)

Encoder-decoder networks

Leverages sentence level information

Attention-based encoder-decoder

Use of attention (as in MT) in the encoder-decoder network

Attention is estimated using a feed-

forward network with input: h

t

and s

t

at time t

𝑦0 𝑦1 𝑦2 𝑦𝑛

𝑤𝑛 𝑤2 𝑤1 𝑤0𝑛210

𝑤0 𝑤1 𝑤2 𝑤𝑛 𝑦0 𝑦1 𝑦2 𝑦𝑛

𝑤0 𝑤1 𝑤2 𝑤𝑛

012𝑛 𝑠0 𝑠1 𝑠2 𝑠𝑛 ci

0

𝑛

http://www.aclweb.org/anthology/D16-1223

(56)

56

Recurrent Neural Nets for Slot Tagging – III (Jaech et al., 2016;

Tafforeau et al., 2016)

Multi-task learning

Goal: exploit data from domains/tasks with a lot of data to improve ones with less data

Lower layers are shared across domains/tasks

Output layer is specific to task

56 https://arxiv.org/abs/1604.00117; http://www.sensei-conversation.eu/wp-content/uploads/2016/11/favre_is2016b.pdf

Material: http://deepdialogue.miulab.tw

(57)

57

Joint Segmentation and Slot Tagging (Zhai et al., 2017)

Encoder that segments

Decoder that tags the segments

57 https://arxiv.org/pdf/1701.04027.pdf

(58)

ht-

1

ht+

1

ht

W W W W

taiwanese

B-type U

food U

please U

V

O V

O V

hT+1 EOS U

FIND_REST V

Slot Filling Intent Prediction

Joint Semantic Frame Parsing

Sequence- based (Hakkani-Tur

et al., 2016)

• Slot filling and intent prediction in the same

output sequence

Parallel (Liu and Lane, 2016)

• Intent prediction and slot filling are performed in two branches

58 https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/IS16_MultiJoint.pdf; https://arxiv.org/abs/1609.01454

Material: http://deepdialogue.miulab.tw

(59)

59

Contextual LU

59

just sent email to bob about fishing this weekend

O O O O

B-contact_name O

B-subject I-subject I-subject U

S

I send_email D communication

 send_email(contact_name=“bob”, subject=“fishing this weekend”)

are we going to fish this weekend U1

S2

 send_email(message=“are we going to fish this weekend”) send email to bob

U2

 send_email(contact_name=“bob”)

B-message

I-messageI-message I-message I-message I-message I-message

B-contact_name S1

Domain Identification  Intent Prediction  Slot Filling

(60)

60

Contextual LU

User utterances are highly ambiguous in isolation

Cascal, for 6.

#people time

?

Book a table for 10 people tonight.

Which restaurant would you like to book a table for?

Restaurant Booking

Material: http://deepdialogue.miulab.tw

(61)

61

Contextual LU (Bhargava et al., 2013; Hori et al, 2015 )

Leveraging contexts

Used for individual tasks

Seq2Seq model

Words are input one at a time, tags are output at the end of each utterance

Extension: LSTM with speaker role dependent layers

61 https://www.merl.com/publications/docs/TR2015-134.pdf

(62)

62

End-to-End Memory Networks (Sukhbaatar et al, 2015)

U: “i d like to purchase tickets to see deepwater horizon”

S: “for which theatre”

U: “angelika”

S: “you want them for angelika theatre?”

U: “yes angelika”

S: “how many tickets would you like ?”

U: “3 tickets for saturday”

S: “What time would you like ?”

U: “Any time on saturday is fine”

S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm”

U: “Let’s do 5:40”

m0

mi

mn-1 u

Material: http://deepdialogue.miulab.tw

(63)

63

E2E MemNN for Contextual LU (Chen et al., 2016)

63

u

Knowledge Attention Distribution

pi

mi

Memory Representation

Weighted

Sum h

Wkg

Knowledge Encoding o

Representation history utterances {xi}

current utterance

c

Inner Product Sentence

Encoder RNNin

x1 x2 xi

Contextual Sentence Encoder

x1 x2 xi

RNNmem

slot tagging sequencey

ht-1 ht

V V

W W W

wt-1 wt yt-1 yt

U U

M M

1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding

Idea: additionally incorporating contextual knowledge during slot tagging

 track dialogue states in a latent way

RNN Tagger

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/IS16_ContextualSLU.pdf

(64)

64

Analysis of Attention

U: “i d like to purchase tickets to see deepwater horizon”

S: “for which theatre”

U: “angelika”

S: “you want them for angelika theatre?”

U: “yes angelika”

S: “how many tickets would you like ?”

U: “3 tickets for saturday”

S: “What time would you like ?”

U: “Any time on saturday is fine”

S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm”

U: “Let’s do 5:40”

0.69

0.13

0.16

Material: http://deepdialogue.miulab.tw

(65)

65

Sequential Dialogue Encoder Network (Bapna et al., 2017)

Past and current turn encodings input to a feed forward network

65 Bapna et.al., SIGDIAL 2017

(66)

66

Structural LU (Chen et al., 2016)

K-SAN: prior knowledge as a teacher

66

Knowledge Encoding

Sentence Encoding

Inner Product

m

i

Knowledge Attention Distribution

p

i

Encoded Knowledge Representation

Weighted Sum

Knowledge- Guided Representation

slot tagging sequence knowledge-guided structure {xi}

showme theflights fromseattleto sanfrancisco

ROOT

Input Sentence

W W W W

wt-1

yt-1 U

wt M U

wt+1 U

V

yt V

yt+1 V M

M

RNN Tagger

Knowledge Encoding Module

http://arxiv.org/abs/1609.03286

Material: http://deepdialogue.miulab.tw

(67)

67

Structural LU (Chen et al., 2016)

Sentence structural knowledge stored as memory

67

Semantics (AMR Graph)

show me

the

flights from seattle

to

san francisco

ROOT

1.

3.

4.

2.

show

you flight I

1.

2.

4.

city city

Seattle San Francisco

3.

Sentence s show me the flights from seattle to san francisco

Syntax (Dependency Tree)

http://arxiv.org/abs/1609.03286

(68)

68

Structural LU (Chen et al., 2016)

Sentence structural knowledge stored as memory

http://arxiv.org/abs/1609.03286

Using less training data with K-SAN allows the model pay the similar attention to the salient substructures that are important for tagging.

Material: http://deepdialogue.miulab.tw

(69)

69

LU Importance (Li et al., 2017)

Compare different types of LU errors

http://arxiv.org/abs/1703.07055

Slot filling is more important than intent detection in language understanding

Sensitivity to Intent Error Sensitivity to Slot Error

(70)

70

LU Evaluation

Metrics

Sub-sentence-level: intent accuracy, slot F1

Sentence-level: whole frame accuracy

70

Material: http://deepdialogue.miulab.tw

(71)

71

Outline

Introduction

Background Knowledge

Neural Network Basics

Reinforcement Learning

Modular Dialogue System

Spoken/Natural Language Understanding (SLU/NLU)

Dialogue Management

Dialogue State Tracking (DST)

Dialogue Policy Optimization

Natural Language Generation (NLG)

Evaluation

Recent Trends and Challenges

End-to-End Neural Dialogue System

Multimodality

Dialogue Breath

Dialogue Depth

71

(72)

72

Elements of Dialogue Management

(Figure from Gašić) 72

Dialogue State Tracking

Material: http://deepdialogue.miulab.tw

(73)

73

Dialogue State Tracking (DST)

Maintain a probabilistic distribution instead of a 1-best prediction for better robustness

73

Incorrect

for both!

(74)

74

Dialogue State Tracking (DST)

Maintain a probabilistic distribution instead of a 1-best prediction for better robustness to SLU errors or ambiguous input

74

How can I help you?

Book a table at Sumiko for 5 How many people?

3

Slot Value

# people 5 (0.5)

time 5 (0.5)

Slot Value

# people 3 (0.8)

time 5 (0.8)

Material: http://deepdialogue.miulab.tw

(75)

75

Multi-Domain Dialogue State Tracking (DST)

A full representation of the system's belief of the user's goal at any point during the dialogue

Used for making API calls

75

Do you wanna take Angela to go see a movie tonight?

Sure, I will be home by 6.

Let's grab dinner before the movie.

How about some Mexican?

Let's go to Vive Sol and see Inferno after that.

Angela wants to watch the Trolls movie.

Ok. Lets catch the 8 pm show.

Inferno

6 pm 7 pm

2 3

11/15/16

Vive Sol Restaurant

Mexican Cuisine

6:30 pm 7 pm 11/15/16 Date

Time

Restaurants

7:30 pm

Century 16

Trolls

8 pm 9 pm

Movies

(76)

76

Dialog State Tracking Challenge (DSTC)

(Williams et al. 2013, Henderson et al. 2014, Henderson et al. 2014, Kim et al. 2016, Kim et al. 2016)

Challenge Type Domain Data Provider Main Theme

DSTC1 Human-Machine Bus Route CMU Evaluation Metrics

DSTC2 Human-Machine Restaurant U. Cambridge User Goal Changes

DSTC3 Human-Machine Tourist Information U. Cambridge Domain Adaptation

DSTC4 Human-Human Tourist Information I2R Human Conversation

DSTC5 Human-Human Tourist Information I2R Language Adaptation

Material: http://deepdialogue.miulab.tw

(77)

77

Mrkšić et al., 2016)

(Figure from Wen et al, 2016) 77

http://www.anthology.aclweb.org/W/W13/W13-4073.pdf; https://arxiv.org/abs/1506.07190; https://arxiv.org/abs/1606.03777

(78)

78

Neural Belief Tracker (Mrkšić et al., 2016)

78 https://arxiv.org/abs/1606.03777

Material: http://deepdialogue.miulab.tw

(79)

79

Multichannel Tracker (Shi et al., 2016)

79

Training a multichannel CNN for each slot

Chinese character CNN

Chinese word CNN

English word CNN

https://arxiv.org/abs/1701.06247

(80)

80

DST Evaluation

Dialogue State Tracking Challenges

DSTC2-3, human-machine

DSTC4-5, human-human

Metric

Tracked state accuracy with respect to user goal

Recall/Precision/F-measure individual slots

80

Material: http://deepdialogue.miulab.tw

(81)

81

Outline

Introduction

Background Knowledge

Neural Network Basics

Reinforcement Learning

Modular Dialogue System

Spoken/Natural Language Understanding (SLU/NLU)

Dialogue Management

Dialogue State Tracking (DST)

Dialogue Policy Optimization

Natural Language Generation (NLG)

Evaluation

Recent Trends and Challenges

End-to-End Neural Dialogue System

Multimodality

Dialogue Breath

Dialogue Depth

81

(82)

82

Elements of Dialogue Management

(Figure from Gašić) 82

Dialogue Policy Optimization

Material: http://deepdialogue.miulab.tw

(83)

83

Dialogue Policy Optimization

Dialogue management in a RL framework

83

U s e r

Reward R

Observation O Action A

Environment

Agent

Natural Language Generation Language Understanding

Dialogue Manager

Slides credited by Pei-Hao Su

Optimized dialogue policy selects the best action that can maximize the future reward.

Correct rewards are a crucial factor in dialogue policy training

(84)

84

Reward for RL ≅ Evaluation for System

Dialogue is a special RL task

Human involves in interaction and rating (evaluation) of a dialogue

Fully human-in-the-loop framework

Rating: correctness, appropriateness, and adequacy

- Expert rating high quality, high cost

- User rating unreliable quality, medium cost - Objective rating Check desired aspects, low cost

84

Material: http://deepdialogue.miulab.tw

(85)

85

Reinforcement Learning for Dialogue Policy Optimization

85

Language understanding

Language (response) generation

Dialogue Policy 𝑎 = 𝜋(𝑠)

Collect rewards (𝑠, 𝑎, 𝑟, 𝑠’)

Optimize 𝑄(𝑠, 𝑎) User input (o)

Response

𝑠

𝑎

Type of Bots State Action Reward

Social ChatBots Chat history System Response # of turns maximized;

Intrinsically motivated reward

InfoBots (interactive Q/A) User current question + Context

Answers to current question

Relevance of answer;

# of turns minimized

Task-Completion Bots User current input + Context

System dialogue act w/

slot value (or API calls)

Task success rate;

# of turns minimized

Goal: develop a generic deep RL algorithm to learn dialogue policy for all bot categories

(86)

86

Dialogue Reinforcement Learning Signal

Typical reward function

-1 for per turn penalty

Large reward at completion if successful

Typically requires domain knowledge

✔ Simulated user

✔ Paid users (Amazon Mechanical Turk)

✖ Real users

|||

86

The user simulator is usually required for dialogue system training before deployment

Material: http://deepdialogue.miulab.tw

(87)

87

Neural Dialogue Manager (Li et al., 2017)

Deep Q-network for training DM policy

Input: current semantic frame observation, database returned results

Output: system action

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location DQN-based

Dialogue Management

Simulated User

(DM)

Backend DB

https://arxiv.org/abs/1703.01008

(88)

88

SL + RL for Sample Efficiency (Su et al., 2017)

Issue about RL for DM

slow learning speed

cold start

Solutions

Sample-efficient actor-critic

Off-policy learning with experience replay

Better gradient update

Utilizing supervised data

Pretrain the model with SL and then fine-tune with RL

Mix SL and RL data during RL learning

Combine both

88 https://arxiv.org/pdf/1707.00130.pdf Su et.al., SIGDIAL 2017

Material: http://deepdialogue.miulab.tw

(89)

89

Online Training (Su et al., 2015; Su et al., 2016)

Policy learning from real users

Infer reward directly from dialogues

(Su et al., 2015)

User rating

(Su et al., 2016)

Reward modeling on user binary success rating

Reward

Model

Success/Fail

Embedding

Function

Dialogue Representation

Reinforcement Signal Query rating

http://www.anthology.aclweb.org/W/W15/W15-46.pdf; https://www.aclweb.org/anthology/P/P16/P16-1230.pdf

(90)

90

Interactive RL for DM (Shah et al., 2016)

90

Immediate Feedback

https://research.google.com/pubs/pub45734.html

Use a third agent for providing interactive feedback to the DM

Material: http://deepdialogue.miulab.tw

(91)

91

Interpreting Interactive Feedback (Shah et al., 2016)

91 https://research.google.com/pubs/pub45734.html

(92)

92

Dialogue Management Evaluation

Metrics

Turn-level evaluation: system action accuracy

Dialogue-level evaluation: task success rate, reward

92

Material: http://deepdialogue.miulab.tw

(93)

93

Outline

Introduction

Background Knowledge

Neural Network Basics

Reinforcement Learning

Modular Dialogue System

Spoken/Natural Language Understanding (SLU/NLU)

Dialogue Management

Dialogue State Tracking (DST)

Dialogue Policy Optimization

Natural Language Generation (NLG)

Evaluation

Recent Trends and Challenges

End-to-End Neural Dialogue System

Multimodality

Dialogue Breath

Dialogue Depth

93

(94)

94

Natural Language Generation (NLG)

Mapping semantic frame into natural language

inform(name=Seven_Days, foodtype=Chinese) Seven Days is a nice Chinese restaurant

94

Material: http://deepdialogue.miulab.tw

(95)

95

Template-Based NLG

Define a set of rules to map frames to NL

95

Pros:

simple, error-free, easy to control

Cons: time-consuming, poor scalability Semantic Frame Natural Language

confirm() “Please tell me more about the product your are looking for.”

confirm(area=$V) “Do you want somewhere in the $V?”

confirm(food=$V) “Do you want a $V restaurant?”

confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.”

(96)

96

Plan-Based NLG (Walker et al., 2002)

Divide the problem into pipeline

Statistical sentence plan generator

(Stent et al., 2009)

Statistical surface realizer

(Dethlefs et al., 2013; Cuayáhuitl et al., 2014; …) Inform(

name=Z_House, price=cheap )

Z House is a cheap restaurant.

Pros:

can model complex linguistic structures

Cons: heavily engineered, require domain knowledge Sentence

Plan Generator

Sentence Plan Reranker

Surface Realizer

syntactic tree

Material: http://deepdialogue.miulab.tw

(97)

97

Class-Based LM NLG (Oh and Rudnicky, 2000)

Class-based language modeling

NLG by decoding

97

Pros:

easy to implement/ understand, simple rules

Cons: computationally inefficient

Classes:

inform_area inform_address

request_area request_postcode

http://dl.acm.org/citation.cfm?id=1117568

(98)

98

Phrase-Based NLG (Mairesse et al, 2010)

Semantic DBN Phrase

DBN

Charlie Chan is a Chinese Restaurant near Cineworld in the centre

d d

Inform(name=Charlie Chan, food=Chinese, type= restaurant, near=Cineworld, area=centre)

98

Pros:

efficient, good performance

Cons: require semantic alignments

realization phrase semantic stack

http://dl.acm.org/citation.cfm?id=1858838

Material: http://deepdialogue.miulab.tw

(99)

99

RNN-Based LM NLG (Wen et al., 2015)

<BOS> SLOT_NAME serves SLOT_FOOD .

<BOS> Din Tai Fung serves Taiwanese . delexicalisation

Inform(name=Din Tai Fung, food=Taiwanese) 0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, 0, 0, 0…

dialogue act 1-hot representation

SLOT_NAME serves SLOT_FOOD . <EOS>

Slot weight tying

conditioned on the dialogue act

Input

Output

http://www.anthology.aclweb.org/W/W15/W15-46.pdf#page=295

(100)

100

Handling Semantic Repetition

Issue: semantic repetition

Din Tai Fung is a great Taiwanese restaurant that serves Taiwanese.

Din Tai Fung is a child friendly restaurant, and also allows kids.

Deficiency in either model or decoding (or both)

Mitigation

Post-processing rules

(Oh & Rudnicky, 2000)

Gating mechanism (Wen et al., 2015)

Attention(Mei et al., 2016; Wen et al., 2015)

100

Material: http://deepdialogue.miulab.tw

參考文獻

相關文件

• User goal: Two tickets for “the witch” tomorrow 9:30 PM at regal meridian 16, Seattle. E2E Task-Completion Bot (TC-Bot) (Li et

Finally, we train the SLU model by learning latent feature vectors for utterances and slot candidates through MF techniques. Combining with a knowledge graph propagation model based

Shang-Yu Su, Chao-Wei Huang, and Yun-Nung Chen, “Dual Supervised Learning for Natural Language Understanding and Generation,” in Proceedings of The 57th Annual Meeting of

A spoken language understanding (SLU) component requires the domain ontology to decode utterances into semantic forms, which contain core content (a set of slots and slot-fillers)

Department of Electrical Engineering, National Cheng Kung University In this thesis, an embedded system based on SPCE061A for interactive spoken dialogue learning system (ISDLS)

¾ School arranges programme review with external agencies with specific focus identified for review. ¾ All Key Learning Areas and the subject General Studies (for primary schools)

Ongoing Projects in Image/Video Analytics with Deep Convolutional Neural Networks. § Goal – Devise effective and efficient learning methods for scalable visual analytic

○ Value function: how good is each state and/or action. ○ Policy: agent’s