• 沒有找到結果。

Deep Learning for Dialogue Systems

N/A
N/A
Protected

Academic year: 2022

Share "Deep Learning for Dialogue Systems"

Copied!
178
0
0

加載中.... (立即查看全文)

全文

(1)

1 Y U N - N U N G ( V I V I A N ) C H E N 陳 縕 儂

H T T P : / / V I V I A N C H E N . I D V . T W

(2)

NTUMIULAB

• Introduction

• Background Knowledge

• Modular Dialogue System

• System Evaluation

• Recent Trends of Learning Dialogues

Outline

2

(3)

NTUMIULAB

• Introduction

• Background Knowledge

• Modular Dialogue System

• System Evaluation

• Recent Trends of Learning Dialogues

Outline

3

(4)

NTUMIULAB

Early 1990s

Early 2000s

2017 Multi-modal systems

e.g., Microsoft MiPad, Pocket PC

Keyword Spotting (e.g., AT&T)

System: “Please say collect, calling card, person, third number, or operator”

TV Voice Search e.g., Bing on Xbox

Intent Determination

(Nuance’s Emily™, AT&T HMIHY) User: “Uh…we want to move…we want to change our phone line from this house to another house Task-specific argument extraction

(e.g., Nuance, SpeechWorks) User: “I want to fly from Boston to New York next week.”

Brief History of Dialogue Systems

Apple Siri (2011)

Google Now (2012)

Facebook M & Bot (2015)

Google Home (2016) Microsoft Cortana

(2014)

Amazon Alexa/Echo (2014)

Google Assistant (2016)

DARPA CALO Project

Virtual Personal Assistants

(5)

NTUMIULAB

Language Empowering Intelligent Assistant

Apple Siri (2011) Google Now (2012)

Facebook M & Bot (2015) Google Home (2016)

Microsoft Cortana (2014)

Amazon Alexa/Echo (2014)

Google Assistant (2016)

Apple HomePod (2017)

(6)

NTUMIULAB

Why We Need?

• Get things done

• E.g. set up alarm/reminder, take note

• Easy access to structured data, services and apps

• E.g. find docs/photos/restaurants

• Assist your daily schedule and routine

• E.g. commute alerts to/from work

• Be more productive in managing your work and personal life

6

(7)

NTUMIULAB

Why Natural Language?

• Global Digital Statistics (2018 January)

Total Population 7.59B

Internet Users

4.02B Unique Mobile Users

5.14B

The more natural and convenient input of devices evolves towards speech.

Active Mobile Social Users

2.96B Active Social Media

Users 3.20B

13% 4%

7% 14%

(8)

NTUMIULAB

Spoken Dialogue System (SDS)

• Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions.

• Spoken dialogue systems are being incorporated into various devices (smart-phones, smart TVs, in-car navigating system, etc).

8

JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion

Good dialogue systems assist users to access information conveniently and finish tasks efficiently.

(9)

NTUMIULAB

App → Bot

• A bot is responsible for a “single” domain, similar to an app

9

Users can initiate dialogues instead of following the GUI design

(10)

NTUMIULAB

GUI v.s. CUI (Conversational UI)

10

https://github.com/enginebai/Movie-lol-android

(11)

NTUMIULAB

GUI v.s. CUI (Conversational UI)

Website/APP’s GUI Msg’s CUI

Situation Navigation, no specific goal Searching, with specific goal

Information Quantity More Less

Information Precision Low High

Display Structured Non-structured

Interface Graphics Language

Manipulation Click mainly use texts or speech as input

Learning Need time to learn and adapt No need to learn

Entrance App download Incorporated in any msg-based interface

Flexibility Low, like machine manipulation High, like converse with a human

11

(12)

NTUMIULAB

Conversational Agents

Chit-Chat

Task-Oriented

(13)

NTUMIULAB

Challenges

• Variability in Natural Language

• Robustness

• Recall/Precision Trade-off

• Meaning Representation

• Common Sense, World Knowledge

• Ability to Learn

• Transparency

13

(14)

NTUMIULAB

Task-Oriented Dialogue System (Young, 2000)

14

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame

request_movie

genre=action, date=this weekend

System Action/Policy

request_location

Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Backend Action / Knowledge Providers

http://rsta.royalsocietypublishing.org/content/358/1769/1389.short

(15)

NTUMIULAB

Interaction Example

15

User

Intelligent

Agent Q: How does a dialogue system process this request?

Good Taiwanese eating places include Din Tai Fung, Boiling Point, etc. What do you want to choose? I can help you go there.

find a good eating place for taiwanese food

(16)

NTUMIULAB

Task-Oriented Dialogue System (Young, 2000)

16

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame

request_movie

genre=action, date=this weekend

System Action/Policy

request_location

Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Backend Action / Knowledge Providers

(17)

NTUMIULAB

Requires Predefined Domain Ontology

17

find a good eating place for taiwanese food

User

Organized Domain Knowledge (Database)

Intelligent Agent

Restaurant DB Taxi DB Movie DB

Classification!

(18)

NTUMIULAB

Requires Predefined Schema

18

find a good eating place for taiwanese food

User

Intelligent Agent

Restaurant DB

FIND_RESTAURANT FIND_PRICE

FIND_TYPE :

Classification!

(19)

NTUMIULAB

Requires Predefined Schema

find a good eating place for taiwanese food

User

Intelligent Agent

19

Restaurant DB

Restaurant Rating Type Rest 1 good Taiwanese

Rest 2 bad Thai

: : :

FIND_RESTAURANT rating=“good”

type=“taiwanese”

SELECT restaurant { rest.rating=“good”

rest.type=“taiwanese”

Semantic Frame } Sequence Labeling O O B-rating O O O B-type O

(20)

NTUMIULAB

Task-Oriented Dialogue System (Young, 2000)

20

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame

request_movie

genre=action, date=this weekend

System Action/Policy

request_location

Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Backend Action / Knowledge Providers

(21)

NTUMIULAB

Requires Hand-Crafted States

User

Intelligent Agent

find a good eating place for taiwanese food

location rating 21type

loc, rating rating, type loc, type

all

i want it near to my office

NULL

(22)

NTUMIULAB

Requires Hand-Crafted States

User

Intelligent Agent

find a good eating place for taiwanese food

location rating 22type

loc, rating rating, type loc, type

all

i want it near to my office

NULL

(23)

NTUMIULAB

Handling Errors and Confidence

User

Intelligent Agent

find a good eating place for taixxxx food

23

FIND_RESTAURANT rating=“good”

type=“taiwanese”

FIND_RESTAURANT rating=“good”

type=“thai”

FIND_RESTAURANT rating=“good”

location rating type

loc, rating rating, type loc, type

all NULL

?

?

rating=“good”, type=“thai”

rating=“good”, type=“taiwanese”

?

?

(24)

NTUMIULAB

Dialogue Policy for Agent Action

• Inform(location=“Taipei 101”)

• “The nearest one is at Taipei 101”

• Request(location)

• “Where is your home?”

• Confirm(type=“taiwanese”)

• “Did you want Taiwanese food?”

24

(25)

NTUMIULAB

Task-Oriented Dialogue System (Young, 2000)

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling Hypothesis

are there any action movies to see this weekend

Semantic Frame

request_movie

genre=action, date=this weekend

System Action/Policy

request_location

Text Input

Are there any action movies to see this weekend?

Speech Signal

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy

Backend Action / Knowledge Providers Natural Language

Generation (NLG) Text response

Where are you located?

(26)

NTUMIULAB

Output / Natural Language Generation

• Goal: generate natural language or GUI given the selected dialogue action for interactions

• Inform(location=“Taipei 101”)

• “The nearest one is at Taipei 101” v.s.

• Request(location)

• “Where is your home?” v.s.

• Confirm(type=“taiwanese”)

• “Did you want Taiwanese food?” v.s. 26

(27)

NTUMIULAB

• Introduction

• Background Knowledge

• Neural Network Basics

• Reinforcement Learning

• Modular Dialogue System

• System Evaluation

• Recent Trends of Learning Dialogues

Outline

27

(28)

NTUMIULAB

• Introduction

• Background Knowledge

• Neural Network Basics

• Reinforcement Learning

• Modular Dialogue System

• System Evaluation

• Recent Trends of Learning Dialogues

Outline

28

(29)

NTUMIULAB

• Speech Recognition

• Image Recognition

• Go Playing

• Chat Bot

Machine Learning ≈ Looking for a Function

(

)

=

f

(

)

=

f

(

)

=

f

(

)

=

f

cat

“你好 (Hello) ”

5-5 (next move)

“Where is KAIST?”

“The address is…”

Given a large amount of data, the machine learns what the function f should be.

(30)

NTUMIULAB

Machine Learning

30

Machine Learning

Unsupervised Learning Supervised

Learning

Reinforcement Learning

Deep learning is a type of machine learning approaches, called “neural networks”.

(31)

NTUMIULAB

A Single Neuron

z w

1

w

2

w

N

x

1

x

2

x

N

+ b

( ) z

( ) z

bias

z

y

( )

z

z e

= + 1

 1

Sigmoid function Activation function

1

w, b are the parameters of this neuron

31

(32)

NTUMIULAB

A Single Neuron

z w

1

w

2

w

N

x

1

x

2

x

N

+

b

bias

y

1

  

5 . 0

"

2

"

5 . 0

"

2

"

y not

y is

A single neuron can only handle binary classification

32

M

N

R

R

f : →

(33)

NTUMIULAB

A Layer of Neurons

• Handwriting digit classification

M

N

R

R

f : →

A layer of neurons can handle multiple possible output, and the result depends on the max one

x

1

x

2

x

N

+

1

+ y

1

+

… …

“1” or not

“2” or not

“3” or not

y

2

y

3

10 neurons/10 classes

Which one is max?

(34)

NTUMIULAB

Deep Neural Networks (DNN)

• Fully connected feedforward network

x1

x2

……

Layer 1

……

y1

y2

……

Layer 2

……

Layer L

……

……

……

Input Output

yM

xN

vector x

vector y

Deep NN: multiple hidden layers

M

N

R

R

f : →

(35)

NTUMIULAB

Recurrent Neural Network (RNN)

http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/

: tanh, ReLU

time

RNN can learn accumulated sequential information (time-series)

(36)

NTUMIULAB

• Introduction

• Background Knowledge

• Neural Network Basics

• Reinforcement Learning

• Modular Dialogue System

• System Evaluation

• Recent Trends of Learning Dialogues

Outline

36

(37)

NTUMIULAB

Reinforcement Learning

• RL is a general purpose framework for decision making

• RL is for an agent with the capacity to act

• Each action influences the agent’s future state

• Success is measured by a scalar reward signal

• Goal: select actions to maximize future reward

Observation Action Reward

(38)

NTUMIULAB

Scenario of Reinforcement Learning

Agent learns to take actions to maximize expected reward.

Environment

Observation ot Action at

Reward rt If win, reward = 1 If loss, reward = -1 Otherwise, reward = 0

Next Move

(39)

NTUMIULAB

Supervised v.s. Reinforcement

• Supervised

• Reinforcement

39

……

Say “Hi”

Say “Good bye”

Learning from teacher

Learning from critics

Hello ☺ ……

“Hello”

“Bye bye”

……. …….

OXX???!

Bad

(40)

NTUMIULAB

Sequential Decision Making

• Goal: select actions to maximize total future reward

• Actions may have long-term consequences

• Reward may be delayed

• It may be better to sacrifice immediate reward to gain more long-term reward

40

(41)

NTUMIULAB

Deep Reinforcement Learning

Environment

Observation Action

Reward Function

Input

Function Output

Used to pick the best function

……

DNN

(42)

NTUMIULAB

• Start from state

s0

• Choose action

a0

• Transit to

s1 ~ P(s0, a0)

• Continue…

• Total reward:

Reinforcing Learning

Goal: select actions that maximize the expected total reward

(43)

NTUMIULAB

Reinforcement Learning Approach

• Policy-based RL

• Search directly for optimal policy

• Value-based RL

• Estimate the optimal value function

• Model-based RL

• Build a model of the environment

• Plan (e.g. by lookahead) using model

is the policy achieving maximum future reward

is maximum value achievable under any policy

(44)

NTUMIULAB

• Introduction

• Background Knowledge

• Modular Dialogue System

• Spoken/Natural Language Understanding (SLU/NLU)

• Dialogue Management

• Dialogue State Tracking (DST)

• Dialogue Policy Optimization

• Natural Language Generation (NLG)

• System Evaluation

• Recent Trends of Learning Dialogues

Outline

44

(45)

NTUMIULAB

Task-Oriented Dialogue System (Young, 2000)

45

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame

request_movie

genre=action, date=this weekend

System Action/Policy

request_location

Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Backend Action / Knowledge Providers

http://rsta.royalsocietypublishing.org/content/358/1769/1389.short

(46)

NTUMIULAB

• Introduction

• Background Knowledge

• Modular Dialogue System

• Spoken/Natural Language Understanding (SLU/NLU)

• Dialogue Management

• Dialogue State Tracking (DST)

• Dialogue Policy Optimization

• Natural Language Generation (NLG)

• System Evaluation

• Recent Trends of Learning Dialogues

Outline

46

(47)

NTUMIULAB

Language Understanding (LU)

• Pipelined

47

1. Domain Classification

2. Intent

Classification 3. Slot Filling

(48)

NTUMIULAB

LU – Domain/Intent Classification

As an utterance classification

task

• Given a collection of utterances ui with labels ci, D= {(u1,c1),…,(un,cn)} where ci C, train a

model to estimate labels for new utterances uk.

find me a cheap taiwanese restaurant in oakland

Movies

Restaurants Music

Sports

find_movie, buy_tickets

find_restaurant, find_price, book_table find_lyrics, find_singer

Domain Intent

(49)

NTUMIULAB

• Deep belief nets (DBN)

• Unsupervised training of weights

• Fine-tuning by back-propagation

• Compared to MaxEnt, SVM, and boosting

Domain/Intent Classification

(Sarikaya et al., 2011)

49

http://ieeexplore.ieee.org/abstract/document/5947649/

(50)

NTUMIULAB

• Deep convex networks (DCN)

• Simple classifiers are stacked to learn complex functions

• Feature selection of salient n-grams

• Extension to kernel-DCN

Domain/Intent Classification

(Tur et al., 2012; Deng et al., 2012)

50

(51)

NTUMIULAB

• RNN and LSTMs for utterance classification

Domain/Intent Classification

(Ravuri & Stolcke, 2015)

51

Intent decision after reading all words performs better

(52)

NTUMIULAB

• RNN and CNNs for dialogue act classification

Dialogue Act Classification

(Lee & Dernoncourt, 2016)

(53)

NTUMIULAB

LU – Slot Filling

flights from Boston to New York today

O O B-city O B-city I-city O

O O B-dept O B-arrival I-arrival B-date

As a sequence tagging task

• Given a collection tagged word sequences, S={((w1,1,w1,2,…, w1,n1), (t1,1,t1,2,…,t1,n1)), ((w2,1,w2,2,…,w2,n2), (t2,1,t2,2,…,t2,n2)) …}

where tiM, the goal is to estimate tags for a new word sequence.

flights from Boston to New York today

Entity Tag Slot Tag

(54)

NTUMIULAB

• Variations:

a. RNNs with LSTM cells

b. Input, sliding window of n-grams c. Bi-directional LSTMs

Slot Tagging

(Yao et al, 2013; Mesnil et al, 2015)

𝑤0 𝑤1 𝑤2 𝑤𝑛 0𝑓 1𝑓 2𝑓 𝑛𝑓 0𝑏 1𝑏 2𝑏 𝑛𝑏 𝑦0 𝑦1 𝑦2 𝑦𝑛

(b) LSTM-LA (c) bLSTM

𝑦0 𝑦1 𝑦2 𝑦𝑛

𝑤0 𝑤1 𝑤2 𝑤𝑛 0 1 2 𝑛

(a) LSTM

𝑦0 𝑦1 𝑦2 𝑦𝑛

𝑤0 𝑤1 𝑤2 𝑤𝑛 0 1 2 𝑛

http://131.107.65.14/en-us/um/people/gzweig/Pubs/Interspeech2013RNNLU.pdf; http://dl.acm.org/citation.cfm?id=2876380

(55)

NTUMIULAB

• Encoder-decoder networks

• Leverages sentence level information

• Attention-based encoder-decoder

• Use of attention (as in MT) in the encoder-decoder network

• Attention is estimated using a feed-forward network with input: ht and st at time t

Slot Tagging

(Kurata et al., 2016; Simonnet et al., 2015)

𝑦0 𝑦1 𝑦2 𝑦𝑛

𝑤𝑛 𝑤2 𝑤1 𝑤0 𝑛 2 1 0

𝑤0 𝑤1 𝑤2 𝑤𝑛

𝑦0 𝑦1 𝑦2 𝑦𝑛

𝑤0 𝑤1 𝑤2 𝑤𝑛

0 1 2 𝑛 𝑠0 𝑠1 𝑠2 𝑠𝑛 ci

0𝑛

http://www.aclweb.org/anthology/D16-1223

(56)

NTUMIULAB

• Encoder that segments

• Decoder that tags the segments

Joint Segmentation & Slot Tagging (Zhai+, 2017)

56

https://arxiv.org/pdf/1701.04027.pdf

(57)

NTUMIULAB

• Multi-task learning

• Goal: exploit data from domains/tasks with a lot of data to improve ones with less data

• Lower layers are shared across domains/tasks

• Output layer is specific to task

Multi-Task Slot Tagging

(Jaech et al., 2016; Tafforeau et al., 2016)

57

https://arxiv.org/abs/1604.00117; http://www.sensei-conversation.eu/wp-content/uploads/2016/11/favre_is2016b.pdf

(58)

NTUMIULAB

Semi-Supervised Slot Tagging

(Lan+, 2018)

58

O. Lan, S. Zhu, and K. Yu, “Semi-supervised Training using Adversarial Multi-task Learning for Spoken Language Understanding,” in Proceedings of ICASSP, 2018.

Slot Tagging

Model

BLM exploits the unsupervised knowledge, the shared-private framework and adversarial training make the slot tagging model more generalized

https://speechlab.sjtu.edu.cn/papers/oyl11-lan-icassp18.pdf

• Idea: language model objective can enhance other tasks

(59)

NTUMIULAB

LU Evaluation

• Metrics

• Sub-sentence-level: intent accuracy, slot F1

• Sentence-level: whole frame accuracy

59

(60)

NTUMIULAB

ht-1 ht ht+1

W W W W

taiwanese

B-type U

food U

please U

V

O V

O V

hT+1 EOS U

FIND_REST V

Slot Filling Intent Prediction

Joint Semantic Frame Parsing

Sequence- based (Hakkani-

Tur et al., 2016)

• Slot filling and intent prediction in the same

output sequence

Parallel (Liu and Lane, 2016)

• Intent prediction

and slot filling are

performed in two

branches

(61)

NTUMIULAB

Slot-Gated Joint SLU

(Goo+, 2018)

Slot Attention

Intent Attention 𝑦𝐼

Word Sequence

𝑥1 𝑥2 𝑥3 𝑥4

BLSTM Slot

Sequence 𝑦1

𝑆 𝑦2𝑆 𝑦3𝑆 𝑦4𝑆

Word

Sequence 𝑥1 𝑥2 𝑥3 𝑥4

BLSTM

Slot Gate

𝑊

𝑐𝐼

𝑣 tanh

𝑔

𝑐𝑖𝑆

Slot Gate

𝑔 = ∑𝑣 ∙ tanh 𝑐𝑖𝑆 + 𝑊 ∙ 𝑐𝐼 Slot Prediction

𝑦𝑖𝑆 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑊𝑆 𝑖 + 𝒈 ∙ 𝑐𝑖𝑆 + 𝑏𝑆

𝒈 will be larger if slot and intent are better related

(62)

NTUMIULAB

Contextual LU

62

just sent email to bob about fishing this weekend

O O O O

B-contact_name

O

B-subject I-subject I-subject

U S

I send_email D communication

→ send_email(contact_name=“bob”, subject=“fishing this weekend”)

are we going to fish this weekend U1

S2

→ send_email(message=“are we going to fish this weekend”)

send email to bob

U2

→ send_email(contact_name=“bob”)

B-message

I-messageI-message I-message I-message I-message I-message

B-contact_name

S1

Domain Identification → Intent Prediction → Slot Filling

(63)

NTUMIULAB

• User utterances are highly ambiguous in isolation

Contextual LU

Cascal, for 6.

#people time

?

Book a table for 10 people tonight.

Which restaurant would you like to book a table for?

Restaurant Booking

(64)

NTUMIULAB

• Leveraging contexts

• Used for individual tasks

• Seq2Seq model

• Words are input one at a time, tags are output at the end of each utterance

• Extension: LSTM with speaker role dependent layers

Contextual LU

(Bhargava et al., 2013; Hori et al, 2015)

64

https://www.merl.com/publications/docs/TR2015-134.pdf

(65)

NTUMIULAB

U: “i d like to purchase tickets to see deepwater horizon”

S: “for which theatre”

U: “angelika”

S: “you want them for angelika theatre?”

U: “yes angelika”

S: “how many tickets would you like ?”

U: “3 tickets for saturday”

S: “What time would you like ?”

U: “Any time on saturday is fine”

S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm”

U: “Let’s do 5:40”

End-to-End Memory Networks

(Sukhbaatar et al, 2015)

m0

mi

mn-1 u

(66)

NTUMIULAB

E2E MemNN for Contextual LU

(Chen+, 2016)

66

u

Knowledge Attention Distribution

pi

mi

Memory Representation

Weighted

Sum h

Wkg

Knowledge Encoding o

Representation history utterances {xi}

current utterance

c

Inner Product Sentence

Encoder RNNin

x1 x2 xi

Contextual Sentence Encoder

x1 x2 xi

RNNmem

slot tagging sequence y

ht-1 ht

V V

W W W

wt-1 wt yt-1 yt

U U

M M

1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding

Idea: additionally incorporating contextual knowledge during slot tagging

→ track dialogue states in a latent way

RNN Tagger

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/IS16_ContextualSLU.pdf

(67)

NTUMIULAB U: “i d like to purchase tickets to see deepwater horizon”

S: “for which theatre”

U: “angelika”

S: “you want them for angelika theatre?”

U: “yes angelika”

S: “how many tickets would you like ?”

U: “3 tickets for saturday”

S: “What time would you like ?”

U: “Any time on saturday is fine”

S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm”

U: “Let’s do 5:40”

Analysis of Attention

0.69

0.13

0.16

(68)

NTUMIULAB

Role-Based & Time-Aware Attention

(Su+, 2018)

Dense Layer

+

wt wt+1 wT

Dense Layer

Spoken Language Understanding u2

u6

Tourist u4 Guide

u1

u7

Current

Sentence-Level Time-Decay Attention

u3 u5

Role-Level Time-Decay Attention

𝛼𝑟1 𝛼𝑟2

𝛼𝑢𝑖

∙ u2 ∙ u4 ∙ u5

𝛼𝑢2 𝛼𝑢4 𝛼𝑢5 𝛼𝑢∙ u1 1 𝛼𝑢∙ u3 3 𝛼𝑢∙ u6 6 History Summary

Time-Decay Attention Function (𝛼𝑢 & 𝛼𝑟)

𝛼

𝑑 𝛼

𝑑 𝛼

𝑑

convex linear concave

(69)

NTUMIULAB

Learnable Time-Decay Attention

(Su+, 2019)

Dense Layer

+

Current Utterance wt wt+1 wT

Dense

Layer Spoken Language Understanding

∙ u2 ∙ u4 ∙ u5

u2

u6

Tourist

u4

Guide u1

u7

Current

Sentence-Level Time-Decay Attention

u3 u5

𝛼𝑢2 𝛼𝑢∙ u1 1 ∙ u3 ∙ u6

Role-Level Time-Decay Attention

𝛼𝑟1 𝛼𝑟2

𝛼𝑢𝑖

𝛼𝑢4 𝛼𝑢5 𝛼𝑢3 𝛼𝑢6

History Summary

Attention Model

Attention Model

convex linear concave

S.-Y. Su, P.-C. Yuan, and Y.-N. Chen, "Learning Context-Sensitive Time-Decay Attention for Role-Based Dialogue Modeling," in Submission.

(70)

NTUMIULAB

Structural LU

(Chen et al., 2016)

• K-SAN: prior knowledge as a teacher

Knowledge 70

Encoding

Sentence Encoding

Inner Product

mi

Knowledge Attention Distribution

pi

Encoded Knowledge Representation

Weighted Sum

Knowledge- Guided Representation

slot tagging sequence knowledge-guided structure {xi}

showme theflights fromseattleto sanfrancisco

ROOT

Input Sentence

W W W W

wt-1

yt-1 U

wt M U

wt+1 U

V

yt V

yt+1 V M

M

RNN Tagger

Knowledge Encoding Module

http://arxiv.org/abs/1609.03286

(71)

NTUMIULAB

Structural LU

(Chen et al., 2016)

• Sentence structural knowledge stored as memory

71

Semantics (AMR Graph)

show me

the

flights from seattle

to

san francisco

ROOT

1.

3.

4.

2.

show

you flight I

1.

2.

4.

city city

Seattle San Francisco

3.

Sentence s

show me the flights from seattle to san francisco

Syntax (Dependency Tree)

http://arxiv.org/abs/1609.03286

(72)

NTUMIULAB

Structural LU

(Chen et al., 2016)

• Sentence structural knowledge stored as memory http://arxiv.org/abs/1609.03286

Using less training data with K-SAN allows the model pay the similar attention to the salient substructures that are important for tagging.

(73)

NTUMIULAB

Semantic Frame Representation

• Requires a domain ontology: early connection to backend

• Contains core content (intent, a set of slots with fillers)

find me a cheap taiwanese restaurant in oakland

show me action movies directed by james cameron

find_restaurant (price=“cheap”,

type=“taiwanese”, location=“oakland”)

find_movie (genre=“action”, director=“james cameron”) Restaurant

Domain

Movie Domain

restaurant price type

location

movie

year genre

director

73

(74)

NTUMIULAB

• Learning key domain concepts from goal-oriented human-human conversations

• Clustering with mutual information and KL divergence (Chotimongkol &

Rudnicky, 2002)

• Spectral clustering based slot ranking model (Chen et al., 2013)

• Use a state-of-the-art frame-semantic parser trained for FrameNet

• Adapt the generic output of the parser to the target semantic space

LU – Learning Semantic Ontology

(Chen+, 2013)

74

(75)

NTUMIULAB

• Transfer dialogue acts across domains

• Dialogue acts are similar for multiple domains

• Learning new intents by information from other domains

LU – Intent Expansion

(Chen+, 2016)

CDSSM New Intent

Intent Representation 12

K:

Embedding Generation

K+1

<change_calender> K+2 Training Data

<change_note>

“adjust my note”

:

<change_setting>

“volume turn down”

300 300 300 300

U A1 A2 An

CosSim

P(A1 | U) P(A2 | U) P(An | U)

Utterance Action

The dialogue act representations can be automatically learned for other domains

http://ieeexplore.ieee.org/abstract/document/7472838/

postpone my meeting to five pm

(76)

NTUMIULAB

LU – Language Extension

(Upadhyay+, 2018)

• Source language: English (full annotations)

• Target language: Hindi (limited annotations)

76

RT: round trip, FC: from city, TC: to city, DDN: departure day name

http://shyamupa.com/papers/UFTHH18.pdf

(77)

NTUMIULAB

LU – Language Extension

(Upadhyay+, 2018)

77

English Train

Hindi Train

Hindi Tagger

MT SLU

Results Hindi Test

Train on Target (Lefevre et al, 2010)

English Tagger Hindi

Test

English

MT Test SLU

Results Test on Source (Jabaian et al, 2011)

SLU Results Hindi Train (Small)

Bilingual Tagger English Train (Large)

Joint Training

Hindi Test Joint Training

MT system is not required and both languages can be processed by a single model

http://shyamupa.com/papers/UFTHH18.pdf

(78)

NTUMIULAB

• Introduction

• Background Knowledge

• Modular Dialogue System

• Spoken/Natural Language Understanding (SLU/NLU)

• Dialogue Management

• Dialogue State Tracking (DST)

• Dialogue Policy Optimization

• Natural Language Generation (NLG)

• System Evaluation

• Recent Trends of Learning Dialogues

Outline

78

(79)

NTUMIULAB

Elements of Dialogue Management

79

(Figure from Gašić)

Dialogue State Tracking

(80)

NTUMIULAB

• Maintain a probabilistic distribution instead of a 1-best prediction for better robustness

Dialogue State Tracking (DST)

80

Incorrect for both!

(81)

NTUMIULAB

Dialogue State Tracking (DST)

• Maintain a probabilistic distribution instead of a 1-best prediction for better robustness to SLU errors or ambiguous input

81

How can I help you?

Book a table at Sumiko for 5 How many people?

3

Slot Value

# people 5 (0.5)

time 5 (0.5)

Slot Value

# people 3 (0.8)

time 5 (0.8)

(82)

NTUMIULAB

• A full representation of the system's belief of the user's goal at any point during the dialogue

• Used for making API calls

Multi-Domain Dialogue State Tracking

82

Movies

Less Likely

More Likely Date

Time

#People

6 pm 2 11/15/17

7 pm 8 pm 9 pm

Century 16 Shoreline

#People Theater

Inferno.

Inferno Movie

Which movie are you interested in?

I wanna buy two tickets for tonight at the Shoreline theater.

(83)

NTUMIULAB

• A full representation of the system's belief of the user's goal at any point during the dialogue

• Used for making API calls

Multi-Domain Dialogue State Tracking

83

Movies

Less Likely

More Likely

I wanna buy two tickets for tonight at the Shoreline theater.

Date Time

#People

6:30 pm 2 11/15/17

7:30 pm 8:45 pm 9:45 pm

Century 16 Shoreline

#People Theater

Which movie are you interested in?

Inferno.

Inferno Movie

Inferno showtimes at Century 16 Shoreline are 6:30pm, 7:30pm, 8:45pm and 9:45pm. What time do you prefer?

We'd like to eat dinner before the movie at Cascal, can you check what time i can get a table?

Restaurants

6:00 pm 6:30 pm 11/15/17

Date

Time 7:00 pm

Cascal

#People 2 Restaurant

(84)

NTUMIULAB

• A full representation of the system's belief of the user's goal at any point during the dialogue

• Used for making API calls

Multi-Domain Dialogue State Tracking

84

Movies

Less Likely

More Likely Date

Time

#People

6:30 pm 2 11/15/17

7:30 pm 8:45 pm 9:45 pm

Century 16 Shoreline

#People Theater

Inferno.

Inferno Movie

Inferno showtimes at Century 16 Shoreline are 6:30pm, 7:30pm, 8:45pm and 9:45pm. What time do you prefer?

We'd like to eat dinner before the movie at Cascal, can you check what time i can get a table?

Restaurants

6:00 pm 6:30 pm 11/15/17

Date

Time 7:00 pm

Cascal

Cascal has a table for 2 at 6pm and 7:30pm.

OK, let me get the table at 6 and tickets for the 7:30 showing.

#People 2 Restaurant

(85)

NTUMIULAB

RNN-CNN DST

(Mrkšić+, 2015)

85

(Figure from Wen et al, 2016)

http://www.anthology.aclweb.org/W/W13/W13-4073.pdf; https://arxiv.org/abs/1506.07190; https://arxiv.org/abs/1606.03777

(86)

NTUMIULAB

Neural Belief Tracker

(Mrkšić+, 2016)

• Candidate pairs are considered

https://arxiv.org/abs/1606.03777

Belief State Updates: [bt]

Previous Belief State: [bt-1]

(87)

NTUMIULAB

• More advanced encoder

• Global modules share parameters for all slots

• Local modules learn slot-specific feature representations

Global-Locally Self-Attentive DST

(Zhong+, 2018)

87

http://www.aclweb.org/anthology/P18-1135

(88)

NTUMIULAB

Dialog State Tracking Challenge (DSTC)

(Williams et al. 2013, Henderson et al. 2014, Henderson et al. 2014, Kim et al. 2016, Kim et al. 2016)

Challenge Type Domain Data Provider Main Theme

DSTC1 Human-Machine Bus Route CMU Evaluation Metrics

DSTC2 Human-Machine Restaurant U. Cambridge User Goal Changes

DSTC3 Human-Machine Tourist Information U. Cambridge Domain Adaptation

DSTC4 Human-Human Tourist Information I2R Human Conversation

DSTC5 Human-Human Tourist Information I2R Language Adaptation

參考文獻

相關文件

Shang-Yu Su, Chao-Wei Huang, and Yun-Nung Chen, “Dual Supervised Learning for Natural Language Understanding and Generation,” in Proceedings of The 57th Annual Meeting of

Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm.

 End-to-end reinforcement learning dialogue system (Li et al., 2017; Zhao and Eskenazi, 2016)?.  No specific goal, focus on

Reading: Stankovic, et al., “Implications of Classical Scheduling Results for Real-Time Systems,” IEEE Computer, June 1995, pp.. Copyright: All rights reserved, Prof. Stankovic,

A spoken language understanding (SLU) component requires the domain ontology to decode utterances into semantic forms, which contain core content (a set of slots and slot-fillers)

Agent: Okay, I will issue 2 tickets for you, tomorrow 9:00 pm at AMC pacific place 11 theater, Seattle, movie ‘Deadpool’. User:

For example, as a user of deep learning, you probably need to roughly know how it worksX. Otherwise you might now know what you are doing and what kinds of results you

Ongoing Projects in Image/Video Analytics with Deep Convolutional Neural Networks. § Goal – Devise effective and efficient learning methods for scalable visual analytic