• 沒有找到結果。

Towards Conversational AI

N/A
N/A
Protected

Academic year: 2022

Share "Towards Conversational AI"

Copied!
96
0
0

全文

(1)

Towards Conversational AI

Applied Deep Learning

May 31st, 2021 http://adl.miulab.tw

(2)

What can machines achieve now or in the future?

Iron Man (2008)

2

(3)

Language Empowering Intelligent Assistants

Apple Siri (2011) Google Now (2012)

Google Home (2016)

Microsoft Cortana (2014)

Amazon Alexa/Echo (2014)

Google Assistant (2016)

Apple HomePod (2017) Facebook Portal (2019)

3

(4)

Why Natural Language?

Global Digital Statistics (2021 January)

Internet Users 4.66B (59.5%)

Unique Mobile Users

5.22B (66.6%)

The more natural and convenient input of devices evolves towards speech.

Active Mobile Social Users 4.20B (53.6%) Total Population

7.83B

4

(5)

Why and When We Need?

“I want to chat”

“I have a question”

“I need to get this done”

“What should I do?”

Turing Test (talk like a human) Information consumption

Task completion Decision support

Social Chit-Chat Task-Oriented Dialogues

• Is this course good to take?

• Book me the train ticket from Kaohsiung to Taipei

• Reserve a table at Din Tai Fung for 5 people, 7PM tonight

• Schedule a meeting with Vivian at 10:00 tomorrow

• What is today’s agenda?

• What does NLP stand for?

5

(6)

Intelligent Assistants

Task-Oriented

6

(7)

App → Bot

A bot is responsible for a “single” domain, similar to an app

Users can initiate dialogues instead of following the GUI design

7

(8)

Two Branches of Conversational AI

Chit-Chat

Task-Oriented

8

(9)

Task-Oriented Dialogue Systems

9

(10)

Task-Oriented Dialogue Systems

(Young, 2000)

LU: Language

Understanding DST: Dialogue State Tracking

DP: Dialogue Policy Learning NLG: Natural

Language Generation For how many people?

ASR

TTS Can you help me book a

5-star hotel on Sunday?

10

(11)

Modular Task-Oriented Dialogue Systems

Language Understanding

11

(12)

Language Understanding (LU)

NLU is a turn-level task that maps utterances to semantics frames.

Input: raw user utterance

Output: semantic frame (e.g. speech-act, intent, slots)

DP

For two people, thanks! DST

people_num=2

NLG

LU: Language Understanding 12

(13)

Language Understanding (LU)

Pipelined

1. Domain Classification

2. Intent

Classification 3. Slot Filling 13

(14)

1. Domain Identification

Requires Predefined Domain Ontology

find a good eating place for taiwanese food

User

Organized Domain Knowledge (Database)

Intelligent Agent

Restaurant DB Taxi DB Movie DB

Classification!

14

(15)

2. Intent Detection

Requires Predefined Schema

find a good eating place for taiwanese food

User

Intelligent Agent

Restaurant DB

FIND_RESTAURANT FIND_PRICE

FIND_TYPE :

Classification!

15

(16)

3. Slot Filling

Requires Predefined Schema

find a good eating place for taiwanese food

User

Intelligent Agent

Restaurant DB

Restaurant Rating Type Rest 1 good Taiwanese

Rest 2 bad Thai

: : :

FIND_RESTAURANT rating=“good”

type=“taiwanese”

SELECT restaurant { rest.rating=“good”

rest.type=“taiwanese”

Semantic Frame } Sequence Labeling O O B-rating O O O B-type O

16

(17)

Slot Tagging

(Yao et al, 2013; Mesnil et al, 2015)

Variations:

a.

RNNs with LSTM cells

b.

Input, sliding window of n-grams

c.

Bi-directional LSTMs

𝑤0 𝑤1 𝑤2 𝑤𝑛 0𝑓 1𝑓 2𝑓 𝑛𝑓 0𝑏 1𝑏 2𝑏 𝑛𝑏 𝑦0 𝑦1 𝑦2 𝑦𝑛

(b) LSTM-LA (c) bLSTM 𝑦0 𝑦1 𝑦2 𝑦𝑛

𝑤0 𝑤1 𝑤2 𝑤𝑛 0 1 2 𝑛

(a) LSTM

𝑦0 𝑦1 𝑦2 𝑦𝑛

𝑤0 𝑤1 𝑤2 𝑤𝑛 0 1 2 𝑛

http://131.107.65.14/en-us/um/people/gzweig/Pubs/Interspeech2013RNNLU.pdf; http://dl.acm.org/citation.cfm?id=2876380

17

(18)

Slot Tagging

(Kurata et al., 2016; Simonnet et al., 2015)

Encoder-decoder networks

Leverages sentence level information

Attention-based encoder-decoder

Use of attention (as in MT) in the encoder-decoder network

Attention is estimated using a feed-

forward network with input: h

t

and s

t

at time t

𝑦0 𝑦1 𝑦2 𝑦𝑛

𝑤𝑛 𝑤2 𝑤1 𝑤0 𝑛 2 1 0

𝑤0 𝑤1 𝑤2 𝑤𝑛

𝑦0 𝑦1 𝑦2 𝑦𝑛

𝑤0 𝑤1 𝑤2 𝑤𝑛

0 1 2 𝑛 𝑠0 𝑠1 𝑠2 𝑠𝑛 ci

0𝑛

http://www.aclweb.org/anthology/D16-1223

18

(19)

ht-1 ht ht+1

W W W W

taiwanese

B-type U

food U

please U

V

O V

O V

hT+1 EOS U

FIND_REST V

Slot Filling Intent Prediction

Joint Semantic Frame Parsing

Sequence-based

(Hakkani-Tur+, 2016)

Parallel-based

(Liu and Lane, 2016)

Attention Mechanism

Intent-Slot Relationship Sequence-based (Hakkani-Tur+, ‘16) X Δ (Implicit) Parallel-based (Liu & Lane, ‘16) Δ (Implicit)

Slot-Gated Joint Model √ (Explicit)

19

(20)

Slot-Gated Joint SLU

(Goo+, 2018)

Slot Attention

Intent Attention 𝑦𝐼

Word Sequence

𝑥1 𝑥2 𝑥3 𝑥4

BLSTM Slot

Sequence

𝑦1𝑆 𝑦2𝑆 𝑦3𝑆 𝑦4𝑆

Word

Sequence 𝑥1 𝑥2 𝑥3 𝑥4 BLSTM

Slot Gate

𝑊

𝑐𝐼

𝑣 tanh

𝑔

𝑐𝑖𝑆

Slot Gate

𝑔 = ∑𝑣 ∙ tanh 𝑐𝑖𝑆 + 𝑊 ∙ 𝑐𝐼 Slot Prediction

𝑦𝑖𝑆 = softmax 𝑊𝑆 𝑖 + 𝒈 ∙ 𝑐𝑖𝑆 + 𝑏𝑆

𝒈 will be larger if slot and intent are better related

20

(21)

Contextual Language Understanding

User utterances are highly ambiguous in isolation

Cascal, for 6.

#people time

?

Book a table for 10 people tonight.

Which restaurant would you like to book a table for?

Restaurant Booking

21

(22)

End-to-End Memory Networks

(Sukhbaatar et al, 2015)

m0

mi

mn-1

u U: “i d like to purchase tickets to see deepwater horizon”

S: “for which theatre”

U: “angelika”

S: “you want them for angelika theatre?”

U: “yes angelika”

S: “how many tickets would you like ?”

U: “3 tickets for saturday”

S: “What time would you like ?”

U: “Any time on saturday is fine”

S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm”

U: “Let’s do 5:40”

22

(23)

E2E MemNN for Contextual LU

(Chen+, 2016)

u

Knowledge Attention Distribution

pi

mi

Memory Representation

Weighted

Sum h

Wkg

o

Knowledge Encoding Representation history utterances

{xi} current utterance

c

Inner Product Sentence

Encoder RNNin

x1 x2 xi

Contextual Sentence Encoder

x1 x2 xi

RNNmem

slot tagging sequence y

ht-1 ht

V V

W W W

wt-1 wt yt-1 yt

U U

M M

1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding

Idea: additionally incorporating contextual knowledge during slot tagging

RNN Tagger

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/IS16_ContextualSLU.pdf

23

(24)

E2E MemNN for Contextual LU

(Chen+, 2016)

0.69

0.13

0.16

U: “Let’s do 5:40”

U: “i d like to purchase tickets to see deepwater horizon”

S: “for which theatre”

U: “angelika”

S: “you want them for angelika theatre?”

U: “yes angelika”

S: “how many tickets would you like ?”

U: “3 tickets for saturday”

S: “What time would you like ?”

U: “Any time on saturday is fine”

S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm”

24

(25)

Recent Advances in NLP

Contextual Embeddings (ELMo & BERT)

Boost many understanding performance with pre- trained language models

?

25

(26)

26

(27)

27

(28)

Robustness – Adapting to ASR

(Huang & Chen, 2019)

LSTM LSTM LSTM

What a day

Linear

a day <EOS>

Stage 1: Pre-Training on Sequential Texts

LatticeLSTM

the, 1.0

LatticeLSTM Max pooling

classification Fine-Tuning

the, 1.0 0.8

0.2

Linear

0.9 1.0 1.0

0.1

1.0 1.0

Stage 2: Pre-Training on Lattices

LatticeLSTM

28

(29)

Robustness – Adapting to ASR

(Huang & Chen, 2019)

Idea: lattices may include correct words

Goal: feed lattices into Transformer

Transformer Encoder 𝑤1 𝑤2 . . .

𝑤𝑚−1𝑤𝑚

<S> <E>

Linear

𝑦

<s> cheapest airfare

fair

affair air

to Milwaukee </s>

1

0.4 0.3 0.3

1

1 1

1

1 1

Chao-Wei Huang and Yun-Nung Chen, “Adapting Pretrained Transformer to Lattices for Spoken Language Understanding,”

in Proceedings of 2019 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2019.

SLU performance is improved by leveraging the lattices without increasing training/inference time 29

(30)

Robustness – Adapting to ASR

(Huang & Chen, 2019)

Chao-Wei Huang and Yun-Nung Chen, “Learning ASR-Robust Contextualized Embeddings for Spoken Language Understanding,” in ICASSP, 2019.

The contextual embeddings of the recognized texts would be similar to the ground truth one.

Confusion-Aware Fine-Tuning

Supervised

Unsupervised

30

(31)

Scalability – Multilingual LU

(Upadhyay+, 2018)

Source language: English (full annotations)

Target language: Hindi (limited annotations)

RT: round trip, FC: from city, TC: to city, DDN: departure day name

http://shyamupa.com/papers/UFTHH18.pdf

31

(32)

Scalability – Multilingual LU

(Upadhyay+, 2018)

English Train

Hindi Train

Hindi Tagger

MT SLU

Results Hindi Test

Train on Target (Lefevre et al, 2010)

English Tagger Hindi

Test

English

MT Test SLU

Results Test on Source (Jabaian et al, 2011)

SLU Results Hindi Train (Small)

Bilingual Tagger English Train (Large)

Joint Training

Hindi Test Joint Training

MT system is not required and both languages can be processed by a single model

http://shyamupa.com/papers/UFTHH18.pdf

32

(33)

LU Evaluation

Metrics

Sub-sentence-level: domain/intent accuracy, slot F1

Sentence-level: whole frame accuracy

Utterance: For 2 people thanks

Slot: O B-people O O Domain: Hotel

Intent: Hotel_Book ⇒ Acc

⇒ Slot-F1

⇒ Frame Accuracy 33

(34)

Modular Task-Oriented Dialogue Systems

Dialogue State Tracking

34

(35)

Dialogue State Tracking

DST is a dialogue-level task that maps partial dialogues into dialogue states.

Input: a dialogue / a turn with its previous state

Output: dialogue state (e.g. slot-value pairs)

Hotel_Book ( star=5

day=sunday )

Hotel_Book ( star=5

day=sunday people_num=2) Can you help me book a

5-star hotel on Sunday?

DP NLG

NLU people_num=2

For two people, thanks! DST: Dialogue

State Tracking 35

(36)

Dialogue State Tracking

request (restaurant; foodtype=Thai)

inform (area=centre)

request (address)

bye ()

36

(37)

Dialogue State Tracking

Requires Hand-Crafted States

User

Intelligent Agent

find a good eating place for taiwanese food

location rating type

loc, rating

rating, type

loc, type all

i want it near to my office

NULL

37

(38)

Dialogue State Tracking

Requires Hand-Crafted States

User

Intelligent Agent

find a good eating place for taiwanese food

location rating type

loc, rating

rating, type

loc, type all

i want it near to my office

NULL

38

(39)

Dialogue State Tracking

Handling Errors and Confidence

User

Intelligent Agent

find a good eating place for taixxxx food

FIND_RESTAURANT rating=“good”

type=“taiwanese”

FIND_RESTAURANT rating=“good”

type=“thai”

FIND_RESTAURANT rating=“good”

location rating type

loc, rating

rating, type

loc, type all

NULL

?

?

rating=“good”, type=“thai”

rating=“good”, type=“taiwanese”

?

?

39

(40)

DST Problem Formulation

The DST dataset consists of

Goal: for each informable slot

e.g. price=cheap

Requested: slots by the user

e.g. moviename

Method: search method for entities

e.g. by constraints, by name

The dialogue state is

the distribution over possible slot-value pairs for goals

the distribution over possible requested slots

the distribution over possible methods

40

(41)

Dialogue State Tracking (DST)

Maintain a probabilistic distribution instead of a 1-best prediction for better robustness to SLU errors or ambiguous input

How can I help you?

Book a table at Sumiko for 5 How many people?

3

Slot Value

# people 5 (0.5)

time 5 (0.5)

Slot Value

# people 3 (0.8)

time 5 (0.8)

41

(42)

Multi-Domain Dialogue State Tracking

A full representation of the system's belief of the user's goal at any point during the dialogue

Used for making API calls

Movies

Less Likely

More Likely Date

Time

#People

6 pm 2 11/15/17

7 pm 8 pm 9 pm

Century 16 Shoreline

#People Theater

Inferno.

Inferno Movie

Which movie are you interested in?

I wanna buy two tickets for tonight at the Shoreline theater.

42

(43)

Multi-Domain Dialogue State Tracking

A full representation of the system's belief of the user's goal at any point during the dialogue

Used for making API calls

Movies

Less Likely

More Likely

I wanna buy two tickets for tonight at the Shoreline theater.

Date Time

#People

6:30 pm 2 11/15/17

7:30 pm 8:45 pm 9:45 pm

Century 16 Shoreline

#People Theater

Which movie are you interested in?

Inferno.

Inferno Movie

Inferno showtimes at Century 16 Shoreline are 6:30pm, 7:30pm, 8:45pm and 9:45pm. What time do you prefer?

We'd like to eat dinner before the movie at Cascal, can you check what time i can get a table?

Restaurants

6:00 pm 6:30 pm 11/15/17

Date

Time 7:00 pm

Cascal

#People 2 Restaurant

43

(44)

Multi-Domain Dialogue State Tracking

A full representation of the system's belief of the user's goal at any point during the dialogue

Used for making API calls

Movies

Less Likely

More Likely Date

Time

#People

6:30 pm 2 11/15/17

7:30 pm 8:45 pm 9:45 pm

Century 16 Shoreline

#People Theater

Inferno.

Inferno Movie

Inferno showtimes at Century 16 Shoreline are 6:30pm, 7:30pm, 8:45pm and 9:45pm. What time do you prefer?

We'd like to eat dinner before the movie at Cascal, can you check what time i can get a table?

Restaurants

6:00 pm 6:30 pm 11/15/17

Date

Time 7:00 pm

Cascal

Cascal has a table for 2 at 6pm and 7:30pm.

OK, let me get the table at 6 and tickets for the 7:30 showing.

#People 2 Restaurant

44

(45)

Discriminative DST – Single Turn

Data

Model

Prediction

• Observations labeled w/ dialogue state

• Distribution over dialogue states – Dialogue State Tracking

• Neural networks

• Ranking models

45

(46)

DNN for DST

feature

extraction DNN

A slot value distribution for each slot

multi-turn conversation

state of this turn 46

(47)

Discriminative DST – Multiple Turns

Data

Model

Prediction

• Sequence of observations labeled w/ dialogue states

• Distribution over dialogue states – Dialogue State Tracking

• Sequential models

– Recurrent neural networks (RNN)

47

(48)

Recurrent Neural Network (RNN)

Elman-type

Jordan-type

48

(49)

RNN-Based DST

Idea: internal memory for representing dialogue context

Input

most recent dialogue turn

last machine dialogue act

dialogue state

memory layer

Output

update its internal memory

distribution over slot values 49

(50)

RNN-CNN DST

(Mrkšić+, 2015)

(Figure from Wen et al, 2016)

http://www.anthology.aclweb.org/W/W13/W13-4073.pdf; https://arxiv.org/abs/1506.07190; https://arxiv.org/abs/1606.03777

50

(51)

Global-Locally Self-Attentive DST

(Zhong+, 2018)

More advanced encoder

Global modules share parameters for all slots

Local modules learn slot-specific feature representations

http://www.aclweb.org/anthology/P18-1135

51

(52)

Generative DST

Generating the state as a sequence (Lei+, 2018) or dialogue state updates (Lin+, 2020)

(Dialogue history) ⇒ (slot1=val,slot2=val …)

Given a dialogue and a slot, generate the value of the slot ( Wu+, 2019; Gao+, 2019; Ren+, 2019; Zhou & Small, 2019; Kim+, 2019;

Le+, 2020) ⇒ requires multiple forwards

(Dialogue history, slot1) ⇒ val

52

(53)

Handling Unknown Slot Values

(Xu & Hu, 2018)

Issue: fixed value sets in DST

http://aclweb.org/anthology/P18-1134

<sys> would you like some Thai food

Attention Dist.

<usr> I prefer Italian one <food>

“Italian”

other dontcare

none

Italian

Pointer networks for generating unknown values 53

(54)

NONE DONTCARE

Context PTR

Vector Ashley

Slot Gate

Ex: hotel

Utterances

…....

Bot: Which area are you looking for the hotel?

User: There is one at east town called Ashley Hotel.

Utterance Encoder

Domains Hotel, Train,

Attraction, Restaurant, Taxi

Slots

Price, Area, Day, Departure, name, LeaveAt, food, etc.

State Generator

Ashley

Ex: name

Hotel?

TRADE: Transferable DST

(Wu+, 2019) 54

(55)

TripPy: Handling OOV & Rare Values

(Heck+, 2020)

55

(56)

DST Evaluation

Dialogue State Tracking Challenges

DSTC2-3, human-machine

DSTC4-5, human-human

DSTC8, human-machine

Metric

Tracked state accuracy with respect to user goal

Recall/Precision/F-measure individual slots

Input Dialogue:

USER: Can you help me book a 5- star hotel on Sunday?

SYSTEM: For how many people?

USER: For two people, thanks!

Output Dialogue State:

Hotel_Book (star=5, day=sunday) Hotel_Book (star=5, day=sunday, people_num=2)

⇒ Slot Acc / Joint Acc 56

(57)

Dialog State Tracking Challenge (DSTC)

(Williams et al. 2013, Henderson et al. 2014, Henderson et al. 2014, Kim et al. 2016, Kim et al. 2016)

Challenge Type Domain Data Provider Main Theme

DSTC1 Human-Machine Bus Route CMU Evaluation Metrics DSTC2 Human-Machine Restaurant U. Cambridge User Goal Changes DSTC3 Human-Machine Tourist Information U. Cambridge Domain Adaptation DSTC4 Human-Human Tourist Information I2R Human Conversation DSTC5 Human-Human Tourist Information I2R Language Adaptation

57

(58)

DSTC4-5

◉ Type: Human-Human

◉ Domain: Tourist Information

Tourist: Can you give me some uh- tell me some cheap rate hotels, because I'm planning just to leave my bags there and go somewhere take some pictures.

Guide: Okay. I'm going to recommend firstly you want to have a backpack type of hotel, right?

Tourist: Yes. I'm just gonna bring my backpack and my buddy with me. So I'm kinda looking for a hotel that is not that expensive. Just gonna leave our things there and, you know, stay out the whole day.

Guide: Okay. Let me get you hm hm. So you don't mind if it's a bit uh not so roomy like hotel because you just back to sleep.

Tourist: Yes. Yes. As we just gonna put our things there and then go out to take some pictures.

Guide: Okay, um- Tourist: Hm.

Guide: Let's try this one, okay?

Tourist: Okay.

Guide: It's InnCrowd Backpackers Hostel in Singapore.

If you take a dorm bed per person only twenty dollars. If you take a room, it's two single beds at fifty nine dollars.

Tourist: Um. Wow, that's good.

Guide: Yah, the prices are based on per person per bed or dorm. But this one is room. So it should be fifty nine for the two room. So you're actually paying about ten dollars more per person only.

Tourist: Oh okay. That's- the price is reasonable actually. It's good.

{Topic: Accommodation; Type: Hostel; Pricerange:

Cheap; GuideAct: ACK; TouristAct: REQ}

{Topic: Accommodation; NAME: InnCrowd

Backpackers Hostel; GuideAct: REC; TouristAct: ACK}

58

(59)

Multi-Domain DST Data

MultiWoZ 2.0 ⇒ 2.1 ⇒ 2.2 ⇒ 2.3 ⇒ ……

SGD: natural language described schema for better scalability

59

(60)

MultiWOZ 2.1 Leaderboard

60

(61)

Modular Task-Oriented Dialogue Systems

Dialogue Policy Learning

61

(62)

Dialogue Policy Learning

DP decides the system action for interacting with users based on dialogue states.

Input: dialogue state + KB results

Output: system action (speech-act + slot-value pairs)

62

NLU DST

DP: Dialogue Policy Learning

NLG

Inform (

hotel_name=B&B )

KB Hotel_Book (

star=5

day=sunday people_num=2) Can you help me book a

5-star hotel on Sunday?

For two people, thanks!

(63)

Dialogue Policy Learning

request (restaurant; foodtype=Thai)

inform (area=centre)

request (address)

bye ()

greeting ()

request (area)

inform (restaurant=Bangkok city, area=centre of town, foodtype=Thai)

inform (address=24 Green street)

63

(64)

Supervised v.s. Reinforcement

Supervised

Reinforcement

……

Say “Hi”

Say “Good bye”

Learning from teacher

Learning from critics

Hello ☺ ……

“Hello”

“Bye bye”

……. …….

OXX???

!

Bad

64

(65)

Dialogue Policy Optimization

Dialogue management in a RL framework

U s e r

Reward R Observation O Action A

Environment

Agent

Natural Language Generation Language Understanding

Dialogue Manager

Select the best action that maximizes the future reward 65

(66)

Reward for RL ≅ Evaluation for System

Dialogue is a special RL task

Human involves in interaction and rating (evaluation) of a dialogue

Fully human-in-the-loop framework

Rating: correctness, appropriateness, and adequacy

- Expert rating high quality, high cost

- User rating unreliable quality, medium cost - Objective rating Check desired aspects, low cost

66

(67)

Dialogue Reinforcement Learning Signal

Typical reward function

-1 for per turn penalty

Large reward at completion if successful

Typically requires domain knowledge

✔ Simulated user

✔ Paid users (Amazon Mechanical Turk)

✖ Real users

| | |

The user simulator is usually required for dialogue system training before deployment

67

(68)

Neural Dialogue Manager

(Li et al., 2017)

Deep Q-network for training dialogue policy

Input: current semantic frame observation, database returned results

Output: system action

Semantic Frame request_movie

genre=action, date=this weekend

System

Action/Policy request_location

DQN-based Dialogue Management Simulated User (DM)

Backend DB

https://arxiv.org/abs/1703.01008

68

(69)

E2E Task-Completion Bot (TC-Bot)

(Li et al., 2017)

Idea: SL for each component and RL for end-to-end training the neural dialogue system

wi

<slot>

wi+1

O

EOS

<intent>

wi

<slot>

wi+1

O

EOS

<intent>

Knowledge Database

Neural Dialogue System

User Model

User Simulation

Dialogue Policy Natural Language

w0 w1NLGw2 EOS

User Goal

wi

<slot>

wi+1

O

EOS

<intent>

LU

𝑠𝑡 DST

𝑠1 𝑠2 𝑠𝑛

𝑎1 𝑎2 𝑎𝑘

……

Dialogue Policy Learning Are there any action

movies to see this weekend?

request_location

69

(70)

User goal Two tickets for “Deadpool” tomorrow 9PM at AMC Pacific Place 11 theater, Seattle.

RULE BASED SYSTEM

User: Do you have any ticket for movie ‘Deadpool’ at 9:00 pm?

Agent: Which theater do you want?

User: AMC pacific place 11 theater, please!

Agent: How many tickets do you want?

User: Two!

Agent: What date?

User: Tomorrow!

Agent: Which city do you want?

User: Seattle!

Agent: Do you have any specification for zip code?

User: No, I do not care.

Agent: Do you have any distance constraint?

User: No, I do not care.

Agent: How about video format? 2d? Imax 3d?

User: Any kind is okay.

Agent: Do you have any constraint to price?

User: No!

Agent: Okay, I will issue 2 tickets for you, tomorrow 9:00 pm at AMC pacific place 11 theater, Seattle, movie

‘Deadpool’.

User: Okay, thanks.

Agent: Thanks!

REINFORCEMENT LEARNING SYSTEM User: I want two tickets for movie ‘Deadpool’.

Agent: What is the starttime you prefer?

User: 9:00 PM Agent: Which day?

User: Tomorrow Agent: Which city?

User: Seattle

Agent: Which theater do you prefer?

User: AMC Pacific Place 11 theater.

Agent: Okay, I will issue 2 tickets for you, tomorrow 9:00 pm at AMC pacific place 11 theater, Seattle, movie ‘Deadpool’.

User: Okay, thanks.

Agent: Thanks!

Reinforcement Learning Agent

(Li et al., 2017)

Skip the requests the user may not care about to improve efficiency

Issue: no notion

about what requests can be skipped

70

(71)

Online Training

(Su+, 2015; Su+, 2016)

Policy learning from real users

Infer reward directly from dialogues (Su et al., 2015)

User rating (Su et al., 2016)

Reward modeling on user binary success rating

Reward

Model Success/Fail Embedding

Function

Dialogue Representation

Reinforcement Signal

Query rating 71

(72)

Interactive RL for DP

(Shah+, 2016)

Immediate Feedback

https://research.google.com/pubs/pub45734.html

Use a third agent for providing interactive feedback to the policy

72

(73)

Planning – Deep Dyna-Q

(Peng+, 2018)

◉ Idea: learning with real users with planning

Policy Model

World User Model

Real Experience

Direct

Reinforcement Learning World Model

Learning

Planning

Acting Human

Conversational Data Imitation

Learning Supervised

Learning

Policy learning suffers from the poor quality of fake experiences

73

(74)

Robust Planning – D3Q (Su+, 2018)

Idea: add a discriminator to filter out the bad experiences

Policy Model

World User Model

Real Experience

Direct

Reinforcement Learning World Model

Learning

Controlled Planning

Acting Human

Conversational Data

Imitation Learning Supervised

Learning

Discriminator Discriminative

Training

S.-Y. Su, X. Li, J. Gao, J. Liu, and Y.-N. Chen, “Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning," (to appear) in Proc. of EMNLP, 2018.

NLU Discriminator

System Action (Policy)

Semantic Frame

State

Representation Real

Experience

DST

Policy Learning NLG

Simulated Experience

World Model User

74

(75)

Robust Planning – D3Q

(Su+, 2018)

The policy learning is more robust and shows the improvement in human evaluation

75

(76)

Multi-Domain – Hierarchical RL

(Peng+, 2017)

Travel Planning

Actions

Set of tasks that need to be fulfilled collectively!

Build a DM for cross-subtask constraints (slot constraints)

Temporally constructed goals

hotel_check_in_time > departure_flight_time

# flight_tickets = #people checking in the hotel

hotel_check_out_time< return_flight_time

https://arxiv.org/abs/1704.03084

76

(77)

Multi-Domain – Hierarchical RL

(Peng+, 2017)

Model makes decisions over two levels: meta-controller & controller

The agent learns these policies simultaneously

Policy of optimal sequence of goals to follow 𝜋𝑔 𝑔𝑡, 𝑠𝑡; 𝜃1

Policy 𝜋𝑎,𝑔 𝑎𝑡, 𝑔𝑡, 𝑠𝑡; 𝜃2 for each sub-goal 𝑔𝑡

Meta-Controller

Controller

(mitigate reward sparsity issues)

Multiple policies need to collaborate with each other for better multi-domain interactions 77

(78)

Dialogue Policy Evaluation

Metrics

Turn-level evaluation: system action accuracy

Dialogue-level evaluation: task success rate, reward

78

Dialogue State:

Hotel_Book ( star=5, day=sunday, people_num=2 ) KB State:

rest1=B&B

System Action:

inform ( hotel_name=B&B )

(79)

Modular Task-Oriented Dialogue Systems

Natural Language Generation

79

(80)

Natural Language Generation

NLG is to map system actions to natural language responses.

Input: system speech-act + slot-value (optional)

Output: natural language response

80

NLU DST

I have book a hotel B&B for you. DP

Inform (

hotel_name=B&B ) NLG: Natural

Language Generation Can you help me book a

5-star hotel on Sunday?

For two people, thanks!

(81)

Template-Based NLG

Define a set of rules to map frames to natural language

Pros: simple, error-free, easy to control

Cons: time-consuming, rigid, poor scalability

Semantic Frame Natural Language

confirm() “Please tell me more about the product you are looking for.”

confirm(area=$V) “Do you want somewhere in the $V?”

confirm(food=$V) “Do you want a $V restaurant?”

confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.”

81

(82)

RNN-Based LM NLG

(Wen et al., 2015)

<BOS> SLOT_NAME serves SLOT_FOOD .

<BOS> Din Tai Fung serves Taiwanese . delexicalisation

Inform(name=Din Tai Fung, food=Taiwanese) 0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, 0, 0, 0…

dialogue act 1-hot representation

SLOT_NAME serves SLOT_FOOD . <EOS>

conditioned on the dialogue act

Input

Output 82

(83)

xt ht-1 xt ht-1 xt ht-1

LSTM cell

Ct it

ft

ot ht xt

ht-1

Semantic Conditioned LSTM

(Wen et al., 2015)

Issue: semantic repetition

Din Tai Fung is a great Taiwanese restaurant that serves Taiwanese.

Din Tai Fung is a child friendly restaurant, and also allows kids.

DA cell

rt dt

dt-1

xt ht-1

Inform(name=Seven_Days, food=Chinese)

0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, … dialog act 1-hot representation d0

Idea: using gate mechanism to control the generated semantics (dialogue act/slots) 83

(84)

Structural NLG

(Sharma+, 2017; Nayak+, 2017)

Delexicalized slots do not consider the word level information

Slot value-informed sequence to sequence models

84

(85)

Contextual NLG

(Dušek and Jurčíček, 2016)

Goal: adapting users’ speaking way, providing context-aware responses

Context encoder

Seq2Seq model

https://www.aclweb.org/anthology/W/W16/W16-36.pdf#page=203

85

(86)

Controlled Text Generation

(Hu et al., 2017)

Idea: NLG based on generative adversarial network (GAN) framework

c: targeted sentence attributes

86

(87)

Issues in NLG

Issue

NLG tends to generate shorter sentences

NLG may generate grammatically-incorrect sentences

Solution

Generate word patterns in an order

Consider linguistic patterns

87

(88)

Hierarchical NLG w/ Linguistic Patterns

(Su et al., 2018)

Bidirectional GRU Encoder

Italian priceRange

name

ENCODER

name[Midsummer House], food[Italian], priceRange[moderate], near[All Bar One]

All Bar One place it Midsummer House

All Bar One is priced place it is called Midsummer House All Bar One is moderately priced Italian place it is called Midsummer House

Near All Bar One is a moderately priced Italian place it is called Midsummer House

DECODING LAYER1 DECODING LAYER2 DECODING LAYER3 DECODING LAYER4

Hierarchical Decoder

1. NOUN + PROPN + PRON 2. VERB

3. ADJ + ADV 4. Others

Input Semantics

[ … 1, 0, 0, 1, 0, …]

Semantic 1-hot Representation

GRU Decoder

All Bar One is a

is a moderately

All Bar One is moderately

output from last layer 𝒚𝒕𝒊−𝟏 last output 𝒚𝒕−𝟏𝒊 1. Repeat-input

2. Inner-Layer Teacher Forcing 3. Inter-Layer Teacher Forcing 4. Curriculum Learning

𝒉enc

88

(89)

Fine-Tuning Pre-Trained GPT-2

Fine-tuning for conditional generation

89

Pre-trained models have better capability of generating fluent sentences

(90)

NLG Evaluation

Automatic metrics

Human evaluation

90

System Action inform(name=B&B)

System Response

I have book a hotel B&B for you.

(91)

Automatic Evaluation

Perplexity ⇒ how likely the model is to generate the gold response

N-gram overlapping ⇒ BLEU etc.

Slot error rate ⇒ whether the given slots are mentioned

Distinct N-grams ⇒ response diversity

91

Model

Response

Do you have any other plans this weekend?

Gold Response What do you do in the

coming days?

Scorer Score

(92)

Human Evaluation Likert

Judges are asked to give ratings 0-5 according to “Humanness, Fluency and Coherence”

92

Model

Response

Do you have any other plans this weekend?

Dialogue History

I could teach a few classes this weekend and I don’t

know what to do

Human Evaluator

Likert:

Humanness Fluency Coherency

(93)

Human Evaluation Dynamic Likert

Human judge interacts with the model and give ratings 0-5 according to “Humanness, Fluency and Coherence”

93

Model

Human Evaluator

Likert:

Humanness Fluency Coherency

Model

Human Evaluator Human

Evaluator

ACUTE-EVAL (Li et.al. 2019)

After conversation

(94)

Human Evaluation A/B

Judges are asked to choose the best one according to “Humanness, Fluency and Coherence”

94

Model A

Response

Do you have any other plans this

weekend?

Response I don’t know

Human Evaluator

A / B Testing Humanness

Fluency Coherency

Model B Human

Dialogue History

參考文獻

相關文件

Agent: Okay, I will issue 2 tickets for you, tomorrow 9:00 pm at AMC pacific place 11 theater, Seattle, movie ‘Deadpool’. User:

In this work, a goal-driven requirements traceability approach is proposed to develop and manage requirements changes along three dimensions: (1) to develop software and

• User goal: Two tickets for “the witch” tomorrow 9:30 PM at regal meridian 16, Seattle. E2E Task-Completion Bot (TC-Bot) (Li et

Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm.

Agent: Okay, I will issue 2 tickets for you, tomorrow 9:00 pm at AMC pacific place 11 theater, Seattle, movie ‘Deadpool’. User:

• Learn to show the body props and to make the sound effects.. • Teamwork to play the fairy

“Towards case-specific applications of mindfulness-based cognitive-behavioral therapies: A mindfulness-based rational emotive behavior therapy.. Counseling Psychology

For example, Ko, Chen and Yang [22] proposed two kinds of neural networks with different SOCCP functions for solving the second-order cone program; Sun, Chen and Ko [29] gave two

For finite-dimensional second-order cone optimization and complementarity problems, there have proposed various methods, including the interior point methods [1, 15, 18], the

Note that if the server-side system allows conflicting transaction instances to commit in an order different from their serializability order, then each client-side system must apply

¾ PCS systems can connected to Public Switched Telephone Network (PSTN)6. ¾ Goal of PCS:enabling communications with a person at anytime, at any place and in any

4. To apply the basic principles and techniques in preparing personal budget, and 5. To develop a proper attitude towards personal finance.. Resources for the TEKLA curriculum

Shang-Yu Su, Chao-Wei Huang, and Yun-Nung Chen, “Dual Supervised Learning for Natural Language Understanding and Generation,” in Proceedings of The 57th Annual Meeting of

◉ These limitations of vanilla seq2seq make human-machine conversations boring and shallow.. How can we overcome these limitations and move towards deeper

The criterion for securing consistence in bilateral vicinities is to rule out the pairs which consist of two cliff cell edges with different slope inclination but the pairs

Two-scale Tone Management for Photographic Look Interactive Local Adjustment of Tonal Values Image-Based Material Editing..

Our main goal is to give a much simpler and completely self-contained proof of the decidability of satisfiability of the two-variable logic over data words.. We do it for the case

language reference User utterances “Find me an Indian place near CMU.” language reference Meta data Monday, 10:08 – 10:15, Home contexts of the tasks..

The Future of Asian &amp; Pacific Cities: Transformative Pathways Towards Sustainable Urban Development.

在雲中街文創聚落中營運中的「凹凸 咖啡館」是利用當時遺留下的建築群

The test taker who fails to provide the electronic photo should prepare two paper photos according to the above requirements. One photo should be affixed at the correct place of the

Hsueh (1996), “A Dynamic User-Optimal Route Choice Problem Using a Link-Based Variational Inequality Formulation,”. Paper Presented at The 5th World Congress of the RSAI

[30] Longhua Zhang, Gail-Joon Ahn and Bei-Tseng Chu, “A rule-based Framework for Role-Based Delegation”, Proceedings of the sixth ACM symposium on Access control models