Towards Conversational AI

(1)

Towards Conversational AI

Applied Deep Learning

May 31st, 2021 http://adl.miulab.tw

(2)

What can machines achieve now or in the future?

Iron Man (2008)

2

(3)

Language Empowering Intelligent Assistants

Apple Siri (2011) Google Now (2012)

Google Home (2016)

Microsoft Cortana (2014)

Amazon Alexa/Echo (2014)

Google Assistant (2016)

Apple HomePod (2017) Facebook Portal (2019)

3

(4)

Why Natural Language?

◉

Global Digital Statistics (2021 January)

Internet Users 4.66B (59.5%)

Unique Mobile Users

5.22B (66.6%)

The more natural and convenient input of devices evolves towards speech.

Active Mobile Social Users 4.20B (53.6%) Total Population

7.83B

4

(5)

Why and When We Need?

“I want to chat”

“I have a question”

“I need to get this done”

“What should I do?”

Turing Test (talk like a human) Information consumption

Task completion Decision support

Social Chit-Chat Task-Oriented Dialogues

• Is this course good to take?

• Book me the train ticket from Kaohsiung to Taipei

• Reserve a table at Din Tai Fung for 5 people, 7PM tonight

• Schedule a meeting with Vivian at 10:00 tomorrow

• What is today’s agenda?

• What does NLP stand for?

5

(6)

Intelligent Assistants

Task-Oriented

6

(7)

App → Bot

◉

A bot is responsible for a “single” domain, similar to an app

Users can initiate dialogues instead of following the GUI design

7

(8)

Two Branches of Conversational AI

Chit-Chat

Task-Oriented

8

(9)

Task-Oriented Dialogue Systems

9

(10)

Task-Oriented Dialogue Systems

(Young, 2000)

LU: Language

Understanding DST: Dialogue State Tracking

DP: Dialogue Policy Learning NLG: Natural

Language Generation For how many people?

ASR

TTS Can you help me book a

5-star hotel on Sunday?

10

(11)

Modular Task-Oriented Dialogue Systems

Language Understanding

11

(12)

Language Understanding (LU)

◉

NLU is a turn-level task that maps utterances to semantics frames.

○

Input: raw user utterance

○

Output: semantic frame (e.g. speech-act, intent, slots)

DP

For two people, thanks! DST

people_num=2

NLG

LU: Language Understanding 12

(13)

Language Understanding (LU)

◉

Pipelined

1. Domain Classification

2. Intent

Classification 3. Slot Filling 13

(14)

1. Domain Identification

Requires Predefined Domain Ontology

find a good eating place for taiwanese food

User

Organized Domain Knowledge (Database)

Intelligent Agent

Restaurant DB Taxi DB Movie DB

Classification!

14

(15)

2. Intent Detection

Requires Predefined Schema

User

Intelligent Agent

Restaurant DB

FIND_RESTAURANT FIND_PRICE

FIND_TYPE :

Classification!

15

(16)

3. Slot Filling

Requires Predefined Schema

User

Intelligent Agent

Restaurant DB

Restaurant Rating Type Rest 1 good Taiwanese

Rest 2 bad Thai

: : :

FIND_RESTAURANT rating=“good”

type=“taiwanese”

SELECT restaurant { rest.rating=“good”

rest.type=“taiwanese”

Semantic Frame } Sequence Labeling O O B-rating O O O B-type O

16

(17)

Slot Tagging

(Yao et al, 2013; Mesnil et al, 2015)

◉

Variations:

a.

RNNs with LSTM cells

b.

Input, sliding window of n-grams

c.

Bi-directional LSTMs

𝑤₀ 𝑤₁ 𝑤₂ 𝑤_𝑛 ℎ₀^𝑓 ℎ₁^𝑓 ℎ₂^𝑓 ℎ_𝑛^𝑓 ℎ₀^𝑏 ℎ₁^𝑏 ℎ₂^𝑏 ℎ_𝑛^𝑏 𝑦₀ 𝑦₁ 𝑦₂ 𝑦_𝑛

(b) LSTM-LA (c) bLSTM 𝑦₀ 𝑦₁ 𝑦₂ 𝑦_𝑛

𝑤₀ 𝑤₁ 𝑤₂ 𝑤_𝑛 ℎ₀ ℎ₁ ℎ₂ ℎ_𝑛

(a) LSTM

𝑦₀ 𝑦₁ 𝑦₂ 𝑦_𝑛

𝑤₀ 𝑤₁ 𝑤₂ 𝑤_𝑛 ℎ₀ ℎ₁ ℎ₂ ℎ_𝑛

http://131.107.65.14/en-us/um/people/gzweig/Pubs/Interspeech2013RNNLU.pdf; http://dl.acm.org/citation.cfm?id=2876380

17

(18)

Slot Tagging

(Kurata et al., 2016; Simonnet et al., 2015)

◉

Encoder-decoder networks

○

Leverages sentence level information

◉

Attention-based encoder-decoder

○

Use of attention (as in MT) in the encoder-decoder network

○

Attention is estimated using a feed-

forward network with input: h

_t

and s

_t

at time t

𝑤_𝑛 𝑤₂ 𝑤₁ 𝑤₀ ℎ_𝑛 ℎ₂ ℎ₁ ℎ₀

𝑤₀ 𝑤₁ 𝑤₂ 𝑤_𝑛

ℎ₀ ℎ₁ ℎ₂ ℎ_𝑛 𝑠₀ 𝑠₁ 𝑠₂ 𝑠_𝑛 c_i

ℎ₀…ℎ_𝑛

http://www.aclweb.org/anthology/D16-1223

18

(19)

h_t-1 h_t h_t+1

W W W W

taiwanese

B-type U

food U

please U

V

O V

h_T+1 EOS U

FIND_REST V

Slot Filling Intent Prediction

Joint Semantic Frame Parsing

◉

Sequence-based

(Hakkani-Tur+, 2016) ◉

Parallel-based

(Liu and Lane, 2016)

Attention Mechanism

Intent-Slot Relationship Sequence-based (Hakkani-Tur+, ‘16) X Δ (Implicit) Parallel-based (Liu & Lane, ‘16) √ Δ (Implicit)

Slot-Gated Joint Model √ √ (Explicit)

19

(20)

Slot-Gated Joint SLU

(Goo+, 2018)

Slot Attention

Intent Attention 𝑦^𝐼

Word Sequence

𝑥₁ 𝑥₂ 𝑥₃ 𝑥₄

BLSTM Slot

Sequence

𝑦₁^𝑆 𝑦₂^𝑆 𝑦₃^𝑆 𝑦₄^𝑆

Word

Sequence ^𝑥¹ ^𝑥² ^𝑥³ ^𝑥⁴ BLSTM

Slot Gate

𝑊

𝑐^𝐼

𝑣 tanh

𝑔

𝑐_𝑖^𝑆

Slot Gate

𝑔 = ∑𝑣 ∙ tanh 𝑐_𝑖^𝑆 + 𝑊 ∙ 𝑐^𝐼 Slot Prediction

𝑦_𝑖^𝑆 = softmax 𝑊^𝑆 ℎ_𝑖 + 𝒈 ∙ 𝑐_𝑖^𝑆 + 𝑏^𝑆

𝒈 will be larger if slot and intent are better related

20

(21)

Contextual Language Understanding

◉

User utterances are highly ambiguous in isolation

Cascal, for 6.

#people time

?

Book a table for 10 people tonight.

Which restaurant would you like to book a table for?

Restaurant Booking

21

(22)

End-to-End Memory Networks

(Sukhbaatar et al, 2015)

m₀

m_i

m_n-1

u U: “i d like to purchase tickets to see deepwater horizon”

S: “for which theatre”

U: “angelika”

S: “you want them for angelika theatre?”

U: “yes angelika”

S: “how many tickets would you like ?”

U: “3 tickets for saturday”

S: “What time would you like ?”

U: “Any time on saturday is fine”

S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm”

U: “Let’s do 5:40”

22

(23)

E2E MemNN for Contextual LU

(Chen+, 2016)

u

Knowledge Attention Distribution

p_i

m_i

Memory Representation

Weighted

Sum h

∑ W_kg

o

Knowledge Encoding Representation history utterances

{x_i} current utterance

c

Inner Product Sentence

Encoder RNN_in

x1 x2 … xi

Contextual Sentence Encoder

x1 x2 …xi

RNN_mem

slot tagging sequence y

h_t-1 h_t

V V

W W W

w_t-1 w_t y_t-1 y_t

U U

M M

1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding

Idea: additionally incorporating contextual knowledge during slot tagging

RNN Tagger

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/IS16_ContextualSLU.pdf

23

(24)

E2E MemNN for Contextual LU

(Chen+, 2016)

0.69

0.13

0.16

U: “Let’s do 5:40”

U: “i d like to purchase tickets to see deepwater horizon”

S: “for which theatre”

U: “angelika”

S: “you want them for angelika theatre?”

U: “yes angelika”

S: “how many tickets would you like ?”

U: “3 tickets for saturday”

S: “What time would you like ?”

U: “Any time on saturday is fine”

S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm”

24

(25)

Recent Advances in NLP

◉

Contextual Embeddings (ELMo & BERT)

○ Boost many understanding performance with pretrained language models

?

25

(26)

26

(27)

27

(28)

Robustness – Adapting to ASR

(Huang & Chen, 2019)

LSTM LSTM LSTM

What a day

Linear

a day <EOS>

Stage 1: Pre-Training on Sequential Texts

LatticeLSTM

the, 1.0

LatticeLSTM Max pooling

classification Fine-Tuning

the, 1.0 0.8

0.2

Linear

0.9 1.0 1.0

0.1

1.0 1.0

Stage 2: Pre-Training on Lattices

LatticeLSTM

28

(29)

Robustness – Adapting to ASR

(Huang & Chen, 2019)

◉

Idea: lattices may include correct words

◉

Goal: feed lattices into Transformer

Transformer Encoder 𝑤₁ 𝑤₂ . . .

𝑤_𝑚−1𝑤_𝑚

<S> <E>

Linear

𝑦

<s> cheapest airfare

fair

affair air

to Milwaukee ^</s>

1

0.4 0.3 0.3

1

1 1

1

1 1

Chao-Wei Huang and Yun-Nung Chen, “Adapting Pretrained Transformer to Lattices for Spoken Language Understanding,”

in Proceedings of 2019 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2019.

SLU performance is improved by leveraging the lattices without increasing training/inference time 29

(30)

Robustness – Adapting to ASR

(Huang & Chen, 2019)

Chao-Wei Huang and Yun-Nung Chen, “Learning ASR-Robust Contextualized Embeddings for Spoken Language Understanding,” in ICASSP, 2019.

The contextual embeddings of the recognized texts would be similar to the ground truth one.

◉

Confusion-Aware Fine-Tuning

○

Supervised

○

Unsupervised

30

(31)

Scalability – Multilingual LU

(Upadhyay+, 2018)

◉

Source language: English (full annotations)

◉

Target language: Hindi (limited annotations)

RT: round trip, FC: from city, TC: to city, DDN: departure day name

http://shyamupa.com/papers/UFTHH18.pdf

31

(32)

Scalability – Multilingual LU

(Upadhyay+, 2018)

English Train

Hindi Train

Hindi Tagger

MT SLU

Results Hindi Test

Train on Target (Lefevre et al, 2010)

English Tagger Hindi

Test

English

MT Test SLU

Results Test on Source (Jabaian et al, 2011)

SLU Results Hindi Train (Small)

Bilingual Tagger English Train (Large)

Joint Training

Hindi Test Joint Training

MT system is not required and both languages can be processed by a single model

http://shyamupa.com/papers/UFTHH18.pdf

32

(33)

LU Evaluation

◉

Metrics

○

Sub-sentence-level: domain/intent accuracy, slot F1

○

Sentence-level: whole frame accuracy

Utterance: For 2 people thanks

Slot: O B-people O O Domain: Hotel

Intent: Hotel_Book ⇒ Acc

⇒ Slot-F1

⇒ Frame Accuracy 33

(34)

Dialogue State Tracking

34

(35)

Dialogue State Tracking

◉

DST is a dialogue-level task that maps partial dialogues into dialogue states.

○

Input: a dialogue / a turn with its previous state

○

Output: dialogue state (e.g. slot-value pairs)

Hotel_Book ( star=5

day=sunday )

Hotel_Book ( star=5

day=sunday people_num=2) Can you help me book a

DP NLG

NLU people_num=2

For two people, thanks! DST: Dialogue

State Tracking 35

(36)

Dialogue State Tracking

request (restaurant; foodtype=Thai)

inform (area=centre)

request (address)

bye ()

36

(37)

Dialogue State Tracking

Requires Hand-Crafted States

User

Intelligent Agent

location rating type

loc, rating

rating, type

loc, type all

i want it near to my office

NULL

37

(38)

Dialogue State Tracking

Requires Hand-Crafted States

User

Intelligent Agent

loc, rating

rating, type

loc, type all

i want it near to my office

NULL

38

(39)

Dialogue State Tracking

Handling Errors and Confidence

User

Intelligent Agent

find a good eating place for taixxxx food

type=“taiwanese”

type=“thai”

loc, rating

rating, type

loc, type all

NULL

?

rating=“good”, type=“thai”

rating=“good”, type=“taiwanese”

?

39

(40)

DST Problem Formulation

◉

The DST dataset consists of

○

Goal: for each informable slot

■ e.g. price=cheap

○

Requested: slots by the user

■ e.g. moviename

○

Method: search method for entities

■ e.g. by constraints, by name

◉

The dialogue state is

○

the distribution over possible slot-value pairs for goals

○

the distribution over possible requested slots

○

the distribution over possible methods

40

(41)

Dialogue State Tracking (DST)

◉

Maintain a probabilistic distribution instead of a 1-best prediction for better robustness to SLU errors or ambiguous input

How can I help you?

Book a table at Sumiko for 5 How many people?

3

Slot Value

# people 5 (0.5)

time 5 (0.5)

Slot Value

# people 3 (0.8)

time 5 (0.8)

41

(42)

Multi-Domain Dialogue State Tracking

◉

A full representation of the system's belief of the user's goal at any point during the dialogue

◉

Used for making API calls

Movies

Less Likely

More Likely Date

Time

#People

6 pm 2 11/15/17

7 pm 8 pm 9 pm

Century 16 Shoreline

#People Theater

Inferno.

Inferno Movie

Which movie are you interested in?

I wanna buy two tickets for tonight at the Shoreline theater.

42

(43)

Multi-Domain Dialogue State Tracking

◉

A full representation of the system's belief of the user's goal at any point during the dialogue

◉

Used for making API calls

Movies

Less Likely

More Likely

I wanna buy two tickets for tonight at the Shoreline theater.

Date Time

#People

6:30 pm 2 11/15/17

7:30 pm 8:45 pm 9:45 pm

Which movie are you interested in?

Inferno.

Inferno showtimes at Century 16 Shoreline are 6:30pm, 7:30pm, 8:45pm and 9:45pm. What time do you prefer?

We'd like to eat dinner before the movie at Cascal, can you check what time i can get a table?

Restaurants

6:00 pm 6:30 pm 11/15/17

Date

Time 7:00 pm

Cascal

#People 2 Restaurant

43

(44)

Multi-Domain Dialogue State Tracking

◉

A full representation of the system's belief of the user's goal at any point during the dialogue

◉

Used for making API calls

Movies

Less Likely

More Likely Date

Time

#People

6:30 pm 2 11/15/17

7:30 pm 8:45 pm 9:45 pm

Inferno.

Inferno showtimes at Century 16 Shoreline are 6:30pm, 7:30pm, 8:45pm and 9:45pm. What time do you prefer?

We'd like to eat dinner before the movie at Cascal, can you check what time i can get a table?

Restaurants

6:00 pm 6:30 pm 11/15/17

Date

Time 7:00 pm

Cascal

Cascal has a table for 2 at 6pm and 7:30pm.

OK, let me get the table at 6 and tickets for the 7:30 showing.

#People 2 Restaurant

44

(45)

Discriminative DST – Single Turn

Data

Model

Prediction

• Observations labeled w/ dialogue state

• Distribution over dialogue states – Dialogue State Tracking

• Neural networks

• Ranking models

45

(46)

DNN for DST

feature

extraction DNN

A slot value distribution for each slot

multi-turn conversation

state of this turn 46

(47)

Discriminative DST – Multiple Turns

Data

Model

Prediction

• Sequence of observations labeled w/ dialogue states

• Distribution over dialogue states – Dialogue State Tracking

• Sequential models

– Recurrent neural networks (RNN)

47

(48)

Recurrent Neural Network (RNN)

◉

Elman-type

◉

Jordan-type

48

(49)

RNN-Based DST

◉

Idea: internal memory for representing dialogue context

○ Input

■ most recent dialogue turn

■ last machine dialogue act

■ dialogue state

■ memory layer

○ Output

■ update its internal memory

■ distribution over slot values 49

(50)

RNN-CNN DST

(Mrkšić+, 2015)

(Figure from Wen et al, 2016)

http://www.anthology.aclweb.org/W/W13/W13-4073.pdf; https://arxiv.org/abs/1506.07190; https://arxiv.org/abs/1606.03777

50

(51)

Global-Locally Self-Attentive DST

(Zhong+, 2018)

◉

More advanced encoder

○ Global modules share parameters for all slots

○ Local modules learn slot-specific feature representations

http://www.aclweb.org/anthology/P18-1135

51

(52)

Generative DST

●

Generating the state as a sequence (Lei+, 2018) or dialogue state updates (Lin+, 2020)

(Dialogue history) ⇒ (slot1=val,slot2=val …)

●

Given a dialogue and a slot, generate the value of the slot ( Wu+, 2019; Gao+, 2019; Ren+, 2019; Zhou & Small, 2019; Kim+, 2019;

Le+, 2020) ⇒ requires multiple forwards

(Dialogue history, slot1) ⇒ val

52

(53)

Handling Unknown Slot Values

(Xu & Hu, 2018)

◉

Issue: fixed value sets in DST

http://aclweb.org/anthology/P18-1134

<sys> would you like some Thai food

Attention Dist.

<usr> I prefer Italian one <food>

“Italian”

other dontcare

none

Italian

Pointer networks for generating unknown values 53

(54)

NONE DONTCARE

Context PTR

Vector Ashley

Slot Gate

Ex: hotel

Utterances

…....

Bot: Which area are you looking for the hotel?

User: There is one at east town called Ashley Hotel.

Utterance Encoder

Domains Hotel, Train,

Attraction, Restaurant, Taxi

Slots

Price, Area, Day, Departure, name, LeaveAt, food, etc.

State Generator

Ashley

Ex: name

Hotel?

TRADE: Transferable DST

(Wu+, 2019) 54

(55)

TripPy: Handling OOV & Rare Values

(Heck+, 2020)

55

(56)

DST Evaluation

◉

Dialogue State Tracking Challenges

○ DSTC2-3, human-machine

○ DSTC4-5, human-human

○ DSTC8, human-machine

◉

Metric

○ Tracked state accuracy with respect to user goal

○ Recall/Precision/F-measure individual slots

Input Dialogue:

USER: Can you help me book a 5- star hotel on Sunday?

SYSTEM: For how many people?

USER: For two people, thanks!

Output Dialogue State:

Hotel_Book (star=5, day=sunday) Hotel_Book (star=5, day=sunday, people_num=2)

⇒ Slot Acc / Joint Acc 56

(57)

Dialog State Tracking Challenge (DSTC)

(Williams et al. 2013, Henderson et al. 2014, Henderson et al. 2014, Kim et al. 2016, Kim et al. 2016)

Challenge Type Domain Data Provider Main Theme

DSTC1 Human-Machine Bus Route CMU Evaluation Metrics DSTC2 Human-Machine Restaurant U. Cambridge User Goal Changes DSTC3 Human-Machine Tourist Information U. Cambridge Domain Adaptation DSTC4 Human-Human Tourist Information I2R Human Conversation DSTC5 Human-Human Tourist Information I2R Language Adaptation

57

(58)

DSTC4-5

◉ Type: Human-Human

◉ Domain: Tourist Information

Tourist: Can you give me some uh- tell me some cheap rate hotels, because I'm planning just to leave my bags there and go somewhere take some pictures.

Guide: Okay. I'm going to recommend firstly you want to have a backpack type of hotel, right?

Tourist: Yes. I'm just gonna bring my backpack and my buddy with me. So I'm kinda looking for a hotel that is not that expensive. Just gonna leave our things there and, you know, stay out the whole day.

Guide: Okay. Let me get you hm hm. So you don't mind if it's a bit uh not so roomy like hotel because you just back to sleep.

Tourist: Yes. Yes. As we just gonna put our things there and then go out to take some pictures.

Guide: Okay, um- Tourist: Hm.

Guide: Let's try this one, okay?

Tourist: Okay.

Guide: It's InnCrowd Backpackers Hostel in Singapore.

If you take a dorm bed per person only twenty dollars. If you take a room, it's two single beds at fifty nine dollars.

Tourist: Um. Wow, that's good.

Guide: Yah, the prices are based on per person per bed or dorm. But this one is room. So it should be fifty nine for the two room. So you're actually paying about ten dollars more per person only.

Tourist: Oh okay. That's- the price is reasonable actually. It's good.

{Topic: Accommodation; Type: Hostel; Pricerange:

Cheap; GuideAct: ACK; TouristAct: REQ}

{Topic: Accommodation; NAME: InnCrowd

Backpackers Hostel; GuideAct: REC; TouristAct: ACK}

58

(59)

Multi-Domain DST Data

◉

MultiWoZ 2.0 ⇒ 2.1 ⇒ 2.2 ⇒ 2.3 ⇒ ……

◉

SGD: natural language described schema for better scalability

59

(60)

MultiWOZ 2.1 Leaderboard

60

(61)

Dialogue Policy Learning

61

(62)

Dialogue Policy Learning

◉

DP decides the system action for interacting with users based on dialogue states.

○

Input: dialogue state + KB results

○

Output: system action (speech-act + slot-value pairs)

62

NLU DST

DP: Dialogue Policy Learning

NLG

Inform (

hotel_name=B&B )

KB Hotel_Book (

star=5

day=sunday people_num=2) Can you help me book a

For two people, thanks!

(63)

Dialogue Policy Learning

request (restaurant; foodtype=Thai)

inform (area=centre)

request (address)

bye ()

greeting ()

request (area)

inform (restaurant=Bangkok city, area=centre of town, foodtype=Thai)

inform (address=24 Green street)

63

(64)

Supervised v.s. Reinforcement

◉

Supervised

◉

Reinforcement

……

Say “Hi”

Say “Good bye”

Learning from teacher

Learning from critics

Hello ☺ ^……

“Hello”

“Bye bye”

……. ^…….

OXX???

!

Bad

64

(65)

Dialogue Policy Optimization

◉

Dialogue management in a RL framework

U s e r

Reward R Observation O Action A

Environment

Agent

Natural Language Generation Language Understanding

Dialogue Manager

Select the best action that maximizes the future reward 65

(66)

Reward for RL ≅ Evaluation for System

◉

Dialogue is a special RL task

● Human involves in interaction and rating (evaluation) of a dialogue

● Fully human-in-the-loop framework

◉

Rating: correctness, appropriateness, and adequacy

- Expert rating high quality, high cost

- User rating unreliable quality, medium cost - Objective rating Check desired aspects, low cost

66

(67)

Dialogue Reinforcement Learning Signal

◉

Typical reward function

○ -1 for per turn penalty

○ Large reward at completion if successful

◉

Typically requires domain knowledge

○ ✔ Simulated user

○ ✔ Paid users (Amazon Mechanical Turk)

○ ✖ Real users

| | |

…

﹅

The user simulator is usually required for dialogue system training before deployment

67

(68)

Neural Dialogue Manager

(Li et al., 2017)

◉

Deep Q-network for training dialogue policy

○

Input: current semantic frame observation, database returned results

○

Output: system action

Semantic Frame request_movie

genre=action, date=this weekend

System

Action/Policy request_location

DQN-based Dialogue Management Simulated User (DM)

Backend DB

https://arxiv.org/abs/1703.01008

68

(69)

E2E Task-Completion Bot (TC-Bot)

(Li et al., 2017)

Idea: SL for each component and RL for end-to-end training the neural dialogue system

w_i

<slot>

w_i+1

O

EOS

w_i

<slot>

w_i+1

O

EOS

Knowledge Database

Neural Dialogue System

User Model

User Simulation

Dialogue Policy Natural Language

w₀ w₁NLGw₂ _EOS

User Goal

w_i

<slot>

w_i+1

O

EOS

LU

𝑠_𝑡 DST

𝑠₁ 𝑠₂ 𝑠_𝑛

𝑎₁ 𝑎₂ 𝑎_𝑘

……

…

Dialogue Policy Learning Are there any action

movies to see this weekend?

request_location

69

(70)

User goal – Two tickets for “Deadpool” tomorrow 9PM at AMC Pacific Place 11 theater, Seattle.

RULE BASED SYSTEM

User: Do you have any ticket for movie ‘Deadpool’ at 9:00 pm?

Agent: Which theater do you want?

User: AMC pacific place 11 theater, please!

Agent: How many tickets do you want?

User: Two!

Agent: What date?

User: Tomorrow!

Agent: Which city do you want?

User: Seattle!

Agent: Do you have any specification for zip code?

User: No, I do not care.

Agent: Do you have any distance constraint?

User: No, I do not care.

Agent: How about video format? 2d? Imax 3d?

User: Any kind is okay.

Agent: Do you have any constraint to price?

User: No!

Agent: Okay, I will issue 2 tickets for you, tomorrow 9:00 pm at AMC pacific place 11 theater, Seattle, movie

‘Deadpool’.

User: Okay, thanks.

Agent: Thanks!

REINFORCEMENT LEARNING SYSTEM User: I want two tickets for movie ‘Deadpool’.

Agent: What is the starttime you prefer?

User: 9:00 PM Agent: Which day?

User: Tomorrow Agent: Which city?

User: Seattle

Agent: Which theater do you prefer?

User: AMC Pacific Place 11 theater.

Agent: Okay, I will issue 2 tickets for you, tomorrow 9:00 pm at AMC pacific place 11 theater, Seattle, movie ‘Deadpool’.

User: Okay, thanks.

Agent: Thanks!

Reinforcement Learning Agent

(Li et al., 2017)

Skip the requests the user may not care about to improve efficiency

Issue: no notion

about what requests can be skipped

70

(71)

Online Training

(Su+, 2015; Su+, 2016)

◉

Policy learning from real users

○ Infer reward directly from dialogues (Su et al., 2015)

○ User rating (Su et al., 2016)

◉

Reward modeling on user binary success rating

Reward

Model Success/Fail Embedding

Function

Dialogue Representation

Reinforcement Signal

Query rating 71

(72)

Interactive RL for DP

(Shah+, 2016)

Immediate Feedback

https://research.google.com/pubs/pub45734.html

Use a third agent for providing interactive feedback to the policy

72

(73)

Planning – Deep Dyna-Q

(Peng+, 2018)

◉ Idea: learning with real users with planning

Policy Model

World User Model

Real Experience

Direct

Reinforcement Learning World Model

Learning

Planning

Acting Human

Conversational Data Imitation

Learning Supervised

Learning

Policy learning suffers from the poor quality of fake experiences

73

(74)

Robust Planning – D3Q (Su+, 2018)

◉ Idea: add a discriminator to filter out the bad experiences

Policy Model

World User Model

Real Experience

Direct

Reinforcement Learning World Model

Learning

Controlled Planning

Acting Human

Conversational Data

Imitation Learning Supervised

Learning

Discriminator Discriminative

Training

S.-Y. Su, X. Li, J. Gao, J. Liu, and Y.-N. Chen, “Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning," (to appear) in Proc. of EMNLP, 2018.

NLU Discriminator

System Action (Policy)

Semantic Frame

State

Representation Real

Experience

DST

Policy Learning NLG

Simulated Experience

World Model User

74

(75)

Robust Planning – D3Q

(Su+, 2018)

The policy learning is more robust and shows the improvement in human evaluation

75

(76)

Multi-Domain – Hierarchical RL

(Peng+, 2017)

Travel Planning

Actions

• Set of tasks that need to be fulfilled collectively!

• Build a DM for cross-subtask constraints (slot constraints)

• Temporally constructed goals

• hotel_check_in_time > departure_flight_time

• # flight_tickets = #people checking in the hotel

• hotel_check_out_time< return_flight_time

https://arxiv.org/abs/1704.03084

76

(77)

Multi-Domain – Hierarchical RL

(Peng+, 2017)

◉ Model makes decisions over two levels: meta-controller & controller

◉ The agent learns these policies simultaneously

○ Policy of optimal sequence of goals to follow 𝜋_𝑔 𝑔_𝑡, 𝑠_𝑡; 𝜃₁

○ ^{Policy 𝜋}_𝑎,𝑔 ^𝑎_𝑡^{, 𝑔}_𝑡^{, 𝑠}_𝑡^{; 𝜃}₂ for each sub-goal 𝑔_𝑡

Meta-Controller

Controller

(mitigate reward sparsity issues)

Multiple policies need to collaborate with each other for better multi-domain interactions 77

(78)

Dialogue Policy Evaluation

◉

Metrics

○

Turn-level evaluation: system action accuracy

○

Dialogue-level evaluation: task success rate, reward

78

Dialogue State:

Hotel_Book ( star=5, day=sunday, people_num=2 ) KB State:

rest1=B&B

System Action:

inform ( hotel_name=B&B )

(79)

Natural Language Generation

79

(80)

Natural Language Generation

◉

NLG is to map system actions to natural language responses.

○

Input: system speech-act + slot-value (optional)

○

Output: natural language response

80

NLU DST

I have book a hotel B&B for you. DP

Inform (

hotel_name=B&B ) NLG: Natural

Language Generation Can you help me book a

For two people, thanks!

(81)

Template-Based NLG

◉

Define a set of rules to map frames to natural language

Pros: simple, error-free, easy to control

Cons: time-consuming, rigid, poor scalability

Semantic Frame Natural Language

confirm() “Please tell me more about the product you are looking for.”

confirm(area=$V) “Do you want somewhere in the $V?”

confirm(food=$V) “Do you want a $V restaurant?”

confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.”

81

(82)

RNN-Based LM NLG

(Wen et al., 2015)

<BOS> SLOT_NAME serves SLOT_FOOD .

<BOS> Din Tai Fung serves Taiwanese . delexicalisation

Inform(name=Din Tai Fung, food=Taiwanese) 0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, 0, 0, 0…

dialogue act 1-hot representation

SLOT_NAME serves SLOT_FOOD . <EOS>

conditioned on the dialogue act

Input

Output 82

(83)

x_t h_t-1 x_t h_t-1 x_t h_t-1

LSTM cell

C_t i_t

f_t

o_t h_t x_t

h_t-1

Semantic Conditioned LSTM

(Wen et al., 2015)

◉

Issue: semantic repetition

○ Din Tai Fung is a great Taiwanese restaurant that serves Taiwanese.

○ Din Tai Fung is a child friendly restaurant, and also allows kids.

DA cell

r_t d_t

d_t-1

x_t h_t-1

Inform(name=Seven_Days, food=Chinese)

0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, … dialog act 1-hot representation d₀

Idea: using gate mechanism to control the generated semantics (dialogue act/slots) 83

(84)

Structural NLG

(Sharma+, 2017; Nayak+, 2017)

◉

Delexicalized slots do not consider the word level information

◉

Slot value-informed sequence to sequence models

84

(85)

Contextual NLG

(Dušek and Jurčíček, 2016)

◉

Goal: adapting users’ speaking way, providing context-aware responses

○

Context encoder

○

Seq2Seq model

https://www.aclweb.org/anthology/W/W16/W16-36.pdf#page=203

85

(86)

Controlled Text Generation

(Hu et al., 2017)

◉

Idea: NLG based on generative adversarial network (GAN) framework

○

c: targeted sentence attributes

86

(87)

Issues in NLG

◉

Issue

○

NLG tends to generate shorter sentences

○

NLG may generate grammatically-incorrect sentences

◉

Solution

○

Generate word patterns in an order

○

Consider linguistic patterns

87

(88)

Hierarchical NLG w/ Linguistic Patterns

(Su et al., 2018)

Bidirectional GRU Encoder

Italian priceRange

name … …

ENCODER

name[Midsummer House], food[Italian], priceRange[moderate], near[All Bar One]

All Bar One place it Midsummer House

All Bar One is priced place it is called Midsummer House All Bar One is moderately priced Italian place it is called Midsummer House

Near All Bar One is a moderately priced Italian place it is called Midsummer House

DECODING LAYER1 DECODING LAYER2 DECODING LAYER3 DECODING LAYER4

Hierarchical Decoder

1. NOUN + PROPN + PRON 2. VERB

3. ADJ + ADV 4. Others

Input Semantics

[ … 1, 0, 0, 1, 0, …]

Semantic 1-hot Representation

GRU Decoder

All Bar One is a

is a moderately

All Bar One is moderately

…… …

…

… …

output from last layer 𝒚_𝒕^𝒊−𝟏 last output 𝒚_𝒕−𝟏^𝒊 1. Repeat-input

2. Inner-Layer Teacher Forcing 3. Inter-Layer Teacher Forcing 4. Curriculum Learning

𝒉enc

88

(89)

Fine-Tuning Pre-Trained GPT-2

◉

Fine-tuning for conditional generation

89

Pre-trained models have better capability of generating fluent sentences

(90)

NLG Evaluation

◉

Automatic metrics

◉

Human evaluation

90

System Action inform(name=B&B)

System Response

I have book a hotel B&B for you.

(91)

Automatic Evaluation

◉

Perplexity ⇒ how likely the model is to generate the gold response

◉

N-gram overlapping ⇒ BLEU etc.

◉

Slot error rate ⇒ whether the given slots are mentioned

◉

Distinct N-grams ⇒ response diversity

91

Model

Response

Do you have any other plans this weekend?

Gold Response What do you do in the

coming days?

Scorer _Score

(92)

Human Evaluation Likert

◉

Judges are asked to give ratings 0-5 according to “Humanness, Fluency and Coherence”

92

Model

Response

Do you have any other plans this weekend?

Dialogue History

I could teach a few classes this weekend and I don’t

know what to do

Human Evaluator

Likert:

Humanness Fluency Coherency

(93)

Human Evaluation Dynamic Likert

◉

Human judge interacts with the model and give ratings 0-5 according to “Humanness, Fluency and Coherence”

93

Model

Human Evaluator

Likert:

Humanness Fluency Coherency

Model

Human Evaluator Human

Evaluator

ACUTE-EVAL (Li et.al. 2019)

After conversation

(94)

Human Evaluation A/B

◉

Judges are asked to choose the best one according to “Humanness, Fluency and Coherence”

94

Model A

Response

Do you have any other plans this

weekend?

Response I don’t know

Human Evaluator

A / B Testing Humanness

Fluency Coherency

Model B Human

Dialogue History