Course Logistics

(1)

(2)

Course Logistics

2

(3)

3

Outline

 Introduction and Background

 Modular Dialogue System

 Spoken/Natural Language Understanding (SLU/NLU)

 Dialogue State Tracking (DST)

 Dialogue Policy

 Natural Language Generation (NLG)

 End-to-End Learning for Dialogue Systems

 Conclusion

(4)

Part I

Introduction & Background

4

(5)

5

Language Empowering Intelligent Assistants

Apple Siri (2011) Google Now (2012) Microsoft Cortana (2014)

(6)

Dialogue System



Task-Oriented



Personal assistant, achieve a certain task



Combination of rules and statistical components



POMDP for spoken dialog systems (Williams and Young, 2007)



Learning End-to-End Goal- oriented Dialog (Antoni and Weston, 2016)



An End-to-End Trainable Task-oriented Dialogue System (Wen el al., 2016)



Chit-Chat



No specific goal, focus on conversation flow



Work using variants of seq2seq model



A Neural Conversation Model (Vinyals and Le, 2015)



Deep Reinforcement Learning for Dialogue Generation (Li et al., 2016)



Conversational Contextual Cues: The Case of

Personalization & History for Response Ranking (AI- Rfou et al., 2016)

6

(7)

7

Pipelined Task-Oriented Dialogue System

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame request_movie

genre=action, date=this weekend

Text response Text Input

Are there any action movies to see this weekend?

Speech Signal

(8)

Part II

Modular Dialogue System

8

(9)

9

Outline

 Introduction and Background

 Modular Dialogue System

 Spoken/Natural Language Understanding (SLU/NLU)

 Dialogue State Tracking (DST)

 Dialogue Policy

 Natural Language Generation (NLG)

 End-to-End Learning for Dialogue Systems

 Conclusion

(10)

10

Semantic Frame Representation



Requires a domain ontology



Contains core content (intent, a set of slots with fillers) find a cheap taiwanese restaurant in oakland

show me action movies directed by james cameron

find_restaurant (price=“cheap”,

type=“taiwanese”, location=“oakland”)

find_movie (genre=“action”, director=“james cameron”) Restaurant

Domain

Movie Domain

restaurant price type

location

movie year genre

director

10

(11)

11

Language Understanding (LU)

 Pipelined

1. Domain Classification

2. Intent

Classification 3. Slot Filling

(12)

LU – Domain/Intent Classification

• Given a collection of utterances u

_i

with labels c

_i

, D=

{(u

₁

,c

₁

),…,(u

_n

,c

_n

)} where c

_i

∊ C, train a model to estimate labels for new utterances u

_k

.

Mainly viewed as an utterance classification task

12

find me a cheap taiwanese restaurant in oakland

Movies Restaurants Sports

Weather Music

…

Find_movie Buy_tickets Find_restaurant Book_table Find_lyrics

…

(13)

13

Language Understanding - Slot Filling

Is there um a cheap place in the centre of town please?

O O O O

B-price

O O O

B-area

O I-areaI-area

As a sequence tagging task

• CRF for tagging each utterance

As a classification task

• SVM for each slot value pair

(14)

14

Language Understanding



Intent Classification (Ravuri and Stolcke, 2015)



IOB Sequence Labeling for Slot Filling (Hakkani-Tur et al., 2016)

𝑤

₀

𝑤

₁

𝑤

₂

𝑤

_𝑛

ℎ

₀^𝑓

ℎ

₁^𝑓

ℎ

₂^𝑓

ℎ

_𝑛^𝑓

ℎ

₀^𝑏

ℎ

₁^𝑏

ℎ

₂^𝑏

ℎ

_𝑛^𝑏

𝑦

₀

𝑦

₁

𝑦

₂

𝑦

_𝑛

(a) LSTM (b) LSTM-LA (c) bLSTM-LA

intent

𝑤

₀

𝑤

₁

𝑤

₂

𝑤

_𝑛

ℎ

₀

ℎ

₁

ℎ

₂

ℎ

_𝑛

𝑦

₀

𝑦

₁

𝑦

₂

𝑦

_𝑛

𝑤

₀

𝑤

₁

𝑤

₂

𝑤

_𝑛

ℎ

₀

ℎ

₁

ℎ

₂

ℎ

_𝑛

𝑦

₀

𝑦

₁

𝑦

₂

𝑦

_𝑛

𝑤

₀

𝑤

₁

𝑤

₂

𝑤

_𝑛

ℎ

₀

ℎ

₁

ℎ

₂

ℎ

_𝑛

14

(15)

h_t-

1

h_t+

1

h_t

W W W W

taiwanese U

food U

please U

h_T+1 EOS U

Joint Semantic Frame Parsing

Sequence-based (Hakkani-Tur et

al., 2016)

• Slot filling and intent prediction in the same

output sequence Parallel (Liu and Lane, 2016)

• Intent prediction and slot filling are performed in two branches

15

(16)

16

Outline

 Introduction and Background

 Modular Dialogue System

 Spoken/Natural Language Understanding (SLU/NLU)

 Dialogue State Tracking (DST)

 Dialogue Policy

 Natural Language Generation (NLG)

 End-to-End Learning for Dialogue Systems

 Conclusion

16

(17)

17

Dialogue State Tracking (DST)

(18)

18

Dialogue State Tracking (DST)

 Maintain a probabilistic distribution instead of a 1-best prediction for better robustness

18

Slide credited by Sungjin Lee

Incorrect

for both!

(19)

19

Dialogue State Tracking (DST)

 Maintain a probabilistic distribution instead of a 1-best prediction for better robustness to SLU errors or

ambiguous input

How can I help you?

Book a table at Sumiko for 5 How many people?

Slot Value

# people 5 (0.5)

time 5 (0.5)

Slot Value

(20)

20

Outline

 Introduction and Background

 Modular Dialogue System

 Spoken/Natural Language Understanding (SLU/NLU)

 Dialogue State Tracking (DST)

 Dialogue Policy

 Natural Language Generation (NLG)

 End-to-End Learning for Dialogue Systems

 Conclusion

20

(21)

21

Dialogue Policy Optimization

 Dialogue management in a RL framework

U s e r

Reward R Observation O Action A

Environment

Agent

Natural Language Generation Language Understanding

Dialogue Manager

(22)

22

Reward for RL ≅ Evaluation for SDS

 Dialogue is a special RL task

 Human involves in interaction and rating (evaluation) of a dialogue

 Fully human-in-the-loop framework

 Rating: correctness, appropriateness, and adequacy

- Expert rating high quality, high cost

- User rating unreliable quality, medium cost - Objective rating Check desired aspects, low cost

22

(23)

23

Dialogue Reinforcement Signal

Typical Reward Function

 per turn penalty -1

 Large reward at completion if successful

Typically requires domain knowledge

✔ Simulated user

✔ Paid users (Amazon Mechanical Turk)

✖ Real users

|||

…

﹅

(24)

24

User Simulation



User Simulation



Goal: generate natural and reasonable conversations to enable reinforcement learning for exploring the policy space



Approach



Rule-based crafted by experts (Li et al., 2016)



Learning-based (Schatzmann et al., 2006)

Dialogue Corpus

Simulated User

Real User

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy

Interaction

24

(25)

25

Outline

 Introduction and Background

 Modular Dialogue System

 Spoken/Natural Language Understanding (SLU/NLU)

 Dialogue State Tracking (DST)

 Dialogue Policy

 Natural Language Generation (NLG)

 End-to-End Learning for Dialogue Systems

 Conclusion

(26)

26

Natural Language Generation (NLG)

 Mapping semantic frame into natural language

inform(name=Seven_Days, foodtype=Chinese)

Seven Days is a nice Chinese restaurant

26

(27)

27

Template-Based NLG

 Define a set of rules to map frames to NL

Pros: simple, error-free, easy to control Semantic Frame Natural Language

confirm() “Please tell me more about the product your are looking for.”

confirm(area=$V) “Do you want somewhere in the $V?”

confirm(food=$V) “Do you want a $V restaurant?”

confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.”

(28)

28

Trainable Generator (Walker et al., 2002)

 Divide the problem into pipeline

 Statistical sentence plan generator

(Stent et al., 2009)

 Statistical surface realization

(Dethlefs et al., 2013; Cuayáhuitl et al., 2014; …)

Sentence Plan Generator Inform(

name=Z_House, price=cheap )

Z House is a cheap restaurant.

Sentence Plan Reranker

Surface Realiser

Pros: can model complex linguistic structures

Cons: heavily engineered, require domain knowledge

₂₈

(29)

29

RNN Language Generator (Wen et al., 2015)

<BOS> SLOT_NAME serves SLOT_FOOD .

<BOS> EAT serves British . Inform(name=EAT, food=British)

0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, 0, 0, 0… …

dialog act 1-hot representation

SLOT_NAME serves SLOT_FOOD . <EOS>

(30)

30

Outline

 Introduction and Background

 Modular Dialogue System

 Spoken/Natural Language Understanding (SLU/NLU)

 Dialogue State Tracking (DST)

 Dialogue Policy

 Natural Language Generation (NLG)

 End-to-End Learning for Dialogue Systems

 Conclusion

30

(31)

31

E2E Joint NLU and DM (Yang et al., 2017)

 Errors from DM can be propagated to NLU for better robustness

DM

(32)

32

E2E Supervised Dialogue System (Wen et al., 2016)

Wen, et al., “A Network-based End-to-End Trainable Task-Oriented Dialogue System,” arXiv.:1604.04562v2.

Can I have korean

Korean

0.7 British 0.2 French 0.1

…

Belief Tracker

0 0 0 … 0 1

MySQL query:

“Select * where food=Korean”

Database Operator Intent Network

Can I have <v.food>

Generation Network

<v.name> serves great <v.food> .

Policy Network

^Copy_field

…

Database

Seven days Curry Prince Nirala Royal Standard Little Seuol

DB pointer

x

^t

z

^t

p

^t

q

^t

32

(33)

33

Info Access Bot (Info-Bot) (Dhingra et al., 2016)

Movie=?; Actor=Bill Murray; Release Year=1993

Find me the Bill Murray’s movie.

I think it came out in 1993.

When was it released?

Groundhog Day is a Bill Murray

movie which came out in 1993. KB-InfoBot User

(Groundhog Day, actor, Bill Murray) (Groundhog Day, release year, 1993) (Australia, actor, Nicole Kidman) (Mad Max: Fury Road, release year,

Knowledge Base (head, relation, tail)

(34)

34

E2E Task-Completion Bot (TC-Bot) (Li et al., 2017)

wi

B- type

wi +1

wi+2

O O

EOS

<intent

>

w_i

B- type

wi +1

w_i+2

O O

EOS

<intent

>

Semantic Frame

request_movie genre=action, date=this weekend

System Action /

Policy

request_location

User Dialogue Action

Inform(location=San Francisco)

Time t-1

w_i

<slot>

w_i

+1

wi+2

O O

EOS

Language Understanding (LU)

Time t-2 Time t

Dialogue Management

(DM)

w₀ w1 w2

Natural Language Generation (NLG)

User EOS

Goal

User Agenda Modeling User Simulator

End-to-End Neural Dialogue System Text Input

Are there any action movies to see this weekend?

Idea: supervised learning for each component and reinforcement learning for end-to-end training the neural dialogue system

34

(35)

35

E2E Task-Completion Bot (TC-Bot) (Li et al., 2017)



User goal: Two tickets for “the witch” tomorrow 9:30 PM at regal meridian 16, Seattle.

RULE BASED SYSTEM

User: Can I get 2 tickets for the witch?

Agent: Which theater do you want?

User: Regal meridian 16

Agent: What time would you like to see it?

User: 9:30 pm

Agent: What date would you like to see it?

User: Tomorrow!

Agent: How many people are you?

User: Just two adults.

Agent: What is your location?

User: Seattle!

Agent: What kind of video format do you like?

User: Oh, any kind is okay.

Agent: Do you have any price constraint?

User: No, I don’t care.

REINFORCEMENT LEARNING SYSTEM User: Can I get 2 tickets for the witch?

Agent: What date would you like to see it?

User: Tomorrow

Agent: Which theater do you prefer?

User: Regal meridian 16 Agent: What is your location?

User: Seattle

Agent: What time would you like to see it?

User: 9:30 pm.

Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm.