• 沒有找到結果。

Slides credit from Gašić

N/A
N/A
Protected

Academic year: 2022

Share "Slides credit from Gašić"

Copied!
65
0
0

加載中.... (立即查看全文)

全文

(1)

Slides credit from Gašić

(2)

Review

2

(3)

3

Task-Oriented Dialogue System

(Young, 2000)

3

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Backend Database/

Knowledge Providers

http://rsta.royalsocietypublishing.org/content/358/1769/1389.short

(4)

4

Task-Oriented Dialogue System

(Young, 2000)

4

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Natural Language Generation (NLG)

Hypothesis

are there any action movies to see this weekend

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Backend Action / Knowledge Providers

http://rsta.royalsocietypublishing.org/content/358/1769/1389.short

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy

(5)

Dialogue Management

5

(6)

6

Example Dialogue

6

request (restaurant; foodtype=Thai)

inform (area=centre)

request (address)

bye ()

(7)

7

Elements of Dialogue Management

(Figure from Gašić)7

(8)

8

Dialogue State Tracking (DST)

Maintain a probabilistic distribution instead of a 1-best prediction for better robustness to recognition errors

8

Incorrect for both!

(9)

9

Dialogue State Tracking (DST)

Maintain a probabilistic distribution instead of a 1-best prediction for better robustness to SLU errors or

ambiguous input

9

How can I help you?

Book a table at Sumiko for 5 How many people?

3

Slot Value

# people 5 (0.5)

time 5 (0.5)

Slot Value

# people 3 (0.8)

time 5 (0.8)

(10)

10

1-Best Input w/o State Tracking

10

(11)

11

N-Best Inputs w/o State Tracking

11

(12)

12

N-Best Inputs w/ State Tracking

12

(13)

13

Dialogue State Tracking (DST)

Definition

Representation of the system's belief of the user's goal(s) at any time during the dialogue

Challenge

How to define the state space?

How to tractably maintain the dialogue state?

Which actions to take for each state?

13

Define dialogue as a control problem where the behavior can be automatically learned

(14)

Introduction to RL

14

Reinforcement Learning

(15)

15

Reinforcement Learning

RL is a general purpose framework for decision making

RL is for an agentwith the capacity to act

Each actioninfluences the agent’s future state

Success is measured by a scalar rewardsignal

Goal: select actions to maximize future reward Big three: action, state, reward

(16)

16

Reinforcement Learning

16

Agent

Environment

Observation Action

Reward Don’t do

that

(17)

17

Reinforcement Learning

17

Agent

Environment

Observation Action

Reward Thank you.

Agent learns to take actions to maximize expected reward.

(18)

18

Supervised v.s. Reinforcement

Supervised

Reinforcement

18

Hello 

Agent

……

Agent

……. …….

……

Bad

“Hello” Say “Hi”

“Bye bye” Say “Good bye”

Learning from teacher

Learning from critics

(19)

Scenario of Reinforcement Learning

Environment

Observation Action

Reward If win, reward = 1 If loss, reward = -1

Agent learns to take actions to maximize expected reward.

19

Otherwise, reward = 0

Next Move

(20)

20

RL Based AI Examples

Play games: Atari, poker, Go, …

Explore worlds: 3D worlds, …

Control physical systems: manipulate, …

Interact with users: recommend, optimize, personalize, …

(21)

21

Agent and Environment

→←

MoveRight MoveLeft

observation ot action at

reward rt Agent

Environment

(22)

22

Agent and Environment

At time step t

The agent

Executes action at

Receives observation ot

Receives scalar reward rt

The environment

Receives action at

Emits observation ot+1

Emits scalar reward rt+1

t increments at env. step

observation ot

action at

reward rt

(23)

23

State

Experience is the sequence of observations, actions, rewards

State is the information used to determine what happens next

what happens depends on the history experience

The agent selects actions

The environment selects observations/rewards

The state is the function of the history experience

(24)

24

observation ot

action at

reward rt

Environment State

The environment state 𝑠𝑡𝑒 is the environment’s private

representation

whether data the environment uses to pick the next

observation/reward

may not be visible to the agent

may contain irrelevant information

(25)

25

observation ot

action at

reward rt

Agent State

The agent state 𝑠𝑡𝑎 is the agent’s internal representation

whether data the agent uses to pick the next action 

information used by RL algorithms

can be any function of experience

(26)

26

Information State

An information state (a.k.a. Markov state) contains all useful information from history

The future is independent of the past given the present

Once the state is known, the history may be thrown away

The state is a sufficient statistics of the future A state is Markov iff

(27)

27

Fully Observable Environment

Full observability: agent directly observes environment state

information state = agent state = environment state

This is a Markov decision process (MDP)

(28)

28

Partially Observable Environment

Partial observability: agent indirectly observes environment

agent state ≠ environment state

Agent must construct its own state representation 𝑠𝑡𝑎

Complete history:

Beliefs of environment state:

Hidden state (from RNN):

This is partially observable Markov decision process (POMDP)

(29)

29

Reward

Reinforcement learning is based on reward hypothesis

A reward rt is a scalar feedback signal

Indicates how well agent is doing at step t

Reward hypothesis: all agent goals can be desired by maximizing expected cumulative reward

(30)

30

Sequential Decision Making

Goal: select actions to maximize total future reward

Actions may have long-term consequences

Reward may be delayed

It may be better to sacrifice immediate reward to gain more long-term reward

30

(31)

31

Elements of Dialogue Management

(Figure from Gašić)31

Dialogue state tracking

(32)

32

Generative v.s. Discriminative

Generative

The state generates the observation

Discriminative

The state depends on the observation

32

(33)

Generative Approach

33

Dialogue State Tracking

(34)

34

Markov Process

Markov process is a memoryless random process

a sequence of random states S1, S2, ... with the Markov property

34

Student Markov chain

Sample episodes from S1=C1

• C1 C2 C3 Pass Sleep

• C1 FB FB C1 C2 Sleep

• C1 C2 C3 Pub C2 C3 Pass Sleep

• C1 FB FB C1 C2 C3 Pub

• C1 FB FB FB C1 C2 C3 Pub C2 Sleep

(35)

35

Student MRP

Markov Reward Process (MRP)

Markov reward process is a Markov chain with values

The return Gt is the total discounted reward from time-step t

35

(36)

36

Markov Decision Process (MDP)

Markov decision process is a MRP with decisions

It is an environment in which all states are Markov

36

Student MDP

(37)

37

Markov Decision Process (MDP)

S: finite set of states/observations

A: finite set of actions

P : transition probability

R : immediate reward

γ : discount factor

Goal is to choose policy π at time t that maximizes expected overall return:

37

(38)

38

DM as Markov Decision Process (MDP)

38

Data

Model

Prediction

• Dialogue states

• Reward – a measure of dialogue quality

• System actions

• Markov decision process (MDP)

(39)

39

DM as Partially Observable Markov Decision Process (POMDP)

39

Data

Model

Prediction

• Noisy observation of dialogue states

• Reward – a measure of dialogue quality

• Distribution over dialogue states – Dialogue State Tracking

• Optimal system actions

• Partially observable Markov decision process (POMDP)

(40)

40

Markov Decision Process (MDP)

States can be fully observed

State depends on the

previous state and the action

st+1 at

st

rt transition probability

(41)

41

Partially Observable Markov Decision Process (POMDP)

State generates a noisy observation

st+1 at

st

rt

ot+1 ot

observation probability

transition probability

State is unobservable and depends on the previous state and the action

summation over all possible states at every dialogue turn – intractable!

(42)

42

Dialogue State Tracking (DST)

Requirement

Dialogue history

Keep tracking of what happened so far in the dialogue

Normally done via Markov property

Task-oriented dialogue

Need to know what the user wants

Modeled via the user goal

Robustness to errors

Need to know what the user says

Modeled via the user action

42

(43)

43

Decompose dialogue state into

conditionally independent elements

User goal gt

User action ut

Dialogue history dt

Dialogue State Factorization

at

rt

ot+1 ot

summation over all possible goals – intractable!

summation over all possible histories and user actions – intractable!

ut

dt gt

ut+1

dt+1 gt+1

(44)

44

Generative DST

POMDPs are normally intractable for everything

Two approximations enable POMDP for dialogues

I. Hidden Information State (HIS) system (Young et al., 2010)

II. Bayesian Update of Dialogue State (BUDS) system

(Thomson and Young, 2010)

44

(45)

45

Hidden Information State (HIS)

Dialogue state: distribution over most likely hypotheses 45

(46)

46

HIS Partitions

46

=

(47)

47

Pruning

47

=

(48)

48

Pruning

48

=

(49)

49

Bayesian Update of Dialogue State (BUDS)

Idea

Further decomposes the dialogue state

Produce tractable state update

Transition and observation probability distributions can be parameterized

49

(50)

50

BUDS Belief Tracking

Expectation propagation

Allow parameters tying

Handle factorized hidden variables

Handle large sate spaces

Example

50

(51)

Discriminative Approach

51

Dialogue State Tracking

(52)

52

Generative v.s. Discriminative

Generative

The state generates the observation

Discriminative

The state depends on the observation

52

Directly model dialogue states given arbitrary input features Assumption: observations at each turn are independent

(53)

53

DST Problem Formulation

The DST dataset consists of

Goal: for each informable slot

e.g. price=cheap

Requested: slots by the user

e.g. moviename

Method: search method for entities

e.g. by constraints, by name

The dialogue state is

the distribution over possible slot-value pairs for goals

the distribution over possible requested slots

the distribution over possible methods

53

(54)

54

Class-Based DST

54

Data

Model

Prediction

• Observations labeled w/ dialogue state

• Distribution over dialogue states – Dialogue State Tracking

• Neural networks

• Ranking models

(55)

55

DNN for DST

55

feature

extraction DNN

A slot value distribution for each slot

multi-turn conversation

state of this turn

(56)

56

Sequence-Based DST

56

Data

Model

Prediction

• Sequence of observations labeled w/

dialogue state

• Distribution over dialogue states – Dialogue State Tracking

• Recurrent neural networks (RNN)

(57)

57

Recurrent Neural Network (RNN)

Elman-type

Jordan-type

57

(58)

58

RNN DST

Idea: internal memory for representing dialogue context

Input

most recent dialogue turn

last machine dialogue act

dialogue state

memory layer

Output

update its internal memory

distribution over slot values

58

(59)

59

RNN-CNN DST

(Figure from Wen et al, 2016)59 http://www.anthology.aclweb.org/W/W13/W13-4073.pdf; https://arxiv.org/abs/1506.07190

(60)

60

Multichannel Tracker

(Shi et al., 2016)

60

Training a multichannel CNN for each slot

Chinese character CNN

Chinese word CNN

English word CNN

https://arxiv.org/abs/1701.06247

(61)

61

DST Evaluation

Metric

Tracked state accuracy with respect to user goal

L2-norm of the hypothesized dist. and the true label

Recall/Precision/F-measure individual slots

61

(62)

62

Dialog State Tracking Challenge (DSTC)

(Williams et al. 2013, Henderson et al. 2014, Henderson et al. 2014, Kim et al. 2016, Kim et al. 2016)

Challenge Type Domain Data Provider Main Theme DSTC1 Human-

Machine Bus Route CMU Evaluation Metrics

DSTC2 Human-

Machine Restaurant U. Cambridge User Goal Changes DSTC3 Human-

Machine Tourist Information U. Cambridge Domain Adaptation DSTC4 Human-

Human Tourist Information I2R Human Conversation DSTC5 Human-

Human Tourist Information I2R Language Adaptation

(63)

63

DSTC1

Type: Human-Machine

Domain: Bus Route

63

(64)

64

DSTC4-5

Type: Human-Human

Domain: Tourist Information

64

Tourist: Can you give me some uh- tell me some cheap rate hotels, because I'm planning just to leave my bags there and go somewhere take some pictures.

Guide: Okay. I'm going to recommend firstly you want to have a backpack type of hotel, right?

Tourist: Yes. I'm just gonna bring my backpack and my buddy with me. So I'm kinda looking for a hotel that is not that expensive. Just gonna leave our things there and, you know, stay out the whole day.

Guide: Okay. Let me get you hm hm. So you don't mind if it's a bit uh not so roomy like hotel because you just back to sleep.

Tourist: Yes. Yes. As we just gonna put our things there and then go out to take some pictures.

Guide: Okay, um- Tourist: Hm.

Guide: Let's try this one, okay?

Tourist: Okay.

Guide: It's InnCrowd Backpackers Hostel in Singapore. If you take a dorm bed per person only twenty dollars. If you take a room, it's two single beds at fifty nine dollars.

Tourist: Um. Wow, that's good.

Guide: Yah, the prices are based on per person per bed or dorm. But this one is room. So it should be fifty nine for the two room. So you're actually paying about ten dollars more per person only.

Tourist: Oh okay. That's- the price is reasonable actually. It's good.

{Topic: Accommodation; Type: Hostel; Pricerange:

Cheap; GuideAct: ACK; TouristAct: REQ}

{Topic: Accommodation; NAME: InnCrowd Backpackers Hostel; GuideAct: REC; TouristAct: ACK}

(65)

65

Concluding Remarks

Dialogue state tracking (DST) of DM has Markov assumption to model the user goal and be robust to errors

Generative models for DST are based on POMDP

Hidden Information State (HIS)

state  user goal, user action dialogue history

transitions are hand-crafted and the goals are grouped together to allow tractable belief tracking

65

Bayesian Update of Dialogue State (BUDS)

further factorizes the state

allows tractable belief tracking and learning of the shapers of distributions via expectation propagation

Discriminative models directly estimate dialogue states given arbitrary input features

參考文獻

相關文件

Agent learns to take actions maximizing expected reward.. Machine Learning ≈ Looking for

 Goal: select actions to maximize future reward Big three: action, state, reward.. Scenario of Reinforcement Learning.. Agent learns to take actions to maximize expected

◦ Value function: how good is each state and/or action1. ◦ Model: agent’s representation of

State value function: when using

3. Works better for some tasks to use grammatical tree structure Language recursion is still up to debate.. Recursive Neural Network Architecture. A network is to predict the

Pascanu et al., “On the difficulty of training recurrent neural networks,” in ICML, 2013..

◉ State-action value function: when using

Idea: condition the neural network on all previous words and tie the weights at each time step. Assumption: temporal