Open-Domain Neural Dialogue Systems

(1)

opendialogue.miulab.tw

YUN-NUNG(VIVIAN) CHEN JIANFENGGAO

How can I help you?

(2)

2 Material: http://opendialogue.miulab.tw

Outline

 PART I. Introduction & Background Knowledge

 PART II. Task-Oriented Dialogue Systems

 PART III. Social Chat Bots

 PART IV. Evaluation

 PART V. Recent Trends and Challenges

2

Break

(3)

Introduction & Background Knowledge

Introduction

3

(4)

Outline

 Dialogue System Introduction

 Neural Network Basics

 Reinforcement Learning

 PART II. Task-Oriented Dialogue Systems

 PART III. Social Chat Bots

 PART IV. Evaluation

 PART V. Recent Trends and Challenges

4

(5)

Early 1990s

Early 2000s

2017

Multi-modal systems

e.g., Microsoft MiPad, Pocket PC

Keyword Spotting (e.g., AT&T)

System: “Please say collect, calling card, person, third number, or operator”

TV Voice Search e.g., Bing on Xbox

Intent Determination

(Nuance’s Emily™, AT&T HMIHY) User: “Uh…we want to move…we want to change our phone line from this house to another house” Task-specific argument extraction

(e.g., Nuance, SpeechWorks) User: “I want to fly from Boston to New York next week.”

Brief History of Dialogue Systems

Apple Siri (2011)

Google Now (2012)

Facebook M & Bot (2015)

Google Home (2016) Microsoft Cortana

(2014)

Amazon Alexa/Echo (2014)

Google Assistant (2016)

DARPA CALO Project

Virtual Personal Assistants

(6)

Why We Need?

“I am smart”

“I have a question”

“I need to get this done”

“What should I do?”

6

Turing Test (“I” talk like a human) Information consumption

Task completion Decision support

(7)

Why We Need?

“I am smart”

• What is the employee review schedule?

• Which room is the dialogue tutorial in?

• When is the IJCNLP 2017 conference?

• What does NLP stand for?

(8)

Why We Need?

“I am smart”

• Book me the flight from Seattle to Taipei

• Reserve a table at Din Tai Fung for 5 people, 7PM tonight

• Schedule a meeting with Bill at 10:00 tomorrow.

(9)

Why We Need?

“I am smart”

9

• Is this product worth to buy?

(10)

Why We Need?

“I am smart”

10

Task-Oriented Dialogues

(11)

Language Empowering Intelligent Assistant

Apple Siri (2011) Google Now (2012)

Facebook M & Bot (2015) Google Home (2016)

Microsoft Cortana (2014)

Amazon Alexa/Echo (2014)

Google Assistant (2016)

Apple HomePod (2017)

(12)

Intelligent Assistants

12

Task-Oriented Engaging

(social bots)

(13)

Why Natural Language?

 Global Digital Statistics (2017 January)

Total Population 7.48B

Internet Users 3.77B

Active Social Media Users

2.79B

Unique Mobile Users 4.92B

The more natural and convenient input of devices evolves towards speech.

13

Active Mobile Social Users

2.55B

(14)

Spoken Dialogue System (SDS)

 Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions.

 Spoken dialogue systems are being incorporated into various devices (smart-phones, smart TVs, in- car navigating system, etc).

14

JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion

Good dialogue systems assist users to access information conveniently and finish tasks efficiently.

(15)

App  Bot

 A botis responsible for a “single” domain, similar to an app

15

Users can initiate dialogues instead of following the GUI design

(16)

GUI v.s. CUI (Conversational UI)

16 https://github.com/enginebai/Movie-lol-android

(17)

GUI v.s. CUI (Conversational UI)

Website/APP’s GUI Msg’s CUI

Situation Navigation, no specific goal Searching, with specific goal

Information Quantity More Less

Information Precision Low High

Display Structured Non-structured

Interface Graphics Language

Manipulation Click mainly use texts or speech as input

Learning Need time to learn and adapt No need to learn

Entrance App download Incorporatedin any msg-based interface

Flexibility Low, like machine manipulation High, like converse with a human

17

(18)

Two Branches of Dialogue Systems

 Personal assistant, helps users achieve a certain task

 Combination of rules and statistical components

 POMDP for spoken dialog systems (Williams and Young, 2007)

 End-to-end trainable task-oriented dialogue system (Wen et al., 2016)

 End-to-end reinforcement learning dialogue system (Li et al., 2017; Zhao and Eskenazi, 2016)

 No specific goal, focus on natural responses

 Using variants of seq2seq model

 A neural conversation model (Vinyals and Le, 2015)

 Reinforcement learning for dialogue generation (Li et al., 2016)

 Conversational contextual cues for response ranking (AI-Rfou et al., 2016)

18

Task-Oriented Bot Chit-Chat Bot

(19)

Task-Oriented Dialogue System

(Young, 2000)

19

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame request_movie

genre=action, date=this weekend

System Action/Policy request_location Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Backend Action / Knowledge Providers

(20)

Outline

 Dialogue System Introduction

 Neural Network Basics

 Reinforcement Learning

20

(21)

Machine Learning ≈ Looking for a Function

 Speech Recognition

 Image Recognition

 Go Playing

 Chat Bot





^

f





^

f





^

f





^

f

cat

“你好 (Hello) ”

5-5 (next move)

“Where is IJCNLP?” “The location is…”

Given a large amount of data, the machine learns what the function f should be.

(22)

Machine Learning

22

Machine Learning

Unsupervised Learning Supervised

Learning

Reinforcement Learning

Deep learning is a type of machine learning approaches, called “neural networks”.

(23)

A Single Neuron

z w

1

w

2

w

N

…

x

1

x

2

x

N

 b

 

^z

 ^   ^z

bias

z

y

 

_z

z e_

  1



1

Sigmoid function Activation function

1

w, bare the parameters of this neuron

23

(24)

A Single Neuron

z w

1

w

2

w

N

…

x

1

x

2

x

N



b

bias

y

1

  





5 . 0

"

2 "

5 . 0

"

2 "

y not

y is

A single neuron can only handle binary classification

24

M

N R

R

f : 

(25)

A Layer of Neurons

 Handwriting digit classification f :R^N  R^M

A layer of neurons can handle multiple possible output, and the result depends on the max one

…

x

1

x

2

x

N



1

 y

₁



… …

“1” or not

“2” or not

“3” or not

y

2

y

3

10 neurons/10 classes

Which one is max?

(26)

Deep Neural Networks (DNN)

 Fully connected feedforward network

x1

x2

……

Layer 1

……

y1

y2

……

Layer 2

……

Layer L

……

Input Output

yM

xN

vector x

vector y

Deep NN: multiple hidden layers

M

N R

R

f : 

(27)

Recurrent Neural Network (RNN)

http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/

: tanh, ReLU

time

RNN can learn accumulated sequential information (time-series)

(28)

Outline

 Dialogue System Introduction

 Neural Network Basics

 Reinforcement Learning

28

(29)

Reinforcement Learning

 RL is a general purpose framework for decision making

 RL is for an agentwith the capacity to act

 Each actioninfluences the agent’s future state

 Success is measured by a scalar rewardsignal

 Goal: select actions to maximize future reward

(30)

Scenario of Reinforcement Learning

Agent learns to take actions to maximize expected reward.

Environment

Observation o_t Action a_t

Reward r_t If win, reward = 1 If loss, reward = -1 Otherwise, reward = 0

Next Move

(31)

Supervised v.s. Reinforcement

 Supervised

 Reinforcement

31

……

Say “Hi”

Say “Good bye”

Learning from teacher

Learning from critics

Hello  ^……

“Hello”

“Bye bye”

……. ^…….

OXX???!

Bad

(32)

Sequential Decision Making

 Goal: select actions to maximize total future reward

 Actions may have long-term consequences

 Reward may be delayed

 It may be better to sacrifice immediate reward to gain more long-term reward

32

(33)

Deep Reinforcement Learning

Environment

Observation Action

Reward Function

Input

Function Output

Used to pick the best function

……

… DNN

(34)

Reinforcing Learning

 Start from state s₀

 Choose action a₀

 Transit to ^s₁^{~ P(s}₀^{, a}₀⁾

 Continue…

 Total reward:

Goal: select actions that maximize the expected total reward

(35)

Reinforcement Learning Approach

 Policy-based RL

 Search directly for optimal policy

 Value-based RL

 Estimate the optimal value function

 Model-based RL

 Build a model of the environment

 Plan (e.g. by lookahead) using model

is the policy achieving maximum future reward

is maximum value achievable under any policy

(36)

Task-Oriented Dialogue Systems

36

(37)

Task-Oriented Dialogue System

(Young, 2000)

37

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

System Action/Policy request_location Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Backend Action / Knowledge Providers

(38)

Outline

 PART I. Introduction & Background Knowledge

 Spoken/Natural Language Understanding (SLU/NLU)

 Dialogue Management – Dialogue State Tracking (DST)

 Dialogue Management – Dialogue Policy Optimization

 Natural Language Generation (NLG)

 End-to-End Task-Oriented Dialogue Systems

38

(39)

Language Understanding (LU)

 Pipelined

39

1. Domain Classification

2. Intent

Classification 3. Slot Filling

(40)

LU – Domain/Intent Classification

• Given a collection of utterances u_iwith labels c_i, D= {(u₁,c₁),…,(u_n,c_n)}

where c_i∊ C, train a model to estimate labels for new utterances u_k. Mainly viewed as an utterance classification task

40

find me a cheap taiwanese restaurant in oakland

Movies Restaurants Sports

Weather Music

…

Find_movie Buy_tickets Find_restaurant Book_table Find_lyrics

…

(41)

DNN for Domain/Intent Classification

(Ravuri & Stolcke, 2015)

41

Intent decision after reading all words performs better

 RNN and LSTMs for utterance classification

(42)

DNN for Dialogue Act Classification

(Lee & Dernoncourt, 2016)

42

 RNN and CNNs for dialogue act classification

(43)

LU – Slot Filling

43

flights from Boston to New York today

O O B-city O B-city I-city O

O O B-dept O B-arrival I-arrival B-date

As a sequence tagging task

• Given a collection tagged word sequences, S={((w_1,1,w_1,2,…, w_1,n1), (t_1,1,t_1,2,…,t_1,n1)), ((w_2,1,w_2,2,…,w_2,n2), (t_2,1,t_2,2,…,t_2,n2)) …}

where t_i ∊ M, the goal is to estimate tags for a new word sequence.

flights from Boston to New York today

Entity Tag Slot Tag

(44)

RNN for Slot Tagging – I

(Yao et al, 2013; Mesnil et al, 2015)

 Variations:

a. RNNs with LSTM cells

b. Input, sliding window of n-grams

c. Bi-directional LSTMs

𝑤₀ 𝑤₁ 𝑤₂ 𝑤_𝑛 ℎ₀^𝑓 ℎ₁^𝑓 ℎ₂^𝑓 ℎ_𝑛^𝑓 ℎ₀^𝑏 ℎ₁^𝑏 ℎ₂^𝑏 ℎ_𝑛^𝑏 𝑦₀ 𝑦₁ 𝑦₂ 𝑦_𝑛

(b) LSTM-LA (c) bLSTM

𝑦₀ 𝑦₁ 𝑦₂ 𝑦_𝑛

𝑤₀ 𝑤₁ 𝑤₂ 𝑤_𝑛 ℎ₀ ℎ₁ ℎ₂ ℎ_𝑛

(a) LSTM 𝑦₀ 𝑦₁ 𝑦₂ 𝑦_𝑛

𝑤₀ 𝑤₁ 𝑤₂ 𝑤_𝑛 ℎ₀ ℎ₁ ℎ₂ ℎ_𝑛

(45)

RNN for Slot Tagging – II

(Kurata et al., 2016; Simonnet et al., 2015)

 Encoder-decoder networks

 Leverages sentence level information

 Attention-based encoder-decoder

 Use of attention (as in MT) in the encoder-decoder network

 Attention is estimated using a feed-

forward network with input: h_t and s_t at time t

𝑤_𝑛 𝑤₂ 𝑤₁ 𝑤₀ ℎ_𝑛 ℎ₂ ℎ₁ ℎ₀

𝑤₀ 𝑤₁ 𝑤₂ 𝑤_𝑛

ℎ₀ ℎ₁ ℎ₂ ℎ_𝑛 𝑠₀ 𝑠₁ 𝑠₂ 𝑠_𝑛

c_i ℎ₀…ℎ_𝑛

(46)

RNN for Slot Tagging – III

(Jaech et al., 2016; Tafforeau et al., 2016)

 Multi-task learning

 Goal: exploit data from domains/tasks with a lot of data to improve ones with less data

 Lower layers are shared across domains/tasks

 Output layer is specific to task

46

(47)

Joint Segmentation and Slot Tagging

(Zhai et al., 2017)

 Encoder that segments

 Decoder that tags the segments

47

(48)

h_t-

1

h_t+

1

h_t

W W W W

taiwanese

B-type U

food U

please U

V

O V

h_T+1 EOS U

FIND_REST V

Slot Filling Intent Prediction

Joint Semantic Frame Parsing

Sequence- based (Hakkani-

Tur+ 16)

• Slot filling and intent prediction in the same

output sequence

Parallel- based (Liu+ 16)

• Intent prediction and slot filling are performed in two branches

48

(49)

Contextual LU

49

just sent email to bob about fishing this weekend

O O O O

B-contact_name O

B-subject I-subject I-subject U

S

I send_email D communication

 send_email(contact_name=“bob”, subject=“fishing this weekend”)

are we going to fish this weekend U₁

S₂

 send_email(message=“are we going to fish this weekend”) send email to bob

U₂

 send_email(contact_name=“bob”)

B-message

I-messageI-message I-message I-message I-message I-message

B-contact_name S₁

Domain Identification  Intent Prediction  Slot Filling

(50)

Contextual LU

 User utterances are highly ambiguous in isolation

Cascal, for 6.

#people time

?

Book a table for 10 people tonight.

Which restaurant would you like to book a table for?

Restaurant Booking

(51)

Contextual LU

(Bhargava et al., 2013; Hori et al, 2015)

 Leveraging contexts

 Used for individual tasks

 Seq2Seq model

 Words are input one at a time, tags are output at the end of each utterance

 Extension: LSTM with speaker role dependent layers

51

(52)

End-to-End Memory Networks

(Sukhbaatar et al, 2015)

U: “i d like to purchase tickets to see deepwater horizon”

S: “for which theatre”

U: “angelika”

S: “you want them for angelika theatre?”

U: “yes angelika”

S: “how many tickets would you like ?”

U: “3 tickets for saturday”

S: “What time would you like ?”

U: “Any time on saturday is fine”

S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm”

U: “Let’s do 5:40”

m₀

m_i

m_n-1 u

(53)

E2E MemNN for Contextual LU

(Chen et al., 2016)

53

u

Knowledge Attention Distribution

p_i

m_i

Memory Representation

Weighted

Sum h

∑ W_kg

Knowledge Encoding o

Representation history utterances {x_i}

current utterance

c

Inner Product Sentence

Encoder RNN_in

x1 x2 … xi

Contextual Sentence Encoder

x1 x2 … xi

RNN_mem

slot tagging sequencey

h_t-1 h_t

V V

W W W

w_t-1 w_t y_t-1 y_t

U U

M M

1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding

Idea: additionally incorporating contextual knowledge during slot tagging

 track dialogue states in a latent way

RNN Tagger

(54)

Analysis of Attention

U: “i d like to purchase tickets to see deepwater horizon”

S: “for which theatre”

U: “angelika”

S: “you want them for angelika theatre?”

U: “yes angelika”

S: “how many tickets would you like ?”

U: “3 tickets for saturday”

S: “What time would you like ?”

U: “Any time on saturday is fine”

S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm”

U: “Let’s do 5:40”

0.69

0.13

0.16

(55)

Sequential Dialogue Encoder Network

(Bapna et al., 2017)

 Past and current turn encodings input to a feed forward network

55 Bapna et.al., SIGDIAL 2017

(56)

Structural LU

(Chen et al., 2016)

 K-SAN: prior knowledge as a teacher

56

Knowledge Encoding

Sentence Encoding

Inner Product

m_i

Knowledge Attention Distribution

p_i

Encoded Knowledge Representation

Weighted Sum

∑

Knowledge- Guided Representation

slot tagging sequence knowledge-guided structure {x_i}

showme theflights fromseattleto sanfrancisco

ROOT

Input Sentence

W W W W

w_t-1

y_t-1 U

w_t M U

w_t+1 U

V

y_t V

y_t+1 V M

M

RNN Tagger

Knowledge Encoding Module

(57)

Structural LU

(Chen et al., 2016)

 Sentence structural knowledge stored as memory

57

Semantics (AMR Graph)

show me

the

flights from seattle

to

san francisco ROOT

1.

3.

4.

2.

show

you flight I

1.

2.

4.

city city

Seattle San Francisco 3.

Sentence s show me the flights from seattle to san francisco

Syntax (Dependency Tree)

(58)

Structural LU

(Chen et al., 2016)

 Sentence structural knowledge stored as memory

Using less training data with K-SAN allows the model pay the similar attention to the salient substructures that are important for tagging.

(59)

LU Importance

(Li et al., 2017)

 Compare different types of LU errors

Slot filling is more important than intent detection in language understanding

Sensitivity to Intent Error Sensitivity to Slot Error

(60)

LU Evaluation



Metrics

 Sub-sentence-level: intent accuracy, slot F1

 Sentence-level: whole frame accuracy

60

(61)

Outline

 Spoken/Natural Language Understanding (SLU/NLU)

 Dialogue Management – Dialogue State Tracking (DST)

61

(62)

Elements of Dialogue Management

(Figure from Gašić) 62

Dialogue State Tracking

(63)

Dialogue State Tracking (DST)

 Maintain a probabilistic distribution instead of a 1-best prediction for better robustness to SLU errors or ambiguous input

63

How can I help you?

Book a table at Sumiko for 5 How many people?

3

Slot Value

# people 5 (0.5)

time 5 (0.5)

Slot Value

# people 3 (0.8)

time 5 (0.8)

(64)

Multi-Domain Dialogue State Tracking (DST)

 A full representation of the system's belief of the user's goal at any point during the dialogue

 Used for making API calls

64

Do you wanna take Angela to go see a movie tonight?

Sure, I will be home by 6.

Let's grab dinner before the movie.

How about some Mexican?

Let's go to Vive Sol and see Inferno after that.

Angela wants to watch the Trolls movie.

Ok. Lets catch the 8 pm show.

Inferno

6 pm 7 pm

2 3

11/15/16

Vive Sol Restaurant

Mexican Cuisine

6:30 pm 7 pm 11/15/16 Date

Time

Restaurants

7:30 pm

Century 16

Trolls

8 pm 9 pm

Movies

(65)

Dialog State Tracking Challenge (DSTC)

(Williams et al. 2013, Henderson et al. 2014, Henderson et al. 2014, Kim et al. 2016, Kim et al. 2016)

Challenge Type Domain Data Provider Main Theme

DSTC1 Human-Machine Bus Route CMU Evaluation Metrics

DSTC2 Human-Machine Restaurant U. Cambridge User Goal Changes

DSTC3 Human-Machine Tourist Information U. Cambridge Domain Adaptation

DSTC4 Human-Human Tourist Information I2R Human Conversation

DSTC5 Human-Human Tourist Information I2R Language Adaptation

(66)

NN-Based DST

(Henderson et al., 2013; Mrkšić et al., 2015; Mrkšić et al., 2016)

(Figure from Wen et al, 2016) 66

(67)

Neural Belief Tracker

(Mrkšić et al., 2016)

67

(68)

DST Evaluation

 Dialogue State Tracking Challenges

 DSTC2-3, human-machine

 DSTC4-5, human-human

 Metric

 Tracked state accuracy with respect to user goal

 Recall/Precision/F-measure individual slots

68

(69)

Outline

 Dialogue Management – Dialogue Policy Optimization

69

(70)

Elements of Dialogue Management

(Figure from Gašić) 70

Dialogue Policy Optimization

(71)

Dialogue Policy Optimization

 Dialogue management in a RL framework

71

U s e r

Reward R Observation O Action A

Environment

Agent

Natural Language Generation Language Understanding

Dialogue Manager

Slides credited by Pei-Hao Su

Optimized dialogue policy selects the best action that can maximize the future reward.

Correct rewards are a crucial factor in dialogue policy training

(72)

Reward for RL ≅ Evaluation for System

 Dialogue is a special RL task

 Human involves in interaction and rating (evaluation) of a dialogue

 Fully human-in-the-loop framework

 Rating: correctness, appropriateness, and adequacy

- Expert rating high quality, high cost

- User rating unreliable quality, medium cost - Objective rating Check desired aspects, low cost

72

(73)

Reinforcement Learning for Dialogue Policy Optimization

73

Language understanding

Language (response) generation

Dialogue Policy 𝑎 = 𝜋(𝑠)

Collect rewards (𝑠, 𝑎, 𝑟, 𝑠’)

Optimize 𝑄(𝑠, 𝑎) User input (o)

Response

𝑠

𝑎

Type of Bots State Action Reward

Social ChatBots Chat history System Response # of turns maximized;

Intrinsically motivated reward

InfoBots (interactive Q/A) User current question + Context

Answers to current question

Relevance of answer;

# of turns minimized

Task-Completion Bots User current input + Context

System dialogue act w/

slot value (or API calls)

Task success rate;

# of turns minimized

Goal: develop a generic deep RL algorithm to learn dialogue policy for all bot categories

(74)

Dialogue Reinforcement Learning Signal

 Typical reward function

 -1 for per turn penalty

 Large reward at completion if successful

 Typically requires domain knowledge

✔ Simulated user

✔ Paid users (Amazon Mechanical Turk)

✖ Real users

|||

…

﹅

74

The user simulator is usually required for dialogue system training before deployment

(75)

Neural Dialogue Manager

(Li et al., 2017)

 Deep Q-network for training DM policy

 Input: current semantic frame observation, database returned results

 Output: system action

System Action/Policy request_location DQN-based

Dialogue Management

Simulated User (DM) Backend DB

Material: http://deepdialogue.miulab.tw

(76)

SL + RL for Sample Efficiency

(Su et al., 2017)

 Issue about RL for DM

 slow learning speed

 cold start

 Solutions

 Sample-efficient actor-critic

Off-policy learning with experience replay

Better gradient update

 Utilizing supervised data

Pretrain the model with SL and then fine-tune with RL

Mix SL and RL data during RL learning

Combine both

76

(77)

Online Training

(Su et al., 2015; Su et al., 2016)

 Policy learning from real users

 Infer reward directly from dialogues (Su et al., 2015)

 User rating (Su et al., 2016)

 Reward modeling on user binary success rating

Reward

Model Success/Fail Embedding

Function

Dialogue Representation

Reinforcement Signal Query rating

(78)

Interactive RL for DM

(Shah et al., 2016)

78

Immediate Feedback

Use a third agent for providing interactive feedback to the DM

(79)

Dialogue Management Evaluation

 Metrics

 Turn-level evaluation: system action accuracy

 Dialogue-level evaluation: task success rate, reward

79

(80)

Outline

 Natural Language Generation (NLG)

80

(81)

Natural Language Generation (NLG)

 Mapping semantic frame into natural language

inform(name=Seven_Days, foodtype=Chinese) Seven Days is a nice Chinese restaurant

81

(82)

Template-Based NLG

 Define a set of rules to map frames to NL

82

Pros:simple, error-free, easy to control Cons: time-consuming, poor scalability Semantic Frame Natural Language

confirm() “Please tell me more about the product your are looking for.”

confirm(area=$V) “Do you want somewhere in the $V?”

confirm(food=$V) “Do you want a $V restaurant?”

confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.”

(83)

Plan-Based NLG

(Walker et al., 2002)

 Divide the problem into pipeline

 Statistical sentence plan generator (Stent et al., 2009)

 Statistical surface realizer (Dethlefs et al., 2013; Cuayáhuitl et al., 2014; …) Inform(

name=Z_House, price=cheap )

Z House is a cheap restaurant.

Pros:can model complex linguistic structures

Cons: heavily engineered, require domain knowledge Sentence

Plan Generator

Sentence Plan Reranker

Surface Realizer

syntactic tree

(84)

Class-Based LM NLG

(Oh and Rudnicky, 2000)

 Class-based language modeling

 NLG by decoding

84

Pros:easy to implement/ understand, simple rules Cons: computationally inefficient

Classes:

inform_area inform_address

…

request_area request_postcode

(85)

RNN-Based LM NLG

(Wen et al., 2015)

<BOS> SLOT_NAME serves SLOT_FOOD .

<BOS> Din Tai Fung serves Taiwanese . delexicalisation

Inform(name=Din Tai Fung, food=Taiwanese) 0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, 0, 0, 0…

dialogue act 1-hot representation

SLOT_NAME serves SLOT_FOOD . <EOS>

Slot weight tying

conditioned on the dialogue act

Input

Output

(86)

Handling Semantic Repetition

 Issue: semantic repetition

 Din Tai Fung is a great Taiwanese restaurant that serves Taiwanese.

 Din Tai Fung is a child friendly restaurant, and also allows kids.

 Deficiency in either model or decoding (or both)

 Mitigation

 Post-processing rules (Oh & Rudnicky, 2000)

 Gating mechanism (Wen et al., 2015)

 Attention(Mei et al., 2016; Wen et al., 2015)

86

Open-Domain Neural Dialogue Systems

Outline

Introduction & Background Knowledge

Outline

Brief History of Dialogue Systems

Why We Need?

Why We Need?

Why We Need?

Why We Need?

Why We Need?

Language Empowering Intelligent Assistant

Intelligent Assistants

Why Natural Language?

Spoken Dialogue System (SDS)

App  Bot

GUI v.s. CUI (Conversational UI)

GUI v.s. CUI (Conversational UI)

Two Branches of Dialogue Systems

Task-Oriented Dialogue System

Outline

Machine Learning ≈ Looking for a Function

















Machine Learning

A Single Neuron

z w

w

w

x

x

x

 b

 

    z

z

y

 



A Single Neuron

z w

w

w

x

x

x



b

y

  





5 . 0

"

2

"

5 . 0

"

2

"

y not

y is

A Layer of Neurons

x

x

x



 y



… …

y

y

Deep Neural Networks (DNN)

Recurrent Neural Network (RNN)

Outline

 ^   ^z