• 沒有找到結果。

in DialoguesY

N/A
N/A
Protected

Academic year: 2022

Share "in DialoguesY"

Copied!
58
0
0

加載中.... (立即查看全文)

全文

(1)

How the Context Matters Language & Interaction in Dialogues

Y

UN

-N

UNG

(V

IVIAN

) C

HEN

(2)

Introduction

Word-Level Contexts in Sentences

Learning from Prior Knowledge –

Knowledge-Guided Structural Attention Networks (K-SAN) [Chen et al., ‘16]

Learning from Observations –

Modularizing Unsupervised Sense Embedding (MUSE) [Lee & Chen, ‘17]

Sentence-Level Contexts in Dialogues

Inference –

Leveraging Behavioral Patterns for Personalized Understanding [Chen et al., ‘15]

Investigation of Understanding Impact –

Reinforcement Learning Based Neural Dialogue System [Li et al., ‘17]

Misunderstanding Impact [Li et al., ‘17]

(3)

3

(4)

Dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via conversational interactions.

Dialogue systems are being incorporated into various devices (smart- phones, smart TVs, in-car navigating system, etc).

JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion

(5)

5

LU and DM significantly benefit from contexts in sentences and in dialogues

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame

request_movie

genre=action, date=this weekend

System Action/Policy

request_location

Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Backend Action / Knowledge Providers

(6)

Word-level context

Prior knowledge such as linguistic syntax

Collocated words

Sentence-level context

Smartphone companies including apple, blueberry, and sony will be invited.

show me the flights from seattle to san francisco

(browsing movie reviews…)

Find me a good action movie this weekend

London Has Fallen is currently the number 1 action movie in America request_movie

(genre=action,

date=this weekend) Contexts provide informative cues for better understanding

(7)

7

Knowledge-Guided Structural Attention Network (K-SAN)

Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided Structural Attention Networks,” preprint arXiv: 1609.00777, 2016.

(8)

Syntax (Dependency Tree)

Semantics (AMR Graph)

show me

the

flights from seattle

to

san francisco ROOT

1.

3.

4.

2.

Sentence s show me the flights from seattle to san francisco

show

you flight I

1.

2.

4.

city city

Seattle San Francisco

3.

.

(9)

Prior knowledge as a teacher

9

knowledge-guided structure {xi}

Knowledge Encoding

Sentence Encoding

Inner Product

u

mi

Knowledge Attention Distribution

pi

Encoded Knowledge Representation Weighted Sum

h

o Knowledge-Guided

Representation

slot tagging sequence

s

y show me the flights from seattle to san francisco

ROOT

Input Sentence

ht-1 ht ht+1

W W W W

wt-1

yt-1 U

wt U

wt+1 U

V

yt V

yt+1 V

RNN Tagger Knowledge Encoding Module

CNNkg

CNNin NNout

Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided Structural Attention Networks,” preprint arXiv: 1609.00777, 2016.

M M M

(10)

Syntax (Dependency Tree)

Semantics (AMR Graph)

show me

the

flights from seattle

to

san francisco ROOT

1.

3.

4.

2.

1. show me

2. show flights the

3. show flights from seattle 4. show flights to francisco san

Sentence sshow me the flights from seattle to san francisco

Knowledge-Guided Substructure xi

(s / show

:ARG0 (y / you) :ARG1 (f / flight

:source (c / city

:name (d / name :op1 Seattle)) :destination (c2 / city

:name (s2 / name :op1 San :op2 Francisco))) :ARG2 (i / I)

Knowledge-Guided Substructure xi 1. show you

2. show flight seattle

3. show flight san francisco 4. show i

show

you flight I

1.

2.

4.

city city

Seattle San Francisco

3.

.

(11)

knowledge-guided structure {xi}

Knowledge Encoding

Sentence Encoding

Inner Product

u

mi

Knowledge Attention Distribution

pi

Encoded Knowledge Representation Weighted Sum

h

o Knowledge-Guided

Representation

slot tagging sequence

s

y show me the flights from seattle to san francisco

ROOT

Input Sentence

ht-1 ht ht+1

W W W W

wt-1

yt-1 U

wt M U

wt+1 U

V

yt V

yt+1 V M

M

RNN Tagger Knowledge Encoding Module

CNNkg

CNNin NNout

Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided 11

Structural Attention Networks,” preprint arXiv: 1609.00777, 2016.

The model will pay more attention to more important substructures that may be crucial for slot tagging.

(12)

ATIS Dataset (F1 slot filling)

Small (1/40)

Medium

(1/10) Large

Tagger (GRU) 73.83 85.55 93.11

Encoder-Tagger (GRU) 72.79 88.26 94.75

(13)

13

ATIS Dataset (F1 slot filling)

Small (1/40)

Medium

(1/10) Large

Tagger (GRU) 73.83 85.55 93.11

Encoder-Tagger (GRU) 72.79 88.26 94.75

K-SAN (Stanford dep) 74.60+ 87.99 94.86+ K-SAN (Syntaxnet dep) 74.35+ 88.40+ 95.00+

Syntax provides richer knowledge and more general guidance when less training data.

Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided Structural Attention Networks,” preprint arXiv: 1609.00777, 2016.

(14)

ATIS Dataset (F1 slot filling)

Small (1/40)

Medium

(1/10) Large

Tagger (GRU) 73.83 85.55 93.11

Encoder-Tagger (GRU) 72.79 88.26 94.75

K-SAN (Stanford dep) 74.60+ 87.99 94.86+ K-SAN (Syntaxnet dep) 74.35+ 88.40+ 95.00+

K-SAN (AMR) 74.32+ 88.14 94.85+

K-SAN (JAMR) 74.27+ 88.27+ 94.89+

Syntax provides richer knowledge and more general guidance when less training data.

Semantics captures the most salient info so it achieves similar performance with much less substructures

(15)

Joint Intent Prediction and Slot Filling

Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided 15

Structural Attention Networks,” preprint arXiv: 1609.00777, 2016.

knowledge-guided structure {xi}

Knowledge Encoding

Sentence Encoding

Inner Product

u

mi

Knowledge Attention Distribution

pi

Encoded Knowledge Representation Weighted Sum

h

o Knowledge-Guided

Representation

s

show me the flights from seattle to san francisco

ROOT

Input Sentence

RNN Tagger Knowledge Encoding Module

CNNkg

CNNin NNout

slot tagging sequence y

ht

-1

ht+

1

h

W W t W W

wt-

1

yt-1 U

wt M U

wt+1 U

V

yt

V

yt+1 V M

M EOS

U

Intent V

ht+1

Extend the K-SAN model for joint semantic frame parsing by outputting the user intent at last timestamp (Hakkani-Tur et al.).

(16)

ATIS Dataset

(train: 4478/dev:

500/test: 893)

Small (1/40) Medium (1/10) Large

Slot (Indep) Slot (Joint) Frame Slot (Indep) Slot (Joint) Frame Slot (Indep) Slot (Joint) Frame

Tagger 73.8 73.0 85.6 86.4 93.1 93.4

Encoder-Tagger 72.8 71.9 88.3 87.5 94.8 93.1

K-SAN (Syntax) 74.4+ 74.6+ 88.4+ 88.2+ 95.0+ 95.4+

K-SAN (Semantics) 74.3+ 73.4+ 88.3 88.1+ 94.9+ 95.1+

Communication

(train: 10479/dev:

1000/test: 2300)

Small (1/40) Medium (1/10) Large

Slot (Indep) Slot (Joint) Frame Slot (Indep) Slot (Joint) Frame Slot (Indep) Slot (Joint) Frame

Tagger 45.5 50.3 69.0 69.8 80.4 79.8

Encoder-Tagger 45.5 47.7 69.4 73.1 85.7 86.0

K-SAN (Syntax) 45.0 55.1+ 69.5+ 75.3+ 85.0 84.5

K-SAN (Semantics) 45.1 55.0 69.1 74.3+ 85.3 85.2

(17)

17

When data is scare, K-SAN with joint parsing significantly improves the performance (slot & frame) ATIS Dataset

(train: 4478/dev:

500/test: 893)

Small (1/40) Medium (1/10) Large

Slot (Indep) Slot (Joint) Frame Slot (Indep) Slot (Joint) Frame Slot (Indep) Slot (Joint) Frame

Tagger 73.8 73.0 33.5 85.6 86.4 58.5 93.1 93.4 79.7

Encoder-Tagger 72.8 71.9 35.2 88.3 87.5 61.9 94.8 93.1 82.5

K-SAN (Syntax) 74.4+ 74.6+ 37.6+ 88.4+ 88.2+ 63.5+ 95.0+ 95.4+ 84.3+ K-SAN (Semantics) 74.3+ 73.4+ 37.1+ 88.3 88.1+ 63.6+ 94.9+ 95.1+ 83.8+

Communication

(train: 10479/dev:

1000/test: 2300)

Small (1/40) Medium (1/10) Large

Slot (Indep) Slot (Joint) Frame Slot (Indep) Slot (Joint) Frame Slot (Indep) Slot (Joint) Frame

Tagger 45.5 50.3 48.9 69.0 69.8 68.2 80.4 79.8 79.5

Encoder-Tagger 45.5 47.7 52.7 69.4 73.1 71.4 85.7 86.0 83.9

K-SAN (Syntax) 45.0 55.1+ 57.2+ 69.5+ 75.3+ 73.5+ 85.0 84.5 84.5 K-SAN (Semantics) 45.1 55.0 54.1+ 69.1 74.3+ 73.8+ 85.3 85.2 83.4

Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided Structural Attention Networks,” preprint arXiv: 1609.00777, 2016.

(18)

Darker blocks and lines correspond to higher attention weights

(19)

Darker blocks and lines correspond to higher attention weights

Using less training data with K-SAN allows the model pay the similar attention to the salient substructures that are important for tagging.

Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided 19

Structural Attention Networks,” preprint arXiv: 1609.00777, 2016.

(20)

Modularizing Unsupervised Sense Embeddings (MUSE)

(21)

Word embeddings are trained on a corpus in an unsupervised manner

Using the same embeddings for different senses for NLP tasks, e.g.

NLU, POS tagging

21

Finally I chose Google instead of Apple.

Can you buy me a bag of apples, oranges, and bananas?

G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” preprint arXiv: 1704.04601, 2017.

Words with different senses should correspond different embeddings

(22)

Smartphone companies including blackberry, and sony will be invited.

Input: unannotated text corpus

Two key mechanisms

Sense selection given a text context

Sense representation to embed statistical characteristics of sense identity

apple

apple-1 apple-2 sense selection

sense embedding

(23)

Efficient sense selection

[Neelakantan et al., 2014; Li and Jurafsky, 2015]

Use word embeddings as input to update the sense posterior given words

Introduce ambiguity

Purely sense-level embedding [Qiu et al., 2016]

Inefficient sense selection  exponential time complexity

G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” preprint arXiv: 1704.04601, 2017.23

The prior approaches have disadvantages about either ambiguity or inefficiency

(24)

Sense selection

Policy-based

Value-based

Corpus: { Smartphone companies including apple blackberry, and sony will be invited.}

sense selection ←

reward signal

sense selection

sample collocation

1

2

2 3

Sense selection for collocated word 𝐶𝑡

Sense Selection Module

𝐶𝑡′ = 𝑤𝑗

𝐶𝑡′−1

𝑞(𝑧𝑗1|𝐶𝑡′) 𝑞(𝑧𝑗2|𝐶𝑡′) 𝑞(𝑧𝑗3|𝐶𝑡′) matrix 𝑄𝑗

matrix 𝑃

𝐶𝑡′+1

apple and

including blackberry sony

𝑧𝑖1

Sense Representation Module 𝑃(𝑧𝑗2|𝑧𝑖1) 𝑃(𝑧𝑢𝑣|𝑧𝑖1)

negative sampling

matrix 𝑉

matrix 𝑈

Sense representation learning

Skip-gram approximation

Sense Selection Module

𝐶𝑡 = 𝑤𝑖

𝐶𝑡−1

𝑞(𝑧𝑖1| ഥ𝐶𝑡) 𝑞(𝑧𝑖2| ഥ𝐶𝑡) 𝑞(𝑧𝑖3| ഥ𝐶𝑡) Sense selection for target word 𝐶𝑡

matrix 𝑄𝑖

matrix 𝑃

𝐶𝑡+1

including apple blackberry

companies and

(25)

Learning algorithm

Sense selection strategy

Stochastic policy: selects the sense based on the probability distribution

Greedy: selects the sense with the largest Q-value (no exploration)

ε-Greedy: selects a random sense with ε probability, and adopts the greedy strategy

Boltzmann: samples the sense based on the Boltzmann distribution modeled by Q- value

25

𝑧𝑖1

Sense Representation Module

𝑃(𝑧𝑗2|𝑧𝑖1) 𝑃(𝑧𝑢𝑣|𝑧𝑖1)

matrix 𝑈 matrix 𝑉

Sense Selection Module

𝐶𝑡 = 𝑤𝑖

𝐶𝑡−1

𝑞(𝑧𝑖1| ഥ𝐶𝑡) 𝑞(𝑧𝑖2| ഥ𝐶𝑡) 𝑞(𝑧𝑖3| ഥ𝐶𝑡) Sense selection for target word 𝐶𝑡

matrix 𝑄𝑖

matrix 𝑃

𝐶𝑡+1

including apple blackberry

companies and

(26)

Approach MaxSimC AvgSimC

Huang et al., 2012 26.1 65.7

Neelakantan et al., 2014 60.1 69.3

Tian et al., 2014 63.6 65.4

Li & Jurafsky, 2015 66.6 66.8

Bartunov et al., 2016 53.8 61.2

Qiu et al., 2016 64.9 66.1

Dataset: SCWS for multi-sense embedding evaluation

He borrowed the money from banks. I live near to a river. correlation=?

Baseline

bank-1 bank-2 0.6 x + 0.4 x

MaxSimC

(27)

Dataset: SCWS for multi-sense embedding evaluation

27

Approach MaxSimC AvgSimC

Huang et al., 2012 26.1 65.7

Neelakantan et al., 2014 60.1 69.3

Tian et al., 2014 63.6 65.4

Li & Jurafsky, 2015 66.6 66.8

Bartunov et al., 2016 53.8 61.2

Qiu et al., 2016 64.9 66.1

MUSE-Policy 66.1 67.4

He borrowed the money from banks. I live near to a river. correlation=?

(28)

Dataset: SCWS for multi-sense embedding evaluation

Approach MaxSimC AvgSimC

Huang et al., 2012 26.1 65.7

Neelakantan et al., 2014 60.1 69.3

Tian et al., 2014 63.6 65.4

Li & Jurafsky, 2015 66.6 66.8

Bartunov et al., 2016 53.8 61.2

Qiu et al., 2016 64.9 66.1

MUSE-Policy 66.1 67.4

MUSE-Greedy 66.3 68.3

He borrowed the money from banks. I live near to a river. correlation=?

(29)

Dataset: SCWS for multi-sense embedding evaluation

29

Approach MaxSimC AvgSimC

Huang et al., 2012 26.1 65.7

Neelakantan et al., 2014 60.1 69.3

Tian et al., 2014 63.6 65.4

Li & Jurafsky, 2015 66.6 66.8

Bartunov et al., 2016 53.8 61.2

Qiu et al., 2016 64.9 66.1

MUSE-Policy 66.1 67.4

MUSE-Greedy 66.3 68.3

MUSE-ε-Greedy 67.4+ 68.6

He borrowed the money from banks. I live near to a river. correlation=?

(30)

Dataset: SCWS for multi-sense embedding evaluation

Approach MaxSimC AvgSimC

Huang et al., 2012 26.1 65.7

Neelakantan et al., 2014 60.1 69.3

Tian et al., 2014 63.6 65.4

Li & Jurafsky, 2015 66.6 66.8

Bartunov et al., 2016 53.8 61.2

Qiu et al., 2016 64.9 66.1

MUSE-Policy 66.1 67.4

MUSE-Greedy 66.3 68.3

He borrowed the money from banks. I live near to a river. correlation=?

(31)

31

Approach ESL-50 RD-300 TOEFL-80

Global Context 47.73 45.07 60.87

SkipGram 52.08 55.66 66.67

IMS+SkipGram 41.67 53.77 66.67

EM 27.08 33.96 40.00

MSSG (Neelakantan et al., 2014) 57.14 58.93 78.26 CRP (Li & Jurafsky, 2015) 50.00 55.36 82.61

MUSE-Policy 52.38 51.79 79.71

MUSE-Greedy 57.14 58.93 79.71

MUSE-ε-Greedy 61.90+ 62.50+ 84.06+

MUSE-Boltzmann 64.29+ 66.07+ 88.41+

Retro-GlobalContext 63.64 66.20 71.01

Retro-SkipGram 56.25 65.09 73.33

Conventional Word Embedding

Word Sense Disambiguation

Unsupervised Sense Embedding

Supervised Sense Embedding

MUSE with exploration achieves the state-of-the-art results for synonym selection.

(32)

KNN senses sorted by collocation likelihood

Context KNN Senses

… braves finish the season in tie with the los angeles dodgers … scoreless otl shootout 6-6 hingis 3-3 7-7 0-0

… his later years proudly wore tie with the chinese characters for … pants trousers shirt juventus blazer socks anfield

… of the mulberry or the blackberry and minos sent him to … cranberries maple vaccinium apricot apple

… of the large number of blackberry users in the us federal … smartphones sap microsoft ipv6 smartphone

… ladies wore extravagant head ornaments combs pearl necklaces face … venter thorax neck spear millimeters fusiform

… appoint john pope republican as head of the new army of … multi-party appoints unicameral beria appointed

MUSE learns sense embeddings in an unsupervised way and achieves the first purely sense-level representation learning system with linear-time sense selection

(33)

33

Leveraging Behavior Patterns of Mobile Apps for Personalized Spoken Language Understanding

Y.-N. Chen, S. Ming, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.

(34)

Task: user intent prediction

Challenge: language ambiguity

User preference

Some people prefer “Message” to “Email”

Some people prefer “Outlook” to “Gmail”

App-level contexts

“Message” is more likely to follow “Camera”

“Email” is more likely to follow “Excel”

send to vivian

v.s.

Email? Message?

Communication

(35)

Subjects’ app invocation is logged on a daily basis

Subjects annotate their app activities with

Task Structure: link applications that serve a common goal

Task Description: briefly describe the goal or intention of the task

Subjects use a wizard system to perform the annotated task by speech

35

TASK59; 20150203; 1; Tuesday; 10:48

play music via bluetooth speaker

com.android.settings  com.lge.music Meta

Desc App

: Ready.

: Connect my phone to bluetooth speaker.

: Connected to bluetooth speaker.

: And play music.

: What music would you like to play?

: Shuffle playlist.

: I will play the music for you.

W1 U1 W2

U2 W3

U3 W4

Dialogue

SETTINGS MUSIC

MUSIC

Y.-N. Chen, S. Ming, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.

(36)

1

Lexical Intended App

photo tell check CAMERA IM

take this photo

tell vivian this is me in the lab

CAMERA

IM

Train

check my grades on website send an email to professor

CHROME EMAIL

send

Behavioral

NULL CAMERA

.85

take a photo of this send it to alice

CAMERA

IM

email

1 1

1 1

1

1 .70

CHROME

1

1 1

1 1 1

CHROME EMAIL

1 1

1 1

.95

.80 .55

User Utterance Intended

App

Test take a photo of this send it to alex

hidden semantics

Issue: unobserved hidden semantics may benefit understanding

(37)

The decomposed matrices represent low-rank latent semantics for utterances and words/histories/apps respectively

The product of two matrices fills the probability of hidden semantics

37 1

Lexical Intended App

photo tell check send CAMERA IM

Behavioral

NULL CAMERA

.85

email

1 1

1 1

1

1 .70

CHROME

1

1 1

1 1 1

CHROME EMAIL

1 1

1 1

.95

.80 .55

𝑼

𝑾 + 𝑯 + 𝑨

𝑼 × 𝒅 𝒅 × 𝑾 + 𝑯 + 𝑨

Y.-N. Chen, S. Ming, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.

(38)

Model implicit feedback by completing the matrix

not treat unobserved facts as negative samples (true or false)

give observed facts higher scores than unobserved facts

Objective:

the model can be achieved by SGD updates with fact pairs

1

𝑓+ 𝑓 𝑓

𝑢

𝑥

(39)

39 1

Lexical Intended App

photo tell check CAMERA IM

take this photo

tell vivian this is me in the lab

CAMERA

IM

Train

check my grades on website send an email to professor

CHROME EMAIL

send

Behavioral

NULL CAMERA

.85

take a photo of this send it to alice

CAMERA

IM

email

1 1

1 1

1

1 .70

CHROME

1

1 1

1 1 1

CHROME EMAIL

1 1

1 1

.95

.80 .55

User Utterance Intended

App

Reasoning with Matrix Factorization for Implicit Intents Test take a photo of this

send it to alex

Y.-N. Chen, S. Ming, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.

(40)

Dataset: 533 dialogues (1,607 utterances); 455 multi-turn dialogues

Google recognized transcripts (word error rate = 25%)

Evaluation metric: accuracy of user intent prediction (ACC)

mean average precision of ranked intents (MAP)

Baseline: Maximum Likelihood Estimation (MLE) Multinomial Logistic Regression (MLR)

Approach Lexical Behavioral All

(a) MLE User-Indep 13.5 / 19.6

(b) User-Dep 20.2 / 27.9

(41)

Dataset: 533 dialogues (1,607 utterances); 455 multi-turn dialogues

Google recognized transcripts (word error rate = 25%)

Evaluation metric: accuracy of user intent prediction (ACC)

mean average precision of ranked intents (MAP)

Baseline: Maximum Likelihood Estimation (MLE) Multinomial Logistic Regression (MLR)

41

Approach Lexical Behavioral All

(a) MLE User-Indep 13.5 / 19.6

(b) User-Dep 20.2 / 27.9

(c) MLR User-Indep 42.8 / 46.4 14.9 / 18.7 (d) User-Dep 48.2 / 52.1 19.3 / 25.2

Lexical features are useful to predict intended apps for both user-independent and user- dependent models.

Y.-N. Chen, S. Ming, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.

(42)

Dataset: 533 dialogues (1,607 utterances); 455 multi-turn dialogues

Google recognized transcripts (word error rate = 25%)

Evaluation metric: accuracy of user intent prediction (ACC)

mean average precision of ranked intents (MAP)

Baseline: Maximum Likelihood Estimation (MLE) Multinomial Logistic Regression (MLR)

Approach Lexical Behavioral All

(a) MLE User-Indep 13.5 / 19.6

(b) User-Dep 20.2 / 27.9

(c) MLR User-Indep 42.8 / 46.4 14.9 / 18.7 46.2+ / 50.1+ (d) User-Dep 48.2 / 52.1 19.3 / 25.2 50.1+ / 53.9+

(43)

Y.-N. Chen, S. Ming, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.

Dataset: 533 dialogues (1,607 utterances); 455 multi-turn dialogues

Google recognized transcripts (word error rate = 25%)

Evaluation metric: accuracy of user intent prediction (ACC)

mean average precision of ranked intents (MAP)

Baseline: Maximum Likelihood Estimation (MLE) Multinomial Logistic Regression (MLR)

43

Approach Lexical Behavioral All

(a) MLE User-Indep 13.5 / 19.6

(b) User-Dep 20.2 / 27.9

(c) MLR User-Indep 42.8 / 46.4 14.9 / 18.7 46.2+ / 50.1+ (d) User-Dep 48.2 / 52.1 19.3 / 25.2 50.1+ / 53.9+ (e) (c) + Personalized MF 47.6 / 51.1 16.4 / 20.3 50.3+* / 54.2+*

(f) (d) + Personalized MF 48.3 / 52.7 20.6 / 26.7 51.9+* / 55.7+* Personalized MF significantly improves MLR results by considering hidden semantics.

(44)

App functionality modeling

Learning app embeddings

(45)

45

Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems

X. Li, Y.-N. Chen, L. Li, and J. Gao, “End-to-End Task-Completion Neural Dialogue Systems,” preprint arXiv: 1703.01008, 2017.

X. Li, Y.-N. Chen, L. Li, J. Gao, and A. Celikyilmaz, “Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems,” preprint arXiv: 1703.07055, 2017.

(46)

Dialogue management is framed as a reinforcement learning task

Agent learns to select actions to maximize the expected reward

Environment

Observation

Reward

If booking a right ticket, reward = +30 If failing, reward = -30

Otherwise, reward = -1

Agent

(47)

Dialogue management is framed as a reinforcement learning task

Agent learns to select actions to maximize the expected reward

47

Environment

Observation

Action

Agent Natural Language Generation

User Agenda Modeling User Simulator

Language Understanding Dialogue Management Neural Dialogue System Text Input:

Are there any action movies to see this weekend?

Dialogue Policy:

request_location

X. Li, Y.-N. Chen, L. Li, and J. Gao, “End-to-End Task-Completion Neural Dialogue Systems,” preprint arXiv: 1703.01008, 2017.

(48)

NLU and NLG are trained in a supervised manner

DM is trained in a reinforcement learning framework (NLU and NLG can be fine tuned)

wi

B- type

wi+

1

wi+2

O O

EO S

<intent>

wi

B- type

wi+

1

wi+2

O O

EO S

<intent>

Dialogue Policy

request_location

User Dialogue Action

Inform(location=San Francisco)

Time t-1

wi

<slot>

wi+

1

wi+2

O O

EO S

<intent

>

Language Understanding

Time t-2 Time t

Dialogue Management

w

0

w1 w2

Natural Language Generation

EO S

User Goal

User Agenda Modeling

User Simulator End-to-End Neural Dialogue System

Text Input

Are there any action movies to see this weekend?

Semantic Frame request_movie genre=action, date=this weekend

(49)

DM receives frame-level information

No error model: perfect recognizer and LU

Error model: simulate the possible errors

49

Error Model

• Recognition error

• LU error

Dialogue State Tracking (DST)

system dialogue acts

Dialogue Policy Optimization

Dialogue Management (DM)

User Model

User Simulation

user dialogue acts (semantic frames)

X. Li, Y.-N. Chen, L. Li, and J. Gao, “End-to-End Task-Completion Neural Dialogue Systems,” preprint arXiv: 1703.01008, 2017.

(50)

User simulator sends natural language

No recognition error

Errors from NLG or LU

Natural Language Generation (NLG)

Dialogue State Tracking (DST)

system dialogue acts

Dialogue Policy Optimization

Dialogue Management (DM)

User Model

User Simulation

NL Language Understanding

(LU)

(51)

Frame-level semantics

51

Natural language

The RL agent is able to learn how to interact with users to complete tasks more efficiently and effectively, and outperforms the rule-based agent.

X. Li, Y.-N. Chen, L. Li, and J. Gao, “End-to-End Task-Completion Neural Dialogue Systems,” preprint arXiv: 1703.01008, 2017.

(52)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

Success Rate

Learning Curve of System Performance

Upper Bound DQN - 0.00 Rule - 0.00

RL Agent w/o LU errors

Rule-Based Agent w/o LU errors

(53)

X. Li, Y.-N. Chen, L. Li, J. Gao, and A. Celikyilmaz, “Investigation of Language Understanding Impact for Reinforcement

Learning Based Dialogue Systems,” preprint arXiv: 1703.07055, 2017. 53

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 12 24 36 48 60 72 84 96 108 120 132 144 156 168 180 192 204 216 228 240 252 264 276 288 300 312 324 336 348 360 372 384 396 408 420 432 444 456 468 480 492

Success Rate

Simulation Epoch

Learning Curve of System Performance

Upper Bound DQN - 0.00 DQN - 0.05 Rule - 0.00 Rule - 0.05

RL Agent w/o LU errors RL Agent w/ 5% LU errors

Rule-Based Agent w/o LU errors Rule-Based Agent w/ 5% LU errors

>5%

performance drop

The system performance is sensitive to LU errors (sentence-level contexts), for both rule-based and RL agents.

(54)

Intent error type

I0: random

I1: within group

I2: between group

Intent error rate

I3: 0.00

I4: 0.10

I5: 0.20

Group 1: greeting(), thanks(), etc Group 2: inform(xx)

Group 3: request(xx)

request_moviename(actor=Robert Downey Jr) request_year

(55)

Slot error type

I0: random

I1: slot deletion

I2: value substitution

I3: slot substitution

Slot error rate

S4: 0.00

S5: 0.10

S6: 0.20

55

Slot errors significantly degrade the RL system performance

Value substitution has the largest impact on the system performance

request_moviename (actor=Robert Downey Jr) director Robert Downey Sr

(56)

Intent error rate

Slot error rate

The RL agent has better robustness to intent errors in terms of dialogue-level performance

(57)

Word-level contexts in sentences help understand word meanings

Learning from Prior Knowledge –

K-SAN achieves better LU via known knowledge [Chen et al., ‘16]

Learning from Observations –

MUSE learns sense embeddings with efficient sense selection [Lee & Chen, ‘17]

Sentence-level contexts have different impacts on dialogue performance

Inference –

App contexts improve personalized understanding via inference [Chen et al., ‘15]

Investigation of Understanding Impact –

Slot errors degrade system performance more than intent errors [Li et al., ‘17]

Contexts from different levels provide cues for better understanding in supervised and unsupervised ways

57

(58)

Q & A

參考文獻

相關文件

The construction was part of an intense competition in  New York for the title of &#34;world's tallest building&#34;. Two  other  projects  fighting  for  the 

Reading Task 6: Genre Structure and Language Features. • Now let’s look at how language features (e.g. sentence patterns) are connected to the structure

 Register, tone and style are entirely appropriate to the genre and text- type.  Text

¾ A combination of results in five HKDSE subjects of Level 2 in New Senior Secondary (NSS) subjects, &#34;Attained&#34; in Applied Learning (ApL) subjects (subject to a maximum of

Nicolas Standaert, &#34;Methodology in View of Contact Between Cultures: The China Case in the 17th Century &#34;, Centre for the Study of Religion and Chinese Society Chung

Suggestions to Medicine Researchers on Using ML-driven AI.. From Intelligence to Artificial Intelligence.. intelligence: thinking and

 Propose eQoS, which serves as a gene ral framework for reasoning about th e energy efficiency trade-off in int eractive mobile Web applications.  Demonstrate a working prototype and

Teachers can design short practice tasks to help students focus on one learning target at a time Inferencing task – to help students infer meaning while reading. Skimming task –