Communication - in DialoguesY

▪

Subjects’ app invocation is logged on a daily basis

▪

Subjects annotate their app activities with

▪ Task Structure: link applications that serve a common goal

▪ Task Description: briefly describe the goal or intention of the task

▪

Subjects use a wizard system to perform the annotated task by speech

TASK59; 20150203; 1; Tuesday; 10:48

play music via bluetooth speaker

com.android.settings  com.lge.music Meta

Desc App

: Ready.

: Connect my phone to bluetooth speaker.

: Connected to bluetooth speaker.

: And play music.

: What music would you like to play?

: Shuffle playlist.

: I will play the music for you.

W₁ U₁ W₂

U₂ W₃

U₃ W₄

Dialogue

S^ETTINGS M^USIC

M^USIC

Y.-N. Chen, S. Ming, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.

Lexical Intended App

photo tell check CAMERA IM

take this photo

tell vivian this is me in the lab

CAMERA

Train

check my grades on website send an email to professor

…

C^HROME EMAIL

send

Behavioral

NULL CAMERA

.85

take a photo of this send it to alice

CAMERA

…

email

1 1

1 .70

CHROME

1 1

1 1 1

CHROME EMAIL

1 1

.95

.80 .55

User Utterance Intended

App

Test take a photo of this send it to alex

…

hidden semantics

Issue: unobserved hidden semantics may benefit understanding

▪

The decomposed matrices represent low-rank latent semantics for utterances and words/histories/apps respectively

▪

The product of two matrices fills the probability of hidden semantics

37 1

Lexical Intended App

photo tell check send CAMERA IM

Behavioral

NULL CAMERA

.85

email

1 1

1 .70

CHROME

1 1

1 1 1

CHROME EMAIL

1 1

.95

.80 .55

𝑼

𝑾 + 𝑯 + 𝑨

≈ 𝑼 × 𝒅 𝒅 × 𝑾 + 𝑯 + 𝑨

▪

Model implicit feedback by completing the matrix

▪ not treat unobserved facts as negative samples (true or false)

▪ give observed facts higher scores than unobserved facts

▪

Objective:

▪ the model can be achieved by SGD updates with fact pairs

𝑓⁺ 𝑓⁻ 𝑓⁻

𝑢

𝑥

39 1

Lexical Intended App

photo tell check CAMERA IM

take this photo

tell vivian this is me in the lab

CAMERA

Train

check my grades on website send an email to professor

…

C^HROME EMAIL

send

Behavioral

NULL CAMERA

.85

take a photo of this send it to alice

CAMERA

…

email

1 1

1 .70

CHROME

1 1

1 1 1

CHROME EMAIL

1 1

.95

.80 .55

User Utterance Intended

App

Reasoning with Matrix Factorization for Implicit Intents Test take a photo of this

send it to alex

…

▪

Dataset: 533 dialogues (1,607 utterances); 455 multi-turn dialogues

▪

Google recognized transcripts (word error rate = 25%)

▪

Evaluation metric: accuracy of user intent prediction (ACC)

mean average precision of ranked intents (MAP)

▪

Baseline: Maximum Likelihood Estimation (MLE) Multinomial Logistic Regression (MLR)

Approach Lexical Behavioral All

(a) MLE User-Indep 13.5 / 19.6

(b) User-Dep 20.2 / 27.9

▪

Dataset: 533 dialogues (1,607 utterances); 455 multi-turn dialogues

▪

Google recognized transcripts (word error rate = 25%)

▪

Evaluation metric: accuracy of user intent prediction (ACC)

mean average precision of ranked intents (MAP)

▪

Baseline: Maximum Likelihood Estimation (MLE) Multinomial Logistic Regression (MLR)

Approach Lexical Behavioral All

(a) MLE User-Indep 13.5 / 19.6

(b) User-Dep 20.2 / 27.9

Lexical features are useful to predict intended apps for both independent and user-dependent models.

▪

Dataset: 533 dialogues (1,607 utterances); 455 multi-turn dialogues

▪

Google recognized transcripts (word error rate = 25%)

▪

Evaluation metric: accuracy of user intent prediction (ACC)

mean average precision of ranked intents (MAP)

▪

Baseline: Maximum Likelihood Estimation (MLE) Multinomial Logistic Regression (MLR)

Approach Lexical Behavioral All

(a) MLE User-Indep 13.5 / 19.6

(b) User-Dep 20.2 / 27.9

(c) MLR User-Indep 42.8 / 46.4 14.9 / 18.7 46.2⁺/ 50.1⁺ (d) User-Dep 48.2 / 52.1 19.3 / 25.2 50.1⁺/ 53.9⁺

▪

Dataset: 533 dialogues (1,607 utterances); 455 multi-turn dialogues

▪

Google recognized transcripts (word error rate = 25%)

▪

Evaluation metric: accuracy of user intent prediction (ACC)

mean average precision of ranked intents (MAP)

▪

Baseline: Maximum Likelihood Estimation (MLE) Multinomial Logistic Regression (MLR)

Approach Lexical Behavioral All

(a) MLE User-Indep 13.5 / 19.6

(b) User-Dep 20.2 / 27.9

(c) MLR User-Indep 42.8 / 46.4 14.9 / 18.7 46.2⁺/ 50.1⁺ (d) User-Dep 48.2 / 52.1 19.3 / 25.2 50.1⁺/ 53.9⁺ (e) (c) + Personalized MF 47.6 / 51.1 16.4 / 20.3 50.3⁺* / 54.2⁺*

(f) (d) + Personalized MF 48.3 / 52.7 20.6 / 26.7 51.9⁺* / 55.7⁺* Personalized MF significantly improves MLR results by considering hidden semantics.

▪

App functionality modeling

▪ Learning app embeddings

45 Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems

X. Li, Y.-N. Chen, L. Li, and J. Gao, “End-to-End Task-Completion Neural Dialogue Systems,” preprint arXiv: 1703.01008, 2017.

X. Li, Y.-N. Chen, L. Li, J. Gao, and A. Celikyilmaz, “Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems,” preprint arXiv: 1703.07055, 2017.

▪

Dialogue management is framed as a reinforcement learning task

▪

Agent learns to select actions to maximize the expected reward

Environment

Observation

Reward

If booking a right ticket, reward = +30 If failing, reward = -30

Otherwise, reward = -1

Agent

▪

Dialogue management is framed as a reinforcement learning task

▪

Agent learns to select actions to maximize the expected reward

Environment

Observation

Action

Agent Natural Language Generation

User Agenda Modeling User Simulator

Language Understanding Dialogue Management Neural Dialogue System Text Input:

Are there any action movies to see this weekend?

Dialogue Policy:

request_location

X. Li, Y.-N. Chen, L. Li, and J. Gao, “End-to-End Task-Completion Neural Dialogue Systems,” preprint arXiv: 1703.01008, 2017.

▪

NLU and NLG are trained in a supervised manner

▪

DM is trained in a reinforcement learning framework (NLU and NLG can be fine tuned)

B-type

w_i+

w_i+2

O O

EO S

B-type

w_i+

w_i+2

O O

EO S

Dialogue Policy

request_location

User Dialogue Action

Inform(location=San Francisco)

Time t-1

w_i

<slot>

wi+

w_i+2

O O

EO S

<intent

Language Understanding

Time t-2 Time t

Dialogue Management

w₁ w2

Natural Language Generation

EO S

User Goal

User Agenda Modeling

User Simulator End-to-End Neural Dialogue System

Text Input

Are there any action movies to see this weekend?

Semantic Frame request_movie genre=action, date=this weekend

▪

DM receives frame-level information

▪ No error model: perfect recognizer and LU

▪ Error model: simulate the possible errors

Error Model

• Recognition error

• LU error

Dialogue State Tracking (DST)

system dialogue acts

Dialogue Policy

Optimization

在文檔中 in DialoguesY (頁 34-49)