▪
Subjects’ app invocation is logged on a daily basis
▪
Subjects annotate their app activities with
▪ Task Structure: link applications that serve a common goal
▪ Task Description: briefly describe the goal or intention of the task
▪
Subjects use a wizard system to perform the annotated task by speech
35
TASK59; 20150203; 1; Tuesday; 10:48
play music via bluetooth speaker
com.android.settings com.lge.music Meta
Desc App
: Ready.
: Connect my phone to bluetooth speaker.
: Connected to bluetooth speaker.
: And play music.
: What music would you like to play?
: Shuffle playlist.
: I will play the music for you.
W1 U1 W2
U2 W3
U3 W4
Dialogue
SETTINGS MUSIC
MUSIC
Y.-N. Chen, S. Ming, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.
1
Lexical Intended App
photo tell check CAMERA IM
take this photo
tell vivian this is me in the lab
CAMERA
IM
Train
check my grades on website send an email to professor
…
CHROME EMAIL
send
Behavioral
NULL CAMERA
.85
take a photo of this send it to alice
CAMERA
IM
…
1 1
1 1
1
1 .70
CHROME
1
1 1
1 1 1
CHROME EMAIL
1 1
1 1
.95
.80 .55
User Utterance Intended
App
Test take a photo of this send it to alex
…
hidden semantics
Issue: unobserved hidden semantics may benefit understanding
▪
The decomposed matrices represent low-rank latent semantics for utterances and words/histories/apps respectively
▪
The product of two matrices fills the probability of hidden semantics
37 1
Lexical Intended App
photo tell check send CAMERA IM
Behavioral
NULL CAMERA
.85
1 1
1 1
1
1 .70
CHROME
1
1 1
1 1 1
CHROME EMAIL
1 1
1 1
.95
.80 .55
𝑼
𝑾 + 𝑯 + 𝑨
≈ 𝑼 × 𝒅 𝒅 × 𝑾 + 𝑯 + 𝑨
Y.-N. Chen, S. Ming, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.
▪
Model implicit feedback by completing the matrix
▪ not treat unobserved facts as negative samples (true or false)
▪ give observed facts higher scores than unobserved facts
▪
Objective:
▪ the model can be achieved by SGD updates with fact pairs
1
𝑓+ 𝑓− 𝑓−
𝑢
𝑥
39 1
Lexical Intended App
photo tell check CAMERA IM
take this photo
tell vivian this is me in the lab
CAMERA
IM
Train
check my grades on website send an email to professor
…
CHROME EMAIL
send
Behavioral
NULL CAMERA
.85
take a photo of this send it to alice
CAMERA
IM
…
1 1
1 1
1
1 .70
CHROME
1
1 1
1 1 1
CHROME EMAIL
1 1
1 1
.95
.80 .55
User Utterance Intended
App
Reasoning with Matrix Factorization for Implicit Intents Test take a photo of this
send it to alex
…
Y.-N. Chen, S. Ming, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.
▪
Dataset: 533 dialogues (1,607 utterances); 455 multi-turn dialogues
▪
Google recognized transcripts (word error rate = 25%)
▪
Evaluation metric: accuracy of user intent prediction (ACC)
mean average precision of ranked intents (MAP)
▪
Baseline: Maximum Likelihood Estimation (MLE) Multinomial Logistic Regression (MLR)
Approach Lexical Behavioral All
(a) MLE User-Indep 13.5 / 19.6
(b) User-Dep 20.2 / 27.9
▪
Dataset: 533 dialogues (1,607 utterances); 455 multi-turn dialogues
▪
Google recognized transcripts (word error rate = 25%)
▪
Evaluation metric: accuracy of user intent prediction (ACC)
mean average precision of ranked intents (MAP)
▪
Baseline: Maximum Likelihood Estimation (MLE) Multinomial Logistic Regression (MLR)
41
Approach Lexical Behavioral All
(a) MLE User-Indep 13.5 / 19.6
(b) User-Dep 20.2 / 27.9
(c) MLR User-Indep 42.8 / 46.4 14.9 / 18.7 (d) User-Dep 48.2 / 52.1 19.3 / 25.2
Lexical features are useful to predict intended apps for both independent and user-dependent models.
Y.-N. Chen, S. Ming, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.
▪
Dataset: 533 dialogues (1,607 utterances); 455 multi-turn dialogues
▪
Google recognized transcripts (word error rate = 25%)
▪
Evaluation metric: accuracy of user intent prediction (ACC)
mean average precision of ranked intents (MAP)
▪
Baseline: Maximum Likelihood Estimation (MLE) Multinomial Logistic Regression (MLR)
Approach Lexical Behavioral All
(a) MLE User-Indep 13.5 / 19.6
(b) User-Dep 20.2 / 27.9
(c) MLR User-Indep 42.8 / 46.4 14.9 / 18.7 46.2+ / 50.1+ (d) User-Dep 48.2 / 52.1 19.3 / 25.2 50.1+ / 53.9+
Y.-N. Chen, S. Ming, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.
▪
Dataset: 533 dialogues (1,607 utterances); 455 multi-turn dialogues
▪
Google recognized transcripts (word error rate = 25%)
▪
Evaluation metric: accuracy of user intent prediction (ACC)
mean average precision of ranked intents (MAP)
▪
Baseline: Maximum Likelihood Estimation (MLE) Multinomial Logistic Regression (MLR)
43
Approach Lexical Behavioral All
(a) MLE User-Indep 13.5 / 19.6
(b) User-Dep 20.2 / 27.9
(c) MLR User-Indep 42.8 / 46.4 14.9 / 18.7 46.2+ / 50.1+ (d) User-Dep 48.2 / 52.1 19.3 / 25.2 50.1+ / 53.9+ (e) (c) + Personalized MF 47.6 / 51.1 16.4 / 20.3 50.3+* / 54.2+*
(f) (d) + Personalized MF 48.3 / 52.7 20.6 / 26.7 51.9+* / 55.7+* Personalized MF significantly improves MLR results by considering hidden semantics.
▪
App functionality modeling
▪ Learning app embeddings
45
Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems
X. Li, Y.-N. Chen, L. Li, and J. Gao, “End-to-End Task-Completion Neural Dialogue Systems,” preprint arXiv: 1703.01008, 2017.
X. Li, Y.-N. Chen, L. Li, J. Gao, and A. Celikyilmaz, “Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems,” preprint arXiv: 1703.07055, 2017.
▪
Dialogue management is framed as a reinforcement learning task
▪
Agent learns to select actions to maximize the expected reward
Environment
Observation
Reward
If booking a right ticket, reward = +30 If failing, reward = -30
Otherwise, reward = -1
Agent
▪
Dialogue management is framed as a reinforcement learning task
▪
Agent learns to select actions to maximize the expected reward
47
Environment
Observation
Action
Agent Natural Language Generation
User Agenda Modeling User Simulator
Language Understanding Dialogue Management Neural Dialogue System Text Input:
Are there any action movies to see this weekend?Dialogue Policy:
request_locationX. Li, Y.-N. Chen, L. Li, and J. Gao, “End-to-End Task-Completion Neural Dialogue Systems,” preprint arXiv: 1703.01008, 2017.
▪
NLU and NLG are trained in a supervised manner
▪
DM is trained in a reinforcement learning framework (NLU and NLG can be fine tuned)
wi
B-type
wi+
1
wi+2
O O
EO S
<intent>
wi
B-type
wi+
1
wi+2
O O
EO S
<intent>
Dialogue Policy
request_location
User Dialogue Action
Inform(location=San Francisco)
Time t-1
wi
<slot>
wi+
1
wi+2
O O
EO S
<intent
>
Language Understanding
Time t-2 Time t
Dialogue Management
w
0
w1 w2
Natural Language Generation
EO S
User Goal
User Agenda Modeling
User Simulator End-to-End Neural Dialogue System
Text Input
Are there any action movies to see this weekend?
Semantic Frame request_movie genre=action, date=this weekend
▪
DM receives frame-level information
▪ No error model: perfect recognizer and LU
▪ Error model: simulate the possible errors
49
Error Model
• Recognition error
• LU error