Multimodality
2
Definition
Multimodality describes communication practices in terms of the textual, aural, linguistic, spatial, and visual resources - or modes - used to compose messages.
Where media are concerned, multimodality is the use
of several modes (media) to create a single artifact.
3
Brain Signal for Understanding
3
Misunderstanding detection by brain signal
Green: listen to the correct answer
Red: listen to the wrong answer
http://dl.acm.org/citation.cfm?id=2388695
Detecting misunderstanding via brain signal in order to correct the understanding results
4
Eye Tracking for Understanding
Better understanding using additional multimodal information
Improving understanding via non-textual signal
5
Video for Intent Understanding
5
Proactive (from camera) I want to see a movie on TV!
Intent: turn_on_tv
Sir, may I turn on the TV for you?
Proactively understanding user intent to initiate the dialogues.
6
App Behavior for Understanding
6
Task: user intent prediction
Challenge: language ambiguity
User preference
Some people prefer “Message” to “Email”
Some people prefer “Outlook” to “Gmail”
App-level contexts
“Message” is more likely to follow “Camera”
“Email” is more likely to follow “Excel”
send to vivian
v.s.
Email? Message?
Communication
Considering behavioral patterns in history to model understanding for intent prediction.
7
Evolution Roadmap
Dialogue breadth (coverage) 7
Dialogue depth (complexity)
What is influenza?
I’ve got a cold what do I do?
Tell me a joke.
I feel sad…
8
Evolution Roadmap
8
Single domain systems
Extended systems
Multi- domain systems
Open domain systems Dialogue breadth (coverage)
Dialogue depth (complexity)
What is influenza?
I’ve got a cold what do I do?
Tell me a joke.
I feel sad…
9
Intent Expansion
(Chen et al., 2016)
Transfer dialogue acts across domains
Dialogue acts are similar for multiple domains
Learning new intents by information from other domains
CDSSM New Intent
Intent Representation 12
K:
Embedding Generation
K+1
<change_calender> K+2 Training Data
<change_note>
“adjust my note”
:
<change_setting>
“volume turn down”
The dialogue act representations can be automatically learned for other domains
http://ieeexplore.ieee.org/abstract/document/7472838/
postpone my meeting to five pm
10
Zero-Shot Learning
(Daupin et al., 2016)
Semantic utterance classification
Use query click logs to define a task that makes the
networks learn the meaning or intent behind the queries
The semantic features are the last hidden layer of the DNN
Use zero-shot discriminative embedding model combines H with the minimization of entropy of a zero-shot classifier
https://arxiv.org/abs/1401.0509
11
Domain Adaptation for SLU
(Kim et al., 2016)
Frustratingly easy domain adaptation
Novel neural approaches to domain adaptation
Improve slot tagging on several domains
http://www.aclweb.org/anthology/C/C16/C16-1038.pdf
12
Policy for Domain Adaptation
(Gašić et al., 2015)
Bayesian committee machine (BCM) enables estimated Q-function to share knowledge across domains
QR DR
QH DH
QL DL
Committee Model
The policy from a new domain can be boosted by the committee policy
http://ieeexplore.ieee.org/abstract/document/7404871/
13
Multi-Domain Dialogue System
Hierarchical reinforcement learning for DM
Meta-controller: select the goal/domain
Controller: select the action
13 https://arxiv.org/pdf/1704.03084.pdf
Meta- controller Controller
14
Multi-Domain Dialogue System
Hierarchical reinforcement learning for DM
https://arxiv.org/pdf/1704.03084.pdf
15
Evolution Roadmap
15
Knowledge based system Common sense system
Empathetic systems
Dialogue breadth (coverage)
Dialogue depth (complexity)
What is influenza?
I’ve got a cold what do I do?
Tell me a joke.
I feel sad…
16
High-Level Intention for Dialogue Planning
(Sun et al., 2016; Sun et al., 2016)
High-level intention may span several domains
Schedule a lunch with Vivian.
find restaurant check location contact play music What kind of restaurants do you prefer?
The distance is … Should I send the restaurant information to Vivian?
Users can interact via high-level descriptions and the system learns how to plan the dialogues
http://dl.acm.org/citation.cfm?id=2856818; http://www.lrec-conf.org/proceedings/lrec2016/pdf/75_Paper.pdf
17
Empathy in Dialogue System
(Fung et al., 2016)
Embed an empathy module
Recognize emotion using multimodality
Generate emotion-aware responses
Emotion Recognizer 17
vision speech text
https://arxiv.org/abs/1605.04072
18
Concluding Remarks
Multimodal signals
Detect misunderstanding
Benefit understanding
Dialogue breath
Single domain Extended domain Multi-domain Open domain
Dialogue depth
Knowledge-based system Common sense system Empathetic system