Multimodality 2

(1)

(2)

Multimodality

2



Definition



Multimodality describes communication practices in terms of the textual, aural, linguistic, spatial, and visual resources - or modes - used to compose messages.



Where media are concerned, multimodality is the use

of several modes (media) to create a single artifact.

(3)

3

Brain Signal for Understanding

3



Misunderstanding detection by brain signal



Green: listen to the correct answer



Red: listen to the wrong answer

http://dl.acm.org/citation.cfm?id=2388695

Detecting misunderstanding via brain signal in order to correct the understanding results

(4)

4

Eye Tracking for Understanding



Better understanding using additional multimodal information

Improving understanding via non-textual signal

(5)

5

Video for Intent Understanding

5

Proactive (from camera) I want to see a movie on TV!

Intent: turn_on_tv

Sir, may I turn on the TV for you?

Proactively understanding user intent to initiate the dialogues.

(6)

6

App Behavior for Understanding

6



Task: user intent prediction



Challenge: language ambiguity

 User preference

 Some people prefer “Message” to “Email”

 Some people prefer “Outlook” to “Gmail”

 App-level contexts

 “Message” is more likely to follow “Camera”

 “Email” is more likely to follow “Excel”

send to vivian

v.s.

Email? Message?

Communication

Considering behavioral patterns in history to model understanding for intent prediction.

(7)

7

Evolution Roadmap

Dialogue breadth (coverage) 7

Dialogue depth (complexity)

What is influenza?

I’ve got a cold what do I do?

Tell me a joke.

I feel sad…

(8)

8

Evolution Roadmap

8

Single domain systems

Extended systems

Multi- domain systems

Open domain systems Dialogue breadth (coverage)

What is influenza?

Tell me a joke.

I feel sad…

(9)

9

Intent Expansion

(Chen et al., 2016)



Transfer dialogue acts across domains

 Dialogue acts are similar for multiple domains

 Learning new intents by information from other domains

CDSSM New Intent

Intent Representation 12

K:

Embedding Generation

K+1

<change_calender> K+2 Training Data

<change_note>

“adjust my note”

:

<change_setting>

“volume turn down”

The dialogue act representations can be automatically learned for other domains

http://ieeexplore.ieee.org/abstract/document/7472838/

postpone my meeting to five pm

(10)

10

Zero-Shot Learning

(Daupin et al., 2016)



Semantic utterance classification

 Use query click logs to define a task that makes the

networks learn the meaning or intent behind the queries

 The semantic features are the last hidden layer of the DNN

 Use zero-shot discriminative embedding model combines H with the minimization of entropy of a zero-shot classifier

https://arxiv.org/abs/1401.0509

(11)

11

Domain Adaptation for SLU

(Kim et al., 2016)



Frustratingly easy domain adaptation



Novel neural approaches to domain adaptation



Improve slot tagging on several domains

http://www.aclweb.org/anthology/C/C16/C16-1038.pdf

(12)

12

Policy for Domain Adaptation

(Gašić et al., 2015)



Bayesian committee machine (BCM) enables estimated Q-function to share knowledge across domains

Q_R D_R

Q_H D_H

Q_L D_L

Committee Model

The policy from a new domain can be boosted by the committee policy

http://ieeexplore.ieee.org/abstract/document/7404871/

(13)

13

Multi-Domain Dialogue System



Hierarchical reinforcement learning for DM



Meta-controller: select the goal/domain



Controller: select the action

13 https://arxiv.org/pdf/1704.03084.pdf

Meta- controller Controller

(14)

14

Multi-Domain Dialogue System



Hierarchical reinforcement learning for DM

https://arxiv.org/pdf/1704.03084.pdf

(15)

15

Evolution Roadmap

15

Knowledge based system Common sense system

Empathetic systems

Dialogue breadth (coverage)

What is influenza?

Tell me a joke.

I feel sad…

(16)

16

High-Level Intention for Dialogue Planning

(Sun et al., 2016; Sun et al., 2016)



High-level intention may span several domains

Schedule a lunch with Vivian.

find restaurant check location contact play music What kind of restaurants do you prefer?

The distance is … Should I send the restaurant information to Vivian?

Users can interact via high-level descriptions and the system learns how to plan the dialogues

http://dl.acm.org/citation.cfm?id=2856818; http://www.lrec-conf.org/proceedings/lrec2016/pdf/75_Paper.pdf

(17)

17

Empathy in Dialogue System

(Fung et al., 2016)



Embed an empathy module



Recognize emotion using multimodality



Generate emotion-aware responses

Emotion Recognizer 17

vision speech text

https://arxiv.org/abs/1605.04072

(18)

18

Concluding Remarks



Multimodal signals

 Detect misunderstanding

 Benefit understanding

 Dialogue breath

 Single domain  Extended domain  Multi-domain  Open domain

 Dialogue depth

 Knowledge-based system  Common sense system  Empathetic system