• 沒有找到結果。

Multimodality 2

N/A
N/A
Protected

Academic year: 2022

Share "Multimodality 2"

Copied!
18
0
0

加載中.... (立即查看全文)

全文

(1)
(2)

Multimodality

2

Definition

Multimodality describes communication practices in terms of the textual, aural, linguistic, spatial, and visual resources - or modes - used to compose messages.

Where media are concerned, multimodality is the use

of several modes (media) to create a single artifact.

(3)

3

Brain Signal for Understanding

3

Misunderstanding detection by brain signal

Green: listen to the correct answer

Red: listen to the wrong answer

http://dl.acm.org/citation.cfm?id=2388695

Detecting misunderstanding via brain signal in order to correct the understanding results

(4)

4

Eye Tracking for Understanding

Better understanding using additional multimodal information

Improving understanding via non-textual signal

(5)

5

Video for Intent Understanding

5

Proactive (from camera) I want to see a movie on TV!

Intent: turn_on_tv

Sir, may I turn on the TV for you?

Proactively understanding user intent to initiate the dialogues.

(6)

6

App Behavior for Understanding

6

Task: user intent prediction

Challenge: language ambiguity

User preference

Some people prefer “Message” to “Email”

Some people prefer “Outlook” to “Gmail”

App-level contexts

“Message” is more likely to follow “Camera”

“Email” is more likely to follow “Excel”

send to vivian

v.s.

Email? Message?

Communication

Considering behavioral patterns in history to model understanding for intent prediction.

(7)

7

Evolution Roadmap

Dialogue breadth (coverage) 7

Dialogue depth (complexity)

What is influenza?

I’ve got a cold what do I do?

Tell me a joke.

I feel sad…

(8)

8

Evolution Roadmap

8

Single domain systems

Extended systems

Multi- domain systems

Open domain systems Dialogue breadth (coverage)

Dialogue depth (complexity)

What is influenza?

I’ve got a cold what do I do?

Tell me a joke.

I feel sad…

(9)

9

Intent Expansion

(Chen et al., 2016)

Transfer dialogue acts across domains

Dialogue acts are similar for multiple domains

Learning new intents by information from other domains

CDSSM New Intent

Intent Representation 12

K:

Embedding Generation

K+1

<change_calender> K+2 Training Data

<change_note>

“adjust my note”

:

<change_setting>

“volume turn down”

The dialogue act representations can be automatically learned for other domains

http://ieeexplore.ieee.org/abstract/document/7472838/

postpone my meeting to five pm

(10)

10

Zero-Shot Learning

(Daupin et al., 2016)

Semantic utterance classification

Use query click logs to define a task that makes the

networks learn the meaning or intent behind the queries

The semantic features are the last hidden layer of the DNN

Use zero-shot discriminative embedding model combines H with the minimization of entropy of a zero-shot classifier

https://arxiv.org/abs/1401.0509

(11)

11

Domain Adaptation for SLU

(Kim et al., 2016)

Frustratingly easy domain adaptation

Novel neural approaches to domain adaptation

Improve slot tagging on several domains

http://www.aclweb.org/anthology/C/C16/C16-1038.pdf

(12)

12

Policy for Domain Adaptation

(Gašić et al., 2015)

Bayesian committee machine (BCM) enables estimated Q-function to share knowledge across domains

QR DR

QH DH

QL DL

Committee Model

The policy from a new domain can be boosted by the committee policy

http://ieeexplore.ieee.org/abstract/document/7404871/

(13)

13

Multi-Domain Dialogue System

Hierarchical reinforcement learning for DM

Meta-controller: select the goal/domain

Controller: select the action

13 https://arxiv.org/pdf/1704.03084.pdf

Meta- controller Controller

(14)

14

Multi-Domain Dialogue System

Hierarchical reinforcement learning for DM

https://arxiv.org/pdf/1704.03084.pdf

(15)

15

Evolution Roadmap

15

Knowledge based system Common sense system

Empathetic systems

Dialogue breadth (coverage)

Dialogue depth (complexity)

What is influenza?

I’ve got a cold what do I do?

Tell me a joke.

I feel sad…

(16)

16

High-Level Intention for Dialogue Planning

(Sun et al., 2016; Sun et al., 2016)

High-level intention may span several domains

Schedule a lunch with Vivian.

find restaurant check location contact play music What kind of restaurants do you prefer?

The distance is … Should I send the restaurant information to Vivian?

Users can interact via high-level descriptions and the system learns how to plan the dialogues

http://dl.acm.org/citation.cfm?id=2856818; http://www.lrec-conf.org/proceedings/lrec2016/pdf/75_Paper.pdf

(17)

17

Empathy in Dialogue System

(Fung et al., 2016)

Embed an empathy module

Recognize emotion using multimodality

Generate emotion-aware responses

Emotion Recognizer 17

vision speech text

https://arxiv.org/abs/1605.04072

(18)

18

Concluding Remarks

Multimodal signals

Detect misunderstanding

Benefit understanding

Dialogue breath

Single domain  Extended domain  Multi-domain  Open domain

Dialogue depth

Knowledge-based system  Common sense system  Empathetic system

參考文獻

相關文件

Further more by means of a grey verhulst model to predict the population in 2011, in order to propose results to the government for reference when making decision

The respective fuel values for protein, fat, and carbohydrate are 17, 38, and 17 kJ/g, respectively.. 39) Consider the following properties of an element:.. (i) It is solid at

Choose the one alternative that best completes the statement or answers the question... 1) An FM radio station broadcasts electromagnetic radiation at a frequency of

Different modes of activities are employed to provide students with a systematic understanding about the context (e.g. lecture on appreciation of popular music) and

(c) a full-time Bachelor of Arts (BA) degree programme majoring in English or a relevant subject (such as Linguistics, with English as the main language of study)

Now, nearly all of the current flows through wire S since it has a much lower resistance than the light bulb. The light bulb does not glow because the current flowing through it

 Register, tone and style are entirely appropriate to the genre and text- type.  Text

The results of the study show that the reflection signal due to the necking defective can be identified in the acceleration history curve of the necking pile.. There is a delay in