• 沒有找到結果。

HAKKANI-TUR, TUR, GAO, DENG YUN-NUNG (VIVIAN) CHEN

N/A
N/A
Protected

Academic year: 2022

Share "HAKKANI-TUR, TUR, GAO, DENG YUN-NUNG (VIVIAN) CHEN"

Copied!
22
0
0

加載中.... (立即查看全文)

全文

(1)

1

H A K K A N I - T U R , T U R , G A O , D E N G

(2)

Outline

Introduction

Spoken Dialogue System

Spoken/Natural Language Understanding (SLU/NLU)

Contextual Spoken Language Understanding

Model Architecture End-to-End Training

Experiments

Conclusion & Future Work

2

d Memory Networks for Multi-Turn Spoken Language Understanding Yun-Nung (Vivian) Chen

(3)

Outline

Introduction

Spoken Dialogue System

Spoken/Natural Language Understanding (SLU/NLU)

Contextual Spoken Language Understanding

Model Architecture End-to-End Training

Experiments

Conclusion & Future Work

3

d Memory Networks for Multi-Turn Spoken Language Understanding Yun-Nung (Vivian) Chen

(4)

Dialogue System Pipeline

d Memory Networks for Multi-Turn Spoken Language Understanding Yun-Nung (Vivian) Chen

4

ASR

Language Understanding (LU)

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking

• Policy Decision Output

Generation

Hypothesis

are there any action movies to see this weekend

Semantic Frame (Intents, Slots) request_movie genre=action

date=this weekend

System Action request_locaion Text response

Where are you located?

Screen Display location?

Text Input

Are there any action movies to see this weekend?

Speech Signal

(5)

LU Importance

d Memory Networks for Multi-Turn Spoken Language Understanding Yun-Nung (Vivian) Chen

5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 15 30 45 60 75 90 105 120 135 150 165 180 195 210 225 240 255 270 285 300 315 330 345 360 375 390 405 420 435 450 465 480 495

Success Rate

Simulation Epoch

Learning Curve of System Performance

Upper Bound DQN - 0.00 DQN - 0.05 Rule - 0.00 Rule - 0.05

RL Agent w/o LU errors

Rule Agent w/o LU errors

(6)

LU Importance

d Memory Networks for Multi-Turn Spoken Language Understanding Yun-Nung (Vivian) Chen

6

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 15 30 45 60 75 90 105 120 135 150 165 180 195 210 225 240 255 270 285 300 315 330 345 360 375 390 405 420 435 450 465 480 495

Success Rate

Simulation Epoch

Learning Curve of System Performance

Upper Bound DQN - 0.00 DQN - 0.05 Rule - 0.00 Rule - 0.05

RL Agent w/o LU errors RL Agent w/ 5% LU errors

Rule Agent w/o LU errors Rule Agent w/ 5% LU errors

>5% performance drop

The system performance is sensitive to LU errors, for both rule-based

and reinforcement learning agents.

(7)

Dialogue System Pipeline

d Memory Networks for Multi-Turn Spoken Language Understanding Yun-Nung (Vivian) Chen

7

SLU usually focuses on understanding single-turn utterances The understanding result is usually influenced by

1) local observations 2) global knowledge.

ASR

Language Understanding (LU)

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking

• Policy Decision Output

Generation

Hypothesis

are there any action movies to see this weekend

Semantic Frame (Intents, Slots) request_movie genre=action

date=this weekend

System Action request_locaion Text response

Where are you located?

Screen Display location?

Text Input

Are there any action movies to see this weekend?

Speech Signal

current bottleneck

 error propagation

(8)

Spoken Language Understanding

8

d Memory Networks for Multi-Turn Spoken Language Understanding Yun-Nung (Vivian) Chen

just sent email to bob about fishing this weekend

O O O O

B-contact_name

O

B-subject I-subject I-subject

U

S

I send_email D communication

 send_email(contact_name=“bob”, subject=“fishing this weekend”)

are we going to fish this weekend U 1

S 2

 send_email(message=“are we going to fish this weekend”)

send email to bob

U 2

 send_email(contact_name=“bob”)

B-message

I-message I-message I-message I-message I-message I-message

B-contact_name

S 1

Domain Identification  Intent Prediction  Slot Filling

(9)

Outline

Introduction

Spoken Dialogue System

Spoken/Natural Language Understanding (SLU/NLU)

Contextual Spoken Language Understanding

Model Architecture End-to-End Training

Experiments

Conclusion & Future Work

9

d Memory Networks for Multi-Turn Spoken Language Understanding Yun-Nung (Vivian) Chen

(10)

MODEL ARCHITECTURE

d Memory Networks for Multi-Turn Spoken Language Understanding Yun-Nung (Vivian) Chen

10

u

Knowledge Attention Distribution

p i

m i

Memory Representation

Weighted

Sum h

W kg

Knowledge Encoding o

Representation history utterances

{x

i

} current utterance

c

Inner Product Sentence

Encoder RNN

in

x1 x2 xi

Contextual Sentence Encoder

x1 x2 xi

RNN

mem

slot tagging sequence y

h

t-1

h

t

V V

W W W

w

t-1

w

t

y

t-1

y

t

U U

RNN Tagger

M M

Idea: additionally incorporating contextual knowledge during slot tagging

Chen, et al., “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding,” in Interspeech, 2016.

1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding

(11)

MODEL ARCHITECTURE

d Memory Networks for Multi-Turn Spoken Language Understanding Yun-Nung (Vivian) Chen

11

u

Knowledge Attention Distribution

p i

m i

Memory Representation

Weighted

Sum h

W kg

Knowledge Encoding o

Representation history utterances

{x

i

} current utterance

c

Inner Product Sentence

Encoder RNN

in

x1 x2 xi

Contextual Sentence Encoder

x1 x2 xi

RNN

mem

slot tagging sequence y

h

t-1

h

t

V V

W W W

w

t-1

w

t

y

t-1

y

t

U U

RNN Tagger

M M

Idea: additionally incorporating contextual knowledge during slot tagging

Chen, et al., “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding,” in Interspeech, 2016.

1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding

CNN

CNN

(12)

END-TO-END TRAINING

• Tagging Objective

• RNN Tagger

d Memory Networks for Multi-Turn Spoken Language Understanding Yun-Nung (Vivian) Chen

12

slot tag sequence contextual utterances & current utterance

h t-1 h t h t+1

V V V

W W W W

w t-1 w t w t+1 y t-1 y t y t+1

U U U

o

M M M

Automatically figure out the attention distribution without explicit

supervision

(13)

Outline

Introduction

Spoken Dialogue System

Spoken/Natural Language Understanding (SLU/NLU)

Contextual Spoken Language Understanding

Model Architecture End-to-End Training

Experiments

Conclusion & Future Work

13

d Memory Networks for Multi-Turn Spoken Language Understanding Yun-Nung (Vivian) Chen

(14)

EXPERIMENTS

• Dataset: Cortana communication session data

– GRU for all RNN – adam optimizer

– embedding dim=150 – hidden unit=100 – dropout=0.5

d Memory Networks for Multi-Turn Spoken Language Understanding Yun-Nung (Vivian) Chen

14

Model Training Set Knowledge Encoding

Sentence

Encoder First Turn Other Overall

RNN Tagger single-turn x x 60.6 16.2 25.5

The model trained on single-turn data performs worse for non-first

turns due to mismatched training data

(15)

EXPERIMENTS

• Dataset: Cortana communication session data

– GRU for all RNN – adam optimizer

– embedding dim=150 – hidden unit=100 – dropout=0.5

d Memory Networks for Multi-Turn Spoken Language Understanding Yun-Nung (Vivian) Chen

15

Model Training Set Knowledge Encoding

Sentence

Encoder First Turn Other Overall

RNN Tagger single-turn x x 60.6 16.2 25.5

multi-turn x x 55.9 45.7 47.4

Treating multi-turn data as single-turn for training performs reasonable

(16)

EXPERIMENTS

• Dataset: Cortana communication session data

– GRU for all RNN – adam optimizer

– embedding dim=150 – hidden unit=100 – dropout=0.5

d Memory Networks for Multi-Turn Spoken Language Understanding Yun-Nung (Vivian) Chen

16

Model Training Set Knowledge Encoding

Sentence

Encoder First Turn Other Overall

RNN Tagger single-turn x x 60.6 16.2 25.5

multi-turn x x 55.9 45.7 47.4

Encoder- Tagger

multi-turn current utt (c) RNN 57.6 56.0 56.3

multi-turn history + current (x, c) RNN 69.9 60.8 62.5

Encoding current and history utterances improves the performance

but increases the training time

(17)

EXPERIMENTS

• Dataset: Cortana communication session data

– GRU for all RNN – adam optimizer

– embedding dim=150 – hidden unit=100 – dropout=0.5

d Memory Networks for Multi-Turn Spoken Language Understanding Yun-Nung (Vivian) Chen

17

Model Training Set Knowledge Encoding

Sentence

Encoder First Turn Other Overall

RNN Tagger single-turn x x 60.6 16.2 25.5

multi-turn x x 55.9 45.7 47.4

Encoder- Tagger

multi-turn current utt (c) RNN 57.6 56.0 56.3

multi-turn history + current (x, c) RNN 69.9 60.8 62.5 Proposed multi-turn history + current (x, c) RNN 73.2 65.7 67.1

Applying memory networks significantly outperforms all approaches

with much less training time

(18)

EXPERIMENTS

• Dataset: Cortana communication session data

– GRU for all RNN – adam optimizer

– embedding dim=150 – hidden unit=100 – dropout=0.5

d Memory Networks for Multi-Turn Spoken Language Understanding Yun-Nung (Vivian) Chen

18

Model Training Set Knowledge Encoding

Sentence

Encoder First Turn Other Overall

RNN Tagger single-turn x x 60.6 16.2 25.5

multi-turn x x 55.9 45.7 47.4

Encoder- Tagger

multi-turn current utt (c) RNN 57.6 56.0 56.3

multi-turn history + current (x, c) RNN 69.9 60.8 62.5 Proposed multi-turn history + current (x, c) RNN 73.2 65.7 67.1

multi-turn history + current (x, c) CNN 73.8 66.5 68.0

CNN produces comparable results for sentence encoding with

shorter training time

(19)

Outline

Introduction

Spoken Dialogue System

Spoken/Natural Language Understanding (SLU/NLU)

Contextual Spoken Language Understanding

Model Architecture End-to-End Training

Experiments

Conclusion & Future Work

19

d Memory Networks for Multi-Turn Spoken Language Understanding Yun-Nung (Vivian) Chen

(20)

Conclusion

The proposed end-to-end memory networks store

contextual knowledge, which can be exploited dynamically based on an attention model for manipulating knowledge carryover for multi-turn understanding

• The end-to-end model performs the tagging task instead of classification

The experiments show the feasibility and robustness of modeling knowledge carryover through memory networks

20

d Memory Networks for Multi-Turn Spoken Language Understanding Yun-Nung (Vivian) Chen

(21)

Future Work

• Leveraging not only local observation but also global knowledge for better language understanding

– Syntax or semantics can serve as global knowledge to guide the understanding model

– “Knowledge as a Teacher: Knowledge-Guided Structural Attention Networks,” arXiv preprint arXiv: 1609.03286

21

d Memory Networks for Multi-Turn Spoken Language Understanding Yun-Nung (Vivian) Chen

(22)

Q & A

T H A N K S F O R Y O U R AT T E N T I O N !

22

The code will be available at

https://github.com/yvchen/ContextualSLU

參考文獻

相關文件

Building on the strengths of students and considering their future learning needs, plan for a Junior Secondary English Language curriculum to gear students towards the learning

Building on the strengths of students and considering their future learning needs, plan for a Junior Secondary English Language curriculum to gear students towards the

Language Curriculum: (I) Reading and Listening Skills (Re-run) 2 30 3 hr 2 Workshop on the Language Arts Modules: Learning English. through Popular Culture (Re-run) 2 30

 Register, tone and style are entirely appropriate to the genre and text- type.  Text

(B) Basketball is usually played by young people.. (C) Basketbal l is reported in the Olympic

However, Venerable Master Hsing Yun said, “Although we have different standpoints and understanding, but for the purpose of propagating the Dharma, we managed to come to

 The IEC endeavours to ensure that the information contained in this presentation is accurate as of the date of its presentation, but the information is provided on an

• Dilek Hakkani-Tur, Asli Celikyilmaz, Larry Heck, and Gokhan Tur, Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding,