• 沒有找到結果。

Task-Oriented Dialogue Systems

N/A
N/A
Protected

Academic year: 2022

Share "Task-Oriented Dialogue Systems"

Copied!
28
0
0

加載中.... (立即查看全文)

全文

(1)

Task-Oriented Dialogue Systems (Young, 2000)

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame

request_movie

genre=action, date=this weekend

System Action/Policy

request_location

Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Database

26

(2)

Natural Language Understanding (NLU)

Parse natural language into structured semantics NLU

Natural Language

McDonald’s is a cheap restaurant nearby the station.

Semantic Frame

RESTAURANT=“McDonald’s”

PRICE=“cheap”

LOCATION= “nearby the station”

27

(3)

Natural Language Generation (NLG)

Construct natural language based on structured semantics

Natural Language

McDonald’s is a cheap restaurant nearby the station.

Semantic Frame

RESTAURANT=“McDonald’s”

PRICE=“cheap”

LOCATION= “nearby the station”

NLG

28

(4)

Duality between NLU and NLG

Natural Language

McDonald’s is a cheap restaurant nearby the station.

Semantic Frame

RESTAURANT=“McDonald’s”

PRICE=“cheap”

LOCATION= “nearby the station”

NLG NLU

How can we leverage this dual relationship?

29

(5)

Shang-Yu Su, Chao-Wei Huang, and Yun-Nung Chen, “Dual Supervised Learning for Natural Language Understanding and Generation,” in Proceedings of The 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019.

Solution:

Dual Supervised Learning for NLU & NLG

(Su et al., 2019)

30

(6)

DSL: Dual Supervised Learning (Xia et al., 2017)

◉ Proposed for machine translation

◉ Consider two domains 𝑋 and 𝑌, and two tasks 𝑋 → 𝑌 and 𝑌 → 𝑋

𝑋 𝑌

𝜽 𝒚→𝒙 𝜽 𝒙→𝒚

We have 𝑃 𝑥, 𝑦 = 𝑃 𝑥 𝑦)𝑃 𝑦 = 𝑃 𝑦 𝑥)𝑃(𝑥)

Ideally 𝑃 𝑥, 𝑦 = 𝑃 𝑥 𝑦; 𝜽 𝒚→𝒙 )𝑃 𝑦 = 𝑃 𝑦 𝑥; 𝜽 𝒙→𝒚 )𝑃(𝑥)

Xia, Y., Qin, T., Chen, W., Bian, J., Yu, N., & Liu, T. Y., “Dual supervised learning,” in Proc. of ICML, 2017.

31

(7)

Dual Supervised Learning

◉ Exploit the duality by forcing models to follow the probabilistic constraint 𝑃 𝑥 𝑦; 𝜽 𝒚→𝒙 )𝑃 𝑦 = 𝑃 𝑦 𝑥; 𝜽 𝒙→𝒚 )𝑃(𝑥)

Objective function

ቐ min 𝜃 𝑥→𝑦 𝔼 𝑙 1 (𝑓 𝑥; 𝜃 𝑥→𝑦 , 𝑦) min 𝜃 𝑦→𝑥 𝔼 𝑙 2 (𝑔 𝑦; 𝜃 𝑦→𝑥 , 𝑥)

+ 𝜆 𝑥→𝑦 𝑙 𝑑𝑢𝑎𝑙𝑖𝑡𝑦 + 𝜆 𝑦→𝑥 𝑙 𝑑𝑢𝑎𝑙𝑖𝑡𝑦

How to model the marginal distributions of 𝑋 and 𝑌?

Xia, Y., Qin, T., Chen, W., Bian, J., Yu, N., & Liu, T. Y., “Dual supervised learning,” in Proc. of ICML, 2017.

32

(8)

Dual Supervised Learning

◉ Let’s go back to NLU and NLG

Natural Language

McDonald’s is a cheap restaurant nearby the station.

Semantic Frame

RESTAURANT=“McDonald’s”

PRICE=“cheap”

LOCATION= “nearby the station”

NLG NLU

Natural Language

X

Semantic Frame

Y

log෡ 𝑷(𝒙) log෡ 𝑷(𝒚)

Shang-Yu Su, Chao-Wei Huang, and Yun-Nung Chen, “Dual Supervised Learning for Natural Language Understanding and Generation,” in Proceedings of The 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019.

33

(9)

Natural Language log ෠ 𝑃(𝑥)

◉ Language modeling

GRU

𝑥 𝑑−1

𝑃 𝑥 𝑑 𝑥 1 , … , 𝑥 𝑑−1 )

Shang-Yu Su, Chao-Wei Huang, and Yun-Nung Chen, “Dual Supervised Learning for Natural Language Understanding and Generation,” in Proceedings of The 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019.

34

(10)

Semantic Frame log ෠ 𝑃(𝑦)

◉ We treat NLU as a multi-label classification problem

◉ Each label is a slot-value pair

RESTAURANT=“McDonald’s”

PRICE=“cheap”

LOCATION= “nearby the station”

0

1 . . . 0 1

How to model the marginal distributions of 𝑦?

Shang-Yu Su, Chao-Wei Huang, and Yun-Nung Chen, “Dual Supervised Learning for Natural Language Understanding and Generation,” in Proceedings of The 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019.

35

(11)

Semantic Frame log ෠ 𝑃(𝑦)

◉ Naïve approach

○ Calculate prior probability for each label ෠ 𝑃(𝑦 𝑖 ) on training set.

○ 𝑃 𝑦 = ς ෠ ෠ 𝑃(𝑦 𝑖 )

Assumption: labels are independent

Restaurant: “McDonald’s”

Restaurant: “KFC”

Restaurant: “PizzaHut”

Price: “cheap”

Price: “expensive”

Food: “Pizza”

Food: “Hamburger”

Food:”Chinese”

Shang-Yu Su, Chao-Wei Huang, and Yun-Nung Chen, “Dual Supervised Learning for Natural Language Understanding and Generation,” in Proceedings of The 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019.

36

(12)

Semantic Frame log ෠ 𝑃(𝑦)

◉ Masked autoencoder for distribution estimation (MADE)

2 1 3

1 2 2 1

2 1 3

Introduce sequential dependency among labels by masking certain connections

→ marginal distribution of 𝑦

Germain, M., Gregor, K., Murray, I., & Larochelle, H., “MADE: Masked autoencoder for distribution estimation,”

in Proceedings of International Conference on Machine Learning, 2015.

37

(13)

GRU

McDonald’s is

station

Linear

0

1 . . . 0 1

NLU

GRU

<BOS> McDonald’s

station

NLG 0

1 . . . 0 1

McDonald’s is <EOS>

Shang-Yu Su, Chao-Wei Huang, and Yun-Nung Chen, “Dual Supervised Learning for Natural Language Understanding

and Generation,” in Proceedings of The 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019. 38

(14)

NLU/NLG Results

◉ E2E NLG data: 50k examples in the restaurant domain

◉ NLU: F-1 score; NLG: BLEU, ROUGE

50 55 60 65 70 75

F1 BLEU ROUGE-1

NLG Baseline

NLG Baseline

NLU Baseline

39

(15)

NLU/NLG Results

◉ E2E NLG data: 50k examples in the restaurant domain

◉ NLU: F-1 score; NLG: BLEU, ROUGE

50 55 60 65 70 75

F1 BLEU ROUGE-1

NLU Baseline DSL w/o MADE

DSL w/o MADE

DSL w/o MADE NLG Baseline

NLG Baseline

40

(16)

NLU/NLG Results

◉ E2E NLG data: 50k examples in the restaurant domain

◉ NLU: F-1 score; NLG: BLEU, ROUGE

50 55 60 65 70 75

F1 BLEU ROUGE-1

NLU Baseline DSL w/ MADE

DSL w/ MADE

DSL w/o MADE DSL w/ MADE DSL w/o MADE

DSL w/o MADE NLG Baseline

NLG Baseline

41

(17)

Task-Oriented Dialogue Systems (Young, 2000)

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame

request_movie

genre=action, date=this weekend

System Action/Policy

request_location

Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Database

42

(18)

43

(19)

44

(20)

Unstructured Knowledge Access

◉ A machine reads big text data

○ serves as a teacher

◉ A user can ask questions

○ serves as a student

○ in a conversational manner

→ Conversational QA

45

(21)

Solution: FlowDelta

(Yeh & Chen, 2019)

Yi-Ting Yeh and Yun-Nung Chen, "FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension," in Proceedings of Machine Reading for Question Answering Workshop at EMNLP (MRQA), 2019.

46

(22)

FlowDelta: Information Gain in Dialogue Flow

◉ Idea: model the difference of hidden states in multi-turn dialogues

Conversation Flow (over Context)

Time (Question Turns)

Δ Δ Δ … … Δ Δ Δ Δ … … Δ

𝑡−1,𝑗

𝑡,𝑗

𝑐

𝑡,𝑗

FlowDelta: Modeling Flow Information Gain

𝑡,2

𝑐

𝑡,2

𝑡−1,2

𝑡,1

𝑐

𝑡,1

𝑡−1,1

… …

Q1 Q2 Q3

… …

… …

Yi-Ting Yeh and Yun-Nung Chen, "FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension," in Proceedings of Machine Reading for Question Answering Workshop at EMNLP (MRQA), 2019.

47

(23)

FlowDelta (Yeh & Chen, 2019)

◉ Idea: model the difference of hidden states in multi-turn dialogues

i-th Question Context

i-th Answer

FlowQA

Dialogu e R easoning Encodi ng

Encodi ng

i-th Question

Context

BERT 𝑙

1

BERT 𝑙

k

:

BERT 𝑙

k-1

i-th Answer BERT

Dialogu e Reaso ning

48

(24)

Conversational QA Results

◉ Data: QuAC, CoQA

60 62 64 66 68 70 72 74 76 78 80

CoQA QuAC

FlowQA BERT

FlowQA BERT

Yi-Ting Yeh and Yun-Nung Chen, "FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension," in Proceedings of Machine Reading for Question Answering Workshop at EMNLP (MRQA), 2019.

49

(25)

Conversational QA Results

◉ Data: QuAC, CoQA

60 62 64 66 68 70 72 74 76 78 80

CoQA QuAC

FlowQA

+ Flow BERT

FlowQA

+ Flow BERT

Yi-Ting Yeh and Yun-Nung Chen, "FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension," in Proceedings of Machine Reading for Question Answering Workshop at EMNLP (MRQA), 2019.

50

(26)

Conversational QA Results

◉ Data: QuAC, CoQA

60 62 64 66 68 70 72 74 76 78 80

CoQA QuAC

FlowQA

+ FlowDelta

+ FlowDelta + Flow

BERT

FlowQA

+ FlowDelta

+ FlowDelta + Flow

BERT

Yi-Ting Yeh and Yun-Nung Chen, "FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension," in Proceedings of Machine Reading for Question Answering Workshop at EMNLP (MRQA), 2019.

51

(27)

QuAC Leaderboard

52

(28)

Summary

Spoken language embeddings are needed for better conversational AI

○ Written texts enough for pre-training embeddings

○ Mismatch when applying to spoken language 1) Adapting Transformer to ASR lattices

2) Adapting contextualized embeddings robust to misrecognition

◉ Leveraging the duality of NLU and NLG improves the scalability

○ Apply dual supervised learning to leverage the duality

○ Data distribution property is important

○ Better performance and flexibility for diverse NLU/NLG models

Conversational QA enables unstructured information access

FlowDelta: information gain in dialogue flow guides better understanding

53

參考文獻

相關文件

 In the context of the English Language Education, STEM education provides impetus for the choice of learning and teaching materials and design of learning

Building on the strengths of students and considering their future learning needs, plan for a Junior Secondary English Language curriculum to gear students towards the learning

- allow students to demonstrate their learning and understanding of the target language items in mini speaking

Building on the strengths of students and considering their future learning needs, plan for a Junior Secondary English Language curriculum to gear students towards the

Language Curriculum: (I) Reading and Listening Skills (Re-run) 2 30 3 hr 2 Workshop on the Language Arts Modules: Learning English. through Popular Culture (Re-run) 2 30

Wang, Solving pseudomonotone variational inequalities and pseudocon- vex optimization problems using the projection neural network, IEEE Transactions on Neural Networks 17

Define instead the imaginary.. potential, magnetic field, lattice…) Dirac-BdG Hamiltonian:. with small, and matrix

Teachers can design short practice tasks to help students focus on one learning target at a time Inferencing task – to help students infer meaning while reading. Skimming task –