Task-Oriented Dialogue Systems (Young, 2000)
Speech Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy Natural Language
Generation (NLG) Hypothesis
are there any action movies to see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Database
26
Natural Language Understanding (NLU)
◉ Parse natural language into structured semantics NLU
Natural Language
McDonald’s is a cheap restaurant nearby the station.
Semantic Frame
RESTAURANT=“McDonald’s”
PRICE=“cheap”
LOCATION= “nearby the station”
27
Natural Language Generation (NLG)
◉ Construct natural language based on structured semantics
Natural Language
McDonald’s is a cheap restaurant nearby the station.
Semantic Frame
RESTAURANT=“McDonald’s”
PRICE=“cheap”
LOCATION= “nearby the station”
NLG
28
Duality between NLU and NLG
Natural Language
McDonald’s is a cheap restaurant nearby the station.
Semantic Frame
RESTAURANT=“McDonald’s”
PRICE=“cheap”
LOCATION= “nearby the station”
NLG NLU
How can we leverage this dual relationship?
29
Shang-Yu Su, Chao-Wei Huang, and Yun-Nung Chen, “Dual Supervised Learning for Natural Language Understanding and Generation,” in Proceedings of The 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019.
Solution:
Dual Supervised Learning for NLU & NLG
(Su et al., 2019)
30
DSL: Dual Supervised Learning (Xia et al., 2017)
◉ Proposed for machine translation
◉ Consider two domains 𝑋 and 𝑌, and two tasks 𝑋 → 𝑌 and 𝑌 → 𝑋
𝑋 𝑌
𝜽 𝒚→𝒙 𝜽 𝒙→𝒚
We have 𝑃 𝑥, 𝑦 = 𝑃 𝑥 𝑦)𝑃 𝑦 = 𝑃 𝑦 𝑥)𝑃(𝑥)
Ideally 𝑃 𝑥, 𝑦 = 𝑃 𝑥 𝑦; 𝜽 𝒚→𝒙 )𝑃 𝑦 = 𝑃 𝑦 𝑥; 𝜽 𝒙→𝒚 )𝑃(𝑥)
Xia, Y., Qin, T., Chen, W., Bian, J., Yu, N., & Liu, T. Y., “Dual supervised learning,” in Proc. of ICML, 2017.
31
Dual Supervised Learning
◉ Exploit the duality by forcing models to follow the probabilistic constraint 𝑃 𝑥 𝑦; 𝜽 𝒚→𝒙 )𝑃 𝑦 = 𝑃 𝑦 𝑥; 𝜽 𝒙→𝒚 )𝑃(𝑥)
Objective function
ቐ min 𝜃 𝑥→𝑦 𝔼 𝑙 1 (𝑓 𝑥; 𝜃 𝑥→𝑦 , 𝑦) min 𝜃 𝑦→𝑥 𝔼 𝑙 2 (𝑔 𝑦; 𝜃 𝑦→𝑥 , 𝑥)
+ 𝜆 𝑥→𝑦 𝑙 𝑑𝑢𝑎𝑙𝑖𝑡𝑦 + 𝜆 𝑦→𝑥 𝑙 𝑑𝑢𝑎𝑙𝑖𝑡𝑦
How to model the marginal distributions of 𝑋 and 𝑌?
Xia, Y., Qin, T., Chen, W., Bian, J., Yu, N., & Liu, T. Y., “Dual supervised learning,” in Proc. of ICML, 2017.
32
Dual Supervised Learning
◉ Let’s go back to NLU and NLG
Natural Language
McDonald’s is a cheap restaurant nearby the station.
Semantic Frame
RESTAURANT=“McDonald’s”
PRICE=“cheap”
LOCATION= “nearby the station”
NLG NLU
Natural Language
X
Semantic Frame
Y
log 𝑷(𝒙) log 𝑷(𝒚)
Shang-Yu Su, Chao-Wei Huang, and Yun-Nung Chen, “Dual Supervised Learning for Natural Language Understanding and Generation,” in Proceedings of The 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019.
33
Natural Language log 𝑃(𝑥)
◉ Language modeling
GRU
𝑥 𝑑−1
𝑃 𝑥 𝑑 𝑥 1 , … , 𝑥 𝑑−1 )
Shang-Yu Su, Chao-Wei Huang, and Yun-Nung Chen, “Dual Supervised Learning for Natural Language Understanding and Generation,” in Proceedings of The 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019.
34
Semantic Frame log 𝑃(𝑦)
◉ We treat NLU as a multi-label classification problem
◉ Each label is a slot-value pair
RESTAURANT=“McDonald’s”
PRICE=“cheap”
LOCATION= “nearby the station”
0
1 . . . 0 1
How to model the marginal distributions of 𝑦?
Shang-Yu Su, Chao-Wei Huang, and Yun-Nung Chen, “Dual Supervised Learning for Natural Language Understanding and Generation,” in Proceedings of The 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019.
35
Semantic Frame log 𝑃(𝑦)
◉ Naïve approach
○ Calculate prior probability for each label 𝑃(𝑦 𝑖 ) on training set.
○ 𝑃 𝑦 = ς 𝑃(𝑦 𝑖 )
Assumption: labels are independent
Restaurant: “McDonald’s”
Restaurant: “KFC”
Restaurant: “PizzaHut”
Price: “cheap”
Price: “expensive”
Food: “Pizza”
Food: “Hamburger”
Food:”Chinese”
Shang-Yu Su, Chao-Wei Huang, and Yun-Nung Chen, “Dual Supervised Learning for Natural Language Understanding and Generation,” in Proceedings of The 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019.
36
Semantic Frame log 𝑃(𝑦)
◉ Masked autoencoder for distribution estimation (MADE)
2 1 3
1 2 2 1
2 1 3
Introduce sequential dependency among labels by masking certain connections
→ marginal distribution of 𝑦
Germain, M., Gregor, K., Murray, I., & Larochelle, H., “MADE: Masked autoencoder for distribution estimation,”
in Proceedings of International Conference on Machine Learning, 2015.
37
GRU
McDonald’s is
…
station
Linear
0
1 . . . 0 1
NLU
GRU
<BOS> McDonald’s
…
station
NLG 0
1 . . . 0 1
McDonald’s is <EOS>
Shang-Yu Su, Chao-Wei Huang, and Yun-Nung Chen, “Dual Supervised Learning for Natural Language Understanding
and Generation,” in Proceedings of The 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019. 38
NLU/NLG Results
◉ E2E NLG data: 50k examples in the restaurant domain
◉ NLU: F-1 score; NLG: BLEU, ROUGE
50 55 60 65 70 75
F1 BLEU ROUGE-1
NLG Baseline
NLG Baseline
NLU Baseline
39
NLU/NLG Results
◉ E2E NLG data: 50k examples in the restaurant domain
◉ NLU: F-1 score; NLG: BLEU, ROUGE
50 55 60 65 70 75
F1 BLEU ROUGE-1
NLU Baseline DSL w/o MADE
DSL w/o MADE
DSL w/o MADE NLG Baseline
NLG Baseline
40
NLU/NLG Results
◉ E2E NLG data: 50k examples in the restaurant domain
◉ NLU: F-1 score; NLG: BLEU, ROUGE
50 55 60 65 70 75
F1 BLEU ROUGE-1
NLU Baseline DSL w/ MADE
DSL w/ MADE
DSL w/o MADE DSL w/ MADE DSL w/o MADE
DSL w/o MADE NLG Baseline
NLG Baseline
41
Task-Oriented Dialogue Systems (Young, 2000)
Speech Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy Natural Language
Generation (NLG) Hypothesis
are there any action movies to see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Database
42
43
44
Unstructured Knowledge Access
◉ A machine reads big text data
○ serves as a teacher
◉ A user can ask questions
○ serves as a student
○ in a conversational manner
→ Conversational QA
45
Solution: FlowDelta
(Yeh & Chen, 2019)
Yi-Ting Yeh and Yun-Nung Chen, "FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension," in Proceedings of Machine Reading for Question Answering Workshop at EMNLP (MRQA), 2019.
46
FlowDelta: Information Gain in Dialogue Flow
◉ Idea: model the difference of hidden states in multi-turn dialogues
Conversation Flow (over Context)
Time (Question Turns)
Δ Δ Δ … … Δ Δ Δ Δ … … Δ
ℎ
𝑡−1,𝑗ℎ
𝑡,𝑗𝑐
𝑡,𝑗FlowDelta: Modeling Flow Information Gain
ℎ
𝑡,2𝑐
𝑡,2ℎ
𝑡−1,2ℎ
𝑡,1𝑐
𝑡,1ℎ
𝑡−1,1… …
Q1 Q2 Q3
… …
… …
Yi-Ting Yeh and Yun-Nung Chen, "FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension," in Proceedings of Machine Reading for Question Answering Workshop at EMNLP (MRQA), 2019.
47
FlowDelta (Yeh & Chen, 2019)
◉ Idea: model the difference of hidden states in multi-turn dialogues
i-th Question Context
i-th Answer
FlowQA
Dialogu e R easoning Encodi ng
Encodi ng
i-th Question
Context
BERT 𝑙
1BERT 𝑙
k:
BERT 𝑙
k-1i-th Answer BERT
Dialogu e Reaso ning
48
Conversational QA Results
◉ Data: QuAC, CoQA
60 62 64 66 68 70 72 74 76 78 80
CoQA QuAC
FlowQA BERT
FlowQA BERT
Yi-Ting Yeh and Yun-Nung Chen, "FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension," in Proceedings of Machine Reading for Question Answering Workshop at EMNLP (MRQA), 2019.
49
Conversational QA Results
◉ Data: QuAC, CoQA
60 62 64 66 68 70 72 74 76 78 80
CoQA QuAC
FlowQA
+ Flow BERT
FlowQA
+ Flow BERT
Yi-Ting Yeh and Yun-Nung Chen, "FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension," in Proceedings of Machine Reading for Question Answering Workshop at EMNLP (MRQA), 2019.
50
Conversational QA Results
◉ Data: QuAC, CoQA
60 62 64 66 68 70 72 74 76 78 80
CoQA QuAC
FlowQA
+ FlowDelta
+ FlowDelta + Flow
BERT
FlowQA
+ FlowDelta
+ FlowDelta + Flow
BERT
Yi-Ting Yeh and Yun-Nung Chen, "FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension," in Proceedings of Machine Reading for Question Answering Workshop at EMNLP (MRQA), 2019.