Task-Oriented Dialogue Systems

(1)

Task-Oriented Dialogue Systems (Young, 2000)

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame

request_movie

genre=action, date=this weekend

System Action/Policy

request_location

Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Database

26

(2)

Natural Language Understanding (NLU)

◉ Parse natural language into structured semantics NLU

Natural Language

McDonald’s is a cheap restaurant nearby the station.

Semantic Frame

RESTAURANT=“McDonald’s”

PRICE=“cheap”

LOCATION= “nearby the station”

27

(3)

Natural Language Generation (NLG)

◉ Construct natural language based on structured semantics

Natural Language

McDonald’s is a cheap restaurant nearby the station.

Semantic Frame

RESTAURANT=“McDonald’s”

PRICE=“cheap”

LOCATION= “nearby the station”

NLG

28

(4)

Duality between NLU and NLG

Natural Language

McDonald’s is a cheap restaurant nearby the station.

Semantic Frame

RESTAURANT=“McDonald’s”

PRICE=“cheap”

LOCATION= “nearby the station”

NLG NLU

How can we leverage this dual relationship?

29

(5)

Shang-Yu Su, Chao-Wei Huang, and Yun-Nung Chen, “Dual Supervised Learning for Natural Language Understanding and Generation,” in Proceedings of The 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019.

Solution:

Dual Supervised Learning for NLU & NLG

(Su et al., 2019)

30

(6)

DSL: Dual Supervised Learning (Xia et al., 2017)

◉ Proposed for machine translation

◉ Consider two domains 𝑋 and 𝑌, and two tasks 𝑋 → 𝑌 and 𝑌 → 𝑋

𝑋 𝑌

𝜽 _𝒚→𝒙 𝜽 _𝒙→𝒚

We have 𝑃 𝑥, 𝑦 = 𝑃 𝑥 𝑦)𝑃 𝑦 = 𝑃 𝑦 𝑥)𝑃(𝑥)

Ideally 𝑃 𝑥, 𝑦 = 𝑃 𝑥 𝑦; 𝜽 _𝒚→𝒙 )𝑃 𝑦 = 𝑃 𝑦 𝑥; 𝜽 _𝒙→𝒚 )𝑃(𝑥)

Xia, Y., Qin, T., Chen, W., Bian, J., Yu, N., & Liu, T. Y., “Dual supervised learning,” in Proc. of ICML, 2017.

31

(7)

Dual Supervised Learning

◉ Exploit the duality by forcing models to follow the probabilistic constraint 𝑃 𝑥 𝑦; 𝜽 _𝒚→𝒙 )𝑃 𝑦 = 𝑃 𝑦 𝑥; 𝜽 _𝒙→𝒚 )𝑃(𝑥)

Objective function

ቐ min _𝜃 _𝑥→𝑦 𝔼 𝑙 ₁ (𝑓 𝑥; 𝜃 _𝑥→𝑦 , 𝑦) min _𝜃 _𝑦→𝑥 𝔼 𝑙 ₂ (𝑔 𝑦; 𝜃 _𝑦→𝑥 , 𝑥)

+ 𝜆 _𝑥→𝑦 𝑙 _{𝑑𝑢𝑎𝑙𝑖𝑡𝑦} + 𝜆 _𝑦→𝑥 𝑙 _{𝑑𝑢𝑎𝑙𝑖𝑡𝑦}

How to model the marginal distributions of 𝑋 and 𝑌?

Xia, Y., Qin, T., Chen, W., Bian, J., Yu, N., & Liu, T. Y., “Dual supervised learning,” in Proc. of ICML, 2017.

32

(8)

Dual Supervised Learning

◉ Let’s go back to NLU and NLG

Natural Language

McDonald’s is a cheap restaurant nearby the station.

Semantic Frame

RESTAURANT=“McDonald’s”

PRICE=“cheap”

LOCATION= “nearby the station”

NLG NLU

Natural Language

X

Semantic Frame

Y

log෡ 𝑷(𝒙) log෡ 𝑷(𝒚)

33

(9)

Natural Language log ෠ 𝑃(𝑥)

◉ Language modeling

GRU

𝑥 _𝑑−1

𝑃 𝑥 _𝑑 𝑥 ₁ , … , 𝑥 _𝑑−1 )

34

(10)

Semantic Frame log ෠ 𝑃(𝑦)

◉ We treat NLU as a multi-label classification problem

◉ Each label is a slot-value pair

RESTAURANT=“McDonald’s”

PRICE=“cheap”

LOCATION= “nearby the station”

0 1 . . . 0 1

How to model the marginal distributions of 𝑦?

35

(11)

Semantic Frame log ෠ 𝑃(𝑦)

◉ Naïve approach

○ Calculate prior probability for each label ෠ 𝑃(𝑦 _𝑖 ) on training set.

○ 𝑃 𝑦 = ς ෠ ෠ 𝑃(𝑦 _𝑖 )

Assumption: labels are independent

Restaurant: “McDonald’s”

Restaurant: “KFC”

Restaurant: “PizzaHut”

Price: “cheap”

Price: “expensive”

Food: “Pizza”

Food: “Hamburger”

Food:”Chinese”

36

(12)

Semantic Frame log ෠ 𝑃(𝑦)

◉ Masked autoencoder for distribution estimation (MADE)

2 1 3

1 2 2 1

2 1 3

Introduce sequential dependency among labels by masking certain connections

→ marginal distribution of 𝑦

Germain, M., Gregor, K., Murray, I., & Larochelle, H., “MADE: Masked autoencoder for distribution estimation,”

in Proceedings of International Conference on Machine Learning, 2015.

37

(13)

GRU

McDonald’s is

…

station

Linear

0 1 . . . 0 1

NLU

GRU

<BOS> McDonald’s

…

station

NLG 0

1 . . . 0 1

McDonald’s is <EOS>

Shang-Yu Su, Chao-Wei Huang, and Yun-Nung Chen, “Dual Supervised Learning for Natural Language Understanding

and Generation,” in Proceedings of The 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019. ³⁸

(14)

NLU/NLG Results

◉ E2E NLG data: 50k examples in the restaurant domain

◉ NLU: F-1 score; NLG: BLEU, ROUGE

50 55 60 65 70 75

F1 BLEU ROUGE-1

NLG Baseline

NLU Baseline

39

(15)

NLU/NLG Results

◉ E2E NLG data: 50k examples in the restaurant domain

◉ NLU: F-1 score; NLG: BLEU, ROUGE

50 55 60 65 70 75

F1 BLEU ROUGE-1

NLU Baseline DSL w/o MADE

DSL w/o MADE

DSL w/o MADE NLG Baseline

NLG Baseline

40

(16)

NLU/NLG Results

◉ E2E NLG data: 50k examples in the restaurant domain

◉ NLU: F-1 score; NLG: BLEU, ROUGE

50 55 60 65 70 75

F1 BLEU ROUGE-1

NLU Baseline DSL w/ MADE

DSL w/ MADE

DSL w/o MADE DSL w/ MADE DSL w/o MADE

DSL w/o MADE NLG Baseline

NLG Baseline

41

(17)