• 沒有找到結果。

? Robust and Scalable Conversational AI

N/A
N/A
Protected

Academic year: 2022

Share "? Robust and Scalable Conversational AI"

Copied!
41
0
0

加載中.... (立即查看全文)

全文

(1)

Robust and Scalable Conversational AI

Computer Science & Information Engineering National Taiwan University

Yun-Nung (Vivian) Chen

?

16th WORKSHOP ON SPOKEN DIALOGUE SYSTEMS FOR PHDS, POSTDOCS & NEW RESEARCHERS (YRRSDS 2020)

(2)

Language Empowering Intelligent Assistants

Apple Siri (2011) Google Now (2012)

Google Home (2016)

Microsoft Cortana (2014)

Amazon Alexa/Echo (2014)

Google Assistant (2016)

Apple HomePod (2017) Facebook Portal (2019)

2

(3)

Task-Oriented Dialogue Systems (Young, 2000)

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame

request_movie

genre=action, date=this weekend

System Action/Policy

request_location

Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Database

3

(4)

Recent Advances in NLP

◉ Contextual Embeddings (ELMo & BERT)

○ Boost many understanding performance with pre-trained language models

?

4

(5)

5

(6)

Lift all lights to Morocco List all flights tomorrow

6

(7)

Task-Oriented Dialogue Systems (Young, 2000)

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame

request_movie

genre=action, date=this weekend

System Action/Policy

request_location

Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Database

7

(8)

Mismatch between Written and Spoken Languages

◉ Goal: ASR-Robust Contextualized Embeddings

learning spoken contextualized word embeddings

better performance on spoken language understanding tasks

Training

• Written language

Testing

• Spoken language

• Include recognition errors

8

(9)

Solution: LatticeLM

(Huang & Chen, ACL 2020)

Chao-Wei Huang and Yun-Nung Chen, “Learning Spoken Language Representations with Neural Lattice Language Modeling,”

in Proceedings of The 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020.

9

(10)

ASR Lattices for Preserving Uncertainty

◉ Idea: lattices may include correct words

<s> cheapest airfare

fair

affair air

to Milwaukee </s>

1

0.4

0.3 0.3

1

1 1

1

1 1

LatticeRNN helps

LM pre-training helps

(Ladhak, et al., 2016)

Chao-Wei Huang and Yun-Nung Chen, “Learning Spoken Language Representations with Neural Lattice Language Modeling,”

in Proceedings of The 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020.

10

(11)

Lattice Language Modeling

1) LatticeLSTM encodes nodes of a lattice

2) The goal is to predict the outgoing transitions (words) given a node’s representation

◉ The one-hypothesis lattice reduces to normal language modeling

the

, 1.0

LatticeLSTM

0.8 0.2

Linear

0.9 1.0 1.0

0.1

1.0 1.0

Chao-Wei Huang and Yun-Nung Chen, “Learning Spoken Language Representations with Neural Lattice Language Modeling,”

in Proceedings of The 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020.

Issue: LatticeLSTM runs prohibitively slow

11

(12)

Efficient Two-Stage Pre-Training

LSTM LSTM LSTM

What a day

Linear

a day <EOS>

Stage 1: Pre-Training on Sequential Texts

LatticeLSTM

the, 1.0

LatticeLSTM Max pooling

classification Fine-Tuning

the, 1.0 0.8

0.2

Linear

0.9 1.0 1.0

0.1

1.0 1.0

Stage 2: Pre-Training on Lattices

LatticeLSTM

12

(13)

Spoken Language Understanding Results

◉ Intent Prediction

○ Word Error Rate: 45.6% (SNIPS); 15.6% (ATIS)

80 85 90 95 100

ATIS SNIPS

1-Best

1-Best 1-Best +

1-Best + LatticeLSTM

LatticeLSTM

13

(14)

Spoken Language Understanding Results

◉ Intent Prediction

○ Word Error Rate: 45.6% (SNIPS); 15.6% (ATIS)

80 85 90 95 100

ATIS SNIPS

1-Best

1-Best 1-Best +

1-Best + LatticeLSTM

LatticeLSTM

LatticeLM LatticeLM

14

(15)

Spoken Language Understanding Results

◉ Intent Prediction

○ Word Error Rate: 45.6% (SNIPS); 15.6% (ATIS)

80 85 90 95 100

ATIS SNIPS

1-Best

1-Best 1-Best +

1-Best + LatticeLSTM

LatticeLSTM

LatticeLM LatticeLM

15

(16)

Spoken Language Understanding Results

◉ Dialogue Act Prediction

○ Word Error Rate: 32.0% (MRDA); 28.4% (SWDA)

50 55 60 65 70 75

SWDA MRDA

1-Best

1-Best +

LatticeLSTM

LatticeLM

1-Best

1-Best + LatticeLSTM

LatticeLM

16

(17)

What if we do not have ASR lattices?

17

(18)

Solution:

Learning ASR-Robust Embeddings

(Huang & Chen, ICASSP 2020)

Chao-Wei Huang and Yun-Nung Chen, “Learning ASR-Robust Contextualized Embeddings for Spoken Language

Understanding,” in The 45th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020.

18

(19)

ASR-Robust Contextualized Embeddings

Chao-Wei Huang and Yun-Nung Chen, “Learning ASR-Robust Contextualized Embeddings for Spoken Language

Understanding,” in The 45th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020.

◉ Confusion-Aware Fine-Tuning

○ Supervised

○ Unsupervised

19

(20)

Spoken Language Understanding Results

◉ Airline Traveling Information System (ATIS)

○ Word Error Rate: 16.4%

90 91 92 93 94 95 96 97 98 99

Intent Slot

Chao-Wei Huang and Yun-Nung Chen, “Learning ASR-Robust Contextualized Embeddings for Spoken Language

Understanding,” in The 45th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020.

20

(21)

Spoken Language Understanding Results

◉ Airline Traveling Information System (ATIS)

○ Word Error Rate: 16.4%

90 91 92 93 94 95 96 97 98 99

Intent

Slot + LM fine-tuning

+ LM fine-tuning

Chao-Wei Huang and Yun-Nung Chen, “Learning ASR-Robust Contextualized Embeddings for Spoken Language

Understanding,” in The 45th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020.

21

(22)

Spoken Language Understanding Results

◉ Airline Traveling Information System (ATIS)

○ Word Error Rate: 16.4%

90 91 92 93 94 95 96 97 98 99

Intent

Slot + LM fine-tuning

+ LM fine-tuning + LM + Confusion (Supervised)

+ LM + Confusion (Supervised)

Chao-Wei Huang and Yun-Nung Chen, “Learning ASR-Robust Contextualized Embeddings for Spoken Language

Understanding,” in The 45th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020.

22

(23)

Spoken Language Understanding Results

◉ Airline Traveling Information System (ATIS)

○ Word Error Rate: 16.4%

90 91 92 93 94 95 96 97 98 99

Intent

Slot + LM fine-tuning

+ LM fine-tuning + LM + Confusion (Supervised)

+ LM + Confusion (Supervised) + LM + Confusion (Unsupervised) + LM + Confusion (Unsupervised)

Chao-Wei Huang and Yun-Nung Chen, “Learning ASR-Robust Contextualized Embeddings for Spoken Language

Understanding,” in The 45th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020.

23

(24)

Task-Oriented Dialogue Systems (Young, 2000)

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame

request_movie

genre=action, date=this weekend

System Action/Policy

request_location

Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Database

24

(25)

Natural Language Understanding (NLU)

Parse natural language into structured semantics

NLU

Natural Language

McDonald’s is a cheap restaurant nearby the station.

Semantic Frame

RESTAURANT=“McDonald’s”

PRICE=“cheap”

LOCATION= “nearby the station”

25

(26)

Natural Language Generation (NLG)

Construct natural language based on structured semantics

Natural Language

McDonald’s is a cheap restaurant nearby the station.

Semantic Frame

RESTAURANT=“McDonald’s”

PRICE=“cheap”

LOCATION= “nearby the station”

NLG

26

(27)

Duality between NLU and NLG

Natural Language

McDonald’s is a cheap restaurant nearby the station.

Semantic Frame

RESTAURANT=“McDonald’s”

PRICE=“cheap”

LOCATION= “nearby the station”

NLG NLU

How can we leverage this dual relationship?

27

(28)

Solution:

Dual Supervised Learning for NLU & NLG

(Su et al., ACL 2019)

Shang-Yu Su, Chao-Wei Huang, and Yun-Nung Chen, “Dual Supervised Learning for Natural Language Understanding and Generation,” in Proceedings of The 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019.

28

(29)

DSL: Dual Supervised Learning (Xia et al., 2017)

◉ Proposed for machine translation

◉ Consider two domains 𝑋 and 𝑌, and two tasks 𝑋 → 𝑌 and 𝑌 → 𝑋

𝑋 𝑌

𝜽 𝒚→𝒙 𝜽 𝒙→𝒚

We have 𝑃 𝑥, 𝑦 = 𝑃 𝑥 𝑦)𝑃 𝑦 = 𝑃 𝑦 𝑥)𝑃(𝑥)

Ideally 𝑃 𝑥, 𝑦 = 𝑃 𝑥 𝑦; 𝜽 𝒚→𝒙 )𝑃 𝑦 = 𝑃 𝑦 𝑥; 𝜽 𝒙→𝒚 )𝑃(𝑥)

Xia, Y., Qin, T., Chen, W., Bian, J., Yu, N., & Liu, T. Y., “Dual supervised learning,” in Proc. of the 34th International Conference on Machine Learning, 2017.

29

(30)

Dual Supervised Learning

◉ Exploit the duality by forcing models to follow the probabilistic constraint 𝑃 𝑥 𝑦; 𝜽 𝒚→𝒙 )𝑃 𝑦 = 𝑃 𝑦 𝑥; 𝜽 𝒙→𝒚 )𝑃(𝑥)

Objective function

ቐ min 𝜃 𝑥→𝑦 𝔼 𝑙 1 (𝑓 𝑥; 𝜃 𝑥→𝑦 , 𝑦) min 𝜃 𝑦→𝑥 𝔼 𝑙 2 (𝑔 𝑦; 𝜃 𝑦→𝑥 , 𝑥)

+ 𝜆 𝑥→𝑦 𝑙 𝑑𝑢𝑎𝑙𝑖𝑡𝑦 + 𝜆 𝑦→𝑥 𝑙 𝑑𝑢𝑎𝑙𝑖𝑡𝑦

How to model the marginal distributions of 𝑋 and 𝑌?

Xia, Y., Qin, T., Chen, W., Bian, J., Yu, N., & Liu, T. Y., “Dual supervised learning,” in Proc. of the 34th International Conference on Machine Learning, 2017.

30

(31)

Dual Supervised Learning

◉ Let’s go back to NLU and NLG

Natural Language

McDonald’s is a cheap restaurant nearby the station.

Semantic Frame

RESTAURANT=“McDonald’s”

PRICE=“cheap”

LOCATION= “nearby the station”

NLG NLU

Natural Language

X

Semantic Frame

Y

log෡ 𝑷(𝒙) log෡ 𝑷(𝒚)

Shang-Yu Su, Chao-Wei Huang, and Yun-Nung Chen, “Dual Supervised Learning for Natural Language Understanding and Generation,” in Proceedings of The 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019.

31

(32)

Natural Language log ෠ 𝑃(𝑥)

◉ Language modeling

GRU

𝑥 𝑑−1

𝑃 𝑥 𝑑 𝑥 1 , … , 𝑥 𝑑−1 )

Shang-Yu Su, Chao-Wei Huang, and Yun-Nung Chen, “Dual Supervised Learning for Natural Language Understanding and Generation,” in Proceedings of The 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019.

32

(33)

Semantic Frame log ෠ 𝑃(𝑦)

◉ We treat NLU as a multi-label classification problem

◉ Each label is a slot-value pair

RESTAURANT=“McDonald’s”

PRICE=“cheap”

LOCATION= “nearby the station”

0

1 . . . 0 1

How to model the marginal distributions of 𝑦?

Shang-Yu Su, Chao-Wei Huang, and Yun-Nung Chen, “Dual Supervised Learning for Natural Language Understanding and Generation,” in Proceedings of The 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019.

33

(34)

Semantic Frame log ෠ 𝑃(𝑦)

◉ Naïve approach

○ Calculate prior probability for each label ෠ 𝑃(𝑦 𝑖 ) on the training set.

○ 𝑃 𝑦 = ς ෠ ෠ 𝑃(𝑦 𝑖 )

Assumption: labels are independent

Restaurant: “McDonald’s”

Restaurant: “KFC”

Restaurant: “PizzaHut”

Price: “cheap”

Price: “expensive”

Food: “Pizza”

Food: “Hamburger”

Food:”Chinese”

Shang-Yu Su, Chao-Wei Huang, and Yun-Nung Chen, “Dual Supervised Learning for Natural Language Understanding and Generation,” in Proceedings of The 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019.

34

(35)

Semantic Frame log ෠ 𝑃(𝑦)

◉ Masked autoencoder for distribution estimation (MADE)

2 1 3

1 2 2 1

2 1 3

Introduce sequential dependency among labels by masking certain connections

→ marginal distribution of 𝑦

Germain, M., Gregor, K., Murray, I., & Larochelle, H., “MADE: Masked autoencoder for distribution estimation,”

in Proceedings of International Conference on Machine Learning, 2015.

35

(36)

GRU

McDonald’s is

station

Linear

0

1 . . . 0 1

NLU

GRU

<BOS> McDonald’s

station

NLG

0

1 . . . 0 1

McDonald’s is <EOS>

Shang-Yu Su, Chao-Wei Huang, and Yun-Nung Chen, “Dual Supervised Learning for Natural Language Understanding

and Generation,” in Proceedings of The 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019. 36

(37)

NLU/NLG Results

◉ E2E NLG data: 50k examples in the restaurant domain

◉ NLU: F-1 score; NLG: BLEU, ROUGE

50 55 60 65 70 75

F1 BLEU ROUGE-1

NLG Baseline

NLG Baseline

NLU Baseline

37

(38)

NLU/NLG Results

◉ E2E NLG data: 50k examples in the restaurant domain

◉ NLU: F-1 score; NLG: BLEU, ROUGE

50 55 60 65 70 75

F1 BLEU ROUGE-1

NLU Baseline DSL w/o MADE

DSL w/o MADE

DSL w/o MADE NLG Baseline

NLG Baseline

38

(39)

NLU/NLG Results

◉ E2E NLG data: 50k examples in the restaurant domain

◉ NLU: F-1 score; NLG: BLEU, ROUGE

50 55 60 65 70 75

F1 BLEU ROUGE-1

DSL w/ MADE

DSL w/ MADE

DSL w/ MADE DSL w/o MADE

DSL w/o MADE NLG Baseline

NLG Baseline

NLU Baseline DSL w/o MADE

39

(40)

Summary

Robustness: spoken language embeddings are needed for better conversational AI

○ Written texts enough for pre-training embeddings

○ Mismatch when applying to spoken language

1) LatticeLM for preserving uncertainty

2) Adapting contextualized embeddings robust to misrecognition

Scalability: leveraging the duality of NLU and NLG

○ Apply dual learning to leverage the duality

○ Data distribution property is important

○ Better performance and flexibility for diverse NLU/NLG models

40

(41)

◉ Yun-Nung (Vivian) Chen

◉ Assistant Professor, National Taiwan University

◉ y.v.chen@ieee.org / http://vivianchen.idv.tw

41

參考文獻

相關文件

 In the context of the English Language Education, STEM education provides impetus for the choice of learning and teaching materials and design of learning

Building on the strengths of students and considering their future learning needs, plan for a Junior Secondary English Language curriculum to gear students towards the learning

• To enhance teachers’ knowledge and understanding about the learning and teaching of grammar in context through the use of various e-learning resources in the primary

- allow students to demonstrate their learning and understanding of the target language items in mini speaking

Building on the strengths of students and considering their future learning needs, plan for a Junior Secondary English Language curriculum to gear students towards the

Language Curriculum: (I) Reading and Listening Skills (Re-run) 2 30 3 hr 2 Workshop on the Language Arts Modules: Learning English. through Popular Culture (Re-run) 2 30

S3: And the products were the lipase fatty acid…no, no, fatty acid and glycerol and the enzyme remained unchanged. S1: Our enzyme was amylase and our substrate

Wang, Solving pseudomonotone variational inequalities and pseudocon- vex optimization problems using the projection neural network, IEEE Transactions on Neural Networks 17