• 沒有找到結果。

Can Current Conversational Assistants Satisfy Users?

N/A
N/A
Protected

Academic year: 2022

Share "Can Current Conversational Assistants Satisfy Users?"

Copied!
38
0
0

加載中.... (立即查看全文)

全文

(1)

Can Current Conversational Assistants Satisfy Users?

Yun-Nung Vivian Chen

http://vivianchen.idv.tw

(2)

2

Iron Man (2008)

(3)

N T U M I U L A B

Language Empowering Intelligent Assistants

Apple Siri (2011) Google Now (2012)

Facebook M & Bot (2015) Google Home (2016)

Microsoft Cortana (2014)

Amazon Alexa/Echo (2014)

Google Assistant (2016)

Apple HomePod (2017)

(4)

N T U M I U L A B

Task-Oriented Dialogue Systems (Young, 2000)

4

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame

request_movie

genre=action, date=this weekend

System Action/Policy

request_location

Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Database

(5)

N T U M I U L A B

• Contextual Embeddings (ELMo & BERT)

• Boost many understanding performance with pre-trained natural language

Recent Advances in NLP

?

(6)

N T U M I U L A B

(7)

N T U M I U L A B

(8)

N T U M I U L A B

Task-Oriented Dialogue Systems (Young, 2000)

8

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame

request_movie

genre=action, date=this weekend

System Action/Policy

request_location

Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Database

(9)

N T U M I U L A B

• Goal: ASR-Robust Contextualized Embeddings

✓learning contextualized word embeddings specifically for spoken language

✓achieves better performance on spoken language understanding tasks

‒ shows better results on ASR transcripts

‒ maintain similar results on manual transcripts

Mismatch between Written and Spoken Languages

Training

• Written language

Testing

• Spoken language

• Include recognition errors

(10)

N T U M I U L A B

Solution 1:

Adapting Transformer to ASR Lattices

(11)

N T U M I U L A B

BERT/GPT Pre-Training & Fine-Tuning

• Pre-Training • Fine-Tuning

Transformer Encoder

𝑤 1 𝑤 2 𝑤 2 𝑤 3

. . .

. . .

𝑤 𝑚−1 𝑤 𝑚 𝑤 𝑚 𝑤 𝑚+1 Linear

. . .

Transformer Encoder

𝑤 1 𝑤 2 . . .

𝑤 𝑚−1 𝑤 𝑚

<S> <E>

Linear

𝑦

(12)

N T U M I U L A B

• Idea: lattices may include correct words

• Goal: feed lattices into Transformer

1) Linearize 2) Binary mask

3) Probabilistic mask

ASR Lattices

<s> cheapest airfare

fair

affair air

to Milwaukee

</s>

1

0.4 0.3 0.3

1

1 1

1

1 1

Transformer Encoder

𝑤1 𝑤2 . . .

𝑤𝑚−1𝑤𝑚

<S> <E>

Linear

𝑦

(13)

N T U M I U L A B

Self-Attention (Vaswani+, 2017)

13

Dot-Prod

FFNN

Dot-Prod +

× ×

Dot-Prod

×

MatMulK

MatMulV MatMulV

softmax

MatMulQ

Dot-Prod

Value

Query Key

MatMulV

Vaswani et al., “Attention Is All You Need”, in NIPS, 2017.

Dot-Prod

×

Dot-Prod

×

MatMulV MatMulV

(14)

N T U M I U L A B

• Binary masks

• Probabilistic masks

Attention Masks

<s> cheapest airfare

fair

affair air

to Milwaukee </s>

1

0.4 0.3

0.3

1

1 1

1

1 1

(15)

N T U M I U L A B

• Airline Traveling Information System (ATIS)

• Word Error Rate: 15.5%

Spoken Language Understanding Results

86 88 90 92 94 96 98

Intent Slot

1-Best

1-Best

(16)

N T U M I U L A B

• Airline Traveling Information System (ATIS)

• Word Error Rate: 15.5%

Spoken Language Understanding Results

86 88 90 92 94 96 98

Intent

Slot Lattice-Linear

1-Best

Lattice-Linear 1-Best

(17)

N T U M I U L A B

• Airline Traveling Information System (ATIS)

• Word Error Rate: 15.5%

Spoken Language Understanding Results

86 88 90 92 94 96 98

Intent

Slot Lattice-Linear

1-Best

Lattice-Binary

Lattice-Linear 1-Best

Lattice-Binary

(18)

N T U M I U L A B

• Airline Traveling Information System (ATIS)

• Word Error Rate: 15.5%

Spoken Language Understanding Results

86 88 90 92 94 96 98

Intent

Slot Lattice-Linear

1-Best

Lattice-Binary Lattice-Prob

Lattice-Linear 1-Best

Lattice-Binary Lattice-Prob

(19)

N T U M I U L A B

86 88 90 92 94 96 98

Intent Slot

• Airline Traveling Information System (ATIS)

• Word Error Rate: 26.3%

Spoken Language Understanding Results

Lattice-Linear 1-Best

Lattice-Binary Lattice-Prob

Lattice-Linear 1-Best

Lattice-Binary Lattice-Prob

(20)

N T U M I U L A B

What if we do not have ASR lattices?

(21)

N T U M I U L A B

Solution 2:

Learning ASR-Robust Embeddings

(22)

N T U M I U L A B

ASR-Robust Contextualized Embeddings

• Confusion-Aware Fine-Tuning

• Supervised

• Unsupervised

(23)

N T U M I U L A B

• Airline Traveling Information System (ATIS)

• Word Error Rate: 16.4%

Spoken Language Understanding Results

90 91 92 93 94 95 96 97 98 99

Intent Slot

(24)

N T U M I U L A B

• Airline Traveling Information System (ATIS)

• Word Error Rate: 16.4%

Spoken Language Understanding Results

90 91 92 93 94 95 96 97 98 99

Intent

Slot + LM fine-tuning

+ LM fine-tuning

(25)

N T U M I U L A B

• Airline Traveling Information System (ATIS)

• Word Error Rate: 16.4%

Spoken Language Understanding Results

90 91 92 93 94 95 96 97 98 99

Intent

Slot + LM fine-tuning

+ LM fine-tuning + LM + Confusion (Supervised)

+ LM + Confusion (Supervised)

(26)

N T U M I U L A B

• Airline Traveling Information System (ATIS)

• Word Error Rate: 16.4%

Spoken Language Understanding Results

90 91 92 93 94 95 96 97 98 99

Intent

Slot + LM fine-tuning

+ LM fine-tuning + LM + Confusion (Supervised)

+ LM + Confusion (Supervised) + LM + Confusion (Unsupervised) + LM + Confusion (Unsupervised)

(27)

N T U M I U L A B

Task-Oriented Dialogue Systems (Young, 2000)

27

Speech Recognition

Language Understanding (LU)

• Domain Identification

• User Intent Detection

• Slot Filling

Dialogue Management (DM)

• Dialogue State Tracking (DST)

• Dialogue Policy Natural Language

Generation (NLG) Hypothesis

are there any action movies to see this weekend

Semantic Frame

request_movie

genre=action, date=this weekend

System Action/Policy

request_location

Text response

Where are you located?

Text Input

Are there any action movies to see this weekend?

Speech Signal

Database

(28)

N T U M I U L A B

(29)

N T U M I U L A B

(30)

N T U M I U L A B

Conversational AI for Unstructured Knowledge

• A machine reads big text data

• serves as a teacher

• A user can ask questions

• serves as a student

• in a conversational manner

→ Conversational QA

(31)

N T U M I U L A B

• Idea: model the difference of hidden states in multi-turn dialogues

FlowDelta: Information Gain in Dialogue Flow

Conversation Flow (over Context)

Time (Question Turns)

Δ Δ Δ … … Δ

Δ Δ Δ … … Δ

𝑡−1,𝑗

𝑡,𝑗

𝑐

𝑡,𝑗

FlowDelta:

Modeling Flow Information Gain

𝑡,2

𝑐

𝑡,2

𝑡−1,2

𝑡,1

𝑐

𝑡,1

𝑡−1,1

… …

Q1 Q2 Q3

… …

… …

(32)

N T U M I U L A B

• Idea: model the difference of hidden states in multi-turn dialogues

FlowDelta: Information Gain in Dialogue Flow

i-th Question Context

i-th Answer

FlowQA

Dialogue ReasoningEncoding

Encoding

i-th Question Context

BERT 𝑙1 BERT 𝑙k

:

BERT 𝑙k-1

i-th Answer

BERT

Dialogue Reasoning

(33)

N T U M I U L A B

• Data: QuAC, CoQA

Conversational QA Results

60 62 64 66 68 70 72 74 76 78 80

CoQA QuAC

FlowQA BERT

FlowQA BERT

(34)

N T U M I U L A B

• Data: QuAC, CoQA

Conversational QA Results

60 62 64 66 68 70 72 74 76 78 80

CoQA QuAC

FlowQA

+ Flow BERT

FlowQA

+ Flow BERT

(35)

N T U M I U L A B

• Data: QuAC, CoQA

Conversational QA Results

60 62 64 66 68 70 72 74 76 78 80

CoQA QuAC

FlowQA

+ FlowDelta

+ FlowDelta + Flow

BERT

FlowQA

+ FlowDelta

+ FlowDelta + Flow

BERT

(36)

N T U M I U L A B

QuAC Leaderboard

(37)

N T U M I U L A B

• Spoken language embeddings are needed for better conversational AI

• Written texts enough for pre-training embeddings

• Mismatch when applying to spoken language

1) Adapting Transformer to ASR lattices

2) Adapting contextualized embeddings robust to misrecognition

• Conversational QA enables unstructured information access

• FlowDelta: information gain in dialogue flow guides better understanding

Summary

(38)

N T U M I U L A B

Yun-Nung (Vivian) Chen

Assistant Professor, National Taiwan University

y.v.chen@ieee.org / http://vivianchen.idv.tw

參考文獻

相關文件

 develop a better understanding of the design and the features of the English Language curriculum with an emphasis on the senior secondary level;..  gain an insight into the

Specifically, the senior secondary English Language curriculum comprises a broad range of learning targets, objectives and outcomes that help students consolidate what they

- allow students to demonstrate their learning and understanding of the target language items in mini speaking

• Among the learning objectives for ELEKLA, the language development strategies, literary competence development strategies and attitudes specific to language and literature

• Among the learning objectives for ELEKLA, the language development strategies, literary competence development strategies and attitudes specific to language and literature

one on ‘The Way Forward in Curriculum Development’, eight on the respective Key Learning Areas (Chinese Language Education, English Language Education, Mathematics

ii. Drama as a Second Language: a Practical Handbook for Language Teachers. Cambridge: National Extension College Trust. Drama Techniques in Language Learning: a Resource Book

 Register, tone and style are entirely appropriate to the genre and text- type.  Text