Wei-Bin Liang, Chung-Hsien Wu, and Chia-Ping Chen*

(1)

Wei-Bin Liang, Chung-Hsien Wu, and Chia-Ping Chen*

Dept. of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan

*Dep. of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan

Introduction

 Developing a spoken dialogue system requires paying attention to several critical issues

 Imperfect Automatic Speech Recognition module leads to task failures

 Partial Sentence Tree to keep all keywords and some non-keywords

 z-score to replace the unreliable recognized words with Filler

 Semantic features used by Spoken Dialogue Understanding module

 Derivation Rules represented as a vector

 Dialogue Management module require functionality, including capturing intentions and interactional patterns of the user

 Data-driven approach to decide the Dialogue Act (DA) types

 Dialogue History for dialogue controlling

Framework

Models for Dialogue Act Detection

 At dialogue turn t, given user’s utterance U

_t

and dialogue historical information H

_t

, the most likely DA

Data Collection

TTS

Speak Listen

User Operator Listen Select/

Type

Role DA type Utterance O Greeting Welcome

U Query_Service What service can you provide?

O Ans_Service I can provide the information of

historic spot and timetable of railway.

U Query_Intro Can you give an introduction to Anping-Fort?

O Ans_Intro Anping-Fort is also known as Fort Zeelandia.

It was first built by the Dutch in 1624 as ...

*

arg max ( | , )

= arg max ( , | , ) arg max max ( , | , )

arg max max ( | , ) ( | , , ) arg max max ( | ) ( | , )

arg max max ( , ) (

( , ) , )

t t t

A

t t

A

t t

A

t t t t

A

t t

A

A t

g A

t

A P A U H

P A U H P A U H

P U H P A U H P U P A H

f U h A H

∈Ω

=

≈

∝

∑

W

W W

W

W W W

for all possible word sequence best -1 ASR

output

ASR Score

t t

U ⊥ H

Lexical Score History Score

History Score

 Based on a Markov model assumption for the chain of the DAs

h( A, H

^t

) = P( A

_t

= A | A

_t₋₁

)

DR-DA Matrix Training

 Partial Sentence Tree Construction

 Derivation Rules (DRs) Extraction

 Data-driven approach to decide DA types

 Concept of Spectral Clustering Algorithm

 DR-DA Matrix Construction

 All utterances and transcriptions are used to building this matrix

 Entropy-based normalization

N Transcriptions of Users’ Turns

Similarity Matrix

N × N

first Q eigenvalues for the size of DA

Each Partial Sentence ^σ will be used to extract the Derivation Rules ^b

_σ

ASR Score

 HTK-based Mandarin speech recognizer

 297 Lexical words

 39 dim MFCC

 86.1% Accuracy

 z -score to detect the

unreliable recognized words

ASR Output

Where ether Anping Fort

▼

Detected and Substitution Where Filler Anping Fort

Corpus

144 Dialogues Q-Data (U)

1,586 utterances 297 token types

A-Data (O)

1,603 utterances 317 token types

z(w) = f (w) −

µ

(w)

σ

(w) ^Filler< − 2

Lexical Score

 Broken into two terms

( )

( , ) max a

( ,

( )

, )

a ( , )

=

R

T j s T

N

k k j

g A W g

g A s b

b v

A W

σ

A

σ σ

∈Γ

α

≈

∏



DR Score

Named Entity Score DA A Keywords (KW) α

Greet Welcome, Hello Spot Anping-Fort

Time Morning

Evaluations

 Number of DA Type

 Detection accuracy (%) for the lexical score

 Detection accuracy (%) for eighted history score

#(DA Types) 37 38 39 Accuracy 82.7 84.3 77.2

40%-SIM 60%-SIM 86.1%-ASR 100%-REF

DR-DA 26.3 47.4 82.9 93.3

value of λ_L 0.5 0.6 Accuracy (%) 84.3 84.6

(2)

Anping Fort

Where

is

the filler

filler

the Anping

Fort

Anping Fort

Anping Fort PS1 PS2 PS3 PS4

KW NonKW

Ref Where is the Anping Fort

Parsing Result (ROOT

(SBARQ

(WHADVP (WRB Where)) (SQ (VBP filler)

(NP (NNP Spot))))) Derivation Rules

DR1: WHADVP (WRB Where) DR2: SQ (VBP filler)

DR3: WHADVP (WRB Where) DR4: NP (NNP Spot)

σ=PS

₄ Named Entity

Substitution Stanford

Parser Where Filler Spot

1 1 0 . . DR₁

DR₂

. . .

DR_L

vector b

_σ

Wei-Bin Liang, Chung-Hsien Wu, and Chia-Ping Chen*