Wei-Bin Liang, Chung-Hsien Wu, and Chia-Ping Chen*
Dept. of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan
*Dep. of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan
Introduction
Developing a spoken dialogue system requires paying attention to several critical issues
Imperfect Automatic Speech Recognition module leads to task failures
Partial Sentence Tree to keep all keywords and some non-keywords
z-score to replace the unreliable recognized words with Filler
Semantic features used by Spoken Dialogue Understanding module
Derivation Rules represented as a vector
Dialogue Management module require functionality, including capturing intentions and interactional patterns of the user
Data-driven approach to decide the Dialogue Act (DA) types
Dialogue History for dialogue controlling
Framework
Models for Dialogue Act Detection
At dialogue turn t, given user’s utterance U
tand dialogue historical information H
t, the most likely DA
Data Collection
TTS
Speak Listen
User Operator Listen Select/
Type
Role DA type Utterance O Greeting Welcome
U Query_Service What service can you provide?
O Ans_Service I can provide the information of
historic spot and timetable of railway.
U Query_Intro Can you give an introduction to Anping-Fort?
O Ans_Intro Anping-Fort is also known as Fort Zeelandia.
It was first built by the Dutch in 1624 as ...
*
arg max ( | , )
= arg max ( , | , ) arg max max ( , | , )
arg max max ( | , ) ( | , , ) arg max max ( | ) ( | , )
arg max max ( , ) (
( , ) , )
t t t
A
t t
A
t t
A
t t t t
A
t t
A
A t
g A
tA P A U H
P A U H P A U H
P U H P A U H P U P A H
f U h A H
∈Ω
∈Ω
∈Ω
∈Ω
∈Ω
∈Ω
=
≈
≈
≈
∝
∑
W
W
W
W
W
W W
W W
W
W W W
for all possible word sequence best -1 ASR
output
ASR Score
t t
U ⊥ H
Lexical Score History Score
History Score
Based on a Markov model assumption for the chain of the DAs
h( A, H
t) = P( A
t= A | A
t−1)
DR-DA Matrix Training
Partial Sentence Tree Construction
Derivation Rules (DRs) Extraction
Data-driven approach to decide DA types
Concept of Spectral Clustering Algorithm
DR-DA Matrix Construction
All utterances and transcriptions are used to building this matrix
Entropy-based normalization
N Transcriptions of Users’ Turns
Similarity Matrix
N × N
first Q eigenvalues for the size of DA
Each Partial Sentence σ will be used to extract the Derivation Rules b
σASR Score
HTK-based Mandarin speech recognizer
297 Lexical words
39 dim MFCC
86.1% Accuracy
z -score to detect the
unreliable recognized words
ASR Output
Where ether Anping Fort
▼
Detected and Substitution Where Filler Anping Fort
Corpus
144 Dialogues Q-Data (U)
1,586 utterances 297 token types
A-Data (O)
1,603 utterances 317 token types
z(w) = f (w) −
µ
(w)σ
(w) Filler< − 2Lexical Score
Broken into two terms
( )
( , ) max a
( ,
( )
, )
a ( , )
=
R
T j s T
N
k k j
g A W g
g A s b
b v
A W
σ
A
σ σ
∈Γ
α
≈
∏
DR Score
Named Entity Score DA A Keywords (KW) α
Greet Welcome, Hello Spot Anping-Fort
Time Morning
Evaluations
Number of DA Type
Detection accuracy (%) for the lexical score
Detection accuracy (%) for eighted history score
#(DA Types) 37 38 39 Accuracy 82.7 84.3 77.2
40%-SIM 60%-SIM 86.1%-ASR 100%-REF
DR-DA 26.3 47.4 82.9 93.3
value of λL 0.5 0.6 Accuracy (%) 84.3 84.6
Anping Fort
Where
is
the filler
filler
the Anping
Fort
Anping Fort
Anping Fort PS1 PS2 PS3 PS4
KW NonKW
Ref Where is the Anping Fort
Parsing Result (ROOT
(SBARQ
(WHADVP (WRB Where)) (SQ (VBP filler)
(NP (NNP Spot))))) Derivation Rules
DR1: WHADVP (WRB Where) DR2: SQ (VBP filler)
DR3: WHADVP (WRB Where) DR4: NP (NNP Spot)
σ=PS
4 Named EntitySubstitution Stanford
Parser Where Filler Spot
1 1 0 . . DR1
DR2
. . .
DRL