a Programmable Dialogue

(1)

國立成功大學電機工程學系

碩士論文

基於 SPCE061A 嵌入式單晶片實現可更換文本架構之互動式語音對話學習硬體系統

Embedded System Design based on SPCE061A for Interactive Spoken Dialogue Learning System with

a Programmable Dialogue

研究生: 郭昀昇 Student: Yung-Shing Kuo 指導教授: 王駿發 Advisor: Jhing-Fa Wang

Department of Electrical Engineering National Cheng Kung University

Tainan, Taiwan, R.O.C July, 2005

中華民國九十四年七月

(2)

(3)

(4)

中文摘要

基於 SPCE061A 嵌入式單晶片實現可更換文本架構之互動式語音對話學習硬體系統

郭昀昇

^*

王駿發

^**

國立成功大學電機工程學系

本論文提出一可更換文本架構之互動式語音對話學習系統，並將系統實現於微控制器(Micro Controller Unit, MCU) SPCE061A 的嵌入式單晶片硬體系統，使其單晶片具備有語音辨識及具有互動性的語音對話功能。在互動式語音對話學習系統上，除可提供內定文本的劇情式情境對話練習外，也可自行錄製對話語句，更換對話文本內容，以實現多樣化的對話內容。而在系統使用過程中，與使用者進行語音對話時，系統將處理使用者對話事件觸發與追蹤、使用者語音對話屬性擷取與系統所要回應使用者對話之事件偵測。對於嵌入式單晶片硬體系統實現方面，本系統將語音對話屬性擷取所用到的 LPC 參數擷取及事件偵測所需的動態時間校準(DTW)演算法實現在運算時脈及記憶體資源拘限的微控制器(MCU, Micro Controller Unit)SPCE061A 硬體模組上，在有限的運算能力及記憶體資源的影響下，本論文內提出利用音訊取樣時的閒置時間，來處理 LPC 參數擷取，以及利用有限的記憶體資源來執行動態時間校準(DTW) 的方法，而達到在音框同步處理上的即時性效能。最後，希望藉由該硬體系統的實現，

可以提昇互動式語音對話系統的應用層面，如互動式語音對話 IC 玩具，使能建構成語音學習及資訊的互通平台，提供國內數位內容產業使用。

* 作者

(5)

ABSTRACT

Embedded System Design based on SPCE061A for Interactive Spoken Dialogue Learning System

with a Programmable Dialogue

Yung-Shing Kuo

^*

Jhing-Fa Wang

^**

Department of Electrical Engineering, National Cheng Kung University

In this thesis, an embedded system based on SPCE061A for interactive spoken dialogue learning system (ISDLS) with a programmable dialogue is designed, and it provides capabilities for speech recognition and interactive spoken dialogue. For the proposed ISDLS, besides default dialogue, users can record programmable dialogue content for multiform practice of conversation. When users begin to have a dialogue, ISDLS provides functional capabilities for dialogue trigger and tracking, dialogue attribute extraction and dialogue event feedback to make ISDLS interactive with users. For the design of embedded system, the LPCC features from dialogue attribute extraction and dynamic time warping algorithm for dialogue event feedback are implemented with the low cost and resource-limited micro controller unit (MCU) - SPCE061A. While sampling input speech, the idle time of MCU is used to perform LPCC extraction and DTW for frame-synchronous operations to have real-time capability for ISDLS. We implement the ISDLS embedded system, and the system can be used for the application of interactive speech toys for improving the recreation of digital contents industry.

* the author

(6)

ACKNOWLEDGMENTS

感謝指導教授王駿發老師，在成功大學研究所的兩年裡悉心的教導，引領我進入學術和知識上的寶庫，並精鍊了我的思維，老師教導我如何去分析問題及解決問題，

培養我獨立思考的能力，此外，在生活與待人處事上也讓學生學習到許多。在此由衷感謝王駿發老師，並致上最高的敬意與謝意。

感謝林順傑學長熱心的指導，讓我在系統架構上有新的想法，並且在論文撰寫上給與我許多協助與指正，林博川學長與楊宗憲學長在硬體及軟體上的鼎力相助，讓我在系統實現上有新的突破。也感謝我的同窗好友：宋豪靜、李俊賢、廖上嘉及簡菎廷在這兩年研究所生涯中互相激勵，帶給我新的視野及思想上的活躍。

在這幾年的生活中，感謝邱伯伯及邱媽媽的照顧，讓我隻身在高雄及台南仍然能感受到如家人般的溫暖，非常謝謝您們。感謝女友璿芝的支持與體諒，與我一同度過許多快樂及艱難的時光。

最後要特別感謝年長辛勤的偉大雙親，孕育我健全的心智和體魄，並使我能無後顧之憂的完成學業，我所有的榮耀皆是屬於您們，在此以無限感恩的心情，謹將此論文獻給您們。

(7)

LIST OF TABLES

Table 3-1 The configuration register used in ISDLS... 21

Table 3-2 Memory allocation of question databased and answer database ... 27

Table 3-3 Circuit signals description ... 36

Table 4-1 Profile of autocorrelation based on SPCE061A ... 42

Table 4-2 ISDLS functions ... 43

(10)

LIST OF FIGURES

Figure 2-1 Question and Answer separating. ... 4

Figure 2-2 ISDLS database construction... 5

Figure 2-3 An example of use case for ISDLS... 6

Figure 2-4 Data flow of ISDLS ... 7

Figure 2-5 Data flow of programmable dialogue mode ISDLS ... 9

Figure 2-6 Procedure of dialogue trigger and tracking... 10

Figure 2-7 Flow chart of endpoint detection ... 13

Figure 2-8 Flow chart of speech feature extraction... 14

Figure 2-9 The comparison between two speech templates ... 16

Figure 3-1 The block diagram of SPCE061A ... 20

Figure 3-2 SUNPLUS SPCE061 EMU development board layout ... 22

Figure 3-3 The system control flow of ISDLS ... 25

Figure 3-4 Memory allocation for the ISDLS ... 26

Figure 3-5 2K words Programming SRAM scheduling and overlay ... 28

Figure 3-6 Sampling and features extraction simultaneously ... 29

Figure 3-7 Frame-synchronous LPCC with Frame-overlap displacement... 30

Figure 3-8 DTW execution with two memory buffer... 32

Figure 3-9 Execution sequence of floating-point and fixed-point... 35

Figure 3-10 Design flow of floating-point conversion... 35

Figure 3-11 Coarse system block diagram ... 36

Figure 3-12 Power supply circuit ... 37

Figure 3-13 Microphone circuit... 38

Figure 3-14 Speaker circuit ... 38

Figure 3-15 Keyboard circuit ... 39

Figure 3-16 SIO Memory interface ... 39

(11)

Figure 3-17 Write and Read timing of SIO ... 40

Figure 3-18 Overall system circuit ... 40

Figure 4-1 The coding flow of SUNPLUS IDE ... 41

Figure 4-2 The design architecture of ISDLS ... 42

Figure 4-3 The prototype board of ISDLS. ... 44

(12)

CHAPTER 1

INTRODUCTION

1.1 Background

To be carried with conversation ability is always difficult for many learners in language learning. Talking with foreign teachers in class or having a conversation with foreigner on the street may be a good idea to improve conversation skill. But it’s not easy to find a partner especially for speaking foreign language to practice and build up your language abilities at once.

At the present days, speech recognition technology makes computers become the embodiment of the foreigner for the new learners in language learning. Computers could start up speech recognition technique to understand what the speaker says, and also users can interact with the computer they are talking with to learn language. Users transmit speech data into the computer, and then the computer compares input speech with template which saved in database and finally scores it.

There are many language learning products which use speech recognition technique on the market such as EZ Talk, Talk to Me, MyET and Live ABC. Those products let users practice on the computer, and let user develop capability in oral expression and listening comprehension. But that software just only could be executed on the desktop computer. It’s not portable for users. Otherwise there are also many language learning machines on the market. They are all small and portable, but they don’t have built-in speech recognition function. They only replay the speech content to users, and users can’t interact with those machines. Besides, all of them are expensive.

(13)

1.2 Motivation

The implementations based on Dynamic time wrapping (DTW) have been commonly employed in mobile phones, PDA and toys. But currently most of these implementations are using DSP platform which is very expensive.

Nowadays, MCU (Micro Controller Unit) is becoming faster and cheaper. It becomes possible to execute speech recognition calculation. We implement the DTW on MCU and use MCU as kernel to build an embedded system for language learning. It’s provided with advantage of portable and inexpensive.

In our thesis, we implement the entire embedded system based on SPCE061A. The SPCE061A only requires a few peripheral components for system operation. And it provides a flexible, powerful debug and development environment.

We also present the architecture for language conversation training. This architecture provides capability of interactive dialogue between users and our system. Besides, users can record their own dialogue content in our system. It is a programmable dialogue system for language learning.

1.3 Organization of Thesis

The rest of this thesis is organized into the following chapters:

CHAPTER 2 THE PROPOSED ISDLS ARCHITECTURE

Overview of Interactive Spoken Dialogue Learning System (ISDLS) and its function explanation and theorem.

CHAPTER 3 IMPLEMENTATION FOR ISDLS BASED ON SPCE061A

Description of modified LPCC and DTW algorithms, the design of SPCE061A control flow and hardware implementation for proposed

(14)

CHAPTER 4 SYSTEM IMPLEMENTATION AND VERIFICATION

The implementation of software and hardware.

CHAPTER 5 CONCLUSIONS AND FUTURE WORKS

Conclusion and future works of this thesis.

(15)

CHAPTER 2

THE PORPOSED INTERACTIVE SPOKEN DIALOGUE LEARNING SYSTEM

2.1 Interactive Spoken Dialogue Learning System with a programmable dialogue

Interactive Spoken Dialogue Learning System (ISDLS) is designed for practicing language conversation. In the basic language instructional course, the students usually practice constant conversation. For this reason, we make ISDLS a limited dialogue system.

We store the dialogue document into the ISDLS database. The dialogue document is the teaching materials of language instructional courses. The dialogue document is consisted of speech sounds of conversation. And students can practice these dialogues with ISDLS.

Figure 2-1 Question and Answer separating

(16)

We simplify the data structure of dialogue by briefly classifying the dialogue into two classifications: question and answer. In Figure 2-1, the dialogue document has been separated into the question database and the answer database. When the user asks ISDLS a question, ISDLS search the question in the question database. If the question is in the database, the system response the corresponded answer to the user. Another case, the user has been asked one question by ISDLS and replies one answer to ISDLS, then the system will search the answer in the answer database and check it. If the answer is correct, ISDLS will send out next question to student.

Figure 2-2 ISDLS database construction

Figure 2-2 is a procedure of ISDLS database construction. The databases of ISDLS consist of two kinds of data type, one is speech sounds of dialogue and another is speech

(17)

features. The block of “dialogue attribute extraction” is the function of speech features extraction. We extract the speech features from dialogue document and store these features in the database. And we store the speech sounds of dialogue document into the database.

ISDLS uses these speech features as templates to recognize the input speech and responses the speech sounds to the user.

Figure 2-3 An example of using ISDLS contextual dialogue mode

Figure 2-3 is an example of using ISDLS. Student can be a questioner or answerer.

The conversation between student and ISDLS is fixed. The user need to response the

(18)

correct dialogue with the system. The procedure of “Questions and Answers follow” gives the student practice in pairing and matching the base sentences into conversational form.

Dialogue attribute extraction

Dialogue event feedback

Dialogue Finish?

End Yes Dialogue trigger

and tracking

Read

No Contextual

Dialogue Database

Figure 2-4 Data flow of ISDLS contextual dialogue mode.

The figure 2-4 shows that data flow of ISDLS contextual dialogue mode. When speaker want to talk with ISDLS, the major function of “Dialogue trigger and tracking” is waiting for speech input. It will confirm that the speech signal is incoming and then record it. After recording speech sounds, ISDLS sent these speech signals into function block of

“Dialogue attribute extraction”. This function takes charge of converting speech signals into speech features. Then these speech features could be used to compare with feature template which saved in database in the function block of “Dialogue event feedback”. This

(19)

system checks the similarity between input speech and template speech and then responding speech to users.

From the prior descriptions, we discuss the ISDLS and its operation procedure. We call it the “Contextual Dialogue Mode” of ISDLS. Besides, we define another mode called the “Programmable Dialogue Mode” for ISDLS.

The Figure 2-5 is the data flow of programmable dialogue mode. The procedure of programmable dialogue mode is very similar to the contextual dialogue mode but the dialogue source is no longer been dialogue document. Users can record their own dialogue (answer and question) into ISDLS by themselves.

Another difference of programmable dialogue mode is the “Q&A index table”. The Q&A index table indexes that which question corresponds to which answer. When users record the question and answer into database, the system establishes the link of Q&A and record in Q&A index table. If users input a question, the system will feedback the corresponding answer to the users by checking index table. If users add new dialogue into database, ISDLS will modify Q&A index table automatically.

(20)

Dialogue trigger and tracking

Dialogue event feedback

End

Programmable Dialogue Recording Interactive

Dialogue

Write Read

Speech (Question) Speech

(Answer)

Programmable Dialogue Database

Figure 2-5 Data flow of programmable dialogue mode ISDLS.

(21)

2.2 Dialogue trigger and tracking

In our dialogue system, we detect speech input automatically. Users don’t need to push buttons to record speech. When ISDLS operating, Users is just like talking with human being. It provides hand-free interface. Therefore ISDLS needs to detect when to start and when to stop. This mechanism of detecting speech signals variation, we call it dialogue trigger and tracking.

Endpoint Detection (Starting point)

Dialogue trigger?

Speech

Endpoint Detection (Stop point)

Dialogue tracking?

Stop No

Yes

Yes No

Samping

Frame Speech input

(22)

Figure 2-6 is procedure of dialogue trigger and tracking. We discuss every function below:

a. Sampling: The ISDLS receives the analog signals and sample them into

discrete time signals by DAC. The frequency of human speech is about 4 KHz, the sampling rate of ISDLS will be 8 KHz to avoid aliasing.

b. Frame: As sampling rate is 8 KHz, we receive 8000 sampling points per

second. We cut these sampling data into short time period for signals processing. The period usually is 10 ms for endpoint detection, and it’s 80 sampling points in 8 KHz sampling rate.

c. Endpoint Detection: Endpoint detection algorithm used to determine

when the speech starts and when the speech stops. ISDLS detect the start of input speech then record. And detect the end of speech then stop processing. This procedure is key function of dialogue trigger and tracking. We regard start point detection as dialogue trigger and stop point detection as dialogue tracking. Endpoint detection algorithm usually combines energy estimation and zero-crossing rate. We will describe below:

i. Energy estimation: Define E(n) as the energy of discrete time

signals X(n) in Eq (2. 1):

∑

⁻

=

= ¹

0 2( ) )

(

N

n

n X n

E Eq (2.1)

The value of N is 80 in 8 KHz sampling rate. It’s the 10 ms interval energy. Because the E(n) of speech sounds is much larger than soundless and noise, we could use this specific to detect

(23)

voiced segment. The computation of square of X(n) is large so we use absolute instead of square.

∑

⁻

=

= ¹

0

) ( )

(

N

n

n X n

E Eq (2.2)

ii. Zero-crossing rate: For voiced signals, if positive value and

negative value is joint, it mains that the signal is crossing the zero.

The number of voice signals crossing the zero in one interval is defined as zero-crossing rate (ZCR). The formula shows below:

∑

⁻

[ ]

=

+

×

= ¹

0

)) 1 ( ( ))

( ( )

(

N

n

n X sign n

X sign u n

ZCR Eq (2.3)

⎩⎨

⎧

<

= ≥

0 1

0 ) 0

( x

x x

u ,

⎩⎨

⎧

<

−

= ≥

0 1

0 ) 1

( x

x x sign

E(n) is difficult to differentiate nasal and voiceless consonant

from noise because the E(n) of them is similar to noise. The ZCR of noise is larger than voiced speech and voiceless consonant.

Besides ZCR of silence is zero. So we make a threshold of ZCR to differentiate sounds from noise. If ZCR of speech segment is greater than the threshold, we categorize it to noise.

iii. E(n) and ZCR Combination: We combine E(n) and ZCR to

detect endpoint. We define two threshold of E(n): TH_low and TH_high. And the threshold of ZCR is TH_zcr. We calculate the energy of the frame. TH_low is a conservative threshold of energy.

It helps to detect nasal and voiceless consonant sounds. Because their energy is lower. TH_high is greater than TH_low and used to detect voiced speech. Voiced speech has larger energy. TH_zcr

(24)

assisted in detecting voiceless sounds and noise. Figure 2-7 illustrates the flow chart of endpoint detection.

Figure 2-7 Flow chart of endpoint detection (a) Flow chart of starting speech detection. (b) Flow chart of stopping speech detection.

2.3 Dialogue attribute extraction

Human speech is variety and complex, and it contains large information. These information includes time and frequency domain data. Human brain can handle easily but computer can’t. In order to process speech signals by computer, we have to reduce the computing data and extract appropriate features from speech signals.

Figure 2-8 is the flow chart of feature extraction. In this thesis, the frame number is 256 and 64 points overlapping between frames. The sampling rate is 8 KHz. So ISDLS receives 8000 samples per second. Because frame number is 256 and overlap is 64, we can get about 42 frames per second on our system. When speech signals are incoming, the

(25)

There are several speech features could be used like LPCC, MFCC and PLP and we choose LPCC (Linear Prediction Cepstrum Coefficient) as our purposed extraction algorithm. The major advantage of LPCC is low computation. It is suitable to implement in embedded system.

Figure 2-8 Flow chart of speech feature extraction

The LPCC includes three parts: autocorrelation, Durbin`s recursive and cepstrum coefficient conversion. The formula of autocorrelation function is list in Eq (2.4):

Eq (2.4) In general, the value of k is 0~11 and we use the R(1)~R(10) as 10 order autocorrelation coefficient.

After autocorrelation, we calculate LPC (Linear Predictive Coding, LPC) coefficients by Durbin’s recursive algorithm and we receive a[n].

Finally, we evaluate cepstrum coefficients. The formula of cepstrum coefficient conversion Eq (2.5).

∑

⁻

=

+

=

¹

0

) ( ) ( )

(

N

n

k n x n x k

R

(26)

Eq (2.5)

The C[n] is final result of speech feature extraction, and we will save the features to ISDLS database in the flexible-content mode.

2.4 Dialogue event feedback

Because our dialogue system needs to know what speaker says, a speech recognition mechanism is necessary. We select the Dynamic Time Warping (DTW) as our speech recognition algorithm

.

Speech recognition based on DTW is simple to implement and fairly effective for small-vocabulary speech recognition.

DTW has been widely used to derive the overall distortion between two speech templates. In our system, each speech template (answer or question) consists of a sequence of speech vectors. The overall distortion measure is computed from the accumulated distance between two feature vectors that are aligned between two speech templates with minimal overall distortion. The DTW method can warp two speech templates (x₁ x₂ …x_N) and (y1 y2 …yM) in the time dimension to alleviate nonlinear distortion as illustrated in Figure 2-9.

DTW is equivalent to the problem of finding the minimum distance in the trellis between two templates. The d(i,j) is a distance between two speech vectors xi and yj. To find the optimal path between starting point (1,1) and end point (N,M) from left to right, we need to compute the optimal accumulated distance D(N,M). We can enumerate all possible accumulated distance from (1,1) to (N,M) and identify the one that has the minimum distance. The D(N,M) is the minimum distance between two speech templates.

∑

⁻

=

≤

−

⋅

⋅ +

=

¹

1

10 1

] [ ] [ ) ( ] [ ] [

n

k

n k

n a k n C n k

a n C

ts coefficien LPC

n a

ts coefficien Cepstrum

n C

: ] [

:

]

[

(27)

Figure 2-9 The comparison between two speech templates X=x1x2…xN and Y=y1y2…yM

The minimum distance D(i,j) must satisfy the following equation:

⎪⎭

⎪⎬

⎫

⎪⎩

⎪⎨

⎧

−

− +

=

) , 1 (

) 1 , 1 (

) 1 , ( min ) , ( ) , (

j i D

j i D j

i d j i

D

Eq (2.6)

Eq (2.6) indicates us only need to consider and keep only the best move for each pair although there are three possible paths. The recursion allows the optimal path search to be conducted incrementally from left to right. We can identify the optimal match yj with respect to x_i and save the index in a back pointer table B(i,j) as we move forward. The optimal path can be back traced after the optimal path is identified.

The minimum distortion of two speech templates is more small, They are more similar. In our system, we compare the input speech with speech templates which saved in database. Using DTW, system can find out the most similar speech template of input speech.

In the contextual dialogue mode, the dialogue sequence is constant. So the speech template is fixed and the feedback of dialogue is sequencing. Therefore, the function of

(28)

two speech is greater than a threshold, the system will consider that input speech is matching with template sentence and then feedback the dialogue.

In the programmable dialogue mode, each speech sentence has its own index number in dialogue database. After DTW, the system gets an index number of the input speech and the system looks for this index number in the index table. The index table records every relation between question and answer. If you ask one question to the ISDLS, it will feedback corresponding answer from answer database. In the ISDLS, the same question can correspond to one or more different answers. For example, if I say “How are you? ”, the ISDLS will randomly response “ I am fine.” or “ I am not happy.” Otherwise, users could add other speech response just like “Not bad”. The ISDLS will store it in the answer database and modify the Q&A index table.

(29)

CHAPTER 3

EMBEDDED SYSTEM DESIGN FOR ISDLS BASED ON SPCE061A

In the past, if we want to implement the speech recognition system, we will need a powerful computer to process speech signal. Nowadays, MCU (Micro Controller Unit) is becoming more and more faster. It is fast enough to execute speech recognition. Besides, the cost of an embedded system constructed by MCU is cheaper than a personal computer system. Therefore, we choose MCU as target device to implement our purpose system.

There are many different kinds of MCU in the market, just like 8051 of INTEL, ARM7 of ARM, S3C2410TK of SAMSUNG, EM60000 of ELAN, and SPCE061A of SUNPLUS. Why we choose the SPCE061A as our purpose-built system? Because of SPCE061A is 16-bit architecture MCU, it provides more precision in speech signal calculation than traditional 8-bit architecture. Otherwise SPCE061A provides 8 channels ADC (includes one channel microphone input) and 2 channels DAC. It has more advantage than other MCU, because it saves the ADC and DAC circuit when we build embedded system.

Before implementing the proposed system on a embedded system, realizing how many resources we can use is necessary. Therefore, the beginning of this chapter is an overview of the development platform. After the overview of SPCE061A, the HW/SW design and the system implementation are described.

(30)

3.1 Overview of SPCE061A

To implement the proposed ISDLS system, we utilize SUNPLUS^TM SPCE061 EMU development board to carry out the embedded design. The principal component of the SPCE061EMU is the SPCE061A device.

3.1.1 Specification of SPCE061A device

The SPCE061A is a 16-bit architecture product, carries the SUNPLUS 16-bit microprocessor, µ'nSP®. The SPCE061A operating voltage of 3.0V through 3.6V . The operating speed of 0.32MHz through 49.152MHz yield the SPCE061A to be easily used in different applications.

In SPCE061A, the memory capacity includes 32K-word flash memory, and a 2K-word working SRAM.

As Figure 3-1 , the SPCE061A is equipped with a SUNPLUS 16-bit core, µ'nSP®. The µ'nSP®include eight registers : R1 – R4 ( General-purpose Registers), PC ( Program Counter), SP ( Stack Pointer), BP ( Base Pointer) and SR ( Segment Register). The interrupts include three FIQs ( Fast Interrupt Request), eight IRQs ( Interrupt Request), and one software interrupt, BREAK instruction

.

(31)

Figure 3-1: The block diagram of SPCE061A

In SPCE061A, the amount of SRAM is 2K-word, range from 0x0000 through 0x07FF.

The access speed of SRAM is two CPU clock cycles. Flash memory is range from 0x8000 through 0xFFFF. Flash memory is a high speed memory with access speed of two CPU clock, as SRAM.

The system clock is provide a base frequency 32768 Hz and to pump the frequency from 20.48MHz to 49.152MHz for system clock. The SPCE061A has eight channels 10-bit ADC. The eight channels of ADC can be seven channels of line-in from IOA[6:0] or one channel microphone input through amplifier and AGC controller. The SPCE061A has two 10-bit D/A with 2.0mA or 3.0mA driving current for audio outputs, DAC1 and DAC2.

(32)

The SPCE061A provides several configuration registers. The configuration register can control system function and property and every register can be defined individually.

For example, in order to access the internal flash ROM, we write 0xAAAA to P_Flash_Ctrl (register address is 7555H, $7555H) to enable flash access first. Then we can write 0x5511 to P_Flash_Ctrl ($7555H) for the page erase or write 0x5533 to P_Flash_Ctrl ($7555H) for program Flash ROM.

Table 3-1 shows that configuration register which used in ISDLS. It includes registers about I/O configuration, timer/counter controller, interrupt, system clock, ADC/DAC, serial interface I/O control and flash access. We read/write specific values from/to those registers to complete ISDLS operation. So it’s very important for control design of ISDLS.

Table 3-1 the configuration register used in ISDLS Configuration

register

Address Description

P_IOA_Data 0x7000 Write Data into data register and read from IOA pad P_IOA_Buffer 0x7001 Write Data into buffer register and read from buffer

register

P_IOA_Dir 0x7002 Direction vector for IOA P_IOA_Attrib 0x7003 Attribute vector for IOA

P_IOA_Latch 0x7004 Latch PortA data for key change wake-up

P_IOB_Data 0x7005 Write Data into the data register and read from IOB pad

P_IOB_Buffer 0x7006 Write Data into buffer register and read from buffer register

P_IOB_Dir 0x7007 Direction vector for IOB P_IOB_Attrib 0x7008 Attribute vector for IOB P_TimerA_Data 0x700A Data port for TimerA P_TimerA_Ctrl 0x700B Control Port for TimerA

P_INT_Ctrl 0x7010 Control port for interrupt source P_INT_Clear 0x7011 Clear interrupt source

(33)

P_SystemClock 0x7013 Change system clock frequency P_ADC 0x7014 Data Port for AD

P_ADC_Ctrl 0x7015 Control Port for AD control P_DAC1 0x7017 Data Port for DAC1

P_DAC_Ctrl 0x702A Control Port for two DAC and audio output mode P_Flash_Ctrl 0x7555 Control Port for flash access mode

P_SIO_Data 0x701A Serial interface I/O P_SIO_Addr_Low 0x701B Address Port low P_SIO_Addr_Mid 0x701C Address Port middle P_SIO_Addr_High 0x701D Address Port high

P_SIO_Ctrl 0x701E Control Port

P_SIO_Start 0x701F Start port for serial interface P_SIO_Stop 0x7020 Stop port for serial interface

3.1.2 SUNPLUS SPCE061A EMU Development Board

Designers can use the SUNPLUS SPCE061 EMU development board as a desktop development system. It provides a hardware platform to start developing embedded systems immediately; Figure 3-2 shows a layout diagram of the SPCE061 EMU development board.

Figure 3-2 SUNPLUS SPCE061 EMU development board layout

(34)

The SPCE061 EMU development board could communicate with personal computer by SUNPLUS PROBE. SUNPLUS also provides integrated development environment (IDE) tool to develop embedded software. We develop embedded software on IDE and download firmware into SPCE061A by PROBE. Using PROBE, we debug the source codes with SPCE061A online and modify source codes on IDE quickly and easily.

The SPCE061 EMU development board provides 32-bit general IO pins. Users could add peripheral devices like keyboard pad and LED display by 32-bit general IO pins. In addition, development board provides additional SIO pins. It could connect external serial ram memory by SIO pins.

Next section, we describe the control design of ISDLS in detail.

3.2 Control design for the embedded system based on SPCE061A

In chapter 2.1, we discuss the rough sketch of ISDLS. In this section, we will implement whole ISDLS into the embedded system and implement all system function.

Those functions include endpoint detection, LPCC feature extraction and DTW. In previous section, we know how much resource we can use in SPCE061A and how to control SPCE061A MCU by configuration register. Besides, in order to implement ISDLS, we have to understand the assembler language of SUNPLUS μ`nSP^TMand program assembler to accomplish the function desired. We will not describe assembler language and program codes in this thesis but scheme out the control and design flow for ISDLS.

3.2.1 System control Flow

According to chapter 2, there are two modes in ISDLS. They are contextual dialogue mode and programmable dialogue mode. In contextual dialogue mode, we prepare speech

(35)

everything ready beforehand in contextual dialogue mode. Oppositely, we don’t have any constant dialogue data saved in database in the programmable dialogue mode. All of dialogue data will be recorded by speakers themselves.

Figure 3-3 shows the system control flow of ISDLS. In figure 3-3, we can see there are two modes in the control flow: contextual dialogue mode and programmable dialogue mode. In contextual dialogue mode, we input speech data, and the system compares input speech with the dialogue document. It includes endpoint detection, LPCC feature extraction and speech responding.

There are two modes in programmable dialogue mode, one is programmable dialogue recording and another is interactive dialogue.

In interactive dialogue, the processing procedure is similar to contextual dialogue mode but it needs to index recognition result. In programmable dialogue recording mode, speaker could record their own speech dialogue. The system process speech dialogue then save them in question database or answer database preparing for dialogue mode. The processing procedure includes SUNPLUS A2000 speech compression and LPCC feature extraction.

The SUNPLUS A2000 is speech compression algorithm proposed by SUNPLUS. Its compression rate is 16Kbps. The system compresses the speech dialogue and save them in the database. The system could save memory usage by using A2000 compression technique.

The LPCC feature extraction rate is 12.5kbps in our system.

(36)

Figure 3-3 The system control flow of ISDLS

(37)

3.2.2 Memory Allocation

Before discussing the design flows of system functions, we have to know the allocation of memory map, especially for an embedded system which have far fewer system resources. The SPCE061A provides several kinds of memories for a variety of purposes. Figure 3-4 illustrates the memory map for the proposed dialogue system and the access address for the MCU.

Figure 3-4 Memory allocation for the ISDLS (a) External memory (b) Internal memory

(38)

In figure 3-4(b), there are 2K words high speed SRAM and 32K words programmable flash ROM within the SPCE061A device. The software instruction codes are stored in the programmable flash ROM from 0x8000 to 0xA7FF. The address region from 0xA800 to 0xFFF5 is reserved for question database. This region contains the question features and compressed waveform for ISDLS. MCU accesses the question database to recognize questions which inputted by speaker or play question waveform to speaker. Figure 3-4(a) shows the memory map for the answer databases which stored in the external memory of embedded system. Table 3-2 is memory allocation of contextual dialogue database and programmable dialogue database.

Table 3-2 Memory allocation of question database and answer database

Location Address Region Description

Internal Memory 0xA800~0xFFF5 Contextual Dialogue Database External Memory 0x0000~0xFFFF Programmable Dialogue Database

The size of question database is 20K words. If we use all memory space of question database to store speech data which compressed by A2000, we could save 20 second compressed speech. If we store speech features, we could save about 25 second speech features. In addition, the size of answer database is 32K words. We can save 32 second compressed speech or 40 second speech features.

The programming SRAM in internal memory is 2K words. Now, we analyze the memory usage of ISDLS. The variable data of LPCC is 242 words and parameter data of DTW processing is 1380 words. Besides, the data used by SUNPLUS A2000 processing is 1606 words. Total programming memory usage of ISDLS is 3228 words. This result shows that the programming SRAM is not enough for our use. This is a problem we have to solve.

(39)

In ISDLS, the interactive dialogue mode and programmable dialogue recording mode are not processing at the same time.It means that we don’t need to define all parameter and variable at the same time. And we didn’t need to set all of them in the programming SRAM for processing. We can schedule and define them one by one.

The memory space of variables will be overlaid on different processing time. It shows on figure 3-6. The memory definition for LPCC, DTW and SUNPLUS A2000 could be set on the same memory space.

Figure 3-5 2K words Programming SRAM scheduling and overlay.

2K words Programming SRAM

SUNPLUS A2000 LPCC and DTW

Define 1622 words

Define 1606 words

0x0000 0x0000

0x07FF 0x07FF

Time

(40)

3.2.3 LPCC and DTW units

LPCC and DTW occupy most computation of ISDLS. In order to satisfy real-time operation for ISDLS, we have to modify the LPCC and DTW processing procedure.

Besides, LPCC and DTW put a lot memory to use. Here we propose several methods to accelerate LPCC and DTW and economize the use of memory.

3.2.3.1 Frame-Synchronous design for LPCC unit

When speech signals are incoming, the system convert it to speech features frame by frame. In this thesis, the frame length is 256 with overlapping 64 and sampling rate is 8 KHz. Usually, we record all sampling data in memory and then processing.

When sampling, the time interval between the two sampling points measures about 1/8000 second. We consider that the time interval of sampling is the extra processing resource could be used. In our system, we didn’t wait to finish sampling all speech signals but we execute sampling and features extraction simultaneously. It shows as figure 3-7. We use the idle time which was spread on the sampling procedure to execute LPCC (Feature extraction).

Figure 3-6 Sampling and features extraction simultaneously

The feature extraction algorithm was described in chapter 2.3. In order to use the idle time, first we set three frame buffers: frame buffer 1, 2 and 3 to save sampling data. The

(41)

When frame buffer has been stored full and then the LPCC of this frame buffer will be execute. The length of execution frame is 256. In figure 3-7, the size of frame buffer 2,3 is 192. Because the system will combine them with the last 64 points which stored in the previous frame buffer to process LPCC. The last 64 points are frame overlapping.

Access Point I

Access Point II

Access Point III

Copy to first 64 points Access Point I

Access Point II

Access Point III

Frame 2 Processing Time Frame 1

Processing Time

Frame 3 Processing Time

Frame 6 Processing Time Overlap

Overlap

Overlap Frame buffer 2 Frame buffer 1

Frame buffer 3

Frame buffer 2

Frame buffer 3 Frame buffer 1

Figure 3-7 Frame-synchronous LPCC with Frame-overlap displacement Now we describe how frame-synchronous manner works step by step below:

Step 1: To Store sampling data into frame buffer 1. We will store sampling data behind

(42)

data arrive point of #256 (Access point I), then process first LPCC calculation of frame buffer 1.

Step 2: Continue storing sampling data into frame buffer 2, meanwhile the first LPCC

calculation is running. When number of stored data arrive point of #448 (Access point II), the frame buffer 2 is full. And the first LPCC calculation should have been complete. Then we combine frame buffer 2 with last 64 points of frame buffer 1. And the second LPCC calculation is in motion.

Step 3: Repeat Step2. When number of stored data arrive point of #640 (Access point III),

frame buffer 3 is full. We process third LPCC calculation of frame buffer 3. At the same time, we copy the last 64 points from frame buffer 3 to the first 64 points of frame buffer 1. It is overlapping for next LPCC calculation.

Step 4: Repeat Step.1 to Step.3 until reaching wanted frame numbers or conditions.

In Step 3, we copy the last 64 points from frame buffer 3 to the first 64 points of frame buffer 1. Using this manner, we can save the memory usage of the LPCC calculation.

Because it can reuse memory buffer 1 to 3 when sampling and feature extraction. No matter how long of speech we receive, we only use 640 words to execute features extraction.

The limit of MCU performance is also the limit of frame-synchronous LPCC. The principle characteristic of frame-synchronous LPCC is to use the idle time when sampling.

If LPCC calculation can’t complete at one frame processing time (showed in figure 3-7), the operation of frame-synchronous LPCC will be break down. So we have to make sure that MCU has enough performance.

(43)

3.2.3.2 Frame-Synchronous design for DTW unit

In chapter 2.4, we discussed the DTW. It shows that the system needs a lot of memory when process DTW. But we didn’t have much memory resource could be used in the embedded system. After analyzing the DTW algorithm, we abandon back pointer table and modify DTW as frame-synchronous processing so that we can reduce the memory usage.

The another advantage of frame-synchronous DTW is that we can process DTW in the sampling idle time such as frame-synchronous LPCC.

We only use two memory buffers (previous state buffer and present state buffer) to execute DTW. In figure 3-8, the suffix of previous state is “t-1” and present state is “t”.

The system compute minimum distance between input frame feature and reference frame feature at time t, it has to consult the previous result which was computed at time t-1 in the previous state buffer.

Previous State

Present State

Reference Frame Feature

Input Frame Feature t

Input Frame Feature t-1

Dt-1(1) dt(1)

Dt-1(2) dt(2)

y1

t

dt(M) dt(M-1) Dt-1(M-1)

Dt-1(M)

dt(3) dt(4) Dt-1(3)

Dt-1(4)

y2

y3

y4

yM-1

yM

Figure 3-8 DTW execution with two memory buffer.

(44)

The execution flow is described below:

Step 1: Initialization

If Previous state buffer is empty then compute { )

1 ( ) 1

( _t

t d

D =

} Else {

D_t(1)=D_t₋₁(1)+d_t(1) }

Step 2: Iteration

For i = 2,…..,M compute {

If Previous state buffer is empty then compute { D_t(i)=d_t(i)+D_t(i−1)

} Else {

⎪⎭

⎪⎬

⎫

⎪⎩

⎪⎨

⎧

−

− +

=

−

) (

) 1 (

) 1 ( min ) ( ) (

1 1

i D

i d i

d i D

t t

t

} }

Step 3: Reproduction

For i = 1,…..,M execute { D_t₋₁(i)=D_t(i) }

) (M

D_t is the minimum distance.

Step 4: Repeat Step.1 to Step.3 until all input frame features had been computed.

(45)

3.2.4 Fixed-point design for LPCC features

Because SPCE061A lacks the FPU (Floating-point unit) hardware so that it doesn’t have good computation performance on floating-point calculation. So we have to convert the floating-point to fixed-point. The IEEE754 is 32bit single precision floating-point format. The maximum positive value of IEEE754 is (2−2⁻²³)×2¹²⁷ and minimum negative value is −(2−2⁻²³)×2¹²⁷. It could express large range in floating-point number.

The format of 32bit fixed-point includes 16bit integer and 16bit decimal. The maximum positive value of 32bit fixed-point is 32767.9999847 and minimum value is -32768. The expression range of fixed-point number is more limited than floating-point. But it is faster than floating-point when process the same numerical operation. Because that the numerical operation of fixed-point is the same as integer number. In LPCC processing, the maximum value of autocorrelation result is greater than 32768 in our system. So we choose floating-point format to save autocorrelation result and process Durbin’s recursive procedure and cepstrum coefficient with floating-point. But this method has a critical weakness: MCU has no floating-point instructions and all floating-point calculations are analogue. In ISDLS, DTW occupy the most computation time of ISDLS. Because DTW has a lot of floating-point operations. If we process DTW with floating-point, the system will not be real-time. Otherwise, we can use fix-point instead of floating-point. But we have to extend integer bit of fix-point to 32bit for saving autocorrelation result. We make 32bit fix-point become 48bit fix-point so we need more memory space to store computation result. It is not suitable for our system, because we don’t have so much memory resource. It is trade-off between performance and memory resource.

We propose a method to solve this dilemma. The cepstrum coefficient conversion is last step of LPCC procedure and it produces the speech features. After numerical analysis,

(46)

features in fix-point format. And DTW could process those speech features within fix-point operation. It makes ISDLS be more efficient. We design a procedure to convert the floating-point value of speech features to fix-point number. Figure 3-9 is execution sequence of our proposed method. And figure 3-10 is the design flow of floating-point and fix-point conversion.

Floating-point

Fixed-point

Floating to Fix point

Durbin`s Recursive Procedure and Cepstrum

Coefficient

DTW

Figure 3-9 Execution sequence of floating-point and fixed-point.

Figure 3-10 Design flow of floating-point conversion

(47)

3.3 The circuit design for the embedded system

Figure 3-11 is the coarse system block diagram. The circuit of embedded system includes power supply, microphone, speaker, keyboard and external memory. SPCE061A is the core of the embedded system; its function is to compute programming codes and control peripheral device.

Figure 3-11 Coarse system block diagram

Table 3-3 is the circuit signals connected to SPCE061A and its description. We use 23 pins of SPCE061A.

Table 3-3 Circuit signals description

Pin Name Width Direction Description

Power supply circuit

VDD 1 bit Input Positive supply for logic.

VSS 1 bit Input Ground reference for logic and I/O pins.

AVDD 1 bit Input Positive supply for analog circuit including ADC and DAC.

AVSS 1 bit Input Ground reference for analog circuit including ADC and DAC.

VDDIO 1 bit Input Positive supply for I/O pins.

(48)

VSSIO 1 bit Input Ground reference for I/O pins.

Keyboard

IOA[7:0] 7 bit In-Out Bi-directional I/O ports Microphone circuit interface signals

VMIC 1 bit Output Microphone power supply.

MICP 1 bit Input Microphone differential input (positive).

MICN 1 bit Input Microphone differential input (negative).

VADREF 1 bit Output AD reference voltage.

MICOUT 1 bit Output Microphone amplifier output.

OPI 1 bit Input Microphone amplifier input.

AGC 1 bit Input AGC control pin.

Speaker circuit interface signals

DAC1 1 bit Output Audio DAC1 output.

External Serial RAM interface signals IOB0 1 bits In-Out Serial interface clock.

IOB1 1 bit In-Out Serial interface data.

Every part of our circuit can be described briefly as follow section.

3.3.1 Power supply circuit

Our system only use two battery to supply power. VDDIO is reference positive voltage for I/O pins. It makes logic high is 3.6V.

Figure 3-12 Power supply circuit

(49)

3.3.2 Speech recording circuit

Speech recording circuit consists of microphone circuit, AGC circuit and ADC.

Because SPCE061A comprises a lot portion of ADC and AGC circuit inner chip, outer circuit becomes simple and succinct. Voice transform to electronic signals via microphone and capacitance filter out direct current then deliver to inner amplifier of SPCE061A. AGC circuit can control amp. gain automatically and hold electronic signals in applicable voltage range.

220μ 1K

MIC 10K

10K

3K

3K 0.22μ 0.22μ

47μ MICP

MICN VMIC

VADREF

SPCE061A

0.22μ

5.1K

5000p

MICOUT

OPI

AGC 470K 4.7μ

Figure 3-13 Microphone circuit

3.3.3 Amplifier circuit

Because SPCE061A contains built-in 10bit DAC circuit, we only to connect speaker and pin DAC1 with simple BJT amplifier.

Figure 3-14 Speaker circuit

(50)

3.3.4 Keyboard circuit

We design keyboard circuit to control system flow. Keyboard pad connects with IOA[7:0].

Figure 3-15 Keyboard circuit

3.3.5 Memory interface circuit

Because flash ROM of SPCE061A is only 32K words, it’s not enough to store dialogue data. We expand memory space with external serial memory SPRS512. The size of SPRS512 is 32K words. SPCE061A communicates with SPRS512 by SIO (IOB0 and IOB1). Figure 3-17 is timing diagram of SIO. SIO transfer data from SPCE061A bit by bit.

Figure 3-16 SIO Memory interface

(51)

Figure 3-17 Write and Read timing of SIO

3.3.6 Overall system circuit

We integrate all the circuit into figure 3-18 . In figure 3- 18, the pins X32I and X32O are oscillator crystal input and output. It provides 32768Hz oscillation. We reduce the circuit components as possible as we can in order to decrease the hardware cost of the embedded system.

(52)

CHAPTER 4

SYSTEM IMPLEMENTATION AND VERIFICATION

4.1 System implementation

SUNPLUS µ'nSP® Integrated Development Environment (IDE) provides GCC complier and Xasm16 assembler compiler, so we can develop software by C and assembly code. Figure 4-1 is the coding flow of SUNPLUS IDE. We establish the project to design ISDLS. In our design, we use c code to establish system control flow. C code is a friendly programming language. And C has constructs that dictate which statements execute, based on the values of expressions. It’s easily to design control flow of ISDLS.

Figure 4-1: The coding flow of SUNPLUS IDE

(53)

Actually, C code doesn’t have good work efficiency in SPCE061A. We guess that GCC doesn’t optimize for µ'nSP® assembler. The µ'nSP® IDE provides “profile” function that allows us to analyze part of program. The analysis includes instruction cycles, IRQ, label flow analysis and some significant information. We analyze the same autocorrelation program with different source code by profile tool.

Table 4-1 Profile of autocorrelation based on SPCE061A

Function Source code Instruction cycles

Autocorrelation C 675835

Autocorrelation Assembler 79850

The result is shown on Table 4-1. For this reason, we realize the interrupt control, sampling, memory access, endpoint detection, LPCC and DTW by assembler. Figure 4-2 is the design architecture of ISDLS. We develop the ISDLS function in separated file. Every file handle different purpose operation. System.asm is the most important one of our design. It contains floating-point operation, LPCC and DTW functions. ISR.asm handles the interrupt program. When interrupt occurs, it calls the functions of ISR.asm. Flash.asm and SIO.asm take charge of memory access, and Hardware.inc saves the system parameter.

All functions are described on Table 4-2. We use those functions in the main.c. The main.c control the system execution flow.

Main.c

SIO.asm System.asm ISR.asm Hardware.inc

Flash.asm

(54)

Table 4-2 ISDLS functions

Function name Description

System.asm System_Initial( ) Initialize system.

TimerA_Interrupt_Init( ) Initialize interrupt service.

F_Key_Scan( ) Keyboard scan function.

ADD32( ) 32bit add operation SUB32( ) 32bit sub operation.

ABS32( ) 32bit absolute operation.

fabs( ) Floating-point absolute operation.

fmin( ) Return the minimum value of two floating-point values.

AutoCorrelation( ) Autocorrelation initialization.

F_AutoCorrelation1( ) Return the autocorrelation result.

CalculateOneFrame( ) Execute Durbin`s recursive and cepstrum coefficient conversion

FloatToFix( ) Converter floating-point to fix-point.

FDTW( ) Fix-point DTW calculation.

ISR.asm

FIQ( ) High priority interrupt function, used for A2000 compression algorithm.

IRQ1( ) Low priority interrupt function, used for speech signals sampling and endpoint detection.

SIO.asm

SIOInitial( ) Initialize the serial RAM I/O ports.

SIOSendAWord( ) Send a word to serial RAM.

SIOReadAWord( ) Read a word from serial RAM.

Flash.asm CleanFlashPage( ) Clean the flash ROM

CopyFeature( ) Copy speech features to flash ROM

(55)

4.2 System verification

We made the ISDLS prototype board to verify the hardware design. Its circuit contains power supply, microphone, amplifier, 4x4 keyboard, SPRS512 serial RAM and 7-segment display to execute ISDLS functions. Also we build the PROBE circuit in order to download program into SPCE061A. The prototype board shows as Figure 4-3.

We tested this board and found there are some background noises in amplifier circuit.

We conjecture that it is the BJT mismatch in amplifier circuit. We will fix this problem and reduce a few components of this prototype board for layout in the future.

Figure 4-3: The prototype board of ISDLS

(56)

CHAPTER 5

CONCLUSIONS AND FUTURE WORKS

We proposed a dialogue system for conversation training. We call it interactive spoken dialogue learning system (ISDLS). The dialogue content of ISDLS is changeable, and we can replace it with new dialogue document comes from another language lesson. In addition, users can record their own dialogue by ISDLS.

For a portable and real-time dialogue learning system, this work proposes the embedded system based on SPCE061A. According to the computation complexity analysis of the proposed ISDLS system, the highest computation load comes from LPCC and DTW calculation. We optimize the LPCC and DTW algorithm to make ISDLS on real-time operation. The SPCE061A was tested and found to be capable of performing within the expected specifications of real-time. The system design is developed with SUNPLUS SPCE061 EMU development board. Therefore, for realization of the proposed ISDLS system, this work presents the embedded system circuit design, speech recognition implementation and control flow design of ISDLS.

In the future, with various design of software and hardware, the proposed embedded system architecture can be used for different application of sound recognition. And the SPCE061A can integrate into other hand-held devices or speech toys to provide speech recognition function.

(57)

REFERENCES

[1] Christophe Lévy, Georges Linarès1, Pascal Nocera1, Jean-François Bonastre1,

“REDUCING COMPUTATIONAL AND MEMORY COST FOR CELLULAR PHONE”, IEEE ICASP 2004

[2] Dong Wang, Liang Zhang, Jia Liu and Runsheng Liu, “EMBEDDED SPEECH RECOGNITION SYSTEM ON 8-BIT MCU CORE”, IEEE ICASP 2004

[3] B.L. Zeigler and B. Mazor, “DIALOG DESIGN FOR A SPEECH-INTERACTIVE AUTOMATION SYSTEM”, GTE Laboratories Incorporated, 1994

[4] Bellman R., Dynamic Programming, Princeton University Press, 1957. EMBEDDED SPEECH RECOGNITION SYSTEM

[5] L. Rabiner and B. H. Juang, “FUNDAMENTALS OF SPEECH RECOGNITION.”

Prentice-Hall, Inc., 1993.

[6] HUANG, ACERO and HON, “Spoken Language Processing - A Guide to Theory, Algorithm and System Development”, PH/PTR, 2001

[7] Patterson Hennessy, “Computer Organization and Design - The Hardware/Software Interface”, Morgan Kaufmann, 2000

[8] SUNPLUS, “SPCE061A DataSheet”, SUNPLUS TECHONOLEGY CO., Ltd., DEC.

2004.

[9] SUNPLUS, “IDE User guide”, SUNPLUS TECHONOLEGY CO., Ltd., JULY. 2003.

[10] SUNPLUS, “SPCE040A/060A/061A Programming Guide v1.2”, SUNPLUS TECHONOLEGY CO., Ltd., MAY. 2004.

[11] SUNPLUS, “SUNPLUS SPRS512C Datasheet”, SUNPLUS TECHONOLEGY CO., Ltd., March. 2003.

(58)

Ltd., March. 2003

[13] SUNPLUS, “SUNPLUS µ'nSP® Instruction Set”, SUNPLUS TECHONOLEGY CO., Ltd., JULY. 2002

[14] SUNPLUS, “C Programming in SUNPLUS SPCE061A”, SUNPLUS TECHONOLEGY CO., Ltd., March. 2003

(59)

a Programmable Dialogue

國 立 成 功 大 學 電 機 工 程 學 系

碩士論文

基於 SPCE061A 嵌入式單晶片實現可更換文本架 構之互動式語音對話學習硬體系統

Embedded System Design based on SPCE061A for Interactive Spoken Dialogue Learning System with

a Programmable Dialogue

研 究 生: 郭昀昇 Student: Yung-Shing Kuo 指導教授: 王駿發 Advisor: Jhing-Fa Wang

Department of Electrical Engineering National Cheng Kung University

Tainan, Taiwan, R.O.C July, 2005

中華民國 九十四 年 七 月

中文摘要

基於 SPCE061A 嵌入式單晶片實現可更換 文本架構之互動式語音對話學習硬體系統

郭昀昇

王駿發

國立成功大學電機工程學系

ABSTRACT

Embedded System Design based on SPCE061A for Interactive Spoken Dialogue Learning System

with a Programmable Dialogue

Yung-Shing Kuo

Jhing-Fa Wang

Department of Electrical Engineering, National Cheng Kung University

ACKNOWLEDGMENTS

CONTENTS

CHAPTER 1 INTRODUCTION... 1

CHAPTER 2

THE PORPOSED INTERACTIVE SPOKEN DIALOGUE LEARNING SYSTEM (ISDLS) ... 4

CHAPTER 3

EMBEDDED SYSTEM DESIGN FOR ISDLS BASED ON SPCE061A ... 18

CHAPTER 4 SYSTEM IMPLEMENTATION AND VERIFICATION.. 41

CHAPTER 5 CONCLUSIONS AND FUTURE WORKS ... 45

REFERENCES ... 46

LIST OF TABLES

LIST OF FIGURES

CHAPTER 1

INTRODUCTION

1.1 Background

1.2 Motivation

1.3 Organization of Thesis

CHAPTER 2

THE PORPOSED INTERACTIVE SPOKEN DIALOGUE LEARNING SYSTEM

2.1 Interactive Spoken Dialogue Learning System with a programmable dialogue

2.2 Dialogue trigger and tracking

∑

∑

∑

[ ]

2.3 Dialogue attribute extraction

∑

+

=

) ( ) ( )

(

k n x n x k

R

2.4 Dialogue event feedback

.

∑

≤

≤

−

⋅

⋅ +

=

10 1

] [ ] [ ) ( ] [ ] [

: ] [

:

]

[

CHAPTER 3

EMBEDDED SYSTEM DESIGN FOR ISDLS BASED ON SPCE061A

3.1 Overview of SPCE061A

3.1.1 Specification of SPCE061A device

.

3.1.2 SUNPLUS SPCE061A EMU Development Board

3.2 Control design for the embedded system based on SPCE061A

3.2.1 System control Flow

3.2.2 Memory Allocation

3.2.3 LPCC and DTW units

3.2.3.1 Frame-Synchronous design for LPCC unit

國立成功大學電機工程學系

基於 SPCE061A 嵌入式單晶片實現可更換文本架構之互動式語音對話學習硬體系統

研究生: 郭昀昇 Student: Yung-Shing Kuo 指導教授: 王駿發 Advisor: Jhing-Fa Wang

中華民國九十四年七月

基於 SPCE061A 嵌入式單晶片實現可更換文本架構之互動式語音對話學習硬體系統