• 沒有找到結果。

In Section 2.1, we introduce the three approaches and their principle advantage and disadvantage of those approaches. We discuss the implementation of the approaches in Section 2.2 and show their pros and cons. And the detail of REM is revealed in Section 3. In Section 4 and Section 5 we introduce the implementation and discuss all scenarios. Section 6 concludes the thesis.

Chapter 2 Background

In the chapter, we talk about background of our research and current dialogue system.

2.1 Approach of Dialogue

There are three basic strategies for a dialogue system -- finite state-based approach, frame-based approach [3] and information state update (ISU) approach respectively. We discuss pros and cons of finite state-based approach, frame-based approach and ISU approach.

2.1.1 Finite State-Based Approach

In a basic finite state-based system, the dialogue structure is represented in the set of a state and possible transitions. The states in the system represent information from user, and the transitions between nodes determine possible path. The system progresses through a series of states, with the transitions between nodes which are determined by the user's input. In finite state-based approach, the system consisted of a sequence of predetermined question and corresponding keywords,as show in Figure. 2.1. All paths and questions are predefined.

Most commercial systems are implemented with this approach, because the dialogue flow is specified as a sequence of states with possible transitions. The system maintains the

transitions of the dialogue by recognize the answer of question, as words or phrases of user's responses.

We take an example about home control. The example of specified flow of a dialogue system which verifies the user's response of each state. There is a assumption that accuracy rate of the speech recognizer is 100%.

Figure 2.1: Finite State-Based approach for component in home control

System:Which one do you want to operate? [Question 0]

User: The television in the bedroom. [Keyword 0]

System:Which one television? [Question 1]

User: In the bedroom [Keyword 1]

System:What is the instruction [Question 2]

User: Turn on and go to channel 36 [Keyword 2]

System:What is the next instruction [Question 4]

User: Go to channel 36 [Keyword 4]

System:What is the next instruction

User: No

In this conversation, system would ask a sequence of questions for the controlled device and behaviors. User's answers have to satisfy the transition condition or system would ask the same question until the corrected keywords occur. And system can not handle

over-information, if user provided too much information. The system with finite-state approach can not accept the over-information.[4] And a problem about the orderless of behaviors occur. If a user gives "Go to the channel 36" command before "Turn on the TV"

command, the system with finite-state approach can not execute the command exactly.

Generally the system with finite state-based approach restricts user's input to a word or phrase. Each state which receives a predetermined word or simple phrase makes language understand more easier. However, this advantage will make system easy to construct step by step. The flow of dialogue is passive for user, because the restricted input. Even the input is natural input, like sentences, the system must extract the words or phrase in the sentences by the speech recognizer.

In 1996, B. Hansen described a toolkit with typical finite state machine that automatically generates prompts in a variety of styles.[5] The toolkit provides several styles for format of question, so a programmer can design a set of questions with selected styles. A programmer can construct a system by a series of predefine questions.

Advantages

Principal advantage of the finite state-based approach are its simplicity and its execution efficiency. For a developer, the predetermined transition network can approach

well-constructed task involving question-response conversation. The well-construed task involves many aspects, as ordered states, information of each state will be dependent or clear transition network. The constructed system guides the flow of dialogue and decides the next

question.

Moreover, the user's responses are restricted, so technological demands will be reduced, particularly the speech recognizer. But the lack of flexibility is a trade-off against the natural input. If the dialogue system is built-in in mobile device, the technological demand is a big issues for implementation. For those reasons, the most commercial systems are implemented by finite-state based approach. As mentioned example, the developers define a series of states and tradition network to complete a well-construed task.

Disadvantages

Finite state-based approach is not suitable for less well-structured task involving

unpredicted conversation between user and system. The less well-constructed task means all information of each state will be dependent or the transition network will change within runtime. The dependence means that current state needs information of next state.

The simple example of changing network is that the entry point of system changes or is unpredicted every time. Because the system with finite-state based approach has fixed

transition network, the entry point must be unchangeable. It's not natural for user, because user gives the predetermined keyword what dialogue system needs in order. So, The system can't handle additional information in one conversation at a time. For user, the dialogue system with finite stated-based approach is not flexible for general lifestyle.

2.1.2 Frame-Based Approach

A basic frame-based system would ask user questions that enable the system to fill slots in a template in order to perform a task. In frame-based approach, the dialogue flow is not

determined but dependent on user's input. That means the frame-based approach is

user-initiative because user provides information on user's own initiative.

The questions or tasks of frame-based approach are dependent on their preconditions. The precondition is the information from user, that is the answer of the question. In the Philips timetable system[6], there are predefined conditions that compose the task or question.

In fact, the frame-based approach can import mathematical model for decision making. In 2005, Chin-Han Tsai proposed a dialogue strategy [7] with SA-Q learning [8] and Markov decision [9] for navigation.

We take two examples about home control. The two example of specified conditions of a dialogue system with frame-based approach, as show in Figure. 2.2. There is a assumption that accuracy rate of the speech recognizer is 100%.

Figure 2.2: Frame-Based Model

Conversation 1

System:What is your command?

User: Turn on the TV. [Task 2]

System:which one TV? [Question 2]

User: In the bedroom

System:Complete your command [Task 3]

Conversation 2

System:What is your command?

User: The TV in the bedroom. [Task 4]

System:What is your instruction? [Question 4]

User: Turn on

System:Complete your command [Task 3]

In the two conversations, the system with frame-based approach would ask a common question at first. A user could say any imperative sentence to be a command, and then the system would find the correct keywords in user's command. The system fills the slots (or satisfies the conditions) to complete some tasks.

Obviously, the system can handle over-information under some circumstances, if a user provided more information. But, if a programmer did not take into account some situation, the system does not complete the task. In summary, the precondition determine the flexibility of the dialogue system and how system executes actions.

Advantages

The frame-based approach has several advantages over the finite-state-based approach.

For users, frame-based approach is greater flexibility and the ability of using natural language as input. It is difficult to constrain user's responses required by the system, even when the system have been carefully designed.[10] The user can provide over-information under the frame-based approach. In this way, the transition time can be reduced, that results in a more efficient and more natural dialogue flow.

Disadvantages

Finite state-based approach and frame-based approach are appropriate for well-defined tasks. In frame-based approach, all task should be decomposed to several meaningful slots (or conditions). And only well-defined tasks can be decompose.

In this context, the determination of the system’s next action is fairly limited, The developers defined the time of occurrence of tasks (or questions) according to the sets of chosen conditions. So, the frame-based approach is short of scalability.

2.1.3 Information State Update Approach

The ISU approach consists of five concepts.

• Informational components, including aspects of common context and internal motivating factors.

• Formal representations of informational components.

• A set of dialogue moves, that trigger the update of information state.

• A set of update rules, that govern the updating of information state.

• An update strategy,that deciding which rule(s) to apply.

In a sense, the ISU approach is similar with the frame-based approach. There also are conditions and rules in the ISU approach. But, there is a difference between the ISU approach and the frame-based frame There is information state in the ISU approach, the dialogue state can be represented by information state.

Moreover, a programmer must define the informational components in formal represantation. The informational components can be private information or public

information. Further, the dialogue moves serve as triggers to update the information state. The set of dialogue moves would influence the possible messages the can be sent and the update to be made.

At last, the programmer should design a set of update rules and choose a update strategy.

The update rules formalize the way that information state is changed. There some types of the update strategy, take the first rule, or apply each rule in sequence etc.

Advantages

The ISU approach has several advantages over the frame-based approach. The ISU approach provides the information state to present the dialogue state. The rules can be more complex than the frame-based approach.

The ISU approach is more flexible than frame-based approach, because the dialogue moves can make system be mixed-initiative. The dialogue moves can determine the next information state according to the previous information state and user's input.

It is also difficult to constrain user's responses required by the system, even when the system have been carefully designed. The ISU approach can update a set of informational components in a conversation.

Disadvantages

The disadvantages of ISU approach are how to define the informational components to represent the information state, and how to design the conditions and effects for rules. In a word, the ISU approach is more complex than frame-based model Because the levels of ISU approach is more complicated.

2.2 Current Dialogue System

In the section we would take a look at the few implementations of dialogue system by a or finite-state based and frame-based strategy.

A number of researchers have focused on single strategy for implementation. Some simple application is still constructed by finite-state based strategy, as telephone booking system. The demand of telephone booking system is easily structure, so the system was constructed by finite-state based strategy.

On the other hand, the frame-based strategy will be applied to database query system mostly, as voice navigation system. The voice navigation system will analysis user's

instruction, as "Take me to NBA store on 5th avenue.". The system fill in slots of the specified task. The user's instruction will be taken apart to several pattens, as verb or destination. The system would complete the user's request based on the result of decomposing the user's input.

2.2.1 FSM-based Dialogue Systems

Since 1992, the Center for Spoken Language Understanding (CSLU) developed the CSLU Toolkit [11] as a complete system , including speech recognition and authoring tools etc. By CSLU Toolkit, we can use Rapid Application Developer (RAD) to build real-world dialogue

system. A programmer can define activities by placing objects and assign transition of activities. In 1999, Michael F. McTear provided practical experience for undergraduate students in the specification and development of spoken dialogue systems.[12]

We take a simple ticketing system as explanation according theory of finite state machine.

[13] We define a set of questions as a finite set of state S =

Departure,T ime,Destination,End . And we define the transition, as shown in Figure 2.3.

At first, the reservation system asks the user for destination. If the ticketing system does not get the exact answer from user, the system holds the state Departure till the correct answer. If the ticketing system receives the correct answers of three states Departure , T ime and Destination , the system would enter the state End.

Obviously, the ticketing system with predefined questions is lack of flexibility. And user must complete condition of all states, it is unfriendly for user. Even programmers design a integrated finite-state machine,a unpredictable user input or unpredictable behaviour of devise will make the system in loss of function.

2.2.2 Frame-based Dialogue Systems

In 1995, "The Philips Automatic Train Timetable Information System" [6] provides information about train connection between German cities. And in 2000, "MIMIC: an adaptive mixed initiative spoken dialogue system for information queries" [14] provides movie

showtime information. The two system had the same aim, that is how to construct an

appropriate database query that user required. In fact, the Philips system still is user-initiative, the MIMIC is mixed-initiative by modelling initiative during dialogue interaction [15]

The task specification in MIMIC consists of four slot, Question-Type, Movie, Theater and Town respectively. The MIMIC use goal selection algorithm to determine the action of system

Departure Time Destination 0

1

End 1 1

0 0

Figure 2.3: Exmaple of Finite-based Dialogue System

to find the goal with basic probability (bap). The MIMIC would update the baps in runtime with a set of initial value.

Obviously, the frame-based strategy would be suitable for well-defined task. If a task has unknown number of slot, e.g controlling a device, the frame-based strategy will be complex.

We would say a series of instructions to a device. For handle the situation of unknown number of slots, developer must design more combinations of slots for tasks. So, the frame-based strategy is similar with finite state-based strategy in a aspect of weak ability of unknown situation.

2.2.3 Information State Update Dialogue System

In 2007,Amores et al.[16] [17] proposed a multimodal and multilingual dialogue system for the home domain(MIMUS). MIMUS follows the Information State Update approach to dialogue management, and has been developed under the EU-funded TALK project[18]

MIMUS

Architecture of MIMUS is a set of OAA(Open Agent Architecture) agents[2]. The system core is Dialogue Manager ,which processes all requests from other agents, the user's input and provides appropriate output. Information transformation between all agents is controlled by system core.

Because of Open Agent Architecture,every manager will complete subsection of user's request. And communication of every manager still pass through the dialogue manager, every manager will send the result of subtask to the dialogue manager. The main approach

implementing the system is Information State Update (ISU). The principal element of ISU approach is the dialogue history, which memorizes dialogue states and is updated by some update rules.

The informational components in MIMUs are Dialogue Move, Type, Arguments and Contents (DTAC)[19]. The DTAC obtained for a keyword or a phrase trigger the dialogue update rule.

2.3 Summary

In summary, all approaches have their own advantages and disadvantages, and there are implementation with those approaches. Some are more easy to construct, some are more friendly for users. But, the implemented systems must query the missing information from

asking users.

Chapter 3 Design

3.1 Preliminaries

In the section we will define the notation of controlling commands. First We were inspired by Phrase Structure Grammar (PSG),that is a grammatical notion presented by Chomsky in Syntactic Structures (1957) to represent the structure in language phrases.[20] Based on PSG, we could decompose the sentence to noun-phrase as NP and verb-phrase as VP. In general, the VP will contain NP and verbs and the NP will contain nouns.

Further, we define some notation . (Table 3.1) We define the device that receive user's command is dominated device (Cx). And general noun (N ) often means the operated-state of dominated device. User's commands map to the behavior (bxi) of the dominated device. The subscript x is corresponding to the operated device (Cx). The subscript i is a index means order of behaviors. The subscript x is corresponding to the operated component (Cx). We define the description as Dx that includes adj and Loc.

Table 3.1: Notation of Cluster

Table 3.2: Notation of Set

And we define the set of notation. (Table 3.2) As mentioned PSG, the structure of the sentence is often drawn out as a parse tree, shown in Figure 3.1. Due to the parse tree, we can obtain the structure of sentence easily. Give a example "Turn on the red TV in the room and

Turn on the red TV in the room

go to channel 36". As mentioned, we can disassemble the sentence to several parts by PSG.

First, the sentence will be taken apart to two V Psaccording to the verbs "Turn on" and "go to". Second, the V Psare taken apart to several segments. (Table 3.3)

3.2 Definition of Task

We define a task including three slots, as Cx, bxi and Dx. A task would represent an imperative machine control command, including the elements we define before. The user's sentence would be transformed through semantics parsing, and the task generator, we introduce in next section, would generate the task with the result.

A task should include Cxor bxi at least, for example , a Cx could compose a task. The composition of a task is limited for three elements, but the number of bxi and Dxis not limited.

A task would include more than one behavior (bx), for an example, task = {

tv,turn on,go to the channel 36,bedroom}. The mentioned task means the operated device tv would receive more than one command, as turn on and go to channel 36.

But, the situation about missing information would happen unexpectedly, for an example, task = {tv,bedroom}. The mentioned task is lack of behaviors, the sub-model in Section 3.3 could analysis the task, and deal with the missing information in the task.

3.3 Block Diagram

Abbreviation In the section we discuss out block diagram in REM, and the architecture of the model we proposed is according to OAA [2]. All sub-models in the model are agents that be responsible for different function based on the theory of distributed agents. So, every agent is responsible for of the task. Here is the block diagram, as shown in Figure. 3.2.

We would introduce function of the sub-models in REM, and flow chart of the sub-models.

Component Sub-model The facilitator receives the tagged word and dispatching a part of the task to the suitable agent according to the label. The tagged words as information of the

Task Generator

component including descriptions are stored in memory, and the facilitator sends the words to the component sub-model. The component sub-model fills actual command in instruction file.

There is a component mapping table to support component sub-model mapped the user's command to actual instruction. The component sub-model can handle the unknown device according to some rules. For example, if we want to add a new TV in system, the NLTK can identify the word "TV" and the word "TV" has a device type code, then the component sub-model can update the component mapping table. The component sub-model will analysis the description of the device ,including the location and some additional feature of device. The

There is a component mapping table to support component sub-model mapped the user's command to actual instruction. The component sub-model can handle the unknown device according to some rules. For example, if we want to add a new TV in system, the NLTK can identify the word "TV" and the word "TV" has a device type code, then the component sub-model can update the component mapping table. The component sub-model will analysis the description of the device ,including the location and some additional feature of device. The

相關文件