基於人類示範之意圖推論於工具操作任務之應用

(1)

國立交通大學

資訊科

學與工程研究所

博

博士

士

士論

論

論文

文

基於人類示範之意圖推論於工具操作任務之應用

Intention Deduction by Demonstration for Tool-Handling

Tasks

研究

生：陳豪宇

指導教授：傅心家教授

楊谷洋教授

(2)

基於人類示範之意圖推論於工具操作任務之應用

Intention Deduction by Demonstration for Tool-Handling

Tasks

研究

生：陳豪宇

Student: Hoa-Yu Chan

指導教授：傅心家教授

Advisor: Prof. Hsin-Chia Fu

楊谷洋教授

Prof. Kuu-young Young

國立交通大學

資訊科

學與工程研究所

博士論文

A Dissertation

Submitted to Institute of Computer Science and Engineering

College of Computer Science

National Chiao Tung University

in partial Fulfillment of the Requirements

for the Degree of

Doctor of Philosophy

in

Computer Science

June 2012

(3)

基於人類示範之意圖推論於工具操作任務之應用

研究

生：陳豪宇

_{指導教授：傅心家教授}

楊谷洋教授

國立交通大學資訊科學與工程研究所

摘

要

在不久的將來，可以預期越來越多的機器人工作環境會從工廠移至家庭環境，如果機器人在面對各種家庭任務時均需有對應的程式是不切實際的。因此，有學者提出讓機器人從示範中學習的概念，可以減少使用者分析與程式的負擔。然而許多遵循這概念的方法卻需要限制使用者動作或任務計畫進而得以推論使用者的意圖。為了避免這些限制，我們提出一個新的方法讓機器人能從工具操作任務執行的軌跡中推論出示範者的意圖。在家庭環境中，工具操作任務是常見的任務，但是_{分析示範者的意圖卻不容易。我們的方法乃基於交叉驗證的概念，定位出符合} 精細技巧操作的軌跡片段，並且利用動態規劃尋找最有可能的意圖。我們提出的方法不需要事先定義可操作的動作或限制動作的速度，並且在示範的過程中允許變換動作順序，也可加入多餘的動作。在實驗中，我們提出的方法以三種任務進行測試_{，分別是倒水、泡咖啡及塗果醬任務，在示範中改變任務物件的位置和數} 目以測試其影響。更進一步，我們分析任務中各參數的影響來研究方法的適用性。實驗結果顯示我們的方法不但對於這三種任務可以推論使用者的意圖，而且可以讓使用者在沒有限制的情況下以較自然且有效率的方式示範動作。

(4)

Intention Deduction by Demonstration for Tool-Handling

Tasks

Student: Hoa-Yu Chan

Advisor: Prof. Hsin-Chia Fu

Prof. Kuu-young Young

Institute of Computer Science and Engineering

National Chiao Tung University

Abstract

In the near future, more robots come to the home-like environment, the pro-gramming for task execution becomes very demanding, if not infeasible. The concept of learning from demonstration is thus introduced, which may remove the load of detailed analysis and programming from the user. However, many methods which follow the concept of learning from demonstration limit the motions of the user or task plan to deduce the intentions of the user. To avoid these limitations, in this dissertation, we propose a novel approach for the robot to deduce the intention of the demonstrator from the trajectories during task execution. We focus on the tool-handling task, which is common in the home environment, but complicated for analysis. We apply the concept of cross-validation to locate the portions of the trajectory that corresponds to delicate and skillful maneuvering, and apply an al-gorithm based on dynamic programming to search for the most probable intention. The proposed approach does not predefine motions or put constraints on motion speed, while allowing the event order to be altered and the presence of redundant operations during demonstration. In experiments, we apply the proposed approach for three different kinds of tasks: pouring, coffee-making, and fruit jam, with the number of objects and their locations varied during demonstrations. To further investigate its scalability and generality, we also perform intensive analysis on the parameters involved in the tasks. The results show that our approach can not only

(5)

Acknowledgment

首先感謝我的指導教授—楊谷洋教授與傅心家教授，在我攻讀學位長達十年的期間給予指導、支持與協助。亦感謝口試委員林正中教授、單智君教授、蔡清池教_{授、蘇木春教授、許秋婷教授、林銘瑤博士於口試時給予的寶貴意見，使本論} 文可以更加完整。另外謝謝同學一元、木政、修任、一哲與「人與機器實驗室」的同窗在研究與實驗上的幫忙，使本論文的研究可以完成。最後要感謝我的家人—我的父母、外公、外婆、妹妹，以及寵物貓咪、烏龜與小狗乖乖。

(6)

List of Tables

4.1 Specifications of the tracking system Polhemus FASTRAK. . . 25 4.2 Specifications of the Mitsubishi RV-2A. . . 26 4.3 Denavit-Hartenberg parameters of the Mitsubishi RV-2A. . . 26 4.4 Average position error between the trajectories of human operator

and the generated trajectories in the pouring tasks. . . 31 4.5 Number of success in the presence of redundant operations . . . 33 4.6 The average path length and demonstration time in the pouring tasks. 34 4.7 Average position error between the trajectories of human operator

(9)

List of Figures

3.1 Conceptual diagram of the proposed approach. . . 8

3.2 A pouring task: (a) the setting of the vessels, (b) pouring vessel A to vessel B, and (c) pouring vessel A to vessel C. . . 8

3.3 Process for motion generation. . . 9

3.4 Examples for (a) motion cutting and (b) motion generation based on the pouring task shown in Fig. 3.2. . . 10

3.5 Process for M I evaluation: (a) standard M I evaluation and (b) M I evaluation with strategy. . . 10

3.6 Process for M I generation. . . 12

3.7 Process for optimal M I derivation. . . 12

3.8 The validation of M.I.: (a) with M.I. and (b) with M.I. and operating order. . . 17

4.1 System implementation. . . 23

4.2 Tracking system Polhemus FASTRAK. . . 24

4.3 Long range transmitter. . . 24

4.4 Mitsubishi RV-2A type six-axis robot arm. . . 25

4.5 Experimental setups for the pouring task: (a) human demonstration and (b) robot execution. . . 28 4.6 The derived intentions for the eight demonstrations of the pouring task. 29

(10)

4.7 Experimental results for the pouring 4 cups task executed by both the human operator and robot manipulator under new environmental states: (a) trajectory in X-Y plane, (b) variation of the height, and (c) variation of the tilt angle of vessel A. . . 30 4.8 The errors of the generated trajectories with the different numbers of

the training demonstrations in the pouring 2˜4 cups task. . . 32 4.9 The calculating time with the different numbers of demonstrations in

the six pouring tasks. . . 33 4.10 Experimental setups for the coffee-making task: (a) human

demon-stration and (b) robot execution. . . 35 4.11 The derived intentions for the eight demonstrations of the

coffee-making task 1. . . 37 4.12 The derived intentions for the eight demonstrations of the

coffee-making task 2. . . 38 4.13 Experimental results for the coffee-making task 1 executed by both

the human operator and robot manipulator under new environmental states: (a) trajectory in X-Y plane, (b) variation of the height, and (c) variation of the tilt angle of spoon A. . . 39 4.14 Experimental results for the coffee-making task 2 executed by both

the human operator and robot manipulator under new environmental states: (a) trajectory in X-Y plane, (b) variation of the height, and (c) variation of the tilt angle of spoon A. . . 40 4.15 The errors of the generated trajectories with the different numbers of

the training demonstrations in the coffee-making tasks. . . 41 4.16 The calculating time with the different numbers of demonstrations in

(11)

4.17 Experimental setups for the fruit jam task: (a) human demonstration and (b) robot execution. . . 42 4.18 The derived intentions for the ten demonstrations of the fruit jam task. 43 4.19 Experimental results for the fruit jam task executed by both the

hu-man operator and robot hu-manipulator under new environmental states: (a) trajectory in X-Y plane, (b) variation of the height, and (c) vari-ation of the tilt angle of knife A. . . 44

(12)

Nomenclature

D delicate motion

E the difference between generated motion and validating motion G generated motion

i an index of training motions or generated motions j, k an index of delicate motions or move motions M move motion

M I an index linking to a set of delicate motions

Q a motion consists of delicate motions and move motions T training motion

(13)

Chapter 1 Introduction

1.1 Background

Due to the progress in service robot, more robots are entering the home or office environment. It can be expected that many challenging problems shall emerge when they deal with these highly uncertain and varying environments, such as path plan-ning and manipulation [1,2]. One issue of interest is how to teach the robot to perform daily works effectively. To relieve the human operator from detailed task analysis and program coding, researchers have proposed letting the robot learn how to execute the task from observing human demonstration by itself [3]. However, the motions of human demonstration are difficult to be analyzed because the motions related to home or office environment may not be able to be predefined for robot learning. Besides, in the trajectory level, it is difficult that human repeats a mo-tion in exactly the same speed and trajectory during multiple demonstramo-tions, and the operated objects may be not fixed in the environment. Moreover, in the task level, some tasks can be accomplished with a varying operating order or redundant motions. These pose challenges to learning from demonstration.

Many researchers have proposed approaches to tackling these challenges [4–7]. Among them, Calinon proposed an approach using Gaussian Mixture Regression (GMR) and Lagrange optimization to extract the unchanged motions from the mul-tiple demonstrations [8,9]. The proposed approach demands the order of the op-erating motions to be the same during demonstrations. Dautenhahn and Nehaniv proposed an approach for the robot to learn from human demonstration by imita-tion [10], referred to as the correspondence problem, and later the team developed a system that can learn 2D arranging tasks [11,12]. Dillmann proposed a hierarchical structure for the robot to deal with complex tasks while the motion order can be changed [13], and later they went on analyzing human motion features for high-level tasks [14]. With both symbolic and trajectory high-levels of skill representation, Ogawara proposed a method that determines the essential motions from the

(14)

possi-ble motions [15]. However, these proposed methods may need to pre-define human motions, limit the human operator to follow some motion type or speed, or limit operating order. The robot may, thus, not be able to learn the task automatically or the human operator not demonstrate the task naturally and efficiently. In contrast, babies of 7 ∼ 8 month old can learn motions by regular pattern, while it is still unclear of the learning process [16]. Therefore, it motivates us to propose a method which can learn motion by demonstration without these shortcomings.

1.2 Contributions of the Dissertation

In this dissertation, we propose an approach for the robot to learn the human in-tention from her/his demonstration. To allow the human operator for more natural and efficient manipulation during demonstration, the proposed approach (a) does not need to pre-define motions, (b) does not constrain the operator to perform the task with certain motion speed or motion type, (c) allows the order of the events to be altered, and (d) allows some redundant operations.

For the motions of human manipulations, Dillmann classify them into three dif-ferent types by their goals: transport operations for moving objects, device handling for changing the internal states of the objects, and tool handling for using tool to interact with objects. We focus on the tool-handling task, which is common in daily life [14]. The motions of this kind of task can be performed continuously without stopping, because the tool can be operated for multiple objects sequentially without leaving the hand, and it is complicated for analysis. Without predefined motions, the method of cross-validation is suitable to decide the unknown parameters of a learn-ing system. Based on the concept of cross-validation, but with some modification, we propose an approach to identifying the portions of the trajectory corresponding to the delicate and dexterous maneuver of the demonstrator, referred to as motion features. These motion features, in some sense, exhibits the human skill in execut-ing a certain task. The challenge is how to find the correct intentions, among all

(15)

we apply the method of dynamic programming for the search. For demonstration, experiments based on three different kinds of tasks, the pouring, coffee-making, and fruit jam tasks, are performed. During the experiments, the locations of the op-erated objects and operating sequence may vary, and the motion features derived from the demonstrated trajectories are used for task execution under different ex-perimental settings. To further demonstrate the scalability and generality of the proposed approach, we perform intensive analysis on the parameters involved in the tasks, such as numbers of objects and demonstrations, among others.

1.3 Organization of the Dissertation

This dissertation is divided into six chapters. In Chapter 1, the background informa-tion of robot learning from demonstrainforma-tion and the contribuinforma-tions are introduced to explain why we proposed such a new approach for intention deduction. In Chapter 2, previous approaches are introduced and discussed. In Chapter 3, we describe the proposed approach, which is based on the concept of cross-validation to deduce the intention from demonstrations In Chapter 4, the experimental results are reported for performance evaluation, and the discussion on the proposed approach is given. Finally, in Chapter 5, the conclusions are presented and suggestions are stated for future research.

(16)

Chapter 2 Preliminaries and Surveys

Many researchers have proposed letting the robot learn how to execute the tool-handling task from observing human demonstration by itself [4–7]. The tool-tool-handling task is of the focus in this dissertation, which involves the interactions between tools and objects [17]. The resultant trajectory from task execution can be mainly di-vided into two types of motions: delicate motion (D) for delicate maneuver and move motion (M) between the delicate motions [15]. The delicate motion is much of the interest, since it serves to achieve the goal; by contrast, the move motion is not very critical. Meanwhile the delicate and move motions are executed alternately. As some delicate motions are noncontact motions, which do not contact the oper-ated objects, such as those in pouring motions, the operoper-ated objects may not be determined directly. Besides, some tasks may be able to be accomplished when the order of executing the delicate motions is changed or redundant delicate motions are added during demonstration. Therefore, in order to tackle the uncertainty in demonstration, multiple demonstrations are collected. The robot may need to rec-ognize the delicate motions and analyze the ordering of the delicate motions from the multiple demonstrations. Among the current approaches, they can basically be classified into three types based on either using pre-known human motions or certain assumptions [4].

2.1 Motion Recognition Type

The approaches of the first type are based on pattern recognition methods to clas-sify motions before analyzing the ordering of the delicate motions [18–27]. In this type, human needs to predefine a set of operating motions before the robot observes the human demonstrations. Human collects all possible operating motions, labels the collected motions into different classes of motions, and uses them as training data. For example, in [18], the researchers define a set of human motion data,

(17)

sign, Garbage, Start, and End, to classify human motions. Usually, Hidden Markov Model(HMM) is used to learn and recognize the operating motions because HMM can process stochastic human motion data in space and time. After motion recog-nition, in order to analyze the relation between operating motions, the relation of operating motions may be transformed into a symbolic sequence problem such as longest common subsequence problem (LCS) to find the common relation in the multiple demonstrations. Finally, the recognized motions are used to generate an operating trajectory based on the motion relations for robot execution. The concept of this type is not complicated, and the system can be implemented easily. However, it may not be able to handle all possible motions of daily life because the operating motions need to be collected before recognition.

2.2 Motion Matching Type

The approaches of second type segment human motions and matches the com-mon motions without predefining motions [14,15,28–35].First, the motions of the demonstrations are segmented based on some motion features such as the difference of operating speed [34,36] or the cyclic motion [37] without predefining motions. The approaches search for the common motions of all demonstrations by motion matching and they analyze the essential motions and the ordering of the motions. In [15], the problem of searching the common motions between the demonstrations is transformed to multiple sequence alignment problem to handle redundant motions. In [14], the researchers measure the similarities between all possible permutations of the motions of the subtasks to analyze the hierarchical structure of the common motions of the task. Finally, the common motions are searched to be the essential motions of the task and used to generate a new operating trajectory for execution. These approaches do not need to predefine possible motions, so this type can handle many known or unknown tasks of daily life. However, the motion hints such as the difference of operating speed or the cyclic motion, which are used to segment, limit human motions because only the hint-following motions can be segmented correctly.

(18)

2.3 Motion Synchronization Type

The approaches of the third type find common portions from synchronized oper-ating trajectories without motion segmentation [8,9,38–55]. First, the operoper-ating trajectories of the multiple demonstrations are synchronized as signals to avoid seg-mentation. Usually, the synchronization methods such as Dynamic Time Warpping (DTW) [56] or Continuous Profile Models (CPM) [57,58] are used to synchronize multiple signals. After synchronization, the difference of each portion of the syn-chronized trajectories are measured to find common operating motions from multiple demonstrations. For example, the synchronized trajectories can be transformed to the probability distribution by using Gaussian Mixture Models (GMM) to measure the expected operating trajectory [9]. Finally, the common operating trajectory is used to generate a new operating trajectory for the robot execution in a new envi-ronment. The approaches of this type can learn tasks without predefining motions or limiting human to follow some motion hints. However, these synchronization methods cannot handle permuting motions, so the ordering of the motions must be the same during multiple demonstrations. The order of operating motions is limited, and human may not be able to demonstrate task naturally and efficiently.

(19)

Chapter 3 Proposed Approach

In this chapter, we describe the proposed approach which takes the intention de-duction problem to be that of locating the delicate motion from the demonstrated trajectory. In Sec. 2.1, we describe the process of intention deduction. In Sec. 2.2, we explain similar function, which is used in intention deduction. And, in Sec. 2.3, we design a series of experiments for investigating its extensibility and robustness.

3.1 Intention Deduction

In this section, first, we introduce what the intention is in the tool-handling task, and then formulate it. Then, we use the concept of validation to evaluate the motion index candidates. Finally, we use dynamic programming to search the optimal motion index from all possible candidates.

3.1.1 Intention in Tool-handling Task

Fig. 3.1 shows the conceptual diagram of the proposed approach for intention de-duction from demonstration. In Fig. 3.1, the robot first observes a series of human demonstrations and records the corresponding trajectories and environmental states. From these recorded motion data, the robot searches for the possible intentions that lead to the delicate motions. The derived intentions can then be used to generate new trajectories that respond to new environmental states. Let us take the pouring task shown in Fig. 3.2 as an example. In Fig. 3.2(a), three vessels A, B, and C are arbitrarily located on the table. And, in Figs. 3.2(b)-(c), the operator pours the content from vessel A to vessels B and C, respectively, and then places vessel A back on the table. During the demonstrations, the initial locations of the vessels may vary, and so does the pouring sequence. From the recorded trajectories and corresponding locations of the vessels (environmental states), the proposed approach identifies the intention of the operator, i.e., the portions of the trajectory that

(20)

cor-Human Demonstration Intention Deduction Motion Generation Demonstrated motions Intention Generated motion New environment

Figure 1 The conceptual diagram of the proposed approach for intention learning by demonstration.

Figure 3.1: Conceptual diagram of the proposed approach.

Human Demonstration Intention Deduction Motion Generation Demonstrated motions Intention Generated motion New environment

Fig. 4: Conceptual diagram of the proposed approach.

Fig. 5: A pouring task: (a) the setting of the vessels, (b) pouring vessel A to vessel B, and (c) pouring vessel A to vessel C.

A to vessels B and C, respectively, and then places vessel A back on the table. During the demonstrations, the

initial locations of the vessels may vary, and so does the pouring sequence. From the recorded trajectories and

corresponding locations of the vessels (environmental states), the proposed approach will identify the intention of

the operator, i.e., the portions of the trajectory that correspond to the two pouring actions (delicate motions). With

the derived intention, the robot is then able to execute the pouring task with the vessels located at various locations

and possibly altered pouring sequences.

Before the discussion on the process of intention deduction, we ﬁrst describe how the motion can be generated

under new environmental states when the human intention has already been derived. We start with the representation

of the intention I. Assume that there are N delicate motions and S objects involved in a demonstrated task. Because

the intention is closely related to the delicate motions of the maneuver, I is formulated as a set of delicate motions,

Figure 3.2: A pouring task: (a) the setting of the vessels, (b) pouring vessel A to vessel B, and (c) pouring vessel A to vessel C.

respond to the two pouring actions (delicate motions). With the derived intention, the robot is then able to execute the pouring task with the vessels located at various locations and possibly altered pouring sequences.

Before the discussion on the process of intention deduction, we first describe how the motion can be generated under new environmental states when the human intention has already been derived. We start with the representation of the intention I. Assume that there are N delicate motions and S objects involved in a demon-strated task. Because the intention is closely related to the delicate motions of the maneuver, I is formulated as a set of delicate motions, Dn(t), associated with the

corresponding objects Objs:

I = {D1(t), D2(t), ..., DN(t); Obj1, Obj2, ..., ObjS} (3.1)

where Dn(t) stands for the part of the demonstrated trajectory for delicate motion

n and Objs the position and orientation of an object s. Note that, because an

object may correspond to one, several, or no delicate motion, the number of delicate motions may not be equal to that of the objects. We then introduce the motion index (M I), which serves as an index linking to I. M I is formulated as an ordered set of the time-point pairs, dj = {nj, lj, sj}, which provides the starting time nj,

(21)

Motion Adjustment Demonstrated motion Environmental state Motion Cutting New delicate motions _Motion Connection Delicate motions Resultant motion Motion index Motion Generation

Figure 2 The process for motion generation.

Figure 3.3: Process for motion generation.

end time lj, and number of the operated object sj for each of delicate motions D:

M I = {d1, d2, .., dN} (3.2)

where M I represents I. Fig. 3.3 shows the process for motion generation. Ac-cording to M I, the motion cutting module locates the delicate motions Dj from

the demonstrated motion in order. To respond to the new environmental state, the motion adjustment module moves these Dj to match the new locations of the

ob-jects and become DGj. Finally, the motion connection module uses the move motion

MGj to smoothly connect every two DGj. As its accuracy is not that critical, MGj

is generated using the cubic polynomial. With both DGj and MGj, we now have a

feasible trajectory QG corresponding to the new environmental state:

QG = {MG1, DG1, MG2, DG2, ..., DGN, MGN +1} (3.3)

Fig. 3.4(a) shows an example for motion cutting based on the pouring task shown in Fig. 3.2, and Fig. 3.4(b) that of motion generation. In Fig. 3.4(a), the demonstrated trajectory during task execution is projected on the X-Y plane, where the yellow and green rectangles indicate the locations of vessels B and C. The yellow and green trajectories are the delicate motions determined by the motion cutting module according to the given M I. In Fig. 3.4(b), the yellow and green rectangles indicate the locations of vessels B and C in the new environmental state. In re-sponding to these new locations of vessels B and C, the delicate motions identified in Fig. 3.4(a) are transformed to be the yellow and green trajectories by the motion adjustment module. Finally, the three move motions, as the red trajectories, are utilized to smoothly connect the two delicate motions.

(22)

0.1 0.2 0.3 0.4 0.5 0.05 0.1 0.15 0.2 0.25 0.3 0 0.1 0.2 0.3 0.4 -0.05 0 0.05 0.1 0.15 0.2 0.25

Figure 4 Examples of (a) motion cutting and (b) motion generation based on the pouring task.

End

Start

D

(a) motion cutting

End Start D D M M M (b) motion generation y (m) y (m) x (m) x (m)

Figure 3.4: Examples for (a) motion cutting and (b) motion generation based on the pouring task shown in Fig. 3.2.

Motion Generation Validating motion Environmental state (validating motion) Motion Comparison Generated motions Training motions Error _Motion Generation Validating motion MI candidate Motion Comparison Generated motions Training motions Error MI candidate

(a) standard MI evaluation (b) MI evaluation with strategy

Environmental state (validating motion)

Figure 3.5: Process for M I evaluation: (a) standard M I evaluation and (b) M I evaluation with strategy.

(23)

3.1.2 Derivation of Optimal Motion Index

From the motion generation process discussed above, we can take the intention de-duction process as that of finding proper motion index M I. To find the optimal M I among all M I candidates, we introduce first the process for M I evaluation, shown in Fig. 3.5(a). This process evaluates the fitness of the M I candidates de-rived from the demonstrated motion, based on a reasoning that proper M I should lead to a generated motion very similar to the human demonstrated motion, which includes all the delicate motions. In Fig. 3.5(a), from the demonstrated motions, we select one demonstrated motion as the validating motion and the rest as the training motions. We will discuss the selection of validating and training motions later. For an M I candidate derived from the validating motion, the motion gen-eration module, described above, generates motions based on the training motions and the environmental state corresponding to the validating motion; the generated motions, with their lengths set to be equal to that of the validating motion, are then compared with the validating motion via the motion comparison module, yielding the differences between them (marked as errors). Because the operator may per-form the demonstrations in different speeds and possibly with different orders for the events involved, the corresponding delicate motions are likely to be with various sampling rates, or to appear in different portions of the demonstrated trajectories. To tackle this, our strategy is to let each of the delicate motions of the validating motion be compared with every portion of the training motion, accompanied by altering sampling rates, showing in Fig. 3.5(b). Through this comparison process, the generated motion, whose delicate motions lead to the minimum difference when compared with those of the validating motion, is determined as the output and sent to the motion comparison module for the following comparison. As a high search complexity is expected, we come up with an approach analogous to that of dynamic time warping (DTW) in execution [56]. Details of this strategy will be explained in next section.

We go on with the process for M I generation, shown in Fig. 3.6. In Fig. 3.6, among all the demonstrated motions, one demonstrated motion is first selected as

(24)

Select Validating

motion Demonstrated

motions Validating motion MI Generator

Pool of MI candidates Training motions

Figure 6: Selection of validating and training motions

from demonstrated motions.

Validating motion MI Generation

Figure 3.6: Process for M I generation.

: multiple I/O : single I/O (b) complete process MI Generation Demonstrated motions Validating motion Pool of MI candidates MI evaluation Validating motion and Environmental state Training motions Errors Evaluation Optimal MI candidate

For all MI candidates

Find Optimal MI Optimal MI All Optimal MI candidates Training motions

Figure 7: The complete process for deriving the optimal motion index (MI).

Figure 3.7: Process for optimal M I derivation.

the validating motion, denoted as QV, and the rest as the training motions, QT,

for each sequence of the process. The process will be repeated until each of the demonstrated motions serves as the validating motion once. In next step, the M I generator will locate all possible M I candidates from QV. Because the proposed

approach does not constrain the human operator to perform the task with certain motion speed or motion type, and also allows the order of the events to be altered during demonstration, there is in fact not a priori knowledge for the selection of M I. The criterion for M I generation is thus to let M I candidate correspond to every portion of QV with a duration longer than 0.3 second, as human cannot cognize an

event until it happens 0.3 second later [59]. It can be expected that there will be a huge number of M I candidates. That is why we employ the method of dynamic programming for the search of the optimal M I.

With the M I evaluation process in Fig. 3.5(b) and M I generation process in Fig. 3.6, Fig. 3.7 shows the entire process for optimal M I derivation. For the

(25)

of them serves as the validating motion once. Via the M I generation process, M I candidates along with the validating and training motions are sent into the M I evaluation process to determine which M I candidate leads to the minimum error, identified as an optimal M I candidate. As each validating motion corresponds to one optimal M I candidate, the outputs of the outer dotted block are the optimal M I candidates for each of them. Finally, the optimal M I is determined to be the one with the minimum error among all optimal M I candidates.

3.1.3 Implementation of Intention Deduction

For mathematical formulation of this optimal M I derivation process, we start with the description of M I for a given validating motion QV, denoted as M IV:

M IV = {dV1, dV2, .., dVN} (3.4)

with

dVj = {nVj, lVj, sVj} (3.5)

where dVj indexes the delicate motion DVj with nVj, lVj, and sVj the starting time,

end time, and number of the operated object. According to M IV, QV can then be

expressed as the combination of a series of delicate and move motions:

QV = {MV1, DV1, MV2, DV2, ..., DVN, MVN +1} (3.6)

On the other hand, with the same M IV, the generated motion QiG for each training

motion Qi_T can be formulated as

Qi_G = {M_Gi₁, Di_G₁, M_Gi₂, Di_G₂, ..., Di_G_N, M_Gi_{N +1}} (3.7) where Di

Gj and M

i

Gj are its delicate and move motion, respectively. D

i

Gj can be

determined via the M I evaluation process above, of which the minimization between Di

Gj and DVj is dealt with a DTW-like method:

D_Gi_j = similar(Qi_T, DVj) (3.8)

In the similar function, the training motion Qi_T is transformed to match the environment of the validating motion according to the possible operated object sVj,

(26)

and generated delicate motion Di

Gj is searched from the transformed motion to be

similar to DVj as close as possible. Therefore, the search can use a DTW-like method

to minimize the difference between Di

Gj and DVj [56]. Details of this method will

be explained in next section.

After the generated delicate motions are generated, Mi

Gj determined by

func-tion MG, which utilizes the cubic polynomial to smoothly connect the two delicate

motions, D_Gi_j−1 and Di_G_j:

M_Gi_j = MG(DGij−1, D

i

Gj) (3.9)

To determine the optimal motion index M I_V∗, QV will be compared with all

QG generated according to every M IV. Because we are looking for an M IV that

may induce all the necessary delicate motions, M I_V∗ should not induce too much deviation between the delicate motions for QV and QG, and consequently between

the move motions for them. By taking Emax as the maximum difference between

the delicate and move motions for QV and those QG generated for all the training

motions corresponding to some M IV, we determine M IV∗, among all M IV, to be the

one that leads to the smallest Emax:

M I_V∗ = arg min M IV Emax (3.10) with Emax = N X j=1 ED(DVj) + N +1 X j=1 EM(DVj−1, DVj) (3.11) where ED(DV) = max i ||DV − D i G|| 2 (3.12) EM(DVa, DVb) = max i ||MV(DVa, DVb) − MG(D i Ga, D i Gb)|| 2 _(3.13)

Here, ED computes the difference between the respective delicate motions for QV and

those QG, and EM that for the move motions, with MV as a function which outputs

the move motion part between two delicate motions of the validating motion, DVa

and DVb. Because each demonstrated motion serves as the validating motion once,

(27)

Emax, demoted as E∗. As the length LV for each QV may not be the same, E∗ needs

to be normalized before the comparison. M I∗∗ is then formulated as M I∗∗= arg min

M I∗ V

E∗/LV (3.14)

The search for M I∗∗ is of high complexity, as exhibited in Eqs. (3.10)-(3.14) above. As an attempt to enhance search efficiency, we employ the method of dynamic programming [60] and let the computation of E∗ in Eq. (3.14) be expressed into a recursive formulation: E∗ = min d_Vk ER(DVk) + EM(DVk, DVN +1) (3.15) with ER(DVk) = min d_Vk−1(ER(DVk−1) + EM(DVk−1, DVk)) + ED(DVk) (3.16)

where ER(DVk) stands for the minimum difference between the motions from the

first move motion to a given delicate motion; dVk and dVk−1, described in Eq. (3.5),

index the delicate motions DVk and DVk−1; and 1 ≤ k ≤ N . Because the number of

delicate motions is not known in advance, N and k are not specific numbers. Also note that, the first move motion is generated between DV0 and DV1, and the last

one between DVN and DVN +1, with DV0 and DVN +1 taken as the first and last point

of the trajectory, respectively. In Eq. (3.15), E∗ is derived as the minimum one for all ER(DVk) with ER(DVk) computed recursively via Eq. (3.16). With Eqs. (3.15)

and (3.16), dynamic programming can take advantage of the table generated for ER(DVk) to simplify the computation in deriving E

∗_.

Based on the discussions above, the algorithm for intention derivation algorithm is formulated in Algorithm 1. Time complexity for this optimal M I derivation pro-cess is related to the number (R) and length (LV) of the demonstrated motions

and the number (S) of objects involved in the task. Here, the lengths of the demon-strated motions are assumed to be close. In Eqs. (3.15) and (3.16), the generation of the table for ER(DVk) takes up most of the time consumed. The table has O(LV

2_·S)

elements, and each element deals with the complexity of the order of O(R · LV3· S).

During the entire process, the table needs to be generated R times. The final time complexity is thus computed to be in the order of O(R2_{· L}

(28)

The divide-and-conquer method [60] may also be an alternate to solve Eq. (3.14). However, because our proposed approach takes every portion of the trajec-tory of the validation demonstration as the candidate for a possible delicate motion, it is not that straightforward to divide the trajectory properly. Consequently, the search for the optimal solution may demand a large number of divisions, leading to high computational load.

Algorithm 1 Find the intention of the task through R times of demonstrations Input: the demonstrated trajectories Qi (1 ≤ i ≤ R) for the R times of

demonstra-tions

Output: the optimal M I∗∗

1: for i = 1 to n do

2: Select Qi among the R recorded trajectories as the validating motion QV and

the rest as the training demonstrations QT

3: Apply the method of dynamic programming, based on Eq. (3.10), to deter-mine the optimal M I∗ for QV

4: end for

5: Utilize Eq. (3.14) to determine the optimal M I∗∗ for the demonstrator among

those M I∗ for the R validating motions

6: return M I∗∗

3.2 Similar Function

In this section, the reason why the similar function is used is first explained, and the implementation of the similar function, which is based on the dynamic time wrapping method, is then discussed.

3.2.1 Reasoning for Similar Function

(29)

Motion Generation Validating motion Environmental state (validating motion) Motion Comparison Generated motion Training motion Error MI candidate

(a) with M.I. (b) with M.I. and operating order

Motion Generation Validating motion Environmental state (validating motion) Motion Comparison Generated motion Training motion Error MI candidate Operating order

Figure 3.8: The validation of M.I.: (a) with M.I. and (b) with M.I. and operating order.

training data because the fitness of each possible M.I. candidate is evaluated by the validation. In Fig. 3.8(a), the fitness of each M.I. candidate can be evaluated by comparing the difference between the generated motion and the validating motion in the motion comparison module, and the M.I. candidate which leads to the minimum error is the optimal M.I. candidate. However, the operating speed of the generated motion and the ordering of the delicate motions may be different from that of the validating motion because the motion generation module uses the M.I. candidate of the training motion to generate motion. The motion comparison module may need to use DTW to deal with the differences of motion speeds [56], but DTW cannot deal with the different ordering of delicate motions. Therefore, when DTW is used in the motion comparison module, we need to align the ordering of the delicate motions of the generated motion as that of the validating motion, as shown in Fig. 3.8(b). In Fig. 3.8(b), the motion speeds of the delicate motions are not included in the operating order because DTW can handle this.

With the concept of validation, the correct operating order and the optimal M.I. may be able to be searched simultaneously by validating all possible pairs of the operating order and the M.I. candidate when single training motion is inputted, as shown in Fig. 3.8(b). Moreover, the M.I. candidates of the training motion can be evaluated more accurately by inputting multiple validating motions with each operating order. However, the number of possible operating orders of each validating motions increases factorially with the number of the M.I.-assigned delicate motions. A lot of calculating time is needed to evaluate all operating order for each

(30)

M.I. candidate. In contrast, when multiple training motions and a single validating motion are inputted, the searching space is much larger because each training motion has an individual M.I. and an individual operating order. Therefore, we need a method to tackle this searching problem.

It is known that the optimal M.I. candidate leads to the minimum difference and the difference between two similar motions is smaller than two randomly chosen motions. Therefore, a shortcut method is proposed. The M.I. candidate of validating motion is inputted into the motion generation module without inputting the possible operating order candidate for the multiple training motions. The motion cutting module of the motion generation module is modified to use the similar function to minimize the difference between the delicate motion of the generated motion and the validating motion, as shown in Fig. 3.5(b). Note that this shortcut method may not be match the concept of validation.

We use the similar function to search for the part of the motion from each training motion as closely as possible to the M.I.-assigned delicate motion of the validating motion, so the difference between generated delicate motion and M.I.-assigned delicate motion can lead to minimum error. Besides, we use a DTW-like method to resample the delicate motion of generated motion in the calculation of the similar function, so the motion comparison module can calculate the difference without conducting DTW again.

3.2.2 Implementation of Similar Function

To calculate the similar function in Eq. (3.8), we use a DTW-like method, which uses the technique of dynamic programming to minimize the error between the delicate motions of the validating motion and the training motion. In the calculation of the similar function, first, the training motion Qi

T is transformed to match the

environment of the validating motion according to the M.I.-assigned operated object sVj. Then, the transformed motion is resampled to P

i

(31)

some portion of the training motion. After that, the minimum error between the M.I.-assigned delicate motion of the validating motion and some portion of the training motion can be calculated as

Esimilar∗ = min 1≤%≤2Li T−1 H(%, lVj− nVj + 1) (3.17) with H(%, ς) =          ∞, if % ≤ 0, ||Pi T(%) − DVj(ς)|| 2 + min %−4≤ι≤%−1H(ι, ς − 1), if % ≥ 1,ς ≥ 1, 0, otherwise. (3.18)

where Esimilar∗ is the minimum error between the delicate motions DVj and D

i Gj,

H(%, ς) is a recursive function that outputs the minimum error between DVj(1 ∼ ς)

and Pi

T(u ∼ %) (u is determined automatically during the process of minimization),

and other symbols are explained in the following. The calculation of the similar function is different from DTW in two parts. First, in order to measure the difference independently without the influence of the time length of different generated motion, the generated motion is mapped and resampled to the time of the validating motion and the difference is measured based on the time of validating demonstration in the calculation of H(%, ς). Because we want to search the minimum error under the dynamic speed whose range is from 1/2 to 2 times of the original speed of training motion, the number of the samples of Pi

T is 2LiT − 1, which is the two times of the

sampling rate of Qi_T, and the four answers of the subproblems (H(% − 4, ς − 1) ∼ H(% − 1, ς − 1)) are used to solve the problem H(%, ς). Second, in order to check each possible start point of P_Ti, the number of the cases which H(%, ς) are setted to zero is more than that using DTW because each sample points of Pi

T is a possible

start point of the similar motion. Therefore, H(%, ς) is a function that outputs the minimum error between DVj(1 ∼ ς) and P

i

T(u ∼ %), and the result of similar function

can be obtained by tracking the selections in the calculation of the minimum error Esimilar∗. Moreover, some elements of table H(%, ς) can be shared for the different

inputs of DVj to reduce the calculation time, so the time complexity of calculation

of the similar function with the total possible inputs is O(R · LV3· S) when the range

of the dynamic speed is fixed and the demonstration times of the validating motion and the training motions are assumed to be close.

(32)

Because the similar function is a shortcut method, which does not calculate the difference between the move motions of validating motion and generated motions, this method does not minimize the total difference between the validating motion and the generated motions. Although the penalty which is the expectative error between the move motions of the validating motion and the generated motions can be used in the calculation of the similar function, we do not use it in our experiment because the expectative error is difficult to be estimated precisely. Note that, in the experiment, it is observed that to create ER(DV) table by using the similar function,

many the same EM values have been used. Therefore, if a cache is used to stand for

the EM, the calculation process can be reduced.

3.3 Experimental Design for Extensibility and

Ro-bustness

The proposed approach is developed for general tool-handling tasks, with the ap-pealing features in (a) no need for pre-defined motions, (b) no constraints on motion speed or motion type, (c) allowance for event-order altering, and (d) allowance for redundant operations during demonstration. To further investigate its extensibility and robustness, we design a series of experiments based on three different kinds of tasks: the pouring, coffee-making, and fruit jam tasks.

In the pouring task, the operator is asked to hold a vessel and pour the content into other vessel(s), as described in the example shown in Fig. 3.2. The experiments are designed to evaluate the influence from the following factors:

• pouring order during execution; • number of vessels to pour;

(33)

coffee-making task, the operator uses a spoon to scoop coffee powder, sugar, or milk from the jars into the coffee cup and stir it. The number of jars are fixed for the experiments, but the operator can access the same jar(s) one or several several times. In addition to the performance on this coffee-making task, the effect of number of demonstrations on M I derivation and time complexity during task execution are also evaluated. To further explore its generality, in the fruit jam task, the operator picks up a knife from the table, scoops the fruit jam from the jar, spreads it on the toast in a zigzag motion, and places the knife back on the table. Meanwhile, we also analyze how the presence of the redundant motions affects system performance.

To simulate that the robot learns a task in a home environment, we assume that the robot uses a vision system to identify objects by comparing the difference between background and foreground in the multiple demonstrations for obtaining the positions of the objects. Moreover, the handled tool whose position is varied during the demonstration can also be identified. We suggest that the operated objects should change the positions in the multiple demonstrations, but some operated objects which are heavy or fixed may not be moved in the task. These objects cannot be identified by the vision system. Fortunately, no matter how many operated objects hidden in the background, these objects can be seen as a special object -background object whose position is not changed during demonstrations just like the origin. In contrast, although we suggest that the positions of the operated objects should be changed during the multiple demonstrations, we do not require that the positions of the unoperated objects cannot be changed. For example, there are three cups in the pouring task of two cups, it is allowed that the position of the unoperated cup is changed carelessly for some reason during demonstrations. Therefore, even some identified objects which are not operated in the demonstrations are inputted into the intention deduction module, our learning method can still handle it.

(34)

Chapter 4 Experiments

In this chapter, first, the system implementation is introduced. Then, the results of experiments of the pouring, coffee-making, and fruit jam tasks are reported, and its robustness is evaluated. Finally, we discuss how the proposed approach performs when compared with previous ones, and an application scenario is proposed for breakfast preparation to demonstrate the practicality of this approach in daily lives.

4.1 System Implementation

The proposed system is implemented for experiments, as shown in Fig. 4.1. In Fig. 4.1, the task environment consists of the objects, including the tool, the op-erated objects, and the background objects. The human operator can see the task environment and operate the objects. The positions and orientations of the objects are measured by the tracking system Polhemus FASTRAK. This tracking system Polhemus FASTRAK consists of a system electronics unit, receivers, a power sup-ply, which are shown in Fig. 4.2, and a long range transmitter, which are shown in Fig. 4.3. The update rate of the positions (XYZ) and orientations (ZYX Euler angles) of receivers is 30Hz, and the specifications of this tracking system Polhemus FASTRAK is described in Table 4.1. After the human operator demonstrates a task multiple times, these operating trajectories are recorded as the 7-dimensional sequences, which consist of positions and orientations (in the form of quaternion) in 3 and 4 dimensions, respectively. The data for the position is normalized by its standard deviation to balance the effects of the errors of due positions and ori-entations. These trajectories with the positions and orientations of all possible operated objects are then inputted into our learning method to deduce the inten-tion. The executable operating trajectory for the robot manipulation is generated as 7-dimensional sequences (positions and orientations) according to the positions and orientations of the operated objects of the robot-faced environment. In the

(35)

Task environment Eyes Hands Human operator FASTRAK object object sensor sensor PC RV-2A drive unit RV-2A vision force position & force positions positions position position force radiator

Figure 4.1: System implementation.

and the program is executed in the computer with CPU of Intel E6300, running at 1.86GHz with 3.62Gbyte RAM. Although this CPU has two cores, the program only uses one core to measure the time of the calculation. The robot manipulator Mitsubishi RV-2A is position-controlled, shown in Fig. 4.4, with its specifications listed in Table 4.2. The Denavit-Hartenberg parameters of this robot manipulator are specified in Table 4.3 [61], and the transformation matrix of each joint is defined as Ai−1_i (θi) =

cos(θi) − sin(θi) cos(αi) sin(θi) sin(αi) aicos(θi)

sin(θi) cos(θi) cos(αi) − cos(θi) sin(αi) aisin(θi)

0 sin(αi) cos(αi) di 0 0 0 1 (4.1)

This robot manipulator can accept the position command from internet by Ethernet interface card, and each position command can be operated at the operation control time 7.1 ms (≈141Hz). Therefore, the generated robot trajectory is resampled at 141Hz to perform robot manipulator, and the position command (J1∼J6) of each sampling point is solved by inverse kinematics [61]. In order to avoid damaging the devices, in the experiments, all vessels are demonstrated with empty content.

(36)

Figure 4.2: Tracking system Polhemus FASTRAK.

(37)

Figure 4.4: Mitsubishi RV-2A type six-axis robot arm.

Table 4.1: Specifications of the tracking system Polhemus FASTRAK. Latency 4 milliseconds

Interface RS-232 with selectable baud rates up to 115.2K baud Static Accuracy position 0.03” RMS

orientation 0.15 degrees RMS Resolution position 0.0002 inches

orientation 0.025 degrees

(38)

Table 4.2: Specifications of the Mitsubishi RV-2A. Degrees of freedom 6

Maximum load capacity (rating) 2Kg Maximum reach radius 621mm

Working area J1 320◦ (-160 to +160) J2 180◦ (-45 to +135) J3 120◦ (+50 to +170) J4 320◦ (-160 to +160) J5 240◦ (-120 to +120) J6 400◦ (-200 to +200) Maximum speed (degree/s) J1 150

J2 150 J3 180 J4 240 J5 180 J6 330 Repeat position accuracy ±0.04mm

Table 4.3: Denavit-Hartenberg parameters of the Mitsubishi RV-2A. Link ai(m) αi(rad.) di(m) θi(rad.)

1 0.1 −π/2 0.35 θ1 2 0.25 0 0 θ2− π/2 3 0.13 −π/2 0 θ3− π/2 4 0 π/2 0.25 θ4 5 0 −π/2 0 θ5 6 0 0 0.24 θ6

(39)

4.2 Pouring Task

We applied the proposed approach for the pouring task shown in Fig. 4.5 first. The experiment is divided into two stages: (a) human demonstration and (b) robot execution. Fig. 4.5(a) shows the experimental setup for human demonstration, which includes the human operator and the Polhemus FASTRAK tracking system. There are five vessels placed randomly on the table. The human operator held a vessel (vessel A) and poured the content into the other vessels (vessels B, C, D, and E) on the table. The Polhemus FASTRAK tracking system, with a sampling rate of 30 Hz for each of the sensors, was used to measure and record the demonstrated trajectories and positions of the objects. These trajectories were recorded as the 7-dimensional sequences, which consist of positions and orientations (in the form of quaternion) in 3 and 4 dimensions, respectively, with the position normalized by its standard deviation. From these recorded trajectories, we applied the intention deduction algorithm, discussed in Sec. 2.1, to derive the intention of the operator from all possible intentions. We then moved on to the second stage of the experiment, and let the Mitsubishi RV-2A 6-DOF robot manipulator follow the derived intention to execute the pouring task under new environmental states, as shown in Fig. 4.5(b). There are two changeable parameters of the pouring tasks in the experiment, which are the poured vessels ({B,C}, {B,C,D}, and {B,C,D,E}) and the pouring order (arbitrary order or same order), so there are 6 different settings of the pouring tasks. The human operator demonstrated 18 times in each setting of the pouring tasks, and the total number of demonstrations is 108.

To test the results of our method, for each pouring task (6 pouring tasks), the 8 of the demonstrations are randomly selected to be training data to be inputted into the learning method, and the rest demonstrations whose operating orders are equal to the operating orders of generated trajectories are selected to be testing data. In each test, the positions of vessel B∼ E and origin are inputted to test the effect of the unoperated objects and the background object because the robot does not know which object is operated. These processes are executed 5 times, and the average errors, which describe the difference between the generated trajectories and

(40)

Figure 8. Experimental setups for the pouring 4 cups task: (a) human demonstration and (b) robot execution.

(a) Human demonstration (b) Robot execution

Figure 4.5: Experimental setups for the pouring task: (a) human demonstration and (b) robot execution.

the trajectories of the testing data, are calculated by DTW. Besides, we then moved on to the second stage of the experiment, and let the Mitsubishi RV-2A 6-DOF robot manipulator follow the generated trajectories to execute the pouring tasks under a new environmental state, as shown in Fig. 4.5(b).

Fig. 4.6 shows the derived intentions for each of the eight demonstrations of the pouring task in one test. Because the tilt-angle changes are the clear features of pouring motions, the time series graphs which describe the tilt-angle of the tra-jectories are used to illustrate the results. In Fig. 4.6, delicate motions related to vessels B, C, D, and E were identified from the trajectory of vessel A, marked by the yellow, green, blue, and purple blocks, respectively. It was observed that most of the delicate motions were located at those portions with evident tilt-angle changes, implicating the pouring action. The derived intention for demonstration 3 was determined to be optimal among all.

Fig. 4.7 shows one of the generated trajectories and one of the trajectories of the testing data. In Fig. 4.7, the black line is the trajectory of the testing data, and the color line is the generated trajectory, which consists of the red lines (move motions) and the other color lines (delicate motions for operated objects). In XY-plane subfigure of Fig. 4.7, the color rectangles (yellow,green,blue, and purple)

(41)

0

1

2

3

4

5

6 Time

H

s

L

0

0.25

0.5

0.75

1

1.25

1.5

1.75 q

H

da

r.

L

Demonstration 7

0

1

2

3

4

5

6 Time

H

s

L

0

0.2

0.4

0.6

0.8

1

1.2

1.4 q

H

da

r.

L

Demonstration 8

0

1

2

3

4

5

6 Time

H

s

L

0

0.2

0.4

0.6

0.8

1

1.2

1.4 q

H

da

r.

L

Demonstration 5

0

1

2

3

4

5

6 Time

H

s

L

0

0.25

0.5

0.75

1

1.25

1.5 q

H

da

r.

L

Demonstration 6

0

1

2

3

4

5

6 Time

H

s

L

0

0.25

0.5

0.75

1

1.25

1.5 q

H

da

r.

L

Demonstration 3

0

1

2

3

4

5

6 Time

H

s

L

0

0.25

0.5

0.75

1

1.25

1.5 q

H

da

r.

L

Demonstration 4

0

1

2

3

4

5 Time

H

s

L

0

0.2

0.4

0.6

0.8

1

1.2

1.4 q

H

da

r.

L

Demonstration 1

0

1

2

3

4

5 Time

H

s

L

0

0.2

0.4

0.6

0.8

1

1.2

1.4 q

H

da

r.

L

Demonstration 2

(42)

-0.1 0.1 0.2 0.3 0.4 0.05 0.15 0.2 0.25 0.3 1 2 3 4 5 6 0.25 0.5 0.75 1 1.25 1.5 1.75 2 1 2 3 4 5 6 0.1 0.2 0.3 0.4 y(m) x(m) z(m) t(s) t(s) θ(rad.)

(b) Variation of the height (a) Trajectory in X-Y plane

(c) Variation of the tilt angle

Figure 11. Experimental results for the pouring 4 cups task executed by both the human operator and robot manipulator under new environmental states: (a) trajectory of vessel A in X-Y plane. (b) variation of the height of vessel A. (c) variation of the tilt angle of vessel A.

--- Human

-

Robot

Figure 4.7: Experimental results for the pouring 4 cups task executed by both the human operator and robot manipulator under new environmental states: (a) trajectory in X-Y plane, (b) variation of the height, and (c) variation of the tilt angle of vessel A.

(43)

subfigure and the height subfigure and the tilt angle subfigure of Figure 4.7, it was observed that the most of the delicate motions were located at those positions near the operated objects and those portions with minimum height and those portions with maximum tilt angle because these features are the features of the pouring motions. This generated trajectory and human trajectory exhibited certain degree of similarity, but not exactly the same. However, the generated trajectory can be used to manipulate the robot to accomplish the pouring tasks. Moreover, the generated trajectories of the other tasks show the correct delicate motions and match the human operations.

Table 4.4: Average position error between the trajectories of human operator and the generated trajectories in the pouring tasks.

Pouring task Order Error (m) Same 0.052 2 cups Arbitrary 0.051 Same 0.049 3 cups Arbitrary 0.049 Same 0.052 4 cups Arbitrary 0.045

Table 4.4 shows average position errors between the trajectories of the human operator and the generated trajectories for the six combinations of the pouring task, and these error are calculated by DTW on 3D positions. First, it was observed that the orders of the operations do not have a great effect upon the results of our method. Second, it was observed that the number of the operated objects does not have a great effect upon the results of our method.

4.2.1 Robustness in Pouring Tasks

Figure 4.8 shows the relation between the errors and the number of demonstrations (2 to 8) in the pouring 2˜4 cups task, as shown in Table 4.4. First, it was observed that our method may need 4˜5 demonstrations to learn the goals of the 2˜4-cup

(44)

2

3

4

5

6

7

8

0.06

0.08

0.1

0.12

0.14 (m)

Figure basic. The errors of the generated trajectories with the

different numbers of the training demonstrations in the pouring 2~4

cups tasks.

Wtih unoperated objects

4 vessels (s)

3 vessels (s)

2 vessels (a)

2 vessels (s)

3 vessels (a)

4 vessels (a)

(no.)

s: same, a: arbitrary

Figure 4.8: The errors of the generated trajectories with the different numbers of the training demonstrations in the pouring 2˜4 cups task.

pouring tasks. Second, it was observed that the number of demonstrations which is needed to decrease the errors of the 4-cup pouring task is more than those for the 2˜3-cup pouring tasks. It implies that the difficulty of learning task increases with the number of the poured cups. Third, it was observed that many errors of the arbitrary order pouring tasks are smaller than that of the same order pouring tasks. It implies that human operator may introduce more unusual operations in the same order pouring tasks because human operator needs to pay more attentions to unnatural operating orders during demonstrations. As for the time of the calculation of the pouring tasks, Fig. 4.9 shows that it increases approximately squarely with the the number of the demonstrations. This matches the expectation of the time complexity.

We further evaluate how the presence of the redundant operations during demonstration may affect system performance. The analysis was based on the 2-vessel pouring task. Among a group of 2-2-vessel demonstrations, we gradually added in some demonstrations involving three or four vessels, taken as the introduction of the redundant operations. With this, we attempted to find out whether the proposed approach could still successfully recognize that the demonstrations were

(45)

3 4 5 6 7 8 2000 4000 6000 8000

t(s)

Figure 22. The executing time with the different

numbers of demonstrations in the six pouring tasks.

With unoperated objects

4 vessels (s)

3 vessels (s)

2 vessels (a)

2 vessels (s)

3 vessels (a)

4 vessels (a)

(no.)

s: same, a: arbitrary

Figure 4.9: The calculating time with the different numbers of demonstrations in the six pouring tasks.

involving redundant operations. Table 4.5 lists the number of success out of 10 tests where the number of demonstrations involving 3 or 4 vessels increased from 1 to 4 out of a total 8 demonstrations. Because the redundant operations are not common motions which can not be searched in each training motion in Eq. (3.8), larger percent of the redundant operations usually led to larger Emaxin deriving the

optimal M I, as shown in Eq. (3.14). When the demonstrations involving redundant operations consisted of half of the total demonstrations, the proposed approach can still reach a high success rate at 80%.

Table 4.5: Number of success in the presence of redundant operations 2 vessels 3 vessels 4 vessels number of success

7 1 0 9

6 1 1 9

5 2 1 8

4 2 2 8

About the operating order of the tasks, in the experiment, it is observed that the path length and demonstration time of the tasks whose operating orders are arbitrary are smaller than that of the task whose operating orders are same, as

(46)

shown in Table 4.6. Therefore, the arbitrary order of the operations can decrease the path length and the demonstration time of the operations. Besides, it can decrease the probability of the mistakes of the operations, the pause motions, and the redundant operations because the operator does not need to care about the order of the operations when he/she demonstrates a complex task. From Fig. 4.9 and Table 4.6, it clear that the calculation time increases with the demonstration time. It means that the arbitrary operating orders can also decrease the calculation time in the pouring tasks. Therefore, in order to decrease the calculation time of our method, the operator should accomplish each demonstration as fast as possible and do not care about the operating order in each demonstration.

Table 4.6: The average path length and demonstration time in the pouring tasks. path length (m) demonstration time (s) Same order 1.017 3.450 Pouring 2 cups Arbitrary order 0.973 3.452 Same order 1.436 4.900 Pouring 3 cups Arbitrary order 1.320 4.569 Same order 1.776 6.396 Pouring 4 cups Arbitrary order 1.523 5.856

4.3 Coffee-making Task

In the coffee-making tasks, a spoon (spoon A) and a vessel (vessel B) are placed randomly on an area of 0.21x0.21 meter2 on the table in each demonstration, and three vessels (vessel C, D, and E) are placed on the assigned positions where are not changed during the experiment, as shown in Fig. 4.10. Spoon A is always held as a tool and operated to do the scooping motions and stirring motions on the vessels. Two different coffee-making tasks are designed to test the effects of the repeated operations. The motions of the first coffee-making task are using spoon A to scoop

(47)

(a) Human demonstration (b) Robot execution

Figure 9. Experimental setups for the coffee-making task:

(a) human demonstration and (b) robot execution.

Figure 4.10: Experimental setups for the coffee-making task: (a) human demonstra-tion and (b) robot execudemonstra-tion.

motions of the second coffee-making task are using spoon A to scoop the content of vessel C into vessel B, to scoop the content of vessel C into vessel B, to scoop the content of vessel D into vessel B, and to stir the content of vessel B. Both coffee-making tasks are demonstrated 22 times, and the total number of demonstrations is 44.

In the coffee-making task, the Polhemus FASTRAK tracking system, with a sampling rate of 30 Hz for the sensors attached on the spoon A, were used to measure the demonstrated trajectories. These trajectories were recorded as the 7-dimensional sequences, which consist of positions and orientations (in the form of quaternion) in 3 and 4 dimensions, respectively, with the position normalized by its standard deviation.

In the coffee-making tasks, vessel C, D, and E never change positions in the training data. In order to recognize the operated objects in the background, a background object is created to replace all background objects, and the position of the background object is the origin. Therefore, in the coffee-making tasks, only two operated objects are inputted, which are the vessel B and the background object.

For each coffee-making task (2 coffee-making tasks), the 8 of the demonstrations are randomly selected to be training data to be inputted into the learning method, and the rest demonstrations are selected to be testing data. These processes are

基於人類示範之意圖推論於工具操作任務之應用

國立交通大學

資訊科

學與工程研究所

博

博

博 士

士

士 論

論

論 文

文

文

基於人類示範之意圖推論於工具操作任務之應用

Intention Deduction by Demonstration for Tool-Handling

Tasks

研 究

生： 陳豪宇

指導教授： 傅心家 教授

楊谷洋 教授

基於人類示範之意圖推論於工具操作任務之應用

Intention Deduction by Demonstration for Tool-Handling

Tasks

研 究

生 ： 陳豪宇

Student: Hoa-Yu Chan

指 導 教 授 ： 傅心家 教授

Advisor: Prof. Hsin-Chia Fu

楊谷洋 教授

Prof. Kuu-young Young

國 立 交 通 大 學

資 訊 科

學 與 工 程 研 究 所

博 士 論 文

A Dissertation

Submitted to Institute of Computer Science and Engineering

College of Computer Science

National Chiao Tung University

in partial Fulfillment of the Requirements

for the Degree of

Doctor of Philosophy

in

Computer Science

June 2012

基於人類示範之意圖推論於工具操作任務之應用

研 究

生 ：陳豪宇

指 導 教 授 ： 傅心家 教授

楊谷洋 教授

國立交通大學資訊科學與工程研究所

摘

要

Intention Deduction by Demonstration for Tool-Handling

Tasks

Student: Hoa-Yu Chan

Advisor: Prof. Hsin-Chia Fu

Prof. Kuu-young Young

Institute of Computer Science and Engineering

National Chiao Tung University

Abstract

Acknowledgment

Table of contents

List of Tables

List of Figures

Nomenclature

Chapter 1

Introduction

1.1

Background

1.2

Contributions of the Dissertation

1.3

Organization of the Dissertation

Chapter 2

Preliminaries and Surveys

2.1

Motion Recognition Type

2.2

Motion Matching Type

2.3

博士

士論

論文

研究

生：陳豪宇

指導教授：傅心家教授

楊谷洋教授

研究

生：陳豪宇

指導教授：傅心家教授

楊谷洋教授

國立交通大學

資訊科

學與工程研究所

博士論文

研究

生：陳豪宇

_{指導教授：傅心家教授}

楊谷洋教授