Design of Sensing System and Anticipative Behavior for Human Following of Mobile Robots

(1)

Design of Sensing System and Anticipative

Behavior for Human Following of

Mobile Robots

Jwu-Sheng Hu, Member, IEEE, Jyun-Ji Wang, and Daniel Minare Ho

Abstract—The human-following behavior design for mobile robots is considered in this paper. In particular, the issue should be addressed from the perspective of human–robot interaction since humans are aware of the following actions. This makes the prob-lem quite different from human tracking where recognition and location accuracy are the main concerns. An anticipative human-following behavior is proposed by incorporating the human model. The human model is constructed using relevant scientific studies about human walk and social interaction, allowing the robot to predict the human trajectory and to take preemptive action. To realize the idea, it is necessary to have a robust sensing system that is capable of tracking the human location persistently. In this pa-per, we also propose a sensing system based on a novel 3-D mean-shift algorithm on RGBD camera. The system performance is assessed through experimental evaluation of three specific human-following scenarios: human-following from behind, human-following on the side, and following in front. Each of these scenarios has its particulari-ties and applications, thus providing insight about the effectiveness and usability of anticipative behavior.

Index Terms—Human following, mobile robot, tracking. I. INTRODUCTION

T

HE rapid development of robotics technology made it possible for robots to be used for tasks that require cooperation with humans, such as health-care-related tasks, home automation, construction, etc. For instance, [1] presents a few examples of robots that can be used to help in carrying heavy objects. Such development in robotics has been driven, in great part, by the technological advances in human–robot interaction (HRI) that involves aspects of human dynamics, behavior, and perception; on the robot side, it involves a variety of sensors and algorithms that are required to perform quality and safe cooperation. This paper focuses on one specific aspect of HRI: human following. Following is an important capability for mobile robots working together with humans. A typical application is when a person needs to guide a robot toward a place in order to execute a task. For example, a construction robot, as the one presented in [2], could benefit from this capability by autonomously following the human operator to the work position. Another application is to allow a human user to teach the robot how to get to a specific location, as it Manuscript received October 18, 2012; revised February 19, 2013 and March 26, 2013; accepted April 27, 2013. Date of publication May 13, 2013; date of current version September 19, 2013.

The authors are with the Department of Electrical Engineering, National Chiao Tung University, Hsinchu 300, Taiwan (e-mail: [email protected]).

Digital Object Identifier 10.1109/TIE.2013.2262758

was discussed in [3]. In some cases, following the human is an integrant part of the robot’s task, i.e., when the robot is carrying objects for the human or when the robot needs to perform a task that requires interacting with a walking human.

Table I shows an overview of recent human-following works on robotics. Many of these works focus on the tracking algorithms, and they present a number of techniques for de-tecting humans and recovering them in the case of occlusions. However, only a few focus on the following behavior itself and on specific aspects of modeling the human walk. Note also that there are no standard performance measures that allow comparative evaluation of these works.

The basic human-following methodology is to simply move the robot toward the human position and then to stop the robot when it is close enough to the human. In this way, when the human moves, the robot will follow afterward. This approach is usually sufficient to satisfy the requirements in applications in which the robot only has to follow the human and no special interaction is required. A similar but smoother strategy is the virtual spring approach [7], [17].

Fig. 1(a) shows a schematic visualization of this analogy. In this implementation, a virtual spring connects the human and the robot. When the human moves, the robot smoothly follows afterward. Furthermore, constraining this virtual spring to a specific angle allows the robot to follow the human side by side. One limitation of the virtual spring is that it is a passive behavior, in the sense that the robot does not make any assumption about the human movement, and therefore, it will not take any preemptive action.

In order to overcome this, works such as [16] use anticipation methods in order to allow the robot to take preemptive action during the following, as shown in Fig. 1(b). They present a side-by-side following strategy that is designed to maximize a utility function for both human and robot. This approach requires the tuning of a considerable amount of parameters which were estimated from the trajectory data of two people walking side by side. Another way to model the human and robot interaction is by modeling the environment with a potential field as in [15]. This is based on the concept that the human motion can be described based on social forces. For example, if the robot in-tercepts a person as in Fig. 1(c), the robot will apply a repulsive social force on the human, causing him or her to alter his or her original trajectory. Other works also used human-following strategies in specific applications: [18] developed a robot to guide elderly people in retirement houses, [19] implemented 0278-0046 © 2013 IEEE

(2)

TABLE I

RECENTRELEVANTHUMAN-FOLLOWINGWORKS

Fig. 1. (a) Following with a virtual spring [7], [17]. (b) Anticipative following [16]. (c) Human behavior modeled with social forces [15].

a robot to guide people in shops and supermarkets, and [20] implemented a robotic supermarket cart that also serves as a walking aid.

In order to implement a generalized following behavior, it is desirable to perform a prediction of the human intention. This, in turn, requires real-time human tracking and obstacle avoidance capability. This paper proposes a new modified meanshift algorithm with depth information using an RGBD camera which can acquire color information (RGB) and depth information (D). The algorithm uses the depth information as the weighting of spatial similarity to reduce the influence of background with similar color distribution. The meanshift algorithm is combined with leg tracking from laser range finder (LRF) and ultrasound information for obstacle avoidance to form the sensing system for the mobile robot. The second contribution of this paper is a novel human-following behavior based on the observation that the human walk pattern changes depending on the robot’s positioning scheme. This is more pronounced when the human tries to command the robot using

her/his movement. This issue is seldom addressed in previous works for human following. To make the issue more prominent, we consider the “front-following” behavior, which means that the robot tries to stay in front of the human to be followed. Accompanying in front of a human is useful in many applica-tions. If the robot carries tools, materials, or merchandise to be dispensed, it is more natural for the person to access the items if the robot is in front. Factories, supermarkets, libraries, and restaurants are a few examples of environments where people could benefit from robot assistants with this behavior. This behavior not only offers a challenging case for HRI but also has practical usage (e.g., when human has to fetch objects from and/or place them to the robot frequently). Different from the related works mentioned previously, this paper focuses on the

normal human walk as explained later. In this case, it is possible

to make assumptions about the human movement that allow the estimation and prediction of the human trajectory. Moreover, we analyze the human and robot interaction in the light of

command embedded in actions [21] using the idea of social forces in [15] and also mutual utility maximization as in [16].

This paper proposes an anticipative human-following behav-ior for the mobile robots. The term anticipative means that the robot will take preemptive action based on a predictive human model. The system is composed of a human-tracking

system, a human walk model for estimation of hidden states,

and a control strategy that can be adapted to various follow-ing scenarios. The experimental evaluation is based on three specific human-following scenarios: following from behind,

(3)

Fig. 2. Sensors and field of view on the mobile robot.

Fig. 3. (a) Scenarios of human following. (b) Relative position and pose of the robot (R) to the human (H).

particularities of these human-following scenarios are ex-plained. With these experiments, we show that, by using scien-tific data and experimental observations about human walk, it is possible to improve the performance of the human-following algorithm. As mentioned previously, there are no standard metrics for assessing the performance of HRI. Reference [22] suggests a few application-driven alternatives. In this paper, the performance will be evaluated with measurable quantities, such as time and control effort, and also by qualitative means, in which human volunteers subjectively rate the system perfor-mance and express their opinions.

II. PROBLEMFORMULATION ANDSYSTEMOVERVIEW The sensors on the mobile robot to investigate the proposed algorithm are shown in Fig. 2. The frontal direction is defined as the direction of the RGBD camera and LRF which are used to detect the human. The sonar sensor is used for obstacle avoidance. Fig. 3(a) depicts three specific human-following scenarios considered in this paper: following from behind, fol-lowing on the side, and folfol-lowing in front. Fig. 3(b) shows the relative position and pose of the robot to the human. Two angles relative to the heading direction of the human are important: the robot’s position vector α and the robot’s frontal direction β. The desired angles for the human-following behavior are shown in Table II.

The overall following behavior control system is comprised of two main functional blocks: control and sensing, as shown in Fig. 4. The sensing block is responsible for observing the human motion, while the control block is used to determine the robot command for following, including obstacle avoidance. The human position measurements (xH0, yH0) and (xH1, yH1)

are obtained by 3D MeanShift and LegTracking, respectively, as explained later. Robot Odometry provides an estimate of the

TABLE II

DESIREDANGLES FOR THEFOLLOWINGBEHAVIOR

Fig. 4. Overall control system block diagram.

robot states from the odometer. Human Walk Model Estimation is responsible for fusing the measurements and for estimating the states of human: position and pose p_H = [xH yH θH]T,

linear velocity (vH), and rotational velocity (ωH). The variable

bswitchis the switch that is used to command the robot to

con-duct either passive or anticipative behavior, as explained later.

Robot Command Generation uses the estimated parameters

from the human walk model and ultrasound data for obstacle detection to generate the control inputs (vR, ωR).

III. SENSINGSYSTEM

As shown in Fig. 4, the sensing system contains four func-tional subblocks. The human leg tracking by LRF is imple-mented using the adaptive breakpoint detector from [25]. The estimation of robot state from the odometer is a standard practice in two-wheel differential drive robot [44]. Therefore, we focus on introducing the proposed modified meanshift algorithm with depth information and human walk model es-timation in the following.

A. Modified Meanshift Algorithm With Depth Information

The RGBD camera is composed of an RGB camera and a depth camera. It can capture the color and depth image simultaneously. The depth data have been calibrated to match correct pixel location in the color image. Depending on the camera model, the pixel location and depth can be converted to 3-D position in the camera coordinate.

In meanshift algorithm [26], the target model and the target candidate are represented by the color pdf which is distributed in the 2-D image plane. We consider the center of target model

q as location 0, and the candidate model pu(y) is defined at

location y. The image data are classified into m-bin histograms in order to reduce the computational complexity. Thus, the target model and the target candidate are defined as

q ={qu}u=1,...,m, m

u=1

qu= 1

p(y) ={pu(y)}u=1,...,m, m

u=1

(4)

A rectangular region with N pixels is selected in the image as the target. Let x∗_i be the normalized pixel location in the region, and the corresponding depth is li. The region is centered

at 0. We define the function b : R2_{→ {1, . . . , m} as the color}

index, and b(x∗_i) is the index of its bin in the quantized feature space of the pixel at location x∗_i. The depth information is added into the target model and target candidate as a kernel weighting function. The modified probability of the feature u = 1, . . . , m in the target model is then computed as

qu= C N i=1 k x∗ i 2 s (li− l_h_lavg) 2 δ [b (x∗_i)− u] (1) where the kernel function k is a 2-D normal function and s is a 1-D normal function. lavg is the average depth in the

central area of the region. The bandwidth hl depends on the

maximum movement of the target between successive frames in the camera coordinate. The depth kernel function s can reduce the influence of the background with similar color to the tracking object. δ is the Kronecher delta function. For the condition m_u=1qu= 1, C is the normalization constant

derived as C = _N i=1 k x∗ i 2 s (li− l_h_lavg) 2−1 . (2) As the modified target model with depth kernel function, the modified target candidate with Ncpixels and bandwidth h is

pu(y)=Ch Nc i=1 k y−x_h i 2 s li−l_h_lavg 2 δ [b (xi)−u] (3) where Ch= _N c i=1 k (y− x_h i) 2 s (li− l_h_lavg) 2−1 . (4) Similar to the meanshift procedure in [26], the current location

y0moves to the new location y1according to the relation

y1= Nc i=1xiwig (y0−xi) h 2 s (li−lavg) hl 2 Nc i=1wig (y0−xi) h 2 s (li−lavg) hl 2 (5) where g(x) =−k(x), and wi= m u=1 qu pu(y0) δ [b (xi)− u] . (6)

While the new location y1is obtained, the average depth lavg

is recomputed with the current central region. The bandwidth h is adapted according to the scale of the target [26].

The 3-D location of the target is computed from the new location y1 and the average depth lavg. The result is projected

onto the 2-D plane with predefined transformation to obtain the human location (xH0, yH0).

Fig. 5. Overview of the human walk model.

Fig. 6. Turning strategies during normal walk.

B. Human Walk Model Estimation

The human walk model estimation can be summarized in the diagram in Fig. 5. The observer is responsible for fusing the measurements (xH0, yH0) and (xH1, yH1) and for

generat-ing estimates (pH, vH, ωH). The spin turn detector resets the

observer in case a spin turn happens, as explained later. The interaction model combines the human and robot states for generating the vector b, which configures the robot behavior to be either passive or anticipative.

Human beings have the agility and flexibility to perform many different movement patterns. In this paper, we focus on one of these patterns: normal walk. Different people have differ-ent gaits, but there is still a significantly high degree of similar-ity [27]. In this paper, the human walk model in a macroscopic scale is considered without going into the biomechanics details. In this way, it is possible to make a more generic and robust model that can adapt to a wider range of people. In normal walk, humans move forward most of the time and avoid turning. That is because the symmetry and structure of the human locomotion mechanism appear to favor straight-ahead walking. Therefore, in order to perform turns, the central nervous system (CNS) must abdicate to this symmetry [28]. Moreover, it is known that the CNS has an optimization behavior [29]. Therefore, it is expected that, for a given trajectory, it will favor trajectories with the minimum turning.

As for turning behavior during human walk, two basic strate-gies can be observed: step turn and spin turn [30]. Fig. 6 shows a schematic representation of both strategies. The step turn is a slow and smooth turn. In this strategy, the steps have a constant rhythm, and the walking process is not interrupted. The spin turn is a fast and sharp turn. It requires breaking the forward movement in order to rotate the body to the new direction. As a result, the spin turn requires considerably more effort than the step turn; therefore, the CNS favors the execution of the step turn strategy.

(5)

During normal walk, the human moves strictly forward with-out any sideways movement. This leads to a nonholonomic model. The approach has also been proposed by [31]. The kinematic model of the human is given by

˙ pH = ⎡ ⎣ ˙x˙yHH ˙ θH ⎤ ⎦ = ⎡ ⎣cos(θsin(θHH)) 00 0 1 ⎤ ⎦vH ωH (7) where ˙pH is the first derivate of the human posture pH =

[xHyHθH]T, in which xH and yH are the human positions

and θH is the human orientation. vH is the forward walking

velocity, and ωHis the turning velocity.

The leg-tracking data from the LRF and the meanshift result from the RGBD camera have different rates and different sensing regions. The framework (SCAAT [46]) is used to solve the sensor fusion problem. Whenever one measurement is avail-able, the elapsed time tk,k−1 is computed since the previous

estimate. The kinematic model of the human can be written as

pH(k) = fH(pH(k− 1), vH(k− 1), ωH(k− 1)) (8)

where fHis a nonlinear function given by

fH(pH, vH, ωH) = ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎡ ⎣xH+ vH ωH (sin(θH+ωHtk,k−1)−sin θH) yH+_ωvH_H (−cos(θH+ωHtk,k−1)+cos θH) θH+ωHtk,k−1 ⎤ ⎦ , ωH = 0 ⎡ ⎣xyHH+v+vHHttk,kk,k−1−1sin(θcos(θHH)) θH ⎤ ⎦ , ωH= 0. (9) The geometric interpretation of this function is that the human moves in an arc of radius vH/ωH. Given the

leg-tracking data (xH0, yH0) from the LRF or the 3-D meanshift

data (xH1, yH1) from the RGBD camera, the state-space

for-mulation for the human posture observation is given as the following: xH(k) = ⎡ ⎣pvHH(k)(k) ωH(k) ⎤ ⎦ = ⎡ ⎣fHv(xHH(k(k− 1)− 1)) ωH(k− 1) ⎤ ⎦ + wH(k− 1) zH(k) = [ I2×2 02×3] xH(k) + vH(k). wH(k)∼N (0, QH(k)) , vH(k)∼ N (0, RH(k)) (10)

where zH(k) = [xH(k) yH(k)]T, wH is the process noise,

and vH is the measurement noise. QH and RH are the

re-spective covariance matrixes for the process and measurement noises. This paper uses the same covariance matrix of mea-surement noise RH for the measurement from both sensors.

The formulation is ready for various kinds of nonlinear filtering algorithms to estimate the states. The unscented Kalman filter is used in this paper, which will be explained in the experimental implementation. Note that, if the measurements from both sensors are not available (e.g., out of range), the robot stops the following behavior.

When the human turning velocity ωH or the measurement

errors exceed a certain threshold ωst, it may indicate that the

human is performing a spin turn, therefore violating the smooth

Fig. 7. Comparison between a social robot and a conventional robot [33]. (a) Conventional robot. (b) Social robot.

Fig. 8. Inclusion of an extra level in the robot model.

nonholonomic model. In this case, a spin turn detector running parallel to the UKF will reset the filter when a spin turn is detected. The turning velocity ωH is set to zero, and human

pose pH is recalculated based on the average of the last kst

observations of the human position. This avoids instability and false loops in the estimated trajectory.

C. Interaction Model

It is known that humans have specific walking behaviors, patterns, and formations while walking in groups [32]. This phenomenon is also observed when the human changes his or her walking behavior once the robot is present. Furthermore, these changes depend on the robot movement pattern. This indicates that a model that considers only the human walk is insufficient. To begin with, consider the HRI model in Fig. 7 suggested by [33]. It has already been shown by previous works [34] that the interaction between robots and humans can be enhanced by improving the human model the robot works with. Based on these works, we propose to insert a robot model inside the human model as shown in Fig. 8. This symbolizes the model of how the human perceives that the robot further emulates the interaction between humans. It is natural that the concepts of reflected self [35], [36], self-concept [37], and empathy [38], which are fundamental in human–human

(6)

Fig. 9. Detailed human model.

Fig. 10. Two behaviors that the human perceives the robot following strategy. (a) Anticipative behavior. (b) Passive behavior.

interaction and mutual understanding, should also be of fun-damental importance in HRI.

However, it is not feasible currently to model how the human perceives the robot in its completeness. Therefore, in this paper, we attempt to model only the way the human perceives the robot following strategy. Moreover, only two possible behaviors are considered: anticipative behavior and passive behavior, as shown in Fig. 9. The system switches to the passive behavior when the human approaches the robot in a circle of radius

di (interaction distance). It switches back to the anticipative

behavior when the human leaves the area of interaction. The variable bswitch indicates to the command generation system

whether it should enable the passive behavior or anticipative

behavior, called combined strategy.

The anticipative behavior is illustrated in Fig. 10(a). The robot anticipates the human trajectory based on the human walk kinematic model described previously. Therefore, in the human model in Fig. 9, the anticipative behavior state in the robot model indicates that the human expects the robot to predict his or her trajectory. The passive behavior shown in Fig. 10(b) indicates that the robot has no anticipative action as it simply keeps a constant distance from the human. Therefore, in the human model, the human expects the robot to respond only to his or her position, without making any assumptions about his or her trajectory.

The anticipative behavior is applicable when the human walks without trying to influence the robot trajectory. Con-versely, the passive behavior is applicable when the human tries to control the robot trajectory. Since the robot does not respond

Fig. 11. Comparison between the passive and anticipatory behaviors. to any input from the human except for his or her position, it is very natural for the human to use his or her position to control the robot. The necessity of controlling the robot naturally arises in the scenario that it is not clear where the robot should go. This human behavior was observed in the experiments.

That can be considered a command embedded in action as explained in [21]. In this kind of command, the robot observes the human behavior and responds accordingly. The existence of the command embedded in action and the necessity of both behaviors can be better illustrated by the front following case as in Fig. 11. In this case, the robot is in front of the human, and it acts as a pushing cart. The robot cannot follow the human walking direction with passive behavior if it is not aligned with the human as in Fig. 11(a). In the anticipative behavior in Fig. 11(c), the robot follows the human walking direction even if it is not aligned with the human. When the human wishes to turn around a corridor, the human may try to issue a command embedded in action to tell the robot where it should go, as in Fig. 11(b). In this case, if the robot did not interpret the command embedded in action, it would move to an undesirable direction as in Fig. 11(d). This can also be analyzed in the light of the social forces described in [15]. The human approaches the robot in a certain direction as to virtually push it to the other direction.

IV. ROBOTCOMMANDGENERATION

Dynamic control strategies for the two-wheel differential drive mobile robot have been extensively studied in many works such as [23], [39], and [40]. In this paper, we focus on generating the command to carry the behaviors as shown in the previous section. The following positions are those considered in Fig. 3 and Table II. The arrows in Fig. 3(a) indicate the possible transitions among the following strategies. Red arrows represent transitions that require a 180◦ turn, while green ones do not.

The passive behavior is implemented similarly to the virtual spring method developed by [7] and [17]. It is activated by the input bswitch, and it is designed to respond to social forces

and commands embedded in actions. The anticipative behavior is implemented with a control rule to stabilize the robot in a following angle around the human:

vR∗ wR∗ = Kv(DX+ dfcos(α)) + cos(θH− θR)vH Kyvr(DY+dfsin(α))+Kθ(β +θH−θR)ωH (11)

(7)

where Kv, Ky, and Kθ are tunable constants to be chosen

according to the mobile robot characteristics and the desired following behavior. vris a reference velocity, and dfis the

de-sired distance between the robot and the human. The velocities

vHand ωHare included in the control rule as a prediction of the

human trajectory. α is the following angle, and β is the desired relative angle between the robot and the human. DX and DY

are calculated as

DX = ((xH, yH)− (xR, yR))·ˆiH

DY = ((xH, yH)− (xR, yR))·ˆjH (12)

where ˆiH and ˆjH are unitary vectors that point to the human

front and left directions, respectively. Please refer to Fig. 3(b) for the definitions of these variables and vectors.

Lastly, the command generated from the behavior control is combined with the one generated by obstacle avoidance. In the human-following scenario, we assume that the human is sufficiently close to the robot, in the sense that there is always a simple path toward the human. Therefore, only local path planning was used for obstacle avoidance. In this paper, the POTBUG algorithm developed in [41] is adopted. This algorithm is a combination of potential field navigation and BUG navigation. In this paper, the radius of the circle is set to the distance to the closest reachable obstacle. The potential

U for each point in the focus circle is calculated as U (pθ) = UAttractor(da) +

i

UObstacle(di) (13)

where pθis a point in the focus circle, UAttractoris a

monoton-ically increasing function, dais the distance between pθand the

subgoal, UObstacleis a monotonically decreasing function, and

di is the distance between pθand the ith obstacle. UObstacleis

infinity for distances below a safety value. This ensures that the robot will not collide with obstacles.

The algorithm works in two modes: free mode and avoid

mode. In the free mode, the command (vR, ωR) is equal to

(vR∗, ωR∗) from the anticipative behavior as in (11) or the

passive behavior. In the avoid mode, the value of (vR, ωR) is

overridden vR ωR = min(vR∗, vsaf e) Kθ(θpa− θR) (14) where vsaf e is the velocity for which the robot can come

to a complete stop considering the maximum acceleration (deceleration) amax. θpa is the angle of the point pθ in the

forward part of the focus circle that has the lowest potential

U . ΣUObstaclegenerates smooth equipotential contours around

the obstacles. Therefore, in order to move around the obstacles, the algorithm sets subgoals on these contours. The contour is chosen to an avoid distance dao. If the obstacle is recognized to

be the human, the avoid distance is set to dah. Similar to [41],

the algorithm is summarized in the following steps. If there is no obstruction in the forward semicircle rc:

Free mode: output: (vR, ωR) = (vR∗, ωR∗)

Else if there is an obstruction:

Avoid mode: output: (vR, ωR) as in (14)

TABLE III ROBOTSENSORS

Fig. 12. Experimental setting: T-shaped corridor.

If the obstacle is not a human, set subgoal in a dao

contour as ΣUObstacle;

Else if the obstacle is a human, set subgoal in a dah

contour as ΣUObstacle.

The main difference is that [41] has a fixed avoidance dis-tance, while we use different distances for the human (dah)

and other obstacles (dao), and dah is bigger than dao. Also,

[41] has three modes, namely, free, engaged, and approach, while this work only uses two modes, namely, free and avoid (engaged).

V. EXPERIMENTALINVESTIGATION

The Pioneer3DX robot was used for experiments. The sen-sors as shown in Fig. 2 are listed in Table III.

A number of experiments were designed to evaluate the per-formance of the proposed human model and also to investigate the human interaction with the robot in different scenarios. The experimental setting was a T-shaped corridor like the one shown in Fig. 12. Each experiment was taken by eight participants whom were asked to take the robot from a starting location at one side of the corridor to an end location at the other side.

Three measures are used to assess the performance. The first one is the time the participant spends to take the robot from the start position to the end position. The second one is the robot total displacement as a measure of the robot effort. Lastly, the human total displacement obtained from the tracking algorithm is used to represent the human effort. The arbitrary parameters for the algorithms used in these experiments are described in Table IV.

A. Human Tracking With Modified Meanshift Method

The captured frame rate of the RGBD camera is 30 fps. The image size is 640 × 480. The maximum speed of humans is limited to 300 cm/s. Therefore, the maximum displacement between successive frames is 10 cm. As a result, the parameter

hdis 100 mm. The feature space is normalized RGB. The color

histograms are divided into 216 bins, i.e. m = 216.

The initial tracking regions, assigned by hand, for the original meanshift method [26] and the proposed method were the same.

(8)

TABLE IV

PARAMETERS FOREXPERIMENTS

Fig. 13. Results of the body tracking using meanshift algorithm (dashed line) and the proposed method (solid line), where frames 1, 10, 40, 60, 100, 130, 185, 198, 220, 240, 336, and 398 are listed.

The body was selected as the tracked target. In Fig. 13, the tracking block of the original meanshift algorithm was attracted to the wall in frame 60. When the target walked near the tracking block later, the tracking block still locked the wall. The proposed method tracked well at every frame. The depth kernel function can reduce the influence of the background with similar color to the tracking object.

B. Back Following

Table V shows quantitative performance measures for each behavior. Fig. 14 shows one representative path for each of

TABLE V

EXPERIMENTALEVALUATION OFBACKFOLLOWING

Fig. 14. Robot and human trajectories for the back following. (a) Passive back following. (b) Anticipative back following.

TABLE VI

EXPERIMENTALEVALUATION OFSIDEFOLLOWING

the behaviors. Most participants commented that they could not observe any difference between the behaviors. From the experimental evaluation, we did not observe significant differ-ences in performance either. One observable difference between the results is that, in the passive back following, the robot makes a wider turn than that in the anticipative back following, as observed in Fig. 14. The reason for this is that, in the

anticipative back following, the robot can estimate the human

trajectory and then it is able to turn faster.

C. Side Following

Table VI shows quantitative performance measures for each behavior, while Fig. 15 shows a comparison between the tra-jectories. Most participants commented that the anticipative behavior is faster and more human-like. A few participants walked faster than the robot could accompany, causing the

anticipative behavior to have the same effect as the passive

behavior.

Similar to the back following scenario, Table VI does not show significant differences of performance between the

passive and anticipative behaviors for time and displacements.

However, a distinct difference between the behaviors was the position of the robot relative to the human, measured with a relative angle. The ideal relative angle α should be 90◦ (see Table II), as the robot would be right beside the human. In the anticipative behavior, as in Fig. 16, the robot stays beside

(9)

Fig. 15. Robot and human trajectories over time for the side following. (a) Passive side following. (b) Anticipative side following.

Fig. 16. Relative following angle over time for the trajectories in Fig. 15. (a) Passive side following. (b) Anticipative side following.

the human most of the time and only loses its position when the human turns, because the robot is already in its maximum speed. In the passive behavior, the robot has more difficulty

TABLE VII

EXPERIMENTALEVALUATION OFFRONTFOLLOWING

following side by side with the human since it does not antici-pate the human trajectory. The fact that the participants consid-ered the anticipative behavior more human-like is compatible with the results from [16], as human walk behavior is highly anticipative.

D. Front Following

Four different robot behaviors were tested: anticipative front following, passive front following, combined front following strategy (interaction model), and passive back following. The back following was added to this experiment as a benchmark. The behaviors were tested in random orders with different participants to avoid biasing the results. In the front following behaviors, the robot was restricted to only follow the human in front, i.e., the robot never moves toward the human. The combined strategy has characteristics from the passive and anticipative behaviors using the interaction model in Section III. As previously explained, the passive behavior is enabled when the human approaches the robot at an interaction distance di, and the anticipative behavior is enabled

when the human leaves this interaction area.

Table VII shows quantitative performance measures for each of the behaviors. Fig. 17 shows one representative walking path for each of the tested behaviors. In detail, the figure shows the human and robot orientations when walking around the corner of the T-shaped intersection. From Table VII and Fig. 17, the anticipative front following had a significantly worse perfor-mance than the passive front following. The reason for the poor performance of the pure anticipative front following is that most participants intuitively walked first to the opposite direction they intended to turn, thus misleading the robot. In practical terms, the difference is that, to take the robot to the goal, in the anticipative front following, the human simply has to walk to his desired position and the robot will follow in front. In the passive front following, the human has to actively control his walking direction in order to keep the robot in front, as in an inverted pendulum problem. See Fig. 11 for a comparison.

In order to further investigate this issue, some of the partic-ipants were told to take the anticipative front following exper-iment again after being briefed that the robot will turn to the same direction they turn. Interestingly, at times, the participants still insisted in turning to the opposite direction. This hints that this decision may be happening in a subconscious level. This intuition may come from the fact that, when pushing an object such as a supermarket cart, it is easier to make

(10)

Fig. 17. Robot and the human walked paths. Below each trajectory: robot and human orientations.

turns by first rotating it around its center of mass. Then, in the case of the front following, this would mean walking to the opposite direction. Contrarily, a few of the test subjects intuitively directly turned to the direction they wanted the robot to go and had more difficulties with the passive front following than with the anticipative front following. It was observed that these participants attributed more human charac-teristics to the robot when describing its behavior than the other participants.

However, the passive front following had a clear disadvan-tage when walking through the straight sections of the corridor. It requires the human to be precisely in front of the robot when walking straight, causing an oscillating path when the human fails to do it as shown in Fig. 17.

The combined strategy is a compromise between the passive and anticipative behaviors. It is anticipative at most times

except when the human intent to control the robot is detected. The trajectory observed for the combined strategy is similar to the passive front following as shown in Fig. 17. However, the time required to take the robot from the start position to the end position is significantly lower and is close to the benchmark case of back following. This happened because, in the combined strategy, the robot can move faster since it estimates the human future position, and yet, it does not go to the wrong direction since it assumes a passive behavior when it is detected that the human is trying to control the robot.

VI. CONCLUSION

In this paper, we have proposed a human-following strategy that is based on predictions yielded by a human walk model. The model was based on relevant scientific research materials from studies about human walk kinematics and HRI. The following strategy was experimentally evaluated in different scenarios and with different parameters. It was observed that the prediction cannot be based on the human walk kinematics alone. In the presence of the robot, the human clearly alters his or her behavior, therefore indicating the necessity of joint walk model. It was possible to improve the performance of the following algorithm by using the interaction model that can switch between the passive and anticipative models of the human behavior. This was especially the case for the front following in which a wrong prediction may cause the robot to go to the incorrect direction.

As the paths were observed in the experiment of front fol-lowing, people naturally use their own position to try to tell the robot where it should go. This human behavior can be analyzed in the light of the social forces mentioned in [15] and [42] and the spontaneous reciprocal altruism [43] (also referred to as mutual utility maximization in [16]). In other words, this could mean that the human naturally tries to help the robot to go to the correct direction. This kind of behavior is expected from humans interacting in a social environment. However, assessing if this kind of behavior equally applies to humans interacting with robots would require further studies.

REFERENCES

[1] K. Kosuge and Y. Hirata, “Human–robot interaction,” in Proc. IEEE Int.

Conf. ROBIO, 2004, pp. 8–11.

[2] S. Y. Lee, K. Y. Lee, S. H. Lee, J. W. Kim, and C. S. Han, “Human–robot cooperation control for installing heavy construction materials,” Autonom.

Robots, vol. 22, no. 3, pp. 305–319, Apr. 2007.

[3] L. Iocchi, J. Ruiz-del-Solar, and T. van der Zant, “Domestic service robots in the real world,” J. Intell. Robot. Syst., vol. 66, no. 1/2, pp. 183–186, Apr. 2012.

[4] Z. Chen and S. T. Birchfield, “Person following with a mobile robot using binocular feature-based tracking,” in Proc. IEEE Int. Conf. IROS, 2007, pp. 815–820.

[5] H. Kwon, Y. Yoon, J. B. Park, and A. C. Kak, “Person tracking with a mobile robot using two uncalibrated independently moving cameras,” in

Proc. IEEE ICRA, 2005, pp. 2877–2883.

[6] M. Kobilarov, G. Sukhatme, J. Hyams, and P. Batavia, “People tracking and following with mobile robot using an omnidirectional camera and a laser,” in Proc. IEEE Int. Conf. Robot. Autom., 2006, pp. 557–562. [7] R. C. Luo, N. W. Chang, S. C. Lin, and S. C. Wu, “Human tracking and

following using sensor fusion approach for mobile assistive companion robot,” in Proc. 35th Annu. Conf. IEEE IECON, 2009, pp. 2235–2240.

(11)

[8] R. C. Luo, C. H. Huang, and T. T. Lin, “Human tracking and follow-ing usfollow-ing sound source localization for multisensor based mobile assis-tive companion robot,” in Proc. 36th Annu. Conf. IEEE IECON, 2010, pp. 1552–1557.

[9] E. A. Topp and H. I. Christensen, “Tracking for following and passing persons,” in Proc. IEEE/RSJ Int. Conf. IROS, Edmonton, AB, Canada, Mar. 2005, pp. 2321–2327.

[10] H. Kim, W. Chung, and Y. Yoo, “Detection and tracking of human legs for a mobile service robot,” in Proc. IEEE/ASME Int. Conf. AIM, 2010, pp. 812–817.

[11] W. Chung, H. Kim, Y. Yoo, C. Moon, and J. Park, “The detection and following of human legs through inductive approaches for a mobile robot with a single laser range finder,” IEEE Trans. Ind. Electron., vol. 59, no. 8, pp. 3156–3166, Aug. 2012.

[12] H. Takemura, N. Zentaro, and H. Mizoguchi, “Development of vi-sion based person following module for mobile robots in/out door environment,” in Proc. IEEE Int. Conf. ROBIO, 2009, pp. 1675–1680. [13] V. Alvarez-Santos, X. Pardo, R. Iglesias, A. Canedo-Rodriguez, and

C. Regueiro, “Feature analysis for human recognition and discrimination: Application to a person following behaviour in a mobile robot,” Robot.

Autonom. Syst., vol. 60, no. 8, pp. 1021–1036, Aug. 2012.

[14] F. Hoshino and K. Morioka, “Human following robot based on control of particle distribution with integrated range sensors,” in Proc. IEEE/SICE

Int. Symp. Syst. Integr., 2011, pp. 212–217.

[15] A. Garrell Zulueta and A. Sanfeliu Cortés, “Model validation: Robot behavior in people guidance mission using DTM model and estimation of human motion behavior,” in Proc. IEEE/RSJ Int. Conf. IROS, 2010, pp. 5836–5841.

[16] L. Y. Morales Saiki, S. Satake, R. Huq, D. Glass, T. Kanda, and N. Hagita, “How do people walk side-by-side? Using a computational model of human behavior for a social robot,” in Proc. 7th ACM/IEEE Int. Conf.

HRI, 2012, pp. 301–308.

[17] K. Morioka, J. H. Lee, and H. Hashimoto, “Human-following mobile robot in a distributed intelligent sensor network,” IEEE Trans. Ind.

Electron., vol. 51, no. 1, pp. 229–237, Feb. 2004.

[18] J. Pineau, M. Montemerlo, M. Pollack, N. Roy, and S. Thrun, “To-wards robotic assistants in nursing homes: Challenges and results,” Robot.

Autonom. Syst., vol. 42, no. 3/4, pp. 271–281, Mar. 31, 2003.

[19] H. M. Gross, H. J. Böhme, C. Schröter, S. Müller, A. König, C. Martin, M. Merten, and A. Bley, “ShopBot: Progress in developing an interactive mobile shopping assistant for everyday use,” in Proc. IEEE Int. Conf.

SMC, 2008, pp. 3471–3478.

[20] H. Kaindl, B. Putz, D. Ertl, H. Hüttenrauch, and C. Bogdan, “A walking aid integrated in a semi-autonomous robot shopping cart,” in Proc. 4th Int.

Conf. Adv. Comput.-Human Interact., 2011, pp. 218–221.

[21] K. Kobayashi and S. Yamada, “Extending commands embedded in actions for human–robot cooperative tasks,” Int. J. Social Robot., vol. 2, no. 2, pp. 159–173, Jun. 2010.

[22] A. Steinfeld, T. Fong, D. Kaber, M. Lewis, J. Scholtz, A. Schultz, and M. Goodrich, “Common metrics for human–robot interaction,” in

Proc. 1st ACM SIGCHI/SIGART Conf. Human-Robot Interact., 2006,

pp. 33–40.

[23] R. Fierro and F. Lewis, “Control of nonholonomic mobile robot: Back-stepping kinematics into dynamics,” J. Robot. Syst., vol. 14, no. 3, pp. 149–163, 1997.

[24] J. S. Hu, C. W. Juan, and J. J. Wang, “A spatial-color mean-shift object tracking algorithm with scale and orientation estimation,” Pattern

Recog-nit. Lett., vol. 29, no. 16, pp. 2165–2173, Dec. 2008.

[25] G. A. Borges and M. J. Aldon, “Line extraction in 2D range images for mobile robotics,” J. Intell. Robot. Syst., vol. 40, no. 3, pp. 267–297, Jul. 2004.

[26] D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-based object tracking,”

IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 5, pp. 564–577,

May 2003.

[27] G. Scott, Analysis of Human Motion: A Textbook in Kinesiology. New York, NY, USA: Crofts, 1946.

[28] G. Courtine and M. Schieppati, “Human walking along a curved path. I. Body trajectory, segment orientation and the effect of vision,” Eur. J.

Neurosci., vol. 18, no. 1, pp. 177–190, Jul. 2003.

[29] G. Arechavaleta, J. P. Laumond, H. Hicheur, and A. Berthoz, “An opti-mality principle governing human walking,” IEEE Trans. Robot., vol. 24, no. 1, pp. 5–14, Feb. 2008.

[30] K. Hase and R. Stein, “Turning strategies during human walking,”

J. Neurophysiol., vol. 81, no. 6, pp. 2914–2922, Jun. 1999.

[31] G. Arechavaleta, J. P. Laumond, H. Hicheur, and A. Berthoz, “On the nonholonomic nature of human locomotion,” Autonom. Robots, vol. 25, no. 1/2, pp. 25–35, Aug. 2008.

[32] M. Costa, “Interpersonal distances in group walking,” J. Nonverbal

Behav., vol. 34, no. 1, pp. 15–26, Mar. 2010.

[33] T. Sato, Y. Nishida, J. Ichikawa, Y. Hatamura, and H. Mizoguchi, “Ac-tive understanding of human intention by a robot through monitoring of human behavior,” in Proc. IEEE/RSJ/GI Int. Conf. IROS, 1994, vol. 1, pp. 405–414.

[34] M. A. Goodrich and A. C. Schultz, “Human–robot interaction: A survey,”

Found. Trends Human-Comput. Interact., vol. 1, no. 3, pp. 203–275,

Feb. 2007.

[35] K. N. Ochsner, J. S. Beer, E. R. Robertson, J. C. Cooper, J. D. E. Gabrieli, J. F. Kihsltrom, and M. D’Esposito, “The neural correlates of direct and reflected self-knowledge,” Neuroimage, vol. 28, no. 4, pp. 797–814, Dec. 2005.

[36] H. M. Wallace, “The reflected self: Creating yourself as (you think) others see you,” in Handbook of Self and Identity. New York, NY, USA: Guilford, 2003, p. 91.

[37] H. Markus, J. Smith, and R. L. Moreland, “Role of the self-concept in the perception of others,” J. Personal. Social Psychol., vol. 49, no. 6, pp. 1494–1512, Dec. 1985.

[38] J. Decety and P. L. Jackson, “The functional architecture of human empathy,” Behav. Cognit. Neurosci. Rev., vol. 3, no. 2, pp. 71–100, Jun. 2004.

[39] B. d’Andrea-Novel, G. Campion, and G. Bastin, “Control of nonholo-nomic wheeled mobile robots by state feedback linearization,” Int.

J. Robot. Res., vol. 14, no. 6, pp. 543–559, Dec. 1995.

[40] A. De Luca, G. Oriolo, and C. Samson, “Feedback control of a non-holonomic car-like robot,” in Robot Motion Planning and Control. New York, NY, USA: Springer-Verlag, 1998, pp. 171–253.

[41] M. Weir, A. Buck, and J. Lewis, “POTBUG: A mind’s eye approach to providing bug-like guarantees for adaptive obstacle navigation using dynamic potential fields,” in Proc. From Animals Animats, 2006, vol. 9, pp. 239–250.

[42] B. R. Fajen, W. H. Warren, S. Temizer, and L. P. Kaelbling, “A dynamical model of visually-guided steering, obstacle avoidance, and route selection,” Inte. J. Comput. Vis., vol. 54, no. 1–3, pp. 13–34, Aug. 2003.

[43] E. Fehr and U. Fischbacher, “The nature of human altruism,” Nature, vol. 425, no. 6960, pp. 785–791, Oct. 2003.

[44] G. Cook, Mobile Robots: Navigation, Control and Remote Sensing, 1st ed. Hoboken, NJ, USA: Wiley-IEEE Press, Jun. 28, 2011. [45] J. Y. Jung, B. K. Dan, K. H. An, S. W. Jung, and S. J. Ko, “Real-time

human tracking using fusion sensor for home security robot,” in Proc.

IEEE ICCE, 2012, pp. 420–421.

[46] G. Welch and G. Bishop, “SCAAT: Incremental tracking with incomplete information,” in Proc. 24th Annu. Conf. Comput. Graph. Interact. Tech., 1997, pp. 333–344.

Jwu-Sheng Hu (M’94) received the B.S. degree from the Department of Mechanical Engineering, National Taiwan University, Taipei, Taiwan, in 1984 and the M.S. and Ph.D. degrees from the Department of Mechanical Engineering, University of California, Berkeley, CA, USA, in 1988 and 1990, respectively. From 1991 to 1993, he was an Assistant Professor with the Department of Mechanical Engineering, Wayne State University, Detroit, MI, USA. Since 1993, he has been with the Department of Electri-cal Engineering, National Chiao Tung University, Hsinchu, Taiwan, where he became a Full Professor in 1998. Since 2008, he has been working part-time at the Industrial Technology Research Institute of Taiwan, where he serves as the advisor for the intelligent robotics program and the principal investigator of a large-scale robotics research projects funded by the Ministry of Economic Affairs. He also serves as an advisor at the National Chip Implementation Center of Taiwan for embedded system design applications. His current research interests include robotics, mechatronics, and embedded systems.

(12)

Jyun-Ji Wang received the B.S. degree from the Department of Electrical and Control Engineering, National Chiao Tung University, Taipei, Taiwan, in 2006, where he is currently working toward the Ph.D. degree in the Department of Electrical Engineering.

His current research interests include robotics and machine vision.

Daniel Minare Ho received the B.S. degree from the Department of Mechatronics Engineering, Uni-versity of Brasilia, Brasilia, Brazil, in 2010 and the M.S. degree from the Electrical Engineering and Computer Science Department, National Chiao Tung University, Taipei, Taiwan.

He currently works in the IT industry. His research interests include robotics and machine vision.