Chapter 1 1
1.4 Thesis Organization
This these is organized as follows. In Chapter 2, we introduce the hardware design of our omnidirectional robot including the holonomic kinematic, some background knowledge of human tracking system and social norms. In Chapter 3, the proposed socially-aware navigation using extended social force model (EFSM) is presented. With the grouping human model and ESFM, several robot behaviors are designed. In the end of Chapter 3, a multi-policy decision making procedure is proposed. In Chapter 4, the experiments show the interacting detection and the result of robot behaviors. We implement the system on an omnidirectional mobile robot and evaluate the effects through the real interaction with human. Finally, Chapter 5 gives the summaries, conclusions and future works.
Chapter 2
Preliminaries
2.1 Omnidirectional Mobile Robot
In this section we introduce the design and several important kinematics of omnidirectional mobile robot using in this work. To imitate human movement, which is actually a hybrid system of non-holonomic and holonomic movement, a three-wheels mobile robot was designed and used. With the characteristic of omnidirectional movement, a nature and human-like navigation is possible.
2.1.1 Hardware Design
The following shows the architecture of our omnidirectional mobile robot. Hardware design is shown in Figure 2.1.
Figure 2.1. Hardware Design of Omnidirectional Mobile Robot
Figure 2.2. Example of a omnidiretional wheel.
This mobile robot is hexagonal shape, there are three wheels, one laser range finder, one camera mounted on it. This wheel with several sliding wheels, shown in Figure 2.2, can let robot has the ability to do a lateral movement. The wheels divide equally the plane, which means each angle between two adjacent axis of wheel is 120 .
Figure 2.3. The Hardware Structure of A Three Omnidirectional Wheels Mobile Robot.
For mobile robot moving on the ground, the maximum degree of freedom is three,
which include translation along x-axis, y-axis and rotation about z-axis. Therefore, the pose of a mobile platform can be written as
x y
T . Corresponding to that, the control input of the platform is
vT vL
T which are tangential velocity, lateral velocity and angular velocity respectively.Figure 2.4. Kinematic Model of Holonomic Mobile Robot.
The model of this robot structure is shown in Figure 2.3 above. Wheel’s velocities are symbolled as vl, vr, vb, which is left, right, back wheel, respectively. With variable R as radius, as the angle between direction of vl and vr, the mathematic model can be written as equation below.
[ camera, Logitech c170, are used. The LRF, which is mounted at height, is able to detect
obstacles and human legs. The scanning rate is up to 50Hz and the operating range is from 20cm to 20m. This property is enough for indoor environment, even a square. The field of view is 270 and the resolution is 5 . The laser data are expressed into a set as: number of total laser data. These sequence laser point can be used as geometrical feature for modeling the environment.
2.1.2 Omnidirectional Movement
Different from differential movement, omnidirectional movement, or says holonomic movement, does not suffer from sliding constraint – no sliding motion.
Because of the special mechanism, robot is allowed to have lateral velocity with rotating about z-axis simultaneously. Such property makes robot has a higher ability in navigation.
The holonomic model can be expressed as:
movements can be achieved. With zero angular velocity, lateral and diagonal movement
The control input can be expressed as equation below. changing the heading, robot could move forward to a diagonal direction. The control input can be written as:
On the other hand, with a value of angular velocity, robot can achieve a powerful movement called “rotating without changing transition direction”. While rotating the heading angle, robot can move toward a fixed direction in global coordinate simultaneously. From study of human motion, this movement is the most useful and common when we are going to turn and move forward in the same time. As observation, the translational direction is relative to global coordinate. If the global translational and angular velocity
x y
are given. The control input can be written as:An example of this movement is illustrated in Figure 2.5.
Figure 2.5. Rotation without Changing Translation Direction.
By these holonomic movements, robot can move in a more nature, human-like way.
Besides, robot has a powerful ability to avoid obstacles in high density or narrow environment.
2.2 Human Tracking System
In this section we describe the human tracking system by using laser range finder and web camera. Human tracking system is composed of two part: (1) human detection, (2) human tracking. In this work, laser range finder is used to detect the human position which is the input of the tracking system. On the other hand, human information
2.2.1 Human Detection and Tracking
Human detection using laser has been studied in Human Robot Interaction and navigation for years. Several works are proposed in literature. These methods are designed to detect legs or bodies and can be track by a laser-based system. Recall the laser
distinguished from the data. Some important geometry features are described in [39].
Referring from our previous work, a modified “inscribed angle variance” (IAV) [40]
method is proposed for human detection which studied the leg’s patterns. The flow chart of hybrid approach to human detection is shown in Figure 2.6.
Figure 2.6. Flow Chart of Human Detection System
The hybrid approach can be decomposed into five parts: segmentation, pattern extraction, IAV filter, coordinate transformation and motion detection. After a sequence of laser scanning data is collected by laser range finder, the measured points are clustered into several sub-group based on the Euclidean distance between two adjacent points. To distinguish whether these segments are legs or not, several leg patterns are used to match the geometry traits. In Figure 2.7, three patterns using in this work are illustrated which
are LA (legs apart) pattern, FS (front straddle) pattern and SL (legs together) pattern.
However, these patterns can be generated by some false positive such like chairs or electric fan. To overcome this problem, a modified IAV method is used. With the specific range and arc shape of human leg, this method can separate human legs and other objects.
After that, human position is calculated from these segments by doing coordinate transformation. In the end, a motion detector will sense a human is moving or not. Thus, the output of the detection system would be the position and moving information of all human in robot’s the field of view.
Figure 2.7. Leg Patterns.
As the detection procedure finished, a Sequence Importance Resampling (SIR) particle filter based human tracking system is introduced [41]. For predicting human which includes position and velocity, the human state is presented in a probability model 𝑝(𝑥𝑡|𝑥𝑡−1). The prediction model based on the assumption of constant velocity can be modeled as:
constant covariance. For the update stage, a distance model considered as a measurement model can be expressed as:
𝑝(𝑧𝑡|𝑥𝑡) = (√2𝜋𝜎)−1𝑒𝑥𝑝(− 𝑑𝑖
2𝜎)2 (2.)
Where 𝑑𝑖 is the distance between the 𝑖𝑡ℎ particle state and estimated pose. In the weighting update step, the posterior distribution function (PDF) of human state can be approximated by a set of weighted particles. Finally, the human state [𝑥𝑡 𝑦𝑡 𝜃𝑡 𝑣𝑡] can be estimated by the weighted mean of all the particles.
Figure 2.8. Laser-Based Leg Detection Example.
2.2.2 Human Skeleton Keypoints Estimation
Although laser measurement is an efficient way to have an accurate information of distance, other information of human such like heading angle or posture is hard to obtain.
From sociology research and observation, human could face to a direction different from moving direction. This human motion characteristic is far difference from the assumption of previous works and cannot calculate from SIR tracking system directly. Therefore, a visual-based skeleton detection system is used in our work.
Cao et al. [42] proposed a real-time multi-person pose estimation method, named OpenPose, that can detect human skeleton using RGB images. This work is the first bottom-up representation of association scores via Part Affinity Fields (PAFs). PAFs is a set of 2D vector fields that encode the degree of association between parts of human body in the image domain. Taking a color image of size 𝑤 × ℎ, the system produces the 2D locations of keypoints for each person in the image. The input image is first analyzed by a convolutional network which is initialized by the first 10 layer of VGG-19[43] and fine-tuned to generate a set of feature maps F. These feature maps are then put into a feedforward network to predict a set of 2D confidence maps 𝑆 of body locations and a set of 2D vector fields 𝐿 of part affinities, simultaneously. Finally, the confidence maps and the affinity fields are parsed by greedy inference to output the 2D keypoints representing the joint positions for all people in the image. The system flow is shown in Figure 2.9 below.
Figure 2.9. System flow of multi-person pose estimation method.
This method uses a two-branch Convolutional Neural Network (CNN) to predict confidence maps 𝑆𝑡 and part affinity fields 𝐿𝑡, respectively. Each branch is an iterative prediction architecture concatenated for next stage along with the image features, shown in Figure 2.10. The upper branch predicts confidence maps 𝑆𝑡 and the bottom one predicts PAFs 𝐿𝑡.
Figure 2.10. Architecture of The Two-Branch Multi-Stage CNN.
After the prediction and detection of human in the image, a list of 2D keypoints obtained from the CNN network can be regarded as a useful information of human orientation and facing direction. The example is shown in Figure 2.11 below.
Figure 2.11. Example of the keypoints of a person.
Here, the domain of keypoints can be represented as:
𝐷𝑜𝑚𝑎𝑖𝑛𝐾𝑃 = {𝑝0, 𝑝1, 𝑝2, … , 𝑝17} (2.) 𝑝𝑖 = [𝑥𝑖𝑖𝑚𝑔 𝑦𝑖𝑖𝑚𝑔], ∀𝑝𝑖 ∈ 𝐷𝑜𝑚𝑎𝑖𝑛𝐾𝑃 (2.)
In detail, {𝑝0, 𝑝14, 𝑝15, 𝑝16, 𝑝17} are the points of nose, eyes and ears, respectively.
{𝑝1, 𝑝2, 𝑝5} are shoulders; {𝑝3, 𝑝4, 𝑝6, 𝑝7} are arms. We then focus on these points to calculate the human orientation. The methodology would be described in Chapter 3.
2.3 Social Force Model
Helbing et al. [1] proposed the basic idea of Social Force between interactions in late 20th century. It is based on the concept of changing in behavior which can be explained in terms of social forces or potential fields. These social force can be
objects based on the type of interaction.
An attractive force describes the moving intention of an agent. Let 𝑛𝑎𝑖→𝑔𝑖 be the unit vector for the agent 𝑖 toward its goal and 𝑘𝑎 is the parameter representing the strength of force. The attractive force can be expressed as:
𝑓𝑖𝑎𝑡𝑡𝑟 = 𝑘𝑎 ∙ 𝑛𝑖→𝑔𝑖 (2.)
Second, the repulsive force describes the interaction between an agent 𝑖 and another agent 𝑗 that prevents from collision can be modeled as:
𝑓𝑖,𝑗𝑖𝑛𝑡 = 𝛼𝑝exp(−𝑑𝑖,𝑗
𝛽𝑝) ∙ 𝑛𝑗→𝑖 (2.)
where {𝛼𝑝 𝛽𝑝} are the SFM parameters for human, 𝑑𝑖,𝑗 is the distance between these two agents and 𝑛𝑗→𝑖 is the unit vector from agent 𝑗 to 𝑖.
Similarly to the previous one, the repulsive force between an agent 𝑖 and an obstacle 𝑜 ∈ 𝑂 in the neighborhood of 𝑖 with different SFM parameters {𝛼𝑜 𝛽𝑜} and unit vector 𝑛𝑜→𝑖 can be modeled as:
𝑓𝑖,𝑜𝑜𝑏𝑠 = 𝛼𝑜exp(−𝑑𝑖,𝑜
𝛽𝑜) ∙ 𝑛𝑜→𝑖 (2.)
Finally, the resultant force is the summation of all exerted forces. For an agent 𝑖, the overall social force can be expressed as:
𝑓𝑖 = 𝑓𝑖𝑎𝑡𝑡𝑟+ ∑ 𝑓𝑖,𝑗𝑖𝑛𝑡
𝑗≠𝑖
+ ∑ 𝑓𝑖,𝑜𝑜𝑏𝑠
𝑜∈𝑂
(2.)
An illustration of a Social Force Model is shown in Figure 2.12.
Figure 2.12. The illustration of social force model. (a) shows the example that two agents and a wall are performed. Green arrow is the repulsive force and red is the attractive one.
The overall social force 𝐹𝑖 is shown in (b).
2.4 Gap Analyzing
Borrowing the concept from Nearness-Diagram proposed by Minguez and Montano [6] and the previous work from our lab [5], a laser-based gap analyzing method is used.
Figure 2.13. Regions obtained from gap analyzing.
From the laser data P = {𝑃𝑛 = [𝑑𝑛 𝜙𝑛]𝑇}, 𝑛 ∈ 𝑁𝑙𝑎𝑠𝑒𝑟 , we first calculate the Euclidean distance between two adjacent points. With the threshold 𝐺𝑎𝑝𝑡ℎ𝑟𝑒 , we can clutter the laser points into several groups, some of them are human and the remaining are environmental obstacles. A list of gaps 𝐺𝑎𝑝̂ is built after this procedure.
Gap occurs when the Euclidean distance between two adjacent points exceeds the threshold 𝐺𝑎𝑝𝑡ℎ𝑟𝑒. The formula is:
|𝑑𝑖− 𝑑𝑖−1| > 𝐺𝑎𝑝𝑡ℎ𝑟𝑒, 𝐺𝑎𝑝𝑡ℎ𝑟𝑒 > 0 (2.)
There are two types of edge that form the starting and ending of a gap named 𝐸𝑑𝑔𝑒𝑟𝑖𝑠𝑖𝑛𝑔 and 𝐸𝑑𝑔𝑒𝑓𝑎𝑙𝑙𝑖𝑛𝑔 are defined as below:
𝐸𝑑𝑔𝑒𝑟𝑖𝑠𝑖𝑛𝑔 = {𝐸(𝑑𝑖, 𝑑𝑖−1)| 𝑑𝑖 − 𝑑𝑖−1 > 𝐺𝑎𝑝𝑡ℎ𝑟𝑒} () 𝐸𝑑𝑔𝑒𝑓𝑎𝑙𝑙𝑖𝑛𝑔= {𝐸(𝑑𝑖, 𝑑𝑖−1)| 𝑑𝑖−1− 𝑑𝑖 > 𝐺𝑎𝑝𝑡ℎ𝑟𝑒} ()
An example illustrate the environment and result of gap analyzing is shown in Figure 2.13 above. A rising edge E(𝑃𝑎, 𝑃𝑎−1) and a falling edge E(𝑃𝑏+1, 𝑃𝑏) can form an area Γ𝑖 = [𝜙𝑠𝑡𝑎𝑟𝑡, 𝜙𝑒𝑛𝑑]𝑇 that describes the gap and free space, where .𝜙𝑠𝑡𝑎𝑟𝑡 and 𝜙𝑒𝑛𝑑 are the starting and end angle of the 𝑖𝑡ℎ region, respectively. After all the possible gaps are calculated, a list of gap 𝐺𝑎𝑝̂ can be obtained. The regions are Γ1 = [𝜙𝑎, 𝜙𝑏]𝑇 , Γ2 = [𝜙𝑐, 𝜙𝑑]𝑇 and Γ3 = [𝜙𝑒, 𝜙𝑓]𝑇.
2.5 Multi-Policy Decision Making
A decision making problem has been studied for decades and applied to lots of robot frameworks. Here, we apply the work proposed by Mehta et al. [44], the Multi-Policy Decision Making (MPDM) method. Given a set of policies 𝛏, observation z, the joint state of robot and detected human x, the MPDM would simulate the joint state forward for each executable policy and product an optimal policy based on the cost function. In this section, we first describe the cost function and then the body of MPDM.
2.5.1 Multi-Policy Decision Making
In the decision making procedure, [44] first define a function that simulates the joint state forward in a given time interval. Given the joint state and one policy, the function named simulate_forward calculates the future trajectories of human and robot and the total cost during the time interval.
Algorithm MPDM(𝐱, 𝐳, 𝒕𝑯, 𝑵𝒔)
The body of MPDM is shown in Algorithm above. Given current observation form sensor, we first sample 𝑁𝑠 times of the joint state from the observation. Each sample is simulated forward by the function simulate_forward with a policy and a given time interval 𝑡𝐻 . The cost of each turns would be the feedback of the simulate_forward function and an expectation method is used to calculate the mean state of the cost with considering the uncertainties. To obtain a closed form of the cost expectation, a sample technique, importance sampling, is used. The approximation of the expectation can be expressed as:
𝐸𝑥{𝐶(𝒙, 𝜉)}~ ∑ 𝑤𝑠𝐶(𝑥𝑠, 𝜉)
𝑠∈𝑆
(2.)
where S is the set of samples drawn from the distribution 𝑃(𝑥|𝑧) and 𝑤𝑠 is the importance weight. These samples would result in the different future trajectories that take the uncertainty into account. The simulate_forward function is shown in Algorithm below.
Algorithm simulate_forward(𝐱, 𝛏, 𝒕𝑯)
As the general purposes of design the cost function, there are two main goal of a robot navigation, progress toward its target position and avoid to disturb the human or collide to an obstacle. The two components of the cost function are:
Force: The summation of maximum repulsive force exerted on another agent can be regarded as the cost of disturbing others. The cost can be expressed as:
F (𝑋̂(ξ)) = ∑ 𝑚𝑎𝑥𝑗≠𝑟‖𝑓𝑟,𝑗𝑖𝑛𝑡(𝑥̂(𝑡′, 𝜉))‖
𝑡𝐻 𝑡′=𝑡
(2.)
Progress: The term encourages the robot for going toward the goal. With the unit vector 𝑛𝑟→𝑔𝑟 toward its goal and a future position applying a policy 𝑥𝑟(𝑡𝐻, 𝜉), the cost of progress can be expressed as:
PG (𝑋̂(ξ)) = (𝑥𝑟(𝑡𝐻, ξ) − 𝑥𝑟) ⋅ 𝑛𝑟→𝑔𝑟 (2.)
Chapter 3
Socially-Aware Navigation
3.1 System Architecture
The overview of our system is shown in Figure 3.1. From the multi-human indoor environment, the robot will operate in a proper action and movement based on the current situation which can be described by the state of human and obstacles. Each control loop starts from sensing module, which combines a laser-based submodule with a vision-based one, and collects data including human information and obstacles position. About the second stage, we are going to construct models of human and static obstacles from the collected data. After modeling the environment, the main procedure of decision making is applied. This multi-policy decision making procedure will choose an optimal policy that matches the social situation that the robot is currently facing. Recall that the velocities controlled by the system behind are translation along x-axis, y-axis and rotation about z-axis, namely [𝑣𝑇 𝑣𝐿 𝜔]𝑇 , the control module perform to send proper velocity command to actuators. Following the optimal policy, the robot navigates in the environment by controlling the robot’s behaved velocities to achieve socially-aware navigation. The description of each function modules is shown below.
In the sensing module block, it is composed of laser-based and vision-based models. For laser-based model, we collect the laser data as geometry feature such as distance and angle. For vision-based model, we use the multi-person
2D keypoints detection method proposed by [42] to calculate human heading direction. The detail is described in Chapter 2.
Figure 3.1. Overall system architecture.
In the human detection module, we use the sensing data to construct individual human model and go one step further to construct group model from it. This individual and group models together will be considered as the social situation faced by the robot. After the construction, this information will form a joint state of robot and pedestrian which will be a part of decision input.
For other static obstacles in the environment, such as wall, sofa, chair, we can also collect the geometry data from laser. Here, we use the idea of gap analysis [6]. This method clusters the laser data into several groups based on the Euclidean distance between two adjacent laser points. From the analysis, a sequence of gaps is confirmed and describes the free space of the robot configuration. After clustering, we also do the line detection to figure out where a wall is. The detail is described in Chapter 2.
The policy set consists of several policies, also known as robot behaviors, including “Go-Forward”, “Follow Agent”, “Wall Following”, “Stop”. Actually, the size of executable policy list is dynamic because the number of agents who can be followed is yet to be determined. Therefore, this module also takes human states as input to construct the current policy set.
In Multi-Policy Decision Making module, we take the human model, gaps, and a list of executable policies into consideration. First, this module does sampling from the current observation of joint state. Second, it simulates all the agents including human and robot forward time for some duration with the executable policies and the calculate the total cost of each individual policy. Based on the cost, an optimal one is chosen by this decision making procedure for the robot to execute. The submodule of this procedure is shown in Figure 3.2.
As has been described in Chapter 2, the control module is composed of robot
control and actuator model. Robot control model follow the optimal policy ξ𝑜𝑝𝑡𝑖𝑚𝑎𝑙 to generate the velocity [𝑣𝑇 𝑣𝐿 𝜔] . The actuator model then generates the output velocity [𝑣𝑙 𝑣𝑟 𝑣𝑏] to each wheel. The hardware design and system detail can be seen in Chapter 2 as well.
3.2 Human Model Construction
To achieve a socially-aware navigation, we take human feeling and nature into consideration. We first define an individual model which represents a human with several assisting information including position, orientation, and velocity. Constructed from sensing data, this individual model can express the human trajectory and behavior that become the social cues. Different from previous works, we go one step further to construct the group model that describes the dynamics and behavior of a crowd. The behavior between pedestrians in a group can give us more information about human crowds. In the real environment, pedestrians walk either in individual way or with others to form a group.
Following this concept, we propose a human model including individual and dynamic group model that show the social situation surrounding the robot.
3.2.1 Individual Model Construction
To construct an individual human model, we first define the component of this model.
The information of an individual model includes position, orientation, and velocity. The
The information of an individual model includes position, orientation, and velocity. The