Chapter 1 Introduction
1.5 Thesis Organization
The remainder of this thesis is organized as follows. In Chapter 2, we describe the system configuration of the vehicle and the principles of proposed learning,
Learning Phase
Guiding Phase
Learn path data
Learn turning data
Environment analyze
Navigation Path and point information
Detect turning node
Turn
(b) Tour guiding in narrow indoor environments technique.
Figure 1.1 Flowcharts of proposed systems.
human following, and guidance techniques. In Chapter 3, the proposed the method for improving the practicability of camera calibration, the method for calibration of the odometer, and the method of using the reference data for distance computation are described. In Chapter 4, a method for improving the practicability of human facing direction detection is described. The proposed techniques for path planning are described in Chapter 5. The method for learning the path of guidance is described in Chapter 6. Some satisfactory experimental results are shown in Chapter 7. Finally, some conclusions and suggestions for future works are given in Chapter 8.
Chapter 2
System Configuration and Guidance Principles
2.1 Introduction
There had many studies about using machines to reduce the burden of humans to do the work of patrolling. The autonomous land vehicle with camera equipments is commonly used for this kind of work because cameras could “see” objects which need be monitored in replacement of human eyes. Usually, we have to key in the path of a patrolling vehicle manually into a computer and such work of assigning a path to a vehicle is not easy to perform for a user who is not good at machine handling. This inconvenience can be resolved if we have an autonomous vehicle equipped with a camera which has the abilities of following people and remembering the path it walked through. With such an intelligent vehicle, we can then bring it to walk through the path once, and the vehicle can remember the path and walk the way back to the start point by itself. Additionally, we can change the patrol path adaptively for various application needs easily.
On the other hand, when we visit an unfamiliar place like an office in a building, we often need a guide to bring us from the outside of the elevator into the office. It is desirable too that the guide could give some introduction to the environment. We can use a robot instead of a human being to do this job. However, when the environment is not broad, then the robot could hit the wall. The problem could be solved if we have
an autonomous land vehicle with ultrasonic sensors, which can keep the vehicle navigating in the middle of the path using the ultrasonic signals provided by the sensors.
The entire hardware equipments and software used in this study are described in Section 2.2. The system of person following and patrolling includes two major stages to reach its goal. The first is the learning process which makes the vehicle more intelligent. In Section 2.3, we will introduce the principle of the proposed learning technique. We will describe how to follow a person and learn the information of paths.
The second stage is the path planning process which can refine the learned path. In Section 2.4, we will describe the major steps of refining the path obtained from the learning process. For the system of the guiding vehicle, we will introduce the principles of the learning and guiding strategies in Section 2.5.
2.2 System Configuration
In our system, we use the Pioneer 3, a vehicle made by ActiveMedia Robotics Technologies Inc., as a test bed. The vehicle is equipped with a pan-tilt-zoom camera of which we can adjust some parameters by a graphical user interface, such as the image resolution, the image format, and so on.
Because the wireless signal might be interfered by unknown signals sometimes, we control the vehicle by a network cable connecting our computer and the vehicle in our system. In this way we can have stable communication among the vehicle, the program, and the PTZ camera. A diagram illustrating this configuration is shown in Figure 2.1.
Laptop
Network cable
Figure 2.1 Equipment connection situations in this study.
2.2.1 Hardware configuration
The hardware equipments we use in our system include three parts. The first part is a laptop which we use to run our program. A kernel program can be executed on the laptop to control the vehicle by issuing commands to the vehicle. It also can be used to get the status information of the vehicle.
The second part is the vehicle which has an aluminum body. The size of it is 44cm×38cm×22cm with three wheels of the same diameter of 16.5cm. In the vehicle, there are three 12V batteries each of which, by one charge, supplies power to the vehicle to run 18-24 hours. The vehicle can reach a forward speed of 160cm per second and a rotation speed of 300 degrees per second. The embedded control system can be used to control the vehicle to move forward or backward and turn around by the user’s commands. The appearance of the vehicle is shown in Figure 2.2.
The third part is a digital IP camera with panning, tilting, and zooming (PTZ) capabilities. The PTZ IP camera used in this study is an AXIS 213 PTZ made by AXIS, as shown in Figure 2.3. This is a camera with a height of 130mm, a width of 104mm, a depth of 130mm, and a weight of 700g. The pan angle range is 340 degrees and the tilt angle range is 100 degrees. It has 26x optical zooming and 12x digital
zooming capabilities. The image captured in our experiments is of the resolution of 320×240 pixels for the reason of raising image processing efficiency. Moreover, the camera is directly connected to a laptop by a network cable for transmission of the captured image.
(a) (b)
Figure 2.2 The vehicle Pioneer 3 used in this study. (a) A front view of the vehicle. (b) A side view of the vehicle.
2.2.2 Software configuration
The ActiveMedia Robotics provides an application interface ARIA to control the vehicle used in this study. ARIA is an object-oriented interface which is usable under Linux or Win32 in the C++ language and can dynamically control the velocity, heading, and other navigation settings of the vehicle. We use the ARIA to communicate with the embedded system of the vehicle. And we use the Borland C++ Builder as the development tool in our experiments.
(a)
(b) (c)
Figure 2.3 The pan-tilt-zoom camera used in this study. (a) A perspective view of the camera. (b) A front view of the camera. (c) A left-side view of the
camera.
2.3 Learning Strategy and Major Steps in Proposed Process
The proposed learning strategy is based on human following [2]. When the vehicle follows a person, it uses the information of the clothes which the person wears.
Two kinds of situations should be handled. The first is that the vehicle can find the person by the clothes feature and the second is that the vehicle cannot find the person.
In the first situation, the vehicle will keep following the person and avoid losing the observation of the person. In the second situation, the vehicle cannot follow the person anymore. Because the camera on the vehicle always pans to follow the person, the vehicle can keep knowing that the person is on which side with respect to the vehicle (left or right). If the person disappears in front of the vehicle, then the vehicle can find out the side on which the disappearing person was. After that, the vehicle will turn an angle to the correct its direction to search for the disappearing person in the acquired image. But if the vehicle turns for a pre-set number of times and still cannot find the person, then the system stops the learning strategy.
Our system needs to learn two major kinds of data at the learning strategy. The first part is path data. These data are used in the path planning. The second part is the information of those objects which the person wants the vehicle to monitor. The person only needs to stand in front of the object and looks at it. Then the vehicle will go forward to learn the information of the object. These data are used in the patrolling process. The major steps are shown in Figure 2.4.
2.4 Path Planning and Patrolling Principle and Major Steps in Proposed Process
After the vehicle learns the path and the object data in the learning stage, the vehicle refines the path and stretches the rough part of the path by analyzing these
data using a binary-cut line fitting technique. The path planning process has three major steps. The first step is to retain the nodes where the monitored objects are located. After doing the first step, the path is separated into some segments. Then, we collect these path segments by pushing them into an empty queue.
Human detection
Person following
Learn path data
Detect human facing direction
The person looks at a picture for a
while
Learn object data Yes
Start of learning
The person in front of vehicle Yes
Turn a little angle to find the person find the person for
a while
Figure 2.4 An illustration of the automatic learning process by person following.
The second step that is to pop out one segment from the queue and check the segment to see whether it is smooth enough or not. If the answer is no, then go to the third step to separate the segment again into two small parts and push them into the queue mentioned before. We then execute the second step again and repeatedly, until the queue is empty. We save the segments which are smooth enough to be the path data, for use in the patrolling process. The major steps are illustrated in Figure 2.5.
After doing the steps of the path planning, the next process is navigation to the start point. The navigation process includes three major steps. The first step is to read and sort the path data obtained from the path planning. The second step is to go forward according to the path data, and check the current position to see whether there is an object to be monitored there or not. If the answer is yes, then go to the third step to match the monitored object. We repeat the second and the third steps until the vehicle arrives at the start point. The major steps are illustrated in Figure 2.6.
2.5 Vehicle Navigation Principle and Major Steps in Proposed Process
Our system has two major processes in the vehicle navigation system. The first is the learning process. The vehicle learns the information of a navigation path in this process. We divide the navigation path into several segments, and the divided points are the positions where the vehicle needs to turn. Also, the vehicle needs to learn two kinds of major information of every segment of the navigation path. One is the information of the environment of the path where the vehicle navigates and the other is the information of a turning node which the vehicle passes by. The major steps are
illustrated in Figure 2.7.
Allow the part of path Start of path planning
If the queue is empty?
Separate the part of path into two part and push into queue Point data The part of path is
smooth enough?
Path data
Separate path and push path
data into an empty queue Object data
Get a part of path from queue
Yes
No
Yes
End of path planning
Figure 2.5 An illustration of the path planning process.
The second part of our system is the navigation process. After reading the data of the navigation path which is separated into several segments by the turning nodes obtained from the learning process, the vehicle can now navigate and guide, if necessary, a visitor on the path, by performing two tasks at every segment. One is navigation in the middle of the hallway and the other is finding the position of the turning node. These major steps are illustrated in Figure 2.8.
Adjust vehicle location
Read and sort the path
data Path data
Navigation to the next node
Figure 2.6 An illustration of the patrolling process.
Start of learning
Figure 2.7 An illustration of the learning process of the vehicle navigation system.
Start of guiding
Arrive at the turning node?
Navigate on the hallway
Turn
End of guiding Yes
No
Read path data Guiding
path data
Arrive at the destination?
Yes No
Figure 2.8 An illustration of the navigation process.
Chapter 3
Camera and Odometer Calibration
3.1 Introduction
The camera and the odometer are the two important equipments of the vehicle used in this study. In this chapter, we describe the proposed calibration methods for these two equipments.
Distance information is important and useful in the person following process.
The vehicle of our system computes the distance information by analyzing images captured from the camera. But there is ambiguity in the inverse mapping from 2D image coordinates to 3D world positions. Wang and Tsai [1] proposed a method of angular mapping for camera calibration to compute the distance between a vehicle and a person. Chen and Tsai [9] proposed a method of area tracking to deal with the situation where the vehicle cannot see the entire clothes of the person. In Section 3.2, we will review these methods and propose another method which can improve the practicability of these camera calibration methods.
For vehicle navigation in indoor environments, the vehicle location is the most important information to guide the vehicle in correct paths. Though the information provided by the odometer of the vehicle is precise enough for inferring the vehicle location for most applications, it cannot be used solely for the navigation process because the incremental mechanical errors might result in imprecise odometer data and so in deviations from the planned navigation path. Hence, in order to keep the navigation in track, a vision-based odometer calibration technique is desirable to
eliminate the errors. In Section 3.3, we will propose a technique for this purpose.
3.2 Camera Calibration by Cross Shape Detection
3.2.1 Idea of Proposed Camera Calibration Method
The distance information is indispensable for a person-following vehicle. The vehicle can avoid striking a person by using the distance information to keep a safe distance to the person. And the vehicle can go forward to see more clearly for avoiding losing information of this person by using the distance information when the person is far from the vehicle. Our system computes the distance information by analyzing the images captured from the camera. Through imaging with the camera, 3D world coordinate systems are mapped into 2D image coordinate systems. However, there is ambiguity in the inverse mapping from a 2D image point to its corresponding 3D world location because each point in the image is the projection result of a light ray onto the image sensor. The light ray can be described by a longitude angle and a latitude angle in the 3D world space. To define the corresponding longitude and latitude angles (or simply called longitude and latitude in the sequel) of each point in images, Wang and Tsai [1] proposed a method of 2D mapping to achieve the goal of angular-mapping camera calibration.
However, some steps repeat and incur error easily in their method. We will propose a method which simplifies the steps of camera calibration and is suitable for common users. In Section 3.2.3, we will review the method of angular-mapping
camera calibration. In Section 3.2.4, we will describe the proposed method.
Before describing the above-mentioned methods, we first introduce the definitions of the coordinate systems and the directional angle of the camera used in this study in Section 3.2.2.
3.2.2 Coordinate Systems and Directional Angles of Camera
Four coordinate systems are utilized in this study which describes the relative locations between the vehicle and encountered objects. The coordinate systems are shown in Figure 3.1. The definitions of all the coordinate systems are stated in the following.
(1) Image coordinate system (ICS): denoted as (u, v). The uv-plane of the system is coincident with the image plane and the origin I of the ICS is placed at the center of the image plane.
(2) Global coordinate system (GCS): denoted as (x, y). The x-axis and the y-axis are defined to lie to on the ground, and the origin G of the global coordinate system is a pre-defined point on the ground. In this study, we define G as the starting position of the person-following process.
(3) Vehicle coordinate system (VCS): denoted as (Vx, Vy). The VxVy-plane is coincident with the ground. And the origin V is placed at the middle of the line segment that connects the two contact points of the two driving wheels with the ground. The Vx-axis of the system is parallel to the line segment joining the two driving wheels and through the origin V. The Vx-axis is perpendicular to the x-axis and goes through V.
(4) Spherical coordinate system (SCS): denoted as (ρ, θ, φ). This system is proposed
by Wang and Tsai [1]. It is a 3D polar coordinate system and we explain this system in terms of the 3D Cartesian coordinate system with coordinates (i, j, k) for convenience. The origin S of the spherical system, which is also the origin of the Cartesian system, is the optical center of the camera. The ij-plane of the Cartesian system is parallel to the uv-plane in the ICS. A point P at coordinates (i, j, k) in the Cartesian space is represented by a 3-tuple (ρ, θ, φ) in the spherical space. The value ρ with ρ≥ is the distance between the point P and the origin S. The 0 longitude θ is the angle between the positive k-axis and the line from the origin S to the point P projected onto the ik-plane. The latitude φ is the angle between the ik-plane and the line from the origin S to the point P.
Two kinds of directional angles of a camera are used in this study. One is the pan angle and the other the tilt angle. The pan angle of the camera is defined in the VCS, denoted by θc. It represents the degree of horizontal rotation of the camera and is important for coordinate transformation.
u
Figure 3.1 The coordinate systems used in this study. (a) The image coordinate system.
(b) The global coordinate system (c) The vehicle coordinate system. (d) The spherical coordinate system.
Vy
Vx
(c)
(d)
Figure 3.1 The coordinate systems used in this study. (a) The image coordinate system.
(b) The global coordinate system (c) The vehicle coordinate system. (d) The spherical coordinate system. (continued)
We define the direction of the y-axis to be zero. The value of θc is exactly the angle between the camera direction and the direction of the y-axis. The range of θc is between 0 and π if θc is in the first and fourth quadrants and between 0 and –π if θc is in the second and third quadrants, as shown in Figure 3.2. The tilt angle of the camera is defined as the angle between the optical axis of the camera and the ground. The angle, denoted as φc, represents the vertical tilting of the camera. We define the angle to be zero when the optical axis of the camera is parallel to the ground. The range of φc is between 0 and π/2 if the camera tilts up, and is between 0 and –π/2, else, as shown in Figure 3.3.
3.2.3 Review of Adopted Camera Calibration Method
Wang and Tsai [1] proposed a nonlinear angular mapping method to precisely obtain the angular transformation from the real world to the image. By the angular
information of the light rays and the height of the camera, we can know the relative distances of targets in images.
(a) (b)
Figure 3.2 The pan angle of the camera. (a) 0≤θc ≤ . (b) 0π ≥θc ≥ − . π
The coordinate system used in this method is shown in Figure 3.4, which
The coordinate system used in this method is shown in Figure 3.4, which