Chapter 1 Introduction
1.5 Thesis Organization
The remainder of this thesis is organized as follows. In Chapter 2, the configuration of the proposed system and the system processes are introduced in detail. In Chapter 3, the proposed method for constructing the environment map and learning of the environment is described. In Chapter 4, the proposed method for detecting vertical–line features in the outdoor environment with an omni-camera is presented. In Chapter 5, the proposed method for vehicle localization in tours in outdoor park environments by using computer vision techniques is described. In Chapter 6, the proposed method for AR-based tour guidance is presented. In Chapter 7, experimental results and discussions are included. Finally, conclusions and some suggestions for future works are given in Chapter 8.
Chapter 2
System Design and Processes 2.1 Ideas of Proposed Method
In order to monitor the surrounding environment of the video surveillance vehicle, we choose the use of omni-cameras instead of traditional projective cameras to acquire environmental images. In this study we affix an omni-camera on top of the vehicle as shown in Figure 2.1. The omni-cameras in the device can monitor a 360-degree view of the car surround and acquire necessary scene information outside the vehicle.
(a) (b)
Figure 2.1 The video surveillance vehicle used in this study with an omni-camera affixed on the car roof. (a) A front view of the vehicle. (b) A side view of the vehicle.
The video surveillance vehicle has high mobility so that we can move the system to everywhere. But we have to determine the best locations on top of the vehicle where the omni-imaging device should be affixed in order to enhance the imaging effect. As illustrated in Figures 2.2(a) and 2.2(b), if we affix the device at the front middle of the top of the vehicle, a half of the omni-image acquired with the
device will be occupied by the vehicle body. On the contrary, if we affix it at the right-front position on the top of the vehicle, only a quarter of the omni-image is occupied by the vehicle body instead. Therefore, in this study we decide to affix an omni-imaging device at the right-front of the top of the vehicle.
(a) (b)
Figure 2.2 Positions of the omni-imaging device affixed to the video surveillance vehicle roof and the corresponding FOV. (a) The device is affixed at the rear-middle of the car roof. (b) The device is affixed at the right-front of the car roof.
Furthermore, we choose the upper omni-camera in the omni-imaging device for capturing features at larger heights such as those of buildings and light poles. In more detail, it is noted at first that each omni-camera captures a view of a hemisphere shape, so we can use this wide view to detect the vertical lines in the environment. In addition, a characteristic of the vertical line in the real world, when projected into the omni-image, is that the line appearing in the omni-image will go through the center of the omni-image. Therefore, we may consider vertical lines in the real world as good features and detect them in omni-images utilizing this characteristic.
Next, we localize the vehicle using combinations of line features. For example, at some positions along navigation paths, the omni-camera may detect the feature lines of buildings and light poles in the meantime. In such cases, we regard the
combination of the two types of features as a new type of feature learned by the system.
Moreover, we generate perspective-view images, transform them into passenger-view images seen on the mobile device held by the passenger, and augment tour-guidance information on it. Even if the back-seat passenger in the vehicle can watch the front passenger’s view image on his/her own mobile device for tour guidance.
Finally, we implement the system by using a 4G/LTE network for data transmission. By using such a high-speed network, we can send the data to the server to achieve faster computations and send the results to the mobile device held by the passenger. Every passenger in the vehicle can view the tour guidance via the system.
Furthermore, because we display the result on the web, even people not in the vehicle can also enjoy this AR-based tour guidance.
2.2 System of Configuration
In this section, we introduce the configuration of the proposed system in more detail. The hardware of the system includes: 1) a video surveillance vehicle, 2) an omni-camera, and 3) a laptop computer, 4) a server computer, and 5) a pad. The software includes: 1) a program used to integrate the components of the proposed system, 2) the drivers of the omni-cameras, and 3) the program for image acquisition developed by AYSIS VISION Company which is a provider of CCD cameras. The omni-camera is controlled by the laptop computer, and the pad receives information from the server computer which is kept at a cloud site.
2.2.1 Hardware Configuration
The surveillance vehicle, named Delica, is made by Mitsubishi Co. It is a vehicle with size 469cm×169cm×196cm with a working table and a power supply. System operators may sit inside the surveillance vehicle to operate the laptop computer and monitor the entire surrounding environment. Moreover, a steel frame is affixed to the top of the vehicle, on which the omni-imaging device is affixed. And extension USB cords and a cross-over cable crossing the video surveillance vehicle were added to facilitate transmitting images captured with the omni-imaging device. The entire video surveillance system is shown in Fig. 2.3.
4G/LET Network
Server Computer
Laptops
surveillance vehicle
Camera System
Affixed on
Pad
Figure 2.3 Structure of the proposed surveillance system.
In order to control the entire guidance system, we use a laptop computer, a server computer, and a pad as control units, with the laptops handling the omni-imaging device. The laptop is produced by TOSHIBA Computer, Inc. The pad, named Eee Pad Slate B121, is produced by ASUS Computer, Inc. Detailed specifications of these devices are listed in Table 2.1.
The omni-imaging device used in this study consists of two omni-cameras combined coaxially in the longitudinal direction, connected back to back, and tightened by a specially-designed steel holder. Each camera includes a lens of model JHF8M-5MP which is shown in Fig. 2.4(a), and a CMOS sensor of model AISYS ALTAIR U500C which is shown in Fig. 2.4(b). The JHF8M-5MP model is a mega-pixel lens with the parameters of 2/3", 8mm, and F2.8-22. The specification of the CMOS camera sensor is shown in Table 2.2 and the specification of the lens are shown in Table 2.3. The entire omni-imaging device shown in Fig. 2.4(c) is formed with a pair of AISYS ALTAIR U500C cameras and is affixed on the top of the steel holder.
Table 2.1 Specifications of the laptop computers and the pad used in this study.
Satellite A660 Eee Pad Slate
CPU Intel Core i5-480M
Network Fast Ethernet LAN WLAN 802.11
b/g/n 2.4GHz
Table 2.2 Specification of the CMOS cameras used in the imaging device AISYS ALTAIR U500Color Camera 5.0M
Sensor type CMOS
Sensor size 1/2.5" (5.70 x 4.28 mm)
Pixel size 2.2 x 2.2 μm
Frame per second
3~7 FPS
Transfer Type USB 2.0(480 million bytes per second) Table 2.3 Specification of the lens used in the imaging device
(a) (b) (c)
Figure 2.4 The component of the camera device and entire device. (a) AISYS ALTAIR U500C cameras. (b) JHF8M-5MP lens. (c) Entire camera device.
2.2.2 Software Configuration
We use a Visual Studio 2010 (VS 2010) as the development platform to build our guidance system. The VS 2010 is a program development tool for the operating system of Windows. The programming language we use is C++. It is a widely used language. The laptop and the pad run under the operating system of Windows 7.
Lens JHF8M-5MP
Focal Length 8 mm
Maximum Relative Aperture 1:2.8
Iris F2.8 – 22
Angular Field of View 57.9 X 45.0 deg
Image Format 8.8 X 6.6 mm (D11mm)
Minimum Object Distance 0.1 m (From Front Vertex)
In order to use the camera devices, we have to install the drivers of the ALTAIR U500C cameras into the laptop. The camera company also provides corresponding software development kits (SDKs). In addition, we can get the source codes and so we can understand the purpose of the call functions in the program. Accordingly, we can adjust the parameters of each camera, such as the value of exposure or the global color gain, through the SDK. Moreover, the camera company not only provides the VS 2010 but also the BCB, VB.NET, or C#.NET to the programmers.
2.2.3 Network Configuration
The configuration of the network used in this study is as shown in Figure 2.5, where the cameras and the laptop computer are connected through the USB cable. The computer server at the cloud site can access the images captured by the omni-cameras through the laptop by the 4G/LTE network, and so one can make sure that the system always accesses correct and immediate images and messages. Moreover, the Pad accesses the resulting images from the server computer via the 4G/LTE network also.
Display USB port
Laptop
4G/LET Network
Pad
Omni-image
Server Camera
Figure 2.5 The architecture of the local network used in this study.
2.3 Network System
In this section, we describe in detail the design of the proposed network system used in this study. In Section 2.3.1, the server-side system used for conducting complicated works with long computing time in the server computer at the cloud site is described. In Section 2.3.2, we describe the client-side which includes two functions: 1) displaying the result on the pad device, and 2) sending the omni-image data to the servers. Finally, in Section 2.3.3, we will introduce the cooperative operations between the client and the server sides.
2.3.1 Server-side System
The server-side system runs on a virtual machine (VM) of the cloud server. It is connected to the laptop computer to send image data to the server, and the pad device receives the result from the server. Moreover, it has heavy computation loads while carrying out the programs implementing the proposed vision-based techniques, so we use a more powerful cloud server to implement it.
The system in the server computer gets images from the cameras on top of the vehicle. Then, it detects the line features in the images. Because the computational work is heavy, we divide the work into four parts to run them on the multi-CPU server.
Next, we use the detected line features and the learned data to locate the vehicle position.
In more detail, the database of the learned information is saved in the cloud server that has lots of storage. Furthermore, the system uses the information of the location of the vehicle and the learned data to calculate the positions of the buildings in the generated perspective image, and augments the building names on it for the passenger to inspect. Finally we put the resulting image on the web site as well so that
all the users, not just the passengers inside the car, can get tour guidance by connecting to the web site.
2.3.2 Client-side System
The client-side system involves two components, namely, the laptop computer and the pad device. The laptop computer acquires the images by connecting to the omni-imaging device via the USB cable. Moreover, it transforms the omni-image into the passenger-image. Consequently, the client-side system of the laptop sends two images, which are the passenger-image and the omni-image, to the server after their resolutions are reduced to one sixteenth. Reduction of the omni-image resolution to be one sixteenth will not make the vehicle localization result wrong in most cases, but will advantageously decrease lots of data and increase the entire processing speed of the system. After all, we propose the methods that reduce the huge data into two parts; and in the meantime, the system can run continually with high vehicle localization precisions.
Finally, the pad device displays the processed images by connecting to the server and receiving processed images from it. As mentioned previously, the server system sends the processed images for AR-based guidance via the Internet, so the user can just visit the web site by using the pad device without running other applications.
2.3.3 Cooperation between Client and Server Sides
The details of the functions of the server and client side systems are described in Sections 2.3.1 and 2.3.2, respectively. Here we describe the cooperation between them in more detail. An illustration of the cooperation between the two systems is shown in Figure 2.6.
After the client-side of the laptop computer acquires an omni-image, the system transforms the omni-image into a passenger-image firstly. Then, it sends the two images, which are the passenger-image and a reduced version of the original omni-image, to the server.
After the server gets the data, it starts to analyze the data. First, the system detects the vertical line features in the reduced omni-image, and matches them with the learned feature data to localize the vehicle. Next, the system calculates the positions of the building and augment its name on the passenger-view image. Finally, the system sends the AR result to the client-side device via the web site. That is, the client-side uses the pad device to displays the AR result by visiting the web site so that any user in the vehicle can enjoy the guidance by using this system.
Server
Figure 2.6 Cooperation between client and server sides.
2.4 System Processes
2.4.1 Learning Process
The first part of our guidance system is the learning process. It plays an important role in our system. At the beginning, we need a real-world environment map.
Therefore, we choose the open-source map on the Internet to construct the first part of the environment map. Next, we choose a path on the environment map and define it.
Then, we save the result in into the database.
Next, we equip the omni-camera on the top of the vehicle and drive it on the path we choose. When travelling on the path, the learning system detects vertical lines
“seen” in the images acquired with the omni-camera in the meantime. The operator can see the vertical lines detected by system on the laptop computer, and then he/she makes the vehicle stop and starts to “learn” the features. The learned information includes two types: are building and light pole. The system saves information, such as the type of each feature (building or light pole), the position of the feature in the map, the angle of the feature, etc., as the learning result into the database. The detail of the feature information will be discussed in Chapter 3. After traveling the path, the system analyzes the information automatically. In this process, the system works like simulating the traveling once again, and saves the analyzed data in the form of tables which may be looked up in the navigation process to speed up the guidance system. In more detail, the system can use the tables to match the detected features while running the navigation system rather than to analyze it again and again. A flowchart of the above learning process is shown in Fig. 2.7.
Start of Learning Map
Figure 2.7 Flowchart of learning process.
2.4.2 Navigation Process
The second part of the proposed guidance system is the navigation process. In Section 2.4.1, we mentioned how we learn about environment through the learning process. Accordingly, we can estimate the position of the vehicle on the environment
map and implement our tour guidance system in the navigation process. First of all, we use the captured omni-image to detect the vertical line-shaped objects in the environment. This process will be introduced elaborately in Chapter 4. Then, by using the learning information and the detected features, the system can localize the current position of the vehicle in the environment map. The detailed process will be introduced in Chapter 5. Next, the system can calculate the position of the building and augment the building information on the passenger-view image. The detailed process will be introduced in Chapter 6. The entire navigation process is shown as a flowchart in Figure 2.8.
Figure 2.8 Flowchart of proposed tour guidance system.
Chapter 3
Learning of Environments
3.1 Ideas of Proposed Environment Learning Techniques
In this chapter, we describe the details of the method we propose to generate the environment map for use in the proposed AR-based tour guidance system. In order to complete the system, we must construct the environment map for use in the navigation phase, which includes the information about the path of the tour, the line feature detected for vehicle localization, and the building information.
The first part of environment learning is the construction of a real-world map. We choose the “OpenStreetMap” to construct our environment map. The OpenStreetMap is an open data commons where peoples can modify the map free on the internet like Wikipedia. We use the real-world map acquired from there as the base of the environment map for this study, and define features of the environment for my system.
In more detail, each feature we define will be marked with an icon on the map. The selected path is also marked on the map. In addition, the system can also show the vehicle position on the map during the tour so that the user can see the map clearly.
Next, we learn the line-shaped features for vehicle localization. In order to make our system more accurate, we have to get more information about the features. In addition, we not only learn the position of each feature on the map but also learn the information of the feature about how the camera on the vehicle can “see” It. The detail will be described in Section 3.3.
Furthermore, we learn the building information for the system to show in the AR image. The building information includes the building name, the building area in the map, the area where the camera can see, and so on.
Finally, we merge all the data of the environment as the environment map that the system can use for navigation in the tour. The detail of environment learning will be described in the following.
3.2 Coordinate Systems Used in This Study
In this section, we will introduce the coordinate systems used in this study, which describe the relations between the used devices and the environment map. The following are the four coordinate systems used in this study.
(1) World coordinate system (WCS): denoted as (x, y, z) as shown in Figure 3.1(a). The origin OW of the WCS, a pre-defined point on the ground, is regarded as the starting position of the path traversed by the vehicle during the learning and navigation processes.
(2) Camera coordinate system (CCS): denoted as (X, Y, Z) as shown in Figure 3.1(b). The origin Om of the CCS, a focal point of the hyperboloidal-shaped mirror, lies on the X-Y plane which is coincident with the image plane. The Z-axis coincides with the optical center of the lens of the upper CMOS camera in the omni-imaging device.
(3) Image coordinate system (ICS): denoted as (u, v) as shown in Figure 3.1(c).
The u-v plane of this system coincides with the image plane with the origin OC located at the center of the image plane.
The u-v plane of this system coincides with the image plane with the origin OC located at the center of the image plane.