Thesis Organization - 以俯視式環場電腦視覺及行動裝置作擴增實境式室內導覽

Chapter 1 Introduction

1.5 Thesis Organization

The remainder of this thesis is organized as follows. In Chapter 2, we introduce the configuration of the proposed system and the system process in detail. In Chapter 3, we introduce the proposed process for learning of an indoor environment, which includes the data that we will use in the proposed system. In Chapter 4, the proposed user localization method for indoor environments and the proposed user orientation detection method are described. In Chapter 5, we introduce the proposed path planning technique. In Chapter 6, we describe the proposed AR technique, a method to conduct the perspective transformation for information displays on the user’s device, and the adopted technique for rendering augmentations on real images. In Chapter 7, some experimental results to show the feasibility of the proposed techniques for indoor navigation are presented. At last, conclusions and some suggestions for future works are given in Chapter 8.

Chapter 2 Ideas of Proposed Methods and System Design

2.1 Ideas of Proposed Method

We propose an image-based localization technique for AR-based guidance of indoor environments in this study. The system analyzes the omni-images captured from the cameras affixed on the ceiling, and then finds the user’s foot points in the omni-images. When we get the user’s foot points, we transform their coordinates in the image coordinate system (ICS) into the global coordinate system (GCS) to get the actual position of the user in the indoor environment. In order to conduct the above transformation, we construct a mapping table between the ICS and the GCS in advance.

Next, we must detect the user’s orientation after detecting the user’s location. In order to accomplish this aim, the simplest way is to track the user’s locations in consecutively acquired images, and use the resulting motion vectors of the user’s foot points to compute the user’s orientation. But when the user is not walking, this method will not work because there is then no more moving vector for use. In this situation, we propose other techniques to overcome the problem. The first technique is to utilize the orientation sensor installed in the user’s device mentioned previously.

The orientation sensor measures the azimuth angle of the device by detecting changes and disturbances in the magnetic field in the surrounding environment. However, according to our experimental experience, the azimuth values detected are not stable

enough for our application due to indoor magnetic interferences from various sources.

Therefore, we propose a second technique to improve the stability of detected orientations, that is, to attach a color edge mark on the top edge of the user device, and detect this line mark appearing in the omni-image to compute a more accurate orientation of the user at each visiting target.

In addition, in order to guide users to their desired destinations, we propose further a path planning technique for the proposed indoor AR navigation system. An environment map model is constructed first from a graphic drawing of the floor plan of the environment. Then, walkable regions in the floor plan are detected by image processing techniques with the graphic drawing as input. In this way, we can know where the obstacles are in the environment. The orientations of the obstacles then are analyzed to decide how to avoid them and where to go next. When a user wants to go to a destination, the system will search the constructed environment map, and get the destination point in the map. In the meantime, the system will plan a path starting from the user position and ending at the destination. When the planned path collides with any obstacle, it follows the orientation of the obstacle’s boundary to avoid the collision and go to the next immediate visiting spot. Repeating the above steps until reaching the appointed destination, we can get a complete navigation path finally as the desired path planning result.

When the client-side system receives the navigation information sent from the server, the system will display the information on the device screen. The navigation information includes the visiting target information and the navigation path itself. The visiting target information includes the name of the visiting target and its coordinates in the GCS. The navigation path contains the GCS coordinates of the points on the path. In order to display the information in an AR manner, the client-side system must transform the GCS coordinates onto a 2D screen plane. The field-of-view of the

camera of the client device must also be estimated to get a perspective projection matrix. With the matrix, the 3D points of the navigation information can be transformed into the 2D screen plane. Then, the navigation information can be overlaid onto the real places or objects in the image taken of the current scene, and the user can so understand the surrounding environment easily, achieving the major goal of AR-based indoor environment guidance of this study. This step of navigation information overlaying on real environment images for displays on the user’s mobile device will be called display rendering in the sequel of this thesis.

2.2 Ideas of System Design

In this study, the proposed system is of a client-server architecture, which may be decomposed into two parts: a server side and a client side. The server-side system is used for conducting complicated works with heavy computations, and it runs on a centralized computer. The server-side system will be introduced in more detail in Section 2.2.1. The client-side system runs on the user’s mobile device, which obtains navigation information from the server-side system and displays it on the screen of the device. The client-side system will be introduced in more detail in Section 2.2.2.

Finally, the cooperation between the client and server sides will be introduced in Section 2.2.3.

2.2.1 Server-side System

The server-side system runs on a centralized computer as mentioned, and is connected to the cameras on the ceiling through a local area network. In the learning stage, we build an environment map, which includes environment information such as target locations, target titles, and camera locations. In the navigation stage, the server

accesses the omni-images captured from the cameras, and analyzes the omni-images to detect the user’s location and orientation at each visited spot. After the server detects a user via images acquired by the cameras, it sends the user’s location, orientation, and the information of nearby visiting targets to the user’s client-side system. All of such information will be updated when the user moves. When the user wants to reach a certain destination, the server will receive a request from the client, and then plan a path from the user’s location to the destination, and send a set of intermediate points of the path to the user’s client-side system to display.

As a whole, the server is designed mainly for conducting human localization and path planning, and these two tasks are both heavy computational works. Because the client-side system runs on the user’s mobile device, which has lower power and inferior computational capabilities than the centralized computer, conducting these heavy computational works on the server can increase the computational performance and reduce the battery power usage of the client-side system.

2.2.2 Client-side System

The client-side system runs on the user’s mobile device. Because the mobile device held by the user (like an iPad) has lower power and inferior computational capabilities than a laptop or desktop computer, the client-side system on it must be assigned as few works as possible to reduce the power consumption and increase the computational performance. Therefore, most tasks carried out by the client-side system are limited to be those related to information displays, such as view projection, display rendering, and creation of the navigation path’s geometric shape (arrows, thick line segments, etc).

When a user enters the environment, the user’s client-side system is connected to the server through a network and receives relevant information from the server. Then,

the client-side system just needs to display the information on the screen of the user’s mobile device.

2.2.3 Cooperation between Client and Server Sides

The server and client side systems are described in Section 2.2.1 and Section 2.2.2. Here we describe the cooperation between the client-side and server-side systems in more detail. An illustration of the cooperation between the two systems is shown in Figure 2.1.

When the client is connected to the server, the latter will begin to detect the user’s location and orientation, and send the location coordinates, the orientation vector, and the nearby environment information to the user. The information will be updated continuously to make sure that the user can receive correct and immediate messages.

When the user wants to reach a certain destination, the client-side system will send a request, which includes the name of the destination, to the server. After server receives the request, it will plan a path starting from the user’s location and ending at the destination. Finally, a set of intermediate points of the path will be sent to the

Figure 2.1 Cooperation between client and server sides.

2.3 System Configuration

In this section, we will introduce the configuration of the proposed system. The hardware of the proposed system includes fisheye cameras which we use for human detection, and the mobile device which we use as the client-side device. It will be introduced in more detail in Section 2.3.1. In Section 2.3.2, we will describe how to connect the hardware over the network, and how it operates. Finally, we will introduce the software development environment and the operating system we use both in the server-side system and in the client-side system.

2.3.1 Hardware Configuration

The camera we use in this study is of the model of Axis 207MW, which is made by Axis Communications, and the original lens is replaced with a fisheye lens in this study to expand its field-of-view. The Axis 207MW camera has a dimension of 855540mm (3.3”2.2”1.6”, not including the antenna), and a weight of 190g (0.42 lb., not including the power supply). Its appearance is shown in Figure 2.2(a).

The maximum resolution of the images captured with it is up to 12801024 pixels.

For performance efficiency, we use the resolution of 640480 pixels in our system, and the frame rate is up to 15 fps. The cameras can be accessed through wireless networks (IEEE 802.11g/b), but for speed improvement, we access the cameras through the Ethernet.

We build our experimental environment in the Computer Vision Lab at National Chiao Tung University by installing several fisheye cameras on the ceiling of the lab.

(see Figure 2.2(b)). The images captured from the cameras are analyzed by the centralized computer to detect the user’s location and orientation. The server sends the navigation information to the users’ mobile device so that the user can begin the

navigation. The mobile device we use in the experiment is a HTC Flyer tablet made by HTC Corporation. Its appearance is shown in Figure 2.3. The HTC Flyer has a dimension of 19512213.2mm (7.7”4.8”0.5”) and a weight of 420g (0.93 lb). It has a screen size of 7 inches, a camera acquiring 5-megapixel images, and an e-compass that can detect the device orientation in a magnetic field, etc. The user uses the HTC Flyer as the client device, and connects it to the server through a wireless network.

(a) (b)

Figure 2.2 The camera used in the proposed system. (a) The appearance of the camera. (b) The camera installed on the ceiling in the indoor environment.

Figure 2.3 The HTC flyer used as the client device in this study.

2.3.2 Network Configuration

Using the Ethernet is more reliable for our application in this study than using a wireless network. Therefore, the cameras and the centralized computer are connected through a local area network (LAN) in this study. The server can access the images captured from the cameras in a more reliable way through the Ethernet, and so one can make sure that the system always accesses correct and immediate images and messages.

The client device we use is a mobile device, so it must access the server through the wireless network. The most commonly-used wireless networks currently are the Wi-Fi and 3G networks, and the client device can access the server and receive the navigation information using both of them. For reliability and speed considerations, we set up a Wi-Fi access point in our experimental environment, and the user can connect to the server through the Wi-Fi network in the environment. A complete network architecture is shown in Figure 2.4.

LAN

Server Client Device

Camera Camera

Wi-Fi Network

Figure 2.4 The network architecture of the proposed system.

2.3.3 Software Configuration

The server-side system is written in C# programming language using the

Microsoft Visual Studio 2010 development environment, and the system operates on the Windows 7 operating system. The server-side system accesses the cameras by the AXIS Media Control SDK (AMC SDK), which is provided by the manufacturer of the cameras, Axis Communications. The AMC SDK provides the application programming interface (API) for developers to access the camera images or control the cameras using C# and C++ programming languages.

As to the client-side system, it is written in the Java programming language and operates on the Android 2.3.4 operating system. The client-side system uses the Qualcomm’s Augmented Reality (QCAR) platform, which provides many useful functions for AR developments on mobile devices. But in our system, we only use the QCAR to handle the capturing of camera images. The rendering of 3D augmented objects is conducted by the Android OpenGL API.

2.4 System Processes

2.4.1 Learning Process

The goal of the learning process of the proposed system is to establish the environment map, which includes information about the visiting targets, cameras, magnetic fields, and obstacle orientations. The entire learning process is shown in Figure 2.5, and more details of it will be described in Chapter 3. Only a brief description of the process is given here.

First, we establish an environment map in the form of a floor plan drawing. The floor plan is drawn at a specific ratio relative to the actual size of the environment.

After specifying the ratio, we compute the corresponding size in the unit of pixel. The use of this scaling ratio is necessary for the transformation between the ICS and the GCS. Next, the visiting targets of the environment are specified on the environment

map. Furthermore, we must also specify the installation information of the fisheye cameras. The installation information includes the location and height of the cameras, which is necessary for use in computing the transformation between the image coordinate system and the map coordinate system.

After the environment map is established, the learning processes can be decomposed into two phases: learning for path planning and learning for human localization. Before we perform path planning, the system must know the information of obstacles. The path planning algorithm can determine how to avoid the obstacles in the environment by the obstacle information. Therefore, the goal is to analyze the information of obstacles, which includes obstacle location and obstacle orientation, in the path planning phase. A more detailed description of obstacle analysis will be given in Section 3.3.3.

In the human localization phase, we calibrate the cameras, including the server-side fisheye cameras and the client-side on-device camera, to map the points between different coordinate systems. A more detailed description of the camera calibration process will be described in Section 3.4. Furthermore, the system detects the user’s orientation by aid of the e-compass on the client device. The e-compass, as mentioned before, is an orientation sensor that measures the azimuth angle of the device by detecting the changes and disturbances in a magnetic field around the currently-visited spot. However, the magnetic field will be interfered by the structural steel elements in a building, so the magnetic field does not have an identical distribution at every location in the environment. To learn the magnetic field in the visited environment, we establish an azimuth map, which keeps a record of four direction azimuth values for every sample location in the environment map. A more detailed description of the magnetic field learning process will be described in Section 3.3.4.

After the above learning steps, we have completed the preparation works needed in the navigation stage of the proposed system process. In the next section, we will describe the works conducted in the navigation stage.

2.4.2 Navigation Process

In the navigation stage, the server analyzes the omni-images captured with the cameras continuously, and sends the environment information to the client. The client-side system displays the information on the screen of the user’s mobile device.

When the user wants to reach a certain destination, the server will plan a path, and

send a set of intermediate points of the path to the client. The entire navigation process proposed in this study is shown in Figure 2.6.

At the server side, the first step is human location detection. The proposed human localization process transforms the detected human location from the ICS into the GCS using the camera information we have acquired in the learning stage. Then the user’s location is used in the steps of human tracking and human orientation detection. The objective of the human tracking step is to identify the same human in consecutive video frames, and then compute the user’s speed to determine whether the user is walking or not. In the human orientation detection step, we detect the orientation by analyzing the color edge mark, which is on the top of the client device, in the omni-image. However, when the color edge mark is not observable in the omni-image, another technique must be adopted. For this, we compute the orientation by use of detected human motions, or by the azimuth map constructed in the learning stage. Here we also determine the nearby visiting targets seen by the user according to the user’s location. Finally, the server sends the information of the user’s location and orientation, and the nearby visiting targets to the user’s mobile device (the client).

Next, if the server receives a request that the client wants to reach a certain destination, the server begins the path planning process; if not, the server continues to conduct human localization repetitively. At the first step of path planning, the system tries to find a path starting from the user location and ending at its desired destination using the obstacle information analyzed in the learning stage. But the found path may be not of the simplest form; i.e., there may exist two non-connected points in the path that can instead be connected together. In such cases, we simplify the path to be of a simpler form. Finally, a set of resulting intermediate points of the path will be sent to the client.

When the client receives the navigation information mentioned above, it begins

to conduct the work of display rendering by “drawing” the information, which includes the visiting target information and the navigation path, on the device screen for the user to inspect. In order to map real world objects onto the mobile device

在文檔中以俯視式環場電腦視覺及行動裝置作擴增實境式室內導覽 (頁 23-0)