Thesis Organization - 利用多部KINECT建構環車3D行車紀錄器及其應用

Chapter 1 Introduction

1.5 Thesis Organization

The remainder of this thesis is organized as follows. In Chapter 2, we introduce the configuration of the proposed system and the system processes in detail. In Chapter 3, we present the design of the proposed 3D around-car imaging system. In Chapter 4, a method for constructing 3D images is described. In Chapter 5, we introduce the proposed method for modeling around-car objects, including calibration and merging multiple KINECT devices. In Chapter 6, the proposed method for browsing a 3D image including integrated long-range views and nearby objects is described. In Chapter 7, experimental results and discussions are presented. Finally, conclusions and some suggestions for future works are given in Chapter 8.

Chapter 2 System Design and Processes

2.1 Ideas of Proposed System

In order to build a complete panoramic view around the car, we affix KINECT devices with different orientations to the car body at different locations. We try to consider the use of the minimal number of KINECT devices to cover the entire car body with a 360-degree surround. With the horizontal angle of a KINECT device being 57 degrees, the minimal number of KINECT devices required to accomplish a complete covering of 360^o would be 7, but our final design of the 3D around-car system uses 14 KINECT devices as mentioned in Chapter 1. The reason and more details of the design will be described in Chapter 3.

With the 14 KINECT devices, we can acquire depth and color images through universal serial buses (USBs). Since each KINECT device needs one distinct USB controller, we need 14 USB controllers for the proposed system. In turn, we need a special computer with at least 14 USB controllers. But such a computer is not available commercially. Therefore, we insert a USB extension card onto the mother board of a common desk-top computer for use in this study. But the computing speed becomes so slow that we decided to use two computers to build our system, with each computer containing seven USB controllers. Analysis of the resulting processing speed will be described in Chapter 3.

In Section 2.2, we will describe the configuration of the proposed system, including the hardware and the software. The processes conducted by the proposed

system, which include a learning process and a data recording and analysis process, will be described in Section 2.3.

2.2 System Configuration

To build the proposed 3D imaging system on the car, we affix 14 KINECT devices on the car as mentioned before. Color and depth images are acquired with the KINECT devices via the OpenNI which is an open source software for use as the device driver. In Section 2.2.1 we will review the functions of the KINECT device, and describe how and where we affix the 14 KINECT devices on the car. In Section 2.2.2, we will introduce system development environment for this study, and the functionality of each component in the system.

2.2.1 Hardware Configuration

At first, we review the structure of the KINECT device as shown in Fig. 2.1.

Inside a sensor case, a KINECT device contains an infrared (IR) emitter, an RGB camera, an IR depth sensor, and a tilt motor to control the tilting angle of the KINECT device.

Fig. 2.1 The structure of a KINECT device.

The functions of the components of the KINECT device are explained below.

1. An infrared (IR) emitter and an IR depth sensor The IR emitter emits light to objects in the environment, and the IR depth sensor reads the reflected light from the objects to compute the distances between the objects and the KINECT device by converting the times of flight of the reflected light rays into depth values.

2. An RGB camera  The camera in the KINECT device senses image data of three color channels R, G, and B.

3. An accelerometer  The accelerometer is conFig.d for sensing a 2G range (G is a unit of acceleration due to gravity), by which, we can know the current operational condition of the tilter (i.e., the tilt motor).

4. A multi-array microphone  This device contains four microphones which may be used to record the sound and know the direction of the sound.

Some more specifications of the KINECT device are listed in Table 2.1.

Especially, the view angle, the tilt range of the device, and the range of the sensed depth values are important for our design of the proposed system. We will discuss these parameters in more detail later in Chapter 3. Furthermore, some specifications of the desktop computer used in this study are listed in Table 2.2.

Table 2.1 Specifications of the KINECT device.

Horizontal viewing angle 57 degrees

Vertical viewing angle 43 degrees

Tilt range of the device ±27 degrees

Range of the depth sensor 1.2-3.5 meters

Resolution of color images 640 x 480

Resolution of depth images 320 x 240

Table 2.1 Specifications of the desktop computer used in this study.

Processor Intel Core i7-3770

Memory 16 GB

Mother board GA-Z77X-UD4H-1

graphics card GV-R7750C 2GI

PCIE USB 3.0 extension card (AISYS Vision)

Aguila SU16T base x 1 Aguila SU16T expansion x 1

Fig. 2.2 The USB extension card (the upper is the expansion and the lower is the base)

The Aguila SU16T base mentioned in Table 2.1 and shown in Fig. 2.2 contains four USB controllers, and the Aguila SU16T expansion also contains four USB controller as an expansion of the base. We use two computers with each holding an Aguila SU16T base and an Aguila SU16T expansion. Thus each computer can allow connections to eight USB controllers, but in our study, only seven USB controllers in a computer are used. In addition, we use a ferrous box and affix it on the car to hold each of the KINECT devices. Two examples are shown in Fig. 2.3.

(a) (b)

Fig. 2.3 Ferrous boxes for holding KINECT devices. (a) Without a KINECT device.

(b) With a KINECT device.

2.2.2 Software Configuration

About the software configuration, firstly we review the OpenNI which is an open source software for development of 3D sensing middleware libraries and applications as shown in Fig.2.4. It provides a tool for getting depth and color images from KINECT devices, or in other words, it provides an interface to communicate with the hardware and the computer.

Fig. 2.4 illustration of the role OpenNI plays.

Secondly, the OpenGL is an application programming interface (API) for generating 2D and 3D images. This API is typically used to interact with a GPU to achieve hardware-accelerated rendering. In our case, the OpenGL is a tool for rendering 3D data which are obtained from transforming the color and depth images.

The details will be described later.

Finally, we use the Visual Studio 2010 as a development environment for integrating multiple kinds of libraries, such as OpenNI and OpenCV. Therefore, we can write programs using the C and C++ languages and the libraries to conduct works of image processing, rendering, …, etc. on this software platform.

2.3 System Processes

2.3.1 Learning Process

Before the data recording process, a learning process is necessary. We divide the learning process into two parts as follows.

The first part is the process for system calibration, including the calibrations of the KINECT devices and some system parameters. The first task to be done in this process is to find out the height of each KINECT device with respect to the road in the 3D image. For this, we have to transform the depth and color images acquired by a KINECT device into a 3D image at first. Then, we find the desired height by a try-and-error manner using the 3D image; when an appropriate height parameter is found, we can use it for next steps.

Secondly, we have to calibrate the geometric relation between every two neighboring KINECT devices. For this, we use a box with a simple shape as the calibration target, and take the previous result, the height of each KINECT device, to find the calibration target out in precise. The result of this process is the relative angle between every two KINECT devices, which can be used in the data recording and analysis process. A flowchart of this part of the learning process is shown in Fig. 2.5, and the detail of calibration is described in Chapter 5.

The second part is the process for stitching of multiple images to construct a long-range panoramic view. In this processing, at first we want to learn a threshold value to separate far-view contents from near-view ones in each color image which is taken by the KINECT device. The work is completed by try-and-errors. The resulting set of far views can then be combined together to get a panoramic view by a stitching process which will mentioned in more detail in Chapter 6. A flow chart of this part of

the learning process is shown in Fig. 2.6.

Conversion

Fig. 2.5 The learning process for system calibration.

Stitching

Fig. 2.6 The learning process for image stitching.

2.3.2 Data Recording and Analysis Process

In the data recording and analysis process, two computers are in use as the controllers of the 14 KINECT devices and are connected by a cable. They are of a master-slave structure, as shown in Fig. 2.7. The software implementation is based on a client-server architecture which we use a windows socket to conduct the

Fig. 2.7 Master-slave structure of proposed data recording and analysis process where PC1 is the master computer and PC2 is the slave computer.

At the beginning of the data recording and analysis process, as shown in Fig. 2.8, at the client side, the master receives an instruction from the user, and then sends a request to the slave at the server side for recording. The slave starts the recording process and returns a reply to the master after getting the request. While receiving a reply from the slave at the server side, the master starts to run the recording process,

too. All the communications are implemented by multi-thread, because we have to communicate two devices and record at the same time. The ending of the data recording and analysis process is shown in Fig. 2.9.

Client(master) Server(slave) 2. Request

1. instruction

4. Reply

5. recording 3. recording

Fig. 2.8 Starting the data recording and analysis process.

Client(master) Server(slave)

2. stop request 1. wait the stop pattern

4. stop reply

5. stop recording 3. stop recording

Fig. 2.9 Ending the data recording and analysis process.

Furthermore, a series of tasks are conducted in the data recording and analysis process as described in Section 1.3, including (1) transforming color and depth images acquired by each KINECT device at each time instant into a 3D image; (2) stitching all the color images into a panoramic color background image; (3) extracting nearby 3D objects from the 3D image corresponding to each KINECT device; (4) merging the extracted 3D objects into the panoramic color background image; and (5) allowing

the user to browse the merging result from any viewpoint and display the corresponding partial view.

Chapter 3 Design of Proposed 3D Around-car Imaging System

3.1 Idea of Proposed 3D Around-car Imaging System

When constructing a 3D EDR, it is important to let the EDR “see” the view around the car with no blind spot. It is obviously not enough to use only one KINECT device whose horizontal angle range is 57 degrees only. Instead, we have to affix multiple KINECT devices around the car. In addition, the way of design for this system is different for each distinct part of a car. Before starting the description of the proposed system design, we give a brief review of the design of a car model with multiple cameras produced by Luxgen Motor Co. Ltd.

Luxgen has released a new car equipped with six RGB cameras around the car body. They are called “eagle views.” A camera is affixed to the front of the car, and another to the rear. The remaining four cameras are affixed below the side mirrors, with each side equipped with two cameras, one camera facing to the rear of the car, and the other is facing askew to the rear as shown in Fig. 3.1.

(a) (b)

(e) (f)

(g) (h)

Fig. 3.1 The cameras affixed on the body of the car (a) (b) front part of the car (c) (d) side part of the car (e) (f) rear part of the car (g) (h) the recorder on the mirror.

This design inspired us, because we can affix cameras on the side mirror. With these cameras, we can see the around-car view like through the window of the car. If someone gets close to the car near the window, it would be found by the nearby KINECT device.

About the coverage of the front view of the car, Luxgen uses only one camera to cover the front view, because the camera is of the fisheye type which yields a wider view than from a normal projective camera. The overlapping portion of the front view and the side view is narrow, and so there exist four bind spots on the corner. To improve it, we use four additional KINECT devices to cover the corner views in our design. Specifically, to cover the front-left corner, a KINECT device with its view covering the portion of (a) as shown in Fig. 3.2 is deployed. A second KINECT device is used to cover the right symmetric portion in the front. For the rear-left corner, a third KINECT device with its view covering the portion of (c) is deployed. And the fourth KINECT device is used to cover the right symmetric portion in the rear.

About the other deployed KINECT devices, we affix three KINECT devices for the front view, four for each side view, and three for the rear view. In addition, we affix a KINECT device on each of the two side mirrors which looks backward to cover the view portion of (B) as shown Fig. 3.2. More details will be described in Section 3.3.

3.2 Details of System design

3.2.1 Front Part

In this section we will focus the part of our design which is related to the driver’s views. Specifically, we will affix KINECT devices to proper car body parts to cover

“blind spots” that the driver cannot notice during driving, for example, the part of the

front view which is lower than the height of the engine hood.

Fig. 3.2 Proposed design of the KINECT-device system affixed on the vehicle and the views of the KINECT devices.

When one drives a normal car in the street, the car might accidentally run over a dog or some animal. In this situation, the driver won’t know what is happening because what is going on is within the blind spots of the car. Using the car with our

design of the multiple KINECT device system, the blind spots can be eliminated and such accidents can be avoided. Moreover, blind spots seen from a driver on the truck are much larger than those of a usual vehicle, so proposing a design to cover completely the surround of a car is really important. And this is done in this study.

Also, we affix three ferrous boxes on the bumper (as shown in Fig. 3.6(a)) for the maximal utilization of the KINECT devices. This allows us to see the region under the engine hood, as shown in Fig.3.3. The yellow object is used to tag the limit of driver’s view (below or closer than this object would not be seen in the driver’s view).

(a) (b)

Fig.3.3 A test for driver’s view. (a) Side view. (b) Front view.

3.2.2 Right- and Left-side Parts

Originally, a KINECT device was put on the iron stand which is stitched on the car as shown in Fig. 3.4. This design was considered the maximal utilization of the depth information, which is available in the range from 0.5m to 6m according to our experiment. However, this design violates the law of car modification. Other by-passing cars will possibly be scratched by the iron stand while driving on the road, so this wasn’t an appropriate design.

(a) (b)

Fig. 3.4. A car-side iron stand for holding a KINECT device. (a) With a KINECT device. (b) Without a KINECT device.

After this experience, a new design was developed with the KINECT devices affixed at the higher side rack on the car roof, as shown in Fig. 3.5. This design is considered to be safer and more convenient. The ferrous box preserves a position for each KINECT, so the position of each KINECT device will not change whenever we put KINECT devices back on the car.

(a)

(b)

Fig. 3.5. Ferrous boxes for holding KINECT devices. (a) With a KINECT device. (b) Without a KINECT device.

With this new design, the KINECT device is unmovable and safer when we are driving. Though the tilter of the KINECT device is movable, these two KINECT

devices are too high. To solve this problem, we affix two KINECT devices on the side mirrors to cover the lower view ranges as shown in Figs. 3.6(c) and 3.6(d)

3.2.3 Rear Part

In the previous part we used a ferrous bar and affixed the boxes on that.

Contrastive to the previous part, because we don’t have any space to put the ferrous bar, or there is some ferrous stand originally, we drilled a hole for fastening the screw and affixed the ferrous box as shown in Fig. 3.6(b). We use this method only as a last resort.

The rear part is similar to the part of the front view, but we have to consider the case when the car is driven backward. From common drivers’ experience, the back view is known to be as a serious bind spot of the car, so we extend the view by using three KINECT devices (originally it was only one which is not enough). Furthermore, we use two KINECT devices on the side mirrors to cover more of the back view.

(a) (b)

Fig. 3.6 Around-car KINECT devices (a) A front view. (b) A back view. (c) A lateral view. (d) A rear-view mirror.

3.3 System Performance Analysis

3.3.1 Ranges of Camera Views

With the proposed 3D images system, driving a car is like to carry a box. This box is used to collect 3D data. At each instance every view is dropped into this box which is a parallelepiped shape. The size of this box is an extension of the car plus 6m outward. After recording KINECT images and processing them in this study, we can see a 3D image around the car for every instance. However, the depth data of KINECT devices are partially available in outdoor environments due to the interference from the sun to the emitted infrared light of the device. Note that the sun has a full spectrum of light. But according to our experiments, we have found that the data acquired by the KINECT device still works during sunset time. Our experimental results about this aspect are shown in Fig. 3.7 which were collected in August. We know from this result that the size of the parallelepiped box mentioned above is floating, and the 6m range is just an ideal case.

Fig. 3.7 The relationship between the depth image quality and the sun intensity (the x-axis specifies time, the y-axis specifies the available depth range).

Though this would be a bad news to our application, but we can still utilize the color images as the views in day time, and use the depth image to construct 3D models for night time, since the quality of depth images is pretty good in night. We improve the vision in the night by using depth images, and the night time is when traditional 2D EDRs do not work well.

3.3.2 Imaging Sequence and Speed

We use two computers to speed up our imaging speed. Two problems so arises.

The first is the communication time, and the second is the synchronization of the rates of FPS (frames per second).

Firstly, we have to know the imaging speed of a single KINECT device. By reading the specification of the KINECT device, we know that the imaging speed is 30 FPS. In other words, we take a picture in 33ms. A single request from the master computer to the slave takes about 1 ms according to our experimental experience.

Since the communication steps will not affect our imaging speed too much, we can conduct sequential processes with this speed for applications.

在文檔中利用多部KINECT建構環車3D行車紀錄器及其應用 (頁 19-0)