Thesis Organization - 透過KINECT影像做視訊監控應用上的立體環境建模與監視

Chapter 1 Introduction

1.5 Thesis Organization

The remainder of this thesis is organized as follows. In Chapter 2, we introduce

the configuration of the proposed system and the system process in detail. In Chapter 3, we introduce the design of the hardware device of the 3D video surveillance system in detail, and analyze its performance. In Chapter 4, we describe the proposed schemes for conversion of KINECT data into 3D image data, and correction of the conversion result. In Chapter 5, we describe the proposed methods to calibrate the KINECT devices and to model the indoor environment. In Chapter 6, we introduce the proposed human detection and tracking method. In Chapter 7, we introduce the proposed human modeling method and the 3D way we use for displaying the result. In Chapter 8, we will show some experimental results of the entire system process. At last, conclusions and some suggestions for future works are given in Chapter 9.

Chapter 2 Ideas of Proposed Methods and System Design

2.1 Ideas of System Design

To complete the construction of the proposed 3D video surveillance system, it is important to design an appropriate structure of the video acquisition device for the system. The field of view of a single KINECT device is not wide enough, so we construct an octagonal 9-KINECT imaging device using multiple KINECT devices to extend the view of field. It not only can monitor an indoor environment which is large enough as a whole, but also can fully use the tilting mechanism in the KINECT device for dynamic human activity tracking. The detail of the octagonal 9-KINECT image device will be introduced in Chapter 3.

After constructing the octagonal 9-KINECT imaging device, we affix it on the ceiling of our experimental environment at a suitable height, and the KINECT devices in it are used to acquire image data of the around environment by tilting them from top to bottom for a full view of 360^o. Because the KINECT devices in the octagonal 9-KINECT imaging device work individually and the computer controller acquires images sequentially, we set an image acquisition order for the KINECT devices.

When acquiring the data from KINECT devices, we will sort the data by this order of KINECT devices.

Finally, we design several software process units to analyze the data acquired from the KINECT devices and display the result. More details about the hardware

devices which we use in this study and the software for processing image data and displaying the processing result will be described in Section 2.2. The system processes are introduced in Section 2.3.

2.2 System Configuration

In this section, we introduce the configuration of the proposed 3D video surveillance system. The hardware of the proposed system includes the KINECT devices we use in this study widely and the necessary devices for acquiring data from multiple KINECT devices. It will be introduced in detail in Section 2.2.1. In Section 2.2.2, we will describe the software development environment for processing data and displaying results.

2.2.1 Hardware Configuration

The sensor we use in this study widely is the KINECT device which is made by Microsoft. It consists of one RGB camera, a couple of 3D depth sensors, a set of multi-array microphone, and one motorized tilt. Its appearance is shown in Figure 2.1.

Its vertical and horizontal viewing angles are 43^o and 57^o, respectively. Its vertical tilt angles range from 27^o to 27^o. Its sensing distances for the color image, the depth image, or the skeleton tracking ranges from 1.2 meters to 3.6 meters, but the actual sensing distance used in this study will be larger and will be discussed in Section 2.2.2. The maximum resolution of the color image and the depth image captured from the KINECT device is up to 1280960 pixels with a lower frame rate. For performance efficiency, we usually use the resolution of 640480 pixels and 320240 pixels in our system, and the frame rate is kept 30 fps. Its audio format is 16-kHz and 24-bit mono pulse code modulation (PCM). Its audio unit has a four-microphone

array with a 24-bit analog-to-digital converter (ADC), and a Kinect-resident signal processing unit with the functions of acoustic echo cancellation and noise suppression.

In this study, we won’t use the audio device and the skeleton tracking function.

Figure 2.1 The KINECT device used in this study.

A single KINECT device uses a USB to deliver its data to the data-processing device (a computer), so the data-processing device should prepare more USB ports for multiple KINECT devices. Furthermore, the data volume delivered by a single KINECT device is too huge, so we can’t use a general USB port extension without adding a USB controller to the data-processing device. In this case, the KINECT device relies on more USB controllers than USB ports, so we should prepare more USB controllers instead of more USB ports for the data-processing device. As previously mentioned, we install the Aguila SU16T Base and the Aguila SU16T Expansion to our data-processing device to extend USB ports and controllers. The Aguila SU16T Base and the Aguila SU16T Expansion are shown in Figure 2.2. The

Aguila SU16T Base is installed on the mother board by PCI Express with 16 ports, and the Aguila SU16T Expansion is installed on the Aguila SU16T Base. The Aguila SU16T Base and the Aguila SU16T Expansion provide 8 USB controllers and each USB controller has 2 USB ports.

Figure 2.2The Aguila SU16T Base is on the top of PCI Express x16 and the Aguila SU16T Expansion is at the bottom.

2.2.2 Software Configuration

After the hardware of the 3D video surveillance system is constructed, we build up a data-processing system to implement the desired functions of the 3D video surveillance system. The system is written in the C++ programming language using the Microsoft Visual Studio 2010 development environment, and run under the Windows 7 operating system. The system initializes the KINECT device and acquires

the image data from the KINECT devices through the Kinect-for-Windows SDK, which is provided by Microsoft. By the way, the maximum sensing distance is 4 meters by using the Kinect-for-Windows SDK, because Microsoft considers that distances smaller than 4 meters is more precise than those larger than 4 meters. The system also uses open sources such as the Open Source Computer Vision (OpenCV) and the Open Graphics Library (OpenGL) to assist data processing. By using the OpenCV application programming interface (API), the system can process the image data easily, and display the result in 3D manners by the OpenGL API.

2.3 System Processes

With the hardware and software configuration completed, we will introduce the whole process of the proposed processing system in detail in this section. For this, we separate the system process into four parts.

The first part is a data conversion process. Because the depth information acquired from the KINECT devices is not 3D in nature, we should convert it to 3D data and the converted data can also be used for other processes. The detail of the conversion scheme will be described in Chapter 4.

The second part is a model construction process of the indoor environment. First, we use the 3D data, which are obtained from the data conversion process just mentioned, from each KINECT devices to calibrate the spatial relation between KINECT devices. Afterwards, we use the calibration result to merge the 3D data and construct an indoor environment model. Finally, we show the model with color images in 3D manners. The flow of the process is shown in Figure 2.3, and the details of the calibration strategy, the merging algorithm, and the model display scheme will be introduced in Chapter 5.

KINECT device

Calibration 3D data

Calibration

result Merging

Model

Rendering

Display

KINECT device ... ... KINECT device

Data conversion

Color images Depth images

Figure 2.3The model construction process of the indoor environment

The third part is a process of human activity tracking. First, we use depth images to detect human activities. By the detection strategy used in this study, we conduct

background learning and noise elimination. The detail of human detection will be described in Section 6.2. Next, we use the result of detection to track human activities.

When tracking human activities, we will adjust the tilter of the KINECT device dynamically. Furthermore, we will also change the viewpoint by the in-time handoff between KINECT devices and display the result with color images in 3D manners.

Figure 2.4 The process of tracking human activities.

The forth part is a process of human model construction and human activity display. We will convert the 3D data, which are recorded by the KINECT devices, by a data conversion process proposed in this study to build up the human model. For this, at first we segment the human activity in each frame out from individual KINECT devices by using the detection method described in Section 6.2. Next, we merge the 3D data obtained for the individual KINECT devices. Then, we use the merging results of individual KINECT devices to merge again to build up a finer human model. Finally, we display the human model and show the human features extracted from the model. The whole process is shown in Figure 2.5, and the detail of the process will be introduced in Chapter 7.

Figure 2.5 The process of constructing human model and displaying human activities.

Chapter 3 Design of Proposed Octagonal 9-KINECT Imaging Device

3.1 Introduction to KINECT Device

In this study, we have designed an octagonal 9-KINECT imaging device for environment monitoring. About the basic unit of the imaging device, namely, the KINECT device, we have presented some of its basic specifications in Section 2.2.1, but we would like to introduce the structure of the KINECT device in detail in this section.

The height of the whole KINECT device is 70 millimeters, the width of the main part of the KINECT device is 283 millimeters, and the thickness of the main part of the KINECT device is 60 millimeters. The area of the basement of the KINECT device is 9072 square millimeters. The structure specifications are shown in Figure 3.1.

The KINECT device can also change its panning angle by manual adjustment, but we won’t use the panning angle in this study because the constructed 9-KINECT imaging device is hung high up on the ceiling for monitoring the environment from a higher position. The KINECT device contains a gravity sensor which can detect the tilting angle between the device and the ground. We will use this tilting function to monitor wider areas of the environment.

(a)

(b)

(c)

(d)

Figure 3.1 The Structure specifications for each part of the KINECT device. (a) The height of the KINECT device. (b) The width of the main part of the KINECT device.

3.2 Ideas of Proposed Design

In this study, we want to use multiple KINECT devices for the proposed 3D video surveillance system, but we can’t directly use multiple KINECT devices without being organized. So we propose the octagonal 9-KINECT imaging device to organize multiple KINECT devices. The idea of the design of this system is described in this section.

Firstly, we have to know how many KINECT devices we should use. As we mention in the previous sections, the horizontal viewing angles of a single KINECT device is 57^o, so we should use at least 7 KINECT devices for a full view of 360^o. In our design, we would like to use 8 KINECT devices to cover the full view with a certain degree of overlapping. But when we use the 8 KINECT devices to sense outward for a full view of 360^o, there is a missing field of view which appears in the combination of the 8 views given by the 8 KINECT devices, namely, the middle part.

So, we add an additional downward-looking KINECT device to make up the missing field of view. So, totally 9 KINECT devices are used to establish the system. The basic placement idea of the 9 KINECT devices is illustrated in Figure 3.2.

With the basic placement idea, we can make a container for the 9 KINECT devices as shown in Figure 3.3 which is a copy of Figure 1.1. We also consider the utility of the tilting device within each KINECT device, so we place the 8 KINECT devices, which are sensing outward for a full view of 360^o, on their individual bases outside the container as shown in Figure 3.3.

Figure 3.2 The basic placement idea of the proposed octagonal 9-KINECT imaging device. The central KINECT device looks downward and the others senses outward.

3.3 Details of Design

With the design idea as described above, we will now introduce the design specification of the proposed octagonal 9-KINECT imaging device in detail. We will separate the design specification into three main parts: interchangeable bases for KINECT devices, the container, and the top part. The whole appearance of the octagonal 9-KINECT imaging device is already shown in Figure 3.3.

Figure 3.3 The octagonal 9-KINECT imaging device.

3.3.1 Interchangeable Bases for KINECT Devices

The first part is interchangeable bases for the outer 8 KINECT devices. We want to use the outer 8 KINECT devices to sense more information above the ground when the outer 8 KINECT devices are placed on the bases with a suitable height. Therefore, we designed an incline for every base. The tilt angle of the incline is 30^o. Because the area of the basement of the KINECT device is 9072 square millimeters, we design the incline to have the area of 100100 square millimeters to fit the basement. We also make two screw holes to fix the whole base. The base is shown in Figure 3.4.

Figure 3.4 The interchangeable base.

3.3.2 Container

The second part is the container. All the lines of the KINECT devices are put in the container. We design the container in an octagon shape for the outer 8 KINECT devices. Because the width of the main part of the KINECT device is 283 millimeters and we don’t want to make collisions when changing the tilting angles of the KINECT devices, we designed each of the edges of the octagon to be 320 millimeters.

The height of the octagonal container is 300 millimeters.

Then, on each side of the octagonal container, we make one square hole and two screw holes. The size of the square hole on the side of the octagonal container is

2525 square millimeters. For the each KINECT devices on the interchangeable base outside the octagonal container, we can put the transmission line and power line of the KINECT device into the container through the square hole. We also used the two screw holes to fix the interchangeable base.

Furthermore, we made a rectangular hole whose size is 70150 square millimeters on the center of the bottom of the octagonal container. The inner KINECT device can look downward through the rectangular hole.

Finally, the cap of the octagonal container is a cross-shaped plate. We used the crossed plate as a plate to connect with the top part. The width edge of the cross-shaped plate is 320 millimeters and the length of it is 775 millimeters. We made one circular hole whose diameter is 230 millimeters and twelve screw holes on the cross-shape plate. We can put the plugs of the 9 KINECT devices into the top part and arrange all lines of the KINECT devices through the circular hole. We use the twelve screw holes to connect the octagonal container with the top part. The octagonal container is shown in Figure 3.5.

(a) (b)

Figure 3.5 The octagonal container. (a) The whole appearance of the octagonal container. (b) The side of the octagonal container. (c) The bottom of the octagonal container. (d) The cap of the octagonal container.

3.3.3 Top Part

The third part is the top part. We separate the top in three parts. The first part of the top part is a circular plate. The diameter of the circular plate is 600 millimeters.

There are four screw holes on the plate. We use the four screw holes to fix the whole octagonal 9-KINECT imaging device on the ceiling.

The second part of the top part is a hollow cylinder. We set two sockets of power extension cords in the hollow cylinder. The two sockets of power extension cords are used to extend the power lines of the 9 KINECT devices. The diameter of the hollow cylinder is 400 millimeters and its height is 650 millimeters. We make one square hole and one rectangular hole on the surface of the hollow cylinder. The size of the square hole is 100100 square millimeters. We put two plugs of the socket of the power extension cords into the outer socket through the square hole. The size of the rectangle hole is 400150 square millimeters. A user can put their hands into the octagonal 9-KINECT imaging device through the rectangular hole.

The third part of the top part is another cross-shaped plate. The design specification is the same as the cross-shaped plate of the octagonal container. A user

can arrange all lines of the 9 KINECT devices through the circular hole. We connect the top part and the octagonal container with the twelve screw holes. Finally, we welded the three parts of the top together. The top part is shown in Figure 3.6.

(a) (b)

Figure 3.6 The top part. (a) The whole appearance of the top part. (b) The circular plate of the top part. (c) The hollow cylinder of the top part. (d) The crossed plate of the top part.

3.4 Analysis of Device Performance

In this study, we think the suitable height from the bottom of the octagonal 9-KINECT imaging device to the ground is 3,000 millimeters. If the suitable height is not 3,000 millimeters, we can change the hollow cylinder of the top part. The vertical tilt angle of the outer 8 KINECT devices on the interchangeable bases ranges from

3^o to 57^o. We can change the range of the vertical tilt angle by changing the

interchangeable base with the different tilt angle of the incline. But it should be noticed that the tilting device of the KINECT device won’t work, when the vertical tilt angle of the KINECT device is smaller than 60^o, because of the gravity sensor on the KINECT device. We would like to use the range of the vertical tilt angle from ^to

^

3.4.1 Coverage of Views

With the height from the bottom of the octagonal 9-KINECT imaging device to the ground and the range of the vertical tilt angle, we can analyze the coverage of views of the octagonal 9-KINECT imaging device. We separate the analysis of the coverage of views into the color image side and the depth image side.

On the color image side, we use a single KINECT device to analyze the maximum and minimum sensing ranges of the field of view. The maximum sensing range is approximate 45,000 millimeters with a ^ vertical tilt angle of the KINECT device. A diagram illustrating this case is shown in Figure 3.7. The minimum sensing range is approximate 2,350 millimeters with the ^vertical tilt angle of the KINECT device and an illustration diagram is shown in Figure 3.8.

We now analyze the coverage of views when all of the 9 KINECT devices are used. Because we want to have more overlapping views between the 9 KINECT devices to facilitate human model construction, we use the minimum sensing range.

Also, we can use a circle whose diameter is approximate 6,730 millimeters to

在文檔中透過KINECT影像做視訊監控應用上的立體環境建模與監視 (頁 19-0)