• 沒有找到結果。

Chapter 4 Construction of 3D Images from KINECT Images

4.5 Experimental Results

4.5.2 Results of 3D Image Correction

We use the correction algorithm to correct the 3D data of the constructed 3D image and display the result by the OpenGL. But there is still a problem. That is, the corners of the 3D image are still curved irregularly. For this, on solution is to avoid the use of the 3D data of the corners of the 3D image. An example of the results of such geometric corrections for planes is shown in Figure 4.9. Another example of the results of such geometric corrections for an indoor environment is shown in Figure 4.10.

(a) (b)

(c) (d)

Figure 4.9 Results of geometric correction. (a) Original data seen from the top before correction. (b) Data seen from the top after correction. (c) Original data seen from the side before correction. (d) Data seen from the side after correction.

43

(a) (b)

(c) (d)

Figure 4.10 Results of geometric correction. (a) Original data seen from the top before correction. (b) Data seen from the top after correction. (c) Original data seen from the front before correction. (d) Data seen from the front after correction.

44

Chapter 5

Construction of 3D Indoor

Environment Model from Multiple KINECT Images

5.1 Introduction

In this chapter, we describe how we construct the indoor environment model for 3D video surveillance using images acquired by the octagonal 9-KINECT imaging device. More specifically, we use the nine KINECT devices to get nine sets of KINECT images and convert them into nine 3D images individually. Then, we merge the nine 3D images to build up an indoor environment model. But, before doing so, we should calibrate the spatial relation between the nine KINECT devices in advance.

The detail of the proposed calibration technique will be described in Section 5.2. After the calibration work, we use the results to merge the nine 3D images by shifting, rotating, and merging them to build up the indoor environment model. Finally, we display the model in 3D manners. The details of data merging and model displaying will be shown in Sections 5.3 and 5.4, respectively.

5.2 Calibration of KINECT Devices

45

5.2.1 Review of Iterative Closest Point (ICP) Algorithm

The iterative closest point (ICP) algorithm [11] can be employed to minimize the difference between two groups of points. It is often used to match objects, which are constructed by many points, to compute their similarity. It is useful for constructing 2D or 3D images from different views, because object registration or stitching requires shape matching.

The concept of the algorithm is simple. It iteratively revises the transformation, including translation and rotation, from an object into another in order to minimize the total distance between the points of the two objects. The algorithm is as follows.

Algorithm 5.1: ICP algorithm

Input: a group of points GA, another group of points GB, a set of transformations Tis, an initialized minimum value M, and an initialized transformation T0.

Output: A transformation T which is the relation between group GA and group GB. Steps:

Step 1. Apply a transformation Ti, which is not used yet, to all points in group GB. Step 2. Find points PMDs with the minimum distance in group GA for each point in

group GB.

Step 3. Compute the values VMDs of the minimum distance between the found points PMDs in group GA and the corresponding points in group GB.

Step 4. Sum up the values VMDs to get a total sum TS.

Step 5. If the total sum TS is small than the input minimum valueM, update the minimum value M with the total sum TS and the desired transformation T with the transformation Ti.

Step 6. Repeat Step 1 through Step 5 if the transformations Tis are not exhausted yet.

46

Step 7. Take the last updated transformation T as the output.

5.2.2 Calibration of Spatial Relation between KINECT Devices

In this section, we want to use the ICP algorithm to calibrate the spatial relations of the nine KINECT devices in the octagonal 9-KINECT imaging device. By using the ICP algorithm to merge the 3D images of two objects which are the same object but come from two different KINECT devices, we can get the result of the transformation between them, which is just the spatial relation of the two KINECT devices, because the transformation between 3D images is equivalent to the transformation between KINECT devices. With the concept above, we should prepare three things before starting calibration.

First, we should decide the range of the transformation parameters, and for this, we divide the transformation into two parts  a rotation and a translation. For the rotation, because the sensing directions of the nine KINECT devices of the octagonal 9-KINECT imaging device are fixed, the angles between the nine KINECT devices are also fixed. We can use the values of these angles for the rotation. For the translation, we divide it into two directions to facilitate running the ICP algorithm.

The place of each of the nine KINECT devices is fixed, so the distance between every two of the nine KINECT devices is also fixed. We would like to enlarge values of these distances and divide these distances into two directions for the translation of the two directions.

Second, we should find out the overlap region of the 3D images acquired from every two KINECT devices. Using the overlap region, we can merge the 3D images of an identical object “seen” from different KINECT devices by the ICP algorithm in

47

order to get the result of the transformation. The overlap regions may be found by manpower.

Third, we should choose objects, whose 3D images from different KINECT devices can be merged in the overlap regions, and we will call them calibration targets. Basically, we should use a calibration target which is big enough and can appear in the overlap region apparently. For this, we use common objects which appear in the indoor environment as calibration targets, such as couch, table, chair, clapboard, etc. Sometimes, we will also use a box which is put at suitable height as the calibration target, if there is no apparent object in the overlap region. Some calibration targets are shown in Figure 5.1.

(a) (b)

(c) (d)

Figure 5.1 Some calibration targets used in this study. (a) A couch. (b) A clapboard. (c) A chair and a table. (d) A box with a suitable height.

48

5.2.3 Algorithm for KINECT Device Calibration

With the preparation done, we start to calibrate the spatial relations between the nine KINECT devices in the octagonal 9-KINECT imaging device. Firstly, we label the nine KINECT devices by numbers, and two consecutively numbered KINECT devices mean that they are neighboring. Then, we use the 3D images, which include the pre-selected calibration target in their overlap region, to calibrate the inter-KINECT relation parameters by the ICP algorithm. Totally, we conduct such calibration for eight times.

Before we conduct such calibrations each time, we reset the range of the possible transformations between the two devices for the ICP algorithm, set the two 3D images including the calibration target from two neighboring KINECT devices as inputs to the ICP algorithm, and use the overlap region in the images to assist the calibration work. The proposed algorithm for KINECT-device calibration is as follows.

Algorithm 5.2: KINECT device calibration.

Input: the 3D images CT0, CT1, …, CT8 which are constructed from KINECT images acquired by the nine KINECT devices D0, D1, …, D8 and include the calibration target; the transformation NTj and the overlap region ORj between every two neighboring KINECT devices Dj and Dj+1 where j = 0, 1, …, 7; a counter with its value C set to be 0 initially.

Output: the transformation Rk between every two KINECT devices Dk and Dk+1, where k = 0, 1, …, 7, which can be used to “register” the 3D images CTk and CTk+1.

Steps:

Step 1. Take two 3D images CTc and CTc+1, which include the calibration target in their overlapping region, as input data for the ICP algorithm.

49

Step 2. Set the transformation NTc to be the transformation sets for the ICP algorithm.

Step 3. Start the ICP algorithm described in Section 5.2.1 while using the overlap region ORc to assist finding the calibration target for the ICP algorithm.

Step 4. Store the result of transformation of the ICP algorithm as the result of the transformation Rc.

Step 5. Increment the value C of the counter by 1.

Step 6. If the value C is smaller than eight, then repeat Steps 1 through 5; else, exit.

5.3 Environment Model Construction

5.3.1 Idea of Construction

After calibrating the spatial relations between the nine KINECT devices in the octagonal 9-KINECT imaging device, we can get eight transformations between the nine KINECT devices. As we mentioned previously, a transformation between two 3D images is equivalent to the transformation between the two corresponding KINECT devices, and vice versa. So we will use the results of the transformation to “register”

the nine 3D images, which are constructed from KINECT images acquired by the nine KINECT devices. By doing so, we can merge the nine 3D images into one to construct the indoor environment model.

5.3.2 Merge of Multiple 3D Images

In Section 5.2.3, we label the nine KINECT devices by numbers. It means that we also label the nine 3D images by numbers which are the same as the numbers of the nine KINECT devices. We then merge the nine 3D images sequentially according to the numbers. We use the first 3D image as a pivot and the others are merged into it,

50

and so to each of the last eight 3D images, more transformations should be applied.

The merging processing will be run eight times. The result from merging the nine 3D images is just an indoor environment model which we desire. The merging algorithm is as follows.

Algorithm 5.3: merging nine 3D images.

Input: nine 3D images IS0, IS1, …, IS8 constructed from images acquired by the nine KINECT devices D0, D1, …, D8 respectively; eight transformations RT0, RT1, …, RT7 from the calibration results where RTi represents the spatial relation between the two KINECT devices Di and Di+1 and i = 0, 1, …, 7; a counter with its value C set to be 0 initially.

Output: the merging result MR.

Step 6. If the value C is smaller than eight, then repeat Steps 3 through 5; else, go to the next step.

Step 7. Take the final merging result MR as the desired indoor environment model.

5.4 Experimental Results

The result of indoor environment modeling by merging the nine 3D images

51

acquired from the nine KINECT devices of the octagonal 9-KINECT imaging device is shown in Figure 5.2.

(a)

(b)

(c)

Figure 5.2 The constructed indoor environment model. (a) The indoor environment model seen from the top. (b) and (c) The indoor environment model seen from different views.

52

Chapter 6

Human Tracking by Tilting KINECT Devices

6.1 Introduction

In this chapter, we will introduce the proposed human tracking method by using the octagonal 9-KINECT imaging device for 3D video surveillance system. To track human activities, we should detect the human first. So we will separate the subject into two parts: human detection and human tracking.

In the human detection part, the depth image acquired by the KINECT device may be considered also as a kind of image like gray level image, so we may apply some method of motion object detection, which have been used for the color image, to the depth image to conduct human detection. For this, at first we use the background subtraction technique to detect moving objects in the depth image. Then, we use a noise reduction scheme to reduce noise in the resulting image. Finally, we apply a region growing scheme with a suitable threshold to the image resulting from noise reduction, and get the whole moving object in the depth image as the result. The detail of the proposed detection algorithm will be described in Section 6.2.

In the human tracking part, by analyzing the moving object in two consecutive frames of the depth images, we can know where the object will go and how large the distance the object moves in the two images. Accordingly, we can adjust the tilt angle of the KINECT devices in the 9-KINECT imaging device to track the object or do the handoff between KINECT devices. The detail of the proposed human tracking process

53

will be introduced in Section 6.3. And the experimental result will be shown in Section 6.4.

6.1.1 Review of Background Subtraction

The background subtraction is a technique commonly used in the fields of image processing and computer vision for object segmentation. We can use the background subtraction technique to separate the foreground of the image from its background because when we read the image, we are usually interested in the objects in the foreground of the image. The background subtraction technique is also a widely used approach for detecting moving objects in videos acquired from static cameras. The basic approach is to detect moving objects from the difference between the frame including moving objects and a reference frame often called the background image.

An example of background subtraction results is shown in Figure 6.1.

(a) (b)

(c) (d)

Figure 6.1 An example for the background subtraction. (a) The background image. (b) An image with moving objects. (c) The image of the difference between (a) and (b) with some noise. (d) The resulting image of background subtraction.

54

6.1.2 Review of Noise Reduction Method

There are many methods to reduce noise in the image. Mathematical morphology operations are often used to assist reducing noise in the image. Mathematical morphology is a theory for analysis and processing of geometrical structures. It is most commonly applied to digital images. Mathematical morphology has two basic operators. One is the erosion operator and the other is the dilation operator.

Before explaining the two operators, we should define some variables for input data. We use the variable A as the input image and use the variable B as the structuring element. The structuring element is a binary image with a simple and pre-defined shape but smaller than the input image. We also use the variable a as the pixel of the input image A and use the variable b as the pixel of the structuring element B. With the definitions above, we start to describe the two operators.

For the erosion operator, the erosion of the input image A by the structuring center of B when B moves inside A. An example of the results of applying the erosion operator is shown in Figure 6.2.

(a) (b)

Figure 6.2 An example for erosion results. (a) The original image. (b) The image after erosion.

55

For the dilation operator, the dilation of the input image A by the structuring element B is defined by:

AB = b

b B

A

. (6.2)

If the structuring element B has a center on the origin, then the dilation of A by B can be understood as the locus of the points covered by B when the center of B moves inside A. An example of the respectively of applying the dilation operator is shown in Figure 6.3.

(a) (b)

Figure 6.3 An example for dilation operator. (a) The original image. (b) The image after dilation operator.

From Figure 6.2, the thin parts of the object in the original image disappear in Figure 6.2 (b) after the erosion operator is applied. From Figure 6.3, the thin parts of the object in the original image can be seen to get thicker in Figure 6.3 (b) after the dilation operator is applied. So we can use the erosion operator to reduce noise in the image and use the dilation operator to restore the lost parts of objects which are produced by the erosion operator.

56

6.2 Human Detection

6.2.1 Background Learning

To use the background subtraction technique, we should conduct background learning of the indoor environment, which is the experimental place in this study, and get the 3D image of the background first. However, when we used a KINECT device to sense a static region, we discovered that the same locations of a pixel in two consecutive depth images of the KINECT image acquired from the same KINECT device are sometimes different. One has a value of depth information, but the other has no value of depth information. This problem comes from the infrared light rays sent out by the KINECT device. Because the reflective path of the infrared light rays will be interfered in some situations, the total amount of reflective infrared light rays will be different for different times of depth information detection and will also affect the production of the depth image indirectly.

To solve the problem and complete the background learning, we use a KINECT device to sense a static region for a while to get a multiple of KINECT images. Then, we average the values of all the pixels in the depth images at the same locations to get an average depth image. Also, we choose one color image from the acquired color images as the background color image. And finally, we choose one mapping array from the multiple ones produced by the Kinect-for-Windows SDK as the final mapping array, which will be used for constructing the 3D image of the background.

With the three measures above, we can construct a 3D image of the background.

Because we want to use the depth image to do background subtraction technique actually, we won’t construct the 3D image of the background immediately but regard the results of the three measures as a set of the 3D image constructor of the

57

background.

But when the vertical tilt angle of the KINECT device is changed, the field of view of the KINECT device is also changed. So we should redo the background learning with different vertical tilt angles of the KINECT device. Because the vertical tilt angle ranges from -25o to -55o, we do the background learning by the increment steps of 2 degrees of the vertical tilt angle from - 25o to - 55o. Furthermore, we use nine KINECT devices in the octagonal 9-KINECT imaging device to do the background learning, so we apply the processing task described above to the images taken by the nine KINECT devices in the experimental place and get many sets of 3D image constructors of the background. But the one KINECT device, which is at the center of the octagonal 9-KINECT imaging device and senses from top to bottom, is used to do background learning only once. The background learning algorithm using the nine KINECT devices of the octagonal 9-KINECT imaging device is as follows.

Algorithm 6.1: background learning algorithm for the nine KINECT devices of the octagonal 9-KINECT imaging device

Input: the nine KINECT devices D0, D1, …, D8 of the octagonal 9-KINECT imaging device, where the KINECT device D0 is at the center of the octagonal 9-KINECT imaging device and senses from top to bottom; the experimental place EP without moving objects; an angle value AG; a counter with it value C set to be 0 initially.

Output: many sets of the 3D image constructors of the background BG whose depth images will be used to the background subtraction technique.

Steps:

Step 1. Set the value AG of the angle to -25o.

58

Step 2. If the KINECT device DC is not the KINECT device D0, then set the tilt angle of the KINECT device DC with the angle value AG; else, go to the next step.

Step 3. Use the KINECT device DC to sense the experimental place EP for a while, and get a set of depth images, DI, of the KINECT images and a set of color images, CI, of the KINECT images.

Step 4. Average the multiple depth images in DI to get an average depth image AVGD.

Step 5. Choose one color image from those color images of the KINECT images as the background color image BGC.

Step 6. Use the Kinect-for-Windows SDK with those depth images in DI as input to produce a set of mapping arrays, MA.

Step 7. Choose one mapping array from MA as the final mapping array FMA for constructing the 3D image of the background.

Step 8. Regard the average depth image AVGD, the background color image BGC and the final mapping array FMA as a set of the 3D image constructor of the background, CT3D.

Step 9. Put CT3D into the set of the 3D image constructors of the background BG.

Step 10. Decrement the angle value AG by -2.

Step 10. Decrement the angle value AG by -2.