Merge of Multiple 3D Images - Environment Model Construction

5.3 Environment Model Construction

5.3.2 Merge of Multiple 3D Images

In Section 5.2.3, we label the nine KINECT devices by numbers. It means that we also label the nine 3D images by numbers which are the same as the numbers of the nine KINECT devices. We then merge the nine 3D images sequentially according to the numbers. We use the first 3D image as a pivot and the others are merged into it,

and so to each of the last eight 3D images, more transformations should be applied.

The merging processing will be run eight times. The result from merging the nine 3D images is just an indoor environment model which we desire. The merging algorithm is as follows.

Algorithm 5.3: merging nine 3D images.

Input: nine 3D images IS0, IS1, …, IS8 constructed from images acquired by the nine KINECT devices D₀, D₁, …, D₈ respectively; eight transformations RT₀, RT1, …, RT7 from the calibration results where RTi represents the spatial relation between the two KINECT devices D_i and D_i+1 and i = 0, 1, …, 7; a counter with its value C set to be 0 initially.

Output: the merging result MR.

Step 6. If the value C is smaller than eight, then repeat Steps 3 through 5; else, go to the next step.

Step 7. Take the final merging result MR as the desired indoor environment model.

5.4 Experimental Results

The result of indoor environment modeling by merging the nine 3D images

acquired from the nine KINECT devices of the octagonal 9-KINECT imaging device is shown in Figure 5.2.

(a)

(b)

(c)

Figure 5.2 The constructed indoor environment model. (a) The indoor environment model seen from the top. (b) and (c) The indoor environment model seen from different views.

Chapter 6 Human Tracking by Tilting KINECT Devices

6.1 Introduction

In this chapter, we will introduce the proposed human tracking method by using the octagonal 9-KINECT imaging device for 3D video surveillance system. To track human activities, we should detect the human first. So we will separate the subject into two parts: human detection and human tracking.

In the human detection part, the depth image acquired by the KINECT device may be considered also as a kind of image like gray level image, so we may apply some method of motion object detection, which have been used for the color image, to the depth image to conduct human detection. For this, at first we use the background subtraction technique to detect moving objects in the depth image. Then, we use a noise reduction scheme to reduce noise in the resulting image. Finally, we apply a region growing scheme with a suitable threshold to the image resulting from noise reduction, and get the whole moving object in the depth image as the result. The detail of the proposed detection algorithm will be described in Section 6.2.

In the human tracking part, by analyzing the moving object in two consecutive frames of the depth images, we can know where the object will go and how large the distance the object moves in the two images. Accordingly, we can adjust the tilt angle of the KINECT devices in the 9-KINECT imaging device to track the object or do the handoff between KINECT devices. The detail of the proposed human tracking process

will be introduced in Section 6.3. And the experimental result will be shown in Section 6.4.

6.1.1 Review of Background Subtraction

The background subtraction is a technique commonly used in the fields of image processing and computer vision for object segmentation. We can use the background subtraction technique to separate the foreground of the image from its background because when we read the image, we are usually interested in the objects in the foreground of the image. The background subtraction technique is also a widely used approach for detecting moving objects in videos acquired from static cameras. The basic approach is to detect moving objects from the difference between the frame including moving objects and a reference frame often called the background image.

An example of background subtraction results is shown in Figure 6.1.

(a) (b)

Figure 6.1 An example for the background subtraction. (a) The background image. (b) An image with moving objects. (c) The image of the difference between (a) and (b) with some noise. (d) The resulting image of background subtraction.

6.1.2 Review of Noise Reduction Method

There are many methods to reduce noise in the image. Mathematical morphology operations are often used to assist reducing noise in the image. Mathematical morphology is a theory for analysis and processing of geometrical structures. It is most commonly applied to digital images. Mathematical morphology has two basic operators. One is the erosion operator and the other is the dilation operator.

Before explaining the two operators, we should define some variables for input data. We use the variable A as the input image and use the variable B as the structuring element. The structuring element is a binary image with a simple and pre-defined shape but smaller than the input image. We also use the variable a as the pixel of the input image A and use the variable b as the pixel of the structuring element B. With the definitions above, we start to describe the two operators.

For the erosion operator, the erosion of the input image A by the structuring center of B when B moves inside A. An example of the results of applying the erosion operator is shown in Figure 6.2.

(a) (b)

Figure 6.2 An example for erosion results. (a) The original image. (b) The image after erosion.

For the dilation operator, the dilation of the input image A by the structuring element B is defined by:

AB = _b

b B



. (6.2)

If the structuring element B has a center on the origin, then the dilation of A by B can be understood as the locus of the points covered by B when the center of B moves inside A. An example of the respectively of applying the dilation operator is shown in Figure 6.3.

(a) (b)

Figure 6.3 An example for dilation operator. (a) The original image. (b) The image after dilation operator.

From Figure 6.2, the thin parts of the object in the original image disappear in Figure 6.2 (b) after the erosion operator is applied. From Figure 6.3, the thin parts of the object in the original image can be seen to get thicker in Figure 6.3 (b) after the dilation operator is applied. So we can use the erosion operator to reduce noise in the image and use the dilation operator to restore the lost parts of objects which are produced by the erosion operator.

6.2 Human Detection

6.2.1 Background Learning

To use the background subtraction technique, we should conduct background learning of the indoor environment, which is the experimental place in this study, and get the 3D image of the background first. However, when we used a KINECT device to sense a static region, we discovered that the same locations of a pixel in two consecutive depth images of the KINECT image acquired from the same KINECT device are sometimes different. One has a value of depth information, but the other has no value of depth information. This problem comes from the infrared light rays sent out by the KINECT device. Because the reflective path of the infrared light rays will be interfered in some situations, the total amount of reflective infrared light rays will be different for different times of depth information detection and will also affect the production of the depth image indirectly.

To solve the problem and complete the background learning, we use a KINECT device to sense a static region for a while to get a multiple of KINECT images. Then, we average the values of all the pixels in the depth images at the same locations to get an average depth image. Also, we choose one color image from the acquired color images as the background color image. And finally, we choose one mapping array from the multiple ones produced by the Kinect-for-Windows SDK as the final mapping array, which will be used for constructing the 3D image of the background.

With the three measures above, we can construct a 3D image of the background.

Because we want to use the depth image to do background subtraction technique actually, we won’t construct the 3D image of the background immediately but regard the results of the three measures as a set of the 3D image constructor of the

background.

But when the vertical tilt angle of the KINECT device is changed, the field of view of the KINECT device is also changed. So we should redo the background learning with different vertical tilt angles of the KINECT device. Because the vertical tilt angle ranges from －25^o to －55^o, we do the background learning by the increment steps of 2 degrees of the vertical tilt angle from － 25^o to － 55^o. Furthermore, we use nine KINECT devices in the octagonal 9-KINECT imaging device to do the background learning, so we apply the processing task described above to the images taken by the nine KINECT devices in the experimental place and get many sets of 3D image constructors of the background. But the one KINECT device, which is at the center of the octagonal 9-KINECT imaging device and senses from top to bottom, is used to do background learning only once. The background learning algorithm using the nine KINECT devices of the octagonal 9-KINECT imaging device is as follows.

Algorithm 6.1: background learning algorithm for the nine KINECT devices of the octagonal 9-KINECT imaging device

Input: the nine KINECT devices D₀, D₁, …, D₈ of the octagonal 9-KINECT imaging device, where the KINECT device D₀ is at the center of the octagonal 9-KINECT imaging device and senses from top to bottom; the experimental place EP without moving objects; an angle value AG; a counter with it value C set to be 0 initially.

Output: many sets of the 3D image constructors of the background BG whose depth images will be used to the background subtraction technique.

Steps:

Step 1. Set the value AG of the angle to －25^o.

Step 2. If the KINECT device D_C is not the KINECT device D₀, then set the tilt angle of the KINECT device DC with the angle value AG; else, go to the next step.

Step 3. Use the KINECT device DC to sense the experimental place EP for a while, and get a set of depth images, DI, of the KINECT images and a set of color images, CI, of the KINECT images.

Step 4. Average the multiple depth images in DI to get an average depth image AVGD.

Step 5. Choose one color image from those color images of the KINECT images as the background color image BGC.

Step 6. Use the Kinect-for-Windows SDK with those depth images in DI as input to produce a set of mapping arrays, MA.

Step 7. Choose one mapping array from MA as the final mapping array FMA for constructing the 3D image of the background.

Step 8. Regard the average depth image AVGD, the background color image BGC and the final mapping array FMA as a set of the 3D image constructor of the background, CT_3D.

Step 9. Put CT3D into the set of the 3D image constructors of the background BG.

Step 10. Decrement the angle value AG by －2.

Step 11. If the KINECT device DC is the KINECT device D0, then go to Step 13; else, go to the next step.

Step 12. If the value AG is larger than －55^o, then repeat Steps 2 through 10; else, go to the next step.

Step 13. Increment the value C of the counter by 1.

Step 14. If the value C is smaller than nine, then repeat Steps 1 through 13; else, exit.

6.2.2 Human Detection by Depth Image

With the background learning done, we start to conduct human detection. As we mentioned in Chapter 1, we make two assumptions as follows.

1. The indoor environment is unchangeable all the time.

2. The detected motion objects are humans for security surveillance.

We will follow these two assumptions to design the background subtraction technique.

When we use a KINECT device to sense the indoor environment with human activities and get a pair of KINECT images, we subtract the depth image in this pair from the background depth image acquired from the results of the background learning and get a subtracted depth image. In the subtracted depth image, there are many fragments and the human shape. The fragments are caused by the fact that the reflective infrared light rays are interfered in some situations as described in Section 6.2.1 to cause fluctuations in the depth image. We will regard the fragments as a kind of noise. We so apply the erosion operator of mathematical morphology to the subtracted depth image to reduce the small fragments. Then, we apply the dilation operator of mathematical morphology to the resulting depth image to restore the lost parts of the human shape and big fragments which are shrunken by the erosion operator. Finally, we apply the region growing scheme with a suitable threshold to the resulting depth image to find the human shape. An example of the results of human detection is shown in Figure 6.4.

(a) (b)

(e)

Figure 6.4 An example of human detection results. (a) The background depth image.

(b) The depth image with human activities. (c) The subtracted depth image with many fragments and the human shape. (d) The depth image with the human shape and big fragments after doing erosion and dilation. (e) The final human depth image after applying the region growing scheme with a suitable threshold.

6.2.3 Detection Algorithm

With the idea of human detection described in Section 6.2.2, we will propose an algorithm to implement the idea by using the nine KINECT devices of the octagonal 9-KINECT imaging device. The result will be used for human tracking. The detection algorithm is as follows.

Algorithm 6.2: Human detection by the nine KINECT devices.

Input: the nine KINECT devices D0, D1, …, D8 of the octagonal 9-KINECT imaging device; the background depth images BDIs from the results of the background learning; the threshold value T which will be used in the region growing scheme; the indoor environment IE; a counter with its value C set to be 0 initially.

Output: The device RD which detects human activities.

Steps:

Step 1. Use a KINECT device D_C to sense the indoor environment IE and get a depth image DI.

Step 2. Subtract the depth image DI from the background depth image BDI and get a subtracted depth image SDI.

Step 3. Apply the erosion and dilation operators of mathematical morphology to the subtracted depth image SDI to get the temporary depth image TDI.

Step 4. Apply the region growing scheme to the temporary depth image TDI with the threshold value T, and get the final depth image FDI.

Step 5. If there is a human shape in the final depth image FDI, then go to Step 8;

else, go to the next step.

Step 6. If the value C is smaller than nine, then increment the value C of the counter by 1; else, set the value C to be 0.

Step 7. Repeat Steps 1 through 6.

Step 8. Record the KINECT device DC as the result device RD and exit.

6.3 Human tracking

6.3.1 Human Tracking with Single KINECT Device

Once the human is detected, we can start to track the human’s activities. We can know which KINECT device of the nine KINECT devices of the octagonal 9-KINECT imaging device detects the human from the result of the human detection algorithm in Section 6.2 and we call that KINECT device the tracking KINECT device.

When we use the tracking KINECT device to track the human’s activities, we get a multiple of KINECT images. We can apply the methods described in Section 6.2.2 to the depth images of those KINECT images with the background depth images acquired from the results of the background learning to get the human’s depth images.

Next, we construct the human’s 3D data from the human’s depth images by the algorithm described in Chapter 4. Then, we analyze the human’s 3D data together with the frame rate of the tracking KINECT device to get the moving velocity and direction of the human. With the information above, we can predict the next position the human will go, and the tracking KINECT device can adjust accordingly its tilt angle dynamically to track the human.

6.3.2 Handoff between KINECT Devices

When the human is going out of the field of view of the tracking KINECT device, we should use one of the other KINECT devices of the octagonal 9-KINECT imaging device to keep tracking the human. So there is a handoff problem between the nine KINECT devices. Because we have the spatial relations between the nine KINECT

devices and the overlap regions of every two neighboring KINECT devices of the nine KINECT devices, it is easier to conduct the task of handoff between the nine KINECT devices. The handoff strategy we adopt is that when the human is going into the overlap region of the tracking KINECT device and its neighboring KINECT device, we let the neighboring KINECT device to assume the role of the new tracking KINECT device to complete the handoff task.

6.3.3 Tracking Algorithm

With the dynamic human tracking technique described in Section 6.3.1 and with single KINECT device and the handoff strategy described in Section 6.3.2, we can integrate them into a tracking algorithm using the nine KINECT devices of the octagonal 9-KINECT imaging device. The algorithm is described as follows.

Algorithm 6.3: human tracking using the nine KINECT devices

Input: the tracking KINECT device TKD which is assigned according to the result of the human detection algorithm described in Section 6.2; the neighboring KINECT devices NKDs of the tracking KINECT device TKD; the overlap regions ORs of the tracking KINECT device TKD and its neighboring KINECT devices NKDs.

Output: the new tracking KINECT device RKD for keeping tracking the human activities, which will be set to be “null” if the human is going out of the fields of view of the nine KINECT devices.

Steps:

Step 1. Use the tracking KINECT device TKD to track the human and get some KINECT images KIs.

Step 2. Apply the methods described in Section 6.2.2 to the depth images of the

KINECT images KIs to get the human’s depth images HDIs.

Step 3. Construct the human’s 3D data H3Ds from the human depth images HDIs.

Step 4. Analyze the human’s 3D data H_3Ds and get the moving velocity MV of the human and the moving direction MD of the human.

Step 5. Use the moving velocity MV and moving direction MD to predict the next position NP.

Step 6. If the position NP is still in the field of views of the tracking KINECT device TKD, then repeat Steps 1 through 5; else, go to the next step.

Step 7. If the position NP is in the field of view of the tracking KINECT device TKD with different tilt angles, then change its tilt angle by its tilting device and repeat Steps 1 through 6; else, go to the next step.

Step 8. If the position NP is in the one of the overlap regions OR, then take the involved neighboring KINECT device NKD, which shares this overlap region with the tracking KINECT device TKD, as the new tracking KINECT device RKD and exit; else, go to the next step.

Step 9. If the position NP is out of the fields of view of all the nine KINECT devices, then set the new tracking KINECT device RKD as null and exit.

6.4 Experimental Results

An example of the human tracking by tilting KINECT devices is shown in this section. The path of the human activities is shown in Figure 6.5. The 3D image sequences of the tracking human activities are displayed in 3D images by the OpenGL

在文檔中透過KINECT影像做視訊監控應用上的立體環境建模與監視 (頁 62-0)