• 沒有找到結果。

3.3 Calibration of Camera by Angular Mapping

3.3.2 Nonlinear Angular Mapping

With the camera distortion, the linear angular transformation method mentioned in Section 3.3.1 is not applicable to get the correct corresponding angles θi and ϕi of the image coordinate system. To precisely obtain the angular transformation from the real world to the image, a real world data acquisition method by angular-mapping camera calibration is proposed. Since camera distortion exists both horizontally and vertically, we have to consider the horizontal and vertical directions at the same time while we compute the longitude and latitude values of each point in the image.

In the proposed method, we attach a grid with m vertical lines and n horizontal lines on a wall which is perpendicular to the ground. Then we have a real world point set V = {V00, V01, …, Vmn}, where Vij = (θij, φij) is a pair of the longitude and latitude values in the SCS of the point Vij at the intersection of the ith vertical line and the jth horizontal line. The set V of intersection points is known in advance. And the corresponding point set P = {P00, P01, …, Pmn} appearing in the image may be

identified manually, where Pij = (uij, vij) is a point in the ICS corresponding to point Vij. The detailed process of the previously-mentioned nonlinear angular mapping is described as an algorithm in the following.

Algorithm 3.2 The real location data acquisition by image taking and mapping.

Input: An image I, as shown in Figure 3.11, and a set of longitude and latitude pair V

= {V00, V01, …, Vmn}, as mentioned above.

Output: A point set P = {P00, P01, …, Pmn} in I corresponding to V, with Pij

corresponding to Vij, where i = 0, 1, …, m and j = 0, 1, …, n.

Steps:

Step 1. Attach a grid with m vertical lines and n horizontal lines on a wall, which is perpendicular to the ground.

Step 2. According to the interval distance of the grid on the wall and the distance Dic

from the wall to the camera, measure the longitude and latitude values of each point Vij in the set V.

Step 3. Fix the interval of the longitude and the latitude to be 5º by adjusting the interval distance of each vertical line and each horizontal line of the grid on the wall based on the constraint of Dic = 170cm as shown in Figure 3.10.

Step 4. Mark yellow points at the intersections of the lines, as shown in Figure 3.11.

Step 5. Record the coordinates of each yellow point Pij(uij, vij) in the ICS and group all such points as a set P.

Step 6. For each point Pij in P in the image, manually identify the corresponding point Vij in V with the longitude and latitude values θij and φij, as shown in Figure 3.12 and set up the mapping.

We have known the longitude and the latitude values of the yellow points in the

other pixels in the image, we use an interpolation method, as described in the following algorithm.

Algorithm 3.3 Interpolation for computing viewing angles of any point.

Input: An image point I(u, v) in the ICS, the point set P and the point set V

Step 2. Decide whether the point I is in the region surrounded by the coordinates (u, v) of the four endpoints, Pij, P(i+1)j, Pi(j+1), and P(i+1)(j+1) by substituting If the inequalities (3.1) and (3.2) are satisfied, the point I is regarded to be in the region; else, repeat Step 2 to check the next region.

Step 3. Define a line Mh which passes the point I and its slope is the average of the slope of L1 and L3, and so obtain two intersections q(uq, vq) and r(ur, vr) of

Mh with L0 and L2 as shown in Figure 3.13.

Step 4. Define a line Mv which passes the point I and its slope is the average of the slope of L0 and L2, and so obtain two intersections s(us, vs) and t(ut, vt) as shown in Figure 3.13.

Step 5. Use an interpolation method to obtain the longitude and the latitude (θI, φI) of I in the SCS by the following equations according to the geometric ratio principle:

Figure 3.10 An illustrate of Attaching the lines on the wall.

By the interpolation method, each pixel in the image coordinate system can be mapped into the longitude and the latitude values in the SCS. By this information, we can get the angular position of objects in the image, as described in the following.

(a) (b)

Figure 3.11 A method of finding image coordinates of tessellated points in the grabbed image. (a) A grabbed image with tessellated points. (b) The tessellated points marked by yellow points.

Figure 3.12 The points on the wall corresponding to of yellow points in Figure 3.11(b).

Pij

Figure 3.13 An illustrate of the interpolation method that a region contains the point I in the ICS.

3.4 Vehicle Location Techniques Using Angular Mapping

Using the calibration by the non-linear angular mapping method mentioned in Section 3.3.2, we can get the longitude and the latitude values of each pixel in the image. Since the camera is equipped on the arm of the vehicle, the directional angles of the camera are not always zero. When the pan angle of the camera is not zero, the directions of an object with respect to the camera and the vehicle also are both different. To track the target object correctly, we have to transform the directional angles with respect to the camera to ones with respect to the vehicle. In order to obtain the transformation, the information of the longitude and latitude values which are obtained from mapping the image coordinates is not enough. We must have more data to solve the ambiguity in distance estimation.

First, we have to know how to calculate the distance between the object and the camera by the latitude value and the distance from the camera to the ground, and we will discuss our method for this purpose in Section 3.4.1. And the way we propose for computing the directional angles of the camera with respect to the vehicle will be stated in Section 3.4.2.

3.4.1 2D to 3D Distance Transformation

As shown in Figure 3.14, we knew the coordinates of an object in the ICS after we take the image of the object. After the angular mapping, we have the information of the longitude and the latitude of each image point of the object in the SCS. Since we have the latitude value of the object and the knowledge about the height of the camera, if we know the ground contact point of the object in the image, we can compute the distance between the camera and the object. The algorithm is described in the following.

Algorithm 3.4 Transformation of 2D image point to 3D real world point.

Input: Camera height Hc, and the ground contact point P(u, v) in the ICS of an object.

Output: The distance Doc between the object and the camera.

Steps:

Step 1. Transform (u, v) of P into its (θ, φ), the longitude and the latitude in the SCS, by the proposed mapping method described in Section 3.3.2.

Step 2. Compute Doc by the following equation:

OC

Since we know the distance between the vehicle and the object by applying the above algorithm, what we have to do now is to turn the direction of the vehicle toward the object and moves forward. The way we propose to find the angle the vehicle has to turn will be introduced in the next section.

(a)

(b)

Figure 3.14 The distance between the object and the vehicle. (a) The camera has no tilting, i.e. φc = 0. (b) The camera has a tilt angle of φc.

3.4.2 Angle Transformation between Coordinate Systems

As shown in Figure 3.15, we can know the point of the object in the image and get the distance from the vehicle to the object according to the above algorithms.

However, the rotation center of the camera is different from the one of the vehicle.

angle θv, the angle that the vehicle has to turn to aim at the object, in the PCS. The transformation from (u, v) in the ICS to θv in the PCS is described in the following algorithm.

Algorithm 3.5 The angular transformation from the ICS to the PCS.

Input: The distance between the rotation center of the camera and the vehicle, and the ground contact point (u, v) of the object in the ICS.

Output: The directional angle θv of the object in the PCS Steps:

Step 1. Transform (u, v) to (θ, φ), the longitude and latitude in the SCS, by the proposed mapping method in Section 3.3.2.

Step 2. Compute θv in the PCS by the following equation according to Figure 3.15:

cv

Figure 3.15 The rotation angle of the vehicle to obtain the tracking of the object.

From the above algorithm and the one in the last section, we can transform the position of the object in the image to the polar coordinate system with the distance and directional angles (Dic, θv). By the transformation, the location of an object in the real word now is clear and can help us to conduct tracking of target objects.

Chapter 4

Human Detection by Image Analysis for Indoor Security Patrolling

4.1 Overview of Human Detection

There are many kinds of features and sensors to detect human beings. Since visual perception is the only sensing capability of the proposed system in this study, image analysis is one of the solutions to detect human beings. The face is an obvious characteristic of human beings. As the result, we propose a method to detect human faces by color and shape features in images. The method for face detection will be described in Section4.3. Sometimes, the limitation of camera resolution makes the acquired image unclear. A far distance from a person to the camera might cause difficulty in segmenting a clear human face region out of an image of the person. To redeem the limitation, we propose a blockwise frame difference method to extract moving objects in the image and decide if the moving object is similar to a human body. The motion detection method will be proposed in Section 1.4. Before all the details of the mentioned techniques are described, we will give a brief introduce to the proposed process in Section 4.2 first.

4.2 Proposed Process of Human Detection

The proposed process of human detection has two major parts: human face detection and human body detection. The features we adopt to detect a human face are color and shape. The color of the face undoubtedly is just the skin color, and the skin color has been studied intensively in recent years. In this study, we adopt an elliptic skin model to determine if the color of a pixel is skin color or not.

After getting all the skin color regions in an image, we have to recognize which one is similar to the shape of a human face. As the contour of a human face is roughly elliptic in shape, we propose a method for matching each skin color region with an elliptic shape mask. On the other hand, to avoid erroneously recognizing an elliptic non-face region as a face from skin color regions, we make a double check by motion detection.

If nothing is detected by the face detection process, we decide that a person might exist at a far distance. Then, we try to confirm the decision further by detecting the existence of a human body using moving regions in the image, which can be obtained by an additional process of frame differencing. The technique of frame differencing does not work finding the case of having a moving region in a changing background. We propose therefore a blockwise frame differencing technique to detect moving regions. After performing this technique, we can get moving regions in an image and detect any human body by applying a shape recognition technique to these moving regions.

The system will stop the human detection process and start a human tracking process as long as a face is detected in an image. The process of human tracking will

be described in the next chapter. The major steps of the proposed process of human detection are presented as follows.

Step 1. Capture an image.

Step 2. Apply region segmentation by skin color identification and motion detection by blockwise frame differencing to extract motion regions.

Step 3. Fit each extracted skin region with an ellipse to detect a possible human face.

Step 4. Apply human body detection by applying shape recognition to extracted motion regions.

The proposed process of human detection is illustrated in Figure 4.1.

Human

Figure 4.1 The proposed process of human detection.

4.3 Human Detection by Faces

In order to detect faces in images, we have to choose features to define a face. The features we used in this study are color and shape, as mentioned previously. The rough sketch of a face can be represented by the shape of an ellipse with the skin color. Thus, we detect a human face in the image by searching a skin-colored ellipse.

More specifically, recognizing a skin-colored ellipse in an image needs two tasks:

giving the definition of skin color and conducting pattern recognition of ellipses. With a captured image, we first segment out any skin region and then fit shapes to the regions. If a skin color region is close to an ellipse in shape, it is decided that a face is detected. In Section 4.3.1, we will introduce the proposed method of classification of skin color. And in Section 4.3.2, we will describe the proposed method for ellipse shape recognition.

4.3.1 Skin Region Segmentation Using Color Classification

Skin region segmentation is commonly used for face detection. Determining a color pixel is of a skin color or not is the goal. Before defining the classifier for skin color, the choice of color representations is important, which affects the complexity of the classifier.

In this section, we describe the color space and the classification algorithm proposed in this study.

4.3.1.1 YC

b

C

r

Color Space

Many common color models are used in the field of computer vision, for examples, RGB, HIS, HSV, YCbCr, CMY, CIE, etc. Each one of the color models has its own characteristics and is applicable to a specific set of applications. In the

application of skin region segmentation, classifiers for different color models are proposed in many research works.

In this study, we choose the YCbCr to be the color space for detecting skin color in images. According to Chai and Bouzerdoum [20], the distribution of skin color in YCbCr color space is concentrative and the distribution of the skin colors of different human races are similar. As the result, transforming images in the RGB color space into ones in the YCbCr color space can reduce the complexity of skin-color pixel classification.

In the YCbCr color space, Y represents the luminance, Cb represents the chrominance of blueness, and Cr represents the chrominance of redness. Y is coded from 16 to 235, where code 16 is black and 235 is white. And Cb and Cr range from 16 to 240. RGB values can be transformed into the YCbCr color space by (4.1) below:

b r

Y 16 65.481 128.533 24.966 R

C 128 37.797 74.203 112 G

C 128 112 93.786 84.214 B

⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤

Given the input RGB values which are within the range of [0, 1], the output values will be within the range of [16, 235] for Y and [16,240] for Cb and Cr.

Figure 4.2 shows the YCbCr color model with the value of Y being 126. Because the transformation from the RGB space to the YCbCr is linear but not one-to-one, some value sets of Cb and Cr are not meaningful when the value of Y is 126. Some YCbCr color models with different Y values are shown in Figure 4.3. When Y is 16 which is the darkest luminance, Cb and Cr can only have the value of 128. Likewise, when Y is 235 which is the brightest luminance, Cb and Cr also only have the value of 128. Figure 4.4 shows a 3D YCbCr color model [23][24].

Cb Cr

(65, 48)

(190, 207) Y=126

16 240

16 240

Figure 4.2 YCbCr color model with Y = 126

Figure 4.3 YCbCr color models with different Y values.

Figure 4.4 3D YCbCr color model in [23][24].

4.3.1.2 Adopted Skin Color Model

Previous studies have found that pixels belonging to skin regions exhibit similar Cb and Cr values [20][21]. Chai and Mgan [21] used a fixed-range skin color map in the Cb-Cr plane for face segmentation, and the range of Cb is between 77 and 127 and the range of Cr is between 133 and 173. And the region of the skin color is shown to be a rectangle. However, when we observe the distribution of skin color in the Cb-Cr

plane, it is found more similar to an oblique ellipse, as shown in Figure 4.5. In the study by Lee and Yoo [22], a new statistical color model for skin detection with elliptical boundaries was suggested. Thus, we define an oblique ellipse in the Cb-Cr

plane to be the skin color model in this study, and the parameters of the elliptic skin model are adjusted by experiments. The center of the elliptic model is taken to have the values 103 for Cb and 158 for Cr. And the angle of rotation is set to be 145 degrees, and the lengths of major and minor axes are set to be 25.39 and 14.03 respectively.

Figure 4.5 Distribution of conditional probability density function of skin color in Cb-Cr plane [20].

Figure 4.6 The elliptic skin model used in this study.

4.3.2 Detection of Human Face by Ellipse Shape Fitting

Since the shape of a human face is close to an ellipse, we propose in this study a pattern recognition method for ellipse shape detection to distinguish face regions from

other skin regions segmented out by the skin color classification mentioned in Section 4.3.1. More specifically, after segmenting the skin color regions out of an image, we determine if the region is similar to an ellipse. If so, we take it to be a human face.

The method to decide whether a region is elliptic in shape is described as an algorithm in the following.

Algorithm 4.1 Face detection by pattern recognition of ellipses.

Input: A skin region set R=

{

R R1, 2,...,Rn

}

as shown in Figure 4.7(b). The width of a region Ri in R is denoted by wi, and the height of Ri by hi. The boundaries of the region Ri are denoted as lefti, righti, topi, buttomi. And the number of pixels in region Ri is denoted by pi.

Output: A face region Rface. Steps:

where c is a pre-selected constant.

Step 2. Make a rectangular mask rectanglei for Ri with width wi and height 1.2×wi, and an elliptic mask ellipsei for Ri with its major axis length being wi and its minor axis length being 1.2× , as shown in Figure 4.7(c). wi

Step 3. To fit each region in R' with an ellipse shape, compute the number, ini, of the pixels of region Ri within ellipsei. Additionally, compute the number, outi, of the pixels of region Ri within rectanglei and without ellipsei. That is,

compute

,

( ),

i i i i

i i i i i

in R ellipse R R'

out R rectangle ellipse R R'

= ∀ ∈ 1] which is defined in advance.

By the ellipse shape fitting as described above, we can detect the face region in images, as shown by the example in Figure 4.7.

(a) (b)

(c) (d) Figure 4.7 The detection of human face by ellipse shape fitting. (a) Input image. (b)

Skin Segmentation. (c) Rectangular and elliptic mask. (d) Detected face region.

4.4 Human Body Detection by Motion Analysis

Two kinds of misjudgments happen in the human detection work using the proposed human face detection method mentioned in the last section. One is recognizing a face-like object to be a human face in the image, and the other is detecting nothing when a person does exist at a distance from the vehicle. To avoid the first kind of mistake, we need an advanced feature to confirm if the detected face region is a human face or not. Also, to avoid the second type of mistake, we have to detect humans by other features, not only the feature of face. In this study, we use techniques of motion detection and human body recognition to reduce the effects of these drawbacks and so increase the reliability of the proposed system.

In a fixed camera system, the moving parts of the scene can be detected by frame differencing with fixed backgrounds learned in advanced. Unfortunately, this method is not working in our system because the background in the image is always changing with the camera on the moving vehicle and robot arm. We thus propose in this study a method

In a fixed camera system, the moving parts of the scene can be detected by frame differencing with fixed backgrounds learned in advanced. Unfortunately, this method is not working in our system because the background in the image is always changing with the camera on the moving vehicle and robot arm. We thus propose in this study a method