There are many kinds of features and sensors to detect human beings. Since visual perception is the only sensing capability of the proposed system in this study, image analysis is one of the solutions to detect human beings. The face is an obvious characteristic of human beings. As the result, we propose a method to detect human faces by color and shape features in images. The method for face detection will be described in Section4.3. Sometimes, the limitation of camera resolution makes the acquired image unclear. A far distance from a person to the camera might cause difficulty in segmenting a clear human face region out of an image of the person. To redeem the limitation, we propose a blockwise frame difference method to extract moving objects in the image and decide if the moving object is similar to a human body. The motion detection method will be proposed in Section 1.4. Before all the details of the mentioned techniques are described, we will give a brief introduce to the proposed process in Section 4.2 first.
4.2 Proposed Process of Human Detection
The proposed process of human detection has two major parts: human face detection and human body detection. The features we adopt to detect a human face are color and shape. The color of the face undoubtedly is just the skin color, and the skin color has been studied intensively in recent years. In this study, we adopt an elliptic skin model to determine if the color of a pixel is skin color or not.
After getting all the skin color regions in an image, we have to recognize which one is similar to the shape of a human face. As the contour of a human face is roughly elliptic in shape, we propose a method for matching each skin color region with an elliptic shape mask. On the other hand, to avoid erroneously recognizing an elliptic non-face region as a face from skin color regions, we make a double check by motion detection.
If nothing is detected by the face detection process, we decide that a person might exist at a far distance. Then, we try to confirm the decision further by detecting the existence of a human body using moving regions in the image, which can be obtained by an additional process of frame differencing. The technique of frame differencing does not work finding the case of having a moving region in a changing background. We propose therefore a blockwise frame differencing technique to detect moving regions. After performing this technique, we can get moving regions in an image and detect any human body by applying a shape recognition technique to these moving regions.
The system will stop the human detection process and start a human tracking process as long as a face is detected in an image. The process of human tracking will
be described in the next chapter. The major steps of the proposed process of human detection are presented as follows.
Step 1. Capture an image.
Step 2. Apply region segmentation by skin color identification and motion detection by blockwise frame differencing to extract motion regions.
Step 3. Fit each extracted skin region with an ellipse to detect a possible human face.
Step 4. Apply human body detection by applying shape recognition to extracted motion regions.
The proposed process of human detection is illustrated in Figure 4.1.
Human
Figure 4.1 The proposed process of human detection.
4.3 Human Detection by Faces
In order to detect faces in images, we have to choose features to define a face. The features we used in this study are color and shape, as mentioned previously. The rough sketch of a face can be represented by the shape of an ellipse with the skin color. Thus, we detect a human face in the image by searching a skin-colored ellipse.
More specifically, recognizing a skin-colored ellipse in an image needs two tasks:
giving the definition of skin color and conducting pattern recognition of ellipses. With a captured image, we first segment out any skin region and then fit shapes to the regions. If a skin color region is close to an ellipse in shape, it is decided that a face is detected. In Section 4.3.1, we will introduce the proposed method of classification of skin color. And in Section 4.3.2, we will describe the proposed method for ellipse shape recognition.
4.3.1 Skin Region Segmentation Using Color Classification
Skin region segmentation is commonly used for face detection. Determining a color pixel is of a skin color or not is the goal. Before defining the classifier for skin color, the choice of color representations is important, which affects the complexity of the classifier.
In this section, we describe the color space and the classification algorithm proposed in this study.
4.3.1.1 YC
bC
rColor Space
Many common color models are used in the field of computer vision, for examples, RGB, HIS, HSV, YCbCr, CMY, CIE, etc. Each one of the color models has its own characteristics and is applicable to a specific set of applications. In the
application of skin region segmentation, classifiers for different color models are proposed in many research works.
In this study, we choose the YCbCr to be the color space for detecting skin color in images. According to Chai and Bouzerdoum [20], the distribution of skin color in YCbCr color space is concentrative and the distribution of the skin colors of different human races are similar. As the result, transforming images in the RGB color space into ones in the YCbCr color space can reduce the complexity of skin-color pixel classification.
In the YCbCr color space, Y represents the luminance, Cb represents the chrominance of blueness, and Cr represents the chrominance of redness. Y is coded from 16 to 235, where code 16 is black and 235 is white. And Cb and Cr range from 16 to 240. RGB values can be transformed into the YCbCr color space by (4.1) below:
b r
Y 16 65.481 128.533 24.966 R
C 128 37.797 74.203 112 G
C 128 112 93.786 84.214 B
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
Given the input RGB values which are within the range of [0, 1], the output values will be within the range of [16, 235] for Y and [16,240] for Cb and Cr.
Figure 4.2 shows the YCbCr color model with the value of Y being 126. Because the transformation from the RGB space to the YCbCr is linear but not one-to-one, some value sets of Cb and Cr are not meaningful when the value of Y is 126. Some YCbCr color models with different Y values are shown in Figure 4.3. When Y is 16 which is the darkest luminance, Cb and Cr can only have the value of 128. Likewise, when Y is 235 which is the brightest luminance, Cb and Cr also only have the value of 128. Figure 4.4 shows a 3D YCbCr color model [23][24].
Cb Cr
(65, 48)
(190, 207) Y=126
16 240
16 240
Figure 4.2 YCbCr color model with Y = 126
Figure 4.3 YCbCr color models with different Y values.
Figure 4.4 3D YCbCr color model in [23][24].
4.3.1.2 Adopted Skin Color Model
Previous studies have found that pixels belonging to skin regions exhibit similar Cb and Cr values [20][21]. Chai and Mgan [21] used a fixed-range skin color map in the Cb-Cr plane for face segmentation, and the range of Cb is between 77 and 127 and the range of Cr is between 133 and 173. And the region of the skin color is shown to be a rectangle. However, when we observe the distribution of skin color in the Cb-Cr
plane, it is found more similar to an oblique ellipse, as shown in Figure 4.5. In the study by Lee and Yoo [22], a new statistical color model for skin detection with elliptical boundaries was suggested. Thus, we define an oblique ellipse in the Cb-Cr
plane to be the skin color model in this study, and the parameters of the elliptic skin model are adjusted by experiments. The center of the elliptic model is taken to have the values 103 for Cb and 158 for Cr. And the angle of rotation is set to be 145 degrees, and the lengths of major and minor axes are set to be 25.39 and 14.03 respectively.
Figure 4.5 Distribution of conditional probability density function of skin color in Cb-Cr plane [20].
Figure 4.6 The elliptic skin model used in this study.
4.3.2 Detection of Human Face by Ellipse Shape Fitting
Since the shape of a human face is close to an ellipse, we propose in this study a pattern recognition method for ellipse shape detection to distinguish face regions from
other skin regions segmented out by the skin color classification mentioned in Section 4.3.1. More specifically, after segmenting the skin color regions out of an image, we determine if the region is similar to an ellipse. If so, we take it to be a human face.
The method to decide whether a region is elliptic in shape is described as an algorithm in the following.
Algorithm 4.1 Face detection by pattern recognition of ellipses.
Input: A skin region set R=
{
R R1, 2,...,Rn}
as shown in Figure 4.7(b). The width of a region Ri in R is denoted by wi, and the height of Ri by hi. The boundaries of the region Ri are denoted as lefti, righti, topi, buttomi. And the number of pixels in region Ri is denoted by pi.Output: A face region Rface. Steps:
where c is a pre-selected constant.
Step 2. Make a rectangular mask rectanglei for Ri with width wi and height 1.2×wi, and an elliptic mask ellipsei for Ri with its major axis length being wi and its minor axis length being 1.2× , as shown in Figure 4.7(c). wi
Step 3. To fit each region in R' with an ellipse shape, compute the number, ini, of the pixels of region Ri within ellipsei. Additionally, compute the number, outi, of the pixels of region Ri within rectanglei and without ellipsei. That is,
compute
,
( ),
i i i i
i i i i i
in R ellipse R R'
out R rectangle ellipse R R'
= ∀ ∈ 1] which is defined in advance.
By the ellipse shape fitting as described above, we can detect the face region in images, as shown by the example in Figure 4.7.
(a) (b)
(c) (d) Figure 4.7 The detection of human face by ellipse shape fitting. (a) Input image. (b)
Skin Segmentation. (c) Rectangular and elliptic mask. (d) Detected face region.
4.4 Human Body Detection by Motion Analysis
Two kinds of misjudgments happen in the human detection work using the proposed human face detection method mentioned in the last section. One is recognizing a face-like object to be a human face in the image, and the other is detecting nothing when a person does exist at a distance from the vehicle. To avoid the first kind of mistake, we need an advanced feature to confirm if the detected face region is a human face or not. Also, to avoid the second type of mistake, we have to detect humans by other features, not only the feature of face. In this study, we use techniques of motion detection and human body recognition to reduce the effects of these drawbacks and so increase the reliability of the proposed system.
In a fixed camera system, the moving parts of the scene can be detected by frame differencing with fixed backgrounds learned in advanced. Unfortunately, this method is not working in our system because the background in the image is always changing with the camera on the moving vehicle and robot arm. We thus propose in this study a method of frame differencing for use by our vehicle system, which is presented in Section 4.4.1.
In Section 4.4.2, we will describe the method of human body recognition by motion detection.
4.4.1 Motion Detection by Shift Tolerance Blockwise Frame Differencing
First, we define some terns for use in the proposed method.
(1) Current image: The image captured from the camera at the current moment or equivalently in the current navigation cycle.
(2) Reference image: The image captured from the camera at the last moment.
(3) Block: A block consists of a square region of pixels, which is the unit of the image.
(4) Searching window: A searching window consists of a square region of pixels, whose size is larger than the size of a block.
Subtracting the current image from the reference image block by block is the basic idea of blockwise frame difference. If the difference between the target block in the current image and the candidate block at the same position in the reference image is below some threshold, then it is may be considered that no motion has taken place, i.e.
the target block is stationary. If it is not, we will find the best match block for the target block within the searching window in the reference image. If the best match block is below the threshold, we say that the target block is stationary; otherwise, moving. Repeating these steps for each block in the current image, we can get all the moving parts in the current image. The detail is stated in the following algorithm.
Algorithm 4.2 Shift tolerance blockwise frame differencing.
Input: current image Ic, reference image Ir, block size s s× , and the size of a searching range w, which makes the size of the searching window being
(2w s+ ×) (2w s+ . )
Output: a difference image Id. Steps:
Step 2. Define the range of the searching window to be (2w+s)×(2w+s), as shown
in Figure 4.9. Subtract the target block bij from the candidate block at the same position in the reference image. If the difference is below the threshold t, regard the target block bij as stationary and go to Step 5.
Otherwise, go to Step 3.
Step 3. Find the best match block of the target block bij within the searching window in the reference image by subtracting the target block bij from each of the blocks within the searching window, as shown in Figure 4.10.
Step 4. If the difference between the target block bij and the best match block is below the threshold t, regard the block bij as stationary; else, moving.
Step 5. Repeat Step 2 for each block in Bc to decide the state, stationary or moving, of it.
Step 6. Get a complete frame difference image Id by filling the moving blocks with white color and the stationary blocks with black color, as shown in Figure 4.11.
Figure 4.8 The image is segmented into blocks.
current image Ic
reference image Ir
s s
bij w w
w
w s s
searching window
2w+s 2w+s
bij
searching window
Figure 4.9 The searching window.
Figure 4.10 The blocks within in the searching window in Ir.
Figure 4.11 An example of blockwise frame difference images.
4.4.2 Human Body Detection
By the blockwise frame differencing result obtained by the method described in the last section, we can get a difference image from every two sequential image frames. The difference image shows the moving regions at the current moment. Since we compute the moving regions by blockwise frame differencing instead of a pixelwise operation, the regions do not appear to have a complete shape of a human body. Moreover, the shape of a human body is irregular in shape, so it is impossible to detect a human body by fitting a detected region with a certain well-known shape.
Furthermore, when the vehicle is moving in an open space, if the system detects some moving regions, we can assume that there is something moving in the filed of view of the camera. Since the shape of a human body is complicated and is hard to define, we use only the ratio of the width to the length of a moving region as a feature for human body detection in this study. This feature of a normal person is defined to be the ratio of the shoulder width to the body height. And a reference range of this ratio is around 1/4 as shown by the Vitruvian Man painted by Leonardo da Vinci.
However, if two sequential images both include the same person but at different positions, the difference of these two images cannot form a complete shape of a
image of these two images might show a moving region which is too “thin,” and if the positions are far away, the difference image might show a moving region which is too
“thick.” Thus, we widen the range from 1/8 to 1 to consider the situation of overlapping of body regions in consecutive images. The following algorithm presents the human body detection method by the feature of body proportion discussed above.
Algorithm 4.3 Human body detection by moving region proportion.
Input: A moving region set R=
{
R R1, 2,...,Rn}
as shown in Figure 4.7(b). The width of a region Ri in Ris denoted by wi, and the height of Riby hi.Output: A human body region Rbody. Steps:
where c is a pre-selected constant.
Step 2. If R'= ∅ , it means no human body is detected in the moving regions. Else, if
4.4.3 Some Experimental Results
In this section, we show some experimental results of human body detection in Figure 4.12. Figure 4.12(a) shows a case of both sequential images with no human beings, and the difference image showing some spotty moving regions. As shown in Figure 4.12(b), the difference image shows the complete shape of a person. However, if two sequential images both include the same person but at different positions, the difference of these two images cannot form a complete shape of a human body. If the positions in these two images are close, the result of the difference image of these two images might show the contours of a moving person, as shown in Figure 4.12(c). And if the positions are far away, the difference image might show a union region of the moving person, as shown in Figure 4.12(d).
(c) (d) Figure 4.12 The order of the images: the current image, the reference image and the
difference image using the proposed blockwise frame difference. (a) The person regions are close. (b) The person regions are far. (c) The person only exists in one of the images (d) No person exists in both images.
Chapter 5
Human Tracking in Indoor Environment
5.1 Basic Idea of Human Tracking
After a human face is detected by the proposed human detection method mentioned in the previous chapter, the system will extract the color of the person’s clothes and save the image part of the clothes in the PC. The vehicle can then track the target person by continuous detection of the clothes. In this chapter, we will describe the entire process of human tracking in detail. In Section 5.2, we first present the process of human tracking step by step. The vehicle navigates according to the position of the target person and conducts face detection at the same time. The system will also compute the distance of the target person using the face detected.
In Section 5.3, a method for extraction of colors of human clothes is proposed.
According to the position of the human face which is detected before, we compute the position of the human body and extract the color of the clothes by region growing and save the image part of the clothes. Then, we will describe the detail of human tracking by clothes in Section 5.4. We use the intersection of the cloth images to compute the position in the image of the target person. Two applications, stranger tracking and person following, of the proposed human tracking method are stated in Section 5.5.
5.2 Proposed Process of Human Tracking
In the process of human tracking, the vehicle tracks the target person by detecting the clothes of the target person consecutively. In the previous chapter, we describe how to estimate the location of the target person’s face in the image. Then, the system will extract the cloth region of the person to facilitate human tracking. The idea behind the tracking method is to make the target person always appear at the center of the image, which means that the head of the vehicle always aims at the target person and moves forward. After turning the head of the vehicle, the vehicle will move to the person for a constant distance. Figure 5.1 shows a cycle of the human tracking process.
Figure 5.1 A cycle of the human tracking process.
The proposed process of human tracking is described in the following
The proposed process of human tracking is described in the following