1.1 Motivation
Multimedia applications in daily life become widespread used for education, security, entertainment and medicine, and provide more additional value for clients.
Many image segmentation techniques are used in multimedia applications and service robot, can subdivide an image or video frame into its constituent regions or objects and then become an important topic in recent years. The objects of interested can be extracted from an image as the foreground and are then used in industrial inspection, autonomous target acquisition, medicine image processing, traffic flow magnitude monitored, human detection, depth estimation, and etc.
Face detection techniques develop rapidly and are also an important topic in computer vision. A robust face detection system can locate each human face from an image and are widely applications in face recognition, face tracking, automatic surveillance system, human-machine interface, and home care system.
The human visual system has a strange ability to recognize objects from single view. However, for computer vision, it is a very challenging task to extract objects from scenes without any background information. Therefore, color feature, texture feature, shape matching, and any other information acquired by analyzing original image are used to achieve more accurate segmentation in computer vision. Since the display applied on 2D image does not satisfy the user’s requirement, we reconstruct the stereo image from single camera view by estimating the extracted subject depths.
Summary of above, this motivates us to design a robust method that adopts feature-based, shape-based, and space information of an image to recognize different
segmented regions as identical object and confirm object boundaries. The face detection techniques are utilized and located human face by extracting skin-color and then fitting by the ellipse templates. We also propose an improved automatic seeded region growing algorithm to simplify pre-process procedure. Furthermore, human bodies are extracted by using some semantic rules. After the human extraction procedure, the depths are estimated by analyzing the image information and acquiring the vertical y-coordinate values in the image.
The system flowchart is illustrated in Fig 1.1. Our system can be separated into three components. The first component is image segmentation. Edge detection methods are utilized initially. The detected edges are used for elliptical template matching and automatic seed selection. Then the color image transformed from RGB to YCbCr color space is used for skin-color extraction and region growing procedure.
The second component is human recognition. The interesting human is extracted by using some semantic rules and the adjacent regions with high similarity are merged to accomplish the human recognition. The third component is depth estimation. Hough transform is utilized for detecting the intersection of two or more parallel lines in the world, and the vanishing lines and points are detected. Then the vertical y-coordinate values in the image are detected and the depths are estimated base on the vanishing line or the depth look-up table of the camera.
2
Depth estimation
Human recognition
Image segmentation
Skin-color extraction
Edge detection
Automatic seed selection
Ellipse fitting method
Seeded region growing
Extract human face
Extract human body
Complete human detection
Hough transform
Vanishing lines and points
detection
Depth map Generation
Complete depth estimation Input
image
Fig 1.1 The block diagram of human recognition system.
1.2 Face Detection
The purpose of face detection is to localize and extract the face region from the background. Most of face detection techniques can be classified as feature-based and image-based approaches [1]. Feature-based approaches can be divided into low level
analysis, feature analysis and active shape model. Image-based approaches can be divided into linear subspace method, neural networks and statistical approach. A more detail techniques is shown in Fig. 1.2.
Color method is one of the low level analyses and a number of skin-color models have been constructed. A color image is specified in RGB components. The RGB model is suitable for color display, but is not good for color analysis because of its high correlation among R, G, and B components. Besides, the distance in RGB color space does not represent the perceptual difference in a uniform scale. In image processing and analysis, we often transform these components into other color spaces such as normalized RGB, HSV, CIE(Lab), HIS, YIQ, and YCbCr. In this paper, we detect the skin-region in the YCbCr color space for three reasons: First, an effective use of the chrominance information for modeling skin-color can be achieved in the YCbCr color space. Second, the chrominance components (Cb and Cr) are explicitly separated from the luminance (Y) component in the YCbCr model. Third, the YCbCr
space is typically used in most image and video coding standards.
In most cases, the shape of the human face is similar to an ellipse. An ellipse fitting method is one of the feature analyses and is employed to extract the face region.
We can detect the edges of the original image initially. Then, edge detection results are utilized to perform elliptical template matching. The parameters that describe the center of the ellipse, the major and minor radii, have high variations of the shapes of human faces. If dealing with the whole image and cover the high shape variations, the computation burden would be very high. We utilize instead an elliptical model to search for the human face proposed by Tang and Chen [2], which is much more computational efficient. Furthermore, we improve to do the elliptical template matching only to the possible region instead of the whole image.
4
Fig. 1.2 Face detection divided into two approaches [1].
1.3 A Hybrid Method of Seeded Region Growing
Many image segmentation techniques have been proposed in the past few years.
Most of image segmentation techniques can be classified into four main categories:
thresholding, edge-based, region-based, and hybrid techniques. Thresholding methods are based on the assumption that adjacent pixels whose value lies within a certain range belong to the same class. Edge-based methods assume that the pixel values change abruptly at the boundary between two different regions. Region-based
methods assume that adjacent pixels within the same region should have similar visual features such as intensity, color, or texture. Hybrid methods tend to integrate the results of boundary detection and region growing together to achieve better segmentation.
Hybrid methods of the image segmentation integrate the results of boundary detection and region growing to improve the drawback of discontinuous edges detected by edge-based method only. A hybrid method of seeded region growing (SRG) is widely used for segmenting object regions or boundaries from images. It is first proposed by Adams and Bischof [3], starts with assigned seeds, and grows regions by merging a pixel into its nearest neighboring seed region. The initial seeds are manually selected. Then, Fan et al. [4] presented a color image segmentation algorithm automated the initial seed selection by integrating color-edge extraction and seeded region growing on the YUV color space. The Y, U, V components perform edge detection procedure individual and edges are decided by the fast entropic thresholding technique. Then, edge results for the three color components are integrated to obtain color edges. The centroids between adjacent edge regions are taken as the initial seeds. It is too sensitive and may cause over-generated. Shih et al.
[5] proposed an automatic seeded region growing algorithm and initial seeds are generated if pixel has high similarity to its neighbors. The disadvantage is that the computation is much complicated.
In order to decrease the computation complexity, we develop a simpler method.
It detects the edges using Canny operator [6] initially. If neighbors of one pixel have no edge or few detected edges, the pixel is considered as a seed candidate.
6
1.4 Depth Estimation
Depth estimation has become an important topic in multimedia applications and several methods have been proposed to reconstruct the stereo image from a 2D image.
Xiong et al. [7] presented a method to obtain depth information from focus and defocus. Criminisi et al. [8] compute the 3D affine measurements from a single perspective view of a scene given only the vanishing line and point information determined from the image. Luong et al. [9] presented that correspondences between three images taken by the same camera with fixed internal parameters are sufficient to recover the internal parameters of the camera, to compute coherent perspective projection matrices, and to reconstruct 3D structure up to a similarity.
In the thesis, we estimated the human depth based on vanishing lines and points.
Sometimes the vanishing lines and points are not available, we utilized another method based on depth look-up table of the camera to estimate the depths.
1.5 Thesis Outline
The thesis is organized as follows. The skin-color extraction method and the elliptical template matching technique are described in Chapter 2. In Chapter 3, an improved method of automatic seeded region growing is introduced. In Chapter 4, the experiment results of our object segmentation and depth estimation system are shown.
At last, we conclude this thesis with a discussion in Chapter 5.