1.1 Motivation
Human activity recognition from video streams has many applications such as home care system, human-machine interface, and automatic surveillance, etc.
However, there is no rigid syntax and well-defined structure in human action recognition system. Therefore, it makes human activity recognition a very challenging task.
Several human activity recognition methods have been proposed in the past few years. Yamato et al. [1] turn image frames into a symbol sequence and use HMM to recognize human action. Bobick and Davis [2] recognize human activities by comparing motion-energy and motion-history of template images with temporal images. Cohen and Li [3] use a view-independent 3-D shape description for classifying and identifying human activity using SVMs. There have been some significant projects on detecting, tracking people and recognizing their activities. W4 [4] is one of them. W4 can detect people (single person or people in group) by adopting an adaptive background model and identify the activities by finding the body parts on the silhouette boundary.
In vision-based systems, foreground subject extraction is usually the first an important step, which is also the objective of this thesis. If we can improve the accuracy of extracting foreground object, then monolithic performance of surveillance
proposed system can be separated into four components. The first component is the background modeling. The second component is the foreground subject extraction.
The third component is the transformation of image data into a space which is smaller and easier for posture recognition. The fourth component is the posture classification of an image frame and activity recognition using frame sequences. In this thesis, we emphasize the first two components to improve the accuracy of extracting the foreground image, so that we can enhance the performance of an activity surveillance system.
Fig. 1.1 The flowchart of our human activity recognition system.
Background modeling
Foreground subject extraction
Transformation of image data
Posture classification and activity recognition
1.2 Background Modeling
Background subtraction is widely used for detecting moving objects from image frames of static cameras. Most of this work has been based on background subtraction using color or luminance component. In these approaches, difference between the coming frame and the background image is performed to detect foreground objects. W4 [4] is a famous one to be noted. It records the maximum and minimum luminance and the maximum inter-frame difference in every position of a frame in a background video. Then every pixel of the image frame subtracts the maximum and minimum luminance at this position. If the pixel’s absolute value of this difference is larger than the maximum inter-frame difference, the pixel is a foreground.
Background subtraction is extremely sensitive to dynamic scene changes due to illumination change. In order to solve the artifact causing from varying luminance, we develop a method which is more robust to the illumination changes. To this end the method makes use of frame ratio rather than frame difference in the luminance component.
If we utilize only the luminance to do background subtraction, we cannot detect a foreground pixel correctly when the colors of foreground and background are similar.
To make fully use of the spectrum of a pixel, it is imperative to do the segmentation in the color domain. In our system, we build our background model in the HSV color space. We use both the luminance and the chromatic components in the background
According to our investigation, we have found that CIELAB color space is developed to become more sensitive in color difference, which also bears the attributes from Hue, Saturation and Lightness. In the CIELAB space, the color difference formula is proposed in this thesis to effectively differentiate the color difference between two colors, where effectiveness becomes significant for close color. The background model records the maximum color difference in every position of an inter-frame in a background video. If the pixel’s color difference between the background and the foreground is lager than a preset maximum color difference, the pixel belongs to the foreground. In this way, the color difference between the background and the foreground becomes larger, and thus the effectiveness of foreground object extraction can be raised greatly.
1.3 Foreground Subject Extraction
Foreground subject extraction is an important step of the vision-based human activity recognition system. Many authors have developed methods of detecting people in images. Park and Aggarwal subtracted foreground pixels from background by computing Mahalanobis distance in each pixel in the HSV color model [5]. Leung and Yang built a human body outline labeling system [6]. Jabri and Duric [7] used color and edge information to improve the quality and reliability of the results. They have all tried to find out the real poses a human did by human body outline or by silhouettes.
Furthermore, the moving cast shadows mostly exhibit a challenge for accurate foreground subject detection. A lot of attempts have been developed to tackle the shadow suppression [8]−[13] encountered in background subtraction. Horprasert et al.
[8] and Cucchiara et al. [9] utilized the rationale that shadows have similar chromaticity, but lower brightness than the background model. Under the proposed frame work in the HSV and color space, we can effectively identify the shadow existence in our detected foreground subject.
After building background models in HSV and CIELAB color spaces, we can extract foreground subjects from video frames by subtracting pixel’s color difference existing in the image frames.
1.4 Thesis Outlines
The thesis is organized as follows. Before introducing the technique of our human activity recognition system, the basic concepts concerning the color difference formula in HSV and CIELAB color spaces are introduced in Chapter 2. In this chapter, we first introduce the HSV and CIELAB color spaces, and then some color difference formulae. Chapter 3 describes in detail our CIELAB-based method, embedded in difference formulae, to build a statistical background modeling for foreground subject extraction. In Chapter 4, the experiment results of the foreground object extraction in the HSV and CIELAB color spaces are shown and compared. At last, we conclude this thesis with a discussion in Chapter 5.