There has been growing interest in multimedia service due to the development of computer technology and digital signal processing. For example: video conferencing, interactive multimedia, community security and distance learning, etc. These services would enrich our life and let us have convenient life.
To address these services, an object-based coding standard, MPEG-4 [1], was introduced. Unlike its previous versions, MPEG-1 [2] and MPEG-2 [3], MPEG-4 would not only provide large coding gains but also provide new functionalities for multimedia applications, such as content-based interactivity and content-based scalability. To provide new functionalities for multimedia applications, it first partitions many objects that are different content from a scene. For example: static background, a walking person, and background music, etc. Then, these objects are separately encoded and decoded. In this concept, we could manipulation objects which are interesting and then we could efficiently storage, transmit, and express these multimedia data.
In video coding system of MPEG-4, each frame of a video sequence would be partitioned into many semantic video objects that are represented by video object planes (VOP’s). Each video object plane would contain some information, such as shape, texture, motion, which are used into encoding tool specified in MEPG-4.
However, extract video object plane is not a simple process, because video object plane cannot be uniquely extracted by a low-level feature such as motion, intensity, color, etc, and it is not a normative part of the MPEG-4 video coding scheme.
However, VOP segmentation constitutes the basis for content-based representation of natural video sequences. Thus, video object segmentation, which extracts an object
from video sequence, is a critical process for MPEG-4 coding and crucial factor in the future success of MPEG-4 as a content-based video coding standard.
In general video object segmentation algorithms can be roughly classified into two categories according to their primary segmentation criteria. Ones are background based algorithms [4], [5], [6], and the others are non-background based algorithms [7], [8], [9], [10], [11]. Non-background based algorithms have good performance, but the computation is complexity. These algorithms extract initial object shape or edge by change detection from consecutive frame before tracking new object boundary or shape by motion estimation. They use spatial homogeneity as the primary segmentation criterion to partition many homogeneity regions from a frame. Their are three major steps for these algorithms. First, morphological filters are used to simplify the image and the watershed algorithm is applied for region boundary decision.
Secondly, temporal change detection is used to find moving region. Finally, temporal-spatial fusion is used to extract fine moving object shape. The temporal-spatial fusion means to calculate the motion vector of each region by motion estimation, and merge regions with similar motion or gray level to form the final object region. In object tracking step, region based motion estimation is used to extract new object shape. However, these algorithms could not be suitable in real-time system because of the intensive computational complexity both by watershed algorithm and the motion estimation are computationally intensive operations..
Other approaches are background-based algorithms, where a background information as a reference to extract a moving object depending on the segmentation criterion of temporal change detection between the current frame image and the reference known in advance. This scenario mentioned above is generally feasible for some surveillance applications because it is easy to obtain the reference image when
there is no foreground object in the scene, but it is not the condition for most other applications when the first frame of the video sequence is not the background frame.
Background-based algorithms are possible to extract objects in the scene if they suddenly stop moving, which frequently occurs. We think that these algorithms are more efficient than the previous category algorithms by using the motion that distinguishes a moving object from the background. Since moving objects generate changes in the image intensity and motion detection is highly related to temporal change detection, this scheme is also based on the temporal change detection as a difference between the current frame and the background frame. The background is derived by integrating these differences from the previous successive frames image.
There are, however, some problems during the operation of temporal change detection between the current frame and the background frame. One is that temporal changes in image intensity will be generated by noise or illumination drifts. Another is that moving object generates perturbations in the temporal changes. An example is the occurrence of an area referred to as uncovered background which is generated by move of an object and without updated background. It does not belong to a moving object, but it is generally detected as temporal changed. These approaches have to eliminate uncovered background by reconstructing completely the background frame.
In this thesis, we propose a new video object extraction algorithm based on [4]
and [5] and apply it in the application of a real-time surveillance and video conferencing system. We use background information to extract moving objects to achieve higher efficiency, where edge operator is used to provide higher object extraction performance. In video conferencing system, the first frame of the video sequence is general not a background frame. Therefore, the problem of uncovered background should be solved when using consecutive change detection to construct
initial background information. We will propose a modified connected component algorithm to partition an image into many homogeneous regions with less computation. Then, using the moving edge from frame difference with edge operator could detect the uncovered background region.
After constructing the initial background information, we propose a motion estimation method to track the moving object. Using edge information could segment precise object with free noise in background. The uncovered background problem is still generated because the background is not updated. We have to solve it by both background prediction and change detection method. Combining these two methods could solve efficiently the uncovered background problem during moving object tracking.
This thesis is organized as follows: Chapter 2 describes recent researches for video object segmentation. Chapter 3 describes our framework of video object segmentation algorithm and the initial background construction. Chapter 4 describes object tracking algorithm in detail. Chapter 5 shows the experiment results and discussions. Chapter 6 is the conclusions of this thesis.