Motivation - 影像處理與電腦視覺技術應用於複雜文件影像分析、夜間駕駛輔助、以及視訊監控系統之研究

Chapter 1. Introduction

1.1 Motivation

Extraction of objects from observed images using image thresholding techniques is useful for in the early processing stages of a vision system [10][11]. To-date, many researchers have developed valuable thresholding techniques [12]-[26] for applications that include image segmentation, pattern recognition and document analysis, among others. The conventional concept of most of these methods is that the thresholding is carried out as a

simple classification procedure that the pixels of the image are assigned to two classes, foreground object pixels and background pixels. Hence most of them were developed for effectively adoption on bi-level thresholding process. The surveys and comparative performance testing studies of these methods was presented in the remarkable works of Sahoo et al. [10] and Lee et al. [11]. Besides, in Trier and Jain’s work [27][28], a goal-directed evaluation methodology has been proposed for performance testing on the thresholding methods based on judgment of the recognition performance of conducting OCR process on the binarization results obtained by these methods. The conventional thresholding techniques are all based on finding the threshold value which achieves the optimal condition of the criterion functions. Indeed, they are effective in bi-level thresholding. However, when the number of desired thresholds increases, the computation needed to obtain the optimal threshold values is substantially increased and the search to achieve the optimal value of the criterion functions is particularly exhaustive. Another problem associated with the conventional methods is that the number of segments, into which the image should be segmented, cannot be suitably and automatically determined. Thus, we intend to develop a computationally fast and effective automatic multilevel thresholding approach to overcome the above-mentioned image segmentation issues associated with the conventional methods.

For document image analysis, the interesting objects in a document image are textual objects. Extracting textual objects from document images provides many useful applications in document analysis and understanding, such as optical character recognition, document retrieval, and compression [4][5]. To-date, many techniques were presented for extracting textual objects from monochromatic document images [29]-[32]. In recent years, owing to advances in multimedia publishing and printing technology have led to an increasing number of real-life documents in which stylistic character strings are printed with pictorial, textured, and decorated objects and colorful, varied background components. However, most of

conventional approaches cannot work well for extracting textual objects from real-life complex document images. Compared to monochromatic document images, text extraction in complex document images brings many difficulties associated with the complexity of background images, variety and shading of character illuminations, superimposing characters with illustrations and pictures, as well as other decorated background components. As a result, there is an increasing demand for a system that is able to read and extract the textual information printed on pictorial and textured regions in both colored images as well as monochromatic main text regions.

Since most textual objects show sharp and distinctive edge features, methods based on edge information [33]-[36] have been developed. Such methods utilize an edge detection operator to extract the edge features of textual objects, and then use these features to extract characters from document images. Such edge-based methods are capable of extracting textual objects in different homogeneous illuminations from graphic backgrounds. However, when the characters are adjoined or touched with graphical objects, texture patterns, or backgrounds with sharply varying contours, edge-feature vectors of non-text objects with similar characteristics may also be identified as text, and thus the characters in extracted textual regions are blurred by those non-text objects. Several conventional color-segmentation-based methods for text extraction from color document images have been proposed [37]-[41]. These methods utilize color clustering or quantization approaches for determining the prototype colors of documents so as to facilitate the detection of textual objects in these separated color planes. However, most of these methods have difficulties in extracting characters which are embedded in complex backgrounds or that touch other graphical objects. This is because the prototype colors are determined in a global view, so that appropriate prototype colors cannot be easily selected for distinguishing textual objects from those touched graphical objects and complex backgrounds without sufficient contrast.

For vision-based systems for driver assistance and autonomous vehicle guidance, many researchers have also developed valuable techniques for recognizing interesting vehicles and obstacles from images of road environments outside the car [6]-[9], to facilitate applications on the camera-assisted system that assists drivers in understanding possible hazards on the road, and automatically controlling the apparatus of vehicles, such as headlights, windshield wipers, etc. A vision-based vehicle and obstacle detection system is aiming at identification of vehicles, obstacles, traffic signs and other patterns on the road from grabbed image sequences by means of image processing and pattern recognition techniques. Until recently, researchers in this field still open new questions and concepts [42][43]. By adopting different concepts and definitions on interesting objects on the road, different techniques are applied on the grabbed image sequences to detect them as vehicles or obstacles. For locating vehicles in an image sequence, the task can be carried out by searching for specific patterns on the images based on typical features of vehicles, such as shape, symmetrization, or their surrounding bounding boxes [44]-[46].

Until recently, most of the conventional works focused on detecting vehicles under daytime road environments. However, under bad-illuminated conditions in nighttime road environments, those obvious features of vehicles which are effective for detecting vehicles in daytime become invalid in nighttime road environments. Thus, most of the above-mentioned conventional techniques cannot work well under such nighttime road environments. At night, as well as under dark illuminated condition in general, the only visual features of vehicles are their headlights and taillights. Headlights and taillights are visible if a vehicle lies in the visible range of the CCD camera mounted on a camera-assisted car. However, there are also many other illuminant sources coexisted with the vehicle lights in nighttime road environments, such as street lamps, traffic lights, and road reflector plates on ground. These non-vehicle illuminant sources cause many difficulties for detecting actual vehicles in

nighttime road scenes.

As for the applications on video surveillance systems, it is an emerging application of video compression and communication for security issues in modern life. However, conventional monitoring systems are mostly analog systems, which exploit many tapes and human effort, to replace the tapes frequently. The recording time and image quality of systems cannot compete with those of digital monitoring systems. In digital monitoring systems, transform coding techniques are the most popular for video recording applications.

At the beginning of the development of this field, DCT-based (Discrete Cosine Transform) coding techniques were commonly used, and have since become an element of the JPEG image compression standard. Accordingly, its application can be seen in many electronic devices today. Over recent years, researchers have demonstrated that DWT-based (Discrete Wavelet Transform) transform coding ([47]-[50]) outperforms DCT-based methods. Hence, newly emerging image compression methods such as the video compression method standard MPEG-4 [51][52] and the still image compression standard JPEG2000 are using DWT-based methods [53][54]. Since multiple CCD camera systems are continuously heavily loaded with sequences of images, so the speed of image compression is critical in such systems. Presently, DWT-based compression techniques suffer from high computational complexity, and so cannot support multi-channel video recording with a high frame rate. A new, highly efficient DWT-based technique, which yields images of high visual quality, is a significant demand for such application.

在文檔中影像處理與電腦視覺技術應用於複雜文件影像分析、夜間駕駛輔助、以及視訊監控系統之研究 (頁 15-20)