Review of Related Works - 利用尺度空間二值化與累積梯度投影的方法應用於車牌字體的擷取與辨識

This chapter briefly describes three important techniques from which this work is motivated and constructed. First, the methods dealing with recognition of license plate characters are reviewed. Second, the useful scale-space theory and its most popular representation, difference-of-Gaussian functions, are discussed. Finally, the most popular methods doing image binarization are described and compared.

2.1. License Plate Recognition

In traditional LPR systems, there is a detection function in the first step to find possible areas that license plates may appear. The function often requires high speed feature detection and therefore is generally focused on simple features such as gradient energy or Harr-like features[51] in the image. In order to make fast detection, traditional methods often suppose a fixed camera capture angle and allow a small degree of deviation in plate size and orientation.

On the detected areas, more specific rules are used to accurately localize the entire license plate and find out the histogram for binarization. Once the plate is binarized, the corresponding baseline becomes an important reference for characters segmentation and normalization. Based on the binarized plate image, the segmentation is often done by projecting the TRUE pixels onto baseline and finding the valley on the projected histogram as segmentation boundaries. For the segmented characters, the statistical features of them are extracted and fed into a statistical classifier such as template matching[16], vector quantization[4], support vector machine(SVM)[15], or neural networks(NN)[5][6], for recognition. The statistical features include some vectors such as CC(contour-crossing count)[46], PBA(peripheral ground area)[47], and CS(character shape), that are common used for recognizing license plate character.

2.2. Scale Space Theory

The concept of scale space [11] starts from the basic observation that real-world objects are composed of different structures at different scales. In other words, real-world objects may appear in different ways depending on the scale of observation. For a computer designed to detect the existence of an object in an image, it is necessary to consider all the possible scales that object may appear in the image in order to capture the interested target in the correct scale.

Earlier works such as [12] and [13] have suggested that Gaussian function is the best choice for scale-space kernel. Also, in [13], the author showed that the difference-of-Gaussian(DOG) function provides a close approximation to the scale-normalized Laplacian of Gaussian, σ²∇²G, which was proven by detail experiment in [14] that it produces the most stable image features compared to a range of other possible image functions.

There are two additional advantages using Gaussian functions as smoothing kernel. First, its symmetric property makes it practical to decompose the two-dimensional convolution into two independent single dimensional equations. This greatly reduces the computation and shortens processing time in computing different scale images. Second, taking the Fourier transform of a Gaussian function yields another Gaussian function [17]. Consequently, it can be derived that the successive convolution with Gaussian kernel G(σ2) and G(σ1) is equivalent to convolution with G(σ3), where

2 2 2 1

3 σ σ

σ ⁼ ⁺ (1)

Based on (1) and assumed that a Gaussian point spread function (PSF) is used to approximate the image capturing process[18], it can explain that the blur in input image can be ignored if a sufficiently large observation scale is chosen since σ3~σ2 ifσ2 >>σ1.

2.3. Image Binarization

The methods for binarization of gray-level images can be divided into two classes: global and local thresholding. Global thresholding methods generally binarize the image with a single threshold. In the contrast, local methods change the threshold dynamically over the image according to local information. The threshold for global methods is often easier to be determined than that of local methods because it focuses on the entire image. However, global methods are easily failed when the dealt image contains noise, variable illumination, or complex background. Local thresholding methods have better adaptability than global ones to deal with illumination change or complex background, however, it is difficult to decide the range of local area for threshold determination and yet still sensitive to noise.

Global thresholding methods often calculate the threshold based on histogram analysis [7], [20]-[21]. Otsu’s method [7] proposed from the viewpoint of discrimination analysis is one of the most preferred global techniques by investigators. It directly approaches the feasibility of evaluating the "goodness" of threshold and automatically selects an optimal threshold from the zeroth- and the first-order cumulative moments of the gray-level histogram. In practice, this method does not work well for the images with shadows, inhomogeneous backgrounds, and complex background patterns [22]. It is also discovered in [22] that, a single threshold or some multilevel global thresholds could not result in an accurate binary image.

Local thresholding methods generally find thresholds by statistical measurement in local areas [23]-[26] based on the principle that objects in an image provide high spatial frequency components and illumination consists mainly of lower spatial frequencies [31]. The local intensity gradient (LIG) method in [23] is one of the most popular local thresholding methods which first finds the pixels with high intensity gradient as reference of initial threshold, and then extends the threshold to whole image through region growing method [30]. It uses a

predetermined window size to calculate the regional gradient means, locates low gradient areas in the image based on the regional means, and finds edge pixels by comparing pixel’s intensity gradient with the regional means.

In general, local thresholding methods are, considered from real world situations, more accurate than global ones. However, they still suffer from two problems that usually make them unsatisfactory for investigators. First, it is difficult to give a proper size of the “local area”

without prior information in the source image. Second, the methods of this class are usually more computationally expensive than the other one; it makes the local methods almost unacceptable for real-time applications.

There are still some hybrid methods to binarize the image by referring to the expected content within the region of interest (ROI). Typical applications performing hybrid binarization such as license plate recognition (LPR) or automatic document analysis, often segment the image into areas and find the areas which are most likely to be ROIs before binarization. Such systems often have faster speed and higher accuracy than general (global or loca) thresholding methods but usually require prior information within the ROI for fast detection and binarization. For example, in the LPR system [3], the author uses Haar-like features in the first step to perform fast detection and find out the ROI(license plate candidates), and then perform peak-valley analysis within the ROI for binarization of the license plates candidates. The peak-valley analysis is referring to the histogram acquired in the ROI and assumes some parameters such as number of characters, characters scale and orientation are already known. In document binarization method [28], the input image is firstly segmented into different types ROIs containing different contents such as characters or graphics or images. And specific binarization methods are applied within the ROIs based on the characteristics of the type of contents. In usual, the hybrid methods are not general enough to be applied onto different applications.

在文檔中利用尺度空間二值化與累積梯度投影的方法應用於車牌字體的擷取與辨識 (頁 18-22)