Eliminate False Candidates - The Extraction Method

Chapter 3 The Extraction Method

3.4 Eliminate False Candidates

(a) (b) (c) (d) (e)

Fig. 3-9 (a) An imperfect sample character image. (b)The DOG responses: positive response in red and negative response in blue. (c) The inner profile set in red and the bonding rectangle in

cyan. (d) Boundary set.(e) Extraction result.

3.4. Eliminate False Candidates

There are two stages elimination to filter out the false character candidates in order to minimize the computational consumption in later stages.

The first stage elimination is based on the geometric features captured by the CCA process.

After the CCA, each isolated group has own preliminary features measured by its bonding rectangle, i.e., group width W, group height H, group occupancy U (pixel count w.r.t. the bonding rectangle area), U=CT/(W×H). The groups having abnormal preliminary features are possibly caused by non-character objects such as noise or background or variable illumination and are eliminated immediately. For example, large ratio of W to H may represent a long edge or a thin line in the image; small W and H may be caused by noise or a spot; large U may stand for a solid object or shadow…, etc. General characters have a typical value for occupancy ranged in 0.3 ≤ U ≤0.8.

The second elimination is based on a quantity that measures from the “goodness” of the

profiles of each isolated group, namely, the profile score SP. In ideal cases, the edge pixels found by the CCA should be adjacent to Set₁ or Set₂ pixels. i.e., C_P = C_E . However, in real world images, it is often not the case and most are CP < CE. Therefore, the profile score defined by SP= CP/C_Eis calculated for each isolated group to evaluate how much goodness it is from the ideal case. In our experiments, the isolated groups having SP < 0.8 is eliminated. The remaining isolated groups form the character candidates and can be used for recognition or other purposes hereafter.

3.5. Implementation for Fast SSB

Besides a stable and accurate performance, the computational complexity of a binarization algorithm is also important in evaluating the performance. The demand for low computational complexity methods is especially strong in a real-time embedded system. In such systems they require low computational complexity methods for not only speeding up the response to external events but also reducing the power consumption. Although the computational complexity of the method presented here is higher than a global thresholding method, a good implementation can still make it computed efficiently and executed as fast as a global method.

Of course, it is expected to compete with the most local thresholding methods both in accuracy and speed.

The problems to be discussed here is similar to the optimization in implementation. For the proposed method, the optimization can be considered from several aspects,

1. Simplify the convolution with Gaussian filter.

2. Use integers instead of floating points.

3. Use shifter to replace multiplier or divider.

4. Use acceleration table for dynamic threshold propagation.

3.5.1. Optimization in convolution

The convolution with Gaussian kernel takes much computation time because it is directly propositional to the size of the input image and the Gaussian kernel. Let W denote the width and H denote the height of the input image, and give a Gaussian kernel sized n×n. To convolve the input image with the Gaussian kernel, it needs H×W×n² multiplications and H×W×( n²-1) additions. Due to the symmetrical properties of a Gaussian function, the 2D convolution can be decomposed into horizontal and vertical direction. For each direction, n×1 dimensional Gaussian function is used so that H×W×n multiplications and H×W×(n-1) additions is required.

This simplifies the complexity from O(n²) to O(n).

3.5.2. Implement by integers and shifters

In computer systems, integer manipulation is always faster than floating points. Especially, many computer systems still have no hardware floating point processor and allow only manipulations by integers. On the other hand, multiplications or divisions often take longer computation time than simple manipulations such as addition, subtraction, or shifter; it would be preferred if they can be replaced by shifter for speeding up the computation and making the algorithm more practical on various grade computer systems. Consider to implement by integer and shifter in the program, we decide to select the Gaussian kernel as G(x)=G(y)=[1 4 8 4 1]. Fig.

3-10 gives a comparison to the three normalized Gaussian functions: selected Gaussian kernel in Gau3, ideal continuous Gaussian function(σ=1) in Gau1, and ideal discrete Gaussian function in Gau2. It shows that the selected Gaussian kernel is close to the ideal discrete Gaussian function.

Fig. 3-10 Comparison to three Gaussian functions, Gau1: ideal continuous function, Gau2:

ideal discrete function, Gau3: selected kernel

The advantages of using the selected Gaussian kernel are, first, the coefficients are integers;

second, all the coefficients are 2’s multiples so that the multiplications can be replaced by shifters. Based on the selected kernel, the convolution for the first Gaussian image I1(x,y) (σ1=1) can be written as

( )

^x^,^y

(

( )

^x^,^y ^G

( )

)

( )

I1 = ⊗ ⊗ (11)

This can be achieved in program 1,

Where T [x][y] is an intermediate array, I [x][y] is the gray-level intensity on I(x,y) and I1[x][y]

is the Gaussian smoothed gray-level intensity on I₁(x,y). The “<<” operator denotes the left shifter. According to (1), the second Gaussian image I2(x,y) can be obtained by convolving I₁(x,y) with the same Gaussian kernel, i.e.,

( )

^x^,^y

(

( )

^x^,^y ^G

( )

)

( )

I2 = 1 ⊗ ⊗ . (12)

=============================================================

Program1

/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

T [x][y]=(I [x-2][y]+I [x+2][y]) + ((I [x-1][y] + I [x+1][y])<<2) + (I [x][y]<<3);

I₁[x][y]=( T [x][y-2]+ T [x][y+2]) + ((T [x][y-1] + T [x][y+1])<<2) + (T [x][y]<<3);

=============================================================

The same program1 can be used by substituting I1[x][y] with I2 [x][y] and I [x][y] with I₁[x][y]. Note that the equivalent scale for I₂(x,y) is _σ₂ ₌ ₂ based on equation (1). Finally, for computing the DOG image, the two Gaussian images must be normalized to the same level.

Therefore, the summation of the Gaussian kernel must be eliminated. The equation is written as

( )

( ) ( )

− ×

∑ ( )

program to find the DOG image is implemented as:

It is important to check if the value in each step manipulation exceeds the full range of integers of a computer system and trim some least significant bits(LSBs) from the operands if necessary. For a 8-bit gray-level input image, the maximum value of I₁(x,y) is 324×256 which becomes 18-bit signed integers. And the maximum value of I₂(x,y) is 324×324×256 which is extended to 26-bit. For a 16-bit computer system implemented by the proposed method, the program to calculate I₁(x,y) and I₂(x,y) can be changed to program 3 to avoid integers overflow.

3.5.3. Use acceleration table for dynamic threshold propagation

The dynamic threshold propagation often runs over ten iterations for a typical input image

=================================================================

sized 640×480. It is very time-consuming if a whole-image scan is performed on each iteration.

Therefore, an acceleration table is used to reduce the time required for propagation.

The acceleration table is composed of two first-in-first-out (FIFO) memories, namely FIFO A and FIFO B. When the propagation is started from boundary set pixels, the coordinates of the adjacent pixels that belong to non-boundary pixels, i.e., pixels of Seta1

, are sequentially stored into FIFO A. After a whole-image scan, all the boundary pixels are visited and the locations for the adjacent non-boundary pixels are saved. Then, in the second iteration, the propagation starts from the pixels saved in FIFOA, they become Set_b² pixels for this iteration. Again, the coordinates of the pixels adjacent to Setb2

form Seta2

and are stored into FIFO B. The process repeats the same flow and toggles FIFO A and FIFO B by iteration. As a result, only one full image scan is required for the first time and the computation is greatly reduced by the way.

Chapter 4 The Deformation Correction Method

In general cases, the license plate characters are often involved with certain degree of deformation when they are projected into two-dimensional images. The deformation in turns of mathematics could be composed of any transformation such as rotation, scaling, affine transform or mixed transformations…, etc. It is difficult to recognize these characters without correcting the deformation beforehand. In this chapter a novel method is discussed to correct the extracted characters in the proposed license plate recognition system.

4.1. Useful Properties for Deformation Correction

The extracted character candidates are not suitable for recognition directly because they probably undergo some geometric transformations such as rotation, affine deformations or mixed deformation…due to abnormal camera location or capture angle. The method in this section tries to eliminate the geometric transformations of character candidates and transform them into normal orientation for stable recognition. Fig. 4-1 shows some typical transformations from normal plate image in Fig. 4-1(a) such as rotation in Fig. 4-1(b), affine deformation in Fig. 4-1(c) and mixed deformation in Fig. 4-1(d). Due to the difficulties in finding invariant reference points, we utilize two useful properties for license plate characters to eliminate the undergone geometric transformations. The properties may not be sufficient to make perfect recovery from the deformation; however they can be used to detect the deformation and correct it in certain degrees to improve the successful rate in recognition.

The first property used for correcting geometric deformation of character candidates is the baseline. The baseline is an invisible line above which all the characters on a license plate are aligned. For various geometric deformations such as Fig. 4-1(b)-(d), the baseline can be used to correct a part of them, e.g., Fig. 4-1(b). However, for some other deformations, e.g., Fig.

4-1(c)-(d), it needs more information in addition to baseline to correct them for recognition. In order to correct from these complex deformations, a second property is adopted by referring to the horizontal boundary lines of each candidate. Unlike the baseline belonging to a group of character candidates, the horizontal boundary lines are the left and right boundaries belonging to a single character candidate which can be used to normalize the slant angle of each character candidate so that it can be changed to a state suitable for feature extraction and recognition.

Before locating the baseline, the character candidates are grouped by their sizes and positions.

The rules of license plates [48] with an acceptable tolerance are used to check if the character candidates belong to the same license plate. The candidates obeying the rules will be grouped and considered as a single license plate. For each group of character candidates, a baseline is expected to exist below and can be found by the following methods.

Fig. 4-1 Typical geometric transformations in LPR systems

(a) Normal Plate (b) Rotational transformation

transformation

4.2. Voting Boundary Method

The voting boundary method is suitable to find boundary lines of a group of pixels in an image. It works by assuming many straight line candidates and detecting the best one passing through most of the edge pixels by voting. The method is in some respects similar to Hough transform[49] and has the same advantage with it in robust detection. However, it simplifies the computation from Hough transform by replacing the complex triangular functions with simple additions and subtractions.

Fig. 4-2 A character candidate and the bottom pixels

Before the voting boundary method, it is required to find the edge pixels in four directions, respectively top, bottom, left and right boundary pixels. The edge pixels are the most outside pixels of an image group. For example, the bottom pixels are defined as the set of pixels that first appear when searching from bottom to top on each vertical pixel line. Fig. 4-2 shows an example on how to find the bottom pixels, where the gray pixels are grouped by connected component analysis in the extraction stage and the pixels marked as ‘B’ are the bottom pixels found according to the definition above. The principle for computing the voting boundary method starts from similar triangles. Let’s see Fig. 4-3 for example, in the similar triangle pair

∆ABC and ∆ADE, it is known that ^a×^d =

(

^a+^b

)

×^c. Let the line NG be one of the bottom boundary lines of the pixel groups inside rectangle MNOP and the black circles are the

B B B B

B B

Direction to find bottom pixels

corresponding bottom pixels. The distances from the bottom edge, NO to each bottom pixel are stored in an array BP, where the array has w elements BP[x], x=1 to w. If BP[x] is on the line

NG , then it satisfies

( )

^w ^BP

[ ]

x× = × . (14)

Consider to include error tolerance and rearrange the equation, the BP[x] is on line NG if it satisfies

where the variable r represents thickness of the boundary line and can be adjusted according to different applications.

Fig. 4-3 Derivation of the voting boundary method

Each boundary pixel is voted into one of the following three sets according to the inequality pairs: the first set FIT if a boundary pixel satisfies the both inequality, the second set

each boundary line candidate, the distance of start pixel p1 on coordinate (x1, y1), and end pixel p_n on coordinate (x_n, y_n), are measured as

( ) (

)

1 y y

x x

d = _n − + _n − and treated as the length of the boundary line.

The process to vote boundary lines is drawn in Fig. 4-4, where it can be seen that the computation is very simple because of continuity of the x-axis. Only one division representing the angle between the boundary line and the x-axis is required at the beginning and few additions or subtractions are required afterward. After the voting process, the line gets the most votes in set FIT is assigned to be the true boundary line of the pixel group. Note that the set UNDERFIT and OVERFIT can be referenced to delete improper character candidates if any one of them is abnormally large.

Fig. 4-4 Flow chart to vote boundary lines

Assign m = g/w

Initialize x=0, n=0

If n < BP[x]-1

UNDERFIT=

UNDERFIT + 1

If n ≥ BP[x]+1 OVERFIT=

OVERFIT + 1

FIT = FIT +1 n = n + m

x = x+1

If x reach end

End

Yes

Yes No

4.3. The Correction Method

The method used to find the baseline is first locating the bottom pixels of each character candidates, and then use voting boundary method to find a line that most bottom pixels pass through. After finding the bottom pixels of each character candidate, the voting boundary method is applied to detect the baseline passing though most of the bottom pixels.

Once the baseline is detected, the next step is to correct rotation angles of character candidates. As discussed above that the characters on a license plate are aligned above the baseline. If the baseline found by the voting process is rotated, it stands for that all the character candidates on it are rotated, too. Therefore, the rotation angle of the character candidates can be recovered to normal position according to the detected baseline. During the recovery process, each character candidate is rotated and the related preliminary features such as width, height and occupancy are re-measured for the feature extraction in next stage.

For each recovered single character candidate, the voting boundary method used to find baseline of multiple characters is applied again to find the horizontal boundary lines of each single character candidate. While something different from the former, the conditions for detecting horizontal boundary lines are adjusted for different characteristics of single characters.

After the voting boundary process, the true boundary line is selected according to the following two rules: First, the number of votes to set UNDERFIT must be zero. It stands for that all the edge pixels must lay inside the boundary lines. Second, instead of referring to the number of votes in set FIT, the length of boundary line is referred as the key factor to select true boundary line. The length of a boundary line is defined as the length from the first edge pixel to the last one in set FIT. The boundary line candidate of longest length is selected as the true boundary line if its length is longer than a pre-defined threshold. For some characters containing curvature boundaries, the thickness r in (15) can be adjusted to retain accurate results. A typical choice for

32×32 size characters is r=2.

Based on the left and right boundary lines, each candidate is adjusted to balance the left and right boundary. Fig. 4-5 shows an example on how to adjust a deformed character based on the detected boundary lines; Fig. 4-5(a) is the source character and Fig. 4-5(b) is the character after adjustment, say, adjusted character. Rectangle ABCD and A′B′C′D′ are respectively the

rectangular borders of the source character and adjusted character. w and w′ are the widths of the characters before and after adjustment. The character height, h, is unchanged after the adjustment. Node1 to node4 are left edge pixels and node5 to node8 are right edge pixels.

Node1 and node4 are respectively the start pixel and end pixel of the left boundary line. Node5 and node7 are of the right boundary line. Our target is to arrange the left and right boundary lines symmetrically, i.e., any two pixels having the same y-coordinate on left and right boundary line have the same distance to the outer rectangular left and right borders. Once the deformation is corrected, the characters candidates are then passed to next stage for recognition.

(a) (b) Fig. 4-5 Compensation of geometric deformation

Chapter 5 The Recognition Method

After deformation correction, a novel method named accumulated gradient projection vector method, or AGPV method in short, is applied to recognize the extracted character candidates.

5.1. Why AGPV

When dealing with detection or recognition of characters, edge/line is a basic component that could never be ignored. Straight edges have simple representation and stable characteristic that make them easier detected than any other attributes in an image.

There are numerous methods of edge detection can be found in literatures[16]-[20], among which Hough transform [16] is well-known for its stable and reliable performance. However, Hough transform is also famous for the expensive cost on computation and memory consumption. Although some methods [21][22] are proposed to improve the speed and reduce memory consumption of Hough transform, sometimes it is still insufficient in consideration of accuracy for some applications. In our study, Hough transform provides an important concept to us that stable performance can be achieved by means of accumulation.

In this work we propose a novel accumulated gradient projection method for detection of edges. The new method adopts the same concept as Hough transform to accumulate the pixels of similar attributes in order to achieve stable and reliable result. Besides, two more concepts are included to guarantee the reliability of the method. First, the new method projects the pixels of similar gradient orientations onto an axis which is chosen parallel to the majority of these gradient orientations. In general cases the gradient orientations of edge pixels are perpendicular to the direction of the edge. The projection method achieves the best accuracy of measuring since the edges are measured from their perpendicular direction. Second, instead of referring to pixel intensity which might be sensitive by illumination change, the new method accumulates

the gradient magnitudes which are relatively more stable against illumination change. Besides, the result is also stable against noise because it refers to the majority of accumulation and minimizes the effect of random distributed noise.

5.2. The AGPV Methods

There are four stages to recognize a character using the AGPV method. First, determine the axes; including the nature axes and augmented axes. Second, calculate the AGPVs based on these axes. Third, normalize the AGPVs for comparing with standard ones. Fourth, match with standard AGPVs to validate the recognition result. The procedure will be explained in detail in the following sections.

5.2.1. Determine Axes

When discussing about the AGPV method, it is important to introduce an essential property, axes, in advance. An axis of a character is a specific orientation on which the gradients of grouped pixels are projected and accumulated to form the desired feature vector. An axis is represented by a line that has the specific orientation and passes through the center of gravity point of the grouped pixels. The axes of a character can be separated into two different classes named nature axes and augmented axes. The two classes are different in characteristics and

在文檔中利用尺度空間二值化與累積梯度投影的方法應用於車牌字體的擷取與辨識 (頁 34-0)