Recognition

Chapter 2. Backgrounds

2.3 Recognition

The shape is also an important feature of static waymarks. Authors in [3]

apply the corner detector to recognize object shape, as shown in Figure 2-16. [4] uses the relation of boundary line (Figure 2-17) to determine the shape. The recent approach [5] uses the distance to borders (Dtbs) as the features of shape. The illustration of [5] is shown in Figure 2-18.

Figure 2-17 Corner detection result in [3]

Figure 2-18 The relation of boundary line in [4]

(a) (b)

Figure 2-19 (a) The Dbts of triangle (b) the feature vectors of (a) by Dtbs [5]

After the shape recognition, we want to recognize the context inside the

shape. The current approach [2] applies the Histogram of Oriented Gradient descriptor (HOG) to obtain the features of the objects. In [2], they first divides the image into small connected regions, called cells. After that, each cell gathers a histogram of gradient directions for the pixels within the cell. Finally, these histograms of each cell are represented as the descriptor. For better accuracy, the measure of intensity across a larger region of the image, called a block, normalizes the histograms of each cell in

the block. This normalization enhances the robustness of the descriptor to lighting changes.

[10] proposes the concept of the Speeded Up Robust Features (SURF). The same as HOG, the input image is divided into many sub-regions. Each sub-region is analyzed by the Haar wavelet and the filter response in horizontal direction and vertical direction are denoted as d_x and d_y respectively. The wavelet responses

dxand d_yare summed up over each sub-region as two feature vectors. In order to obtain extra information, they also extract the sum of the absolute values of the responses, d and_x d . Therefore, each sub-region has a four-dimensional descriptor _y

vectorv(

   

d_x, d_y, d_x, d_y). Figure 2-19 are three examples of SURF descriptor.

Figure 2-20 Three examples of SURF [10]

Chapter 3. P ^ROPOSED M ^ETHOD

The goal of the system is to detect and recognize the static waymarks. The system includes two stages 1) Bottom-up, 2) Top-down learning.

1) Bottom-up: In this stage, the static waymarks in the image are fast identified.

2) Top-down learning: In this stage, we want to get the features of the static waymarks which are useful for recognition. Here, we use the skills of machine learning to learn the features that can be used in the recognition.

Input image goes through the operation of Bottom-up and a corresponding saliency-map is generated. The Saliency-map shows where the interesting region and where the static waymarks in. Second we use the features that get form the Top-down learning to do recognition.

Figure 3-1 Block diagram of the proposed system

3.1. B OTTOM -U P

In Bottom-up, we want to fast identify the location of the static waymarks.

Here, we find the physical properties to implement our Bottom-up algorithm. The static waymarks like fire hydra, bands have the following properties 1) strong contrast with respect to its surrounding regions 2) regular shape 3) simple texture. The three properties will be discussed in following section.

The flow chart of proposed Bottom-Up is illustrated in Figure 3-2. First the input image are decomposed into three color map R color, G color, and B color.

Second these color map go through three operations based on three physical properties 1)strong contrast, 2)regular shape, and 3) simple texture. Finaly the corresponding salient map will be generated for each color map and combine them.

Figure 3-2Block diagram of the proposed Bottom-up

3.1.1. L INEAR F ILTERING OF I MAGE D ATA

Because the color of the static waymarks is simple, similar to [6]‟s approach, we decompose an input image into a few feature vectors, including R color, G color, and B color, where r, g, and b denote the red, green, and blue components of the input image. For each color, negative values are set to zero.

 

In the Figure 3-3, we can notice the objects, (a) fire hydra, (b) blue band, and(c) unable parking band, have the strong contrast compare to the surroundings.

This is because they are designed salient for the people. And in the top-down, we want to analyze the color of objects. Thus, we hope the algorithm that can not only show the strong contrast but also can do the analysis of color. Therefore we apply the feature-pair proposed by [9].

(a)fire hydra (b) blue band (c) unable parking band

Figure 3-3The example of the static waymarks

For each of the R color, G color, and B color, we compute the feature-pair distribution as proposed in [9].

Take R component for example, we first analyze the relation between the central point and its 8-connectivity neighbors. In the Figure 3-4, we define the 8 neighbors of the point E and define the eight 2D coordinates (E,A) (E,B) (E,C) (E,D) (E,F) (E,G) (E,H) (E,I) as the R color-pairs. Here A, B….I is the R color value and the point A~I is the neighbor of E. Here we use the eight R color-pairs forming the R color-pair distribution.

Figure 3-4 The main idea of feature-pair

Figure 3-5 The feature-pair distribution

By using the R color-pairs of all the image pixels, we can obtain the R color-pair distribution as shown in Figure 3-6. Obviously, we can notice that the R color pairs in smooth regions will lie around the 45° line; whereas the R color pairs across edges will lie far from the 45°line.

Figure 3-6 shows an example of the R color-pair distribution of the fire hydrant image. It is clear that the road and grass are the major backgrounds of the image.

Therefore the R color-pairs of these two regions are two major clusters in the R color-pair distribution. On the other hand, the fire hydrant map to a smaller cluster in the top-right corner of the distribution. Moreover, the R color pairs over the road -grass boundary and the fire hydrant - grass boundary form four clusters (represented in green color) far away from the 45-degree line as shown in Figure 3-6.

In the R color-pair distribution, we can easily notice that the boundary between the fire hydrant and the grass show a stronger contrast than the road-grass boundary. Here we can conclude two facts that (1) the fire hydrant is “less common”

than the road and grass; and (2) the fire hydrant has a stronger contrast compared to its background. We obtain the conclusion that the fire hydrant may attract the attention of most observers. We also form the G color-pair and the B color-pair distributions.

Figure 3-6 The example of R color-pair distribution

Similar to [10], we form a 3-D histogram by dividing the plane of feature-pair values into uniform cells and calculate the number of feature pairs in each cell. Most clusters lei around the diagonal in the 3-D histogram; the largest cluster corresponds to the background in the image; the foreground objects correspond to smaller clusters; and those clusters away from the diagonal correspond to the edge.

Figure 3-7 The 3-D histogram

Based on the 3-D histogram, a contrast-weighting algorithm is proposed to weigh the saliency degree of each cell around the diagonal. This weighting algorithm contains three main parts: the “edge weight” to gather the information that the off-diagonal cells provide, the “self weight” to determine whether a diagonal cell corresponds to a visually salient region, and the “foreground weight” to judge whether a diagonal cell corresponds to not the background.

The three parts are respectively base on three ideas 1) the stronger the edge is the more salient the objects are, 2) the smaller the size of objects is the more salient the objects are, and 3) the more different comparing to background is the more salient the objects are. Thus, we combine the three properties to the formulate

Edge weight count is like Figure 3-8. The formulation is

_ ( , ) 2

, {3 - _ }

Edge weight hist i j d

i j D histogram off the diagonal

  



Eq. 3-4

Figure 3-8 The computation of edge weight

2) Self weight

The Self weight is the size of objects that can give t how much contrast is. The smaller the size of objects is the more salient the objects are. The formulation is

, { }

_ = ( , )

i j the histgram of the object

Self weight hist i j





^{Eq. 3-5}

3) Foreground weight

The foreground weight is how different between objects and background. The more different comparing to background is the more salient the objects are. In the 3D histogram, the maximum of histogram on the diagonal is the background. And the foreground weight is how long to the maximum of histogram like Figure 3-9.

The formulation is

F o r e g r o u n d^_ w e i g h t Eq. 3-6 d

Figure 3-9 The 3-D histogram

3.1.3. R EGULAR S HAPE

As we know, the static waymarks are different from the natural scene like trees, flowers. The main difference is the shape. In the Figure 3-10, we can find the shape of (a) fire hydra, (b) blue band, and (c) unable parking band is more regular than the surrounding.

(a) Fire hydra (b) blue band (c) unable parking band

Figure 3-10 The example of the static waymarks

The regular shape means it is compounded of the straight line like horizontal, straight, oblique, and the circular line. Therefore the best way to implement the property is to detect the boundary. Here we find the filter banks have these properties.

In the Figure 3-11, we use these filters to do convolution with the R color, B color, and G color. And the objects have the regular shape will response.

Figure 3-11 The filter that can detect boundary [11]

3.1.4. S IMPLE T EXTURE

As we observe, the static waymarks are designed for easy identification. Thus they have to be designed differently compared to natural scenery. We notice that they are not complicated and have the simple texture compared the natural scenery.

In the paper [8], it mentions that if the signal is a pulse Figure 3-12 (a), the reconstructed signal based on the phase spectrum will have high response in location of input pulse.

(a) (b)

Figure 3-12 (a) Original signal(pulse) (b) Reconstruction result using phase [8]

Then we find the specially designated objects also have the property. Look the Figure 3-13 (a). We find the blue band has simple texture and at the B channel (Blue) Figure 3-13 (b) we can notice it also has the property like pulse.

We find the specially designated object also like a pulse signal in nature image.

Take Figure 3-13 (a) and (b) for example. Blue signboard in the image has simple texture. At B channel (Blue) it is the same as the situation depicted in Figure 3.12(a)

(a) (b)

Figure 3-13 (a) Blue band (b) B color of blue band

Conversely, if the signal is not pulse but regular signal like sine Figure 3-14 (a), then we can find the result (b) has lower response. We think the natural scenery has

this property because it has more complex structure

(a) (b)

Figure 3-14 (a) Original signal(sine) (b) Reconstruction result using phase [8]

The method of reconstruction result using phase is that ( , ) ( ( , )) respectively. P(f) represents the phase spectrum of the image. g(x, y) is a 2D Gaussian filter. sM( x , y ) is reconstruction result using phase.

3.1.5. C OMBINATION

Based on the three physical properties, we can obtain the three saliency maps which are normalized to one for each color map. The three saliency maps are combined and we apply the variance of them as the weighting (Eq.3-13). The R, G, and B saliency map are generated respectively by the Eq.3-10, 3-11, and 3-12.

1 2 3

3.2. T OP -D OWN L EARNING

In this section, we will discuss how we design a classifier for the recognition of the static waymarks. In order to achieve robustness to illumination, scale, and rotation, we chosen the color and shape as the features; then, a SVM classifier is trained based on the chosen features. Finally, we apply the Synthetic Minority Over-sampling Technique (Smote) algorithm to solve the problem of imbalanced data in the machine learning. The following sections are 1) color analysis and 2) shape analysis 3) SVM classifier 4) the algorithm of Smote

3.2.1. C OLOR A NALYSIS

As we known, the RGB color space is sensitive to illumination change;

therefore, we choose the HSI space as the candidate space for color analysis due to its robustness to illumination change.

In order to get color features, we obtain the histogram of hue and saturation and don‟t consider intensity. The example is shown in Figure 3-15(b) and (c).

(a) (b) (c)

Figure 3-15 (a) Blue signboard (b) the saturation of (a) quantized by ten (c) hue of (a) quantized by ten

3.2.2. S HAPE A NALYSIS

Only color information is not enough for presenting the features of the static waymarks, hence the shape of the object is also taken into consideration. There are many methods to get the features of shape. However, in our system, we use the filter based approach for shape analysis because the shape of the static waymarks is composed of many different directional lines. Here we use the six directional lines to present the shape as shown in Figure 3.16. In order to be invariant to light change, we convolute the hue and saturation of the static waymarks with these filters as shown in Figure 3-17.

Figure 3-16 The six directional lines

Figure 3-17 The six directional filters

The challenge of the shape analysis is the rotation of the object. To take care of this issue, we use the descriptor in [12]. In this descriptor, we divide the patch into 24 regions as shown in Figure 3-18(a).The sum of value in each region constitutes the 24 dimensions features vectors. If the objects rotate, we have to adjust the feature vectors as shown in Figure3-18.

(a) (b)

Figure 3-18 (a) Original (b) rotate

In order for rotation invariant, we have to adjust the features vector. Here we choose the method “maximum shift”. In the Figure 3-19, we divide the patch into 8 regions and get the sum of each region.

Figure 3-19 Maximum shift

If the maximum is region 5 in the Figure 3-20(a), we shift the regions 5, 13, and 21 in the Figure 3-20 (b) to the top as shown in Figure 3-20 (c).

(a) (b)

(c)

Figure 3-20 (a) Find the maximum region (b) correspond region t (c) the result of shift

Although we apply “maximum shift”, we can‟t solve the problem of rotation.

Example in Figure 3-21 demonstrates this problem. In Fig. 3.21, the object rotates in

(b). We can find the result of filter 1 in (a) is equal to the result of filter 2 in (a) if we use “maximum shift”. Therefore we also have to adjust the filers.

(a) (b)

Figure 3-21 (a) Find the maximum region (b) correspond region t (c) the result of shift

Here we use the “minimum shift” to adjust the filters. We calculate the sum of each result of filter and shift the result with minimum sum to the top as shown in Figure 3-22.

Figure 3-22 Adjust the result of filters

Through two shifting operation, our proposed method is invariant to rotation.

3.2.3. S UPPORT V ECTOR M ACHINE

We apply the SVM classifier for data classification. In Figure 3-22, there are two classes: one is marked in blue and the other is marked in white. Here we want to find a hyper planethat separates two classes and H₁, H₂, and H_3,are all possible hyper planes for data separation. The main idea of SVM classifier is to choose a hyper plane which can get the maximum margin of two set; therefore H₃ is selected as the decision plane in SVM classification.

Figure 3-23 The three hyper planes to separate two sets

We have training data labeled as{ ,x_i y_i}, wherei1... , l y_i { 1,1}, x_i{R^d}.

In our proposed method, the vectors x_iare color features and shape features, the valuesy_i are “1” for one class and “-1” for the other class, d is the dimension of the vectorx_i, and l is the number of training data. We find the hyper plane { , }w b that separates the two classes as shown in Figure 3-23. The vector w is the normal to the hyper plane, ^b

w is the perpendicular distance from the hyper plane to the original, and w is the Euclidean norm ofw .

Figure 3-24 Find the hyper plane

In order to separate the two set, the following constraints should satisfy:

( ^T ) 1 0

The solution can be expressed by terms of linear combination of the training vectors

3.2.4. S YNTHETIC M INORITY O VER - SAMPLING

T ECHNIQUE (SMOTE)

The problem of our proposed method is imbalanced data. The number of the negative data is about 3000; however, the number of static waymarks data is about 40.

The difference is about 60 times. In Figure 3-24(a), if the training data is balanced, the estimated decision boundary (solid balck line) will approximate the true boundary (solid red line) if there is few wrong data (black symbol „+‟). In contrary, if the training data is imbalanced (Figure 3-24(b)), the estimated decision boundary (solid black line) may be very far from the true boundary (solid red line) if there is few wrong data (black symbol „+‟).

(a) (b) Figure 3-25 (a) Balanced data (b) imbalanced data

Thus, if the training data is imbalanced, we have to synthesize new data for the minority set. Here we apply SMOTE algorithm in [13].Synthetic samples are generated in the following way: Take the difference between the feature vector (sample) under consideration and its nearest neighbor. Multiply this difference by a random number between 0 and 1, and add it to the feature vector under consideration.

This causes the selection of a random point along the line segment between two specific features. This approach effectively forces the decision region of the minority class to become more general. The following the pseudo-code for SMOTE in [13].

Algorithm SMOTE(T, N, k)

Input: Number of minority class samples T, Amount of SMOTE N%; Number of nearest neighbors k

Output: (N/100)* T synthetic minority class samples

(*If N is less than 100%, randomize the minority class samples as only a random

(* The amount of SMOTE is assumed to be in integral multiples of 100. *) k = Number of nearest neighbors

numattrs = Number of attributes

Sample[ ][ ]: array for original minority class samples

newindex: keeps a count of number of synthetic samples generated, initialized to 0 Synthetic[ ][ ]: array for synthetic samples(*Compute k nearest neighbors for

each minority class sample only. *) for i ← 1 to T

Compute k nearest neighbors for i, and save the indices in the nnarray Endfor

Populate(N, i, nnarray) (*Function to generate the synthetic samples. *) while N ~= 0

Choose a random number between 1 and k, call it nn. This step chooses one of the k nearest neighbors of i.

for attr ← 1 to numattrs

Chapter 4. E XPERIMENTAL R ^ESULTS

In this chapter, we will show and discuss our experimental results. In computer simulation, the proposed algorithm is coded in Matlab without code optimization, and is tested over a PC with Intel® Core™2 Duo CPU running at 3G Hz. The first experimental stage is to capture the video in the campus. Here we prepare three videos captured at different places for four different static waymarks. Figure 4-1 and 4-2 show the three places and the selected waymarks.

(a)

(b)

(c)

Figure 4-1 The overview of three places

Figure 4-2 Left to right: blue signboard, fire hydrant, Disabled signboard, and parking signboard

In Table 4-1, for each static waymark, we randomly selected forty image patches in the video. The SOMTE algorithm was used to synthesize two hundred image patches as the positive training samples. On the other hand, three thousand image patches were extracted from the same video. The video was captured in the afternoon.

Table 4-1 Number of training images

Static waymarks The number selected

SMOTE

Blue signboard 40 200

Fire hydrant 40 200

Unable signboard 40 200

Parking signboard 40 200

Negative images 3000 X

In the Figure 4-3, we use the four colored bounding boxes to recognize four different static waymarks.

Figure 4-3 Use different colors bounding boxes for different static waymarks

The results of four static waymarks, with variations in rotation and scale, are shown in Figure 4-4 and 4-5.

35 (a)

(b)

(c)

Figure 4-4 Different scales of static waymarks in the afternoon

(a)

(b)

(c)

Figure 4-5 Static waymarks in the afternoon with rotation variations

Here we also apply our algorithm to the videos captured at noon and evening. The results are shown in Figure 4-6 and 4-7.

36 (a)

(b)

(c)

Figure 4-6 Static waymarks at noon with rotation variations

Figure 4-7 Static waymarks in the evening with rotation variations

Table 4-2 is the detection rate and false alarm for each static waymark at noon, afternoon, and evening. The number of testing image for each static waymark is about six hundreds. In Table 4-2, we find varying illumination is important effect at the detection rate. Moreover, if there are many objects similar to the static waymark, such as cripple signboard or parking signboard, the detection rate of the static waymark is lower than others.

Table 4-2 Detection rate and false alarm at different time for each static waymarks

Time Static

Comparing to combining the three methods (Table 4-2), applying three methods respectively (Table 4-3) has little lower detection rate in the afternoon and much lower the detection rate at night, because of varying illumination. The result shows if we combine three methods we can obtain the good bottom-up detection and detection rate.

On other hand, in Table 4-4, we obtain the average computing time that includes detection and recognition by different bottom-up detection. The number of images is 2520 and the size of image is 360*240.

在文檔中基於顯著特性之靜態路標偵測與辨識系統 (頁 21-0)

Chapter 2. Backgrounds

2.3 Recognition

   

Chapter 3.

P ROPOSED M ETHOD

3.1. B OTTOM -U P

3.1.1. L INEAR F ILTERING OF I MAGE D ATA

 



3.1.3. R EGULAR S HAPE

3.1.4. S IMPLE T EXTURE

3.1.5. C OMBINATION

3.2. T OP -D OWN L EARNING

3.2.1. C OLOR A NALYSIS

3.2.2. S HAPE A NALYSIS

3.2.3. S UPPORT V ECTOR M ACHINE

3.2.4. S YNTHETIC M INORITY O VER - SAMPLING

T ECHNIQUE (SMOTE)

Chapter 4.

E XPERIMENTAL R ESULTS

P ^ROPOSED M ^ETHOD

E XPERIMENTAL R ^ESULTS