Chapter 2. Backgrounds
2.3 Recognition
The shape is also an important feature of static waymarks. Authors in [3]
apply the corner detector to recognize object shape, as shown in Figure 2-16. [4] uses the relation of boundary line (Figure 2-17) to determine the shape. The recent approach [5] uses the distance to borders (Dtbs) as the features of shape. The illustration of [5] is shown in Figure 2-18.
Figure 2-17 Corner detection result in [3]
13
Figure 2-18 The relation of boundary line in [4]
(a) (b)
Figure 2-19 (a) The Dbts of triangle (b) the feature vectors of (a) by Dtbs [5]
After the shape recognition, we want to recognize the context inside the
shape. The current approach [2] applies the Histogram of Oriented Gradient descriptor (HOG) to obtain the features of the objects. In [2], they first divides the image into small connected regions, called cells. After that, each cell gathers a histogram of gradient directions for the pixels within the cell. Finally, these histograms of each cell are represented as the descriptor. For better accuracy, the measure of intensity across a larger region of the image, called a block, normalizes the histograms of each cell in
14
the block. This normalization enhances the robustness of the descriptor to lighting changes.
[10] proposes the concept of the Speeded Up Robust Features (SURF). The same as HOG, the input image is divided into many sub-regions. Each sub-region is analyzed by the Haar wavelet and the filter response in horizontal direction and vertical direction are denoted as dx and dy respectively. The wavelet responses
dxand dyare summed up over each sub-region as two feature vectors. In order to obtain extra information, they also extract the sum of the absolute values of the responses, d andx d . Therefore, each sub-region has a four-dimensional descriptor y
vectorv(
dx, dy, dx, dy). Figure 2-19 are three examples of SURF descriptor.Figure 2-20 Three examples of SURF [10]
15
Chapter 3.
P ROPOSED M ETHOD
The goal of the system is to detect and recognize the static waymarks. The system includes two stages 1) Bottom-up, 2) Top-down learning.
1) Bottom-up: In this stage, the static waymarks in the image are fast identified.
2) Top-down learning: In this stage, we want to get the features of the static waymarks which are useful for recognition. Here, we use the skills of machine learning to learn the features that can be used in the recognition.
Input image goes through the operation of Bottom-up and a corresponding saliency-map is generated. The Saliency-map shows where the interesting region and where the static waymarks in. Second we use the features that get form the Top-down learning to do recognition.
Figure 3-1 Block diagram of the proposed system
16
3.1. B OTTOM -U P
In Bottom-up, we want to fast identify the location of the static waymarks.
Here, we find the physical properties to implement our Bottom-up algorithm. The static waymarks like fire hydra, bands have the following properties 1) strong contrast with respect to its surrounding regions 2) regular shape 3) simple texture. The three properties will be discussed in following section.
The flow chart of proposed Bottom-Up is illustrated in Figure 3-2. First the input image are decomposed into three color map R color, G color, and B color.
Second these color map go through three operations based on three physical properties 1)strong contrast, 2)regular shape, and 3) simple texture. Finaly the corresponding salient map will be generated for each color map and combine them.
Figure 3-2Block diagram of the proposed Bottom-up
17
3.1.1. L INEAR F ILTERING OF I MAGE D ATA
Because the color of the static waymarks is simple, similar to [6]‟s approach, we decompose an input image into a few feature vectors, including R color, G color, and B color, where r, g, and b denote the red, green, and blue components of the input image. For each color, negative values are set to zero.
In the Figure 3-3, we can notice the objects, (a) fire hydra, (b) blue band, and(c) unable parking band, have the strong contrast compare to the surroundings.
This is because they are designed salient for the people. And in the top-down, we want to analyze the color of objects. Thus, we hope the algorithm that can not only show the strong contrast but also can do the analysis of color. Therefore we apply the feature-pair proposed by [9].
(a)fire hydra (b) blue band (c) unable parking band
Figure 3-3The example of the static waymarks
18
For each of the R color, G color, and B color, we compute the feature-pair distribution as proposed in [9].
Take R component for example, we first analyze the relation between the central point and its 8-connectivity neighbors. In the Figure 3-4, we define the 8 neighbors of the point E and define the eight 2D coordinates (E,A) (E,B) (E,C) (E,D) (E,F) (E,G) (E,H) (E,I) as the R color-pairs. Here A, B….I is the R color value and the point A~I is the neighbor of E. Here we use the eight R color-pairs forming the R color-pair distribution.
Figure 3-4 The main idea of feature-pair
Figure 3-5 The feature-pair distribution
By using the R color-pairs of all the image pixels, we can obtain the R color-pair distribution as shown in Figure 3-6. Obviously, we can notice that the R color pairs in smooth regions will lie around the 45° line; whereas the R color pairs across edges will lie far from the 45°line.
Figure 3-6 shows an example of the R color-pair distribution of the fire hydrant image. It is clear that the road and grass are the major backgrounds of the image.
19
Therefore the R color-pairs of these two regions are two major clusters in the R color-pair distribution. On the other hand, the fire hydrant map to a smaller cluster in the top-right corner of the distribution. Moreover, the R color pairs over the road -grass boundary and the fire hydrant - grass boundary form four clusters (represented in green color) far away from the 45-degree line as shown in Figure 3-6.
In the R color-pair distribution, we can easily notice that the boundary between the fire hydrant and the grass show a stronger contrast than the road-grass boundary. Here we can conclude two facts that (1) the fire hydrant is “less common”
than the road and grass; and (2) the fire hydrant has a stronger contrast compared to its background. We obtain the conclusion that the fire hydrant may attract the attention of most observers. We also form the G color-pair and the B color-pair distributions.
Figure 3-6 The example of R color-pair distribution
Similar to [10], we form a 3-D histogram by dividing the plane of feature-pair values into uniform cells and calculate the number of feature pairs in each cell. Most clusters lei around the diagonal in the 3-D histogram; the largest cluster corresponds to the background in the image; the foreground objects correspond to smaller clusters; and those clusters away from the diagonal correspond to the edge.
20
Figure 3-7 The 3-D histogram
Based on the 3-D histogram, a contrast-weighting algorithm is proposed to weigh the saliency degree of each cell around the diagonal. This weighting algorithm contains three main parts: the “edge weight” to gather the information that the off-diagonal cells provide, the “self weight” to determine whether a diagonal cell corresponds to a visually salient region, and the “foreground weight” to judge whether a diagonal cell corresponds to not the background.
The three parts are respectively base on three ideas 1) the stronger the edge is the more salient the objects are, 2) the smaller the size of objects is the more salient the objects are, and 3) the more different comparing to background is the more salient the objects are. Thus, we combine the three properties to the formulate
Edge weight count is like Figure 3-8. The formulation is
_ ( , ) 2
, {3 - _ }
Edge weight hist i j d
i j D histogram off the diagonal
Eq. 3-4
Figure 3-8 The computation of edge weight
21
2) Self weight
The Self weight is the size of objects that can give t how much contrast is. The smaller the size of objects is the more salient the objects are. The formulation is
, { }
_ = ( , )
i j the histgram of the object
Self weight hist i j
Eq. 3-53) Foreground weight
The foreground weight is how different between objects and background. The more different comparing to background is the more salient the objects are. In the 3D histogram, the maximum of histogram on the diagonal is the background. And the foreground weight is how long to the maximum of histogram like Figure 3-9.
The formulation is
F o r e g r o u n d_ w e i g h t Eq. 3-6 d
Figure 3-9 The 3-D histogram
22
3.1.3. R EGULAR S HAPE
As we know, the static waymarks are different from the natural scene like trees, flowers. The main difference is the shape. In the Figure 3-10, we can find the shape of (a) fire hydra, (b) blue band, and (c) unable parking band is more regular than the surrounding.
(a) Fire hydra (b) blue band (c) unable parking band
Figure 3-10 The example of the static waymarks
The regular shape means it is compounded of the straight line like horizontal, straight, oblique, and the circular line. Therefore the best way to implement the property is to detect the boundary. Here we find the filter banks have these properties.
In the Figure 3-11, we use these filters to do convolution with the R color, B color, and G color. And the objects have the regular shape will response.
Figure 3-11 The filter that can detect boundary [11]
23
3.1.4. S IMPLE T EXTURE
As we observe, the static waymarks are designed for easy identification. Thus they have to be designed differently compared to natural scenery. We notice that they are not complicated and have the simple texture compared the natural scenery.
In the paper [8], it mentions that if the signal is a pulse Figure 3-12 (a), the reconstructed signal based on the phase spectrum will have high response in location of input pulse.
(a) (b)
Figure 3-12 (a) Original signal(pulse) (b) Reconstruction result using phase [8]
Then we find the specially designated objects also have the property. Look the Figure 3-13 (a). We find the blue band has simple texture and at the B channel (Blue) Figure 3-13 (b) we can notice it also has the property like pulse.
We find the specially designated object also like a pulse signal in nature image.
Take Figure 3-13 (a) and (b) for example. Blue signboard in the image has simple texture. At B channel (Blue) it is the same as the situation depicted in Figure 3.12(a)
(a) (b)
Figure 3-13 (a) Blue band (b) B color of blue band
Conversely, if the signal is not pulse but regular signal like sine Figure 3-14 (a), then we can find the result (b) has lower response. We think the natural scenery has
24
this property because it has more complex structure
(a) (b)
Figure 3-14 (a) Original signal(sine) (b) Reconstruction result using phase [8]
The method of reconstruction result using phase is that ( , ) ( ( , )) respectively. P(f) represents the phase spectrum of the image. g(x, y) is a 2D Gaussian filter. sM( x , y ) is reconstruction result using phase.
3.1.5. C OMBINATION
Based on the three physical properties, we can obtain the three saliency maps which are normalized to one for each color map. The three saliency maps are combined and we apply the variance of them as the weighting (Eq.3-13). The R, G, and B saliency map are generated respectively by the Eq.3-10, 3-11, and 3-12.
1 2 3
25
3.2. T OP -D OWN L EARNING
In this section, we will discuss how we design a classifier for the recognition of the static waymarks. In order to achieve robustness to illumination, scale, and rotation, we chosen the color and shape as the features; then, a SVM classifier is trained based on the chosen features. Finally, we apply the Synthetic Minority Over-sampling Technique (Smote) algorithm to solve the problem of imbalanced data in the machine learning. The following sections are 1) color analysis and 2) shape analysis 3) SVM classifier 4) the algorithm of Smote
3.2.1. C OLOR A NALYSIS
As we known, the RGB color space is sensitive to illumination change;
therefore, we choose the HSI space as the candidate space for color analysis due to its robustness to illumination change.
In order to get color features, we obtain the histogram of hue and saturation and don‟t consider intensity. The example is shown in Figure 3-15(b) and (c).
(a) (b) (c)
Figure 3-15 (a) Blue signboard (b) the saturation of (a) quantized by ten (c) hue of (a) quantized by ten
26
3.2.2. S HAPE A NALYSIS
Only color information is not enough for presenting the features of the static waymarks, hence the shape of the object is also taken into consideration. There are many methods to get the features of shape. However, in our system, we use the filter based approach for shape analysis because the shape of the static waymarks is composed of many different directional lines. Here we use the six directional lines to present the shape as shown in Figure 3.16. In order to be invariant to light change, we convolute the hue and saturation of the static waymarks with these filters as shown in Figure 3-17.
Figure 3-16 The six directional lines
Figure 3-17 The six directional filters
The challenge of the shape analysis is the rotation of the object. To take care of this issue, we use the descriptor in [12]. In this descriptor, we divide the patch into 24 regions as shown in Figure 3-18(a).The sum of value in each region constitutes the 24 dimensions features vectors. If the objects rotate, we have to adjust the feature vectors as shown in Figure3-18.
(a) (b)
Figure 3-18 (a) Original (b) rotate
27
In order for rotation invariant, we have to adjust the features vector. Here we choose the method “maximum shift”. In the Figure 3-19, we divide the patch into 8 regions and get the sum of each region.
Figure 3-19 Maximum shift
If the maximum is region 5 in the Figure 3-20(a), we shift the regions 5, 13, and 21 in the Figure 3-20 (b) to the top as shown in Figure 3-20 (c).
(a) (b)
(c)
Figure 3-20 (a) Find the maximum region (b) correspond region t (c) the result of shift
Although we apply “maximum shift”, we can‟t solve the problem of rotation.
Example in Figure 3-21 demonstrates this problem. In Fig. 3.21, the object rotates in
28
(b). We can find the result of filter 1 in (a) is equal to the result of filter 2 in (a) if we use “maximum shift”. Therefore we also have to adjust the filers.
(a) (b)
Figure 3-21 (a) Find the maximum region (b) correspond region t (c) the result of shift
Here we use the “minimum shift” to adjust the filters. We calculate the sum of each result of filter and shift the result with minimum sum to the top as shown in Figure 3-22.
Figure 3-22 Adjust the result of filters
Through two shifting operation, our proposed method is invariant to rotation.
29
3.2.3. S UPPORT V ECTOR M ACHINE
We apply the SVM classifier for data classification. In Figure 3-22, there are two classes: one is marked in blue and the other is marked in white. Here we want to find a hyper planethat separates two classes and H1, H2, and H3, are all possible hyper planes for data separation. The main idea of SVM classifier is to choose a hyper plane which can get the maximum margin of two set; therefore H3 is selected as the decision plane in SVM classification.
Figure 3-23 The three hyper planes to separate two sets
We have training data labeled as{ ,xi yi}, wherei1... , l yi { 1,1}, xi{Rd}.
In our proposed method, the vectors xiare color features and shape features, the valuesyi are “1” for one class and “-1” for the other class, d is the dimension of the vectorxi, and l is the number of training data. We find the hyper plane { , }w b that separates the two classes as shown in Figure 3-23. The vector w is the normal to the hyper plane, b
w is the perpendicular distance from the hyper plane to the original, and w is the Euclidean norm ofw .
30
Figure 3-24 Find the hyper plane
In order to separate the two set, the following constraints should satisfy:
( T ) 1 0
The solution can be expressed by terms of linear combination of the training vectors
l
31
3.2.4. S YNTHETIC M INORITY O VER - SAMPLING
T ECHNIQUE (SMOTE)
The problem of our proposed method is imbalanced data. The number of the negative data is about 3000; however, the number of static waymarks data is about 40.
The difference is about 60 times. In Figure 3-24(a), if the training data is balanced, the estimated decision boundary (solid balck line) will approximate the true boundary (solid red line) if there is few wrong data (black symbol „+‟). In contrary, if the training data is imbalanced (Figure 3-24(b)), the estimated decision boundary (solid black line) may be very far from the true boundary (solid red line) if there is few wrong data (black symbol „+‟).
(a) (b) Figure 3-25 (a) Balanced data (b) imbalanced data
Thus, if the training data is imbalanced, we have to synthesize new data for the minority set. Here we apply SMOTE algorithm in [13].Synthetic samples are generated in the following way: Take the difference between the feature vector (sample) under consideration and its nearest neighbor. Multiply this difference by a random number between 0 and 1, and add it to the feature vector under consideration.
This causes the selection of a random point along the line segment between two specific features. This approach effectively forces the decision region of the minority class to become more general. The following the pseudo-code for SMOTE in [13].
32
Algorithm SMOTE(T, N, k)
Input: Number of minority class samples T, Amount of SMOTE N%; Number of nearest neighbors k
Output: (N/100)* T synthetic minority class samples
(*If N is less than 100%, randomize the minority class samples as only a random
(* The amount of SMOTE is assumed to be in integral multiples of 100. *) k = Number of nearest neighbors
numattrs = Number of attributes
Sample[ ][ ]: array for original minority class samples
newindex: keeps a count of number of synthetic samples generated, initialized to 0 Synthetic[ ][ ]: array for synthetic samples(*Compute k nearest neighbors for
each minority class sample only. *) for i ← 1 to T
Compute k nearest neighbors for i, and save the indices in the nnarray Endfor
Populate(N, i, nnarray) (*Function to generate the synthetic samples. *) while N ~= 0
Choose a random number between 1 and k, call it nn. This step chooses one of the k nearest neighbors of i.
for attr ← 1 to numattrs
33
Chapter 4.
E XPERIMENTAL R ESULTS
In this chapter, we will show and discuss our experimental results. In computer simulation, the proposed algorithm is coded in Matlab without code optimization, and is tested over a PC with Intel® Core™2 Duo CPU running at 3G Hz. The first experimental stage is to capture the video in the campus. Here we prepare three videos captured at different places for four different static waymarks. Figure 4-1 and 4-2 show the three places and the selected waymarks.
(a)
(b)
(c)
Figure 4-1 The overview of three places
Figure 4-2 Left to right: blue signboard, fire hydrant, Disabled signboard, and parking signboard
34
In Table 4-1, for each static waymark, we randomly selected forty image patches in the video. The SOMTE algorithm was used to synthesize two hundred image patches as the positive training samples. On the other hand, three thousand image patches were extracted from the same video. The video was captured in the afternoon.
Table 4-1 Number of training images
Static waymarks The number selected
SMOTE
Blue signboard 40 200
Fire hydrant 40 200
Unable signboard 40 200
Parking signboard 40 200
Negative images 3000 X
In the Figure 4-3, we use the four colored bounding boxes to recognize four different static waymarks.
Figure 4-3 Use different colors bounding boxes for different static waymarks
The results of four static waymarks, with variations in rotation and scale, are shown in Figure 4-4 and 4-5.
35 (a)
(b)
(c)
Figure 4-4 Different scales of static waymarks in the afternoon
(a)
(b)
(c)
Figure 4-5 Static waymarks in the afternoon with rotation variations
Here we also apply our algorithm to the videos captured at noon and evening. The results are shown in Figure 4-6 and 4-7.
36 (a)
(b)
(c)
Figure 4-6 Static waymarks at noon with rotation variations
Figure 4-7 Static waymarks in the evening with rotation variations
Table 4-2 is the detection rate and false alarm for each static waymark at noon, afternoon, and evening. The number of testing image for each static waymark is about six hundreds. In Table 4-2, we find varying illumination is important effect at the detection rate. Moreover, if there are many objects similar to the static waymark, such as cripple signboard or parking signboard, the detection rate of the static waymark is lower than others.
37
Table 4-2 Detection rate and false alarm at different time for each static waymarks
Time Static
Comparing to combining the three methods (Table 4-2), applying three methods respectively (Table 4-3) has little lower detection rate in the afternoon and much lower the detection rate at night, because of varying illumination. The result shows if we combine three methods we can obtain the good bottom-up detection and detection rate.
On other hand, in Table 4-4, we obtain the average computing time that includes detection and recognition by different bottom-up detection. The number of images is 2520 and the size of image is 360*240.