Hypothesis Verification (HV) Methods

Chapter 2 Related Works

2.2 Hypothesis Verification (HV) Methods

The input of HV step is the hypothesized vehicle locations generated from HG step. In HV step, the correctness of these hypothesized vehicle locations are verified. A. Khammari et al. [12] classified HV methods into two categories, namely template-based and appearance-based.

Template-based methods compute the correlation between the input and the predefined patterns of vehicle. M. Betke et al. [13] proposed a vehicle detection approach using deformable gray-scale template matching. J. Ferryman et al. [14] used Principal Component Analysis (PCA) on manually sampled data to form a deformable model, utilized this model to confirm the detection results.

Appearance-based methods require training data of vehicle images to learn the

modeled to improve performance. Each training sample is represented by local or global features. Then, the decision boundary between the vehicle and non-vehicle classes are learned either by training process (e.g., Support Vector Machine (SVM) [15]) or by modeling the probability distribution of the features in each class (e.g., using the Bayes rule [16]). In [17], wavelet transform was used for feature extraction and Support Vector Machines (SVMs) was used for classification.

One of the most important issues in object detection is feature selection. In most cases, a large quantity of features is employed because the important features are unknown in advance.

Furthermore, many of them are either redundant or even irrelevant. Therefore, it is necessary to use only those features that have great separability power and ignore or pay less attention to the rest. Consequently, a powerful feature selection algorithm is highly desirable.

R. Wang et al. [18] proposed a vehicle detection system based on local features of three vehicle sub-regions. They used PCA and Independent Component Analysis (ICA) as method of feature selection, and combine PCA with ICA. Each sub-region of training data is projected onto its associated eigenspace and independent basis space to generate a PCA weight vector and an ICA coefficient vector respectively. A likelihood evaluation process is then performed based on the estimated joint probability of the projection weight vectors and the coefficient vectors. The PCA model the low-frequency components of vehicles, and ICA model the high-frequency components of vehicles. The combination of PCA and ICA improve the tolerance of variations in the illumination condition and vehicle pose.

P. Viola and M. J. Jones [19] proposed a novel feature selection algorithm for human face detection. The algorithm will use training data to construct a cascade boosted classifiers.

Each layer in the cascade classifier rejects some input that do not contain interested object.

The feature they used is Haar-like feature, also called rectangular filter (verified by C.

Papageorgiou et al. [20]), and the training algorithm is AdaBoost machine learning algorithm

[21], which is used to select useful features in each layer. The integral image is used to accelerate the calculation of Haar-like features. The cascaded structure makes the classifier trained by AdaBoost suitable for real-time face-detection application.

Inspired by AdaBoost algorithm, P. Negri et al. [22] combined the Haar-like features and the histograms of oriented gradient (HoG) with AdaBoost algorithm. The purpose of this method is to use classifier composed of Haar-like features filter out easy negative inputs in the early part of the fusion classifier. And in the later part of the fusion classifier, classifier composed of HoG features generates a fine decision boundary to remove negative inputs which are similar to vehicle, so that the fusion classifier achieved better performances than either single classifier.

In this study, we use cascaded classifier trained by AdaBoost algorithm to generate vehicle hypotheses. We proposed a false alarm eliminating system using edge complexity for daytime and histogram matching and intensity complexity for evening as hypothesis verification method, and cooperate with survival algorithm to stabilize the final results. In hypothesis generation step, we do not care about false alarm rate. In other words, Haar-like features classifier is focusing on high detection rate and ignoring the false alarm rate. We filter out most the false alarms in later step instead. By this three stages structure, namely high detection rate vehicle classifier, false alarm eliminating system and stabilizer, our whole vehicle detection system can achieve satisfactory performance.

Chapter 3 Vehicle Detection System

In this chapter, the structure of proposed system is defined. The system structure is composed of three stages: vehicle detector, false alarm eliminator and stabilizer. Section 3.1 demonstrates the diagram of whole system and shows key modules of each stage. Section 3.2 explains the heart of vehicle detector, namely AdaBoost vehicle classifier, including concept and details. Section 3.3 illustrates the process and methods of false alarm eliminator. Section 3.4 explains how the system stabilizes the detection results and also erases some false alarms.

3.1 System Overview

First of all, we transform the raw color image into a gray level image and the gray level image is downsized by a simple downsampling algorithm for acceleration reason. After resizing, the image is send to AdaBoost vehicle detector to perform detection. Here, the detector uses slicing window to detect vehicles and confirms the results by its cascaded structure. The outputs of vehicle detector stage are candidate regions (CRs) of vehicles and are also the inputs of false alarm eliminator stage. In the second stage, we firstly check the erasing mode, namely daytime or evening. For daytime case, Canny edge operator [23] is applied to each CR, and then the system examines the edge complexity to decide whether the CR is vehicle or not. For evening case, system firstly calculates the histogram of each CR and then uses the histogram to do histogram matching and intensity complexity checking to verify the result. Both daytime and evening cases, all the results have to be examined by size filter to filter out too big or too small CRs. The final step is stabilizer stage. In this stage, we use survival algorithm to further ensure the CRs are vehicles and stabilize the detection results.

The diagram of whole system is shown in Figure 3-1.

Figure 3-1 System diagram

3.2 Vehicle Detection using AdaBoost

This sub-system is the foundation of entire system, as mentioned in section 2.1, most research efforts have focused on feature extraction and classification by learning or statistical models. The most important issue in object detection is selecting a suitable set of features which can soundly represent the implicit invariant of interested objects. An intuitive method is to focus on the common components of interested objects. For example, contour, color, symmetry, texture and etc. are commonly use perceptual features. These features can be used alone or cooperatively to find interested objects in the searching area. Although these features can be easily implemented, they also have several limitations. One of the limitations is that these features are based on human’s perception and the physical nature of human’s perception is usually not reliable enough for computer vision. What’s more, how to choose a suitable set of features, no matter using single feature or combining features, is also a very difficult problem. Manually choosing is time consuming and inefficient. Therefore, the efforts of feature extraction have focused on statistical and machine learning area.

AdaBoost algorithm proposed by Y. Freund et al. [21] is one of machine learning techniques and has been widely used for pattern recognition. AdaBoost algorithm combines weak classifiers into one strong classifier by weighted voting mechanism, and it also showed high performance in various fields. P. Viola et al [24] used AdaBoost algorithm to built a pedestrian detector, which has progressively complex rejecting rules. In [19], AdaBoost algorithm is originally applied to face detection and yields the best performance comparable to previous researches. Like original idea, we want to use several features to describe a vehicle and use these features to detect vehicles. Therefore, we use Haar-like features, which will be introduced in section 3.2.1, to describe vehicle and use AdaBoost algorithm, which will be explained in details in section 3.2.3, to select useful Haar-like features and combine them into a strong classifier.

Our goal of this stage, namely AdaBoost vehicle detector, is to build a high detection rate vehicle classifier. In other words, we used detection rate as prime criterion to decode whether the performance of detector is good enough or not. The false alarm rate is treated as reference information about the detector and left to false alarm eliminator and stabilizer to deal with.

We called this kind of training as non-converging training, which will be explained in section 3.2.5. The entire system is also designed and built under this concept.

3.2.1 Haar-like Features (Rectangle Features)

Haar basis functions (Haar-like features) used by Papageorgiou et al. [25] provide information about the grey-level intensity distribution of two or more adjacent regions in an image and is extended by Rainer Lienhart et al. [26]. Figure 3-2 shows the set of Haar-like features, including original and extended features. The output of a Haar-like feature on a certain region of gray-level image is the sum of all pixels intensity in the black region being subtracted from the sum of all pixels intensity in the white region. The sums of black and white region are normalized by a coefficient in case of square measures of white and black regions are different. In order to reduce computation time for the filters, P. Viola et al. [19]

introduced the integral image which is an intermediate representation for a input image. The concept of integral image is illustrated in Figure 3-3(a), the value of the integral image at point (x, y) is the sum of all the pixels above and to the left. In Figure 3-3(b), the sum of the pixels within rectangle D can be computed with four array references. The value of the integral image at location 1 is the sum of the pixels in rectangle A. The value at location 2 is A + B, at location 3 is A + C, and at location 4 is A + B + C + D. The sum within D can be computed as 4 + 1 − (2 + 3). Utilizing integral images, sum of a rectangular region can be calculated by using only four references in the integral image. As a result, the difference of

image, eight in the case of the three-rectangle Haar-like features and nine for four-rectangle Haar-like features.

Every Haar-like feature j is defined as f( j ) = (r_j, w_j, h_j, x_j, y_j), where r_j is the type of Haar feature, wj and hj are width and height of the Haar feature and (xj, yj) is its position in the window. value_subtracted  f( j), is the weighted sum of the pixels in white rectangles subtracted from those of dark rectangles.

Figure 3-2 The Haar features [26]

(a) (b)

3.2.2 Weak Classifier

A weak classifier h(x, f, p, θ) consists of a Haar-like feature f (defined in 3.2.1), a threshold (θ) and a polarity (p) indicating the direction of the inequality in Equation 3-1.



here x is a 22x18 sub-window of an image. For each feature j, AdaBoost algorithm is applied to decide an optimal threshold θj for which the classification error on training database (containing positive and negative samples) is minimized. The process of threshold selection for weak classifier is a brute force method. We can think of threshold of weak classifier as a line that separate the positive sample from negative sample. The AdaBoost algorithm examines every separating threshold to find out the optimal threshold. The threshold seleting process is illustrated in Figure 3-4. By selecting the optimal threshold, the blue points (positive samples) and red points (negative samples) can be separated with a lowest classification error. Because one weak classifier has only one threshold, its separating ability is limited and that is why this kind of classifier being called weak classifier.

3.2.3 AdaBoost Algorithm

In literature, some methods are commonly used for features selection. For example, Principal Component Analysis (PCA) used in [14, 17, 18], Independent Component Analysis (ICA) used in [18] and so forth. Compared with above methods, AdaBoost algorithm has shown its capability to improve the performance of not only features selection but also detection rate.

In the beginning, we have to prepare a feature set originated from the permutation of Haar-like feature types, scales and positions. This features set is many times greater than the number of pixels in the input image. Even though each Haar-like feature can be computed very efficiently by integral image, computing the complete set is still extremely time consuming and costly. AdaBoost algorithm uses this feature sets to generate weak classifiers and finds precise hypotheses of vehicle by iteratively combining weak classifiers which, in general, have moderate precision into a strong classifier defined in Equation 3-2.



where h and C are the weak and strong classifiers respectively, and α is a weight coefficient for each h.

Considering a 2D feature space is full of positive and negative training samples. At first, AdaBoost algorithm chose a weak classifier with the highest accuracy in current training round to split the training samples by its optimal threshold. As shown in Figure 3-5(a), samples misclassified by previous weak learner are given more attention before adding next weak classifier at next round. By increasing weights, the training process will focus on these difficult samples, which cannot be correctly separated by single weak classifier. The idea of how AdaBoost algorithm selects and combines weak classifiers into a strong classifier is shown in Figure 3-5(b)(c)(d)(e)(f)(g). As shown in Figure 3-5(b), the misclassified blue points

(positive samples) in right side of the black line and red points (negative samples) in left side of the black line are emphasized in the next round as Figure 3-5(c). Similarly, the misclassified blue points in left side of the second black line and red points in right side are emphasized in the next round, as shown in Figure 3-5(d) and Figure 3-5(e). Finally, the strong classifier is formed with a linear combination of weak classifiers shown in Figure 3-5(g), and the boosting algorithm for selecting a set of weak classifiers to compose a strong classifier is shown in Table 3-1.

(a)

(d) (e)

(f) (g)

Figure 3-5 The process of selecting and combining weak classifiers

Table 3-1 The boosting algorithm for selecting and combining weak classifiers[29]

3.2.4 Cascaded Classifier

A cascaded classifier is a linear combination of strong classifiers, and strong classifier is composed of at least one weak classifier. P. Viola et al [19] also proposed a cascading algorithm of AdaBoost described in Table 3-2. At each stage, if an input extracted by the searching window is classified as vehicle, it is allowed to enter the next stage; otherwise, the

T hypotheses are constructed and each using a single feature. The final hypothesis is a weighted linear combination of the T hypotheses where the weights are inversely proportional to the training errors.

• Given example images (x1, y1), . . . , (xn, yn) where yi = 0, 1 for negative are the number of negatives and positives respectively.

• For t = 1, . . . , T :

– Select the best weak classifier with respect to the weighted error



^

• The final strong classifier is:



can be labeled as vehicle; otherwise it is rejected by particular stage even if it enters the last stage. Figure 3-6 demonstrates the schema of cascaded classifier.

Table 3-2 The training algorithm for building a cascaded detector[29]

Figure 3-6 Structure of cascaded classifier.

• User selects values for f, the maximum acceptable false positive rate per layer and d, the minimum acceptable detection rate per layer.

• User selects target overall false positive rate, F_target.

• P = set of positive examples

* Evaluate current cascaded classifier on validation set to determine Fⁱ and Dⁱ.

* Decrease threshold for the ith classifier until the current cascaded classifier has a detection rate of at least d × D_i-1 (this also affects Fi)

– N ← 0

– If F_i > F_target then evaluate the current cascaded detector on the set of non-face images and put any false detections into the set N

3.2.5 Non-Converging Training

As far as machine learning is concerned, training samples play a very important role in the training process. There are many factors of training samples can affect the final performance of classifier. For example, the amount of training samples, the complexity of content in the training image and etc. Commonly, positive training samples contain similar images of interested object. For instance, the positive training samples have similarity in the phase of interested object, lighting condition, contour and so forth. When collecting positive training samples, it is important to make the difference between each sample as low as possible. In the other words, the main purpose of positive training samples is to provide training algorithm the common features of interested object. On the other hand, negative training sample is everything except interested object. In this study, the interested object is vehicles and we focus on the front phase of vehicles.

Different from the normal concept of preparing training samples, we collected not only similar samples but also dissimilar samples. More precisely, we collected frontal-viewed image of vehicles for positive samples and the others for negative samples both from daytime and evening. The samples have quite difference in lighting condition and have different features. Figure 3-7 is some positive samples we collected, including daytime and evening and Figure 3-8 is some samples of negative samples.

(a)

(b)

Figure 3-7 Some positive training samples of (a) Daytime (b) Evening

(a)

(b)

Figure 3-8 Some negative training samples of (a) Daytime (b) Evening

Generally, the complexity of training samples will affect the convergence of training process. That means if the content of training samples is too difficult, the training process of

AdaBoost algorithm would not finish, namely non-converging. According to algorithm described in Table 3-1, AdaBoost algorithm will pay more attention to the hard samples, which is difficult to classify by single weak classifier. What’s more, by algorithm described in Table 3-2, AdaBoost will build progressively complex decision rules in order to deal with hard samples when the training process is going on, illustrated by Figure 3-9. As you can see in Table 3-2, the outer while loop will continue until the Fi is small than Ftarget. Therefore, if the training samples are too complex to classify, the training process will never stop. Besides, the performance criterion of cascading algorithm is false alarm rate. In order to fit this criterion, the training process will try to decrease false alarm and the detection rate will decrease simultaneously.

Figure 3-9 Progressively complex decision rules of AdaBoost vehicle detector

Because we increased the complexity of training samples, it is not realistic to expect the training process will converge. As the consequence, we use detection rate as prime criterion instead. By the algorithm described in Table 3.2, the more layers the classifier has, the more difficult the object is classified as a vehicle. Consequently, if we decrease the layer number of classifier, the chance that the object be classified as a vehicle is higher. Therefore, what we did here is to specify the training process will stop at particular layer number. It doesn’t matter that the training procedure is not converge yet, we just want the classifier can detect as

as assistant criterion. The statistical results will be presented in section 4.

3.3 False Alarm Eliminator

As mentioned in section 3.2, this stage of system is to handle the false alarms generated from previous stage, namely AdaBoost vehicle detector. The first priority of false alarm eliminator is to filter out false alarms as many as possible and affect detection rate as less as possible. When decreasing false alarm rate, as mentioned in section 3.2.5, the detection rate is affected simultaneously. Therefore, how to decrease false alarm rate and still keep detection rate as high as possible in the same time is the primary issue in this step. Additionally, the complexity of computation is also one of our concerns because we also want the system can be applied to real-time applications. As the results, we used less complex methods cooperating with properties of vehicles to filter out false alarms.

在文檔中運用複雜度及亮度分佈檢測之建立在Boosting基礎上的車輛偵測 (頁 14-0)