Object detection algorithm - 適用於分散式暫存記憶體多核心平台之多媒體多解析處理應用最佳化

Object detection based on image processing has become an active research area in computer vision field in recent years. For the evolving embedded applications rely on object recognition system such as automotive applications, surveillance, and intelligent robots, reliable object detection has major influence on the performance and usability the above systems. Viola and Jones have proposed a adaboost-based procedure that reduces the processing time substantially while achieving almost the same accuracy as compared to other methods (i.e. skin color, neural network, example-based, …etc.). This section is composed by four topics as introduced as follows,

 Haar-feature

Adaboost-based object detection classifies images based on value of simple feature. There are many motivations for using feature rather than pixel directly. The most common reason is that features can act to encode ad-hoc domain knowledge that is difficult to learn using a finite quantity of training data. For this system there is also the second critical motivation for features: the feature-based system operates much faster than a pixel-based system. The simple feature are used reminiscent of Haar basis functions which have been used by Papergeorgiou et al. More specifically, they use two kinds of features. The value of the two-rectangle feature is the difference between the sum of pixels within two rectangular regions. The regions have the same size and shape and are horizontally and vertically adjacent (see figure 2-1). A three-rectangle feature computes the sum within two outside rectangles subtracted from the sum in a center rectangle.

Figure 2-1 Harr-like based object detection

 Cascade classifier

feature0

feature1

feature2

feature0

feature1

Feature_n-1

feature0

feature1

Feature_m-1 Stage 0

Stage 1 Stage 21

Fail Fail Fail

Pass Candidate

Pass Pass

Figure 2-2 Cascade classifier

Haar-like features used are simply masks constituted by two to three rectangles. When utilized in object detection, pixels masked by white and black are summed up respectively. The

difference of summed up value are examined by a predefined threshold. A weak classifier is set if the summed up value exceed threshold. Weak classifiers produced by features are weighted averaged to produce strong classifier, which indicate whether the examined area is an object or non-object. Various strong classifiers are cascaded together as a degenerate decision tree. While maintaining classifiers cascaded in a proper order, detection rate can be raised without sacrificing detection time. In our research, we use the default file of adaboost-trained classifier which has 2135 features and 22 stages, as shown in figure 2-2.

 Integral image

Rectangle features can be computed very rapidly using an intermediate representation for the image which we call the integral image. The integral image at location x, y contains the sum of the pixels above and to the left of x, y, inclusive:



where ii(x, y) is the integral image and i(x, y) is the original image. Using the following pair of recurrences:

s(x, y) = s(x, y-1) + i(x, y) (2-2)

ii(x, y) = ii(x-1, y) + s(x, y) (2-3)

(where s(x, y) is the cumulative row sum, s(x, -1) = 0 an ii(-1, y) = 0) the integral image can be computed in one pass over the original image.

A B

C D

1 2

3 4

Figure 2-3 Integral image

Each rectangle’s sum can be computed in four references of integral image (see figure 2-3). If we want to calculate the accumulated pixel of area D, we can just obtain by simple computation of (p1-p2-p3+p4). For one example, if we want to calculate an area of m×n pixels. We can reduce the complexity from m×n additions to 3 additions. This method can highly reduce computation complexity. Unfortunately, it is implying the problem with frequently memory accessing for the scatter data.

 Scaling window scanning

OpenCV is a library originally proposed by Intel in 2002. It targets for computer vision applications on programmable platform.As OpenCV library gets more and more popular and is broadly used in various applications, it has been implemented on many platforms, including those with multicore architectures. CellCV is right the parallized implementation of OpenCV on Cell Broadband Engine. .

There are two detection approaches for adaboost-based detection. One is scaling window and the other is scaling image. In this thesis, we choose scaling window for detection mode.

Because of our simulation platform is cell processor, we reference CellCV’s implementation.

The detecting mechanism is using a scaling window to scan the whole image from low

resolution to high resolution. And it stops scaling till the window size out of one of image length (see figure 2-4).

Figure 2-4 Scaling window detection

From above description of adaboost-based object detection, we can fine out some characteristics in this application. The task workload is highly depending on input data owing to early termination of cascade classifier. Therefore, no static scheduling can guarantee load balance. Dynamic dispatcher is required. The multi-resolution process is one special characteristic in object detection. It contains repeated signal processing on same input data with different granularity.

 Workload analysis

Object detection is made easy by OpenCV APIs. To verify performance of object detection in OpenCV, the latest OpenCV library is built on a 2.5GHz Xeon workstation with 8GB main memory running Linux. Face detection on different sizes using different scaling factors is evaluated. Table 2-1 summarizes the profiling using single thread. According to the result, object matching is the hot spot of object detection which occupy more than 95%

execution time in all cases. The overall detection time is dependent on image size and scaling factor. The detector can process QVGA image in realtime regardless of scaling factor.

However, ability to process QVGA in realtime already becomes insufficient for today’s applications. As it has been reported that scaling factor = 1.2 has the best detection rate, we will focus on object matching for VGA image using scaling factor 1.2 hereafter

Table 2-1 Performance of row-based splitting

1.1 0.76 5.13 322.21 0.21 328.31 98%

1.2 0.83 5.17 166.48 0.10 172.59 96%

1.3 0.81 5.10 123.97 0.06 129.94 95%

1.1 0.51 0.79 65.36 0.03 66.68 98%

1.2 0.51 0.80 30.83 0.03 32.17 96%

1.3 0.58 0.93 26.50 0.02 28.03 95%

640×480

 Detection kernel - object matching

According to Table 2-1, we take object matching as computation kernel on DSPs. The whole operation can be divided into summed feature calculation, feature-threshold comparison and stage-threshold comparison. Figure 2-5 illustrates the detail of operations in feature calculation. The summed feature would be taken to multiply with a weight and then be summed into a value. Variance represents the pixel distribution for a specific search position and specific search window. It is taken to multiply with constant threshold to obtain feature-threshold. This value is taken to be compare with feature-sum and then decide the result is assign from value A and value B. Value A and value B are define while classifier training. Accumulate each result after going through every features, the final result is taken to compare with stage threshold. The result determines the following operation whether pass to next stage or not.

a b

× × ×

+ ×

stage_sum_fea

sum_fea0 sum_fea1 sum_fea2

weight0 weight1 weight2

variance threshold_orig weighted_fea0 weighted_fea1 weighted_fea2

sum thresold

Figure 2-5 Data flow of feature matching

在文檔中適用於分散式暫存記憶體多核心平台之多媒體多解析處理應用最佳化 (頁 33-39)