Chapter 2 System Architecture and Grid-Processing
2.2 Grid Processing
2.2.2 Candidate Selection
(a) Video sequence (b) GMM Background Image Fig. 2-5 Background image construction by GMM
(a) Current image (b) Foreground image Fig. 2-6 Foreground image obtained by background subtraction
2.2.2 Candidate Selection
The image will be divided into non-overlapped blocks, and each block has the same size in a same image. To find out the moving block with a gray-level change, the foreground image will be obtained by GMM approach, and the sum of foreground image for each pixel as shown in Eq. (2.9)
1where Sk is the kth block and x, y is the coordinates of the scene. T1 is the predefined threshold.
Foreground regions can be found by the GMM approach, but they could also include static objects. Consequently, temporal difference of two successive frames will be employed. All pixels in the difference image with value “1” are considered as moving objects in the scene. To reduce the disturbance of noises, the temporal block difference is computing the summation for each block to determine the moving property defined by Eq. (2.10).
2 T2 is the predefined threshold.To reduce computational cost, only values of background subtraction and temporal difference larger than the predefined thresholds will be regarded as candidates containing moving objects by Eq. (2.11) and Figure 2-7 shows the results of “grid
Fig. 2-7 Results of grid processing
Chapter 3
Local Feature Analysis
This chapter will be divided into three parts according to three different kind of local feature analysis. Firstly, based on the smoke will blur the scene, comparing the energy of background with foreground is proposed in this paper. Furthermore, analyzing one-dimension temporal wavelet energy by different from the other physical objects, smoke will change the energy of scene smoothly. Finally, as same as the last feature, smoke will change the color configuration much smooth than the other objects.
All of the local features are based on block based processing.
3.1 2-D Spatial Wavelet Analysis
Although the Fourier transform has been the mainstay of transform-based image processing since the late 1950s, a more recent transformation, called the wavelet transform, is now making it even easier to compress, transmit, and analyze many images. Unlike the Fourier transform, whose basis functions are sinusoids, wavelet transforms are based on small waves, called wavelets, of varying frequency and limited duration. This allows them to provide the equivalent of a musical score for an image, revealing not only what notes (frequencies) to play but also when to play them.
Conventional Fourier transforms, on the other hand, provide only the notes or frequency information; temporal information is lost in the transform process.
Now we want to transform an image (M by N) into wavelet domain. The whole 2-D spatial wavelet transform can be decomposed by the horizontal wavelet transform and the vertical wavelet transform. Fig. 3-1 is the diagram of horizontal wavelet
transform. The direction from left to right is the wavelet decomposition, and the direction from left to right is the wavelet synthesis.
Fig. 3-1 Horizontal wavelet transformation
Each row of the image will be regarded as mutual independent image sequences and each independent row will process wavelet transform respectively. Briefly, a original image will be decomposed into low-band information on the left side and high-band information on the right side after horizontal wavelet transform. We used L and H stand for low-band and high-band information, respectively.
Vertical wavelet transform will process on L and H obtained by horizontal transforms and the whole wavelet transform will be done. Fig. 3-2 is the diagram of vertical wavelet transform. The direction from left to right is the wavelet decomposition, and the direction from right to left is the wavelet synthesis. The data on the left side was processed by horizontal wavelet transform but not vertical wavelet transform yet. Each column of the image will be regarded as mutual independent image sequences and each independent column will process wavelet
Fig. 3-2 Vertical wavelet transformation
transform, respectively. Anyhow, the data can further separate into upside and underside after vertical wavelet transform. The upside is the vertical low-band information and the underside is the vertical high-band information as shown on the right side of Fig. 3-2. To operate in coordination with horizontal transform, the whole image data can separate into four regions, which are horizontal low-band vertical low-band (LL), horizontal low-band vertical high-band (LH), horizontal high-band vertical low-band (HL), and horizontal high-band vertical high -band (HH).
It is well-known that wavelet subimages contain the texture and edge information of the original image. Edges produce local extreme in wavelet subimages [15]. Wavelet subimages LH, HL, and HH contain horizontal, vertical and diagonal high frequency information of the original image, respectively. Fig. 3-3 is the original image and its single level wavelet subimages.
Fig. 3-3 Original image and its single level wavelet subimages
Because smoke blurs the texture and edges in the background of an image, high-frequency information becomes much more invisible when smoke covers part of the scene. Therefore, details will be an important indicator of smoke due to the decrease in value of high-frequency information. Energy of details is calculated for each candidate block:
2
2
2 transform coefficients are shown in Fig. 3-4.Fig. 3-4 Two-dimension wavelet transform and its coefficients
Instead of using energy of the input directly, we prefer computing the energy ratio of the current frame to the background model due to the cancelation of negative effect on different conditions and the capability of impartial measurement in the decrease:
background model. The value of the energy ratio α is our first feature in spatial domain, which supports the fact that the texture or edges of the scene observed by the camera are no longer visible as they used to be in the current input frame. It is also possible to determine the location of smoke using the wavelet subimages as shown in Fig. 3-5.(a) Original frame without smoke
(b) Frame with smoke
Fig. 3-5 Blurring in the edges is visible by single level wavelet subimages
3.2 1-D Temporal Energy Analysis
A wave is an oscillating function of time or space and is periodic. In contrast, wavelets are localized waves. They have their energy concentrated in time or space and are suited to analysis of transient signals. Differential signal is easy to extract the suddenly change of signals and the computation cost is lower to anther analysis methods. In figure 3-6 shows the original signal and the differential signal. It is an applicable way to calculate the quantity of changing value.
Fig. 3-6 Block diagram of 1-D differential operation
Ordinary moving objects such as pedestrians or vehicles have solid characteristic so we can’t see details behind through the bodies. If there is an ordinary moving object going through the candidate block then there will be a sudden energy change because of the transition from the background to the foreground object. On the contrary, initial smoke has semi-transparent nature and becomes less visible as time goes by.
A gradual change of energy is guaranteed to this process and any abrupt variation will be regarded as a noise caused by common disturbance. One-dimension temporal differential analysis of energy ratio α provides a proper evaluation of this phenomenon.
We obtain variation information by the 1-D differential shown in Fig.3-6. Therefore, the disturbance can be measured by computing the summation of variations for a predefined time interval. Obviously, ordinary solid moving objects produce a great quantity of variations in Fig. 3-7(b). Smoke has smooth variation in value of energy ratio and produces few variations shown in Fig. 3-7(e). The likelihood of the candidate block to be a smoke region is in inverse proportion to the parameter β
where D[n] is the differential signal information of energy ratio α and t is current frame and N is the number of calculate frame.
(a) (d)
(b) (e)
(c) (f)
Fig. 3-7 Comparison of changes on the wavelet energy ratio at the passage of an ordinary moving object and smoke objects. (a) Sample frame from the test sequence and temporal candidate block with an ordinary moving object. (b) Profile of the wavelet energy ratio in the selected block. (c) Differential signal of the wavelet energy ratio in the same block with possible observation of sudden change properties. (d) Sample frame from the test sequence and temporal candidate block with smoke objects. (e) Profile of the wavelet energy ratio in the selected block. (f) Differential signal of the wavelet energy ratio in the same block, with possible observation of gradual change properties.
3.3 1-D Temporal HSV Analysis
Smoke is hard to be defined by a specific color appearance precisely. However, it is possible to characterize smoke by considering its effect on the color appearance of the region on which it covers. Besides the gradual change of energy, smoke has the same property of color configuration.
Color analysis is performed in order to identify those pixels in the image that respect chromatic properties of smoke. The HSV color space and photometric invariant features are considered in the analysis. Photometric invariant features are functions describing the color configuration of each image coordinate discounting local colors variations. HSV stands for hue, saturation, and value, and is also often called HSB (B for brightness). HSV color space describes in figure 3-8. A third model, common in computer vision applications. HSV is the most common cylindrical-coordinate representations of points in an RGB color model, which rearrange the geometry of RGB in an attempt to be more perceptually relevant than the cartesian representation and the full spectrum of colors can be created by edit these three values:
Hue is another word for color. Red, blue, and yellow are the primary hues, and
when combined in equal amounts they create the secondary hues orange, green and violet. However, moving around the cone changes the Hue color along the rainbow.
Saturation is the intensity of a color (or hue). Mix colors or add black to a color, saturation and intensity drops. Add white, color becomes lighter, but not necessarily more intense. The colors are more pure when the S (saturation) values increasingly.
Value is the lightness of a color. Like saturation, adding black or white to a color affects value. Tints are colors with added white, and shades are colors with added black.
Fig. 3-8 HSV color space
It is often more natural to think about a color in terms of hue and saturation than in terms of additive or subtractive color components. They were developed in the 1970s for computer graphics applications, and are used for color pickers, and for image analysis and computer vision. Hue and saturation in the HSV color space and the normalized-RGB color space are two photometric invariant features in common use.
This work uses the HSV color space for its fast computation since it can be obtained by each channel. From the empirical analysis, smoke smooth changes each component in HSV color space of the covered point but smoke doesn’t severely change the configuration of the HSV color system. However, the configuration is likely to
, , , , , ,
, , ,
k
k
k
H B t H x y t t S B t S x y t t V B t V x y t t
(3.4)
This investigation draws the HSV color histogram of a specific block in three different situations of a video sequence in order to characterize the presence or absence of smoke. The color histogram distribution in Fig. 3-9 (c) is similar to the one in Fig.
3-9 (a). However, the presence of pedestrian produces totally different color histogram distributions between Fig. 3-9 (b) and Fig. 3-9 (a).
(a)
(b)
(b)
(c)
Fig. 3-9 HSV color histogram of a specific block (a) original image (b) covered by ordinary moving objects (c) covered by smoke
Variations of the three channels in the HSV color system are obtained by the 1-D differential again in Fig. 3-10. Ordinary solid moving objects produce a great quantity of variations in the right column of Fig. 3-10 (b). Smoke has smooth variation in HSV color space and produces few impulses shown in the right column of Fig. 3-10 (d).
respectively. The likelihood of the candidate block to be a smoke region is in inverse proportion to the parameter β.(a) (c)
(b) (d)
Fig. 3-10 Comparison of changes on color components of HSV color spaces at the passage of an ordinary moving object and smoke objects. (a) Sample frame from the
test sequence and temporal candidate block with an ordinary moving object. (b) Left column profile of the H, S and V color components in the selected block and right column profile of the differential signal of H, S and V components in the same block with possible observation of variance properties. (c) Sample frame from the test sequence and temporal candidate block with smoke objects. (d) Left column profile of the H, S and V color components in the selected block and right column profile of the differential signal of H, S and V components in the same block with possible observation of invariance properties
Chapter 4
Classification and Verification
This chapter will be divided into two parts. First part will introduce the classifier of this system. Five features proposed in the previous chapter are partially complementary with different physical meanings. The 2-D spatial wavelet feature α in chapter 3.1, distinguish high-texture objects from smoke. The 1-D temporal energy feature βE in chapter 3.2, distinguishes objects suddenly change the texture in the candidate block. The 1-D temporal chromatic configuration feature βH, βS and βV. in chapter 3.3, distinguishes objects suddenly change the color structure in the candidate block. In this section, five proposed features are combined as feature vector x[ ,
E, H, S, V]for each candidate block and classified by cascade classifier. Second part of this chapter will introduce the global feature verification in this system. There are three verification processes: area ratio, contour analysis and region analysis and they are also proposed to further reduce the false alarm rate.4.1 Classification
The conventional AdaBoost [26] procedure can be easily interpreted as a greedy feature selection process. Consider the general problem of boosting, in which a set of classification functions are combined using a weight majority vote. The challenge is to associate a large weight with each good classification function and a smaller weight with poor functions. AdaBoost is an aggressive mechanism for selecting a small set of good classification functions which nevertheless have significant variety. This study, researchers proposed five features that introduced in previous chapter and the threshold
selection for each feature is described in below.
For distribute the information of each value for block-based smoke database. To find a threshold that determines the optimal threshold classification function, causing the minimum number of examples are misclassified. Eq. (4.1) is a weak classifier
, , ,
h x f p consists of a feature( )f , a threshold
and a polarity
p indicating the direction of inequality. Here x is a 10x10 pixels image.
1,
, , ,
0,
if pf x p h x f p
otherwise
(4.1) Figure 4-1 is the flow chart of optimal threshold selection for each feature.
Selected threshold of each feature is trained by block-based smoke 10x10 patch database which consists of 2,336 smoke images and 23,632 non-smoke images.
Fig. 4-1 Optimal threshold selection for each feature
The algorithm described in Table 4-1 is used to select key weak classifiers form the set of possible weak classifiers. In this study case, five local features are proposed to classify smoke blocks and non-smoke blocks. In this study case, all of the weak classifiers (five local features) introduced in last chapter.
Table 4-1 Boosting algorithm for learning a query online
T hypotheses are constructed each using a single feature. The final hypothesis is a weighted linear combination of the T hypotheses where the weights are inversely proportional to the training errors. of negatives and positives respectively.
• For t = 1, . . . , T :
– Select the best weak classifier with respect to the weighted error
• The final strong classifier is:
Cascade architecture [26][28] is a kind of degenerate decision tree which attempts to reject as many negatives as possible at earliest stage possible. A positive result from the first classifier triggers the evaluation of a second classifier and a positive result from the second classifier triggers the evaluation of a third classifier, and so on. However, a negative outcome at any point leads to be reject immediately. Figure 4-2 shows the schematic depiction of a cascade classifier.
1 2 3 ProcessingFurther
Reject Candidate Blocks All Candidate
Blocks
T T T
F F F
Fig. 4-2 Schematic depiction of a cascade classifier
The number of cascade stages and the size of each stage must be sufficient to achieve similar detection performance while minimizing computation. Given a trained cascade of classifiers, the false positive rate of the cascade
1
where F is the false positive rate of the cascaded classifier, K is the number of
classifiers, and fi is the false positive rate of the ith classifier on the examples that get
The detection rate is
where D is the detection rate of the cascaded classifier, K is the number of classifiers, and di is the detection rate of the ith classifier on the examples that get through to it.
The cascade design process is driven from a set of detection and performance goals. In most cases of classifiers training will achieve high detection rates and low false positive rates. Table 4-2 indicated the training algorithm for building a cascaded classifier.
Table 4-2 Training algorithm for building a cascaded detector.
• User selects values for f, the maximum acceptable false positive rate per layer and d, the minimum acceptable detection rate per layer.
• User slelets target overall false positive, Ftarget.
• P = set of positive examples
* Evaluate current cascaded classifier on validation set to determine Fi and Di
* Decrease threshold for the ith classifier until the current cascaded classifier has a detection rate of at least d x Di-1 (this also affects Fi)
– N ← 0
– If Fi > Ftarget then evaluate the current cascaded detector on the set of non-smoke images and put any false detections into the set N
A very simple framework is used to produce an effective classifier which is highly efficient. Researcher selects the maximum acceptable rate for fi = 0.3 and the minimum acceptable rate for di = 0.9. Each layer of cascade is trained by AdaBoost [26] (as described in Table 4-1) with the number of features used being increased until the target detection and false positive rates are met for this level. The complete arrangement of local feature cascade classifier is shows in figure 4-3.
2D Spatial
Fig. 4-3 Local feature cascade classifier
4.2 Global Feature Verification
Global feature verification is the final stage of proposed system. This is an important stage of the system because of the global feature verification has perfect discrimination ability with tough case such like big moving objects, light reflection with a big area and tree with vivid leafs, etc. Because all of these cases would meet the local features we proposed. To solve these problems, adding more local feature is inefficient. Combine each block in the same region to get whole region of the objects.
At this time, we can analyze the region based features (global features) in each region.
Therefore, block-based connected components were employed to combine each block in neighbor. In next section will introduce block-based connected components then will introduce the global feature verification method in our system: “Area ratio”, “Contour
verification.
Fig. 4-4 Flow chart of global feature verification
4.2.1 Block-Based Connected Components
Connectivity between pixels is a fundamental concept that simplifies the definition of numerous digital image concepts, such as regions and boundaries. To establish whether these two pixels are connected, it is determined by their neighbors and finds their gray levels satisfy a specified criterion or similarity [27]. For instance, in binary image with values 0 and 1, two pixels maybe 4-neighbors, but they are said to be connected only if they have the same value.
Let V be the set of gray-level values used to define adjacency. In a binary image, V
Let V be the set of gray-level values used to define adjacency. In a binary image, V