Chapter 2 Moving Object Extraction
2.3 Moving Object Tracking
The connected components labeling process extracts all separate object and give them indexes. Then we can use these data to build a table for these objects. We call this table Object Tracking Table. This object tracking table contains several items, include life time, coordinates, classify record…etc. Table 1 shows all items in the object tracking table. Several items are useful for further classification which we will discuss in chapter 3, such as H/R ratio or histogram.
Table 1 : The items of object tracking table.
Item description
Index Labeling number
Coordinates Top-right and bottom-left coordinates Life Times Life period
FindorNot Search result for comparing previous object tracking table FindorNot Counter Search result buffer
NeworNot New object or not Classify Record Classify Record H/W Ratio Height, width ratio.
Size Size
Histogram Data Histogram for pixel number project to width axis
To build object tracking table there are several steps we need to do. Firstly, we need to know the object existed in current frame will or not exist in next frame.
Secondly we need to analysis the character of object to assist our classification process. The flow chart of how to know the object shown in this frame is still here when next frame comes in is as shown bellow.
Figure 2-9 : Object tracking table update process.
We check every new object to see if the object is belonging to one of object tracking table. The first step is checking the overlapping status. There are three
condition of overlapping status: one contains the other or big overlapping area, small overlapping area and more the one overlapping object, and not overlap. When the area of one object contains the other or there is seemed to be a big overlapping area, we take these two objects are the same one and update the object tracking table. If they only have small overlapping area or the compared result shows that the new object has more the one overlapping object, we will use the center of object to tell the difference. We set a threshold for difference between two centers of object. If the difference is upper than threshold, the comparison is fail. If the difference is lower than threshold, we take these two objects the same. When more than one object is success compared, the final check method is present. We use histogram comparison to be the final gate of object tracking table updating process. When new object connect to more than two object lists in the object tracking table and they are close enough for center threshold, we use the histogram of each object to calculate the SAD (Sum of Absolute Difference). Because our system is a real-time system, when FPS is 30, using SAD to identify the similarity is practicable.
3 Chapter 3
Human and Non-Human Detection
In this chapter, we will explain how the human and non-human detection process works. Because of the requirement of low computing power, we choose the shape-based human model to classify human being by codebook matching, which decrease the performance of human detection from the other objects. The people walking indoor are sometimes covered by furniture such as desks or chairs. To solve this kind of problem, we provide Deformable Codebook Matching, a human detection algorithm for first half body with different height/width ratio. With Deformable Codebook Matching, when someone’s bottom half body is covered, the system can still work. Further, we use Deformable Codebook Matching to implement the human detection for multiple people walking side by side. Figure 3-1 is the flow chart of human and non-human detection process.
Figure 3-1 : The flow chart of human detection.
3.1 Codebook Classification
The ultimate goal of our developed system is to be able to identify people and track individuals to find out what they are doing. But the most of we can do now is to do the classification of human and non-human. The algorithm we used is presented below. For human recognition, we use the codebook to classify the human from other objects. At first, we normalize the size of human being in any attitude to 20 pixels at the horizontal by 40 pixels at the vertical, and then extract the shape and the histogram of object as the feature code to construct the codebook. Second, we match this feature vector against the code vectors in the codebook. The purpose of matching process is to find a code vector in codebook with the minimum distortion to the feature vector of object. If the minimum distortion is less than a threshold, we consider this object as human.
In order to describe how we use the codebook to classify the human from other objects, there are some variables should be defined at first. If we can extract a series of features as feature word X from every normalized image, and each of X
V
L L . The distortion between feature word and code word is defined in Equation 3-1.∑
−X
Codebook
With the definition of these variables above, we can explain the procedure of the human detection. Every time when we get a new foreground object, we do the normalization to get a uniform size image. After normalization, we take the feature wordX from this new object. And the way we extract the feature word X will be shown in the next section. The feature word X is used to compare with every
V
j in the codebook (C). The compared functionDIS
(X
,V
) is shown in Equation 3-1.Dismin is the minimum of comparing result in the N code words. If the value of Dismin is smaller than the threshold we defined, the object with the feature word X is considered as human; otherwise, it is not a human object. Figure 3-2 shows the demonstration of comparing X with
V
j in the codebook.Figure 3-2 : The procedure of the comparison with the codebook.
And the way to extract the feature word X is described as follows. After normalizing the object image to 20 pixels by 40 pixels, we use a vector with twenty elements to describe the shape of the foreground object, and a histogram vector of ten elements by the projection of the object image on X axis is also used to increase the accuracy. To extract the shape features of foreground object, we draw a horizontal line with fixed coordinate at Y axis on the normalized image. Both the leftest and most right intersections of the horizontal line and the boundary of the object are recorded to represent the shape information. The features of feature word are obtained by drawing
D
minten horizontal lines. The twenty coordinates at X axis of the twenty intersections forms the feature word to represent the shape information. Figure 3-3 shows the feature word extraction in the image (white points).
Figure 3-3 : Feature word extraction: shape information.
There are something we need to notice. The top two and bottom two raw of pixels are not suitable for the feature word because these pixels are changeable. The way we find ten fixed Y axle values is to calculate the standard division in each fixed Y axle for total four thousand training samples, and then chooses ten lowest values each side to determine the coordinate Y axis of these ten horizontal lines.
After the shape information extraction mentioned above, we get feature word with twenty elements. The way to find the histogram of projection information is described below. Figure 3-4 (b) shows the final ten dimension of the feature word.
We project the mask image to the X axis and calculate the pixel value to build a histogram of the projection on X axis. We take ten values of histogram for the feature word. If we only use discrete shape information, some of hollow object may not be detected correctly. The histogram of projection information can eliminate lots of non-human object and prevent the false alarm. The diagram to extract the histogram of project information is shown in Figure 3-4 (b)
(a)Object. (b)Histogram of projection
Figure 3-4 : Feature word extraction: histogram of projection information.
After describing our procedure of human detection, we will illustrate how to build a codebook for the distortion measurement in the next section. When we build a codebook for the classification, the further step is to overcome some problems encountered to improve the performance of system. It will be discussed in the section after the establishment of the codebook.
3.2 Training Algorithm
The design of the codebook is critical for the classification. The well-known partial distortion theorem for codebook design is that each partition region makes an equal contribution to the distortion for an optimal quantizer with sufficiently large N [16] and [17]. Based on this theorem, we use the distortion sensitive competitive learning (DSCL) algorithm to design the codebook. In order to describe this algorithm, we define
V
={V
j;j
=1,2L,N
} as the codebook andV is the
jj code vector.
thXt is the
t
th train vector andL
is the number of train vector.D
i is the partial distortion of regionR
i , andD
is the average distortion of codebook. The DSCL algorithm is described as follows.3.2.1 Pre-classify (K-Means)
The first step of the training algorithm is to initiate a set of code words in the codebook for the initial sets, and we select the K-Means to do this job. We use K-Means algorithm to build
N
prototypes fromM
ttraining sample for the first step of training algorithm. The simple description of K-Means algorithm is presented below.The main purpose of K-Means algorithm is to cluster the whole samples based on attributes into k partitions. It is similar to the expectation-maximization algorithm for mixtures of Gaussians in that they both attempt to find the centers of natural clusters in the data. It assumes that the object attributes form a vector space. The objective is to make the center to achieve minimize total intra-cluster variance, or, the squared error function.
S
i. We use the Lloyd's algorithm which is the most common form of the algorithm using an iterative refinement heuristic. Lloyd's algorithm consists of two steps. At first, the input points are partitioned into k initial sets. It may use either at random or using some heuristic data. Secondly, it calculates the mean point, or center, of each set. The second step will constructs a new partition by associating each point with the closest center. Then take the new clusters into first step, and algorithm repeated by alternatelyapplying of these two steps until convergence, which is obtained when the points no longer switch clusters (or alternatively centroids are no longer changed). Lloyd's algorithm and k-means are often used synonymously. In reality, Lloyd's algorithm is a heuristic for solving the k-means problem. However, with certain combinations of starting points and centroids, Lloyd's algorithm can in fact converge to the wrong answer (A different and optimal answer to the minimization function above exists.)
3.2.2 Training Algorithm
We refer the DSCL algorithm in the [17] to build the codebook of codebook matching algorithm, and the steps of training algorithm is list below.
z Step 1 :
Initialization I
Set
V
(0)={V
j(0);j
=1,2L,N
} with K-means algorithm, andInitialization I I Set
t
=0z Step 3 :
Compute the distortion of each code word.
) (t
W
X
Dis
j = t − jz Step 4 :
Select the winner: the
k
thcode wordN
Adjust the code word for winner.
))
Others go to step8.
z Step 8 :
3.3 Deformable Codebook Matching
There are several problems for human detection in indoor environment such as light change or hidden foreground object. When people walk in the indoor environment, it is a common situation that the bottom half body of human is covered by background object. Some examples of this kind of situation are shown in Figure 3-5. The camera is setting on the ceiling and captures the video sequence with an angle of depression. On this angle, we can see the bottom half body of human on the field is blocked by the table or board.
Figure 3-5 : Indoor video.
Figure 3-6 : Outdoor video.
When it comes to outdoor environment, the situations are also the same which is shown in Figure 3-6. When it comes to these situations, most of human recognition system which use the full body features will be failed. Because of this kind of situations, we propose “Deformable Codebook Matching (DCBM)” algorithm to attack the problem of half body of human occurring at first as shown in Figure 3-5 and Figure 3-6. Further more, we use the deformable codebook matching algorithm to do the multiple-occlusive human detection. This part will be also present after we finish describing the DCBM algorithm.
Figure 3-7 : The flow chart of deformable codebook matching.
Figure 3-7 is the flow chart of deformable codebook matching algorithm. The ratio filter tells us which detecting algorithm we should execute, full body matching
or first half body matching or multiple human matching. The result buffer can receive the result from object tracking table to correct the temporary result.
3.3.1 Full Body Matching
As we discussed in the previous section, we use the height and the width of the object to decide which matching algorithm we should use. The table 2 shows this process, where
R
H/W is the ratio between the height and the width of the object.Table 2 : H/W ratio selecting table.
Full Body Matching 1.5≤
R
H/W <2.5 Half Body Matching I 1.2≤R
H/W <1.5 Half Body Matching II 0.9≤R
H/W <1.2 Multiple People detection 0.65≤R
H/W <0.9The full body matching is the default matching algorithm. We use all information of the feature code word to distinguish human from the other objects. The distortion
Dis
between feature word and code word in the codebook is defined in Equation 3-1. AndDis
min defined in Eq 3-2 is the minimum of the distortion with N code words. If the value ofDis
min is smaller than the threshold we set, the object is human, otherwise, it is not human. This part we already explain in the section 3.1.The threshold of this matching algorithm is defined by the testing result. After training procedure, we use another two hundred testing samples to test the codebook we built and find the suitable threshold for the codebook. The testing result is shown in Figure 3-8. We take the value in the intersection as the value of threshold. And there is one thing we should notice that the feature word contain two part, shape
information and histogram information. Just as the feature word, there are two thresholds for shape feature word and for histogram feature word.
0 10 20 30 40 50 60
0.0 0.2 0.4 0.6 0.8 1.0
Accurate
Threshold Value
Human Object Non-Human Object
Figure 3-8 : Threshold finding result.
3.3.2 First Half Body Matching
The first half body matching is the most important part of the deformable codebook matching. This matching algorithm is proposed for solving the foreground object covered by background object such as the cases in Figure 3-5 and Figure 3-6. It uses the limited information to classify the moving object. The main idea is very simple. We use the H/W ratio
R
H/W to decide how many dimensions of data we should use for classification. Table 2 shows the way we select matching algorithm by looking upR
H/W . We can see there are two level of first half body matching.Actually, we should use the linear function to calculate the percentage of the data we
need, but in order to reduce the computing power and simplify the system, we only use two stages to simulate the overall situations.
(a) People with only half body. (b) People with full body.
Figure 3-9 : Comparison with full and half body.
When people show in the video with only first half body, we can see in Figure 3-9, it contains a little less information of the human body than full body human object. If we want to remain the performance of detecting rate and not to modify the codebook we already build. The best and the simplest way is to use the information which is not lost when covered or the information which can not be affected by this situation.
The information which will not be lost when the human object is covered by something is the upper half body shape-based features. With this definition, they are the first ten or eighteen shape features of the shape-based part in the feature word. We can see the Table 3. In the ordinary full body codebook matching algorithm, the shape-based information is twenty dimensional data. With decreasing of
R
H/W, we also decrease the number of shape-based feature. And the size of object after normalized needs to be change for the matching procedure too.Table 3 : Half body matching table.
There is one thing we should notice, the feature word contain two part, information from shape and information from histogram. As we mention before, the information we need for the half body matching is the feature which can not be affected when covered situation occurs. When we get the feature word from histogram, we also normalize it in to a fixed range. So even if the bottom part of human is covered, the histogram of the human is not change a lot. The threshold for this part is also been modified according the requirement. So the information from histogram can be taken as the invariable feature with covered situation occurring.
After recombination of these two parts of feature word, the only thing we need to do is to compare the feature word with the codebook to obtain the distortion. And this part has been discussed in the section 3.1. The only thing we need to do is to do some little modification in Equation 3-1.
∑
Equation 3-4 is the result after some modification. S is the set of the shape-based feature which we need in the different level of half body matching algorithm. H is the set of all the feature from histogram.
3.3.3 Multiple Occlusive Human Detection
DCBM can also be used to solve some more complex problems, as well as the situation of detecting the human whose bottom half body been covered. Foe example, we can utilize DCBM to realize the human detection in the situation of multiple-occlusive human. We find out that these two problems have lots of common points. First, the major problem of multiple-human detection is that there are some occlusive part can not be seeing when people walk side by side as shown in Figure 3-10. Second, when we take a close look at non-occlusive part of this two human, we can see the upper half body of these two people are not covered. According to these two common points we consider that we can use the deformable codebook matching algorithm to realize the human detection in detecting the multiple-occlusive human.
Figure 3-10 : Two persons walk side by side.
Figure 3-11 : Flow chart of multiple-occlusive human detection.
Figure 3-11 is the flow chart of this procedure. The first step is to select the object which probably is the occlusive object of two or more human. This step contains two moves, one is the size and ration filter and the other is the information from histogram. We use size and ration filter to take appropriate size and H/W ratio which is between 0.65 and 0.9, just as the value in the Table 2. And we use
Figure 3-11 is the flow chart of this procedure. The first step is to select the object which probably is the occlusive object of two or more human. This step contains two moves, one is the size and ration filter and the other is the information from histogram. We use size and ration filter to take appropriate size and H/W ratio which is between 0.65 and 0.9, just as the value in the Table 2. And we use