Chapter 3 Human and Non-Human Detection
3.3 Deformable Codebook Matching
3.3.3 Multiple Occlusive Human Detection
DCBM can also be used to solve some more complex problems, as well as the situation of detecting the human whose bottom half body been covered. Foe example, we can utilize DCBM to realize the human detection in the situation of multiple-occlusive human. We find out that these two problems have lots of common points. First, the major problem of multiple-human detection is that there are some occlusive part can not be seeing when people walk side by side as shown in Figure 3-10. Second, when we take a close look at non-occlusive part of this two human, we can see the upper half body of these two people are not covered. According to these two common points we consider that we can use the deformable codebook matching algorithm to realize the human detection in detecting the multiple-occlusive human.
Figure 3-10 : Two persons walk side by side.
Figure 3-11 : Flow chart of multiple-occlusive human detection.
Figure 3-11 is the flow chart of this procedure. The first step is to select the object which probably is the occlusive object of two or more human. This step contains two moves, one is the size and ration filter and the other is the information from histogram. We use size and ration filter to take appropriate size and H/W ratio which is between 0.65 and 0.9, just as the value in the Table 2. And we use information of histogram to ensure if the moving object contains two or more occlusive blobs. Finally, we use the information of histogram to separate these blobs and then feed into the DCBM algorithm to recognition what these objects are.
0 10 20 30 40 50
0 10 20 30 40 50 60
Number of Pixels
Column Number
Figure 3-12 : Projection histogram of Figure 3-10
0 10 20 30 40 50
Figure 3-13 : Projection histogram checking and separating.
Figure 3-12 is the projection histogram of moving object in Figure 3-10, we can easily see there are two major blobs joined together, and the height/width radio is competent. But how does our system know there are two blobs joined together? We use the first order differentiation and second order differentiation of the histogram to ensure the shape of histogram is like a camel's hump. According to Fermat's theorem, let
f
:(a
,b
)→R
be a continuous function and suppose thatx
0∈(a
,b
) is a local extremum of f . If f is differentiable inx and
0f
'(x
0)=0. So we can find there are three extremums for the histogram except the beginning and ending point in Figure 3-13. And the “second derivative test” tells us if the function is twice differentiable in a neighborhood of a stationary point, then the sign of second derivative can tell us the open side of the camel's hump. By using first and second derivative of histogram we can easily to figure out whether the shape of histogram is right or not. We can use the above procedure to tell the difference between Figure 3-13 and Figure 3-14 which is a histogram of tree waved shadow. And by using the location of stationary points, we can separate the connected blobs by cutting through the location of local minimum.0 10 20 30 40 50 60
Figure 3-14 : Histogram analysis of a tree waved shadow.
Figure 3-15 : Result of multiple-occlusive human detection.
The vertical red line in the red block in Figure 3-15 is the result of multiple-human separating algorithm. As we can see, the vertical red line makes these two blobs become individual ones. Finally we cut the blob we separated and take the information of first half body. Then we use DCBM algorithm to classify the blob. And this is how the Multiple Occlusive Human Detection algorithm works.
0 10 20 30 40 50 60
Figure 3-16 : Histogram analysis of three occlusive human
4 Chapter 4
Experimental Result
In this chapter, we will show the experimental results of our human detection system. First, we discuss the importance of the threshold when doing temporal difference and binalization in the chapter 2 and present the function to determine the optimize threshold. Section 4.1 will show the simulation result of this method. And section 4.2 will preset the result of how the noise elimination filter works, and carry out some comparison with the ordinary situation. Section 4.3 presents the most main simulation result of the human detection with deformable codebook matching, DCBM algorithm. Including single and multiple human detection and half body situation.
Section 4.4 and 4.5 is the summary of the simulation results. And finally we will make some discussion in section 4.5.
4.1 Simulation Result of Optimize Threshold Finding Algorithm
Recall the function we discussed in Chapter 2, our threshold is based on background gray value. The dark part in the background frame has lower threshold, on the other hand the bright part of background has higher threshold. And the adjustment of threshold is according to the average and standard deviation of current frame and background model. To prevent the heavy computing consumption, this threshold selecting method is activated during background update process. Figure 4-1
is the example of plot of the threshold adjustment. The middle line is
TH , the basic
0 part of our threshold, the upper and under line is the threshold modified by the argument β . The two turning points of Figure 4-1 are decided by the stander deviation and the average value of current frame and background model.0 50 100 150 200 250
5 10 15 20 25 30 35 40
Threshold
Background Pixel Gray Value
Figure 4-1 : The plot of threshold adjustment.
Figure 4-2(a) is the testing video of our system, the scene of this video is our laboratory but the light is off at all. The only light source is the infrared rays from camera itself. This frame shows the poor luminance and bed situation in the indoor environment. Figure 4-2(b) is the result with utilizing the fixed threshold. We can see that the people’s bottom half body in Figure 4-2(b) can not be detected. Figure 4-2(c) is the result with utilizing the regulation 3σ of threshold which is used in several papers with good performance. We can see that it is a little batter than Figure 4-2(b), but it is not as well as our threshold yet. If use our threshold adjustment function, we can see in Figure 4-2(d) not only bottom half body but also the shoulder and head can be seen more clearly. It shows that the higher sensitivity part of our threshold adjustment function works.
(a) Current frame. (b) With fixed threshold.
(c) With regular 3σ threshold. (d) With our threshold.
Figure 4-2 : Example of utilization of the threshold adjustment function.
(a) Frame n (b) Frame n+1 Figure 4-3 : Scene of Figure4-4.
Figure 4-3 is another testing video of our system, the scene of this video is in an office environment. But the camera is against the light. If the object is near to camera, the diaphragm of camera will change quickly. It causes lots of noise when carrying out background subtraction and binalization. We can see Figure 4-4(a), when we utilize the fixed threshold to carry out the difference, lots of noise will be detected. If we utilize the regulation of threshold to carry out the difference, because the value of difference with lagerσthe foreground object can not be extracted.
(a) With fixed threshold.
(b) With regular 3σ threshold. (c) With our threshold.
Figure 4-4 : Example of utilizing three kinds of threshold.
Then the results using our threshold adjustment is shown in Figure 4-4(c), we eliminate most of noise in the result of image difference. After this procedure, we can use the erosion operator or noise elimination filter to remove the noise remain.
When using the threshold we adjusted, we can segment much clearer object than using the regulation of threshold, and with more robust when encountering high noise.
4.2 Simulation Noise Elimination Filter
When we talked about our noise elimination filter, we mention that the key of our noise elimination process is to put off the timing of using filters till finish of background subtraction. And instead of all kinds of complex filter such as median filter or Gaussian filter, we use only mean filter and get better result. Now we will demonstrate the result of using the filter in the opportune moment.
(a) Original frame. (b) Regular Gaussian filter.
(c) Regular mean filter. (d) Our mean filter.
Figure 4-5 : Examples of noise elimination filter.
Figure 4-5 is the comparison of noise elimination filter in common use. Figure 4-5(a) is current frame of video sequence. It is a pavement outside the building with tree waving next the path. Figure 4-5(b) is the result using the five by five Gaussian filter withσ =1 before background subtraction, in regular way. Figure 4-5(c) is the result using the five by five mean filter before background subtraction, in regular way too. And finally, Figure 4-5(d) is the result using the five by five mean filter, but after background subtraction. In Figure 4-5, we can easily see that the effect of the algorithm we present has the most effect of eliminating the tree waving. And as we can see, the human object is not affected a lot by the filter. The human object remains its original shape.
(a) Original frame. (b) Regular Gaussian filter.
(c) Regular mean filter. (d) Our mean filter.
Figure 4-6 : Other examples of noise elimination filter.
Figure 4-6 is the comparison of noise elimination filter in common use too.
Figure 4-6(a) is current frame of video sequence. It is a indoor office with strong light change. Figure 4-6(b) is the result using the five by five Gaussian filter withσ =1 before background subtraction, in regular way. Figure 4-6(c) is the result using the five by five mean filter before background subtraction, in regular way too. Figure 4-6(d) is the result using the five by five mean filter, but after background subtraction.
In Figure 4-5, and Figure 4-6 we can easily see that the algorithm we present works on not only the background effect object effect but also the light change.
4.3 Simulation Result of Deformable Codebook Matching
In this section we will show the result of our human and non-human detection system. First of all, we explain the tableau we will see in our system.
Figure 4-7 : Snap shot of the output of our System.
Figure 4-7 is the snap shot of the output of our DSP system, and we can see
there are two cubic blocks in this frame. The upper right red block represents the output of our system. The little block in the frame represents the location of the human detected by the system. The upper right block is the statistics of the result after a span. Because of the display of our DSP system can not show too many information in the current frame, some result will be shown by the PC. Figure 4-8 shows this kind of result, the blue block shows the non-human object in the current frame, and red block indicate the human object we detected.
Figure 4-8 : Snap shot of the output of our system on PC.
4.3.1 Human and Non-Human Detection
First we show the result in indoor environment. Figure 4-9 is the result of front and lateral human object detection shown in the scene in the normal indoor environment.
(a) Front side of human detection (b) Lateral side of human detection.
Figure 4-9 : Results of normal indoor environment.
We also show the result of front and lateral human detection by our system in the normal outdoor environment in Figure 4-10. Figure 4-10(a) shows the front-side of human and Figure 4-10(b) shows the lateral -side of human detected by our system.
We can see trees waving in the boundary of the video sequence in Figure 4-10, and they can’t affect our system.
(a) Front Side of Human Detection (b) Lateral Side of Human Detection Figure 4-10 : Result of normal outdoor environment.
Figure 4-11 and Figure 4-12 will show the non-human object shown in the scene including indoor and outdoor situation.
(a) Result with non-human object (motorcycle).
(b) Result with non-human object (car and motorcycle).
(c) Result with non-human object (dog).
Figure 4-11 : Results of non-human object shown in the outdoor environment.
(a) Background frame. (b) Background frame.
(c) Moving chairs. (d) Automatic valve.
Figure 4-12 : Results of non-human object shown in the indoor environment.
Figure 4-13(a) shows the human with bag or carries something with hand.
Figure 4-13(b) shows the human running through the path. Figure 4-14 shows the multiple human detection including indoor and outdoor situation. Figure 4-15 shows the moving objects with human and non-human objects at the same time.
(a) People carries something. (b) People running.
Figure 4-13 : Results of complex human detection.
(a) Multiple-human detection outdoor. (b) Multiple-human detection indoor.
Figure 4-14 : Results of multiple human detection.
(a) Human and car in one frame. (b) Human and animal in one frame.
Figure 4-15 : Moving objects with human and non-human objects.
Figure 4-16 is the scene with the lightless environment. In Figure 4-16(b) we can see the trees waving in the boundary.
(a) Infrared rays light source. (b) Dark and windy situation.
Figure 4-16 : Results of complex human detection.
4.3.2 First Half-Body
(a) Front. (b) Lateral.
Figure 4-17 : Results with only half-body (indoor).
(a) Covered by car. (b) Covered by car.
(c) Covered by background object. (d) Covered by background object.
Figure 4-18 : Results with only half-body (outdoor)
The figures above shows that how the DCBM algorithm works. Figure 4-17 simulates lag and bottom-body covered in the indoor environment, and we still can detect the human pass through. Figure 4-18 shows the result in the outdoor environment. And of course, the human in the video is cut from waist.
4.3.3 Object Tracking Table
Figure 4-19 : Result of object tracking table.
The main purpose of object tracking table is to prevent the false alarm of codebook classification. The major target is the background object. Sometimes the tree or flag waving just like the human body, and they probably make the system do some wrong decisions. The object tracking can prevent this kind of false alarm. Here we use the PC snap shot of the output, because it is easy to explain how it work. The blue block in Figure 4-19 is the non-human object. The yellow one represent the human object after statistics, it means that the object is always detected as human in a period of time. On the other hand, the purple block is the object which does not move too much, and detected as non-human most of time, then we drive it out to prevent the false alarm.
4.3.4 Multiple Occlusive Human Detection
(a) One of results.
(b) One of results. (c) One of results.
Figure 4-20 : Results of multiple-occlusive man detection.
We already explain the way we detect two or three people walk side by side.
The results are shown in Figure 4-20. As we explain before, the blue block shows the non-human object, and the red block indicate the human object. The vertical line in the red block is creative by multiple-human detection algorithm. As we see in Figure 4-20, when people walk shoulder by shoulder, the algorithm we proposed can separate them and running DCBM algorithm for human detection.
4.4 Testing Environment
Figure 4-21 : List of testing environment.
We select twenty-four scenes for testing, and more than thirty testing videos.
The list of testing environments is shown in Figure 4-21. We have indoor, outdoor and night scenes. The result of these scenes is recorded below.
4.5 Accuracy of DCBM algorithm
Table 4 is the testing result of the twenty-four scenes list above. The column marked as human is the recognition result of human object. The row marked as positive means that we recognize the human object as human and the row marked as negative means that we recognize the human object as non-human one. The column marked as non-human is the recognition result of non-human object. The row marked as “CBM test” is the testing result of the algorithm with only normal codebook matching, and the “DCBM Test” raw is use our deformable codebook matching algorithm to recognition the foreground objects appeared in the scene.
Table 4 : The accuracy of our system.
Human Non-Human
Human Non-Human
Human Non-Human
(a) Result of video with only full body human object.
Human Non-Human
Human Non-Human
(b) Result of video with half body human object.
Table 5: The statistic of the accuracy (Video with only full body human).
Human Non-Human
Table 6: The statistic of the accuracy (Video with lots half body human).
Human Non-Human
Table 7: The average of accuracy above.
Average Accuracy (Human) Average Accuracy (Non-Human)
CBM Test 77.2 % 87.5 %
DCBM Test 93.0 % 92.3 %
4.6 Discussion
First, Table 5 to Table 7 shows that the accuracy of our system is more than ninety percent, and we think it is enough for a warning system. There are something need to be explained about Table 5, Table 6 and Table 7. The first eighteen scenes are the normal testing videos which mean they don’t have human which only have first half body shown in the video. The other six scenes are including lots of testing sample which only human’s first half body can be seen. In this way, we can see the advantage of our DCBM algorithm. Table 5 is the result of full body matching algorithm. In Table 5 although the accurate of DCBM algorithm is a little lower than normal codebook algorithm, but when there are some human are covered in the video, the accuracy of normal codebook algorithm is become not acceptable. We can find this situation in Table 6. But our DCBM algorithm obviously can take this test. In Table 7 we can see the accuracy of our DCBM algorithm keeps the accuracy when the general situation.
Of course, there still some situation may cause the system fail. Figure 4-22 is the example of system fail. Sometimes the color of people dressing is too close to background, it may cause the background subtraction failed and cut the object by half.
It is shown in Figure 4-22(a). There is a fix size of the object after normalization, if the object in the video is much smaller than this size, the feature is no longer valid for system. In this situation, we may take this object as non-human object. Example is Figure 4-22(b).
(a) (b) Figure 4-22 : Example of system fail #1.
(a) (b) Figure 4-23 : Example of system fail. #2
Because of the codebook classification is shape-based classification the shape of object is the major feature for classification. The shape of motorcycle passing through is just like the shape when people walking. There are some objects of people riding motorcycle is classified to human object, the example is shown in Figure 4-23(a). Sometimes, the shape of tree waving object is not predictable, sometimes it will be took to be the human, the example is shown in Figure 4-23(b). Although we have multiple-human detection algorithm, but when people walk as a group like Figure 4-24, it can not be detected by our system.
Figure 4-24 : Example of system fail. #3
5 Chapter 5 Conclusion
In this thesis, a fast real-time human detection system with low computing power is proposed.. The first part of our human detection system is to segment the moving object from the scenes. We use the background subtraction here to segment the moving blob. We provide a simple and fast function to calculate the binarization threshold for the varying environments and videos taken by different cameras. In second part of our system, we use simple trajectory tracking and condition judgment to provide some data for human detection algorithm and to decrease the false-alarm rate. The final part is human detection. Because of the requirement of low computing power, we choose the shape-based method, and the codebook by training to classify human being from the other objects. The people walking indoor are sometimes covered by furniture such as desks or chairs. To solve this kind of problem, we provide Deformable Codebook Matching, a human detection algorithm for first half body with different height/width ratio. With Deformable Codebook Matching, when someone’s bottom half body is covered, the system can still work. Further, we use
In this thesis, a fast real-time human detection system with low computing power is proposed.. The first part of our human detection system is to segment the moving object from the scenes. We use the background subtraction here to segment the moving blob. We provide a simple and fast function to calculate the binarization threshold for the varying environments and videos taken by different cameras. In second part of our system, we use simple trajectory tracking and condition judgment to provide some data for human detection algorithm and to decrease the false-alarm rate. The final part is human detection. Because of the requirement of low computing power, we choose the shape-based method, and the codebook by training to classify human being from the other objects. The people walking indoor are sometimes covered by furniture such as desks or chairs. To solve this kind of problem, we provide Deformable Codebook Matching, a human detection algorithm for first half body with different height/width ratio. With Deformable Codebook Matching, when someone’s bottom half body is covered, the system can still work. Further, we use