Training Process for Adaboost Classifier

第五章結論

Chapter 3 Human Action Recognition using Adaboost

4.2 Training Process for Adaboost Classifier

By using 150 sets of MHPH from 15 video clips of walking behavior video format, 100 sets of MHPH from 14 segments of squatting down behavior, 130 sets of MHPH from 10 segments of falling down behavior, 70 sets of MHPH from 5 segments of running behavior, and 120 sets of MHPH from 11 segments of other behaviors; the following shows the stages of training and identification;

Step 1: Classifying 100 sets of MHPH for walking, falling down, running and squatting down as positive images, and 50 sets of MHPH for other behaviors as negative images.

Step 2: Before Adaboost is trained, every positive images must be checked to confirm that the main features in positive images is correctly highlighted, and is given the right values. Using create sample in OpenCV [ 20] to create the positive sample needed for highlighting the featured area.

Step 3: Using the highlighted positive sample in Step 2, OpenCV[20] is used to create positive sample of every behaviors. This consists of: 100 sets of positive sample for the behavior of walking, 100 sets of positive sample for the behavior of falling down, 50 sets of positive sample for the behavior of running, and 100 sets of positive sample for the behavior of squatting down.

Fig. 4.8 Positive sample create as according to the MHPH of squatting down.

Step 4: Using OpenCV[20] and positive images is trained to become effective classifier; during training, hit rate of various behaviors are designated at the lowest value of 0.995, and false alarm rate is designated at the highest value of 0.5. Firstly, openCV[20] will isolate on the haar-feature resulted by the quantity of the sample during training. Adaboost will focus on the training of the background area. This is because the concept of Adaboost is to first utilizing the feature to eliminate the large portion of background information, then it will train areas which are more difficult to distinguish. For example, with the behavior of walking, the first stage took 15 seconds and 2 features to eliminate the background information. The second stage took 30 seconds and 3 features to eliminate the background information. The third stage took 4 minutes and 7 features to eliminate the background information. Upon the 9 stage, it took 17 hours and 21 features to eliminate the background information; example with the behavior of squatting down, the first stage took 21 seconds and 3 features to eliminate the background

information. The second stage took 41 seconds and 4 features to eliminate the background information. The third stage took 7 minutes and 7 features to eliminate the background information. Upon the 14 stage, it took 27 hours and 32 features to eliminate the background information; example with the behavior of falling down, the first stage took 13 seconds and 4 features to eliminate the background information. The second stage took 37 seconds and 5 features to eliminate the background information. The third stage took 2 minutes and 7 features to eliminate the background information. Upon the 12 stage, it took 23 hours and 27 features to eliminate the background information; another example with the behavior of running, the first stage took 21 seconds and 2 features to eliminate the background information. The second stage took 33 seconds and 3 features to eliminate the background information. The third stage took 4 minutes and 4 features to eliminate the background information. Upon the 7 stage, it took 9 hours and 19 features to eliminate the background information; all input data used as training reference with influence the results after training, and the efficiency of execution. Main influencing attributes are;

positive and negative images will require 3 hours, but to train 100 sets of positive and negative images will require 1 working day.

(2)The ratio of positive images to negative images quantity: During training, with experiences from previous studies, the quantity of positive images is twice the quantity of negative images. Classifier trained under such conditions has better results as more negative images helps to narrow the possibility of correct areas.

(3)Sample Size: during training, a broader range can be obtained with larger sample size. Relatively the time consumed is higher during training and identification. Under the same condition, sample size 10 training duration is shorter, but accuracy is lower. Sample size 20x20 has a longer training duration and is more accurate.

(4)Level of stage: The quantity of sub-cascade determined the efficiency of the classifier. But longer duration is required.

(5)The value of minimum hit rate and Maximum false alarm: Accuracy of the value will determinate the efficiency of the classifier. When the minimum value is high, or when the Maximum false alarm value is small, the higher the standard to achieve is set higher, the classifier will produced better results.

(6)Basic or all mode choices: Upright feature is the main Extended Sets of

Haar-like Feature in Basic mode. All modes include 45 degree rotated feature set other than upright feature; Training is faster with Basic mode but with poor detection rate. Training is longer with all mode but the classifier it produced have better detection rate.

Therefore we proceed with the training stage by using different attributes to initiate the training. Using different classifier produced for detection, the results are observed and the variation in the detection rate.

Step 4: Using 50 sets of MHPH derived from 5 segments of video, classifiers produced for detection in Stage 3 are used for detection of different behaviors.

Table 4.3 Results of walking detection Classifiers Correct

detection

Incorrect detection

Rate of accuracy

Duration Image Number：30，

Sample Size：10x10，

Stage Number：3

31 19 62% Train Duration:30 mins Detect Duration: 1sec Image Number：50，

Sample Size：15x15，

Stage Number：5

37 13 74% Train Duration:4 Hour Detect Duration: 3 sec Image Number：100，

Sample Size：50x50，

Stage Number：9

47 3 94% Train Duration:1 Day Detect Duration: 5 sec

Stage Number：3 Image Number：50，

Sample Size：15x15，

Stage Number：5

34 16 68% Train Duration:4 Hour Detect Duration: 3 sec Image Number：100，

Sample Size：50x50，

Stage Number：12

43 7 86% Train Duration:1 Day Detect Duration: 5 sec

Table 4.4 Results of falling down detection Classifiers Correct

detection

Incorrect detection

Rate of accuracy

Duration Image Number：30，

Sample Size：10x10，

Stage Number：3

29 21 58% Train Duration:30 min Detect Duration: 1 sec Image Number：50，

Sample Size：15x15，

Stage Number：5

33 17 66% Train Duration:4 Hour Detect Duration: 3 sec Image Number：100，

Sample Size：50x50，

Stage Number：14

39 11 78% Train Duration:1 Day Detect Duration: 4 sec

Table 4.4 Results of running detection Classifiers Correct

detection

Incorrect detection

Rate of accuracy

Duration Image Number：20，

Sample Size：10x10，

Stage Number：3

28 22 56% Train Duration:30 min Detect Duration: 1 sec Image Number：30，

Sample Size：15x15，

Stage Number：5

31 19 62% Train Duration:4 Hour Detect Duration: 3 sec Image Number：50，

Sample Size：50x50，

Stage Number：7

36 14 72% Train Duration:1 Day Detect Duration: 4 sec

With observation from the walking detection in Table 4.3, when training data is

although training duration is short, rate of accuracy in detection is also lower. Rate of accuracy in detection for squatting down and falling down is lower when compared with the results of walking detection. This is due to the variation in motion direction provided as training data, such as in the hands, legs, and body in falling down and squatting down as to walking. Thus, error in the detection is caused. However if adequately large volume of training data is used, this error can be resolved. We also observed in our results that Adaboost, even though it is limited by the accuracy of the classifiers, and is commonly used for facial recognition, the duration it took to do not have a big difference. It took an average 1~3 seconds to detect one image. This result shows that Adaboost can be used for behavioral detection. This is because the greatest problem in behavioral monitoring is the ability to accurately detect a specific behavior in real time. But with our formulation, accurate real time behavioral detection is possible.

Next, we compared the proposed method of Alireza Fathi and Greg Mori with our method. The proposed method of Alireza Fathi and Greg Mori is the utilization of low-level optical flow information of uncompressed video data to for the structure of

same as the proposed method of Alireza Fathi et al.[19]. But in the case of behavior detection: running towards the right and running towards the left, our method is superior than the proposed method of Alireza Fathi et al; meanwhile, our method is able to detect using compressed video data, while the proposed method of Alireza Fathi et al can only detect behavior using uncompressed video data.

Table 4.5 Accuracy rate of our method compare with Fathi [19]

Action Our Method Alireza Fathi et al

Running 72% 72%

Walking 94% 92%

Table 4.5 Summary of our method with Fathi[20]

Compare Item Our Method Alireza Fathi et al Method Adaboost Multi-Adaboost Video Type Compress Video Un-compress Video

Training Feature MPHP Mid-Level motion feature Noise reduce Reduce slight noise No handle noise part

The problem faced by using a simple tool such as Adaboost to detect individual and singular behavior such as walking and not walking, falling down and not falling down, is that people may walked a certain distance before demonstrating such behaviors. We can only detect specific behavior purely in a designated duration. Other than that, more detailed detection is difficult. Therefore multi-adaboost can considered in later studies for training and detection.

在文檔中中華大學 (頁 37-46)

第五章 結論

Chapter 3 Human Action Recognition using Adaboost

4.2 Training Process for Adaboost Classifier

第五章結論