CHAPTER 4 EXPERIMENTAL RESULTS
4.2 Fuzzy Rule Construction for Action Recognition
In order to decrease the numbers of fuzzy set, we select templates to represent a video sequence. The postures of certain activities vary slightly between two image frames if their interval is fewer than five frames in video stream. Therefore, we selected one frame every fifth frame as the template image of posture, and on the other hand the interval is equal to one-sixth second in our experiment. An example is shown in Fig. 4.5. The image at the time t1 was selected as the template n1 and the image at time t2 was selected as the next template n2.
Fig. 4.5 Template selection with an interval of five frames.
We chose six kinds of essential templates for “walking from right to left,”
“walking from left to right” and “climbing down,” respectively; five for “climbing down,” three for “crouching” and two for “jumping.” There are totally 28 kinds of essential templates, and called 28 classes. The essential template numbers of each activity depend on how long it takes. Each essential template is a cluster with five template images which are from five different training person’s and have similar postures. Fig. 4.6 and Fig. 4.7 are two examples of some templates of two training
model.
In Fig. 4.6 and Fig. 4.7, if a model bend down or squat down, the bodies in template images are wider than others. It is because images are resized until its height equals to 128 pixels or width equals to 96 pixels. Images of stand posture usually resize according to its height since the ratio of image width to height is lager than the ratio of 96 to 128 and are resized by the smaller scale. On the contrary, when the height of body shape is shorter, the magnifying factor becomes larger.
Class 3 Class 5 Class 8
Class 12 Class 15 Class 19
Class 21 Class 23 Class 25
Fig. 4.6 Some “essential templates of posture” of model A.
Class 3 Class 5 Class 8
Class 12 Class 15 Class 19
Class 21 Class 23 Class 25
Fig. 4.7 Corresponding “essential templates of posture,” Fig. 4.6, of model B.
The template images are transformed to canonical space by the methods described in Chapter 2. The mean vectors and the standard deviation vectors of all templates were computed by Eq. (24). Each template image of a training model was treated as a center. Hence, there were 140 mean vectors because of five training models and 28 classes of templates. Besides, there were six groups of standard deviation vectors and mean vectors because of six kinds of different training models.
After determining the standard deviation vectors, the corresponding training video frames are inputted. The relationship between each image frame and each
template is calculated by using Eq. (27) in Section 3.4. We gathered three images as a group in order to include temporal information. The interval between each of these three images is five image frames which is the same as in template selection. Training is accomplished in off-line situation. Therefore, we gathered three images from different start points to train fuzzy rules. For examples: the first frame, the 6-th frame and 11-th frame are gathered together as an input training data; the second frame, the 7-th frame and 12-th frame are gathered together as another input training data; the third frame, the 8-th frame and the 13-th frame are gathered together as an other input training data etc. Different start points of image frames are used for training fuzzy rules in our experiment, because the starting posture of testing video and of training video may not be the same. By utilizing different start points, the system is able to learn much more combinations of image frames and increase accuracy of fuzzy rules.
The group of the threes images is converted to the posture sequence which has the maximum summation of three membership function values in Eq. (27). Each posture sequence will trigger a corresponding rule one time. If the corresponding rule is not existent, a new rule is built in the form of IF-THEN which is represented in Section 3.4.
A threshold has to be set after all training patterns have been learned. The threshold is used to abandon the rules whose occurrence times of the specific sequence is relative few. The numbers of rules varies with different thresholds. Table I shows the rule numbers of different threshold values. One person video out of the six person models is chosen from the training data in order to use it as a testing datum.
We can easily find out that the higher threshold we set, the fewer rules we obtained.
Although higher threshold can reduce rules, fewer rules will lose the tolerance for ambiguity. If some conflicting rules are generated, we choose the rule that is supported by a maximum number of training instances.
TABLE I
THE RULE NUMBERS AT DIFFERENT THRESHOLD
Training data models Threshold = 3 Threshold = 4 Threshold = 5
Person 1 131 92 83
Person 2 130 101 80
Person 3 148 99 82
Person 4 157 105 75
Person 5 136 91 70
Person 6 150 114 87
The templates and the test patterns of fuzzy rules are both sampled with a rate of five image frames. An activity should appear in proper order directly perceived through our sense. For example, P1through P6 are the six linguistic labels of the activity “walking from left to right.” The activity of “walking from left to right”
should have the rules with the posture sequence directly perceived through the senses:
(P1, P2, P3), (P2, P3, P4), (P3, P4, P5), (P4, P5, P6), (P5, P6, P1), (P6, P1, P2). We called these rules essential rules. There are totally 24 essential rules for the six activities. But there are only 18 essential rules found in our experiment if we set the threshold at three. Numbers of fuzzy rules are many more than the essential rules, because essential rules are based on the view of spatiotemporal space but fuzzy rule base is based on the view of canonical space. Fuzzy rule base is able to learn the hidden modes or replaceable existent in these actions. The appeared essential rules are less than 24 because fuzzy rule base combines some similar rules to one rule. Some of the fuzzy rules, which used the training data of person 1 and the threshold was set at three, are listed in Table II. Two of the fuzzy rules are represented in the view of template images in Fig. 4.8.
TABLE II
SOME OF THE OBTAINED FUZZY RULE BASE
Number Image 1 Image 2 Image 3 Class
1 P1 P1 P1 WLR
2 P1 P1 P2 WLR
3 P1 P1 P3 WLR
M M M M M
30 P4 P11 P12 WRL
M M M M M
60 P3 P13 P14 JUMP
M M M M M
80 P13 P16 P17 CROUCH
M M M M M
91 P2 P18 P18 CUP
M M M M M
129 P27 P28 P10 CDOWN
130 P28 P7 P7 CDOWN
131 P28 P28 P10 CDOWN
(a)
(b) Fig. 4.8 Two examples of fuzzy rules. (a) Walking from left to right
(b) Climbing down