• 沒有找到結果。

Chapter 4 Experimental Results

4.1 Image Rectification Result

All frames of video captured in total dark environment must be rectified by using Eq. (3.6) in Section 3.1. Fig. 4.2(a) is a frame from the action recognition training data and it is transformed to gray level. Results of rectifying NIR images with different f are shown in Fig. 4.2. We can find that f = 300 is a better parameter for rectifying the NIR image to be a uniform illumination image.

(a) (b)

(b) (d)

Fig. 4.3 Results of rectifying NIR image with different f (a) brfore rectifying, (b) f = 200, (c) f = 300, (d) f = 400.

4.2 Background Model and Foreground Object Extraction

For constructing the background model, we first record a video of pure background (like Fig. 4.1) about 2 second in bright and dark environments. After building the grayscale value and the HSV color space background models, we will extract the foreground pixel by using Eq. (2.5) and Eq. (2.6) in Section 2.1.2. Then we continue to emend the former foreground image by shadow filter.

In order to get the optimal result of object extraction, we have to adjust some parameters in our system. We set k 2.3 and k 2.0 for the grayscale value background models and kv 1.4 and kv 1.1 for the HSV color background models in bright and dark environments respectively. The same parameter is used in bright and dark environments for shadow filter. We set Lncc 0.95 in the grayscale

value space and kH 1.3 and ks 1.3 in the HSV color space to detect shadow pixels. Fig. 4.3 shows results of foreground extraction in bright and dark environments.

(a) (b)

(c) (d)

Fig. 4.4 Results of foreground extraction (a) an image frame in the bright environment, (b) foreground extraction from (a), (c) an image frame in the dark

environment (d) foreground extraction from (c).

Finally, we simply introduce a threshold on the histograms in X and Y direction to minimize the size of foreground images, and then resize the images to 96×128 for normalization. That is described in Section 2.1.2. The threshold in X and Y direction is about 10 pixels in our experience.

4.3 The Day and Night Face and Action Recognition

4.3.1 Fuzzy Rule Construction

We construct the template model matrix and the fuzzy rule database with the training data. Firstly, we choose key posture images as essential templates from each action, and the number of each action is according to its period. Key posture images of each action for one person (one model) are shown in Fig. 4.4. We will regard each posture as one class.

(a)

(b)

(c)

(d)

(e)

Fig 4.5 Key postures of the actions (a) walking from right to left, (b) walking from left to right, (c) walking straightly, (d) bending, (e) waving.

Fuzzy rules are constructed in off-line situation. We gathered three images from different start points to train fuzzy rules. For examples: the first frame, the 6-th frame and 11-th frame are gathered together as an input training data; the second frame, the 7-th frame and 12-th frame are gathered together as another input training data, etc.

By utilizing different start points, the system is able to learn much more combinations of image frames and increase accuracy of fuzzy rules.

The group of the three images is converted to the posture sequence which has the maximum summation of three membership function values in Eq. (2.44). Each posture sequence will trigger a corresponding rule one time. If the corresponding rule is not existent, a new rule is built in the form of IF-THEN which is represented in Section 2.5.

4.3.2 The Recognition Rate of Actions

In order to calculate the recognition rate of actions, we use off-line videos in our experiment. Then, we input the testing video from different starting frames which is

similar to the way for the training fuzzy rules. Namely, we recognize the video from the first frame, the second frame and the third frame, etc. Table I and Table II show the recognition rate in bright and dark environments respectively, four folds cross validation, of each action of each model. If we test these videos in Person 1, we will constructed the templates and fuzzy rules by used the order three persons. That is, the testing video was not used for constructing templates and fuzzy rules.

In the tables, WRL is the action “walking from right to left,” WLR is the action

“walking from left to right,” WS is the action “walking straight,” WAVE is the action

“waving,” BEND is the action “bending.” Here, the recognition rate is the number of correct recognition divide by the total number of recognition for each video.

Table I

The recognition rate of each activity in the light environment

Person 1 Person 2 Person 3 Person 4

WRL 93.1% (108/116) 99.1% (105/106) 92.9% (118/127) 90.3% (121/134) WLR 95.0% (95/100) 100% (110/110) 98.4% (112/124) 96% (96/100)

WS 100% (81/81) 88.8% (87/98) 92.3% (36/39) 94.0% (47/50) WAVE 100% (83/83) 95.6% (43/45) 100% (107/107) 98.1% (53/54) BEND 100% (48/48) 89.2% (66/74) 100% (200/200) 100% (74/74)

Average 95.7% (1790/1870)

Table II

The recognition rate of each activity in the dark environment

Person 1 Person 2 Person 3 Person 4 WRL 89.5% (68/76) 90.9% (60/66) 95.5% (64/67) 93.3% (56/60) WLR 100% (80/80) 95.6% (43/45) 100% (66/66) 93.9% (46/49) WS 100% (83/83) 100% (49/49) 100% (44/44) 100% (32/32) WAVE 82.7% (62/75) 100% (94/94) 93.8% (45/48) 100% (55/55) BEND 100% (86/86) 100% (70/70) 97.6% (80/82) 94% (63/67)

Average 96.5% (1249/1294)

4.3.3 The Recognition Rate of Faces

In our face recognition experiment, we take face images of 8 persons and 9 persons in bright and dark environments respectively to obtain the accurate rate of face recognition. The size of face images is 50×60 for training and testing. Firstly, the face images are project to eigenspace by using EST transformation. Then, we utilize CST transformation to project former images to FisherFace space and implement face recognition. The test image is compared to every training data by L2 norm to find the most similar one. The numbers of training and testing images are 15 and 45 for each person in the darkness. In the lightness, the numbers of training and testing images are 10 and 100 for each person. Fig. 4.5 shows the curve of accumulative eigenvalues.

Accumulative eigenvalues contain 98% information of images when the number of eigenvalues is about 50. Fig. 4.6 shows the correct rate of face recognition by using FisherFace method for different dimension in eigenspace. The best correct rate of face recognition in bright and dark environments are recorded in Table III and Table IV respectively.

(a)

(b)

Fig. 4.6 The Curves of accumulative eigenvalues (a) in bright environment (b) in dark environment.

0 10 20 30 40 50 60 70 80 90 100

0 20 40 60 80

%

dimensions of eigenspace

0 10 20 30 40 50 60 70 80 90 100

0 20 40 60 80 100 120 140

%

Number of eigenvalues

(a)

(b)

Fig. 4.7 The curves of face recognition rate versus dimensions of eigenspace used in the (a) bright environment; (b)dark environment.

82 84 86 88 90 92 94 96 98 100

0 20 40 60 80

%

dimensions of eigenspace

0 10 20 30 40 50 60 70 80 90

0 20 40 60 80 100 120 140

%

dimensions of eigenspace

Table III

The correct rate of face recognition in the light environment

  Person 1 Person 2 Person 3 Person 4 Person 5 Person 6 Person 7 Person 8

Person 1 100 0 0 1 0 0 2 0

Person 2 0 100 0 0 0 0 0 0

Person 3 0 0 99 0 0 0 0 0

Person 4 0 0 0 99 0 0 0 0

Person 5 0 0 0 0 100 0 2 0

Person 6 0 0 1 0 0 100 0 4

Person 7 0 0 0 0 0 0 96 0

Person 8 0 0 0 0 0 0 0 96

individual

Accuracy rate 100% 100% 99% 99% 100% 100% 96% 96%

The total accuracy rate is 98.7%.

Table IV

The correct rate of face recognition in the dark environment

  Person 1 Person 2 Person 3 Person 4 Person 5 Person 6 Person 7 Person 8 Person 9

Person 1 42 0 4 0 0 0 0 5 0

Person 2 2 44 0 0 0 0 1 1 0

Person 3 0 1 23 2 9 2 15 0 7

Person 4 0 0 0 27 0 0 0 0 0

Person 5 0 0 0 0 36 0 0 0 2

Person 6 0 0 0 0 0 43 0 0 0

Person 7 0 0 18 0 0 0 29 0 0

Person 8 0 0 0 0 0 0 0 39 0

Person 9 1 0 0 16 0 0 0 0 36

Individual

Accuracy rate 93.3% 97.8% 51.1% 60.0% 80.0% 95.6% 64.4% 86.7% 80.0%

The total accuracy rate is 78.8%

Test  Judge 

Test  Judge 

4.4 Sleep/Awake Detection

In the sleep/awake detection system, the detected region is divided into hundreds of macroblocks if the size we choose of macroblocks is 5×5 pixels (see Fig. 4.7).

Dimensions of the region in the rectangle with red edges are 265×85 pixels (i.e., 53×

17 = 901 macroblocks). Common sample rate of NIR camera is 30 frames per second, and it will waste the spaces for data if we capture image data by using the sampling rate. Because the human activity is not active in sleeping, we reduce the sampling rate to 2 frames per second for our records.

Fig. 4.8 The region of sleep/awake detection.

Table IV show the result of sleep/awake detection. An interval represents a sleep or awake video of 30 seconds. The threshold of MADI is 6 that is set by training data.

When a person is awake, our system will output 1, otherwise 0.

Table V

The result of sleep/awake detection

Awake Sleep Interval MADI Judge 1 Judge 2 MADI Judge 1 Judge 2

1 20.57 1 1 3.82 0 0

2 6.30 1 1 2.93 0 0

3 15.97 1 1 3.63 0 0

4 2.82 0 1 4.68 0 0

5 4.12 0 1 3.30 0 0

6 43.78 1 1 3.53 0 0

7 4.73 0 1 3.02 0 0

8 75.52 1 1 4.77 0 0

9 93.83 1 1 4.45 0 0

10 3.38 0 1 4.23 0 0

11 2.62 0 1 3.62 0 0

12 3.28 0 1 4.00 0 0

13 13.37 1 1 3.98 0 0

14 2.95 0 1 3.02 0 0

15 2.42 0 1 2.58 0 0

16 3.75 0 1 4.37 0 0

17 2.88 0 1 4.35 0 0

18 2.80 0 1 3.40 0 0

19 3.10 0 1 2.95 0 0

20 2.75 0 1 3.47 0 0

4.5 Sleeping Posture Recognition

Actions recognition system is utilized to classify sleeping postures in this thesis.

We set k 2.0 for the grayscale value background models and kv 1.1 for the HSV color background models. In the HSV color space, we set Lncc0.95 in the grayscale value space and kH 1.3 and ks 1.3 to detect shadow pixels. Fig. 4.3

shows results of foreground extraction in bright and dark environments. Key posture images of four sleeping posture are show in Fig. 4.8. We select different postures as templates according to degree of shrinking feet in sleeping postures, right and left foetus. Table VI show the correct rate of sleeping posture.

(a)

(b)

(c)

(d)

Fig. 4.9 Key postures of sleeping postures (a) log, (b) star-fish, (c) right-foetus, (c) left-foetus.

Table VI

The recognition rate of each sleeping posture

Log Star-fish Right-foetus Left-foetus

Person 1 100% (79/79) 100% (95/95) 96.1% (73/76) 98.2% (54/55)

Average 98.7% (301/305)

         

5. Conclusion

In this thesis, we implement the automatic home health care system that combine the face, action and sleep/awake recognition of a person in day and night. The test images are extracted by background subtraction in action recognition system and by Haar cascade classifier in face recognition system. Then, the test images are transformed to a new space by eigenspace and canonical space projection for better efficiency and separability. Because actions are dynamic unlike face, we gather three images with fixed interval to construct fuzzy rules for containing temporal information. In sleep/awake detection, the NIR images will are rectified by using the function of illumination variation firstly. Then, the motion estimation is utilized to quantify the activity degree of sleepers.

NIR images look similar to gray-level image. The NIR image has less information of hue and saturation components than color images. Therefore, the correct rate of face recognition in dark environment is much lower than in the bright environment. However, the correct rates of action recognition in bright and dark environment are not that different because information provided by NIR images is sufficient to extract almost complete foreground images. In the sleep/awake detection system, we also obtain very good by using motion estimation. In the future, it is necessary to find a new a new face recognition algorithm to improving the correct rate in darkness environment.

References

[1] W. H. Liao and C. M. Yang , “Video-based activity and movement pattern analysis in overnight sleep studies,” ICPR, pp.1-4, 2008.

[2] Y.T. Peng, C.Y. Lin, M.T. Sun, and C.A. Landis, "Multimodality Sensor System for Long-Term Sleep Quality Monitoring," IEEE Transactions on Biomedical Circuits and Systems, vol. 1, no. 3, pp.217–227, 2007.

[3] M. Piccardi, “Background subtraction techniques: a review,” in Proc. IEEE Int.

Conf. SMC., vol. 4, pp. 3099–3104, Oct. 2004.

[4] H. Saito, A Watanabe, and S Ozawa, “Face pose estimating system based on eigenspace analysis,” in Proc. Int. Conf. Image Processing, vol. 1, pp. 638–642, 1999.

[5] J. Wang, G. Yuantao, K. N. Plataniotis, and A. N. Venetsanopoulos, “Select eigenfaces for face recognition with one training sample per subject,” in Proc.8th Cont., Automat. Robot. Vision Conf., ICARCV 2004, vol. 1, pp. 391–396, Dec.

2004.

[6] M. M. Rahman and S. Ishikawa, “Robust appearance-based human action recognition,” in Proc. the 17th Int. Conf. Pattern Recog., vol. 3, pp. 165–168, 2004.

[7] L. X. Wang and J. M. Mendel, “Generating fuzzy rules by learning from examples,” IEEE Trans. Syst., Man Cybern, vol. 22, no. 6, pp. 1414–1427, Dec.

1992.

[8] P. Viola and M. Jones, “Robust Real-Time Face Detection,” IJCV, vol. 57, no. 2, pp. 137–154, Mar. 2004.

[9] P. Belhumeur, J. Hespanha, and D. Kriegman, “Eigenfaces vs. Fisherfaces:

Recognition Using Class Specific Linear Projection,” IEEE Trans. Pattern

Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711-720, 1997.

[10] “OpenCV 1.0, Open Source Computer Vision Library,”

http://www.intel.com/technology/computing/opencv/, 2006.

[11] I. Haritaoglu, D. Harwood, and L. S. Davis, “W4: Real-time surveillance of people and their activities,” IEEE Trans. Pattern Anal. Machine Intell., vol. 22, no.8, pp. 809–830, August 2000.

[12] J. C. S. Jacques Jr., C. R. Jung, S. R. Musse, “Background subtraction and shadow detection in grayscale video sequences.” In Proc. SIGGRAPH, pp.

189–196, 2005.

[13] M. Soriano, B. Martinkauppi, S. Huovinen and M. Laaksonen, “Using the skin locus to cope with changing illumination conditions in color-based face tracking,”

in Proc. IEEE NORSIG, Kolmarden, Sweden, pp. 383–386, 2000.

[14] K. Etemad and R. Chellappa, “Discriminant analysis for recognition of human face images,” in Proc. ICASSP, pp. 2148–2151, 1997.

[15] K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd edition, 1300 Boylston Street Chestnut Hill, Massachusetts USA: Academic Press, 1990. 

[16] S. B. Kang and R. Weiss, “Can we calibrate a camera using an image of a flat textureless Lambertian surface?” in Proc. ECCV, vol. 2, pp. 640–653, 2000.

[17] W. Huang, A. Phyo Wai, S. Fook Foo, J. Biswas, C. Hsia and K. Liou,

“Multimodalsleeping posture classification,” in Proc. ICPR, pp. 4336-4339, Aug., 2010.

[18] Y. C. Luo, “Extracting the Foreground Subject in the HSV Color space and Its Application to Human Activity Recognition System,” Master Thesis, Elect. and Con. Eng. Dept., Chiao Tung Univ., Taiwan, 2007.

[19] R. Gonzales and R. Woods, Digital Image Processing, 3rd ed. Pearson Education International, pp. 589–591, 2008.

相關文件