Understanding of human behaviors from videos in nursing care monitoring systems

(1)

IOS Press

Understanding of human behaviors from

videos in nursing care monitoring systems

Chin-De Liu

a

_{, Pau-Choo Chung}

a,∗

_{, Yi-Nung Chung}

b

_{and Monique Thonnat}

c

a_{Department of Electrical Engineering, Institute of Computer and Communication Engineering,} National Cheng Kung University, Tainan 70101, Taiwan ROC

b_{Department of Electrical Engineering, Dayeh University, Changhua 515, Taiwan ROC} c_{Orion research team, INRIA BP93, FR 06902 Sophia Antipolis, France}

Abstract. This paper addresses the issue in scenario-based understanding of human behavior from videos in a nursing care monitoring

sys-tem. The analysis is carried out based on experiments consisting of single-state scenarios and multi-state scenarios where the former monitors activities under contextual contexts for elementary behavior reasoning, while the latter dictating the elementary behavior order for behavior rea-soning, with a priori knowledge in system profile for normality detection. By integrating the activities, situation context, and profile knowledge we can have a better understanding of patients in a monitoring system. In activity recognition, a Negation-Selection mechanism is developed. which employs a divide-and-conquer concept with the Negation using posture transition to preclude the negative set from the activities. The Selection that follows the Negation uses a moving history trace for activity recognition. Such a history trace composes not only the pose from single frame, but also history trajectory information. As a result, the activity can be more accurately identified. The developed approach has been established into a nursing care monitoring system for elder’s daily life behaviors. Results have shown the promise of the approach which can accurately interpret 85% of the regular daily behavior. In addition, the approach is also applied to accident detection which was found to have 90% accuracy with 0% false alarm.

Keywords: Behavior analysis, activity recognition, state machine, scenario recognition

1. Introduction

Due to prolong of human lives, the elder’s daily health care has become one of most critical issues in our soci-ety. As such, how to apply the current technology to improving the well-being of the elder’s daily life has become increasingly important. In order to address an increasing need of assisting elderly care, establishing nursing homes or centers have become a necessity, some of which are installed with cameras for monitoring the elder’s conditions in every bedroom and hallway to avoid unexpected accidents. However, this approach requires a dedicated person keeps watching the screens all the time, which is a tremendous human burden and is also very likely to commit hu-man occasional negligence as well. Thus an approach for understanding the elderly behaviors from video sequence would provide great assistance to monitoring the elderly situation.

Recently, many research works have been devoted to human behavior understanding. Kosuke Hara, Takashi Omori, and Reiko Ucho [7] used location sequence for estimating behavior patterns. This approach involves only position patterns. However, more precise human behavior understanding should also involve human activity and their contextual interaction. Based on this consideration, this paper develops multi-state scenario based human behavior understanding from video sequence. The transition in the multi-state scenario is triggered by activity contexts reasoned from single state scenarios, while each single-state scenario is the composition of activity and contextual information. In order to accurately recognize the activities from video sequence, a Negation-Selection

*_{Corresponding author. Department of Electrical Engineering, National Cheng-Kung University, No. 1, University Road, Tainan 70101,}

Taiwan ROC. E-mail: pcchung@ee.ncku.edu.tw.

(2)

mechanism is proposed in the paper. The Negation-Selection employs a divide-and-conquer idea where the Nega-tion is used to reduce the recogniNega-tion target domain using posture transiNega-tions and the SelecNega-tion then employs a history map for precise recognition of activities. With this approach, the activity can be more accurately recog-nized and used in the inference of behaviors combining with contextual information.

Some related works reported in literatures are surveyed as follows. G. Sudsir, John C.M. Lee, Anil K. Jain [6] and Hisashi Miyamori, Shun-ichi Iisaku [9] employed shape matching to recognize the tennis players’ actions from video streams. Ismail Haritaoglu and David Harwood [10] performed human posture recognition from his-togram projection. Jezekiel Ben-Arie, Zhiqian Wang, Purvin Pandit, and Shyamsundar Rajaram [11] extracted the angles of body parts from a sequence of video frames and then used maximum likelihood estimation for the recognition of human activities. Galata et al. [3] used Variable-Length Markov Models (VLMM) for recog-nizing human activities from a sequence of contours composed of variable-length of points. Davis and Bobick [1,2,4,5] used motion-history images (MHIs) in a motion recency for activity recognition. Osama Masoud and Nikos Papanikolopoulos [14] employed template matching on the image difference for the recognition of human activities. Randal Nelson [15,16] performed activity recognition from a binary feature vector, each element of which represents whether an associated block in an image difference is in high motion. M. Masudur Rahman, Kazuya Nakamura, and Seiji Ishikawa [12] used eigen features extracted from the image motion vectors for the recognition of activities. N. Krahnstover, M. Yeasin and R. Sharma [13] used motions of feet to distinguish “walk-ing” and “runn“walk-ing”. Hironobu Fujiyoshi and Alan J. Lipton [8] proposed the star-skeleton for the recognition of human activities. In accident behavior detection, Wenmiao Lu and Yap-Peng Tan [17] employed hypothesis analy-sis on the extracted object shape for swimming pool incident detection.

According to previous surveys, most of the research has been focused on recognition of only some particular activity actions, such as running, walking, tennis actions, and swimming accidents. They are not applicable to activity recognition, in our case such as activities occurred in a nursing home/center. On the other hand, most of the recognition did not apply context information. As we know, human’s daily life behavior is usually perceived from activities a person performs followed by his interactions with surrounding context. Consequently, for daily life behavior understanding, human activities should act as the primary, and the context interaction also acts as an assistant. Furthermore, since activity is derived from a sequence of motion patterns, the instant posture and motion trace should play equally important roles to the recognition. Based on these considerations, this paper performs the human behavior recognition using multi-state scenario based approach combining the human activities derived from postures and motion trace, and the surrounding contexts in the state reasoning. The results have been imple-mented into a healthcare monitoring system in a nursing center. The monitoring system extracts the elderly daily behavior scenarios and performs statistics. Furthermore, it is also implemented with a tele-consultation [19,20] to connect the nursing center with supporting hospital, so that consultations can be provided real-time from the physicians.

The remainder of the paper is organized as follows. Section 2 presents posture recognition and activity recogni-tion using Negarecogni-tion-Selecrecogni-tion mechanism. The state machine employed for single-state scenarios and multi-state scenarios, and the inference rules are described in Section 3. Section 4 presents the nursing care monitoring system which implements the behavior understanding. Section 5 gives the experimental results. Finally, the conclusions are drawn in Section 6.

2. Activity recognition from video sequence

2.1. Feature extraction and posture recognition

To extract an activity from video stream, first is to detect the foreground objects and image features. A simple and common method to detect foreground object is to use a background model which involves subtracting with thresholding to determine foreground pixels. The pixel intensity of a completely stationary background can be reasonably modeled as a normal distribution with two parameters: the mean m(x) and variance σ(x). The pixel I(x)

(3)

is detected as a foreground pixel if the Gaussian probability p(I(x)) = G(I(x), m(x), σ(x)) is greater than a threshold value.

Based on the extracted foreground pixel, the posture estimation is calculated by using the horizontal histogram projection H(x) and vertical histogram projection V (x), computed through the Kullback-Leibler information mea-sure defined in [18] as ρ = min1ik{Di}, where

Di=1 2 x H(x) log H(x) hi_(x) + x hi(x) log hi(x) H(x) +1 2 x V (x) log V (x) vi_(x) + x vi(x) log vi(x) V (x) . (1)

The k here is the number of postures in the database. The ρ is the obtained posture number and hi(x) and vi(x) are the horizontal and vertical projection of the i-th posture, respectively.

2.2. Negation-Selection for activity recognition

A simple activity is composed of two main postures and motions. These two postures usually can be determined at the segment of the beginning and the ending of an activity. Based on these considerations, this paper develops a Negation-Selection mechanism for activity recognition. The Negation uses the beginning and the ending segment of postures to veto some activities. Following the Negation is the Selection which then identifies the target activity from the remaining candidate activities using the history map.

In the Negation, two features, starting posture and ending posture, are employed as the abstraction coding of a posture sequence. The two codes are obtained by weighted match count shown in the following

jstart= max 1jk _N/2 i=1 cstart_i F (pi, j)

and jend= max 1jk _N i=N/2 cend_i F (pi, j) , (2) where F (pi, j) = 1, if pi= j 0, otherwise, c start

i = 0.3∗ (0.9)i, cendi = cstartN−i

are the weighting factors, k is the number of defined postures, piis the posture for frame i, N is the length of the sequence. Once the starting posture and ending posture are determined, the system uses Table 1 to eliminate the impossible activities.

Table 1

The impossible activities associated with posture code for Negation-Selection

jstart jend

Standing-pose Bending-pose Lying-pose

Standing-pose Sitting, lying, Standing, lying Standing, sitting, walking,

standing throwing, picking.

Bending-pose Sitting, lying. Walking, Walking, standing,

Lying, standing throwing, picking.

Lying-pose Sitting, lying, Lying, throwing. Walking, standing,

(4)

Fig. 1. The normalized vertical and horizontal projections of standing, bending, and lying down postures.

Fig. 2. The projection histogram of history map in vertical. Table 2

One example of daily schedule in a nursing center

Breakfast Free time Lunch Take midday naps Free time

AM 7:00∼8:00 AM 8:00∼11:00 AM 11:00∼PM PM 1:00∼PM PM 2:30∼PM

1:00 2:30 4:00

Bath time Dinner Free Time Sleep Time

PM 4:00∼PM PM 5:00∼PM PM 6:30∼PM PM 7:30∼AM

(5)

Fig. 3. The projection histogram of history map in horizontal.

Following the “Negation” is the “Selection” which identifies the activity associated with the frame segment from remaining activity candidates. An activity is the composition of a sequence of motions. The motion is computed from the motion history map (MHS) as

M HSt(x) =

255, x∈ M

M HSt−1(x)−255N , otherwise,

(3) where M represents the set containing motion pixels. In the computation, the MHS is adopted for activity recogni-tion. Assume that HAi

h and V Ai

v are the horizontal and vertical projection of the MHS of activity Aiand Hhand Vvare the horizontal and vertical projection of MHS from the input video sequence. The activity A∗i of the input sequence with regard of history map is obtained as

A∗_i = arg max Ai∈A − log h v H_h− HAi h 2 + Vv− VvAi 2 , (4)

where A is the set of containing candidate activities after Negation.

3. Behavior understanding with scenario recognition

Detection of abnormality in a situation is important in nursing centers. As abnormality is compared to normality, an abnormality situation should be detected from the contrast of the normal routine behaviors. In a nursing center which also takes care of mentally hindered people, the detection of behavior using only activities has its limitation as the mentally hindered people could occasionally perform some meaningless actions. However, it is worth men-tioning that they are arranged to have routine daily schedule and restricted activity range, such as the schedule in Table 2 of one nursing center experimented in our study. According to this strict schedule and location restriction, every patient has very regular activities within certain time period and the combination of location. Thus, their daily normal behaviors can be established into scenarios and then used in the detection of abnormality.

In this paper the scenarios are divided into single-state scenarios and multi-state scenarios. A single-state sce-nario is an elementary behavior inferred from an activity and a context, while a multi-state scesce-nario is a composition of several single state scenarios denoting a behavior description.

(6)

To recognize our scenarios, context plays an important role. Context can be categorized into two classes: numer-ical Caand symbolic Cbcontexts. The numerical context includes time duration, velocity, distance, size and etc. The symbolic context includes marks, equipments, locations and time period. With detected activity Atat time t, a single state scenario is computed as the maximum likelihood between observations and given scenarios as

S∗= arg max Si∈S

{P (At, C|Si)}, (5)

where P (At, C|Si) = P (At|Si)∗ P (C|Si) = P (At|Si)∗m_j=1wi_jP (Cj|Si) is the probability of the detected ob-servations and given scenario. The w_jihere is the weight of j-th context in i-th scenario, m is the number of context. With the symbolic contexts, the P (Cj|Si) = 1 with context Cjmatched in scenario Si, otherwise P (Cj|Si) = 0. With numerical contexts, P (Cj|Si) is calculated with a Gaussian estimation P (Cj|Si) = G(Cj, mSi, σSi) with

mean mSiand variance σSi.

To detect multi-state scenario from a sequence of single state scenario, the likelihood function is computed as P (St|Sm) = P (St1, St2, . . . , StK|Sm1, Sm2, . . . , SmK)

= P (St1|Sm1)∗ P (St2|Sm1, Sm2)∗ · · · ∗ P (StK|SmK₋₁, SmK)

= P (St1|Sm1)∗ (P (St2|Sm2)∗ P (St2|Sm1))∗ · · · ∗ P (StK|SmK)∗ P (StK|SmK₋₁), (6) where St= [St1, St2, . . . , Stk] and Sm= [Sm1, Sm2, . . . , Smk] represents the sequence and observations, respec-tively. By the log function, the likelihood function can be computed as

log(P (St|Sm)) = K k=1

P (Stk|Smk)∗ P (Stk|Smk−1), (7)

where K is the length of single state scenario sequence. Then the most likely single state scenario sequenceΩ is found by the maximum likelihood approach withΩ = arg max_m(1_kK_k=1P (Stk|Smk)∗ P (Stk|Smk₋₁)). To handle the start point of a sequence of single state scenario, the likelihood function is rewritten as

Ω = arg max s arg max m 1 k K k=1 P (Stk|Smk+s)∗ P (Stk|Smk+s−1) . (8)

With the above estimation, a multi-state scenario can be detected from each sequence of single state scenario with length k. Besides observation sequence, the time also plays an important role for multi-state scenario. It provides as a support in the multi-state scenario reasoning. To apply the time context for scenario reasoning, we construct a deterministic finite accepter (DFA) for each multi-state scenario, denoted as M = (Q, δ, q0, F ), where Q is the set of states, δ is the transition function contains the time constraint in each multi-state scenario, q0∈ Q is the initial state and F is the final state set. The time context is applied as a transition in DFA to make a detected multi-state scenario arrive at the final state. Figure 4 shows a state machine for one multi-state scenario “sitting on a chair and watching TV” describing the behavior associated with the camera in the TV area. The complete inference is triggered by the following single state scenarios: “walking to chair”, “sitting on a chair” and static with a temporal context “more than 3 minutes”. Figure 5 shows a state machine for a multi state scenario of “going to sleep” in the bedrooms. It has a complete inference when lying on bed for more than 10 minutes.

Detection of abnormality is performed based on the inference from multi-state scenarios and a prior knowledge of user profile contained in a system profile. In other words, an activity occurred in an unjustified temporal context, or unjustified with user profile is regarded abnormal. For instances, the situation that a scenario of “sitting on a chair and watching TV” is detected under the temporal context “midnight” is regarded as an abnormality. If a

(7)

Fig. 4. State machine for scenario “sitting on a chair and watching TV” recognition.

Fig. 5. State machine of scenario “going to sleep”.

Fig. 6. State machine of scenario “escape from the nursing center”.

person has a regular midday nap of one hour, the situation where he sleeps much longer than one hour is also regarded as an abnormality. When the scenario associated with the camera installed in the entrance of the bath room detects that a person stays in the bath room for longer than a certain period, this situation is regarded as another abnormality. For the nursing center we experimented with, if “leaving bed and going out” scenario is detected under the temporal context “midnight”, this is also regarded as an abnormality, since it is required that the elderly should be accompanied by a nurse when he/she goes to the toilet after the bed time. Under this abnormality situation, a signal will be issued to notify the nurse center. Figure 6 shows an abnormality is detected when the elderly walks to the entrance and exit.

Another abnormality detected by the system is an accident. An accident such as “faint”, “fall” or “slip” is usually associated with the activity “lying”. Thus activity “lying” acts as one major element in our current design of accident detection. Accident recognition combines the activity “lying” and spatial contexts. “Lying” occurring

(8)

Fig. 7. State machine of scenario “faint”.

Fig. 8. In (a) and (b) a scenario “walking to bed” is detected, in (d) a scenario “sitting on the bed” is detected, in (e) an activity “static” is detected, in (f) a scenario “lying on bed” is detected, and a multi-state scenario “go to sleep” is detected.

under unjustified spatial context is suspected as an accident. Other pre- and post activities are also applied to reduce the possibility of accident false alarm. For an instance, if activities “walking”, “standing” or “running” is detected immediately after accident, this accident must have been a false-alarm. Figure 7 shows the state machine for accident detection.

4. Monitoring system implementation in nursing center and hospital

The above mentioned technique is implemented in a nursing care monitoring system. The system is built in the nursing center “Hung-Chia Sanctuary for the Handicapped” and connected to “Chi-Mei Medical Center”. The system architecture is shown in Fig. 9.

(9)

Fig. 9. System Architecture of nursing care monitoring system.

There are four modules which are Behavior Interpretation Module, Announcement Service Module, Behavior Statistics Module and Tele-Consultation Module. The Behavior Interpretation Module is used to extract elderly behaviors. The Announcement Service Module notifies care takers based on a layer notification policy when some behaviors are detected. The Behavior Statistics Module is to record the behavior history and contains statistics tools to generate reports for care takers. In order to provide assistance in urgent situation, a Tele-Consultation Module is built for audio and video communication in real time between nursing center and hospital. Each module is described in detail in the following:

4.1. Announcement service

The announcement service is to notify related care takers of the happening of certain special behaviors. The notification methods include broadcast, E-mail and multi-media service notification. Some behaviors associated notifications require response confirmation from care takers within a time limitation. In this system layer notifi-cation is also designed. If a notifinotifi-cation in one level is not responded within the time limitation, the next level of notification will be activated, in order to ensure that the event will be received appropriately. Table 7 shows the attributes used in the notification policy including the notification target, the response requirement, the response time limitation, auto-cancel and layer notification for some typical behaviors. The notification targets include staffs in nursing home, nurse and doctor in hospital and patient’s family members. The auto-cancel indicates whether the notification event will be canceled if the behavior does not continue in the following video frames.

4.2. Tele-consultation service

Once the staff in nursing center needs guidance for handling an unexpected accident, it is necessary to have advices from the supporting hospital. In order to provide the real-time assistance, a tele-consultation service is built into the monitoring system. The tele-consultation service includes video and audio communications which

(10)

enable a doctor to give directions for the emergency process and observes the patient’s situation in the nursing center.

4.3. Behavior cross statistic tools

The statistics of elderly behaviors provides significant information for healthcare in nursing home and medical diagnosis in hospital. For example, if the elderly always sits or lies on bed, this implies the elderly have less exercise. For this situation, nursing home should force the elderly to have some exercises for health reason. To provide the information, a database is setup for recording long term behaviors and related information such as time duration. Also a statistic tool is implemented into this system. In each behavior, the statistic tools analyze the time duration, number of times and time interval for each behavior.

5. Experimental results

The developed system is applied in a nursing home (or nursing center) for tests. The multi-state scenarios and single-state scenarios are constructed based on the environment under each camera’s monitor. Figure 8 shows a multi-state scenario “going to sleep” is detected in a test video with 10 frames per second; frame 1004 to 1307, single-state scenario “walking to bed” is detected; frame 1911 to frame 2021, single-state scenario “sitting on bed” is detected; frame 5762 to frame 5842, single-state scenario “lying on bed” is detected; frame 5842 to frame 7642, “static” with temporal context “3 minutes” is detected. With these single-state scenarios, the multi-state scenario “going to sleep” is recognized. Then normality reasoning is performed with the results referred to the system profile.

To evaluate the performance of the system in the recognition of human scenarios, activities, and postures, we selected 7 video segments totaling to about 120 minutes for testing. These videos contain the behaviors “going to sleep”, “going to toilet”, “watching TV” and “taking a walk”. We first classify the videos manually into our defined scenarios, activities and postures which totally contain 15 multi-state scenarios, 35 single-state scenarios, 102 activity segments and about 72 000 posture frames. From the real situation case in a nursing center, it can be said that all the patients are relatively inactive. So, even though there are a total of 72 000 (120 minutes∗ 60 second∗ 10 fps) postures, it contains not many scenarios and activities.

Table 3 shows the recognition rates for the postures. From the results, we can find that the performance for the bending-pose is the lowest. This is due to that bending-pose is a posture relatively similar to the standing-pose,

Table 3 Posture recognition rate

Standing-pose Bending-pose Lying-pose

Standing-pose 54141 16587 124

Bending-pose 17438 47528 11180

Lying-pose 421 7885 60696

Rate 75% 66% 84%

Table 4

Activity recognition rate based on posture sequence analysis

Total Success Rate

Sitting 15 10 66%

Walking 12 11 91%

Lying down 8 7 87%

Standing 15 10 66%

(11)

Table 5

Single state scenario recognition results

Number True-positive False-positive Rate

Walking to bed 5 5 1 83% Lying on bed 3 3 2 60% Leaving bed 4 4 1 80% Sitting on a chair 4 4 1 80% Going to toilet 4 3 3 57% Table 6

Multi state scenario recognition results

Number True- False- Rate

positive positive

Sitting on a sofa and watch TV 4 4 2 67%

Going to sleep 3 3 1 75%

Leaving bed to toilet, and come back to sleep on the bed 3 3 0 100%

causing it be easily to misrecognized as standing-pose. Also, the stand pose shows a lower performance than the lie-pose. This is because the standing-pose is extracted in moving situation and therefore would involve more complicated background situations. On the other hand, as the lying-pose is extracted on static situation, on a fixed background, consequently it presents a higher recognition rate.

Table 4 shows the activity recognition results. According to Table 4, we can see that because sitting and standing are two activities involving only the posture transition, they have lower recognition rates. But for “walking”, which involves the same posture, it has higher recognition rate. Lying down has higher recognition rate, partly due to its higher recognition rate of the primary related postures.

Table 5 shows the recognition results of single state scenarios. We have found that the system has lower recogni-tion rate in “Lying on the bed” and “going to toilet”. This is because the foreground objects on these two scenarios are too small. We expect that a much higher recognition rate of the activity “lying” would be achieved if the fore-ground object is larger. Also, the toilet is positioned in the end of the hallway, which usually is exposed under a strong sun light during the afternoon, causing strong bleach in the video images and degrading our recognition rate. It is also worth mentioning that because of the obvious motions, the scenarios “walking to bed” and “leaving bed” have higher recognition rates.

Table 6 lists the recognition results of multi-state scenarios. From Table 6 we can find the lower recognition rate in “sitting on a chair and watching TV” and higher recognition rates in “going to sleep” and “leaving bed to toilet and coming back to sleep on bed”. Both in Table 5 and Table 6, the low recognition rates are usually associated with videos taken from the camera having its direction of view along the hallway. We think that this long extension of the direction of view makes the drastic difference in the object sizes. Particularly when the object is in the far away end, foreground object detection becomes much more challenging. We also found out that the recognition rates for the cameras in bed room are usually much higher than those in hallways. The reason may again due to that the bed room has shorter direction of view than the hallway, giving clearer and more stable object features.

6. Conclusions

In view of the increasing necessity of elderly care, a scenario-based human behavior understanding from video streams for a nursing center has been developed. In the developed approach, a behavior scenario is established by a multi-state scenario with its state transition triggered by activity-context inferred single-state scenarios. In this design, the activities, contextual information, and profile knowledge are integrated in scenario reasoning. For obtaining accurate activity recognition, a Negation-Selection mechanism which adopts a divide-and-conquer idea is

(12)

Table 7

Attributes in the notification policy. N: nurse, D: doctor, F: family member, S: staff, Y: yes, N: no

Behavior Notification Response Response Auto-Cancel Layer

targets from related time notification

people limitation

Faint N, D, F, S Y 3 seconds Y Y

Go to toilet too long S Y 1 minute Y Y

Sleep too long S,N Y 5 minutes Y N

Walk to unallowable location S Y 1 minute Y Y

Self-injure N, D, F, S Y 1 minute N Y

also developed. The Negation precludes negative set from the activities using posture transitions, and the Selection then following the Negation uses a moving history trace for activity recognition. In contrast to existing approaches which conduct only posture and activity recognition, our system embeds activities into a state machine and have them combined with contextual information for behavior recognition. While high level behaviors exist only under certain contextual environments, this approach provides a higher potential for behavior understanding. Results have indicated the promise of the approach which can accurately interpret 85% of the regular daily behaviors. The approach is also employed for accident detection which was found to have 90% accuracy, with 0% false alarm.

The developed approach has been implemented in a monitoring system to understand the elderly’s daily behav-iors in a nursing center. The system includes four modules which are Behavior Interpretation Module, Behavior Statistics Module, Announcement Service Module and Tele-Consultation Module. With the system the nursing center can better understand the behaviors of the elderly for providing appropriate healthcare services.

Acknowledgements

This work was supported by National Science Council Taiwan, under Grant NSC 93-2213-E-006-049.

References

[1] A.F. Bobick and J.W. Davis, The recognition of human movement using temporal templates, IEEE Trans. on PAMI 23(3) (2001). [2] A.F. Bobick, Movement activity and action: The role of knowledge in the perception of motion, in: Proc. Workshop Knowledge-based

Vision in Man and Machine, 1997.

[3] A. Galata, N. Johnson and D. Hogg, Learning variable-length Markov models of behavior, Computer Vision and Image Understanding

81(3) (2001), 398–413.

[4] A. Bobick and J. Davis, An appearance based representation of action, in: Proc. Int’l Conf. Pattern Recognition, vol. 1, 1996.

[5] J.W. Davis, Hierarchical motion history images for recognizing human motion, in: Proc. IEEE Workshop Detection and Recognition of Events in Video, 2001, pp. 39–46.

[6] G. Sudhir, J.C.M. Lee and A.K. Jain, Automatic classification of tennis video for high-level content-based retrieval, in: Proc. IEEE Int’l Workshop Content-Based Access of Image and Video Database, 1998, pp. 81–90.

[7] K. Hara, T. Omori and R. Ueno, Detection of unusual human behavior in intelligent house, in: Proc. IEEE Int’l Workshop Neural Networks for Signal Processing, 2002, pp. 697–706.

[8] H. Fujiyoshi and A.J. Lipton, Real-time human motion analysis by image skeletonization, in: Proc. IEEE Workshop Applications of Computer Vision, 1998, pp. 15–21.

[9] H. Miyamori and S.-I. Iisaku, Video annotation for content-based retrieval using human behavior analysis and domain knowledge, in: Proc. Int’l Conf. Automatic Face and Gesture Recognition, 2000.

[10] I. Haritaoglu, D. Harwood and L.S. David, W4: Real-time surveillance of people and their activities, IEEE Trans. on PAMI 24(8) (2000). [11] J. Ben-Arie, Z. Wang, P. Pandit and S. Rajaram, Human activity recognition using multidimensional indexing, IEEE Trans. on PAMI

(13)

[12] M.M. Rahman, K. Nakamura and S. Ishikawa, Recognizing, human behavior using universal eigenspace, in: Proc. Int’l Conf. Pattern Recognition, vol. 1, 2002, pp. 295–298.

[13] N. Krahnstover, M. Yeasin and R. Sharma, Towards a unified framework for tracking and analysis of human motion, in: Proc. IEEE Workshop Detection and Recognition of Events in Video, 2001, pp. 47–54.

[14] O. Masoud and N. Papanikolopoulos, Recognizing human activities, in: Proc. IEEE Conf. Advanced Video and Signal Based Surveillance, 2003, pp. 157–162.

[15] R. Polana and R. Nelson, Low level recognition of human motion, in: Proc. IEEE Workshop Motion of Non-Rigid and Articulated Objects, 1994, pp. 77–82.

[16] R. Polana and R. Nelson, Recognizing activities, in: Proc. Int’l Conf. IAPR, vol. 1, 1994, pp. 815–818.

[17] W. Lu and Y.-P. Tan, A vision-based approach to early detection of drowning incidents in swimming pools, IEEE Trans. on Circuits and System for Video Technology, 14(2) (2004).

[18] S. Kullback, Information Theory and Statistics, Dover, 1968.

[19] C.-S. Lo, C.-W. Yang, P.-C. Chung, Y.-C. Ouyang, S.-K. Lee and P.-S. Liao, A mammography tele-consultation pilot system in Taiwan, Journal of High Speed Networks 9 (2000), 31–46.

[20] C.-C. Lee, P.-C. Chung, D.-R. Duh, Y.S. Han and C.-W. Lin, A practice of a collaborative multipoint medical teleconsultation system on broadband network, Journal of High Speed Network 13(3) (2004).