• 沒有找到結果。

智慧型跌倒偵測系統之研究

N/A
N/A
Protected

Academic year: 2021

Share "智慧型跌倒偵測系統之研究"

Copied!
57
0
0

加載中.... (立即查看全文)

全文

(1)

行政院國家科學委員會專題研究計畫 期末報告

智慧型跌倒偵測系統之研究

計 畫 類 別 : 個別型

計 畫 編 號 : NSC 100-2221-E-011-134-

執 行 期 間 : 100 年 08 月 01 日至 101 年 12 月 31 日 執 行 單 位 : 國立臺灣科技大學電子工程系

計 畫 主 持 人 : 陳郁堂

計畫參與人員: 碩士班研究生-兼任助理人員:劉殷助 碩士班研究生-兼任助理人員:曾柏叡 碩士班研究生-兼任助理人員:鍾紹鵬 碩士班研究生-兼任助理人員:鄭凱文

報 告 附 件 : 出席國際會議研究心得報告及發表論文

公 開 資 訊 : 本計畫涉及專利或其他智慧財產權,1 年後可公開查詢

中 華 民 國 102 年 03 月 26 日

(2)

中 文 摘 要 : 本計畫發展一個智慧型跌倒偵測系統,透過攝影機對老年人 進行監控,利用電腦視覺的技術,將攝影機傳回的畫面,進 行特徵萃取與分析,一旦偵測到老年人跌倒,能迅速發出警 訊。本系統有別以往研究,利用雞尾酒式演算法(hybrid algorithm),結合人體骨架與形狀特徵,以提升偵測系統的 準確率,能準確區分跌倒與近似跌倒的行為。然而現有三維 度的人體骨架建構演算法,有計算複雜度過高的缺陷,無法 滿足即時偵測的要求,為降低人體骨架建構的計算量,我們 將發展一個新的二維度骨架建構演算法,本演算法排除使用 顏色資訊,主要運用形狀資訊來建構人體骨架。在光線昏暗 的環境下,依舊能正常運作。同時我們採用道格拉斯演算 法,降低人體輪廓上的像素點,同時利用取樣技巧

(sampling technique) ,降低偵測演算法的複雜度,以滿足 即時偵測的要求。本計劃利用 Intel OpenCV 完成以自動的跌 倒偵測系統開發,並並利用各種跌倒偵測影片,進行跌倒偵 測系統的效能實驗。實驗結果顯示,本計劃所提出跌倒偵測 系統較現有方法顯著提昇系統的效能。

中文關鍵詞: 跌倒偵測 電腦視覺 人體骨架建構

英 文 摘 要 : This project proposed and implanted a novel real-time human fall detection system. Because the system is based on a combination of skeletal features and various human shapes, it can distinguish between fall-down and fall-like incidents with a high degree of accuracy. Normally, complex 3-D models are used to extract human posture information. However, to reduce the high computational cost of human skeleton

extraction, we propose using a 2-D skeleton model instead. The proposed 2-D skeleton model only uses shape analysis to locate body parts without color cues. As a result, the model is effective in low light conditions. We apply the Douglas-Peucher

algorithm to reduce the number of pixels in the human contour, which speeds up the human skeleton

extraction process. Furthermore, we acquire an image of the human skeleton every 0.4 seconds instead of in each frame, which enables us to detect changes in the victim's skeleton during the fall.

英文關鍵詞: human fall detection computer vision 2-D skeleton model

(3)

行政院國家科學委員會專題研究計畫成果 智慧型跌倒偵測系統 之研究

The Study of Video-based Human Fall Detection Systems

計畫編號:NSC 100-2221-E011-134

執行期限:100 年 8 月 1 日至 101 年 12 月 31 日 主持人:陳郁堂 台灣科技大學電子系

計畫參與人員:鄭凱文 劉殷助 曾柏叡 鍾紹鵬

台灣科技大學電子系

(4)

一、中文摘要

本計畫發展一個智慧型跌倒偵測系統,透過攝影機對老年人進行

監控,利用電腦視覺的技術,將攝影機傳回的畫面,進行特徵萃

取與分析,一旦偵測到老年人跌倒,能迅速發出警訊。本系統有

別以往研究,利用雞尾酒式演算法(hybrid algorithm),結合人

體骨架與形狀特徵,以提升偵測系統的準確率,能準確區分跌倒

與近似跌倒的行為。然而現有三維度的人體骨架建構演算法,有

計算複雜度過高的缺陷,無法滿足即時偵測的要求,為降低人體

骨架建構的計算量,我們將發展一個新的二維度骨架建構演算

法,本演算法排除使用顏色資訊,主要運用形狀資訊來建構人體

骨架。在光線昏暗的環境下,依舊能正常運作。同時我們採用道

格拉斯演算法,降低人體輪廓上的像素點,同時利用取樣技巧

(sampling technique) ,降低偵測演算法的複雜度,以滿足即

時偵測的要求。本計劃利用 Intel OpenCV 完成以自動的跌倒偵

測系統開發,並並利用各種跌倒偵測影片,進行跌倒偵測系統的

效能實驗。實驗結果顯示,本計劃所提出跌倒偵測系統較現有方

法顯著提昇系統的效能。

(5)

英文摘要

This project proposed and implanted a novel real-time human fall detection system. Because the system is based on a combination of skeletal features and various human shapes, it can distinguish between fall-down and fall-like incidents with a high degree of accuracy. Normally,

complex 3-D models are used to extract human posture information.

However, to reduce the high computational cost of human skeleton

extraction, we propose using a 2-D skeleton model instead. The proposed

2-D skeleton model only uses shape analysis to locate body parts without

color cues. As a result, the model is effective in low light conditions. We

apply the Douglas-Peucher algorithm to reduce the number of pixels in

the human contour, which speeds up the human skeleton extraction

process. Furthermore, we acquire an image of the human skeleton every

0.4 seconds instead of in each frame, which enables us to detect changes

in the victim’s skeleton during the fall.

(6)

1 前言

The aging global population has rapidly increased in recent years. In Taiwan, in 2007 the senior citizen population was more than 10 percent of the total population. It is expected that the senior citizen population will reach 3.68 million in 2020. Hence, an automatic human fall detection system for senior citizens has become an important

issue for smart homes.

2.研究目的

We address this problem using a video-based fall incident detection system for senior citizens. For these people, a fall-down incident normally occurs suddenly, within approximate 0.45 to 0.85 seconds. When that happens, both the posture and shape of the victim change rapidly as shown in Fig. 1. The victim then lies on the floor and becomes inactive. Hence, a drastic change in the human posture and shape are important features for human fall detection. However, modeling human postures with low computational complexity is a challenging issue. In previous research, simple features derived from shape analysis were used for fall detection instead of the human skeleton, a popular approach in human behavior analysis. The high computational cost of human skeleton extraction deters this approach from use in real-time human fall detection. No single approach based on simple features can detect all kinds of human falls. Most existing video-based fall detection systems based

(7)

on simple features suffer from high false alarm rates because they fail to differentiate

"fall-like" activities from "fall-down" ones. To provide a reliable fall detection system, an ingenious combination of several approaches is necessary. Hence, the intelligent combination of different fall detection approaches, which can increase detection accuracy while still satisfying the real-time constraint, has become a major research

issue in human fall detection.

3.文獻探討

Many efforts have been made concerning fall detection. Wearable sensor-based devices (Zhang et al, 2006) such as accelerometers (Lindemann et al, 2005) have been used to detect abnormal acceleration. However, this approach becomes useless if senior citizens do not wear the sensor device or the batteries within the sensors are dead. Several video-based fall detection technologies have been developed. For instance, the aspect ratio of the bounding box (William et al, 2007)(Töreyin et al, 2005)(Vishwakarma et al, 2007), the ratio between the x-axis and y-axis, is used to detect a fall. When a fall-down incident occurs, the aspect ratio of the bounding box will change. A fall angle, the angle between ground and the person, is another popular approach (Vishwakarma et al, 2007). When this angle is less than 45 degrees, a fall alarm is triggered. However, this approach may fail if the person falls toward the camera. The centroid of the person is also used to detect a fall-down

(8)

incident. During a fall, the centroid changes significantly and rapidly. In light of this, a vertical projection histogram,V x( ), is defined as follows:

1 if (x y) is a pixel within the human object ( )

0 otherwise

H x y       (1)

( ) ( )

y

V x

H x y(2)

has been suggested in (Lin et al, 2007) to detect a fall. When a fall occurs, the vertical projection histogram will change significantly. Horizontal and vertical gradients (Vishwakarma et al, 2007) are yet another technique to detect the fall. When a fall-down incident occurs, the vertical gradient is less than the horizontal gradient.

Several Hidden Markov Models (HMMs) have been proposed for use in fall detection (Töreyin et al, 2005). In (Töreyin et al, 2005), three-state HMMs are used to classify fall events. The feature parameters of HMMs are extracted from temporal wavelet signals describing the bounding box of the moving object. Anderson also used multiple features extracted from the silhouette such as the magnitude of motion vector, the determinant of the covariance matrix and the ratio of width to height of the bounding box to train HMMs to verify walking, kneeling, getting-up and falling.

Hiseh et al. (Hsu et al, 2008) proposed the triangulation-based skeleton extraction approach to analyze human movements. However, this system is not specifically designed for falling-down incident detection. Rougier et. al. (Rougier et al, 2007)

(9)

proposed a fall detection approach based on the Motion History Image (MHI) (Bobick et al, 2001) and changes in human shape.

Inspired by efficient human skeleton extraction (Hsu et al, 2008) and shape analysis in fall detection (Rougier et al, 2007), we propose a novel video-based human fall detection system that combines posture estimation and shape analysis. We attempt to use skeletal information to differentiate “fall-down” from “fall-like”

activities. To alleviate the high computational cost of human skeleton extraction, we extract human posture based on 2-D skeleton models instead of complex 3-D ones despite the fact that 3-D human models provide more information than 2-D ones. We then apply the Douglas-Peucher algorithm (Douglas et al, 1973) to reduce the number of pixels in the human contour, which can effectively speed up human skeleton extraction since the computational cost of the 2-D extraction is proportional to the number of pixels in the human contour. Simultaneously, we acquire the human skeleton every 0.4 seconds instead of in each frame. Consequently, we can detect the change in skeletons in human fall detection instead of skeleton matching. The objective of the human fall detection system is to detect a fall-down incident rather than a fall-down posture. In summary, the proposed fall detection scheme consists of posture change detection, shape change detection and inactivity detection. We use a human skeleton to detect a large posture change. We then use an ellipse to fit a human

(10)

body and use the orientation and the ratio of height and width of the ellipse to analyze the change in human shape. Finally, we confirm the human fall incident by sensing immobility in a person for a period of time. The major contribution of this paper is to propose a novel real-time fall detection approach for elderly people, which is an intelligent combination of three fall detection approaches to increase detection accuracy while still satisfying the real-time constraint.

4 研究方法及成果

Fall-down Detection Scheme

Foreground object segmentation and object tracking with occlusion handling are important steps before human fall detection as shown in Fig. 2. Light conditions or the shadows of moving objects can seriously affect the accuracy of foreground object segmentation. Moreover, looking at an object from a different visual angle can produce a different shape. The same scene may include several objects in an occlusion situation where a far object is covered by a near object. In foreground object segmentation, we use the Bayesian approach (Huang et al, 2003) instead of the popular background subtraction approach because shadows of moving objects can seriously affect object segmentation in the background subtraction approach. A

(11)

comparison of the background subtraction approach and the Bayesian approach under the shadow of a foreground object is shown in Fig. 3. However, the details of foreground object segmentation and object tracking are beyond the scope of this paper.

The proposed human fall detection system, shown in Fig. 4, integrates posture analysis, shape analysis and inactivity detection. We assume that a person is immobile on the floor after a fall and all video sequences are captured from a stationary camera.

The human skeleton is first extracted through the depth-first search of Delaunay meshes. The distance between the two sampling skeletons beyond a threshold determines a posture change. We use an ellipse to approximate the human shape and the orientation of the ellipse and the ratio of its major and minor semi-axes to detect the human shape change. We then confirm a human fall incident by monitoring the inactivity of a person for a period of time.

1. Posture Analysis with Human Skeleton

We use the Douglas-Peucker algorithm (Douglas et al, 1973) to approximate the contour of foreground objects with fewer vertices. We next perform constrained Delaunay triangulation to partition a foreground object blob into triangular meshes, and extract the human skeleton through the depth first search on centers of triangular meshes. Finally, we calculate a distance map of the human skeleton, and detect the

(12)

posture change by calculating the distance map of two human skeletons every 0.4 second. The human skeleton extraction procedure is illustrated in Fig. 5.

(i). Douglas-Peuker Algorithm

The Douglas-Peucker algorithm approximates a curve by recursively dividing the poly-line. In the beginning, we connect the two endpoints of a poly-line as the initial approximation and calculate the perpendicular distance of each intermediate point to the approximation line. If each distance is less than a predefined threshold, , the

approximation straight line is an acceptable solution. All endpoints are kept, and other intermediate points are discarded. Otherwise, if any perpendicular distance is larger than a threshold, , the approximation is then unacceptable and we continue to

choose a point with the largest perpendicular distance as a new point and subdivide the original poly-line into two shorter poly-lines. This process continues until the approximation is acceptable. An example of the Douglas-Peucker algorithm is presented in Fig. 6 .

(ii). Delaunay Triangulation

Definition: let S be a set of points in the plane of triangulation. T is a Delaunay triangulation of S if for each edge of T, there exists a circle C with the following properties.

(13)

(1) endpoints of edge e are on the boundary of C, and (2) no other vertex of S is in the interior of C.

Definition: Let G be a straight-line planar graph. A triangulation T is a constrained Delaunay triangulation of G if each edge of G is an edge of T and for each remaining edge e of T , there exists a circle C with the following properties.

(1) endpoints of edge e are on the boundary of C, and

(2) if any vertex v of G is in the interior of C, then it cannot be “seen" from at least one of the end points of e .

After simplifying the human contour, we use Delaunay triangulation, well-studied in computational geometry, to partition foreground objects into triangular meshes. Fig.

7 illustrates the Delaunay Triangulation. We can obtain the Delaunay meshes of a human object as follows.

(1) Select boundary nodes of Delaunay meshes:

A polygon is used to approximate the boundary of a human object. The polygon vertices are the boundary nodes of the Delaunay meshes. We use a simple heuristic to select boundary points with high curvatures as the boundary nodes of the Delaunay meshes.

(2) Choose the interior nodes:

(14)

The edge points or corners within the object’s boundary are chosen as the interior

nodes of the Delaunay meshes.

(3) Perform Delaunay triangulation:

A constrained Delaunay triangulation is performed on the boundary nodes and on the interior nodes. With the bounding polygon as a constraint, the triangulation uses line segments to connect consecutive boundary nodes as edges and form triangles only within the boundary.

The details of the Delaunay triangulation algorithm can be found in (Chew, 1987).

(iii). Human Skeleton Extraction

We first connect all triangle meshes and obtain the centroid of each triangular mesh.

We can then find a spanning tree as the human skeleton via the depth first search from the root node of the triangular meshes (Hsu et al, 2008). The details of the human skeleton extraction algorithm can be described as follows

Human Skeleton Extraction Algorithm

(1) Calculate the centroid C of a human posture.

(2) Construct a graph G by connecting all centers of any two connected triangular meshes, which share a common edge.

(3) Find the node with the largest y coordinate and degree = 1 among all nodes in G as the root node R .

(15)

(4) Perform the depth first search on G and obtain a spanning tree F. (5) Find all leaf nodes L and branch nodes B of the spanning tree F.

(6) Extract the skeleton S by connecting any two nodes in {R L B C   } if there

exists a path between these two nodes.

(iv). Distance Map of a Human Skeleton

The distance map, also known as the distance transform (Borgefors, 1986), is employed to compute the distance between two sampling skeletons. In the distance transform of a binary map, the value of a pixel is the shortest distance to all pixels in the foreground object. The distance map of the binary image S , denoted as 1

S1

DM ,

can be represented as follows :

1 1

( ) min ( )

S q S

DM p Dist p q

(3)

where Dist p q(  ) is the Euclidian distance between pixel p and pixel q. Before

calculating the distance between two skeletons, we must normalize them to the same size. Consequently, the distance of the two skeletons, S ,and 1 S , denoted as 2 Dist(S ,1 S ), can be calculated as follows : 2

1 2

1 2

( ) 1 S ( ) S ( )

p

Dist S S DM p DM p

  DM   

 

(4)

where DM  represents the image size of the distance map. Fig. 8 illustrates the

distance map, and Fig. 9 illustrates the Delaunay triangulation, the human skeleton extraction, and the distance map of a skeleton.

(16)

2. Human Shape Analysis

We use an ellipse to approximate the human shape instead of a bounding box. An ellipse can provide more precise shape information than a bounding box, especially, when a human carries an object. The comparison between an ellipse and a bounding box in shape analysis is shown in Fig. 10.

(i). Ellipse Fitting

We can obtain the contour of a human and approximate the human shape with an ellipse using moments (Pratt, 2007). The ellipse is defined by its center(x y ), the orientation  and the length I and a I of its major and minor semi-axes. b

For a gray level image value f(x, y), the moments are given by

( ) 0 1 2

p q

mp q



x y f x y dxdyp q     (5) The center of gravity is obtained by computing the coordinates of the center of mass

with the first and spatial moments of zero-order:xm10/m00,ym01/m00 The center (x y ) is used to compute the central moment as follows:

( ) (p )q ( )

p q x x y y f x y dxdy



   (6)

The orientation of the ellipse is given by the tilt angle between the major axis and the

x-axis of the person, and it can be computed with the second order central moments:

11

20 02

2 1arctan( ) 2

 

 

  (7)

(17)

We then compute the major and the minor semi-axes of the ellipse, which correspond to, respectively, the maximum and minimum eigenvalues of the covariance matrix:

2,0 1,1 1,1 0,2

 

 

J (8)

The maximum eigenvalue Imax and the minimum eigenvalue Imin are given,

respectively, by

2 2

2 0 0 2

2 0 0 2 4 1 1 ( )

min 2

I      

(9)

2 2

2 0 0 2

2 0 0 2 4 1 1 ( )

max 2

I      

(10)

The major and the minor semi-axes of the ellipse are then given, respectively, by

1 1

8 4

4 3

( ) ( max)

a

min

I I

I

(11)

1 1

8 4

4 3

( ) ( min)

b

max

I I

I

(12)

Finally, the ratio of the major and the minor semi-axes of the ellipse,  , can be

defined as follows

a b

I

  I (13)

(18)

(ii). Ellipse Features for Fall-down Detection

Two features, derived from the orientation of the ellipse, , and the ratio of major and minor ellipse semi-axes,  , are used to discriminate a falling incident from

normal daily activities. To avoid the influence of the size of foreground objects on feature thresholds, we use the change rate of the ellipse features within a slide window instead of a fixed threshold.

The change rate in the orientation of the ellipse, R, and the ratio of major and

minor ellipse semi-axes, R, are then given in Eqs. (14) and (15) respectively:

( ) ( ) ( )

SW n

R SW

 

  

(14)

where ( )n represents the orientation of the object’s ellipse in the n-th frame, and (SW)

 represents the average orientation in a slide window:

( ) ( ) ( )

SW n

R SW

 

  

(15)

where ( )n represents the proportion of the object’s ellipse in the n-th frame, and (SW)

 represents the average proportion of the object’s ellipse in a slide window.

If a person falls at a vertical angle to the camera optical axis, the change in orientation of the ellipse will be extremely large and the change ratio in the orientation, R, becomes relatively high. If a fall-down incident does not occur, Ris

relatively low. On the other hand, if a person falls in line with the camera optical axis, the change ratio of the proportion of the ellipse’s length and width, R, becomes

(19)

relatively high. If it is not a fall-down incident, R will be relatively low. To

conform with the definition of a fall-down incident, we set the period of the time window (SW) from 0.4 second to 0.8 second in the calculation of R and R .

3. Fall-Down Confirmation

We check the following two conditions to confirm to a fall-down.

(1) The motion of the human object is smaller than the threshold for a period of time.

(2) Both R in Eq. (14) and R in Eq. (15) are smaller than the threshold for a

period of time.

The first condition assures that the person is inactive for a period of time after a possible fall, whereas, the second condition guarantees that both the change rate in the orientation of the ellipse, which approximates the human shape, and the ratio of major and minor of the ellipse semi-axes approach zero within a fixed time interval, i.e., the human shape becomes immobile after a possible fall.

EXPERIMENTAL RESULTS

We implemented the proposed skeleton-based fall detection system on Intel’s OpenCV library (Bradski et al, 2008), and decoded the compressed video by mean of the FFMPEG library. All test videos were acquired from a single stationary and

(20)

un-calibrated camera in the MPEG-1 format with 320*240 pixels resolution, and 30 frames per second. Human activities in testing videos include fall incidents such as forward falls, backward falls, and side-ways falls, and daily activities such as walking, running, and squatting as shown in Fig. 11. Our experiments were run on a computer with windows XP, Intel Pentium D 3.2GHz CPU and 2 GB RAM.

Table 1 the confusion matrix

detected falling detected normal

actual falling True Positive(TP) False Negative(FN) actual normal False Positive(FP) True Negative(TN)

Performance metrics in fall detection experiments can be presented in a confusion matrix, as in Table 1, where true positive (TP) and true negative (TN) are the counts of correct detection while false positive (FP) and false negative (FN) are the counts of incorrect prediction. The detection rate, the fraction of human fall events that are correctly detected, is defined by

detection rate

( )

TP

TP FN

 

The false alarm rate, the fraction of non-fall events that are incorrectly predicted, can be defined by

(21)

false alarm rate

( )

FP

TN FP

 

1. EXPERIMENTAL RESULTS WITH DIFFERENT THRESHOLDS

In the proposed fall detection system, thresholds in human fall detection such as skeleton change detection, ellipse orientation change detection, ellipse aspect ratio change detection and inactivity detection can influence the detection rate and the false alarm rate. To tune the performance of the proposed fall detection system, we conducted a series of experiments under different skeleton distance thresholds as shown in Table 2. In a trade-off between the detection rate and the false alarm rate, the skeleton distance = 0.056 can tune the system to a condition with a detection rate = 90.91 % and a false alarm rate = 9.09 %. Table 3 shows major system thresholds for the proposed fall detection system in our experiments.

(22)

Table 2 The experimental results of the proposed fall detection with different skeleton distance

thresholds

Skeleton distance 0.054 0.055 0.056 0.057 0.058 0.059 0.060 TP 22 20 20 20 19 18 17

FN 0 2 2 2 2 4 5

TN 27 30 30 30 30 31 31

FP 5 2 2 2 2 1 1

Detection rate 100% 90.91% 90.91% 90.91% 90.91% 81.82% 77.27%

False alarm rate 15.63% 6.25% 6.25% 6.25% 6.25% 3.12% 3.12%

Table 3 system thresholds in fall detection experiments

Parameter Threshold Change ratio of orientation,( R) 0.28 Change ratio of proportion, ( R ) 0.3

Skeleton distance,(Dist S S( 12)) 0.056 Unmoving time 3 sec

2. PERFORMANCE COMPARISON BETWEEN DIFFERENT APPROACHES

In our experiments, we compared the proposed fall detection system with three

different fall detection approaches:

 the skeleton match (Hsu et al, 2008),

 the posture analysis, and

 the shape analysis (Rougier et al, 2007).

(23)

In the posture analysis, a distance map of two sampling human skeletons is calculated every 0.4 second, and a human fall is detected if this distance map is larger than a threshold, i.e., only a drastic change in the human posture as discussed in Section II.1 is accepted as showing human falling. On the other hand, the shape analysis only uses the change rate in the ellipse angle and the ratio between the major and minor semi-axes of the ellipse to detect human falling, i.e., only a drastic change in the human shape as discussed in Section II.2 is accepted as indicating human falling.

Tables 4 and 5 show experimental results for the human skeleton match and the posture analysis respectively. The experimental results for shape analysis utilizing, the change ratio of two ellipse features, are tabulated in Table 6. From Tables 4 to 6, we can observe that the fall detection systems based on a single approach yield a high false positive rate and low detection accuracy because they cannot differentiate a sit-down from a fall-down.

Table 4 The experimental results of the skeleton match

Event No of videos

Detected falling

Detected non-falling

TP TN FP FN

Falling 22 16 6 16 0 0 6 Sit-down 8 1 7 0 7 1 0 Squat 8 0 8 0 8 0 0 Walking 8 0 8 0 8 0 0 Running 8 0 8 0 8 0 0

(24)

Table 5 The experimental results of the posture analysis

Event No of videos

Detected falling

Detected non-falling

TP TN FP FN

Falling 22 22 0 22 0 0 0 Sit-down 8 8 0 0 0 8 0 Squat 8 0 8 0 8 0 0 Walking 8 3 5 0 5 3 0 Running 8 0 8 0 8 0 0

Table 6 The experimental results of the shape analysis

Event No of videos

Detected falling

Detected non-falling

TP TN FP FN Falling 22 20 2 20 0 0 2 Sit-down 8 5 3 0 3 5 0 Squat 8 0 8 0 8 0 0 Walking 8 0 8 0 8 0 0 Running 8 0 8 0 8 0 0

Table 7 The experimental results of the proposed fall detection with the skeleton distance = 0.056

Event No of videos

Detected falling

Detected non-falling

TP TN FP FN

Falling 22 20 2 20 0 0 2 Sit-down 8 2 6 0 6 2 0 Squat 8 0 8 0 8 0 0 Walking 8 0 8 0 8 0 0 Running 8 0 8 0 8 0 0

Table 8 The comparisons of four fall detection schemes

(25)

Fall detection approach Detection rate False alarm rate The skeleton match 0.727 0.031 The posture analysis 1 0.34

The shape analysis 0.909 0.16 The proposed scheme 0.909 0.06

Table 9 Comparison of the proposed scheme and the shape analysis in terms of the execution time, detection rate and false alarm rate

Performance metric The proposed detection scheme

The shape analysis Detection rate 90.9% 90.9 % False alarm rate 6.25 % 15.6 % Execution Time 4.21 sec 1.36 sec

The experimental results for the proposed falling detection system are shown in Table 7. Two fall-down incidents were not detected because they were slow-speed fall incidents, which did not register as fast posture changes in the first step of the proposed approach. On the other hand, two sit-down activities were flagged fall incidents because fast sit-down activities trigger a posture change step in the proposed approach. Table 8 summarizes the performance of the four human fall detection approaches in terms of the detection rate and the false alarm rate. These results demonstrate that an intelligent combination of different fall detection approaches can provide reliable fall detection. Table 9 compares the proposed hybrid human detection approach with the shape analysis approach (Rougier et al, 2007) in terms of the

(26)

execution time, the detection rate and the false alarm rate. We can observe from Tables 8 and 9 that our approach can achieve high detection accuracy and a lower false alarm rate than other systems within a reasonable execution time.

CONCLUSIONS

Since the global population is aging rapidly, fall detection for aging people has become an important issue in smart homes. The major contribution of this paper is to propose a novel real-time fall detection approach for elderly people, which is an intelligent combination of the skeleton change and the shape change detection scheme while still satisfying the real-time constraint. A human skeleton is first extracted from a human posture. The distance between two sampling skeletons beyond a threshold flags a posture change. We then use an ellipse to approximate the human shape. The orientation and the ratio of the major and the minor semi-axes of the ellipse are used to detect human shape change. Finally, we confirm a human fall incident by monitoring the inactivity of a person for a period of time. Experimental results indicate that the proposed hybrid human fall detection system can achieve a high detection rate and a low false alarm rate with reasonable computational costs.

R

EFERENCES

Bobick A. and Davis J., 2001, ”The Recognition of Human Movement Using

(27)

Temporal Template," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, No. 3, pp. 257 - 267.

Borgefors Gunilla, 1986, ”Distance Transformations in Digital Images," Computer Vision, Graphics, and Image Processing, Volume 34, No. 3, Pages 344-371.

Bradski, G. R. and Kaeher, A., 2008, Learning OpenCV: Computer Vision with OpenCV Library, O’Reilly Media, Inc., Sebastopol, CA, USA.

Chew L. P., 1987, ”Constrained Delaunay Triangulations," ACM Proceedings of Third Annual Symposium on Computational Geometry, , pp. 215-222, Waterloo, Ontario, Canada.

Douglas David and Peucker Thomas, 1973, ”Algorithms for The Reduction of The Number of Points Required to Represent A Digitized Line or Its Caricature," The Canadian Cartographer, Vol.10, No. 2, pp. 112-122.

Hsu Y. T., Liao H. Y. M., and Chen C. C., Hsieh J. W., 2008, ”Video-based Human Movement Analysis and Its Application to Surveillance Systems," IEEE Transactions on Multimedia, Vol. 10, No. 3, pp. 372-392.

Huang W., Gu I. Y. H., and Li Q. Tian L., 2003, ”Foreground Object Detection from Videos Containing Complex Background," Proceedings of the Eleventh ACM International Conference on Multimedia, pp.2 - 10, Berkeley, CA, USA.

Lin Chia-Wen, Ling Zhi-Hong, Cheng-Yeng, and Kuo Chung J., 2007,

“Compressed-domain Fall Incident Detection for Intelligent Homecare," Journal of VLSI Signal Processing System, Vol. 49, No. 3, pp. 393 - 408.

Lindemann U., Hock A., and Stuber, M., 2005, ”Evaluation of A Fall Detector Based on Accelerometers: A Pilot Study," Medical and Biological Engineering and

Computing, Vol. 43, No. 5, pp. 548 - 551.

Pratt Willam, 2007, Digital Image Processing, 4th ed., John Wiley and Sons, Ltd., New York, USA.

Rougier Caroline, Meunier Jean, St-Arnaud Alain, and Rousseau Jacqueline, 2007, ”Fall Detection from Human Shape and Motion History Using Video

(28)

Surveillance," Proceedings of the 21st International Conference on Advanced Information Networking and Applications Workshops, pp. 875 – 880, Niagara Falls, Canada.

Rougier Caroline, Meunier Jean, St-Arnaud Alain, and Rousseau Jacqueline, 2008, ”Procrustes Shape Analysis for Fall Detection," The Eighth International Workshop on Visual Surveillance, pp. 50-62, Marseille, France.

Töreyin B. U., Dedeoglu Y., and Cetin A. E., 2005, “HMM-based Falling Person Detection Using Both Audio and Video," Lecture Notes in Computer Science, vol.

3766, pp.211-220 Springer-Verlag, Berlin

Vishwakarma Vinay, Mandal Chittaranjan, and Sural Shamik, 2007, ”Automatic Detection of Human Fall in Video," International Conference on Pattern Recognition and Machine Intelligent , Lecture Notes in Computer Science, pp.612 – 613, Kolkata, India.

William A., Ganesan D. and Hanson A., 2007, “Aging in Place: Fall Detection and Localization in Distributed Smart Camera Network," Proceedings of the 15th international conference on Multimedia, pp. 892 – 901, Augsburg, Germany.

Zhang T., Wang J., and Xu L., 2006, ”Using Wearable Sensor and NMF Algorithm to Realize Ambulatory Fall Detection," International Conference on Advances in

Natural Computation, Lecture Notes in Computer Science, pp.488 - 491, Xi'an, China.

Bobick A. and Davis J., 2001, ”The Recognition of Human Movement Using Temporal Template," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, No. 3, pp. 257 - 267.

Borgefors Gunilla, 1986, ”Distance Transformations in Digital Images," Computer Vision, Graphics, and Image Processing, Volume 34, No. 3, Pages 344-371.

Bradski, G. R. and Kaeher, A., 2008, Learning OpenCV: Computer Vision with OpenCV Library, O’Reilly Media, Inc., Sebastopol, CA, USA.

Chew L. P., 1987, ”Constrained Delaunay Triangulations," ACM Proceedings of Third Annual Symposium on Computational Geometry, , pp. 215-222, Waterloo, Ontario, Canada.

(29)

Douglas David and Peucker Thomas, 1973, ”Algorithms for The Reduction of The Number of Points Required to Represent A Digitized Line or Its Caricature," The Canadian Cartographer, Vol.10, No. 2, pp. 112-122.

Hsu Y. T., Liao H. Y. M., and Chen C. C., Hsieh J. W., 2008, ”Video-based Human Movement Analysis and Its Application to Surveillance Systems," IEEE Transactions on Multimedia, Vol. 10, No. 3, pp. 372-392.

Huang W., Gu I. Y. H., and Li Q. Tian L., 2003, ”Foreground Object Detection from Videos Containing Complex Background," Proceedings of the Eleventh ACM International Conference on Multimedia, pp.2 - 10, Berkeley, CA, USA.

Lin Chia-Wen, Ling Zhi-Hong, Cheng-Yeng, and Kuo Chung J., 2007,

“Compressed-domain Fall Incident Detection for Intelligent Homecare," Journal of VLSI Signal Processing System, Vol. 49, No. 3, pp. 393 - 408.

Lindemann U., Hock A., and Stuber, M., 2005, ”Evaluation of A Fall Detector Based on Accelerometers: A Pilot Study," Medical and Biological Engineering and Computing, Vol. 43, No. 5, pp. 548 - 551.

Pratt Willam, 2007, Digital Image Processing, 4th ed., John Wiley and Sons, Ltd., New York, USA.

Rougier Caroline, Meunier Jean, St-Arnaud Alain, and Rousseau Jacqueline, 2007, ”Fall Detection from Human Shape and Motion History Using Video Surveillance," Proceedings of the 21st International Conference on Advanced Information Networking and Applications Workshops, pp. 875 – 880, Niagara Falls, Canada.

Rougier Caroline, Meunier Jean, St-Arnaud Alain, and Rousseau Jacqueline, 2008, ”Procrustes Shape Analysis for Fall Detection," The Eighth International Workshop on Visual Surveillance, pp. 50-62, Marseille, France.

Töreyin B. U., Dedeoglu Y., and Cetin A. E., 2005, “HMM-based Falling Person Detection Using Both Audio and Video," Lecture Notes in Computer Science, vol.

3766, pp.211-220 Springer-Verlag, Berlin

Vishwakarma Vinay, Mandal Chittaranjan, and Sural Shamik, 2007, ”Automatic

(30)

Detection of Human Fall in Video," International Conference on Pattern Recognition and Machine Intelligent , Lecture Notes in Computer Science, pp.612 – 613, Kolkata, India.

William A., Ganesan D. and Hanson A., 2007, “Aging in Place: Fall Detection and Localization in Distributed Smart Camera Network," Proceedings of the 15th international conference on Multimedia, pp. 892 – 901, Augsburg, Germany.

Zhang T., Wang J., and Xu L., 2006, ”Using Wearable Sensor and NMF Algorithm to Realize Ambulatory Fall Detection," International Conference on Advances in Natural Computation, Lecture Notes in Computer Science, pp.488 - 491, Xi'an, China.

(31)

Fig. 1 The features of the fall-down incident

Input video Foreground Object

Segmentation Object Tracking

Fall Detection Fall Alarm Generation

Fig. 2 The flowchart of the proposed fall detection system

(32)

Fig. 3 The comparison of the background subtraction and the Bayesian approach under the shadow of a foreground object

Use skeleton to detect posture change

Analyze ratio of human shape to detect shape change

Confirm fall down based on immobility of person or people

Fall-down alarm

Fig. 4 The flowchart of the fall-down detection

(33)

Object contour

Douglas-Peuker Algorithm

Delaunary Triangulation

Build skeleton tree

Distance transform

Distance map

(34)

Fig. 5 The flowchart of the human skeleton extraction

Fig. 6 An example of the Douglas-Peuker algorithm

Fig. 7 The Delaunay Triangulation

(35)

Fig. 8 An example of the distance map

(36)

Fig. 9 An example of the Delaunay triangulation, the human skeleton extraction, and the distance map of a skeleton

Fig. 10 The comparison between an ellipse and a bounding box in the shape analysis

Fig. 11 Parts of the testing videos in experiments: walking, sit-down, and fall

(37)

陳郁堂台灣科技大學電子系 Joint IAPR International Workshops on Structural and Syntactic Pattern Recognition (SSPR 2012) and Statistical Techniques in Pattern Recognition (SPR 2012)

國際會議出席報告

1.參與會議經過

第 21 屆 Joint IAPR International Workshops on Structural and Syntactic Pattern Recognition (SSPR 2012) and Statistical Techniques in Pattern Recognition (SPR 2012)會議於 2012 年 11 月 7

日至 9 日,在日本廣島市宮島口的

Aki Grand Hotel 舉行。 會議所在地 風景如織,對面為宮島,列入日本三 大古跡,是日本人一生必遊的景點。

本 會 議 是 國 際 型 態 辨 識 協 會 (International Association of Pattern Recognition)每兩年一次盛會,吸引

來自來自 18 個國家百餘名型態辨識

(Pattern Recognition) 研 究 學 者 與

會。本次會議共發表 80 篇論文。今

年會議Pierr Devijver 獎頒發給 來自 美國 Rensselar Polytechnic Institute 的 George Nagy 教授,以表彰他從

1966 開始長達四十年在型態辨識領

域的貢獻。

此次會議包含下列議題:圖型模型 (Graphical Models), 圖 型 表 示 法 (Graphical-based Representation), 核 心 方 法 (Kernel Methods for Structure data), 非 指 引 式 學 習

(Unsupervised Learning), 分群分析 (Clustering Analysis) 型態辨識在文 字形狀靜態影像與視訊的應用等,

涵蓋各個型態辨識領域。

本次會議三個Invited Lecture 分 別 Pierr Devijver 獎 得 主 George Nagy 教 授 發 表 “Estimation, Learning, Adaption: Systems that Improve with Use ”演講; Kenichi Kanatani 教 授 發 表 ”Optimization Technologies for Geometric Estimation: Beyond Minimization” 演 講 與 Ales Leonardis 教 授 發 表

“Hierarchical Compositional Representations of Object Structure”.

為題的演講。

Kenichi Kanatani 教 授 發 表 ”Optimization Technologies for Geometric Estimation: Beyond Minimization 的演講中,談到在電腦 視覺最重要工作之一就是計算二度空 間或三度空間物件的幾何限制條件。

幾何條件限制就是以簡單方程式,表 示物件在一直線、或在同一個平面 上、或者相互水平、或進行幾何投

(38)

影,以產生照相機影像,我們稱運用 幾 何 限 制 進 行 推 理 (geometric estimation)。 然而在存在雜訊的環境 中,這些假設條件並不一定成立。在 這個 speech ,Kenichi Kanatani 教 授將歷史發展與最近幾年重要發展做 了詳䀆的介紹。內容相當精釆。

Ales Leonardis 教 授 發 表

“Hierarchical Compositional Representations of Object Structure”.

的演講,談到視覺物件類別的辨識包 含三個相互關聯議題:(1)物件的 代表方式,一般化可涵蓋每個類別。

(2)可透過一組的輸入影像進行學 習;而且儘量減少人工前置作業例如

以人工進行影像標示註示。(3)有

效率的推縯演算法,可以對大量的影 像與物件進行比對。本演講重點以 Ales Leonardis 教授發展出的方法,

階層組成學習式組合為主軸,包含利 用二度空間形狀來代表多種物件的類 別,與一個 coarse to fine 比對方 式,以達到有效率的物件偵測。這個 學習架構,先截取物件輪廓片段,並 學習這些片段在空間出現的頻率,以 建構成為階層式,內容非常有趣。

2.與會心得

筆者以 “A Novel Shadow-assistant Human Fall Detection Scheme Using a Cascade of SVM Classifier”為題發 表論文。利用電腦視覺進行人體跌倒 事件偵測,近年來一直是一個熱門的

研究議題。然而,大多數已發表的方 法,常無法準確區分跌倒與類似跌倒

的動作,如坐下 蹲下等動作。在我

們所發表論文中,我們提出了一種不 同以往的人體摔倒偵測方法,核心摡 念就是以陰影來輔助偵測人體跌倒。

相較於以往利用複雜的 3-D 模型來估

計影像中人的高度,本論文利用移動 陰影的信息來估計影像中人的高度,

可大幅降低計算時間。系統架構以影 子資訊單與人體外形輪廓的長寬比為 特徵,以支持向量機(Support Vector Machine)為分類器,它可以準確區分 跌倒與類似跌倒的動作。本系統具有

快速偵測優點,從很短的 1-10 張有

影像劃 就能進行高準確性的偵測。

我們以鳥瞰相機角度設置的實驗,結 果顯示,,我們所發展人體跌倒事件

偵測可以達到 100 % 檢測率和低於

誤報率 5.5%,較其他跌倒檢測法已

大幅提高準確度。

在這次會議聽到幾埸令人收穫不錯的 論文發表其中印象較為深刻有下列幾 篇:

由法國學者 Chesner Desir 與比利時 學者 Simon Bernard等發表論文

“A_New_Radom_Forest_Method_for _One-Class_Classification”

單類分類器是一個二元分類的分類 器,其中只有一種類別(目標類)為 主要學習辨識的對象這種分類器主要 應用在收集反例樣本(異常值)較為

(39)

困難的情況,例如身份驗證,手寫字 符的識別,機器或結構監測…等。本 篇論文提出新的單類分類器演算法,

稱為單類隨機森林分類器 One-class Random Forests (OCRF) ,OCRF 是一個架構在隨機森林為基礎上的單

類分類方法。本篇OCRF的演算法包

含兩個隨機化原則—“裝袋”以及”隨機

特徵選取”。OCRF 算法的兩個主要

步驟(1)從目標數據的原始特徵空間中 提取預先檢驗的信息,以利匡正學習 過程。(2)隨機森林同時使用”隨機子 空間降維法”(RSM)以及“裝袋、隨機 特徵選取”來降低維度,並創造多樣性

的分類樹。總而言之,OCRF 方法利

用:

(1) 相結合多元化的弱分類器,這是 眾所周知的增加分類器的準確度 的方法。

(2) 為了有效地透過控制自己的位置 數量生成離群值,在訓練樣本和 功能時以二次採樣的方式訓練數 據集。

荷蘭學者 Gerard Sanroma等發表

“Recognition of Long Term Behavior by Parsing Sequence of Short term Actions with a Stochastic Regular

Grammar” 從視訊資料中辨識人類行

是個熱門研究題目。然而現有研究常 將問題限定在一小段時間內動作(稱為 簡短動作)的識別,對於較長時間的 複雜行為的研究較少。本篇論文提出 以隨機文法(stochastic grammar)方式

來識別長時間行為。這篇論文假設任 何簡短動作具有特定結構利用這個結 構來提高簡短動作的辨識結果。而複 雜 行 為 可 利 用 語 法 分 析 ( syntactic approach) 技巧建立模型,任何視訊 中 要 辨 識 的 行 為 可 以 用 文 法 法 則 (grammar rule)來表示。這個方法特別 適用於缺乏訓練資料的行為辨識,例 如識別具有威脅行為。

首先對一個已知的集合萃取其軌跡信 息,最主要的目的是,計算測試資料 屬於各個已知動作的機率。為了進一

步減低雜訊,使用 PCA 對取得的特

徵向量重新做投影,再以 KNN 計算

各類別的相似度。

簡短動作經常遵循某些前因後果的行 為模式,大部分這些行為模式都蘊含 幾十甚至上百個簡短動作所組成。所 以作者將複雜行為視為行為模式的”文 法規則”,因此,當要辨識複雜行為 時,可視為確認是否符合語意規則。

西班牙學者 Adrian Perez 報告論文

“Online_Metric_Learning_Methods Using Soft Margins and Least Squares Formulations”

組織,分類或表示數據集是具有關鍵 的重要性在許多不同的應用領域,從 圖像分析等領域,模式識別或資料探 勘。這些基於距離的方法形成了一個 完善的方法類別來解決分類,回歸,

估計和聚類的問題。這類方法之效能 依賴於輸入樣本彼此之間關係以什麼 樣的機制表示。然而這些機制與物件

(40)

表示方式密切相關。近年來,距離度 量的學習(DML)一直是持續熱門的

研究。在大多數情況下,DML旨在學

習一個合適的距離矩陣。雖然有很多 方法,最常見的是定義一個(通常是

convex 函數)標準函數,它表示所

期望的目標,基本上同時將相似物件 拉近,非相似物件拉遠。當遇到大規 模的問題時,解決全域最佳化問題是 非常耗時的。因此,我們急需要一個 有 效 且 高 效 率 的 學 習 機 制 。 以

Sequential 方式學習的演算法因應而

生。該演算法對循序獲得的輸入樣本 對進行標準函數的最佳化處理。不幸 的是,許多實際層面和理論層面問題 的出現。一方面,不同的方式順序地 執行額外的限制可能會導致不同解決 方案需要不同的計算量。另一方面,

最後答案的效能可能會與理想狀況顯 著地偏離。

本篇論文介紹了一系列利用邊緣最大 化技巧的在線度量學習算法。本篇論 文對於目前在線度量學習算法提出了

不同的全新構想。除了利用 passive-

aggressive 機制去描述不同的在線度

量學習算法之外。本篇論文也利用 least square 公式規劃這些在線度量 學習算法。詳盡的比較試驗充分體現 各種不同在線度量學習算法的優點和 缺點。

荷蘭學者 Robert P. W. Duin 提出

“Model Seeking Clustering by KNN

and Mean Shift Evaluated”

Mean shift 分群演算法實現如何快速 地以非參數的方式找到概率密度函數

Mode 的想法,其概率密度函數是利

用 Parzen kernel 求得。它可以被用 於分割圖像。雖然可靠的估計需要很

多的數據點,隨著記憶體容量和 CPU

的快速進展,使得Mean shift 演算法 在許多應用上是可行的。儘管如此,

Mean shift 演算法在高維度空間的運 算上,梯度追蹤的計算量負擔仍是沉 重。因此它的使用主要局限於小規模 的應用,例如三種顏色的圖像。

在分群上,除了以 Parzen kernel 計 算概率密度函數之外,另一種方法是 k-Nearest Neighbor (kNN) 規則。k- Nearest Neighbor (kNN) 也 是 Fukunaga 及 他 的 同 事 Koontz 和 Narendra 於 1976 提出,但它並沒有 得到太多的關注。

本 報 告 中 作 者 將 再 次 實 現 利 用 k- Nearest Neighbor 方式,使其可以套 用在更廣泛的應用上 (的資料維度以 及數據量)。Model seeking 演算法首 先估計資料集合的密度函數,這些 Mode 為密度函數local極值。在分群 的階段,每一筆資料根據密度梯度流

向找到所屬的 Mode。最後匯集到同

一個 Mode 的資料集合形成一個群

集。kNN mode seeking 演算法定義

每個資料集的的密度是與第 k 個鄰居

的距離成正比。每個資料點定義一個

(41)

指標,指向鄰近區域最大密度的資料 點。若該資料點在鄰近區域密度最 大,則指標指向自己,此資料點為

Mode。所有匯流到此 Mode 的指標

集合成一個群集。

kNN與Mean shift分群演算法最大的 不同是Mean shift使用了固定的空間

視窗尺吋,然而 KNN 演算法使用了

固定鄰居數量。

本篇論文比較KNN mode seeking及 Mean shift 分群演算法。在大規模問 體 , 例 如 問 題 維 度 高 達 數 百 情 況 下, kNN 的效能比 Mean shift 演算 法來的好;在。在小規模資勞情況下 kNN的效能比Mean shift演算法來的 差。

由 NTT 日本學者 Makoto Yamada 報吿論文 Change_point_detection in Time-Series Data by Relative Density Estimation。變化點檢測的目 的是從時間序列數據中發現突然變化 的樣本,這個問題在幾十年來已吸引 統計和資料探勘領域研究人員。變化 點檢測可分為即時性的變化點檢測和 回顧性的變化點檢測。即時性的變化 點檢測需要即刻的反應,例如機器人 控制應用。而回顧性的變化點檢測一 般需要較長的時間運算,其相關的應 用包含天候異常偵測、訊號切割和網 路入侵系統等。作者對回顧性的變化 點檢測方案,並提出了一種非參數方 法。作者提出了一個新的統計變化點 檢測的演算法,主要的概念是將非參

數之間的分歧估計的點檢測算法運用 到兩個分別為過去及現在的時間序列 片 段 。 其 中 分 歧 估 計 量 測 是 採 用 Relative Pearson divergence,非參

數變化點檢測演算法,例如 kernel

density estimation常用來估計過去及 現在時間序列片段的機率分佈。然而 curse of dimensionality,此方法在高 維度資料的環境下往往造成準確率的 下降。為了克服這個困難,本篇論文 提出了一個想法:直接估計密度比例 而不計算過去及現在時間序列片段的 機率分佈。這個密度比估計的合理性 的想法是,知道兩種密度意味著知道 知道的密度比,但反之則不然,如下 圖所示。因此,直接密度比估計大大 的簡化了密度估計所遭遇的因難。許 多演算法 (例如 Logistic 回歸方法、

KLIEP等) 都是遵循這一理念。

本篇論文的貢獻有兩方面,第一個貢 獻是應用最近提出的密度比估計的方 法稱為 unconstrained least-squares importance fitting(uLSIF)。uLSIF 的顯著優勢是它實現了最佳的非參數 收斂速度,它具有最佳的數值穩定 性,它比 KLIEP 更 Robust。本文的

第二個貢獻是將以 uLSIF 為基礎的變

化 點 檢 測 做 延 伸 , 稱 為 Relative uLSIF(RuLSIF)“。RuLSIF 的基本 思想是考慮相對密度比,為了避免無

窮大密度比的情形發生。RuLSIF 被

證實擁有比普通 uLSIF 卓越的非參數

收斂性,這意味著 RuLSIF 提供了一

(42)

個更好的估計,從一個小數量的樣

本。通過實驗證明,RuLSIF 的變點

檢測方法與其他方法相比,毫不遜 色。

比利時學者 Laura Antanas 報吿論文

“A Relational Kernel based Framework for Hierarchical Image Understating”。影像利用低階特徵,

並不足以代表高階複雜的樣式;另一 方面,描述影像採用階層式結構、或 圖型代表方式組成元素,較符合人類 視覺的直覺

目前電腦視覺有關 part model ,多 以固定的組合結構、或衛星式結構。

近年有些開始使用高階關係代表方 式,進行影像或物件辨識。這些研究 較多以 model based 方法,透過一 些影像的文法規則進行。相反的,本 論文所採用方法,先從經過人工標示 範例,再運用學習方式,同時利用領 域知識、以定性方式,描述影像中組 成物件的元件

關係代表方式在早期語意與統計式型 態辨識是常用方法,然而,在近代電 腦視覺技術中較少被採用,主要原因 在於完全以符號的特性。另一方面單 純關係代表方式無法處理雜訊資料,

近來成功結合統計學習與關係代表的 發展,觸發本研究將這個技術應用在 階 層 式 影 像 辨 識 。 這 篇 論 文 利 用

KLog 一個邏輯與關係語言進行核心

學習以進行階層式物件偵測,在基層 影像利用區域式興趣點與表式子所組

成,再上一層組成物件最高層則構成 物件結構。

3 建議與結論

第 21 屆 Joint IAPR International Workshops on Structural and Syntactic Pattern Recognition (SSPR 2012) and Statistical Techniques in Pattern Recognition (SPR 2012)會議,來自世界十八個國

家學者都參與此盛。圖型表示法, 核

心方法,非指引式學習是本次會議主 要議題。我們以“A Novel Shadow- assistant Human Fall Detection Scheme Using a Cascade of SVM Classifier” 為 題 發 表 論 文 。 本 次 會 議,認識許多歐洲年輕學者,真是長 江後浪推前浪。能參加這次國際會議 獲益良多,帶回論文集一本。最後感 謝國科會的計畫補助。

(43)

A Novel Shadow-Assistant Human Fall Detection Scheme Using a Cascade of SVM Classifiers



Yie-Tarng Chen, You-Rong Lin, and Wen-Hsien Fang Department of Electronic Engineering,

National Taiwan University of Science and Technology, Taipei, Taiwan, R.O.C.

{ytchen,whf}@mail.ntust.edu.tw

Abstract. Visual recognition of human fall incidents in video clips has been an active research issue in recent years, However, most published methods cannot effectively differentiate between fall-down and fall-like incidents such as sitting and squatting. In this paper, we present a novel shadow-assistant method for detecting human fall. Normally, complex 3-D models are used to estimate the human height. However, to reduce the high computational cost, only the information of moving shadow is used for this context. Because the system is based on a combination of shadow-assistant height estimation, and a cascade of SVM classifiers, it can distinguish between fall-down and fall-like incidents with a high degree of accuracy from very short sequence of 1-10 frames. Our exper- imental results demonstrate that under bird’s-eye view camera setting, the proposed system still can achieve 100% detect rate and a low false alarm rate, while the detection rate of other fall detection schemes have been dropped dramatically.

Keywords: fall detection, SVM.

1 Introduction

In recent years, visual recognition of human fall incidents in video clips has been an active research issue. In this paper, we consider the problem of using a mono un-calibrated camera to detect if senior citizens fall over, called fall-down inci- dents hereafter. Such incidents normally occur suddenly and take approximately 0.45 to 0.85 seconds. Both the posture and shape of the victim change rapidly, and he/she usually lies inactive on the floor. Hence, drastic changes in the pos- ture, shape and height of the body are key features in human fall detection.

However, modeling those features with low computational complexity is a not a trivial task, especially for accurate human height estimation.

A number of fall detection schemes have been proposed [4-5]. Simple features derived from shape analysis, such as the aspect ratio of the bounding box, the

 This work was supported by National Science Council of R.O.C. under contract NSC 100-222-E-011-134.

G.L. Gimel’ farb et al. (Eds.): SSPR & SPR 2012, LNCS 7626, pp. 710–718, 2012.

 Springer-Verlag Berlin Heidelberg 2012c

(44)

A Novel Shadow-Assistant Human Fall Detection Scheme 711

angle of the fall and a vertical projection histogram have been used for fall detec- tion. Rougier [4] proposed a fall detection approach based on the Motion History Image (MHI) [3] and changes in body shape. Hidden Markov Models (HMMs) have also been utilized for fall detection. Hiseh [5] developed a triangulation- based skeleton extraction approach to analyze human movements; however, it is not designed specifically for detection fall-down incidents. No approach based on simple features can detect all kinds of human falls. Most video-based fall detection systems based on simple features suffer from high false alarm rates be- cause they do not differentiate between fall-like and fall-down incidents. The high computational cost of human skeleton extraction discourages researchers from using it for real-time human fall detection. Hence, there is need for a reliable fall detection system, a combination of several approaches, which can increase the detection accuracy while still satisfying the real-time constraint.

1.1 A Motivation Example

Figure 1 illustrates the motivation for this paper, which attempted to differ- entiate the falling posture through the shadow information. We can observe a correlation between the height of standing, sitting down and falling postures and their relevant shadow areas. In particular, the shadow area approaches 0 for a falling posture. Hence, we attempt to investigate the possibility to utilize shadow information for human fall detection. However, shadow is not a stable image cue, especially, it is dependent on the capturing conditions. if a person stands just below a light source, where the projection angle of a light is vertical, the length of a person’s shadow is still cannot be detected. Hence, there is need an intel- ligent combination of the shadow information with other approaches which can increase detection accuracy.

In this paper, we propose a real-time video-based human fall detection system which can support both bird-view and flat-view camera furnishing. The proposed system applies a novel shadow-assistant human height estimation scheme to differentiate between fall-down and fall-like incidents. Normally, complex 3-D models are required to estimate human height in bird-view camera finishing.

Fig. 1. The shadow (blue) and human foreground (red) for standing, sitting and falling

數據

Table 1    the confusion matrix
Table 4 The experimental results of the skeleton match
Table 7 The experimental results of the proposed fall detection with the skeleton distance = 0.056
Table 9 Comparison of the proposed scheme and the shape analysis in terms of the execution  time, detection rate and false alarm rate
+7

參考文獻

相關文件

• Dilek Hakkani-Tur, Asli Celikyilmaz, Larry Heck, and Gokhan Tur, Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding,

• Many-body physics is essential for sub-keV detectors of neutrinos and dark matter. • High-quality many-body calculations can substantially reduce

(1) 99.8% detection rate, 50 minutes to finish analysis of a minute of traffic. (2) 85% detection rate, 20 seconds to finish analysis of a minute

(1) 99.8% detection rate, 50 minutes to finish analysis of a minute of traffic?. (2) 85% detection rate, 20 seconds to finish analysis of a minute

• Dilek Hakkani-Tur, Asli Celikyilmaz, Larry Heck, and Gokhan Tur, Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding,

– evolve the algorithm into an end-to-end system for ball detection and tracking of broadcast tennis video g. – analyze the tactics of players and winning-patterns, and hence

 The 3SEQ maximum descent statistic describes clus tering patterns in sequences of binary outcomes, a nd is therefore not confined to recombination analy sis... New Applications (1)

[3] Haosong Gou, Hyo-cheol Jeong, and Younghwan Yoo, “A Bit collision detection based Query Tree protocol for anti-collision in RFID system,” Proceedings of the IEEE