3-D human posture recognition system using 2-D shape features

(1)

Abstract—This paper presents an integrated framework for recognizing 3D human posture from 2D images. A flexible combinational algorithm motivated by the novel view expressed by Cyr and Kimia [1] is proposed to generate the aspects of 3D human postures as the posture prototype using features extracted from the collected 2D images sampled at random intervals from the viewing sphere. Frequency and phase information of the posture are calculated from the Fourier descriptors (FDs) of the sampled points on the posture contour as the main and assistant features to extract the characteristic views as the aspects. Moreover, a modified particle filter is applied to improve the robustness of human posture recognition for continuous monitoring. Experimental trials on synthetic and real sequences have shown the effectiveness of the proposed method.

I. INTRODUCTION

UMAN posture recognition is an important step towards human behavior analysis, which can be applied to monitor the behavior of people at home, especially for elders with limited autonomy. However, vision systems suffer from the view angle of the human posture. The simplest description of a human posture is a densely sampled collection of independent views. Postures can be described detail with a large number of collected 2D views. However, the computing time for recognizing objects also grows due to huge searching space. Therefore, the main objective of this study is to propose a framework for 3-D human posture recognition with the efficiency in both modeling and search.

Existing theorems [2] about the human posture recognition can be classified as direct and indirect approach based on the description of human body model and be classified as two-dimensional (2-D) and three-dimensional (3-D) representation based on the dimensional of human body model. The first way, direct approach, consists of a detailed human body model. For example, Ghost [3] constructs a silhouette based body model and combines hierarchical body pose estimation, a convex hull analysis of the silhouette, and a partial mapping from the body parts to the silhouette segments. Furthermore, Pfinder [4] used color information to build a multi-class statistical model and then detects the

Manuscript received September 15, 2006. This work was supported by National Science Council of the R.O.C. under grant no. NSC94－2218－ E009064 and DOIT TDPA Program under the project number 95-EC-17-A-04-S1-054.

J. S. Hu is with Department of Electrical and Control Engineering, National Chiao Tung University, Hsinchu 300, Taiwan, ROC. (phone: 886-3-5712121 ext. 54424; e-mail: e-mail: jshu@cn.nctu.edu.tw ).

T. M. Su and P. C. Lin are with Department of Electrical and Control Engineering, National Chiao Tung University, Hsinchu 300, Taiwan, ROC. (tzungminsu@ieee.org and peiching.ece93g@nctu.edu.tw ).

human body parts with the combination of shape detection. However, occlusions and perspective distortion cause the unreliable results.

The second way, indirect approach, extracts features about the human body instead of detailed human body model and combines classifiers to estimate the human posture. The amount of information in the field of indirect approach are less, but more robust than the information of direct approach. For example, Ozer et al. [5] extracted the AC-coefficients as the features and adopted principal component analysis to be the classifier. Furthermore, Ozer et al. [6] extracted color, edge and shape as the features and adopted the hidden markov model to be the classifier. Besides, complex 3-D models are adopted with different equipments. For example, Delamarre et al. [7] proposed a method to build a 3-D human body via three or more cameras and then calculated the projection of silhouette to compare with the 2-D projection in the database. Moreover, 3-D laser scanners [8] or thermal cameras [9] were also adopted to build the 3-D human body model. However, the above 3-D solutions suffer from enormous computing time requirement and high device cost.

In order to reduce the cost and computing time, some methods have been studied to extract a minimal set of object views with a single camera. Aspect-graph representations, which focus on changes in the shape of the projection of the object is one kind of method to achieve the minimal set of object views. The underlying theory that describes 3D objects by aspect-graph was proposed by Koenderink et al. [10]. The traditional aspect-graph method [11] bases on an assumption that an object belonging to limited classes of shapes and characteristic views can be extracted using prior knowledge of the object. In our previous work [12], a similarity-based aspect-graph approach is proposed to recognize free-form objects using the frequency information of FDs and point-to-point lengths. For recognizing human posture in this work, phase information of FDs is adopted to avoid misclassifying human postures between similar body shapes and a modified similarity-based aspect-graph approach is proposed to improve the efficiency of recognizing human posture. The proposed method is applied to video data where a sequence of human motion is observed. To improve the robustness, a modified particle filter is applied as the post-processing stage to reduce the errors caused from the inaccuracy of foreground detection. This method is also useful in monitoring continuous behavior of a person.

The rest of the paper is organized as follows: Section II describes the procedure of extracting the frequency and phase information of FDs . Next, Section III describes the novelty of this work, the procedures of building a modified

3-D Human Posture Recognition System Using 2-D Shape Features

Jwu-Sheng Hu, Member, IEEE, Tzung-Min Su, Student Member, IEEE, and Pei-Ching Lin

(2)

similarity-based aspect-graph representations and recognizing 3-D objects using 2-D object views. Furthermore, the modified particle filter is described. In Section IV, some experimental results are presented to demonstrate the performance of the proposed method in 3-D human posture recognition. Conclusions are finally made in Section V.

II. FEATURE EXTRACTION A. Foreground Detection and Contour Extraction

In this work, the shape feature is used to measure the similarity between two object views. In order to extract the shape information from the foreground object, Canny edge detection [13] is applied to extract the shape edge and Gradient Vector Flow Snake (GVF) [14] is then applied to extract the contour information. Suppose the contour information is included in a set Z, which is composed of N pointsz_i, where z_i can be described as a complex form with (1).

{ ( )} {z i xi jyi}, 0 i N = = + ≤ ≤

z (1) B. Fourier Descriptors

In order to avoid the variations in shift and scale, the points inside the set zare re-sampling by (2).

~

{ ( )} { [(z i L xc i xc) j y( i yc)] / }L

= = − + −

~

Z (2) where 0≤i≤N,L means the contour length, L_cmeans the expected contour length and (xc,yc) means the contour

center .

Then the Fourier transform is applied on Z~ to calculate the FDs with (3).

~ ₁

0

( ) N_n ( ) exp( 2 / ), k 0,1,2,...,N-1

Z k =

∑

₌− z n −j πkn N = (3)

The magnitude parts of low frequency information in

) (

~

k

Z are extracted to describe the human posture without the variations on the high-frequency noises and are defined asFD_m . The method for extracting FD_m is described as (4).

} 1 ,| ) ( | ,| ) ( {| 2 ~ ~ T k k N Z k Z FDm= − ≤ ≤ (4) However, althoughFD_m describes the shape of human posture, the direction information of human posture is lost without phase information ofZ~(k). For example, the postures of slide standing and lying down are classified as the same posture easily using onlyFD_m . Therefore, for improving the

efficiency of classifying human postures, the phase information in Z~(k) is adopted in this work to solve the above problems. Suppose the phase information is θ , and _z θ z can be calculated using Z~(1)and Z N~( −1) , which is derived from the work of [12]. The representation of Z~(1) and

~

( 1)

Z N− are described as (5) and (6), and θ can be _z calculated using (7). ~ ~ 1 1 1 (1) | (1)|.exp( ) Z =Z jθ =R+jI (5) ~ ~ 1 1 1 ( 1) | ( 1)|.exp( N ) N N Z N− =Z N− jθ − =R − +jI − (6) 1 1 1 1 1 1 ( ) / 2 ( tan( / ) tan( / )) / 2 z N arc I R arc IN RN θ = θ θ+ − = + − − (7)

whereR₁andR_N₋₁ means the real parts of Z~(1)andZ N~( −1) ,

1

I and I_N₋₁ means the imaginary parts of Z~(1)and Z N~( −1), and θ and ₁ θN−1 means the phase of

~

(1)

Z andZ N~( −1). III. SIMILARITY-BASED ASPECT-GRAPH A. Similarity Function

In order to calculate the similarity between human postures, a similarity measure metric is necessary to apply on the extracted features. Suppose the magnitude parts of low frequency information extracted from the test image and database are T

m

FD and D m

FD , and the phase information

extracted from the test image and database are T z

θ and D z ϑ Then the similarity between T

m

FD and D m

FD is calculated

using one-norm distance, which is described as (8) and the similarity between T

z

θ and D z

ϑ is calculated using one-norm distance, which is described as (9).

∑

= − + − − − = 2 1| ( ) ( )| | ( ) ( )| T k D m T m D m T m F FD k FD k FD N k FD N k d (8) | ) ( ) ( | k k d D z T z p =θ −θ (9)

B. Generation of Aspects and Characteristic Views

In our previous work [12], a similarity-based aspect-graph approach, are proposed to extract the characteristic views of complex free-form objects via two features, FD_m and

point-to-point lengths. However, point-to-point length is not suitable for human posture recognition due to the rotation invariant, huge computing time and appearance variations. In Fig. 1. The flowchart of the proposed method

(3)

min m min m min 1 m − _mmin 1 min+ m _mmin

Fig. 2. The procedure of the proposed combinational algorithm

this work, phase information of FDs θ is adopted to replace _z point-to-point lengths to avoid misclassifying human postures between similar body shapes. Moreover, a modified similarity-based aspect-graph approach is proposed to improve the efficiency of human posture recognition with

m

FD andθ . One or one above characteristic view can be z combined into the same aspect to maintain the differences between similar human shapes via keeping the phase information in the database.

Suppose n new

V means the new sampled view of the nthobject,

) (i

Cn

m means the ith characteristic view of the mthaspects of the n_thobject, n

m

C min₋₁and

n m

C min₊₁ means the neighbor views of min

n m

C that has the minimum distance with n new

V , A_mminmeans the

aspects that has the minimum distance with n new

V , where

min

m means the index of A_mmin. Then four steps are imposed to

form aspects and characteristic views as Step A-1 to A-4 and the flowchart of the modified aspect-graph representation is illustrated as Fig. 2.

Step A-1:

When the number of existed aspects of the n_thobject equals zero, n

new

V is regarded as a characteristic view of a new aspect.

Step A-2:

When the number of existed aspects of the n_thobject equals one or two,

(A-2.1) If (10) and (11) both meet, n new

V is combined into

the _mmin_{aspect and the characteristic view of the keep the}

same,

(A-2.2) Otherwise, if (10) meets but (11) conflicts , n new

V is combined into the _mmin_{aspect and is regarded as a new}

characteristic view of the _mmin_aspect.

(A-2.3) Otherwise, if (10) and (11) both conflict, a new

aspect of the n_thobject is built, and n new

V is regarded as the new characteristic view of the new aspect.

(

)

n m min 3 Cmin , m n n F new m all ∈A d V C <T (10)

(

)

n m min 5 Cmin , m n n p new m all ∈A d V C <T (11) where T₃andT5are both predefined threshold value.

Step A-3:

When the number of existed aspects of the n_thobject equals

three or above three, (A-3.1) If (12) or (13) meet and (11) conflicts, a new aspect is

built up and n new

V is regarded as the characteristic view of the new aspect.

(A-3.2) Otherwise, if (12) and (13) both conflict and (11) meets, n

new

V is combined into the _mmin _{aspect and the}

characteristic view of the _mmin_{aspect keeps the same.}

(A-3.3) Otherwise, if (12) and (13) both conflict and (11) conflicts, n

new

V is combined into the _mmin _{aspect and is}

regarded as a new characteristic view of the _mmin_aspect.

(

)

n m min 4 Cmin , m n n F new m all ∈A d V C >T (12)

(

)

(

min

)

n m min 3 _Cmin , C 4 , C 1 4 m n n n n F new m F new m all A T d V T and d V _± T ∈ < < > (13)

Moreover, if a new aspect is built, the aspect order can be decided using (14). If the similarity distance between n

new

V

and n m

C min₊₁ is larger than the similarity distance between

n new

V

and n m

C min₋₁, the new aspect is inserted between aspect min

m

and aspect_mmin−1_{. Otherwise, the new aspect is inserted}

between aspect _mmin_{and aspect}_mmin+1_{. Therefore, the similar}

aspects are close to each other.

(

, min ₁

)

(

, min ₁

)

n n n n

F new m F new m

d V C ₊ > d V C ₋ (14) Besides, T₃ and T₄are two predefined threshold values andT₄>T₃. If T₃and T₄ are defined with both small values, the criterion of combing 2D views becomes strict and thus the aspect number becomes more numerous. Moreover, if the difference between T₃and T₄ becomes smaller, the tolerance of difference between 2D views inside an aspect becomes smaller and thus aspect number becomes more numerous.

C. Posture Recognition using 2D Characteristic Views

After building the aspects-graph representation of each human posture in the database, a test view of an unknown posture can be recognized using the similarity measure with the main features and the assistant features. Two steps are imposed as follows:

Step B-1:

The test view of an unknown human posture is compared with the characteristic views of the database via main features. Then, the first T₆ 2D characteristic views in the database having the smallest similarity distance with the test 2D view via main features are preserved to be further recognized.

(4)

Step B-2: Suppose

6

T

A is defined as the set that contains the T₆2D

characteristic views described at the Step B-2 , then the final similarity distance can be calculated with the assistant features by (15). 2 2 ( ,i n) ( (( ,i n)) (90 ( / ) j m p j m main Max d V C = d V C + ⋅ L L (15) where i j

V means the 2D view of an unknown human posture,

n m

C denotes the m_th characteristic view of the nthhuman posture in the database , L_main denotes the similarity distance

calculated using FD_m between the unknown human posture and the m_th characteristic view of the n_thhuman posture, which is defined as 6 ( ,i n)), where n main F j m m T L =d V C C ∈A , and 6 arg max (_n ( , )) m T i n Max F j m all C A L d V C ∈ = . D. Particle Filter

For dealing with human behavior analysis, combining the temporal information from the video sequences with the results of human posture recognition is adopted in this work. A modified particle filter is proposed from the novel view expressed by Kwartra et al. [16] to compute the statistics for the distribution of each human posture and then the statistics are used to estimate a confidence measure for each posture.

The particle filter uses particles to represent the posteriori possibility distribution. Suppose the state of each time-instant t is the discrete posture statex_t, the measurement of each time-instant t isz_t, and the measurement in a period of time t

is Zt ={z1,z2,...,zt} . Then p(xt |Zt) is defined as the posteriori density probability given the measurementZ_t ,

) | (xt Zt−1

p is defined as the priori probability, p(xt |xt−1) is

defined as the process density probability describing the dynamics, andp(zt|xt) is defined as the observation density

probability.

Furthermore, a set of M samples are defined as S_M to representp(xt |Zt), which is described as (16). Each sample

k

s consists of a weighting parameterπ , a accumulating _k parametera_kand a class parameter c_k. The weighting of each sample is initialized as 1/M and each class has equal samples.

M

S ={sk=( , , ),πk c ak k k=1, 2,3,..., }M (16)

Moreover, p(zt |xt)can be calculated using the similarity

distance between z_t and the images of database via a zero-mean Gaussian distribution. A discrete state translation matrix T is being substituted for p(xt |xt−1).

The following four steps are the basic steps of the modified particle filter proposed in this work and are implied to generate a probabilistic estimation at each time-instant. Step C-1: (Prediction)

(1) Calculate p(zt |x)∀x∈Χ(Χ:the set of all postures in the

database) (2) Calculate p(xt|Zt−1) T p(zt |x j).Tij ' ij= = = , and let

∑

_{j ij}T =1.

(3) Calculate the class parameter t k

c for each sample sk by (17) ) | ( max ' 1 j − ∀ = = t k ij t k T i c c (17) Step C-2: (Update)

Update the weighting parameter t k

π and the accumulating parameter t

k

a for each sample s_k by (18)-(19)

( | ) t t k p z xt t ck π = = and let

∑

_k t = k 1 π . (18) M k a a a t k t k t k t ₀_, _, ₁_,₂_,₃_,..., 1 0 = = − +π = (19) Step C-3: (Output):

Calculate the summation of the posteriori density probability

) | (x Zt

p for each posture class of the database and select the posture having the maximum value as the estimated posture, which is described as (20).

∑

∈ ∀ ∀ = = = j k t k j t j p x j Z

xˆ argmax ( | ) argmax _γ π (20)

where {k|ct j}

k

j = =

γ

Step C-4: (Selection):

Relace p(x|Zt−1)by p(x|Zt)and approximate p(x|Zt)by resampling the samples of S_M

(1) Generate a random valuable r∈[0,1] by uniform distribution.

(2) For each sample, find the minimum integer j that meets

r at j ≥ and assign 1 _, _{1, 2,3,...,} t t k j c+ ₌c k₌ M

IV. EXPERIMENTAL RESULTS

This section describes several experiments that demonstrate the effectiveness of the proposed method. SONY EVI-D30 PTZ camera is used to capture the video sequences containing human postures. Fig.3(a) shows the indoor environment and Fig.3(b) shows a sample frame of video sequences. Eight basic human postures are built in the database, and are illustrated in the Fig. 4. The training views of human postures are captured with 5-degree increment intervals and are collected as n

d

W , where contains 72 views for each human posture. The extra views of each object are captured from the trisection-points between each 5-degree point and are collected as n

t

W , where contains 216 views for each human posture. The descriptions of the captured views are list as (21)-(22) { ( )},1 8, 1 72 n n d W = W d ≤ ≤n ≤ ≤d (21) { ( )},1 8, 1 216 n n t W = W d ≤ ≤n ≤ ≤t (22) Furthermore, in the following experiments, T₁ is defined as 72, as and T2 is defined as 20, T3 is defined as 1450, T4 is

(5)

Fig. 3. The indoor environment for capturing the test videos of human posture recognition and human behavior analysis

Posture 1 Posture 2 Posture 3 Posture 4

Posture 5 Posture 6 Posture 7 Posture 8 Fig. 4. The image database that contains 8 3-D human postures

defined as 1800, T₅ is defined as 10 and T₆ is defined as 10. The computing time of calculating the similarity between a test view and a view in the database is about 0.004 seconds with P4 2.8G CPU and 512MB RAM.

A. 3D human posture recognition using 2D views via FDs

In the first experiment, the efficiency of the modified aspect-graph representation using 2D views is demonstrated with synthetic video sequences, where the background was removed manually and automatically using the proposed method in our previous work [17]. The 2D views in n

d

W were used for training the database and the 2D views in n

t

W were used for estimating the performance. Table I and II shows the information about the mean of the aspect numbers, the Top1, Top2 and Top3 matching rates, where the number of aspects of each human posture is fewer than the training views. Although the matching rates decrease while adopting automatic background subtraction, the Top3 matching rates are still above 90%. Moreover, the computing time for recognizing objects is reduced using fewer aspects.

B. 3D human posture recognition using 2D views via FDs and the modified particle filter

In the second experiment, the modified particle filter is applied to combine the temporal information and the human recognition result using FDs. Table III shows the human posture recognition results after applying the modified particle filter. The recognition results are all better than the Top1 matching rates listed in Table I and II. Moreover, Table IV shows the discrete state translation matrix T, which is applied for replace the process density probability distributionp(xt |xt−1). Suppose the column index imeans

the previous state x_t₋₁ and the row index jmeans the present

state x_t . Then the criterion for setting the discrete state

translation matrix T_ij is list below:

TABLEI

THE RESULTS OF HUMAN POSTURE RECOGNITION USING 2D VIEWS VIA FDS WITH BACKGROUND SUBTRACTION MANUALLY

THE INDEX OF POSTURES Result 1 2 3 4 5 6 7 8 A.* Aspect 8 25 37 41 42 38 8 38 29.6 Top 1(%) 94.9 99.1 98.2 100 99.1 96.3 99.5 100 98.4 Top 2(%) 99.1 99.5 100 100 100 99.5 100 100 99.8 Top 3(%) 100 99.5 100 100 100 99.5 100 100 99.9

*: A. means the average number of aspect TABLEII

THE RESULTS OF HUMAN POSTURE RECOGNITION USING 2D VIEWS VIA FDS WITH BACKGROUND SUBTRACTION AUTOMATICALLY

THE INDEX OF POSTURES Result 1 2 3 4 5 6 7 8 A.* Aspect 8 25 37 41 42 38 8 38 29.6 Top 1(%) 90.8 84.9 89.4 86.1 96.3 93.5 81.5 94.0 89.6 Top 2(%) 96.8 85.5 93.5 87.5 97.2 97.7 93.1 95.4 93.3 Top 3(%) 99.5 90.2 96.3 90.2 98.6 97.7 95.4 98.2 95.8 TABLEIII

THE RESULTS OF HUMAN POSTURE RECOGNITION USING 2D VIEWS VIA FDS AND THE MODIFIED PARTICLE FILTER

THE INDEX OF POSTURES

Result

1 2 3 4 5 6 7 8 A.*

C1**(%) 100 100 99.5 100 100 98.6 100 100 99.8

C2**(%) 94.0 92.2 90.7 95.8 97.2 94.9 90.7 94.0 93.7

**: C1 and C2 means the recognition results with background subtraction manually and automatically

(1) When i= j, the coefficient of T_ijis given as the highest

value, 10.

(2) When i≠ j, the coefficient of T_ij is given by (23), where

s

Tmeans the number of translation steps from state i to state j.

2 * 10

Tij= −Ts (23) (3) Adjust the coefficients of T_ij with ±1 by the similarities

between each state manually.

(4) Normailze the discrete state translation matrix T_ijand let 1

T

j ij=

∑

.

C. Human behavior analysis via FDs and particle filter

In the final experiment, one real video sequence is performed using the proposed method in this work to analyze the human behavior. Fig. 5 shows the results of human behavior analysis and each figure contains three figures, where the top figure means the result with only FDs, the middle figure means the result with FDs and the modified particle filter, and the bottom figure means the results decided manually. The video sequence listed in Fig. 5 has total 300 frames and represent the human behavior of “Stand, lie down and then standing up”. From the observation of Fig. 5, some misclassifications happened while the extracted posture contour is not accurate enough to classify the captured human posture, e.g. at the 700th_{frame, the seventh posture}

(sit-to-stand) was regarded as the second posture (sit), and at the 870th_{frame, the eighth posture (sit on the floor) was}

(6)

TABLEIV

THE DISCRETE STATE TRANSLATION MATRIX T

1 2 3 4 5 6 7 8 1 0.24 0.15 0.22 0.04 0.03 0.19 0.14 0.025 2 0.16 0.25 0.11 0.04 0.15 0.15 0.16 0.125 3 0.16 0.09 0.26 0.04 0.03 0.19 0.10 0.025 4 0.02 0.02 0.02 0.38 0.10 0.02 0.02 0.20 5 0.02 0.15 0.02 0.15 0.26 0.02 0.16 0.20 6 0.19 0.12 0.22 0.04 0.03 0.24 0.12 0.025 7 0.19 0.20 0.13 0.04 0.20 0.17 0.20 0.15 8 0.02 0.02 0.02 0.27 0.20 0.02 0.10 0.25

Fig.5. Human behavior analysis: Stand, lie down and then standing up regarded as the fifth posture (squat), etc. However, applying particle filter to human behavior analysis is demonstrated to be useful with real video sequences.

V. CONCLUSIONS

This work proposes an integrated framework for recognizing 3D human postures with 2D images. Frequency and phase information of the posture are calculated from the FDs of the sampled points on the posture contour as the main and assistant features to extract the characteristic views as the aspects. A modified aspect-graph representation from our previous work [12] is proposed to improve the efficiency in human posture recognition. Experimental trials on synthetic and real video sequences have shown the effectiveness of the proposed method. Moreover, the computing time for recognizing an unknown object increases while more human postures are adopted in the database. The real-time issue is still the future work of this study. Besides, although the integrity of the database increases while more 2-D views of a posture are collected, the extracted characteristic views of a

postures are limitied with the consideration of similarity measure between 2-D views. Furthermore, for combing the temporal information between video frames, a modified particle filter is proposed to improve the efficiency in human behavior analysis. A real video sequence that contains human behavior of “Stand, lie down and then standing up” is used to demonstrate the effectiveness of the modified particle filter. The experimental results show that the proposed framework is efficient in both human posture recognition and human behavior analysis.

REFERENCES

[1] C. M. Cyr and B. Kimia, “A Similarity-Based Aspect-Graph Approach to 3D Object Recognition,” in Int’l Journal of Computer Vision, vol. 57, no. 1, pp. 5–22, 2004.

[2] R. Cucchiara, C. Grana, A. Prati, and R. Vezzani, "Probabilistic Posture Classification for Human-Behavior Analysis," IEEE Trans. on Systems, Man, and Cybernetics, vol. 35, no.1, January 2005.

[3] I. Haritaoglu ,D. Harwood and L. S. Davis, “Ghost : A Human Body Part Labeling System Using Silhouettes,” in 14th Int’l Conf. on Pattern Recognition, Brisbane,Australia,1998.

[4] C. Wren ,A. Azarbayejani , T. Darrell and A. Pentland, “Pfinder : Real-Time Tracking of the Human Body,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no 7, pp. 780-785, 1997. [5] L.B.Ozer, W.Wolf, “Real-time posture and activity recognition”, in

Proc. IEEE Workshop, Motion and Video Computing, pp. 133–138, Dec. 2002.

[6] L.B.Ozer, T. Lu and W. Wolf, “Design of a real-time gesture recognition system: high performance through algorithms and software” IEEE Signal Processing Magazine, vol. 22, pp. 57-64, May 2005.

[7] Q. Delamarre and O. Faugeras, “3D Articulated Models and Multi-View Tracking with Silhouettes,” Proc. 17th Int"l Conf. Computer Vision, pp. 716–721, Sept. 1999.

[8] N. Werghi and Y. Xiao, “Recognition of Human Body Posture from a Cloud of 3d Data Points using Wavelet Transform Coefficients,” IEEE Int’l Conf. on Automatic Face and. Gesture Recognition, pp. 70–75, 2002.

[9] S. Iwasawa, K. Ebihara, J. Ohya and S. Morishima, "Real-time human posture estimation using monocular thermal images," IEEE Int’l Conf. on Automatic Face and Gesture Recognition, pp.492-497, 1998. [10] J.J. Koenderink, and A.J. van Doorn, “The singularities of the visual

mapping,” Biol. Cyber., vol.24, pp. 51-59, 1976.

[11] I. Shimshoni, J. Ponce, “Finite-Resolution Aspect Graphs of Polyhedral Objects,” IEEE Trans. on Pattern Anal. Mach. Intell., pp.315-327, 1997.

[12] J.S. Hu, T.M. Su, and C.C. Lin, “Shape Memorization and Recognition of 3-D Objects Using a Similarity-Based Aspect-Graph Approach,” IEEE Int’l Conf. on Systems, Man, and Cybernetics, Oct. 2006. [13] J. Canny, “A Computational Approach to Edge Detection,” IEEE Trans.

on Pattern Analysis and Machine Intelligence, vol. 8, no.6, pp. 679-698, 1986.

[14] C. Xu and J. L. Prince, “Gradient Vector Flow: A New External Force for Snakes,” IEEE Conf. on Computer Vision and Pattern Recognition, pp. 66-71, 1997.

[15] A. Folkers and H. Samet, “Content-based Image Retrieval Using Fourier Descriptors on a Logo Database,” Proceedings of the 16th Int’l Conf. on Pattern Recognition, vol. 3, pp. 521-524, August 2002. [16] V. Kwatra, A. Bobick, A. Johnson, “Temporal integration of multiple

silhouette-based body-part hypotheses,” IEEE Int’l Conf. on Computer Vision and Pattern Recognition, pp. 758-764, Dec. 2001.

[17] J.S. Hu, T.M. Su and S.C. Jeng, “Robust Background Subtraction with Shadow and Highlight Removal for Indoor Environment Surveillance,” IEEE/RSJ Int’l Conf. on Intelligent Robots and Systems, Oct. 2006.