一個基於全方位與PTZ雙攝影機協調控制的人體追蹤與人臉辨識之高畫質機器人視覺系統

(1)

行政院國家科學委員會專題研究計畫成果報告

一個基於全方位與 PTZ 雙攝影機協調控制的人體追蹤與人臉辨識之高畫質機器人視覺系統

研究成果報告(精簡版)

計畫類別：個別型

計畫編號： NSC 98-2221-E-011-119-

執行期間： 98 年 08 月 01 日至 99 年 07 月 31 日執行單位：國立臺灣科技大學資訊工程系

計畫主持人：范欽雄

處理方式：本計畫涉及專利或其他智慧財產權，2 年後可公開查詢

中華民國 99 年 08 月 05 日

(2)

一個基於全方位與 PTZ 雙攝影機協調控制的人體追蹤與人臉辨識之高畫質機器人視覺系統

A High-Definition Robot Vision System for Human Body Tracking and

Face Recognition Based on the Cooperative Control of Omnidirectional and PTZ Cameras

計畫編號：NSC-98-2221-E-011-119

執行期限：民國 98 年 8 月 1 日至 99 年 7 月 31 日

主持人：范欽雄國立台灣科技大學資訊工程系副教授

計畫參與人員：羅金松國立台灣科技大學資訊工程系碩士班研究生

中文中文中文中文摘要摘要摘要摘要

全方位攝影機因為能提供 360 度的場景資訊而被廣泛的應用在影像監控及機器人視覺上，但是全方位攝影機有一個明顯的缺點：僅能低解析度的影像，使得離此攝影機較遠的物體無法被正確的辨識。為了克服這個問題，我們提出一個結合全方位攝影機和 PTZ 攝影機的協力視覺追蹤系統，它係藉由全方位攝影機收到的全景影像裡偵測及追蹤人臉，並控制 PTZ 攝影機去注視所選擇的人臉，以獲得高解析度的人臉影像。首先，人臉偵測的程序係利用時間差異法及膚色篩選器取得移動的人臉；然後，被偵測到的人臉會傳送給利用粒子濾除器，以達到即時性人臉追蹤需求的人臉追蹤程序，當目標人臉被選擇了後，PTZ 攝影機會快速地被引導去注視目標並放大之。接著，人臉追蹤的程序改由 PTZ 攝影機接收到的影像繼續進行人臉追蹤直到目標人臉離開監視範圍；再將人臉追蹤程序切換回使用全方位影像重新開始追蹤，而達到雙攝影機協力追蹤人臉的閉迴路系統。於實驗結果顯示，我們所提方法的人臉追蹤正確率在一般的情況下高於 95%，而在啟動 PTZ 攝影機追蹤的情況下高於 82%。整體的系統效能在未啟動 PTZ 攝影機追蹤的條件下達到每秒 20 個畫面的速度，而在啟動 PTZ 攝影機追蹤後仍然可達到每秒 5 個畫面的速度；根據本計畫所發展的系統在人機介面及影像監控方面相當有助益。

關鍵詞關鍵詞關鍵詞

關鍵詞：：：高畫質人臉追蹤、時間差異、膚色與髮色濾波、：粒子濾除器、全方位與 PTZ 攝影機之融合。

A

BSTRACT

The omni-directional cameras providing 360 degrees field of view (FOV) are widely used in video surveillance and robot vision applications. However, the omni-directional cameras have an obvious drawback; that is, only low-resolution images captured. Therefore, the objects are not able to be correctly identified if they are far from the omni-directional cameras. To overcome this problem, we propose a high-definition human face tracking system using the fusion of omni-directional and pan-tilt-zoom (PTZ) cameras. Our system first detects and tracks human faces in the panoramic images received from an omni-directional camera, and then controls the PTZ camera to fixate at a target face for capturing a high-resolution image. At

the beginning, the human face detection procedure obtains moving human faces by means of temporal differencing together with skin and hair color filtering. The detected human faces are subsequently fed to the face tracking procedure which employs a particle filter for iteratively tracking human faces. Once a target human face is selected, the PTZ camera is directed to stare at the target and zoom in it speedily. Then the face tracking procedure turns to use the images received from the PTZ camera for continuously tracking the target face until it is outside the FOV of the camera.

Keywords: High-definition human face tracking, temporal differencing, skin and hair color filtering, particle filter, fusion of omni-directional and PTZ cameras.

1. I

NTRODUCTION

Since the FOV of a traditional dioptric camera is confined to cover part of a natural scene, the object of interest will only be detected while it is visible. In contrast to the limited FOV of traditional cameras, omni-directional cameras capture 360 degrees scene information and output them in a single image.

Owing to the advantage of wide FOV, omni-directional cameras are broadly used in vision-based applications, such as robotic security guard [1].

A novel approach which combines omni-directional and PTZ cameras as a dual camera system is available in public area monitoring and target tracking recently. Meanwhile, this approach overcomes both the shortages of low-resolution images captured with an omni-directional camera and limited FOV images captured with a PTZ camera. Yao et al.

concentrated on the performance improvement of a cooperative

camera system (omni-directional and slaved PTZ cameras) and

developed a distributed Kalman filter to exchange estimated

trajectories among cameras [2]. Human faces are the most

intuitive biometric features for human identification. A

conventional robot vision system usually consists of off-the-

shelf cameras that have limited view angles and could

detect/track human faces just in a short distance accurately. The

omni-directional cameras, possessing 360 degrees FOV, can

overcome the shortage of the limited view angle. The robot

vision system with the omni-directional camera will be

extensively utilized in video surveillance and intelligent

appliances, because of the advantage of detecting activities in

full directions. However, this robot vision system is still

(3)

shortsighted and cannot obtain a clear human face for further identification. Our main research is to set up a human face tracking system which increases the effective view distance and provides a high-definition human face by aid of fusing the omni-directional and PTZ cameras.

2. S

YSTEM

D

ESIGHN

O

VERVIEW

For enhancing the overall robustness and the spatial coverage of a robot vision system, we implement a dual camera system which tracks targets on the ground plane for an autonomous mobile robot. Both the omni-directional and PTZ cameras are mounted on the robot platform and the on-board PC performs face tracking on the images which are received from the omni-directional camera, then direct the PTZ camera to gaze at a selected target and rapidly zoom in it. The enlarged target is continuously tracked by the PTZ camera until it exceeds the FOV of the camera.

Besides the dual camera system, the power system of the robot platform supplies two DC motors to drive two main wheels, whereas two casters are added for the platform balance in the course of movement. Moreover, six ultrasonic sensors, a digital compass module, and an acceleration sensor, are placed at the bottom level of the platform. In the middle level, an on- board PC which contains an Intel Dual Core 2 1.66 GHz processor and one Giga byte system memory is assembled.

The video frames output of the omni-directional and PTZ cameras are captured by two USB 2.0 video grabber devices and sent to the PC through an USB interface, respectively. Fig.

1 shows our experimental robot platform.

Figure 1. The experimental mobile robot platform.

The human face tracking system that we propose first gets a polar image from an omni-directional camera and converts it to a panoramic one. The motion analysis is then applied to the panoramic image for acquiring motion blobs. Following that, the human face detection algorithm is activated for verifying

the blobs to obtain moving faces. When a human face is detected, it is continuously located and its position is also estimated. Subsequently, the system can conduct the PTZ camera to gaze at the target of interest; that is, a selected human face, and appropriately zoom in it. If the target moves outside the FOV of the PTZ camera, the system makes a fresh start to detect human faces using the omni-directional camera. It is called a close-loop face tracking mechanism. The system block diagram is depicted in Fig. 2. Additionally, Fig. 3(a) demonstrates the original polar image obtained from the omni- directional camera, which is converted to a panoramic image as Fig. 3(b) shows.

Figure 2. The block diagram of our human face tracking system.

3. C

AMERA

C

ALIBRATION

Camera calibration builds up the geometry correspondence

between the omni-directional and PTZ cameras and estimates

the intrinsic and extrinsic parameters for mapping the 3D real

world into an image plane. In this section, we first describe the

calibration method for our experimental omni-directional

camera. Next, we elaborate the PTZ camera calibration

procedure used for controlling the PTZ camera to stare at the

target that was detected and tracked through the omni-

directional camera beforehand.

(4)

(a)

(b)

Figure 3. Omni-directional image acquisition: (a) the polar image obtained from the omni-directional camera; (b) the panoramic image resulting from converting the polar image.

A. Omni-directional Camera Calibration

Omni-directional camera calibration is a lively research area and various projection models are proposed for actual imaging processes [3]. The calibration for such an omni- directional camera mainly includes the estimation of an effective image region, image center, and transformation function.

1) Effective image region

The number of effective pixels received from a camera, locating in the polar image, is quite less than the total number

of image pixels. Retrieving the effective image region can reduce the processing of unused pixels; this also increases the

efficiency of subsequent image processing. The estimation process starts by placing a white tissue on the cataoptric mirror that is mounted over the CCD camera. Consequently, a whole white polar image is obtained from the omni-directional camera sensor. Then the image is converted to a gray level one and binarized by means of an adequate threshold. The external radius R

_ext

and internal radius R

_int

of the polar image are acquired by estimating the radii of a donut-shaped object. The image center can also be received from computing the center of the object.

2) Image transformation

The image taken by the omni-directional camera is originally in polar form, and must be transformed to a panoramic one for the convenience of tracking targets later. Fig.

4 shows the geometry mapping between the omni-directional and PTZ cameras. In the literature, several algorithms have been proposed for omni-directional image transformation [4].

To accomplish this, our method first acquires the relationship between the polar and p

a

noramic images using (1), which is graphically represented in Fig. 5.

( ) sin( and ( ) cos( ,

=c_u+ ) =c_v+ )

u S y x v S y x

(1) where c

u

and c

v

are the coordinates of the common center of the polar and panoramic images; u and v are the pixel coordinates in the polar image domain; x and y are the pixel coordinates in the panoramic image domain, and S(

˙)

is the stretching function along the y-axis.

O

X Z

Y

Om

x

y z

op

Oc

( , , ) p X Y Z

u

−v

target _* ( , ) p x y A natural surrounding scene

A target in part of the panoramic image ( , )

′ p u v

The sensor image

Figure 4. The geometry mapping between the omni-directional and PTZ cameras.

(5)

x

y

v

( , ) c c

u v

u

Figure 5. The relationship between the polar and panoramic images.

Because our color-based face detection and tracking scheme is still efficacious for running on skewed and distorted images, we apply one-to-one mapping along the vertical axis for alleviating the computational load. The following expresses such a stretching function.

S y ( )

=H_p−

y (2) , where H

_p

is the same as R

_ext

denoting the height of the panoramic image.

B. PTZ Camera Calibration

The goal of PTZ camera calibration is to figure out the relationship between the omni-directional and PTZ cameras for directing the PTZ camera to focus on the selected target. For this intention, the pose of the PTZ camera for target fixation is maintained by continuously estimating its required panning and tilting angles. However, traditional camera calibration is unable to meet this specific requirement.

Due to the characteristic of an omni-directional camera (360 degrees FOV), the situating direction of a target of interest can be obtained from calculating the azimuth where it appears. Given the x-coordinate of a target position in the panoramic image, denote θ

₁

and θ

₂

as the previous and desired panning angles, respectively. And W

p

is the width of the panoramic image. Fig. 6 illustrates an example of the panning control that turns the lens of the PTZ camera from θ

₁

to θ

₂

horizontally. The algorithm for determining the panning angle θ

_p

and the panning direction D

_p

of a PTZ camera are stated as follows.

θ1 θ2

0^o 360^o

p x W

Figure 6. Illustration of the panning control from θ1 to θ2.

The mirror equipped in our omni-directional camera is hyperbolic and it provides the view by 15 degrees above the horizon and 60 degrees below. So, the relationship between a target point p(x,y) in the panoramic image and the angle* φ referred to the vertical axis through the standing position of the omni-directional camera should be linear under the assumption that the target is located on the ground plane and its size is known. We evaluate the function f(y) associated with the mapping geometry between the y-coordinate of p(x,y) and the* angle φ related to the hyperbolic mirror by selecting several points on a calibration board which is taken in an image by the

Input: a target position with the x-coordinate, the previous panning angle θ

1

, and the width W

_p

of a panoramic image.

2 360 / ;

θ = ⋅x ° W_p

If

abs(θ₂−θ₁)>180°

then If

θ₂>θ₁

then

θ_p =360° −θ₂+θ₁; D_p =“left;”

Else

θ_p=360° − +θ θ₁ ₂; D_p =“right;”

End If

Else

θ_p=abs(θ θ₂- ₁);

If

θ₂>θ₁

then

D_p =“right;”

Else

D_p =“left;”

End If End If

Output: the panning angle θ

p

and the panning

direction D

p

.

(6)

omni-directional camera and then transformed into a panoramic image.

Fig. 7 graphically shows an example for angular calibration and (3) depicts the aforementioned mapping function.

φ = f y (3) ( ).

Hc

d

φ

Figure 7. The relationship between the angle

φ

and the distance d.

Owing to the characteristic of the hyperbolic mirror, the relationship between the distance of a target away from the two cameras and the angular value of the target in the polar image is linear. Once the angle φ is obtained, the distance d of the target may be calculated by the following equation.

tan( ), φ

=

c

d H (4) where H

_c

is the altitude to which the omni-directional camera is set.

Substituting (3) into (4), the distance d of the target may be estimated by the y-coordinate using

tan( ( )).

=

c

d H f y (5) At last, given a known altitude H

_t

where the PTZ camera is set, the value of the tilting angle θ

_t

for the PTZ camera fixating at the target is derived from

arctan( ).

θ

_t

= H d (6)

_t

4. F

ACE

D

ETECTION

A

ND

T

RACKING

Human face is one of the main biometric features for security applications, and face detection is the first and important processing step for an automated face or facial expressions recognition system. The following first elaborates our face detection procedure. Next, the face tracking algorithm that we develop is presented.

C. Face Detection

Face detection is a procedure to locate human faces in a scene. Several approaches in the field of face detection, fast and accurate, have been proposed in the recent decade [5]. Our face detection procedure mainly comprises temporal differencing, motion analysis, and skin and hair color filtering. Due to the limitation to the length of writing, the motion analysis is not expressed herein.

1) Temporal differencing

Temporal differencing that acquires the dissimilarity between two consecutive images is a speedy method to retrieve motion blocks with a stationary camera. However, the detected motion blocks might be moving objects, noise, shadows of moving objects or the variance generated from lighting change.

The blocks in which belong to moving objects are only interested. The temporal difference of images at time step t is estimated by the following equation that constitutes a binary image using a threshold ε ^.

1 if

¹

0 otherwise,

ε

∧ 

−

≥

=





t t

t

I I

I (7)

where I

_t

and I

_t-1

are the gray level images at time steps t and t-1, respectively.

2) Skin and hair color filtering

Color filtering which detects specific colors is a most intuitive and fast method to find a human face. It is utilized to screen out the pixels that do not possess skin or hair colors after retrieving motion blocks. In addition to skin color filtering [6], hair color filtering is applied to prevent incomplete face regions from detection lost. The method, based on the YC

_b

C

_r

color model proposed in [7], is adopted for hair color regions detection. By combining the above two color filtering outcomes, we obtain human face regions in a high confidence, each of which can be represented by a binary image. That is, the moving human faces are located initially.

D. Face Tracking

The aim of face tracking is to keep on locating human faces detected previously in real time. Namely, the trajectory of a moving face is received from successive image frames without interruption. This accomplishment involves three main phases:

candidate region location, feature extraction, and target tracking. The fundamental technique that we utilize is a particle filter [6], which is composed of four parts: propagation, observation, selection, and estimate depicted in brief below.

5. E

XPERIMENTAL

R

ESULTS

Given: a sample set ^S

^t⁻¹

⁼ { ( ^s

^t^{( )}⁻ⁱ¹

^, ^π

^t^{( )}⁻ⁱ¹

) ⁱ ⁼ ^{1, 2,...,} ^N } ^at

time step t-1, perform the following steps:

1. Propagation: produce each sample s

_t^{( )}ⁱ₋₁

with the weight

( ) 1 i

π

t−

by a dynamic model to obtain the sample set S

_t

. 2. Observation: weigh the samples using both color and motion cues as:

1) Calculate the distance between each sample and the target by

Dis

_t^{( )}ⁱ

= 1 − M

_t^{( )}ⁱ

, where M

_t^{( )}ⁱ

is a linear combination of color and motion cues.

2) Weigh each sample with

( ) 2 ( )

2

1 exp( )

2 2

i

i t

t

π Dis

πσ σ

= − , where σ ^{is the}

standard deviation of a Gaussian distribution.

3. Selection: if π

_t^{( )}ⁱ

exceeds or equals a threshold, select C

_i

even-weighted samples.

4. Estimate: acquire the state of the sample s

_t^⊗

with the largest weight as:

*( )^j

for arg max

*( )ⁱ

t t t

i

π

^⊗

= π j = π

(7)

The visual tracking system that we create is a mobile robot platform equipped with an omni-directional camera and a PTZ camera mounted on the top. The distance H

_t

from the ground plane to the omni-directional camera is 89 cm and H

_c

representing the altitude of the PTZ camera is 83 cm. The resolution of the image frames received from both the omni- directional and PTZ cameras is 640 × 480 pixels, whereas the panoramic image size is set to 360 × 240 pixels. We conduct many experiments in our laboratory and the corridor of the Research Building at National Taiwan University of Science and Technology. The results of distance estimation as well as face detection and tracking are presented in this section orderly.

E. The Results of Face Detection

The moving faces are found by applying the face detection procedure to the panoramic images. This procedure consists of temporal differencing, motion analysis, skin and hair color filtering, and connected component labeling. The entire process takes 45 ms regardless of the number of moving faces in a scene, and an example of detecting a single face bounded with a box is shown in Fig. 8.

Figure 8. Illustration of a single face detected.

The detection rate is evaluated by performing face detection tests on different backgrounds and lighting conditions. TABLE I lists the results of the face detection tests in the cases of locating multiple faces.

TABLE I. THE EVALUATIONOF MULTIPLE FACES DETECTION Experimental

condition

Number of faces

Number of detected faces

Detection rate

Error rate

Normal 104 95 91.3% 8.7%

Skin color-like objects in the background

112 92 82.1% 17.9%

Low lighting 107 80 74.8% 25.2%

F. The Results of Face Tracking

Since there are no certain ways to validate the face tracking by the fusion of the omni-directional and PTZ cameras, we categorize the validation into two kinds: one is the accuracy rate of face tracking with only enabling the zoom in/out control of the PTZ camera which successfully gazes at the target, and another is that with enabling the pan/tilt/zoom control of the PTZ camera. The accuracy rate of face tracking of each kind is measured by the ratio of the number of the faces effectively tracked to that appearing in natural scenes. Fig. 9 demonstrates an example of tracking multiple faces without enabling the pan and tilt control of the PTZ camera. The evaluation of multiple faces tracking under this condition is recorded in TABLE II.

On the whole, the tracking accuracy in a corridor is higher than

that in our laboratory because the latter is possessed of a cluttered background. The simple background in the corridor has less noise disturbance, so that the accuracy rate of face tracking increases. The entire face tracking process takes about 50 ms. It is obviously seen that the performance of a single face tracking is preferable to that of multiple faces tracking in each of experimental sites.

(a)

(b)

Figure 9. Illustration of multiple faces tracking with enabling the zoom in/out control: (a) the tracked face at frame #10; (b) the tracked face at frame #40.

TABLE II. THE EVALUATIONOF MULTIPLE FACES TRACKINGWITH ENABLING THE ZOOM IN/OUT CONTROL

Experimental site

Number of faces

Number of tracked faces

Accuracy rate

Error rate Our

laboratory 146 119 81.5% 18.5%

A corridor 139 115 82.7% 17.3%

6. C

ONCLUSIONS

The robot vision system that we have presented can detect and track human faces in omni-directional images;

furthermore, it can control the PTZ camera to stare at the

selected human face at a high accuracy rate. Our system not

only has the advantage of a wide FOV brought by the omni-

directional camera, but also employs the PTZ camera to

overcome the low resolution problem. Meanwhile, with the

advantage of the wide FOV, the moving human faces in the

area surrounding the mobile robot will be located and tracked

efficiently. Our experimental results manifest that the face

detection and tracking in panoramic images using the omni-

directional camera only costs 50 ms (i.e., 20 fps), whereas it

costs 200 ms in zoom-in images using the PTZ camera with

enabling the pan/tilt/zoom control. The performance of this

robot vision system is very encouraged and quite useful in

human interaction and video surveillance tasks.

(8)

R

EFERENCES

[1]

P. Biber, S. Fleck, and T. Duckett, “3D modeling of indoor environments for a robotic security guard,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, San Diego, California, vol. 3, pp. 124-130.

[2]

Y. Yao, B. Abidi, and M. Abidi, “Fusion of omni- directional and PTZ cameras for accurate cooperative tracking,” in Proceedings of the IEEE International Conference on Video and Signal Based Surveillance, 2006, Sydney, Australia, p. 46.

[3]

J. Kannala and S. Brandt, “A generic camera calibration method for fish-eye lenses,” in Proceedings of the International Conference on Pattern Recognition, 2004, Cambridge, United Kindom, pp. 10-13.

[4]

G. Scotti, L. Marcenaro, C. Coelho, F. Selvaggi, and C. S.

Regazzoni, “Dual camera intelligent sensor for high definition 360 degrees surveillance,” IEE Proceedings- Vision Image and Signal Processing, vol. 152, no. 2, pp.

250-257, 2004.

[5]

K. C. Yow and R. Cipolla, “Feature-based human face detection,” Image and Vision Computing, vol. 15, no. 9, pp. 712-735, 1997.

[6]

C. S. Fahn, M. J. Kuo, and K. Y. Wang, “Real-time face tracking and recognition based on particle filtering and AdaBoosting techniques,” in Proceedings of the 13th International Conference on Human-Computer Interaction, 2009, San Diego, California, pp.198-207.

[7]

K. T. Song and W. J. Chen, “Face recognition and

tracking for human-robot interaction,” in Proceedings of

the International Conference on Systems, Man, and

Cybernectics, 2004, Hauge, Netherlands, pp. 2877-2882.

(9)

國科會補助計畫衍生研發成果推廣資料表

日期 2010年08月05日

國科會補助計畫

研發成果名稱

發明人 (創作人)

技術說明

技術移轉可行性及預期效益技術/產品應用範圍

產業別

計畫名稱:

計畫主持人:

計畫編號: 學門領域:

(中文)

(英文)

成果歸屬機構

(中文)

(英文)

一個基於全方位與PTZ雙攝影機協調控制的人體追蹤與人臉辨識之高畫質機器人視覺系統

范欽雄

98 -2221-E -011 -119 - 圖形辨識基於隱藏式馬可夫模型學習機制的人體動作辨識技術

Human Actions Recognition Techniques Based on the Learning Mechanism of HMMs

國立臺灣科技大學范欽雄,吳文華

近年來，人類動作辨識技術已在電腦視覺領域中備受關注，在自動監視系統、

人機界面、居家安全照護系統和智慧型居家環境等方面的應用中皆佔有主要的地位。有鑑於此，我們提出一個基於隱藏式馬可夫模型的人類動作辨識方法，

本方法主要包含移動物體捕捉、特徵擷取、動作分段以及動作辨識四個步驟。

首先，前景人物透過背景剪裁方法由背景模型中抽取出來，並將抽取出來影像轉換成二值化的影像格式；接著，採取星型骨架描述人類動作姿勢特徵，星型骨架為前景人物輪廓突出點與人物中心點連結而成，此外，也將時間影像序列轉為特徵序列，並進一步產生符號序列，藉以為動作建構隱藏式馬可夫模型。

在動作分段步驟中，觀察前景人物輪廓突出點，計算出固定不動的突出點個數變化，藉此區分出不同動作的符號序列，同時結合滑動窗口技巧，將不同動作一一分段出來，並於事先的訓練階段中，為所有動作建立隱藏式馬可夫模型，

並在最後的動作辨識步驟中，找出與分段動作符號序列最匹配的動作模型，此動作即為辨識的結果。此外，我們也利用了星型骨架中心點的移動速度差異，

來辨識人類跌倒的動作。

In this research, we proposed human actions recognition techniques based on HMMs. The star skeleton was used to effectively and efficiently represent the features of postures for each human action.To handle the recognition in continuous actions, the action segmentation is conducted by combining the sliding window scheme with stable contact detection. The extreme points of star skeleton

remaining in the same place for a long enough period are the stable contacts. Primitive motion units (PMUs) that have a consistent number of stable contacts are regarded as a segmented action. When the period of PMU is too long, we employ sliding window to segment the continuous actions. We build an HMM for each action type except

“fall-down”, and the recognition result is determined as the category which the best matched the observed sequence. Moreover, we 資訊服務業

自動監視系統、人機界面、居家安全照護系統和智慧型居家環境

本技術可應用在任何視覺追蹤及辨識系統上，預估在有應用視覺型影像追蹤辨識系統的市場上都有相當高的優勢。

註：本項研發成果若尚未申請專利，請勿揭露可申請專利之主要內容。

(10)

(11)

國科會補助專題研究計畫成果報告自評表

請就研究內容與原計畫相符程度、達成預期目標情況、研究成果之學術或應用價值（簡要敘述成果所代表之意義、價值、影響或進一步發展之可能性）、是否適合在學術期刊發表或申請專利、主要發現或其他有關價值等，作一綜合評估。

1. 請就研究內容與原計畫相符程度、達成預期目標情況作一綜合評估

■達成目標

□未達成目標（請說明，以 100 字為限）

□實驗失敗

□因故實驗中斷

□其他原因說明：

2. 研究成果在學術期刊發表或申請專利等情形：

論文：□已發表 ■未發表之文稿 □撰寫中 □無專利：□已獲得 ■申請中 □無

技轉：□已技轉 □洽談中 ■無其他：（以 100 字為限）

3. 請依學術成就、技術創新、社會影響等方面，評估研究成果之學術或應用價值（簡要敘述成果所代表之意義、價值、影響或進一步發展之可能性）（以 500 字為限）

全方位攝影機可以提供 360 度的場景資訊，但是它有一個明顯的缺點：僅能低解析度的影像，使得離此攝影機較遠的物體無法被正確的辨識。為了克服這個問題，我們提出一個結合全方位攝影機和 PTZ 攝影機的協力視覺追蹤系統，它係藉由全方位攝影機收到的全景影像裡偵測及追蹤人臉，並控制 PTZ 攝影機去注視所選擇的人臉，以獲得高解析度的影像。

首先，人臉偵測的程序係利用時間差異法及膚色篩選器取得移動的人臉；然後，被偵測到的人臉會傳送給利用粒子濾除器建構的即時性人臉追蹤系統進行追蹤，當目標人臉被選擇後，PTZ 攝影機會快速地去注視目標並放大之。接著，人臉追蹤的程序改由 PTZ 攝影機接收到的影像繼續進行追蹤直到目標人臉離開監視範圍；再將追蹤程序切換回使用全方位影像重新開始追蹤，而達到雙攝影機協力追蹤人臉的閉迴路系統。於實驗結果顯示，人臉追蹤正確率在一般的情況下高於 95%，而在啟動 PTZ 攝影機追蹤的情況下高於 82%。整體的系統效能在未啟動 PTZ 攝影機追蹤的條件下達到每秒 20 個畫面的速度，而在啟動 PTZ 攝影機追蹤後仍然可達到每秒 5 個畫面的速度；根據本計畫所發展的系統在人機介面及影像監控方面相當有助益。

(12)