Codebook matching - Human detection - 應用於主動式攝影機上的權重式重取樣粒子濾波器的人形追蹤

Chapter 2 Human detection

2.2 Codebook matching

Moving objet Normalize Shape feature extraction

Histogram feature extraction Feature

matching with codebook Human /

Nonhuman

Fig. 2-3 Codebook matching

The codebook matching algorithm in this thesis is based on human-shape information. At first, the moving object obtained from moving object extraction is normalized into (20*40). The shape feature extraction extracts the position of shape pixels in an image, as pointed by red dot in Fig. 2-4. Ten Y-axis coordinates are chosen for leftmost and rightmost of object’s boundary. Then, the twenty coordinates of its related X-axis coordinate are arranged as feature vector as shown as blue blocks in Fig. 2-4.

The projection of X-axis calculates the histogram of pixel values on the Y-axis.

Total bin of the histogram is ten as shown as green blocks in Fig. 2-4. Therefore, an object can be representing as 30-feature vector in our work.

Fig. 2-4 Feature word X

By observation, the top and bottom of Y-axis of shape pixels are not suitable to choose as feature points, because these pixels are changeable. The way to find ten specific coordinates at Y-axis is to calculate the standard deviation in each value of Y-axis for a training sample, and then chooses ten lowest standard deviation values each side.

The feature vector is matched with the code vectors in the codebook. The codebook represents a list of feature vectors. The purpose of matching process is to find a code vector with the minimum distortion to the feature vector of object. In order to describe how the codebook is used to classify the human from other objects, there are some variables should be defined at first. Let a series of features vector denotes as 𝑋, and each of 𝑋 includes data of 𝑀 dimensions, indicated by 𝑋⁰… 𝑋^𝑖…𝑋^𝑀−1. There are 𝑁 sets of code word 𝑉 defined as 𝑉₀…𝑉_𝑗…𝑉_𝑁−1 in codebook 𝐶. Each of 𝑉_𝑗 is just like 𝑋 that has 𝑀 dimensional data defined as 𝑉_𝑗⁰…𝑉_𝑗^𝑖…𝑉_𝑗^𝑀−1. The distortion between feature word and code words is defined in Eq.

2.4.

𝐷𝑖𝑠_𝑗 = ‖𝑋 − 𝑉_𝑗‖ = ∑^𝑀−1_𝑖=0 |𝑋^𝑖− 𝑉_𝑗^𝑖| (2.4)

𝐷𝑖𝑠_𝑚𝑖𝑛= 𝑚𝑖𝑛(𝐷𝑖𝑠_𝑗) 𝑗 = 0 … 𝑁 − 1 (2.5)

With the definition of the variables above, the feature vector of normalized moving object compares with every 𝑉_𝑗 in the codebook 𝐶. If the value of 𝐷𝑖𝑠_𝑚𝑖𝑛 is smaller than the threshold defined by user, the moving object with the feature word 𝑋 is considered as human. Otherwise, it is a nonhuman object. The demonstration of comparing 𝑋 with 𝑉_𝑗 is showed in Fig. 2-5.

Fig. 2-5 The procedure of the comparison with the codebook

3 Chapter 3

Human tracking

A human tracking system based on particle filter algorithm is proposed in this thesis. The key idea of particle filter is to approximate the probability distribution by a weighted sample set and each sample represents one hypothetical state of the object with a corresponding discrete sampling probability [25]. The original resample method selects sample using uniformly distributed random number. It usually keeps some low weighted samples which may decrease the accuracy of tracking. In this thesis, we proposed a weighted resampling particles method which just selects high weighted samples.

3.1 Histogram color features

Mostly, feature based object tracking method uses color as feature. Color information is more accurate than gray scale. Based on experiment, the HSV color space gives a good performance while tracking, thus we apply it on our tracking algorithm.

The main purpose of HSV color space is to reduce the sensitivity of illumination or lightness information of RGB color space. Fig. 3-1 (a), (b) shows RGB and HSV color space, respectively. The HSV model, also known as HSB model, was created in 1978 by Alvy Ray Smith. It is a nonlinear transformation of the RGB color space. It defines a color space in terms of three components: hue, saturation, and value. The definition is described below: [31]

1. Hue: It is the color type and ranges from 0 ~ 360 degree. Each value corresponds to one color. For example, 0 is red, 45 is orange and 55 yellow. When it comes to 360 degree, it is also equal to 0 degree.

2. Saturation: It is the intensity of the color, and ranges from 0%~100%. 0 means no color, and that means only gray value between black and white exists. 100 means the intense color with the most color variety.

3. Value: It is the brightness of the color, and also ranges from 0%~100%. 0 is always black. Depending on the saturation, 100 may be white or a more or less saturated color.

(a) (b)

Fig. 3-1 (a) RGB color model [31] (b) HSV color model [32]

The transformation from RGB color model to HSV color model is written in the following equation.

15 respectively. Generally, each color-channel has 8-bits data, means produce (256*256*256) of bins of color histogram. Without loss of generality, the color-data is quantizing into (6*6*6). Therefore, total bin of the color histogram is 216 bins.

3.2 Kernel function

Kernel function is used to represent a target object. Different statistical distributions can be adopted for both target object or candidate model, such as Gaussian kernel, Flat kernel and Epanechnikov kernel. Let x be normalized pixel as location in the region defined as target model, then the Gaussian kernel, Flat kernel and Epanechnikov kernel [33] are defined as follows.

1. Gaussian kernel

Fig. 3-2 (a) and (c) show the distribution of Gaussian and Epanechnikov kernel are similar. They have highest value are the center distribution. If we take a look at the ROI of target model in Fig. 3-3 (a), the pixels which are closer to the center of ROI is

containing more important information and the background pixels are mostly near at ROI’s boundary. Therefore, Gaussian and Epanechnikov kernel can disregard the boundary information and the accuracy will larger than flat kernel.

(a) (b) (c)

Fig. 3-2 (a) Gaussian kernel (b) flat kernel (c) Epanechnikov kernel

(a) (b) (c)

Fig. 3-3 (a) Target object (b) Kernel function (c) Target object and Kernel function

3.3 Particle filter algorithm

Particle filter provides a robust tracking framework, as it models uncertainty. It can keep its options open and consider multiple state hypotheses simultaneously.

Since less likely object states have a chance to temporarily remain in the tracking process, particle filters can deal with short-lived occlusions [25].

Define target model at location-y as m-bin histogram 𝑞_𝑦 = {𝑞_𝑦^(𝑢)}_{𝑢=1…𝑚} which compute by equation below.

𝑞_𝑦^(𝑢) = 𝑓 ∑^𝐼_𝑖=1𝑘 (^{‖𝑦−𝑥}_𝑎 ^𝑖^‖) 𝛿[ℎ(𝑥_𝑖) − 𝑢] (3.7) with the normalized factor

𝑓 = ¹

∑^𝐼_𝑖=1𝑘(^{‖𝑦−𝑥𝑖‖}_𝑎 ) (3.8)

where I denotes the number of pixels in the ROI region, 𝛿 is the Kronecker delta function, and a=√𝑤²+ ℎ² is used to normalize the size of the object region. The sample model 𝑝_𝑦 = {𝑝_𝑦^(𝑢)}_{𝑢=1…𝑚} is represented as the same model as target model.

The similarity value 𝜌 between target and sample model computes by Bhattacharyya distance d. Large 𝜌 means two models more similar, 𝜌 equal to 1 when two histograms are identical.

𝑝_𝑦^(𝑢) = 𝑓 ∑^𝐼_𝑖=1𝑘 (^{‖𝑦−𝑥}_𝑎 ^𝑖^‖) 𝛿[ℎ(𝑥_𝑖) − 𝑢] (3.9)

𝜌[𝑝, 𝑞] = ∑^𝑚_𝑢=1√𝑝^(𝑢)𝑞^(𝑢) (3.10)

𝑑 = √1 − 𝜌[𝑝, 𝑞] (3.11)

In particle filter algorithm, the target model can be represented by a state vector 𝑠_{𝑡𝑎𝑟𝑔𝑒𝑡}.

𝑠_{𝑡𝑎𝑟𝑔𝑒𝑡} = {x, 𝑣_𝑥, y, 𝑣_𝑦, w, h} (3.12)

where (x, y) specify the center position of ROI, (𝑣_𝑥, 𝑣_𝑦) object’s motion. w and h

denote the width and height of ROI, respectively.

The initial sample set 𝑆_{𝑖𝑛𝑖𝑡𝑖𝑎𝑙} = {𝑠^(𝑛)}_{𝑛=1…𝑁} compute by

𝑠^(𝑛) = 𝐼𝑠_{𝑡𝑎𝑟𝑔𝑒𝑡}+ 𝑟. 𝑣. (3.13)

with 𝑁 is the number of samples, 𝐼 is an identity matrix, and 𝑟. 𝑣. is a multivariate Gaussian random variable.

The sample set is propagated through a dynamic model as following equation.

𝑠_𝑡= 𝐴𝑠_𝑡−1+ 𝑟. 𝑣._𝑡−1 (3.14)

where 𝐴 defines the deterministic component of the model. By using every sample’s weight and its state vector, the target human’s position and size can be obtained from estimated vector using following equation. opportune moment of resample step is determined by following equation.

𝑁_𝑒𝑓𝑓 < 𝑁_𝑡ℎ𝑠 (3.17)

𝑁_𝑒𝑓𝑓 = ¹

∑^𝑁_𝑛=1(𝜔_𝑡^(𝑛))² (3.18)

𝑁_𝑡ℎ𝑠= 𝑟𝑎𝑡𝑒 ∗ 𝑁 (3.19)

where 𝑟𝑎𝑡𝑒 ∈ (0,1), 𝑁_𝑒𝑓𝑓 and 𝑁_𝑡ℎ𝑠 are the effective number of samples and given of threshold sample, respectively. During resample step, samples with high weight may be chosen several times, leading to identical copies, while others with relatively low weights may not be chosen at all.

Given a sample set 𝑆_𝑡−1 and the target model q, for the first iteration, 𝑆_𝑡−1 is

2. Observe the color distributions:

(a) calculate the color distribution 𝑝𝑠_𝑡^(𝑛)

(𝑢) = 𝑓 ∑ 𝑘 (^‖𝑠^𝑡

(𝑛)−𝑥_𝑖‖

𝑎 ) 𝛿[ℎ(𝑥_𝑖) − 𝑢]

𝐼𝑖=1 for each sample in the set 𝑆_𝑡

(b) calculate the Bhattacharyya coefficient for each sample of the set 𝑆_𝑡 ρ [𝑝_𝑠 (a) calculate the normalized cumulative probabilities 𝑐_𝑡^′

𝑐_𝑡⁽⁰⁾ = 0

𝑐_𝑡^(𝑛) = 𝑐_𝑡^(𝑛−1)+ 𝜔_𝑡^(𝑛) 𝑐^′_𝑡^(𝑛) = ^𝑐^𝑡^(𝑛)

𝑐_𝑡^(𝑁)

(b) generate a uniformly distributed random number 𝑟 ∈ [0,1]

Finally resample by 𝑆_𝑡 = 𝑆_𝑡^′

Fig. 3-4 Weighted resampling particle filter

Our proposed tracking method is showed in Fig. 3-4. The differences between original particle filter and ours are weighted resampling and occlusion handler. We will explain them later.

3.4 Weighted resampling algorithm

The original resample step in particle filter selects samples randomly. Samples with a high weight value may be chosen several times, leading to identical copies. But there are some samples with relatively low weights are selected in resample step.

Fig. 3-5 Original resampling

Fig.3-5 shows the samples points with high weights are in the ROI (green block), and samples with relatively low weights are in the red block. Although, two blocks

have nearly same similarity value, but the actual target object is in the green block.

Consequently, it may track a different object as target object. In other word, it will decrease the accuracy of tracking. Thus, we proposed a weighted resampling algorithm to cover this problem. First, choose top sample set 𝑆_𝑡^𝑡𝑜𝑝 with 𝑁_𝑡𝑜𝑝 weights from set 𝑆_𝑡.

𝑁_𝑡𝑜𝑝 = 𝑡𝑜𝑝 ∗ 𝑁 (3.20)

𝑆_𝑡^𝑡𝑜𝑝 = {𝑠^{𝑡𝑜𝑝(𝑛)}}_{𝑛=1…𝑁}_𝑡𝑜𝑝 (3.21)

where 𝑡𝑜𝑝 is a top rate which set to 0.2. The 𝑆_𝑡^𝑡𝑜𝑝 just selecting samples with top 20% weights from set 𝑆_𝑡. Reproduce 𝑁 samples in 𝑆_𝑡 according to the weight of 𝑠^{𝑡𝑜𝑝(𝑛)}. This step will produce 𝑠^{𝑡𝑜𝑝(𝑛)} which has relative high more times in 𝑆_𝑡, and others with relative low weight will be produced at least one time. Fig. 3-6 shows the weighted resampling result. Most of sample points lie in the green block or in the target object region.

Fig. 3-6 Weighted resample

3.5 Target update

The GMM (Gaussian Mixture Model) applies to update the target model over time. Let 𝐾 Gaussian distributions are chosen to approximate any continuous

The idea of GMM update algorithm is to update target model’s color histogram.

Each bin 𝑞^(𝑢) is modeled by 𝐾 = 3 Gaussian distributions. The mean 𝜇_𝑘, standard deviation 𝜎_𝑘, and weight 𝜋_𝑘 will be initialized as 𝜇_𝑘 = 𝑞^(𝑢), 𝜎_𝑘 = 1, and 𝜋_𝑘 =_𝐾¹, where 𝑘 = 1~𝐾.

1. Sort the {𝜋_𝑘}_𝑘=1~𝐾 in descending order and obtain the order {𝜋_𝑎, 𝜋_𝑏, 𝜋_𝑐} which 𝜋_𝑎 ≥ 𝜋_𝑏 ≥ 𝜋_𝑐.

2. Update bin’s value by following equation.

𝑞^′(𝑢) = 𝐴𝜇_𝑎+ 𝐵𝜇_𝑏+ 𝐶𝜇_𝑐 (3.23)

where A = 0.6, B = 0.25, C = 0.15 and a, b, c is the descending order.

3. If the difference previous and current frame’s 𝑞^(𝑢) is smaller than a threshold.

Find the first one Gaussian distribution which follows Eq. 3.24.

|𝑞^(𝑢)− 𝜇_𝑘| < 𝜎_𝑘∗ 3 (3.24)

Step 1~3 will produce updated target model 𝑞^′= {𝑞^′(𝑢)}

𝑢=1…𝑚.

3.6 Occlusion handler

Normally, particle filter can handle some occlusion condition, but it depends on the range of samples which spread in the spatial space. If some samples spread in the location where human appeared after occlusion, then the human still can be tracked continuously. On the other hand, if the occlusion happened and spread range of samples is too small to cover the region where human appears, then resample will lead to track lost.

The occlusion handler in our work is based on color similarity of target and candidate model. The details are described below.

1. Create candidate model 𝑐 = {𝑐^(𝑢)}_{𝑢=1…𝑚} from the ROI in current frame.

2. Compute similarity value between target model 𝑞^′= {𝑞^′(𝑢)}

𝑢=1…𝑚 and candidate model 𝑐 = {𝑐^(𝑢)}_{𝑢=1…𝑚}.

3. If 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 < 𝑡ℎ𝑠_𝑠𝑖𝑚, then do not process the resample step. Assume candidate model is occluded with other object.

4. Add counter 𝐶𝑜𝑢𝑛𝑡 = 𝐶𝑜𝑢𝑛𝑡 + 1.

5. During tracking process, the step 1~4 are iterated until the tracking human has appeared (similarity value larger than ths_sim) or 𝐶𝑜𝑢𝑛𝑡 ≥ 10 which avoided the samples spread out of image.

6. Then the resample step is restarted.

(a) frame 681 (b) frame 685

Fig. 3-7 Occlusion handler (a) frame 681 (b) frame 685 (c) frame 690 (d) frame 695

PF similarity

similarity

< 0.7 Count < 10

Yes

Weighted resampling

End Count=0

Count++

Yes

Occlusion handler

Fig. 3-8 Occlusion handler flow chart

Fig. 4-1 Active camera control through RS485

The active camera is controlled by pelco P-protocol [34] through RS-232 to RS-485 converter. It has to control pan (horizontal direction), tilt (vertical direction) angle, and zoom’s step to achieve tracking purpose.

The pelco P-protocol has 8 bytes data with message format as shown in Fig. 4-2.

Byte1 and byte7 is start and stop byte, and always set to 0xA0 and 0xAF respectively.

Byte2 is the receiver or camera address. In this thesis, we only use one camera, so byte2 always set to 0x00. Byte3, byte4, byte5, byte6 are used to control pan-tilt-zoom (PTZ) as shown in Fig. 4-3. The last byte is an XOR check sum byte.

Data byte 2 Data byte 3 Data byte 4

Fig. 4-2 Message format

Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0

Data byte1 Fixed to 0 Camera On Auto Scan On

Camera On/Off

Iris Close Iris Open Focus Near Focus Far

Data byte2 Fixed to 0 Zoom Wide Zoom Tele Tilt Down Tilt Up Pan Left Pan Right 0 (for pan/tilt) Data byte3 Pan speed 00 (stop) to 3F (high speed) and 40 for Turbo

Data byte4 Tilt speed 00 (stop) to 3F (high speed)

Fig. 4-3 Data byte 1 to 4 format

In this thesis, we divide the image into 9 regions associated with pan-tilt directions, and keep moving object in the center of FOV. Every region has specific direction as shown in Fig. 4-4. If the target is located on stop-region, then camera is set to stop. Meanwhile, the camera speed on other regions is determined by PID controller. The zoom-in and zoom-out will be activated if the target’s size becomes smaller or larger than user’s defined size. The details of camera control are showed in Fig. 4-5.

S T O P

Fig. 4-4 Control direction for each regions

27 Tracking target

Target size >

upper bound Zoom out command

Target size <

lower bound Zoom in command

Target position in

center region Stop command

PID control

Pan / Tilt command

Update upper bound

& lower bound &

target size

Send PTZ command Yes

Yes

Yes No

Fig. 4-5 Camera control flow chart

4.1 PID controller

A proportional-integral-derivative controller (PID controller) is a generic control loop feedback mechanism (controller) widely used in industrial control systems [35].

The diagram of PID is showed in Fig. 4-6.

Fig. 4-6 The PID controller

In the PID control system, the monitored Plant/Process is hoped to keep one ideal state. The measured value of Plant/Process is 𝑦(𝑡) which is sent to the comparer to compare with setting value 𝑢(𝑡). If the Plant/Process has affected by disturbance, the measured value is not equal to setting value and the comparer will produce error signal 𝑒(𝑡). The error signal 𝑒(𝑡) is sent to controller. The controller produces output signal 𝐶_𝑜𝑢𝑡 to correct Plant/Process and make it returns to ideal state.

The output signal 𝐶_𝑜𝑢𝑡 is defined by following equation.

𝐶_𝑜𝑢𝑡 = 𝐾_𝑝𝑒(𝑡) + 𝐾_𝐼∫ 𝑒(𝑡) 𝑑𝑡 + 𝐾𝐷𝑑𝑒(𝑡)

𝑑𝑡 (4.1)

where 𝐾_𝑝 is proportional constant, 𝐾_𝐼 is integral constant, and 𝐾_𝐷 is derivative constant.

The controller consists proportional controller (P controller), integral controller (I controller), and derivative controller (D controller).

1. P controller: is error signal 𝑒(𝑡) multiplied by 𝐾_𝑝. The Plant/Process which has been disturbed can be corrected by this controller, but there are some small eternal error cannot solve by this controller.

2. I controller: is the integral of error signal 𝑒(𝑡) with time. In other words, it multiplies error with its existed time. Also, it can correct the small eternal error which P controller cannot overcome. This controller can use the accumulated integrals with time to make disturbed Plant/Process recover to setting value 𝑢(𝑡).

3. D controller: is the differential of error signal 𝑒(𝑡). Due to this operation, the system has the perspective and can predict the Plant/Process which has large variation.

The corresponding variables of PID controller in our work are defined as follows:

Setting value 𝑢(𝑡): the center position of image.

Error signal 𝑒(𝑡): the difference of center position and target position.

Measured value 𝑦(𝑡): the target position which estimated by tracking system.

Output signal 𝐶_𝑜𝑢𝑡: the output is transferred to pan / tilt speed.

We use two independent PID controllers to control horizontal and vertical position difference, and estimate the speed of pan and tilt. The 𝐶_𝑜𝑢𝑡 is converted to pan speed and tilt speed by Eq. 4.2, Eq. 4.3.

𝑆𝑝𝑒𝑒𝑑_𝑝𝑎𝑛 = 𝐶_𝑜𝑢𝑡 ∗ 0.1 + 𝑜𝑓𝑓𝑠𝑒𝑡_𝑝𝑎𝑛 (4.2)

𝑆𝑝𝑒𝑒𝑑_{𝑡𝑖𝑙𝑡} = 𝐶_𝑜𝑢𝑡 ∗ 0.1 + 𝑜𝑓𝑓𝑠𝑒𝑡_{𝑡𝑖𝑙𝑡} (4.3)

𝑜𝑓𝑓𝑠𝑒𝑡_𝑝𝑎𝑛 = { 𝑜𝑓𝑓𝑠𝑒𝑡_𝑝𝑎𝑛, 𝐶_𝑜𝑢𝑡 ≥ 0

−𝑜𝑓𝑓𝑠𝑒𝑡_𝑝𝑎𝑛, 𝐶_𝑜𝑢𝑡 < 0 (4.4)

𝑜𝑓𝑓𝑠𝑒𝑡_{𝑡𝑖𝑙𝑡} = { 𝑜𝑓𝑓𝑠𝑒𝑡_{𝑡𝑖𝑙𝑡}, 𝐶_𝑜𝑢𝑡 ≥ 0

−𝑜𝑓𝑓𝑠𝑒𝑡_{𝑡𝑖𝑙𝑡}, 𝐶_𝑜𝑢𝑡 < 0 (4.5)

where 𝑜𝑓𝑓𝑠𝑒𝑡_𝑝𝑎𝑛 and 𝑜𝑓𝑓𝑠𝑒𝑡_{𝑡𝑖𝑙𝑡} are defined by user. These values are related to the pan-tilt speed provided by camera’s specifications (0 to 64). The PID controller in Eq.

4.2 and Eq. 4.3 produced limited speed value in a suitable range, because if the speed is set too large, then the camera may drive over the object. The consequence is tracking lost may happen.

30 new width and height are defined as follow.

𝑟𝑎𝑡𝑖𝑜_{𝑤 ℎ}_⁄ =^𝑤_ℎ^{𝑖𝑛𝑖𝑡𝑖𝑎𝑙}

5 Chapter 5

Experimental results

This system was implemented on PC platform with Intel® Core™ i5 CPU 650 @ 3.20GHz, 4GB RAM, and developed in Borland C++ Builder 6.0 on Windows 7. The system has been tested under several environments in order to verify its performance and stability. Both video files (AVI uncompressed format) and image sequences from active camera are tested.

5.1 Track on video file

Three videos have been used to verify the tracking system, with parameter particle filter as follows.

Number of samples 𝑁 = 30

Number of bins in histogram 𝑚 = 6 ∗ 6 ∗ 6 = 216

State covariance (𝜎_𝑥, 𝜎_𝑣_𝑥, 𝜎_𝑦, 𝜎_𝑣_𝑦, 𝜎_𝑤, 𝜎_ℎ) = (2,0.5,2,0.5,0.4,0.8)

1. Video 1 is used to verify the occlusion handler in our system as shown in Fig.

5-1 and Fig. 5-2. Figure 5-1 shows the tracking result without occlusion handler.

The full occlusion condition happens in frame 685. If the particle filter resamples during the full occlusion condition, it may resample on uncorrect positions as shown in frame 689 and tracking will lost in frame 694 and 698. Meanwhile, when the full occlusion happens in the particle filter with occlusion handle, the resample step will not be done immediately. So, the sample set can keep widespread range to track after full occlusion.

Frame 558 Frame 673

Frame 685 Frame 689

Frame 694 Frame 698

Fig. 5-1 Tracking without occlusion handler

Frame 558 Frame 673

Frame 685 Frame 689

Frame 694 Frame 698

Fig. 5-2 Tracking with occlusion handler

2. Video 2 is used to verify the tracking feature. Figure 5-3 shows human wears black jacket walking near a black chair. In this case, the target human has similar color feature with the black chair, but the proposed system still can tracks the target human.

Frame 193 Frame 258

Frame 292 Frame 351

Frame 372 Frame 398

Fig. 5-3 Object has similar color as target human

3. Video 3 is used to verify the tracking performance in complex situation. Figure 5-4 shows the target human is paritial occluded with a chair. The target human does sitting down and stand-up activity, as shown in Fig. 5-4 (b). Moreover, the target human is partial occluded with other human as shown in Fig. 5-4 (c).

(a)

(b)

(c)

Fig. 5-4 (a) frame 1436, 1547 and 1605 (b) frame 2152, 2214 and 2277 (c) frame 2757, 2818 and 2838

5.2 Track on active camera

The active camera sets-up in our laboratory. The complexity of the environment is enough to verify the system while detecting and tracking moving human. The parameters of particle filter and PTZ are set as follows:

Number of samples 𝑁 = 30

Number of bins in histogram 𝑚 = 6 ∗ 6 ∗ 6 = 216

State covariance

(𝜎_𝑥, 𝜎_𝑣_𝑥, 𝜎_𝑦, 𝜎_𝑣_𝑦, 𝜎_𝑤, 𝜎_ℎ) = (10,1,10,1,1,2)

𝑜𝑓𝑓𝑠𝑒𝑡_𝑝𝑎𝑛 = 12 𝑜𝑓𝑓𝑠𝑒𝑡_{𝑡𝑖𝑙𝑡} = 6

Proportional constant 𝐾_𝑝 = 0.9 Integral constant 𝐾_𝐼 = 0.1 Derivative constant 𝐾_𝐷 = 0.15 𝑟𝑎𝑡𝑒_𝑏𝑖𝑔= 1.1

𝑟𝑎𝑡𝑒_{𝑠𝑚𝑎𝑙𝑙} = 0.9

1. Tracking result only by controlling pan and tilt. The experimental result shows the target human is mostly located on camera’s FOV, no matter how he walks.

(a) (b) (c)

(d) (e) (f)

(h) (i) (j)

(k) (l) (m)

Fig. 5-5 Update pan and tilt command to track

2. Tracking to test the zoom-in/out. In this case, the target human is walking away from camera or approaching to the camera.

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

Fig. 5-6 The effects of zoom in / out

Figure 5-6 (a) shows the target human has been detected and the 𝑍𝑜𝑜𝑚_{𝑙𝑎𝑦𝑒𝑟} is initialized to 0. If there is a zoom-in happened, 𝑍𝑜𝑜𝑚_{𝑙𝑎𝑦𝑒𝑟} is added by 1. On the other hand, 𝑍𝑜𝑜𝑚_{𝑙𝑎𝑦𝑒𝑟} is subtracted by 1 when zoom-out happened. The details of 𝑍𝑜𝑜𝑚_{𝑙𝑎𝑦𝑒𝑟} is showed in Table 5-1.

Table 5-1 Zoom layer varies in Fig. 5-6

(a) (b) (c) (d) (e) (f) (g) (h) (i)

𝒐𝒐 0 0 1 2 1 2 3 2 1

3. Tracking by controlling pan, tilt, and zoom, with target human freely walking in the environment.

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

Fig. 5-7 Combination of pan, tilt and zoom in / out

Table 5-2 Zoom layer varies in Fig. 5-7

(a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l)

𝒐𝒐 0 0 1 1 1 0 0 1 2 1 0 0

4. Tracking a target human which more than one person walking in the same environment.

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

Fig. 5-8 Human tracking in multiple objects

5. Tracking a target human which more than one person walking in the same environment.

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

Fig. 5-9 Human tracking in multiple objects

6 Chapter 6

Conclusions and Future work

6.1 Conclusions

The experiment results show that the proposed system can track moving human by particle filter algorithm on active camera. Also, the tracking system is able to track the target human when more than one person walking in the same environment.

Moreover, the zoom-in/out adjusts the resolution image of tracking human.

There are several contributions in this research:

1. Our system can exactly distinguish human and nonhuman.

2. The weighted resampling can help particle filter to preserve the samples with high weights.

3. Occlusion handler can solve the temporal full occlusion condition.

4. It can track target human smoothly by using the PID controller to determine the

在文檔中應用於主動式攝影機上的權重式重取樣粒子濾波器的人形追蹤 (頁 20-0)