Chapter 2 Human detection
2.2 Codebook matching
Moving objet Normalize Shape feature extraction
Histogram feature extraction Feature
matching with codebook Human /
Nonhuman
Fig. 2-3 Codebook matching
The codebook matching algorithm in this thesis is based on human-shape information. At first, the moving object obtained from moving object extraction is normalized into (20*40). The shape feature extraction extracts the position of shape pixels in an image, as pointed by red dot in Fig. 2-4. Ten Y-axis coordinates are chosen for leftmost and rightmost of object’s boundary. Then, the twenty coordinates of its related X-axis coordinate are arranged as feature vector as shown as blue blocks in Fig. 2-4.
The projection of X-axis calculates the histogram of pixel values on the Y-axis.
Total bin of the histogram is ten as shown as green blocks in Fig. 2-4. Therefore, an object can be representing as 30-feature vector in our work.
Fig. 2-4 Feature word X
12
By observation, the top and bottom of Y-axis of shape pixels are not suitable to choose as feature points, because these pixels are changeable. The way to find ten specific coordinates at Y-axis is to calculate the standard deviation in each value of Y-axis for a training sample, and then chooses ten lowest standard deviation values each side.
The feature vector is matched with the code vectors in the codebook. The codebook represents a list of feature vectors. The purpose of matching process is to find a code vector with the minimum distortion to the feature vector of object. In order to describe how the codebook is used to classify the human from other objects, there are some variables should be defined at first. Let a series of features vector denotes as 𝑋, and each of 𝑋 includes data of 𝑀 dimensions, indicated by 𝑋0… 𝑋𝑖…𝑋𝑀−1. There are 𝑁 sets of code word 𝑉 defined as 𝑉0…𝑉𝑗…𝑉𝑁−1 in codebook 𝐶. Each of 𝑉𝑗 is just like 𝑋 that has 𝑀 dimensional data defined as 𝑉𝑗0…𝑉𝑗𝑖…𝑉𝑗𝑀−1. The distortion between feature word and code words is defined in Eq.
2.4.
𝐷𝑖𝑠𝑗 = ‖𝑋 − 𝑉𝑗‖ = ∑𝑀−1𝑖=0 |𝑋𝑖− 𝑉𝑗𝑖| (2.4)
𝐷𝑖𝑠𝑚𝑖𝑛= 𝑚𝑖𝑛(𝐷𝑖𝑠𝑗) 𝑗 = 0 … 𝑁 − 1 (2.5)
With the definition of the variables above, the feature vector of normalized moving object compares with every 𝑉𝑗 in the codebook 𝐶. If the value of 𝐷𝑖𝑠𝑚𝑖𝑛 is smaller than the threshold defined by user, the moving object with the feature word 𝑋 is considered as human. Otherwise, it is a nonhuman object. The demonstration of comparing 𝑋 with 𝑉𝑗 is showed in Fig. 2-5.
Fig. 2-5 The procedure of the comparison with the codebook
13
3 Chapter 3
Human tracking
A human tracking system based on particle filter algorithm is proposed in this thesis. The key idea of particle filter is to approximate the probability distribution by a weighted sample set and each sample represents one hypothetical state of the object with a corresponding discrete sampling probability [25]. The original resample method selects sample using uniformly distributed random number. It usually keeps some low weighted samples which may decrease the accuracy of tracking. In this thesis, we proposed a weighted resampling particles method which just selects high weighted samples.
3.1 Histogram color features
Mostly, feature based object tracking method uses color as feature. Color information is more accurate than gray scale. Based on experiment, the HSV color space gives a good performance while tracking, thus we apply it on our tracking algorithm.
The main purpose of HSV color space is to reduce the sensitivity of illumination or lightness information of RGB color space. Fig. 3-1 (a), (b) shows RGB and HSV color space, respectively. The HSV model, also known as HSB model, was created in 1978 by Alvy Ray Smith. It is a nonlinear transformation of the RGB color space. It defines a color space in terms of three components: hue, saturation, and value. The definition is described below: [31]
14
1. Hue: It is the color type and ranges from 0 ~ 360 degree. Each value corresponds to one color. For example, 0 is red, 45 is orange and 55 yellow. When it comes to 360 degree, it is also equal to 0 degree.
2. Saturation: It is the intensity of the color, and ranges from 0%~100%. 0 means no color, and that means only gray value between black and white exists. 100 means the intense color with the most color variety.
3. Value: It is the brightness of the color, and also ranges from 0%~100%. 0 is always black. Depending on the saturation, 100 may be white or a more or less saturated color.
(a) (b)
Fig. 3-1 (a) RGB color model [31] (b) HSV color model [32]
The transformation from RGB color model to HSV color model is written in the following equation.
15 respectively. Generally, each color-channel has 8-bits data, means produce (256*256*256) of bins of color histogram. Without loss of generality, the color-data is quantizing into (6*6*6). Therefore, total bin of the color histogram is 216 bins.
3.2 Kernel function
Kernel function is used to represent a target object. Different statistical distributions can be adopted for both target object or candidate model, such as Gaussian kernel, Flat kernel and Epanechnikov kernel. Let x be normalized pixel as location in the region defined as target model, then the Gaussian kernel, Flat kernel and Epanechnikov kernel [33] are defined as follows.
1. Gaussian kernel
Fig. 3-2 (a) and (c) show the distribution of Gaussian and Epanechnikov kernel are similar. They have highest value are the center distribution. If we take a look at the ROI of target model in Fig. 3-3 (a), the pixels which are closer to the center of ROI is
16
containing more important information and the background pixels are mostly near at ROI’s boundary. Therefore, Gaussian and Epanechnikov kernel can disregard the boundary information and the accuracy will larger than flat kernel.
(a) (b) (c)
Fig. 3-2 (a) Gaussian kernel (b) flat kernel (c) Epanechnikov kernel
(a) (b) (c)
Fig. 3-3 (a) Target object (b) Kernel function (c) Target object and Kernel function
17
3.3 Particle filter algorithm
Particle filter provides a robust tracking framework, as it models uncertainty. It can keep its options open and consider multiple state hypotheses simultaneously.
Since less likely object states have a chance to temporarily remain in the tracking process, particle filters can deal with short-lived occlusions [25].
Define target model at location-y as m-bin histogram 𝑞𝑦 = {𝑞𝑦(𝑢)}𝑢=1…𝑚 which compute by equation below.
𝑞𝑦(𝑢) = 𝑓 ∑𝐼𝑖=1𝑘 (‖𝑦−𝑥𝑎 𝑖‖) 𝛿[ℎ(𝑥𝑖) − 𝑢] (3.7) with the normalized factor
𝑓 = 1
∑𝐼𝑖=1𝑘(‖𝑦−𝑥𝑖‖𝑎 ) (3.8)
where I denotes the number of pixels in the ROI region, 𝛿 is the Kronecker delta function, and a=√𝑤2+ ℎ2 is used to normalize the size of the object region. The sample model 𝑝𝑦 = {𝑝𝑦(𝑢)}𝑢=1…𝑚 is represented as the same model as target model.
The similarity value 𝜌 between target and sample model computes by Bhattacharyya distance d. Large 𝜌 means two models more similar, 𝜌 equal to 1 when two histograms are identical.
𝑝𝑦(𝑢) = 𝑓 ∑𝐼𝑖=1𝑘 (‖𝑦−𝑥𝑎 𝑖‖) 𝛿[ℎ(𝑥𝑖) − 𝑢] (3.9)
𝜌[𝑝, 𝑞] = ∑𝑚𝑢=1√𝑝(𝑢)𝑞(𝑢) (3.10)
𝑑 = √1 − 𝜌[𝑝, 𝑞] (3.11)
In particle filter algorithm, the target model can be represented by a state vector 𝑠𝑡𝑎𝑟𝑔𝑒𝑡.
𝑠𝑡𝑎𝑟𝑔𝑒𝑡 = {x, 𝑣𝑥, y, 𝑣𝑦, w, h} (3.12)
where (x, y) specify the center position of ROI, (𝑣𝑥, 𝑣𝑦) object’s motion. w and h
18
denote the width and height of ROI, respectively.
The initial sample set 𝑆𝑖𝑛𝑖𝑡𝑖𝑎𝑙 = {𝑠(𝑛)}𝑛=1…𝑁 compute by
𝑠(𝑛) = 𝐼𝑠𝑡𝑎𝑟𝑔𝑒𝑡+ 𝑟. 𝑣. (3.13)
with 𝑁 is the number of samples, 𝐼 is an identity matrix, and 𝑟. 𝑣. is a multivariate Gaussian random variable.
The sample set is propagated through a dynamic model as following equation.
𝑠𝑡= 𝐴𝑠𝑡−1+ 𝑟. 𝑣.𝑡−1 (3.14)
where 𝐴 defines the deterministic component of the model. By using every sample’s weight and its state vector, the target human’s position and size can be obtained from estimated vector using following equation. opportune moment of resample step is determined by following equation.
𝑁𝑒𝑓𝑓 < 𝑁𝑡ℎ𝑠 (3.17)
𝑁𝑒𝑓𝑓 = 1
∑𝑁𝑛=1(𝜔𝑡(𝑛))2 (3.18)
𝑁𝑡ℎ𝑠= 𝑟𝑎𝑡𝑒 ∗ 𝑁 (3.19)
where 𝑟𝑎𝑡𝑒 ∈ (0,1), 𝑁𝑒𝑓𝑓 and 𝑁𝑡ℎ𝑠 are the effective number of samples and given of threshold sample, respectively. During resample step, samples with high weight may be chosen several times, leading to identical copies, while others with relatively low weights may not be chosen at all.
Given a sample set 𝑆𝑡−1 and the target model q, for the first iteration, 𝑆𝑡−1 is
19
2. Observe the color distributions:
(a) calculate the color distribution 𝑝𝑠𝑡(𝑛)
(𝑢) = 𝑓 ∑ 𝑘 (‖𝑠𝑡
(𝑛)−𝑥𝑖‖
𝑎 ) 𝛿[ℎ(𝑥𝑖) − 𝑢]
𝐼𝑖=1 for each sample in the set 𝑆𝑡
(b) calculate the Bhattacharyya coefficient for each sample of the set 𝑆𝑡 ρ [𝑝𝑠 (a) calculate the normalized cumulative probabilities 𝑐𝑡′
𝑐𝑡(0) = 0
𝑐𝑡(𝑛) = 𝑐𝑡(𝑛−1)+ 𝜔𝑡(𝑛) 𝑐′𝑡(𝑛) = 𝑐𝑡(𝑛)
𝑐𝑡(𝑁)
(b) generate a uniformly distributed random number 𝑟 ∈ [0,1]
(c) use binary search to find the smallest 𝑗 for which 𝑐′𝑡(𝑗)≥ 𝑟 (d) set 𝑠′𝑡(𝑛) = 𝑠𝑡(𝑗)
Finally resample by 𝑆𝑡 = 𝑆𝑡′
20
Fig. 3-4 Weighted resampling particle filter
Our proposed tracking method is showed in Fig. 3-4. The differences between original particle filter and ours are weighted resampling and occlusion handler. We will explain them later.
3.4 Weighted resampling algorithm
The original resample step in particle filter selects samples randomly. Samples with a high weight value may be chosen several times, leading to identical copies. But there are some samples with relatively low weights are selected in resample step.
Fig. 3-5 Original resampling
Fig.3-5 shows the samples points with high weights are in the ROI (green block), and samples with relatively low weights are in the red block. Although, two blocks
21
have nearly same similarity value, but the actual target object is in the green block.
Consequently, it may track a different object as target object. In other word, it will decrease the accuracy of tracking. Thus, we proposed a weighted resampling algorithm to cover this problem. First, choose top sample set 𝑆𝑡𝑡𝑜𝑝 with 𝑁𝑡𝑜𝑝 weights from set 𝑆𝑡.
𝑁𝑡𝑜𝑝 = 𝑡𝑜𝑝 ∗ 𝑁 (3.20)
𝑆𝑡𝑡𝑜𝑝 = {𝑠𝑡𝑜𝑝(𝑛)}𝑛=1…𝑁𝑡𝑜𝑝 (3.21)
where 𝑡𝑜𝑝 is a top rate which set to 0.2. The 𝑆𝑡𝑡𝑜𝑝 just selecting samples with top 20% weights from set 𝑆𝑡. Reproduce 𝑁 samples in 𝑆𝑡 according to the weight of 𝑠𝑡𝑜𝑝(𝑛). This step will produce 𝑠𝑡𝑜𝑝(𝑛) which has relative high more times in 𝑆𝑡, and others with relative low weight will be produced at least one time. Fig. 3-6 shows the weighted resampling result. Most of sample points lie in the green block or in the target object region.
Fig. 3-6 Weighted resample
3.5 Target update
The GMM (Gaussian Mixture Model) applies to update the target model over time. Let 𝐾 Gaussian distributions are chosen to approximate any continuous
22
The idea of GMM update algorithm is to update target model’s color histogram.
Each bin 𝑞(𝑢) is modeled by 𝐾 = 3 Gaussian distributions. The mean 𝜇𝑘, standard deviation 𝜎𝑘, and weight 𝜋𝑘 will be initialized as 𝜇𝑘 = 𝑞(𝑢), 𝜎𝑘 = 1, and 𝜋𝑘 =𝐾1, where 𝑘 = 1~𝐾.
1. Sort the {𝜋𝑘}𝑘=1~𝐾 in descending order and obtain the order {𝜋𝑎, 𝜋𝑏, 𝜋𝑐} which 𝜋𝑎 ≥ 𝜋𝑏 ≥ 𝜋𝑐.
2. Update bin’s value by following equation.
𝑞′(𝑢) = 𝐴𝜇𝑎+ 𝐵𝜇𝑏+ 𝐶𝜇𝑐 (3.23)
where A = 0.6, B = 0.25, C = 0.15 and a, b, c is the descending order.
3. If the difference previous and current frame’s 𝑞(𝑢) is smaller than a threshold.
Find the first one Gaussian distribution which follows Eq. 3.24.
|𝑞(𝑢)− 𝜇𝑘| < 𝜎𝑘∗ 3 (3.24)
23
Step 1~3 will produce updated target model 𝑞′= {𝑞′(𝑢)}
𝑢=1…𝑚.
3.6 Occlusion handler
Normally, particle filter can handle some occlusion condition, but it depends on the range of samples which spread in the spatial space. If some samples spread in the location where human appeared after occlusion, then the human still can be tracked continuously. On the other hand, if the occlusion happened and spread range of samples is too small to cover the region where human appears, then resample will lead to track lost.
The occlusion handler in our work is based on color similarity of target and candidate model. The details are described below.
1. Create candidate model 𝑐 = {𝑐(𝑢)}𝑢=1…𝑚 from the ROI in current frame.
2. Compute similarity value between target model 𝑞′= {𝑞′(𝑢)}
𝑢=1…𝑚 and candidate model 𝑐 = {𝑐(𝑢)}𝑢=1…𝑚.
3. If 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 < 𝑡ℎ𝑠𝑠𝑖𝑚, then do not process the resample step. Assume candidate model is occluded with other object.
4. Add counter 𝐶𝑜𝑢𝑛𝑡 = 𝐶𝑜𝑢𝑛𝑡 + 1.
5. During tracking process, the step 1~4 are iterated until the tracking human has appeared (similarity value larger than thssim) or 𝐶𝑜𝑢𝑛𝑡 ≥ 10 which avoided the samples spread out of image.
6. Then the resample step is restarted.
24
(a) frame 681 (b) frame 685
(c) frame 690 (d) frame 695
Fig. 3-7 Occlusion handler (a) frame 681 (b) frame 685 (c) frame 690 (d) frame 695
PF similarity
similarity
< 0.7 Count < 10
Yes
No
Weighted resampling
End Count=0
Count++
No
Yes
Occlusion handler
Fig. 3-8 Occlusion handler flow chart
25
Fig. 4-1 Active camera control through RS485
The active camera is controlled by pelco P-protocol [34] through RS-232 to RS-485 converter. It has to control pan (horizontal direction), tilt (vertical direction) angle, and zoom’s step to achieve tracking purpose.
The pelco P-protocol has 8 bytes data with message format as shown in Fig. 4-2.
Byte1 and byte7 is start and stop byte, and always set to 0xA0 and 0xAF respectively.
Byte2 is the receiver or camera address. In this thesis, we only use one camera, so byte2 always set to 0x00. Byte3, byte4, byte5, byte6 are used to control pan-tilt-zoom (PTZ) as shown in Fig. 4-3. The last byte is an XOR check sum byte.
Data byte 2 Data byte 3 Data byte 4
Fig. 4-2 Message format
26
Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
Data byte1 Fixed to 0 Camera On Auto Scan On
Camera On/Off
Iris Close Iris Open Focus Near Focus Far
Data byte2 Fixed to 0 Zoom Wide Zoom Tele Tilt Down Tilt Up Pan Left Pan Right 0 (for pan/tilt) Data byte3 Pan speed 00 (stop) to 3F (high speed) and 40 for Turbo
Data byte4 Tilt speed 00 (stop) to 3F (high speed)
Fig. 4-3 Data byte 1 to 4 format
In this thesis, we divide the image into 9 regions associated with pan-tilt directions, and keep moving object in the center of FOV. Every region has specific direction as shown in Fig. 4-4. If the target is located on stop-region, then camera is set to stop. Meanwhile, the camera speed on other regions is determined by PID controller. The zoom-in and zoom-out will be activated if the target’s size becomes smaller or larger than user’s defined size. The details of camera control are showed in Fig. 4-5.
S T O P
Fig. 4-4 Control direction for each regions
27 Tracking target
Target size >
upper bound Zoom out command
Target size <
lower bound Zoom in command
Target position in
center region Stop command
PID control
Pan / Tilt command
Update upper bound
& lower bound &
target size
Send PTZ command Yes
Yes
Yes No
No
No
Fig. 4-5 Camera control flow chart
4.1 PID controller
A proportional-integral-derivative controller (PID controller) is a generic control loop feedback mechanism (controller) widely used in industrial control systems [35].
The diagram of PID is showed in Fig. 4-6.
28
Fig. 4-6 The PID controller
In the PID control system, the monitored Plant/Process is hoped to keep one ideal state. The measured value of Plant/Process is 𝑦(𝑡) which is sent to the comparer to compare with setting value 𝑢(𝑡). If the Plant/Process has affected by disturbance, the measured value is not equal to setting value and the comparer will produce error signal 𝑒(𝑡). The error signal 𝑒(𝑡) is sent to controller. The controller produces output signal 𝐶𝑜𝑢𝑡 to correct Plant/Process and make it returns to ideal state.
The output signal 𝐶𝑜𝑢𝑡 is defined by following equation.
𝐶𝑜𝑢𝑡 = 𝐾𝑝𝑒(𝑡) + 𝐾𝐼∫ 𝑒(𝑡) 𝑑𝑡 + 𝐾𝐷𝑑𝑒(𝑡)
𝑑𝑡 (4.1)
where 𝐾𝑝 is proportional constant, 𝐾𝐼 is integral constant, and 𝐾𝐷 is derivative constant.
The controller consists proportional controller (P controller), integral controller (I controller), and derivative controller (D controller).
1. P controller: is error signal 𝑒(𝑡) multiplied by 𝐾𝑝. The Plant/Process which has been disturbed can be corrected by this controller, but there are some small eternal error cannot solve by this controller.
2. I controller: is the integral of error signal 𝑒(𝑡) with time. In other words, it multiplies error with its existed time. Also, it can correct the small eternal error which P controller cannot overcome. This controller can use the accumulated integrals with time to make disturbed Plant/Process recover to setting value 𝑢(𝑡).
29
3. D controller: is the differential of error signal 𝑒(𝑡). Due to this operation, the system has the perspective and can predict the Plant/Process which has large variation.
The corresponding variables of PID controller in our work are defined as follows:
Setting value 𝑢(𝑡): the center position of image.
Error signal 𝑒(𝑡): the difference of center position and target position.
Measured value 𝑦(𝑡): the target position which estimated by tracking system.
Output signal 𝐶𝑜𝑢𝑡: the output is transferred to pan / tilt speed.
We use two independent PID controllers to control horizontal and vertical position difference, and estimate the speed of pan and tilt. The 𝐶𝑜𝑢𝑡 is converted to pan speed and tilt speed by Eq. 4.2, Eq. 4.3.
𝑆𝑝𝑒𝑒𝑑𝑝𝑎𝑛 = 𝐶𝑜𝑢𝑡 ∗ 0.1 + 𝑜𝑓𝑓𝑠𝑒𝑡𝑝𝑎𝑛 (4.2)
𝑆𝑝𝑒𝑒𝑑𝑡𝑖𝑙𝑡 = 𝐶𝑜𝑢𝑡 ∗ 0.1 + 𝑜𝑓𝑓𝑠𝑒𝑡𝑡𝑖𝑙𝑡 (4.3)
𝑜𝑓𝑓𝑠𝑒𝑡𝑝𝑎𝑛 = { 𝑜𝑓𝑓𝑠𝑒𝑡𝑝𝑎𝑛, 𝐶𝑜𝑢𝑡 ≥ 0
−𝑜𝑓𝑓𝑠𝑒𝑡𝑝𝑎𝑛, 𝐶𝑜𝑢𝑡 < 0 (4.4)
𝑜𝑓𝑓𝑠𝑒𝑡𝑡𝑖𝑙𝑡 = { 𝑜𝑓𝑓𝑠𝑒𝑡𝑡𝑖𝑙𝑡, 𝐶𝑜𝑢𝑡 ≥ 0
−𝑜𝑓𝑓𝑠𝑒𝑡𝑡𝑖𝑙𝑡, 𝐶𝑜𝑢𝑡 < 0 (4.5)
where 𝑜𝑓𝑓𝑠𝑒𝑡𝑝𝑎𝑛 and 𝑜𝑓𝑓𝑠𝑒𝑡𝑡𝑖𝑙𝑡 are defined by user. These values are related to the pan-tilt speed provided by camera’s specifications (0 to 64). The PID controller in Eq.
4.2 and Eq. 4.3 produced limited speed value in a suitable range, because if the speed is set too large, then the camera may drive over the object. The consequence is tracking lost may happen.
30 new width and height are defined as follow.
𝑟𝑎𝑡𝑖𝑜𝑤 ℎ⁄ =𝑤ℎ𝑖𝑛𝑖𝑡𝑖𝑎𝑙
31
5 Chapter 5
Experimental results
This system was implemented on PC platform with Intel® Core™ i5 CPU 650 @ 3.20GHz, 4GB RAM, and developed in Borland C++ Builder 6.0 on Windows 7. The system has been tested under several environments in order to verify its performance and stability. Both video files (AVI uncompressed format) and image sequences from active camera are tested.
5.1 Track on video file
Three videos have been used to verify the tracking system, with parameter particle filter as follows.
Number of samples 𝑁 = 30
Number of bins in histogram 𝑚 = 6 ∗ 6 ∗ 6 = 216
State covariance (𝜎𝑥, 𝜎𝑣𝑥, 𝜎𝑦, 𝜎𝑣𝑦, 𝜎𝑤, 𝜎ℎ) = (2,0.5,2,0.5,0.4,0.8)
1. Video 1 is used to verify the occlusion handler in our system as shown in Fig.
5-1 and Fig. 5-2. Figure 5-1 shows the tracking result without occlusion handler.
The full occlusion condition happens in frame 685. If the particle filter resamples during the full occlusion condition, it may resample on uncorrect positions as shown in frame 689 and tracking will lost in frame 694 and 698. Meanwhile, when the full occlusion happens in the particle filter with occlusion handle, the resample step will not be done immediately. So, the sample set can keep widespread range to track after full occlusion.
32
Frame 558 Frame 673
Frame 685 Frame 689
Frame 694 Frame 698
Fig. 5-1 Tracking without occlusion handler
33
Frame 558 Frame 673
Frame 685 Frame 689
Frame 694 Frame 698
Fig. 5-2 Tracking with occlusion handler
34
2. Video 2 is used to verify the tracking feature. Figure 5-3 shows human wears black jacket walking near a black chair. In this case, the target human has similar color feature with the black chair, but the proposed system still can tracks the target human.
Frame 193 Frame 258
Frame 292 Frame 351
Frame 372 Frame 398
Fig. 5-3 Object has similar color as target human
35
3. Video 3 is used to verify the tracking performance in complex situation. Figure 5-4 shows the target human is paritial occluded with a chair. The target human does sitting down and stand-up activity, as shown in Fig. 5-4 (b). Moreover, the target human is partial occluded with other human as shown in Fig. 5-4 (c).
(a)
(b)
(c)
Fig. 5-4 (a) frame 1436, 1547 and 1605 (b) frame 2152, 2214 and 2277 (c) frame 2757, 2818 and 2838
36
5.2 Track on active camera
The active camera sets-up in our laboratory. The complexity of the environment is enough to verify the system while detecting and tracking moving human. The parameters of particle filter and PTZ are set as follows:
Number of samples 𝑁 = 30
Number of bins in histogram 𝑚 = 6 ∗ 6 ∗ 6 = 216
State covariance
(𝜎𝑥, 𝜎𝑣𝑥, 𝜎𝑦, 𝜎𝑣𝑦, 𝜎𝑤, 𝜎ℎ) = (10,1,10,1,1,2)
𝑜𝑓𝑓𝑠𝑒𝑡𝑝𝑎𝑛 = 12 𝑜𝑓𝑓𝑠𝑒𝑡𝑡𝑖𝑙𝑡 = 6
Proportional constant 𝐾𝑝 = 0.9 Integral constant 𝐾𝐼 = 0.1 Derivative constant 𝐾𝐷 = 0.15 𝑟𝑎𝑡𝑒𝑏𝑖𝑔= 1.1
𝑟𝑎𝑡𝑒𝑠𝑚𝑎𝑙𝑙 = 0.9
37
1. Tracking result only by controlling pan and tilt. The experimental result shows the target human is mostly located on camera’s FOV, no matter how he walks.
(a) (b) (c)
(d) (e) (f)
(h) (i) (j)
(k) (l) (m)
Fig. 5-5 Update pan and tilt command to track
38
2. Tracking to test the zoom-in/out. In this case, the target human is walking away from camera or approaching to the camera.
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
Fig. 5-6 The effects of zoom in / out
Figure 5-6 (a) shows the target human has been detected and the 𝑍𝑜𝑜𝑚𝑙𝑎𝑦𝑒𝑟 is initialized to 0. If there is a zoom-in happened, 𝑍𝑜𝑜𝑚𝑙𝑎𝑦𝑒𝑟 is added by 1. On the other hand, 𝑍𝑜𝑜𝑚𝑙𝑎𝑦𝑒𝑟 is subtracted by 1 when zoom-out happened. The details of 𝑍𝑜𝑜𝑚𝑙𝑎𝑦𝑒𝑟 is showed in Table 5-1.
Table 5-1 Zoom layer varies in Fig. 5-6
(a) (b) (c) (d) (e) (f) (g) (h) (i)
𝒐𝒐 0 0 1 2 1 2 3 2 1
39
3. Tracking by controlling pan, tilt, and zoom, with target human freely walking in the environment.
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
(j) (k) (l)
Fig. 5-7 Combination of pan, tilt and zoom in / out
Table 5-2 Zoom layer varies in Fig. 5-7
(a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l)
𝒐𝒐 0 0 1 1 1 0 0 1 2 1 0 0
40
4. Tracking a target human which more than one person walking in the same environment.
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
(j) (k) (l)
Fig. 5-8 Human tracking in multiple objects
41
5. Tracking a target human which more than one person walking in the same environment.
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
(j) (k) (l)
Fig. 5-9 Human tracking in multiple objects
42
6 Chapter 6
Conclusions and Future work
6.1 Conclusions
The experiment results show that the proposed system can track moving human by particle filter algorithm on active camera. Also, the tracking system is able to track the target human when more than one person walking in the same environment.
Moreover, the zoom-in/out adjusts the resolution image of tracking human.
There are several contributions in this research:
1. Our system can exactly distinguish human and nonhuman.
2. The weighted resampling can help particle filter to preserve the samples with high weights.
3. Occlusion handler can solve the temporal full occlusion condition.
4. It can track target human smoothly by using the PID controller to determine the
4. It can track target human smoothly by using the PID controller to determine the