• 沒有找到結果。

Chapter 4 Construction of Environmental Maps and Patrolling in Learned

4.4 Patrolling in Indoor Environment

4.4.3 Patrolling Under Several Cameras

In order to expand the range of surveillance, we use multiple fisheye cameras in this study, so the vehicles will patrol under several cameras. This creates a problem of hand-off between cameras, but the problem here can be simplified by the odometer in a vehicle, because the odometer value indicate the approximate position of a vehicle.

The mapping tables of the fisheye cameras are constructed in advance, so the range of surveillance of every camera is known. As shown in Figure 4.13, point A is the uppermost and leftmost point in Camera 2, and point B is the lowermost and rightmost coordinates in Camera 1. The coordinates of these two points are known by the mapping tables of the two cameras, hence the coordinates of points C and D can be calculated, and afterward the equation of line L can be obtained. Taking the values Ox and Oy of the odometer as variables into the equation of line L, we can judge which camera should be used to get the image of the vehicle. The equation of L can be computed by:

2 1 2 1 2 2 1 1

(yy x) (xx y) x yx y , (4.13) and the judgment is conducted as follows:

If(y2y1)Ox(x2x1)Oyx2y2x1y1 0 , then the camera 1 should be used;

otherwise, the camera 2 should be used.

Camera 1 L

Camera 2 A(x1,y1)

C(x1,y2) B(x2,y2) D(x2,y1)

Chapter 5

Following Suspicious People Automatically and Other

Applications

5.1 Introduction

Although there are several fisheye cameras on the ceiling, they cannot move their positions and cannot take images of a person at different angles. Besides, by fisheye cameras sometimes we cannot take clear images of the face of the person. In this study, a technique is designed to follow a suspicious person who breaks into the indoor environment which is under surveillance. Because the autonomous vehicles are highly mobile and can take clear images by the cameras equipped on them, they are proper to assist the surveillance in the indoor environment.

In order to let the vehicles follow a person, the position of the person should be calculated first. In this study, we only calculate a specific partial region, called the tracking window, in an image to decrease the amount of calculation and decrease the probability of interference by other dynamic obstacles. Afterward, the distance and angle between the vehicle and the person are calculated. Then, the central computer can order the vehicle to chase the person. The detail process will be described in Section 5.2.

Because only the partial image is calculated, and the person will move in the

environment, hence the next position of a person should be predicted in order to decide the region which should be calculated. But the images are taken by the fisheye cameras, so there will be high degrees of distortion in the images. The position of prediction should be modified when the person appears in different places in the image. The detail process will be described in Section 5.3.

Because the FOV’s of one camera is finite, multi-cameras are used in this study to expand the range of surveillance. When a person moves from the region of one camera to that of another, it is necessary to switch the camera to get the images of the vehicle automatically. The detail process will be described in Section 5.4. And other applications, such as recording the track of a person in the computer or calculating the walking speed of the person, are described in Section 5.5.

5.2 Calculating Position of a Person by Specific Partial Region in an Image

One goal in this study is to calculate the position of a person intruding into a space under surveillance. Although we can subtract the foreground from the background to find out all the objects in the environment, the amount of calculation will be very large. Hence we only calculate the specific partial region in an image.

Besides reducing the amount of calculation, there are other advantages to do this:

decreasing the probability of interference by other dynamic obstacles, and increasing the preciseness of calculating the position of the target.

First, the region of the door in a room space under surveillance is set in advance, and this region will be kept under close monitoring continually. When the door is

opened, the foreground image of that region will change, so we can know the door is opened by subtracting the foreground image from the background image in the region of the door. After the door is opened, the process of following any suspicious person will start.

When tracking a person, we use the same idea as finding the central point of a vehicle in Chapter 4. Only the pixels which are in a square region R will be calculated, and the size of R is changeable. The detail of the process is described in Algorithm 5.1.

The coordinates C of a certain point is introduced here. It represents the position where we predict the person should be at this time, and the detail process of calculating C will be described in Chapter 5.3.

Algorithm 5.1: Calculating the position of a person.

Input: An image which is taken by the fisheye camera, and the coordinates C of a predicted location of a person.

Output: The coordinates of the exact position of the person.

Steps:

Step 1. Calculate the range of the square region R by the use of coordinates C, at which the center of a region R is located, and the size of the region is small at first.

Step 2. Subtract the partial foreground image in R from the background image to find the pixels which have big differences between the two images.

Step 3. Apply the erosion and dilation processes to the image portion in R.

Step 4. Find the connected components in R.

Step 5. If there are some components whose sizes are larger than a threshold N in R, then go to Step 6; otherwise, go to Step 2 again with the square region R being

Step 6. Calculate the distances between all the points in the component and the center of the image.

Step 7. Find the point whose distance calculated in Step 6 is the smallest.

Step 8. Convert the point found in Step 7 to the global coordinates as output.

In Step 5, we set a threshold N in advance, and if the size of a component is larger than N, then we consider the component as an object. When there is no object in the square region R, the size of R will be increased, and Step 2 is repeated again with the new size. When there are some objects in R, we calculate the distances between all the points in the component and the center of the image, and find the point whose distance is the smallest. The reason is described below.

We find the person’s feet by the rotational invariance property of omni-cameras.

More specifically, it can be proved that the line through the human body (actually through the human’s head and feet) always passes the image center according to the above property, as shown in Figure 5.1.

c d

x y

(a) (b)

Figure 5.1 The rotational invariance property. (a) A real example. (b) A vertical sketch chart.

If a person stands at point x, a fisheye camera is affixed on the ceiling to take

distance between c and x is d, then the region of the projection will be from x to y, as seen laterally. So if we want to find the position where the person stands, we should check all the pixels in the region to find out the point whose distance to c is the smallest, namely, the point x with the smallest distance d. Several real examples are shown in Figure 5.2, where the red points are the calculated positions of the tracked person.

(a) (b)

(c) (d)

Figure 5.2 Finding the positions of a person. (a)(b)(c) The person stands at several different places in an image. (d) The person stands at the center point in an image.

5.3 Predicting Position of A Person

Because a person may move in the environment continuously and we only want to calculate the specific partial region in an image of the person, the position C of a person in the next period of time should be predicted. Then, the region for tracking the person can be decided by C and Algorithm 5.1.

Because the positions of a person in every period of time should be calculated continuously and the length of every period of time is set equal, a simpler solution to the person prediction problem is to extend the line of two older positions of the person to predict the future position of the person. As shown in Figure 5.3, if the person moves from point A to point B in sequence, and the distance between A and B is D1, we extend the line segment of A and B to a double in length to find the predicting point C for the person, where the distance D2 is set equal to D1.

A D

1

B D

2

C

Figure 5.3 Prediction of the position of a person.

But in this study, we need to track a person in the images which are taken by fisheye cameras. Because there is a high degree of distortion in each of such omni-images, the speed of movement of a person in the image will also change even if the person walks at a fixed speed. Specifically, if a person moves at the center of an image, his/her moving speed will be faster than that of a person who moves at the edges of the same image. The reason is that the partial image which neighbors the center of an image is enlarged, and the partial image which neighbors the edges of an

image is shrunk, as shown in Figure 5.4 where the sizes of two red squares are the same in the real world, but different in the image.

Figure 5.4 An example of distortion in the image taken by a fisheye camera.

Because of the problem mentioned above, three situations are identified in this study which should be dealt with for our purpose of smooth tracking of a person under the surveillance of a fish-eye camera. Situation 1 occurs when a person walks from the edge to the center in an image. In such a situation, the distance of prediction should be enlarged, meaning that D2 should be longer than D1 in Figure 5.3. On the other hand, Situation 2 occurs if a person walks from the center to the edge in an image. Here, the distance of prediction should be shrunk, meaning that D2 should be shorter than D1. And Situation 3 occurs when the person walks around the center of the image. Here, D1 should be set equal to D2.

As a summary, the formula for calculating D2 can be written as follows:

)

1 (

2 D OA OB

D    (5.1)

where the symbol O represents the central point of the image, the symbol OA represents the distance between point A and O, and OB is interpreted similarly. So

OB

OA means the distance of approaching to the center O of an image when a person walks from point A to B. The detail of using Equation 5.1 is described below for all of the three situations.

The symbol  in Equation 5.1 is a ratio of enlarging and shrinking, whose definition will be different in Situations 1 and 2. The equations for calculating  are derived in this study to be:

q q p

 for Situation 1; (5.2)

p q p

 for Situation 2, (5.3)

where p q is the average ratio of the distance between every two consecutive image points listed in the mapping table as shown in Figure 5.5, where we assume p to be closer to the center of the image than q.

Figure 5.5 Calculating p and q.

In more detail, in Situation 1 because a person walks from the edge to the center in an image, OAOB will be larger than 0, and the distance D2 will be enlarged, as shown in Figure 5.6. For example, if a person moves 30 cm totally, and approaches to

the center of the image for a distance 20 cm and pq is 54, then D1 is 30cm, and except that the person is leaving from the center of the image for a distance of 20 cm, then the prediction of D2 will be 20 26

In Situation 3, because the value of OAOB is zero, D1 will be set equal to D2 as shown in Figure 5.8.

B

D1 = 30

D2 = 30 C A

Figure 5.8 Situation 3.

5.4 Using Multi-Cameras to Expand Range of Surveillance

Because the FOV’s of a camera is finite, we use multi-cameras in this study to expand the range of surveillance. A person may move its position between these cameras, so we have to decide which camera should be used to take images of the person. A simpler solution to this problem is described in Section 4.4.3, but the target is a person now instead of a vehicle, so we cannot use the values of the odometer anymore.

Besides, because we only calculate a partial image to find the position of a person, if a person walks to the FOV’s of another camera, we should also decide which region in the image taken by camera 2 should be calculated to continue finding the position of the person. The proposed method for solving the problem here is

described in Algorithm 5.2.

Algorithm 5.2: Calculating the image coordinates of the position of a person in the

image taken by a second camera.

Input: The mapping tables of two cameras C1 and C2, and the image coordinates I1 of the position of a person in the FOV’s of camera C1.

Output: The image coordinates I2 of the position of a person in the image taken by C2. Steps:

Step 1. Calculate the coordinates of the four corner points A, B, C and D of the overlapping area of cameras C1 and C2, and the equation of line L, as shown in Figure 5.9.

Step 2. Check the position of the person to see if he/she moves from the FOV’s of one camera to another and exceeds the line L continuously. If yes, continue;

otherwise, use the original camera C1 to take images.

Step 3. Convert I1 to the global coordinates G by the mapping table of C1. Step 4. Convert G to the image coordinates I2 by the mapping table of C2.

Step 5. Use C2 to continue to take images of the person, and calculate the position of the person by the mapping table of C2.

In Algorithm 5.2, we calculate the coordinates of points A, B, C and D, and the equation of line L as shown in Figure 5.9 by the method described in Section 4.4.3 first. When a person walks from the FOV’s of one camera to another and exceeds the line L, for example from P1 to P2, then the coordinates of the person Pics could be calculated as described in Section 5.2 in the ICS (Image Coordinate System), and then converted to the GCS (global coordinate system) Pgcs. Afterward, Pgcs is converted to

Camera 1

L

Camera 2 A(x1,y1)

C(x1,y2) B(x2,y2) D(x2,y1)

P1 P2

Figure 5.9 An example of hand-off when a person moves from P1 to P2.

For example, if a person walks from P1 to P2, that is, from a region within the FOV of camera 1 to a region within that of camera 2, and exceeds the line L, the coordinates of P2 in the ICS will be converted to the GCS in camera 1, and then be converted to the ICS again in camera 2, as described in the algorithm. Afterward, camera 2 is used to take the image of the person.

5.5 Other Applications

In this section, two other applications of tracking a person are described. The first is that we can record the trail of a person, and the second is that we can calculate the walking speed of a person. Both the trail and the walking speed can be stored in a computer. And if something misses in a room, these data can be read out to check to see who has come into this room and where he/she has walked through. Besides, the walking speed is a characteristic of the person, and it can help us to decide the person’s sex, age, and finally identity.

5.5.1 Recording Trail of a Person

When a person breaks into the environment under surveillance, he/she will be traced by the omni-cameras continuously as described previously. The positions of the person will be calculated in every 400 ms, and the coordinates of these positions can be reserved in a file, as shown in Table 5.1.

Table 5.1 The record of a person’s trail.

(379,213) can then judge who broke into the room by the video, and draw the trail of the person on the map as shown in Figure 5.10 to check where the person has walked through, and hence judge if he/she has taken the stuff or not.

Figure 5.10 The trail of the intruding person drawn on the map.

5.5.2 Calculating Walking Speed of A Person

The purpose of calculating the walking speed of a person is that if the person who breaks into the environment under surveillance has a mask on his/her face, even if we have taken clear images of him/her by the camera equipped on the vehicle, we still cannot tell the identity of him/her. However, we can gather other characteristics of the person, like the walking speed which is one of the characteristics that can help us to identify the person, as mentioned previously.

Because the position of the person is calculated in every 400 ms, we can calculate the distance of movement of the person in the same frequency. Hence, we can take every two consecutive positions P1 and P2 of the person, and calculate the walking speed of the person by the following equation:

100 60 4

. 0

2

1P

P meters/minutes (5.4)

where P1P2 means the distance between positions P1 and P2 in cm. We can calculate the average walking speed by correcting a set S of all the walking speeds in

a certain duration of time, removing data with zero values from S, and calculating an average speed value from S. The necessity of removing zero-valued data is: if the walking speed w is zero, it means that the person is not moving now, and so we cannot consider w into the average walking speed.

Chapter 6

Experimental Results and Discussions

In this chapter, we show some experimental results of the proposed security patrolling system. The first is the results of calculating the positions Wreal of real-world points when a fisheye camera is affixed at the different heights. We compare the values of Wreal which are calculated by the method proposed in Chapter 3 with those obtained by measuring manually.

The second is the results of calculating the positions of a person in an actual environment in the Computer Vision Laboratory, Department of Computer Science, National Chiao Tung University, and the computing results are compared with the real positions of the person.

The third is the results of the distance of deviating from the original path when an autonomous vehicle patrols in an actual environment, and the position and direction of the vehicle are corrected by the method proposed in Chapter 4. The detail will be described in Section 6.3.

6.1 Experimental Results of

Calculating Positions of Real-world

Points

calculated the values of the real-world point positions Wreal for every height by the method mentioned in Chapter 3 and by manually measuring simultaneously.

We construct a basic mapping table for the fisheye camera first. We affix the camera at the height of 20 cm from the calibration board as shown in Figure 6.1. The real-world width between every two consecutive intersection points on the board we used is 1cm obtained by manually measuring. Then we can calculate the image coordinates of the intersections of the lines in the image and construct a basic mapping table by the method proposed in Chapter 3.

Figure 6.1 The camera is affixed at 20 cm from the calibration board.

After the table is constructed, we affix the camera to the heights of 10 cm, 15 cm, 30 cm, and 40 cm from the calibration board as shown in Figures 6.2, 6.3, 6.4, and 6.5, respectively. We can calculate the values Wreal of real-world points for every height by

After the table is constructed, we affix the camera to the heights of 10 cm, 15 cm, 30 cm, and 40 cm from the calibration board as shown in Figures 6.2, 6.3, 6.4, and 6.5, respectively. We can calculate the values Wreal of real-world points for every height by