Chapter 4 Automatic Detection of a Suspicious Passer-by with a Two-camera
4.3 Estimation of a Passer-by’s Distance and Height Information
4.3.3 Calculation of a passer-by’s distance and height in 3D space
At the beginning, note that we construct an r-ρ Table using the radial stretching function in Section 3.2 in the learning process. An r-ρ-Table records the corresponding relationships of an omni-camera between radial lengths r and elevation angles ρ, as shown in Table 4.1.
Table 4.1 r-ρ-Table
r r1 r2 r3 … rn
ρ ρ1 ρ2 ρ3 … ρn
To calculate a passer-by’s relevant 3D data, we do the same process as described in Section 4.3.2 using each omni-camera of a two-camera omni-directional imaging device. Then, we get the two positions of a passer-by’s head (Pup, Pdown) in the two omni-images taken by a two-camera omni-directional imaging device. When
the two positions Pup (xup, yup) and Pdown (xdown, ydown) are obtained, we can derive the radial distance in the following:
2 2 ; 2 2 .
up up up down down down
r = x +y r = x +y (4.11)
By using minimum distance estimation with r in the corresponding r-ρ Table, the corresponding elevation angles ρup, ρdown can be derived.
Because the two omni-cameras are combined coaxially in the longitudinal direction and their camera coordinates (X, Y, Z) have the same axis directions, the two azimuth angles (θup, θdown) of Pup and Pdown are the same by the rotational-invariant property of the omni-camera. However, because of mechanical errors, it is accepted that the difference angle between θup and θdown is less than 10 degrees. Thus, if θup and θdown is matched within this tolerance of angular error, we can use the 3D data acquisition proposed in Chapter 2.3.3 to estimate a passer-by’s position (X, Y, Z) with respect to the center of the upper omni-camera Cup in the WCS. In more detail, with HCup denoting as the height of Cup, the passer-by’s height Hpasser-by can be computed as follows:
Hpasser-by = HCup − Z, (4.12) where HCup in this study is 256cm as measured manually in our experiment..
To summarize, the flowchart in Figure 4.7 shows the overall processes used to detect a passers-by in this study.
Figure 4.7 An overview of passers-by detection proposed in this study.
Chapter 5
Integration of Two Omni-images into a Top-view Image with a Pair of
Two-camera Omni-directional Imaging Devices
5.1 Introduction
In order to expand the range of surveillance, two pairs of two-camera omni-directional imaging devices are used in this study. A top-view image around a video surveillance car which comes from merging the two omni-images taken by the upper cameras in the two pairs of two-camera omni-directional imaging devices is available for users in the car. In this chapter, we describe all the relevant methods which are used in computing such an integrated top-view image.
The remainder of this chapter is organized as follows. In Section 5.2, we introduce the techniques we propose for construction of such an integrated top-view image. In Section 5.3, we describe the techniques we propose for superimposition of the video surveillance car shape and filling of ground texture in the top-view image.
5.2 Construction of a Top-view Image
5.2.1 Construction of a top-view image with an
omni-camera
Because the view range of the upper camera in a two-camera omni-directional imaging device is wider than the lower one, we use the upper omni-camera to construct a top-view image. The height of the omni-camera affixed on the video surveillance car roof is known in this study, so we can derive a simple top-view image by assuming all pixels in the omni-image are on the ground.
Figure 5.1 illustrates the geometry relationship between a point on the ground and the upper omni-camera. A straightforward derivation (forward mapping) is in the following. Note that, forward mapping means to map the pixels in an omni-image to those of the top-view image. Specifically, any pixel p at coordinates (u, v) in the ICS can be used to compute the radial distance r as follows:
2 2
,
r= u +v (5.1)
and the azimuth angle θ as follows:
1 1
cos u sin u.
r r
θ = − = − (5.2)
By using minimum-distance estimation with r in the corresponding r-ρ Table as described in Section 4.3.3, the corresponding elevation angle ρ can be derived. More specifically, we try to find the ri in the r-ρ Table with the minimum distance to r, and regard the corresponding ρi of ri in the table as the corresponding elevation angle ρ of r. Then, according to the known height of the mirror center dh and the known elevation angle ρ so computed, as shown in Figure 5.1, the horizontal distance between a scene point and the mirror base center, dw, can be computed as follows:
cot .
dw dh= × ρ (5.3) Accordingly, by the rotational-invariant property of the omni-camera, the position (x, y) of a point P on the ground in the WCS can be obtained from Eqs. (5.2) and (5.3) as
follows:
cos ; sin .
x dw= θ y dw= θ (5.4) By mapping all pixels in the omni-image in this way, a top-view image can be obtained.
Figure 5.1 The ray tracing of a scene point P on the ground with a hyperbolic-shaped mirror.
However, a forward mapping will lead to a “broken” image with many unfilled points, as shown in Figure 5.2(b) (the black portion in the image). Thus, we attempt to derive a backward mapping for computing a “complete” top-view image. Note that, backward mapping is to map all pixels in a top-view image to those of an omni-image.
First, we may compute dw for every pixel with coordinates (x, y) in the WCS as follows:
2 2,
dw= x +y (5.5)
and the azimuth angle θ can be derived as follows:
1 1
cos x sin y .
dw dw
θ = − = − (5.6)
Eq. (5.3) can be rewritten to derive the elevation angle as follows:
tan 1 dh
ρ= − dw. (5.7) By minimum distance estimation with ρ in the corresponding r-ρ Table as described in Section 4.3.3, the corresponding radial distance r can be derived. Accordingly, by the rotational-invariant property of the omni-camera, the image coordinates (u, v) of the corresponding pixel p in the ICS can be obtained from Eq. (5.6) as follows:
cos ; sin .
u r= θ v r= θ (5.8) As a result, a complete top-view image can be obtained, as shown in Figure 5.2(c).
(a)
Figure 5.2 An omni-image and its corresponding top-view images. (a) An omni-image. (b) A top-view image obtained from forward mapping. (c) A top-view image obtained from backward mapping.
(b) (c) Figure 5.3 An omni-image and its corresponding top-view images. (a) An omni-image.
(b) A top-view image obtained from forward mapping. (c) A top-view image obtained from backward mapping continued.
In order to improve the program speed, the corresponding relations of all pixels between an omni-image and a top-view image is stored into a table, called TopviewTable, in the learning process. Accordingly, constructing top-view images do not need to compute the above-mentioned equations again and again, but just to refer to the corresponding TopviewTable.
5.2.2 Calculation of relative position of two omni-cameras
Figure 5.3 is an illustration of the layout of the video surveillance car roof where the length and the width of the car roof are obtained by manual measurement.
Accordingly, we can get the relative position between the pairs of two-camera omni-directional imaging devices easily with this layout we designed. The red circles in the layout are the positions of our devices. If the front device is assumed as the
offset between the two devices is (−110, −330).
Figure 5.4 An illustration of the layout of the video surveillance car roof.
5.2.3 Merging of two top-view images into a single one
At the beginning, we construct two top-view images using the omni-images taken from the upper cameras in the pairs of two-camera omni-directional imaging devices, respectively. To make the merging construction simple and fast, we divide a top-view image around the video surveillance car into two parts. As shown in Figure 5.4, CW is the width of the relative distance between the two upper omni-cameras, and CL is the length of the relative distance between the two upper omni-cameras. One part is the front half top-view of the car surrounding (covering the image’s upper part before the spot at CL / 2) and the other is the rear half top-view of the car surrounding (covering the image’s lower part behind at spot at CL / 2). The front one is taken from the front upper omni-camera and the rear one is taken from the rear upper omni-camera. As described in Section 5.2.2, the relative position of the two upper
110 cm 330 cm
cameras is described by an offset (CW, CL) = (−110, −330) between the front omni-camera and the rear one, so all the pixels in the rear top-view image need to shift for such an offset. Thus, a preliminary integrated top-view image can be obtained as shown in Figure 5.5
Figure 5.5 An integrated top-view image.
To obtain further a good-looking top-view image, an eclipse shape is used in the top-view image as the viewing window. The parameters a and b of the eclipse shape are described in the following:
; / 2,
W L
a r C= − b r C= − (5.9) where r is the radius of each top-view image taken from an omni-camera, CW is 110 cm, and CL is 330 cm, as mentioned previously. We discard the pixels with image coordinates (x, y) which satisfy the following equation in the integrated top-view image:
CW
CL
Cm of the front camera
Cm of the rear camera
2 2
2 2 1
x y
a +b > (5.10) Thus, an integrated top-view image can be obtained as shown in Figure 5.6.
Figure 5.6 An integrated top-view image with a eclipse shape.
5.3 Video Surveillance Car Shape Superimposition and Ground Texture Filling in Top-view Image
In the previous section, we can derive a simple top-view image by assuming all pixels in the omni-image are on the ground. However, not all the things are flat on the ground obviously, e.g., the video surveillance car, as shown in Figure 5.5. In this section, some techniques will be proposed to solve this problem of drawing the video surveillance car in the integrated omni-image.
5.3.1 Construction of car shape
Because the two pairs of two-camera omni-directional imaging devices are affixed on the video surveillance car roof, the car shape in the top-view image is always fixed. By this property, we can construct a car shape with respect to a top-view image in the learning process. In this learning stage, the video surveillance car shape is selected manually. As shown in Figure 5.6, the yellow mark portions are the video surveillance car shape appearing originally in the non-integrated omni-images taken by the two upper omni-cameras.
Figure 5.7 An illustration of the video surveillance car shape.
5.3.2 Video surveillance car shape superimposition and ground texture filling
A simple texture synthesis scheme is adopted in this study to fill ground texture into the shape area of the video surveillance car, which is described in this section.
Specifically, to fill the yellow mark portions (the car shape) in Figure 5.6, for each car shape pixel pc, we find the closest non-car shape pixel pn to pc, and use the color of pn to fill pc. This simulates ground text within the car shape area. Furthermore, the corresponding relations of all pixels within the car shape with those outside the car shape in an integrated top-view image is stored into a table called CarPatternMatch in the learning process to improve the program speed. After filling the ground texture, a real top-view image portion of a car as shown in Figure 5.7(a) is superimposed on the corresponding position in the integrated top-view image, with a result as shown in Figure 5.7(b).
(a) (b) Figure 5.8 A top view of a car and an integrated top-view image. (a) A top-view image
of video surveillance car. (b) An integrated top-view image with video surveillance car shape superimposition and ground texture filling.
Chapter 6
Automatic Detection of a Passing-by Car with a Two-camera Omni-
directional Imaging Device
6.1 Proposed Idea of Automatic Detection of a Passing-by Car
In Chapter 5, we derive a simple top-view image by assuming all pixels in the omni-image are on the ground. However, if a passing-by car is in the surveillance area, the constructed top-view image will be incorrect, as shown in Figure 6.1. In this study, we propose a method to detect a passing-by car using a two-camera omni-directional imaging device.
Figure 6.1 A top-view image with a passing-by car.
A flowchart of the proposed method is shown in Figure 6.2. It consists of three
major stages: (1) detection of a passing-by car in an omni-image, (2) detection of a passing-by car in the real world, and (3) car shape superimposition and ground filling.
In the first stage, the ground is eliminated first, and then moment preserving thresholding is used to select the non-ground area. Finally, region growing, erosion, and dilation are used to capture the passing-by car. In the second stage, the information of the median point obtained from the process of region growing is used to construct a car model. After the car model is used to conduct template matching, 3D data acquisition is applied to obtain the position of the passing-by car. In the final stage, because the car shape in the omni-image and the position in the real world are known, we can eliminate the car shape in the omni-image, use ground filling, and superimpose a top-view shape of a car on to the top-view image. As a result, a correct and understandable top-view image can be obtained. These techniques are proposed to detect the passing-by car and solve the problems caused by the top-view image error in this study, and each method will be described in the following sections.
The remainder of this chapter is organized as follows. In Section 6.2, we introduce the detection of a car region in an omni-image. In Section 6.3, we describe the detection of a car position in the real world. Finally, the proposed techniques for passing-by car shape superimposition and ground texture filling in the top-view image are described in Section 6.4.
6.2 Detection of Car Region in an Omni-image
6.2.1 Detection of non-ground region
Figure 6.2 Flowchart of a passing-by car detection
At the beginning of non-ground region detection, we transform the two omni-images taken from a two-camera omni-directional imaging device into two grayscale omni-images, and each of the two images is processed in the following way, respectively. In order to discard the ground in the omni-image I, the ground needs to be learned. More specifically, a ground area is selected manually first. Let the number of total pixels in the area be n, and the gray value at pixel (u, v) in I be denoted by I(u, v). The mean value of the ground can be computed as follows:
1 ( , )
v u
mean I u v
=n
∑∑
. (6.1) Figure 6.3 shows the interface we have designed for acquiring the data of such ground points more easily.(a) (b)
Figure 6.3 The interface for ground learning. (a) The omni-image taken by the upper omni-camera. (b) The omni-image taken by the lower one.
To eliminate the ground area in I, the gray values of all the pixels of I are reduced by the mean value of the ground to get a difference image Id. Then, the image Id is used to compute a threshold TH by moment-preserving thresholding introduced in Section 4.2.1[17]. Finally, if the difference value of a pixel is larger than the parameter TH, it is recorded as “1”; else, as “0”. This process is the so-called bi-level thresholding. At the end, we obtain a binary image IBI with non-ground object pixels all labeled as “1,” as shown in Figure 6.4.
6.2.2 Detection of car region by region growing and
component labeling
(a) (b) Figure 6.4 The non-ground omni-images. (a) The omni-image taken by the upper
omni-camera. (b) The omni-image taken by the lower one.
A region growing method [18] is used in this study to cluster the related regions.
First, a seed is selected as the start point, and the eight neighboring points of this start point are examined to check whether they belong to the region or not. The proposed scheme for this connected-component check will be described later. Then, each of the points decided to belong to the region is used as a seed again, and the connected-component check is repeated, until no more region points can be found. A flowchart describing the concept is shown in Figure 6.5. More details of the method are described as an algorithm in the following.
Algorithm 6.1: Finding the connected region in an omni-image.
Input: A seed point and a binary image IBI, as described in Section 6.2.1.
Output: An image with the connected region of the seed.
Steps:
1 Select a seed P in the image as the start point.
2 Check the eight neighboring points Ti of P to see if they belong to the region or not.
for each Ti.
2.2 Calculate the value of the similarity degree between Ti and each Ni.
Start
Put evey point arounded the seed into queue
A seed
End
The point belongs to the region;
Increase the number of the region;
Mark a corresponding label to the pixel.
Is there any data in the queue
The number of neighbors whose color is similar to the point is more than those whose color is not similar
The point color is similar to the average color of the region
Figure 6.5 A flowchart of the region growing we used.
2.3 Decide whether Ti belongs to the similar region according to the similarity values by the following steps.
2.3.1 Compare the values of similarity calculated in Step 2.2 with a threshold TH1 separately (the detail will be described later).
2.3.2 Calculate the number p of similarity values which are larger than TH1. 2.3.3 Calculate the number q of similarity values which are smaller than or
equal to TH1.
2.3.4 Compare p with q, and if the value of p is larger than q, then mark the point Ti as not belonging to the region and go to Step 2 to process the next Ti; else, continue.
2.3.5 Calculate the similarity degree d between Ti and the average RGB values of all pixels in the region of the ground (the detail will be described later).
2.3.6 Compare the similarity degree d with another threshold TH2, and if d is smaller than TH2, then mark Ti as belonging to the ground region; else, mark Ti as not.
2.4 Gather the points Bi which belong to Ti and belong to the region of the ground.
3 If there are some points of Bi which are not examined yet, then regard each Bi as a seed P and go to Step 2 again to check if they belong to the ground region or not.
In Steps 2.1 and 2.2, when a point Ti is examined, all the neighboring points Ni of Ti which have already been decided to belong to the ground region are found out first, and a similarity degree between Ti and each of its eight neighboring points, as shown in Figure 6.6, is computed. The similarity degree between two points A and B is computed in the following way:
similarity between A and B = rA − rB + gA − gB + bA −bB (6.1)
where rC, gC, and bC are the color values of point C with C = A or B here.
Figure 6.6 Illustration of calculation of the similarity degree between two image points.
In Step 2.3.1, after the similarity degree is calculated, the degree is compared with a threshold TH1, whose value may be adjusted by a user. If the value is large, then the scope of the ground region which is found will be enlarged; else, reduced.
In Steps 2.3.2 and 2.3.3, the two introduced values p and q are set to zero at first.
The value of p represents the number of points whose similarity degree is larger than TH1, and the value of q represents the number of points whose similarity degree is not so. Hence, if a degree is larger than TH1, then we add one to p, and if the degree is not so, then we add one to q. Afterward, in Step 2.3.4, if the value of p is larger than q, the point Ti is marked as not belonging to the region, and then go to Step 2 again to check the next Ti. If the value of p is not so, then an additional iterative process is conducted to examine Ti.
Sometimes the boundary between the connective region and obstacles is not very clear in images. So in Step 2.3.5, an average values AVR is calculated first, which contains 3 values, namely, the average values Ravr, Gavr and Bavr of the red, green, and
Sometimes the boundary between the connective region and obstacles is not very clear in images. So in Step 2.3.5, an average values AVR is calculated first, which contains 3 values, namely, the average values Ravr, Gavr and Bavr of the red, green, and