Chapter 1 Introduction
1.5 Thesis Organization
In the remainder of this thesis, the system configuration and the idea of the proposed method are introduced in Chapter 2. In Chapter 3, the technique of using the pano-mapping table for unwarping of omni-images into multiple perspective-view images is described. In Chapter 4, the proposed automatic method for detection of a suspicious passer-by with a two-camera omni-directional imaging device is described.
In Chapter 5, the proposed method for integration of the two omni-images taken respectively with the two upper omni-cameras into a single top-view image is presented. In Chapter 6, the proposed automatic method for detection of a passing-by car with a two-camera omni-directional imaging device is presented. In Chapter 7, experimental results and discussions are included. Finally, conclusions and some suggestions for future works are given in Chapter 8.
Chapter 2
System Configuration, Camera Design, and Idea of Proposed Method
2.1 Idea of Proposed Monitoring of Nearby Objects around a Mobile Surveillance Car
In order to monitor the surrounding area of the video surveillance car, we equip on the roof of the car a pair of two-camera omni-directional imaging devices constructed in this study, as shown in Figure 2.1. More details about the devices and the proposed idea of using them are described here.
(a) (b) Figure 2.1 The video surveillance car used in this study is equipped with a pair of
two-camera omni-directional imaging devices. (a) A front view of the video surveillance car. (c) A side view of the video surveillance car.
First, the locations on the car roof where the two imaging devices should be affixed need to be determined. In Figure 2.2, it is illustrated that the video surveillance car body almost accounts for a half of an omni-image taken by an imaging device which is affixed at the middle of the rear edge of the car roof, but it only accounts for a quarter of an omni-image taken by the same imaging device which instead is affixed at the right-rear corner of the car roof. Consequently, an imaging device affixed at a corner of the car roof will have a better view range than one affixed at the front (or back) middle of the car roof. In this study, one of the imaging devices is so affixed at the right-front of a surveillance car roof, and the other is affixed at the left-rear.
(a) (b) Figure 2.2 Positions of cameras on the video surveillance car roof and the
corresponding images of them. (a) The image captured at the rear-middle of the video surveillance car roof. (b) The image captured at the right-rear of the car roof.
The imaging devices, after being affixed, can be used to estimate relevant 3D data of objects (the detail will be described in Chapter 2.3) Then, an integrated top-view image can be obtained to view the surrounding environment of the video surveillance car from the top (the detail will be described in Chapter 5). Also, any passers-by can be detected automatically and be marked on the top-view image (the detail will be described in Chapter 4). If a user wants to see a suspicious passer-by directly, a corresponding perspective-view image may be generated for inspection (the detail will be described in Chapter 3). An example of the top view and a generated perspective view is shown in Figure 2.3.
Furthermore, passing-by cars can also be detected and marked on the top-view image by algorithms proposed in this study, such as region growing, template matching, etc. (the detail will be described in Chapter 6).
(a) (b) Figure 2.3 Images of monitoring a passer-by. (a) Top-view image showing surrounding
area of the video surveillance car with red mark indicating the passer-by’s position. (b) A corresponding perspective-view image containing the passer-by.
2.2 System Configuration
The proposed video surveillance system will be described elaborately in this section. The description will be separated into three parts: hardware configuration, software configuration, and network configuration. The hardware includes the video surveillance car, a pairs of two-camera omni-directional imaging devices, and two laptop computers. The software includes the program used to integrate the vision-based system, the drivers of the omni-cameras, and the program developed by the hhARTRAY Company which is a provider of CCD cameras. However, each two-camera omni-directional imaging device is controlled by a laptop computer, so we construct a local network configuration to handle the problem of communication.
2.2.1 Hardware configuration
The entire hardware structure of the proposed video surveillance system used in this study is shown in Figure 2.4. The video surveillance car we use is named Delica made by Mitsubishi Co. It is a 469cm×169cm×196cm van with a working desk and a power supply designed especially for this study. Its capacity is eight people. For the purpose of connection between four omni-cameras outside the video surveillance car and the two computers inside, four extension cords are used to cross the video surveillance car.
Each of the two-camera omni-directional imaging devices affixed on the video surveillance car roof includes two omni-cameras, and each of the omni-cameras is composed of a lens, a CMOS camera, a mirror, an acrylic tube, and a shelf. The detailed descriptions of these imaging devices, the parameters of the mirrors, and the optical principle of the imaging devices will be described in detail in Chapter 2.3.
Figure 2.4 Structure of the proposed monitoring system.
As to the control unit, two notebook PC’s are used to integrate the entire video surveillance system. In Figure 2.4, Computer A is a F6E laptop computer produced by ASUSTeK Computer Inc., and Computer B is an A300 laptop computer produced by Toshiba Co. The performance specifications of these computers are shown in Table 2.1. The cross-over cable used for communicating two computers is a Cat-6 cable for the gigabit Ethernet.
Computer A Computer B
Cross-over cable
Camera system B Camera system A
Fixed on Fixed on
USB USB
Video surveillance car
Control System
Table 2.1 Specifications of the used laptop computers.
ASUS F6E TOSHIBA A300
CPU Intel Core 2 Duo T5850/ 2.16 GHz Intel Core 2 Duo T9400/ 2.53 GHz
Chipset Intel PM 965 Intel GM965
RAM 4 GB DDR2 / 667 MHz 2 GB DDR2 / 800 MHz
GPU IntelGMA X3100 ATIRadeon HD 3650 / 512 MB
Network Gigabit LAN Fast Ethernet LAN
2.2.2 Software configuration
We use the Borland C++ builder as the development tool in this study to acquire omni-images and analyze them. It is fast and convenient to develop a GUI-based program using the Borland C++ builder. The programming language we use is C++, a widely-used language. The operating system we use is Window XP.
To access the images taken by the cameras, the computers have to set up the drivers of the ARTCAM-200SO cameras and the ARTCAM-200SS cameras. The Artcam Co. provides a development tool called Capture Module Software Developer Kit that assists developers in communication with the embedded system of the camera, using a USB connection. In addition, the SDK is an object-oriented toolkit and usable under Windows 2000 or XP in many languages like C++, C, VB.NET, C#.NET and Delphi. Using the SDK, we can preview the image of each camera’s view and capture the current image data. It is also convenient to use it to develop any function with images grabbed with the cameras as input.
2.2.3 Network configuration
A network configuration is needed for communication between the two laptop computers because four omni-images are acquired from the two pairs of two-camera omni-directional imaging devices (CSA and CSB in Figure 2.5), and each imaging device is processed by a separated notebook PC (COMA and COMB in Figure 2.5).
The network we propose for this is shown in Figure 2.5.
Figure 2.5 The entire proposed system and the network architecture of transmission.
As shown, COMA is used to display the top-view image of the surrounding area of the video surveillance car, and COMB to display the perspective-view images in a specified direction of the car. Therefore, COMB transforms the omni-images gathered from CSB into a top-view image and transmits the result to COMA which then merges
Transmission data
the two top-view images (one by CSA and other by CSB) into an integrated top-view image of the car surrounding. On the other side, COMA transmits the omni-image gathered from CSA and a control signal to COMB, so that COMB knows the view’s direction and constructs the corresponding perspective-view image.
2.3 Design of a Pair of Two-camera Omni-directional Imaging devices
2.3.1 System configuration
Each of the two-camera omni-directional imaging devices consists of two omni-cameras combined coaxially in the longitudinal direction, as shown in Figure 2.6(a). The entire system includes four lenses of model LV0612H, two CMOS cameras of model ARTCAM-200SO, and two CMOS cameras of model ARTCAM-200MI. Two lenses of the four and the two ARTCAM -200SO CMOS cameras are shown in Figure 2.6(b).
The LV0612H is a mega-pixel lens with the following arguments: 1/2", 6mm, and F1.2. The specifications of the COMS cameras are shown in Table 2.2. Camera system A in Figure 2.4 is formed with the two ARTCAM-200SO cameras, and affixed on the right-front of the video surveillance car roof. Camera system B is formed with the two ARTCAM-200MI cameras, and affixed on the left-rear of the video surveillance car roof.
2.3.2 Camera Design Principle
To explain how we design the omni-camera used in our omni-directional
imaging devices (there are four of this kind of camera in our system), we derive the related formulas in the following first.
(a) (b) Figure 2.6 (a) Two-camera omni-directional imaging device. (b) Two lenses and two
ARTCAM-200SO CMOS cameras.
Table 2.2 Specifications of the used COMS cameras
ARTCAM-200SO ARTCAM-200MI Resolution 2.0 M pixels 2.0 M pixels Dimension 33mm × 33mm × 50mm 33mm × 33mm × 50mm CMOS sensor size 1/2” (6.4×4.8mm) 1/2” (6.4×4.8mm)
Mount C-mount C-mount
Frame per second 8 fps 5 fps
Direct show camera Yes No
The structure of each omni-camera with a hyperbolic-shaped mirror is illustrated in Figure 2.7, with the world coordinate system (WCS) specified by (X, Y, Z). The hyperbolic shape of the mirror in the camera coordinate system may be described [8]
as:
2 2
where a and b are the parameters of the hyperbolic shape. The parameter d, as shown in Figure 2.7(b), is the distance between the optical center of the lens and the mirror center, whose value can be obtained by a simple formula d = 2c where c = a2 +b2 . Also, it is noted that the axis of the camera is aligned with the axis of the hyperbolic mirror, and the camera center is fixed at one of the two focal points of the mirror.
(a) (b)
Figure 2.7 An illustration of used omni-camera structure. (a) Geometry of the omni-camera vision. (b) Geometry between the mirror and the CMOS sensor in camera.
By the geometry of the shape of a hyperboloid described by Eq. (2.1), the value ρ, which specifies an elevation angle shown in Figure 2.7 (a), can be computed by the following formula:
Furthermore, the angles θ and β in Figure 2.7 (a) can be computed as follows:
tan 1 ;
In Figure 2.7 (b), by trigonometry, we have
,
w
d f
r = S (2.5) where, f is a focal length, r is the radius of the circular area of the base of the mirror, and Sw is the width of a CMOS sensor.
Now we can explain how we design the omni-cameras we use in this study according to the above theoretical derivations. The goal is to design a mirror of the hyperbolic shape and determine the distance from the camera to the mirror.
Specifically, we have to derive the parameters, a, b, and c, of the hyperbolic shape so that we can ask an optics manufacturer to produce a mirror of such parameters for us.
Note that the distance from the camera to the mirror, denoted as d above, is just 2c because we put the camera at such a position that its optical center of the lens is located just at a focal point of the hyperbolic shape, as shown in Figure 2.7.
Because the projective camera we use has a focal length f of 6 mm and a sensor width Sw of 2.4mm, and because the circular area of the base of the mirror has a radius r of 4 cm, according to Eq. (2.5) and d = 2c, we can derive d and c as using Eq. (2.4), we can reduce Eq. (2.2) to be the following equation with only one variable b:
(b2+25) 0.9287 10× − b= (2.6) 0
from which b can be solved to be b = 3.3851. And by c = 5 = a2+b2 , a can be solved to be 3.6797. Thus, the parameters of the hyperbolic mirror designed in this study are all obtained, that is, a = 3.6797 and b = 3.3851.
2.3.3 3D data acquisition
α1
Figure 2.8 Computation of depth using the two-camera omni-directional imaging device. (a) The ray tracing of a scene point P in the imaging device with a hyperbolic-shaped mirror. (b) A triangle in detail (part of (a)).
In this section, it will be briefly described how to use two elevation angles of a scene point P to get relevant 3D data. Note that these elevation angles can be obtained by using a pano-mapping table (it will be described in Chapter. 3). Specifically, as shown in Figure 2.8(a), each image point P is a projection of a corresponding point on the hyperboloid, which can be defined by the elevation angles α1 and α2. The upper hyperbolic mirror center is assumed to be the WCS center (0, 0, 0). The desired goal
is to use α1 and α2 to get (x, y, z).
In Figure 2.8(b), by the triangulation principle, the distance d between the scene point P and the center of a hyperbolic-shaped mirror c1 may be computed as
2 1 2
sin(90 ) sin( ),
d b
α = α α
+ − (2.6) where b is the disparity of the stereo imaging device. In the system we proposed, b = 24.2 cm. Eq. (2.6) can be reduced to the following equation:
1 1 2 and the horizontal distance dw and vertical distance Z in Figure 2.8 (a) may thus be computed by:
A system configuration of the upper omni-camera with a hyperbolic-shaped mirror is shown in Figure 2.9, with the WCS specified by (X, Y, Z) and the image coordinate system (ICS) specified by (U, V). The I(u, v) is an image point projected by a scene point P(x, y, z).
A triangulation which includes the angle θ in Figure 2.9 can be described by the pixel coordinates (u, v) as follows:
2 2 2 2
Figure 2.9 The system configuration of upper omni-camera with a hyperbolic mirror.
According to the characteristic that the axis of the camera is aligned with the axis of the hyperbolic mirror, the azimuth angle θ of point P in the WCS and the azimuth θ angle of point I in the ICS are the same one (according to the rotation-invariant property of the omni-camera). Therefore, the parameters x, y in the GCS can be estimated as follows: unique position of a scene point P can be found. The method we use to transform each pixel in an omni-image to an azimuth angle and an elevation angle in the WCS will be described in Chapter 3. Therefore, if a pair of matching points (one is in an omni-image taken by the upper omni-camera, and the other is in an omni-image taken by the lower omni-camera) is known, a relevant 3D data is also obtained.
2.4 System Process
For the purpose of learning all the information that the proposed system can process a video surveillance with the two pairs of two-camera omni-directional imaging devices on the video surveillance car roof, we develop a learning interface for users. The entire learning process is shown in Figure 2.10.
Start of Learning
Figure 2.10 Flowchart of proposed learning process.
In this study, the recorded data are camera-related and object-related ones. The camera-related data are used in a transformation to estimate relevant 3D data and a transformation to construct top-view images. The former is obtained from the camera calibration processes which will be described in Chapter 3, and the latter is obtained from the transformation that transforming an omni-image to a top-view image which
will be described in Chapter 5. The object-related data is a shape of the video surveillance car in top-view image. It is used to construct a top-view image which is not affected by the height of the video surveillance car. The process will be described in detail in Chapter 5.
After all the data are obtained, they are saved into some text files. These files are then used in the video surveillance more than once, so this is also a method for improving the speed of calculations without computing the same data over and over.
When the learning job has been done, the video surveillance system can start surrounding monitoring. The entire monitoring process of passers-by in suspect proposed in this study is shown in Figure 2.11.
As shown in Figure 2.11, we read the related table files at the beginning.
Computer A is used to show the top-view image, and Computer B is used to show the perspective-view image. The communication of the two computers is described in Section 2.2.3. The construction of a perspective-view image will be described in Chapter 3. The passer-by detection, which yields a red mark in the image, will be described in Chapter 4. The construction of a top-view image and the integration of the two camera Systems will be described in Chapter 5. Finally, The passing-by car detection, which yields a yellow mark in the image, will be described in Chapter 6.
Because both the passer-by detection and passing-by car detection processes require heavy computations, the passing-by car detection process we propose is designed to be independent of the passers-by detection process. Such a compromise approach makes the execution of the two processes smoother.
Start of Video Surveillance
Camera System A Computer B with
Camera System B
Figure 2.11 Flowchart of the proposed video surveillance system.
Chapter 3
Using Pano-mapping Tables for Unwarping Omni-images into Multi-perspective-view Images
3.1 Idea of Pano-mapping for Omni- image Unwarping
If a suspicious passing-by approaches the video surveillance car, the perspective-view image in the suspect’s direction with respect to the car should be made available to the user in the car for a clearer inspection. This requires unwarping of the omni-images taken with the camera devices used in this study.
Conventional methods for unwarping omni-images require the knowledge of certain camera parameters, like the focal length of the lens, the coefficients of the mirror surface shape equation, etc., to calibrate the camera before omni-image unwarping. However, we cannot get the complete information of the omni-camera parameters in some situations. A solution to this problem is to use the space-mapping technique proposed by Jeng and Tsai [8], as mentioned previously. The technique is based on the use of a pano-mapping table, which may be regarded as a summary of the information conveyed by all the camera parameters. The pano-mapping table is created once forever for each omni-camera and not changed even when the camera is moved around. The table is created by a calibration process making use of certain selected points in the world space with known coordinates and their corresponding pixels in an omni-image. The detail will be described in Section 3.2. The table may be
used to create perspective-view images, as described in Section 3.3.
Another advantage of using the space-mapping technique is that the corresponding relationship of an omni-camera between a radial length r and an elevation angle ρ can also be obtained. The corresponding relationship is defined as a table, called r-ρ Table in this study. Like the method of 3D data extraction described in Section 2.3.3, if two corresponding pixels taken by a two-camera omni-directional imaging device are known, the corresponding elevation angles may derived by use of the r-ρ Table, and the azimuth θ also can be computed by the rotational-invariant property of the omni-camera. Then the unique position of a scene point can be found. This will be very useful in the following chapters for use in detections of passers-by and pass-by cars, for example.
The remainder of this chapter is organized as follows. In Section 3.2, we describe the technique we adopt for pano-mapping table creation in detail. In Section 3.3, we
The remainder of this chapter is organized as follows. In Section 3.2, we describe the technique we adopt for pano-mapping table creation in detail. In Section 3.3, we