A Two-Omni-Camera Stereo Vision System With an Automatic Adaptation Capability to Any System Setup for 3-D Vision Applications

(1)

A Two-Omni-Camera Stereo Vision System With

an Automatic Adaptation Capability to Any System

Setup for 3-D Vision Applications

Shen-En Shih and Wen-Hsiang Tsai,

Senior Member, IEEE

Abstract—A stereo vision system using two omni-cameras for 3-D vision applications is proposed, which has an automatic adap-tation capability to any system setup before 3-D data compuadap-tation is conducted. The adaptation, which yields the orientations and distance of the two omni-cameras, is accomplished by detecting and analyzing the horizontal lines appearing in the omni-images acquired with the cameras and a person standing in front of the cameras. Properties of line features in environments are utilized for detecting more precisely the horizontal lines that appear as conic sections in omni-images. The detection work is accomplished through the use of carefully chosen parameters and a refined Hough transform technique. The detected horizontal lines are utilized to compute the cameras’ orientations and distance from which the 3-D data of space points are derived analytically. Compared with a traditional system using a pair of projective cameras with nonadjustable camera orientations and distance, the proposed system has the advantages of offering more flexibility in camera setups, better usability in wide areas, higher precision in computed 3-D data, and more convenience for nontechnical users. Good experimental results show the feasibility of the proposed system.

Index Terms—3-D vision applications, automatic adaptation, omni-camera, omni-image, stereo vision, system setup.

I. Introduction

W

ITH the advance of technologies, various types of vision systems have been designed for many applica-tions, such as virtual and augmented reality, video surveillance, environment modeling, TV games. Among these applications, human–machine interaction is a critical area [1]–[4]. For example, Microsoft Kinect [5] is a controller-free gaming system in the home entertainment field, which uses several sensors to interact with players. Most of these human–machine interaction applications require acquisitions of the 3-D data of human bodies, meaning, in turn, the need of precise

Manuscript received May 27, 2012; revised August 8, 2012 and October 5, 2012; accepted October 8, 2012. Date of publication January 14, 2013; date of current version June 27, 2013. This work was supported in part by a grant from the Ministry of Economic Affairs, China, under Project MOEA 98-EC-17-A-02-S1-032 in the Technology Development Program for Academia. A preliminary version of this paper appeared in Advances in Multimedia Modeling (LNCS), vol. 6523, K. T. Lee, W. H. Tsai, H. Y. M. Liao, T. Chen, J. W. Hsieh, and T. T. Tseng, Eds. Berlin/Heidelberg, Germany: Springer, 2011, pp. 193–205. This paper was recommended by Associate Editor Y. Fu. The authors are with the Department of Computer Science, National Chiao Tung University, Hsinchu 30010, Taiwan (e-mail: peter159.cs98g@nctu.edu.tw; whtsai@cis.nctu.edu.tw).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSVT.2013.2240161

system calibration and setup works to yield accurate 3-D data computation results in the application environment. However, from a consumer’s viewpoint, it is unreasonable to ask a user to set up a vision system very accurately, requiring, e.g., the system cameras to be affixed at accurate locations in precise orientations. Contrarily, it is usually preferable to allow a user to choose freely where to set up the system components.

In addition, many interactive systems used for the previously mentioned applications are composed of traditional projective cameras that collect less visual information than systems using omni-directional cameras (omni-cameras). To overcome these difficulties, a 3-D vision system that consists of two omni-cameras with a capability of automatic adaptation to any camera setup is proposed. While establishing the system, the user is allowed to place the two cameras freely in any orientations with any displacement.

Human–machine interaction has intensively been studied for many years. Laakso and Laakso [6] proposed a multiplayer game system using a top-view camera, which maps player avatar movements to physical ones, and uses hand gestures to trigger actions. Magee et al. [7] proposed a special human-machine interface, which uses the symmetry between the left and right human eyes to control computer applications. Zabulis

et al. [8] proposed a vision system composed of eight cameras

mounted at room corners and two cameras mounted on the ceiling to localize multiple persons for wide-area exercise and entertainment applications. Starck et al. [9] proposed an advanced 3-D production studio with multiple cameras. The design considerations are first identified in that study, and some evaluation methods are proposed to provide an insight into different design decisions.

Geometric features, such as points, lines, spheres, in environments encode important information for online cali-brations and adaptations [10], [11]. Several methods have been proposed to detect such features in environments. Ying [12], [13] proposed several methods for detecting geometric features when calibrating catadioptric cameras, which use the Hough transform to find the camera parameters by fitting detected line features into conic sections. Duan et al. [14] proposed a method for calibrating the effective focal length of the central catadioptric camera using a single space line under the condi-tion that other parameters have been calibrated previously. Von Gioi et al. [15] proposed a method for detecting line segments in perspective images, which gives accurate results with a

(2)

controlled number of false detections and requires no parameter tuning. Wu and Tsai [16] proposed a method for detecting lines directly in an omni-image using a Hough transform process without unwarping the omni-image. Maybank et al. [17] proposed a method based on the Fisher–Rao metric to detect lines in paracatdioptric images, which has the advantage that it does not produce multiple detections of a single space line. Yamazawa et al. [18] proposed a method for reconstructing 3-D line segments in images that are taken with three omni-cameras in known poses based on trinocular vision by the use of the Gaussian sphere and a cubic Hough space [19]. Li et al. [20] proposed a vanishing point detection method based on cascaded 1-D Hough transforms, which requires only a small amount of computation time without losing accuracy.

In this paper, we propose a new 3-D vision system using two omni-cameras, which has a capability of automatic adaptation to any system setup for convenient in-field uses. Specifically, the proposed vision system, as shown in Fig. 1, consists of two omni-cameras facing the user’s activity area. Each camera is affixed firmly to the top of a rod, forming an omni-camera stand, with the camera’s optical axis adjusted to be horizontal (i.e., parallel to the ground). The cameras are allowed to be placed freely in the environment at any location in any orientation, resulting in an arbitrary system setup. Then, by the use of space line features in environments, the proposed vision system can adapt automatically to the arbitrarily established system configuration by just asking the user to stand still for a little moment in the middle region of the activity area in front of the two cameras. After this adaptation operation, 3-D data can be computed precisely, as will be shown by the experimental results in this paper.

As an illustration of the proposed system, Fig. 1(c) shows the case of a user using a cot-covered fingertip as a 3-D cursor point, which is useful for 3-D space exploration in video games, virtual/augmented reality, 3-D graphic designs, and so on. The fingertip is detected and marked as red in that figure, whose 3-D location can be computed by triangulation.

In contrast with a conventional vision system with two cameras whose configuration is fixed, the proposed system has several advantages. First, the system can be established freely, making it suitable for wide-ranging applications. This is a highly desired property, especially for consumer electronics applications such as home entertainment or in-house surveil-lance since the user can place the system components flexibly without the need to adjust the positions of the existing furniture in the application environment. Second, since the proposed vision system uses omni-cameras, the viewing angle of the system is very wide. This can be seen as an improvement over commercial products such as Microsoft Kinect since the player can now move more freely at a close distance to the sensors. This advantage is very useful for people who only have small spaces for entertainments. Also, the two cameras in the proposed system are totally separated from each other at a larger distance, resulting in the additional merit of yielding better triangulation precision and 3-D computation results due to the resulting longer baseline between the two cameras.

In the remainder of this paper, an overview of the proposed

Fig. 1. Configuration and an illustration of the usage of proposed system. (a) Illustration. (b) Real system used in this paper. (c) Omni-image of a user wearing a finger cot (marked as red).

system is described in Section II, and the details of the proposed techniques for use in the system are presented in Sections III–VI. Experimental results are included in Section VII, followed by conclusions in Section VIII.

II. Overview of Proposed System

The use of the proposed system for 3-D vision applications includes three stages: 1) in-factory calibration; 2) in-field system adaptation; and 3) 3-D data computation. The goal of the first stage is to calibrate the camera parameters efficiently in the factory environment. For this, a technique using landmarks and certain conveniently measurable system features is proposed. In the second stage, an in-field adaptation process is performed, which uses line features in environments to automatically compute the orientations of the cameras and the distance between them (i.e., the baseline of the system). In this stage, a user with a known height is asked to stand in the middle region in front of the two cameras to complete the adaptation. Subsequently, the 3-D data of any feature point (such as the finger tip shown in Fig. 1(c)) can be computed in the third stage.

A sketch of the three operation stages of the proposed system is described in the following algorithm (Algorithm 1). To simplify the expressions, we will call the left and right cameras as Cameras 1 and 2, and their camera coordinate systems (CCSs) as CCSs 1 and 2, respectively.

Via Algorithm 1, the meaning of system adaptation, which is the main theme of this paper, can be made clearer now: only with the input of the knowledge of the user’s height (see Step 6.3), the proposed system can infer the required values of the cameras’ orientations β1and β2 and baseline D for use

in computing the 3-D data of space points. This is not the case when using a conventional stereo vision system with two cameras, in which the configuration of the cameras is fixed with their orientations and baseline unchangeable. This merit of the proposed system makes it easy to conduct a system

(3)

Algorithm 1 Sketch of the Proposed System’s Operation

Stage 1. Calibration of omni-cameras.

Step 1. Set up a landmark and select at least two feature points Pi on it, called landmark points.

Step 2. Perform the following steps to calibrate Camera 1. 2.1. Measure manually the radius of the mirror base of

the camera as well as the distance between the camera and the mirror, as stated in Section VII-A. 2.2. Take an omni-image I1 of landmark points Pi with

Camera 1 and extract the image coordinates of those pixels pi which correspond to Pi.

2.3. Detect the circular boundary of the mirror base in

I1, compute the center of the boundary as the camera center, and derive accordingly the focal

length f1 of the camera, as described in

Section VII-A.

2.4. Calculate the eccentricity ε1 of the hyperboloidal

mirror shape using the coordinates of pi and those

of Pi, as stated in Section VII-A.

Step 3. Take an image I2 of landmark points Pi with

Camera 2 and perform operations similar to those of the last step to calibrate the camera to obtain its focal length f2 and eccentricity ε2.

Stage 2. Adaptation to the system setup.

Step 4. Place the two camera stands at proper locations with appropriate orientations to meet

the requirement of the application activity.

Step 5. Perform the following steps to calculate the included angle φ between the two optical axes of the cameras as shown in Fig. 1(a).

5.1. Capture two omni-images I1and I2of the application

activity environment with Cameras 1 and 2, respectively.

5.2. Detect space line features Liin omni-image I1using

the Hough transform technique as well as the parameters f1 and ε1, as described in Section IV.

5.3. Detect space line features Ri in omni-image I2

similarly with the use of the parameters f2 and ε2.

5.4. Calculate angle φ using the detected line features Li

and Ri in a way as proposed in Section V.

Step 6. Perform the following steps to calculate the orientations of the two cameras and the baseline between them.

6.1. Ask a user of the system to stand in the middle region in front of the two omni-cameras and take two images of the user using the cameras. 6.2. Extract from the acquired images a pre-selected

feature point on the user’s body, and compute the respective orientations β1 and β2 of the two

cameras using the angle φ, as described in Section VI-A.

6.3. Detect the user’s head and foot in the images, compute the in-between distance up to a scale, and use the distance as well as the corresponding known height of the user to calculate the baseline D between the cameras, as described in Section VI-C.

Stage 3. Acquisition of 3-D data of space points.

Step 7. Take two omni-images of a selected space feature point P (e.g., a fingertip, a handed light point, a body spot, etc.) with both cameras, and extract the corresponding pixels p1 and p2 in the taken images.

Step 8. Calculate as output the 3-D position of P in terms of the coordinates of p1and p2, the focal lengths f1

andf2, the eccentricities ε1 and ε2, the orientations β1 and β2, and the baseline D, using a triangulation

based method described in Section VI-B.

setup in any room space by any people for more types of applications, as mentioned previously.

III. Structure of Omni-Cameras

The structure of omni-cameras used in this paper and the associated coordinate systems are defined as shown in Fig. 2. An omni-camera is composed of a perspective camera and a hyperboloidal-shaped mirror. The geometry of the mirror shape can be described in the CCS as

(Z− c)2 a2 −

X2_{+ Y}2 b2 = 1, a

2_{+ b}2_{= c}2_{, Z < c.}

The relation between the camera coordinates (X, Y, Z) of a space point P and the image coordinates (u, v) of its corresponding projection pixel p may be described [22] as

tan α = √ Z X2_{+ Y}2 = (ε2_{+ 1) sin β}_{− 2ε} (ε2− 1) cos β (1) cos β = r r2_{+ f}2 sin β = f r2_{+ f}2 r= √ u2_{+ v}2 ₍₂₎

where ε is the eccentricity of the mirror shape with its value equal to c/a, and α is the elevation angle of P, respectively. The azimuth angle θ of P can be expressed in terms of the image and camera coordinates as

cos θ=√ X X2_{+ Y}2= u √ u2_{+ v}2 sin θ= Y √ X2_{+ Y}2= v √ u2_{+ v}2. (3)

IV. Space Line Detection in Omni-Images We now describe the proposed method for detecting hori-zontal space lines in omni-images. Several ideas adopted to design the method are emphasized first. First, it is desired to eliminate initially as many nonhorizontal space lines in each acquired image as possible since only horizontal space lines are used to find the included angle φ, as described in Section V. Second, it is hoped that the method can deal with large amounts of noise so that it can be used in an automatic process. Third, it is preferable to utilize certain properties in man-made environments to improve the detection result, including the two properties that space lines are mostly horizontal or vertical, and that space line edges are usually not close to one another. This section is organized as follows. First, a quadratic formula describing the projection of a space line in an omni-image is derived in Section IV-A. Next, a refined Hough

(4)

Fig. 2. Camera and hyperboloidal-shaped mirror structure.

transform technique for detecting space lines is proposed in Section IV-B, which uses a novel adaptive thresholding scheme to produce robust detection results. Also, the projection of a vertical space line is derived and analyzed in Section IV-C. A peak cell extraction technique proposed for use in the refined Hough process is described in Section IV-D.

A. Projection of a Space Line in an Omni-Image

Given a space line L, we can construct a plane S that goes through L and the origin Omof a CCS, as shown in Fig. 3. Let NS= (l, m, n) denote the normal vector of S. Then, any point P= (X, Y, Z) on L satisfies the following plane equation:

NS·P = lX + mY + nZ = 0 (4)

where “.” denotes the inner-product operator. Combining (4) with (1) and (3), we get

lRcosθ+ mRsinθ + nRtanα = 0 (5) where R =√X2_{+ Y}2_{. Dividing (5) by R}√_l2_{+ m}2_{+ n}2_leads

to lcos θ √ l2_{+ m}2_{+ n}2 + msin θ √ l2_{+ m}2_{+ n}2 + ntan α √ l2_{+ m}2_{+ n}2 = 0

that can be transformed into the following form:

Acos θ +1− A2_{− B}2_{sin θ + B tan α = 0} ₍₆₎

with the two parameters A and B being defined as

A= √ l

l2_{+ m}2_{+ n}2, B=

n

√

l2_{+ m}2_{+ n}2. (7)

Accordingly, the normal vector NSof plane S, originally being

(l, m, n), can now be expressed alternately as

NS= (A,

1− A2− B2_{, B}_). ₍₈₎

It is assumed that m ≥ 0 in (6) and (8). In the case that

m <0, we may consider NS = (−l, −m, −n) instead, which

also represents the same space plane S. Also, it can be seen from (7) that parameters A and B satisfy the constraint

A2 + B2 = 1, implying that the Hough space is of a circle shape.

Parameters A and B are used in the Hough transform to detect space lines in omni-images. These two parameters are skillfully defined in (7), leading to several advantages. First, removals of vertical space lines can easily be achieved by ignoring periphery regions, as described in Section IV-C. Next,

Fig. 3. Illustration of a space line L projected on an omni-image as IL.

since the possible values of A and B range from−1 to 1, the size of the Hough space is fixed within this range. This is a necessary property to use the Hough transform technique, and is an improvement on a previous work [16]. Also, parameters

A and B are used directly to describe the directional vector

of the space line L, as will be shown in (14). Hence, one may divide the Hough space into more cells to yield a better precision.

Combining (6) with (1) through (3), we can derive a conic section equation to describe the projection of a space line L onto an omni-image as follows:

FA,B(u, v) = C1u2+ C2uv+ C3v2+ C4u+ C5v+ C6= 0 (9)

where the coefficients C1 through C6 are

C₁= A2− B2C2₇− 1 C₂= 2A1− A2− B2 C₃= 1− A2− C2₇B2 C₄= 2ABC7f C5= 2BC7 1− A2− B2_f _C 6= B2f2 C7= ε2+ 1 ε2_{− 1}.

The quadratic formula (9) will be called the target equation in the Hough transform subsequently, since the goal of the detection process is to find curves described by it in an omni-image.

B. Hough Space Generation With Adaptive Thresholding

We define the Hough space to be 2-D with the parameters

A and B described previously. Furthermore, we define the cell support for a cell at (A, B) in the Hough space as the

set of those pixels that contribute to the accumulation of the value of that cell. Let L denote a space line described by the two parameters (A, B). Two properties of cell supports are desirable: 1) the pixels of the projection IL of L onto the

omni-image are all included in the cell support for the cell (A, B); and 2) the pixels not on IL are not included in this

cell support. Furthermore, it is desired that the shape of the cell support is of a certain fixed width and not too “thin” so that (edge) pixels originally belonging to IL, but with small

(5)

Fig. 4. Shapes of cell supports of four chosen Hough cells yielded by three methods. (a) Using traditional accumulation method. (b) Using a threshold δ= 3000. (c) Using the proposed technique.

a cell support is desired to be a space line projection with a certain width everywhere along the line, which is called an equal-width projection curve hereafter. In this section, we first show that commonly used curve detection methods do not generate desired equal-width projection curves as cell supports, as shown in Fig. 4(a) and (b); thus, we propose in this paper an adaptive method for solving this problem to yield better results such as that shown in Fig. 4(c).

A commonly used method for curve detection to calculate the cell support is as follows [30]–[32]: for each pixel at coordinates (u, v), find all the Hough cells with their parameter values (A, B) satisfying the target equation (9), and increment the value of each cell so found by one. Some cell supports calculated by this method are shown in Fig. 4(a), showing that the cell supports for some cells are not with equal widths.

Another straightforward method for calculating the cell support is as follows [16]–[33]: define a threshold δ first, and for each (edge) pixel with coordinates (u, v), find all the Hough cells with their parameters (A, B) satisfying the equation

FA,B(u, v)=C1u2+ C2uv+ C3v2+ C4u+ C5v+ C6 ≤δ

(10) and increment the value of each cell so found by one. However, as shown in Fig. 4(b), it is impossible to find a good threshold

δthat makes all the projection curves be with equal widths. To solve this problem, it is necessary to develop a new method for adaptively determining the threshold value δ for each different cell support and each different pixel.

Conceptually, to draw an equal-width curve of F = 0, we have to compute the function values of F on the projection curve boundary, and define the threshold δ accordingly. To this aim, the method that we propose makes a novel use of total derivatives to estimate the function values of F on the boundary, and sets the threshold value δ in (10) accordingly. More specifically, δ is set in the proposed method to be

δ(A, B, u, v) = max (u,v)=(±1,±1) ∂FA,B ∂u u+ ∂FA,B ∂v v (11) for different Hough cells with parameters (A, B) and different pixels at coordinates (u, v). Accordingly, as shown in Fig. 4(c), the drawn curves are now with uniform widths.

In summary, the Hough space can be generated using (10) with threshold δ calculated by (11). With this improvement, the

cell supports become equal-width projection curves, making the Hough transform process more robust to yield a precise peak value that represents a detected space line.

C. Additional Constraint on Vertical Space Lines

In man-made environments, most lines are either parallel to the floor (that is called horizontal space lines hereafter) or perpendicular to the floor (that is called horizontal space lines). If we can eliminate vertical space lines from the detection results, the rest of them are much more likely to be horizontal ones that are desired, as stated in Section V. In this section, a constraint on the vertical space line is derived for the purpose of removing such lines.

As mentioned earlier, the omni-camera stands are vertically placed on the floor with the y-axis of the camera coordinate system being a vertical line, as depicted in Fig. 1(a). As a result, the directional vector vL of a vertical space line L is

just (0, 1, 0). Let S be the space plane going through L and the origin Omthat is at camera coordinates (0, 0, 0). Also, let NS=

(l, m, n) be the normal vector of plane S. By definition, normal vector NS is perpendicular to vL, leading to the constraint

NS· vL= (l, m, n) (0, 1, 0) = m = 0.

This constraint, when combined with (7), results in the equality

A2+ B2= 1, which shows subtly that the Hough cells of vertical space lines are located in the periphery region of the circular Hough space (as mentioned in Section IV-A). As a result, vertical space lines can easily be removed by just ignoring the periphery region of the Hough space. In the proposed method, this is achieved automatically by applying a filter on the Hough space, as described in Section IV-D.

Note that, in general, vertical and horizontal space lines do not correspond to curve segments with vertical and horizontal chords in omni-images. In fact, the projections of horizontal space lines may be with any direction, as shown in Fig. 11(f). Also, the removal of a vertical space line will sometimes also eliminate a few horizontal space lines lying on the plane that goes through the vertical space line and the origin of the camera coordinate system. However, as shown in Figs. 7(a), (b) and 11(e) and (f), many horizontal space lines can still be extracted.

D. Peak Cell Extraction

After the Hough space is generated, the last thing to do is to extract cells with peak values, called peak cells, which represent the detected space lines. The simplest way to accomplish this is to find the cells with large values. However, if we do so to get peak cells such as those shown in Fig. 5(a), we might get a bad detection result such as that shown in Fig. 5(b) with many of the detected space lines being too close to one another, from which less useful space lines may be extracted.

To solve this problem, we notice that the line edges in an environment mostly are not so close mutually, meaning that two detected horizontal lines usually are separated for a certain distance. This, in turn, means that extracted peak cells should not be too close to one another. To find the peak cells that are

(6)

Algorithm 2 Detection of horizontal space lines in the form of conic

sections in an omni-image. Input: an omni-image I.

Output: 2-tuple values (Ai, Bi) as defined in (7) which

describe detected horizontal space lines in I. Step 1. Extract the edge points in I by an edge detection

algorithm [25].

Step 2. Set up a 2-D Hough space H with two parameters

Aand B, and set all the initial cell values to be zeros. Step 3. For each detected edge point at coordinates (u, v)

and each cell C with parameters (A, B), if (u, v, A, B) satisfies (10) in which the threshold value

δ is adaptively calculated by (11), then increment the value of C by one.

Step 4. Apply the filter described by (12) to Hough space

H, choose those cells with maximum values, and

take their corresponding parameters (Ai, Bi)

as output.

not too close to each other, a filter as follows is applied on the Hough space:

1 25 ⎡ ⎢ ⎢ ⎢ ⎢ ⎣ −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 24 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 ⎤ ⎥ ⎥ ⎥ ⎥ ⎦. (12)

Then, we extract peak cells by choosing the cells with large values in the filtered Hough space to yield a better detection result, as shown in Fig. 5(c) and (d).

Furthermore, it is noted that when applying the filter to the Hough space, one of the side effects is the removal of the periphery region. This is a desired property mentioned in Section IV-C: the removal of the periphery region is equivalent to the removal of vertical space lines. Thus, expectedly, we can get more horizontal lines as desired. To sum up, we have proposed a new method for detecting horizontal space lines in omni-images with several novel techniques also being proposed in Sections IV-A–IV-D to improve the detection result. The proposed method for horizontal space line detection is summarized as Algorithm 2.

V. Calculation of Included Angle_{φ Between Two} Cameras’ Optical Axes Using Detected Lines In the proposed vision system, the omni-cameras are mounted on two vertical stands with the optical axes being parallel to the floor plane as mentioned previously, but the cameras’ optical axes are allowed to be nonparallel, making an included angle φ as depicted in Fig. 1(a). To accomplish the 3-D data computation work under an arbitrary system setup, the included angle φ must be calculated first. A method for cal-culating the angle φ using a single manually chosen horizontal space line is proposed first in Section V-A. However, in order to conduct the adaptation process automatically, we have to calculate the angle φ using multiple automatically extracted horizontal space lines. To achieve this, a novel method is

Fig. 5. Comparison of the traditional peak cell extraction method and the proposed one. (a) Hough space. (b) Fifty detected space lines using traditional method. (c) Post-processed Hough space. (d) Fifty detected space lines using proposed method.

proposed in Section V-B, which utilizes all the detected space lines from the two omni-images taken with the cameras.

The proposed method has several advantages. First, only the directional information of the space line, which is a robust feature against noise, is used. Next, no line correspondence between the two omni-images need be derived; that is, it is unnecessary to decide which line in the left omni-image corre-sponds to which one in the right omni-image. This makes the proposed method fast, reliable, and suitable for a wide-baseline stereo system such as the one proposed in this paper. Also, the proposed method makes use of a good property of the man-made environment—many line edges in such environments are parallel to one another, leading to an improvement on the robustness and correctness of the computation result.

A. Calculating Angle φ Using a Single Horizontal Space Line

In this section, a method for calculating the angle φ between the two cameras’ optical axes is proposed, using a single horizontal space line L in the environment. Let (A1, B1) be

the parameters corresponding to line L in an omni-image taken with Camera 1, vL= (vx, vy, vz) be the directional vector of L

in CCS 1, and S1be the space plane going through line L and

the origin of CCS 1. The normal vector of S1 can be derived,

according to (8), to be

n1= (A1,

1− A2₁− B2₁, B1).

Since S1 goes through line L, we get to know that vL and n1

are perpendicular, resulting in

vL·n1= vxA1+ vy

1− A2₁− B₁2+ vzB1= 0. (13)

Furthermore, since L, being horizontal, is parallel to the xz-plane as shown in Fig. 1(a), we get another constraint vy = 0.

(7)

Fig. 6. Illustration of the angles φ1, φ2, and φ. (a) Definition of φ1. (b) Relation between φ1, φ2, and φ.

This constraint can be combined with (13) to get

vL= (vx, vy, vz) = (B1, 0,−A1). (14)

Next, by referring to Fig. 6(a), it can be seen that the angle

φ1 between the x-axis of CCS 1 and space line L is φ1= tan−1(−A1/B1).

Similarly, let (A2, B2) be the parameters corresponding to

the horizontal space line L in Camera 2. By following the same derivations described above, the angle φ2 between the x-axis of CCS 2 and line L can be derived to be

φ2= tan−1(A2/B2).

As depicted in Fig. 6(b) where L1and L2specify identically

the single horizontal space line L, the angle φ between the two cameras’ optical axes can now easily be computed to be

φ= φ1− φ2= tan−1( A1/B1)− tan−1(−A2/B2). (15) B. Calculating Angle φ Reliably Using Several Detected Lines

Horizontal space lines can be detected from an omni-image using Algorithm 2, as described in Section IV. Let L1

be a space line so detected from the left omni-image with parameters (A1, B1), and let L2 be another detected similarly

from the right omni-image with parameters (A2, B2). As stated

previously, the angle φ can be calculated using (15) if the space lines L1 and L2 are an identical horizontal space line L in the

environment.

However, the line correspondence problem of deciding whether L1 and L2 are identical or not is difficult for several

reasons, especially for a wide-baseline stereo system such as the one proposed in this paper. First, the respective viewpoints and viewing fields of the two cameras differ largely. Thus, environment features, such as lighting and color, involved in the image-taking conditions at the two far-separated cameras might vary largely as well. Also, the extrinsic parameters of the two cameras are unknown; therefore, the involved geometric relationship is not available for use to determine the line correspondences. To get rid of these difficulties, we propose a novel statistics-based method for reliably finding the angle

φwithout the need to find such line correspondences. More specifically, the proposed method makes use of two important properties. First, it is noticed that the correct value of the angle φ can still be calculated using (15) even when the two space lines L1 and L2 are not an identical one, but are

parallel to each other. This can be seen from the fact that the

angles φ1and φ2remain the same if L1and L2 are parallel so

that the computed angle φ is still correct, as desired. Second, it can be seen that in man-made environments, many of the line edges are parallel to one another to make the environment neat and orderly. For example, tables, shelves, and lights are always placed to be parallel to walls and to one another. Combining these two properties, we can conclude that any two detected space lines L1 and L2 are very likely to be parallel to each

other. Based on this observation, we assume every possible line pair L1 and L2 to be parallel, and compute accordingly a

candidate value for angle φ, where L1is one of the space lines

detected from the left omni-image, and L2is another detected

from the right omni-image. Then, we infer a correct value for angle φ from the set of all the computed candidate values via a statistical approach based on the concept of voting.

In more detail, the proposed method is designed to include three main steps. First, we extract space lines from the left omni-image as described in Algorithm 2, and denote their line parameters (A and B) as li. Similarly, we detect space lines

from the right omni-image with their parameters denoted as

rj. In addition, we define two weights w(li) and w(rj) for li

and rj, respectively, to be the cell values in the post-processed

Hough space derived in Step 4 of Algorithm 2, which represent the trust measures of the detected space lines. Then, from each possible pair (li, rj), we calculate a value φijfor angle φ using

(15), and a third weight wijdefined as w(li)· w(rj). The value wij may be regarded as the trust measure of the calculated

angle φij. Finally, we set up a set of bins, each for a distinct

value of φ, and for each computed value φij, we increase the

value of the corresponding bin by the weight wij. After such

a weight accumulation work is completed, the bin with the largest value is found and the corresponding angle φijis taken

as the desired value for angle φ.

An experimental result so obtained is shown in Fig. 7. In Fig. 7(a) and (b), 50 space lines with parameters li and rj were detected using Algorithm 2 from the left and right

omni-images, respectively. For each possible pair (li, rj) where

1 ≤ i, j ≤ 50, the corresponding angle φij and weight wij were calculated and accumulated in bins, as described

previously. The accumulation result is shown in Fig. 7(c) with the maximum occurring at φ = −23°, which is taken finally as the derived value of angle φ.

VI. Proposed Technique for Baseline Derivation and Analytic Computation of 3-D Data The world coordinate system X–Y –Z is defined as depicted in Fig. 8. The x-axis goes through the two camera centers O1

and O2, the y-axis is taken to be parallel to the Y -axes of

both CCSs, the z-axis is defined to be perpendicular to the

XY -plane, and the origin is defined to be the origin O1 of CCS 1. It is noted here that since the two omni-cameras are affixed firmly on the omni-camera stands and adjusted to be of an identical height as described in Section I, the axes X, Z,

X1, Z1, X2, and Z2 are all on the same plane, as illustrated

in Fig. 8.

Since the two omni-cameras are allowed to be placed arbitrarily at any location with any orientation, it is necessary

(8)

Fig. 7. Experimental result of the proposed adaptation method for detecting included angle φ. (a), (b) Left and right omni-images with the detected space lines superimposed on it. (c) Accumulation result for φ with maximum occurring at φ =−23°.

to find the baseline D and the orientation angles β1 and β2

(as defined in Fig. 8) in advance to calculate the 3-D data of space points. A novel method for calculating the orientation angles is proposed first in Section VI-A. After the orientations are derived, the 3-D data can be determined up to a scale, as discussed in Section VI-B. Then, a method using the known height of the user to determine the baseline D is proposed in Section VI-C. After the baseline D is derived, the absolute 3-D data of space feature points can be derived by a similar method as proposed in Section VI-B. It is emphasized that all computations involved in these steps are done analytically, i.e., by the use of formulas without resorting to iterative algorithms.

A. Finding Two Cameras’ Orientations

Let the camera coordinates of CCS 1 be denoted as (X1, Y1, Z1), and those of CCS 2 as (X2, Y2, Z2), as shown in Fig. 8.

As mentioned previously, the two CCSs X1–Y1–Z1 and X2– Y2–Z2 are allowed to be oriented arbitrarily (with Y1 and Y2

parallel to each other), and the only knowledge acquired by the proposed system is the angle φ between the two optical axes Z1 and Z2, which is derived using the detected space

lines, as described previously in Section V.

To derive the angles β1and β2, the user is asked to stand in

the middle region in front of the two omni-cameras so that a feature point Puseron the user’s body may be utilized to draw a

mid-perpendicular plane of the line segment O1O2, as shown

in Fig. 8. Let (X1, Y1, Z1) be the coordinates of Puser in CCS

1, and (u1, v1) be the corresponding pixel’s image coordinates

in the left omni-image. From (1) and (3), we have the equality X1 Y1 Z1 T = X2₁+ Y2 1

cos θ1 sin θ1 tan α1

T

where cos θ1, sin θ1, and tan α1are computed from (u1, v1)

ac-cording to (1) and (3). This equality shows that the directional vector between O1 and Puser is (cos θ1, sin θ1, tan α1) in CCS

1. An angle 1 is defined on the XZ-plane as illustrated in

Fig. 8, which can be expressed as 1 = tan−1(tan α1/cos θ1).

Fig. 8. Top view of the coordinate systems. The baseline D, orientation angles β1and β2, and a point Puseron the user’s body are also drawn.

Similarly, the angle y2defined on the XZ-plane can be derived

to be tan−1(tan α2/cos θ2). Accordingly, we can derive β1to be β1= ψ1− π 2 − ψ2− ψ1+ φ 2 = ψ1+ ψ2+ φ 2 − π 2 and β2 is just β2= β1− φ. This completes the derivations of

the orientation angles β1 and β2 of the two cameras. B. Calculating 3-D Data of Space Feature Points

Let P be a space feature point with coordinates (X, Y, Z) in CCS 1, and let the projection of P onto the omni-image taken by Camera 1 be the pixel p1 located at image coordinates (u1, v1). From (1) and (3) with R1=

X2 1+ Y12, we have X1 Y1 Z1 T = R1

T

(16) where cos θ1, sin θ1, and tan α1 are computed from (u1, v1)

by (1) and (3). Equation (16) describes a light ray L1 going

through the origin O1 with directional vector d1 = [cos θ1

sin θ1 tan α1]T in CCS 1. To transform the vector into the

coordinate system X− Y − Z, we have to rotate d1 along

the y-axis through the angle β1, as illustrated in Fig. 8. As a

result, the transformed light ray L1goes through (0, 0, 0) with

its directional vector d1 being d1= ⎡ ⎣cos β0 1 01 − sin β0 1 sin β1 0 cos β1 ⎤ ⎦ ⎡ ⎣cos θsin θ11 tan α1 ⎤ ⎦ . (17)

Similarly, let the space feature point P be located at (X,

Y, Z) in CCS 2 and its projection onto the omni-image taken by Camera 2 be the pixel p2 located at image coordinates (u2, v2). Then, similarly to the derivation of (16), we can obtain

the following equation to describe L2 in CCS 2:

X2 Y2 Z2

T

= R2

T

(18) where R2 =

X2

2+ Y22. As illustrated in Fig. 8, we can

transform the light ray L2from CCS 2 to the coordinate system X–Y –Z by rotating the ray through the angle β2and translating

it by the vector [D 0 0]T_{. As a result, the transformed light ray} L2 goes through (D, 0, 0) with its directional vector d2 being

d2= ⎡ ⎣cos β0 2 01 − sin β0 2 sin β2 0 cos β2 ⎤ ⎦ ⎡ ⎣cos θsin θ22 tan α2 ⎤ ⎦ . (19)

(9)

We now have two light rays L1 and L2 both going through

the space point P. If everything, including the works of system setup, camera calibration, and feature detection, is conducted accurately without incurring errors, these two lines should intersect perfectly at one point that is just P. But, unavoidably, various errors of imprecision always exist so that the intersection point does not exist. One solution to this problem is to estimate the coordinates of point P as those of the midpoint Pm on the shortest line segment between the two

light rays L1 and L2, as illustrated in Fig. 9.

To obtain this solution, let d be the vector perpendicular to d1 and d2 as shown in Fig. 9, which can be expressed as d1×d2, where× denotes the cross-product operator. Since Q1

is on light ray L1, its coordinates (X1, Y1, Z1) can be expressed

as

X1 Y1 Z1

T

= 0 0 0 T+ λ1d1 (20)

where λ1 is an unknown scaling factor. Let S1 be the plane

containing P2, Q1, and Q2. As illustrated in Fig. 9, the normal

vector n1of plane S1is d2× d, or equivalently, d2× (d1× d2).

Since P2and Q1are both on this plane, we get to know that the

vector P2Q1is perpendicular to n1. This fact can be expressed

by −−−→ P2Q1· n1= X1 Y1 Z1 T − D 0 0 T · (d2× (d1× d2)) = 0.

Combining the above equality with (20), we get

λ₁d₁− D 0 0 T

· (d2× (d1× d2)) = 0

from which the unknown scalar λ1 can be solved to be λ1= D

(d2× (d1× d2))· e1

(d2× (d1× d2))· d1

(21) where e1 = [1 0 0]T. Similarly, since Q2 is on light ray L2,

the coordinates (X2, Y2, Z2) of Q2 can be expressed as

X2 Y2 Z2

T

= D 0 0 T+ λ2d2 (22)

where λ2 is another unknown scaling factor. Let S2 be the

plane containing P1, Q1, and Q2. The normal vector n2 of

this plane is d1×d = d1×(d1×d2). Since P1 and Q2 are both

on this plane, the vector P1Q2 is known to be perpendicular

to n2, leading to the following equality:

−−−→ P1Q2· n1= X2 Y2 Z2 T₋ 0 0 0 T · (d1× (d1× d2)) = 0.

Combining the above equality with (22), we get

D 0 0 T+ λ2d2

· (d1× (d1× d2)) = 0

which can be solved to get the unknown scalar λ2 as λ2=−D

(d1× (d1× d2))· e1

(d1× (d1× d2))· d2

. (23)

Since Pmis the midpoint between Q1and Q2, the coordinates

(Xm, Ym, Zm) of Pm can be expressed as ⎡ ⎣ XYmm Zm ⎤ ⎦ =1 2 ⎛ ⎝ ⎡ ⎣ XY11 Z1 ⎤ ⎦ + ⎡ ⎣ XY22 Z2 ⎤ ⎦ ⎞ ⎠

Fig. 9. Illustration of deriving the middle point Pmof light rays L1and L2.

which, when combined with (20), (21), (22), and (23), leads to the following estimation result for use as the desired 3-D data of space point P:

⎡ ⎣ XYmm Zm ⎤ ⎦ = 1 2D· e1− (d2× (d1× d2))· e1 (d1× (d1× d2))· d2 d1− (d1× (d1× d2))· e1 (d1× (d1× d2))· d2 d2 (24) where e1 = [1 0 0]T and D is the baseline to be determined.

C. Finding Baseline D

To compute the baseline D, we make use of a fact about triangulation in binocular computer vision: the 3-D data can be determined up to a scale without knowing the value of the baseline D [26]. This fact can also be seen from (24), where the baseline D is a scaling factor of the computed 3-D data.

Specifically, within the omni-images taken of the user standing in front of the two cameras as mentioned previously, we extract two points on the head and the feet of the user, respectively. Let Phead and Pfoot denote their real 3-D data,

respectively. On the other hand, as stated previously, we can compute the 3-D data up to a scale of the two points, which we denote as Phead and Pfoot, respectively, using (24) with

the term D in it being ignored. Then, the relations between the data Phead, Pfoot, Phead, and Pfoot can be expressed as

Phead= D· Phead and Pfoot= D· Pfoot

where D is the actual baseline value. Let H be the Euclidean distance between Phead and Pfoot, and let H be the real

distance between Phead and Pfoot, which is just the known

height of the user. Then, the baseline D can finally be computed as D = H/H.

After finding the baseline D, the system parameters are now all adapted. To sum up, the three steps of the proposed adaptation method are described as follows. First, the included angle φ between the two optical axes is determined using space line features, as discussed in Section V. Then, by asking the user to stand at the middle point in front of the two omni-cameras, the orientation angles β1 and β2 of the two

cameras are calculated, as described in Section VI-A. Finally, the baseline D is calculated using the height H of the user, as described in this section. An overview of the proposed adaptation method is also described in Algorithm 1.

VII. Experimental Results and Discussions In this section, we describe first how we calibrate the omni-cameras to obtain their intrinsic parameters in

(10)

Section VII-A. Then, we present several experimental results to show the feasibility, reliability, and accuracy of the proposed line detection method, the system setup adaptation method, and the 3-D computation process in Sections VII-B to VII-D.

A. Omni-Camera Calibration

In the first step, the lens center and the focal length of the perspective camera should be calibrated. As illustrated in Fig. 10(a) and (b), the mirror boundary, appearing as a circle in each captured omni-image, was extracted to robustly estimate the camera center and the focal length according to [23]. Specifically, we found a circle to fit the circular mirror boundary like that appearing in Fig. 10(b), and defined the camera center as the center of the fitting circle. Also, as shown in Fig. 10(a), we derived the camera’s focal length

f, according to the properties of similar triangles and the

rotational invariance of the omni-camera [27], [28], as

f = Mr

R (25)

where M is the distance from the lens center to the camera center Om, R is the radius of the mirror base in the real-world

space, and r is the radius of the mirror base in the taken image. The measured values in our experiments are R = 4.0 cm, M = 8.6 cm, and r = 243 pixels for both cameras, from which the focal lengths f were derived to be 522.45 according to (25).

Next, we solve ε from (1) to get

ε= sec β + sec α tan β− tan α.

Combining the above equality with (1) and (2), we can get

ε= 1 +_u2f_+v22+ 1 +_X2Z_+Y22 f √ u2_+v2 − Z √ X2_+Y2 . (26)

The above equation shows that, if we have a landmark point with known image coordinates (u, v) and known camera coordinates (X, Y, Z), then the eccentricity ε can be calculated. Although the eccentricity ε is theoretically a constant value, we found in this paper that we can achieve a better accuracy in 3-D data computation if a linear polynomial can be used to describe ε. The reason is that such a polynomial can be used to cope with some types of errors, including the radial distortion of the perspective camera’s lens, the imprecise measurements coming from the calibration process, and the manufacturing imprecision of the hyperboloidal mirror shape. Accordingly, we propose the following first-order equation to describe the eccentricity ε, which comes from a functional expansion of ε with respect to the mirror’s radius r according to the rotational invariance property as used in several studies [27], [28]:

ε= g· r + h (27) where g and h are two coefficients, and r is as defined in (2). In our experiments, a calibration board as shown in Fig. 10(b) was designed and put in front of the omni-camera. Each cross point Pi on the board was taken as a landmark

point as stated in Algorithm 1, and used to calculate the eccentricity εi by (26). After the values εi corresponding to

Fig. 10. Illustration of omni-camera calibration. (a) Relationship between mirror and image plane. (b) Omni-image of a calibration board.

all the landmark points were derived according to (26), the coefficients g and h in (27) were computed finally using a Levenberg–Marquardt algorithm [29] to be −0.0022 and 1.9211, respectively.

To demonstrate the effectiveness of the first-order approx-imation method, we conduct two experiments as follows. In these experiments, we measure the 3-D data of the 60 landmarks on a calibration board, and compute the 3-D measurement errors. The average 3-D measurement error is 6.3% with a standard deviation of 1.4% when using a constant eccentricity, which is reduced to an average error of 1.9% with a standard deviation of 0.71% when using the first-order approximation. This shows the effectiveness of the first-order approximation method for computing the eccentricity ε. It is noted here that the first-order coefficient g is supposed to be small since it should be a constant in theory. Otherwise, it means any of the three possibilities: 1) the measurements in the calibration are not accurate enough; 2) the lens of the perspective camera is heavily distorted; and 3) the mirror is not of a good hyperboloidal shape.

B. Space Line Detection Ability

In Sections IV-A, IV-D, and IV-B, three techniques of improvements on increasing the detection ability and reliability of the proposed Hough-based space line detection method have been proposed, which are henceforth called parameterization,

peak cell extraction, and accumulation, respectively. Some

comparisons are provided here to show the effectiveness of the proposed improvement techniques. About parameterization, we compare the effect of our technique with that proposed in [16]. About peak cell extraction, we compare our technique using the proposed filter with a traditional method. And about accumulation, we compare the adaptive thresholding technique that we propose with a traditional accumulation method [31], [32]. Accordingly, four different space line detection experi-ments have been designed, which are listed in Table I.

The input omni-image of the four experiments is shown in Fig. 11(a). In each experiment, we first found the edges in the omni-image to get those shown in Fig. 11(b). Then, we applied the Hough-based space line detection method to find 50 space lines. Finally, we drew the detected lines on the omni-image. The results of the four experiments are shown in Fig. 11(c)–(f), respectively.

As shown in Fig. 11(c), since the parameterization proposed in [16] has a singularity when n = 0, only space lines near the

(11)

Fig. 11. Space line detection results of the four different experiments. (a) Input omni-image. (b) Edge detection results. (c)–(f) Fifty space lines detected in Experiments 1–4, respectively. Experiment 4 based on use of all proposed improvement methods shows the best result.

periphery region can reliably be detected. In contrast, when using the proposed parameterization technique, space lines in the center region can be detected as shown in Fig. 11(d), but are quite crowded. After using the proposed peak cell extraction technique, the detected lines are more separated, as shown in Fig. 11(e). Finally, after the proposed adaptive thresholding technique was applied in the last experiment, the detection result was improved further, yielding lines with more diversified directions, as shown in Fig. 11(f).

To summarize, the proposed techniques have at least three advantages over the traditional ones. First, the proposed pa-rameterization technique has no singularity problem, and the range of the Hough space is fixed within [−1, 1]. In contrast, the method proposed in [16] has a singularity when n = 0, and the range of the parameters goes from negative infinity to a positive one. Second, space lines can be extracted more effectively by the proposed peak cell extraction technique. Third, the projection curve corresponding to the Hough cells in a cell support is of equal widths everywhere, which further improves the detection result.

C. Adaptation Ability

Some experimental results are given here to show the adaptation ability under different cameras and environments. Two types of cameras were used, which are perspective cameras and catadioptric omni-cameras, and three different

TABLE I

Four Different Space Line Detection Experiments

Parameterization Peak Cell Extraction Accumulation Exp. 1 Proposed in [16] Traditional Traditional

Exp. 2 Proposed Traditional Traditional

Exp. 3 Proposed Proposed Traditional

Exp. 4 Proposed Proposed Proposed

Fig. 12. Experimental results under different cameras and environments. (a) Corridor. (b) Hall. (c) Room. (d) Adaptation results of angle φ.

environments were considered, which are a corridor, a hall, and a room, as shown in Fig. 12(a)–(c).

Four different experiments were conducted. Experiment 1 is conducted in the corridor with omni-cameras, Experiment 2 in the hall with omni-cameras, Experiment 3 in the room with omni-cameras, and Experiment 4 also in the room but with perspective cameras. In each experiment, the two cameras were oriented in different angles (i.e.,−30°, −15°, 0°, 15°, and 30°). Fifty space line features were first extracted as proposed in Section IV. Then, the angle φ was automatically calculated using these lines as proposed in Section V. The results are shown in Fig. 12(d). The x-axis specifies the ground truth of the angle φ, and the y-axis specifies the absolute error of the calculated angle φ.

In Experiments 1 and 2, since the lines in the corridor and hall are relatively simple and obvious, the adaptation result is accurate with errors of about 2°, as shown by the green and purple curves in Fig. 12(c). Also, since we use omni-cameras in these experiments, the lines can still be captured even when the two cameras were oriented with a large angle.

(12)

Thus, the adaptation result remains accurate when the angle

φis large. In Experiment 3, since the space lines in the room are more complicated, the adaptation becomes more difficult. However, since the omni-cameras can capture a large field of view of the environment, a plenty number of space lines can be captured. Therefore, the adaptation result is also accurate with errors of about 4° as shown by the red curve in Fig. 12(c). In contrast, the adaptation errors are about 10° when perspective cameras were used, as shown by the blue curve in Fig. 12(c), and they become unacceptable (larger than 20°) when the included angle φ is large. These experimental results show the feasibility of the proposed adaptation methods, and the power of the omni-cameras in the automatic adaptation process.

D. Adaptation and 3-D Acquisition Ability

A series of experiments are conducted to test the adaptation ability and the 3-D acquisition precision in the room envi-ronment, as shown in Fig. 12(b). In each of the experiments, the two cameras were placed at a distance about 180 cm to each other, and both were oriented randomly within the range of ±40°. After the cameras were set up, two omni-images of the environment were captured as shown, for example, in Fig. 13(a) and (d), respectively, and used to calculate the included angle φ according to Step 5 of Algorithm 1. Next, a user was asked to stand in the middle region in front of the two cameras, as shown in Fig. 13(b) and (e), to calculate the orientation angles β1 and β2 and the baseline D according to

Step 6 of Algorithm 1. After these adaptation tasks were done, a board with 60 landmarks was held by the user, as shown in Fig. 13(c) and (f), to test the precision of the resulting 3-D computation.

In these experiments, three different degrees of adapta-tion were implemented and the corresponding results were compared: 1) no adaptation was conducted with the camera orientations and baseline set to be β1 = β2 = 0° and D = 180

cm (D is the ground-truth value); 2) the left omni-camera was set up to face forward with the values β1 = 0°, D = 180 cm,

and β2adapted to be−φ; and 3) all the parameters β1, β2, and D were adapted according to the proposed method. Denoting

(Xi, Yi, Zi) as the ground-truth location of a landmark point,

and (Xi, Yi, Zi) as the calculated location, we define the 3-D

error E of each landmark point as

E= (Xi− Xi)2+ (Yi− Yi)2+ (Zi− Zi)2 X2 i + Yi2+ Z2i. (28) The comparison results are shown in Fig. 14 in which the vertical axis specifies the average of the 3-D errors, and the horizontal axis specifies the system orientation angle that is de-fined as the maximum of the two orientation angles β1and β2.

As can be seen from Fig. 14(a) and (b), when no parameter is adapted with the results shown by the blue curve, the 3-D errors are seen to become larger as the orientation angle becomes larger, showing the necessity of an automatic system adaptation process. When only the orientation β2 of the right

omni-camera is adapted with the result shown by the red curve, it is observed that the 3-D errors are sometimes lower but vary largely. This results from the fact that the left

omni-Fig. 13. Sample omni-images of an experiment. (a), (d) Taking a shot of the environment to calculate φ. (b), (e) User standing in the middle region in front of the cameras to calculate baseline D and orientation angles b1and b2. (c), (f) Board held by the user to test the 3-D computation precision.

camera is assumed to face forward in this case. Thus, if the left omni-camera is actually placed to face forward in the experiment, the error measure is lowered; otherwise, the error is large as expected. Finally, when all the parameters

β1, β2and D are adapted with the results shown by the purple

curve, the 3-D errors are lower than 8% even when the system orientation angle is large. This shows the feasibility, reliability, and validity of the proposed system adaptation method.

It is noted that these 3-D measurements are calculated under a certain unintended inaccurate system setup. For example, it is required that the two omni-camera stands be adjusted to be at an identical height, but there might still exist a small distance, say, 1 cm, between the heights of the two stands. Similarly, although the optical axes are assumed to be parallel to the

xz-plane, a small angle, say 1°, might be included between

the optical axes and the xz-plane. To see the effect of such an unintended system setup inaccuracy, a plot of the average 3-D errors resulting from a series of planned inaccurate setups is drawn in Fig. 14(c). As can be seen, at the reasonable setup errors of 1 cm in height and 2° in included angle, the average 3-D error is 2.805%, which is tolerable in a real-time game playing according to our experimental experience.

Using the proposed vision system, we have also created a game application in our experiments, which allows a user to play a 3-D maze game, as illustrated in Fig. 15. The game is played mainly by the use of a finger with a yellow cot as a cursor, controlling the avatar going around and up and down in the maze to reach the destination. The 3-D position of the simulated cursor is computed by analyzing the omni-image pair to detect the feature point of the finger cot and by calculating its 3-D position by the proposed method. Fig. 15 shows the game playing environment and three views of the 3-D maze from different directions at a certain moment. When playing the game, the avatar moves toward the correct direction and responds as quickly as the player’s finger moves. This realtime effect comes mainly from the 3-D computations all by the uses of the analytic formulas derived previously. It is noted in game playing that, if the player stands too far from the cameras, it will be too hard to detect the feature point on his or her finger, which influences 3-D calculations. Also,

(13)

Fig. 14. Experimental results of three different degrees of adaptations. (a) 3-D errors. (b) Standard deviations of the 3-D errors. The proposed adaptation methods yield the best results as shown by the purple curves. (c) 3-D errors resulting from the unintended inaccurate setups.

Fig. 15. Use of proposed vision system for a 3-D maze game. (a) Player of the game wearing a yellow finger cot with the two omni-cameras. (b)–(d) Three different views of the 3-D maze game at a certain moment.

since there is a blind circle in omni-images, the 3-D tracking process will fail momentarily. Otherwise, in normal cases, the avatar can easily be controlled by the player, which shows the feasibility of the proposed system for game playing and other similar applications.

VIII. Conclusion

A new two-omni-camera stereo vision system for general 3-D vision applications with a capability of automatic adapta-tion to any camera setup was proposed. The adaptaadapta-tion process yield the values of the two omni-cameras’ orientations and distance (baseline), from which the 3-D data of space feature points can precisely be computed. The experimental results showed the feasibility of the proposed system. In contrast, the cameras’ orientations and distance of the conventional binoc-ular vision system were all fixed because of its nonadjustable configuration.

The proposed vision system had several advantages over conventional systems. First, the user can interact with the system within a wide area because the proposed system used two omni-cameras, instead of traditional projective cameras, to capture omni-images that cover large fields of view. This is a desired property for many applications. For example, it can be used in exhibitions to interact with humans in a large area, in 3-D indoor surveillance of large public spaces, or in future virtual sporting environments where people are walking or running in a wide area. Today, commercial products also try to solve the small field-of-view problems of con-ventional cameras. For example, the Microsoft Kinect uses a motorized tilt mechanism to track the user’s activities to overcome this problem [5]. In contrast, the proposed sys-tem did not suffer from this problem. Second, the proposed system can be set up flexibly, and so was appropriate for more real applications and more convenient for nontechni-cal users. Third, the proposed system yield better precision in computed 3-D data than traditional short-baseline stereo systems. This comes from the merit of the structure of the proposed system—the two omni-cameras are affixed to two independent camera stands that may be placed farther away from each other. It was noted that the proposed system was less applicable in environments with natural scenes as backgrounds where horizontal parallel lines were fewer for use by the system.

Future studies may be directed to applying the proposed system to more human–machine interaction activities.

References

[1] Z. Y. Zhou, A. D. Cheok, Y. Qiu, and X. Yang, “The role of 3-D sound in human reaction and performance in augmented reality environments,” IEEE Trans. Syst., Man, Cybern. A, Syst., Humans, vol. 37, no. 2, pp. 262–272, Mar. 2007.

[2] B. J. Tippetts, D. J. Lee, J. K. Archibald, and K. D. Lillywhite, “Dense disparity real-time stereo vision algorithm for resource-limited systems,” IEEE Trans. Circuits Syst. Video Technol., vol. 21, no. 10, pp. 1547–1555, Oct. 2011.

[3] Y. Sun, X. Chen, M. Rosato, and L. Yin, “Tracking vertex flow and model adaptation for three-dimensional spatiotemporal face analysis,” IEEE Trans. Syst., Man, Cubern. A, Syst, Humans, vol. 40, no. 3, pp. 461–474, May 2010.

[4] K. Li, Q. Dai, W. Xu, J. Yang, and J. Jiang, “Three-dimensional motion esimtation via matrix completion,” IEEE Trans., Syst., Man, Cybern. B, Cybern., vol. 42, no. 2, pp. 539–551, Apr. 2012.

[5] Wikipedia Contributors. “Kinect,” Wikipedia, The Free Encyclopedi [Online]. Available: http://en.wikipedia.org/wiki/Kinect

[6] S. Laakso and M. Laakso, “Design of a body-driven multiplayer game system,” Comput. Entertainment, vol. 4, no. 4, article 4C, Oct.–Dec. 2006.