Comparison of results of proposed method and Tsai’s method

Chapter 5 A Robust and Accurate Calibration Method for Coordinate

5.2 Proposed Method

5.3.2 Comparison of results of proposed method and Tsai’s method

Referring to Fig. 5.4 again, we used calibration patterns Figs. 5.4(a) through (c) to extract landmark points which are marked in Fig. 5.4(f) with “+.” These landmark points were used to conduct calibration both by the proposed method and by Tsai’s method as described previously. After the calibration process, the dot centers of the dot-matrix image were used to perform the coordinate transformation to get the coordinates of the dot centers in the display screen. Both results by using a 5×5 dot-matrix pattern for calibration are showed in Figs. 5.13 and 5.14 for comparison. Fig. 5.13 shows the results using the set of the extracted image points which are identical to those used for calibration. Fig. 5.14 shows the results using a set of image points which were extracted from a new dot matrix pattern image. In the figures, the listed data r:<xxx, yyy> specify the real coordinates of the dot centers in the dot-matrix pattern, c:<xxx, yyy> specify the coordinate transformation results using the proposed method, and t:<xxx, yyy> specify the coordinate transformation results using Tsai’s method.

Observing Figs. 5.13 and 5.14, we can see that the coordinate transformation errors of Tsai’s method are larger than those of the proposed method, especially near the image border. In Fig. 5.13, we see that no coordinate transformation error was created by the proposed method. This is owing to the fact that the input image points used in the transformation are the same as those used in calibration. In Fig. 5.14, we can see some small coordinate transformation errors created by the proposed method. They were caused

positions. But these errors are still smaller than those yielded by Tsai’s method.

5.4 Summary

A robust and accurate method for calibration and coordinate transformations from image coordinates to display screen positions has been proposed. By using three sequentially displayed calibration patterns, including a white-rectangle shape, a black-rectangle shape, and a dot matrix, relevant landmark points can be extracted accurately and robustly for calibration. Deformable pattern matching is used for creating more accurate geometric models for the display screen shapes in acquired images. The

Fig. 5.13 Comparison using calibration image points. The precise landmark position, the

computed position by proposed method, and that by Tsai’s method are listed under

each dot.

the results with a conventional calibration method, we showed that the proposed method has the merits of more accuracy, robustness, and simplicity.

Fig. 5.14 Comparison using re-captured image points. The precise landmark position, the

computed position by proposed method, and that by Tsai’s method are listed under

each dot.

Chapter 6 A camera mouse for computer cursor control

6.1 Introduction

It is a common practice to control the cursor of a computer with a mouse on a flat pad at a close distance to the computer monitor. However, in cases of playing shooting games on computers or making presentations on large screens, it is required to maintain a certain distance from the user to the monitor. Also, the user tends to stand up in such cases, especially in the latter case of making presentations. It is so inconvenient to hold the mouse to control the cursor. It is desired to have a certain type of hand-held device, which can be operated in the air, for these applications.

From the technical point of view, we may adopt three ways to design such a kind of device: (1) using a conventional wireless remote controller with capabilities of paging and cursor movement control; (2) using a controller with a capability of detecting the device movement, which may be achieved by the use of inertial sensing devices like gyroscopes, accelerometers, etc. [46][47][48]; and (3) using a visual device like a video camera which has a self-locating capability provided by computer vision techniques [49][50].

Some drawbacks can be found in the first and second approaches. The main drawback inherent in the first approach is that the operator uses his/her fingers to push buttons. This results in slow responses, and is thus inappropriate for fast-speed game control. The main

drawback of the second approach is that inertial sensing techniques are not mature yet and the costs of the devices are high.

On the contrary, digital cameras and related devices of CCD sensors are becoming cheaper and popular. Computer vision methods based on the use of such devices also can be implemented more stably and reliably nowadays. Therefore, in this study we try to adopt the third approach and use a web camera as the hand-held device for cursor control.

Some related studies can be found in [49][50][51][52]. Nesi and Bimbo [49] proposed a kind of vision-based 3D mouse which used stereo vision for hand tracking and gesture recognition in the 3D space. The mouse position was represented by the 3D position of the hand that was estimated by computing the center of gravity at each time instant. Two independent hand postures were defined in order to emulate the buttons on traditional 2D mouse for switching off the transmission of hand movements. Dementhon and Davis [50]

developed another kind of vision-based 3D mouse which was based on an algorithm of 2D-to-3D correspondences. The mouse was a small object held in one hand of the user, on which there were four non-coplanar infrared sources. One camera was fixed next to the computer monitor and adjusted to face the user. The centroids of the four spots were computed by the micro-controller integrated with the camera, and then transmitted to the computer to calculate the pose of the 3D mouse sixty times per second.

Yang and Tsai [51] proposed an inside-out vision-based 3D mouse, which is a camera held by hand to view a square mark in front of the mouse. The orientation and the position of the mouse are computed via monocular images of the mark. Resolution adjustments and some speed constraints were proposed to reduce computation errors. Simulations and real image sequences were both conducted. Li, Hsu and Pung [52] proposed a 3D mouse which

uses a mirror and a single camera to restore the 3D position of a finger tip. The camera is positioned in such a way that it captures both the hand as well as its mirror image. The captured images are then processed to extract the contour of the hand for locating the position of the finger tip. A prototype system was implemented and the performance of the 3D mouse before different backgrounds was analyzed.

All the above-mentioned methods are vision-based 3D mice which compute 3D information of the mouse position. For many applications, 3D information is not necessary, and a “vision-based 2D mouse” is sufficient. In this study, we concentrate on the design of such a kind of mouse. More specifically, we hold a web camera in hand as the mouse and let it look at a computer monitor screen. After taking an image of the display screen, we use image processing techniques to detect and track appropriate features of certain artificially-attached landmarks attached on the monitor and the monitor screen corners, thus achieving the function of locating the cursor which is controlled by the in-air movement of the hand-held camera. The experimental results show the feasibility of the proposed method.

Some merits of the proposed method are: (1) the method requires no complicated camera calibration; (2) the method is reliable because of the combined use of display feature detection and tracking; (3) the method is robust against loss of one or two landmark points in the tracking, which allows the user to have high freedom and space for hand movement; (4) the method allows unintended device rotation by affine transformation to correct the rotation error.

In the remainder of this chapter, we describe in detail the proposed method in Section 6.2. In Section 6.3, we show some experimental results. Finally, we make a summary of

this chapter in Section 6.4.

6.2 Proposed Method

A system setup for the proposed method is illustrated in Fig. 6.1. In the sequel, we call the web camera we use as the mouse a camera mouse. The camera mouse is a video camera. The video taken by the mouse is transmitted through a USB or wireless connection to the computer for image processing and cursor position determination. It is desired to allow the mouse to have a larger movement area in the air, widening the application domain of the mouse. For this purpose, we make the following two assumptions.

(1) The camera mouse is operated at a sufficient distance from the computer monitor screen so that the area of the field of view (FOV) of the camera is at least four times of the area of the screen (i.e., both the width and the height of the FOV are two times of those of the screen, respectively).

(2) The four corners of the outer frame of the computer monitor are attached respectively with four landmarks, each being a black circle at the center of a white rectangular shape.

Fig. 6.1 Illustration of proposed camera mouse system.

Driver side

Device side

The reason why we attach landmarks for corner detection is that the colors of the image displayed in the computer monitor sometimes will be identical to that of the outer frame of the monitor such that detection of the monitor corners becomes difficult by image processing. On the contrary, with the black circle against the white background on the landmark, detection of the landmarks becomes easier. With the landmarks being detected, we can then locate the corners of the monitor screen easily because the relative location of each corner with respect to the corresponding landmark center is fixed. Fig. 6.2 shows an example of images of a computer monitor in which an image of a boy is displayed.

In the proposed method, the camera mouse system initially, in the detection stage, detects the four landmarks and utilizes them to locate the four corners of the computer monitor screen. Subsequently, the system initializes a tracking stage, in which it tries to track the four landmarks in each image frame in the video sequence taken with the camera mouse. Sometimes the hand movement might be too large such that one or two of the four corners disappear in the taken image sequence. The proposed system will, in such cases, use the position information of the existing corners in the current image frame as well as

Fig. 6.2 An example of computer monitor images.

that of the corners in the last frame to predict the positions of the missing corners. The prediction is based on the use of the movement vectors of the corners. This type of prediction is conducted until the missing corners appear again in the FOV of the camera mouse.

With the four corners of the monitor screen in an image being detected, we can find next the relative location of the image center with respect to a corner of the monitor screen, which supposedly is the desired computer cursor position pointed to by the camera mouse.

More details of the proposed method are described in the following.

6.2.1 Locating landmarks and computer monitor screen border in the detection stage

We locate the landmarks and the border of the computer monitor screen in the following way.

1. Take an image frame from the video taken by the camera mouse, and threshold it into a binary image I with a pre-determined threshold value.

2. Find black regions in I by a connected component labeling algorithm [24], aiming to detecting the black circular regions in the landmarks.

3. Filter out non-circular regions by checking the appropriateness of the area and the width-to-height ratio of each black region.

4. In an exhaustive manner, check in the following way the respective centers C1, C2,

C

3, C4 of every four remaining circular regions CR1, CR2, CR3, CR4 to see if the quadrilateral shape S formed by C₁ through C₄ meets the condition of being similar to the rectangular shape of the computer monitor screen:

a. check if the opposite sides of S are roughly equal in length, i.e., check if |C₁ −

C

₂| ≈ |C3 − C4| and if |C₁ − C4| ≈ |C2 − C3|, where the notation | ⋅ | means the

distance between two points;

b. check if the four corners formed by C1 through C4 are all of roughly right angles;

c. check if the width-to-height ratio R of S is roughly equal to that of the computer monitor which is measured manually in advance, where R is computed as the ratio of (|C1 − C2| + |C3 − C4|)/2 over (|C1 − C4| + |C2 −

C

₃|)/2.

Fig. 6.3 shows an illustration of checking the above three conditions.

5. If there exists at least a group of four black regions which meet the above conditions, then collect all of them into a candidate set D_L of landmark regions and continue; otherwise, go to Step 1 to process the next image frame.

6. Find out the real landmark circles in DL in the following way according to a concept of edge strength of the shape formed by the black region centers. Refer to Fig. 6.4 for an illustration of the following steps.

a. For each group G_j of four black regions in D_L with region centers C₁ through

C

4, take a side of the corresponding quadrilateral shape Sj, say Ci

C

i+1, and find the corresponding parallel side Ci

'C

i+1

' of the computer monitor screen

border by applying the Hough transform within a certain rectangular search window W to the left or right of line Ci

C

i+1 or below or above it.

b. Take the corresponding peak value in the Hough space as the edge strength Ei

of the detected line C_i

'C

_i+1

'.

c. Perform the last two steps for all the four sides of Sj and sum up the four edge strengths E1 through E4 to obtain a total energy strength Ej for Gj.

d. Find the maximum total energy strength E_m in the D_L, and take the corresponding group Sm of four black regions as the desired set of landmark circles.

6.2.2 Computing cursor location in the tracking stage

In the above process of detecting landmark circles in an image frame of the taken video of the computer monitor, we also obtain the border of the computer monitor screen,

C

'

Fig. 6.3 Illustration of detecting candidate landmark circles.

Fig. 6.4 Detecting of border lines.

C

₁

C

₁

'

One screen candidate

Detected line Search window (W) Base line

i.e., the four sides C₁

'C

₂

', C

₂

'C

₃

', C

₃

'C

₄

', C

₄

'C

₁

' of the screen. We may now use such border

information to compute the location of the cursor in the computer monitor screen. Note that the position of the cursor is just the position of the image center with respect to the upper-left corner of the computer monitor screen, which we assume to be C₁

' in the sequel.

Assume that the points C1

' through C

' have image coordinates (x

', y

1') through (x4

', y

₄

'), respectively, and that the image center, denoted as C

₀

', have image coordinates (x

₀

', y

₀

'). Without perspective transformation created by camera movement and rotation, it is not

difficult to figure out that the relative cursor position with respect to C1

' may be computed

simply to have coordinates (u0, v0) with u0 = k(x0

' − x

') and v

0 = k(y0

' − y

') in the monitor

screen coordinate system, where k is a scaling factor.

However, perspective transformation does exist, and so the above simple cursor coordinate computation must be modified. For this purpose, we use an affine transform which is defined as a mapping of the four corners C₁

' through C

₄

' of the

perspectively-transformed monitor screen shape S' to the four corners C1

'' through C

'' of a corrected rectangular screen shape S''. Let C

₁

'' through C

₄

'' have coordinates (x

₁

'', y

₁

'')

through (x4

'', y

'') with C

'' as the origin. How to define these coordinates will be described

later in this section.

On the other hand, an affine transform may be described as

x'' = ax' + by' + c, (6.1-1)

y'' = dx' + ey' +f,

(6.1-2)

where (x', y') and (x'', y'') are the coordinates of corresponding corners of S' and S'', respectively, and a through f are six unknown coefficients to be determined. Substituting the coordinate data of three of the four corresponding corner pairs of S' and S'' into the

above equations, we can solve the six coefficients a through f. Then, the new coordinates (x0

'', y

'') of the image center C

' may be computed accordingly, and so may the new

coordinates (x1

'', y

'') of the upper-left corner C

' of the monitor screen. In turn, the desired

position (u₀, v₀) of the cursor may finally be computed as

u

0 = k(x0

'' − x

'') = k ×

[a(x0

' − x

') + b(y

' − y

') + c]; (6.2-1) v

₀ = k(y₀

'' − y

₀

'') = k ×

[d(x₀

' − x

₁

') + e(y

₀

' − y

₁

') +f]. (6.2-2)

Fig. 6.5 shows an illustration of the above process.

6.2.3 Computing corner coordinates of corrected rectangular monitor screen shape

We now describe how we define the coordinates (x1

'', y

'') through (x

'', y

'') of the four

corners C₁

'' through C

₄

'', respectively, of the corrected rectangular monitor screen shape S''

mentioned previously. Let the origin C1

'' be given the coordinates (0, 0). Next, we compute

the side lengths L and H for use as those of S'' from the corners C1

' through C

' of the

perspectively-transformed monitor screen shape S' in the following way:

L = (|C

₁

' − C

₂

'| + |C

₃

' − C

₄

'|)/2, (6.3)

Fig. 6.5 Affine transformation and cursor position computation.

C

'

C

'

C

' C

''

C

₀

' C

''

C

₂

''

C

'' C

'

C

''

Final cursor position

Affined screen Original screen

H = (|C

' − C

'| + |C

' − C

'|)/2, (6.4)

where | ⋅ | means the distance between two points. Then, the coordinates of corner C2

'' of S'' are set to be (L, 0), those of C

'' to be (L, H), and those of C

'' to be (0, H).

6.2.4 Dynamic tracking of landmarks for continuous cursor position computation

So far we have described how we find the locations of the landmarks specified by their centers C₁ through C₄, as well as the corners of the monitor screen C₁

' through C

₄

'.

Because the relative position between each pair of Ci and Ci

' is fixed, we only have to track

the landmark positions for continuous cursor position computation in the subsequent cycles, and so save time in the entire process.

Many methods have been proposed for target tracking [53][54][55]. In this study, we adopt the method of deformable template matching [56], which is effective for tracking objects with fixed geometric shapes. More specifically, we use each detected circular landmark center in the previous cycle to generate a tracking window for the current cycle.

And within each tracking window, we try to detect exhaustively all possible circular shapes and select the best-matching circular shape as the detected circular landmark for the current cycle. The details of the proposed landmark tracking process are described in the following.

1. Use the four circular landmarks detected in the previous cycle as centers to generate four corresponding tracking windows, each with its side length being five times the diameter of a circular landmark.

2. Find a circular landmark within each tracking window W in an optimal way as described in the following.

a. Assume that the circle to be detected has the parameters (r_i, x_i, y_i) where r_i is the radius, (xi, yi) specifies the coordinates of the circle center, i = 1, 2, …, m and m is a pre-selected number of possible candidate circles within the tracking window W.

b. For the candidate circular shape Si corresponding to the parameter set (ri, xi,

y

_i,), compute a fitting measure F_i as illustrated in Fig. 6.6, where (1) F_i is computed as the sum of the edge values of eight “sampling bars” located evenly around Si; (2) each sampling bar is composed of n “sampling pixels”

with n/2 ones within the circle (called inner sampling pixels) and the other n/2 ones outside the circle (called outer sampling pixels) where n is a pre-selected number; and (3) the edge value Ei

of a sampling bar is computed as the

difference between the sum of the gray vales of the inner sampling pixels and the sum of the gray values of the outer sampling pixels.

c. If the largest fitting measure Fmax of all of the m candidate shapes is larger than a pre-selected threshold value, then take the corresponding candidate shape Smax to be the detected circular landmark within the tracking window W;

otherwise, decide that no landmark exists in W.

Fig. 6.6 Eight sampling bars at a candidate landmark.

a sampling bar

E

= g

₁− g2

g

₁

g

₂

6.2.5 Tracking with missing landmarks

Due to possible large movements of the camera mouse, screen corners might be out of the field of view of the camera, causing disappearance of one or two landmarks. In this case, we compute the displacement vector of a found landmark in the current cycle with respect to its version in the last cycle, and utilize the vector to compute a predicted position for each missing landmark, based on the position of the missing landmark in the last cycle.

That is, we obtain the position of the missing landmark in the current cycle by adding the vector to the position of the landmark in the last cycle. We then use the predicted positions of the missing landmarks as well as the detected positions of the existing landmarks to compute the cursor position as described previously. Finally, if only one landmark is detected (i.e., if three landmarks are missing), then the detection of the landmarks is regarded to fail, and the cursor position of the last cycle is used as a substitute for the current cycle before the next cycle is started.

6.3 Experimental Results

In this section, we show the results of two of the experiments we have conducted. In the experiments, with a camera mouse we took continuously images of the screens of a notebook PC and desktop PC as two videos, simulating using the mouse for cursor control.

The program implementing the proposed method was written in C⁺⁺ on a PC with a

在文檔中攝影機校準及影像轉換技術與其應用之研究 (頁 125-0)