Chapter 4 User Identification by Color Image Analysis Using
4.3 Proposed Algorithm of User Identification
We introduced three techniques for user identification in the previous sections. In this section, we will describe how we integrate the three techniques to perform the user identification work more reliably.
Algorithm 4.4. Integrated determination of the user identification number.
Input: an omni-image I and the foreground region R of a user.
Output: the user’s identification number n.
Steps
52
Step 1. Detect the middle points P1, P2, and P3 of the color regions and the approximating line L of each multicolor edge mark Mi in omni-image I by Algorithm 4.1.
Step 2. Use the normal direction dl of L, the middle points P1, P2, and P3, and the identification number table T to classify the edge mark Mi to obtain a user identification number n by Algorithm 4.2.
Step 3. Reduce the error classifications and correct the identification number n into n by Algorithm 4.3.
Step 4. Take nas the output.
4.4 Experimental Results
In this section, we show experimental results of user identification by the use of the previously-proposed algorithms. Figure 4.3 shows some examples of the successful results of recognizing multicolor edge marks with four different identification numbers. In each case, the obtained user identification number is indicated at the right-bottom corner of the red rectangle enclosing the detected region of the user.
Figure 4.4 shows the results of classification error reduction, where Figures 4.4(a) through 4.4(d) are an image sequence obtained before the proposed classification error reduction technique is applied, from which we can see that the identification numbers obtained in the sequence are not stable with the non-identical results of 5-5-7-5. Figures 4.4(e) through 4.4(h) are an image sequence obtained after classification error reduction is carried out, which shows that the obtained numbers 7-7-7-7 become identical now.
53
Figure 4.5 shows the results of applying classification error reduction to the case of two users being in the environment, where Figures 4.5(a) through 4.5(d) are an image sequence obtained before classification error reduction, from which we can see that the identification numbers obtained in the sequence are not stable with the results of 5-7-7-5 and 5-3-3. Figures 4.5(e) through 4.5(h) are an image sequence after classification error reduction, which shows that the obtained numbers 7-7-7-7 and 5-5-5 become identical now.
(a) (b)
(c) (d)
Figure 4.3 User identifications at different locations with different colors on edge marks. (a) Identification number 7. (b) Identification number 9. (c) Identification number 3. (d) Identification number 5.
54
(a) (e)
(b) (f)
(c) (g)
(d) (h)
Figure 4.4 Effect of user identification using classification error reduction for a single-user case. (a) to (d) An image sequence obtained before classification error reduction. (e) to (f) An image sequence obtained after classification error reduction.
55
(a) (e)
(b) (f)
(c) (g)
(d) (h)
Figure 4.5 Effect of user identification results using classification error reduction for a two-user case. (a) to (d) An image sequence obtained before classification error reduction. (e) to (f) An image sequence obtained after classification error reduction.
56
Chapter 5
Multi-user Localization in Indoor Environments by Computer Vision Techniques
5.1 Review of a Previous Work and Idea of Proposed Method
In this study, we propose a multi-user localization method using image-based analysis techniques for AR-based guidance in indoor environments. We have built a vision-based infrastructure with fisheye cameras affixed on the ceiling. The server-side system can access the omni-images captured with the cameras, and conduct detections of both the users’ locations and orientations. We integrate single-user localization techniques as described in Hsieh and Tsai [17] and the proposed multi-user identification technique to localize multiple users in indoor environments.
For multi-user location detection, we perform background/foreground separation to detect foreground images, and then apply connected component analysis to find the users’ activity regions. Then, the users’ foot points in the regions are analyzed and transformed into the GCS. A more detailed description of the proposed multi-user location detection scheme will be described in Section 5.2.
For user reviewing orientation detection, we use three different techniques integrally to obtain the viewing orientation of users. The first is the simplest way, which is to calculate user motions by use of the users’ locations detected from
57
consecutive video frames. The second is to use the orientation sensor on the client mobile device to detect the users’ orientation. The last is to attach multicolor edge marks on the mobile devices held by users, and then analyze acquired omni-images to detect the multicolor edge marks which are used to determine the orientation and identification number of the users. In addition, we use the frustum for user viewing detection to detect the height of viewing orientation. A more detailed description of the proposed scheme for user viewing orientation detection will be described in Section 5.3.
5.2 Multi-user Location Detection
5.2.1 Review of an Algorithm for Single-user Location Detection
In this study, we adopt a single-user location detection scheme from the previous study of Hsieh and Tsai [17]. To detect a user’s position, at first the user’s body part is extracted from the input image. For this, a background image is captured in the learning stage as a reference. Then, when the user enters the environment in the navigation stage, his/her body is found from the acquired fisheye-camera image by a process of foreground region detection, including background subtraction, thresholding, and region growing. And the user’s foot point in the found body region is detected according to an optical property of the fisheye camera in an image acquired with a downward-looking fisheye camera, a space line perpendicular to the ground appears as a radial line going through the image center. Accordingly, because the user is standing on the ground, the axis of his/her body will go through the image center. And so the user’s foot point may be found to be the image point in the detected
58
body region nearest to the image center. Finally, the user’s foot point in this region is transformed into the GCS. In this way, we can obtain the user location in the environment.
5.2.2 Proposed Technique for Multi-user Location Detection
Based on the single-user location detection scheme as described previously, we propose a technique for multi-user location detection in this section. The first step is background/foreground separation. As shown in Figure 5.1, we capture a background image before running the server-side system. When users enter the environment, they will be considered as parts of the foreground regions. Therefore, we can obtain the users’ regions by finding the connected components in the foreground image.
Algorithm 5.1 below illustrates the steps to obtain such connected components in an omni-image.
Algorithm 5.1 Finding foreground regions in an omni-image.
Input: An omni-image I captured from a fisheye camera, a background image B captured beforehand, and a pre-selected threshold value TD.
Output: Foreground regions R1,R2, …, Rn in I.
Steps
Step 1. Subtract B from I to get a difference image D.
Step 2. Apply the threshold value TD on D to get a foreground image F by the following steps:
(1) set F u v( , ) 1 , if D u v( , ) TD; (2) set F u v( , )0, otherwise,
where D(u, v) denotes the value of a pixel on D.
59
Step 3. Apply the erosion operation to F to eliminate noise.
Step 4. Find connected components in F as the desired foreground regions R1, R2, …, Rn using a connected component labeling algorithm.
In Step 3, we reduce noise by applying the erosion operation on the foreground image. However, the erosion operation will also eliminate the details of the foreground image. Another way to reduce noise is to set a larger threshold value in Step 2.
(a) (b)
(c)
Figure 5.1 Background/foreground separation. (a) The background image. (b) The image of the environment with two users. (c) The foreground image resulting from by subtracting (a) from (b).
With the regions of the users extracted, we continue to find their foot points in the regions to determine the users’ locations. As described in the previous section, we
60
assume that the users using the proposed indoor AR navigation system are standing on the ground all the time, and so the axis of each user’s body are perpendicular to the ground, meaning that the axis of his/her body will go through the image center. So we can detect the users’ foot points using the property, and then transform them to GCS.
We can find the users’ locations by the following algorithm using the output of Algorithm 5.1 as the input.
Algorithm 5.2 Computation of the multi-user locations.
Input: The foreground regions R1, R2, …, Rn of multiple users.
Output: The locations of the users in the GCS.
Steps
Step 1. Find the nearest point fi to the omni-image center in Ri.
Step 2. Project fi onto the line CCR to obtain a projection point fi, where C is the omni-image center and CR is the center of the bounding box circumscribing Ri.
Step 3. Transform fi into the GCS as output.
The user location can be computed by the spatial transformation described in Section 3.4.3. An example of the results is shown in Figure 5.2.
(a) (b)
Figure 5.2 Detected foot points of two users (shown as red circle). (a) The original image captured from the camera (b) The foot points in MCS.
61
5.3 Detection of Users’ Viewing Orientations
5.3.1 Review of an Algorithm for User Orientation Detection
The second stage in user localization is orientation determination. We adopt the user orientation detection scheme proposed by Hsieh and Tsai [17]. Three techniques, namely, human motion estimation, magnetic-field sensing, and color edge mark detection, are proposed in this study for uses in different situations.
In motion estimation, a user’s position is detected in every navigation cycle as described previously in Section 5.2, resulting in a position sequence, which can be used to compute the user’s orientation by motion estimation. However, the user’s positions detected by image analysis are not always very accurate, and so the user’s orientation computed by this way of motion analysis will not always be smooth. A solution to this problem is to average all the motion vectors obtained within a period of time to get a more stable result for use as the user’s orientation. Yet, such an averaging operation will delay undesirably the orientation computation result when the user is turning. That is, the user’s orientation changes quickly when he/she is turning, but the averaging operation will cause the computed orientation to change slower. Therefore, we use a turning flag to determine whether a user is turning in the proposed user orientation determination scheme. In this way, the result of motion vector averaging will be changing more quickly in time to reflect the user’s turning speed.
As to magnetic-field sensing, the magnetometer on the mobile device can be used to measure the azimuth angle of the device by detecting the changes and disturbances
62
of magnetic fields. In the learning stage, we have established an azimuth map in which the magnetometer reading of each of four pre-selected major directions d0
through d3 in the environment is recorded, i.e., for each sample environment spot p, a 6-tuple (x, y, a0, a1, a2, a3) is kept in the map, where (x, y) specify the position of p and a0 through a3 specify the azimuth values which are obtained when the mobile device is oriented toward the four directions d0 through d3. To use the azimuth map in the navigation stage, with the detected user’s position p as input, at first the sample spot Ap nearest to p is picked out from the map. Then, the azimuth angle value ap at p for the current orientation of the user is measured using the magnetometer. Finally, the desired user’s orientation dp is computed by interpolation using the learned azimuth values a0 through a3 of Ap recorded in the azimuth map.
Finally, in color edge mark detection, the color edge mark becomes a strip shape in the image, so a line approximation scheme can be applied to detect the mark as described in the last chapter. Under the assumption that the user holds the device horizontally, the color edge mark becomes parallel to the ground. Therefore, we can determine the orientation of the color edge mark by the orientation of the line. The color edge mark has two mutually reverse directions, and we take as output the one closer to the orientation u0 detected by the use of the magnetometer readings
5.3.2 Proposed Technique for User Viewing Orientation Detection
Based on the ideas about user orientation detection described previously, we propose a method for user viewing orientation detection in this study, which is described in this section. Recall that three techniques have been proposed for uses in different situations, namely, human motion estimation, magnetic-field sensing, and
63
multicolor edge mark detection.
In motion estimation, the positions of the users are detected as described previously in Section 5.2, which can be used to compute the orientations of the users by motion estimation. We adopt the same method to estimate the motions of multiple users as described in Section 5.3.1. We average the motion vectors to get a more stable result and use turning flags to determine whether users are turning.In this way, the result of motion-vector averaging will be changing more quickly in time to reflect the user’s turning speed.
In magnetic-field sensing, we have established an azimuth in the learning stage map as well. Therefore, we can obtain the desired orientation of each user, which is computed by interpolation using the learned azimuth recorded in the azimuth map.
In multicolor edge mark detection, a multicolor edge mark, which is detected as described in Chapter 4, is used for orientation detection as well. We try to compute the direction vector of the approximating line in order to determine the device orientation.
Under the assumption that the user holds the device horizontally, the multicolor edge mark becomes parallel to the ground. As shown by the example in Figure 5.3, the multicolor edge mark is represented as a yellow-green-pink line, and the red line and the yellow-green-pink line are projected onto identical image points; meanwhile, the vertical projection (shown as the dotted green line) of the multicolor edge mark is parallel to the red line. Therefore, we can determine the orientation of the color edge mark by the orientation of the red line.
64
Multicolor edge mark
Camera
Figure 5.3 The red line and the multicolor edge mark (shown as yellow-green-pink line) are projected onto identical image points. The vertical projection (shown as dotted green line) of the multicolor edge mark is parallel to the red line.
The following algorithm describes the process to detect the user orientation by the three methods.
Algorithm 5.3 User viewing orientation detection.
Input: A users’ position and body region in an image obtained by the schemes described in Section 5.2.
Output: the user’s orientation d.
Steps
Step 1. Determine the user’s orientation using the magnetometer readings provided by the mobile device and the azimuth map constructed in the learning stage, resulting in a direction d0 for use as the initial user orientation
Step 2. With d0 as a reference, apply multicolor edge mark detection to find another user orientation dc and a corresponding reliability index rc.
Step 3. If rc is larger than a pre-selected threshold, then take dc as the desired user orientation d and exit; otherwise, perform the next step.
Step 4. If the user is walking, then conduct human motion estimation to obtain a third user orientation d as the desired output and exit; otherwise, perform
65
the next step.
Step 5. Take d0 obtained in Step 1 as the desired output and exit.
5.4 Review of User Tracking
5.4.1 User Tracking under a Single Fisheye Camera
The objective of user tracking is to identify the same person in consecutive video frames captured by fisheye cameras. For this aim, we adopt a high-level tracking technique proposed by Senior, et al. [19]. At first, as described in Section 5.2, the foreground regions in a foreground image are extracted, and each foreground region is enclosed with a bounding box. Then, for each successive video frame, each foreground region is associated with one of the existing tracks of objects including the user. A track here represents an identical object’s movements in consecutive video frames. This process of track association is achieved by constructing a tracking matrix representing the distance between each of the foreground regions and all the existing tracks. Each row of the tracking matrix corresponds to one track, and each column corresponds to one foreground region. The distance is computed using a bounding box distance measure proposed in [19] as illustrated in Figure 5.4(a): the distance between two bounding boxes, A and B, is the smaller of the distance from the center of A to the nearest point on B and that from the center of B to the nearest point on A. In particular, if either box center lies within the other box as shown in Figure 5.4(b), then the distance is set to be zero.
66
A
B
A B
(a) (b)
Figure 5.4 Bounding box distance measure. (a) The distance between A and B is the smaller of the distance from the center of A to the nearest point on B or from the center of B to the nearest point on A. (b) The distance is zero if either center lies within the other bounding box.
If a foreground region is close enough to only one track and only a region is close enough to the track, i.e., if there is a one-to-one correspondence between the track and the region, then the corresponding column and row will both have a “1” as shown in Figure 5.5(a), meaning that the region is associated with the track. However, two regions may be both close enough to one track, and this will produce two “1’s” in a row as shown in Figure 5.5(b), meaning that both regions are associated with the track.
Similarly, if a region is close enough to two tracks, the region is associated with both tracks. Finally, if two regions are both close enough to two tracks as shown in Figure 5.5(c), then the two regions are associated with both tracks. Accordingly, multiple regions may be associated with multiple existing tracks properly, by which the same user may be identified in consecutive video frames, achieving the goal of user tracking under a single fisheye camera.
The user tracking algorithm using the adopted method is described in Algorithm 5.4.
67
Figure 5.5 Tracking matrix at different situations. (a) A region is close enough to only a track, and only one region is close enough to the track. (b) Two regions are close enough to a track. (c) Two regions are close enough to two identical tracks.
Algorithm 5.4 User tracking.
Input: The foreground regions C in a frame and a set T of tracks with each track being associated with at least one foreground region; Ti meaning the ith track in T; and Ci meaning the ith foreground region in C.
Output: Tracks T.
Steps
Step 1. Create a tracking matrix M with all zeros with each row of M corresponding to one track of T, and each column of M corresponding to one region of C.
Step 2. Compute the bounding box distance dij between each Ci and R(Tj) using the bounding box distance measure where the function R(t) returns the associated region of the track t.
68
Step 3. For each dij, set M(i, j) to be 1 if dij < TD, where TD is a pre-selected threshold value.
Step 4. Perform the following steps for M.
4.1. For each column i with only one non-zero element at row j which has only one non-zero element at column i, associate Ci with Tj. 4.2. For each column i with all zero elements, create a new track tnew,
associate it with Ci, and add tnew into T.
4.3. For each row j with all zero elements, remove Tj from T.
4.4. For the columns i1, i2, …, im which have more than one non-zero elements at rows j1, j2, …, jn, associate C1, C2, …, Cm with T1, T2, …, Tn.
In Step 3, we binarize the tracking map by the resulting distance; if two bounding boxes are close enough, the resulting value is set to be one; otherwise, it is set to be zero. In Step 4.2, if a foreground region is not associated with any track, then it is regarded as a new object to track. We remove tracks which are not associated with any object in Step 4.3.
5.4.2 Camera Hand-off for Tracking under Multi-cameras
A fisheye camera has a wider field of view, and can observe a wider range than a traditional perspective camera. However, if an object is located outside the view of a
A fisheye camera has a wider field of view, and can observe a wider range than a traditional perspective camera. However, if an object is located outside the view of a