Experimental Results - Learning of Environments

Chapter 3 Learning of Environments

3.5 Experimental Results

An environment map of our experimental environment obtained by applying Algorithm 3.5 is shown in Figure 3.8. The scaling factor of the map is taken to be 40 pixels/m. The environment map includes six target places for visits (shown as green regions), eight merchandise items (shown as blue labels), and two fisheye cameras (shown as blue circles). Two examples of images captured with the two fisheye cameras are shown in Figure 3.9.

Exit

Figure 3.8 Environment map of the experimental environment, where target places are shown as green regions, and cameras as blue circles.

An obstacle avoidance map of the experimental map obtained by applying Algorithm is shown in Figure 3.10, in which the avoidance directions are shown as green arrows. Blocks without avoidance directions means that the blocks are obstacle regions or regions which are away enough from obstacles.

(a) (b)

Figure 3.9 Images captured from the two fisheye cameras of the experimental environment. (a) An image captured from the Camera-1 of the map shown in Figure 3.8 (b) An image captured from the Camera-2.

Figure 3.10 Obstacle avoidance map of the experimental environment.

Chapter 4 User Identification by Color Image Analysis Using Multicolor Edge

Marks on Top of Client Devices

4.1 Ideas of Proposed User Identification Method

For the purpose of providing a multi-user AR-based navigation system, we must identify each of the multiple users in the environment. In this study, we propose a user identification method by color image analysis for the indoor environment. The method is described in this chapter.

We have built a vision-based infrastructure with fisheye cameras affixed on the ceiling. The server-side system can access the omni-images captured with the cameras, and conduct detections of multiple users the in indoor environment. The method we propose for user identification is to attach a multicolor edge mark on top of each client device (the mobile device) and detect it in each consecutive video frame which is an omni-image. We segment the multicolor edge mark from the omni-image and classify the edge mark according to its color pattern.

Sometimes we cannot segment out all the color segments of the mark because the multicolor edge mark may be blocked by the user’s body. Therefore, we propose also a technique for classification error reduction. In this technique, three schemes are used to detect and track multicolor edge marks. A more detailed description of these

schemes will be described in Section 4.2.3.

4.2 Proposed User Identification

Method by Color Image Analysis

4.2.1 Multicolor Edge Mark Detection

Here we describe the proposed method to identify multiple users by the multicolor edge mark on the top of each client device. The colors of the multicolor edge mark have high saturation and high lightness, so they appear to be very prominent in the acquired omni-image, and can be segmented out easily from the image. For example, as shown in Figure 4.1, we can see the yellow-green-pink edge mark very clearly in the omni-image.

Figure 4.1 The clear multicolor edge mark (The yellow-green-pink strip) in the acquired omni-image.

In order to separate the multicolor edge mark from the rest part of an omni-image, at first we convert the color space of the omni-image from the RGB space to the HSV one. The HSV color model assigns three color components into a pixel, which

respectively are hue, saturation, and value. The hue component may be described with the words we normally think of colors: red, blue, green, etc. The saturation refers to the dominance of hue in the color. The value indicates the lightness of the color. By the use of the HSV color model, we can separate the multicolor edge mark from the omni-image more easily, because the colors of the multicolor edge mark have high saturation and high lightness. In addition, we assume that there are three colors on multicolor edge marks. We detect and classify them to obtain the identification number for each user. The classification scheme will be introduced in the next section. In addition, the multicolor edge mark becomes a strip shape in the omni-image, so we can use it to detect as well the orientation of the users (i.e., the direction to which the user is facing). The proposed scheme for detection of the user’s orientation will be described in Chapter 5.

The following algorithm describes the process to detect the color regions of multicolor edge marks.

Algorithm 4.1. Detection of the multicolor edge mark on top of the client device.

Input: an omni-image I and the foreground region R of a user.

Output: the three middle points P1, P2, and P3 of the color regions on the multicolor edge mark, and an approximating line L for the edge mark.

Steps.

Step 1. Create an image I_hsv by converting the color space of I from the RGB model to the HSV one by the following equations:

max( , , );

46 connected component in region R, set the middle points Pi to be non-usable.

Step 4. Find the bounding box B_i of C_i.

In the above algorithm, we apply six threshold values to extract the three different color regions of the multicolor edge mark in Step 2, and extract the entire mark in Step 6. An example of the result is shown in Figure 4.2. After we detect the middle points of the extracted color regions, we can classify them to obtain the identification numbers for identification of multiple users, as described next.

(a) (b)

Figure 4.2 Detection of the color regions on the color edge mark on top of the mobile device. (a) The three color regions segmented from the omni-image. (b) The three bounding boxes (in red) detected in (a).

4.2.2 Multicolor Edge Mark Classification

We detect the color regions by applying Algorithm 4.1 to each omni-image taken by a fisheye camera on the ceiling and obtain the three middle points of the color regions. The combination of the three colors is then mapped to different user identification numbers by a pattern classification scheme proposed in this study, as described now. First, we construct a mapping table between the detected color regions and the user identification numbers. To guarantee a correct performance of the

proposed classifier, we have to omit some combinations of the three colors. In more detail, in theory we have 3³ 27 user identification numbers. But we have to consider two “ambiguous” cases on the edge mark here.

(1) The three colors on an edge mark are left-right symmetric to those on another (e.g., yellow-green-pink and pink-green-yellow)  If the colors of two edge marks are of this case, then when they appear in the image simultaneously and face symmetrically to each other, then they cannot be differentiated as two different marks.

(2) More than one identical color is neighboring in the mark (e.g., yellow-yellow-pink)  Neighboring colors on the edge mark are extracted by image processing to be just a single region in this study, so the edge mark will be regarded to consist of less than three colors. For example, the combination of seemingly three colors, yellow-yellow-pink, will be considered to be just a combination of two colors, yellow-pink, only.

Therefore, only nine effective combinations of colors are left, instead of 27, for use in user identification when three colors are used on the edge mark, as can be figured out and seen from the mapping table shown in Table. 4.1. Of course, we may use a larger number of colors on the edge mark to increase the number of effective combinations for identifying more persons. For example, when four distinct colors are used, then it can be figured out that twenty-two effective combinations of them can be used for user identification.

The following algorithm describes the process to classify the color regions of the multicolor edge mark.

Algorithm 4.2. Classification of the multicolor edge mark on top of the client device.

Table 4.1 Mapping table of combinations of three colors and identification numbers.

ID 1 2 3 4 5 6 7 8 9 -

Output: a user identification number n.

Steps

Step 1. Find the normal d of the approximating line L. _l

Step 2. Find the direction  of d if L is not vertical. _l

Step 3. If L is not vertical, then conduct the following two steps:

(1) for i = 1, 2, 3, perform the following operation:

(2) sort all of those P_ix' whose corresponding original P_i are usable;

else, sort Piy directly for i = 1, 2, 3.

Step 4. Concatenate the pixels in an order corresponding to the sorting result to obtain a pixel sequence r.

Step 5. Map the sequence of colors of the pixels in r to a user identification number n according to Table 4.1 and take it as the output.

4.2.3 Technique for Classification Error Reduction

We introduced the proposed user identification technique in the previous sections.

However, it has stability and precision problems which cause failures in identifying users. Therefore, we propose further a technique to reduce the errors of classification.

We construct a record to remember the last five results of yielded identification numbers. If the last five identification numbers are all the same, meaning that the person in question has been stably identified already, then the newly-yielded identification number will be discarded.

In addition, sometimes we cannot detect all the three colors on a multicolor edge mark successfully because the multicolor edge mark may undesirably be blocked by the user’s body. So, we give a priority sequence to multicolor edge marks as: the three-color edge mark, two-color edge mark, and one-color edge mark. This priority sequence will be used in correcting erroneous classification results. For example, if we detect three colors on a multi-color edge mark, which was classified as a two-color edge mark in previous image frames, and then it will be corrected to a three-color edge mark in the current frame according to the priority. The following algorithm describes the proposed process to reduce the error of classification.

Algorithm 4.3. Reduction of errors in classification of multicolor edge marks on

top of the client device.

Input: a sequence of user identification numbers n1, n2, n3, …, nn.

Output: an sequence of corrected user identification numbers n1, n2, n3, …, nn.

Steps

Step 1. Put n_, n2, n3, n4, n5 in a record R and set nini for i = 1, 2, …, 5.

Step 2. For i  5, if the following two qualities are satisfied:

(1) ni-1  ni-2  … ni-5; (2) priority(ni-1)  priority(ni),

then, set ni  ni-1 to keeping the detected identification number;

else, set ni  ni-1 to correct the identification number.

Step 3. Put niin R.

Step 4. Delete ni-5 from R if the number of elements in R is larger than five.

Step 5. Repeat Step 2 through Step 4 until the last user identification number has already processed.

4.3 Proposed Algorithm of User Identification

We introduced three techniques for user identification in the previous sections. In this section, we will describe how we integrate the three techniques to perform the user identification work more reliably.

Algorithm 4.4. Integrated determination of the user identification number.

Input: an omni-image I and the foreground region R of a user.

Output: the user’s identification number n.

Steps

Step 1. Detect the middle points P₁, P₂, and P₃ of the color regions and the approximating line L of each multicolor edge mark Mi in omni-image I by Algorithm 4.1.

Step 2. Use the normal direction d_l of L, the middle points P₁, P₂, and P₃, and the identification number table T to classify the edge mark Mi to obtain a user identification number n by Algorithm 4.2.

Step 3. Reduce the error classifications and correct the identification number n into n by Algorithm 4.3.

Step 4. Take nas the output.

4.4 Experimental Results

In this section, we show experimental results of user identification by the use of the previously-proposed algorithms. Figure 4.3 shows some examples of the successful results of recognizing multicolor edge marks with four different identification numbers. In each case, the obtained user identification number is indicated at the right-bottom corner of the red rectangle enclosing the detected region of the user.

Figure 4.4 shows the results of classification error reduction, where Figures 4.4(a) through 4.4(d) are an image sequence obtained before the proposed classification error reduction technique is applied, from which we can see that the identification numbers obtained in the sequence are not stable with the non-identical results of 5-5-7-5. Figures 4.4(e) through 4.4(h) are an image sequence obtained after classification error reduction is carried out, which shows that the obtained numbers 7-7-7-7 become identical now.

Figure 4.5 shows the results of applying classification error reduction to the case of two users being in the environment, where Figures 4.5(a) through 4.5(d) are an image sequence obtained before classification error reduction, from which we can see that the identification numbers obtained in the sequence are not stable with the results of 5-7-7-5 and 5-3-3. Figures 4.5(e) through 4.5(h) are an image sequence after classification error reduction, which shows that the obtained numbers 7-7-7-7 and 5-5-5 become identical now.

(a) (b)

Figure 4.3 User identifications at different locations with different colors on edge marks. (a) Identification number 7. (b) Identification number 9. (c) Identification number 3. (d) Identification number 5.

(a) (e)

(b) (f)

(d) (h)

Figure 4.4 Effect of user identification using classification error reduction for a single-user case. (a) to (d) An image sequence obtained before classification error reduction. (e) to (f) An image sequence obtained after classification error reduction.

(a) (e)

(b) (f)

(d) (h)

Figure 4.5 Effect of user identification results using classification error reduction for a two-user case. (a) to (d) An image sequence obtained before classification error reduction. (e) to (f) An image sequence obtained after classification error reduction.

Chapter 5 Multi-user Localization in Indoor Environments by Computer Vision Techniques

5.1 Review of a Previous Work and Idea of Proposed Method

In this study, we propose a multi-user localization method using image-based analysis techniques for AR-based guidance in indoor environments. We have built a vision-based infrastructure with fisheye cameras affixed on the ceiling. The server-side system can access the omni-images captured with the cameras, and conduct detections of both the users’ locations and orientations. We integrate single-user localization techniques as described in Hsieh and Tsai [17] and the proposed multi-user identification technique to localize multiple users in indoor environments.

For multi-user location detection, we perform background/foreground separation to detect foreground images, and then apply connected component analysis to find the users’ activity regions. Then, the users’ foot points in the regions are analyzed and transformed into the GCS. A more detailed description of the proposed multi-user location detection scheme will be described in Section 5.2.

For user reviewing orientation detection, we use three different techniques integrally to obtain the viewing orientation of users. The first is the simplest way, which is to calculate user motions by use of the users’ locations detected from

consecutive video frames. The second is to use the orientation sensor on the client mobile device to detect the users’ orientation. The last is to attach multicolor edge marks on the mobile devices held by users, and then analyze acquired omni-images to detect the multicolor edge marks which are used to determine the orientation and identification number of the users. In addition, we use the frustum for user viewing detection to detect the height of viewing orientation. A more detailed description of the proposed scheme for user viewing orientation detection will be described in Section 5.3.

5.2 Multi-user Location Detection

5.2.1 Review of an Algorithm for Single-user Location Detection

In this study, we adopt a single-user location detection scheme from the previous study of Hsieh and Tsai [17]. To detect a user’s position, at first the user’s body part is extracted from the input image. For this, a background image is captured in the learning stage as a reference. Then, when the user enters the environment in the navigation stage, his/her body is found from the acquired fisheye-camera image by a process of foreground region detection, including background subtraction, thresholding, and region growing. And the user’s foot point in the found body region is detected according to an optical property of the fisheye camera  in an image acquired with a downward-looking fisheye camera, a space line perpendicular to the ground appears as a radial line going through the image center. Accordingly, because the user is standing on the ground, the axis of his/her body will go through the image center. And so the user’s foot point may be found to be the image point in the detected

body region nearest to the image center. Finally, the user’s foot point in this region is transformed into the GCS. In this way, we can obtain the user location in the environment.

5.2.2 Proposed Technique for Multi-user Location Detection

Based on the single-user location detection scheme as described previously, we propose a technique for multi-user location detection in this section. The first step is background/foreground separation. As shown in Figure 5.1, we capture a background image before running the server-side system. When users enter the environment, they will be considered as parts of the foreground regions. Therefore, we can obtain the users’ regions by finding the connected components in the foreground image.

Algorithm 5.1 below illustrates the steps to obtain such connected components in an omni-image.

Algorithm 5.1 Finding foreground regions in an omni-image.

Input: An omni-image I captured from a fisheye camera, a background image B captured beforehand, and a pre-selected threshold value TD.

Output: Foreground regions R1,R2, …, Rn in I.

Steps

Step 1. Subtract B from I to get a difference image D.

Step 2. Apply the threshold value TD on D to get a foreground image F by the following steps:

(1) set F u v( , ) 1 , if D u v( , ) T_D; (2) set F u v( , )0, otherwise,

where D(u, v) denotes the value of a pixel on D.

Step 3. Apply the erosion operation to F to eliminate noise.

Step 4. Find connected components in F as the desired foreground regions R1, R₂, …, R_n using a connected component labeling algorithm.

In Step 3, we reduce noise by applying the erosion operation on the foreground image. However, the erosion operation will also eliminate the details of the foreground image. Another way to reduce noise is to set a larger threshold value in Step 2.

(a) (b)

(c)

Figure 5.1 Background/foreground separation. (a) The background image. (b) The image of the environment with two users. (c) The foreground image resulting from by subtracting (a) from (b).

With the regions of the users extracted, we continue to find their foot points in the regions to determine the users’ locations. As described in the previous section, we

assume that the users using the proposed indoor AR navigation system are standing on the ground all the time, and so the axis of each user’s body are perpendicular to the ground, meaning that the axis of his/her body will go through the image center. So we can detect the users’ foot points using the property, and then transform them to GCS.

We can find the users’ locations by the following algorithm using the output of Algorithm 5.1 as the input.

Algorithm 5.2 Computation of the multi-user locations.

Input: The foreground regions R1, R2, …, Rn of multiple users.

Output: The locations of the users in the GCS.

Steps

Step 1. Find the nearest point f_i to the omni-image center in R_i.

Step 2. Project fi onto the line CC_R to obtain a projection point fi, where C is the omni-image center and CR is the center of the bounding box circumscribing R_i.

Step 3. Transform fi into the GCS as output.

The user location can be computed by the spatial transformation described in Section 3.4.3. An example of the results is shown in Figure 5.2.

(a) (b)

Figure 5.2 Detected foot points of two users (shown as red circle). (a) The original image captured from the camera (b) The foot points in MCS.

5.3 Detection of Users’ Viewing Orientations

5.3.1 Review of an Algorithm for User Orientation Detection

The second stage in user localization is orientation determination. We adopt the user orientation detection scheme proposed by Hsieh and Tsai [17]. Three techniques, namely, human motion estimation, magnetic-field sensing, and color edge mark detection, are proposed in this study for uses in different situations.

In motion estimation, a user’s position is detected in every navigation cycle as described previously in Section 5.2, resulting in a position sequence, which can be used to compute the user’s orientation by motion estimation. However, the user’s positions detected by image analysis are not always very accurate, and so the user’s orientation computed by this way of motion analysis will not always be smooth. A

在文檔中以俯視式環場攝影機作多人擴增實境式室內商品導覽 (頁 55-0)