Proposed Framework - 針對盲人輔助應用之電腦視覺演算法開發與硬體架構設計

Initially, a 3D-sensor is used to provide depth images as well as color images.

There are two stages to be performed. The first stage is object segmenta-tion. Edge detection on the depth image is used to find the discontinuous depth values in real-world environments. Then depth layers of the depth im-age are analyzed to distinguish different objects. We regard the pixels with similar depth values in neighboring locations as the same object. However, the segmented images are noisy. To overcome this problem, noise elimina-tion is performed with informaelimina-tion of detected edges and pixel numbers.

The second stage is obstacle extraction. Properties of detected objects are extracted to determine which ones belong to obstacles. Once obstacles are confirmed, they would be labeled with distance information far from the user. The system flow diagram is shown in Fig. 4.9.

2.2.1 3D-sensor

The Time-of-Flight (TOF) camera has been widely used as the 3D-sensor recently. It provides each pixel with depth information in intensity values in real-time. In spite of physical limitations, it is reliable in general conditions.

In this paper, we use TOF camera to generate depth images and color images. The 640x480 resolution video inputs are tested in this work.

2.2.2 Object Segmentation

Figure 2.1: Segmented images. The left frame of each pair is a complete image and the right frame of each pair is the ROI. The image pairs on the first row shows the noisy-segmented result. The second row image pairs presents the noise-reduced segmentation result.

The purpose of the first stage is to segment each depth image into several regions, which represent different objects. The discontinuous depth values in the depth image measured by the TOF camera form the boundaries of objects in the real environment. Therefore, extracting points with

discon-13 tinuous depth values for edge detection is the first step. Let I(x) is the depth value of pixel x. We define m(x_c, x_i) as a binary variable, which rep-resents whether the difference of depth values between the center x_cand its neighboring pixel x_i is large than threshold T_V or not. The representation is shown as follows.

To find out pixels on the object boundaries, we define E(x_c) as a binary value, representing whether x_c locates on one edge or not. Let N(x_c) be the set of neighboring pixels of the center pixel x_c. For the center pixel x_c, once the number of its neighboring pixels, whose m(x_c, x_i) is equal to 1, is larger than a given threshold T_N, the value of E(x_c) would be set to 1. The equation is shown in Eq. 4.2.

E(x) =

After all edges in the depth map are found, we eliminate them so that the depth image could be segmented more accurately. The next step in object segmentation is depth layer analysis. Compared with Watershed and Histogram-Based algorithm in segmentation, the concept of Region-Growing[17] is a more suitable method to separate objects in different depth layers and spatial locations. This algorithm spreads several seeds in the image plane and merges their neighbors iteratively. However, this algorithm is seed-dependent and it would fail if the initial seeds not ideally spread on objects. To overcome this defect, we modified the approach. Let every pixel initially be set as a seed, denoted as s_x. These seeds would be merged into the same union A_i as their neighboring seeds N(s_x) if the difference of their depth values is smaller than a threshold T . Then, multiple unions are

generated. Let A_i(s_x₎ represents the seed set which contains the same union index i(s_x). For each iteration, because of the scan order for processing the region growing step, a seed may not be merged within the union containing its neighbors with the nearest depth value. After one iteration, some seeds of N(s_x) may have depth values with difference smaller than T but still do not belong to A_i(s_x₎. Thus, we define the set of seeds as W . The representation is shown below.

W = {s_x|∃s ∈ N(s_x) : [I(s) − I(s_x) < T ]Λ[i(s) 6= i(s_x)]} (2.3) In our approach, every seed in W would be merged again with the region that has the most nearest mean of depth values. The equation could be represented as follows.

Where the selected seed s⁰_x is a neighboring seed to s_x within the region that has the most nearest mean value of depth for all pixels inside. We iteratively renew the union index i(sx) until W ⊂ ∅.

The segmented image, however, is noisy because of the imperfect depth image measured by the TOF camera. Fortunately, noise usually has certain characteristics such as small number of pixels and sparse locations. The small regions with few pixel numbers and sparse locations near the edges usually represent the noise, which would be eliminated to guarantee the cor-rect segmentation. Fig. 2.1 show the comparison between a noisy segmented frame and a noise-reduced frame.

2.2.3 Obstacle Extraction

The purpose of this stage is to extract obstacles by object proprieties, such as standard deviation and the mean of depth values. The term ”obstacle” is

15 defined as moving and standing objects those appear within a certain alarm range, which is set as 1-2 meters far from the user in this paper. The alarm range could be adjusted according to different environments. Therefore, whenever obstacles occur within the alarm range, they will be captured and labeled with the estimated distance from the user.

(a) (b)

Figure 2.2: Pixel distribution of depth values on the chair (a) and the floor (b).

Intuitively, obstacles can be extracted among the detected objects by calculating the median or mean of depth values. This simple method is indeed able to extract all obstacles, however, it will cause high false-alarm rate when detecting the floor, which has low mean, is not what we aim to capture. In order to solve this problem, the standard deviation of each ob-ject should be extracted to distinguish whether it is floor or not. Since pixel distribution of depth values of the floor is scattered, its standard deviation would be larger than other objects. Fig. 2.2 shows the pixel distribution of depth values on the chair (Fig. 2.2-a) and floor (Fig. 2.2-b). Then, the threshold of standard deviation is established to remove the floor. There-fore, false alarm rate could be successfully reduced. The accurate road model would be introduced in Section 3. This road model could help us to extract the road accurately and increases the obstacle detection rate.

在文檔中針對盲人輔助應用之電腦視覺演算法開發與硬體架構設計 (頁 28-33)