In this chapter, we propose a system to analyze freeway scenes which usually contain sky, trees, road, vehicles, and so on. The functional module of the proposed system is shown in Fig. 3.1. The main capability of this system can automatically classify natural objects in the scene by the characteristics of image feature without too many subjective opinions. The details of the system are described in this chapter.
3.1. Feature Selection
Color is one of the most interesting characteristic of the natural world and can be computed in many different ways. Moreover, it is well known that chromatic characteristics of natural elements are not stable and highly dependent on color brilliance, reflections from the objects, illumination geometry, viewing geometry, and camera parameters. Unfortunately, up to now no single solution has been found to sufficiently characterize objects belonging to natural scenes. When given an image representing an outdoor scene, the first difficulty in describing it is to choose the most appropriate color space for object characterization. Selecting the best color space still is one of the difficulties in scene analysis [16].
3.1.1 RGB Color Space
Color is perceived by humans as a combination of tristimuli red, green, and blue a
Fig. 3.1. The functional modules of the scene analysis system.
which are usually called three primary colors. From representation, we can derive other kinds of color space by using either linear or nonlinear transformations.
In the model, each color appears in its primary spectral components of red, green, and blue. This model is based on a Cartesian coordinate system. The
color space can be geometrically represented in a 3 dimensional cub in Fig. 3.2. The coordinates of each point inside the cube represent the values of red, green, and blue components, respectively.
Fig. 3.2. RGB color space represented in a 3 dimensional cube.
RGB is the most commonly used model for the television system and picture acquired by digital cameras. RG is suitable for color display, but not good for color scene analysis because the high correlation among the R , , and B components. By high correlation, we mean that if the intensity changes, all the three components will change accordingly. Also, the measurement of a color in RG
B
G
B
space does not represent color differences in a uniform scale, hence, it is impossible to evaluate the similarity of two colors form their distance in RGB color space.
3.1.2. HSI Color Space
The color space is another commonly used color space in image processing, which is more intuitive to human vision. The color space separates color information of an image from its intensity information. Color information is represented by hue and saturation values, while intensity which describes the brightness of an image, is determined by the amount of the light. Hue represents basic colors. Saturation is a measure of the purity of the color, and signifies the amount of white light mixed with the hue.
HSI
HSI
Fig. 3.3. HSI color space represented in a cylindrical coordinates.
The color model can be described geometrically as in Fig. 3.3. Generally hue is considered as an angle between a reference line and the color point in
color space. The range of hue value is from 0 HSI
RGB
o to 360o , for example, yellow is 60o, green is 120o , blue is 240o, and magenta is 300o. The saturation component represents the radial distance from the cylinder center. The nearer the point is to the center, the lighter is the color. Intensity is the height in the axis direction. The axis of the cylinder describes the gray levels, for example, minimum intensity is black, maximum intensity is white. Each slice of the cylinder perpendicular to the intensity axis is a plane with the same intensity.
The color space has a good capability of representing the colors of human perception [17], because human vision can distinguish different hues easily, whereas the perception of different intensity or saturation does not imply the recognition of different color. The color space can be transformed from the [18]. The formulation for hue, saturation, and intensity are
HSI
Hue reflects the predominant color of an object and has a great capability in subjective color perception. Hue is also the most useful feature in color image
processing since it is less influenced by the nonuniform illumination such as shade, shadow, or reflect lights.
3.1.3. Gray Level
The gray level Y is a measurement of the luminance of the color, and is a likely candidate for edge in a color image. The formulation of gray level Y from RG components is given by
B
Y = 0.299R + 0.587G + 0.114B ..(3.5)
3.1.4. Spatial Information
One of the drawbacks of color space clustering is that the cluster analysis does not utilize any spatial information. Therefore, the spatial information, which involves vertical position and horizontal position, is suitable for our scene analysis system. But the horizontal position has less unique information than that of the vertical position.
Because of the natural elements, for example, the house or tree, maybe locate from left to right but can not locate from up to down in the scene. For this reason, we only choose the vertical position as our spatial information feature.
3.2. Establish Fuzzy Rule Base
If we want to build a system that is able to understand a natural element in complex scenes, we can generate a fuzzy rule base system to describe the natural element. However, scene analysis in the computer vision research has been known to be one of the most difficult fields. Consequently, we proposed the genetic algorithm
based fuzzy ID3 method to solving this problem.
Provided here is a gray image as Fig. 3.4(a); we can see this scene with our eyes and understand it with our brain. Nevertheless, the scene analysis system can not recognize anything initially. Therefore, we have to provide a desired recognition by our effort. In Fig. 3.4(b), the desired recognition image has three colors which represent three distinct objects in Fig. 3.4(a), in which the white represents sky, green represents tree, and gray represents road. Then, we use the genetic algorithm based fuzzy ID3 method to generate fuzzy rules to analyze the image. In order to let the scene analysis system makes sense, we expect these fuzzy rules generated by the genetic algorithm based fuzzy ID3 method is reasonable and accurate enough.
(a)
(b)
(c)
Fig. 3.4. Results of the proposed approach: (a) original image, (b) desired output image, and (c) resulting image by fuzzy rule base.
In this case, we use Fig. 3.4(a) as the input image and 3.4(b) as the desired output image. And we choose gray level and vertical position as our input features. After running the genetic algorithm based fuzzy ID3 method, the generated decision tree is aa
Fig. 3.5. The decision tree generated by the scene analysis system on Fig. 3.4(a).
Membership value
Dark Medium Light
Gray level
Fig. 3.6. Membership function of gray level.
Membership value
High Medium Low
Vertical position
Fig. 3.7. Membership function of vertical position.
shown in Fig. 3.5. In addition, the membership function of gray level is shown in Fig.
3.6 and the membership function of vertical position is shown in Fig. 3.7.
Consequently, we can infer fuzzy rules from the decision tree shown in Fig. 3.5.
These fuzzy rules are as follows.
IF Gray level is Light
THEN Sky with certainty 0.480, Road with certainty 0.520, .and Tree with certainty 0.000.
IF Gray level is Dark
THEN Sky with certainty 0.017, Road with certainty 0.015, .and Tree with certainty 0.968.
IF Gray level is Medium and Vertical position is High THEN Sky with certainty 0.700, Road with certainty 0.002, .and Tree with certainty 0.280.
IF Gray level is Medium and Vertical position is Low
THEN Sky with certainty 0.002, Road with certainty 0.875, and Tree with certainty 0.123.
These fuzzy rules evaluate gray level first and vertical position next, because gray level has lager feature rank than vertical position. It means that gray level is more discriminative than vertical position in this case. This is because gray level can separate sky, tree, and road better than vertical position. The training result which evaluated by these four fuzzy rules is shown in Fig. 3.4(c). We obtain a region pixel accuracy of 97.7% in comparison Fig. 3.4(c) with Fig. 3.4(b). It is evident that these four fuzzy rules do locating the pixel to the appropriate natural element well.
Therefore, in our proposed approach, we can say this scene analysis is an efficient and
reasonable system.
3.3. Image Ground-Truthing
The vehicles consist of many inhomogeneous components, such as glass, license plate, tire, lamp, steel plate, and so on. By using fuzzy rules inference, it encounters great difficulty in describing the vehicle class. The vehicle class is not pure as sky, tree, and road classes. Therefore, we introduce image ground-truthing into our scene analysis system to improve the vehicle region accuracy. The ground-truthing useful for this proposed scene analysis system are summarized below.
1) The vehicles must run on the road.
2) Any vehicle highly probable has a shadow area, and this area is the darkest pixels in the scene.
3) All kinds of vehicles have a fixed height/width ratio.
We describe our scene analysis system in Figs. 3.8(a)–(f). We establish fuzzy rule base in advance and test the input image Fig. 3.8(a) which is a 256×192 chromatic image. And Fig 3.8(b) is the desired output to verify the object region analysis accuracy which includes five colors: white denotes sky, green denotes tree, gray denotes road, yellow denotes vehicle, and red denotes the others. After fuzzy rule inference on each pixel of the image, the testing result is shown in Fig. 3.8(c). We can see the image recognition accuracy and vehicle recognition accuracy are not good enough. Therefore, we use three image ground-truthing rules to find the vehicle possible region, and sketch the vehicle possible region in black line square. The result is shown in Fig. 3.8(d). In the vehicle possible region, we use Sobel operator [19]
to …
(a) (b)
(c) (d)
(e) (f)
Fig. 3.8. (a) original image, (b) desired output image, (c) resulting image by fuzzy rule base inferring, (d) possible vehicle region finding by image ground-truthing, (e) vehicle region refining by edge detection, and (f) final scene image obtained by image
erosion.
to detect the vehicle upper contour and replace the wrong classes in our detected vehicle region by vehicle class. Then, we assign the wrong vehicle class at the outside of the vehicle region to the others class. In Fig. 3.8(e), the vehicle was improved by our approach, but the wrong vehicle class is still not correct and needs changing.
Hence, we apply erosion method [19] to remove the pixels in the wrong class. The final experimental result is shown in Fig. 3.8(f). In comparison the final result of Fig.
3.8(f) with the resulting image by fuzzy rule base inferring of Fig 3.8(c), we can find out Fig. 3.8(f) is more similar to desired output of Fig. 3.8(b) than Fig. 3.8(c). In the intelligent transportation system, the vehicle recognition is the most important figure to be considered, and we can recognize vehicle correctly in this case.