Vision-based Techniques for Night Period

Chapter 2. Background

2.2. Vision-based Techniques for Night Period

N IGHT P ERIOD

In the previous section, we have introduced several kinds of methods for vision-based parking space detection. However, none of them can perform the detection task day and night. In our system, we aim to develop a system that can also robustly detect vacant parking spaces during the night period. Hence, in this section, we will mention a few vision-based algorithms that are especially designed for dark environments.

In a dark environment, the color information degrades. Figure 2-18 shows two images of the same parking space, with one being captured at daytime while the other being captured at night. We can clearly see that there is a lack of colors and edges at night.

(a)

(b)

Figure 2-18 (a) An image of the parking lot at daytime (b) An image of the parking lot at night

In [16], the authors propose a method to enhance images captured at night.

Different from traditional enhancement approaches, the authors transform the RGB color space into the HSI space and only apply the SSR (Single-Scale Retinex) or MSR (Multi-Scale Retinex) algorithm to the S (Saturation) and I (Intensity) components.

This is because the authors believe that preserving the H (Hue) component can maintain most color information of the original image. Figure 2-19 shows a performance comparison between the original SSR method and the authors’ work.

(a) (b) (c)

Figure 2-19 (a) Original image (b) SSR processing result (c) HSISSR processing result in [16]

With the same concept, the authors in [17] use another method to enhance the image. At first, given a set of day-time images and night-time images, the author decomposes each image into a reflectance term and an illumination term based on the Retinex theory. For the decomposed data, the authors train the ratio of illumination between day-time images and night-time images. After that, the author can decompose each given image into a reflectance term and an illumination term and then modify the illumination term based on the trained ratio. By combining the reflectance term and the modified illumination term, they can get an enhanced image.

The procedure of this work is illustrated in Figure 2-20.

Figure 2-20 Flowchart in [17]

On the other hand, based on detected vehicles, [18] uses a two-layer detector to measure the traffic flow at night. At first, the authors identify the candidate positions of vehicles by finding headlights. After that, they use the AdaBoost cascade classifier at each candidate position to determine whether there is a vehicle at that location.

Compared to the original AdaBoost cascade classifier, this method is more efficient and can reduce the false positive rate effectively. Figure 2-21 shows the comparison of detection results between this two-layer detector and the AdaBoost cascade classifier.

Figure 2-21 (a)(b) Images at night (b)(e) Results from the two-layer detector (red: candidate position, black: detected cars) (c)(f) Results from the AdaBoost cascade detector (blue: missing cars, black: detected cars)

Willing to extract the foreground objects from night-time images, the authors in [19] propose a background modeling method by using spatial-temporal patches. At first, the authors divide the information in an image sequence into three levels: pixel level, block level, and brick level. Every block consists of a few pixels, and each brick is the accumulation of blocks from different image frames. Figure 2-22 shows an example of the three-level structure of this work.

Figure 2-22 A three level structure in [19]

After the decomposition, the dimension of a brick containing moving objects will be bigger than that of the bricks containing background only. Hence, the authors can build the models of the foreground and the background in advance by using a lot of local bricks. Finally, given an input image, the author compares each local brick to these two models. If the difference between the input brick and the background model is smaller, that brick is regarded as a part of the background and is added into the background model. Some experimental results of this work is shown in Figure 2-23.

(a) (b) (c) (d)

Figure 2-23 (a) Input frames (b) Result of GMM (c) Result of LBP (c) Result of [19]

Chapter 3. P ^ROPOSED M ^ETHOD

In the previous chapter, we have mentioned four major factors that affect the performance of a vacant parking space detection system. These factors are occlusion effect, shadow effect, perspective distortion, and fluctuation of lighting condition. In this chapter, we will introduce the proposed system that aims to develop a system that combines both car-driven and space-driven approaches in order to deal with these four factors and to detect vacant parking spaces days and nights. We will introduce the detection algorithms of our proposed method in the second section first, and its training procedure in the third section. Finally, we will mention a few practical issues in the implementation of our system.

3.1. B ASIC C ONCEPTS

In order to build an all-day system for vacant parking space detection, there are two main issues in the design of our system. The first issue is about how to perform robust detection days and nights, and the second issue is about how to combine car-driven and space-driven approaches together.

In an all-day system, it would be difficult to deal with the dramatic changes of the environment. Especially, during the night period, the variations of the lighting condition become very complicated and a lot of information, such as edges or colors, may get lost. Hence, we will pay a great attention to the recovery of information in the night period. Moreover, as aforementioned in Chapter 2, the car-driven approach, which tends to capture the characteristics of vehicles, seems to be a good approach in handling the occlusion effect and shadows. On the contrary, the space-driven

3.1.1. I MPROVEMENT OF N IGHT -P ERIOD I MAGES

In a dark environment, some information, such as colors or edges, may get lost.

Figure 3-1 shows an image of a parking lot at night. We can observe the loss of colors in the image and this makes it much more difficult to detect vehicles simply based on the color information.

Figure 3-1 An image of the parking lot at night

In intuition, if we can adjust the value of the exposure setting, we may get images captured at different exposure settings. The use of multiple images may help us to recover the lost information in a single image. Figure 3-2 shows two images captured at two different exposure settings.

(a) (b)

Figure 3-2 (a) An image with short exposure (b) An image with long exposure

Basically, if we extend the exposure period when capturing an image, we can get more details in the dark areas. On the contrary, if we shorten the exposure period, we may get more details over the very bright areas. Hence, if we can get images captured at different exposure settings and find a way to combine these images together, we will be able to get an image with more details. With images of improved quality, we will be able to achieve more robust performance in vacant parking space detection.

3.1.2. C OMBINATION OF C AR - DRIVEN AND

S PACE - DRIVEN M ETHODS

Because of the occlusion problem and the perspective distortion in a large-scale parking lot, the traditional car-driven methods cannot be directly applied to vacant parking space detection without modification. In Figure 3-3, we show a parking lot with the presence of serious occlusions.

Figure 3-3 A parking lot with serious occlusions

However, car-driven detectors can perform very well over images with complicated backgrounds or images with changes of illumination. Hence, we aim to make some modifications over car-driven detectors to make them also suitable for parking lots with serious occlusion. In our approach, we try to combine together the car-driven approach and the space-driven approach. In the car-driven approach, we check car-level data over the parking spaces, as shown in Figure 3-4(a). On the other hand, in the space-driven approach, we check pixel-level or patch-level data over the parking spaces, as shown in Figure 3-4(b). In comparison, in our approach, we treat the detection unit as a set of composing surfaces. The ground plane of a parking space is made of ground surface while a car is made of several surfaces, as illustrated in (c) and (d) of Figure 3-4.

(a) (b)

Figure 3-4 (a) Car-level data. (b) Patch-level data (c) Ground surfaces (d) Car surfaces

According to the regularity of occlusion patterns in the parking lot, we can detect vacant parking spaces by determining the labels of each surface. Each parking spot can be thought as a cubic consisting of six surfaces and some of these surfaces are shared by neighboring parking spaces. Figure 3-5 shows the six surfaces of a parking space and Figure 3-6 shows various kinds of patterns at each of these surfaces.

Figure 3-5 Six surfaces of a parking space

Figure 3-6 Possible patterns at each surface of the parking space

As shown in Figure 3-6, there are fourteen kinds of surface patterns in total. Here, each surface pattern is labeled by its surface name and the index of the pattern, such as T_1, G_1, or S_1. Based on the observed surface patterns in the parking area, we use the Hierarchical Bayesian Framework (BHF) proposed in [15] to represent the parking block, as illustrated in Figure 3-7.

Figure 3-7 Structural diagram of the surface-based detection framework

There are three layers in the BHF framework, which are observation layer, label layer, and scene layer. In the observation layer, we treat the whole parking lot as a set of surfaces, with some of these surfaces being shared by neighboring parking spaces.

In the label layer, a node represents the label of a surface. In the scene layer, a node indicates the parking status of a parking space, such as “parked” or “vacant”.

With the BHF framework, Figure 3-8 illustrates the inference flow of our detection procedure. At the observation layer, we treat the whole parking lot as a set of surfaces and compute at each surface the probability of every possible label. Once the probabilities of every possible level are obtained, we compute the probability of each parking hypothesis by using the information passed from the scene layer. Here, a hypothesis indicates a combination of the parking statuses of the parking spaces in the parking lot and we compute the probability of each hypothesis individually. Finally, the hypothesis with the highest probability value will be chosen and that hypothesis indicates the deduced parking space statuses.

Figure 3-8 Inference flow of the proposed algorithm

3.2. D ETECTION A LGORITHM OF THE

P ROPOSED S YSTEM

In the previous section, we have introduced the basic concept of the proposed vacant parking space detection system. In this section, we will introduce the detection procedure of our system. In Figure 3-9, we show the detection flowchart of the proposed method. First, we design an image-capture process that captures images with different exposure settings. After that, these multi-exposure images are fused to obtain an image with improved quality. With the fused images and some 3D scene information, we obtain the image patch of each surface in the parking area. By extracting features from each image patch and matching them with respect to the pre-trained data, we can estimate the probability of each pattern label. Finally, by computing the probabilities of all parking hypotheses, we can deduce the most likely parking statuses of the parking lot.

Figure 3-9 Flowchart of the proposed algorithm

3.2.1. I MAGE C APTURE S YSTEM

To get multi-exposure images, we use the AXIS M1114 IP camera which can adjust the exposure values of the captured images. Figure 3-10 shows the camera settings of this camera. By using the Software Develop kit (SDK) provided by AXIS, we design an image capture system which repeatedly captures images at different exposure settings, as illustrated in Figure 3-11.

Figure 3-10 The camera settings of AXIS M1114, where the setting of exposure value is marked by the red rectangular

Figure 3 -11 Details of our image capture system

In our image capture system, we gradually increase the value of the exposure setting from a small initial value. When the exposure value reaches the pre-selected upper bound, its value is reset to the initial setting again. With this repetitive process, we can get a bag of multi-exposure images at each moment and only need to wait an exposure period to get an updated bag of multi-exposure images. In our system, the longest exposure period is chosen to be around 3 seconds and we select three different exposure values: EV=10, 50, and 90. Figure 3-12 shows three images captured with EV=10, EV=50, and EV=90, respectively.

(a) (b) (c)

Figure 3-12 (a) The captured image with EV=10 (b) The captured image with EV=50 (c) The captured image with EV=90

3.2.2. E XPOSURE F USION

Once we have gotten a bag of multi-exposure images, we can use the high dynamic range imaging (HDR) technique proposed in [13] to recover the scene information. In [13], the authors measure the quality of each pixel based on a set of multi-exposure images and generates the final result by combining multi-exposure images with different weighting maps. By using contrast, saturation and

well-exposedness as the factors, the authors can automatically compute the weighting maps. Eq. 3-1 and Eq. 3-2 are the proposed formulas, where C, S, and E indicate the contrast factor, the saturation factor and the well-exposedness factor, respectively.

C, S, and E indicates the corresponding weightings. W is a weighting map, image may have undesired halos around edges. To overcome this problem, the authors use the image pyramid decomposition. Here, the input images are decomposed into a Laplacian pyramid. The fusion process is performed at each level of the pyramid.

Finally, the resulting pyramid is used to reconstruct the fused image. Figure 3-13 shows the pyramid-based procedure of exposure fusion and Figure 3-14 shows a result of this method.

In Figure 3-15, we show a fusion result of these three images in Figure 3-12. It is apparent that the fusion image can capture more details if compared with the original images without fusion. The comparison of the gradient magnitude between the fusion image and the original image (EV=50) is shown in Figure 3-16.

Figure 3-13 The procedure of pyramid-based exposure fusion

(a) (b)

Figure 3-14 (a) An Image sequence and its weighting maps (b) The fusion result

Figure 3-15 The fusion result of the images in Figure 3-12

(a) (b)

Figure 3-16 Comparison of gradient magnitudes between the original image and the fused image (a) The gradient magnitude image of the original image

(b) The gradient magnitude image of the fused image

3.2.3. F EATURE E XTRACTION

To analyze each surface of the parking area, we get image patches from the corresponded parking spot first. With the pre-known 3D scene information, we know the position and region of every surface in the real world by using the calibration matrix between the real world and the camera images. In Figure 3-17, we show the six

surfaces and the corresponding image regions of a parking space in the parking lot.

Figure 3-17 The six surfaces and the corresponding image regions of a parking space

Once we get the image region of each surface, we extract features from it.

However, due to the perspective distortion of the camera, same surfaces in the 3-D world may appear quite different in the captured image. An example of perspective distortion is shown in Figure 3-18.

Figure 3-18 Image regions of the side surface at four different parking spaces

To deal with perspective distortion, we use the calibration method proposed in [9]. Because surfaces of the same kind would have the same width, length, and depth in the real world, we can normalize image regions at different positions of the image into image patches of the same size, as illustrated in Figure 3-19. In this figure, we transform image regions of the same surface type into normalized image patches of

length “a” and width “b”. In Figure 3-20, we show the normalized patches of the image regions at four different parking spaces in Figure 3-18.

Figure 3-19 Normalization of image regions

Figure 3-20 Normalized image patches of the side surfaces at four different parking spaces

After transforming all image regions into normalized image patches, we extract features from them. Since we want our features to be robust and not easily affected by shadows or the changes of illumination, we use the HOG feature proposed in [3] that works well against shadows and the changes of illumination. In [3], the authors propose a robust method for pedestrian detection. The authors’ goal is to find pedestrians in various kinds of images which contain complicated backgrounds.

Instead of using pixel value or the gradient value between pixels, the authors measure

the statistics of a larger image region that covers several local cells. Figure 3-21 shows an image patch that consists of many cells.

Figure 3-21 An image patch consisting of many cells

Over the larger image regions, the authors in [3] compute the histogram of oriented gradients at each cell by counting the distribution of gradient magnitudes at a few gradient orientations. However, when an object appears in an image patch, the appearance of the object may be affected by shadows or changes of illumination. To deal with these interferences, the authors group several cells into a block and perform normalization over each block to overcome the changes of illumination and the shadow effect. Figure 3-22 shows the feature extraction process of the histogram of

oriented gradients (HOG) feature. Figure 3-23 shows an image and its HOG features.

Figure 3-22 Extraction of HOG features

(a) (b)

Figure 3-23 (a) Original image (b) HOG features

In the HOG feature, the histogram information in all cells are merged together to form a high-dimensional vector. In our case, however, we make a minor modification over this HOG feature. In our case, the cars parked in the parking lot are somewhat different in their brands, types, or the parked position within the parking space. To allow such fluctuations, we further group a few cells together to form super-cells.

Figure 3-24 shows an example in which we divide a region of 48 cells into 22 super-cells by adding together the distribution data within each super-cell. After that, we construct the HOG feature based on the distribution data of these four super-cells.

(a)

(b)

Figure 3-24 (a) The definition of super-cell

(b) An example of merging a region of 48 cells into 22 super-cells

3.2.4. D ISTRIBUTIONS OF S URFACE P ATTERNS

As mentioned before, we have defined fourteen different kinds of surface patterns. For example, for the top surface, there are two possible patterns: “with a car”

pattern and “without a car” pattern. For the side surface, there are four patterns according to the parking status of that parking space and the parking status of the adjacent parking space. Similar to the side surface, there are four possible patterns for the ground surface. On the other hand, the rear surface of a parking space is actually the front surface of the parking space right behind. Hence, there are four possible surface patterns for either the front surface or the rear surface. Figure 3-25 shows these possible patterns of each surface at a parking space.

Figure 3-25 Possible patterns of each surface at a parking space

In our approach, given a set of training data, we use the features extracted from the normalized image patches to train the probabilistic distribution of each possible surface pattern. With the trained distributions, we will be able to estimate for any given patch the likelihood value of being a certain kind of surface pattern.

3.2.5. I NFERENCE OF P ARKING S TATUSES

Because of the geometric structure of our parking lot, we divide our parking lot to three major blocks and we do the labeling of each block independently. Figure 3-26 shows each block of our parking lot.

Figure 3-26 Three blocks in our parking lot

Within each parking block, we build a Hierarchical Bayesian Framework (BHF)

在文檔中以視覺為基礎之戶外全天停車場空位偵測系統 (頁 24-0)

Chapter 2. Background

2.2. Vision-based Techniques for Night Period

N IGHT P ERIOD

Chapter 3.

P ROPOSED M ETHOD

3.1. B ASIC C ONCEPTS

3.1.1. I MPROVEMENT OF N IGHT -P ERIOD I MAGES

3.1.2. C OMBINATION OF C AR - DRIVEN AND

S PACE - DRIVEN M ETHODS

3.2. D ETECTION A LGORITHM OF THE

P ROPOSED S YSTEM

3.2.1. I MAGE C APTURE S YSTEM

3.2.2. E XPOSURE F USION

3.2.3. F EATURE E XTRACTION

3.2.4. D ISTRIBUTIONS OF S URFACE P ATTERNS

3.2.5. I NFERENCE OF P ARKING S TATUSES

P ^ROPOSED M ^ETHOD