A Hierarchical Bayesian Generation Framework for Vacant Parking Space Detection

(1)

A Hierarchical Bayesian Generation Framework

for Vacant Parking Space Detection

Ching-Chun Huang and Sheng-Jyh Wang

Abstract—In this paper, from the viewpoint of scene under-standing, a three-layer Bayesian hierarchical framework (BHF) is proposed for robust vacant parking space detection. In practice, the challenges of vacant parking space inference come from dramatic luminance variations, shadow effect, perspective distor-tion, and the inter-occlusion among vehicles. By using a hidden labeling layer between an observation layer and a scene layer, the BHF provides a systematic generative structure to model these variations. In the proposed BHF, the problem of luminance variations is treated as a color classification problem and is tack-led via a classification process from the observation layer to the labeling layer, while the occlusion pattern, perspective distortion, and shadow effect are well modeled by the relationships between the scene layer and the labeling layer. With the BHF scheme, the detection of vacant parking spaces and the labeling of scene status are regarded as a unified Bayesian optimization problem subject to a shadow generation model, an occlusion generation model, and an object classification model. The system accuracy was evaluated by using outdoor parking lot videos captured from morning to evening. Experimental results showed that the proposed framework can systematically determine the vacant space number, efficiently label ground and car regions, precisely locate the shadowed regions, and effectively tackle the problem of luminance variations.

Index Terms—Bayesian inference, image labeling, parking space detection, semantic detection.

I. Introduction

U

SING AN INTELLIGENT surveillance system to man-age parking lots is becoming practical nowadays. A recent technology review about smart parking system can be found in [1]. To assist users to efficiently find a vacant parking space, an intelligent parking space management system can not only provide the total number of vacant spaces in the parking lot but also explicitly identify the location of vacant parking spaces. Among those smart parking systems, vision-based systems have gathered great attention in recent years. Unlike using trip sensors or other types of sensors to mon-itor a parking lot, a vision-based system may provide many

Manuscript received January 12, 2010; revised April 18, 2010 and June 15, 2010; accepted July 18, 2010. Date of publication October 14, 2010; date of current version January 22, 2011. This work was supported in part by the Ministry of Economic Affairs, under Grant 98-EC-17-A-02-S1-032, and in part by the National Science Council of Taiwan, under Grant 97-2221-E-009-132. This paper was recommended by Associate Editor C. N. Taylor.

The authors are with the Department of Electronics Engineering, Institute of Electronics, National Chiao Tung University, Hsinchu 300, Taiwan (e-mail: chingchun.huang3@gmail.com; shengjyh@faculty.nctu.edu.tw).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSVT.2010.2087510

value-added services, like parking space guidance and video surveillance.

In practice, the major challenges of vision-based parking space detection come from occlusion effect, shadow effect, perspective distortion, and the fluctuation of lighting con-ditions. In Fig. 1, we show several parking lot images in our dataset. In these images, some environmental factors are mixed together in a sophisticated way. For instance, the illumination in a sunny day is quite different from that in a cloudy day, a parked car may occlude or cast a shadow over the parking space next to it, a shadowed region may be mistakenly recognized as a dark-colored vehicle, and a light-colored vehicle under strong sunlight may look very similar to a vacant parking space.

Up to now, many methods have been proposed to overcome the aforementioned difficulties. These methods can be roughly classified into two categories: car-driven and space-driven. For a car-driven method, cars are the major target and algorithms are developed to detect cars. Based on the result of car detection, vacant parking spaces are determined. To detect objects of interest, plentiful object detection algorithms can be used. For example, the object detection method proposed in [2] by Schneiderman and Kanade is a trainable detector based on the statistics of localized parts. The adaboosting-based detection algorithm [3] is another widely used technique for the detection of specific objects in 2-D images. The method proposed by Felzenszwalb et al. [4] offered an efficient way to match objects based on a part-based model that well represents an object by pictorial structures. A global color-based model had been proposed by Tsai et al. [5] to efficiently detect vehicle candidates. In that approach, a Bayesian classifier based on corner features, edge features, and wavelet coef-ficients was trained to verify the detection of vehicles. On the contrary, Lee et al. [6] and Masaki [7] kept tracking and recording the movement of vehicles to identify empty parking spaces. Even though these object detection-based frameworks had gained impressive achievement in many circumstances, such as highway and roadway, most of these algorithms are not specifically designed for vacant parking space detection in a typical parking lot. For example, as shown in Fig. 1, the captured images may include some cars with unclear details. Besides, due to the perspective distortion, a car far away from the camera only occupies a small area in the captured image. This fact may also affect the performance of car detection.

For a space-driven method, the property of a vacant parking space is the major focus and available parking spaces are

(2)

Fig. 1. Image shots of a parking lot. (a) Captured in a normal day. (b) Captured in a day with strong sunlight. (c) Captured in a cloudy day.

detected directly. When the camera is static, several back-ground subtraction algorithms, such as [8]–[10], can be used to detect foreground objects. Typically, these algorithms assume that the variation of the background is statistically stationary within a short period. Unfortunately, this assumption is not always true for an outdoor scene. For example, a passing cloud that block the sunlight may suddenly change the lightness. To handle the dynamic variation of an outdoor environment, a possible solution is to build a complete background reference set under all kinds of lighting conditions. However, lots of memory and heavy computational cost will be needed to support this approach. To solve this problem, Funck et al. [11] proposed an eigen-space representation that models a huge set of background models with much less memory space and computational cost.

With a suitable background model, a typical way to deter-mine the status of a parking space is to check the ratio of foreground pixel number to background pixel number. If the ratio is larger than a predefined threshold, that parking space is considered as occupied. However, even if the background model is well learned, this kind of method still suffers from the occlusions and shadows caused by neighboring cars. To improve the performance of detection, Huang et al. [12] proposed a Bayesian detection framework to take into account both ground plane model and car model. Both occlusion effect and illumination variation were modeled under that framework. Recently, Bong et al. [13]–[14] proposed a car park occupancy information system by using a “bi-stream” detector to overcome the shadow effect. In their approach, one stream used the background subtraction method to perform car detection, while the other stream adopted edge information to achieve shadow-insensitive detection. By using an “And” oper-ator to combine both detection results, detection performance was improved.

On the contrary, some other space-driven methods assume that a vacant parking space possesses homogeneous appear-ance and use this property to detect vacant spaces. For example, Yamada and Mizuno [15] designed a homogeneity measure by calculating the area of fragmental segments. In principle, a vacant space has fewer but larger segments, while

the area of a parked car has an opposite property. Lee et al. [16] suggested an entropy-based metric to determine the status of each parking space. Once the entropy inside a space region is larger than a threshold, that space is considered as occupied. However, these two systems ignored the shadow and occlusion caused by adjacent cars. In [17], Fabian used a segment-based homogeneity measure similar to that in [15] and proposed a method for occlusion handling. By pre-training a weighting map to indicate the image regions that may get occupied by neighboring cars, the influence of the occlusion effect can be reduced. However, due to the perspective distortion, a distant parking space only occupies a small region in the captured image. This leads to instable measurement of homogeneous areas. In order to overcome the perspective problem, López-Sastre et al. [18] suggested the rectification of the perspective effect by transforming the original parking lot image into a top-view image. A Gabor filter bank was used to derive the homogeneity feature for vacant space detection. Even though their homogeneity measure is effective for most parking spaces, the environmental variations, especially the shadow effect and the over-exposure effect caused by strong sunlight, may fail the assumption of homogeneous appear-ance. In practice, the shadow effect makes a parking space less homogeneous while the over-exposure effect makes the appearance of a car more homogeneous.

Some other authors tried to detect vacant parking spaces via classification. For example, Dan [19] trained a general support vector machine (SVM) classifier by directly using the cascaded color vectors inside a parking space as the classification fea-ture. However, the occlusion patterns were not well modeled in their approach. On the contrary, Wu et al. [20] grouped three neighboring spaces as a unit and defined the color histogram across three spaces as the feature in their SVM classifier. With this arrangement, the inter-space correlation can be learned beforehand to overcome the inter-occlusion problem. However, the performance of classification is greatly affected by the environmental variations. In general, the lighting changes may cause the variations of object appearance in both brightness and chromaticity. This effect may dramatically degrade the accuracy of classification-based detection.

Besides the aforementioned smart parking lot management (SPLM) systems, which detect vacant parking spaces based on static surveillance cameras installed around the parking lot, the automobile parking (AP) system is another approach that detects vacant parking spaces based on vehicle-embedded cameras. These in-car AP systems help drivers to identify available parking spaces while they are driving. For example, Suhr et al. [21] proposed an optical flow-based method to estimate the 3-D scene of the rear-view of a moving car. The reconstructed Euclidean 3-D scene is further analyzed to detect vacant parking spaces. In [22], the side-view images are captured for analysis as a car moves along a row of parking spaces. By classifying each captured image frame as either “vehicle” frame or “background” frame, they can identify vacant parking spaces. Even though the AP approach provides an interesting way for detecting vacant parking spaces, the focus of our approach is to improve the performance of an SPLM system, which can easily fit into today’s parking lot

(3)

management systems. In this paper, we propose a hierarchical Bayesian generation framework to model the generation of environmental variations, including the perspective distortion, the occlusion effect, the shadow effect, and the fluctuation of lighting condition. As will be shown in Section VI, accurate results can be obtained based on the proposed framework.

The rest of this paper is organized as follows. In Section II, we present the main idea of our algorithm. The top-down in-formation from the 3-D scene model is detailed in Section III, while the message from image observation is presented in Section IV. The whole inference procedure is explained in Section V. Experimental results and discussions are presented in Section VI. Last, Section VII concludes this paper.

II. Algorithm Overview

In our system, the scene understanding and vacant parking space detection are accomplished based on the integration of scene prior and image observation. By treating the status of each parking space as a part of the scene parameters, the vacant space detection is achieved via the process of scene inference. The general concept of the proposed system is illustrated in Fig. 2. Based on a three-layer Bayesian hierar-chical framework, called BHF, the bottom-up messages from image observation and the top-down knowledge from the scene model are effectively integrated. In BHF, the illumination variations are overcome by transferring the fluctuating red, green, and blue (RGB) observations into meaningful labels. The labeling process is treated as a color classification process between content labeling and image observation. Since the observation difference is mainly caused by the object type and the lighting condition, we decompose the image observation into an object component and a lighting component. The object type is either “car” or “ground,” while the lighting condition is either “shadowed” or “unshadowed.” To adapt to the time-varying lighting condition, we online build the color classification models for object type and lighting condition. On the contrary, some global knowledge of the 3-D scene offers useful information for the labeling of image pixels. The top-down knowledge is propagated downward to influence the labeling process via the generation of an “expected object map” and an “expected shadow map.” Here, we explicitly define a generative model that takes into account the inter-occlusion effect, the expected shadow effect, and the perspec-tive distortion. The relationships among these effects and the status of parking spaces are explicitly modeled via a Bayesian probabilistic model. By compromising between the expected labeling maps and the labeling from image observation, the status hypotheses of each parking space are evaluated. Finally, to avoid incorrect inference caused by unexpected occlusions, the global status hypotheses from the scene model provides useful constraints to handle partially inconsistent labels. Under the proposed BHF, the vacant parking space detection problem and the optimal image content labeling problem are integrated in a unified manner.

In principle, we can formulate the vacant space detection problem as a status decision process based on image obser-vations from a single camera. Since the status of a parking

Fig. 2. Concept of the proposed vacant space detection process.

space may actually affect the inference of neighboring spaces, it is unsuitable to decide the status of each parking space individually. Instead, we analyze the status of neighboring parking spaces at the same time. Moreover, both bottom-up message and top-down knowledge are modeled as probabilistic constraints in the proposed BHF. The vacant parking detection process is regarded as a Bayesian inference problem and is solved by finding the most reasonable parking space status that fits both scene prior and image observation.

In Fig. 3, we show a simplified three-layer structure to explain the BHF framework. For vacant space detection, we define the image observation layer as IL, where each node

IL(m, n) indicates the RGB color feature at the (m, n) pixel

of an image of size M × N. On the contrary, we define the labeling layer as HL, where each node HL(m, n) represents the

categorization of the image pixel at (m, n). The labeling result of HL(m, n) could be (C, S), (G, S), (C, US), or (G, US), where

C denotes “car,” G denotes “ground,” S denotes “shadowed,”

and US denotes “unshadowed.” Moreover, we define the scene layer as SL, which indicates the status hypotheses of the

(4)

Fig. 3. Illustration of the three-layer BHF.

parking spaces. The node SL(i) in SL denotes the status of

the ith parking space. Its value can be either 1 (occupied) or 0 (vacant). Note that Fig. 3 is for illustration purpose only. The exact model of BHF is to be explained later in Section IV.

In this model, the topology of the inter-layer connections represents the probabilistic constraints between nodes. Given the observation IL, the status of the parking spaces is

deter-mined by finding the pair (HL∗, SL∗) such that

H_L∗, S_L∗ = arg max

HL,SL

p(HL, SL|IL). (1)

Furthermore, (1) can be reformulated as follows:

assume the probabilistic property of the observed image data is conditionally independent of the scene model once if the pixel labels are determined. Moreover, p(IL|HL) stands for the

con-straints between the labeling layer and the observation layer. In our approach, the labeling results should be consistent with the RGB values of the observed image, and the labels of adjacent pixels should follow some kind of smoothness constraint. On the contrary, p(HL|SL) stands for the constraints between

the scene layer and the labeling layer. In our approach, the labeling of parked cars and shadowed regions should match the expected inter-occlusion pattern and the expected shadow pattern in a probabilistic sense. Finally, p(SL) represents the

prior knowledge of the parking space status. In our system, we assume that the “occupied” status and the “available” status are equally possible for every parking space. With this assumption, the ln p(SL) term in (2) can be ignored. Moreover, to find

the optimal solution in (2), we adopt the graph-cuts technique [23]–[25].

III. Top-down Knowledge from Scene Layer

Since the parking spaces in a parking lot are well structured, we can synthesize an expected object map once if we have the 3-D car model and a hypothesis about the status of parking spaces. On the contrary, if we know the lighting condition (sunny or cloudy) and have the direction of sunlight, we may also synthesize an expected shadow pattern. In our system, both expected object map and expected shadow map are created to help the labeling of image pixels.

In our approach, p(HL|SL) is reformulated as follows:

p(HL|SL) = m n p(HL(m, n)|SL) (3)

in which we assume the labeling nodes HL(m, n) are

condition-ally independent of each other once if the knowledge from the scene layer SL is given. Since the object type and the lighting

type are physically independent, we formulate p[HL(m, n)|SL]

as follows:

p(HL(m, n)|SL) = p(hO(m, n)|SL)p(hL(m, n)|SL). (4)

In physics, the object labeling model p[hO_{(m, n)}_|S

L] includes

the expected car mask and the inter-occlusion effect among neighboring cars, while the light labeling model p[hL_(m,

n)|SL] includes the expected shadow mask to indicate

shad-owed pixels. To define these two labeling models, we first introduce a parametric model to define the 3-D structure of a parking lot. Based on the parametric scene model, we propose a generation process to generate the expected object labeling map and the expected shadow labeling map.

A. 3-D Scene Model

In our system, the number of parking space (Ns) and their

locations on the 3-D ground plane are defined and learned in advance. In a normal situation, a car is parked inside a parking space. To simulate a parked car, we assume each car is a cube in the 3-D world. The length (l), width (w), and height (h) of the cube are modeled as three independent Gaussian random variables, with the probability density functions p(l), p(w), and

p(h). Besides, the random vector (l, w, h)T _{is assumed to be}

identically and independently distributed at different parking spaces. Here, the probability density functions p(l), p(w), and

p(h) are pre-learned based on 120 parked cars. On the contrary,

the 3-D ground plane of the parking lot is defined as a 2-D plane (X, Y, 0). Inside the ith parking space, we assume the projection of the car center on the ground plane is represented by (Xi, Yi0), where Xi and Yi are modeled as two randomly

distributed Gaussian random variables with the probability density functions p(Xi) and p(Yi). The mean values of p(Xi)

and p(Yi) are set to be the center of the ith parking space on

the ground plane. Moreover, we assume the location pattern of parked cars in difference parking spaces is similar. That is, we assume the variances of p(Xi) and p(Yi) are independent of i.

To train the variance values of p(Xi) and p(Yi), we measured

for each of these 120 cars the deviation of the car center from the center of the parked space.

To predict the shadowed regions, we model the lighting condition in the 3-D scene. In general, we may assume there

(5)

Fig. 4. (a) 3-D car model. (b) Expected car labeling map.

are two major types of illumination in an outdoor environment: direct illumination from the sun and ambient illumination from the sky. For each image pixel, it may be lighted by the skylight only, or lighted by both skylight and sunlight. Basically, shadow reflects the contrast of brightness for regions illuminated by different types of lighting. If the sunlight exists in the environment, the regions lighted by skylight only appear to be shadowed. On the contrary, when sunlight is absent, we assume there is no shadowed region.

Moreover, when sunlight is present, we assume the direction of sunlight is represented by a 3-D vector [DX(t), DY(t), DZ(t)]T, which is a function of time

t. In our approach, the 3-D scene model of a park-ing lot is determined by the parameter set , where

= {DX(t), DY(t), DZ(t),{SL(i), li, wi, hi, Xi, Yi, for i =

1, 2, ..., Ns}}. In , {SL(i)} is the main unknown variable.

The detailed deduction of the sunlight direction [DX(t),

D_Y(t), DZ(t)]T is to be explained later in Section III-A and

Appendix I.

B. Generation of Expected Labeling Maps

1) Object Labeling Model: In our system, once the 3-D scene parameters are given, the expected object labeling and the expected shadow labeling on the captured images are automatically generated. Based on the projection matrix of the camera, a synthesized car parked at (Xi, Yi, 0) inside the

ith parking space, with length li, width wi, and height hi, is

projected onto the camera view to get the projection image

Mi(m, n|Xi, Yi, li, wi, hi), which has the value 1 if the pixel

(m, n) is within the projected region, and 0 otherwise. Since the size parameters (li, wi, hi) and the parked location (Xi,

Yi) may vary from car to car, we take into account the prior

probabilities p(li), p(wi), p(hi), p(Xi), and p(Yi) and define the

expected car labeling map to be a probabilistic map Ci(m, n),

which is the expectation value of Mi(m, n|Xi, Yi, li, wi, hi),

that is

Ci(m, n) = E Xi,Yi,li,wi,hi

[Mi(m, n|Xi, Yi, li, wi, hi)]. (5)

On the contrary, since the object type of an image pixel is either “car” or “ground,” the expected ground labeling map is deduced to be

Gi(m, n) = 1− E Xi,Yi,li,wi,hi

[Mi(m, n|Xi, Yi, li, wi, hi)]. (6)

In our system, we numerically calculate the expectation in (5) and (6) based on the Monte Carlo approach. Here, based

Fig. 5. (a) Expected car labeling. (b) Expected ground labeling.

on the prior probabilities p(li), p(wi), p(hi), p(Xi), and p(Yi),

we draw a large set of sample tuples. For each sample tuple, say (lk, wk, hk, Xk, Yk), we synthesize a projection image. By

averaging all projection images for all sample tuples, we get a probability map that approximates Ci(m, n). In Fig. 4(b), we

show the expected car labeling map of the car in Fig. 4(a). While taking all parking spaces into consideration, an image pixel at (m, n) in the ith parking space may get occluded not only by a car parked at that parking space but also by a car parked at an adjacent parking space. To model the inter-occlusion effect in the object labeling model, we define the probability p(h0(m, n) = 0|SL) = Ns i=1 Gi(m, n)sL(i) (7) where SL(i) is the status of the ith parking space. With (7),

the probability of car labeling at (m, n) given the status of all parking spaces can be formulated as follows:

p(h0(m, n) = 1|SL) = 1− Ns i=1 Gi(m, n)sL(i) . (8) In Fig. 5(a) and (b), we show the examples of p(h0_{(m, n) =}

1|SL), and p(h0(m, n) = 0|SL), respectively.

2) Shadow Labeling Model: Similarly, by using a cube

model for a parked car, the expected shadowed regions on the ground plane can be quickly determined in the 3-D space whenever the sunlight direction is known and the status of parking spaces are determined. An example is illustrated in Fig. 6. Here, we define Ti(m, n|Xi, Yi, li, wi, hi) to be the

projected shadow labeling image generated by a car parked at (Xi, Yi,0) inside the ith parking space, with length li, width

wi, and height hi. Similarly, by taking into account the prior

probabilities p(li), p(wi), p(hi), p(Xi), and p(Yi), we define

the expected shadow labeling map Si(m, n) in a probabilistic

manner as follows:

Si(m, n) = E Xi,Yi,li,wi,hi

[Ti(m, n|Xi, Yi, li, wi, hi)]. (9)

Similarly to (6), the expected non-shadow labeling map is defined as USi(m, n) = 1− Si(m, n). In Fig. 6(b), we show the

expected shadow labeling map of the car in Fig. 6(a). To model the shadow labeling model p(hL_{(m, n)}_|S

L) with

the consideration of all parking spaces, we define

p(hL(m, n) = 0|SL) = Ns

i=1

(6)

Fig. 6. (a) Shadow formation. (b) Expected shadow labeling map.

Fig. 7. (a) Illusion of shadow formation. (b) Expected shadow labeling map. (c) Expected car labeling map. (d) Refined shadow labeling map.

where SL(i) is the status of the ith parking space. With (10),

the probability of shadow labeling at (m, n) given SL can be

modeled by p(hL(m, n) = 1|SL) = 1− Ns i=1 [USi(m, n)SL(i)]. (11)

In Fig. 7(a) and (b), we show an example of the 3-D parking lot model and its expected shadow labeling map. To simplify the problem, we ignore the shadows cast upon the parked cars and only consider the shadows cast on the ground plane. With this assumption, a pixel with a higher probability of car labeling is less likely to be shadowed. Hence, we refine the probabilistic shadow labeling map to be

p(hL_{(m, n) = 1}_|S L) = (1− p(hO_{(m, n) = 1}_|S L)) × (1 − Ns i=1 [USi(m, n)SL(i)]). (12) A refined shadow labeling map is shown in Fig. 7(d).

C. Sunlight Direction

To generate the expected shadow labeling map, we need the direction of sunlight. The information of sunlight parameters is available on the Internet, such as the U.S. Naval Observatory website [26]. By providing the date and the geo-location of the parking lot, including longitude and latitude from a global position system, the web service can provide samples of sunlight direction for every 10 min.

In our system, we adopt the concept proposed in [27] to calculate the sunlight direction. In principle, the solar motion model and the sunlight direction can be estimated based on the variations of intensity values in a day. In a single day,

Fig. 8. Illustration of solar movement and sunlight direction.

the solar motion follows a circle on the solar plane in the 3-D space, with a constant angular frequency ωs, as illustrated

in Fig. 8. The angular frequency depends mainly on the self-rotation of the Earth and is known in advance. The whole set of sunlight directions in a day form a conical surface and the cone aperture is equal to π-2δ, where δ is the sun declination angle approximated as δ=−23.45o· cos 360 365 · (Nd+ 10) o . (13) In (13), Nd is the number of days counted from January 1 to

the current date. With this cone model, the sunlight direction over time can be parametrically represented by

−−→

D(t) =−{sin(δ)−→n+cos(δ)[cos(ωs(t−tθ))−→u+sin(ωs(t−tθ))−→s]}

(14) where u is a unit reference vector on the solar plane at time

tθ, n is the normal vector of the solar plane, and s = n × u.

On the contrary, we assume the scene surfaces are mainly Lambertian surfaces. Hence, the intensity value reflected from a surface is proportional to the incident angle of the incident light with respect to the surface normal. The intensity value at an image pixel will climb to its maximum when the subtended angle between the corresponding surface normal vector and the sunlight direction reaches the minimum. As explained in Appendix I, if P is the normal vector of a surface patch in

the 3-D scene, the intensity value at the corresponding image pixel can be approximated as follows:

Isun(m, n, t) = B(m, n) cos(ωst− θp(m, n)) + C(m, n) (15)

which is a scaled cosine function plus a constant offset. Moreover, if θ represents the angle subtended by u and the projection of P on the solar plane, the phase shift θ p of

the cosine function is equal to θ up to a constant offset. In principle, if we pick up three image pixels whose 3-D scene points lie on different surfaces with linearly independent normal vectors, we can deduce the geometric relationship between the solar plane and these three surface normal vectors [27]. For detailed deduction, please refer to Appendix I.

In Fig. 9(a), we show three manually selected image pixels in the parking lot scene, one from the driveway and two from the bushes. These image pixels locate at three mutually orthogonal planes. The intensity profile of a pixel in the green

(7)

APPENDIX III

Table of Variable Notations

Notation Meaning

1 HL, IL, SL Labeling layer, observation layer,

and scene layer 2 M and N Image dimensions

3 (m, n) Pixel coordinates

4 C, G, Car label and ground label 5 S, US Shadowed label and unshadowed

label

6 Ns Number of parking spaces

7 hO(m, n) Object label at (m, n) 8 hL_{(m, n)} _{Light label at (m, n)}

9 l, w, h Car length, car width, and car height

10 Xi, Yi Projected position of car center on

the ground plane in the ith parking space

11 Ci(m, n) Expected car labeling map at (m,

n) given the ith parking space

being occupied

12 Gi(m, n) Expected ground labeling map at

(m, n) given the ith parking space being occupied

13 Si(m, n) Expected shadow labeling map at

(m, n) given the ith parking space being occupied

14 USi(m, n) Expected non-shadow labeling

map at (m, n) given the ith parking space is occupied

15 ωs Angular frequency of solar

move-ment

16 u A unit reference vector on the

solar plane

17 n Normal vector of the solar plane

18 ED Classification energy

19 EA Adjacency energy

20 Np Neighborhood around (m, n)

21 IRGB RGBcolor features of a pixel

22 IRGBN Normalized IRGB

23 R A 3× 3 matrix depending on sur-face reflectance

24 i A vector depending on

illumina-tion

25 gO,L An intensity sample from the

ob-ject type O under the illumination type L

26 aO1,O2,L The intensity ratio between the objects O2 and O1 under the

il-lumination type L

27 nO1,O2,L A zero mean Gaussian noise that expresses the uncertainty in mod-eling the intensity ratio

28 gO,L, ˆσ2gO,L Sample mean and variance of the intensity training samples

Acknowledgment

The authors would like to thank the Associate Editor C. N. Taylor and the anonymous reviewers for their comments.

References

[1] M. Y. I. Idris, Y. Y. Leng, E. M. Tamil, N. M. Noor, and Z. Razak, “Car park system: A review of smart parking system and its technology,”

Inform. Technol. J., vol. 8, no. 2, pp. 101–113, 2009.

[2] H. Schneiderman and T. Kanade, “Object detection using the statistics of parts,” Int. J. Comput. Vision, vol. 56, no. 3, pp. 151–177, Feb. 2004. [3] P. Viola and M. Jones, “Robust real-time face detection,” Int. J. Comput.

Vision, vol. 57, no. 2, pp. 137–154, May 2004.

[4] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part based models,” IEEE Trans.

Patt. Anal. Mach. Intell., vol. 32, no. 9, pp. 1627–1645, Sep. 2010.

[5] L.-W. Tsai, J.-W. Hsieh, and K.-C. Fan, “Vehicle detection using normalized color and edge map,” IEEE Trans. Image Process., vol. 16, no. 3, pp. 850–864, Mar. 2007.

[6] C. H. Lee, M. G. Wen, C. C. Han, and D. C. Kou, “An automatic monitoring approach for unsupervised parking lots in outdoors,” in Proc.

Int. Conf. Security Technol., 2005, pp. 271–274.

[7] I. Masaki, “Machine-vision systems for intelligent transportation sys-tems,” in Proc. IEEE Int. Conf. Intell. Transport. Syst., vol. 13, no. 6. Nov. 1998, pp. 24–31.

[8] T. Horparasert, D. Harwood, and L. A. Davis, “A statistical approach for real-time robust background subtraction and shadow detection,” in

Proc. IEEE Int. Conf. Comput. Vision, Sep. 1999, pp. 1–19.

[9] C. Stauffer and W. E. L. Grimson, “Adaptive background mixture models for real-time tracking,” in Proc. IEEE Int. Conf. Comput. Vision Patt.

Recog., vol. 2. Jun. 1999, pp. 246–252.

[10] P. Power and J. Schoonees, “Understanding background mixture model for foreground segmentation,” in Proc. Image Vision Comput., 2002, pp. 267–271.

[11] S. Funck, N. Mohler, and W. Oertel, “Determining car-park occupancy from single images,” in Proc. IEEE Intell. Vehicles Symp., Jun. 2004, pp. 325–328.

[12] C. C. Huang, S. J. Wang, Y. J. Chang, and T. Chen, “A Bayesian hierarchical detection framework for parking space detection,” in Proc.

IEEE Int. Conf. Acou., Speech Signal Process., Apr. 2008, pp. 2097–

2100.

[13] D. B. L. Bong, K. C. Ting, and N. Rajaee, “Car-park occupancy information system,” in Proc. 3rd Real-Time Technol. Applicat. Symp., Dec. 2006, pp. 65–70.

[14] D. B. L. Bong, K. C. Ting, and K. C. Lai, “Integrated approach in the design of car-park occupancy information system,” IAENG Int. J.

Comput. Sci., vol. 35, no. 1, pp. 7–14, 2008.

[15] K. Yamada and M. Mizuno, “A vehicle parking detection method using image segmentation,” Electron. Commun., vol. 84, no. 10, pp. 25–34, Oct. 2001.

[16] C. H. Lee, M. G. Wen, C. C. Han, and D. C. Kuo, “An automatic monitoring approach for unsupervised parking lots in outdoor,” in Proc.

IEEE Int. Conf. Security Technol., Oct. 2005, pp. 271–274.

[17] T. Fabian, “An algorithm for parking lot occupation detection,” in

Proc. IEEE Comput. Inform. Syst. Indust. Manage. Applicat., Jun. 2008,

pp. 165–170.

[18] R. J. L´opez-Sastre, P. G. Jimenez, F. J. Acevedo, and S. M. Bascon, “Computer algebra algorithms applied to computer vision in a parking management system,” in Proc. IEEE Int. Symp. Indust. Electron., Jun. 2007, pp. 1675–1680.

(8)

[19] N. Dan, “Parking management system and method,” U.S. Patent 20030144890A1, Jul. 2003.

[20] Q. Wu, C. C. Huang, S. Y. Wang, W. C. Chiu, and T. H. Chen, “Robust parking space detection considering inter-space correlation,” in Proc.

IEEE Int. Conf. Multimedia Expo, Jul. 2007, pp. 659–662.

[21] J. K. Suhr, K. Bae, J. Kim, and H. G. Jung, “Free parking space detection using optical flow-based Euclidean 3-D,” in Proc. IAPR Conf. Mach.

Vision Applicat., May 2007, pp. 563–566.

[22] W. Yu and T. Chen, “Parking space detection from video by augmenting training dataset,” in Proc. IEEE Int. Conf. Image Process., Nov. 2009, pp. 849–852.

[23] Y. Boykov, O. Veksler, and R. Zabih, “Efficient approximate energy minimization via graph cuts,” IEEE Trans. Patt. Anal. Mach. Intell., vol. 23, no. 11, pp. 1222–1239, Nov. 2001.

[24] V. Kolmogorov and R. Zabih, “What energy functions can be minimized via graph cuts?” IEEE Trans. Patt. Anal. Mach. Intell., vol. 26, no. 2, pp. 147–159, Feb. 2004.

[25] Y. Boykov and V. Kolmogorov, “An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision,” IEEE Trans.

Patt. Anal. Mach. Intell., vol. 26, no. 9, pp. 1124–1137, Sep. 2004.

[26] U.S. Naval Observatory. (2010). Naval Oceanography Portal [Online]. Available: http://www.usno.navy.mil/USNO

[27] K. Sunkavalli, F. Romeiro, W. Matusik, Y. Zickler, and H. Pfister, “What do color changes reveal about an outdoor scene?” in Proc. IEEE Int.

Conf. Comput. Vision Patt. Recog., Jun. 2008, pp. 1–8.

[28] Y. Tsin, R. Collins, V. Ramesh, and T. Kanade, “Bayesian color constancy for outdoor object recognition,” in Proc. IEEE Conf. Comput.

Vision Patt. Recog., Dec. 2001, pp. 1132–1139.

[29] T. Boykov, O. Veksler, and R. Zabih, “Markov random fields with efficient approximations,” in Proc. IEEE Int. Conf. Comput. Vision Patt.

Recog., Jun. 1998, pp. 648–655.

[30] D. L. Davies and D. W. Bouldin, “A cluster separation measure,” IEEE

Trans. Patt. Anal. Mach. Intell., vol. 2, no. 1, pp. 224–227, Apr. 1979.

[31] C.-C. Huang. Huang’s Projects [Online]. Available: http://140.113.238. 220/∼chingchun/projects.html

[32] G. D. Finlayson, M. S. Drew, and B. V. Funt, “Color constancy: Generalized diagonal transforms suffice,” J. Optic. Soc. Am., vol. 11, no. 11, pp. 3011–3019, Nov. 1994.

[33] D. Zmura and G. Iverson, “Color constancy: Basic theory of two stage linear recovery of spectral descriptions for lights and surfaces,” J. Optic.

Soc. Am., vol. 10, no. 10, pp. 2148–2165, 1993.

Ching-Chun Huang received the B.S., M.S., and

the Ph.D. degrees in electrical engineering from National Chiao Tung University, Hsinchu, Taiwan, in 2000, 2002, and 2010, respectively.

His research interests are in image/video process-ing, computer vision, and computational photogra-phy.

Dr. Huang is a member of the IEEE Computer Society.

Sheng-Jyh Wang (M’95) received the B.S. degree

in electronics engineering from National Chiao-Tung University (NCTU), Hsinchu, Taiwan, in 1984, and the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, in 1990 and 1995, respectively.

He is currently a Professor with the Department of Electronics Engineering, NCTU. His research interests are in the areas of image processing, video processing, and image analysis.