Reasoning and Decision - System Operation

Chapter 2 Drowsiness Detection and Warning System

2.2. System Operation

2.2.5. Reasoning and Decision

Having obtained the values of facial parameters, we are ready to determine the drowsiness level of the driver based on the parametric values. A fuzzy integral process

to be addressed in Section 3.5 is employed for this purpose. However, different parameters have different ranges of values. Before invoking the fuzzy integral process, we have to transfer the ranges of parameters into a consistent one. In the following, the transfer functions of parameters actually transfer parametric values into drowsiness degrees that are within the range of [0, 1].

Percent eye closure over time:

0 0.12

Gaze degree ([0,1]):

0 10

( ) 0.05 0.5 10 30

1 30

D x x x

⎧⎪

=⎨ − < <

⎪ >=

⎩

The fuzzy reasoning step returns a number of integral values, each resulting from a hypothesized degree of drowsiness. In the decision step, the hypothesized drowsiness degree with the largest integral value is regarded as the drowsiness level of the driver.

The system can then take actions, such as starting a ventilator, spreading fragrance, turning on a radio, offering relaxing tunes, and providing entertainment options, according to the determined drowsiness level of the driver. In high drowsiness situation, the system may initiate navigation aids and alert others to the drowsiness of the driver. Currently, our system only gives different numbers of beeps, which are proportional to the different levels of drowsiness.

Chapter 3. Implementations

In this section, the implementation details of key techniques, including lighting

compensation, color transform, skin model, facial feature detection, are addressed.

3.1. Lighting Compensation

Unlike images taken under fixed light sources, the input images to our system are captured under unpredictable light sources. Both the brightness and chromatic characteristics of our images can vary significantly from image to image. The lighting compensation process is to reduce the influences originating from the variations of ambient lighting conditions.

Consider a color imageC R G B , where R, G, and B are the three color ( , , ) components of the image. First of all, we figure out the level of brightness of the image.

To this end, we compute the grayscale version I of the image by I =(R G+ +B) / 3 and next compute its histogramh( )⋅ . Afterwards, we calculate the distribution tendency t of h( )⋅ . Different from the skewness of h( )⋅ which measures the asymmetricity of

number of gray levels), the distribution tendency of h( )⋅ measures its asymmetricity with respect to the domain median m (m=L/ 2, in this study m = 127).

To begin, we compute the second M and third ₂ M moments of ( )₃ h ⋅ with positive or negative. A positive t indicates a relatively bright image, whereas a negative t indicates a relatively dark image. Refer to Figure 16, in which the first row displays the input images and the second row depicts their histograms and the calculated histogram distribution tendencies.

Fig. 16. Examples for illustrating lighting compensation.

Having obtained the histogram distribution tendency t of an image, if t< −10, we determine a fraction p according to

p = α e

⁻^β^{( 138)}^t⁺ , where α =0.713 and

0.013

β = . Then, for each color component of the image we search for its pixels with the top p of values and average the values. Let (a_R, a_G, a_B) be the averaging values of (R, G, B) color components, respectively. Thereafter, for each image pixel with (r, g, b) color values, we transfer the values into (r’, g’, b’) by

255

The above equations raise the brightness of the image by rescaling the color values of image pixels.

In a similar vein, if t>50 , we determine a fraction p according to

( 138)t

p = α e

⁻^β ⁺ , where α =0.323 and β =0.011.. Then, for each color component of the image we search for its pixels with the bottom p of values and average the values. Let (a_R, , )a_G a_B be the averaging values of (R, G, B) color components, respectively. Thereafter, for each image pixel with (r, g, b) color values, we transfer the values into (r’, g’, b’) by

The above equations decrease the brightness of the image by rescaling the color values of image pixels.

No operation is applied to the image if − < <10 t 50. Refer to Figure 16(c), where the lighting compensation results of the images of Figure 16(a) are exhibited.Note that the range of t values is between -138 and 138. The above lighting compensation process will distort the chromatic characteristics of the original image if its calculated t

compensation process adapts the brightness of an image by rescaling its color components separately.

3.2. Skin Model and Color Transform

The skin model is defined in terms of three color spaces: RGB, YC C and _r _b LUXˆ ˆ . The RGB color space is utilized because the input image is represented in terms

of R, G and B components and because the skin color has larger R than G values.

Therefore, we have the first constraint for the skin model, which states that a pixel is a potential skin pixel if its R value is larger than G value. This constraint is obviously necessary but not sufficient because there are non-skin pixels, whose R values are still larger than G values. The RGB color space merges the chrominance and luminance components in one space. However, the appearance of skin color can vary significantly under different illumination conditions. A relative loose skin tone cluster is distributed in the RGB space.

There are many color spaces with separated chrominance and luminance components. Of which, we prefer the YC C color space because of its perceptual _r _b

uniformity [Poy96] and the low degree of luma dependency and compactness of the skin tone cluster [Hsu02]. The transformation from the RGB to the YC C color space _r _b

is given by

where Y is the luminance component and C and _r C are the chrominance _b components. Chai and Ngan [Cha99] have suggested the ranges of C [133, 173] and _r

C [77, 127] components for skin color. However, the b C range shrinks and shifts as _r the Y value becomes large or small.

Researchers have attempted to modify the YC C space so that the skin tone _r _b cluster could be luma-independent in the new spaces. These include the YC C space _r _g [Dio03] that was linearly transformed from the YC C space, the _r _b LUX^{ˆ ˆ} space

[Lie04] that was nonlinearly transformed from the YC C space, and the _r _b YC C_r′ ′ _b space [Hsu02] that was obtained by piecewise linearly modifying the YC C space. In _r _b

this study, we consider the LUX^{ˆ ˆ} spaces, which is actually a simplified version of the LUX color space. Both the LUX and LUX^{ˆ ˆ} color spaces were provided by Lievin and Luthon [Lie04].

Starting with the LUX color space, the following equations formulate the

transformation from the RGB into the LUX space.

0.3 0.6 0.1 luminance component, and U and X are the chrominance components. In the above

equations, although the time complexities of the equations associated with U and X are reasonable, the equations involve L whose equation has a high computational complexity. Lievin and Luthon [Lie04] then suggested replacing L by G because of their close proximity in skin color. Accordingly, the chrominance components U and X

are approximated by U^ˆ and ˆX as

As mentioned earlier, the skin color has larger R than G values. We are hence interested in only the equation ˆ ( 1)

2 1

U M M G

= − +

+ and empirically determine its range [0, 249] for the skin color.

We now summarize the skin model in the following. A pixel belongs to the skin model if (1) the pixel has lager R than G values, (2) its C value is between 77 and _b

127, and (3) its U^ˆ value is between 0 and 249. Therefore, during the color transform of an image, only the pixels having lager R than G values we compute their C and _b Uˆ values by C_b= −0.1687R−0.3313G+0.5B+128 and ˆ ( 1)

During facial feature detection, eyes and mouth are located using a technique recently reported by Hsu et al. [Hsu02]. This section briefly reviews their technique.

Considering a color image ( , , )I R G B , first of all the image is transformed from the

RGB space into the YC C space using Equation (1). Let _r _b I Y C C′( , _r, _b) denote the transformed image.

A. Eye Detection

At the beginning of eye detection, two maps, EyeMapC and EyeMapL , are constructed. Let ( , )x y denote the location of any pixel. scaled within [0, 255]. The resultant EyeMapC is further enhanced by histogram equalization. Next, construct EyeMapL .

( , ) ( , )

where ⊕ and Θ are morphological dilation and erosion operators, respectively, and ( , )

g_σ x y is the structuring function with the element shape of . Afterwards, the maps EyeMapC and EyeMapL are integrated by

( , ) min{ ( , ), ( , )}

EyeMap x y = EyeMapC x y EyeMapL x y .

Refer to the example shown in Figure 17, in which the input image, maps EyeMapC , EyeMapL and EyeMap are depicted in Figures (a), (b), (c) and (d), respectively. Note that eyes have been highlighted in the map EyeMap . We next locate the eyes in the map by thresholding (Figure 17(e)), connected component labeling, and size filtering (Figure 17(f)). In general, a number of eye candidates are detected. Figure 17(g) shows the located eye candidates.

(a) input image (b) EyeMapC (c) EyeMapL (d) EyeMap

(e) thresholding (f) size filtering (g) eye candidates Fig. 17. Example for illustrating eye detection.

B. Mouth Detection

In mouth detection, one map MouthMap is computed.

detected earlier. Refer to the example shown in Figure 18, in which the input image and the associated map MouthMap are depicted in Figures (a) and (b), respectively.

The mouth has been emphasized in MouthMap. We locate the mouth in the map by thresholding (Figure 18(c)), connected component labeling, and size filtering (Figure 18(d)). In general, a number of mouth candidates are detected. Figure 18(e) shows the located mouth candidates.

(a) input image (b) MouthMap

Fig. 18. Example for illustrating mouth detection.

3.4. Face Tracking

Many techniques [Cap07] are feasible for moving object tracking, such as Baysian dynamic model, bootstrap filter, mean shift, particle filter, and sequential Monte Carlo. In this study, a linear Kalman filter [Wel95] is employed because of its efficiency and the short period of system state changes under consideration. Note that the system under our consideration is the driver’s face, which is to be tracked over a video sequence. The time interval of face state change is the period between two successive images. Face state change within such a small interval is assumed to be linear. In the following, we briefly review the linear Kalman filter and then address how to track over a video sequence the located driver’s face using the filter.

A. Linear Kalman Filter

There are two models involved in the linear Kalman filter: a system model and a measurement model both characterized by linear stochastic difference equations. The system model is governed by s_t₊₁=As_t +w , in which _t s is the system state vector _t

at time t, A is a state transition matrix, and w represents the system perturbation _t assumed to be with a normal probability distribution p( )w =N( , )0 Q where 0 is the zero vector and Q is the covariance matrix of system perturbation. The measurement model is formulated as z_t =Hs_t +v , in which _t z is the measurement vector at time t, _t H relates the system state s to the measurement _t z , and _t v represents the _t measurement noise also assumed to be with a normal probability distribution

( ) ( , ) indicates transposition. It is desirable that the a posterior error covariance

[ ^T]

The Kalman filter proceeds as follows. There are two phases constituting the Kalman filter: the prediction and the updating phases. In the prediction phase,

ˆ_t⁻₊1= Aˆ_t + _t computed. In the above process, matrices A and H are defined according to the

practical application at hand and covariance matrices Q andR are empirically determined. Given the initial values of ˆs , ₀ C and ₀ z , the above two phases repeat ₀ until a stopping criterion is reached and the result is ˆs . _t

B. Tracking

As mentioned, the system under consideration is the located driver’s face, which is described in terms of a triangle connecting the two eyes and mouth of the deriver.

We hence define the measurement vector z of a face as z=( ,x y x y x_l _l, _r, _r, _m,y_m)^T, where ( ,x y , (_l _l) x y_r, _r) and (x_m,y_m) are the locations of the left eye, right eye and mouth, respectively. The state vector s of the face further includes the velocities of the facial features, ( , )u v , (_l _l u v and (_r, _r) u_m,v_m), i.e.,

Since the measurement vector z is a sub-vector of the state vector s, we then define the measurement-state relating matrix H as a sub-matrix of A in the following.

1 0 0 0 0 0 0 0 0 0 0 0

Since the matrices A and H are stemmed from the simplified motion equations, errors are inevitably introduced into the predicted system state and measurement through the matrices. These errors and many others are assumed to be compensated by the system perturbation term w ~ N( , )0 Q and the measurement noise term

( , )

N R

v ~ 0 . Empirically, we observed the location and velocity errors are about four and two pixels, respectively. Accordingly, we define Q and R as

1 6 0 0 0 0 0 0 0 0 0 0 0

Note that we fix the above two matrices during iterations because the same motion equations are assumed in each iteration.

We next determine the initial values of w , ₀ C , ₀ ˆs and ₀ z . The system ₀ perturbations (w_t t≥0) are random vectors generated by N( , )0 Q . The initial a posterior error covariance matrix C is to be updated with iterations. A precise ₀ C is ₀

not necessary. Empirically, 10-pixel positional error and 5-pixel speed error have been

To determine the initial face state vector

0 0 0 0 0 0 0 0 0 0 0 0 0

ˆ =(x_l ,y_l ,x_r ,y_r ,x_m ,y_m ,u_l ,v_l ,u_r ,v_r ,u_m ,v_m )^T

s ,

recall that a face candidate is determined as the actual face only when it is repeatedly detected in two successive images. Let ((x_l⁰,y_l⁰),(x_r⁰,y⁰_r),(x_m⁰,y_m⁰)) and (( ,x y¹_l ¹_l),

1 1

(x y_r, _r),(x¹_m,y¹_m)) be the locations of the left eye, right eye and mouth at times t ₀ and t , respectively. Based on the above values, the components of the initial state ₁

vector ˆs are given as ₀ x_i₀ =(x_i⁰+x¹_i) / 2 , y_i₀ =(y_i⁰+y¹_i) / 2 , u_i₀ =x¹_i −x_i⁰ ,

Recall that measurement vector z consists of the positions of facial features, i.e.,z=( ,x y x y x_l _l, _r, _r, _m,y_m)^T. Considering a facial feature, its location (x_t₋₁,y_t₋₁)

and velocity (u_t₋₁,v_t₋₁) at time t-1 are known. To attain the location ( ,x y _t _t) of the feature at time t, we decide a rectangular searching space S in image I . Let _t

(x_ul,y_ul) and (x_lr,y_lr) denote the coordinates of the upper left and lower right corners of S, respectively. The two corners are determined in the following.

1 1 1 1

(x_ul,y_ul)=(x_t₋ −l_ee/ 2+l u_ee _t₋ /10, /y_t₋ −l_em a l v+ _{em t}₋ /10),

1 1 1 1

(x_lr,y_lr)=(x_t₋ −l_ee/ 2+l u_ee _t₋ /10, /y_t₋ −l_em a l v+ _{em t}₋ /10),

where l is the length between the two eyes, _ee l is the length between the center of _em the two eyes and the mouth, and a is a positive constant (4 for eyes and 3 for mouth).

Both l and _ee l are calculated at the beginning of the system operation. _em

Having determined the searching area of a facial feature, we look for the feature within the area by matching their edge magnitudes so as to reduce the effect of illumination variation. The edge magnitude of the facial feature is computed within a window (l_ee/1.5×l_em/ 3.5for eyes and l_ee×l_em/ 2.5 for mouth) centered at (x_t₋₁,y_t₋₁) in image I_t₋₁ and the edge magnitude of the searching area is computed in I . During _t matching edge magnitudes between the feature and the searching area, the right three fourth of the searching area is examined for the left eye, the left three fourth of the searching area is examined for the right eye, and the entire searching area is examined for the mouth. The above three-fourth strategy for eyes is to avoid incomplete shapes of eyes when the driver’s head has a large turning (see the example shown in Figure 19).

Fig. 19. Incomplete shapes of eyes when the driver’s head has a large turning.

3.5. Fuzzy Reasoning

Given the parameter values of driver’s facial expression, a fuzzy integral technique is employed to deduce the drowsiness level of the driver. In this section, the fuzzy integral technique is discussed first. Fuzzy reasoning based on the technique is then addressed.

A. Fuzzy Integral

Fuzzy integrals [Zim91] have been generalized from Lebesque [Sug77] or Riemann integral [Dub82]. In this study, the Sugeno fuzzy integral extended from Lebesque integral is considered. Let f : [0,1]S→ be a function defined on a finite set S and g P S: ( )→[0,1] be a set function defined over the power set of S. Function

( )

g ⋅ , often referred to as a fuzzy measure function, satisfies the axioms of boundary conditions, monotonicity, and continuity [Wan92]. Sugeno further imposed on g( )⋅ an additional property,∀A B, ⊂S A, ∩B=φ,

g A( ∪B)=g A( )+g B( )+λg A g B( ) ( ), λ≥1. (2) The fuzzy integral of f( )⋅ with respect to ( )g ⋅ is then defined as

∫

⋅ = ∧

where ∧ represents the fuzzy intersection and A_α = ∈{s S f s| ( )≥α}.

The above fuzzy integral provides an elegant nonlinear numeric approach suitable for integrating multiple sources of information or evidence to arrive at a value that indicates the degree of support for a particular hypothesis or decision. Suppose we have several hypotheses, H ={ , h i_i =1, , }⋅⋅⋅ n , from which a final decision d is to be made. Let

e be the integral value evaluated for hypothesis h . We then determined _i the final decision by sgn max

i i

h H h

d e

= ∈ .

Considering any hypothesis h∈H, let S be the set collecting all the information sources at hand. Function f( )⋅ receiving an information source s returns a value ( )f s

that reveals the level of support of s to the hypothesis h. Since the degrees of worth of information sources may be different, function g( )⋅ takes as input a subset of

information sources and gives a value that reflects the degree of worth of the set of sources relative to the other sources. Let d s( )=g s({ }). Function d( )⋅ is referred to

Since g S( ) 1= , the value of λ can be determined by solving

number of subsets required to perform the fuzzy integral from 2^{| |}^S (by Equation (3)) to |S . |

B. Reasoning

Recall that eight facial parameters are considered for drowsiness analysis, i.e., percentage of eye closure over time, eye blinking frequency, eye closure duration, head orientation (including tilt, pan, and rotation angles), mouth opening duration, and degree of gaze. LetD={ ,d d₁ ₂,⋅⋅⋅⋅,d₈} denote the relative degrees of importance of the parameters. Three criteria: worth, accuracy and reliability, have been involved in determining the importance degrees of the parameters. The first criterion is somehow intuitive, whereas the rest two are figured out from experiments to be discussed in Section 4. Accordingly, we define D as D={0.93, 0.8, 0.85, 0.5, 0.3,

0.3, 0.5, 0.9} . Let V ={ ,v v₁ ₂,⋅⋅⋅⋅, }v₈ be the measured values of the eight parameters,

respectively. We transfer V according to the predefined transfer functions of parameters into S ={ ,s s₁ ₂,⋅⋅⋅⋅, }s₈ , where s indicates the degree of drowsiness corresponding to _i the parametric value v . Set S here forms what we call the collection of information _i sources.

Based on the sets D and S, we want to determine using the fuzzy integral method the drowsiness level l of the driver, l∈H ={ , 0.1, 0.2, , }m m+ m+ ⋅⋅⋅⋅ M , where H is the hypothesis set, in which m and M are determined as 10 min /10

Si⊆ , of information sources by Equation (4). Afterwards, for each hypothesis S

hi∈ , we perform the fuzzy integral process. First of all, we calculate the support H value f s_i( )_j for each information sources_j∈S by f s_i( )_j = −1 s_j − . We next sort h_i information sources according to their support values. Let S′={ ,s s₁′ ′₂,⋅⋅⋅, }s₈′ be the sorted version of S such that f s_i( )₁′ ≥ f s_i( )₂′ ≥ ⋅⋅⋅ ≥ f s′_i( )₈ . Substituting f s′ and _i( )_i

( )_i

g S′ into Equation (5), we obtain the fuzzy integral value e of hypothesis _i h . The _i

above process repeats for each hypothesis in H. Finally, the drowsiness level l of the driver is simply determined as ^* arg max

h H i

l h e

= = ∈ .

Chapter 4. Experimental Results

The proposed driver’s drowsiness detection system has been developed using the Borland C++ Programming Language run on an Intel Solo T1300 1.66 GHz PC with the operating system of Windows XP professional. The input video sequence to the system is at a rate of 30 frames/second. The size of video images is 320 x 240 pixels.

We divide our experiments into two parts. The first part investigates the efficiency and accuracy of individual steps of the system process. The results have provided clues when assigning degrees of importance for the facial parameters, which are used in the fuzzy reasoning step. The second part exhibits the performance of the entire system.

4.1. Individual Steps

Recall that there are five major steps: preprocessing, facial feature extraction, face tracking, parameter estimation, and reasoning, involved in the system workflow. Of these five steps, the facial feature extraction and face tracking steps dominate the processing speed of the system, whereas the parameter estimation and reasoning steps determine the accuracy of the system.

4.1.1. Facial Feature Extraction and Face Tracking

Refer to Figure 20, where the experimental result of facial feature extraction and face tracking of a video sequence is exhibited. At the beginning, the system repeatedly

Thereafter, the system initiates the face tracking module. The tracking module continues until it misses detecting the right eye of the driver in frame 198 because of a rapid turning of the driver’s head. The system immediately evokes the facial feature extraction module again. After successfully locating all the facial features in two successive images (i.e., frames 199 and 200), the face tracking module then takes over and tracks the features over the subsequent images. Our current facial feature extraction module takes about 1/8 seconds to detect facial features in an image, whereas the face tracking module takes about 1/25 seconds to locate facial features in an image.

1 2 3 4 5

⋅⋅⋅⋅⋅⋅

178 179 180 181 182 183

184 185 186 187

⋅⋅⋅⋅⋅⋅

¹⁹⁵

196 197 198 199 200 201

202 203 204

⋅⋅⋅⋅⋅⋅

⁴³⁰ ⁴³¹

432 433 434 435 436 437

438 439

⋅⋅⋅⋅⋅⋅

⁴⁶¹ ⁴⁶² ⁴⁶³

464

⋅⋅⋅⋅⋅⋅

⁹⁹⁶ ⁹⁹⁷ ⁹⁹⁸ ⁹⁹⁹

000 001 002 003 004 005

⋅⋅⋅⋅⋅⋅

014 015 016 017 018

019 020 021

⋅⋅⋅⋅⋅⋅

Fig. 20. Facial feature extraction and face tracking over a video sequence.

Figure 21 shows the robustness of the facial feature extraction module under different conditions of illumination (e.g., shiny light, sunny day, cloudy day, twilight, underground passage and tunnel), head orientations, facial expression, and the accessary of glasses. The robustness of the facial feature extraction module is primarily due to the use of a face model. This model helps to find the other features once one or two facial features have been detected. However, there are always uncertainties during facial feature extraction. We confirm a result only when it is repeatedly obtained in two

在文檔中車載型視覺式駕駛者疲倦昏睡偵測系統 (頁 34-0)