• 沒有找到結果。

Facial Feature Extraction and Face Tracking …

Chapter 4 Experimental Results

4.1 Individual Steps

4.1.1 Facial Feature Extraction and Face Tracking …

Refer to Figure 20, where the experimental result of facial feature extraction and face tracking of a video sequence is exhibited. At the beginning, the system repeatedly

Thereafter, the system initiates the face tracking module. The tracking module continues until it misses detecting the right eye of the driver in frame 198 because of a rapid turning of the driver’s head. The system immediately evokes the facial feature extraction module again. After successfully locating all the facial features in two successive images (i.e., frames 199 and 200), the face tracking module then takes over and tracks the features over the subsequent images. Our current facial feature extraction module takes about 1/8 seconds to detect facial features in an image, whereas the face tracking module takes about 1/25 seconds to locate facial features in an image.

1 2 3 4 5

⋅⋅⋅⋅⋅⋅

178 179 180 181 182 183

184 185 186 187

⋅⋅⋅⋅⋅⋅

195

196 197 198 199 200 201

202 203 204

⋅⋅⋅⋅⋅⋅

430 431

432 433 434 435 436 437

438 439

⋅⋅⋅⋅⋅⋅

461 462 463

464

⋅⋅⋅⋅⋅⋅

996 997 998 999

000 001 002 003 004 005

⋅⋅⋅⋅⋅⋅

014 015 016 017 018

019 020 021

⋅⋅⋅⋅⋅⋅

Fig. 20. Facial feature extraction and face tracking over a video sequence.

Figure 21 shows the robustness of the facial feature extraction module under different conditions of illumination (e.g., shiny light, sunny day, cloudy day, twilight, underground passage and tunnel), head orientations, facial expression, and the accessary of glasses. The robustness of the facial feature extraction module is primarily due to the use of a face model. This model helps to find the other features once one or two facial features have been detected. However, there are always uncertainties during facial feature extraction. We confirm a result only when it is repeatedly obtained in two successive images.

(d) twilight (e) underground passage (f) tunnel

(g) head orientations (h) facial expression (i) glasses Fig. 21. Robustness of facial feature extraction under different conditions.

4.1.2. Parameter Estimation A. Eye Parameters

Figure 22 displays 18 images with closed eyes, which were manually extracted from a video clip of 300 frames. Our system has misidentified the closed eyes in frames 102 and 113 as opening eyes, while no opening eye has been misclassified as closed eye. In this experiment, our system achieved the identification rate of 99.3%

(=(300-2)/300). The de− , ve− and be− curves associated with the video clip are depicted in Figure 23. Based on these curves, the parameters of blinking frequency, percentage of eye closure over time (PERCLOS) and eye closure duration are calculated as 67.2 times/min, 0.07, and 0.12 seconds, respectively. Similar experiments have been conducted for 20 different video clips. The identification rates have ranged from 96.4% to 99.7% and the average identification rate was 98.6%.

Fig. 22. Images containing closed eyes in a video clip of 300 images.

B. Mouth Parameter

In the next experiment regarding the estimation of mouth openness duration, we calculated the dm− , vm− and bm− curves of a mouth from a video clip of 300 frames. Figure 24 shows the calculated curves, which collectively reveal the occurrence of a mouth openness within the video clip. The system returned the duration of 1.95 seconds for the mouth openness. We display in Figure 25 the video segment from frames 157 to 205 of the video clip, from which the mouth openness duration was manually estimated as about 2 seconds. Accordingly, our system achieved the accuracy rate of 97.5% (=1.95/2) in estimating the parameter of mouth openness duration in this experiment. Similar accuracy rates have been observed for different video clips. However, we still assigned a relatively small degree (0.5) of importance for the parameter of mouth openness duration because mouth opening can occur when the driver talks with passengers.

0

0 1 2

1 15 29 43 57 71 85 99 113 127 141 155 169 183 197 211 225 239 253 267 281 295 image

bm

(c) Fig. 24. (a) dm− curve, (b) vm− curve, and (c) bm− curve derived

from a video clip of 300 images.

Fig. 25. Images containing closed eyes in a video clip of 300 images.

C. Head Parameters

Figure 26 shows some experimental results of head parameter estimation. In this figure, the first column displays example images of drivers and the second column shows their calculated values of head parameters (including pan, tilt and rotation

angles). See the first example image, in which the driver faced front. The system estimated 0.0, 7.8 and 4.4 degrees with the pan, tilt and rotation angles of the driver’s head, respectively. These results visually agree with the orientation of the driver’s head present in the image. In the second example image, the driver is looking outside the left window. The system returned 37.6, 0.0 and 9.4 degrees with the pan, tilt and rotation angles of the driver’s head, respectively. In this example, the estimated pan and tilt angles are reasonable, while the estimated rotation angle seems a little large.

The similar situation was also observed in the third example image, in which the driver is looking at the rearview mirror. The system estimated -43.9, 0.0 and 6.8 degrees with the pan, tilt and rotation angles of the driver’s head, respectively. Visually, the rotation angles of the drivers’ heads present in the third and fourth example images should both be close to zero degree. On the contrary, the estimated rotation angle (18.3 degrees) for the driver’s head present in the last example image seems too small.

Input image Estimated values of head parameters (degs.) Pan: 0.0

Tilt: 7.8 Rotation: 4.4

Pan: 37.6 Tilt: 0.0 Rotation: 9.4 Pan: -43.9 Tilt: 0.0 Rotation: 6.8 Pan: -3.4 Tilt: 0.0 Rotation: 18.3

Fig. 26. Experimental results of head parameter estimation.

There are two major factors that can lead to the inaccuracy of head parameter estimation: the imprecise localization of facial features and the 3D-from-2D estimation of parametric values. Empirically, the error ranges of estimated pan, tilt, and rotation

angles are about

± 5

o,

± 5

o and

± 10

o, respectively. Accordingly, we would like to give higher degrees of importance for the pan and tilt angles than the rotation angle.

However, the tilt and rotation angles are more decisive than the pan angle in determining the level of drowsiness. Based on the above observations, we then assign the degrees of importance of 0.5, 0.3 and 0.3 for the tilt, pan and rotation angles,

respectively.

D. Gaze Parameter

Two video clips are utilized to illustrate the estimation of gaze parameter. Recall that the gaze parameter is estimated based on the difference (dg ) between the

predicted and observed horizontal displacements of the driver’s face provided by the Kalman filter during face tracking. Figure 27 displays the

d

g

curves of the two video clips. Clearly, the

d

g

curve of video clip 1 has relatively larger distribution magnitudes as well as variation than the

d

g

curve of video clip 2. A

d

g

curve

with large distribution magnitudes signifies large movements of the driver’s head and with a large distribution variation indicates a high frequency of head movement. Both reflect low degrees of gaze. The gaze degree of a video clip is defined as the ratio of

the gaze duration occurring in the clip to the time interval of the clip. To calculate the gaze durations of the two video clips, their binary

b

g

curves derived from their

d

g

curves are depicted in Figure 28. The gaze degrees of the two video clips are easily evaluated from the

b

g

curves as 0.36 and 0.74, respectively. The results

reasonably reflect the gaze situations of the drivers present in the two video clips.

Since the parameter of gaze degree plays a critical role in determining drowsiness level, we assign the degree of importance of 0.9 for the gaze parameter.

0

Table 1 summarizes the performances of the system on five experimental video sequences. The video sequences were acquired under different illumination conditions:

shinny day, sunny day, cloudy day, twilight, and tunnel. The time intervals of the video sequences are indicated in the second column of the table.

相關文件