Temporal Difference Image - Human Outline Extraction

Chapter 2 Human Silhouette Extraction…

2.1 Human Outline Extraction

2.1.2 Temporal Difference Image

We extract the motive information by making a temporal difference image from successive frames in a video stream. First, we generate current image I , _t N resultant image a temporal difference image D(i, j) as given by

( ) ∑ ( ) ( ) ∑ ( ) ( )

Fig. 2.3 shows an example of temporal difference image. Fig. 2.3(d) is absolute difference image between the current image (b) and the previous image (a), Fig. 2.3(e) is absolute difference image between the current image (b) and the following image (c) and Fig. 2.3(f) is the sum of (d) and (e) which called temporal difference image. Each difference image is shown in 8-bit grey level which maximum value is 255 and indicated by the lightest color in the figure. If the intensity of the difference image is larger than 255, it is also indicated by the lightest color.

In the difference image, the great values occur at the location occupied by the foreground in one image but occupied by the background in another image. Therefore

the human silhouette in the difference image is expanded.

Intuitively, stationary regions can be eliminated through the subtraction process and only regions that have been moved can appear in the difference picture. However, in reality, the temporal difference image obtained often contains extraneous information because of changes in the illumination condition and noise (see Fig. 2.3(f), there is much salt-and-pepper noise in the background). Hence, stationary regions may frequently survive the differencing process.

A adaptive thresholding technique was developed to cope with the above mentioned problem by analyzing the shape of the temporal difference image histogram (i.e. occurring frequency versus intensity of the temporal difference image) [8]. It is assumed that (1) the area of the stationary regions is larger than or equal to

Fig. 2.3. (a)－(c) Successive three, previous, current and next, frames of Daria performing “running” action, (d) Absolute difference image between the current and previous image, (e) Absolute difference image between the current and the next image, (f) The resulting (from summing (d) and (e)) temporal difference image.

absolute difference

summarize

the area covered by the regions in motion and (2) the pixels within all the stationary regions undergo approximately the same intensity change with small variation.

Consequently, the pixels from the stationary regions are grouped under a few peaks in the histogram with large area while the pixels within the moving regions are grouped under a number of peaks with relatively small area.

Fig. 2.4(a) illustrates an example with peaks V and W corresponding to the stationary regions and peaks X, Y and Z corresponding to the moving regions. The areas under the peaks are 20, 60, 7, 7 and 6% of the total area, respectively. Fig. 2.4(b) shows the accumulated area from valley point a to point f in Fig. 2.4(a). It should be noted that the curve is plotted versus the valley points which are spaced at equal interval. Also the area between two consecutive valley points in Fig. 2.4(a) is equal to the slope of the line between the two corresponding points in Fig. 2.4(b). The change in the slope at a valley point gives an indication of the difference of contribution due to the next peak. Because of the assumptions (1) and (2), the separation between the stationary regions and the moving regions occurs at the valley point with the largest slope change. This valley point is then chosen as the threshold value.

Fig. 2.4. (a) An example of temporal difference image histogram distribution, (b) accumulative area chart of (a) [8].

(a) (b)

Fig. 2.5 shows one real example of temporal difference image and the charts used to deciding its threshold. Fig. 2.5(a) shows a frame image of Daria performing

“running” action and Fig. 2.5(b) shows the temporal difference images of (a). We can observe that (1) the background area is larger than or equal to the area covered by the regions in motion and (2) there is salt-and-pepper noise spread in the background area which means the assumptions of the adaptive thresholding technique are satisfied in our case. A part of the histogram distribution of Fig. 2.5(b) is shown in Fig. 2.5(d) and we mark the valley points with red dots. There are a few peaks in the histogram with large area while a number of peaks with relatively small area. A part of the accumulated area line chart of Fig. 2.5(d) is shown in Fig. 2.5(e) and we mark the valley point with the largest slope change which is supposed to be the separation between the background regions and the moving regions. The coordinate of the valley point (v1, 95.13) means the threshold value is chosen to be v1 and there are 95.13 percentage of area in the temporal difference image labeled as background region. Fig.

2.5(e) shows the temporal difference image after thresholding and major part of the extraneous information due to changes in the illumination condition and noise is eliminated.

All movements in the Weizmann human action database can roughly divided into “whole body movement” and “partial movement”. When human perform “whole body movement” such as “running”, “walking”, “jumping-jack”, “jumping-in-place- on-two-legs”, “jumping-forward-on-one-leg”, “jumping-forward-on-two-legs” and

“galloping-sideways”, each part of their bodies has displacement and there are relatively complete human shapes in temporal difference images (see Fig. 2.6).

However when human perform “partial movement” just parts of their bodies move and there are incomplete human shapes in temporal difference image (see Fig. 2.7).

When human perform “waving-two-hands” or “waving-one-hand” movement just

their hands have displacement and other parts of their body stay still. When human perform “bending” movement just upper part of their bodies has displacement and lower part of their bodies stay still.

在文檔中基於時序影像差之人體輪廓擷取與頭部偵測 (頁 24-28)